GPU Initiated OpenSHMEM: Correct and Efficient Intra-Kernel Networking for dGPUs
Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. Recent research investigations have explored performing network operations from within a GPU kernel itself. However, these approaches involve the CPU in the critical path which leads to high latency and inefficient utilization of network and/or GPU resources. In this work, we introduce GPU Initiated OpenSHMEM (GIO), a new GPU-centric PGAS programming model and runtime that enables GPUs to act as first-class citizens in network-based systems. GIO leverages tight integration of GPUs and NICs to provide high-performance intra-kernel networking by enabling GPUs to efficiently and directly initiate network operations. This paper explores the GPU’scoarse-grained memory model and its mismatch when GPUs wish to directly interact with the network from within a kernel. GIO also reduces latency by relying on a new template-based design to minimize the overhead of initiating a network operation. We illustrate that for a regular application like a Jacobi 2D Stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing Intra-kernel networking schemes.
Wed 26 FebDisplayed time zone: Tijuana, Baja California change
09:35 - 10:50
|Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory|
|GPU Initiated OpenSHMEM: Correct and Efficient Intra-Kernel Networking for dGPUs|
|No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers|