GPU Initiated OpenSHMEM: Correct and Efficient Intra-Kernel Networking for dGPUs (PPoPP 2020 - Main Conference)

Who

KHALED HAMIDOUCHE, Michael LeBeane

Track

PPoPP 2020 Main Conference

Time Zone

The program is currently displayed in (GMT-08:00) Tijuana, Baja California.

Use conference time zone: (GMT-08:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 26 Feb 2020 10:00 - 10:25 - Concurrency and GPU (Mediterranean Ballroom) Chair(s): Ang Li

Abstract

Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. Recent research investigations have explored performing network operations from within a GPU kernel itself. However, these approaches involve the CPU in the critical path which leads to high latency and inefficient utilization of network and/or GPU resources. In this work, we introduce GPU Initiated OpenSHMEM (GIO), a new GPU-centric PGAS programming model and runtime that enables GPUs to act as first-class citizens in network-based systems. GIO leverages tight integration of GPUs and NICs to provide high-performance intra-kernel networking by enabling GPUs to efficiently and directly initiate network operations. This paper explores the GPU’scoarse-grained memory model and its mismatch when GPUs wish to directly interact with the network from within a kernel. GIO also reduces latency by relying on a new template-based design to minimize the overhead of initiating a network operation. We illustrate that for a regular application like a Jacobi 2D Stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing Intra-kernel networking schemes.

KHALED HAMIDOUCHE

Advanced Micro Devices (AMD)

Michael LeBeane