Write a Blog >>
Wed 26 Feb 2020 09:35 - 10:00 - Concurrency and GPU (Mediterranean Ballroom) Chair(s): Ang Li

In this paper,we propose a runtime, called HUM, which hides host-to-device memory copy time without any code modification. It overlaps the host-to-device memory copy with host computation or CUDA kernel computation by exploiting Unified Memory and fault mechanisms. HUM provides wrapper functions of CUDA commands and executes host-to-device memory copy commands in an asynchronous manner. We also propose two runtime techniques. One checks if it is correct to make the synchronous host-to-device memory copy command asynchronous. If not, HUM makes the host computation or the kernel computation waits until the memory copy completes. The other subdivides consecutive host-to-device memory copy commands into smaller memory copy requests and schedules the requests from different commands in a round-robin manner. As a result, the kernel execution can be scheduled as early as possible to maximize the overlap. We evaluate HUM using 51 applications from Parboil, Rodinia, and CUDA Code Samples and compare their performance under HUM with that of hand-optimized implementations. The evaluation result shows that executing the applications under HUM is, on average, 1.21 times faster than executing them under original CUDA. The speedup is comparable to the average speedup 1.22 of the hand-optimized implementations for Unified Memory.

Wed 26 Feb

Displayed time zone: Tijuana, Baja California change

09:35 - 10:50
Concurrency and GPU (Mediterranean Ballroom)Main Conference
Chair(s): Ang Li Pacific Northwest National Laboratory
09:35
25m
Talk
Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory
Main Conference
Jaehoon Jung Seoul National University, Daeyoung Park Seoul National University, Youngdong Do Seoul National University, Jungho Park Seoul National University, Jaejin Lee Seoul National University
10:00
25m
Talk
GPU Initiated OpenSHMEM: Correct and Efficient Intra-Kernel Networking for dGPUs
Main Conference
KHALED HAMIDOUCHE Advanced Micro Devices (AMD), Michael LeBeane Advanced Micro Devices (AMD)
10:25
25m
Talk
No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers
Main Conference
Nian Liu Shanghai Jiao Tong University, Binyu Zang Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University