Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory
In this paper,we propose a runtime, called HUM, which hides host-to-device memory copy time without any code modification. It overlaps the host-to-device memory copy with host computation or CUDA kernel computation by exploiting Unified Memory and fault mechanisms. HUM provides wrapper functions of CUDA commands and executes host-to-device memory copy commands in an asynchronous manner. We also propose two runtime techniques. One checks if it is correct to make the synchronous host-to-device memory copy command asynchronous. If not, HUM makes the host computation or the kernel computation waits until the memory copy completes. The other subdivides consecutive host-to-device memory copy commands into smaller memory copy requests and schedules the requests from different commands in a round-robin manner. As a result, the kernel execution can be scheduled as early as possible to maximize the overlap. We evaluate HUM using 51 applications from Parboil, Rodinia, and CUDA Code Samples and compare their performance under HUM with that of hand-optimized implementations. The evaluation result shows that executing the applications under HUM is, on average, 1.21 times faster than executing them under original CUDA. The speedup is comparable to the average speedup 1.22 of the hand-optimized implementations for Unified Memory.
Wed 26 FebDisplayed time zone: Tijuana, Baja California change
09:35 - 10:50 | Concurrency and GPU (Mediterranean Ballroom)Main Conference Chair(s): Ang Li Pacific Northwest National Laboratory | ||
09:35 25mTalk | Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory Main Conference Jaehoon Jung Seoul National University, Daeyoung Park Seoul National University, Youngdong Do Seoul National University, Jungho Park Seoul National University, Jaejin Lee Seoul National University | ||
10:00 25mTalk | GPU Initiated OpenSHMEM: Correct and Efficient Intra-Kernel Networking for dGPUs Main Conference | ||
10:25 25mTalk | No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers Main Conference Nian Liu Shanghai Jiao Tong University, Binyu Zang Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University |