Write a Blog >>

SpMM (multiplication of a sparse matrix and a dense matrix) and SDDMM (sampled dense-dense matrix multiplication) are at the core of many scientific, machine learning, and data mining applications. Because of the irregular memory accesses, the two kernels have poor data locality, and data movement overhead is a bottleneck for their performance. To overcome this issue, previous works have proposed using tiling and data reorganization to enhance data reuse. Despite their success in improving the performance for many sparse matrices, we find that the efficacy of existing techniques largely depends on how the non-zeros are distributed in a sparse matrix. In this work, we propose a novel {\em row-reordering} technique to improve data locality for SpMM and SDDMM on GPUs. The goal of such row reordering is to place similar rows close to each other, allowing them to be processed together, and thus providing better temporal locality for the values of the dense matrix. We focus on performing the row-reordering efficiently, by using a hierarchical clustering procedure optimized by locality-sensitive hashing. We also investigate when row-reordering is useful, and what factors the performance gains from our method are correlated to. Experimental evaluation using 1084 sparse matrices from SuiteSparse collection and Network Repository shows that our technique achieves up to 2.91x speedup for SpMM and up to 3.19x speedup for SDDMM against the state-of-the-art alternatives on an Nvidia P100 GPU.

Wed 26 Feb

Displayed time zone: Tijuana, Baja California change

11:20 - 12:35
Matrix Multiplication and Approximation (Mediterranean Ballroom)Main Conference
Chair(s): Albert Cohen Google
11:20
25m
Talk
spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication Through Lightweight Analysis
Main Conference
Mathias Parger Graz University of Technology, Martin Winter Graz University of Technology, Austria, Daniel Mlakar Graz University of Technology, Austria, Markus Steinberger Graz University of Technology, Austria
11:45
25m
Talk
A Novel Data Transformation and Execution Strategy for Accelerating Sparse Matrix Multiplication on GPUs
Main Conference
Peng Jiang The University of Iowa, Changwan Hong The Ohio State University, Gagan Agrawal The Ohio State University
12:10
25m
Talk
MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation
Main Conference
Bangtian Liu University of Toronto, Kazem Cheshmi University of Toronto, Saeed Soori University of Toronto, Michelle Strout University of Arizona, Maryam Mehri Dehnavi University of Toronto