MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation (PPoPP 2020 - Main Conference)

Who

Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Strout, Maryam Mehri Dehnavi

Track

PPoPP 2020 Main Conference

Time Zone

The program is currently displayed in (GMT-08:00) Tijuana, Baja California.

Use conference time zone: (GMT-08:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 26 Feb 2020 12:10 - 12:35 - Matrix Multiplication and Approximation (Mediterranean Ballroom) Chair(s): Albert Cohen

Abstract

Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity of operations such as HMatrix-matrix multiplications with tuneable accuracy in an evaluation phase. Existing implementations of HMatrix evaluations do not preserve locality and often lead to unbalanced parallel execution with high synchronization. Also, current solutions require the compression phase to re-execute if the kernel method or the required accuracy change. In this work, we describe MatRox, a framework that uses novel structure analysis strategies, blocking and coarsen, with code specialization and a storage format to improve locality and create load-balanced parallel tasks for HMatrix-matrix multiplications. Modularization of the matrix compression phase enables the reuse of computations when there are changes to the input accuracy and the kernel function. The MatRox-generated code for matrix-matrix multiplication is 2.98×, 1.60×, and 5.98× faster than library implementations available in GOFMM, SMASH, and STRUMPACK respectively. Additionally, the ability to reuse portions of the compression computation for changes to the accuracy leads to up to 2.64× improvement with MatRox over five changes to accuracy using GOFMM.

Bangtian Liu

University of Toronto

Canada

Kazem Cheshmi

University of Toronto

Canada

Saeed Soori

University of Toronto

Michelle Strout

University of Arizona

United States

Maryam Mehri Dehnavi

University of Toronto

Canada

Time Zone

The program is currently displayed in (GMT-08:00) Tijuana, Baja California.

Use conference time zone: (GMT-08:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 26 Feb
Displayed time zone: Tijuana, Baja California change

11:20 - 12:35	Matrix Multiplication and Approximation (Mediterranean Ballroom)Main Conference Chair(s): Albert Cohen Google

11:20 25m Talk		spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication Through Lightweight Analysis Main Conference Mathias Parger Graz University of Technology, Martin Winter Graz University of Technology, Austria, Daniel Mlakar Graz University of Technology, Austria, Markus Steinberger Graz University of Technology, Austria
11:45 25m Talk		A Novel Data Transformation and Execution Strategy for Accelerating Sparse Matrix Multiplication on GPUs Main Conference Peng Jiang The University of Iowa, Changwan Hong The Ohio State University, Gagan Agrawal The Ohio State University
12:10 25m Talk		MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation Main Conference Bangtian Liu University of Toronto, Kazem Cheshmi University of Toronto, Saeed Soori University of Toronto, Michelle Strout University of Arizona, Maryam Mehri Dehnavi University of Toronto