Scaling out Speculative Execution of Finite-State Machines with Parallel Merge (PPoPP 2020 - Main Conference)

Who

Yang Xia, Peng Jiang, Gagan Agrawal

Track

PPoPP 2020 Main Conference

Time Zone

The program is currently displayed in (GMT-08:00) Tijuana, Baja California.

Use conference time zone: (GMT-08:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 25 Feb 2020 10:00 - 10:25 - Scaling (Mediterranean Ballroom) Chair(s): Zhijia Zhao

Abstract

Finite State Machine (FSM) is a key component in many important applications, such as Huffman decoding, regular expression matching, and HTML tokenization. Due to its inherent dependency and unpredictable memory access, FSM computations are considered to be extremely difficult to parallelize. As such, significant research efforts have been made to accelerate FSM computations. Although they achieve promising performance results on multi-core machines, these methods are not scalable for emerging many-core architectures such as GPUs.

We analyzed the bottleneck of achieving scalability on GPUs is the sequential merge inherent to these methods. However, different from simple reduction loops, parallelization of FSM computations typically incorporates runtime check and re-execution, which may significantly affect performance. Based on these observations, our parallel merge implementations select efficient runtime check implementations and avoids unnecessary re-executions. Further, based on GPU architectural features, we develop optimization techniques to further improve performance.

We evaluate our parallel merge implementations on a set of representative algorithms. Experimental results show that our parallel merge implementation is several times more efficient than sequential merge implementations and achieves linear scalability on different GPUs.

Yang Xia

The Ohio State University

Peng Jiang

The University of Iowa

United States

Gagan Agrawal