No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers
In this paper, we present the first comprehensive performance characterization and optimization of ARM barriers on both mobile and server platforms. We draw a set of observations through several abstracted models and validate them in scenarios where barriers are intensively used. We find that (1) order-preserving approaches without involving the bus significantly outperform other ones, and (2) the tremendous overhead mostly comes from barriers strictly following remote memory references. Usually, such barriers are inserted when threads are exchanging data, and they are used to ensure the relative order between storing the data to a shared buffer and setting a flag to inform the receiver. Based on the observations, we propose a new mechanism, Pilot, to remove such barriers by leveraging the single-copy atomicity to piggyback the flag with the data. Applying Pilot provides 10%-380% performance improvements in multiple benchmarks, which are close to the ideal performance without barriers.
Wed 26 Feb Times are displayed in time zone: (GMT-07:00) Tijuana, Baja California change
|09:35 - 10:00|
|10:00 - 10:25|
|10:25 - 10:50|