SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision

Download PDF

Citation (bibtex)

edings{hishinuma2016vecpar,
	title={SIMD parallel sparse matrix-vector and transposed-matrix-vector multiplication in DD precision},
	author={Hishinuma, Toshiaki and Hasegawa, Hidehiko and Tanaka, Teruo},
	booktitle={International Conference on Vector and Parallel Processing},
	pages={21--34},
	year={2016},
	organization={Springer}
}

Abstract

We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.

Links

VECPAR2016

Springer