SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
Download PDF
Citation (bibtex)
edings{hishinuma2016vecpar,
title={SIMD parallel sparse matrix-vector and transposed-matrix-vector multiplication in DD precision},
author={Hishinuma, Toshiaki and Hasegawa, Hidehiko and Tanaka, Teruo},
booktitle={International Conference on Vector and Parallel Processing},
pages={21--34},
year={2016},
organization={Springer}
}
Abstract
We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.