# Citation (bibtex)

edings{hishinuma2016vecpar,
title={SIMD parallel sparse matrix-vector and transposed-matrix-vector multiplication in DD precision},
author={Hishinuma, Toshiaki and Hasegawa, Hidehiko and Tanaka, Teruo},
booktitle={International Conference on Vector and Parallel Processing},
pages={21--34},
year={2016},
organization={Springer}
}


# Abstract

We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.