​Hi everyone,
Just wanted to share some quick baseline benchmark results [3].
I ran LU decomposition on a AMD Ryzen Threadripper 1950x 16C/32T system.
LAPACK is currently plain loop parallel BLAS as far I believe.
And the upstream version of PLASMA uses OpenMP tasks [1].
The colored region is the 95% confidence interval.
Task parallelism seems to scale pretty well on such a small scale benchmark.
I hope work-stealing could improve this even more.
An
 
Ray Kim
 
[1] YarKhan, Asim, et al. "Porting the PLASMA numerical library to the OpenMP 
standard." International Journal of Parallel Programming 45.3 (2017): 612-633.
[2] https://bitbucket.org/icl/plasma/src/default/
[3] Link to benchmark plot: https://m.imgur.com/ysxs5ol
 

Reply via email to