Hello,


I’ve, over the past few days, looked into using the new Vector API [1] to 
accelerate some BLAS operations straight from Java. You can find a gist at [2] 
containing most of the changes in 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala.



To measure performance, I’ve added a BLASBenchmark.scala [3] at 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala. I do 
see some promising speedups, especially compared to F2jBLAS. I’ve unfortunately 
not been able to install OpenBLAS locally and compare performance to native, 
but I would still expect native to be faster, especially on large inputs. See 
[4] for some f2j vs vector performance comparison.



The primary blocker is that the Vector API is only available in incubator mode, 
starting with JDK 16. We can have an easy run-time check whether we can use the 
Vectorized BLAS. But, to compile the Vectorized BLAS class, we need JDK 16+. 
Spark 3.0+ does compile with JDK 16 (it works locally), but I don’t know how to 
selectively compile sources based on the JDK version used at compile-time.



But much more importantly, I want to get your feedback before I keep exploring 
this idea further. Technically, it is feasible, and we’ll observe speed up 
whenever the native BLAS is not installed. Moreover, I am solely focusing on 
ML/MLLib for now. However, there is still graphx (I haven’t checked if there is 
anything vectorizable) and even supporting more explicit use of the Vector API 
in catalyst, which is a much bigger project.



Thank you,

Ludovic Henry



[1] https://openjdk.java.net/jeps/338

[2] 
https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala

[3] 
https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala

[4] 
https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log

Reply via email to