I believe one of the higher level goals of Spark MLlib should be to improve
the efficiency of the ML algorithms that already exist.  Currently there ML
has a reasonable coverage of the important core algorithms.  The work to get
to feature parity for DataFrame-based API and model persistence are  also
important.
Apache Spark needs to use higher-level BLAS3 and LAPACK routines, instead of
BLAS1 & BLAS3.  For a long time we've used the concept of compute intensity
(compute_intensity = FP_operations/Word) to help look at the performance of
the underling compute kernels (see the papers referenced below).  It has
been proven in many implementations that performance, scalability, and huge
reduction in memory pressure can be achieved by using higher-level BLAS3 or
LAPACK routines in both single node as well as distributed computations.
I performed a survey of some of Apache Spark's ML algorithms.  Unfortunately
most of the ML algorithms are implemented with BLAS1 or BLAS2 routines which
have very low compute intensity.  BLAS2 and BLAS1 routines require a lot
more memory bandwidth and will not achieve peak performance on x86, GPUs, or
any other processor.  
Apache Spark 2.1.0 ML routines & BLAS Routines
ALS(Alternating Least Squares matrix factorization
BLAS2: _SPR, _TPSV
BLAS1: _AXPY, _DOT, _SCAL, _NRM2
Logistic regression classification
BLAS2: _GEMV
BLAS1:  _DOT, _SCAL
Generalized linear regression
BLAS1:  _DOT
Gradient-boosted tree regression
BLAS1: _DOT
GraphX SVD++
BLAS1: _AXPY, _DOT,_SCAL
Neural Net Multi-layer Perceptron
BLAS3: _GEMM
BLAS2: _GEMV
Only the Neural Net Multi-layer Perceptron uses BLAS3 matrix multiply
(DGEMM).  BTW the underscores are replaced by S, D, Z, C for (32-bit real,
64-bit double, 32-bit complex, 64-bit complex operations; respectably).
 Refactoring the algorithms to use BLAS3 routines or higher level LAPACK
routines will require coding changes to use sub-block algorithms but the
performance benefits can be great. 
More at: 
https://blogs.oracle.com/BestPerf/entry/improving_algorithms_in_spark_ml
<https://blogs.oracle.com/BestPerf/entry/improving_algorithms_in_spark_ml>  
Background:
Brad Carlile. Parallelism, compute intensity, and data vectorization.
SuperComputing'93, November 1993.
<https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-intensity-1993.pdf>
  
John McCalpin.
213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers
1995
<https://www.researchgate.net/publication/213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers>
  




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-mission-and-goals-tp20715p20754.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to