*Re: performance measurement framework* We (Databricks) used to use spark-perf <https://github.com/databricks/spark-perf>, but that was mainly for the RDD-based API. We've now switched to spark-sql-perf <https://github.com/databricks/spark-sql-perf>, which does include some ML benchmarks despite the project name. I'll see about updating the project README to document how to run MLlib tests.
On Tue, Jan 24, 2017 at 6:02 PM, bradc <brad.carl...@oracle.com> wrote: > I believe one of the higher level goals of Spark MLlib should be to > improve the efficiency of the ML algorithms that already exist. Currently > there ML has a reasonable coverage of the important core algorithms. The > work to get to feature parity for DataFrame-based API and model persistence > are also important. > > Apache Spark needs to use higher-level BLAS3 and LAPACK routines, instead > of BLAS1 & BLAS3. For a long time we've used the concept of compute > intensity (compute_intensity = FP_operations/Word) to help look at the > performance of the underling compute kernels (see the papers referenced > below). It has been proven in many implementations that performance, > scalability, and huge reduction in memory pressure can be achieved by using > higher-level BLAS3 or LAPACK routines in both single node as well as > distributed computations. > > I performed a survey of some of Apache Spark's ML algorithms. > Unfortunately most of the ML algorithms are implemented with BLAS1 or BLAS2 > routines which have very low compute intensity. BLAS2 and BLAS1 routines > require a lot more memory bandwidth and will not achieve peak performance > on x86, GPUs, or any other processor. > > Apache Spark 2.1.0 ML routines & BLAS Routines > > ALS(Alternating Least Squares matrix factorization > > - BLAS2: _SPR, _TPSV > - BLAS1: _AXPY, _DOT, _SCAL, _NRM2 > > Logistic regression classification > > - BLAS2: _GEMV > - BLAS1: _DOT, _SCAL > > Generalized linear regression > > - BLAS1: _DOT > > Gradient-boosted tree regression > > - BLAS1: _DOT > > GraphX SVD++ > > - BLAS1: _AXPY, _DOT,_SCAL > > Neural Net Multi-layer Perceptron > > - BLAS3: _GEMM > - BLAS2: _GEMV > > Only the Neural Net Multi-layer Perceptron uses BLAS3 matrix multiply > (DGEMM). BTW the underscores are replaced by S, D, Z, C for (32-bit real, > 64-bit double, 32-bit complex, 64-bit complex operations; respectably). > > Refactoring the algorithms to use BLAS3 routines or higher level LAPACK > routines will require coding changes to use sub-block algorithms but the > performance benefits can be great. > > More at: https://blogs.oracle.com/BestPerf/entry/improving_ > algorithms_in_spark_ml > Background: > > Brad Carlile. Parallelism, compute intensity, and data vectorization. > SuperComputing'93, November 1993. > <https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-intensity-1993.pdf> > > John McCalpin. 213876927_Memory_Bandwidth_and_Machine_Balance_in_ > Current_High_Performance_Computers 1995 > <https://www.researchgate.net/publication/213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers> > > ------------------------------ > View this message in context: Re: MLlib mission and goals > <http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-mission-and-goals-tp20715p20754.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > > -- Joseph Bradley Software Engineer - Machine Learning Databricks, Inc. [image: http://databricks.com] <http://databricks.com/>