Finally, we solved this problem by building our own netlib-java natives so
files on CentOS, it works without any warning but the performance is far
from running in Macbook Pro.

The matrix size is rows: 6778, columns: 2487

The MBP used 10 seconds to get the PCA result, but CentOS used 110s, event
MBP with pure BLAS java implementation will only use 40s

The source code cache the input matrix in memory, and only 200+kB data read
by shuffle.
The only different in http://localhost:4040/stages/ are 
Stage Id        Description     Submitted       Duration        Tasks: 
Succeeded/Total  Shuffle Read
Shuffle Write
14      aggregate at RowMatrix.scala:211 2014/06/13 12:18:12    *36 s*  3/3     
        

The Duration on mac is only 10s

So why RowMatrix.scala perform so differently on mac and CentOS, any related
to the native blas implementation?

Then I reduce the matrix size to half, and the duration is reduce to 2s on
mac and 12s on CentOS.

Is there any benchmark available for isolation the problem in either mllib
or netlib-java?

cpus are 3740QM on mac and E5620 on CentOS
http://ark.intel.com/compare/47925,70847

the log files are in attachment, 
centos.log
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n7551/centos.log>  
mac.log
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n7551/mac.log>  




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-tp7042p7551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to