For standalone and yarn mode, you need to install native libraries on all nodes. The best solution is installing them to /usr/lib/libblas.so.3 and /usr/lib/liblapack.so.3 . If your matrix is sparse, the native libraries cannot help because they are for dense linear algebra. You can create RDD of sparse rows and try k-means directly, it supports sparse input. -Xiangrui
Sent from my iPad > On Jun 5, 2014, at 2:36 AM, yangliuyu <yangli...@163.com> wrote: > > Hi, > > We're using Mllib (1.0.0 release version) on a k-means clustering problem. > We want to reduce the matrix column size before send the points to k-means > solver. > > It works on my mac with the local mode: spark-test-run-assembly-1.0.jar > contains my application code, com.github.fommil, netlib code and > netlib-native*.so files (include jnilib and dll files) > > spark-submit --class test.TestMllibPCA --master local[4] --executor-memory > 3g --driver-memory 3g --driver-class-path > /data/user/dump/spark-test-run-assembly-1.0.jar > /data/user/dump/spark-test-run-assembly-1.0.jar > /data/user/dump/user_fav_2014_04_09.csv.head1w > > But if --driver-class-path removed, the warn message appears: > 14/06/05 16:36:20 WARN LAPACK: Failed to load implementation from: > com.github.fommil.netlib.NativeSystemLAPACK > 14/06/05 16:36:20 WARN LAPACK: Failed to load implementation from: > com.github.fommil.netlib.NativeRefLAPACK > > or set SPARK_CLASSPATH=/data/user/dump/spark-test-run-assembly-1.0.jar can > also solve the problem. > > The matrix contain sparse data with rows: 6778, columns: 2487 and the time > consume of calculating PCA is 10s and 47s respectively which infers the > native library works well. > > Then I want to test it on a spark standalone cluster(on CentOS), but it > failed again. > After change JDK logging level to FINEST, got the message: > 14/06/05 16:19:15 INFO JniLoader: JNI LIB = > netlib-native_system-linux-x86_64.so > 14/06/05 16:19:15 INFO JniLoader: extracting > jar:file:/data/user/dump/spark-test-run-assembly-1.0.jar!/netlib-native_system-linux-x86_64.so > to /tmp/jniloader6648403281987654682netlib-native_system-linux-x86_64.so > 14/06/05 16:19:15 WARN LAPACK: Failed to load implementation from: > com.github.fommil.netlib.NativeSystemLAPACK > 14/06/05 16:19:15 INFO JniLoader: JNI LIB = > netlib-native_ref-linux-x86_64.so > 14/06/05 16:19:15 INFO JniLoader: extracting > jar:file:/data/user/dump/spark-test-run-assembly-1.0.jar!/netlib-native_ref-linux-x86_64.so > to /tmp/jniloader2298588627398263902netlib-native_ref-linux-x86_64.so > 14/06/05 16:19:16 WARN LAPACK: Failed to load implementation from: > com.github.fommil.netlib.NativeRefLAPACK > 14/06/05 16:19:16 INFO LAPACK: Implementation provided by class > com.github.fommil.netlib.F2jLAPACK > > The libgfortran ,atlas, blas, lapack and arpack are all installed and all of > the .so files are located under /usr/lib64, spark.executor.extraLibraryPath > is set to /usr/lib64 in conf/spark-defaults.conf but none of them works. I > tried add --jars /data/user/dump/spark-test-run-assembly-1.0.jar but no good > news. > > What should I try next? > > Is the native library need to be visible for driver and executor both? In > local mode the problem seems to be a classpath problem, but for standalone > and yarn mode it get more complex. A detail document is really helpful. > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-tp7042.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.