FWIW I've run into similar BLAS related problems before and wrote up a document on how to do this for Spark EC2 clusters at https://github.com/amplab/ml-matrix/blob/master/EC2.md -- Note that this works with a vanilla Spark build (you only need to link to netlib-lgpl in your App) but requires the app jar to be present on all the machines.
Thanks Shivaram On Tue, Jul 21, 2015 at 7:37 AM, Arun Ahuja <aahuj...@gmail.com> wrote: > Yes, I imagine it's the driver's classpath - I'm pulling those > screenshots straight from the Spark UI environment page. Is there > somewhere else to grab the executor class path? > > Also, the warning is only printing once, so it's also not clear whether > the warning is from the driver or exectuor, would you know? > > Thanks, > Arun > > On Tue, Jul 21, 2015 at 7:52 AM, Sean Owen <so...@cloudera.com> wrote: > >> Great, and that file exists on HDFS and is world readable? just >> double-checking. >> >> What classpath is this -- your driver or executor? this is the driver, >> no? I assume so just because it looks like it references the assembly you >> built locally and from which you're launching the driver. >> >> I think we're concerned with the executors and what they have on the >> classpath. I suspect there is still a problem somewhere in there. >> >> On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja <aahuj...@gmail.com> wrote: >> >>> Cool, I tried that as well, and doesn't seem different: >>> >>> spark.yarn.jar seems set >>> >>> [image: Inline image 1] >>> >>> This actually doesn't change the classpath, not sure if it should: >>> >>> [image: Inline image 3] >>> >>> But same netlib warning. >>> >>> Thanks for the help! >>> - Arun >>> >>> On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza <sandy.r...@cloudera.com> >>> wrote: >>> >>>> Can you try setting the spark.yarn.jar property to make sure it points >>>> to the jar you're thinking of? >>>> >>>> -Sandy >>>> >>>> On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja <aahuj...@gmail.com> >>>> wrote: >>>> >>>>> Yes, it's a YARN cluster and using spark-submit to run. I have >>>>> SPARK_HOME set to the directory above and using the spark-submit script >>>>> from there. >>>>> >>>>> bin/spark-submit --master yarn-client --executor-memory 10g >>>>> --driver-memory 8g --num-executors 400 --executor-cores 1 --class >>>>> org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 >>>>> --conf spark.storage.memoryFraction=0.15 >>>>> >>>>> ​ >>>>> >>>>> libgfortran.so.3 is also there >>>>> >>>>> ls /usr/lib64/libgfortran.so.3 >>>>> /usr/lib64/libgfortran.so.3 >>>>> >>>>> These are jniloader files in the jar >>>>> >>>>> jar tf >>>>> /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar >>>>> | grep jniloader >>>>> META-INF/maven/com.github.fommil/jniloader/ >>>>> META-INF/maven/com.github.fommil/jniloader/pom.xml >>>>> META-INF/maven/com.github.fommil/jniloader/pom.properties >>>>> >>>>> ​ >>>>> >>>>> Thanks, >>>>> Arun >>>>> >>>>> On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen <so...@cloudera.com> wrote: >>>>> >>>>>> Make sure /usr/lib64 contains libgfortran.so.3; that's really the >>>>>> issue. >>>>>> >>>>>> I'm pretty sure the answer is 'yes', but, make sure the assembly has >>>>>> jniloader too. I don't see why it wouldn't, but, that's needed. >>>>>> >>>>>> What is your env like -- local, standalone, YARN? how are you running? >>>>>> Just want to make sure you are using this assembly across your >>>>>> cluster. >>>>>> >>>>>> On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja <aahuj...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Sean, >>>>>>> >>>>>>> Thanks for the reply! I did double-check that the jar is one I think >>>>>>> I am running: >>>>>>> >>>>>>> [image: Inline image 2] >>>>>>> >>>>>>> jar tf >>>>>>> /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar >>>>>>> | grep netlib | grep Native >>>>>>> com/github/fommil/netlib/NativeRefARPACK.class >>>>>>> com/github/fommil/netlib/NativeRefBLAS.class >>>>>>> com/github/fommil/netlib/NativeRefLAPACK.class >>>>>>> com/github/fommil/netlib/NativeSystemARPACK.class >>>>>>> com/github/fommil/netlib/NativeSystemBLAS.class >>>>>>> com/github/fommil/netlib/NativeSystemLAPACK.class >>>>>>> >>>>>>> Also, I checked the gfortran version on the cluster nodes and it is >>>>>>> available and is 5.1 >>>>>>> >>>>>>> $ gfortran --version >>>>>>> GNU Fortran (GCC) 5.1.0 >>>>>>> Copyright (C) 2015 Free Software Foundation, Inc. >>>>>>> >>>>>>> and still see: >>>>>>> >>>>>>> 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: >>>>>>> com.github.fommil.netlib.NativeSystemBLAS >>>>>>> 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: >>>>>>> com.github.fommil.netlib.NativeRefBLAS >>>>>>> 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: >>>>>>> com.github.fommil.netlib.NativeSystemLAPACK >>>>>>> 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: >>>>>>> com.github.fommil.netlib.NativeRefLAPACK >>>>>>> >>>>>>> ​ >>>>>>> >>>>>>> Does anything need to be adjusted in my application POM? >>>>>>> >>>>>>> Thanks, >>>>>>> Arun >>>>>>> >>>>>>> On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes, that's most of the work, just getting the native libs into the >>>>>>>> assembly. netlib can find them from there even if you don't have >>>>>>>> BLAS >>>>>>>> libs on your OS, since it includes a reference implementation as a >>>>>>>> fallback. >>>>>>>> >>>>>>>> One common reason it won't load is not having libgfortran installed >>>>>>>> on >>>>>>>> your OSes though. It has to be 4.6+ too. That can't be shipped even >>>>>>>> in >>>>>>>> netlib and has to exist on your hosts. >>>>>>>> >>>>>>>> The other thing I'd double-check is whether you are really using >>>>>>>> this >>>>>>>> assembly you built for your job -- like, it's the actually the >>>>>>>> assembly the executors are using. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja <aahuj...@gmail.com> >>>>>>>> wrote: >>>>>>>> > Is there more documentation on what is needed to setup >>>>>>>> BLAS/LAPACK native >>>>>>>> > suport with Spark. >>>>>>>> > >>>>>>>> > I’ve built spark with the -Pnetlib-lgpl flag and see that the >>>>>>>> netlib classes >>>>>>>> > are in the assembly jar. >>>>>>>> > >>>>>>>> > jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep >>>>>>>> netlib | grep >>>>>>>> > Native >>>>>>>> > 6625 Tue Jul 07 15:22:08 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeRefARPACK.class >>>>>>>> > 21123 Tue Jul 07 15:22:08 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeRefBLAS.class >>>>>>>> > 178334 Tue Jul 07 15:22:08 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeRefLAPACK.class >>>>>>>> > 6640 Tue Jul 07 15:22:10 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeSystemARPACK.class >>>>>>>> > 21138 Tue Jul 07 15:22:10 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeSystemBLAS.class >>>>>>>> > 178349 Tue Jul 07 15:22:10 EDT 2015 >>>>>>>> > com/github/fommil/netlib/NativeSystemLAPACK.class >>>>>>>> > >>>>>>>> > Also I see the following in /usr/lib64 >>>>>>>> > >>>>>>>> >> ls /usr/lib64/libblas. >>>>>>>> > libblas.a libblas.so libblas.so.3 >>>>>>>> libblas.so.3.2 >>>>>>>> > libblas.so.3.2.1 >>>>>>>> > >>>>>>>> >> ls /usr/lib64/liblapack >>>>>>>> > liblapack.a liblapack_pic.a liblapack.so >>>>>>>> liblapack.so.3 >>>>>>>> > liblapack.so.3.2 liblapack.so.3.2.1 >>>>>>>> > >>>>>>>> > But I stil see the following in the Spark logs: >>>>>>>> > >>>>>>>> > 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: >>>>>>>> > com.github.fommil.netlib.NativeSystemBLAS >>>>>>>> > 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: >>>>>>>> > com.github.fommil.netlib.NativeRefBLAS >>>>>>>> > 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: >>>>>>>> > com.github.fommil.netlib.NativeSystemLAPACK >>>>>>>> > 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: >>>>>>>> > com.github.fommil.netlib.NativeRefLAPACK >>>>>>>> > >>>>>>>> > Anything in this process I missed? >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > Arun >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >