In addition to previous emails, when i try to execute this command from command line:
./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File "kmeans.py", line 31, in <module> import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. On Thu, Jun 25, 2015 at 12:09 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Hi all, > > Does Spark 1.4 version support Python applications on Yarn-cluster ? > (--master yarn-cluster) > > Does Spark 1.4 version support Python applications with deploy-mode > cluster ? (--deploy-mode cluster) > > How can we ship 3rd party Python dependencies with Python Spark job ? > (running on Yarn cluster) > > Thanks. > > > > > > > On Wed, Jun 24, 2015 at 3:13 PM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > >> Hi all, >> >> I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using >> Spark 1.4.0. >> >> I'm passing numpy-1.9.2.zip with --py-files flag. >> >> Here is the command I'm trying to execute but it fails: >> >> ./bin/spark-submit --master yarn-cluster --verbose --py-files >> mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py >> /kmeans_data.txt 5 1.0 >> >> >> - I have kmeans_data.txt in HDFS in / directory. >> >> >> I receive this error: >> >> " >> ... >> 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status: >> SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status >> was reported.) >> 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering >> ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before >> final status was reported.) >> 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory >> .sparkStaging/application_1435182120590_0009 >> 15/06/24 15:08:22 INFO util.Utils: Shutdown hook called >> \00 stdout\00 134Traceback (most recent call last): >> File "kmeans.py", line 31, in <module> >> import numpy as np >> ImportError: No module named numpy >> ... >> >> " >> >> Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py >> example provided with Spark ? >> >> How can we run python script which has other 3rd-party python module >> dependency on yarn-cluster ? >> >> Thanks. >> >> > > > -- > > Best regards, > Elkhan Dadashov >