Hi all, Does Spark 1.4 version support Python applications on Yarn-cluster ? (--master yarn-cluster)
Does Spark 1.4 version support Python applications with deploy-mode cluster ? (--deploy-mode cluster) How can we ship 3rd party Python dependencies with Python Spark job ? (running on Yarn cluster) Thanks. On Wed, Jun 24, 2015 at 3:13 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Hi all, > > I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using > Spark 1.4.0. > > I'm passing numpy-1.9.2.zip with --py-files flag. > > Here is the command I'm trying to execute but it fails: > > ./bin/spark-submit --master yarn-cluster --verbose --py-files > mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py > /kmeans_data.txt 5 1.0 > > > - I have kmeans_data.txt in HDFS in / directory. > > > I receive this error: > > " > ... > 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status: > SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status > was reported.) > 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before > final status was reported.) > 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory > .sparkStaging/application_1435182120590_0009 > 15/06/24 15:08:22 INFO util.Utils: Shutdown hook called > \00 stdout\00 134Traceback (most recent call last): > File "kmeans.py", line 31, in <module> > import numpy as np > ImportError: No module named numpy > ... > > " > > Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py > example provided with Spark ? > > How can we run python script which has other 3rd-party python module > dependency on yarn-cluster ? > > Thanks. > > -- Best regards, Elkhan Dadashov