Hi all, I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using Spark 1.4.0.
I'm passing numpy-1.9.2.zip with --py-files flag. Here is the command I'm trying to execute but it fails: ./bin/spark-submit --master yarn-cluster --verbose --py-files mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - I have kmeans_data.txt in HDFS in / directory. I receive this error: " ... 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1435182120590_0009 15/06/24 15:08:22 INFO util.Utils: Shutdown hook called \00 stdout\00 134Traceback (most recent call last): File "kmeans.py", line 31, in <module> import numpy as np ImportError: No module named numpy ... " Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py example provided with Spark ? How can we run python script which has other 3rd-party python module dependency on yarn-cluster ? Thanks.