Hi all,

I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using
Spark 1.4.0.

I'm passing numpy-1.9.2.zip with --py-files flag.

Here is the command I'm trying to execute but it fails:

./bin/spark-submit --master yarn-cluster --verbose  --py-files
   mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py
/kmeans_data.txt 5 1.0


- I have kmeans_data.txt in HDFS in / directory.


I receive this error:

"
...
15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED,
exitCode: 0, (reason: Shutdown hook called before final status was
reported.)
15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering
ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before
final status was reported.)
15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory
.sparkStaging/application_1435182120590_0009
15/06/24 15:08:22 INFO util.Utils: Shutdown hook called
\00 stdout\00 134Traceback (most recent call last):
  File "kmeans.py", line 31, in <module>
    import numpy as np
ImportError: No module named numpy
...

"

Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py
example provided with Spark ?

How can we run python script which has other 3rd-party python module
dependency on yarn-cluster ?

Thanks.

Reply via email to