cannot submit python files on EC2 cluster

2014-12-03 Thread chocjy
Hi, I am using spark with version number 1.1.0 on an EC2 cluster. After I submitted the job, it returned an error saying that a python module cannot be loaded due to missing files. I am using the same command that used to work on an private cluster before for submitting jobs and all the source fil

using zip gets EOFError error

2014-11-15 Thread chocjy
I was trying to zip the rdd with another rdd. I store my matrix in HDFS and load it as Ab_rdd = sc.textFile('data/Ab.txt', 100) If I do idx = sc.parallelize(range(m),100) #m is the number of records in Ab.txt print matrix_Ab.matrix.zip(idx).first() I got the following error: If I store my matri