Hi,
I am using spark with version number 1.1.0 on an EC2 cluster. After I
submitted the job, it returned an error saying that a python module cannot
be loaded due to missing files. I am using the same command that used to
work on an private cluster before for submitting jobs and all the source
fil
I was trying to zip the rdd with another rdd. I store my matrix in HDFS and
load it as Ab_rdd = sc.textFile('data/Ab.txt', 100)
If I do
idx = sc.parallelize(range(m),100) #m is the number of records in Ab.txt
print matrix_Ab.matrix.zip(idx).first()
I got the following error:
If I store my matri