Hi, I have this piece of code which works fine on one machine but when I run this on another machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions". My code is:
rdd2 = sc.parallelize(list1) rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4)) list = rdd3.collect() assert rdd1. getNumPartitions() == rdd2. getNumPartitions() My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this structure - [1,2,3....] Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both have 1 partition) and I tried to use repartition() as well but it does not resolves this issue. The above code works fine on one machine but throws error on another. I tired to look for some explanations but I couldn't find any specific reason for this behavior. I have spark 1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this error comes. Regards, *Abhinav Mishra *