Problem solved:
for i in range(1,6):
L=L.cartesian(D)
L.unpersist()
L=L.reduceByKey(min).coalesce(6).map(lambda (l,n):l).cache()
L.collect()
Number of partitions should be constant
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cartesian-in
Hi All,
I have problem with cartesian product. I build cartesian of two RDDs in the
loop and the result is squeezed to the original size of one of
participating variables. At the and of the iteration this result is assigned
to the original variable. I expect same running time for each iteration,
b