Re: Spark is slow

Joe L Mon, 21 Apr 2014 19:44:16 -0700

g1 = pairs1.groupByKey().count() 
pairs1 = pairs1.groupByKey(g1).cache() 
g2 = triples.groupByKey().count() 
pairs2 = pairs2.groupByKey(g2)


pairs = pairs2.join(pairs1) 

Hi, I want to implement hash-partitioned joining as shown above. But
somehow, it is taking so long to perform. As I understand, the above joining
is only implemented locally right since they are partitioned respectively?
After we partition, they will reside in the same node. So, isn't it supposed
to be fast when we partition by keys. Thank you. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-is-slow-tp4539p4577.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark is slow

Reply via email to