Re: Join on Spark too slow.

2015-04-09 Thread ๏̯͡๏
If your data has special characteristics like one small other large then you can think of doing map side join in Spark using (Broadcast Values), this will speed up things. Otherwise as Pitel mentioned if there is nothing special and its just cartesian product it might take ever, or you might incre

Re: Join on Spark too slow.

2015-04-09 Thread Guillaume Pitel
Maybe I'm wrong, but what you are doing here is basically a bunch of cartesian product for each key. So if "hello" appear 100 times in your corpus, it will produce 100*100 elements in the join output. I don't understand what you're doing here, but it's normal your join takes forever, it makes