Re: job hangs when using pipe() with reduceByKey()

2015-11-01 Thread hotdog
yes, the first code takes only 30mins. but the second method, I wait for 5 hours, only finish 10% -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/job-hangs-when-using-pipe-with-reduceByKey-tp25242p25249.html Sent from the Apache Spark User List mailing list

Re: job hangs when using pipe() with reduceByKey()

2015-11-01 Thread Gylfi
Hi. What is slow exactly? In code-base 1: When you run the persist() + count() you stored the result in RAM. Then the map + reducebykey is done on in-memory data. In the latter case (all-in-oneline) you are doing both steps at the same time. So you are saying that if you sum-up the time to

Re: job hangs when using pipe() with reduceByKey()

2015-10-31 Thread Ted Yu
Which Spark release are you using ? Which OS ? Thanks On Sat, Oct 31, 2015 at 5:18 AM, hotdog wrote: > I meet a situation: > When I use > val a = rdd.pipe("./my_cpp_program").persist() > a.count() // just use it to persist a > val b = a.map(s => (s, 1)).reduceByKey().count() > it 's so fast >