yes, the first code takes only 30mins.
but the second method, I wait for 5 hours, only finish 10%
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/job-hangs-when-using-pipe-with-reduceByKey-tp25242p25249.html
Sent from the Apache Spark User List mailing list
Hi.
What is slow exactly?
In code-base 1:
When you run the persist() + count() you stored the result in RAM.
Then the map + reducebykey is done on in-memory data.
In the latter case (all-in-oneline) you are doing both steps at the same
time.
So you are saying that if you sum-up the time to
Which Spark release are you using ?
Which OS ?
Thanks
On Sat, Oct 31, 2015 at 5:18 AM, hotdog wrote:
> I meet a situation:
> When I use
> val a = rdd.pipe("./my_cpp_program").persist()
> a.count() // just use it to persist a
> val b = a.map(s => (s, 1)).reduceByKey().count()
> it 's so fast
>