Re: job hangs when using pipe() with reduceByKey()

Ted Yu Sat, 31 Oct 2015 08:15:21 -0700

Which Spark release are you using ?

Which OS ?


Thanks

On Sat, Oct 31, 2015 at 5:18 AM, hotdog <lisend...@163.com> wrote:

> I meet a situation:
> When I use
> val a = rdd.pipe("./my_cpp_program").persist()
> a.count()  // just use it to persist a
> val b = a.map(s => (s, 1)).reduceByKey().count()
> it 's so fast
>
> but when I use
> val b = rdd.pipe("./my_cpp_program").map(s => (s, 1)).reduceByKey().count()
> it is so slow....
> and there are many such log in my executors:
> 15/10/31 19:53:58 INFO collection.ExternalSorter: Thread 78 spilling
> in-memory map of 633.1 MB to disk (8 times so far)
> 15/10/31 19:54:14 INFO collection.ExternalSorter: Thread 74 spilling
> in-memory map of 633.1 MB to disk (8 times so far)
> 15/10/31 19:54:17 INFO collection.ExternalSorter: Thread 79 spilling
> in-memory map of 633.1 MB to disk (8 times so far)
> 15/10/31 19:54:29 INFO collection.ExternalSorter: Thread 77 spilling
> in-memory map of 633.1 MB to disk (8 times so far)
> 15/10/31 19:54:50 INFO collection.ExternalSorter: Thread 76 spilling
> in-memory map of 633.1 MB to disk (9 times so far)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/job-hangs-when-using-pipe-with-reduceByKey-tp25242.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: job hangs when using pipe() with reduceByKey()

Reply via email to