Re: Slow Shuffle Operation on Empty Batch

Cody Koeninger Mon, 26 Sep 2016 14:45:06 -0700

Do you have a minimal example of how to reproduce the problem, that
doesn't depend on Cassandra?


On Mon, Sep 26, 2016 at 4:10 PM, Erwan ALLAIN <eallain.po...@gmail.com> wrote:
> Hi
>
> I'm working with
> - Kafka 0.8.2
> - Spark Streaming (2.0) direct input stream.
> - cassandra 3.0
>
> My batch interval is 1s.
>
> When I use some map, filter even saveToCassandra functions, the processing
> time is around 50ms on empty batches
>  => This is fine.
>
> As soon as I use some reduceByKey, the processing time is increasing rapidly
> between 3 and 4s for 3 calls of reduceByKey on empty batches.
> => Not Good
>
> I've found a workaround by using a foreachRDD on DStream and check if rdd is
> empty before executing the reduceByKey but I find this quite ugly.
>
> Do I need to check if RDD is empty on all shuffle operation ?
>
> Thanks for your lights

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Slow Shuffle Operation on Empty Batch

Reply via email to