Do you have a minimal example of how to reproduce the problem, that doesn't depend on Cassandra?
On Mon, Sep 26, 2016 at 4:10 PM, Erwan ALLAIN <eallain.po...@gmail.com> wrote: > Hi > > I'm working with > - Kafka 0.8.2 > - Spark Streaming (2.0) direct input stream. > - cassandra 3.0 > > My batch interval is 1s. > > When I use some map, filter even saveToCassandra functions, the processing > time is around 50ms on empty batches > => This is fine. > > As soon as I use some reduceByKey, the processing time is increasing rapidly > between 3 and 4s for 3 calls of reduceByKey on empty batches. > => Not Good > > I've found a workaround by using a foreachRDD on DStream and check if rdd is > empty before executing the reduceByKey but I find this quite ugly. > > Do I need to check if RDD is empty on all shuffle operation ? > > Thanks for your lights --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org