Hi, Currently I have a job that has spills to disk and memory due to usage of reduceByKey and a lot of intermediate data in reduceByKey that gets shuffled.
How to use custom partitioner in Spark Streaming for an intermediate stage so that the next stage that uses reduceByKey does not have to do shuffles? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-splling-to-disk-and-memory-in-Spark-Streaming-tp25149.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org