Re: Job splling to disk and memory in Spark Streaming

2015-10-21 Thread Adrian Tanase
, etc) to execute in a single stage. Hope this helps, -adrian From: Tathagata Das Date: Wednesday, October 21, 2015 at 10:36 AM To: swetha Cc: user Subject: Re: Job splling to disk and memory in Spark Streaming Well, reduceByKey needs to shutffle if your intermediate data is not already partitioned

Re: Job splling to disk and memory in Spark Streaming

2015-10-21 Thread Tathagata Das
Well, reduceByKey needs to shutffle if your intermediate data is not already partitioned in the same way as reduceByKey's partitioning. reduceByKey() has other signatures that take in a partitioner, or simply number of partitions. So you can set the same partitioner as your previous stage. Without