subject:"UpdateStateByKey \: Partitioning and Shuffle"

Re: UpdateStateByKey : Partitioning and Shuffle

2016-01-05 Thread Tathagata Das

Both mapWithState and updateStateByKey by default uses the HashPartitioner, and hashes the key in the key-value DStream on which the state operation is applied. The new data and state is partition in the exact same partitioner, so that same keys from the new data (from the input DStream) get shuffl

UpdateStateByKey : Partitioning and Shuffle

2016-01-05 Thread Soumitra Johri

Hi, I am relatively new to Spark and am using updateStateByKey() operation to maintain state in my Spark Streaming application. The input data is coming through a Kafka topic. 1. I want to understand how are DStreams partitioned? 2. How does the partitioning work with mapWithState() or u