Both mapWithState and updateStateByKey by default uses the HashPartitioner,
and hashes the key in the key-value DStream on which the state operation is
applied. The new data and state is partition in the exact same partitioner,
so that same keys from the new data (from the input DStream) get shuffl
Hi,
I am relatively new to Spark and am using updateStateByKey() operation to
maintain state in my Spark Streaming application. The input data is coming
through a Kafka topic.
1. I want to understand how are DStreams partitioned?
2. How does the partitioning work with mapWithState() or
u