Hi all, I'm using Spark Streaming mapWithState operation to do a stateful operation on my Kafka stream (though I think similar arguments would apply for any source).
Trying to understand a way to control mapWithState's partitioning schema. My transformations are simple: 1) create KafkaDStream 2) mapPartitions to get a key-value stream where `key` corresponds to Kafka message key 3) apply mapWithState operation on key-value stream, the state stream shares keys with the original stream, the resulting streams doesn't change keys either The problem is that, as I understand, mapWithState stream has a different partitioning schema and thus I see shuffles in Spark Web UI. >From the mapWithState implementation I see that: mapwithState uses Partitioner if specified, otherwise partitions data with HashPartitioner(<default-parallelism-conf>). The thing is that original KafkaDStream has a specific partitioning schema: Kafka partitions correspond Spark RDD partitions. Question: is there a way for mapWithState stream to inherit partitioning schema from the original stream (i.e. correspond to Kafka partitions). Thanks, Andrii