Right. And you can specify the partitioner via "StateSpec.partitioner(partitioner: Partitioner)".
On Tue, Nov 29, 2016 at 1:16 PM, Amit Sela <amitsel...@gmail.com> wrote: > Hi all, > > I've been digging into MapWithState code (branch 1.6), and I came across > the compute > <https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159> > implementation in *InternalMapWithStateDStream*. > > Looking at the defined partitioner > <https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L112> > it > looks like it could be different from the parent RDD partitioner (if > defaultParallelism() changed for instance, or input partitioning was > smaller to begin with), which will eventually create > <https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L537> > a ShuffleRDD. > > Am I reading this right ? > > Thanks, > Amit >