Hello,

I have encountered some situation just like what is described above. I am
running a Spark Streaming Application with 2 executors, 16 cores and 10G
memory for each executor and the input topic Kafka has 64 partitions.

My code are like this:
--------------------------------------------
KafkaUtils.createDirectStream(...) 
...
.map(s => (k, v))
.mapWithState(...numPartitions(32))
...
.foreachRdd(_.foreachPartition(output))
--------------------------------------------

I was also expecting the 32 partitions of the MapWithStateRDD would be
distributed evenly between the two executors, but it turned out that all the
32 were on one of them.

I noticed that you replyed 'Are you using KafkaUtils.createDirectStream? '
and I was wondering whether this Kafka direct stream lead to this situation.
Or is there something else?

Thanks a lot!




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-does-MapWithStateRDD-distribute-the-data-tp18544p21770.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to