Hello, I have encountered some situation just like what is described above. I am running a Spark Streaming Application with 2 executors, 16 cores and 10G memory for each executor and the input topic Kafka has 64 partitions.
My code are like this: -------------------------------------------- KafkaUtils.createDirectStream(...) ... .map(s => (k, v)) .mapWithState(...numPartitions(32)) ... .foreachRdd(_.foreachPartition(output)) -------------------------------------------- I was also expecting the 32 partitions of the MapWithStateRDD would be distributed evenly between the two executors, but it turned out that all the 32 were on one of them. I noticed that you replyed 'Are you using KafkaUtils.createDirectStream? ' and I was wondering whether this Kafka direct stream lead to this situation. Or is there something else? Thanks a lot! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-does-MapWithStateRDD-distribute-the-data-tp18544p21770.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org