I am reading from a kafka topic which has 8 partitions. My spark app is given 40 executors (1 core per executor). After reading the data, I repartition the dstream by 500, map it and save it to cassandra. However, I see that only 2 executors are being used per batch. even though I see 500 tasks for the stage all of them are sequentially scheduled on the 2 executors picked. My spark concepts are still forming and I missing something obvious. I expected that 8 executors will be picked for reading data from the 8 partitions in kafka, and then with the repartition this data will be distributed between 40 executors and then saved to cassandra. How should I think about this?
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executors-in-streaming-app-always-uses-2-executors-tp28413.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org