Spark offers a receiver-based approach or direct approach with Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html), and a note in the receiver-based approach says "topic partitions in Kafka does correlate to partitions of RDDs generated in Spark Streaming."
A fix might be as simple as switching to the direct approach <https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers> ? Jon Gregg On Wed, Feb 22, 2017 at 12:37 AM, satishl <satish.la...@gmail.com> wrote: > I am reading from a kafka topic which has 8 partitions. My spark app is > given > 40 executors (1 core per executor). After reading the data, I repartition > the dstream by 500, map it and save it to cassandra. > However, I see that only 2 executors are being used per batch. even though > I > see 500 tasks for the stage all of them are sequentially scheduled on the 2 > executors picked. My spark concepts are still forming and I missing > something obvious. > I expected that 8 executors will be picked for reading data from the 8 > partitions in kafka, and then with the repartition this data will be > distributed between 40 executors and then saved to cassandra. > How should I think about this? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Spark-executors-in-streaming-app- > always-uses-2-executors-tp28413.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >