Re: Spark executors in streaming app always uses 2 executors

Jon Gregg Wed, 22 Feb 2017 06:49:28 -0800

Spark offers a receiver-based approach or direct approach with Kafka (
https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html),
and a note in the receiver-based approach says "topic partitions in Kafka
does correlate to partitions of RDDs generated in Spark Streaming."


A fix might be as simple as switching to the direct approach
<https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers>
?

Jon Gregg

On Wed, Feb 22, 2017 at 12:37 AM, satishl <satish.la...@gmail.com> wrote:

> I am reading from a kafka topic which has 8 partitions. My spark app is
> given
> 40 executors (1 core per executor). After reading the data, I repartition
> the dstream by 500, map it and save it to cassandra.
> However, I see that only 2 executors are being used per batch. even though
> I
> see 500 tasks for the stage all of them are sequentially scheduled on the 2
> executors picked. My spark concepts are still forming and I missing
> something obvious.
> I expected that 8 executors will be picked for reading data from the 8
> partitions in kafka, and then with the repartition this data will be
> distributed between 40 executors and then saved to cassandra.
> How should I think about this?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-executors-in-streaming-app-
> always-uses-2-executors-tp28413.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Spark executors in streaming app always uses 2 executors

Reply via email to