AFAIK, there're two levels of parallelism related to the Spark Kafka
consumer:
At JVM level: For each receiver, one can specify the number of threads for
a given topic, provided as a map [topic -> nthreads]. This will
effectively start n JVM threads consuming partitions of that kafka topic.
At Cl
- You are launching up to 10 threads/topic per Receiver. Are you sure your
receivers can support 10 threads each ? (i.e. in the default configuration, do
they have 10 cores). If they have 2 cores, that would explain why this works
with 20 partitions or less.
- If you have 90 partitions, why
I understand that I've to create 10 parallel streams. My code is running
fine when the no of partitions is ~20, but when I increase the no of
partitions I keep getting in this issue.
Below is my code to create kafka streams, along with the configs used.
Map kafkaConf = new HashMap();
kafk
Hi,
Could you add the code where you create the Kafka consumer?
-kr, Gerard.
On Wed, Jan 7, 2015 at 3:43 PM, wrote:
> Hi Mukesh,
>
> If my understanding is correct, each Stream only has a single Receiver.
> So, if you have each receiver consuming 9 partitions, you need 10 input
> DStreams to c
Hi Mukesh,
If my understanding is correct, each Stream only has a single Receiver. So, if
you have each receiver consuming 9 partitions, you need 10 input DStreams to
create 10 concurrent receivers:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism
Hi Guys,
I have a kafka topic having 90 partitions and I running
SparkStreaming(1.2.0) to read from kafka via KafkaUtils to create 10
kafka-receivers.
My streaming is running fine and there is no delay in processing, just that
some partitions data is never getting picked up. From the kafka consol