Try setting spark.streaming.kafka.maxRatePerPartition, this can help control 
the number of messages read from Kafka per partition on the spark streaming 
consumer.

-S


> On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari <vinti.u...@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to figure out why my kafka+spark job is running slow. I found 
> that spark is consuming all the messages out of kafka into a single batch 
> itself and not sending any messages to the other batches.
> 
> 2016/03/05 21:57:05 0 events - - queued 2016/03/05 21:57:00 0 events - - 
> queued 2016/03/05 21:56:55 0 events - - queued 2016/03/05 21:56:50 0 events - 
> - queued 2016/03/05 21:56:45 0 events - - queued 2016/03/05 21:56:40 4039573 
> events 6 ms - processing
> 
> Does anyone know how this behavior can be changed so that the number of 
> messages are load balanced across all the batches?
> 
> Thanks,
> Vinti

Reply via email to