Hi, You can enable backpressure to handle this. spark.streaming.backpressure.enabled spark.streaming.receiver.maxRate
Thanks, Edwin On Mar 18, 2017, 12:53 AM -0400, sagarcasual . <sagarcas...@gmail.com>, wrote: > Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct > approach. The streaming part works fine but when we initially start the job, > we have to deal with really huge Kafka message backlog, millions of messages, > and that first batch runs for over 40 hours, and after 12 hours or so it > becomes very very slow, it keeps crunching messages, but at a very low speed. > Any idea how to overcome this issue? Once the job is all caught up, > subsequent batches are quick and fast since the load is really tiny to > process. So any idea how to avoid this problem?