Hi Biplob,

How many partitions are on the topic you are reading from and have you set
the maxRatePerPartition?  iirc, spark back pressure is calculated as
follows:

*Spark back pressure:*

Back pressure is calculated off of the following:


• maxRatePerPartition=200

• batchInterval 30s

• 3 partitions on Ingest topic


This results in a maximum ingest rate of 18K:


• 3 * 30 * 200 = 180000 max

The spark.streaming.backpressure.initialRate only applies to the first
batch, per docs:


This is the initial maximum receiving rate at which each receiver will
> receive data for the *first batch* when the backpressure mechanism is
> enabled.


If you  set the maxRatePerPartition and apply the above formula, I believe
you will be able to achieve the results you are looking for.

HTH.

-Todd


On Thu, Jul 26, 2018 at 7:21 AM Biplob Biswas <revolutioni...@gmail.com>
wrote:

> Did anyone face similar issue? and any viable way to solve this?
> Thanks & Regards
> Biplob Biswas
>
>
> On Wed, Jul 25, 2018 at 4:23 PM Biplob Biswas <revolutioni...@gmail.com>
> wrote:
>
>> I have enabled the spark.streaming.backpressure.enabled setting and also
>>  set spark.streaming.backpressure.initialRate  to 15000, but my spark
>> job is not respecting these settings when reading from Kafka after a
>> failure.
>>
>> In my kafka topic around 500k records are waiting for being processed and
>> they are all taken in 1 huge batch which ultimately takes a long time and
>> fails with executor failure exception. We don't have more resources to give
>> in our test cluster and we expect the backpressure to kick in and take
>> smaller batches.
>>
>> What can I be doing wrong?
>>
>>
>> Thanks & Regards
>> Biplob Biswas
>>
>

Reply via email to