Re: spark.streaming.kafka.maxRatePerPartition for direct stream

Cody Koeninger Fri, 02 Oct 2015 06:56:52 -0700

But turning backpressure on won't stop you from choking on the first batch
if you're doing e.g. some kind of in-memory aggregate that can't handle
that many records at once.


On Fri, Oct 2, 2015 at 1:10 AM, Sourabh Chandak <sourabh3...@gmail.com>
wrote:

> Thanks Cody, will try to do some estimation.
>
> Thanks Nicolae, will try out this config.
>
> Thanks,
> Sourabh
>
> On Thu, Oct 1, 2015 at 11:01 PM, Nicolae Marasoiu <
> nicolae.maras...@adswizz.com> wrote:
>
>> Hi,
>>
>>
>> Set 10ms and spark.streaming.backpressure.enabled=true
>>
>>
>> This should automatically delay the next batch until the current one is
>> processed, or at least create that balance over a few batches/periods
>> between the consume/process rate vs ingestion rate.
>>
>>
>> Nicu
>>
>> ------------------------------
>> *From:* Cody Koeninger <c...@koeninger.org>
>> *Sent:* Thursday, October 1, 2015 11:46 PM
>> *To:* Sourabh Chandak
>> *Cc:* user
>> *Subject:* Re: spark.streaming.kafka.maxRatePerPartition for direct
>> stream
>>
>> That depends on your job, your cluster resources, the number of seconds
>> per batch...
>>
>> You'll need to do some empirical work to figure out how many messages per
>> batch a given executor can handle.  Divide that by the number of seconds
>> per batch.
>>
>>
>>
>> On Thu, Oct 1, 2015 at 3:39 PM, Sourabh Chandak <sourabh3...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am writing a spark streaming job using the direct stream method for
>>> kafka and wanted to handle the case of checkpoint failure when we'll have
>>> to reprocess the entire data from starting. By default for every new
>>> checkpoint it tries to load everything from each partition and that takes a
>>> lot of time for processing. After some searching found out that there
>>> exists a config spark.streaming.kafka.maxRatePerPartition which can be used
>>> to tackle this. My question is what will be a suitable range for this
>>> config if we have ~12 million messages in kafka with maximum message size
>>> ~10 MB.
>>>
>>> Thanks,
>>> Sourabh
>>>
>>
>>
>

Re: spark.streaming.kafka.maxRatePerPartition for direct stream

Reply via email to