Re: [GitHub] kafka pull request: Parallel log-recovery of un-flushed segments o...

Achanta Vamsi Subhash Sat, 19 Mar 2016 04:49:35 -0700

Can some one please review this?

On Fri, Mar 11, 2016 at 12:09 AM, Achanta Vamsi Subhash <
[email protected]> wrote:


> Hi,
> I would like to make this into 0.0.10.0 so can someone look into this and
> review?
>
> On Wed, Mar 9, 2016 at 10:29 PM, Achanta Vamsi Subhash <
> [email protected]> wrote:
>
>> Hi all,
>>
>> https://github.com/apache/kafka/pull/1035
>> This pull request will make the log-segment load parallel with two
>> configurable properties "log.recovery.threads" and "
>> log.recovery.max.interval.ms".
>>
>> On startup, currently the log segments within a logDir are loaded
>> sequentially when there is a un-clean shutdown. This will take a lot of
>> time for the segments to be loaded as the logSegment.recover(..) is called
>> for every segment and for brokers which have many partitions, the time
>> taken will be very high (we have noticed ~40mins for 2k partitions).
>>
>> Logic:
>> 1. Have a threadpool defined of fixed length (log.recovery.threads)
>> 2. Submit the logSegment recovery as a job to the threadpool and add the
>> future returned to a job list
>> 3. Wait till all the jobs are done within req. time (
>> log.recovery.max.interval.ms - default set to Long.Max).
>> 4. If they are done and the futures are all null (meaning that the jobs
>> are successfully completed), it is considered done.
>> 5. If any of the recovery jobs failed, then it is logged and
>> LogRecoveryFailedException is thrown
>> 6. If the timeout is reached, LogRecoveryFailedException is thrown.
>> The logic is backward compatible with the current sequential
>> implementation as the default thread count is set to 1.
>>
>> JIRA link is here:
>> https://issues.apache.org/jira/browse/KAFKA-3359
>>
>> Please review and give me suggestions. Will make them and contribute.
>> Thanks.
>>
>>
>> On Wed, Mar 9, 2016 at 7:57 PM, vamsi-subhash <[email protected]> wrote:
>>
>>> GitHub user vamsi-subhash opened a pull request:
>>>
>>>     https://github.com/apache/kafka/pull/1035
>>>
>>>     Parallel log-recovery of un-flushed segments on startup
>>>
>>>     Did not find any tests for the method. Will be adding them
>>>
>>> You can merge this pull request into a Git repository by running:
>>>
>>>     $ git pull https://github.com/vamsi-subhash/kafka trunk
>>>
>>> Alternatively you can review and apply these changes as the patch at:
>>>
>>>     https://github.com/apache/kafka/pull/1035.patch
>>>
>>> To close this pull request, make a commit to your master/trunk branch
>>> with (at least) the following in the commit message:
>>>
>>>     This closes #1035
>>>
>>> ----
>>> commit ecab815203a2b6396703660d5a2f9d9bb00efcf3
>>> Author: Vamsi Subhash Achanta <[email protected]>
>>> Date:   2016-03-09T14:24:37Z
>>>
>>>     Made log-recovery parallel
>>>
>>> ----
>>>
>>>
>>> ---
>>> If your project is set up for it, you can reply to this email and have
>>> your
>>> reply appear on GitHub as well. If your project does not have this
>>> feature
>>> enabled and wishes so, or if the feature is enabled but not working,
>>> please
>>> contact infrastructure at [email protected] or file a JIRA
>>> ticket
>>> with INFRA.
>>> ---
>>>
>>
>>
>>
>> --
>> Regards
>> Vamsi Subhash
>>
>
>
>
> --
> Regards
> Vamsi Subhash
>



-- 
Regards
Vamsi Subhash

Re: [GitHub] kafka pull request: Parallel log-recovery of un-flushed segments o...

Reply via email to