Thanks for the write-up. Looks good to me. Minor comments:

Under proposed changes:

"""The proposal is to calculate replica lag as the amount of time not
caught up to the leader. A replica is only in ISR if it is caught
up."""

It may be worth clarifying this on the KIP a bit more - i.e., we
currently have two forms of replica lag (time-based and message-count
based). The latter is hard to do correctly across all topics and the
former is in fact sufficient. i.e., we now have only one concept: "For
how long can a replica be out-of-sync before it is explicitly removed
from the ISR?  This is the new interpretation of replica.lag.max.ms. A
replica that is currently in ISR and happens to fall behind due to a
burst is given this grace period to catch up to the leader without
being removed from ISR."

Also, may want to warn if replica.max.messages is provided (doesn't seem to be 
there in the current patch).

Joel

On Thu, Mar 12, 2015 at 09:35:24PM +0000, Aditya Auradkar wrote:
> I will change the wording to reflect this. But yes, a broker follower should 
> only enter the ISR once it is fully caught up. 
> 
> Caught up means that the follower has read from the log end offset from the 
> broker. I'm using the log end offset from before the actual read operation to 
> avoid these off by one errors. In any case, I plan to run this locally with a 
> small cluster and see how it performs.
> 
> Aditya
> 
> ________________________________________
> From: Joe Stein [joe.st...@stealth.ly]
> Sent: Thursday, March 12, 2015 1:54 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP 16 - Replica lag tuning
> 
> Hi Aditya, thanks for the writeup.
> 
> Lets say a broker follower goes down. And it is down for an hour or two....
> 
> When the broker follower comes back up it will start sending fetch requests
> (lets say every 2ms which would be under a configured lets say 100ms
> (whatever)). Then right away the brokers gets added back to the ISR?
> 
> Maybe it is just the wording or how I am reading it... I think/thought that
> once the replica is caught up THEN the setting goes into action and as long
> as (every 100ms ... whatever) the broker leader is seeing the broker
> follower as "caught up" then it is in the ISR.
> 
> Also, what is the definition of "caught up" now without the number of
> messages? If it is === i worry about that not happening in some networks
> where it is always off by one or something maybe?
> 
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
> 
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
> 
> On Thu, Mar 12, 2015 at 4:36 PM, Aditya Auradkar <
> aaurad...@linkedin.com.invalid> wrote:
> 
> > I wrote a KIP for this after some discussion on KAFKA-1546.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP+16+:+Automated+Replica+Lag+Tuning
> >
> > The RB is here: https://reviews.apache.org/r/31967/
> >
> > Thanks,
> > Aditya
> >

Reply via email to