Jun, Thanks for the detailed explanation. Now I understand. Yes, we might not be able leverage the group commit in this case.
When I was testing the patch, I also found a potential use case for the config (not sure if that is a strong use case though). When we rolling upgrade a cluster, if the controller is still running on an old version, the brokers got bounced will create a bunch of ZK path that will not be picked up by the controller until the controller is upgraded to the new version. It might be fine for small clusters, for larger clusters, there could be many ISR change got accumulated before the controller runs on the new version. So if we have the config, when people do rolling upgrade, they can first have the propagation interval to be very large so ISR change will not be propagated. After all the brokers are running on the new version, we can set the propagation interval back to a short value and bounce brokers again. Thanks, Jiangjie (Becket) Qin On Sun, Aug 9, 2015 at 12:22 PM, Jun Rao <j...@confluent.io> wrote: > Jiangjie, > > Related to group commit, I think what Jay says is the following. Suppose > the propagation of ISR is slow and also creates back pressure, you can just > propagate the ISR one batch at a time as fast as you can. After one batch > is sent, you just collect all ISR changes that have accumulated since the > last propagation and propagate them as a batch. This way, the amount of > batching happens automatically based on the propagation rate and the delay > of propagation is minimized. However, this doesn't quite work if we > propagate the ISR changes by writing a notification path in ZK. There is no > back pressure: the notification can be written faster than the listener can > process it. So the explicit batching that you did essentially is to get > around this problem. > > As for the config, I can see a couple of use cases: (1) We have unit tests > that verify the propagation of ISR. To prevent those tests taking too long, > we can configure a smaller propagation interval. (2) When there are many > partitions and brokers, someone may want to increase the propagation > interval to reduce the ZK overhead. I agree that most people likely don't > need to change this setting since the default should be good enough. > > Thanks, > > Jun > > > > On Fri, Aug 7, 2015 at 4:06 PM, Gwen Shapira <g...@confluent.io> wrote: > > > Maybe Ashish can supply the use-case and tuning advice then :) > > I'm a -1 on adding new configurations that we can't quite explain. > > > > On Fri, Aug 7, 2015 at 3:57 PM, Jiangjie Qin <j...@linkedin.com.invalid> > > wrote: > > > > > Hi Gwen, > > > > > > Completely agree with you. I originally just hard coded it to be 10 > > > seconds. Ashish raised this requirement in KAFKA-2406 because people > > might > > > want to ISR changes get propagated quicker. > > > I don't have a good use case myself. Personally I think hard code it is > > > fine although I don't object to make it configurable. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > >