An easier (although less clean) solution can be a "hidden" configuration:
Avoid adding the new configuration to KafkaConfig's ConfigDef and the docs, but grab a pre-defined parameter anyway (if exists, use the reasonable default if it doesn't). This will allow us to override the value in tests (and emergencies) but avoid confusing users with new user-exposed configurations that they most likely won't need. (The concept is inspired by Oracle's hidden parameters, which serve the same goal but also has support implications) Gwen On Mon, Aug 10, 2015 at 9:39 AM, Jay Kreps <j...@confluent.io> wrote: > I guess the question, which I think is what Gwen was getting at, is if, > rather than making this configurable, it might be possible to make this > just work reliably and with the lowest possible latency in some automatic > fashion? I raised group commit because that is a way to automatically batch > under load. If that doesn't work perhaps there is another way? The > challenge as we've seen with obscure configs is that 99% of people can't > figure out how to set them so for 99% of people this won't really help > them. > > -Jay > > On Mon, Aug 10, 2015 at 9:12 AM, Ashish Singh <asi...@cloudera.com> wrote: > > > Hey Guys, > > > > Looks like Jun and Jiangjie have covered the motivation behind the > config. > > To me one would want to set ISR propagation delay to a high number > > primarily during rolling upgrade. I think the delay one would want to > have > > in ISR propagation is proportional to the cluster size. However, during > > normal operations a faster ISR propagation is desired. Having a config to > > expose the delay time provides admin a way to control it, and it will > come > > with a default value so if someone does not want to play with it they can > > choose not to. > > > > @Gwen, does that answer your question? > > > > On Sun, Aug 9, 2015 at 3:26 PM, Jiangjie Qin <j...@linkedin.com.invalid> > > wrote: > > > > > Jun, > > > > > > Thanks for the detailed explanation. Now I understand. Yes, we might > not > > be > > > able leverage the group commit in this case. > > > > > > When I was testing the patch, I also found a potential use case for the > > > config (not sure if that is a strong use case though). When we rolling > > > upgrade a cluster, if the controller is still running on an old > version, > > > the brokers got bounced will create a bunch of ZK path that will not be > > > picked up by the controller until the controller is upgraded to the new > > > version. It might be fine for small clusters, for larger clusters, > there > > > could be many ISR change got accumulated before the controller runs on > > the > > > new version. So if we have the config, when people do rolling upgrade, > > they > > > can first have the propagation interval to be very large so ISR change > > will > > > not be propagated. After all the brokers are running on the new > version, > > we > > > can set the propagation interval back to a short value and bounce > brokers > > > again. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Sun, Aug 9, 2015 at 12:22 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > Jiangjie, > > > > > > > > Related to group commit, I think what Jay says is the following. > > Suppose > > > > the propagation of ISR is slow and also creates back pressure, you > can > > > just > > > > propagate the ISR one batch at a time as fast as you can. After one > > batch > > > > is sent, you just collect all ISR changes that have accumulated since > > the > > > > last propagation and propagate them as a batch. This way, the amount > of > > > > batching happens automatically based on the propagation rate and the > > > delay > > > > of propagation is minimized. However, this doesn't quite work if we > > > > propagate the ISR changes by writing a notification path in ZK. There > > is > > > no > > > > back pressure: the notification can be written faster than the > listener > > > can > > > > process it. So the explicit batching that you did essentially is to > get > > > > around this problem. > > > > > > > > As for the config, I can see a couple of use cases: (1) We have unit > > > tests > > > > that verify the propagation of ISR. To prevent those tests taking too > > > long, > > > > we can configure a smaller propagation interval. (2) When there are > > many > > > > partitions and brokers, someone may want to increase the propagation > > > > interval to reduce the ZK overhead. I agree that most people likely > > don't > > > > need to change this setting since the default should be good enough. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > > > > > On Fri, Aug 7, 2015 at 4:06 PM, Gwen Shapira <g...@confluent.io> > > wrote: > > > > > > > > > Maybe Ashish can supply the use-case and tuning advice then :) > > > > > I'm a -1 on adding new configurations that we can't quite explain. > > > > > > > > > > On Fri, Aug 7, 2015 at 3:57 PM, Jiangjie Qin > > <j...@linkedin.com.invalid > > > > > > > > > wrote: > > > > > > > > > > > Hi Gwen, > > > > > > > > > > > > Completely agree with you. I originally just hard coded it to be > 10 > > > > > > seconds. Ashish raised this requirement in KAFKA-2406 because > > people > > > > > might > > > > > > want to ISR changes get propagated quicker. > > > > > > I don't have a good use case myself. Personally I think hard code > > it > > > is > > > > > > fine although I don't object to make it configurable. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > Ashish > > >