I'm fine if it goes into 0.10.2.0 On Tue, Jan 31, 2017 at 8:53 AM, Onur Karaman <okara...@linkedin.com.invalid > wrote:
> I get where Ewen's coming from but personally, I have trouble understanding > a use case where end users would specifically rely on existing behavior of > group coordination and offset commits succeeding with nondeterministic RF > (which is only later to be manually fixed) during the window of time when a > legitimate non-quickstart cluster is in the process of being setup. > > As Ewen mentioned, server.properties is already blatantly not meant for > production use cases but actually for quickstart. It chooses /tmp for its > log.dirs, localhost for its zookeeper.connect, and references > kafka.server.KafkaConfig for additional details and defaults. It seems that > setting offsets.topic.replication.factor to 1 in server.properties is the > right way to go. > > That being said, if there actually is opposition, then it would be fair to > bump it to 0.10.3.0. > > On Mon, Jan 30, 2017 at 6:12 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > So, we have one other blocker bug in system tests that we're trying to > make > > sure can safely be removed, so we've had a bit of slack time with this. > > Obviously having this all happen very last minute isn't really ideal > since > > it didn't allow enough time to address the feedback -- Stevo's questions > > didn't get a response until today. > > > > I'd really like this fix in because I think it actually seems pretty low > > risk, although it would potentially make some consumers on a new cluster > be > > delayed indefinitely if the user is relying on having fewer than the > > default offset.topic.replication.factor. Many users will have there > > server.properties files stored elsewhere and not rely on the version we > > ship (and they shouldn't -- it contains other dangerous settings like > > log.dirs in /tmp). This does mean some users could be negatively > affected, > > although I have to admit it seems like it'd be an extremely small > > population of our users. > > > > All that said, given the late voting and discussion continuing while the > > VOTE thread proceeded, I'd like to suggest that if anyone objects to > > including in 0.10.2.0, we bump it to 0.10.3.0 and also follow up on > > Ismael's question (re: does this need to be in a major version, which > > afaict was only followed up by Jeff's comment that he'd rather not wait). > > The patch is there, so those that desperately want it can easily > > cherry-pick it and build a version themselves if it's that big a problem > > for them, else they would see the fix in the next release. > > > > More generally, I think this KIP (and my experience dealing w/ the KIPs + > > feature freeze for this release) suggest we need a little more clarity on > > how KIPs fit into the release process. I'm going to follow up in a new > > release post-mortem thread further but I think we should a) start to > > distinguish between feature KIPs and bug fix KIPs and possibly apply > > different rules for them (this one is admittedly ambiguous) and b) push > the > > deadline for feature KIPs to a week before feature freeze (giving a more > > realistic review period for all that code) and make bugfix KIPs > acceptable > > up to a week before code freeze (again, providing a realistic review > > period), and c) move our freeze dates off of Fridays since having them > > there implicitly gives people 2 extra days to squeeze things in. > > > > -Ewen > > > > On Mon, Jan 30, 2017 at 1:46 PM, Onur Karaman > > <okara...@linkedin.com.invalid > > > wrote: > > > > > I've updated the KIP title to reflect this distinction: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 115%3A+Enforce+offsets.topic.replication.factor+upon+__ > > > consumer_offsets+auto+topic+creation > > > > > > On Mon, Jan 30, 2017 at 12:52 AM, Onur Karaman < > > > onurkaraman.apa...@gmail.com > > > > wrote: > > > > > > > Regarding Joel's comment: > > > > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com> > wrote: > > > > > > > > > > already voted, but one thing worth considering (since this KIP > speaks > > > of > > > > > *enforcement*) is desired behavior if the topic already exists and > > the > > > > > config != existing RF. > > > > > > > > > > > > > The short answer: The first line of the KIP specifically states that > it > > > > attempts enforcement only upon topic creation and I think anything > else > > > > should be beyond the scope of this KIP. > > > > > > > > The long answer: > > > > Mismatch between existing RF and the "offsets.topic.replication. > > factor" > > > > config happens with: > > > > a. topic creation paths 3-5 as defined in the KIP if the size of the > > > > replicas set resulting from AdminUtils != "offsets.topic.replication. > > > > factor" > > > > b. topic creation path 6 > > > > c. a config change to the broker's "offsets.topic.replication. > factor" > > > > d. partition reassignments that expand the RF > > > > > > > > For all of these scenarios, I believe it all boils down to the intent > > of > > > > the imperfectly named "offsets.topic.replication.factor" and > > > > "default.replication.factor" configs. These configs really only > > represent > > > > the RF to be used upon auto topic creation by the broker and are > never > > > > referenced anywhere else, whether it's "offsets.topic.replication. > > > factor" > > > > for the __consumer_offsets topic or "default.replication.factor" for > > any > > > > other topic. > > > > > > > > I think any RF mismatch after topic creation is beyond the scope of > > this > > > > discussion since the configs anyways were not intended to enforce RF > > > > anywhere other than upon auto topic creation by the broker. > > > > > > > > Regarding Stevo's comment: > > > > > On Thu, Jan 26, 2017 at 4:26 AM, Stevo Slavić <ssla...@gmail.com> > > > wrote: > > > > > test cluster. If this problem of offsets.topic.replication.factor > > not > > > > being > > > > > enforced others also observed only in their tests only, than I > don't > > > like > > > > > the KIP proposed change, of setting offsets.topic.replication. > factor > > > to > > > > 1 > > > > > by default. I understand backward compatibility goals of this, but > I > > > can > > > > > imagine late discovered production issues as consequences of this > > > change. > > > > > > > > It's worth noting that the KafkaConfig default for > > > > "offsets.topic.replication.factor" is still 3. This KIP merely > changes > > > the > > > > config/server.properties default to 1 so that the quickstart > > > > "./bin/kafka-server-start.sh config/server.properties" continues to > run > > > > smoothly. > > > > > > > > On Thu, Jan 26, 2017 at 9:34 AM, Colin McCabe <cmcc...@apache.org> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > +1 (non-binding) for KIP-115. > > > > > > > > > > On Thu, Jan 26, 2017, at 04:26, Stevo Slavić wrote: > > > > > > If I understood well, this KIP is trying to solve for the problem > > of > > > > > > offsets.topic.replication.factor not being enforced, > particularly > > in > > > > > > context of "when you have clients or tooling running as the > > cluster > > > is > > > > > > getting setup". Assuming that this problem was observed in > > > production, > > > > so > > > > > > in non-testing only conditions, would it make sense to introduce > > > > > > additional > > > > > > property - min number of alive brokers before offsets topic is > > > allowed > > > > to > > > > > > be created? > > > > > > > > > > It's an interesting idea, but... is there a user use-case for a > > > property > > > > > like this? I'm having a hard time thinking of one, but maybe I > > missed > > > > > something. > > > > > > > > > > cheers, > > > > > Colin > > > > > > > > > > > > > > > > > Currently offsets.topic.replication.factor is used for that > > purpose, > > > > so > > > > > > with offsets.topic.replication.factor set to 3 it's enough to > have > > > > just > > > > > 3 > > > > > > brokers up for offsets topic to be created. Then all replicas of > > all > > > > (by > > > > > > default 50) partitions of this topic would be spread out over > just > > > > these > > > > > > 3 > > > > > > brokers, while eventually entire cluster might be much larger in > > size > > > > and > > > > > > would benefit from wider spread of consumer offsets topic > > partitions > > > > > > leadership. > > > > > > > > > > > > One can achieve wider spread later, manually. But that would > first > > > have > > > > > > to > > > > > > be detected, and then use provided CLI/scripts to change replica > > > > > > assignment. IMO it would be better if it was possible to > configure > > > > > > desired > > > > > > spread, even if just indirectly through configuring min number of > > > alive > > > > > > brokers. If not overriden in server.properties, this new property > > can > > > > > > default to offsets.topic.replication.factor > > > > > > > > > > > > I've been bitten by problem of offsets.topic.replication.factor > > not > > > > > being > > > > > > enforced but only in testing, integration tests, it was almost > > > > > > unpredictable when offsets topic is ready, test cluster > > initialized, > > > > > > would > > > > > > get lots of false failures, unstable tests, but eventually got to > > > > > > predictable deterministic test behavior, found ways to fully > > > initialize > > > > > > test cluster. If this problem of offsets.topic.replication. > factor > > > not > > > > > > being > > > > > > enforced others also observed only in their tests only, than I > > don't > > > > like > > > > > > the KIP proposed change, of setting offsets.topic.replication. > > factor > > > > to > > > > > 1 > > > > > > by default. I understand backward compatibility goals of this, > but > > I > > > > can > > > > > > imagine late discovered production issues as consequences of this > > > > change. > > > > > > So I wouldn't like to trade off production issues probability for > > > > testing > > > > > > convenience. > > > > > > > > > > > > Current Kafka documentation has nice note about > > > > > > offsets.topic.replication.factor and related behavior. New note > > > about > > > > > new > > > > > > default would have to be a warning in bold and red in docs, and > > every > > > > > > broker should output proper warning in log if configuration for > > > > > > offsets.topic.replication.factor is on new proposed default of > 1. > > > > > > > > > > > > Kind regards, > > > > > > Stevo Slavic. > > > > > > > > > > > > On Thu, Jan 26, 2017 at 8:43 AM, James Cheng < > wushuja...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > > already voted, but one thing worth considering (since this > KIP > > > > > speaks of > > > > > > > > *enforcement*) is desired behavior if the topic already > exists > > > and > > > > > the > > > > > > > > config != existing RF. > > > > > > > > > > > > > > > > > > > > > > Yeah, I'm curious about this too. > > > > > > > > > > > > > > -James > > > > > > > > > > > > > > > On Wed, Jan 25, 2017 at 4:30 PM, Dong Lin < > lindon...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > >> +1 > > > > > > > >> > > > > > > > >> On Wed, Jan 25, 2017 at 4:22 PM, Ismael Juma < > > ism...@juma.me.uk > > > > > > > > > wrote: > > > > > > > >> > > > > > > > >>> An important question is if this needs to wait for a major > > > > release > > > > > or > > > > > > > >> not. > > > > > > > >>> > > > > > > > >>> Ismael > > > > > > > >>> > > > > > > > >>> On Thu, Jan 26, 2017 at 12:19 AM, Ismael Juma < > > > ism...@juma.me.uk > > > > > > > > > > > > wrote: > > > > > > > >>> > > > > > > > >>>> +1 from me too. > > > > > > > >>>> > > > > > > > >>>> Ismael > > > > > > > >>>> > > > > > > > >>>> On Thu, Jan 26, 2017 at 12:07 AM, Ewen Cheslack-Postava < > > > > > > > >>> e...@confluent.io > > > > > > > >>>>> wrote: > > > > > > > >>>> > > > > > > > >>>>> +1 > > > > > > > >>>>> > > > > > > > >>>>> Since this is an unusual one, I think it's worth pointing > > out > > > > > that > > > > > > > the > > > > > > > >>> KIP > > > > > > > >>>>> notes it is really a bug fix, but since it has > > compatibility > > > > > > > >>> implications > > > > > > > >>>>> the KIP was worth it. It was a sort of intentional bug, > but > > > > > confusing > > > > > > > >>> and > > > > > > > >>>>> dangerous. > > > > > > > >>>>> > > > > > > > >>>>> Seems important to fix this ASAP since people are hitting > > > this > > > > in > > > > > > > >>> practice > > > > > > > >>>>> and would have to go out of their way to set up > monitoring > > to > > > > > catch > > > > > > > >> the > > > > > > > >>>>> issue. > > > > > > > >>>>> > > > > > > > >>>>> -Ewen > > > > > > > >>>>> > > > > > > > >>>>> On Wed, Jan 25, 2017 at 4:02 PM, Jason Gustafson < > > > > > ja...@confluent.io > > > > > > > > > > > > > > > >>>>> wrote: > > > > > > > >>>>> > > > > > > > >>>>>> +1 from me. The current behavior seems both surprising > and > > > > > > > >> dangerous. > > > > > > > >>>>>> > > > > > > > >>>>>> -Jason > > > > > > > >>>>>> > > > > > > > >>>>>> On Wed, Jan 25, 2017 at 3:58 PM, Onur Karaman < > > > > > > > >>>>>> onurkaraman.apa...@gmail.com> > > > > > > > >>>>>> wrote: > > > > > > > >>>>>> > > > > > > > >>>>>>> Hey everyone. > > > > > > > >>>>>>> > > > > > > > >>>>>>> I made a bug-fix KIP-115 to enforce > > > > offsets.topic.replication. > > > > > > > >>> factor: > > > > > > > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > >>>>>>> 115%3A+Enforce+offsets.topic.replication.factor > > > > > > > >>>>>>> > > > > > > > >>>>>>> Comments are welcome. > > > > > > > >>>>>>> > > > > > > > >>>>>>> - Onur > > > > > > > >>>>>>> > > > > > > > >>>>>> > > > > > > > >>>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >