I'm fine if it goes into 0.10.2.0

On Tue, Jan 31, 2017 at 8:53 AM, Onur Karaman <okara...@linkedin.com.invalid
> wrote:

> I get where Ewen's coming from but personally, I have trouble understanding
> a use case where end users would specifically rely on existing behavior of
> group coordination and offset commits succeeding with nondeterministic RF
> (which is only later to be manually fixed) during the window of time when a
> legitimate non-quickstart cluster is in the process of being setup.
>
> As Ewen mentioned, server.properties is already blatantly not meant for
> production use cases but actually for quickstart. It chooses /tmp for its
> log.dirs, localhost for its zookeeper.connect, and references
> kafka.server.KafkaConfig for additional details and defaults. It seems that
> setting offsets.topic.replication.factor to 1 in server.properties is the
> right way to go.
>
> That being said, if there actually is opposition, then it would be fair to
> bump it to 0.10.3.0.
>
> On Mon, Jan 30, 2017 at 6:12 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > So, we have one other blocker bug in system tests that we're trying to
> make
> > sure can safely be removed, so we've had a bit of slack time with this.
> > Obviously having this all happen very last minute isn't really ideal
> since
> > it didn't allow enough time to address the feedback -- Stevo's questions
> > didn't get a response until today.
> >
> > I'd really like this fix in because I think it actually seems pretty low
> > risk, although it would potentially make some consumers on a new cluster
> be
> > delayed indefinitely if the user is relying on having fewer than the
> > default offset.topic.replication.factor. Many users will have there
> > server.properties files stored elsewhere and not rely on the version we
> > ship (and they shouldn't -- it contains other dangerous settings like
> > log.dirs in /tmp). This does mean some users could be negatively
> affected,
> > although I have to admit it seems like it'd be an extremely small
> > population of our users.
> >
> > All that said, given the late voting and discussion continuing while the
> > VOTE thread proceeded, I'd like to suggest that if anyone objects to
> > including in 0.10.2.0, we bump it to 0.10.3.0 and also follow up on
> > Ismael's question (re: does this need to be in a major version, which
> > afaict was only followed up by Jeff's comment that he'd rather not wait).
> > The patch is there, so those that desperately want it can easily
> > cherry-pick it and build a version themselves if it's that big a problem
> > for them, else they would see the fix in the next release.
> >
> > More generally, I think this KIP (and my experience dealing w/ the KIPs +
> > feature freeze for this release) suggest we need a little more clarity on
> > how KIPs fit into the release process. I'm going to follow up in a new
> > release post-mortem thread further but I think we should a) start to
> > distinguish between feature KIPs and bug fix KIPs and possibly apply
> > different rules for them (this one is admittedly ambiguous) and b) push
> the
> > deadline for feature KIPs to a week before feature freeze (giving a more
> > realistic review period for all that code) and make bugfix KIPs
> acceptable
> > up to a week before code freeze (again, providing a realistic review
> > period), and c) move our freeze dates off of Fridays since having them
> > there implicitly gives people 2 extra days to squeeze things in.
> >
> > -Ewen
> >
> > On Mon, Jan 30, 2017 at 1:46 PM, Onur Karaman
> > <okara...@linkedin.com.invalid
> > > wrote:
> >
> > > I've updated the KIP title to reflect this distinction:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 115%3A+Enforce+offsets.topic.replication.factor+upon+__
> > > consumer_offsets+auto+topic+creation
> > >
> > > On Mon, Jan 30, 2017 at 12:52 AM, Onur Karaman <
> > > onurkaraman.apa...@gmail.com
> > > > wrote:
> > >
> > > > Regarding Joel's comment:
> > > > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com>
> wrote:
> > > > >
> > > > > already voted, but one thing worth considering (since this KIP
> speaks
> > > of
> > > > > *enforcement*) is desired behavior if the topic already exists and
> > the
> > > > > config != existing RF.
> > > > >
> > > >
> > > > The short answer: The first line of the KIP specifically states that
> it
> > > > attempts enforcement only upon topic creation and I think anything
> else
> > > > should be beyond the scope of this KIP.
> > > >
> > > > The long answer:
> > > > Mismatch between existing RF and the "offsets.topic.replication.
> > factor"
> > > > config happens with:
> > > > a. topic creation paths 3-5 as defined in the KIP if the size of the
> > > > replicas set resulting from AdminUtils != "offsets.topic.replication.
> > > > factor"
> > > > b. topic creation path 6
> > > > c. a config change to the broker's "offsets.topic.replication.
> factor"
> > > > d. partition reassignments that expand the RF
> > > >
> > > > For all of these scenarios, I believe it all boils down to the intent
> > of
> > > > the imperfectly named "offsets.topic.replication.factor" and
> > > > "default.replication.factor" configs. These configs really only
> > represent
> > > > the RF to be used upon auto topic creation by the broker and are
> never
> > > > referenced anywhere else, whether it's "offsets.topic.replication.
> > > factor"
> > > > for the __consumer_offsets topic or "default.replication.factor" for
> > any
> > > > other topic.
> > > >
> > > > I think any RF mismatch after topic creation is beyond the scope of
> > this
> > > > discussion since the configs anyways were not intended to enforce RF
> > > > anywhere other than upon auto topic creation by the broker.
> > > >
> > > > Regarding Stevo's comment:
> > > > > On Thu, Jan 26, 2017 at 4:26 AM, Stevo Slavić <ssla...@gmail.com>
> > > wrote:
> > > > > test cluster. If this problem of offsets.topic.replication.factor
> > not
> > > > being
> > > > > enforced others also observed only in their tests only, than I
> don't
> > > like
> > > > > the KIP proposed change, of setting offsets.topic.replication.
> factor
> > > to
> > > > 1
> > > > > by default. I understand backward compatibility goals of this, but
> I
> > > can
> > > > > imagine late discovered production issues as consequences of this
> > > change.
> > > >
> > > > It's worth noting that the KafkaConfig default for
> > > > "offsets.topic.replication.factor" is still 3. This KIP merely
> changes
> > > the
> > > > config/server.properties default to 1 so that the quickstart
> > > > "./bin/kafka-server-start.sh config/server.properties" continues to
> run
> > > > smoothly.
> > > >
> > > > On Thu, Jan 26, 2017 at 9:34 AM, Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > +1 (non-binding) for KIP-115.
> > > > >
> > > > > On Thu, Jan 26, 2017, at 04:26, Stevo Slavić wrote:
> > > > > > If I understood well, this KIP is trying to solve for the problem
> > of
> > > > > > offsets.topic.replication.factor not being enforced,
> particularly
> > in
> > > > > > context of  "when you have clients or tooling running as the
> > cluster
> > > is
> > > > > > getting setup". Assuming that this problem was observed in
> > > production,
> > > > so
> > > > > > in non-testing only conditions, would it make sense to introduce
> > > > > > additional
> > > > > > property - min number of alive brokers before offsets topic is
> > > allowed
> > > > to
> > > > > > be created?
> > > > >
> > > > > It's an interesting idea, but... is there a user use-case for a
> > > property
> > > > > like this?  I'm having a hard time thinking of one, but maybe I
> > missed
> > > > > something.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > > >
> > > > > > Currently offsets.topic.replication.factor is used for that
> > purpose,
> > > > so
> > > > > > with offsets.topic.replication.factor set to 3 it's enough to
> have
> > > > just
> > > > > 3
> > > > > > brokers up for offsets topic to be created. Then all replicas of
> > all
> > > > (by
> > > > > > default 50) partitions of this topic would be spread out over
> just
> > > > these
> > > > > > 3
> > > > > > brokers, while eventually entire cluster might be much larger in
> > size
> > > > and
> > > > > > would benefit from wider spread of consumer offsets topic
> > partitions
> > > > > > leadership.
> > > > > >
> > > > > > One can achieve wider spread later, manually. But that would
> first
> > > have
> > > > > > to
> > > > > > be detected, and then use provided CLI/scripts to change replica
> > > > > > assignment. IMO it would be better if it was possible to
> configure
> > > > > > desired
> > > > > > spread, even if just indirectly through configuring min number of
> > > alive
> > > > > > brokers. If not overriden in server.properties, this new property
> > can
> > > > > > default to offsets.topic.replication.factor
> > > > > >
> > > > > > I've been bitten by problem of offsets.topic.replication.factor
> > not
> > > > > being
> > > > > > enforced but only in testing, integration tests, it was almost
> > > > > > unpredictable when offsets topic is ready, test cluster
> > initialized,
> > > > > > would
> > > > > > get lots of false failures, unstable tests, but eventually got to
> > > > > > predictable deterministic test behavior, found ways to fully
> > > initialize
> > > > > > test cluster. If this problem of offsets.topic.replication.
> factor
> > > not
> > > > > > being
> > > > > > enforced others also observed only in their tests only, than I
> > don't
> > > > like
> > > > > > the KIP proposed change, of setting offsets.topic.replication.
> > factor
> > > > to
> > > > > 1
> > > > > > by default. I understand backward compatibility goals of this,
> but
> > I
> > > > can
> > > > > > imagine late discovered production issues as consequences of this
> > > > change.
> > > > > > So I wouldn't like to trade off production issues probability for
> > > > testing
> > > > > > convenience.
> > > > > >
> > > > > > Current Kafka documentation has nice note about
> > > > > > offsets.topic.replication.factor and related behavior. New note
> > > about
> > > > > new
> > > > > > default would have to be a warning in bold and red in docs, and
> > every
> > > > > > broker should output proper warning in log if configuration for
> > > > > > offsets.topic.replication.factor is on new proposed default of
> 1.
> > > > > >
> > > > > > Kind regards,
> > > > > > Stevo Slavic.
> > > > > >
> > > > > > On Thu, Jan 26, 2017 at 8:43 AM, James Cheng <
> wushuja...@gmail.com
> > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > already voted, but one thing worth considering (since this
> KIP
> > > > > speaks of
> > > > > > > > *enforcement*) is desired behavior if the topic already
> exists
> > > and
> > > > > the
> > > > > > > > config != existing RF.
> > > > > > > >
> > > > > > >
> > > > > > > Yeah, I'm curious about this too.
> > > > > > >
> > > > > > > -James
> > > > > > >
> > > > > > > > On Wed, Jan 25, 2017 at 4:30 PM, Dong Lin <
> lindon...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > >> +1
> > > > > > > >>
> > > > > > > >> On Wed, Jan 25, 2017 at 4:22 PM, Ismael Juma <
> > ism...@juma.me.uk
> > > >
> > > > > wrote:
> > > > > > > >>
> > > > > > > >>> An important question is if this needs to wait for a major
> > > > release
> > > > > or
> > > > > > > >> not.
> > > > > > > >>>
> > > > > > > >>> Ismael
> > > > > > > >>>
> > > > > > > >>> On Thu, Jan 26, 2017 at 12:19 AM, Ismael Juma <
> > > ism...@juma.me.uk
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> +1 from me too.
> > > > > > > >>>>
> > > > > > > >>>> Ismael
> > > > > > > >>>>
> > > > > > > >>>> On Thu, Jan 26, 2017 at 12:07 AM, Ewen Cheslack-Postava <
> > > > > > > >>> e...@confluent.io
> > > > > > > >>>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> +1
> > > > > > > >>>>>
> > > > > > > >>>>> Since this is an unusual one, I think it's worth pointing
> > out
> > > > > that
> > > > > > > the
> > > > > > > >>> KIP
> > > > > > > >>>>> notes it is really a bug fix, but since it has
> > compatibility
> > > > > > > >>> implications
> > > > > > > >>>>> the KIP was worth it. It was a sort of intentional bug,
> but
> > > > > confusing
> > > > > > > >>> and
> > > > > > > >>>>> dangerous.
> > > > > > > >>>>>
> > > > > > > >>>>> Seems important to fix this ASAP since people are hitting
> > > this
> > > > in
> > > > > > > >>> practice
> > > > > > > >>>>> and would have to go out of their way to set up
> monitoring
> > to
> > > > > catch
> > > > > > > >> the
> > > > > > > >>>>> issue.
> > > > > > > >>>>>
> > > > > > > >>>>> -Ewen
> > > > > > > >>>>>
> > > > > > > >>>>> On Wed, Jan 25, 2017 at 4:02 PM, Jason Gustafson <
> > > > > ja...@confluent.io
> > > > > > > >
> > > > > > > >>>>> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> +1 from me. The current behavior seems both surprising
> and
> > > > > > > >> dangerous.
> > > > > > > >>>>>>
> > > > > > > >>>>>> -Jason
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Wed, Jan 25, 2017 at 3:58 PM, Onur Karaman <
> > > > > > > >>>>>> onurkaraman.apa...@gmail.com>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Hey everyone.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I made a bug-fix KIP-115 to enforce
> > > > offsets.topic.replication.
> > > > > > > >>> factor:
> > > > > > > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > >>>>>>> 115%3A+Enforce+offsets.topic.replication.factor
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Comments are welcome.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> - Onur
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to