Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Joe Stein Mon, 19 Jan 2015 09:47:45 -0800

<< For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
and then drop the ack > 1 support in trunk (w/o bumping up the protocol
version)?


+1

On Mon, Jan 19, 2015 at 12:35 PM, Jun Rao <j...@confluent.io> wrote:

> For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
> version)? Thanks,
>
> Jun
>
> On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
>> Overall, agree on point #1, less sure on point #2.
>>
>> 1. Some protocols never ever add new errors, while others add errors
>> without bumping versions. HTTP is a good example of the second type.
>> HTTP-451 was added fairly recently, there are some errors specific to
>> NGINX, etc. No one cares. I think we should properly document in the
>> wire-protocol doc that new errors can be added, and I think we should
>> strongly suggest (and implement ourselves) that unknown error codes
>> should be shown to users (or at least logged), so they can be googled
>> and understood through our documentation.
>> In addition, hierarchy of error codes, so clients will know if an
>> error is retry-able just by looking at the code could be nice. Same
>> for adding an error string to the protocol. These are future
>> enhancements that should be discussed separately.
>>
>> 2. I think we want to allow admins to upgrade their Kafka brokers
>> without having to chase down clients in their organization and without
>> getting blamed if clients break. I think it makes sense to have one
>> version that will support existing behavior, but log warnings, so
>> admins will know about misbehaving clients and can track them down
>> before an upgrade that breaks them (or before the broken config causes
>> them to lose data!). Hopefully this is indeed a very rare behavior and
>> we are taking extra precaution for nothing, but I have customers where
>> one traumatic upgrade means they will never upgrade a Kafka again, so
>> I'm being conservative.
>>
>> Gwen
>>
>>
>> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <j...@confluent.io> wrote:
>> > Overall, I agree with Jay on both points.
>> >
>> > 1. I think it's reasonable to add new error codes w/o bumping up the
>> > protocol version. In most cases, by adding new error codes, we are just
>> > refining the categorization of those unknown errors. So, a client
>> shouldn't
>> > behave worse than before as long as unknown errors have been properly
>> > handled.
>> >
>> > 2. I think it's reasonable to just document that 0.8.2 will be the last
>> > release that will support ack > 1 and remove the support completely in
>> trunk
>> > w/o bumping up the protocol. This is because (a) we never included ack
>> > 1
>> > explicitly in the documentation and so the usage should be limited; (2)
>> ack
>> >> 1 doesn't provide the guarantee that people really want and so it
>> > shouldn't really be used.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <jay.kr...@gmail.com>
>> wrote:
>> >>
>> >> Hey guys,
>> >>
>> >> I really think we are discussing two things here:
>> >>
>> >> How should we generally handle changes to the set of errors? Should
>> >> introducing new errors be considered a protocol change or should we
>> reserve
>> >> the right to introduce new error codes?
>> >> Given that this particular change is possibly incompatible, how should
>> we
>> >> handle it?
>> >>
>> >> I think it would be good for people who are responding here to be
>> specific
>> >> about which they are addressing.
>> >>
>> >> Here is what I think:
>> >>
>> >> 1. Errors should be extensible within a protocol version.
>> >>
>> >> We should change the protocol documentation to list the errors that
>> can be
>> >> given back from each api, their meaning, and how to handle them, BUT we
>> >> should explicitly state that the set of errors are open ended. That is
>> we
>> >> should reserve the right to introduce new errors and explicitly state
>> that
>> >> clients need a blanket "unknown error" handling mechanism. The error
>> can
>> >> link to the protocol definition (something like "Unknown error 42, see
>> >> protocol definition at http://link";). We could make this work really
>> well by
>> >> instructing all the clients to report the error in a very googlable
>> way as
>> >> Oracle does with their error format (e.g. "ORA-32") so that if you
>> ever get
>> >> the raw error google will take you to the definition.
>> >>
>> >> I agree that a more rigid definition seems like "right thing", but
>> having
>> >> just implemented two clients and spent a bunch of time on the server
>> side, I
>> >> think, it will work out poorly in practice. Here is why:
>> >>
>> >> I think we will make a lot of mistakes in nailing down the set of error
>> >> codes up front and we will end up going through 3-4 churns of the
>> protocol
>> >> definition just realizing the set of errors that can be thrown. I
>> think this
>> >> churn will actually make life worse for clients that now have to
>> figure out
>> >> 7 identical versions of the protocol and will be a mess in terms of
>> testing
>> >> on the server side. I actually know this to be true because while
>> >> implementing the clients I tried to guess the errors that could be
>> thrown,
>> >> then checked my guess by close code inspection. It turned out that I
>> always
>> >> missed things in my belief about errors, but more importantly even
>> after
>> >> close code inspection I found tons of other errors in my stress
>> testing.
>> >> In practice error handling always involves calling out one or two
>> >> meaningful failures that have special recovery and then a blanket case
>> that
>> >> just handles everything else. It's true that some clients may not have
>> done
>> >> this well, but I think it is for the best if they fix that.
>> >> Reserving the right to add errors doesn't mean we will do it without
>> care.
>> >> We will think through each change and decide whether giving a little
>> more
>> >> precision in the error is worth the overhead and churn of a protocol
>> version
>> >> bump.
>> >>
>> >> 2. In this case in particular we should not introduce a new protocol
>> >> version
>> >>
>> >> In this particular case we are saying that acks > 1 doesn't make sense
>> and
>> >> we want to give an error to people specifying this so that they change
>> their
>> >> configuration. This is a configuration that few people use and we want
>> to
>> >> just make it an error. The bad behavior will just be that the error
>> will not
>> >> be as good as it could be. I think that is a better tradeoff than
>> >> introducing a separate protocol version (this may be true of the java
>> >> clients too).
>> >>
>> >> We will have lots of cases like this in the future and we aren't going
>> to
>> >> want to churn the protocol for each of them. For example we previously
>> had
>> >> to get more precise about which characters were legal and which
>> illegal in
>> >> topic names.
>> >>
>> >> -Jay
>> >>
>> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gshap...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
>> >>> in the broker about client using deprecated behavior (suggested by Joe
>> >>> in the JIRA, and I think it makes sense).
>> >>>
>> >>> Gwen
>> >>>
>> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gshap...@cloudera.com
>> >
>> >>> wrote:
>> >>> > How about we continue the discussion on this thread, so we won't
>> lose
>> >>> > the context of this discussion, and put it up for VOTE when this has
>> >>> > been finalized?
>> >>> >
>> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <n...@confluent.io>
>> >>> > wrote:
>> >>> >> Gwen,
>> >>> >>
>> >>> >> KIP write-up looks good. According to the rest of the KIP process
>> >>> >> proposal,
>> >>> >> would you like to start a DISCUSS/VOTE thread for it?
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Neha
>> >>> >>
>> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>> >>> >> <e...@confluent.io>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
>> needs
>> >>> >>> to be
>> >>> >>> more specific, but I think that discussion probably needs to
>> happen
>> >>> >>> after a
>> >>> >>> solution is agreed upon.
>> >>> >>>
>> >>> >>> Jay -- I think "older clients will get a bad error message
>> instead of
>> >>> >>> a
>> >>> >>> good one" isn't what would be happening with this change.
>> Previously
>> >>> >>> they
>> >>> >>> wouldn't have received an error and they would have been able to
>> >>> >>> produce
>> >>> >>> messages. After the change they'll just receive this new error
>> >>> >>> message
>> >>> >>> which their clients can't possibly handle gracefully since it
>> didn't
>> >>> >>> exist
>> >>> >>> when the client was written. Whether the acks > 1 setting was
>> >>> >>> actually
>> >>> >>> accomplishing what they thought doesn't matter. Someone could have
>> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>> >>> >>> setting for
>> >>> >>> their applications, set it as a default across a bunch of apps,
>> then
>> >>> >>> follow
>> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>> >>> >>> their
>> >>> >>> apps will start failing on produce requests.
>> >>> >>>
>> >>> >>>
>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <jay.kr...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > This is a good case to discuss.
>> >>> >>> >
>> >>> >>> > Let's figure the general case of how we want to handle errors
>> and
>> >>> >>> > get
>> >>> >>> that
>> >>> >>> > documented in the protocol. The problem right now is that we
>> give
>> >>> >>> > no
>> >>> >>> > guidance on this. I actually thought Gwen's suggestion made
>> sense
>> >>> >>> > on the
>> >>> >>> > guidance we should have given which is that we will enumerate a
>> set
>> >>> >>> > of
>> >>> >>> > errors and their meaning for each API but it is possible that
>> other
>> >>> >>> errors
>> >>> >>> > will occur and they should be handled (maybe poorly) in the same
>> >>> >>> > way
>> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>> >>> >>> > things like
>> >>> >>> > OOMException.
>> >>> >>> >
>> >>> >>> > I really do think we shouldn't be dogmatic here: In considering
>> a
>> >>> >>> > change
>> >>> >>> > to errors we should consider the potential ill-effect vs the
>> >>> >>> > complexity
>> >>> >>> of
>> >>> >>> > yet another protocol version.
>> >>> >>> >
>> >>> >>> > In this case I actually am not sure we need to bump the protocol
>> >>> >>> > because
>> >>> >>> > the whole point of the change was to make a setting we think
>> >>> >>> > doesn't make
>> >>> >>> > sense break, right? Well this will break it. It seems like the
>> only
>> >>> >>> > downside is that older clients will get a bad error message
>> instead
>> >>> >>> > of a
>> >>> >>> > good one. But it isn't like we will have rendered a client
>> >>> >>> > unusable, it
>> >>> >>> is
>> >>> >>> > just that they will need to change their config.
>> >>> >>> >
>> >>> >>> > -Jay
>> >>> >>> >
>> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>> >>> >>> > <gshap...@cloudera.com>
>> >>> >>> > wrote:
>> >>> >>> >
>> >>> >>> >> I created a KIP for this suggestion:
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>> >>> >>> >>
>> >>> >>> >> Basically documenting what was already discussed here.
>> Comments
>> >>> >>> >> will
>> >>> >>> >> be awesome!
>> >>> >>> >>
>> >>> >>> >> Gwen
>> >>> >>> >>
>> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>> >>> >>> >> <gshap...@cloudera.com>
>> >>> >>> >> wrote:
>> >>> >>> >> > The errors are part of the KIP process now, so I think the
>> >>> >>> >> > clients are
>> >>> >>> >> safe :)
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >>> >>> >> >
>> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>> >>> >>> >> > <steve.mo...@gmail.com>
>> >>> >>> >> wrote:
>> >>> >>> >> >> Agree errors should be part of the protocol
>> >>> >>> >> >>
>> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>> >>> >>> >> >>> <gshap...@cloudera.com>
>> >>> >>> >> wrote:
>> >>> >>> >> >>>
>> >>> >>> >> >>> Hi,
>> >>> >>> >> >>>
>> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed
>> part of
>> >>> >>> >> >>> the
>> >>> >>> >> >>> protocol and can't be randomly added.
>> >>> >>> >> >>>
>> >>> >>> >> >>> So, it looks like we need to bump version of
>> ProduceRequest in
>> >>> >>> >> >>> the
>> >>> >>> >> >>> following way:
>> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>> >>> >>> >> >>> existing
>> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
>> surprising
>> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
>> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>> >>> >>> >> >>> something
>> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
>> be
>> >>> >>> >> >>> reused
>> >>> >>> >> >>> in the future and reduce the need to add errors, but will
>> also
>> >>> >>> >> >>> be
>> >>> >>> less
>> >>> >>> >> >>> clear to the client and its users. Maybe even add the error
>> >>> >>> >> >>> message
>> >>> >>> >> >>> string to the protocol in addition to the error code?
>> (since
>> >>> >>> >> >>> we are
>> >>> >>> >> >>> bumping versions....)
>> >>> >>> >> >>>
>> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>> >>> >>> >> >>> sense.
>> >>> >>> IMO
>> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners
>> >>> >>> >> >>> help
>> >>> >>> make
>> >>> >>> >> >>> that call.
>> >>> >>> >> >>>
>> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like
>> a
>> >>> >>> KIP-type
>> >>> >>> >> >>> discussion :)
>> >>> >>> >> >>>
>> >>> >>> >> >>> Gwen
>> >>> >>> >> >>>
>> >>> >>> >> >>>
>> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> >>> >>> >> >>> <e...@confluent.io> wrote:
>> >>> >>> >> >>>> Gwen,
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
>> >>> >>> >> >>>> version
>> >>> >>> >> change is
>> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since
>> it's
>> >>> >>> >> >>>> the
>> >>> >>> only
>> >>> >>> >> one
>> >>> >>> >> >>>> that doesn't potentially break older clients. The protocol
>> >>> >>> >> >>>> guide
>> >>> >>> >> says that
>> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
>> so
>> >>> >>> >> >>>> old
>> >>> >>> >> clients,
>> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
>> should
>> >>> >>> >> >>>> be
>> >>> >>> >> able to
>> >>> >>> >> >>>> work with a new broker version.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
>> change
>> >>> >>> >> >>>> is the
>> >>> >>> >> right
>> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>> >>> >>> >> >>>> think is
>> >>> >>> >> better --
>> >>> >>> >> >>>> the broker should reject requests with invalid values. I
>> >>> >>> >> >>>> think Joe
>> >>> >>> >> and I
>> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
>> >>> >>> >> >>>> missing
>> >>> >>> from
>> >>> >>> >> his
>> >>> >>> >> >>>> description is how long we're going to maintain his "case
>> 0"
>> >>> >>> >> behavior. It's
>> >>> >>> >> >>>> impractical to maintain old versions forever, but it
>> sounds
>> >>> >>> >> >>>> like
>> >>> >>> >> there
>> >>> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
>> >>> >>> >> >>>> that's
>> >>> >>> >> another
>> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior
>> need to
>> >>> >>> >> >>>> be
>> >>> >>> >> listed as
>> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
>> >>> >>> >> >>>> removed
>> >>> >>> >> should be
>> >>> >>> >> >>>> specified so users can understand which versions are
>> >>> >>> >> >>>> guaranteed to
>> >>> >>> be
>> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>> >>> >>> >> >>>> clients.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> -Ewen
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>> >>> >>> >> dana.pow...@gmail.com> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>> clients don't break on unknown errors
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> maybe true for the official java clients, but I dont
>> think
>> >>> >>> >> >>>>> the
>> >>> >>> >> assumption
>> >>> >>> >> >>>>> holds true for community-maintained clients and users of
>> >>> >>> >> >>>>> those
>> >>> >>> >> clients.
>> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
>> and
>> >>> >>> >> >>>>> raises
>> >>> >>> >> an
>> >>> >>> >> >>>>> exception on any unrecognized error code in any server
>> >>> >>> >> >>>>> response.
>> >>> >>> >> in this
>> >>> >>> >> >>>>> case, kafka-python allows users to set their own
>> >>> >>> >> >>>>> required-acks
>> >>> >>> >> policy when
>> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
>> of
>> >>> >>> >> kafka-python
>> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>> >>> >>> production
>> >>> >>> >> >>>>> environments -- and for those users the new error code
>> will
>> >>> >>> >> >>>>> crash
>> >>> >>> >> their
>> >>> >>> >> >>>>> producer code.  I would not be surprised if the same were
>> >>> >>> >> >>>>> true of
>> >>> >>> >> other
>> >>> >>> >> >>>>> community clients.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there
>> isn't
>> >>> >>> >> >>>>> great
>> >>> >>> >> >>>>> documentation on what errors to expect for each request /
>> >>> >>> >> >>>>> response
>> >>> >>> >> -- so
>> >>> >>> >> >>>>> we
>> >>> >>> >> >>>>> use failures to alert that some error case is not handled
>> >>> >>> >> properly.  and
>> >>> >>> >> >>>>> because of that, introducing new error cases without
>> bumping
>> >>> >>> >> >>>>> the
>> >>> >>> api
>> >>> >>> >> >>>>> version is likely to cause those errors to get
>> raised/thrown
>> >>> >>> >> >>>>> all
>> >>> >>> >> the way
>> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
>> can
>> >>> >>> >> >>>>> fix
>> >>> >>> the
>> >>> >>> >> issues
>> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
>> it's
>> >>> >>> >> >>>>> not
>> >>> >>> the
>> >>> >>> >> ideal
>> >>> >>> >> >>>>> situation.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> -Dana
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>> >>> >>> >> gshap...@cloudera.com>
>> >>> >>> >> >>>>> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or
>> the
>> >>> >>> >> >>>>>> new
>> >>> >>> error
>> >>> >>> >> >>>>>> code?
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>> >>> >>> >> >>>>>> receive
>> >>> >>> >> errors
>> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
>> >>> >>> >> >>>>>> clients
>> >>> >>> >> don't
>> >>> >>> >> >>>>>> break on unknown errors, they are simple more
>> challenging
>> >>> >>> >> >>>>>> to
>> >>> >>> >> debug. If
>> >>> >>> >> >>>>>> we document the new behavior, then its definitely
>> >>> >>> >> >>>>>> debuggable and
>> >>> >>> >> >>>>>> fixable.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
>> >>> >>> >> >>>>>> acks > 1
>> >>> >>> >> were
>> >>> >>> >> >>>>>> never documented, and are not supported by Kafka clients
>> >>> >>> >> >>>>>> starting
>> >>> >>> >> with
>> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol
>> bump
>> >>> >>> >> >>>>>> either,
>> >>> >>> >> >>>>>> although its a better case than new error codes.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> Thanks,
>> >>> >>> >> >>>>>> Gwen
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>> >>> >>> joe.st...@stealth.ly>
>> >>> >>> >> >>>>>> wrote:
>> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>> >>> >>> >> >>>>>>> live on
>> >>> >>> >> because
>> >>> >>> >> >>>>>> they
>> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
>> to
>> >>> >>> >> >>>>>>> be
>> >>> >>> >> helping
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> >>>>>>> build the best client libraries they can).
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
>> existing
>> >>> >>> >> >>>>>>> functionality
>> >>> >>> >> >>>>>> of
>> >>> >>> >> >>>>>>> the client protocol, ever.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> There are many reasons for this and we have other
>> threads
>> >>> >>> >> >>>>>>> on the
>> >>> >>> >> >>>>>>> mailing
>> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
>> intended)
>> >>> >>> >> >>>>>>> that I
>> >>> >>> >> don't
>> >>> >>> >> >>>>>> want
>> >>> >>> >> >>>>>>> to re-hash here.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>> >>> >>> >> >>>>>>> format
>> >>> >>> >> (either)
>> >>> >>> >> >>>>>>> we
>> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
>> with
>> >>> >>> >> backward
>> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
>> time
>> >>> >>> >> >>>>>>> for
>> >>> >>> >> folks
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> >>>>>> deal
>> >>> >>> >> >>>>>>> with it.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> match version = {
>> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>> >>> >>> >> >>>>>>> case 1: doNewStuff()
>> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>> >>> >>> >> >>>>>>> }
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>> >>> >>> >> >>>>>>> flags can
>> >>> >>> be
>> >>> >>> >> >>>>>> construed
>> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>> >>> >>> >> different?)
>> >>> >>> >> >>>>>> solution
>> >>> >>> >> >>>>>>> to this.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
>> with
>> >>> >>> >> >>>>>>> this
>> >>> >>> >> change
>> >>> >>> >> >>>>>> then
>> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
>> with a
>> >>> >>> rolling
>> >>> >>> >> >>>>>> restart
>> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer
>> requests
>> >>> >>> >> >>>>>>> start
>> >>> >>> to
>> >>> >>> >> fail
>> >>> >>> >> >>>>>> and
>> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY*
>> >>> >>> >> >>>>>>> need
>> >>> >>> >> people to
>> >>> >>> >> >>>>>> start
>> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
>> going
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> work for
>> >>> >>> >> >>>>>>> everyone.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>> >>> >>> >> >>>>>>> brokers
>> >>> >>> >> and if
>> >>> >>> >> >>>>>>> we
>> >>> >>> >> >>>>>> are
>> >>> >>> >> >>>>>>> not going to support that then I am unclear what the
>> use
>> >>> >>> >> >>>>>>> of
>> >>> >>> >> "version"
>> >>> >>> >> >>>>>>> is
>> >>> >>> >> >>>>>>> based on our original intentions of having it because
>> of
>> >>> >>> >> >>>>>>> the
>> >>> >>> >> >>>>>>> 0.7=>-0.8.
>> >>> >>> >> >>>>>> We
>> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> - Joe Stein
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen
>> Cheslack-Postava <
>> >>> >>> >> >>>>>> e...@confluent.io>
>> >>> >>> >> >>>>>>> wrote:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>> >>> >>> >> >>>>>>>> similar to
>> >>> >>> >> what's
>> >>> >>> >> >>>>>>>> currently being discussed in
>> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649
>> where
>> >>> >>> >> >>>>>>>> users
>> >>> >>> >> now get
>> >>> >>> >> >>>>>>>> errors
>> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>> >>> >>> >> >>>>>>>> clients
>> >>> >>> won't
>> >>> >>> >> >>>>>>>> even
>> >>> >>> >> >>>>>>>> know
>> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
>> even
>> >>> >>> >> >>>>>>>> be
>> >>> >>> able
>> >>> >>> >> to
>> >>> >>> >> >>>>>>>> log
>> >>> >>> >> >>>>>>>> any
>> >>> >>> >> >>>>>>>> meaningful error messages.
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1
>> code,
>> >>> >>> >> >>>>>>>> but
>> >>> >>> >> silently
>> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
>> isn't
>> >>> >>> >> >>>>>>>> the
>> >>> >>> >> same as
>> >>> >>> >> >>>>>>>> other
>> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>> >>> >>> >> >>>>>>>> client
>> >>> >>> and
>> >>> >>> >> >>>>>>>> server
>> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
>> client
>> >>> >>> >> >>>>>>>> might
>> >>> >>> >> just
>> >>> >>> >> >>>>>>>> see
>> >>> >>> >> >>>>>>>> some increased latency since the message might need
>> to be
>> >>> >>> >> replicated
>> >>> >>> >> >>>>>>>> to
>> >>> >>> >> >>>>>>>> more brokers than they requested.
>> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>> >>> >>> >> >>>>>>>> protocol
>> >>> >>> >> version
>> >>> >>> >> >>>>>>>> on
>> >>> >>> >> >>>>>>>> that
>> >>> >>> >> >>>>>>>> message to include the new error code but maintains
>> both
>> >>> >>> >> >>>>>>>> old
>> >>> >>> (now
>> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that
>> would be
>> >>> >>> >> applied in
>> >>> >>> >> >>>>>>>> a
>> >>> >>> >> >>>>>>>> later release that removes the old protocol + code for
>> >>> >>> >> >>>>>>>> handling
>> >>> >>> >> acks
>> >>> >>> >> >>>>>> 1.
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
>> >>> >>> >> >>>>>>>> release
>> >>> >>> >> when
>> >>> >>> >> >>>>>> we'll
>> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
>> deprecation
>> >>> >>> >> >>>>>>>> it
>> >>> >>> >> makes
>> >>> >>> >> >>>>>> things
>> >>> >>> >> >>>>>>>> a
>> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
>> could
>> >>> >>> >> >>>>>>>> give
>> >>> >>> >> users
>> >>> >>> >> >>>>>> better
>> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>> >>> >>> >> >>>>>>>> release
>> >>> >>> >> behind
>> >>> >>> >> >>>>>>>> brokers,
>> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>> >>> >>> features).
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>> >>> >>> >> gshap...@cloudera.com>
>> >>> >>> >> >>>>>>>> wrote:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
>> >>> >>> >> >>>>>>>>> ack>1 on
>> >>> >>> >> the
>> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
>> everyone
>> >>> >>> >> >>>>>>>>> a
>> >>> >>> >> heads up
>> >>> >>> >> >>>>>>>>> on
>> >>> >>> >> >>>>>>>>> what these changes mean.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests
>> that
>> >>> >>> >> >>>>>>>>> include
>> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an
>> exception.
>> >>> >>> >> >>>>>>>>> This
>> >>> >>> >> will be
>> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>> >>> >>> >> >>>>>>>>> assume)
>> >>> >>> and
>> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>> >>> >>> >> >>>>>>>>> can't add
>> >>> >>> >> error
>> >>> >>> >> >>>>>>>>> codes retroactively).
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>> >>> >>> >> >>>>>>>>> (sync and
>> >>> >>> >> >>>>>>>>> new),
>> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
>> that
>> >>> >>> >> >>>>>>>>> relied
>> >>> >>> >> on
>> >>> >>> >> >>>>>>>>> acks
>> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch
>> to
>> >>> >>> >> >>>>>>>>> using
>> >>> >>> >> acks =
>> >>> >>> >> >>>>>>>>> -1
>> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
>> of
>> >>> >>> >> KAFKA-1555
>> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
>> or
>> >>> >>> concerns
>> >>> >>> >> >>>>>>>>> regarding this change.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Gwen
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> --
>> >>> >>> >> >>>>>>>> Thanks,
>> >>> >>> >> >>>>>>>> Ewen
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> --
>> >>> >>> >> >>>>>>> You received this message because you are subscribed to
>> >>> >>> >> >>>>>>> the
>> >>> >>> Google
>> >>> >>> >> >>>>>>> Groups
>> >>> >>> >> >>>>>>> "kafka-clients" group.
>> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving
>> emails
>> >>> >>> >> >>>>>>> from
>> >>> >>> it,
>> >>> >>> >> send
>> >>> >>> >> >>>>>>> an
>> >>> >>> >> >>>>>>> email to kafka-clients+unsubscr...@googlegroups.com.
>> >>> >>> >> >>>>>>> To post to this group, send email to
>> >>> >>> >> kafka-clie...@googlegroups.com.
>> >>> >>> >> >>>>>>> Visit this group at
>> >>> >>> http://groups.google.com/group/kafka-clients.
>> >>> >>> >> >>>>>>> To view this discussion on the web visit
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> >>> >>> >> >>>>>> .
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> For more options, visit
>> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> --
>> >>> >>> >> >>>>>> You received this message because you are subscribed to
>> the
>> >>> >>> Google
>> >>> >>> >> >>>>>> Groups
>> >>> >>> >> >>>>>> "kafka-clients" group.
>> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >>> >> >>>>>> from it,
>> >>> >>> >> send
>> >>> >>> >> >>>>>> an
>> >>> >>> >> >>>>>> email to kafka-clients+unsubscr...@googlegroups.com.
>> >>> >>> >> >>>>>> To post to this group, send email to
>> >>> >>> >> kafka-clie...@googlegroups.com.
>> >>> >>> >> >>>>>> Visit this group at
>> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>> >>> >>> .
>> >>> >>> >> >>>>>> To view this discussion on the web visit
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> >>> >>> >> >>>>>> .
>> >>> >>> >> >>>>>> For more options, visit
>> https://groups.google.com/d/optout.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> --
>> >>> >>> >> >>>> Thanks,
>> >>> >>> >> >>>> Ewen
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Thanks,
>> >>> >>> Ewen
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Thanks,
>> >>> >> Neha
>> >>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "kafka-clients" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to kafka-clients+unsubscr...@googlegroups.com.
>> >> To post to this group, send email to kafka-clie...@googlegroups.com.
>> >> Visit this group at http://groups.google.com/group/kafka-clients.
>> >> To view this discussion on the web visit
>> >>
>> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com
>> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>>
>
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Reply via email to