Gwen,

I think the only option that wouldn't require a protocol version change is
the one where acks > 1 is converted to acks = -1 since it's the only one
that doesn't potentially break older clients. The protocol guide says that
the expected upgrade path is servers first, then clients, so old clients,
including non-java clients, that may be using acks > 1 should be able to
work with a new broker version.

It's more work, but I think dealing with the protocol change is the right
thing to do since it eventually gets us to the behavior I think is better
-- the broker should reject requests with invalid values. I think Joe and I
were basically in agreement. In my mind the major piece missing from his
description is how long we're going to maintain his "case 0" behavior. It's
impractical to maintain old versions forever, but it sounds like there
hasn't been a decision on how long to maintain them. Maybe that's another
item to add to KIPs -- protocol versions and behavior need to be listed as
deprecated and the earliest version in which they'll be removed should be
specified so users can understand which versions are guaranteed to be
compatible, even if they're using (well-written) non-java clients.

-Ewen


On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <dana.pow...@gmail.com> wrote:

> > clients don't break on unknown errors
>
> maybe true for the official java clients, but I dont think the assumption
> holds true for community-maintained clients and users of those clients.
>  kafka-python generally follows the fail-fast philosophy and raises an
> exception on any unrecognized error code in any server response.  in this
> case, kafka-python allows users to set their own required-acks policy when
> creating a producer instance.  It is possible that users of kafka-python
> have deployed producer code that uses ack>1 -- perhaps in production
> environments -- and for those users the new error code will crash their
> producer code.  I would not be surprised if the same were true of other
> community clients.
>
> *one reason for the fail-fast approach is that there isn't great
> documentation on what errors to expect for each request / response -- so we
> use failures to alert that some error case is not handled properly.  and
> because of that, introducing new error cases without bumping the api
> version is likely to cause those errors to get raised/thrown all the way
> back up to the user.  of course we (client maintainers) can fix the issues
> in the client libraries and suggest users upgrade, but it's not the ideal
> situation.
>
>
> long-winded way of saying: I agree w/ Joe.
>
> -Dana
>
>
> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
> > Is the protocol bump caused by the behavior change or the new error code?
> >
> > 1) IMO, error_codes are data, and clients can expect to receive errors
> > that they don't understand (i.e. unknown errors). AFAIK, clients don't
> > break on unknown errors, they are simple more challenging to debug. If
> > we document the new behavior, then its definitely debuggable and
> > fixable.
> >
> > 2) The behavior change is basically a deprecation - i.e. acks > 1 were
> > never documented, and are not supported by Kafka clients starting with
> > version 0.8.2. I'm not sure this requires a protocol bump either,
> > although its a better case than new error codes.
> >
> > Thanks,
> > Gwen
> >
> > On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <joe.st...@stealth.ly>
> wrote:
> > > Looping in the mailing list that the client developers live on because
> > they
> > > are all not on dev (though they should be if they want to be helping to
> > > build the best client libraries they can).
> > >
> > > I whole hardily believe that we need to not break existing
> functionality
> > of
> > > the client protocol, ever.
> > >
> > > There are many reasons for this and we have other threads on the
> mailing
> > > list where we are discussing that topic (no pun intended) that I don't
> > want
> > > to re-hash here.
> > >
> > > If we change wire protocol functionality OR the binary format (either)
> we
> > > must bump version AND treat version as a feature flag with backward
> > > compatibility support until it is deprecated for some time for folks to
> > deal
> > > with it.
> > >
> > > match version = {
> > > case 0: keepDoingWhatWeWereDoing()
> > > case 1: doNewStuff()
> > > case 2: doEvenMoreNewStuff()
> > > }
> > >
> > > has to be a practice we adopt imho ... I know feature flags can be
> > construed
> > > as "messy code" but I am eager to hear another (better? different?)
> > solution
> > > to this.
> > >
> > > If we don't do a feature flag like this specifically with this change
> > then
> > > what happens is that someone upgrades their brokers with a rolling
> > restart
> > > in 0.8.3 and every single one of their producer requests start to fail
> > and
> > > they have a major production outage. eeeek!!!!
> > >
> > > I do 100% agree that > 1 makes no sense and we *REALLY* need people to
> > start
> > > using 0,1,-1 but we need to-do that in a way that is going to work for
> > > everyone.
> > >
> > > Old producers and consumers must keep working with new brokers and if
> we
> > are
> > > not going to support that then I am unclear what the use of "version"
> is
> > > based on our original intentions of having it because of the 0.7=>-0.8.
> > We
> > > said no more breaking changes when we did that.
> > >
> > > - Joe Stein
> > >
> > > On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >>
> > >> Right, so this looks like it could create an issue similar to what's
> > >> currently being discussed in
> > >> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
> > >> errors
> > >> under conditions when they previously wouldn't. Old clients won't even
> > >> know
> > >> about the error code, so besides failing they won't even be able to
> log
> > >> any
> > >> meaningful error messages.
> > >>
> > >> I think there are two options for compatibility:
> > >>
> > >> 1. An alternative change is to remove the ack > 1 code, but silently
> > >> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
> > >> other
> > >> changes to behavior since the interaction between the client and
> server
> > >> remains the same, no error codes change, etc. The client might just
> see
> > >> some increased latency since the message might need to be replicated
> to
> > >> more brokers than they requested.
> > >> 2. Split this into two patches, one that bumps the protocol version on
> > >> that
> > >> message to include the new error code but maintains both old (now
> > >> deprecated) and new behavior, then a second that would be applied in a
> > >> later release that removes the old protocol + code for handling acks >
> > 1.
> > >>
> > >> 2 is probably the right thing to do. If we specify the release when
> > we'll
> > >> remove the deprecated protocol at the time of deprecation it makes
> > things
> > >> a
> > >> lot easier for people writing non-java clients and could give users
> > better
> > >> predictability (e.g. if clients are at most 1 major release behind
> > >> brokers,
> > >> they'll remain compatible but possibly use deprecated features).
> > >>
> > >>
> > >> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gshap...@cloudera.com>
> > >> wrote:
> > >>
> > >> > Hi Kafka Devs,
> > >> >
> > >> > We are working on KAFKA-1697 - remove code related to ack>1 on the
> > >> > broker. Per Neha's suggestion, I'd like to give everyone a heads up
> on
> > >> > what these changes mean.
> > >> >
> > >> > Once this patch is included, any produce requests that include
> > >> > request.required.acks > 1 will result in an exception. This will be
> > >> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> > >> > UnknownException in existing versions (sorry, but I can't add error
> > >> > codes retroactively).
> > >> >
> > >> > This behavior is already enforced by 0.8.2 producers (sync and new),
> > >> > but we expect impact on users with older producers that relied on
> acks
> > >> > > 1 and external clients (i.e python, go, etc).
> > >> >
> > >> > Users who relied on acks > 1 are expected to switch to using acks =
> -1
> > >> > and a min.isr parameter than matches their user case.
> > >> >
> > >> > This change was discussed in the past in the context of KAFKA-1555
> > >> > (min.isr), but let us know if you have any questions or concerns
> > >> > regarding this change.
> > >> >
> > >> > Gwen
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> Ewen
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to kafka-clients+unsubscr...@googlegroups.com.
> > > To post to this group, send email to kafka-clie...@googlegroups.com.
> > > Visit this group at http://groups.google.com/group/kafka-clients.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> > .
> > >
> > > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to kafka-clients+unsubscr...@googlegroups.com.
> > To post to this group, send email to kafka-clie...@googlegroups.com.
> > Visit this group at http://groups.google.com/group/kafka-clients.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> > .
> > For more options, visit https://groups.google.com/d/optout.
> >
>



-- 
Thanks,
Ewen

Reply via email to