There are benefits of moving the error code to the response header.

1) I think it is the right thing to-do from an implementation perspective.
It makes the most sense. You send a request and you get back a response.
The response tells you something is wrong in the header.

2) With such a large change we can make sure we have our solution to solve
these issues (see other thread on Compatibility and KIP) setup and in place
moving forward. If we can make such a large change then smaller ones should
work well too. We could even use this one change as a way to best flush out
the way we want to implement it preserving functionality AND adding the new
response format. When we release 0.8.3 (assuming this was in there)
developers can read KIP-1 (or whatever) and decide if they want to support
the version bump required, if not then fine keep working with 0.8.2 and you
are good to go.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Mon, Jan 12, 2015 at 8:37 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Yeah, adding it to the metadata request probably makes sense.
>
> What you describe of making it a per-broker field is technically correct,
> since each broker could be on a different software version. But I wonder if
> it might not be more usable to just give back a single list of api
> versions. This will be more compact and also easier to interpret as a
> client. An easy implementation of this would be for the broker that answers
> the metadata request by just giving whatever versions it supports. A
> slightly better implementation would be for each broker to register what it
> supports in ZK and have the responding broker give back the intersection
> (i.e. apis supported by all brokers).
>
> Since the broker actually supports multiple versions at the same time this
> will need to be in the form [ApiId [ApiVersion]].
>
> -Jay
>
> On Mon, Jan 12, 2015 at 5:19 PM, Dana Powers <dana.pow...@rd.io> wrote:
>
> > Perhaps a bit hacky, but you could also reserve a specific correlationId
> > (maybe -1)
> > to represent errors and send back to the client an UnknownAPIResponse
> like:
> >
> > Response => -1 UnknownAPIResponse
> >
> > UnknownAPIResponse => originalCorrelationId errorCode
> >
> > The benefit here would be that it does not break the current API and
> > current
> > clients should be able to continue operating as usual as long as they
> > ignore
> > unknown correlationIds and don't use the reserved Id.  For clients that
> > want to
> > catch unknownAPI errors, they can handle -1 correlationIds and dispatch
> as
> > needed.
> >
> > Otherwise perhaps bump the Metadata Request / Response API and include
> > the supported api / versions in the Broker metadata:
> >
> > Broker => NodeId Host Port [ApiKey ApiVersion] (any number of brokers may
> > be returned)
> >     NodeId => int32
> >     Host => string
> >     Port => int32
> >     ApiKey => int16
> >     ApiVersion => int16
> >
> > So that Metadata includes the list of all supported API/Versions for each
> > broker
> > in the cluster.
> >
> > And echo the problem with continuing with the current behavior pointed
> out
> > by Jay:
> > clients cannot know the difference between a network error and an unknown
> > API
> > error.  And because network errors require a lot of state resets, that
> can
> > be a
> > big performance hit.  Generally on a network error a client needs to
> assume
> > the
> > worst and reload cluster metadata at least.  And it is difficult to
> prevent
> > this happening
> > every time because the client doesn't know whether to avoid the API in
> the
> > future because it is not supported, or keep retrying because the network
> is
> > flaking.
> >
> >
> > Dana Powers
> > Rdio, Inc.
> > dana.pow...@rd.io
> > rdio.com/people/dpkp/
> >
> > On Mon, Jan 12, 2015 at 3:51 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Yeah I totally agree--using the closed socket to indicate "not
> supported"
> > > does work since any network error could lead to that.
> > >
> > > Arguably we should have a request level error. We discussed this at the
> > > time we were defining the protocols for 0.8 and the conclusion was not
> to
> > > do that. The reasoning was that since almost all the requests end up
> > having
> > > errors at either a per-topic or per-partition level this makes
> correctly
> > > setting/interpreting the global error a bit confusing. I.e. if you are
> > > implementing a client and a given partition gets an error but there is
> no
> > > global error, what do you do? Likewise in most cases it is a bit
> > ambiguous
> > > how to set the global error on the server side (i.e. if some partitions
> > are
> > > unavailable but some are available). The result was that error
> reporting
> > is
> > > defined per-request.
> > >
> > > We could change this now, but it would mean bumping compatibility on
> all
> > > the apis to add the new field which would be annoying to people,
> right? I
> > > actually agree it might have been better to do it this way both for
> this
> > > and also to make generic error handling easier but I'm not sure if it
> is
> > > worth such a big break now. The other proposal, introducing a
> > > get_protocol_versions() method seems almost as good for probing for
> > support
> > > and is much less invasive. That seems better to me because I think
> > > generally clients shouldn't need to do this, they should just build
> > against
> > > a minimum Kafka version and trust it will keep working into the future.
> > >
> > > -Jay
> > >
> > > On Mon, Jan 12, 2015 at 2:24 PM, Gwen Shapira <gshap...@cloudera.com>
> > > wrote:
> > >
> > > > > I think #1 may be tricky in practice. The response format is always
> > > > > dictated by the request format so how do I know if the bytes I got
> > back
> > > > are
> > > > > a valid response to the given request or are the
> > > UnknownRequestResponse?
> > > >
> > > > On the other hand, from the client developer perspective, having to
> > > > figure out that you are looking at a closed socket because you tried
> > > > to use an API that wasn't implemented in a specific version can be
> > > > pretty annoying.
> > > >
> > > > Another way to do it is to move error_code field (currently
> > > > implemented in pretty much every single Response schema) to the
> > > > Response Header, and then we could use it for "meta errors" such as
> > > > UnknownAPI.
> > > >
> > > > Its a much bigger change than adding a new Request type, but possibly
> > > > worth it?
> > > >
> > > > >
> > > > > #2 would be a good fix for the problem I think. This might be a
> good
> > > > > replacement for the echo api and would probably serve the same
> > purpose
> > > > > (checking if the server is alive).
> > > > >
> > > > > #3 is a little dangerous because we actually want clients to only
> pay
> > > > > attention to the protocol versions which are per-api, not the
> server
> > > > > version. I.e. we actually don't want the client to do something
> like
> > > > check
> > > > > serverVersion.equals("0.8.2") because we want to be able to release
> > the
> > > > > server at will and have it keep answering protocols in a backwards
> > > > > compatible way. I.e. a client that uses just metadata request and
> > > produce
> > > > > request should only care about the version of these protocols it
> > > > implements
> > > > > being supported not about the version of the server or the version
> of
> > > any
> > > > > other protocol it doesn't use. This is the rationale behind
> > versioning
> > > > the
> > > > > apis independently rather than having a single protocol version
> that
> > we
> > > > > would have to bump every time an internal broker-broker protocol
> > > changed.
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Mon, Jan 12, 2015 at 1:32 PM, Gwen Shapira <
> gshap...@cloudera.com
> > >
> > > > wrote:
> > > > >
> > > > >> We ran into similar difficulties, both when trying to get Kafka to
> > use
> > > > >> new APIs when available and when testing the wire protocol.
> > > > >>
> > > > >> +1 for all three suggestions.
> > > > >>
> > > > >> #1 sounds like the bare minimum, but I'm not sure how much it will
> > > > >> complicate the clients (now we expect either a response or an
> > Unknown
> > > > >> message and need to be able to distinguish between them from the
> > byte
> > > > >> array).
> > > > >>
> > > > >> #2 and #3 both makes lots of sense.
> > > > >>
> > > > >> Gwen
> > > > >>
> > > > >>
> > > > >> On Mon, Jan 12, 2015 at 1:15 PM, Dana Powers <dana.pow...@rd.io>
> > > wrote:
> > > > >> > Hi all -- continuing on the compatibility discussion:
> > > > >> >
> > > > >> > I've found that it is very difficult to identify when a server
> > does
> > > > not
> > > > >> > recognize an api
> > > > >> > (I'm using kafka-python to submit wire-protocol requests).  For
> > > > example,
> > > > >> > when I
> > > > >> > send a ConsumerMetadataRequest to an 0.8.1.1 server, I get a
> > closed
> > > > >> socket
> > > > >> > *[stacktrace below].  The server raises an error internally, but
> > > does
> > > > not
> > > > >> > send any
> > > > >> > meaningful response.  I'm not sure whether this is the intended
> > > > behavior,
> > > > >> > but
> > > > >> > maintaining clients in an ecosystem of multiple server versions
> > with
> > > > >> > different API
> > > > >> > support it would be great to have a way to determine what the
> > server
> > > > >> > supports
> > > > >> > and what it does not.
> > > > >> >
> > > > >> > Some suggestions:
> > > > >> >
> > > > >> > (1) An UnknownAPIResponse that is returned for any API or API
> > > Version
> > > > >> > request
> > > > >> > that is unsupported.
> > > > >> >
> > > > >> > (2) A server metadata API to get the list of supported APIs
> and/or
> > > API
> > > > >> > versions supported.
> > > > >> >
> > > > >> > (3) A server metadata API to get the published version of the
> > server
> > > > >> (0.8.2
> > > > >> > v. 0.8.1.1, etc).
> > > > >> >
> > > > >> >
> > > > >> > Thoughts?
> > > > >> >
> > > > >> >
> > > > >> > Dana Powers
> > > > >> > Rdio, Inc.
> > > > >> > dana.pow...@rd.io
> > > > >> > rdio.com/people/dpkp/
> > > > >> >
> > > > >> > *stacktrace:
> > > > >> > ```
> > > > >> > [2015-01-12 13:03:55,719] ERROR Closing socket for /127.0.0.1
> > > > because of
> > > > >> > error (kafka.network.Processor)
> > > > >> > kafka.common.KafkaException: Wrong request type 10
> > > > >> >    at
> > > kafka.api.RequestKeys$.deserializerForKey(RequestKeys.scala:57)
> > > > >> >    at
> > > > >>
> kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:53)
> > > > >> >    at kafka.network.Processor.read(SocketServer.scala:353)
> > > > >> >    at kafka.network.Processor.run(SocketServer.scala:245)
> > > > >> >    at java.lang.Thread.run(Thread.java:722)
> > > > >> > ```
> > > > >>
> > > >
> > >
> >
>

Reply via email to