Perhaps a bit hacky, but you could also reserve a specific correlationId
(maybe -1)
to represent errors and send back to the client an UnknownAPIResponse like:

Response => -1 UnknownAPIResponse

UnknownAPIResponse => originalCorrelationId errorCode

The benefit here would be that it does not break the current API and current
clients should be able to continue operating as usual as long as they ignore
unknown correlationIds and don't use the reserved Id.  For clients that
want to
catch unknownAPI errors, they can handle -1 correlationIds and dispatch as
needed.

Otherwise perhaps bump the Metadata Request / Response API and include
the supported api / versions in the Broker metadata:

Broker => NodeId Host Port [ApiKey ApiVersion] (any number of brokers may
be returned)
    NodeId => int32
    Host => string
    Port => int32
    ApiKey => int16
    ApiVersion => int16

So that Metadata includes the list of all supported API/Versions for each
broker
in the cluster.

And echo the problem with continuing with the current behavior pointed out
by Jay:
clients cannot know the difference between a network error and an unknown
API
error.  And because network errors require a lot of state resets, that can
be a
big performance hit.  Generally on a network error a client needs to assume
the
worst and reload cluster metadata at least.  And it is difficult to prevent
this happening
every time because the client doesn't know whether to avoid the API in the
future because it is not supported, or keep retrying because the network is
flaking.


Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

On Mon, Jan 12, 2015 at 3:51 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Yeah I totally agree--using the closed socket to indicate "not supported"
> does work since any network error could lead to that.
>
> Arguably we should have a request level error. We discussed this at the
> time we were defining the protocols for 0.8 and the conclusion was not to
> do that. The reasoning was that since almost all the requests end up having
> errors at either a per-topic or per-partition level this makes correctly
> setting/interpreting the global error a bit confusing. I.e. if you are
> implementing a client and a given partition gets an error but there is no
> global error, what do you do? Likewise in most cases it is a bit ambiguous
> how to set the global error on the server side (i.e. if some partitions are
> unavailable but some are available). The result was that error reporting is
> defined per-request.
>
> We could change this now, but it would mean bumping compatibility on all
> the apis to add the new field which would be annoying to people, right? I
> actually agree it might have been better to do it this way both for this
> and also to make generic error handling easier but I'm not sure if it is
> worth such a big break now. The other proposal, introducing a
> get_protocol_versions() method seems almost as good for probing for support
> and is much less invasive. That seems better to me because I think
> generally clients shouldn't need to do this, they should just build against
> a minimum Kafka version and trust it will keep working into the future.
>
> -Jay
>
> On Mon, Jan 12, 2015 at 2:24 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
> > > I think #1 may be tricky in practice. The response format is always
> > > dictated by the request format so how do I know if the bytes I got back
> > are
> > > a valid response to the given request or are the
> UnknownRequestResponse?
> >
> > On the other hand, from the client developer perspective, having to
> > figure out that you are looking at a closed socket because you tried
> > to use an API that wasn't implemented in a specific version can be
> > pretty annoying.
> >
> > Another way to do it is to move error_code field (currently
> > implemented in pretty much every single Response schema) to the
> > Response Header, and then we could use it for "meta errors" such as
> > UnknownAPI.
> >
> > Its a much bigger change than adding a new Request type, but possibly
> > worth it?
> >
> > >
> > > #2 would be a good fix for the problem I think. This might be a good
> > > replacement for the echo api and would probably serve the same purpose
> > > (checking if the server is alive).
> > >
> > > #3 is a little dangerous because we actually want clients to only pay
> > > attention to the protocol versions which are per-api, not the server
> > > version. I.e. we actually don't want the client to do something like
> > check
> > > serverVersion.equals("0.8.2") because we want to be able to release the
> > > server at will and have it keep answering protocols in a backwards
> > > compatible way. I.e. a client that uses just metadata request and
> produce
> > > request should only care about the version of these protocols it
> > implements
> > > being supported not about the version of the server or the version of
> any
> > > other protocol it doesn't use. This is the rationale behind versioning
> > the
> > > apis independently rather than having a single protocol version that we
> > > would have to bump every time an internal broker-broker protocol
> changed.
> > >
> > > -Jay
> > >
> > > On Mon, Jan 12, 2015 at 1:32 PM, Gwen Shapira <gshap...@cloudera.com>
> > wrote:
> > >
> > >> We ran into similar difficulties, both when trying to get Kafka to use
> > >> new APIs when available and when testing the wire protocol.
> > >>
> > >> +1 for all three suggestions.
> > >>
> > >> #1 sounds like the bare minimum, but I'm not sure how much it will
> > >> complicate the clients (now we expect either a response or an Unknown
> > >> message and need to be able to distinguish between them from the byte
> > >> array).
> > >>
> > >> #2 and #3 both makes lots of sense.
> > >>
> > >> Gwen
> > >>
> > >>
> > >> On Mon, Jan 12, 2015 at 1:15 PM, Dana Powers <dana.pow...@rd.io>
> wrote:
> > >> > Hi all -- continuing on the compatibility discussion:
> > >> >
> > >> > I've found that it is very difficult to identify when a server does
> > not
> > >> > recognize an api
> > >> > (I'm using kafka-python to submit wire-protocol requests).  For
> > example,
> > >> > when I
> > >> > send a ConsumerMetadataRequest to an 0.8.1.1 server, I get a closed
> > >> socket
> > >> > *[stacktrace below].  The server raises an error internally, but
> does
> > not
> > >> > send any
> > >> > meaningful response.  I'm not sure whether this is the intended
> > behavior,
> > >> > but
> > >> > maintaining clients in an ecosystem of multiple server versions with
> > >> > different API
> > >> > support it would be great to have a way to determine what the server
> > >> > supports
> > >> > and what it does not.
> > >> >
> > >> > Some suggestions:
> > >> >
> > >> > (1) An UnknownAPIResponse that is returned for any API or API
> Version
> > >> > request
> > >> > that is unsupported.
> > >> >
> > >> > (2) A server metadata API to get the list of supported APIs and/or
> API
> > >> > versions supported.
> > >> >
> > >> > (3) A server metadata API to get the published version of the server
> > >> (0.8.2
> > >> > v. 0.8.1.1, etc).
> > >> >
> > >> >
> > >> > Thoughts?
> > >> >
> > >> >
> > >> > Dana Powers
> > >> > Rdio, Inc.
> > >> > dana.pow...@rd.io
> > >> > rdio.com/people/dpkp/
> > >> >
> > >> > *stacktrace:
> > >> > ```
> > >> > [2015-01-12 13:03:55,719] ERROR Closing socket for /127.0.0.1
> > because of
> > >> > error (kafka.network.Processor)
> > >> > kafka.common.KafkaException: Wrong request type 10
> > >> >    at
> kafka.api.RequestKeys$.deserializerForKey(RequestKeys.scala:57)
> > >> >    at
> > >> kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:53)
> > >> >    at kafka.network.Processor.read(SocketServer.scala:353)
> > >> >    at kafka.network.Processor.run(SocketServer.scala:245)
> > >> >    at java.lang.Thread.run(Thread.java:722)
> > >> > ```
> > >>
> >
>

Reply via email to