Re: Writing a client: Connection pooling

Guozhang Wang Mon, 18 May 2015 10:38:35 -0700

Hi Warren,

Maybe I was a little misleading in my previous email:


1. The statement that "The server guarantees that on a single TCP
connection, requests will be processed in the order" is still valid. On a
single connection, all types of requests (produce/fetch/metadata) are
handled in order.

2. As for the clients, it is just a matter of whether or not you want to
have order preserving across different requests, which is orthogonal to the
protocol/server-side guarantees. The statement that "it should not
generally be necessary to maintain multiple connections to a single broker
from a single client instance" is a generally suggestion, not restriction
on client implementation. In fact, if your client acts as both a producer
and a consumer, and you do not need to make sure the produce and fetch
requests' ordering needs to be preserved, but rather only the ordering
within producer requests and the ordering within fetch requests to be
honored, then you'd better use separate channels for these two types of
requests, otherwise as you noticed a long pooling request can block
subsequent produce requests. As an example, the current Java consumer
client used a single channel for each broker, but people are discussing
about using another separate channel to the broker acting as the consumer
coordinator, etc. Again, this is just the client implementation details, of
how you would like to make use of the protocol guarantees while considering
performance, usability, etc.

Guozhang


On Thu, May 14, 2015 at 3:34 PM, Warren Falk <war...@warrenfalk.com> wrote:

> The C# client (kafka-net) isn't written by me; I'm just working on it.  It
> has a separate producer and consumer, but the client is designed to connect
> to each broker exactly once and then reuse (multiplex over) that one
> connection across all consumers and producers.
>
> It's a bit disappointing to see such a great feature of the Kafka protocol
> be abandoned.  It seems such a shame to implement request/response
> correlation and turn around and incur the latency overhead of additional
> TCP handshakes anyway.  If requests didn't block (if the server guaranteed
> ordering only per partition etc.) then there would seem to be no reason to
> use separate channels.  Are we definitely giving up on that feature?
>
> I fear both C# clients will have to be mostly redesigned in light of this,
> which is doubly unfortunate because the C# clients don't seem to have
> enough development momentum behind them as it is. Subsequently, Kafka use
> in .Net environments is still extremely rare.
>
>
> On Wed, May 13, 2015 at 11:55 AM, Guozhang Wang <wangg...@gmail.com>
> wrote:
>
> > Hello Warren,
> >
> > I seems your C# client is both a producer and a consumer. Then with the
> > behavior of the broker, your suspension is correct that a long pooling
> > fetch using the same TCP connection will block subsequent produce /
> > metadata requests.
> >
> > I think the statement that "it should not generally be necessary to
> > maintain multiple connections ..." is not valid anymore if your client
> acts
> > both as a producer and a consumer. In fact, in the 0.9 Java clients
> > (producer and consumer), we already could possibly maintain multiple
> > connections to a single broker even though the client only send produce
> or
> > fetch requests, because we need a separate channel for consumer
> > coordinator, etc, and we have also once discussed about using a separate
> > channel for metadata refresh.
> >
> > So I think we should modify the above statement in the wiki. Thanks for
> > pointing out.
> >
> > Guozhang
> >
> > On Wed, May 13, 2015 at 7:44 AM, Warren Falk <war...@warrenfalk.com>
> > wrote:
> >
> > > I'm working on the C# client.  The current Kafka Protocol page says
> this:
> > >
> > > "it should not generally be necessary to maintain multiple connections
> > to a
> > > single broker from a single client instance (i.e. connection pooling)"
> > >
> > > But then says this:
> > >
> > > "The server guarantees that on a single TCP connection, requests will
> be
> > > processed in the order they are sent and responses will return in that
> > > order as well".
> > >
> > > (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > > )
> > >
> > > Given that fetch requests can be long-polling, these two statements are
> > > mutually exclusive, are they not?  E.g. if I issue a long polling fetch
> > to
> > > one broker for a topic partition and then need to issue another request
> > to
> > > that same broker for any other reason (fetch/produce/metadata), my
> second
> > > request will hang until my long poll times out.  (I either need to use
> an
> > > unacceptably low poll timeout for the first request or I have to accept
> > an
> > > unacceptably high latency for any second request to that broker, or I
> > have
> > > to implement connection pooling and/or multiple connections to a single
> > > broker).
> > >
> > > Three things:
> > >
> > > 1. Am I just missing something obvious?
> > > 2. Is this changing in 0.9?  I know the consumer is getting a redesign,
> > but
> > > is this broker issue addressed in some way?
> > > 3. Is this ordering over all requests even useful?
> > >
> > > On #3 the documentation goes on to say:  "The broker's request
> processing
> > > allows only a single in-flight request per connection in order to
> > guarantee
> > > this ordering"
> > >
> > > As far as I can tell, order preservation is valuable only for produce
> > > requests and only per topic-partition. What else?  But especially once
> a
> > > fetch request goes to purgatory, why would the broker not continue
> > > processing other incoming requests?  (and of what actual use is a
> > > "correlation id" when all responses the server sends are always for the
> > > oldest in-flight request?)
> > >
> > > This problem affects the C# client (kafka-net) among other things.
> > > Brilliantly, after the C# consumer returns fetched messages to its
> > caller,
> > > it immediately reissues another fetch request in the background, but
> this
> > > brilliance ends up backfiring because of the broker's behavior
> mentioned
> > > above.  Any attempt to publish to another topic while processing the
> > > consumed messages will have a mysterious sudden 95% drop in performance
> > if
> > > the two topic partitions happen to be on the same broker.  The only
> > > solution seems to be to implement connection pooling.  This seems
> wrong.
> > >
> > > Despite the note that "it should not generally be necessary to maintain
> > > multiple connections to a single broker", the java (scala)
> SimpleConsumer
> > > appears to make a separate connection for each instance.
> > >
> > > So is the correct solution to have the C# client try to transparently
> > > manage multiple connections to the broker, or is it to have the broker
> > more
> > > intelligently use a single connection?
> > >
> > > Thanks in advance and my apologies if this has been discussed elsewhere
> > on
> > > the list.  I searched but couldn't find anything.
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Writing a client: Connection pooling

Reply via email to