I'm working on the C# client.  The current Kafka Protocol page says this:

"it should not generally be necessary to maintain multiple connections to a
single broker from a single client instance (i.e. connection pooling)"

But then says this:

"The server guarantees that on a single TCP connection, requests will be
processed in the order they are sent and responses will return in that
order as well".

(
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
)

Given that fetch requests can be long-polling, these two statements are
mutually exclusive, are they not?  E.g. if I issue a long polling fetch to
one broker for a topic partition and then need to issue another request to
that same broker for any other reason (fetch/produce/metadata), my second
request will hang until my long poll times out.  (I either need to use an
unacceptably low poll timeout for the first request or I have to accept an
unacceptably high latency for any second request to that broker, or I have
to implement connection pooling and/or multiple connections to a single
broker).

Three things:

1. Am I just missing something obvious?
2. Is this changing in 0.9?  I know the consumer is getting a redesign, but
is this broker issue addressed in some way?
3. Is this ordering over all requests even useful?

On #3 the documentation goes on to say:  "The broker's request processing
allows only a single in-flight request per connection in order to guarantee
this ordering"

As far as I can tell, order preservation is valuable only for produce
requests and only per topic-partition. What else?  But especially once a
fetch request goes to purgatory, why would the broker not continue
processing other incoming requests?  (and of what actual use is a
"correlation id" when all responses the server sends are always for the
oldest in-flight request?)

This problem affects the C# client (kafka-net) among other things.
Brilliantly, after the C# consumer returns fetched messages to its caller,
it immediately reissues another fetch request in the background, but this
brilliance ends up backfiring because of the broker's behavior mentioned
above.  Any attempt to publish to another topic while processing the
consumed messages will have a mysterious sudden 95% drop in performance if
the two topic partitions happen to be on the same broker.  The only
solution seems to be to implement connection pooling.  This seems wrong.

Despite the note that "it should not generally be necessary to maintain
multiple connections to a single broker", the java (scala) SimpleConsumer
appears to make a separate connection for each instance.

So is the correct solution to have the C# client try to transparently
manage multiple connections to the broker, or is it to have the broker more
intelligently use a single connection?

Thanks in advance and my apologies if this has been discussed elsewhere on
the list.  I searched but couldn't find anything.

Reply via email to