Re: Questions about Kafka 0.9 API changes

Valentin Tue, 23 Sep 2014 02:33:48 -0700

Hi Jun,

On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com> wrote:
> The new consumer api will also allow you to do what you want in a
> SimpleConsumer (e.g., subscribe to a static set of partitions, control
> initial offsets, etc), only more conveniently.

Yeah, I have reviewed the available javadocs for the new Kafka 0.9
consumer APIs.
However, while they still allow me to do roughly what I want, I fear that
they will result in an overall much worse performing implementation on my
side.
The main problem I have in my scenario is that consumer requests are
coming in via stateless HTTP requests (each request is standalone and
specifies topics+partitions+offsets to read data from) and I need to find a
good way to do connection pooling to the Kafka backend for good
performance. The SimpleConsumer would allow me to do that, an approach with
the new Kafka 0.9 consumer API seems to have a lot more overhead.

Basically, what I am looking for is a way to pool connections per Kafka
broker host, independent of the topics/partitions/clients/..., so each
Tomcat app server would keep N disjunctive connection pools, if I have N
Kafka broker hosts.
I would then keep some central metadata which tells me which hosts are the
leaders for which topic+partition and for an incoming HTTP client request
I'd just take a Kafka connection from the pool for that particular broker
host, request the data and return the connection to the pool. This means
that a Kafka broker host will get requests from lots of different end
consumers via the same TCP connection (sequentially of course).

With the new Kafka consumer API I would have to subscribe/unsubscribe from
topics every time I take a connection from the pool and as the request may
need go to a different broker host than the last one, that wouldn't even
prevent all the connection/reconnection overhead. I guess I could create
one dedicated connection pool per topic-partition, that way
connection/reconnection overhead should be minimized, but that way I'd end
up with hundreds of connection pools per app server, also not a good
approach.
All in all, the planned design of the new consumer API just doesn't seem
to fit my use case well. Which is why I am a bit anxious about the
SimpleConsumer API being deprecated.

Or am I missing something here? Thanks!

Greetings
Valentin

> On Mon, Sep 22, 2014 at 8:10 AM, Valentin <kafka-9999...@sblk.de> wrote:
> 
>>
>> Hello,
>>
>> I am currently working on a Kafka implementation and have a couple of
>> questions concerning the road map for the future.
>> As I am unsure where to put such questions, I decided to try my luck on
>> this mailing list. If this is the wrong place for such inquiries, I
>> apologize. In this case it would be great if someone could offer some
>> pointers as to where to find/get these answers.
>>
>> So, here I go :)
>>
>> 1) Consumer Redesign in Kafka 0.9
>> I found a number of documents explaining planned changes to the
consumer
>> APIs for Kafka version 0.9. However, these documents are only
mentioning
>> the high level consumer implementations. Does anyone know if the
>> kafka.javaapi.consumer.SimpleConsumer API/implementation will also
change
>> with 0.9? Or will that stay more or less as it is now?
>>
>> 2) Pooling of Kafka Connections - SimpleConsumer
>> As I have a use case where the connection between the final consumers
and
>> Kafka needs to happen via HTTP, I am concerned about performance
>> implications of the required HTTP wrapping. I am planning to implement
a
>> custom HTTP API for Kafka producers and consumers which will be
stateless
>> and where offset tracking will be done on the final consumer side. Now
>> the
>> question here would be whether anyone has made experiences with pooling
>> connections to Kafka brokers in order to reuse them effectively for
>> incoming, stateless HTTP REST calls. An idea here would be to have one
>> connection pool per broker host and to keep a set of open
>> consumers/connections for each broker in those pools. Once I know which
>> broker is the leader for a requested topic partition for a REST call, I
>> could then use an already existing consumer/connection from that pool
for
>> the processing of that REST call and then return it to the pool. So I'd
>> be
>> able to have completely stateless REST call handling without having to
>> open/close Kafka connections all the time.
>>
>> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
>> Now let's assume I want to implement the idea from 2) but with the high
>> level KafkaConsumer (to leave identifications of partition leaders and
>> error handling to it). Are already any implementation details
>> known/decided
>> on how the subscribe, unsubscribe and seek methods will work
internally?
>> Would I be able to somehow reuse connected KafkaConsumer objects in
>> connection pools? Could I for example call subscribe/unsubscribe/seek
for
>> each HTTP request on a consumer to switch topics/partitions to the
>> currently needed set or would this be a very expensive operation (i.e.
>> because it would fetch metadata from Kafka to identify the leader for
>> each
>> partition)?
>>
>> Greetings
>> Valentin
>>

Re: Questions about Kafka 0.9 API changes

Reply via email to