Hi Jun, On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com> wrote: > The new consumer api will also allow you to do what you want in a > SimpleConsumer (e.g., subscribe to a static set of partitions, control > initial offsets, etc), only more conveniently.
Yeah, I have reviewed the available javadocs for the new Kafka 0.9 consumer APIs. However, while they still allow me to do roughly what I want, I fear that they will result in an overall much worse performing implementation on my side. The main problem I have in my scenario is that consumer requests are coming in via stateless HTTP requests (each request is standalone and specifies topics+partitions+offsets to read data from) and I need to find a good way to do connection pooling to the Kafka backend for good performance. The SimpleConsumer would allow me to do that, an approach with the new Kafka 0.9 consumer API seems to have a lot more overhead. Basically, what I am looking for is a way to pool connections per Kafka broker host, independent of the topics/partitions/clients/..., so each Tomcat app server would keep N disjunctive connection pools, if I have N Kafka broker hosts. I would then keep some central metadata which tells me which hosts are the leaders for which topic+partition and for an incoming HTTP client request I'd just take a Kafka connection from the pool for that particular broker host, request the data and return the connection to the pool. This means that a Kafka broker host will get requests from lots of different end consumers via the same TCP connection (sequentially of course). With the new Kafka consumer API I would have to subscribe/unsubscribe from topics every time I take a connection from the pool and as the request may need go to a different broker host than the last one, that wouldn't even prevent all the connection/reconnection overhead. I guess I could create one dedicated connection pool per topic-partition, that way connection/reconnection overhead should be minimized, but that way I'd end up with hundreds of connection pools per app server, also not a good approach. All in all, the planned design of the new consumer API just doesn't seem to fit my use case well. Which is why I am a bit anxious about the SimpleConsumer API being deprecated. Or am I missing something here? Thanks! Greetings Valentin > On Mon, Sep 22, 2014 at 8:10 AM, Valentin <kafka-9999...@sblk.de> wrote: > >> >> Hello, >> >> I am currently working on a Kafka implementation and have a couple of >> questions concerning the road map for the future. >> As I am unsure where to put such questions, I decided to try my luck on >> this mailing list. If this is the wrong place for such inquiries, I >> apologize. In this case it would be great if someone could offer some >> pointers as to where to find/get these answers. >> >> So, here I go :) >> >> 1) Consumer Redesign in Kafka 0.9 >> I found a number of documents explaining planned changes to the consumer >> APIs for Kafka version 0.9. However, these documents are only mentioning >> the high level consumer implementations. Does anyone know if the >> kafka.javaapi.consumer.SimpleConsumer API/implementation will also change >> with 0.9? Or will that stay more or less as it is now? >> >> 2) Pooling of Kafka Connections - SimpleConsumer >> As I have a use case where the connection between the final consumers and >> Kafka needs to happen via HTTP, I am concerned about performance >> implications of the required HTTP wrapping. I am planning to implement a >> custom HTTP API for Kafka producers and consumers which will be stateless >> and where offset tracking will be done on the final consumer side. Now >> the >> question here would be whether anyone has made experiences with pooling >> connections to Kafka brokers in order to reuse them effectively for >> incoming, stateless HTTP REST calls. An idea here would be to have one >> connection pool per broker host and to keep a set of open >> consumers/connections for each broker in those pools. Once I know which >> broker is the leader for a requested topic partition for a REST call, I >> could then use an already existing consumer/connection from that pool for >> the processing of that REST call and then return it to the pool. So I'd >> be >> able to have completely stateless REST call handling without having to >> open/close Kafka connections all the time. >> >> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9) >> Now let's assume I want to implement the idea from 2) but with the high >> level KafkaConsumer (to leave identifications of partition leaders and >> error handling to it). Are already any implementation details >> known/decided >> on how the subscribe, unsubscribe and seek methods will work internally? >> Would I be able to somehow reuse connected KafkaConsumer objects in >> connection pools? Could I for example call subscribe/unsubscribe/seek for >> each HTTP request on a consumer to switch topics/partitions to the >> currently needed set or would this be a very expensive operation (i.e. >> because it would fetch metadata from Kafka to identify the leader for >> each >> partition)? >> >> Greetings >> Valentin >>