Valentin, That's a good point. We don't have this use case in mind when designing the new consumer api. A straightforward implementation could be removing the locally cached topic metadata for unsubscribed topics. It's probably possible to add a config value to avoid churns in caching the metadata. Could you file a jira so that we can track this?
Thanks, Jun On Thu, Sep 25, 2014 at 4:19 AM, Valentin <kafka-9999...@sblk.de> wrote: > > Hi Jun, Hi Guozhang, > > hm, yeah, if the subscribe/unsubscribe is a smart and lightweight > operation this might work. But if it needs to do any additional calls to > fetch metadata during a subscribe/unsubscribe call, the overhead could get > quite problematic. The main issue I still see here is that an additional > layer is added which does not really provide any benefit for a use case > like mine. > I.e. the leader discovery and connection handling you mention below don't > really offer value in this case, as for the connection pooling approach > suggested, I will have to discover and maintain leader metadata in my own > code anyway as well as handling connection pooling. So if I understand the > current plans for the Kafka 0.9 consumer correctly, it just doesn't work > well for my use case. Sure, there are workarounds to make it work in my > scenario, but I doubt any of them would scale as well as my current > SimpleConsumer approach :| > Or am I missing something here? > > Greetings > Valentin > > On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <jun...@gmail.com> wrote: > > Valentin, > > > > As Guozhang mentioned, to use the new consumer in the SimpleConsumer > way, > > you would subscribe to a set of topic partitions and the issue poll(). > You > > can change subscriptions on every poll since it's cheap. The benefit you > > get is that it does things like leader discovery and maintaining > > connections to the leader automatically for you. > > > > In any case, we will leave the old consumer including the SimpleConsumer > > for sometime even after the new consumer is out. > > > > Thanks, > > > > Jun > > > > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <kafka-9999...@sblk.de> > wrote: > > > >> Hi Jun, > >> > >> yes, that would theoretically be possible, but it does not scale at > all. > >> > >> I.e. in the current HTTP REST API use case, I have 5 connection pools > on > >> every tomcat server (as I have 5 brokers) and each connection pool > holds > >> upto 10 SimpleConsumer connections. So all in all I get a maximum of 50 > >> open connections per web application server. And with that I am able to > >> handle most requests from HTTP consumers without having to open/close > >> any new connections to a broker host. > >> > >> If I would now do the same implementation with the new Kafka 0.9 high > >> level consumer, I would end up with >1000 connection pools (as I have > >> >1000 topic partitions) and each of these connection pools would > contain > >> a number of consumer connections. So all in all, I would end up with > >> thousands of connection objects per application server. Not really a > >> viable approach :| > >> > >> Currently I am wondering what the rationale is for deprecating the > >> SimpleConsumer API, if there are use cases which just work much better > >> using it. > >> > >> Greetings > >> Valentin > >> > >> On 23/09/14 18:16, Guozhang Wang wrote: > >> > Hello, > >> > > >> > For your use case, with the new consumer you can still create a new > >> > consumer instance for each topic / partition, and remember the > mapping > >> of > >> > topic / partition => consumer. The upon receiving the http request > you > >> can > >> > then decide which consumer to use. Since the new consumer is single > >> > threaded, creating this many new consumers is roughly the same cost > >> > with > >> > the old simple consumer. > >> > > >> > Guozhang > >> > > >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <kafka-9999...@sblk.de> > >> > wrote: > >> > > >> >> > >> >> Hi Jun, > >> >> > >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com> > wrote: > >> >>> The new consumer api will also allow you to do what you want in a > >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions, > >> >>> control > >> >>> initial offsets, etc), only more conveniently. > >> >> > >> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9 > >> >> consumer APIs. > >> >> However, while they still allow me to do roughly what I want, I fear > >> that > >> >> they will result in an overall much worse performing implementation > on > >> my > >> >> side. > >> >> The main problem I have in my scenario is that consumer requests are > >> >> coming in via stateless HTTP requests (each request is standalone > and > >> >> specifies topics+partitions+offsets to read data from) and I need to > >> find a > >> >> good way to do connection pooling to the Kafka backend for good > >> >> performance. The SimpleConsumer would allow me to do that, an > approach > >> with > >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead. > >> >> > >> >> Basically, what I am looking for is a way to pool connections per > >> >> Kafka > >> >> broker host, independent of the topics/partitions/clients/..., so > each > >> >> Tomcat app server would keep N disjunctive connection pools, if I > >> >> have N > >> >> Kafka broker hosts. > >> >> I would then keep some central metadata which tells me which hosts > are > >> the > >> >> leaders for which topic+partition and for an incoming HTTP client > >> request > >> >> I'd just take a Kafka connection from the pool for that particular > >> broker > >> >> host, request the data and return the connection to the pool. This > >> >> means > >> >> that a Kafka broker host will get requests from lots of different > end > >> >> consumers via the same TCP connection (sequentially of course). > >> >> > >> >> With the new Kafka consumer API I would have to > subscribe/unsubscribe > >> from > >> >> topics every time I take a connection from the pool and as the > request > >> may > >> >> need go to a different broker host than the last one, that wouldn't > >> >> even > >> >> prevent all the connection/reconnection overhead. I guess I could > >> >> create > >> >> one dedicated connection pool per topic-partition, that way > >> >> connection/reconnection overhead should be minimized, but that way > I'd > >> end > >> >> up with hundreds of connection pools per app server, also not a good > >> >> approach. > >> >> All in all, the planned design of the new consumer API just doesn't > >> >> seem > >> >> to fit my use case well. Which is why I am a bit anxious about the > >> >> SimpleConsumer API being deprecated. > >> >> > >> >> Or am I missing something here? Thanks! > >> >> > >> >> Greetings > >> >> Valentin > >> > >> >