Re: Questions about Kafka 0.9 API changes

Guozhang Wang Mon, 29 Sep 2014 17:53:11 -0700

Thanks Valentin!

Guozhang


On Sun, Sep 28, 2014 at 3:49 PM, Valentin <kafka-9999...@sblk.de> wrote:

>
> Hi Jun,
>
> ok, I created:
> https://issues.apache.org/jira/browse/KAFKA-1655
>
> Greetings
> Valentin
>
> On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <jun...@gmail.com> wrote:
> > Valentin,
> >
> > That's a good point. We don't have this use case in mind when designing
> the
> > new consumer api. A straightforward implementation could be removing the
> > locally cached topic metadata for unsubscribed topics. It's probably
> > possible to add a config value to avoid churns in caching the metadata.
> > Could you file a jira so that we can track this?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Sep 25, 2014 at 4:19 AM, Valentin <kafka-9999...@sblk.de> wrote:
> >
> >>
> >> Hi Jun, Hi Guozhang,
> >>
> >> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
> >> operation this might work. But if it needs to do any additional calls
> to
> >> fetch metadata during a subscribe/unsubscribe call, the overhead could
> >> get
> >> quite problematic. The main issue I still see here is that an
> additional
> >> layer is added which does not really provide any benefit for a use case
> >> like mine.
> >> I.e. the leader discovery and connection handling you mention below
> don't
> >> really offer value in this case, as for the connection pooling approach
> >> suggested, I will have to discover and maintain leader metadata in my
> own
> >> code anyway as well as handling connection pooling. So if I understand
> >> the
> >> current plans for the Kafka 0.9 consumer correctly, it just doesn't
> work
> >> well for my use case. Sure, there are workarounds to make it work in my
> >> scenario, but I doubt any of them would scale as well as my current
> >> SimpleConsumer approach :|
> >> Or am I missing something here?
> >>
> >> Greetings
> >> Valentin
> >>
> >> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <jun...@gmail.com> wrote:
> >> > Valentin,
> >> >
> >> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
> >> way,
> >> > you would subscribe to a set of topic partitions and the issue
> poll().
> >> You
> >> > can change subscriptions on every poll since it's cheap. The benefit
> >> > you
> >> > get is that it does things like leader discovery and maintaining
> >> > connections to the leader automatically for you.
> >> >
> >> > In any case, we will leave the old consumer including the
> >> > SimpleConsumer
> >> > for sometime even after the new consumer is out.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <kafka-9999...@sblk.de>
> >> wrote:
> >> >
> >> >> Hi Jun,
> >> >>
> >> >> yes, that would theoretically be possible, but it does not scale at
> >> all.
> >> >>
> >> >> I.e. in the current HTTP REST API use case, I have 5 connection
> pools
> >> on
> >> >> every tomcat server (as I have 5 brokers) and each connection pool
> >> holds
> >> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of
> >> >> 50
> >> >> open connections per web application server. And with that I am able
> >> >> to
> >> >> handle most requests from HTTP consumers without having to
> open/close
> >> >> any new connections to a broker host.
> >> >>
> >> >> If I would now do the same implementation with the new Kafka 0.9
> high
> >> >> level consumer, I would end up with >1000 connection pools (as I
> have
> >> >> >1000 topic partitions) and each of these connection pools would
> >> contain
> >> >> a number of consumer connections. So all in all, I would end up with
> >> >> thousands of connection objects per application server. Not really a
> >> >> viable approach :|
> >> >>
> >> >> Currently I am wondering what the rationale is for deprecating the
> >> >> SimpleConsumer API, if there are use cases which just work much
> better
> >> >> using it.
> >> >>
> >> >> Greetings
> >> >> Valentin
> >> >>
> >> >> On 23/09/14 18:16, Guozhang Wang wrote:
> >> >> > Hello,
> >> >> >
> >> >> > For your use case, with the new consumer you can still create a
> new
> >> >> > consumer instance for each  topic / partition, and remember the
> >> mapping
> >> >> of
> >> >> > topic / partition => consumer. The upon receiving the http request
> >> you
> >> >> can
> >> >> > then decide which consumer to use. Since the new consumer is
> single
> >> >> > threaded, creating this many new consumers is roughly the same
> cost
> >> >> > with
> >> >> > the old simple consumer.
> >> >> >
> >> >> > Guozhang
> >> >> >
> >> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <kafka-9999...@sblk.de>
> >> >> > wrote:
> >> >> >
> >> >> >>
> >> >> >> Hi Jun,
> >> >> >>
> >> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com>
> >> wrote:
> >> >> >>> The new consumer api will also allow you to do what you want in
> a
> >> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
> >> >> >>> control
> >> >> >>> initial offsets, etc), only more conveniently.
> >> >> >>
> >> >> >> Yeah, I have reviewed the available javadocs for the new Kafka
> 0.9
> >> >> >> consumer APIs.
> >> >> >> However, while they still allow me to do roughly what I want, I
> >> >> >> fear
> >> >> that
> >> >> >> they will result in an overall much worse performing
> implementation
> >> on
> >> >> my
> >> >> >> side.
> >> >> >> The main problem I have in my scenario is that consumer requests
> >> >> >> are
> >> >> >> coming in via stateless HTTP requests (each request is standalone
> >> and
> >> >> >> specifies topics+partitions+offsets to read data from) and I need
> >> >> >> to
> >> >> find a
> >> >> >> good way to do connection pooling to the Kafka backend for good
> >> >> >> performance. The SimpleConsumer would allow me to do that, an
> >> approach
> >> >> with
> >> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
> >> >> >>
> >> >> >> Basically, what I am looking for is a way to pool connections per
> >> >> >> Kafka
> >> >> >> broker host, independent of the topics/partitions/clients/..., so
> >> each
> >> >> >> Tomcat app server would keep N disjunctive connection pools, if I
> >> >> >> have N
> >> >> >> Kafka broker hosts.
> >> >> >> I would then keep some central metadata which tells me which
> hosts
> >> are
> >> >> the
> >> >> >> leaders for which topic+partition and for an incoming HTTP client
> >> >> request
> >> >> >> I'd just take a Kafka connection from the pool for that
> particular
> >> >> broker
> >> >> >> host, request the data and return the connection to the pool.
> This
> >> >> >> means
> >> >> >> that a Kafka broker host will get requests from lots of different
> >> end
> >> >> >> consumers via the same TCP connection (sequentially of course).
> >> >> >>
> >> >> >> With the new Kafka consumer API I would have to
> >> subscribe/unsubscribe
> >> >> from
> >> >> >> topics every time I take a connection from the pool and as the
> >> request
> >> >> may
> >> >> >> need go to a different broker host than the last one, that
> wouldn't
> >> >> >> even
> >> >> >> prevent all the connection/reconnection overhead. I guess I could
> >> >> >> create
> >> >> >> one dedicated connection pool per topic-partition, that way
> >> >> >> connection/reconnection overhead should be minimized, but that
> way
> >> I'd
> >> >> end
> >> >> >> up with hundreds of connection pools per app server, also not a
> >> >> >> good
> >> >> >> approach.
> >> >> >> All in all, the planned design of the new consumer API just
> doesn't
> >> >> >> seem
> >> >> >> to fit my use case well. Which is why I am a bit anxious about
> the
> >> >> >> SimpleConsumer API being deprecated.
> >> >> >>
> >> >> >> Or am I missing something here? Thanks!
> >> >> >>
> >> >> >> Greetings
> >> >> >> Valentin
> >> >>
> >> >>
> >>
>



-- 
-- Guozhang

Re: Questions about Kafka 0.9 API changes

Reply via email to