Re: New Consumer API discussion

Guozhang Wang Tue, 11 Feb 2014 15:37:04 -0800

Hi Imran,

1. I think choosing between a) and b) is really dependent on the consuming
traffic. We decided to make the consumer client single-threaded and let
users to decide using one or multiple clients based on traffic mainly
because with a multi-thread client, the fetcher thread could die silently
while the user thread still works and gets blocked forever.


2. Yes. If the subcription is a list of topics, which means it relies on
Kafka to assign partitions, then the first pool will trigger the group
management protocol and upon receiving the partitions the callback function
will be executed.

3. The wiki page (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design)
have some example usages of the new consumer API (there might be some minor
function signature differences with the javadoc). Would you want to take a
look at give some thoughts about that?

Guozhang


On Tue, Feb 11, 2014 at 1:50 PM, Imran Rashid <im...@therashids.com> wrote:

> Hi,
>
> thanks for sharing this and getting feedback.  Sorry I am probably missing
> something basic, but I'm not sure how a multi-threaded consumer would
> work.  I can imagine its either:
>
> a) I just have one thread poll kafka.  If I want to process msgs in
> multiple threads, than I deal w/ that after polling, eg. stick them into a
> blocking queue or something, and have more threads that read from the
> queue.
>
> b) each thread creates its own KafkaConsumer.  They are all registered the
> same way, and I leave it to kafka to figure out what data to give to each
> one.
>
>
> (a) certainly makes things simple, but I worry about throughput -- is that
> just as good as having one thread trying to consumer each partition?
>
> (b) makes it a bit of a pain to figure out how many threads to use.  I
> assume there is no point in using more threads than there are partitions,
> so first you've got to figure out how many partitions there are in each
> topic.  Might be nice if there were some util functions to simplify this.
>
>
> Also, since the initial call to subscribe doesn't give the partition
> assignment, does that mean the first call to poll() will always call the
> ConsumerRebalanceCallback?
>
> probably a short code-sample would clear up all my questions.  I'm
> imagining pseudo-code like:
>
>
> int numPartitions = ...
> int numThreads = min(maxThreads, numPartitions);
> //maybe should be something even more complicated, to take into account how
> many other active consumers there are right now for the given group
>
> List<MyConsumer> consumers = new ArrayList<MyConsumer>();
> for (int i = 0; i < numThreads; i++) {
>   MyConsumer c = new MyConsumer();
>   c.subscribe(...);
>   //if subscribe is expensive, then this should already happen in another
> thread
>   consumers.add(c);
> }
>
> // if each subscribe() happened in a different thread, we should put a
> barrier in here, so everybody subscribes before they begin polling
>
> //now launch a thread per consumer, where they each poll
>
>
>
> If I'm on the right track, I'd like to expand this example, showing how
> each "MyConsumer" can keep track of its partitions & offsets, even in the
> face of rebalances.  As Jay said, I think a minimal code example could
> really help us see the utility & faults of the api.
>
> overall I really like what I see, seems like a big improvement!
>
> thanks,
> Imran
>
>
>
> On Mon, Feb 10, 2014 at 12:54 PM, Neha Narkhede <neha.narkh...@gmail.com
> >wrote:
>
> > As mentioned in previous emails, we are also working on a
> re-implementation
> > of the consumer. I would like to use this email thread to discuss the
> > details of the public API. I would also like us to be picky about this
> > public api now so it is as good as possible and we don't need to break it
> > in the future.
> >
> > The best way to get a feel for the API is actually to take a look at the
> > javadoc<
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html
> > >,
> > the hope is to get the api docs good enough so that it is
> self-explanatory.
> > You can also take a look at the configs
> > here<
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerConfig.html
> > >
> >
> > Some background info on implementation:
> >
> > At a high level the primary difference in this consumer is that it
> removes
> > the distinction between the "high-level" and "low-level" consumer. The
> new
> > consumer API is non blocking and instead of returning a blocking
> iterator,
> > the consumer provides a poll() API that returns a list of records. We
> think
> > this is better compared to the blocking iterators since it effectively
> > decouples the threading strategy used for processing messages from the
> > consumer. It is worth noting that the consumer is entirely single
> threaded
> > and runs in the user thread. The advantage is that it can be easily
> > rewritten in less multi-threading-friendly languages. The consumer
> batches
> > data and multiplexes I/O over TCP connections to each of the brokers it
> > communicates with, for high throughput. The consumer also allows long
> poll
> > to reduce the end-to-end message latency for low throughput data.
> >
> > The consumer provides a group management facility that supports the
> concept
> > of a group with multiple consumer instances (just like the current
> > consumer). This is done through a custom heartbeat and group management
> > protocol transparent to the user. At the same time, it allows users the
> > option to subscribe to a fixed set of partitions and not use group
> > management at all. The offset management strategy defaults to Kafka based
> > offset management and the API provides a way for the user to use a
> > customized offset store to manage the consumer's offsets.
> >
> > A key difference in this consumer also is the fact that it does not
> depend
> > on zookeeper at all.
> >
> > More details about the new consumer design are
> > here<
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > >
> >
> > Please take a look at the new
> > API<
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html
> > >and
> > give us any thoughts you may have.
> >
> > Thanks,
> > Neha
> >
>



-- 
-- Guozhang

Re: New Consumer API discussion

Reply via email to