Comments inline:

On Mon, Feb 10, 2014 at 2:31 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Jay,
>
> Thanks for the detailed comments.
>
> 1. Yeah we could discuss a bit more on that.
>
> 2. Since subscribe() is incremental, adding one topic-partition is OK, and
> personally I think it is cleaner than subscribe(String topic,
> int...partition)?
>
I am not too particular. Have you actually tried this? I think writing
actual sample code is important.


> 3. Originally I was thinking about two interfaces:
>
> getOffsets() // offsets for all partitions that I am consuming now
>
> getOffset(topc-partition) // offset of the specified topic-partition, will
> throw exception if it is not currently consumed.
>
> What do you think about these?
>

The naming needs to distinguish committed offset position versus fetch
offset position. Also we aren't using the getX convention.


> 4. Yes, that remains a config.
>

Does that make sense given that you change your position via an api now?


> 5. Agree.
>
> 6. If the time out value is null then it will "logically" return
> immediately with whatever data is available. I think an indefinitely poll()
> function could be replaced with just
>
> while (true) poll(some-time)?
>

That is fine but we should provide a no arg poll for that, poll(null) isn't
clear. We should add the timeunit as per the post java 5 convention as that
makes the call more readable. E.g.
   poll(5) vs poll(5, TimeUnit.MILLISECONDS)


> 7. I am open with either approach.
>

Cool.

8. I was thinking about two interfaces for the commit functionality:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
>
> Do those sound better?
>

Well none kind of address the common case which is to commit all
partitions. For these I was thinking just
   commit();
The advantage of this simpler method is that you don't need to bother about
partitions you just consume the messages given to you and then commit them.

9. Currently I think about un-subscribe as "close and re-subscribe", and
> would like to hear people's opinion about it.
>

Hmm, I think it is a little weird if there is a subscribe which can be
called at any time but no unsubscribe. Would this be hard to do.


> 10. Yes. Position() is an API function, and as and API it means "be called
> at any time" and will change the next fetching starting offset.
>

Cool.


> 11. The ConsumerRecord would have the offset info of the message. Is that
> what you want?
>

But that is only after I have gotten a message. I'm not sure if that covers
all cases or not.


> About use cases: great point. I will add some more examples of using the
> API functions in the wiki pages.
>
> Guozhang
>
>
>
>
> On Mon, Feb 10, 2014 at 12:20 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > A few items:
> > 1. ConsumerRebalanceCallback
> >    a. onPartitionsRevoked would be a better name.
> >    b. We should discuss the possibility of splitting this into two
> > interfaces. The motivation would be that in Java 8 single method
> interfaces
> > can directly take methods which might be more intuitive.
> >    c. If we stick with a single interface I would prefer the name
> > RebalanceCallback as its more concise
> > 2. Should subscribe(String topic, int partition) should be
> subscribe(String
> > topic, int...partition)?
> > 3. Is lastCommittedOffset call just a local access? If so it would be
> more
> > convenient not to batch it.
> > 4. How are we going to handle the earliest/latest starting position
> > functionality we currently have. Does that remain a config?
> > 5. Do we need to expose the general ability to get known positions from
> the
> > log? E.g. the functionality in the OffsetRequest...? That would make the
> > ability to change position a little easier.
> > 6. Should poll(java.lang.Long timeout) be poll(long timeout, TimeUnit
> > unit)? Is it Long because it allows null? If so should we just add a
> poll()
> > that polls indefinitely?
> > 7. I recommend we remove the boolean parameter from commit as it is
> really
> > hard to read code that has boolean parameters without named arguments.
> Can
> > we make it something like commit(...) and commitAsync(...)?
> > 8. What about the common case where you just want to commit the current
> > position for all partitions?
> > 9. How do you unsubscribe?
> > 10. You say in a few places that positions() only impacts the starting
> > position, but surely that isn't the case, right? Surely it controls the
> > fetch position for that partition and can be called at any time?
> Otherwise
> > it is a pretty weird api, right?
> > 11. How do I get my current position? Not the committed position but the
> > offset of the next message that will be given to me?
> >
> > One thing that I really found helpful for the API design was writing out
> > actual code for different scenarios against the API. I think it might be
> > good to do that for this too--i.e. enumerate the various use cases and
> code
> > that use case up to see how it looks. I'm not sure if it would be useful
> to
> > collect these kinds of scenarios from people. I know they have
> sporadically
> > popped up on the mailing list.
> >
> > -Jay
> >
> >
> > On Mon, Feb 10, 2014 at 10:54 AM, Neha Narkhede <neha.narkh...@gmail.com
> > >wrote:
> >
> > > As mentioned in previous emails, we are also working on a
> > re-implementation
> > > of the consumer. I would like to use this email thread to discuss the
> > > details of the public API. I would also like us to be picky about this
> > > public api now so it is as good as possible and we don't need to break
> it
> > > in the future.
> > >
> > > The best way to get a feel for the API is actually to take a look at
> the
> > > javadoc<
> > >
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html
> > > >,
> > > the hope is to get the api docs good enough so that it is
> > self-explanatory.
> > > You can also take a look at the configs
> > > here<
> > >
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerConfig.html
> > > >
> > >
> > > Some background info on implementation:
> > >
> > > At a high level the primary difference in this consumer is that it
> > removes
> > > the distinction between the "high-level" and "low-level" consumer. The
> > new
> > > consumer API is non blocking and instead of returning a blocking
> > iterator,
> > > the consumer provides a poll() API that returns a list of records. We
> > think
> > > this is better compared to the blocking iterators since it effectively
> > > decouples the threading strategy used for processing messages from the
> > > consumer. It is worth noting that the consumer is entirely single
> > threaded
> > > and runs in the user thread. The advantage is that it can be easily
> > > rewritten in less multi-threading-friendly languages. The consumer
> > batches
> > > data and multiplexes I/O over TCP connections to each of the brokers it
> > > communicates with, for high throughput. The consumer also allows long
> > poll
> > > to reduce the end-to-end message latency for low throughput data.
> > >
> > > The consumer provides a group management facility that supports the
> > concept
> > > of a group with multiple consumer instances (just like the current
> > > consumer). This is done through a custom heartbeat and group management
> > > protocol transparent to the user. At the same time, it allows users the
> > > option to subscribe to a fixed set of partitions and not use group
> > > management at all. The offset management strategy defaults to Kafka
> based
> > > offset management and the API provides a way for the user to use a
> > > customized offset store to manage the consumer's offsets.
> > >
> > > A key difference in this consumer also is the fact that it does not
> > depend
> > > on zookeeper at all.
> > >
> > > More details about the new consumer design are
> > > here<
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > > >
> > >
> > > Please take a look at the new
> > > API<
> > >
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html
> > > >and
> > > give us any thoughts you may have.
> > >
> > > Thanks,
> > > Neha
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to