Thanks for kicking off this discussion, Guozhang.
We might also want to discuss the API to expose the high watermark. Some
discussion has been there in KAFKA-2076.

Thanks,

Jiangjie (Becket) Qin

On 6/9/15, 1:12 PM, "Guozhang Wang" <wangg...@gmail.com> wrote:

>This email is to kick-off some discussion around the changes we want to
>make on the new consumer APIs as well as their semantics. Here are a
>not-comprehensive list of items in my mind:
>
>1. Poll(timeout): current definition of timeout states "The time, in
>milliseconds, spent waiting in poll if data is not available. If 0, waits
>indefinitely." While in the current implementation, we have different
>semantics as stated, for example:
>
>a) poll(timeout) can return before "timeout" elapsed with empty consumed
>data.
>b) poll(timeout) can return after more than "timeout" elapsed due to
>blocking event like join-group, coordinator discovery, etc.
>
>We should think a bit more on what semantics we really want to provide and
>how to provide it in implementation.
>
>2. Thread safeness: currently we have a coarsen-grained locking mechanism
>that provides thread safeness but blocks commit / position / etc calls
>while poll() is in process. We are considering to remove the
>coarsen-grained locking with an additional Consumer.wakeup() call to break
>the polling, and instead suggest users to have one consumer client per
>thread, which aligns with the design of a single-threaded consumer
>(KAFKA-2123).
>
>3. Commit(): we want to improve the async commit calls to add a callback
>handler upon commit completes, and guarantee ordering of commit calls with
>retry policies (KAFKA-2168). In addition, we want to extend the API to
>expose attaching / fetching offset metadata stored in the Kafka offset
>manager.
>
>4. OffsetFetchRequest: currently for handling OffsetCommitRequest we check
>the generation id and the assigned partitions before accepting the request
>if the group is using Kafka for partition management, but for
>OffsetFetchRequest we cannot do this checking since it does not include
>groupId / consumerId information. Do people think this is OK or we should
>add this as we did in OffsetCommitRequest?
>
>5. New APIs: there are some other requests to add:
>
>a) offsetsBeforeTime(timestamp): or would seekToEnd and seekToBeginning
>sufficient?
>
>b) listTopics(): or should we just enforce users to use AdminUtils for
>such
>operations?
>
>There may be other issues that I have missed here, so folks just bring it
>up if you thought about anything else.
>
>-- Guozhang

Reply via email to