Thanks for kicking off this discussion, Guozhang. We might also want to discuss the API to expose the high watermark. Some discussion has been there in KAFKA-2076.
Thanks, Jiangjie (Becket) Qin On 6/9/15, 1:12 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: >This email is to kick-off some discussion around the changes we want to >make on the new consumer APIs as well as their semantics. Here are a >not-comprehensive list of items in my mind: > >1. Poll(timeout): current definition of timeout states "The time, in >milliseconds, spent waiting in poll if data is not available. If 0, waits >indefinitely." While in the current implementation, we have different >semantics as stated, for example: > >a) poll(timeout) can return before "timeout" elapsed with empty consumed >data. >b) poll(timeout) can return after more than "timeout" elapsed due to >blocking event like join-group, coordinator discovery, etc. > >We should think a bit more on what semantics we really want to provide and >how to provide it in implementation. > >2. Thread safeness: currently we have a coarsen-grained locking mechanism >that provides thread safeness but blocks commit / position / etc calls >while poll() is in process. We are considering to remove the >coarsen-grained locking with an additional Consumer.wakeup() call to break >the polling, and instead suggest users to have one consumer client per >thread, which aligns with the design of a single-threaded consumer >(KAFKA-2123). > >3. Commit(): we want to improve the async commit calls to add a callback >handler upon commit completes, and guarantee ordering of commit calls with >retry policies (KAFKA-2168). In addition, we want to extend the API to >expose attaching / fetching offset metadata stored in the Kafka offset >manager. > >4. OffsetFetchRequest: currently for handling OffsetCommitRequest we check >the generation id and the assigned partitions before accepting the request >if the group is using Kafka for partition management, but for >OffsetFetchRequest we cannot do this checking since it does not include >groupId / consumerId information. Do people think this is OK or we should >add this as we did in OffsetCommitRequest? > >5. New APIs: there are some other requests to add: > >a) offsetsBeforeTime(timestamp): or would seekToEnd and seekToBeginning >sufficient? > >b) listTopics(): or should we just enforce users to use AdminUtils for >such >operations? > >There may be other issues that I have missed here, so folks just bring it >up if you thought about anything else. > >-- Guozhang