Yeah I think it is better to discuss these points in the KIP meeting, or it may become a long thread. Let's do that this Tuesday.
Guozhang On Thu, Jun 11, 2015 at 5:48 PM, Jun Rao <j...@confluent.io> wrote: > Guozhang, > > Perhaps we can discuss this in our KIP hangout next week? > > Thanks, > > Jun > > On Tue, Jun 9, 2015 at 1:12 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > This email is to kick-off some discussion around the changes we want to > > make on the new consumer APIs as well as their semantics. Here are a > > not-comprehensive list of items in my mind: > > > > 1. Poll(timeout): current definition of timeout states "The time, in > > milliseconds, spent waiting in poll if data is not available. If 0, waits > > indefinitely." While in the current implementation, we have different > > semantics as stated, for example: > > > > a) poll(timeout) can return before "timeout" elapsed with empty consumed > > data. > > b) poll(timeout) can return after more than "timeout" elapsed due to > > blocking event like join-group, coordinator discovery, etc. > > > > We should think a bit more on what semantics we really want to provide > and > > how to provide it in implementation. > > > > 2. Thread safeness: currently we have a coarsen-grained locking mechanism > > that provides thread safeness but blocks commit / position / etc calls > > while poll() is in process. We are considering to remove the > > coarsen-grained locking with an additional Consumer.wakeup() call to > break > > the polling, and instead suggest users to have one consumer client per > > thread, which aligns with the design of a single-threaded consumer > > (KAFKA-2123). > > > > 3. Commit(): we want to improve the async commit calls to add a callback > > handler upon commit completes, and guarantee ordering of commit calls > with > > retry policies (KAFKA-2168). In addition, we want to extend the API to > > expose attaching / fetching offset metadata stored in the Kafka offset > > manager. > > > > 4. OffsetFetchRequest: currently for handling OffsetCommitRequest we > check > > the generation id and the assigned partitions before accepting the > request > > if the group is using Kafka for partition management, but for > > OffsetFetchRequest we cannot do this checking since it does not include > > groupId / consumerId information. Do people think this is OK or we should > > add this as we did in OffsetCommitRequest? > > > > 5. New APIs: there are some other requests to add: > > > > a) offsetsBeforeTime(timestamp): or would seekToEnd and seekToBeginning > > sufficient? > > > > b) listTopics(): or should we just enforce users to use AdminUtils for > such > > operations? > > > > There may be other issues that I have missed here, so folks just bring it > > up if you thought about anything else. > > > > -- Guozhang > > > -- -- Guozhang