There are two questions: 1. Who controls the consumers position in the stream 2. How is that position stored?
The theory for (1) is that the consumer should control this so it can chose to move forwards or backwards or wherever it wants and has full control over when position changes take effect. For offset storage I think there are really two cases. For most people they just want the ability to easily commit their offsets without a lot of fuss, and this api provides this facility. However there are a number of special cases where you can take advantage of particulars of your destination storage system. Concrete examples in our environment are search indexers storing the offset in their index so that an indexer picks up from whatever point it has index for even if those indexes are distributed to other machines etc. Another example is putting results in a RDBMS and committing the offsets in a transaction with the data. How can a generic client serve both these needs? Well they can't really take advantage of the quirks of every possible system. So I think the best practice for a client would be to do the following: a. Add a commit() method which commits using David's API. This can be called manually or set using some kind of "autocommit" that commits every N messages. This satisfies the simple use case. b. Give out the offset and partition with each message. This way the end-user can choose to use the commit functionality or to turn off autocommit and put the offsets wherever they like. So this API allows to implement an optimal method for (a) that all client implementations can use, and doesn't hurt any other possible use. (this could alternatively be exposed as a kind of offset storage "plug in interface" which defaults to using an implementation based on this API--same difference). Let me know if you buy this argument. -Jay On Mon, Dec 17, 2012 at 2:27 PM, Milind Parikh <milindpar...@gmail.com>wrote: > Perhaps I don't understand the motivation well enough and perhaps I am > misreading the intent. > > But I thought that the design principle behind kafka is for state (from a > consumer standpoint) was to be managed by consumer and not broker. I > understand that "These APIs are optional, clients can store offsets another > way if they like." > > So three questions : > > (a) Is this a change from the original intent of Kafka? > (b) If it is a change, why not make it such that there is no need for > clients to roll their own? > autocommit=false-> no storage of offsets > autocommit=true -> store offsets > (c) I suppose that the use case of multiple sets of consumer groups wanting > to use the offsets for different purposes could be one of the use cases for > the clients to roll their own. That corner case could be handled through > handing out a uuid for a set of consumer group to operate against.Any other > use cases for the clients to absolutely roll their own? > > Regards > Milind > > > > > > > On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey Guys, > > > > David has made a bunch of progress on the offset commit api > implementation. > > > > Since this is a public API it would be good to do as much thinking > up-front > > as possible to minimize future iterations. > > > > It would be great if folks could do the following: > > 1. Read the wiki here: > > https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management > > 2. Check out the code David wrote here: > > https://issues.apache.org/jira/browse/KAFKA-657 > > > > In particular our hope is that this API can act as the first step in > > scaling the way we store offsets (ZK is not really very appropriate for > > this). This of course requires having some plan in mind for offset > storage. > > I have written (and then after getting some initial feedback, rewritten) > a > > section in the above wiki on how this might work. > > > > If no one says anything I will be taking a slightly modified patch that > > adds this functionality on trunk as soon as David gets in a few minor > > tweaks. > > > > -Jay > > >