+1 on limiting the size. But could you do 2k instead of 1k? Using Interval
Time Clocks gets you a lot on distributed autonomous processing; but most
large scale ITCs go upto 1.5K.

http://code.google.com/p/itclocks/    refer to the link on conference paper.


Regards
Milind



On Thu, Dec 20, 2012 at 2:04 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Err, to clarify, I meant punt on persisting the metadata not punt on
> persisting the offset. Basically that field would be in the protocol but
> would be unused in this phase.
>
> -Jay
>
>
> On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > I actually recommend we just punt on implementing persistence in zk
> > entirely, otherwise we have to have an upgrade path to grandfather over
> > existing zk data to the new format. Let's just add it in the API and only
> > actually store it out when we redo the backend. We can handle the size
> > limit then too.
> >
> > -Jay
> >
> >
> > On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <mum...@gmail.com> wrote:
> >
> >> No particular objection, though in order to support atomic writes of
> >> (offset, metadata), we will need to define a protocol for the ZooKeeper
> >> payloads. Something like:
> >>
> >>   OffsetPayload => Offset [Metadata]
> >>   Metadata => length prefixed string
> >>
> >> should suffice. Otherwise we would have to rely on the multi-write
> >> mechanism to keep parallel znodes in sync (I generally don't like things
> >> like this).
> >>
> >> +1 for limiting the size (1kb sounds reasonable)
> >>
> >>
> >> On 12/20/12 4:03 PM, Jay Kreps wrote:
> >>
> >>> Okay I did some assessment of use cases we have which aren't using the
> >>> default offset storage API and came up with one generalization. I would
> >>> like to propose--add a generic metadata field to the offset api on a
> >>> per-partition basis. So that would leave us with the following:
> >>>
> >>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
> >>> Metadata]]
> >>>
> >>> OffsetFetchResponse => [TopicName [Partition Offset Metadata
> ErrorCode]]
> >>>
> >>>    Metadata => string
> >>>
> >>> If you want to store a reference to any associated state (say an HDFS
> >>> file
> >>> name) so that if the consumption fails over the new consumer can start
> up
> >>> with the same state, this would be a place to store that. It would not
> be
> >>> intended to support large stuff (we could enforce a 1k limit or
> >>> something,
> >>> just something small or a reference on where to find the state (say a
> >>> file
> >>> name).
> >>>
> >>> Objections?
> >>>
> >>> -Jay
> >>>
> >>>
> >>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> >>>
> >>>  Hey Guys,
> >>>>
> >>>> David has made a bunch of progress on the offset commit api
> >>>> implementation.
> >>>>
> >>>> Since this is a public API it would be good to do as much thinking
> >>>> up-front as possible to minimize future iterations.
> >>>>
> >>>> It would be great if folks could do the following:
> >>>> 1. Read the wiki here:
> >>>>
> https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<
> https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
> >>>> 2. Check out the code David wrote here:
> >>>> https://issues.apache.org/**jira/browse/KAFKA-657<
> https://issues.apache.org/jira/browse/KAFKA-657>
> >>>>
> >>>> In particular our hope is that this API can act as the first step in
> >>>> scaling the way we store offsets (ZK is not really very appropriate
> for
> >>>> this). This of course requires having some plan in mind for offset
> >>>> storage.
> >>>> I have written (and then after getting some initial feedback,
> >>>> rewritten) a
> >>>> section in the above wiki on how this might work.
> >>>>
> >>>> If no one says anything I will be taking a slightly modified patch
> that
> >>>> adds this functionality on trunk as soon as David gets in a few minor
> >>>> tweaks.
> >>>>
> >>>> -Jay
> >>>>
> >>>>
> >>
> >
>

Reply via email to