Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-08 Thread Becket Qin
I updated the wiki to make the following change: Instead of maintaining a globally monotonically increasing time index. We only make sure the time index for each log segment is monotonically increasing. By doing that everything seems much simpler. we avoid empty time index. Enforcing log retention

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-07 Thread Becket Qin
Hi Jun, What do you think about the above solution? I am trying to include KIP-33 into 0.10.0 because the log retention has been a long pending issue. Thanks, Jiangjie (Becket) Qin On Tue, Mar 1, 2016 at 8:18 PM, Becket Qin wrote: > Hi Jun, > > I see. If we only use index.interval.bytes, the

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-01 Thread Becket Qin
Hi Jun, I see. If we only use index.interval.bytes, the time index entry will be inserted when (1) the largest timestamp is in this segment AND (2) at least index.interval.bytes have been appended since last time index entry insertion. In this case (1) becomes implicit instead of having an explici

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-01 Thread Jun Rao
Hi, Jiangjie, I was thinking perhaps just reusing index.interval.bytes is enough. Not sure if there is much value in adding an additional time.index.interval.ms. For 1, the timestamp index has entries of timestamp -> file position. So, there is actually no offset in the index, right? For 2, what

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-01 Thread Becket Qin
Hi Jun, Rolling out a new segment when the time index is full sounds good. So both time index and offset index will be sharing the configuration of max index size. If we do that, do you think we still want to reuse index.interval.bytes? If we don't, the risk is that in some corner cases, we might

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-03-01 Thread Jun Rao
Jiangjie, Currently, we roll a new log segment if the index is full. We can probably just do the same on the time index. This will bound the index size. 1. Sounds good. 2. I was wondering an edge case where the largest timestamp is in the oldest segment and the time index is empty is in all newe

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-29 Thread Becket Qin
Hi Jun, I think index.interval.bytes is used to control the density of the offset index. The counterpart of index.interval.bytes for time index is time.index.interval.ms. If we did not change the semantic of log.roll.ms, log.roll.ms/time.index.interval.ms and log.segment.bytes/index.interval.bytes

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-29 Thread Jun Rao
Hi, Becket, I thought that your proposal to build time-based index just based off index.interval.bytes is reasonable. Is there a particular need to also add time. index.interval.bytes? Compute the pre-allocated index file size based on log segment file size can be useful. However, the tricky thin

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-28 Thread Becket Qin
Hi Guozhang, The size of memory mapped index file was also our concern as well. That is why we are suggesting minute level time indexing instead of second level. There are a few thoughts on the extra memory cost of time index. 1. Currently all the index files are loaded as memory mapped files. No

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-25 Thread Guozhang Wang
Jiangjie, I was originally only thinking about the "time.index.size.max.bytes" config in addition to the "offset.index.size.max.bytes". Since the latter's default size is 10MB, and for memory mapped file, we will allocate that much of memory at the start which could be a pressure on RAM if we doub

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-24 Thread Becket Qin
Hi Guozhang, I thought about this again and it seems we stilll need the time.index.interval.ms configuration to avoid unnecessary frequent time index insertion. I just updated the wiki to add index.interval.bytes as an additional constraints for time index entry insertion. Another slight change m

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-24 Thread Becket Qin
Thanks for the comment Guozhang, I just changed the configuration name to "time.index.interval.ms". It seems the real question here is how big the offset indices will be. Theoretically we can have one time index entry for each message in a log segment. For example, if there is one event per minut

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-24 Thread Guozhang Wang
Thanks Jiangjie, a few comments on the wiki: 1. Config name "time.index.interval" to "time.index.interval.ms" to be consistent. Also do we need a "time.index.size.max.bytes" as well? 2. Will the memory mapped index file for timestamp have the same default initial / max size (10485760) as the offs

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-23 Thread Becket Qin
Bump. Per Jun's comments during KIP hangout, I have updated wiki with the upgrade plan or KIP-33. Let's vote! Thanks, Jiangjie (Becket) Qin On Wed, Feb 3, 2016 at 10:32 AM, Becket Qin wrote: > Hi all, > > I would like to initiate the vote for KIP-33. > > https://cwiki.apache.org/confluence/d

[VOTE] KIP-33 - Add a time based log index to Kafka

2016-02-03 Thread Becket Qin
Hi all, I would like to initiate the vote for KIP-33. https://cwiki.apache.org/confluence/display/KAFKA/KIP-33 +-+Add+a+time+based+log+index A good amount of the KIP has been touched during the discussion on KIP-32. So I also put the link to KIP-32 here for reference. https://cwiki.apache.org/co