+1

On Fri, Apr 15, 2016 at 9:37 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> +1 from me. Thanks.
>
> On Fri, Apr 15, 2016 at 9:16 AM, Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Jiangjie,
>>
>> Thanks for the latest update. +1 on the KIP.
>>
>> Jun
>>
>> On Thu, Apr 14, 2016 at 2:36 PM, Becket Qin <becket....@gmail.com> wrote:
>>
>> > Hi Jun,
>> >
>> > 11. Yes, that sounds a reasonable solution. In the latest patch I am
>> doing
>> > the following in order:
>> > a. Create an empty time index for a log segment if there isn't one.
>> > b. For all non-active log segments, append an entry of
>> > (last_modification_time -> next_base_offset) into the index. The time
>> index
>> > of the active segment is left empty. All the in-memory maxTimestamp is
>> set
>> > to -1.
>> > c. If there is a hard failure, the time index and offset index will both
>> be
>> > rebuilt.
>> >
>> > So we do not rebuild the time index during upgrade unless there was a
>> hard
>> > failure. I have updated the wiki to reflect this.
>> >
>> > BTW, it seems that the current code will never hit the case where an
>> index
>> > is missing. I commented on PR.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> >
>> > On Thu, Apr 14, 2016 at 10:00 AM, Jun Rao <j...@confluent.io> wrote:
>> >
>> > > Hi, Jiangjie,
>> > >
>> > > 11. Rebuilding all missing time indexes will make the upgrade process
>> > > longer since the log segments pre 0.10.0 don't have the time index.
>> Could
>> > > we only rebuild the missing indexes after the last flush offset? For
>> > other
>> > > segments missing the time index, we just assume lastTimestamp to be -1?
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Wed, Apr 13, 2016 at 9:55 AM, Becket Qin <becket....@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Jun and Guozhang,
>> > > >
>> > > > I have updated the KIP wiki to incorporate your comments. Please let
>> me
>> > > > know if you prefer starting another discussion thread for further
>> > > > discussion.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > > On Mon, Apr 11, 2016 at 12:21 AM, Becket Qin <becket....@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Guozhang and Jun,
>> > > > >
>> > > > > Thanks for the comments. Please see the responses below.
>> > > > >
>> > > > > Regarding to Guozhang's question #1 and Jun's question #12. I was
>> > > > > inserting the time index and offset index entry together mostly for
>> > > > > simplicity as Guozhang mentioned. The purpose of using index
>> interval
>> > > > bytes
>> > > > > for time index was to control the density of the time index, which
>> is
>> > > the
>> > > > > same purpose as offset index. It seems reasonable to make them
>> > aligned.
>> > > > We
>> > > > > can track separately the physical position when we insert the last
>> > time
>> > > > > index entry(my original code did that), but when I wrote the code I
>> > > feel
>> > > > it
>> > > > > seems unnecessary. Another minor benefit is that searching by
>> > timestamp
>> > > > > could be potentially faster if we align the time index and offset
>> > > index.
>> > > > > It is possible that we only have either a corrupted time index or
>> an
>> > > > > offset index, but not both. Although we can choose to only rebuild
>> > the
>> > > > one
>> > > > > which is corrupted, given that we have to scan the entire log
>> segment
>> > > > > anyway, rebuilding both of them seems not much overhead. So the
>> > current
>> > > > > patch I have is rebuilding both of them together.
>> > > > >
>> > > > > 10. Yes, it should only happen after a hard failure. The last time
>> > > index
>> > > > > entry of a normally closed segment has already points to the LEO,
>> so
>> > > > there
>> > > > > is no scan during start up.
>> > > > >
>> > > > > 11. On broker startup, if a time index does not exist, an empty one
>> > > will
>> > > > > be created first. If message format version is 0.9.0, we will
>> append
>> > a
>> > > > time
>> > > > > index entry of (last modification time -> base offset of next
>> > segment)
>> > > to
>> > > > > the time index of each inactive segment. So no actual rebuild will
>> > > happen
>> > > > > during upgrade. However, if message format version is 0.10.0, we
>> will
>> > > > > rebuild the time index if it does not exist. (I actually had a
>> > question
>> > > > > about the how we are loading the log segments, we can discuss it in
>> > the
>> > > > PR)
>> > > > >
>> > > > > I will update the wiki to clarify the question raised in the
>> comments
>> > > and
>> > > > > submit a PR by tomorrow. I am currently cleaning up the
>> > documentation.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Sun, Apr 10, 2016 at 9:25 PM, Jun Rao <j...@confluent.io> wrote:
>> > > > >
>> > > > >> Hi, Jiangjie,
>> > > > >>
>> > > > >> Thanks for the update. Looks good to me overall. Just a few minor
>> > > > comments
>> > > > >> below.
>> > > > >>
>> > > > >> 10. On broker startup, it's not clear to me why we need to scan
>> the
>> > > log
>> > > > >> segment to retrieve the largest timestamp since the time index
>> > always
>> > > > has
>> > > > >> an entry for the largest timestamp. Is that only for restarting
>> > after
>> > > a
>> > > > >> hard failure?
>> > > > >>
>> > > > >> 11. On broker startup, if a log segment misses the time index, do
>> we
>> > > > >> always
>> > > > >> rebuild it? This can happen when the broker is upgraded.
>> > > > >>
>> > > > >> 12. Related to Guozhang's question #1. It seems it's simpler to
>> add
>> > > time
>> > > > >> index entries independent of the offset index since at index entry
>> > may
>> > > > not
>> > > > >> be added to the offset and the time index at the same time. Also,
>> > this
>> > > > >> allows time index to be rebuilt independently if needed.
>> > > > >>
>> > > > >> Thanks,
>> > > > >>
>> > > > >> Jun
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Apr 6, 2016 at 5:44 PM, Becket Qin <becket....@gmail.com>
>> > > > wrote:
>> > > > >>
>> > > > >> > Hi all,
>> > > > >> >
>> > > > >> > I updated KIP-33 based on the initial implementation. Per
>> > discussion
>> > > > on
>> > > > >> > yesterday's KIP hangout, I would like to initiate the new vote
>> > > thread
>> > > > >> for
>> > > > >> > KIP-33.
>> > > > >> >
>> > > > >> > The KIP wiki:
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
>> > > > >> >
>> > > > >> > Here is a brief summary of the KIP:
>> > > > >> > 1. We propose to add a time index for each log segment.
>> > > > >> > 2. The time indices are going to be used of log retention, log
>> > > rolling
>> > > > >> and
>> > > > >> > message search by timestamp.
>> > > > >> >
>> > > > >> > There was an old voting thread which has some discussions on
>> this
>> > > KIP.
>> > > > >> The
>> > > > >> > mail thread link is following:
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201602.mbox/%3ccabtagwgoebukyapfpchmycjk2tepq3ngtuwnhtr2tjvsnc8...@mail.gmail.com%3E
>> > > > >> >
>> > > > >> > I have the following WIP patch for reference. It needs a few
>> more
>> > > unit
>> > > > >> > tests and documentation. Other than that it should run fine.
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/becketqin/kafka/commit/712357a3fbf1423e05f9eed7d2fed5b6fe6c37b7
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> >
>> > > > >> > Jiangjie (Becket) Qin
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> -- Guozhang

Reply via email to