Re: [VOTE] KIP-33 - Add a time based log index

Guozhang Wang Mon, 13 Jun 2016 14:30:45 -0700

Thanks Jiangjie,

I see the need for sensitive data purging, the above proposed change LGTM.
One minor concern is that a wrongly marked timestamp on the first record
could cause the segment to roll much later / earlier, though it may be rare.


Guozhang

On Fri, Jun 10, 2016 at 10:07 AM, Becket Qin <[email protected]> wrote:

> Hi,
>
> During the implementation of KIP-33, we found it might be useful to have a
> more deterministic time based log rolling than what proposed in the KIP.
>
> The current KIP proposal uses the largest timestamp in the segment for time
> based rolling. The active log segment only rolls when there is no message
> appended in max.roll.ms since the largest timestamp in the segment. i.e.
> the rolling time may change if user keeping appending messages into the
> segment. This may not be a desirable behavior for people who have sensitive
> data and want to make sure they are removed after some time.
>
> To solve the above issue, we want to modify the KIP proposal regarding the
> time based rolling to the following behavior. The time based log rolling
> will be based on the first message with a timestamp in the log segment if
> there is such a message. If no message in the segment has a timestamp, the
> time based log rolling will still be based on log segment create time,
> which is the same as we are doing now. The reasons we don't want to always
> roll based on file create time are because 1) the message timestamp may be
> assigned by clients which can be different from the create time of the log
> segment file. 2) On some Linux, the file create time is not available, so
> using segment file create time may not always work.
>
> Do people have any concern for this change? I will update the KIP if people
> think the change is OK.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Apr 19, 2016 at 6:27 PM, Becket Qin <[email protected]> wrote:
>
> > Thanks Joel and Ismael. I just updated the KIP based on your feedback.
> >
> > KIP-33 has passed with +4 (binding) and +2 (non-binding)
> >
> > Thanks everyone for the reading, feedback and voting!
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Apr 19, 2016 at 5:25 PM, Ismael Juma <[email protected]> wrote:
> >
> >> Thanks Becket. I think it would be nice to update the KIP with regards
> to
> >> point 3 and 4.
> >>
> >> In any case, +1 (non-binding)
> >>
> >> Ismael
> >>
> >> On Tue, Apr 19, 2016 at 2:03 AM, Becket Qin <[email protected]>
> wrote:
> >>
> >> > Thanks for the comments Ismael. Please see the replies inline.
> >> >
> >> > On Mon, Apr 18, 2016 at 6:50 AM, Ismael Juma <[email protected]>
> wrote:
> >> >
> >> > > Hi Jiangjie,
> >> > >
> >> > > Thanks for the KIP, it's a nice improvement. Since it seems like we
> >> have
> >> > > been using the voting thread for discussion, I'll do the same.
> >> > >
> >> > > A few minor comments/questions:
> >> > >
> >> > > 1. The proposed name for the time index file
> >> > `SegmentBaseOffset.timeindex`.
> >> > > Would `SegmentBaseOffset.time-index` be a little better? It would
> >> clearly
> >> > > separate the type of index in case we add additional index types in
> >> the
> >> > > future.
> >> > >
> >> > I have no strong opinion on this, I am not adding any thing separator
> >> > because it is more regex friendly.
> >> > I am not sure about the other indexes, time and space seems to be two
> >> most
> >> > common dimensions.
> >> >
> >> > 2. When describing the time index entry, we say "Offset - the next
> >> offset
> >> > > when the time index entry is inserted". I found the mention of
> `next`
> >> a
> >> > bit
> >> > > confusing as it looks to me like the time index entry has the first
> >> > offset
> >> > > in the message set.
> >> >
> >> > This semantic meaning is a little different from the offset index. The
> >> > offset index information is self-contained by nature. i.e. all the
> >> offsets
> >> > before is smaller than the offset of this message set. So we only need
> >> to
> >> > say "the offset of this message set is OFFSET". This does not quite
> >> apply
> >> > to the time index because the max timestamp may or may not be in the
> >> > message set being appended. So we have to either say, "the max
> timestamp
> >> > before I append this message set is T", or "the max timestamp after I
> >> > appended this message set is T". The former case means that we can
> skip
> >> all
> >> > the previous messages if we are looking for a timestamp > T and start
> >> from
> >> > this offset. The latter one means if we are searching for timestamp >
> >> T, we
> >> > should start after this message set, which is essentially the same as
> >> the
> >> > former case but require an additional interpretation.
> >> >
> >> > 3. We say "The default initial / max size of the time index files is
> the
> >> > > same as the offset index files. (time index entry is 1.5x of the
> size
> >> of
> >> > > offset index entry, user should set the configuration accordingly)".
> >> It
> >> > may
> >> > > be worth elaborating a little on what a user should do with regards
> to
> >> > this
> >> > > configuration when upgrading (ie maybe under "Compatibility,
> >> Deprecation,
> >> > > and Migration Plan").
> >> > >
> >> > Makes sense.
> >> >
> >> >
> >> > > 4. In a previous vote thread, Jun said "The simplest thing is
> probably
> >> > > to change
> >> > > the default index size to 2MB to match the default log segment size"
> >> and
> >> > > you seemed to agree. I couldn't find anything about this in the KIP.
> >> Are
> >> > we
> >> > > still doing it?
> >> > >
> >> > Yes, we can still make the change for default settings. User might
> want
> >> to
> >> > set the index size a little larger if they have a customized size but
> in
> >> > reality it should not cause problems other than rolling out a little
> >> more
> >> > log segments.
> >> >
> >> > 5. We say "Instead, it is only monotonically increasing within each
> time
> >> > > index file. i.e. It is possible that the time index file for a later
> >> log
> >> > > segment contains smaller timestamp than some timestamp in the time
> >> index
> >> > > file of an earlier segment.". I think it would be good to explain
> >> under
> >> > > which scenario a time index file for a later log segment contains a
> >> > smaller
> >> > > timestamp (is this only when CreateTime is used?).
> >> > >
> >> > Yes, it only happens when CreateTime is used.
> >> >
> >> >
> >> > > 6. We say "When searching by timestamp, broker will start from the
> >> > earliest
> >> > > log segment and check the last time index entry.". The existing
> logic
> >> > > searches from newest segment backwards. Is there a reason why we are
> >> > > changing it?
> >> > >
> >> > Suppose segment 0 has max timestamp 100, segment 1 has max timestamp
> 50
> >> and
> >> > segment 3 has max timestamp 90, now user want to search for timestamp
> >> 80.
> >> > If we search backwards, we have to take a look at all the segments. If
> >> we
> >> > search forward, we will stop at the first segment whose max timestamp
> is
> >> > greater than 80 (i.e all the previous segments has smaller timestamps)
> >> and
> >> > start the finer search on that segment.
> >> >
> >> >
> >> > > 7. Do you mind if I fix typos and minor grammar issues directly in
> the
> >> > > wiki? It seems easier than doing that via email.
> >> > >
> >> > Not at all, thanks for help.
> >> >
> >> >
> >> > > Thanks,
> >> > > Ismael
> >> > >
> >> > > On Thu, Apr 7, 2016 at 1:44 AM, Becket Qin <[email protected]>
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I updated KIP-33 based on the initial implementation. Per
> >> discussion on
> >> > > > yesterday's KIP hangout, I would like to initiate the new vote
> >> thread
> >> > for
> >> > > > KIP-33.
> >> > > >
> >> > > > The KIP wiki:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
> >> > > >
> >> > > > Here is a brief summary of the KIP:
> >> > > > 1. We propose to add a time index for each log segment.
> >> > > > 2. The time indices are going to be used of log retention, log
> >> rolling
> >> > > and
> >> > > > message search by timestamp.
> >> > > >
> >> > > > There was an old voting thread which has some discussions on this
> >> KIP.
> >> > > The
> >> > > > mail thread link is following:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201602.mbox/%3ccabtagwgoebukyapfpchmycjk2tepq3ngtuwnhtr2tjvsnc8...@mail.gmail.com%3E
> >> > > >
> >> > > > I have the following WIP patch for reference. It needs a few more
> >> unit
> >> > > > tests and documentation. Other than that it should run fine.
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/becketqin/kafka/commit/712357a3fbf1423e05f9eed7d2fed5b6fe6c37b7
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-33 - Add a time based log index

Reply via email to