Jay, Bill:

I'm thinking of one general use case of using timestamp rather than offset
for log deletion, which is that for expiration handling in data
replication, when the source data store decides to expire some data records
based on their timestamps, today we need to configure the corresponding
Kafka changelog topic for compaction, and actively send a tombstone for
each expired record. Since expiration usually happens with a bunch of
records, this could generate large tombstone traffic. For example I think
LI's data replication for Espresso is seeing similar issues and they are
just not sending tombstone at all.

With timestamp based log deletion policy, this can be easily handled by
simply setting the current expiration timestamp; but ideally one would
prefer to configure this topic to be both log compaction enabled as well as
log deletion enabled. From that point of view, I feel that current KIP
still has value to be accepted.

Guozhang


On Mon, May 2, 2016 at 2:37 PM, Bill Warshaw <wdwars...@gmail.com> wrote:

> Yes, I'd agree that offset is a more precise configuration than timestamp.
> If there was a way to set a partition-level configuration, I would rather
> use log.retention.min.offset than timestamp.  If you have an approach in
> mind I'd be open to investigating it.
>
> On Mon, May 2, 2016 at 5:33 PM, Jay Kreps <j...@confluent.io> wrote:
>
> > Gotcha, good point. But barring that limitation, you agree that that
> makes
> > more sense?
> >
> > -Jay
> >
> > On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw <wdwars...@gmail.com>
> wrote:
> >
> > > The problem with offset as a config option is that offsets are
> > > partition-specific, so we'd need a per-partition config.  This would
> work
> > > for our particular use case, where we have single-partition topics, but
> > for
> > > multiple-partition topics it would delete from all partitions based on
> a
> > > global topic-level offset.
> > >
> > > On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> wrote:
> > >
> > > > I think you are saying you considered a kind of trim() api that would
> > > > synchronously chop off the tail of the log starting from a given
> > offset.
> > > > That would be one option, but what I was saying was slightly
> different:
> > > in
> > > > the proposal you have where there is a config that controls retention
> > > that
> > > > the user would update, wouldn't it make more sense for this config to
> > be
> > > > based on offset rather than timestamp?
> > > >
> > > > -Jay
> > > >
> > > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw <wdwars...@gmail.com>
> > > wrote:
> > > >
> > > > > 1.  Initially I looked at using the actual offset, by adding a call
> > to
> > > > > AdminUtils to just delete anything in a given topic/partition to a
> > > given
> > > > > offset.  I ran into a lot of trouble here trying to work out how
> the
> > > > system
> > > > > would recognize that every broker had successfully deleted that
> range
> > > > from
> > > > > the partition before returning to the client.  If we were ok
> treating
> > > > this
> > > > > as a completely asynchronous operation I would be open to
> revisiting
> > > this
> > > > > approach.
> > > > >
> > > > > 2.  For our use case, we would be updating the config every few
> hours
> > > > for a
> > > > > given topic, and there would not a be a sizable amount of
> > consumers.  I
> > > > > imagine that this would not scale well if someone was adjusting
> this
> > > > config
> > > > > very frequently on a large system, but I don't know if there are
> any
> > > use
> > > > > cases where that would occur.  I imagine most use cases would
> involve
> > > > > truncating the log after taking a snapshot or doing some other
> > > expensive
> > > > > operation that didn't occur very frequently.
> > > > >
> > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io>
> wrote:
> > > > >
> > > > > > Two comments:
> > > > > >
> > > > > >    1. Is there a reason to use physical time rather than offset?
> > The
> > > > idea
> > > > > >    is for the consumer to say when it has consumed something so
> it
> > > can
> > > > be
> > > > > >    deleted, right? It seems like offset would be a much more
> > precise
> > > > way
> > > > > > to do
> > > > > >    this--i.e. the consumer says "I have checkpointed state up to
> > > > offset X
> > > > > > you
> > > > > >    can get rid of anything prior to that". Doing this by
> timestamp
> > > > seems
> > > > > > like
> > > > > >    it is just more error prone...
> > > > > >    2. Is this mechanism practical to use at scale? It requires
> > > several
> > > > ZK
> > > > > >    writes per config change, so I guess that depends on how
> > > frequently
> > > > > the
> > > > > >    consumers would update the value and how many consumers there
> > > > > are...any
> > > > > >    thoughts on this?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <
> wdwars...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has
> > > been
> > > > > > > accepted and is in-progress.  I've updated the KIP (
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > ).
> > > > > > > I have a commit with the functionality for KIP-47 ready to go
> > once
> > > > > KIP-33
> > > > > > > is complete; it's a fairly minor change.
> > > > > > >
> > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <
> g...@confluent.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > For convenience, the KIP is here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > >
> > > > > > > > Do you mind updating the KIP with  time formats we plan on
> > > > supporting
> > > > > > > > in the configuration?
> > > > > > > >
> > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <
> > > wdwars...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > I'd like to initiate the vote for KIP-47.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Bill Warshaw
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Reply via email to