+1 (binding)

On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw <wdwars...@gmail.com> wrote:
> Bumping for visibility.  KIP is here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
>
> On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw <wdwars...@gmail.com> wrote:
>
>> Hello Guozhang,
>>
>> KIP-71 seems unrelated to this KIP.  KIP-47 is just adding a new deletion
>> policy (minimum timestamp), while KIP-71 is allowing deletion and
>> compaction to coexist.
>>
>> They both will touch LogManager, but the change for KIP-47 is very
>> isolated.
>>
>> On Wed, Aug 24, 2016 at 2:21 PM Guozhang Wang <wangg...@gmail.com> wrote:
>>
>> Hi Bill,
>>
>> I would like to reason if there is any correlation between this KIP and
>> KIP-71
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist
>>
>> I feel they are orthogonal but would like to double check with you.
>>
>>
>> Guozhang
>>
>>
>> On Wed, Aug 24, 2016 at 11:05 AM, Bill Warshaw <wdwars...@gmail.com>
>> wrote:
>>
>> > I'd like to re-awaken this voting thread now that KIP-33 has merged.
>> This
>> > KIP is now completely unblocked.  I have a working branch off of trunk
>> with
>> > my proposed fix, including testing.
>> >
>> > On Mon, May 9, 2016 at 8:30 PM Guozhang Wang <wangg...@gmail.com> wrote:
>> >
>> > > Jay, Bill:
>> > >
>> > > I'm thinking of one general use case of using timestamp rather than
>> > offset
>> > > for log deletion, which is that for expiration handling in data
>> > > replication, when the source data store decides to expire some data
>> > records
>> > > based on their timestamps, today we need to configure the corresponding
>> > > Kafka changelog topic for compaction, and actively send a tombstone for
>> > > each expired record. Since expiration usually happens with a bunch of
>> > > records, this could generate large tombstone traffic. For example I
>> think
>> > > LI's data replication for Espresso is seeing similar issues and they
>> are
>> > > just not sending tombstone at all.
>> > >
>> > > With timestamp based log deletion policy, this can be easily handled by
>> > > simply setting the current expiration timestamp; but ideally one would
>> > > prefer to configure this topic to be both log compaction enabled as
>> well
>> > as
>> > > log deletion enabled. From that point of view, I feel that current KIP
>> > > still has value to be accepted.
>> > >
>> > > Guozhang
>> > >
>> > >
>> > > On Mon, May 2, 2016 at 2:37 PM, Bill Warshaw <wdwars...@gmail.com>
>> > wrote:
>> > >
>> > > > Yes, I'd agree that offset is a more precise configuration than
>> > > timestamp.
>> > > > If there was a way to set a partition-level configuration, I would
>> > rather
>> > > > use log.retention.min.offset than timestamp.  If you have an approach
>> > in
>> > > > mind I'd be open to investigating it.
>> > > >
>> > > > On Mon, May 2, 2016 at 5:33 PM, Jay Kreps <j...@confluent.io> wrote:
>> > > >
>> > > > > Gotcha, good point. But barring that limitation, you agree that
>> that
>> > > > makes
>> > > > > more sense?
>> > > > >
>> > > > > -Jay
>> > > > >
>> > > > > On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw <wdwars...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > The problem with offset as a config option is that offsets are
>> > > > > > partition-specific, so we'd need a per-partition config.  This
>> > would
>> > > > work
>> > > > > > for our particular use case, where we have single-partition
>> topics,
>> > > but
>> > > > > for
>> > > > > > multiple-partition topics it would delete from all partitions
>> based
>> > > on
>> > > > a
>> > > > > > global topic-level offset.
>> > > > > >
>> > > > > > On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io>
>> > wrote:
>> > > > > >
>> > > > > > > I think you are saying you considered a kind of trim() api that
>> > > would
>> > > > > > > synchronously chop off the tail of the log starting from a
>> given
>> > > > > offset.
>> > > > > > > That would be one option, but what I was saying was slightly
>> > > > different:
>> > > > > > in
>> > > > > > > the proposal you have where there is a config that controls
>> > > retention
>> > > > > > that
>> > > > > > > the user would update, wouldn't it make more sense for this
>> > config
>> > > to
>> > > > > be
>> > > > > > > based on offset rather than timestamp?
>> > > > > > >
>> > > > > > > -Jay
>> > > > > > >
>> > > > > > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw <
>> > wdwars...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > 1.  Initially I looked at using the actual offset, by adding
>> a
>> > > call
>> > > > > to
>> > > > > > > > AdminUtils to just delete anything in a given topic/partition
>> > to
>> > > a
>> > > > > > given
>> > > > > > > > offset.  I ran into a lot of trouble here trying to work out
>> > how
>> > > > the
>> > > > > > > system
>> > > > > > > > would recognize that every broker had successfully deleted
>> that
>> > > > range
>> > > > > > > from
>> > > > > > > > the partition before returning to the client.  If we were ok
>> > > > treating
>> > > > > > > this
>> > > > > > > > as a completely asynchronous operation I would be open to
>> > > > revisiting
>> > > > > > this
>> > > > > > > > approach.
>> > > > > > > >
>> > > > > > > > 2.  For our use case, we would be updating the config every
>> few
>> > > > hours
>> > > > > > > for a
>> > > > > > > > given topic, and there would not a be a sizable amount of
>> > > > > consumers.  I
>> > > > > > > > imagine that this would not scale well if someone was
>> adjusting
>> > > > this
>> > > > > > > config
>> > > > > > > > very frequently on a large system, but I don't know if there
>> > are
>> > > > any
>> > > > > > use
>> > > > > > > > cases where that would occur.  I imagine most use cases would
>> > > > involve
>> > > > > > > > truncating the log after taking a snapshot or doing some
>> other
>> > > > > > expensive
>> > > > > > > > operation that didn't occur very frequently.
>> > > > > > > >
>> > > > > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io>
>> > > > wrote:
>> > > > > > > >
>> > > > > > > > > Two comments:
>> > > > > > > > >
>> > > > > > > > >    1. Is there a reason to use physical time rather than
>> > > offset?
>> > > > > The
>> > > > > > > idea
>> > > > > > > > >    is for the consumer to say when it has consumed
>> something
>> > so
>> > > > it
>> > > > > > can
>> > > > > > > be
>> > > > > > > > >    deleted, right? It seems like offset would be a much
>> more
>> > > > > precise
>> > > > > > > way
>> > > > > > > > > to do
>> > > > > > > > >    this--i.e. the consumer says "I have checkpointed state
>> up
>> > > to
>> > > > > > > offset X
>> > > > > > > > > you
>> > > > > > > > >    can get rid of anything prior to that". Doing this by
>> > > > timestamp
>> > > > > > > seems
>> > > > > > > > > like
>> > > > > > > > >    it is just more error prone...
>> > > > > > > > >    2. Is this mechanism practical to use at scale? It
>> > requires
>> > > > > > several
>> > > > > > > ZK
>> > > > > > > > >    writes per config change, so I guess that depends on how
>> > > > > > frequently
>> > > > > > > > the
>> > > > > > > > >    consumers would update the value and how many consumers
>> > > there
>> > > > > > > > are...any
>> > > > > > > > >    thoughts on this?
>> > > > > > > > >
>> > > > > > > > > -Jay
>> > > > > > > > >
>> > > > > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <
>> > > > wdwars...@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > I'd like to re-initiate the vote for KIP-47 now that
>> KIP-33
>> > > has
>> > > > > > been
>> > > > > > > > > > accepted and is in-progress.  I've updated the KIP (
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 47+-+Add+timestamp-based+log+deletion+policy
>> > > > > > > > > > ).
>> > > > > > > > > > I have a commit with the functionality for KIP-47 ready
>> to
>> > go
>> > > > > once
>> > > > > > > > KIP-33
>> > > > > > > > > > is complete; it's a fairly minor change.
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <
>> > > > g...@confluent.io>
>> > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > For convenience, the KIP is here:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 47+-+Add+timestamp-based+log+deletion+policy
>> > > > > > > > > > >
>> > > > > > > > > > > Do you mind updating the KIP with  time formats we plan
>> > on
>> > > > > > > supporting
>> > > > > > > > > > > in the configuration?
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <
>> > > > > > wdwars...@gmail.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > Hello,
>> > > > > > > > > > > >
>> > > > > > > > > > > > I'd like to initiate the vote for KIP-47.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > Bill Warshaw
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Reply via email to