The problem with offset as a config option is that offsets are
partition-specific, so we'd need a per-partition config.  This would work
for our particular use case, where we have single-partition topics, but for
multiple-partition topics it would delete from all partitions based on a
global topic-level offset.

On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> wrote:

> I think you are saying you considered a kind of trim() api that would
> synchronously chop off the tail of the log starting from a given offset.
> That would be one option, but what I was saying was slightly different: in
> the proposal you have where there is a config that controls retention that
> the user would update, wouldn't it make more sense for this config to be
> based on offset rather than timestamp?
>
> -Jay
>
> On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw <wdwars...@gmail.com> wrote:
>
> > 1.  Initially I looked at using the actual offset, by adding a call to
> > AdminUtils to just delete anything in a given topic/partition to a given
> > offset.  I ran into a lot of trouble here trying to work out how the
> system
> > would recognize that every broker had successfully deleted that range
> from
> > the partition before returning to the client.  If we were ok treating
> this
> > as a completely asynchronous operation I would be open to revisiting this
> > approach.
> >
> > 2.  For our use case, we would be updating the config every few hours
> for a
> > given topic, and there would not a be a sizable amount of consumers.  I
> > imagine that this would not scale well if someone was adjusting this
> config
> > very frequently on a large system, but I don't know if there are any use
> > cases where that would occur.  I imagine most use cases would involve
> > truncating the log after taking a snapshot or doing some other expensive
> > operation that didn't occur very frequently.
> >
> > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > > Two comments:
> > >
> > >    1. Is there a reason to use physical time rather than offset? The
> idea
> > >    is for the consumer to say when it has consumed something so it can
> be
> > >    deleted, right? It seems like offset would be a much more precise
> way
> > > to do
> > >    this--i.e. the consumer says "I have checkpointed state up to
> offset X
> > > you
> > >    can get rid of anything prior to that". Doing this by timestamp
> seems
> > > like
> > >    it is just more error prone...
> > >    2. Is this mechanism practical to use at scale? It requires several
> ZK
> > >    writes per config change, so I guess that depends on how frequently
> > the
> > >    consumers would update the value and how many consumers there
> > are...any
> > >    thoughts on this?
> > >
> > > -Jay
> > >
> > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wdwars...@gmail.com>
> > wrote:
> > >
> > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
> > > > accepted and is in-progress.  I've updated the KIP (
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > ).
> > > > I have a commit with the functionality for KIP-47 ready to go once
> > KIP-33
> > > > is complete; it's a fairly minor change.
> > > >
> > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > > >
> > > > > For convenience, the KIP is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > >
> > > > > Do you mind updating the KIP with  time formats we plan on
> supporting
> > > > > in the configuration?
> > > > >
> > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wdwars...@gmail.com
> >
> > > > wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to initiate the vote for KIP-47.
> > > > > >
> > > > > > Thanks,
> > > > > > Bill Warshaw
> > > > >
> > > >
> > >
> >
>

Reply via email to