The problem with offset as a config option is that offsets are partition-specific, so we'd need a per-partition config. This would work for our particular use case, where we have single-partition topics, but for multiple-partition topics it would delete from all partitions based on a global topic-level offset.
On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> wrote: > I think you are saying you considered a kind of trim() api that would > synchronously chop off the tail of the log starting from a given offset. > That would be one option, but what I was saying was slightly different: in > the proposal you have where there is a config that controls retention that > the user would update, wouldn't it make more sense for this config to be > based on offset rather than timestamp? > > -Jay > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw <wdwars...@gmail.com> wrote: > > > 1. Initially I looked at using the actual offset, by adding a call to > > AdminUtils to just delete anything in a given topic/partition to a given > > offset. I ran into a lot of trouble here trying to work out how the > system > > would recognize that every broker had successfully deleted that range > from > > the partition before returning to the client. If we were ok treating > this > > as a completely asynchronous operation I would be open to revisiting this > > approach. > > > > 2. For our use case, we would be updating the config every few hours > for a > > given topic, and there would not a be a sizable amount of consumers. I > > imagine that this would not scale well if someone was adjusting this > config > > very frequently on a large system, but I don't know if there are any use > > cases where that would occur. I imagine most use cases would involve > > truncating the log after taking a snapshot or doing some other expensive > > operation that didn't occur very frequently. > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > Two comments: > > > > > > 1. Is there a reason to use physical time rather than offset? The > idea > > > is for the consumer to say when it has consumed something so it can > be > > > deleted, right? It seems like offset would be a much more precise > way > > > to do > > > this--i.e. the consumer says "I have checkpointed state up to > offset X > > > you > > > can get rid of anything prior to that". Doing this by timestamp > seems > > > like > > > it is just more error prone... > > > 2. Is this mechanism practical to use at scale? It requires several > ZK > > > writes per config change, so I guess that depends on how frequently > > the > > > consumers would update the value and how many consumers there > > are...any > > > thoughts on this? > > > > > > -Jay > > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wdwars...@gmail.com> > > wrote: > > > > > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been > > > > accepted and is in-progress. I've updated the KIP ( > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > > > ). > > > > I have a commit with the functionality for KIP-47 ready to go once > > KIP-33 > > > > is complete; it's a fairly minor change. > > > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <g...@confluent.io> > > wrote: > > > > > > > > > For convenience, the KIP is here: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > > > > > > > > > Do you mind updating the KIP with time formats we plan on > supporting > > > > > in the configuration? > > > > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wdwars...@gmail.com > > > > > > wrote: > > > > > > Hello, > > > > > > > > > > > > I'd like to initiate the vote for KIP-47. > > > > > > > > > > > > Thanks, > > > > > > Bill Warshaw > > > > > > > > > > > > > > >