Hi Grant, I apologise - I seemed to have skipped over Joel's email. It is not something we considered, but seems valid. I'm not sure if we should do it as part of this KIP or revisit it if/when we have more cleanup policies?
Thanks, Damian On Fri, 19 Aug 2016 at 15:57 Grant Henke <ghe...@cloudera.com> wrote: > Thanks for this KIP Damien. > > I know this vote has passed and I think its is okay as is, but I had > similar thoughts as Joel about combining compaction policies. > > I'm just wondering if it makes sense to allow specifying multiple > > comma-separated policies "compact,delete" as opposed to > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up > > with more policies. The order could potentially indicate precedence. > > > > Out of curiosity was the option of supporting a list of policies rejected? > Is it something we may consider adding later but didn't want to include in > the scope of this KIP? > > Thanks, > Grant > > > > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > Thanks for the proposal. I'm +1 overall with a thought somewhat related > to > > Jun's comments. > > > > While there may not yet be a sensible use case for it, it should be (in > > theory) legal to have compact_and_delete with size based retention as > well. > > I'm just wondering if it makes sense to allow specifying multiple > > comma-separated policies "compact,delete" as opposed to > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up > > with more policies. The order could potentially indicate precedence. > > Anyway, it is just a thought - it may end up being very confusing for > > users. > > > > @Jason - I agree this could be used to handle offset expiration as well. > We > > can discuss that separately though; and if we do that we would want to > also > > deprecate the retention field in the commit requests. > > > > Joel > > > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy <damian....@gmail.com> > wrote: > > > > > Thanks Jason. > > > The log retention.ms will be set to a value that greater than the > window > > > retention time. So as windows expire, they eventually get cleaned up by > > the > > > broker. It doesn't matter if old windows are around for sometime beyond > > > their usefulness, more that they do eventually get removed and the log > > > doesn't grow indefinitely (as it does now). > > > > > > Damian > > > > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson <ja...@confluent.io> > wrote: > > > > > > > Hey Damian, > > > > > > > > That's true, but it would avoid the need to write tombstones for the > > > > expired offsets I guess. I'm actually not sure it's a great idea > > anyway. > > > > One thing we've talked about is not expiring any offsets as long as a > > > group > > > > is alive, which would require some custom expiration logic. There's > > also > > > > the fact that we'd risk expiring group metadata which is stored in > the > > > same > > > > log. Having a builtin expiration mechanism may be more useful for the > > > > compacted topics we maintain in Connect, but I think there too we > might > > > > need some custom logic. For example, expiring connector configs > purely > > > > based on time doesn't seem like what we'd want. > > > > > > > > By the way, I wonder if you could describe the expected usage in a > > little > > > > more detail in the KIP for those of us who are not as familiar with > > Kafka > > > > Streams. Is the intention to have the log retain only the most recent > > > > window? In that case, would you always set the log retention time to > > the > > > > window length? And I suppose a consumer would do a seek to the start > of > > > the > > > > window (as soon as KIP-33 is available) and consume from there in > order > > > to > > > > read the current state? > > > > > > > > Thanks, > > > > Jason > > > > > > > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy <damian....@gmail.com> > > > wrote: > > > > > > > > > Thanks Jun > > > > > > > > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Damian, > > > > > > > > > > > > I was just wondering if we should disable size-based retention in > > the > > > > > > compact_and_delete mode. So, it sounds like that there could be a > > use > > > > > case > > > > > > for that. Since by default, the size-based retention is > infinite, I > > > > think > > > > > > we can just leave the KIP as it is. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy < > damian....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > The only concrete example i can think of is a case for limiting > > > disk > > > > > > usage. > > > > > > > Say, i had something like Connect running that was tracking > > changes > > > > in > > > > > a > > > > > > > database. Downstream i don't really care about every change, i > > just > > > > > want > > > > > > > the latest values, so compaction could be enabled. However, the > > > kafka > > > > > > > cluster has limited disk space so we need to limit the size of > > each > > > > > > > partition. > > > > > > > In a previous life i have done the same, just without > compaction > > > > turned > > > > > > on. > > > > > > > > > > > > > > Besides, i don't think it costs us anything in terms of added > > > > > complexity > > > > > > to > > > > > > > enable it for time & size based retention - the code already > does > > > > this > > > > > > for > > > > > > > us. > > > > > > > > > > > > > > Thanks, > > > > > > > Damian > > > > > > > > > > > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede <n...@confluent.io> > > > > wrote: > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > The motivation for this KIP is to handle joins and windows in > > > Kafka > > > > > > > > streams better and since Streams supports time-based windows, > > the > > > > KIP > > > > > > > > suggests combining time-based deletion and compaction. > > > > > > > > > > > > > > > > It might make sense to do the same for size-based windows, > but > > > can > > > > > you > > > > > > > > think of a concrete use case? If not, perhaps we can come > back > > to > > > > it. > > > > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao <j...@confluent.io> > > > wrote: > > > > > > > > > > > > > > > >> Hi, Damian, > > > > > > > >> > > > > > > > >> Thanks for the proposal. It makes sense to use time-based > > > deletion > > > > > > > >> retention and compaction together, as you mentioned in the > > > > KStream. > > > > > > > >> > > > > > > > >> Is there a use case where we want to combine size-based > > deletion > > > > > > > retention > > > > > > > >> and compaction together? > > > > > > > >> > > > > > > > >> Jun > > > > > > > >> > > > > > > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy < > > > damian....@gmail.com > > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> > Hi Jason, > > > > > > > >> > > > > > > > > >> > Thanks for your input - appreciated. > > > > > > > >> > > > > > > > > >> > 1. Would it make sense to use this KIP in the consumer > > > > coordinator > > > > > > to > > > > > > > >> > > expire offsets based on the topic's retention time? > > > Currently, > > > > > we > > > > > > > >> have a > > > > > > > >> > > periodic task which scans the full cache to check which > > > > offsets > > > > > > can > > > > > > > be > > > > > > > >> > > expired, but we might be able to get rid of this if we > > had a > > > > > > > callback > > > > > > > >> to > > > > > > > >> > > update the cache when a segment was deleted. Technically > > > > offsets > > > > > > can > > > > > > > >> be > > > > > > > >> > > given their own expiration time, but it seems > questionable > > > > > whether > > > > > > > we > > > > > > > >> > need > > > > > > > >> > > this going forward (the new consumer doesn't even expose > > it > > > at > > > > > the > > > > > > > >> > moment). > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > The KIP in its current form isn't adding a callback. So > > you'd > > > > > still > > > > > > > >> need to > > > > > > > >> > scan the cache and remove any expired offsets, however you > > > > > wouldn't > > > > > > > send > > > > > > > >> > the tombstone messages. > > > > > > > >> > Having a callback sounds useful, though it isn't clear to > me > > > how > > > > > you > > > > > > > >> would > > > > > > > >> > know which offsets to remove from the cache on segment > > > > deletion? I > > > > > > > will > > > > > > > >> > look into it. > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > 2. This KIP could also be useful for expiration in the > > case > > > > of a > > > > > > > cache > > > > > > > >> > > maintained on the client, but I don't see an obvious way > > > that > > > > > we'd > > > > > > > be > > > > > > > >> > able > > > > > > > >> > > to leverage it since there's no indication to the client > > > when > > > > a > > > > > > > >> segment > > > > > > > >> > has > > > > > > > >> > > been deleted (unless they reload the cache from the > > > beginning > > > > of > > > > > > the > > > > > > > >> > log). > > > > > > > >> > > One approach I can think of would be to write > > corresponding > > > > > > > >> tombstones as > > > > > > > >> > > necessary when a segment is removed, but that seems > pretty > > > > > heavy. > > > > > > > Have > > > > > > > >> > you > > > > > > > >> > > considered this problem? > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > We've not considered this and I'm not sure we want to as > > part > > > of > > > > > > this > > > > > > > >> KIP. > > > > > > > >> > > > > > > > > >> > Thanks, > > > > > > > >> > Damian > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy < > > > > > damian....@gmail.com > > > > > > > > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > > >> > > > Hi, > > > > > > > >> > > > > > > > > > > >> > > > We have created KIP 71: Enable log compaction and > > deletion > > > > to > > > > > > > >> co-exist` > > > > > > > >> > > > > > > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist > > > > > > > >> > > > > > > > > > > >> > > > Please take a look. Feedback is appreciated. > > > > > > > >> > > > > > > > > > > >> > > > Thank you > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Grant Henke > Software Engineer | Cloudera > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >