Peter, Wesley, thanks for your use cases. There is a KIP discussion about adding a timestamp based log deletion policy into Kafka along side with compaction; and I'm thinking whether it makes sense to enable both log deletion and log compaction for the general cases of changelog data with expirations. Please take a look at the wiki page and discussion thread and feel free to leave your comments on the email threads if you feel it could possibly fit your needs.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy https://www.mail-archive.com/dev@kafka.apache.org/msg49573.html Guozhang On Fri, May 13, 2016 at 6:52 AM, Wesley Chow <w...@chartbeat.com> wrote: > Yes, also classic caching, where you might use memcache with TTLs. > > But a different use case for us is sessionizing. We push a high rate of > updates coming from a browser session to our Kafka cluster. If we don’t see > an update for a particular session after some period of time, we say that > session has expired and want to delete it. Compacted logs seem great for > this, however without TTLs we’d have to consume these updates to figure out > when to expire the session. I can go into more detail if that’s not clear. > > The general case here is that sometimes you want a kv store that doesn’t > exceed some resource bound. In the case of caching, you may not want to > exceed some time bound, but you may also not want to exceed some space > bound. You can totally deal with these bounds with a consumer, but if the > rate of updates to the keys is high then this could be an expensive > proposition. In the case of my sessionizing problem, consuming that data to > deal with expirations can easily add tens of thousands of dollars in > inter-AZ costs per year (not to mention the servers to run the extra > consumers), so having it taken care of in the brokers is actually very > useful. > > Wes > > > > On May 12, 2016, at 8:25 PM, Peter Davis <davi...@gmail.com> wrote: > > > > One use case is implementing a data retention policy. > > > > -Peter > > > > > >> On May 12, 2016, at 17:11, Guozhang Wang <wangg...@gmail.com> wrote: > >> > >> Wesley, > >> > >> Could describe your use case a bit more for motivating this? Is your > data > >> source expiring records and hence you want to auto "delete" the > >> corresponding Kafka records as well? > >> > >> Guozhang > >> > >>> On Thu, May 12, 2016 at 2:35 PM, Wesley Chow <w...@chartbeat.com> > wrote: > >>> > >>> Right, I’m trying to avoid explicitly managing TTLs. It’s nice being > able > >>> to just produce keys into Kafka without having an accompanying vacuum > >>> consumer. > >>> > >>> Wes > >>> > >>> > >>>> On May 12, 2016, at 5:15 PM, Benjamin Manns <benma...@gmail.com> > wrote: > >>>> > >>>> If you send a NULL value to a compacted log, after the retention > period > >>> it > >>>> will be removed. You could run a process that reprocesses the log and > >>> sends > >>>> a NULL to keys you want to purge based on some custom logic. > >>>> > >>>> On Thu, May 12, 2016 at 2:01 PM, Wesley Chow <w...@chartbeat.com> > wrote: > >>>> > >>>>> Are there any thoughts on supporting TTLs on keys in compacted logs? > In > >>>>> other words, some way to set on a per-key basis a time to > auto-delete. > >>>>> > >>>>> Wes > >>>> > >>>> > >>>> > >>>> -- > >>>> Benjamin Manns > >>>> benma...@gmail.com > >>>> (434) 321-8324 > >> > >> > >> -- > >> -- Guozhang > > -- -- Guozhang