Hi Matthias, kind of :)
I'm interested in the retention mechanisms and my use case is to keep old windows around for a long time (up to a year or longer) and access them via interactive queries. As I understand from the documentation, the retention mechanism is used to avoid changelogs from "growing out of bounds". This is a bit unclear to me, what are the storage costs from using a window store? For example, if data is received at a rate of 1 message per second and messages are aggregated to a single key using a tumbling window of 1 hour, would the size of the compacted changelog (and window store) be 24 records after 24 hours? Are there other potential tradeoffs when using the window store with a long retention? E.g., looking at the rocksdb implementation, there is something called a segment which seems to correspond to a single rocksdb instance. Does that have an effect on querying? Best, Mikael On Wed, Dec 14, 2016 at 6:44 PM Matthias J. Sax <matth...@confluent.io> wrote: I am not sure if I can follow. However, in Kafka Streams using window aggregation, the windowed KTable uses a key-value store internally -- it's only called windowed store because it encodes the key for the store as pair of <record-key:windowId> and also applies a couple of other mechanism with regard to retention time to delete old windows. Does this answer your question? -Matthias On 12/14/16 6:46 AM, Mikael Högqvist wrote: > Hi, > > I'm wondering about the tradeoffs when implementing a tumbling window with > a long retention, e.g. 1 year. Is it better to use a normal key value store > and aggregate the time bucket using a group by instead of a window store? > > Best, > Mikael >