Re: Tumbling windows with long retention

Mikael Högqvist Wed, 14 Dec 2016 13:47:28 -0800

Hi Matthias,

kind of :)

I'm interested in the retention mechanisms and my use case is to keep old
windows around for a long time (up to a year or longer) and access them via
interactive queries. As I understand from the documentation, the retention
mechanism is used to avoid changelogs from "growing out of bounds". This is
a bit unclear to me, what are the storage costs from using a window store?
For example, if data is received at a rate of 1 message per second and
messages are aggregated to a single key using a tumbling window of 1 hour,
would the size of the compacted changelog (and window store) be 24 records
after 24 hours?

Are there other potential tradeoffs when using the window store with a long
retention? E.g., looking at the rocksdb implementation, there is something
called a segment which seems to correspond to a single rocksdb instance.
Does that have an effect on querying?

Best,
Mikael

On Wed, Dec 14, 2016 at 6:44 PM Matthias J. Sax <matth...@confluent.io>
wrote:

I am not sure if I can follow.

However, in Kafka Streams using window aggregation, the windowed KTable
uses a key-value store internally -- it's only called windowed store
because it encodes the key for the store as pair of
<record-key:windowId> and also applies a couple of other mechanism with
regard to retention time to delete old windows.

Does this answer your question?

-Matthias

On 12/14/16 6:46 AM, Mikael Högqvist wrote:
> Hi,
>
> I'm wondering about the tradeoffs when implementing a tumbling window with
> a long retention, e.g. 1 year. Is it better to use a normal key value
store
> and aggregate the time bucket using a group by instead of a window store?
>
> Best,
> Mikael
>

Re: Tumbling windows with long retention

Reply via email to