Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Damian Guy
If you use the versions of the methods that pass in the store name they will all be backed by RocksDB On Wed, 3 May 2017 at 15:32 Garrett Barton wrote: > João, yes the stores would hold 90 days and prefer it to be rocksdb backed. > > I didn't actually know there wasn't an in memory state store.

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Garrett Barton
João, yes the stores would hold 90 days and prefer it to be rocksdb backed. I didn't actually know there wasn't an in memory state store. And now that I think about it, how do I verify (or set) what kind of store streams is using for all the tasks? I have a bunch windowed and not windowed and I d

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Eno Thereska
Just to add to this, there is a JIRA that tracks the fact that we don’t have an in-memory windowed store. https://issues.apache.org/jira/browse/KAFKA-4730 Eno > On May 3, 2017, at 12:42 PM, Damian Guy wrote: > > The windowed state store is onl

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Damian Guy
The windowed state store is only RocksDB at this point, so it isn't going to all be in memory. If you chose to implement your own Windowed Store, then you could hold it in memory if it would fit. On Wed, 3 May 2017 at 04:37 João Peixoto wrote: > Out of curiosity, would this mean that a state sto

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread João Peixoto
Out of curiosity, would this mean that a state store for such a window could hold 90 days worth of data in memory? Or filesystem if we're talking about Rocksdb On Tue, May 2, 2017 at 10:08 AM Damian Guy wrote: > Hi Garret, > > No, log.retention.hours doesn't impact compacted topics. > > Thanks,

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Damian Guy
Hi Garret, No, log.retention.hours doesn't impact compacted topics. Thanks, Damian On Tue, 2 May 2017 at 18:06 Garrett Barton wrote: > Thanks Damian, > > Does setting log.retention.hours have anything to do with compacted > topics? Meaning would a topic not compact now for 90 days? I am think

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Thanks Damian, Does setting log.retention.hours have anything to do with compacted topics? Meaning would a topic not compact now for 90 days? I am thinking all the internal topics that streams creates in the flow. Having recovery through 90 days of logs would take a good while I'd imagine. Than

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Damian Guy
Hi Garret, > I was running into data loss when segments are deleted faster than > downstream can process. My knee jerk reaction was to set the broker > configs log.retention.hours=2160 and log.segment.delete.delay.ms=2160 > and that made it go away, but I do not think this is right? > > I th

Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Greetings all, I have a use case where I want to calculate some metrics against sensor data using event time semantics (record time is event time) that I already have. I have years of it, but for this POC I'd like to just load the last few months so that we can start deriving trend lines now vs