Re: Micro-batching in Kafka streams - redux

2017-10-20 Thread Matthias J. Sax
Hi, as Kafka Streams focuses on stream processing, micro-batching is something we don't consider. Thus, nothing has changed/improved. About the store question: If you buffer up your writes in a store, you need to delete those value from the store later on to avoid that the store grown unbounded.

Re: micro-batching in kafka streams

2016-09-28 Thread Ara Ebrahimi
Awesome! Thanks. Ara. On Sep 28, 2016, at 3:20 PM, Guozhang Wang mailto:wangg...@gmail.com>> wrote: Ara, I'd recommend you using the interactive queries feature, available in the up coming 0.10.1 in a couple of weeks, to query the current snapshot of the state store. We are going to write a b

Re: micro-batching in kafka streams

2016-09-28 Thread Guozhang Wang
Ara, I'd recommend you using the interactive queries feature, available in the up coming 0.10.1 in a couple of weeks, to query the current snapshot of the state store. We are going to write a blog post about step-by-step instructions to leverage this feature for use cases just like yours soon. G

Re: micro-batching in kafka streams

2016-09-28 Thread Ara Ebrahimi
I need this ReadOnlyKeyValueStore. In my use case, I do an aggregateByKey(), so a KTable is formed, backed by a state store. This is then used by the next steps of the pipeline. Now using the word count sample, I try to read the state store. Hence I end up sharing it with the actual pipeline. A

Re: micro-batching in kafka streams

2016-09-28 Thread Guozhang Wang
Ara, Are you using the interactive queries feature but encountered issue due to locking file conflicts? https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams This is not expected to happen, if you are indeed using this feature I'd like to learn more of you

Re: micro-batching in kafka streams

2016-09-27 Thread Ara Ebrahimi
One more thing: Guozhang pointed me towards this sample for micro-batching: https://github.com/apache/kafka/blob/177b2d0bea76f270ec087ebe73431307c1aef5a1/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountProcessorDemo.java This is a good example and successfully

Re: micro-batching in kafka streams

2016-09-26 Thread Ara Ebrahimi
Hi, So, here’s the situation: - for classic batching of writes to external systems, right now I simply hack it. This specific case is writing of records to Accmumlo database, and I simply use the batch writer to batch writes, and it flushes every second or so. I’ve added a shutdown hook to the

Re: micro-batching in kafka streams

2016-09-26 Thread Srikanth
Guozhang, Its a bit hacky but I guess it will work fine as range scan isn't expensive in RocksDB. Michael, One reason is to be able to batch before sinking to an external system. Sink call per record isn't very efficient. This can be used just for the sink processor. I feel I might be stealing th

Re: micro-batching in kafka streams

2016-09-26 Thread Michael Noll
Ara, may I ask why you need to use micro-batching in the first place? Reason why I am asking: Typically, when people talk about micro-batching, they are refer to the way some originally batch-based stream processing tools "bolt on" real-time processing by making their batch sizes really small. H

Re: micro-batching in kafka streams

2016-09-23 Thread Guozhang Wang
One way that I can think of, is to add an index suffix on the key to differentiate records with the same keys, so your can still store records not as a list but as separate entries on KV store like: ... And then when punctuating, you can still scan the whole store or do a range query based on

Re: micro-batching in kafka streams

2016-09-23 Thread Srikanth
Guozhang, The example works well for aggregate operations. How can we achieve this if processing has to be in Micro-batching? One way will be to store the incoming records in a List type KV store and process it in punctuate. With the current KV stores, that would mean (de)serializing a list. Which

Re: micro-batching in kafka streams

2016-09-12 Thread Ara Ebrahimi
Thanks. +1 on KIP-63 story. I need all of that :) Ara. > On Sep 11, 2016, at 8:19 PM, Guozhang Wang wrote: > > Hello Ara, > > On the processor API, users have the flexible to do micro-batching with > their own implementation patterns. For example, like you mentioned already: > > 1. Use a state

Re: micro-batching in kafka streams

2016-09-11 Thread Guozhang Wang
Hello Ara, On the processor API, users have the flexible to do micro-batching with their own implementation patterns. For example, like you mentioned already: 1. Use a state store to bookkeep recently received records, and in process() function simply put the record into the store. 2. Use puncuta