Hi,
as Kafka Streams focuses on stream processing, micro-batching is
something we don't consider. Thus, nothing has changed/improved.
About the store question:
If you buffer up your writes in a store, you need to delete those value
from the store later on to avoid that the store grown unbounded.
Awesome! Thanks.
Ara.
On Sep 28, 2016, at 3:20 PM, Guozhang Wang
mailto:wangg...@gmail.com>> wrote:
Ara,
I'd recommend you using the interactive queries feature, available in the
up coming 0.10.1 in a couple of weeks, to query the current snapshot of the
state store.
We are going to write a b
Ara,
I'd recommend you using the interactive queries feature, available in the
up coming 0.10.1 in a couple of weeks, to query the current snapshot of the
state store.
We are going to write a blog post about step-by-step instructions to
leverage this feature for use cases just like yours soon.
G
I need this ReadOnlyKeyValueStore.
In my use case, I do an aggregateByKey(), so a KTable is formed, backed by a
state store. This is then used by the next steps of the pipeline. Now using the
word count sample, I try to read the state store. Hence I end up sharing it
with the actual pipeline. A
Ara,
Are you using the interactive queries feature but encountered issue due to
locking file conflicts?
https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
This is not expected to happen, if you are indeed using this feature I'd
like to learn more of you
One more thing:
Guozhang pointed me towards this sample for micro-batching:
https://github.com/apache/kafka/blob/177b2d0bea76f270ec087ebe73431307c1aef5a1/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountProcessorDemo.java
This is a good example and successfully
Hi,
So, here’s the situation:
- for classic batching of writes to external systems, right now I simply hack
it. This specific case is writing of records to Accmumlo database, and I simply
use the batch writer to batch writes, and it flushes every second or so. I’ve
added a shutdown hook to the
Guozhang,
Its a bit hacky but I guess it will work fine as range scan isn't expensive
in RocksDB.
Michael,
One reason is to be able to batch before sinking to an external system.
Sink call per record isn't very efficient.
This can be used just for the sink processor.
I feel I might be stealing th
Ara,
may I ask why you need to use micro-batching in the first place?
Reason why I am asking: Typically, when people talk about micro-batching,
they are refer to the way some originally batch-based stream processing
tools "bolt on" real-time processing by making their batch sizes really
small. H
One way that I can think of, is to add an index suffix on the key to
differentiate records with the same keys, so your can still store records
not as a list but as separate entries on KV store like:
...
And then when punctuating, you can still scan the whole store or do a range
query based on
Guozhang,
The example works well for aggregate operations. How can we achieve this if
processing has to be in Micro-batching?
One way will be to store the incoming records in a List type KV store and
process it in punctuate.
With the current KV stores, that would mean (de)serializing a list. Which
Thanks.
+1 on KIP-63 story. I need all of that :)
Ara.
> On Sep 11, 2016, at 8:19 PM, Guozhang Wang wrote:
>
> Hello Ara,
>
> On the processor API, users have the flexible to do micro-batching with
> their own implementation patterns. For example, like you mentioned already:
>
> 1. Use a state
Hello Ara,
On the processor API, users have the flexible to do micro-batching with
their own implementation patterns. For example, like you mentioned already:
1. Use a state store to bookkeep recently received records, and in
process() function simply put the record into the store.
2. Use puncuta
13 matches
Mail list logo