Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-10 Thread Dmitriy Vsekhvalnov
Thanks Michael, unfortunately we currently only considering exact counting. But will take a look to your example for sure. On Tue, Apr 10, 2018 at 12:16 PM, Michael Noll wrote: > Also, if you want (or can tolerate) probabilistic counting, with the option > to also do TopN in that manner, we als

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-10 Thread Michael Noll
Also, if you want (or can tolerate) probabilistic counting, with the option to also do TopN in that manner, we also have an example that uses Count Min Sketch: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/test/scala/io/confluent/examples/streams/ProbabilisticCountingSc

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-09 Thread Dmitriy Vsekhvalnov
Thanks again, yeah we saw that example for sure :) Ok, gonna try low-level Transformer and see how it goes. On Mon, Apr 9, 2018 at 9:17 PM, Matthias J. Sax wrote: > For (1), no. > > If you want to do manual put/get you should use a Transformer and > implement a custom operator. > > Btw: here

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-09 Thread Matthias J. Sax
For (1), no. If you want to do manual put/get you should use a Transformer and implement a custom operator. Btw: here is an example of TopN: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/main/java/io/confluent/examples/streams/TopArticlesExampleDriver.java -Matthia

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-09 Thread Dmitriy Vsekhvalnov
Hi Matthias, thanks clarifications below: 1. .aggregate( () -> .. , (k, v, agg) -> { //Can i access KV store here for manual put/get? }); 2. TopN is not hard, we using pretty much same approach you describe, just with bounde

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-07 Thread Matthias J. Sax
>> ok, question then - is it possible to use state store with .aggregate()? Not sure what you exactly mean by this. An aggregations always uses a store; it's a stateful operation and cannot be computed without a store. For TopN, if you get the hit-count as input, you can use a `.aggregate()` oper

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Dmitriy Vsekhvalnov
Thanks guys, ok, question then - is it possible to use state store with .aggregate()? Here are some details on counting, we basically looking for TopN + Remaining calculation. Example: - incoming data: api url -> hit count - we want output: Top 20 urls per each domain per hour + remaining coun

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Guozhang Wang
Hello Dmitriy, You can "simulate" an lower-level processor API by 1) adding the stores you need via the builder#addStore(); 2) do a manual "through" call after "selectKey" (the selected key will be the same as your original groupBy call), and then from the repartitioned stream add the `transform()

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Matthias J. Sax
KGroupedStream and TimeWindowedKStream are only logical representations at DSL level. They don't really "do" anything. Thus, you can mimic them as follows: builder.addStore(...) in.selectKey().through(...).transform(..., "storeName"). selectKey() set's the new key for the grouping and the throug

Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Dmitriy Vsekhvalnov
Hey, good day everyone, another kafka-streams friday question. We hit the wall with DSL implementation and would like to try low-level Processor API. What we looking for is to: - repartition incoming source stream via grouping records by some fields + windowed (hourly, daily, e.t.c). - and