Thanks Michael,
unfortunately we currently only considering exact counting. But will take a
look to your example for sure.
On Tue, Apr 10, 2018 at 12:16 PM, Michael Noll wrote:
> Also, if you want (or can tolerate) probabilistic counting, with the option
> to also do TopN in that manner, we als
Also, if you want (or can tolerate) probabilistic counting, with the option
to also do TopN in that manner, we also have an example that uses Count Min
Sketch:
https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/test/scala/io/confluent/examples/streams/ProbabilisticCountingSc
Thanks again,
yeah we saw that example for sure :)
Ok, gonna try low-level Transformer and see how it goes.
On Mon, Apr 9, 2018 at 9:17 PM, Matthias J. Sax
wrote:
> For (1), no.
>
> If you want to do manual put/get you should use a Transformer and
> implement a custom operator.
>
> Btw: here
For (1), no.
If you want to do manual put/get you should use a Transformer and
implement a custom operator.
Btw: here is an example of TopN:
https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/main/java/io/confluent/examples/streams/TopArticlesExampleDriver.java
-Matthia
Hi Matthias, thanks
clarifications below:
1. .aggregate( () -> .. ,
(k, v, agg) -> {
//Can i access KV store here for manual put/get?
});
2. TopN is not hard, we using pretty much same approach you describe, just
with bounde
>> ok, question then - is it possible to use state store with .aggregate()?
Not sure what you exactly mean by this. An aggregations always uses a
store; it's a stateful operation and cannot be computed without a store.
For TopN, if you get the hit-count as input, you can use a
`.aggregate()` oper
Thanks guys,
ok, question then - is it possible to use state store with .aggregate()?
Here are some details on counting, we basically looking for TopN +
Remaining calculation.
Example:
- incoming data: api url -> hit count
- we want output: Top 20 urls per each domain per hour + remaining coun
Hello Dmitriy,
You can "simulate" an lower-level processor API by 1) adding the stores you
need via the builder#addStore(); 2) do a manual "through" call after
"selectKey" (the selected key will be the same as your original groupBy
call), and then from the repartitioned stream add the `transform()
KGroupedStream and TimeWindowedKStream are only logical representations
at DSL level. They don't really "do" anything.
Thus, you can mimic them as follows:
builder.addStore(...)
in.selectKey().through(...).transform(..., "storeName").
selectKey() set's the new key for the grouping and the throug
Hey, good day everyone,
another kafka-streams friday question.
We hit the wall with DSL implementation and would like to try low-level
Processor API.
What we looking for is to:
- repartition incoming source stream via grouping records by some fields
+ windowed (hourly, daily, e.t.c).
- and
10 matches
Mail list logo