Re: using a state store for deduplication

Michael Noll Mon, 27 Mar 2017 07:42:41 -0700

Jon,

Damian already answered your direct question, so my comment is a FYI:

There's a demo example at
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java
(this is for Confluent 3.2 / Kafka 0.10.2.0).

Note that this code is for demonstration purposes.  To make the example
more suitable to production use cases you could e.g. switch to a window
store instead of manually purging expired entries via
`ReadOnlyKeyValueStore#all()` (which might be an expensive
operation/iteration).

Hope this helps,
Michael

On Mon, Mar 27, 2017 at 3:07 PM, Damian Guy <damian....@gmail.com> wrote:

> Jon,
> You don't need all the data for every topic as the data is partitioned by
> key. Therefore each state-store instance is de-duplicating a subset of the
> key set.
> Thanks,
> Damian
>
> On Mon, 27 Mar 2017 at 13:47 Jon Yeargers <jon.yearg...@cedexis.com>
> wrote:
>
> > Ive been (re)reading this document(
> > http://docs.confluent.io/3.2.0/streams/developer-guide.html#state-stores
> )
> > hoping to better understand StateStores. At the top of the section there
> is
> > a tantalizing note implying that one could do deduplication using a
> store.
> >
> > At present we using Redis for this as it gives us a shared location. Ive
> > been of the mind that a given store was local to a streams instance. To
> > truly support deduplication I would think one would need access to _all_
> > the data for a topic and not just on a per-partition basis.
> >
> > Am I completely misunderstanding this?
> >
>

Re: using a state store for deduplication

Reply via email to