Jon, Damian already answered your direct question, so my comment is a FYI:
There's a demo example at https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java (this is for Confluent 3.2 / Kafka 0.10.2.0). Note that this code is for demonstration purposes. To make the example more suitable to production use cases you could e.g. switch to a window store instead of manually purging expired entries via `ReadOnlyKeyValueStore#all()` (which might be an expensive operation/iteration). Hope this helps, Michael On Mon, Mar 27, 2017 at 3:07 PM, Damian Guy <damian....@gmail.com> wrote: > Jon, > You don't need all the data for every topic as the data is partitioned by > key. Therefore each state-store instance is de-duplicating a subset of the > key set. > Thanks, > Damian > > On Mon, 27 Mar 2017 at 13:47 Jon Yeargers <jon.yearg...@cedexis.com> > wrote: > > > Ive been (re)reading this document( > > http://docs.confluent.io/3.2.0/streams/developer-guide.html#state-stores > ) > > hoping to better understand StateStores. At the top of the section there > is > > a tantalizing note implying that one could do deduplication using a > store. > > > > At present we using Redis for this as it gives us a shared location. Ive > > been of the mind that a given store was local to a streams instance. To > > truly support deduplication I would think one would need access to _all_ > > the data for a topic and not just on a per-partition basis. > > > > Am I completely misunderstanding this? > > >