Jon, You don't need all the data for every topic as the data is partitioned by key. Therefore each state-store instance is de-duplicating a subset of the key set. Thanks, Damian
On Mon, 27 Mar 2017 at 13:47 Jon Yeargers <jon.yearg...@cedexis.com> wrote: > Ive been (re)reading this document( > http://docs.confluent.io/3.2.0/streams/developer-guide.html#state-stores) > hoping to better understand StateStores. At the top of the section there is > a tantalizing note implying that one could do deduplication using a store. > > At present we using Redis for this as it gives us a shared location. Ive > been of the mind that a given store was local to a streams instance. To > truly support deduplication I would think one would need access to _all_ > the data for a topic and not just on a per-partition basis. > > Am I completely misunderstanding this? >