deduplication strategy for Kafka Streams DSL

Artur Mrozowski Wed, 13 Dec 2017 04:01:11 -0800

Hi
I run an app where I transform KTable to stream and then I groupBy and
aggregate and capture the results in KTable again. That generates many
duplicates.


I have played with exactly once semantics that seems to reduce duplicates
for records that should be unique. But I still get duplicates on keys that
have two or more records.

I could not reproduce it on small number of records so I disable caching by
setting CACHE_MAX_BYTES_BUFFERING_CONFIG to 0. Surely enough, I got loads
of duplicates, even these previously eliminated by exactly once semantics.
Now I have hard time to enable it again on Confluent 3.3.

But, generally what it the best deduplication strategy for Kafka Streams?

Artur

deduplication strategy for Kafka Streams DSL

Reply via email to