Hi
I run an app where I transform KTable to stream and then I groupBy and
aggregate and capture the results in KTable again. That generates many
duplicates.

I have played with exactly once semantics that seems to reduce duplicates
for records that should be unique. But I still get duplicates on keys that
have two or more records.

I could not reproduce it on small number of records so I disable caching by
setting CACHE_MAX_BYTES_BUFFERING_CONFIG to 0. Surely enough, I got loads
of duplicates, even these previously eliminated by exactly once semantics.
Now I have hard time to enable it again on Confluent 3.3.

But, generally what it the best deduplication strategy for Kafka Streams?

Artur

Reply via email to