Hi, What are duplicate messages in your use case? 1) different messages with the same content 2) the same message that is send multiple times to the broker due to retries in the producer 3) something else
What do you mean with "identify those duplicates"? What do you want to do with them? For case 1), you could write all messages in a topic and then identify the duplicates with a Kafka Streams application, process them and write the results again to a topic. Be aware that identifying duplicate messages let grow the state in the Kafka Stream application to the sum of the sizes of all unique messages, because you have to store all messages in your state to able to find duplicate future messages. That is not feasible in most cases. To limit your state in the Streams application you can restrict the identification of duplicates to a time window. For example, identify all duplicate messages of the last hour. Within a window of one hour, you would only process unique messages, but you would have duplicates across windows. If you want a fail-safe identification of duplicates, you also need to switch on exactly-once semantics in the Streams application. See https://kafka.apache.org/documentation/streams/ and the Streams configuration `processing.guarantee` under https://kafka.apache.org/22/documentation/streams/developer-guide/config-streams.html#id6 for more information on Kafka Streams and exactly-once semantics. For case 2) and if you want to ensure that the same message is only written once to the log you should look into idempotent producers. See https://kafka.apache.org/documentation/#semantics and the producer configuration `enable.idempotence` under https://kafka.apache.org/documentation/#producerconfigs . Hope that helps. Best regards, Bruno On Fri, Apr 26, 2019 at 9:02 AM saching1...@gmail.com <saching1...@gmail.com> wrote: > I have multiple clients who can send duplicate packets multiple time to > same kafka topic.. Is there a way to identify those duplicate packets. >