GitHub user haphut added a comment to the discussion: Allow topic compaction to discard messages with duplicate key
Another variant of this problem occurs when we are using an in-order pub-sub API, e.g. an MQTT API, or any ephemeral event source to feed Pulsar. If we are running only one instance of the Pulsar Producer or Pulsar Source and the instance crashes or has network issues, some messages might never reach Pulsar. An obvious HA solution would have several identical instances of Producers running in parallel in different AZs, feeding one Pulsar topic with multiple copies of each message. How then do we retain only one copy of each unique message? Keep-last topic compaction could easily mess up the original order of the messages. Instead we can use a Pulsar Function or a Consumer-Producer to implement something similar to keep-first compaction. The Function needs to keep state somewhere for the set of keys or hashes of handled messages, maybe in Bookkeeper or another topic. This latter part of this HA feeder pattern would feel much more ergonomic with a built-in, keep-first, [on-the-fly](https://github.com/apache/pulsar/issues/6230) topic compaction. GitHub link: https://github.com/apache/pulsar/discussions/18842#discussioncomment-4350911 ---- This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org