GitHub user haphut added a comment to the discussion: Allow topic compaction to 
discard messages with duplicate key

Another variant of this problem occurs when we are using an in-order pub-sub 
API, e.g. an MQTT API, or any ephemeral event source to feed Pulsar.

If we are running only one instance of the Pulsar Producer or Pulsar Source and 
the instance crashes or has network issues, some messages might never reach 
Pulsar. An obvious HA solution would have several identical instances of 
Producers running in parallel in different AZs, feeding one Pulsar topic with 
multiple copies of each message.

How then do we retain only one copy of each unique message? Keep-last topic 
compaction could easily mess up the original order of the messages.

Instead we can use a Pulsar Function or a Consumer-Producer to implement 
something similar to keep-first compaction. The Function needs to keep state 
somewhere for the set of keys or hashes of handled messages, maybe in 
Bookkeeper or another topic.

This latter part of this HA feeder pattern would feel much more ergonomic with 
a built-in, keep-first, 
[on-the-fly](https://github.com/apache/pulsar/issues/6230) topic compaction.

GitHub link: 
https://github.com/apache/pulsar/discussions/18842#discussioncomment-4350911

----
This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org

Reply via email to