GitHub user volfco created a discussion: geo replicated subscribers

One of the big things missing in my book, is a way to have multiple subscribers 
across datacenters consume messages "exactly" once. 

DC A produces 5 messages. DC A has a consume subscription named TEST. DC B has 
a consumer with the same name. Ideally, a consumer in DC A process messages 
1,2,3 while DC B is able to process 4 and 5.

Another example would be I'm writing emails to be sent out into a topic from 
us-west and us-east. I can send emails from each DC, and the DC should consume 
messages produced there (just by nature of asynchronous replication). But if 
the DC's consumers go down I would like the other DCs to pick up and consume 
messages that should have been consumed by the now failed DC.

>From my admittedly limited research, I don't think this is currently 
>supported. Subscribers cursor is currently per DC, which allows for consumers 
>in each DC to consume the same message.

I could implement this on the client side by using the configuration store 
zookeeper to record a "lock" on a message ID to disallow consumption on other 
consumers- but this seems like something that could be implemented inside the 
broker and configurable on a per-namespace level. The consumer would write a 
lock into zookeeper, which other consumers would check for before doing the 
actual processing. If a lock exists, the consumer will ack it. 

At the expense of consumption latency and zookeeper I/O, I think this would be 
an amazing feature. 

I could see this implemented in two ways, both as Consumer Types. One is a 
"Lock on Everything" where a lock is acquired for each message. The second 
would be "Lock on foreign messages" where a lock is acquired for each message 
not produced in the local cluster (implying that it was replicated in from 
another cluster)- so messages produced from _us-west_ would not require a 
global lock, but messages produced in _us-west_ and consumed in _us-east_ would 
need a global lock before processing.

GitHub link: https://github.com/apache/pulsar/discussions/18983

----
This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org

Reply via email to