[I] [Bug] Inconsistent markDeletePosition replication for geo-replicated shared subscriptions with delayed messages [pulsar]

via GitHub Wed, 04 Jun 2025 14:12:40 -0700


tarmacmonsterg opened a new issue, #24380:
URL: https://github.com/apache/pulsar/issues/24380


   ### Search before reporting
   
   - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [x] I understand that [unsupported 
versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions)
 don't get bug fixes. I will attempt to reproduce the issue on a supported 
version of Pulsar client and Pulsar broker.
   
   
   ### User environment
   
   Pulsar: 4.0.4 official docker image
   Deployed on K8S
   
   ### Issue Description
   
   We have several topics in our Pulsar deployment. For some topics 
(cache-related), we have geo-replication disabled. Others work as expected — 
the subscription cursor is replicated to the backup cluster.
   However, we are seeing inconsistent behavior with topics used for delayed 
messages and shared subscriptions. These topics have geo-replication enabled 
and use individual acknowledgments. According to the documentation, individual 
acknowledgments themselves are not replicated across clusters. However, the 
markDeletePosition should be replicated.
   In our tests, we noticed that the markDeletePosition in the backup cluster 
does not move predictably. In some cases, it remains unchanged for a long time. 
The only time it eventually advances is after the primary cluster stops 
receiving new messages to that topic — and then, after a delay, the 
markDeletePosition is finally updated in the backup cluster.
   
   First check
   stats-internal main cluster
   ```
   "delayed_message_10_min" : {
         "markDeletePosition" : "2382448:33718",
   ```
   backup cluster
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "50797:509",
   ```
   Second check
   main cluster
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "2382448:40268",
   ```
   backup cluster
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "50797:509",
   ```
   third check
   main cluster
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "2382722:21942",
   ```
   backup cluster
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "50797:509",
   ```
   and check after stop load tests and empty backlog in main cluster
   main
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "2382761:11155",
   ```
   backup
   ```
       "delayed_message_10_min" : {
         "markDeletePosition" : "54807:11139",
   ```
   And i see one difference. In main clusters disappear 
individuallyDeletedMessages after stooping load test.
   
   
   
   ### Error messages
   
   ```text
   
   ```
   
   ### Reproducing the issue
   
        1.      Deploy two Pulsar clusters.
        2.      Create the relevant topics.
        3.      Configure geo-replication between the clusters.
        4.      Enable subscription replication on the client.
        5.      Start continuously producing delayed messages to the topic, 
with delivery delays of up to 10 minutes.
        6.      On the primary cluster, consume messages selectively (based on 
delivery time).
   
   Expected behavior:
   The markDeletePosition should advance on both the primary and the backup 
clusters.
   
   Actual behavior:
   The markDeletePosition advances only on the primary cluster.
   On the backup cluster, a backlog accumulates and markDeletePosition remains 
stuck for a long time.
   
   
   ### Additional information
   
   Disscussion started here: 
https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1748598297819549
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Inconsistent markDeletePosition replication for geo-replicated shared subscriptions with delayed messages [pulsar]

Reply via email to