lhotari commented on code in PR #24722:
URL: https://github.com/apache/pulsar/pull/24722#discussion_r2335698520


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"
+                    + " stored in metadata store is {}. When you get this log, 
the latest several entries may be"
+                    + " repeated.", name, lh.getId(), 
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+                ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+            } else {
+                log.error("[{}] Going to fence the topic because failed opened 
ledger {} to check the last add"
+                        + " confirmed position when the ledger was concurrent 
modified(it is an unexpected behaviour,"
+                        + " which happens when the load-balancer does not work 
as expected). The add confirmed position"

Review Comment:
   again, is it necessary mention the load-balancer



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"
+                    + " stored in metadata store is {}. When you get this log, 
the latest several entries may be"
+                    + " repeated.", name, lh.getId(), 
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());

Review Comment:
   What does this mean: "When you get this log, the latest several entries may 
be repeated." ?
   



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {

Review Comment:
   Does this get called multiple times for each failure? would that result in 
attempts to open the ledger multiple times in a row?



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"
+                    + " stored in metadata store is {}. When you get this log, 
the latest several entries may be"
+                    + " repeated.", name, lh.getId(), 
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+                ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+            } else {
+                log.error("[{}] Going to fence the topic because failed opened 
ledger {} to check the last add"
+                        + " confirmed position when the ledger was concurrent 
modified(it is an unexpected behaviour,"
+                        + " which happens when the load-balancer does not work 
as expected). The add confirmed position"
+                        + " in memory is {}, and the error code {}. Fecing the 
topic to avoid messages lost.",

Review Comment:
   Fecing -> Fencing
   It could be more general. "Fencing the topic to ensure durability and 
consistency."



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"
+                    + " stored in metadata store is {}. When you get this log, 
the latest several entries may be"
+                    + " repeated.", name, lh.getId(), 
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+                ledgerClosed(currentLedger, lh.getLastAddConfirmed());

Review Comment:
   `ledgerClosed` doesn't seem to close the new LedgerHandle that is passed to 
this call back in the `lh` parameter. Is it the correct method to call?



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"
+                    + " stored in metadata store is {}. When you get this log, 
the latest several entries may be"
+                    + " repeated.", name, lh.getId(), 
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+                ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+            } else {
+                log.error("[{}] Going to fence the topic because failed opened 
ledger {} to check the last add"
+                        + " confirmed position when the ledger was concurrent 
modified(it is an unexpected behaviour,"
+                        + " which happens when the load-balancer does not work 
as expected). The add confirmed position"
+                        + " in memory is {}, and the error code {}. Fecing the 
topic to avoid messages lost.",
+                        name, lh.getId(), currentLedger.getLastAddConfirmed(), 
rc);
+                // Stop switching ledger and write topic metadata, to avoid 
messages lost. The doc of
+                // LedgerHandle also mentioned this: 
https://github.com/apache/bookkeeper/blob/release-4.17.2/
+                // 
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2047-L2048

Review Comment:
   Just wondering if the information in the LedgerHandle `if 
(getLedgerMetadata().getState() == LedgerMetadata.State.IN_RECOVERY) {` is 
consistent:
   
https://github.com/apache/bookkeeper/blob/75f65df7b487a9ee29817dc849a40f627c9857b7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2046-L2048
   
   Due to similar reasons as https://github.com/apache/pulsar/pull/24665, is it 
really reliable?
   
   Uses cache:
   
https://github.com/apache/pulsar/blob/a53279837fae8b1c8bd16bec966cbc39823ed210/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/bookkeeper/PulsarLedgerManager.java#L198-L222
   
   In BK, there's no ZK "sync", but it goes directly to ZK each time:
   
https://github.com/apache/bookkeeper/blob/2eb70b1f8216b2c6621d0e57cd491a2067824316/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L469-L471
   
   Not necessarily relevant here, but it's good to be aware of the 
PulsarLedgerManager impact with Pulsar.



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void 
updateLedgersIdsComplete(@Nullable LedgerHandle orig
         }
     }
 
+    void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle 
currentLedger) {
+        bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType, 
config.getPassword(), (rc, lh, ctx) -> {
+            if (rc == Code.OK) {
+                log.warn("[{}] Successfully opened ledger {} to check the last 
add confirmed position when the ledger"
+                    + " was concurrent modified(it is an unexpected behaviour, 
which happens when the load-balancer"
+                    + " does not work as expected). The add confirmed position 
in memory is {}, and the value"

Review Comment:
   Is it necessary to mention "it is an unexpected behavior, which happens when 
the load-balancer does not work as expected" ? What is that when "the 
load-balancer does not work as expected"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to