lhotari commented on code in PR #24722:
URL: https://github.com/apache/pulsar/pull/24722#discussion_r2335698520
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
+ + " stored in metadata store is {}. When you get this log,
the latest several entries may be"
+ + " repeated.", name, lh.getId(),
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+ ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+ } else {
+ log.error("[{}] Going to fence the topic because failed opened
ledger {} to check the last add"
+ + " confirmed position when the ledger was concurrent
modified(it is an unexpected behaviour,"
+ + " which happens when the load-balancer does not work
as expected). The add confirmed position"
Review Comment:
again, is it necessary mention the load-balancer
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
+ + " stored in metadata store is {}. When you get this log,
the latest several entries may be"
+ + " repeated.", name, lh.getId(),
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
Review Comment:
What does this mean: "When you get this log, the latest several entries may
be repeated." ?
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
Review Comment:
Does this get called multiple times for each failure? would that result in
attempts to open the ledger multiple times in a row?
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
+ + " stored in metadata store is {}. When you get this log,
the latest several entries may be"
+ + " repeated.", name, lh.getId(),
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+ ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+ } else {
+ log.error("[{}] Going to fence the topic because failed opened
ledger {} to check the last add"
+ + " confirmed position when the ledger was concurrent
modified(it is an unexpected behaviour,"
+ + " which happens when the load-balancer does not work
as expected). The add confirmed position"
+ + " in memory is {}, and the error code {}. Fecing the
topic to avoid messages lost.",
Review Comment:
Fecing -> Fencing
It could be more general. "Fencing the topic to ensure durability and
consistency."
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
+ + " stored in metadata store is {}. When you get this log,
the latest several entries may be"
+ + " repeated.", name, lh.getId(),
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+ ledgerClosed(currentLedger, lh.getLastAddConfirmed());
Review Comment:
`ledgerClosed` doesn't seem to close the new LedgerHandle that is passed to
this call back in the `lh` parameter. Is it the correct method to call?
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
+ + " stored in metadata store is {}. When you get this log,
the latest several entries may be"
+ + " repeated.", name, lh.getId(),
currentLedger.getLastAddConfirmed(), lh.getLastAddConfirmed());
+ ledgerClosed(currentLedger, lh.getLastAddConfirmed());
+ } else {
+ log.error("[{}] Going to fence the topic because failed opened
ledger {} to check the last add"
+ + " confirmed position when the ledger was concurrent
modified(it is an unexpected behaviour,"
+ + " which happens when the load-balancer does not work
as expected). The add confirmed position"
+ + " in memory is {}, and the error code {}. Fecing the
topic to avoid messages lost.",
+ name, lh.getId(), currentLedger.getLastAddConfirmed(),
rc);
+ // Stop switching ledger and write topic metadata, to avoid
messages lost. The doc of
+ // LedgerHandle also mentioned this:
https://github.com/apache/bookkeeper/blob/release-4.17.2/
+ //
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2047-L2048
Review Comment:
Just wondering if the information in the LedgerHandle `if
(getLedgerMetadata().getState() == LedgerMetadata.State.IN_RECOVERY) {` is
consistent:
https://github.com/apache/bookkeeper/blob/75f65df7b487a9ee29817dc849a40f627c9857b7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2046-L2048
Due to similar reasons as https://github.com/apache/pulsar/pull/24665, is it
really reliable?
Uses cache:
https://github.com/apache/pulsar/blob/a53279837fae8b1c8bd16bec966cbc39823ed210/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/bookkeeper/PulsarLedgerManager.java#L198-L222
In BK, there's no ZK "sync", but it goes directly to ZK each time:
https://github.com/apache/bookkeeper/blob/2eb70b1f8216b2c6621d0e57cd491a2067824316/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L469-L471
Not necessarily relevant here, but it's good to be aware of the
PulsarLedgerManager impact with Pulsar.
##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -1878,10 +1878,38 @@ protected synchronized void
updateLedgersIdsComplete(@Nullable LedgerHandle orig
}
}
+ void ledgerAddFailedDueToConcurrentlyModified(final LedgerHandle
currentLedger) {
+ bookKeeper.asyncOpenLedger(currentLedger.getId(), digestType,
config.getPassword(), (rc, lh, ctx) -> {
+ if (rc == Code.OK) {
+ log.warn("[{}] Successfully opened ledger {} to check the last
add confirmed position when the ledger"
+ + " was concurrent modified(it is an unexpected behaviour,
which happens when the load-balancer"
+ + " does not work as expected). The add confirmed position
in memory is {}, and the value"
Review Comment:
Is it necessary to mention "it is an unexpected behavior, which happens when
the load-balancer does not work as expected" ? What is that when "the
load-balancer does not work as expected"?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]