yihua commented on code in PR #18439:
URL: https://github.com/apache/hudi/pull/18439#discussion_r3035616619
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/StorageBasedLockProvider.java:
##########
@@ -480,12 +480,26 @@ private synchronized boolean tryExpireCurrentLock(boolean
fromShutdownHook) {
Pair<LockUpsertResult, Option<StorageLockFile>> result;
long lockExpirationTimeMs = System.currentTimeMillis();
result = this.storageLockClient.tryUpsertLockFile(expiredLockData,
Option.of(this.getLock()));
+ if (result.getLeft() == LockUpsertResult.THROTTLED && !fromShutdownHook) {
+ logger.warn("Owner {}: Lock expiration write was throttled, retrying
after 1 second.", ownerId);
+ try {
+ TimeUnit.SECONDS.sleep(1);
+ } catch (InterruptedException ie) {
+ Thread.currentThread().interrupt();
+ }
+ result = this.storageLockClient.tryUpsertLockFile(expiredLockData,
Option.of(this.getLock()));
+ }
switch (result.getLeft()) {
case UNKNOWN_ERROR:
Review Comment:
🤖 When `fromShutdownHook` is true and the initial result is THROTTLED, we
skip the retry and fall straight into this case, but the log says "after retry"
— which didn't actually happen. Could you make the message conditional so it's
accurate in both paths? Something like "Lock expiration write was throttled (no
retry during shutdown)" vs the current message.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/StorageBasedLockProvider.java:
##########
@@ -480,12 +480,26 @@ private synchronized boolean tryExpireCurrentLock(boolean
fromShutdownHook) {
Pair<LockUpsertResult, Option<StorageLockFile>> result;
long lockExpirationTimeMs = System.currentTimeMillis();
result = this.storageLockClient.tryUpsertLockFile(expiredLockData,
Option.of(this.getLock()));
+ if (result.getLeft() == LockUpsertResult.THROTTLED && !fromShutdownHook) {
+ logger.warn("Owner {}: Lock expiration write was throttled, retrying
after 1 second.", ownerId);
+ try {
Review Comment:
🤖 This method is `synchronized`, so the 1-second sleep will block any
concurrent `tryLock()`, `renewLock()`, or `getLock()` call on the same provider
instance. In practice, is the heartbeat always stopped before we reach this
point? If so, it's probably fine — just want to confirm there's no window where
the heartbeat thread could be waiting on this monitor while we sleep.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]