zclllyybb commented on issue #64682:
URL: https://github.com/apache/doris/issues/64682#issuecomment-4766250213

   Breakwater-GitHub-Analysis-Slot: slot_a0ba233667ad
   This content is generated by AI for reference only.
   
   Initial assessment: this looks like a valid latent bug in 
`MetaLockUtils.commitLockTables` on current `master`.
   
   I checked upstream `master` at `90db340515e88e177bc4e6151ba0b9a84cf04d8c`. 
In `fe/fe-core/src/main/java/org/apache/doris/common/util/MetaLockUtils.java`, 
the failure rollback path for `commitLockTables()` iterates `j = i - 1 ... 0`, 
but still calls `tableList.get(i).commitUnlock()`. That diverges from the 
nearby table-lock helpers, which unlock the already acquired entries with 
`get(j)` and then propagate the failure.
   
   The consequence is not only stylistic. The main caller I found is the 
cloud-mode branch of `BrokerLoadJob`, which calls 
`MetaLockUtils.commitLockTables(tableList)` before entering the later 
transaction `try/finally` that unlocks all tables. Therefore, if commit-lock 
acquisition fails partway through the list, `commitLockTables()` itself is 
responsible for releasing the earlier locks. With the current code, tables 
`0..i-1` are not released on that path.
   
   One nuance: the issue says the original exception is swallowed and the loop 
keeps going. The method indeed does not explicitly rethrow `e`; however, with 
the current `Table.commitUnlock()` implementation, `commitUnlock()` delegates 
to `MonitoredReentrantLock.unlock()`, which delegates to 
`ReentrantLock.unlock()`. If the failed table's lock is not owned by the 
current thread, the wrong rollback call can raise an unlock-side runtime 
exception instead. Either way, the already acquired commit locks remain 
unreleased, and the original acquisition failure is not handled correctly.
   
   PR #64676 appears to address the two important parts of the fix:
   
   - change the rollback target from `tableList.get(i)` to `tableList.get(j)`;
   - rethrow the acquisition failure after rollback;
   - add a focused FE unit test covering rollback and failure propagation.
   
   Suggested maintainer next step: review PR #64676 as the direct fix. I do not 
think extra runtime logs are required to validate the bug because the affected 
branch is visible by code inspection. If maintainers want to assess whether 
this has already occurred in a running deployment, the useful evidence would be 
FE logs around cloud-mode broker-load transaction commit failures plus a thread 
dump or lock-owner evidence showing a table commit lock held after a failed 
commit-lock acquisition.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to