morrySnow opened a new pull request, #61366:
URL: https://github.com/apache/doris/pull/61366
## Problem
All methods in `CatalogRecycleBin.java` use `synchronized` (single monitor
lock), creating extremely coarse lock granularity. When `erasePartition()` runs
slowly with many partitions, other `synchronized` methods block waiting for the
lock. Callers like `recyclePartition()` hold TABLE WRITE LOCK while waiting,
causing cascading blocking that can bring down the entire Doris metadata
service.
## Solution
Two complementary optimizations:
### 1. Replace `synchronized` with `ReentrantReadWriteLock`
- **Lock-free** (8 methods): Simple ConcurrentHashMap lookups
(`isRecyclePartition`, `getRecycleTimeById`, etc.)
- **Read lock** (4 methods): Read-only iterations
(`allTabletsInRecycledStatus`, `getInfo`, `write`, etc.)
- **Write lock** (11 methods): Map mutations
(`recycleDatabase/Table/Partition`, `recover*`, `clearAll`)
### 2. Microbatch Erase Pattern (Critical)
Refactored all 12 erase methods to process items **one at a time** with lock
release between items:
- **Inside write lock (per item)**: cleanup RPCs + map removal + edit log
write
- **Release lock between items**: other operations can proceed
This reduces lock hold time from **O(N × T)** (all items) to **O(T)** (one
item) per acquisition.
## Data Structure Changes
Changed 4 internal maps from `HashMap` to `ConcurrentHashMap` to enable
lock-free reads.
## Bug Fixes (found during self-review)
1. **NPE in `getIdListToEraseByRecycleTime`**: Used `getOrDefault` to handle
stale IDs that may be concurrently removed between snapshot and processing
2. **DdlException in cascade erase**: Added try-catch in
`eraseDatabaseInstantly`/`eraseTableInstantly` for partitions/tables
concurrently erased by daemon
## Testing
- All 24 existing unit tests pass
- Added 3 new concurrency tests:
- `testConcurrentReadsDoNotBlock` — 10 concurrent reader threads
- `testConcurrentRecycleAndRead` — writer + 5 readers simultaneously
- `testMicrobatchEraseReleasesLockBetweenItems` — verifies
recyclePartition succeeds during erase
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]