dataroaring opened a new pull request, #61289: URL: https://github.com/apache/doris/pull/61289
## Summary - **Lazy-init ConcurrentHashMaps**: Replace eager `new ConcurrentHashMap<>()` initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty (most replicas never use `secondaryClusterToBackends`). - **Intern cluster ID strings**: Add a static intern pool for cluster ID strings to eliminate duplicate String instances across CloudReplica objects. During Gson deserialization, each replica gets its own String copy of the same cluster ID (~40-70 bytes each). Interning shares a single instance. ### Memory savings estimate (1M tablets): | Optimization | Per-replica savings | Total (1M) | |---|---|---| | Lazy-init secondaryClusterToBackends | ~56 bytes | ~53 MB | | Lazy-init primaryClusterToBackend (unassigned) | ~56 bytes | varies | | Smaller ConcurrentHashMap capacity (2 vs 16) | ~112 bytes | ~107 MB | | Cluster ID string interning | ~40-70 bytes | ~40-70 MB | | **Total** | **~160-230 bytes** | **~150-230 MB** | ### Changes: - `primaryClusterToBackend`: `volatile`, null by default, lazy-allocated with `new ConcurrentHashMap<>(2)` - `secondaryClusterToBackends`: `volatile`, null by default, lazy-allocated with `new ConcurrentHashMap<>(2)` - `getOrCreatePrimaryMap()` / `getOrCreateSecondaryMap()`: double-checked locking helpers - `internClusterId()`: heap-based intern pool using static ConcurrentHashMap - All access sites updated with null-safe patterns - `gsonPostProcess()`: interns keys from deserialized maps ## Test plan - [ ] Verify FE compilation passes - [ ] Run existing CloudReplica-related tests - [ ] Verify backward compatibility: old checkpoint/editlog with eager-initialized maps deserializes correctly - [ ] Verify `updateClusterToPrimaryBe` / `updateClusterToSecondaryBe` work correctly with lazy init - [ ] Verify `clearClusterToBe` handles null maps gracefully 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
