dybyte opened a new pull request, #9696: URL: https://github.com/apache/seatunnel/pull/9696
Fixes https://github.com/apache/seatunnel/issues/9637 ### Purpose of this pull request Fixes three memory leak issues: 1. `RunningJobStateIMap` – Checkpoint-related entries are stored but never removed, growing ~8,000/day. 2. `pendingJobMasterMap` – Not cleaned when resource allocation fails, growing ~200/day. 3. `metricsImap` – Cleanup skipped if lock acquisition fails, growing ~40/day. These changes ensure proper cleanup and retry, reducing memory growth in production. This PR introduces a background cleanup worker that collects failed metrics removal tasks into a blocking queue and retries them periodically based on a new configuration option (cleanup-retry-interval). This is my first time working with the engine codebase, so I might have overlooked some details. I’d appreciate any feedback or suggestions. ### Does this PR introduce _any_ user-facing change? Yes. A new configuration option is introduced: `cleanup-retry-interval` – Interval in seconds between attempts to retry metrics cleanup when previous cleanup fails due to lock contention or other issues. Default: 10 seconds. This helps ensure metrics cleanup eventually succeeds under heavy load. ### How was this patch tested? - Added E2E tests using Testcontainers. - Verified cleanup via server logs (direct map inspection not possible in this environment). - **For metricsImap cleanup retries, direct verification is challenging** because the test environment (Docker Testcontainers) does not allow internal state inspection and lock contention is non-deterministic. If reviewers have suggestions for reliably simulating lock contention in integration tests, it would be greatly appreciated. ### Check list * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 2. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) 3. Add ci label in [label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml) 4. Add e2e testcase in [seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/) 5. Update connector [plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
