Izeren opened a new pull request, #27740:
URL: https://github.com/apache/flink/pull/27740
# Reopen of #27719
## What is the purpose of the change
Fix the flaky test class `ExecutionGraphRestartTest`.
**Root cause:** The original test used
`ComponentMainThreadExecutorServiceAdapter.forMainThread()` which wraps a
`DirectScheduledExecutorService` — its `execute(Runnable)` runs inline on the
**calling** thread. When the `EXECUTOR_RESOURCE` thread ran deployment
callbacks via `mainThreadExecutor.execute(callback)`, the callback executed on
the `EXECUTOR_RESOURCE` thread while the test thread simultaneously mutated
`ExecutionGraph` state, causing a race condition.
## Brief change log
- Use a dedicated single-thread executor instead of `forMainThread()` to
serialize all `ExecutionGraph` state mutations on one thread
- Use `runInMainThread` helper with `.join()` for exception propagation
- Add `offerSlotsFromMainThread` / `tryOfferSlotsFromMainThread` methods to
`SlotPoolUtils` for callers already on the main thread (avoids self-deadlock
from re-entrant `CompletableFuture.runAsync().join()`)
- Create slot pool with correct `mainThreadExecutor` via
`DeclarativeSlotPoolBridgeBuilder.setMainThreadExecutor()`
- Move slot pool lifecycle to `@BeforeEach` / `@AfterEach`
- Extract `createSchedulerBuilder` helper to reduce per-test boilerplate
- Split `testFailingExecutionAfterRestart` into two phases to account for
async restart callback queuing
## Verifying this change
This change was verified by running `@RepeatedTest(300)` on all 7 test
methods (2100 total executions) with 0 failures.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? **not applicable**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]