och5351 opened a new pull request, #28434:
URL: https://github.com/apache/flink/pull/28434

   ## What is the purpose of the change
   
   `ExecutionTimeBasedSlowTaskDetectorTest` was flaky due to two issues.
   
   FLINK-38114 introduced 
thenComposeAsync(tryGetTaskDeploymentDescriptorForSlot, 
jobMasterMainThreadExecutor) in Execution.deploy(), which posts TDD creation to 
the jobMasterMainThreadExecutor after task restore serialization completes in 
the IO executor. Before FLINK-38114, TDD creation happened synchronously on the 
main thread.
   
   * Issue : https://issues.apache.org/jira/browse/FLINK-38114
   * PR : https://github.com/apache/flink/pull/26821
   
   ### Issue 1 — Wrong ComponentMainThreadExecutor setting
   
   `createExecutionGraph()` used `forMainThread()`, which asserts that 
`execute()` is called from the registered main (test) thread. When the IO 
thread triggered the thenComposeAsync callback and called execute(), an 
AssertionError was thrown, transitioning the execution to FAILED and causing:
   
   > java.lang.IllegalStateException: BUG: trying to schedule a region which is 
not in CREATED state
   
   ### Issue 2 — Missing waitForTaskDeploymentDescriptorsCreation()
   
   Without waiting for async TDD creation to complete, 
switchAllVerticesToRunning() raced with IO threads still accessing execution 
graph internals, occasionally producing:
   
   > AssertionError: Expected size:<4> but was:<3>
   
   ## Brief change log
   
   - Replace `ComponentMainThreadExecutorServiceAdapter.forMainThread()` with 
`NoMainThreadCheckComponentMainThreadExecutor` in `createExecutionGraph()` and 
`testAllTasksInCreatedAndNoSlowTasks()` to allow IO threads to call `execute()` 
without thread assertion failure.
   - Add `ExecutionUtils.waitForTaskDeploymentDescriptorsCreation()` after 
`startScheduling()` in `createExecutionGraph()` and 
`createDynamicExecutionGraph()` to ensure async TDD creation completes before 
`switchAllVerticesToRunning()` is called.
   
   ## Verifying this change
   
   Ran `ExecutionTimeBasedSlowTaskDetectorTest` with `@RepeatedTest(100000`) on 
each test method individually. All 100,000 repetitions passed with no failures.
   
   <img width="1529" height="1348" alt="image" 
src="https://github.com/user-attachments/assets/319406d2-db95-4913-8228-5fb605a694b7";
 />
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
     - The serializers: (yes / no / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
     - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
please
   change the checkbox below to `[X]` followed by the name of the tool, and 
uncomment the
   "Generated-by" line. See the ASF Generative Tooling Guidance for details:
   https://www.apache.org/legal/generative-tooling.html
   
   You are responsible for the quality and correctness of every change in this 
PR
   regardless of the tooling used. Low-effort AI-generated PRs will be closed. 
See
   AGENTS.md for the full guidance.
   -->
   
   - [ ] Yes (please specify the tool below)
   
   <!--
   Generated-by: [Tool Name and Version]
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to