slfan1989 opened a new pull request, #8266:
URL: https://github.com/apache/hadoop/pull/8266
### Description of PR
JIRA: YARN-11934. Fix testComponentHealthThresholdMonitor race condition.
#### Problem
`TestYarnNativeServices.testComponentHealthThresholdMonitor` test fails
intermittently with the following error:
```
[INFO] Running org.apache.hadoop.yarn.service.TestYarnNativeServices
[ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
953.6 s <<< FAILURE! -- in org.apache.hadoop.yarn.service.TestYarnNativeServices
[ERROR]
org.apache.hadoop.yarn.service.TestYarnNativeServices.testComponentHealthThresholdMonitor
-- Time elapsed: 72.65 s <<< FAILURE!
org.opentest4j.AssertionFailedError: Service should not be in a stable
state. It should throw a timeout exception.
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:138)
at
org.apache.hadoop.yarn.service.TestYarnNativeServices.testComponentHealthThresholdMonitor(TestYarnNativeServices.java:799)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```
#### Root Case
The test has a race condition after calling `flexByRestService()`. The test
expects `waitForServiceToBeStable()` to timeout (because anti-affinity prevents
the 4th container from being allocated), but instead it returns immediately.
The issue occurs because:
1. `flexByRestService()` is called to change the number of containers
2. `waitForServiceToBeStable()` is called immediately after
3. If the flex operation hasn't taken effect yet, the service is still in
the old STABLE state
4. `waitForServiceToBeStable()` returns immediately instead of waiting and
timing out as expected
#### Solution
Introduce a new helper method `waitForServiceToLeaveStable()` that ensures
the service state has transitioned away from STABLE before proceeding with
subsequent assertions. This guarantees the flex operation has taken effect.
### How was this patch tested?
> ./mvnw -pl
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
-Dtest=TestYarnNativeServices#testComponentHealthThresholdMonitor test
```
[INFO] --- surefire:3.5.3:test (default-test) @ hadoop-yarn-services-core ---
[INFO] Using auto detected provider
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.yarn.service.TestYarnNativeServices
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 92.31
s -- in org.apache.hadoop.yarn.service.TestYarnNativeServices
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 02:07 min
[INFO] Finished at: 2026-02-23T10:54:16+08:00
[INFO]
------------------------------------------------------------------------
```
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
If an AI tool was used:
- [ ] The PR includes the phrase "Contains content generated by <tool>"
where <tool> is the name of the AI tool used.
- [ ] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]