slfan1989 opened a new pull request, #8269:
URL: https://github.com/apache/hadoop/pull/8269
### Description of PR
JIRA: HDFS-17885. Fix TestDFSAdmin.testAllDatanodesReconfig flaky test.
#### Problem
`TestDFSAdmin.testAllDatanodesReconfig` test fails with the following error:
```
Expected size:<3> but was:<1> in:
<["Starting of reconfiguration task successful on 0 nodes, failed on 2
nodes."]>
at
org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1263)
```
#### Root Cause
The test has a "self-conflicting" issue where it starts the reconfiguration
task twice on the same DataNodes:
- First call: `admin.startReconfiguration("datanode", "livenodes")` -
Successfully starts reconfiguration on 2 DataNodes
- Second call: `reconfigurationOutErrFormatter("startReconfiguration", ...)`
- Internally calls `admin.startReconfigurationUtil(...)` again
The problem is that DataNode's `startReconfigurationTask()` does not allow
concurrent reconfiguration. If a reconfiguration task is already running, it
throws `IOException` with message `Another reconfiguration task is running.`
Therefore, the second invocation fails on both DataNodes, resulting in
output containing only the summary line:
```
Starting of reconfiguration task successful on 0 nodes, failed on 2 nodes.
```
This causes the assertion `assertThat(outsForStartReconf).hasSize(3)` to
fail because:
- Expected: 2 "Started reconfiguration task on node" lines + 1 summary line
= 3 lines
- Actual: 0 success lines + 1 summary line = 1 line
#### Solution
Remove the duplicate invocation by:
1. Calling `admin.startReconfigurationUtil()` only once
2. Directly capturing the output to ByteArrayOutputStream
3. Parsing the output for assertions
Additionally, improve the test robustness by:
- Using `NUM_DATANODES` constant instead of hardcoded values
- Using stream filtering to count "Started reconfiguration" lines instead of
relying on fixed positions (which is more resilient to concurrent output
ordering)
- Removing unnecessary `Thread.sleep(1000)` before
`awaitReconfigurationFinished()`
### How was this patch tested?
> ./mvnw -pl hadoop-hdfs-project/hadoop-hdfs
-Dtest=TestDFSAdmin#testAllDatanodesReconfig test
```
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.tools.TestDFSAdmin
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader
classes because bootstrap classpath has been appended
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.421
s -- in org.apache.hadoop.hdfs.tools.TestDFSAdmin
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
```
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
If an AI tool was used:
- [ ] The PR includes the phrase "Contains content generated by <tool>"
where <tool> is the name of the AI tool used.
- [ ] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]