peterxcli opened a new pull request, #8060:
URL: https://github.com/apache/ozone/pull/8060
## What changes were proposed in this pull request?
```
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 82.54 s <<<
FAILURE! -- in org.apache.hadoop.ozone.container.TestContainerReportHandling
org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(LifeCycleState)[2]
-- Time elapsed: 33.84 s <<< ERROR!
org.apache.hadoop.hdds.scm.exceptions.SCMException:
org.apache.ratis.protocol.exceptions.NotLeaderException: Server
a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.translateException(SCMHAInvocationHandler.java:164)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:114)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:73)
at jdk.proxy2/jdk.proxy2.$Proxy42.updateContainerState(Unknown Source)
at
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:283)
at
org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(TestContainerReportHandling.java:100)
...
Caused by: org.apache.ratis.protocol.exceptions.NotLeaderException: Server
a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader
at
org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:780)
at
org.apache.ratis.server.impl.LeaderStateImpl.stop(LeaderStateImpl.java:437)
at
org.apache.ratis.server.impl.RoleInfo.shutdownLeaderState(RoleInfo.java:104)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$close$1(RaftServerImpl.java:530)
at
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
at
org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
at
org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
at
org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:512)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:207)
```
Same problem affects TestContainerReportHandlingWithHA
---
See `Terminating with exit status 1: Invalid event: DELETE at CLOSING
state.` in test result log.
```
2025-03-12 20:19:41,402 [scmNode-3-FixedThreadPoolWithAffinityExecutor-0-0]
INFO container.IncrementalContainerReportHandler
(IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process
CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException:
Server cd03248d-9309-426c-ac05-3168de666b12@group-727E36EF7571 is not the
leader, suggested leader is:
5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076
2025-03-12 20:19:41,402 [scmNode-2-FixedThreadPoolWithAffinityExecutor-0-0]
INFO container.IncrementalContainerReportHandler
(IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process
CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException:
Server 416483bd-419d-4dc1-bc92-de8c5f70f57c@group-727E36EF7571 is not the
leader, suggested leader is:
5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076
2025-03-12 20:19:41,409
[5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater]
ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating
with exit status 1: Invalid event: DELETE at CLOSING state.
org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException:
Invalid event: DELETE at CLOSING state.
at
org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58)
at
org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155)
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832)
at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-03-12 20:19:41,409
[5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater]
ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(206)) -
5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater
caught a Throwable.
org.apache.ratis.server.raftlog.RaftLogIOException:
org.apache.ratis.util.ExitUtils$ExitException: Invalid event: DELETE at CLOSING
state.
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1835)
at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.ratis.util.ExitUtils$ExitException: Invalid event:
DELETE at CLOSING state.
at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:141)
at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:151)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:176)
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832)
... 3 more
Caused by:
org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException:
Invalid event: DELETE at CLOSING state.
at
org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58)
at
org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155)
... 4 more
```
## What has been done?
Wait till scm think that container is closed after datanodes report their
containers are closed before update container with DELETE event
## What is the link to the Apache JIRA
CI:
- build-branch: https://github.com/peterxcli/ozone/actions/runs/13812189282
- flakey-check
- TestContainerReportHandling:
https://github.com/peterxcli/ozone/actions/runs/13812243728
- TestContainerReportHandlingWithHA:
https://github.com/peterxcli/ozone/actions/runs/13812233483
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]