Sergey Soldatov created HDDS-13621:
--------------------------------------
Summary: NPE in OzoneManagerRatisServer.checkRetryCache
Key: HDDS-13621
URL: https://issues.apache.org/jira/browse/HDDS-13621
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Manager
Affects Versions: 2.1.0
Reporter: Sergey Soldatov
Under a load, OM periodically fails to check the RetryCache:
{code:java}
2025-08-27 16:18:09,562 WARN ipc.Server: IPC Server handler 0 on default port
9862, call Call#5998989 Retry#2
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from
10.88.252.12:48376
java.lang.NullPointerException: Cannot invoke
"org.apache.ratis.protocol.Message.getContent()" because the return value of
"org.apache.ratis.protocol.RaftClientReply.getMessage()" is null
at
org.apache.hadoop.ozone.om.helpers.OMRatisHelper.getOMResponseFromRaftClientReply(OMRatisHelper.java:68)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getOMResponse(OzoneManagerRatisServer.java:570)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.checkRetryCache(OzoneManagerRatisServer.java:495)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.internalProcessRequest(OzoneManagerProtocolServerSideTranslatorPB.java:168)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:124)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:115)
at
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3198){code}
It's not clear yet whether this is Ozone or Ratis issue. RCA is in progress.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]