[ https://issues.apache.org/jira/browse/HDFS-15556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoqiao He resolved HDFS-15556. -------------------------------- Resolution: Duplicate Close this issue and linked to HDFS-14042. [~haiyang Hu] Please feel free to reopen if meet other issues which HDFS-14042 can not resolve. Thanks. > Fix NPE in DatanodeDescriptor#updateStorageStats when handle DN Lifeline > ------------------------------------------------------------------------ > > Key: HDFS-15556 > URL: https://issues.apache.org/jira/browse/HDFS-15556 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.2.0 > Reporter: huhaiyang > Priority: Critical > Attachments: HDFS-15556.001.patch, NN-CPU.png, NN_DN.LOG > > > In our cluster, the NameNode appears NPE when processing lifeline messages > sent by the DataNode, which will cause an maxLoad exception calculated by NN. > because DataNode is identified as busy and unable to allocate available nodes > in choose DataNode, program loop execution results in high CPU and reduces > the processing performance of the cluster. > *NameNode the exception stack*: > {code:java} > 2020-08-25 00:59:02,977 WARN org.apache.hadoop.ipc.Server: IPC Server handler > 5 on 8022, call Call#20535 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeLifelineProtocol.sendLifeline > from xxxxx:34766 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.updateStorageStats(DatanodeDescriptor.java:460) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.updateHeartbeatState(DatanodeDescriptor.java:390) > at > org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.updateLifeline(HeartbeatManager.java:254) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleLifeline(DatanodeManager.java:1805) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleLifeline(FSNamesystem.java:4039) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendLifeline(NameNodeRpcServer.java:1761) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeLifelineProtocolServerSideTranslatorPB.sendLifeline(DatanodeLifelineProtocolServerSideTranslatorPB.java:62) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeLifelineProtocolProtos$DatanodeLifelineProtocolService$2.callBlockingMethod(DatanodeLifelineProtocolProtos.java:409) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2717) > {code} > {code:java} > // DatanodeDescriptor#updateStorageStats > ... > for (StorageReport report : reports) { > DatanodeStorageInfo storage = null; > synchronized (storageMap) { > storage = > storageMap.get(report.getStorage().getStorageID()); > } > if (checkFailedStorages) { > failedStorageInfos.remove(storage); > } > storage.receivedHeartbeat(report); // NPE exception occurred here > // skip accounting for capacity of PROVIDED storages! > if (StorageType.PROVIDED.equals(storage.getStorageType())) { > continue; > } > ... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org