[ https://issues.apache.org/jira/browse/HDFS-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ayush Saxena resolved HDFS-16697. --------------------------------- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Add logs if resources are not available in NameNodeResourcePolicy > ----------------------------------------------------------------- > > Key: HDFS-16697 > URL: https://issues.apache.org/jira/browse/HDFS-16697 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.1.3 > Environment: Linux version 4.15.0-142-generic > (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu > 5.4.0-6ubuntu1~16.04.12)) > java version "1.8.0_162" > Java(TM) SE Runtime Environment (build 1.8.0_162-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) > Reporter: ECFuzz > Assignee: ECFuzz > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > {code:java} > <property> > <name>dfs.namenode.resource.checked.volumes.minimum</name> > <value>1</value> > <description> > The minimum number of redundant NameNode storage volumes required. > </description> > </property>{code} > I found that when setting the value of > “dfs.namenode.resource.checked.volumes.minimum” is greater than the total > number of storage volumes in the NameNode, it is always impossible to turn > off the safe mode, and when in safe mode, the file system only accepts read > data requests, but not delete, modify and other change requests, which is > greatly limited by the function. > The default value of the configuration item is 1, we set to 2 as an example > for illustration, after starting hdfs logs and the client will throw the > relevant reminders. > {code:java} > 2022-07-27 17:37:31,772 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on > available disk space. Already in safe mode. > 2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe > mode is ON. > Resources are low on NN. Please add or free up more resourcesthen turn off > safe mode manually. NOTE: If you turn off safe mode before adding resources, > the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode > leave" to turn safe mode off. > {code} > {code:java} > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create > directory /hdfsapi/test. Name node is in safe mode. > Resources are low on NN. Please add or free up more resourcesthen turn off > safe mode manually. NOTE: If you turn off safe mode before adding resources, > the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode > leave" to turn safe mode off. NamenodeHostName:192.168.1.167 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) > at java.base/java.security.AccessController.doPrivileged(Native > Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code} > According to the prompt, it is believed that there is not enough resource > space to meet the corresponding conditions to close safe mode, but after > adding or releasing more resources and lowering the resource condition > threshold "dfs.namenode.resource.du.reserved", it still fails to close safe > mode and throws the same prompt . > According to the source code, we know that if the NameNode has redundant > storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum" > set the minimum number of redundant storage volumes will enter safe mode. > After debugging, *we found that the current NameNode storage volumes are > abundant resource space, but because the total number of NameNode storage > volumes is less than the set value, so the number of NameNode storage volumes > with redundancy space must also be less than the set value, resulting in > always entering safe mode.* > In summary, it is found that the configuration item lacks a condition check > and an associated exception handling mechanism, which makes it impossible to > find the root cause of the impact when a misconfiguration occurs. > The solution I propose is to add a mechanism to check the value of this > configuration item, it will printing a warning message in the log when the > value is greater than the number of NameNode storage volumes in order to > solve the problem in time and avoid the misconfiguration from affecting the > subsequent operations of the program. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org