Arpit Agarwal created HDDS-440: ---------------------------------- Summary: Datanode loops forever if it cannot create directories Key: HDDS-440 URL: https://issues.apache.org/jira/browse/HDDS-440 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Arpit Agarwal
Datanode starts but runs in a tight loop forever if it cannot create the DataNode ID directory e.g. due to permissions issues. I encountered this by having a typo in my ozone-site.xml for {{ozone.scm.datanode.id}}. In just a few minutes the DataNode had generated over 20GB of log+out files with the following exception: {code:java} 2018-09-12 17:28:20,649 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 2 63: java.io.IOException: Unable to create datanode ID directories. at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-09-12 17:28:20,648 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when running task in Datanode State Mach ine Thread - 160 2018-09-12 17:28:20,650 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 1 60: java.io.IOException: Unable to create datanode ID directories. at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} We should just exit since this is a fatal issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org