Arpit Agarwal created HDDS-440:
----------------------------------

             Summary: Datanode loops forever if it cannot create directories
                 Key: HDDS-440
                 URL: https://issues.apache.org/jira/browse/HDDS-440
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Arpit Agarwal


Datanode starts but runs in a tight loop forever if it cannot create the 
DataNode ID directory e.g. due to permissions issues. I encountered this by 
having a typo in my ozone-site.xml for {{ozone.scm.datanode.id}}.

In just a few minutes the DataNode had generated over 20GB of log+out files 
with the following exception:
{code:java}
2018-09-12 17:28:20,649 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: 
Caught exception in thread Datanode State Machine Thread - 2
63:
java.io.IOException: Unable to create datanode ID directories.
at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-12 17:28:20,648 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: 
Execution exception when running task in Datanode State Mach
ine Thread - 160
2018-09-12 17:28:20,650 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: 
Caught exception in thread Datanode State Machine Thread - 1
60:
java.io.IOException: Unable to create datanode ID directories.
at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
at 
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}

We should just exit since this is a fatal issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to