Arpit Agarwal created HDDS-440:
----------------------------------
Summary: Datanode loops forever if it cannot create directories
Key: HDDS-440
URL: https://issues.apache.org/jira/browse/HDDS-440
Project: Hadoop Distributed Data Store
Issue Type: Bug
Components: Ozone Datanode
Reporter: Arpit Agarwal
Datanode starts but runs in a tight loop forever if it cannot create the
DataNode ID directory e.g. due to permissions issues. I encountered this by
having a typo in my ozone-site.xml for {{ozone.scm.datanode.id}}.
In just a few minutes the DataNode had generated over 20GB of log+out files
with the following exception:
{code:java}
2018-09-12 17:28:20,649 WARN org.apache.hadoop.util.concurrent.ExecutorHelper:
Caught exception in thread Datanode State Machine Thread - 2
63:
java.io.IOException: Unable to create datanode ID directories.
at
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-12 17:28:20,648 WARN org.apache.hadoop.util.concurrent.ExecutorHelper:
Execution exception when running task in Datanode State Mach
ine Thread - 160
2018-09-12 17:28:20,650 WARN org.apache.hadoop.util.concurrent.ExecutorHelper:
Caught exception in thread Datanode State Machine Thread - 1
60:
java.io.IOException: Unable to create datanode ID directories.
at
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
at
org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}
We should just exit since this is a fatal issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]