Soumitra Sulav created HDDS-660:
-----------------------------------

             Summary: StatusRuntimeException : DataNode going dead
                 Key: HDDS-660
                 URL: https://issues.apache.org/jira/browse/HDDS-660
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Filesystem
    Affects Versions: 0.3.0
            Reporter: Soumitra Sulav


Issue 1 : hdfs operations throw error as *INTERNAL_ERROR* when one of the 
datanode is down, reason being it isn't able to replicate to minimum datanodes. 
_ERROR log could be more specific._

Issue 2 : Datanode process is running but is in a dead state as per SCM. Also 
there are exceptions in DataNode logs *StatusRuntimeException: INTERNAL: 
group-4D3A6FFFBFE2 not found.* Is there a way to fix any filesystem corruptions 
or a fsck utility like hdfs.

+Steps followed to encounter the above issue :+

I had a clean setup of ozone cluster and tried starting HDP services on o3 as 
defaultFS.

Startup of YARN failed and on seeing the logs and UI, I see that one of the 
datanode's state is going to DEAD.

The hdfs cli commands on ozone fs gives below exception :
{code:java}
[root@hcatest-1 ~]# ozone fs -put ozone-site.xml /
2018-10-15 09:33:20,385 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2018-10-15 09:33:21,774 ERROR io.ChunkGroupOutputStream: Try to allocate more 
blocks for write failed, already allocated 0 blocks for this write.
put: Allocate block failed, error:INTERNAL_ERROR
{code}
Error logs on SCM :
{code:java}
2018-10-15 10:16:54,303 WARN org.apache.hadoop.hdds.scm.block.BlockManagerImpl: 
Unable to allocate container: {}
org.apache.hadoop.hdds.scm.exceptions.SCMException
at 
org.apache.hadoop.hdds.scm.pipelines.PipelineSelector.getReplicationPipeline(PipelineSelector.java:268)
at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.allocateContainer(ContainerStateManager.java:270)
at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.allocateContainer(SCMContainerManager.java:312)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.preAllocateContainers(BlockManagerImpl.java:165)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:279)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2018-10-15 10:16:54,303 ERROR 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl: Unable to allocate a block 
for the size: 268435456, type: RATIS, factor: THREE{code}
DataNode error logs :
{code:java}
2018-10-15 10:33:13,522 INFO org.apache.ratis.server.impl.LeaderElection: 
0e4e7c9b-84a9-48a3-b44d-d906231e77b2 got exception when requesting votes: {}
java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
3cf6e2da-4fdb-4198-a24d-5c34ca02fe4d: group-4D3A6FFFBFE2 not found.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
at 
org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
INTERNAL: 3cf6e2da-4fdb-4198-a24d-5c34ca02fe4d: group-4D3A6FFFBFE2 not found.
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:222)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:203)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:132)
at 
org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
at 
org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:61)
at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:150)
at 
org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-10-15 10:33:13,523 INFO org.apache.ratis.server.impl.LeaderElection: 
0e4e7c9b-84a9-48a3-b44d-d906231e77b2: Election REJECTED; received 0 response(s) 
[] and 2 exception(s); 0e4e7c9b-84a9-48a3-b44d-d906231e77b2:t140, leader=null, 
voted=0e4e7c9b-84a9-48a3-b44d-d906231e77b2, raftlog=[(t:1, i:1)], conf=0: 
[76b2ad5f-1a40-4a28-9fc1-b91437fe1398:172.22.119.190:9858, 
0e4e7c9b-84a9-48a3-b44d-d906231e77b2:172.22.119.189:9858, 
3cf6e2da-4fdb-4198-a24d-5c34ca02fe4d:172.22.119.19:9858], old=null
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to