Pratyush Bhatt created HDDS-10689:
-------------------------------------
Summary: [HBase Ozone] All HBase HMasters/RS down with
"OMException: Unable to allocate a container to the block"
Key: HDDS-10689
URL: https://issues.apache.org/jira/browse/HDDS-10689
Project: Apache Ozone
Issue Type: Bug
Components: SCM
Reporter: Pratyush Bhatt
Both the _HMasters_ and all _RS_ failed with same "OMException: Unable to
allocate a container to the block" error approximately at the same time.
Logs from {_}HMaster{_}:
{code:java}
2024-04-10 18:24:23,197 ERROR org.apache.hadoop.hbase.master.HMaster: *****
ABORTING master Master-1,22001,1712638569318: IOE in log roller *****
INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Unable to
allocate a container to the block of size: 268435456, replicationConfig:
RATIS/THREE. Waiting for one of pipelines to be OPEN failed. Pipeline
f1362ba6-ee67-48a9-bdb7-ac80e8d55435,3c2d89bc-935b-424f-8c90-6dcc74933640,040a71f0-fa7d-43ff-baed-37ae3ee87c63,31caf9ea-c145-4d37-91ef-456088158b99,37b5a056-55e0-485a-ad35-53ef27069e39
is not ready in 60000 ms
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:756)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2293)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2281)
at
org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2115)
at
org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:855)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:400)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:304)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createNonRecursive(BasicRootedOzoneFileSystem.java:280)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1382)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1360)
at
org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:63)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160)
at
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
2024-04-10 18:24:23,200 INFO org.apache.ranger.plugin.util.PolicyRefresher:
PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208)
{code}
Checked the SCM Leader logs, showing WARN logs like below:
{code:java}
2024-04-10 18:23:22,106 WARN [IPC Server handler 3 on
9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider:
Pipeline creation failed for repConfig RATIS/THREE Datanodes may be used up.
Try to see if any pipeline is in ALLOCATED state, and then will wait for it to
be OPEN
org.apache.hadoop.hdds.scm.exceptions.SCMException: Pipeline creation failed
due to no sufficient healthy datanodes. Required 3. Found 1. Excluded 7.
at
org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.filterViableNodes(PipelinePlacementPolicy.java:167)
at
org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.chooseDatanodesInternal(PipelinePlacementPolicy.java:256)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:209)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:140)
at
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:176)
at
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:56)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:89)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:255)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:241)
at
org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:100)
at
org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
at
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163)
at
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:206)
at
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198)
at
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:144)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
at
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:115)
at
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15752)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]