That error occurred usually because of disks nearly out of space. In your EMR cluster, SSH into one of the nodes and do a `df -h` to check disk usage in all of your EBS storages. HDFS is usually configured to be unhealthy when disks it's writing to are >90% utilized. Once that happens, the DataNode will just be taken out of the list of available nodes and in your case, all the DataNode are not available, causing new blocks to be rejected when the NameNode is requesting for a place to write to (0 available out of 4 nodes).
Even though your cluster said that there's 120Gb available, the available space might not be where DataNode is configured to write to, thus the misleading assumption that you still have available space. This also happens when YARN and/or M/R logs are filling up the disks where the DataNode is running. On Wed, Jun 13, 2018 at 8:56 AM Sowjanya Kakarala <sowja...@agrible.com> wrote: > Hi Sajid, > > As this is development environment, we have limited nodes (4datanodes > 1masternode) on a unmanaged switch. > So here each node will be treated as rack (managed by HDFS, which creates > block copies) with one replica. > > > On Wed, Jun 13, 2018 at 1:31 AM, Sajid Mohammed <sajid.had...@gmail.com> > wrote: > >> what is your rack topology ? >> >> On Tue, Jun 12, 2018 at 9:26 PM Sowjanya Kakarala <sowja...@agrible.com> >> wrote: >> >>> Hi Guys, >>> >>> >>> I have 4datanodes and one master node EMR cluster with 120GB data >>> storage left. I have been running sqoop jobs which loads data to hive >>> table. After some jobs ran successfully I suddenly see these errors all >>> over the name node logs and datanodes logs. >>> >>> I have tried changing so many configurations as suggeted in >>> stackoverflow and hortonworks sites but couldnt find a way for fixing it. >>> >>> >>> Here is the error: >>> >>> 2018-06-12 15:32:35,933 WARN [main] org.apache.hadoop.mapred.YarnChild: >>> Exception running child : >>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): >>> File >>> /user/hive/warehouse/monolith.db/tblname/_SCRATCH0.28417629602676764/time_stamp=2018-04-02/_temporary/1/_temporary/attempt_1528318855054_3528_m_000000_1/part-m-00000 >>> could only be replicated to 0 nodes instead of minReplication (=1). There >>> are 4 datanode(s) running and no node(s) are excluded in this operation. >>> >>> at >>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1735) >>> >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265) >>> >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2561) >>> >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829) >>> >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) >>> >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) >>> >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847) >>> >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) >>> >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486) >>> >>> >>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1435) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1345) >>> >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) >>> >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) >>> >>> at com.sun.proxy.$Proxy14.addBlock(Unknown Source) >>> >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:444) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) >>> >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) >>> >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) >>> >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) >>> >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) >>> >>> at com.sun.proxy.$Proxy15.addBlock(Unknown Source) >>> >>> at >>> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1838) >>> >>> at >>> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1638) >>> >>> at >>> org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704) >>> >>> >>> References I already followed: >>> >>> >>> https://community.hortonworks.com/articles/16144/write-or-append-failures-in-very-small-clusters-un.html >>> >>> >>> https://stackoverflow.com/questions/14288453/writing-to-hdfs-from-java-getting-could-only-be-replicated-to-0-nodes-instead >>> >>> https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo >>> >>> >>> https://stackoverflow.com/questions/36015864/hadoop-be-replicated-to-0-nodes-instead-of-minreplication-1-there-are-1/36310025 >>> >>> >>> Any help is appreciated. >>> >>> >>> Thanks >>> >>> Sowjanya >>> >> > > > -- > > Sowjanya Kakarala > > Infrastructure Software Engineer > > > > Agrible, Inc. | sowja...@agrible.com | 217-848-1128 > > 2021 S. First Street, Suite 201, Champaign, IL 61820 > <https://maps.google.com/?q=2021+S.+First+Street,+Suite+201,+Champaign,+IL+61820&entry=gmail&source=g> > > > > Agrible.com <http://agrible.com/> | facebook > <https://www.facebook.com/Agrible> | youtube > <https://www.youtube.com/c/AgribleInc_TheInsightToDecide> | twitter > <https://twitter.com/Agribleinc> > > [image: Agrible_Logo_Email_Signature.jpg] > -- Thai