HA: hdfs balancer throws StandbyException -----------------------------------------
Key: HDFS-3052 URL: https://issues.apache.org/jira/browse/HDFS-3052 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 0.24.0 Reporter: Stephen Chu Attachments: balancer_styx01, balancer_styx02 The hdfs balancer tool throws a StandbyException. Originally, styx01 hosts the active NN and styx02 hosts the standby NN. After failing over from styx01 NN to styx02 NN, the _hdfs balancer_ command thows a StandbyException: {noformat} 12/03/06 00:34:01 INFO balancer.Balancer: namenodes = {ha-nn-uri={nn1=styx01.sf.cloudera.com/172.29.5.192:12020, nn2=styx02.sf.cloudera.com/172.29.5.193:12020}} 12/03/06 00:34:01 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved org.apache.hadoop.ipc.StandbyException: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1028) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1522) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:437) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:254) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42590) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:878) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1622) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1618) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1616) . Exiting ... Balancing took 650.0 milliseconds {noformat} After failing back so that active is on styx01 and standby is on styx02, the _hdfs balancer_ command runs without exception. Failing over again results in the same StandbyException. Service ID nn1 corresponds to node styx01, and nn2 corresponds to styx02. Console output from styx01 and styx02 is attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira