Dear Chris:
Thank you for your reply.Do you mean how to maintain my namenodes when failover occurs?
I use ZKFailoverController which is is a new component in zookeeper for my two namenodes,it is a Automatic Namenode failover process in Hadoop。
Then what's should I do?
Thanks.
On 04/1/2017 19:15,Chris Horrocks<chris@hor.rocks> wrote:
The following error occurs when your flume agent tries to write to the standby NameNode:
"Operation category WRITE is not supported in state standby"
What failover mechanism are you using for your NameNodes?
--
Chris Horrocks
On Sat, Apr 1, 2017 at 11:31 am, hui....@wbkit.com <hui....@wbkit.com> wrote:
I have a puzzy problem about flume Failover Sink Processor,Description as below:
Enviroment is that we do HA configuration in HDFS,that is we have 2 namenode,1 status for active(host name:n1) and another for standby(hostname:n2).
We configure our flume.conf like this:
# Flume agent config
c1.sources = source-cd
c1.channels = c_cd
c1.sinks = s_cd1 s_cd2 //s_cd1's hostname is n1,s_cd2's hotname is n2
c1.sinkgroups = g1
c1.sinkgroups.g1.sinks = s_cd1 s_cd2
c1.sinkgroups.g1.processor.type = failover
c1.sinkgroups.g1.processor.priority.s_cd1 = 5
c1.sinkgroups.g1.processor.priority.s_cd2 = 10
c1.sinkgroups.g1.processor.maxpenalty = 10000
c1.sources.source-cd.type = org.apache.flume.source.taildir.TaildirSource
c1.sources.source-cd.channels = c_cd
c1.channels.c_cd.type = memory
c1.channels.c_cd.capacity = 5000000
c1.channels.c_cd.transactionCapacity = 1000
c1.sinks.s_cd1.type = hdfs
c1.sinks.s_cd1.hdfs.path = hdfs://192.168.1.31:8020/PATH
c1.sinks.s_cd1.channel = c_cd
c1.sinks.s_cd1.hdfs.batchSize = 1000
c1.sinks.s_cd1.hdfs.useLocalTimeStamp = true
c1.sinks.s_cd1.hdfs.rollSize = 0
c1.sinks.s_cd1.hdfs.rollCount = 1000000
c1.sinks.s_cd1.hdfs.rollInterval = 600
c1.sinks.s_cd2.type = hdfs
c1.sinks.s_cd2.hdfs.path = hdfs://192.168.1.32:8020/PATH
c1.sinks.s_cd2.channel = c_cd
c1.sinks.s_cd2.hdfs.batchSize = 1000
c1.sinks.s_cd2.hdfs.useLocalTimeStamp = true
c1.sinks.s_cd2.hdfs.rollSize = 0
c1.sinks.s_cd2.hdfs.rollCount = 1000000
c1.sinks.s_cd2.hdfs.rollInterval = 600
In the flume.conf,we know that the priority of n2 is higher than n1,so data should be put into n2.Unfortunately,n2 is breakdown when run big data.according to flume
mechanism,unfinished data should be flow to n1,but the result is not like this.The log is like this:
2017-04-01 15:31:28,285 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-1862131888_34] for 525 seconds. Will retry shortly ...
java.net.ConnectException: Call From archive.cloudera.com/192.168.1.31 to slave01:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:576)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:922)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
... 16 more
According to this log,n2 is breakdown meanwhile n1 doesn't get any data.So we recovery n2.and the log is like this:
2017-03-31 21:16:35,783 WARN org.apache.flume.sink.hdfs.HDFSEventSink: HDFS IO error
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1826)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1404)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:592)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
So how can we do to make n1 get data when n2 is breakdown.Is there any wrong about my configuration or this situation is unsuitable for this.Thanks a lot.
Hollis @ WbkitApr 1 .2017