master

Ted Yu Mon, 17 Nov 2014 11:31:30 -0800

Any chance that you can use three servers in your zookeeper quorum ?

Cheers


On Mon, Nov 17, 2014 at 11:21 AM, eluiggi <[email protected]> wrote:

> Hi,
>
> I have an hbase (0.96.1.1-cdh5.0.2) cluster on AWS managed by Cloudera with
> 4 region servers and 1 zookeeper server. The zookeeper server is running on
> the same node as the hbase master. The problem I'm facing is that 3/4
> region
> servers are down because they can't connect to the zookeeper. The only
> region server that stays up is the one running on the same node as the
> master and zookeeper. Below is the relevant section of one of the failing
> region server logs.
>
> 2014-11-14 15:46:59,871 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,  connectString=ip-10-146-188-157.ec2.internal:2181
> sessionTimeout=60000 watcher=regionserver:60020,
> quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase
> 2014-11-14 15:46:59,915 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=regionserver:60020 connecting to ZooKeeper
> ensemble=ip-10-146-188-157.ec2.internal:2181
> 2014-11-14 15:46:59,920 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181.
> Will not attempt to authenticate using SASL (unknown error)
> 2014-11-14 15:47:00,649 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook
> thread: Shutdownhook:regionserver60020
> 2014-11-14 15:47:59,948 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 60041ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2014-11-14 15:48:00,067 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2014-11-14 15:48:00,072 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 1000ms before retry #0...
> 2014-11-14 15:48:01,067 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181.
> Will not attempt to authenticate using SASL (unknown error)
> 2014-11-14 15:49:00,123 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 60057ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2014-11-14 15:49:00,224 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2014-11-14 15:49:00,224 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2014-11-14 15:49:01,224 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181.
> Will not attempt to authenticate using SASL (unknown error)
> 2014-11-14 15:50:00,259 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 60035ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2014-11-14 15:50:00,360 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2014-11-14 15:50:00,360 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 4000ms before retry #2...
> 2014-11-14 15:50:01,360 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181.
> Will not attempt to authenticate using SASL (unknown error)
> 2014-11-14 15:51:00,408 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 60048ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2014-11-14 15:51:00,509 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2014-11-14 15:51:00,509 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 8000ms before retry #3...
> 2014-11-14 15:51:01,509 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181.
> Will not attempt to authenticate using SASL (unknown error)
> 2014-11-14 15:52:00,559 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 60051ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2014-11-14 15:52:00,659 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode =  ConnectionLoss for /hbase/master
> 2014-11-14 15:52:00,660 ERROR
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists
> failed after 4 attempts
> 2014-11-14 15:52:00,661 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020,   quorum=ip-10-146-188-157.ec2.internal:2181,
> baseZNode=/hbase Unable to set watcher on znode  /hbase/master
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss  for  /hbase/master
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
>     at java.lang.Thread.run(Thread.java:744)
> 2014-11-14 15:52:00,687 ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:   regionserver:60020,
> quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase Received
> unexpected   KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
>     at java.lang.Thread.run(Thread.java:744)
> 2014-11-14 15:52:00,692 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> 0.0.0.0,60020,1415998019646: Unexpected exception during initialization,
> aborting
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
>     at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644)
>     at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772)
>     at java.lang.Thread.run(Thread.java:744)
>
> The hbase-site.xml fraction dealing with zookeeper is.
> <property>
>   <name>zookeeper.znode.parent</name>
>   <value>/hbase</value>
> </property>
> <property>
>   <name>zookeeper.znode.rootserver</name>
>   <value>root-region-server</value>
> </property>
> <property>
>   <name>hbase.zookeeper.quorum</name>
>   <value>ip-10-146-188-157.ec2.internal</value>
> </property>
> <property>
>   <name>hbase.zookeeper.property.clientPort</name>
>   <value>2181</value>
> </property>
>
> The /etc/hosts for each of the nodes is:
> 127.0.0.1               localhost.localdomain localhost
> ::1             localhost6.localdomain6 localhost6
>
>
> Following some other threads I have removed the limit on the number of
> connections, increased the timeout value, and explicitly added the hosts to
> /etc/hosts on the region server and master nodes. None of these have helped
> so far.
>
> Any help will be greatly appreciated.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/ConnectionLossException-KeeperErrorCode-ConnectionLoss-for-hbase-master-tp4066034.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master

Reply via email to