Re: IOException in internalRead! & transient exception communicating with ZooKeeper

Josh Elser Wed, 24 Feb 2016 09:00:06 -0800

ZooKeeper is a funny system. This kind of ConnectionLossException is anormal "state" that a ZooKeeper client can enter. We handle thiscondition in Accumulo, retrying the operation (in this case, a`create()`), after the client can reconnect to the ZooKeeper servers inthe background.

ConnectionLossExceptions can be indicative of over-saturation of yournodes. A ZooKeeper client might lose it's connection because it isstarved for CPU time. It can also indicate that the ZooKeeper serversmight be starved for resources.

* Check the ZooKeeper server logs for any errors about droppedconnections (maxClientCnxns)* Make sure your servers running Accumulo are not running at 100% totalCPU usage and that there is free memory (no swapping).

ACCUMULO-3336 is about a different ZooKeeper error condition called a"session loss". This is when the entire ZooKeeper session needs to betorn down and recreated. This only happens after prolonged pauses in theclient JVM or the ZooKeeper servers actively drop your connections dueto the internal configuration (maxClientCnxns). The stacktrace youcopied is not a session loss error.

Are you saying that when a ZooKeeper server dies, you cannot useAccumulo? How many are you running?


mohit.kaushik wrote:

Sent so early...

Another exception I am getting frequently with zookeeper which is a
bigger problem.
ACCUMULO-3336 <https://issues.apache.org/jira/browse/ACCUMULO-3336> says
it is unresolved yet

Saw (possibly) transient exception communicating with ZooKeeper
        org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/gc/lock
                at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
                at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
                at 
org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
                at 
org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
                at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
                at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

And the worst case is whenever a zookeeper goes down cluster becomes
unreacheble for the time being, untill it restarts ingest process halts.

What do you suggest, I need to resolve these problems. I do not want to
be the ingest process to stop ever.

Thanks
Mohit kaushik


On 02/22/2016 12:06 PM, mohit.kaushik wrote:

I am facing the below given exception continuously, the count keeps on 
increasing every sec(current value around 3000 on a server) I can see the 
exception for all 3 tablet servers.

ACCUMULO-2420  <https://issues.apache.org/jira/browse/ACCUMULO-2420>  says that 
this exception comes when a client closes a connection before scan completes. But the 
connection is not closed every thread uses a common connection object to ingest and 
query, then what could cause this exception?

        java.io.IOException: Connection reset by peer
                at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
                at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
                at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
                at sun.nio.ch.IOUtil.read(IOUtil.java:197)
                at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
                at 
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
                at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
                at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
                at 
org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
                at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
                at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run(CustomNonBlockingServer.java:184)

Regards
Mohit kaushik

Re: IOException in internalRead! & transient exception communicating with ZooKeeper

Reply via email to