Re: TTransportException intermittently in 0.7

Carl Bruecken Tue, 31 Aug 2010 15:56:03 -0700

I've made some progress on narrowing this down and am able toreproduce easily. I am using pelops as a client and I configured thepolicy in pelops to only establish 1 connection to a cassandra node.I'm able to step through the pelops code line by line and see theresulting thrift transport logging in cassandra. Seems that flushingthe transport causes the unwanted TTransportConnection in the server andsubsequent closing of the connection. The connection should stay openafter flushing. When there are many connection established thebehaviour seems intermittent and many operations succeed.



Here are the details

1) The trigger from the client side is when the framed transport is flushed.
               conn.getAPI().batch_mutate(convertedBatch, cLevel);
                // Flush connection
                conn.flush();

2) In CustomTThreadPoolServer.java in Cassandra I modified the code tolog TTransportExceptions.


        catch (TTransportException ttx) {
        LOGGER.error("Transport exception", ttx);
    } catch (TException tx) {

LOGGER.error("Thrift error occurred during processing ofmessage.", tx);

    } catch (Exception x) {
        LOGGER.error("Error occurred during processing of message.", x);
    }

3) Here is the exception that is ignored in cassandra. Flushing thetransport causes the server to believe the client has closed the connection.

org.apache.thrift.transport.TTransportException: Cannot read. Remoteside has closed. Tried to read 4 bytes, but only got 0 bytes.

    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

atorg.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)atorg.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)atorg.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)atorg.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487)atorg.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

    at java.lang.Thread.run(Thread.java:637)

4) The next batch mutate to this connection caused the exception in theclient

WARN [main] 2010-08-31 18:40:06,749 Operand.java (line 72) Operationfailed as result of network exception. Connection must be destroyed.See cause for details...org.apache.thrift.transport.TTransportException:java.net.SocketException: Connection resetatorg.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

atorg.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)atorg.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

    at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
    at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:38)
    at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:53)
    at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:49)
    at com.aol.data.c7.App.doWork(App.java:41)
    at com.aol.data.c7.App.main(App.java:77)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:168)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)

atorg.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

    ... 15 more




On 8/31/10 4:04 PM, Jonathan Ellis wrote:


No, I don't know that anyone has reproduced that.  TTransportException
always means "something went wrong on the thrift side" in my
experience, it shouldn't be cassandra-version specific.

On Tue, Aug 31, 2010 at 12:53 PM, Carl Bruecken
<carl.bruec...@corp.aol.com> wrote:
>
>  Are there any estimates as to when a fix for this will be checked into
> trunk?
>
> Coincidentally, has anyone tracked down the issue?
>
>  I'm experiencing same issue with nightly build from a week ago.
>
> Thank You
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: TTransportException intermittently in 0.7

Reply via email to