I've made some progress on narrowing this down and am able to reproduce easily. I am using pelops as a client and I configured the policy in pelops to only establish 1 connection to a cassandra node. I'm able to step through the pelops code line by line and see the resulting thrift transport logging in cassandra. Seems that flushing the transport causes the unwanted TTransportConnection in the server and subsequent closing of the connection. The connection should stay open after flushing. When there are many connection established the behaviour seems intermittent and many operations succeed.


Here are the details

1) The trigger from the client side is when the framed transport is flushed.
               conn.getAPI().batch_mutate(convertedBatch, cLevel);
                // Flush connection
                conn.flush();

2) In CustomTThreadPoolServer.java in Cassandra I modified the code to log TTransportExceptions.

        catch (TTransportException ttx) {
        LOGGER.error("Transport exception", ttx);
    } catch (TException tx) {
LOGGER.error("Thrift error occurred during processing of message.", tx);
    } catch (Exception x) {
        LOGGER.error("Error occurred during processing of message.", x);
    }


3) Here is the exception that is ignored in cassandra. Flushing the transport causes the server to believe the client has closed the connection.

org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes.
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:637)

4) The next batch mutate to this connection caused the exception in the client

WARN [main] 2010-08-31 18:40:06,749 Operand.java (line 72) Operation failed as result of network exception. Connection must be destroyed. See cause for details... org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
    at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
    at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:38)
    at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:53)
    at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:49)
    at com.aol.data.c7.App.doWork(App.java:41)
    at com.aol.data.c7.App.main(App.java:77)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:168)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 15 more




On 8/31/10 4:04 PM, Jonathan Ellis wrote:

No, I don't know that anyone has reproduced that.  TTransportException
always means "something went wrong on the thrift side" in my
experience, it shouldn't be cassandra-version specific.

On Tue, Aug 31, 2010 at 12:53 PM, Carl Bruecken
<carl.bruec...@corp.aol.com> wrote:
>
>  Are there any estimates as to when a fix for this will be checked into
> trunk?
>
> Coincidentally, has anyone tracked down the issue?
>
>  I'm experiencing same issue with nightly build from a week ago.
>
> Thank You
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Reply via email to