We have recently upgraded to C* 1.2.2 from 1.0.2, and we have started seeing 
errors such as the one below.
Our app collects changes and then flushes them out to C* in a batch.
Sometimes (at high volume) we see the following error:

The log shows this error repeated for each host in the ring (total: eight) all 
within the same second:

[03/19/13 10:33:37.286 ERROR] Could not flush transport (to be expected if the 
pool is shutting down) in close for client: 
CassandraClient<someHost.mycompany.com:9160-93> (HThriftClient.java:124) in 
thread "MessageStorer-thread"
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Broken pipe
        at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
        at 
org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
        at 
me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:122)
        at 
me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:38)
        at 
me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:324)
        at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:272)
        at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
        at 
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
        at 
com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283)
        at 
com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233)
       at 
com.mycompany.some.package.DataWriter.persistFixMessages(DataWriter.java:140)
        at 
com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
        ... 12 more
[03/19/13 10:33:37.289 ERROR] MARK HOST AS DOWN TRIGGERED for host 
someHost.mycompany.com(so.me.ip.add):9160 (HConnectionManager.java:422) in 
thread "MessageStorer-thread"
[03/19/13 10:33:37.289 ERROR] Pool state on shutdown: 
<ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160};
 IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 
(HConnectionManager.java:426) in thread "MessageStorer-thread"
[03/19/13 10:33:37.289 INFO ] Shutdown triggered on 
<ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160}
 (ConcurrentHClientPool.java:162) in thread "MessageStorer-thread"
[03/19/13 10:33:37.302 INFO ] Shutdown complete on 
<ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160}
 (ConcurrentHClientPool.java:170) in thread "MessageStorer-thread"
[03/19/13 10:33:37.302 INFO ] Host detected as down was added to retry queue: 
someHost.mycompany.com(so.me.ip.add):9160 (CassandraHostRetryService.java:68) 
in thread "MessageStorer-thread"
[03/19/13 10:33:37.302 INFO ] Client 
CassandraClient<someHost.mycompany.com:9160-93> released to inactive or dead 
pool. Closing. (HConnectionManager.java:408) in thread "MessageStorer-thread"

Then the application abandons writing the batch, because it cannot write the 
changes (the client pool has shut down).
On average, this involves abandoning 20k mutations, for a total of 14Mb of data.

[03/19/13 10:33:37.302 ERROR] DataWriter write failure -- count:21413 
byteSize:14155488 (DataWriter.java:286) in thread "MessageStorer-thread"
me.prettyprint.hector.api.exceptions.HectorTransportException: 
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Broken pipe
        at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
        at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
        at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
        at 
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
        at 
com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283)
        at 
com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233)
        at 
com.mycompany.some.package.DataWriter.persistMessages(DataWriter.java:140)
        at 
com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.thrift.transport.TTransportException: 
java.net.SocketException: Broken pipe
        at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
        at 
org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:157)
        at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
        at 
org.apache.cassandra.thrift.Cassandra$Client.send_batch_mutate(Cassandra.java:958)
        at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:949)
        at 
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
        at 
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
        at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
        at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
        ... 7 more
Caused by: java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
        ... 15 more

Immediately after shutting down, the pool restarts, so the application 
continues writing data, but some data has been lost.
We have reduced the max size of each batch from 14.4Mb to 13.5Mb, but we are 
still seeing the errors.
Should we reduce the size of the batch?

Our application is using the following JARs:
libthrift-0.7.0.jar
hector-core-1.1-2.jar
cassandra-thrift-1.2.1.jar
cassandra-javautils-0.7.1.jar
cassandra-all-1.2.0.jar

What is causing these errors, and how can we eliminate them?

Best regards
Radu Manolescu

_______________________________________________

This message may contain information that is confidential or privileged. If you 
are not an intended recipient of this message, please delete it and any 
attachments, and notify the sender that you have received it in error. Unless 
specifically stated in the message or otherwise indicated, you may not 
uplicate, redistribute or forward this message or any portion thereof, 
including any attachments, by any means to any other person, including any 
retail investor or customer. This message is not a recommendation, advice, 
offer or solicitation, to buy/sell any product or service, and is not an 
official confirmation of any transaction. Any opinions presented are solely 
those of the author and do not necessarily represent those of Barclays.

This message is subject to terms available at: www.barclays.com/emaildisclaimer 
and, if received from Barclays' Sales or Trading desk, the terms available at: 
www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you 
consent to the foregoing. Barclays Bank PLC is a company registered in England 
(number 1026167) with its registered office at 1 Churchill Place, London, E14 
5HP. This email may relate to or be sent from other members of the Barclays 
group.

_______________________________________________

Reply via email to