We have recently upgraded to C* 1.2.2 from 1.0.2, and we have started seeing errors such as the one below. Our app collects changes and then flushes them out to C* in a batch. Sometimes (at high volume) we see the following error:
The log shows this error repeated for each host in the ring (total: eight) all within the same second: [03/19/13 10:33:37.286 ERROR] Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<someHost.mycompany.com:9160-93> (HThriftClient.java:124) in thread "MessageStorer-thread" org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:122) at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:38) at me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:324) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:272) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) at com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283) at com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233) at com.mycompany.some.package.DataWriter.persistFixMessages(DataWriter.java:140) at com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ... 12 more [03/19/13 10:33:37.289 ERROR] MARK HOST AS DOWN TRIGGERED for host someHost.mycompany.com(so.me.ip.add):9160 (HConnectionManager.java:422) in thread "MessageStorer-thread" [03/19/13 10:33:37.289 ERROR] Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 (HConnectionManager.java:426) in thread "MessageStorer-thread" [03/19/13 10:33:37.289 INFO ] Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160} (ConcurrentHClientPool.java:162) in thread "MessageStorer-thread" [03/19/13 10:33:37.302 INFO ] Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160} (ConcurrentHClientPool.java:170) in thread "MessageStorer-thread" [03/19/13 10:33:37.302 INFO ] Host detected as down was added to retry queue: someHost.mycompany.com(so.me.ip.add):9160 (CassandraHostRetryService.java:68) in thread "MessageStorer-thread" [03/19/13 10:33:37.302 INFO ] Client CassandraClient<someHost.mycompany.com:9160-93> released to inactive or dead pool. Closing. (HConnectionManager.java:408) in thread "MessageStorer-thread" Then the application abandons writing the batch, because it cannot write the changes (the client pool has shut down). On average, this involves abandoning 20k mutations, for a total of 14Mb of data. [03/19/13 10:33:37.302 ERROR] DataWriter write failure -- count:21413 byteSize:14155488 (DataWriter.java:286) in thread "MessageStorer-thread" me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) at com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283) at com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233) at com.mycompany.some.package.DataWriter.persistMessages(DataWriter.java:140) at com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:157) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.cassandra.thrift.Cassandra$Client.send_batch_mutate(Cassandra.java:958) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:949) at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246) at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258) ... 7 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ... 15 more Immediately after shutting down, the pool restarts, so the application continues writing data, but some data has been lost. We have reduced the max size of each batch from 14.4Mb to 13.5Mb, but we are still seeing the errors. Should we reduce the size of the batch? Our application is using the following JARs: libthrift-0.7.0.jar hector-core-1.1-2.jar cassandra-thrift-1.2.1.jar cassandra-javautils-0.7.1.jar cassandra-all-1.2.0.jar What is causing these errors, and how can we eliminate them? Best regards Radu Manolescu _______________________________________________ This message may contain information that is confidential or privileged. If you are not an intended recipient of this message, please delete it and any attachments, and notify the sender that you have received it in error. Unless specifically stated in the message or otherwise indicated, you may not uplicate, redistribute or forward this message or any portion thereof, including any attachments, by any means to any other person, including any retail investor or customer. This message is not a recommendation, advice, offer or solicitation, to buy/sell any product or service, and is not an official confirmation of any transaction. Any opinions presented are solely those of the author and do not necessarily represent those of Barclays. This message is subject to terms available at: www.barclays.com/emaildisclaimer and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays group. _______________________________________________