> On average, this involves abandoning 20k mutations, for a total of 14Mb of 
> data.
That's too many mutations to be practical. Each row mutation becomes a single 
task in the mutation thread pool. When you send so many risk flooding the 
mutation thread pool and starving other requests. Each node has by default 32 
threads to write, consider a batch size that makes sense for the number of 
nodes, the number of threads and the number of other clients making requests.

I also think you are running into the max message size for a thrift frame, have 
a look at thrift_framed_transport_size_in_mb and 
thrift_max_message_length_in_mb in the yaml file. 

> Should we reduce the size of the batch?
Yes, yup, sure thing. 
More is not always better. 

> What is causing these errors, and how can we eliminate them?
I would start by using a much smaller batch size. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/03/2013, at 6:49 AM, radu.manole...@barclays.com wrote:

> We have recently upgraded to C* 1.2.2 from 1.0.2, and we have started seeing 
> errors such as the one below.
> Our app collects changes and then flushes them out to C* in a batch.
> Sometimes (at high volume) we see the following error:
>  
> The log shows this error repeated for each host in the ring (total: eight) 
> all within the same second:
>  
> [03/19/13 10:33:37.286 ERROR] Could not flush transport (to be expected if 
> the pool is shutting down) in close for client: 
> CassandraClient<someHost.mycompany.com:9160-93> (HThriftClient.java:124) in 
> thread "MessageStorer-thread"
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Broken pipe
>         at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
>         at 
> org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
>         at 
> me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:122)
>         at 
> me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:38)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:324)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:272)
>         at 
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
>         at 
> com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283)
>         at 
> com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233)
>        at 
> com.mycompany.some.package.DataWriter.persistFixMessages(DataWriter.java:140)
>         at 
> com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
>         ... 12 more
> [03/19/13 10:33:37.289 ERROR] MARK HOST AS DOWN TRIGGERED for host 
> someHost.mycompany.com(so.me.ip.add):9160 (HConnectionManager.java:422) in 
> thread "MessageStorer-thread"
> [03/19/13 10:33:37.289 ERROR] Pool state on shutdown: 
> <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160};
>  IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 
> (HConnectionManager.java:426) in thread "MessageStorer-thread"
> [03/19/13 10:33:37.289 INFO ] Shutdown triggered on 
> <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160}
>  (ConcurrentHClientPool.java:162) in thread "MessageStorer-thread"
> [03/19/13 10:33:37.302 INFO ] Shutdown complete on 
> <ConcurrentCassandraClientPoolByHost>:{someHost.mycompany.com(so.me.ip.add):9160}
>  (ConcurrentHClientPool.java:170) in thread "MessageStorer-thread"
> [03/19/13 10:33:37.302 INFO ] Host detected as down was added to retry queue: 
> someHost.mycompany.com(so.me.ip.add):9160 (CassandraHostRetryService.java:68) 
> in thread "MessageStorer-thread"
> [03/19/13 10:33:37.302 INFO ] Client 
> CassandraClient<someHost.mycompany.com:9160-93> released to inactive or dead 
> pool. Closing. (HConnectionManager.java:408) in thread "MessageStorer-thread"
>  
> Then the application abandons writing the batch, because it cannot write the 
> changes (the client pool has shut down).
> On average, this involves abandoning 20k mutations, for a total of 14Mb of 
> data.
>  
> [03/19/13 10:33:37.302 ERROR] DataWriter write failure -- count:21413 
> byteSize:14155488 (DataWriter.java:286) in thread "MessageStorer-thread”
> me.prettyprint.hector.api.exceptions.HectorTransportException: 
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Broken pipe
>         at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
>         at 
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
>         at 
> com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283)
>         at 
> com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:233)
>         at 
> com.mycompany.some.package.DataWriter.persistMessages(DataWriter.java:140)
>         at 
> com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Broken pipe
>         at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
>         at 
> org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:157)
>         at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.send_batch_mutate(Cassandra.java:958)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:949)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
>         at 
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
>         ... 7 more
> Caused by: java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
>         ... 15 more
>  
> Immediately after shutting down, the pool restarts, so the application 
> continues writing data, but some data has been lost.
> We have reduced the max size of each batch from 14.4Mb to 13.5Mb, but we are 
> still seeing the errors.
> Should we reduce the size of the batch?
>  
> Our application is using the following JARs:
> libthrift-0.7.0.jar
> hector-core-1.1-2.jar
> cassandra-thrift-1.2.1.jar
> cassandra-javautils-0.7.1.jar
> cassandra-all-1.2.0.jar
>  
> What is causing these errors, and how can we eliminate them?
>  
> Best regards
> Radu Manolescu
> _______________________________________________
> 
> This message may contain information that is confidential or privileged. If 
> you are not an intended recipient of this message, please delete it and any 
> attachments, and notify the sender that you have received it in error. Unless 
> specifically stated in the message or otherwise indicated, you may not 
> duplicate, redistribute or forward this message or any portion thereof, 
> including any attachments, by any means to any other person, including any 
> retail investor or customer. This message is not a recommendation, advice, 
> offer or solicitation, to buy/sell any product or service, and is not an 
> official confirmation of any transaction. Any opinions presented are solely 
> those of the author and do not necessarily represent those of Barclays. This 
> message is subject to terms available at:www.barclays.com/emaildisclaimer 
> and, if received from Barclays' Sales or Trading desk, the terms available 
> at:www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays 
> you consent to the foregoing. Barclays Bank PLC is a company registered in 
> England (number 1026167) with its registered office at 1 Churchill Place, 
> London, E14 5HP. This email may relate to or be sent from other members of 
> the Barclays group.
> 
> _______________________________________________
> 

Reply via email to