It's an inter node timeout waiting for the read to complete. Normally means the cluster is overloaded in some fashion, check for GC activity and/or overloaded IOPs.
If you reduce the batch_size it should help. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/06/2013, at 4:10 AM, Brian Jeltema <brian.jelt...@digitalenvoy.net> wrote: > I'm having problems with Hadoop job failures on a Cassandra 1.2 cluster due > to > > Caused by: TimedOutException() > 2013-06-24 11:29:11,953 INFO Driver - at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932) > > This is running on a 6-node cluster, RF=3. If I run the job with CL=ONE, it > usually runs pretty well, with an occasional timeout. But > if I run at CL=QUORUM, the number of timeouts is often enough to kill the > job. The table being read is effectively read-only when this job runs. > It has from 5 to 10 million rows, with each row having no more than 256 > columns. Each column typically only has a few hundred bytes of data at most. > > I've fiddled with the batch-range size and increasing the timeout, without a > lot of luck. I see some evidence of GC activity in the Cassandra logs, but > it's hard to see a clear correlation with the timeouts. > > I could use some suggestions on an approach to pin down the root cause. > > TIA > > Brian