Cool, so it's a server side because - in the client side stack the thrift code is raising the error - server side log has this DEBUG 22:29:10,318 ... timed out
The TimedOutException is raised when the number of replicas required by your CL have not returned inside the timespan specified by rpc_timeout in conf/cassandra.yaml. In general this means your cluster cannot keep up or there is some sort of problem. There can be a number of reasons by things may be going slow, look into: - the logs on other machines and see if they have messages like "Dropped {} {} messages in the last {}ms" . This means the message was delivered but not processed in time. - check IO performance http://spyced.blogspot.com/2010/01/linux-performance-basics.html - check cassandra thread pools to see if things are backing up nodetool tpstats Hope that helps. Aaron On 9/03/2011, at 11:33 AM, A J wrote: > Client side (it is just a 5th instance in the same EC2 zone, having > stress.py installed on it) gives the following error: > > Process Inserter-4: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-1: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-3: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-8: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-2: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-6: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-5: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-7: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-9: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > Process Inserter-10: > Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "stress.py", line 238, in run > self.cclient.batch_mutate(cfmap, consistency) > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 873, in batch_mutate > self.recv_batch_mutate() > File > "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 899, in recv_batch_mutate > raise result.te > TimedOutException: TimedOutException() > > The related server side errors look like: > DEBUG 22:29:04,407 Deleting CommitLog-1299623301883.log.header > DEBUG 22:29:04,412 Deleting CommitLog-1299623301883.log > DEBUG 22:29:04,443 Deleting CommitLog-1299623318627.log.header > DEBUG 22:29:04,443 Deleting CommitLog-1299623318627.log > DEBUG 22:29:09,202 ... timed out > DEBUG 22:29:09,426 ... timed out > DEBUG 22:29:10,318 ... timed out > DEBUG 22:29:11,354 logged out: #<User allow_all groups=[]> > DEBUG 22:29:11,354 logged out: #<User allow_all groups=[]> > DEBUG 22:29:11,354 logged out: #<User allow_all groups=[]> > DEBUG 22:29:12,442 Processing response on a callback from 784@/10.253.203.224 > DEBUG 22:29:12,443 Processing response on a callback from 786@/10.253.203.224 > DEBUG 22:29:12,443 Processing response on a callback from 791@/10.253.203.224 > > > > On Tue, Mar 8, 2011 at 3:22 PM, aaron morton <aa...@thelastpickle.com> wrote: >> Is this a client side time out or a server side one? What does the error >> stack look like ? >> Also check the server side logs for errors. The thrift API will raise a >> timeout when less the CL level of nodes return in rpc_timeout. >> Good luck >> Aaron >> On 9/03/2011, at 7:37 AM, ruslan usifov wrote: >> >> >> 2011/3/8 A J <s5a...@gmail.com> >>> >>> Trying out stress.py on AWS EC2 environment (4 Large instances. Each >>> of 2-cores and 7.5GB RAM. All in the same region/zone.) >>> >>> python stress.py -o insert -d >>> 10.253.203.224,10.220.203.48,10.220.17.84,10.124.89.81 -l 2 -e ALL -t >>> 10 -n 500 -S 1000000 -k >>> >>> (I want to try with column size of about 1MB. I am assuming the above >>> gives me 10 parallel threads each executing 50 inserts sequentially >>> (500/10) ). >>> >>> Getting several timeout errors.TimedOutException(). With just 10 >>> concurrent writes spread across 4 nodes, kind of surprised to get so >>> many timeouts. Any suggestions ? >>> >> >> >> It may by EC2 disc speed degradation (io speed of EC2 instances doesnt >> const, also can vary in greater limits) >> >>