Is there a reason you are using the trunk and not one of the tagged releases? Official releases are a lot more stable than the trunk.
> 1) thrift timeouts & general degraded response times For read or writes ? What sort of queries are you running ? Check the local latency on each node using cfstats and cfhistogram, and a bit of iostat http://spyced.blogspot.com/2010/01/linux-performance-basics.html What does nodetool tpstats say, is there a stage backing up? If the local latency is OK look at the cross DC situation. What CL are you using? Are nodes timing out waiting for nodes in other DC's ? > 2) *lots* of exception errors, such as: Repair is trying to run on a response which is a digest response, this should not be happening. Can you provide some more info on the type of query you are running ? > 3) ring imbalances during a repair (refer to the above nodetool ring output) You may be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2280 I think it's a mistake that is it marked as resolved. > 4) regular failure detection when any node does something only moderately > stressful, such as a repair or are under light load etc. but the node itself > thinks it is fine. What version are you using ? I'd take a look at the exceptions first then move onto the performance issues. Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Aug 2011, at 03:16, Anton Winter wrote: > Hi, > > I have recently been migrating to a small 12 node Cassandra cluster spanning > across 4 DC's and have been encountering various issues with what I suspect > to be a performance tuning issue with my data set. I've learnt a few lessons > along the way but I'm at a bit of a roadblock now where I have been > experiencing frequent OutOfMemory exceptions, various other exceptions, poor > performance and my ring is appearing to become imbalanced during repairs. > I've tried various different configurations but haven't been able to get to > the bottom of my performance issues. I'm assuming this has something to do > with my data and some performance tuning metric that I'm merely overlooking. > > My ring was created as documented in the wiki & various other performance > tuning guides, calculating the tokens at each DC and incrementing when in > conflict. It is as follows: > > Address DC Rack Status State Load Owns > Token > > 113427455640312821154458202477256070487 > dc1host1 dc1 1a Up Normal 88.62 GB 33.33% 0 > dc2host1 dc2 1 Up Normal 14.76 GB 0.00% 1 > dc3host1 dc3 1 Up Normal 15.99 GB 0.00% 2 > dc4host1 cd4 1 Up Normal 14.52 GB 0.00% 3 > dc1host2 dc1 1a Up Normal 18.02 GB 33.33% > 56713727820156410577229101238628035242 > dc2host2 dc2 1 Up Normal 16.5 GB 0.00% > 56713727820156410577229101238628035243 > dc3host2 dc3 1 Up Normal 16.37 GB 0.00% > 56713727820156410577229101238628035244 > dc4host2 dc4 1 Up Normal 13.34 GB 0.00% > 56713727820156410577229101238628035245 > dc1host3 dc1 1a Up Normal 16.59 GB 33.33% > 113427455640312821154458202477256070484 > dc2host3 dc2 1 Up Normal 15.22 GB 0.00% > 113427455640312821154458202477256070485 > dc3host3 dc3 1 Up Normal 15.59 GB 0.00% > 113427455640312821154458202477256070486 > dc4host3 dc4 1 Up Normal 8.84 GB 0.00% > 113427455640312821154458202477256070487 > > The above ring was freshly created and fairly evenly distributed in load > prior to a repair (which is still running at the time of the above command) > on dc1host1, however with the exception of dc4host3 where a previous bulk > data load timed out. dc4host3 was responding poorly, was failing according > to other nodes and judging from its heap usage was rather close to OOM'ing > before it was restarted. > > I'm also using NTS with RF2. > > The primary issues I'm experiencing are: > > Light load against nodes in dc1 was causing OutOfMemory exceptions across all > Cassandra servers outside of dc1 which were all idle and eventually after > several hours happened on one of the dc1 nodes. This issue was produced > using svn trunk r1153002 and an in house written Snitched which effectively > combined PropertyFileSnitch with some components of Ec2Snitch. While trying > to resolve these issues I have moved to a r1156490 snapshot and have switched > across to just the PropertyFileSnitch and simply utilising the > broadcast_address configuration option available in trunk which seems to work > quite well. > > Since moving to r1156490 we have stopped getting OOM's, but that may actually > be because we have been unable to send traffic to the cluster to be able to > produce one. > > The most current issues I have been experiencing are the following: > > 1) thrift timeouts & general degraded response times > 2) *lots* of exception errors, such as: > > ERROR [ReadRepairStage:1076] 2011-08-11 13:33:41,266 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[ReadRepairStage:1076,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73) > at > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 3) ring imbalances during a repair (refer to the above nodetool ring output) > 4) regular failure detection when any node does something only moderately > stressful, such as a repair or are under light load etc. but the node itself > thinks it is fine. > > My hosts are all 32Gb with either 4 or 16 cores, I've set heaps appropriately > to half physical memory (16G) and for the purpose of cluster simplicity set > all younggen to 400Mb. JNA is in use, commitlogs and data have been split > onto different filesystems and so on. > > My data set as described by a dev is essentially as follows: > > 3 column families (tables): > > cf1. The RowKey is the user id. This is the primary column family queried > on and always just looked up by RowKey. It has 1 supercolumn called "seg". > The column names in this supercolumn are the segment_id's that the user > belongs to and the value is just "1". This should have about 150mm rows. > Each row will have an average of 2-3 columns in the "seg" supercolumn. The > column values have TTL's set on them. > > cf2. This is a CounterColumnFamily. There's only a single "cnt" column > which stores a counter of the number of cf1's having that segment. This was > only updated during the import and is not read at all. > > cf3. This is a lookup between the RowKey which is an external ID and the > RowKey to be used to find the user in the cf1 CF. > > > Does anyone have any ideas or suggestions about where I should be focusing on > to get to the bottom of these issues or any recommendations on where I should > be focusing my efforts on? > > Thanks, > Anton > > > >