Is there a reason you are using the trunk and not one of the tagged releases? 
Official releases are a lot more stable than the trunk. 

> 1) thrift timeouts & general degraded response times
For read or writes ? What sort of queries are you running ? Check the local 
latency on each node using cfstats and cfhistogram, and a bit of iostat 
http://spyced.blogspot.com/2010/01/linux-performance-basics.html What does 
nodetool tpstats say, is there a stage backing up?

If the local latency is OK look at the cross DC situation. What CL are you 
using? Are nodes timing out waiting for nodes in other DC's ? 

> 2) *lots* of exception errors, such as:
Repair is trying to run on a response which is a digest response, this should 
not be happening. Can you provide some more info on the type of query you are 
running ? 

> 3) ring imbalances during a repair (refer to the above nodetool ring output)
You may be seeing this
https://issues.apache.org/jira/browse/CASSANDRA-2280
I think it's a mistake that is it marked as resolved. 

> 4) regular failure detection when any node does something only moderately 
> stressful, such as a repair or are under light load etc. but the node itself 
> thinks it is fine.
What version are you using ? 

I'd take a look at the exceptions first then move onto the performance issues. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12 Aug 2011, at 03:16, Anton Winter wrote:

> Hi,
> 
> I have recently been migrating to a small 12 node Cassandra cluster spanning 
> across 4 DC's and have been encountering various issues with what I suspect 
> to be a performance tuning issue with my data set.  I've learnt a few lessons 
> along the way but I'm at a bit of a roadblock now where I have been 
> experiencing frequent OutOfMemory exceptions, various other exceptions, poor 
> performance and my ring is appearing to become imbalanced during repairs.  
> I've tried various different configurations but haven't been able to get to 
> the bottom of my performance issues.  I'm assuming this has something to do 
> with my data and some performance tuning metric that I'm merely overlooking.
> 
> My ring was created as documented in the wiki & various other performance 
> tuning guides, calculating the tokens at each DC and incrementing when in 
> conflict.  It is as follows:
> 
> Address         DC          Rack        Status State   Load            Owns   
>  Token
>                                                                               
> 113427455640312821154458202477256070487
> dc1host1  dc1          1a          Up     Normal  88.62 GB        33.33%  0
> dc2host1  dc2          1           Up     Normal  14.76 GB        0.00%   1
> dc3host1    dc3          1           Up     Normal  15.99 GB        0.00%   2
> dc4host1    cd4          1           Up     Normal  14.52 GB        0.00%   3
> dc1host2   dc1          1a          Up     Normal  18.02 GB        33.33%  
> 56713727820156410577229101238628035242
> dc2host2  dc2          1           Up     Normal  16.5 GB         0.00%   
> 56713727820156410577229101238628035243
> dc3host2     dc3          1           Up     Normal  16.37 GB        0.00%   
> 56713727820156410577229101238628035244
> dc4host2    dc4          1           Up     Normal  13.34 GB        0.00%   
> 56713727820156410577229101238628035245
> dc1host3  dc1          1a          Up     Normal  16.59 GB        33.33%  
> 113427455640312821154458202477256070484
> dc2host3   dc2          1           Up     Normal  15.22 GB        0.00%   
> 113427455640312821154458202477256070485
> dc3host3   dc3          1           Up     Normal  15.59 GB        0.00%   
> 113427455640312821154458202477256070486
> dc4host3    dc4          1           Up     Normal  8.84 GB         0.00%   
> 113427455640312821154458202477256070487
> 
> The above ring was freshly created and fairly evenly distributed in load 
> prior to a repair (which is still running at the time of the above command) 
> on dc1host1, however with the exception of dc4host3 where a previous bulk 
> data load timed out.  dc4host3 was responding poorly, was failing according 
> to other nodes and judging from its heap usage was rather close to OOM'ing 
> before it was restarted.
> 
> I'm also using NTS with RF2.
> 
> The primary issues I'm experiencing are:
> 
> Light load against nodes in dc1 was causing OutOfMemory exceptions across all 
> Cassandra servers outside of dc1 which were all idle and eventually after 
> several hours happened on one of the dc1 nodes.  This issue was produced 
> using svn trunk r1153002 and an in house written Snitched which effectively 
> combined PropertyFileSnitch with some components of Ec2Snitch.  While trying 
> to resolve these issues I have moved to a r1156490 snapshot and have switched 
> across to just the PropertyFileSnitch and simply utilising the 
> broadcast_address configuration option available in trunk which seems to work 
> quite well.
> 
> Since moving to r1156490 we have stopped getting OOM's, but that may actually 
> be because we have been unable to send traffic to the cluster to be able to 
> produce one.
> 
> The most current issues I have been experiencing are the following:
> 
> 1) thrift timeouts & general degraded response times
> 2) *lots* of exception errors, such as:
> 
> ERROR [ReadRepairStage:1076] 2011-08-11 13:33:41,266 
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
> Thread[ReadRepairStage:1076,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73)
>        at 
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 3) ring imbalances during a repair (refer to the above nodetool ring output)
> 4) regular failure detection when any node does something only moderately 
> stressful, such as a repair or are under light load etc. but the node itself 
> thinks it is fine.
> 
> My hosts are all 32Gb with either 4 or 16 cores, I've set heaps appropriately 
> to half physical memory (16G) and for the purpose of cluster simplicity set 
> all younggen to 400Mb.  JNA is in use, commitlogs and data have been split 
> onto different filesystems and so on.
> 
> My data set as described by a dev is essentially as follows:
> 
> 3 column families (tables):
> 
> cf1.  The RowKey is the user id.  This is the primary column family queried 
> on and always just looked up by RowKey.  It has 1 supercolumn called "seg".  
> The column names in this supercolumn are the segment_id's that the user 
> belongs to and the value is just "1".  This should have about 150mm rows.  
> Each row will have an average of 2-3 columns in the "seg" supercolumn.  The 
> column values have TTL's set on them.
> 
> cf2.  This is a CounterColumnFamily.  There's only a single "cnt" column 
> which stores a counter of the number of cf1's having that segment.  This was 
> only updated during the import and is not read at all.
> 
> cf3.  This is a lookup between the RowKey which is an external ID and the 
> RowKey to be used to find the user in the cf1  CF.
> 
> 
> Does anyone have any ideas or suggestions about where I should be focusing on 
> to get to the bottom of these issues or any recommendations on where I should 
> be focusing my efforts on?
> 
> Thanks,
> Anton
> 
> 
> 
> 

Reply via email to