performance problems on new cluster

Anton Winter Thu, 11 Aug 2011 08:17:13 -0700

Hi,

I have recently been migrating to a small 12 node Cassandra clusterspanning across 4 DC's and have been encountering various issues withwhat I suspect to be a performance tuning issue with my data set. I'velearnt a few lessons along the way but I'm at a bit of a roadblock nowwhere I have been experiencing frequent OutOfMemory exceptions, variousother exceptions, poor performance and my ring is appearing to becomeimbalanced during repairs. I've tried various different configurationsbut haven't been able to get to the bottom of my performance issues.I'm assuming this has something to do with my data and some performancetuning metric that I'm merely overlooking.

My ring was created as documented in the wiki & various otherperformance tuning guides, calculating the tokens at each DC andincrementing when in conflict. It is as follows:

Address DC Rack Status State LoadOwns Token

                                                                               
113427455640312821154458202477256070487
dc1host1  dc1          1a          Up     Normal  88.62 GB        33.33%  0
dc2host1  dc2          1           Up     Normal  14.76 GB        0.00%   1

dc3host1 dc3 1 Up Normal 15.99 GB0.00% 2dc4host1 cd4 1 Up Normal 14.52 GB0.00% 3dc1host2 dc1 1a Up Normal 18.02 GB33.33% 56713727820156410577229101238628035242dc2host2 dc2 1 Up Normal 16.5 GB0.00% 56713727820156410577229101238628035243dc3host2 dc3 1 Up Normal 16.37 GB0.00% 56713727820156410577229101238628035244dc4host2 dc4 1 Up Normal 13.34 GB0.00% 56713727820156410577229101238628035245dc1host3 dc1 1a Up Normal 16.59 GB33.33% 113427455640312821154458202477256070484dc2host3 dc2 1 Up Normal 15.22 GB0.00% 113427455640312821154458202477256070485dc3host3 dc3 1 Up Normal 15.59 GB0.00% 113427455640312821154458202477256070486dc4host3 dc4 1 Up Normal 8.84 GB0.00% 113427455640312821154458202477256070487

The above ring was freshly created and fairly evenly distributed in loadprior to a repair (which is still running at the time of the abovecommand) on dc1host1, however with the exception of dc4host3 where aprevious bulk data load timed out. dc4host3 was responding poorly, wasfailing according to other nodes and judging from its heap usage wasrather close to OOM'ing before it was restarted.


I'm also using NTS with RF2.

The primary issues I'm experiencing are:

Light load against nodes in dc1 was causing OutOfMemory exceptionsacross all Cassandra servers outside of dc1 which were all idle andeventually after several hours happened on one of the dc1 nodes. Thisissue was produced using svn trunk r1153002 and an in house writtenSnitched which effectively combined PropertyFileSnitch with somecomponents of Ec2Snitch. While trying to resolve these issues I havemoved to a r1156490 snapshot and have switched across to just thePropertyFileSnitch and simply utilising the broadcast_addressconfiguration option available in trunk which seems to work quite well.

Since moving to r1156490 we have stopped getting OOM's, but that mayactually be because we have been unable to send traffic to the clusterto be able to produce one.


The most current issues I have been experiencing are the following:

1) thrift timeouts & general degraded response times
2) *lots* of exception errors, such as:

ERROR [ReadRepairStage:1076] 2011-08-11 13:33:41,266AbstractCassandraDaemon.java (line 133) Fatal exception in threadThread[ReadRepairStage:1076,5,main]

java.lang.AssertionError

atorg.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73)atorg.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

3) ring imbalances during a repair (refer to the above nodetool ring output)

4) regular failure detection when any node does something onlymoderately stressful, such as a repair or are under light load etc. butthe node itself thinks it is fine.

My hosts are all 32Gb with either 4 or 16 cores, I've set heapsappropriately to half physical memory (16G) and for the purpose ofcluster simplicity set all younggen to 400Mb. JNA is in use, commitlogsand data have been split onto different filesystems and so on.


My data set as described by a dev is essentially as follows:

3 column families (tables):

cf1. The RowKey is the user id. This is the primary column familyqueried on and always just looked up by RowKey. It has 1 supercolumncalled "seg". The column names in this supercolumn are the segment_id'sthat the user belongs to and the value is just "1". This should haveabout 150mm rows. Each row will have an average of 2-3 columns in the"seg" supercolumn. The column values have TTL's set on them.

cf2. This is a CounterColumnFamily. There's only a single "cnt" columnwhich stores a counter of the number of cf1's having that segment. Thiswas only updated during the import and is not read at all.

cf3. This is a lookup between the RowKey which is an external ID andthe RowKey to be used to find the user in the cf1 CF.

Does anyone have any ideas or suggestions about where I should befocusing on to get to the bottom of these issues or any recommendationson where I should be focusing my efforts on?


Thanks,
Anton

performance problems on new cluster

Reply via email to