Re: performance problems on new cluster

Anton Winter Thu, 11 Aug 2011 17:49:32 -0700

Is there a reason you are using the trunk and not one of the taggedreleases? Official releases are a lot more stable than the trunk.

Yes, as we are using a combination of Ec2 and colo servers we areneeding to use broadcast_address from CASSANDRA-2491. The patch that isassociated with that JIRA does not apply cleanly against 0.8 so this iswhy we are using trunk.

1) thrift timeouts & general degraded response times
For read or writes ? What sort of queries are you running ? Check thelocal latency on each node using cfstats and cfhistogram, and a bit ofiostathttp://spyced.blogspot.com/2010/01/linux-performance-basics.html Whatdoes nodetool tpstats say, is there a stage backing up?
If the local latency is OK look at the cross DC situation. What CL areyou using? Are nodes timing out waiting for nodes in other DC's ?

iostat doesn't show a request queue bottleneck. The timeouts we areseeing is for reads. The latency on the nodes I have temporarily usedfor reads is around 2-45ms. The next token in the ring at an alternateDC is showing ~4ms with everything else around 0.05ms. tpstats desn'tshow any active/pending. Reads are at CL.ONE & Writes using CL.ANY

2) *lots* of exception errors, such as:
Repair is trying to run on a response which is a digest response, thisshould not be happening. Can you provide some more info on the type ofquery you are running ?

The query being run is  get cf1['user-id']['seg']

3) ring imbalances during a repair (refer to the above nodetool ringoutput)
You may be seeing this
https://issues.apache.org/jira/browse/CASSANDRA-2280
I think it's a mistake that is it marked as resolved.

What can I do in regards to confirming this issue is still outstandingand/or we are affected by it?

4) regular failure detection when any node does something onlymoderately stressful, such as a repair or are under light load etc.but the node itself thinks it is fine.
What version are you using ?

Version of failure detection? I've not seen anything on this so Isuspect this is the default.



Thanks,
Anton

Re: performance problems on new cluster

Reply via email to