Re: performance problems on new cluster

Anton Winter Mon, 15 Aug 2011 00:34:53 -0700

OK, node latency is fine and you are using some pretty low
consistency. You said NTS with RF 2, is that RF 2 for each DC ?


Correct, I'm using RF 2 for each DC.



I was able to reproduce the cli timeouts on the non replica nodes.

The debug log output from dc1host1 (non replica node):

DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,183 StorageProxy.java(line 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',key='userid1', column_parent='QueryPath(columnFamilyName='cf1',superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',columnName='null')', start='', finish='', reversed=false,count=1000000)/ONEDEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java(line 546) reading data from /dc1host3DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java(line 593) Read timeout: java.util.concurrent.TimeoutException:Operation timed out - received only 1 responses from /dc1host3, .

If the query is run again on the same node (dc1host1) 0 rows arereturned and the following DEBUG messages are logged:

DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java(line 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',key='userid1', column_parent='QueryPath(columnFamilyName='cf1',superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',columnName='null')', start='', finish='', reversed=false,count=1000000)/ONEDEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java(line 546) reading data from /dc1host3DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java(line 562) reading digest from /dc1host2DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc2host3DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc2host2DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc3host2DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc3host3DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc4host3DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java(line 562) reading digest from /dc4host2DEBUG [pool-2-thread-14] 2011-08-15 05:32:06,022 StorageProxy.java(line 588) Read: 508 ms.ERROR [ReadRepairStage:2112] 2011-08-15 05:32:06,404AbstractCassandraDaemon.java (line 133) Fatal exception in threadThread[ReadRepairStage:2112,5,main]

java.lang.AssertionError

atorg.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73)atorg.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Subsequent queries afterwards show "reading data from /dc1host2"however the results returned remains 0.

If I run the same query on a replica I get the correct result returnedbut with 2 exceptions as follows:

DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,792 StorageProxy.java (line518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',key='userid1', column_parent='QueryPath(columnFamilyName='cf1',superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',columnName='null')', start='', finish='', reversed=false,count=1000000)/ONEDEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line541) reading data locallyDEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line562) reading digest from /dc1host3DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line562) reading digest from dns.entry.for.dc3host2/dc3host2DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line562) reading digest from dns.entry.for.dc3host3/dc3host3DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line562) reading digest from dns.entry.for.dc2host2/dc2host2DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line562) reading digest from dns.entry.for.dc2host3/dc2host3DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line562) reading digest from dc4host2/dc4host2DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line562) reading digest from dc4host3/dc4host3DEBUG [ReadStage:20102] 2011-08-15 05:45:49,793 StorageProxy.java (line690) LocalReadRunnable reading SliceFromReadCommand(table='ks1',key='userid1', column_parent='QueryPath(columnFamilyName='cf1',superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',columnName='null')', start='', finish='', reversed=false, count=1000000)DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,795 StorageProxy.java (line588) Read: 1 ms.ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,890AbstractCassandraDaemon.java (line 133) Fatal exception in threadThread[ReadRepairStage:11,5,main]

java.lang.RuntimeException: java.lang.NullPointerException

atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException

atorg.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74)atorg.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

        ... 3 more

ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,892AbstractCassandraDaemon.java (line 133) Fatal exception in threadThread[ReadRepairStage:11,5,main]

java.lang.RuntimeException: java.lang.NullPointerException

        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException

        ... 3 more

The above have been reproduced in each DC.

The testing that has been performing poorly happens to be on the nonreplica nodes. Initial testing against one of the replicas appears thatthey do not exhibit the performance problems that the non replicas do.


Regards,
Anton



On Fri, 12 Aug 2011 14:11:06 +1200, aaron morton wrote:

iostat doesn't show a request queue bottleneck. The timeouts we areseeing is for reads. The latency on the nodes I have temporarily usedfor reads is around 2-45ms. The next token in the ring at analternate DC is showing ~4ms with everything else around 0.05ms.tpstats desn't show any active/pending. Reads are at CL.ONE & Writesusing CL.ANY


OK, node latency is fine and you are using some pretty low
consistency. You said NTS with RF 2, is that RF 2 for each DC ?

The steps below may help get an idea of whats going on…

1) use nodetool getendpoints to determine which replicas a key is.
2) connect directly to one of the endpoints with the CLI, ensure CL
is ONE and do your test query.
3) connect to another node in the same DC that is not a replica and
do the same.
4) connect to another node in a different DC and do the same

Once you can repo it try turning up the logging not the coordinator
to DEBUG you can do this via JConsole. Look for these lines….

* Command/ConsistencyLevel is….
* reading data locally... or reading data from…
* reading digest locally… or reading digest for from…
* Read timeout:…

You'll also see some lines about receiving messages from other nodes.

Hopefully you can get an idea of which nodes are involved in a
failing query. Getting a thrift TimedOutException on a read with CL
ONE is pretty odd.

What can I do in regards to confirming this issue is stilloutstanding and/or we are affected by it?

It's in 0.8 and will not be fixed. My unscientific approach was to
repair a single CF at a time, hoping that the differences would be
smaller and less data would be streamed.
Minor compaction should help squish things down. If you want to get
more aggressive reduce the min compaction threshold and trigger a
minor compaction with nodetool flush.

Version of failure detection? I've not seen anything on this so Isuspect this is the default.

Was asking so I could see if their were any fixed in Gossip or the
FailureDetect that you were missing. Check the CHANGES.txt file.

Hope that helps.

Re: performance problems on new cluster

Reply via email to