OK, node latency is fine and you are using some pretty low
consistency. You said NTS with RF 2, is that RF 2 for each DC ?

Correct, I'm using RF 2 for each DC.



I was able to reproduce the cli timeouts on the non replica nodes.

The debug log output from dc1host1 (non replica node):

DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,183 StorageProxy.java (line 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1', superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line 546) reading data from /dc1host3 DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line 593) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /dc1host3, .


If the query is run again on the same node (dc1host1) 0 rows are returned and the following DEBUG messages are logged:


DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1', superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 546) reading data from /dc1host3 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 562) reading digest from /dc1host2 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc2host3 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc2host2 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc3host2 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc3host3 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc4host3 DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading digest from /dc4host2 DEBUG [pool-2-thread-14] 2011-08-15 05:32:06,022 StorageProxy.java (line 588) Read: 508 ms. ERROR [ReadRepairStage:2112] 2011-08-15 05:32:06,404 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[ReadRepairStage:2112,5,main]
java.lang.AssertionError
at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73) at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


Subsequent queries afterwards show "reading data from /dc1host2" however the results returned remains 0.


If I run the same query on a replica I get the correct result returned but with 2 exceptions as follows:


DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,792 StorageProxy.java (line 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1', superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 541) reading data locally DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading digest from /dc1host3 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading digest from dns.entry.for.dc3host2/dc3host2 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading digest from dns.entry.for.dc3host3/dc3host3 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading digest from dns.entry.for.dc2host2/dc2host2 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading digest from dns.entry.for.dc2host3/dc2host3 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading digest from dc4host2/dc4host2 DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading digest from dc4host3/dc4host3 DEBUG [ReadStage:20102] 2011-08-15 05:45:49,793 StorageProxy.java (line 690) LocalReadRunnable reading SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1', superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='', finish='', reversed=false, count=1000000) DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,795 StorageProxy.java (line 588) Read: 1 ms. ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,890 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[ReadRepairStage:11,5,main]
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74) at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more
ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,892 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[ReadRepairStage:11,5,main]
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74) at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more

The above have been reproduced in each DC.


The testing that has been performing poorly happens to be on the non replica nodes. Initial testing against one of the replicas appears that they do not exhibit the performance problems that the non replicas do.

Regards,
Anton



On Fri, 12 Aug 2011 14:11:06 +1200, aaron morton wrote:

iostat doesn't show a request queue bottleneck. The timeouts we are seeing is for reads. The latency on the nodes I have temporarily used for reads is around 2-45ms. The next token in the ring at an alternate DC is showing ~4ms with everything else around 0.05ms. tpstats desn't show any active/pending. Reads are at CL.ONE & Writes using CL.ANY

OK, node latency is fine and you are using some pretty low
consistency. You said NTS with RF 2, is that RF 2 for each DC ?

The steps below may help get an idea of whats going on…

1) use nodetool getendpoints to determine which replicas a key is.
2) connect directly to one of the endpoints with the CLI, ensure CL
is ONE and do your test query.
3) connect to another node in the same DC that is not a replica and
do the same.
4) connect to another node in a different DC and do the same

Once you can repo it try turning up the logging not the coordinator
to DEBUG you can do this via JConsole. Look for these lines….

* Command/ConsistencyLevel is….
* reading data locally... or reading data from…
* reading digest locally… or reading digest for from…
* Read timeout:…

You'll also see some lines about receiving messages from other nodes.

Hopefully you can get an idea of which nodes are involved in a
failing query. Getting a thrift TimedOutException on a read with CL
ONE is pretty odd.

What can I do in regards to confirming this issue is still outstanding and/or we are affected by it?
It's in 0.8 and will not be fixed. My unscientific approach was to
repair a single CF at a time, hoping that the differences would be
smaller and less data would be streamed.
Minor compaction should help squish things down. If you want to get
more aggressive reduce the min compaction threshold and trigger a
minor compaction with nodetool flush.

Version of failure detection? I've not seen anything on this so I suspect this is the default.
Was asking so I could see if their were any fixed in Gossip or the
FailureDetect that you were missing. Check the CHANGES.txt file.

Hope that helps.

Reply via email to