How did you solve your problem eventually? I am experiencing something similar. Did you run cleanup on the node that has 80GB data?
-Raj On Mon, Aug 15, 2011 at 10:12 PM, aaron morton <aa...@thelastpickle.com>wrote: > Just checking do you have read_repair_chance set to something ? The second > request is going to all replicas which should only happen with CL ONE if > read repair is running for the request. > > The exceptions are happening during read repair which is running async to > the main read request. It's occurring after we have detected a digest mis > match, when the process is trying to reconcile the full data responses from > the replicas. The Assertion error is happening because the replica sent a > digest response. The NPE is probably happening because the response did not > include a row, how / why the response is not marked as digest is a mystery. > > This may be related to the main problem. If not dont forget to some back > to it. > > In you first log with the timeout something is not right… > > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line > 546) reading data from /dc1host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line > 593) Read timeout: java.util.concurrent.TimeoutException: Operation timed > out - received only 1 responses from /dc1host3, . > The reading… log messages are written before the inter node messages are > sent. For this CL ONE read only node dc 1 host 3 is involved and it has > been asked for the data response. Makes sense if Read Repair is not running > for the request. > > *But* the timeout error says we got a response from dc 1 host 3. One way I > can see that happening is dc 1 host 3 returning a digest instead of a data > response (see o.a.c.service.ReadCallback.response(Message)). Which kind of > matches what we saw above. > > We need some more extensive logging and probably a trip to > https://issues.apache.org/jira/browse/CASSANDRA > > Would be good to see full DEBUG logs from both dc1 host 1 and dc1 host 3 > if you can that reproduce the fault like the first one. Turn off read > repair to make things a little simpler. If thats too much we need > StorageProxy, ReadCalback, ReadVerbHandler > > Can you update the email thread with the ticket. > > Thanks > A > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 15/08/2011, at 7:34 PM, Anton Winter wrote: > > >> OK, node latency is fine and you are using some pretty low > >> consistency. You said NTS with RF 2, is that RF 2 for each DC ? > > > > Correct, I'm using RF 2 for each DC. > > > > > > > > I was able to reproduce the cli timeouts on the non replica nodes. > > > > The debug log output from dc1host1 (non replica node): > > > > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,183 StorageProxy.java (line > 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', > key='userid1', column_parent='QueryPath(columnFamilyName='cf1', > superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', > columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE > > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line > 546) reading data from /dc1host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line > 593) Read timeout: java.util.concurrent.TimeoutException: Operation timed > out - received only 1 responses from /dc1host3, . > > > > > > If the query is run again on the same node (dc1host1) 0 rows are > returned and the following DEBUG messages are logged: > > > > > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line > 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', > key='userid1', column_parent='QueryPath(columnFamilyName='cf1', > superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', > columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line > 546) reading data from /dc1host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line > 562) reading digest from /dc1host2 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc2host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc2host2 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc3host2 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc3host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc4host3 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line > 562) reading digest from /dc4host2 > > DEBUG [pool-2-thread-14] 2011-08-15 05:32:06,022 StorageProxy.java (line > 588) Read: 508 ms. > > ERROR [ReadRepairStage:2112] 2011-08-15 05:32:06,404 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[ReadRepairStage:2112,5,main] > > java.lang.AssertionError > > at > org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73) > > at > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > > > Subsequent queries afterwards show "reading data from /dc1host2" however > the results returned remains 0. > > > > > > If I run the same query on a replica I get the correct result returned > but with 2 exceptions as follows: > > > > > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,792 StorageProxy.java (line > 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1', > key='userid1', column_parent='QueryPath(columnFamilyName='cf1', > superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', > columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line > 541) reading data locally > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line > 562) reading digest from /dc1host3 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line > 562) reading digest from dns.entry.for.dc3host2/dc3host2 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line > 562) reading digest from dns.entry.for.dc3host3/dc3host3 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line > 562) reading digest from dns.entry.for.dc2host2/dc2host2 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line > 562) reading digest from dns.entry.for.dc2host3/dc2host3 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line > 562) reading digest from dc4host2/dc4host2 > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line > 562) reading digest from dc4host3/dc4host3 > > DEBUG [ReadStage:20102] 2011-08-15 05:45:49,793 StorageProxy.java (line > 690) LocalReadRunnable reading SliceFromReadCommand(table='ks1', > key='userid1', column_parent='QueryPath(columnFamilyName='cf1', > superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', > columnName='null')', start='', finish='', reversed=false, count=1000000) > > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,795 StorageProxy.java (line > 588) Read: 1 ms. > > ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,890 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[ReadRepairStage:11,5,main] > > java.lang.RuntimeException: java.lang.NullPointerException > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.lang.NullPointerException > > at > org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74) > > at > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > > ... 3 more > > ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,892 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[ReadRepairStage:11,5,main] > > java.lang.RuntimeException: java.lang.NullPointerException > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.lang.NullPointerException > > at > org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74) > > at > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > > ... 3 more > > > > The above have been reproduced in each DC. > > > > > > The testing that has been performing poorly happens to be on the non > replica nodes. Initial testing against one of the replicas appears that > they do not exhibit the performance problems that the non replicas do. > > > > Regards, > > Anton > > > > > > > > On Fri, 12 Aug 2011 14:11:06 +1200, aaron morton wrote: > >>> > >>> iostat doesn't show a request queue bottleneck. The timeouts we are > seeing is for reads. The latency on the nodes I have temporarily used for > reads is around 2-45ms. The next token in the ring at an alternate DC is > showing ~4ms with everything else around 0.05ms. tpstats desn't show any > active/pending. Reads are at CL.ONE & Writes using CL.ANY > >> > >> OK, node latency is fine and you are using some pretty low > >> consistency. You said NTS with RF 2, is that RF 2 for each DC ? > >> > >> The steps below may help get an idea of whats going on… > >> > >> 1) use nodetool getendpoints to determine which replicas a key is. > >> 2) connect directly to one of the endpoints with the CLI, ensure CL > >> is ONE and do your test query. > >> 3) connect to another node in the same DC that is not a replica and > >> do the same. > >> 4) connect to another node in a different DC and do the same > >> > >> Once you can repo it try turning up the logging not the coordinator > >> to DEBUG you can do this via JConsole. Look for these lines…. > >> > >> * Command/ConsistencyLevel is…. > >> * reading data locally... or reading data from… > >> * reading digest locally… or reading digest for from… > >> * Read timeout:… > >> > >> You'll also see some lines about receiving messages from other nodes. > >> > >> Hopefully you can get an idea of which nodes are involved in a > >> failing query. Getting a thrift TimedOutException on a read with CL > >> ONE is pretty odd. > >> > >>> What can I do in regards to confirming this issue is still outstanding > and/or we are affected by it? > >> It's in 0.8 and will not be fixed. My unscientific approach was to > >> repair a single CF at a time, hoping that the differences would be > >> smaller and less data would be streamed. > >> Minor compaction should help squish things down. If you want to get > >> more aggressive reduce the min compaction threshold and trigger a > >> minor compaction with nodetool flush. > >> > >>> Version of failure detection? I've not seen anything on this so I > suspect this is the default. > >> Was asking so I could see if their were any fixed in Gossip or the > >> FailureDetect that you were missing. Check the CHANGES.txt file. > >> > >> Hope that helps. > > > >