Re: performance problems on new cluster

Raj N Wed, 20 Jun 2012 04:53:06 -0700

How did you solve your problem eventually? I am experiencing something
similar. Did you run cleanup on the node that has 80GB data?


-Raj

On Mon, Aug 15, 2011 at 10:12 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Just checking do you have read_repair_chance set to something ? The second
> request is going to all replicas which should only happen with CL ONE if
> read repair is running for the request.
>
> The exceptions are happening during read repair which is running async to
> the main read request. It's occurring after we have detected a digest mis
> match, when the process is trying to reconcile the full data responses from
> the replicas. The Assertion error is happening because the replica sent a
> digest response. The NPE is probably happening because the response did not
> include a row, how / why the response is not marked as digest is a mystery.
>
> This may be related to the main problem. If not dont forget to some back
> to it.
>
> In you first log with the timeout something is not right…
> > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line
> 546) reading data from /dc1host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line
> 593) Read timeout: java.util.concurrent.TimeoutException: Operation timed
> out - received only 1 responses from /dc1host3,  .
> The reading… log messages are written before the inter node messages are
> sent. For this CL ONE read only node dc 1 host 3 is involved and it has
> been asked for the data response. Makes sense if Read Repair is not running
> for the request.
>
> *But* the timeout error says we got a response from dc 1 host 3. One way I
> can see that happening is dc 1 host 3 returning a digest instead of a data
> response (see o.a.c.service.ReadCallback.response(Message)). Which kind of
> matches what we saw above.
>
> We need some more extensive logging and probably a trip to
> https://issues.apache.org/jira/browse/CASSANDRA
>
> Would be good to see full DEBUG logs from both dc1 host 1 and dc1 host 3
> if you can that reproduce the fault like the first one. Turn off read
> repair to make things a little simpler. If thats too much we need
> StorageProxy, ReadCalback, ReadVerbHandler
>
> Can you update the email thread with the ticket.
>
> Thanks
> A
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/08/2011, at 7:34 PM, Anton Winter wrote:
>
> >> OK, node latency is fine and you are using some pretty low
> >> consistency. You said NTS with RF 2, is that RF 2 for each DC ?
> >
> > Correct, I'm using RF 2 for each DC.
> >
> >
> >
> > I was able to reproduce the cli timeouts on the non replica nodes.
> >
> > The debug log output from dc1host1 (non replica node):
> >
> > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,183 StorageProxy.java (line
> 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',
> key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
> superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',
> columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
> > DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line
> 546) reading data from /dc1host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line
> 593) Read timeout: java.util.concurrent.TimeoutException: Operation timed
> out - received only 1 responses from /dc1host3,  .
> >
> >
> > If the query is run again on the same node (dc1host1) 0 rows are
> returned and the following DEBUG messages are logged:
> >
> >
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line
> 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',
> key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
> superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',
> columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line
> 546) reading data from /dc1host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line
> 562) reading digest from /dc1host2
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc2host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc2host2
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc3host2
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc3host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc4host3
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line
> 562) reading digest from /dc4host2
> > DEBUG [pool-2-thread-14] 2011-08-15 05:32:06,022 StorageProxy.java (line
> 588) Read: 508 ms.
> > ERROR [ReadRepairStage:2112] 2011-08-15 05:32:06,404
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[ReadRepairStage:2112,5,main]
> > java.lang.AssertionError
> >        at
> org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73)
> >        at
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
> >        at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >        at java.lang.Thread.run(Thread.java:662)
> >
> >
> > Subsequent queries afterwards show "reading data from /dc1host2" however
> the results returned remains 0.
> >
> >
> > If I run the same query on a replica I get the correct result returned
> but with 2 exceptions as follows:
> >
> >
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,792 StorageProxy.java (line
> 518) Command/ConsistencyLevel is SliceFromReadCommand(table='ks1',
> key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
> superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',
> columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line
> 541) reading data locally
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line
> 562) reading digest from /dc1host3
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line
> 562) reading digest from dns.entry.for.dc3host2/dc3host2
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line
> 562) reading digest from dns.entry.for.dc3host3/dc3host3
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line
> 562) reading digest from dns.entry.for.dc2host2/dc2host2
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line
> 562) reading digest from dns.entry.for.dc2host3/dc2host3
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line
> 562) reading digest from dc4host2/dc4host2
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line
> 562) reading digest from dc4host3/dc4host3
> > DEBUG [ReadStage:20102] 2011-08-15 05:45:49,793 StorageProxy.java (line
> 690) LocalReadRunnable reading SliceFromReadCommand(table='ks1',
> key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
> superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]',
> columnName='null')', start='', finish='', reversed=false, count=1000000)
> > DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,795 StorageProxy.java (line
> 588) Read: 1 ms.
> > ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,890
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[ReadRepairStage:11,5,main]
> > java.lang.RuntimeException: java.lang.NullPointerException
> >        at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >        at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.lang.NullPointerException
> >        at
> org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74)
> >        at
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
> >        at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >        ... 3 more
> > ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,892
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[ReadRepairStage:11,5,main]
> > java.lang.RuntimeException: java.lang.NullPointerException
> >        at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >        at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.lang.NullPointerException
> >        at
> org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74)
> >        at
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
> >        at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >        ... 3 more
> >
> > The above have been reproduced in each DC.
> >
> >
> > The testing that has been performing poorly happens to be on the non
> replica nodes.  Initial testing against one of the replicas appears that
> they do not exhibit the performance problems that the non replicas do.
> >
> > Regards,
> > Anton
> >
> >
> >
> > On Fri, 12 Aug 2011 14:11:06 +1200, aaron morton wrote:
> >>>
> >>> iostat doesn't show a request queue bottleneck.  The timeouts we are
> seeing is for reads.  The latency on the nodes I have temporarily used for
> reads is around 2-45ms.  The next token in the ring at an alternate DC is
> showing ~4ms with everything else around 0.05ms.  tpstats desn't show any
> active/pending.  Reads are at CL.ONE & Writes using CL.ANY
> >>
> >> OK, node latency is fine and you are using some pretty low
> >> consistency. You said NTS with RF 2, is that RF 2 for each DC ?
> >>
> >> The steps below may help get an idea of whats going on…
> >>
> >> 1) use nodetool getendpoints to determine which replicas a key is.
> >> 2) connect directly to one of the endpoints with the CLI, ensure CL
> >> is ONE and do your test query.
> >> 3) connect to another node in the same DC that is not a replica and
> >> do the same.
> >> 4) connect to another node in a different DC and do the same
> >>
> >> Once you can repo it try turning up the logging not the coordinator
> >> to DEBUG you can do this via JConsole. Look for these lines….
> >>
> >> * Command/ConsistencyLevel is….
> >> * reading data locally... or reading data from…
> >> * reading digest locally… or reading digest for from…
> >> * Read timeout:…
> >>
> >> You'll also see some lines about receiving messages from other nodes.
> >>
> >> Hopefully you can get an idea of which nodes are involved in a
> >> failing query. Getting a thrift TimedOutException on a read with CL
> >> ONE is pretty odd.
> >>
> >>> What can I do in regards to confirming this issue is still outstanding
> and/or we are affected by it?
> >> It's in 0.8 and will not be fixed. My unscientific approach was to
> >> repair a single CF at a time, hoping that the differences would be
> >> smaller and less data would be streamed.
> >> Minor compaction should help squish things down. If you want to get
> >> more aggressive reduce the min compaction threshold and trigger a
> >> minor compaction with nodetool flush.
> >>
> >>> Version of failure detection?  I've not seen anything on this so I
> suspect this is the default.
> >> Was asking so I could see if their were any fixed in Gossip or the
> >> FailureDetect that you were missing. Check the CHANGES.txt file.
> >>
> >> Hope that helps.
> >
>
>

Re: performance problems on new cluster

Reply via email to