[ 
https://issues.apache.org/jira/browse/CASSANDRA-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786585#comment-13786585
 ] 

Li Zou commented on CASSANDRA-5932:
-----------------------------------

Have done some testing using today's trunk. Have observed following issues.

*Issue 1* -- The first method {{MessagingService.addCallback()}} (i.e. without 
the ConsistencyLevel argument) asserts.

Commenting out the assert statement seems to work. But the Cassandra servers 
themselves will produce 10-second outage (i.e. zero transactions from the 
client point of view) periodically.

*Issue 2* -- The Speculative Retry seems stop retrying during the outage window.

During the outage window triggered either by killing one of Cassandra nodes or 
produced by Cassandra servers themselves, the JConsole shows that the JMX 
stats, SpeculativeRetry counter stops incrementing until the gossip figures out 
the outage issue.

What is the reason for this? The Speculative Retry is meant to help during the 
outage period. This observed behavior is consistent with Cassandra 2.0.0-rc2.


> Speculative read performance data show unexpected results
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-5932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5932
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan McGuire
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.0.2
>
>         Attachments: 5932.6692c50412ef7d.compaction.png, 
> 5932-6692c50412ef7d.png, 5932.6692c50412ef7d.rr0.png, 
> 5932.6692c50412ef7d.rr1.png, 5932.ded39c7e1c2fa.logs.tar.gz, 5932.txt, 
> 5933-128_and_200rc1.png, 5933-7a87fc11.png, 5933-logs.tar.gz, 
> 5933-randomized-dsnitch-replica.2.png, 5933-randomized-dsnitch-replica.3.png, 
> 5933-randomized-dsnitch-replica.png, compaction-makes-slow.png, 
> compaction-makes-slow-stats.png, eager-read-looks-promising.png, 
> eager-read-looks-promising-stats.png, eager-read-not-consistent.png, 
> eager-read-not-consistent-stats.png, node-down-increase-performance.png
>
>
> I've done a series of stress tests with eager retries enabled that show 
> undesirable behavior. I'm grouping these behaviours into one ticket as they 
> are most likely related.
> 1) Killing off a node in a 4 node cluster actually increases performance.
> 2) Compactions make nodes slow, even after the compaction is done.
> 3) Eager Reads tend to lessen the *immediate* performance impact of a node 
> going down, but not consistently.
> My Environment:
> 1 stress machine: node0
> 4 C* nodes: node4, node5, node6, node7
> My script:
> node0 writes some data: stress -d node4 -F 30000000 -n 30000000 -i 5 -l 2 -K 
> 20
> node0 reads some data: stress -d node4 -n 30000000 -o read -i 5 -K 20
> h3. Examples:
> h5. A node going down increases performance:
> !node-down-increase-performance.png!
> [Data for this test 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> At 450s, I kill -9 one of the nodes. There is a brief decrease in performance 
> as the snitch adapts, but then it recovers... to even higher performance than 
> before.
> h5. Compactions make nodes permanently slow:
> !compaction-makes-slow.png!
> !compaction-makes-slow-stats.png!
> The green and orange lines represent trials with eager retry enabled, they 
> never recover their op-rate from before the compaction as the red and blue 
> lines do.
> [Data for this test 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.compaction.2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. Speculative Read tends to lessen the *immediate* impact:
> !eager-read-looks-promising.png!
> !eager-read-looks-promising-stats.png!
> This graph looked the most promising to me, the two trials with eager retry, 
> the green and orange line, at 450s showed the smallest dip in performance. 
> [Data for this test 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.json&metric=interval_op_rate&operation=stress-read&smoothing=1]
> h5. But not always:
> !eager-read-not-consistent.png!
> !eager-read-not-consistent-stats.png!
> This is a retrial with the same settings as above, yet the 95percentile eager 
> retry (red line) did poorly this time at 450s.
> [Data for this test 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.just_20.rc1.try2.json&metric=interval_op_rate&operation=stress-read&smoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to