[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805928#comment-13805928
 ] 

Brandon Williams commented on CASSANDRA-6127:
---------------------------------------------

Patch #3 will make it take much longer for a rebooted node to know who's 
actually up or down, exacerbating CASSANDRA-4288.  I'd still like to know *why* 
things are taking longer with vnodes, and I'm especially hesitant to make any 
adjustments to the gossiper or FD since we know they work fine with single 
tokens, and also because they *have no knowledge about tokens*, it's just 
another opaque state to them.  I suspect something in StorageService is 
blocking the gossiper long enough to cause this, perhaps CASSANDRA-6244 or 
something similar.

> vnodes don't scale to hundreds of nodes
> ---------------------------------------
>
>                 Key: CASSANDRA-6127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Any cluster that has vnodes and consists of hundreds of 
> physical nodes.
>            Reporter: Tupshin Harper
>            Assignee: Jonathan Ellis
>         Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
> delayEstimatorUntilStatisticallyValid.patch
>
>
> There are a lot of gossip-related issues related to very wide clusters that 
> also have vnodes enabled. Let's use this ticket as a master in case there are 
> sub-tickets.
> The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
> instances. Each node configured with 32 vnodes.
> Without vnodes, cluster spins up fine and is ready to handle requests within 
> 30 minutes or less. 
> With vnodes, nodes are reporting constant up/down flapping messages with no 
> external load on the cluster. After a couple of hours, they were still 
> flapping, had very high cpu load, and the cluster never looked like it was 
> going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to