We’ve used 32 tokens pre 3.0. It’s been a mixed result due to the randomness. There’s going to be some imbalance, the amount of imbalance depends on luck, unfortunately.
I’m interested to hear your results using 4 tokens, would you mind letting the ML know your experience when you’ve done it? Jon > On Jan 16, 2018, at 9:40 AM, Kyrylo Lebediev <kyrylo_lebed...@epam.com> wrote: > > Agree with you, Jon. > Actually, this cluster was configured by my 'predecessor' and [fortunately > for him] we've never met :) > We're using version 2.1.15 and can't upgrade because of legacy Netflix > Astyanax client used. > > Below in the thread Alex mentioned that it's recommended to set vnodes to a > value lower than 256 only for C* version > 3.0 (token allocation algorithm > was improved since C* 3.0) . > > Jon, > Do you have positive experience setting up cluster with vnodes < 256 for C* > 2.1? > > vnodes=32 also too high, as for me (we need to have much more than 32 servers > per AZ in order to to get 'reliable' cluster) > vnodes=4 seems to be better from HA + balancing trade-off > > Thanks, > Kyrill > From: Jon Haddad <jonathan.had...@gmail.com> on behalf of Jon Haddad > <j...@jonhaddad.com> > Sent: Tuesday, January 16, 2018 6:44:53 PM > To: user > Subject: Re: vnodes: high availability > > While all the token math is helpful, I have to also call out the elephant in > the room: > > You have not correctly configured Cassandra for production. > > If you had used the correct endpoint snitch & network topology strategy, you > would be able to withstand the complete failure of an entire availability > zone at QUORUM, or two if you queried at CL=ONE. > > You are correct about 256 tokens causing issues, it’s one of the reasons why > we recommend 32. I’m curious how things behave going as low as 4, > personally, but I haven’t done the math / tested it yet. > > > >> On Jan 16, 2018, at 2:02 AM, Kyrylo Lebediev <kyrylo_lebed...@epam.com >> <mailto:kyrylo_lebed...@epam.com>> wrote: >> >> ...to me it sounds like 'C* isn't that highly-available by design as it's >> declared'. >> More nodes in a cluster means higher probability of simultaneous node >> failures. >> And from high-availability standpoint, looks like situation is made even >> worse by recommended setting vnodes=256. >> >> Need to do some math to get numbers/formulas, but now situation doesn't seem >> to be promising. >> In case smb from C* developers/architects is reading this message, I'd be >> grateful to get some links to calculations of C* reliability based on which >> decisions were made. >> >> Regards, >> Kyrill >> From: kurt greaves <k...@instaclustr.com <mailto:k...@instaclustr.com>> >> Sent: Tuesday, January 16, 2018 2:16:34 AM >> To: User >> Subject: Re: vnodes: high availability >> >> Yeah it's very unlikely that you will have 2 nodes in the cluster with NO >> intersecting token ranges (vnodes) for an RF of 3 (probably even 2). >> >> If node A goes down all 256 ranges will go down, and considering there are >> only 49 other nodes all with 256 vnodes each, it's very likely that every >> node will be responsible for some range A was also responsible for. I'm not >> sure what the exact math is, but think of it this way: If on each node, any >> of its 256 token ranges overlap (it's within the next RF-1 or previous RF-1 >> token ranges) on the ring with a token range on node A those token ranges >> will be down at QUORUM. >> >> Because token range assignment just uses rand() under the hood, I'm sure you >> could prove that it's always going to be the case that any 2 nodes going >> down result in a loss of QUORUM for some token range. >> >> On 15 January 2018 at 19:59, Kyrylo Lebediev <kyrylo_lebed...@epam.com >> <mailto:kyrylo_lebed...@epam.com>> wrote: >> Thanks Alexander! >> >> I'm not a MS in math too) Unfortunately. >> >> Not sure, but it seems to me that probability of 2/49 in your explanation >> doesn't take into account that vnodes endpoints are almost evenly >> distributed across all nodes (al least it's what I can see from "nodetool >> ring" output). >> >> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html >> >> <http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html> >> of course this vnodes illustration is a theoretical one, but there no 2 >> nodes on that diagram that can be switched off without losing a key range >> (at CL=QUORUM). >> >> That's because vnodes_per_node=8 > Nnodes=6. >> As far as I understand, situation is getting worse with increase of >> vnodes_per_node/Nnode ratio. >> Please, correct me if I'm wrong. >> >> How would the situation differ from this example by DataStax, if we had a >> real-life 6-nodes cluster with 8 vnodes on each node? >> >> Regards, >> Kyrill >> >> From: Alexander Dejanovski <a...@thelastpickle.com >> <mailto:a...@thelastpickle.com>> >> Sent: Monday, January 15, 2018 8:14:21 PM >> >> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org> >> Subject: Re: vnodes: high availability >> >> I was corrected off list that the odds of losing data when 2 nodes are down >> isn't dependent on the number of vnodes, but only on the number of nodes. >> The more vnodes, the smaller the chunks of data you may lose, and vice versa. >> I officially suck at statistics, as expected :) >> >> Le lun. 15 janv. 2018 à 17:55, Alexander Dejanovski <a...@thelastpickle.com >> <mailto:a...@thelastpickle.com>> a écrit : >> Hi Kyrylo, >> >> the situation is a bit more nuanced than shown by the Datastax diagram, >> which is fairly theoretical. >> If you're using SimpleStrategy, there is no rack awareness. Since vnode >> distribution is purely random, and the replica for a vnode will be placed on >> the node that owns the next vnode in token order (yeah, that's not easy to >> formulate), you end up with statistics only. >> >> I kinda suck at maths but I'm going to risk making a fool of myself :) >> >> The odds for one vnode to be replicated on another node are, in your case, >> 2/49 (out of 49 remaining nodes, 2 replicas need to be placed). >> Given you have 256 vnodes, the odds for at least one vnode of a single node >> to exist on another one is 256*(2/49) = 10.4% >> Since the relationship is bi-directional (there are the same odds for node B >> to have a vnode replicated on node A than the opposite), that doubles the >> odds of 2 nodes being both replica for at least one vnode : 20.8%. >> >> Having a smaller number of vnodes will decrease the odds, just as having >> more nodes in the cluster. >> (now once again, I hope my maths aren't fully wrong, I'm pretty rusty in >> that area...) >> >> How many queries that will affect is a different question as it depends on >> which partition currently exist and are queried in the unavailable token >> ranges. >> >> Then you have rack awareness that comes with NetworkTopologyStrategy : >> If the number of replicas (3 in your case) is proportional to the number of >> racks, Cassandra will spread replicas in different ones. >> In that situation, you can theoretically lose as many nodes as you want in a >> single rack, you will still have two other replicas available to satisfy >> quorum in the remaining racks. >> If you start losing nodes in different racks, we're back to doing maths (but >> the odds will get slightly different). >> >> That makes maintenance predictable because you can shut down as many nodes >> as you want in a single rack without losing QUORUM. >> >> Feel free to correct my numbers if I'm wrong. >> >> Cheers, >> >> >> >> >> >> On Mon, Jan 15, 2018 at 5:27 PM Kyrylo Lebediev <kyrylo_lebed...@epam.com >> <mailto:kyrylo_lebed...@epam.com>> wrote: >> Thanks, Rahul. >> But in your example, at the same time loss of Node3 and Node6 leads to loss >> of ranges N, C, J at consistency level QUORUM. >> >> As far as I understand in case vnodes > N_nodes_in_cluster and >> endpoint_snitch=SimpleSnitch, since: >> >> 1) "secondary" replicas are placed on two nodes 'next' to the node >> responsible for a range (in case of RF=3) >> 2) there are a lot of vnodes on each node >> 3) ranges are evenly distribusted between vnodes in case of SimpleSnitch, >> >> we get all physical nodes (servers) having mutually adjacent token rages. >> Is it correct? >> >> At least in case of my real-world ~50-nodes cluster with nvodes=256, RF=3 >> for this command: >> >> nodetool ring | grep '^<ip-prefix>' | awk '{print $1}' | uniq | grep -B2 -A2 >> '<ip_of_a_node>' | grep -v '<ip_of_a_node>' | grep -v '^--' | sort | uniq | >> wc -l >> >> returned number which equals to Nnodes -1, what means that I can't switch >> off 2 nodes at the same time w/o losing of some keyrange for CL=QUORUM. >> >> Thanks, >> Kyrill >> From: Rahul Neelakantan <ra...@rahul.be <mailto:ra...@rahul.be>> >> Sent: Monday, January 15, 2018 5:20:20 PM >> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org> >> Subject: Re: vnodes: high availability >> >> Not necessarily. It depends on how the token ranges for the vNodes are >> assigned to them. For example take a look at this diagram >> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html >> >> <http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html> >> >> In the vNode part of the diagram, you will see that Loss of Node 3 and Node >> 6, will still not have any effect on Token Range A. But yes if you lose two >> nodes that both have Token Range A assigned to them (Say Node 1 and Node 2), >> you will have unavailability with your specified configuration. >> >> You can sort of circumvent this by using the DataStax Java Driver and having >> the client recognize a degraded cluster and operate temporarily in >> downgraded consistency mode >> >> http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html >> >> <http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html> >> >> - Rahul >> >> On Mon, Jan 15, 2018 at 10:04 AM, Kyrylo Lebediev <kyrylo_lebed...@epam.com >> <mailto:kyrylo_lebed...@epam.com>> wrote: >> Hi, >> >> Let's say we have a C* cluster with following parameters: >> - 50 nodes in the cluster >> - RF=3 >> - vnodes=256 per node >> - CL for some queries = QUORUM >> - endpoint_snitch = SimpleSnitch >> >> Is it correct that 2 any nodes down will cause unavailability of a keyrange >> at CL=QUORUM? >> >> Regards, >> Kyrill >> >> >> >> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com <http://www.thelastpickle.com/> >> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com <http://www.thelastpickle.com/>