On Jan 16, 2018, at 2:02 AM, Kyrylo Lebediev
<kyrylo_lebed...@epam.com <mailto:kyrylo_lebed...@epam.com>> wrote:
...to me it sounds like 'C* isn't that highly-available by
design as it's declared'.
More nodes in a cluster means higher probability of simultaneous
node failures.
And from high-availability standpoint, looks like situation is
made even worse by recommendedsettingvnodes=256.
Need to do some math to get numbers/formulas, but now situation
doesn't seem to be promising.
In case smb from C* developers/architects is reading this
message, I'd be grateful to get some links to calculations of C*
reliability based on which decisions were made.
Regards,
Kyrill
------------------------------------------------------------------------
*From:*kurt greaves <k...@instaclustr.com
<mailto:k...@instaclustr.com>>
*Sent:*Tuesday, January 16, 2018 2:16:34 AM
*To:*User
*Subject:*Re: vnodes: high availability
Yeah it's very unlikely that you will have 2 nodes in the
cluster with NO intersecting token ranges (vnodes) for an RF of
3 (probably even 2).
If node A goes down all 256 ranges will go down, and considering
there are only 49 other nodes all with 256 vnodes each, it's
very likely that every node will be responsible for some range A
was also responsible for. I'm not sure what the exact math is,
but think of it this way: If on each node, any of its 256 token
ranges overlap (it's within the next RF-1 or previous RF-1 token
ranges) on the ring with a token range on node A those token
ranges will be down at QUORUM.
Because token range assignment just uses rand() under the hood,
I'm sure you could prove that it's always going to be the case
that any 2 nodes going down result in a loss of QUORUM for some
token range.
On 15 January 2018 at 19:59, Kyrylo
Lebediev<kyrylo_lebed...@epam.com
<mailto:kyrylo_lebed...@epam.com>>wrote:
Thanks Alexander!
I'm not a MS in math too) Unfortunately.
Not sure, but it seems to me that probability of 2/49 in
your explanation doesn't take into account that vnodes
endpoints are almost evenly distributed across all nodes (al
least it's what I can see from "nodetool ring" output).
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html
<http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html>
of course this vnodes illustration is a theoretical one, but
there no 2 nodes on that diagram that can be switched off
without losing a key range (at CL=QUORUM).
That's because vnodes_per_node=8 > Nnodes=6.
As far as I understand, situation is getting worse with
increase of vnodes_per_node/Nnode ratio.
Please, correct me if I'm wrong.
How would the situation differ from this example by
DataStax, if we had a real-life 6-nodes cluster with 8
vnodes on each node?
Regards,
Kyrill
------------------------------------------------------------------------
*From:*Alexander Dejanovski <a...@thelastpickle.com
<mailto:a...@thelastpickle.com>>
*Sent:*Monday, January 15, 2018 8:14:21 PM
*To:*user@cassandra.apache.org
<mailto:user@cassandra.apache.org>
*Subject:*Re: vnodes: high availability
I was corrected off list that the odds of losing data when 2
nodes are down isn't dependent on the number of vnodes, but
only on the number of nodes.
The more vnodes, the smaller the chunks of data you may
lose, and vice versa.
I officially suck at statistics, as expected :)
Le lun. 15 janv. 2018 à 17:55, Alexander Dejanovski
<a...@thelastpickle.com <mailto:a...@thelastpickle.com>> a
écrit :
Hi Kyrylo,
the situation is a bit more nuanced than shown by the
Datastax diagram, which is fairly theoretical.
If you're using SimpleStrategy, there is no rack
awareness. Since vnode distribution is purely random,
and the replica for a vnode will be placed on the node
that owns the next vnode in token order (yeah, that's
not easy to formulate), you end up with statistics only.
I kinda suck at maths but I'm going to risk making a
fool of myself :)
The odds for one vnode to be replicated on another node
are, in your case, 2/49 (out of 49 remaining nodes, 2
replicas need to be placed).
Given you have 256 vnodes, the odds for at least one
vnode of a single node to exist on another one is
256*(2/49) = 10.4%
Since the relationship is bi-directional (there are the
same odds for node B to have a vnode replicated on node
A than the opposite), that doubles the odds of 2 nodes
being both replica for at least one vnode : 20.8%.
Having a smaller number of vnodes will decrease the
odds, just as having more nodes in the cluster.
(now once again, I hope my maths aren't fully wrong, I'm
pretty rusty in that area...)
How many queries that will affect is a different
question as it depends on which partition currently
exist and are queried in the unavailable token ranges.
Then you have rack awareness that comes with
NetworkTopologyStrategy :
If the number of replicas (3 in your case) is
proportional to the number of racks, Cassandra will
spread replicas in different ones.
In that situation, you can theoretically lose as many
nodes as you want in a single rack, you will still have
two other replicas available to satisfy quorum in the
remaining racks.
If you start losing nodes in different racks, we're back
to doing maths (but the odds will get slightly different).
That makes maintenance predictable because you can shut
down as many nodes as you want in a single rack without
losing QUORUM.
Feel free to correct my numbers if I'm wrong.
Cheers,
On Mon, Jan 15, 2018 at 5:27 PM Kyrylo Lebediev
<kyrylo_lebed...@epam.com
<mailto:kyrylo_lebed...@epam.com>> wrote:
Thanks, Rahul.
But in your example, at the same time loss of Node3
and Node6 leads to loss of ranges N, C, J at
consistency level QUORUM.
As far as I understand in case vnodes >
N_nodes_in_cluster and endpoint_snitch=SimpleSnitch,
since:
1) "secondary" replicas are placed on two nodes
'next' to the node responsible for a range (in case
of RF=3)
2) there are a lot of vnodes on each node
3) ranges are evenly distribusted between vnodes in
case ofSimpleSnitch,
we get all physical nodes (servers) having mutually
adjacent token rages.
Is it correct?
At least in case of my real-world ~50-nodes cluster
with nvodes=256, RF=3 for this command:
nodetool ring | grep '^<ip-prefix>' | awk '{print
$1}' | uniq | grep -B2 -A2 '<ip_of_a_node>' | grep
-v '<ip_of_a_node>' | grep -v '^--' | sort | uniq |
wc -l
returned number which equals to Nnodes -1, what
means that I can't switch off 2 nodes at the same
time w/o losing of some keyrange for CL=QUORUM.
Thanks,
Kyrill
------------------------------------------------------------------------
*From:*Rahul Neelakantan <ra...@rahul.be
<mailto:ra...@rahul.be>>
*Sent:*Monday, January 15, 2018 5:20:20 PM
*To:*user@cassandra.apache.org
<mailto:user@cassandra.apache.org>
*Subject:*Re: vnodes: high availability
Not necessarily. It depends on how the token ranges
for the vNodes are assigned to them. For example
take a look at this diagram
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html
<http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html>
In the vNode part of the diagram, you will see that
Loss of Node 3 and Node 6, will still not have any
effect on Token Range A. But yes if you lose two
nodes that both have Token Range A assigned to them
(Say Node 1 and Node 2), you will have
unavailability with your specified configuration.
You can sort of circumvent this by using the
DataStax Java Driver and having the client recognize
a degraded cluster and operate temporarily in
downgraded consistency mode
http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html
<http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html>
- Rahul
On Mon, Jan 15, 2018 at 10:04 AM, Kyrylo
Lebediev<kyrylo_lebed...@epam.com
<mailto:kyrylo_lebed...@epam.com>>wrote:
Hi,
Let's say we have a C* cluster with following
parameters:
- 50 nodes in the cluster
- RF=3
- vnodes=256 per node
- CL for some queries = QUORUM
- endpoint_snitch = SimpleSnitch
Is it correct that 2 any nodes down will cause
unavailability of a keyrange at CL=QUORUM?
Regards,
Kyrill
--
-----------------
Alexander Dejanovski
France
@alexanderdeja
Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://www.thelastpickle.com/>
--
-----------------
Alexander Dejanovski
France
@alexanderdeja
Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://www.thelastpickle.com/>