Re: Degradation of availability when using NTS and RF > number of racks

C. Scott Andreas Mon, 06 Mar 2023 07:43:57 -0800

Modifying NTS in place would not be possible if it changes rack placement in a 
way that breaks existing clusters on upgrade. A strategy introducing a change 
to placement like this would need a new name. A new strategy would be fine in 
trunk.


Logging a warning seems appropriate if RF > rack count.

A discuss thread seems fine for this rather than a CEP to me.

— Scott

> On Mar 6, 2023, at 2:51 AM, Miklosovic, Stefan <stefan.mikloso...@netapp.com> 
> wrote:
> 
> Hi all,
> 
> some time ago we identified an issue with NetworkTopologyStrategy. The 
> problem is that when RF > number of racks, it may happen that NTS places 
> replicas in such a way that when whole rack is lost, we lose QUORUM and data 
> are not available anymore if QUORUM CL is used.
> 
> To illustrate this problem, lets have this setup:
> 
> 9 nodes in 1 DC, 3 racks, 3 nodes per rack. RF = 5. Then, NTS could place 
> replicas like this: 3 replicas in rack1, 1 replica in rack2, 1 replica in 
> rack3. Hence, when rack1 is lost, we do not have QUORUM.
> 
> It seems to us that there is already some logic around this scenario (1) but 
> the implementation is not entirely correct. This solution is not computing 
> the replica placement correctly so the above problem would be addressed.
> 
> We created a draft here (2, 3) which fixes it.
> 
> There is also a test which simulates this scenario. When I assign 256 tokens 
> to each node randomly (by same mean as generatetokens command uses) and I try 
> to compute natural replicas for 1 billion random tokens and I compute how 
> many cases there will be when 3 replicas out of 5 are inserted in the same 
> rack (so by losing it we would lose quorum), for above setup I get around 6%.
> 
> For 12 nodes, 3 racks, 4 nodes per rack, rf = 5, this happens in 10% cases.
> 
> To interpret this number, it basically means that with such topology, RF and 
> CL, when a random rack fails completely, when doing a random read, there is 
> 6% chance that data will not be available (or 10%, respectively).
> 
> One caveat here is that NTS is not compatible with this new strategy anymore 
> because it will place replicas differently. So I guess that fixing this in 
> NTS will not be possible because of upgrades. I think people would need to 
> setup completely new keyspace and somehow migrate data if they wish or they 
> just start from scratch with this strategy.
> 
> Questions:
> 
> 1) do you think this is meaningful to fix and it might end up in trunk?
> 
> 2) should not we just ban this scenario entirely? It might be possible to 
> check the configuration upon keyspace creation (rf > num of racks) and if we 
> see this is problematic we would just fail that query? Guardrail maybe?
> 
> 3) people in the ticket mention writing "CEP" for this but I do not see any 
> reason to do so. It is just a strategy as any other. What would that CEP 
> would even be about? Is this necessary?
> 
> Regards
> 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L126-L128
> (2) https://github.com/apache/cassandra/pull/2191
> (3) https://issues.apache.org/jira/browse/CASSANDRA-16203

Re: Degradation of availability when using NTS and RF > number of racks

Reply via email to