RE: Trouble After Changing Replication Factor

Isaeed Mohanna Wed, 13 Oct 2021 05:08:01 -0700

Hi again
I did run repair -full without any parameters which I understood will run 
repair for all key spaces, but I do not recall seeing validation tasks running 
on one of my two main keyspaces with most data. Maybe it failed or didn’t run.
Anyhow I tested with a small app on a small table that I have, the app would 
fail before the repair, and after running repair -full on the specific table it 
running fine, so I am running a full repair on the problematic keyspace , 
hopefully all will be fine when repair is done.
I am left wondering though, why does Cassandra allow this to happen, most other 
operations are somewhat guarded, one would expect the RF change operation will 
not complete without having the actual changes been carried out, I got 
surprised that CL1 reads are failing and it could cause serious data 
inconsistences, but maybe that is not realistic in large datasets to wait for 
the changes but I think it should be added to the documentation to warn that 
read with CL1 will fail until a full repair is completed.
Thanks everyone for the help,
Isaeed Mohanna

From: Jeff Jirsa <[email protected]>
Sent: Tuesday, October 12, 2021 4:59 PM
To: cassandra <[email protected]>
Subject: Re: Trouble After Changing Replication Factor

The most likely explanation is that repair failed and you didnt notice.
Or that you didnt actually repair every host / every range.

Which version are you using?
How did you run repair?

On Tue, Oct 12, 2021 at 4:33 AM Isaeed Mohanna 
<[email protected]<mailto:[email protected]>> wrote:
Hi
Yes I am sacrificing consistency to gain higher availability and faster speed, 
but my problem is not with newly inserted data that is not there for a very 
short period of time, my problem is the data that was there before the RF 
change, still do not exist in all replicas even after repair.
It looks like my cluster configuration is RF3 but the data itself is still 
using RF2 and when the data is requested from the 3rd (new) replica, it is not 
there and an empty record is returned with read CL1.
What can I do to force this data to be synced to all replicas as it should? So 
read CL1 request will actually return a correct result?

Thanks

From: Bowen Song <[email protected]<mailto:[email protected]>>
Sent: Monday, October 11, 2021 5:13 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Trouble After Changing Replication Factor

You have RF=3 and both read & write CL=1, which means you are asking Cassandra 
to give up strong consistency in order to gain higher availability and perhaps 
slight faster speed, and that's what you get. If you want to have strong 
consistency, you will need to make sure (read CL + write CL) > RF.
On 10/10/2021 11:55, Isaeed Mohanna wrote:
Hi
We had a cluster with 3 Nodes with Replication Factor 2 and we were using read 
with consistency Level One.
We recently added a 4th node and changed the replication factor to 3, once this 
was done apps reading from DB with CL1 would receive an empty record, Looking 
around I was surprised to learn that upon changing the replication factor if 
the read request is sent to a node the should own the record according to the 
new replication factor while it still doesn’t have it yet then an empty record 
will be returned because of CL1, the record will be written to that node after 
the repair operation is over.
We ran the repair operation which took days in our case (we had to change apps 
to CL2 to avoid serious data inconsistencies).
Now the repair operations are over and if I revert to CL1 we are still getting 
errors that records do not exist in DB while they do, using CL2 again it works 
fine.
Any ideas what I am missing?
Is there a way to validate that the repairs task has actually done what is 
needed and that the data is actually now replicated RF3 ?
Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh 
I do get the record but I cannot know if I am hitting the replica that doesn’t 
hold the record
Thanks for your help

RE: Trouble After Changing Replication Factor

Reply via email to