Thanks Kurt. 

I think the main scenario which MUST be addressed by snapshot is Backup/Restore 
so that a node can be restored with minimal time and the lengthy procedure of 
boottsrapping with join_ring=false followed by full repair can be avoided. The 
plain restore snapshot + repair scenario seems to be broken. The situation is 
less critical when you use join_ring=false.   .
Changing consistency level to ALL is not an optimal solution or workaround 
because it may impact performance. Moreover, it is an unreasonable and unstated 
assumption that Cassandra users can dynamically change CL and then revert it 
back after the repair.
Ideal restore process should be :1. Restore Snapshot2. Start the node with 
join_ring=false
3. Cassandra should ACCEPT writes in this phase just like bootstrap with 
join_ring=false.4. Repair the node5. Join the node.
Point 3 seems to be missing in current implementation of join_ring. Thus, at 
step 5 when the node joins the ring, it will NOT lead to inconsistent writes as 
all the data updates after the snapshot was taken and before the snapshot was 
restored are consistent on all the nodes. BUT now, the node has missed on 
important updates done while the repair was going on. So, full repair didn't 
synced entire data. It fixed inconsistencies and prevented inconsistent reeads 
and lead to NEW inconsistencies. You need another full repair on the node :(
I will conduct a test to be 100% sure that join_ring is not accepting writes 
and if  I get same results, I will create a JIRA.
We are updating file system on nodes and doing it one node a time to avoid 
downtime. Snapshot cuts down on excessive streaming and lengthy procedure 
(boostrap+repair), so we were evaluating snapshot restore as an option.

ThanksAnuj


   

    On Wednesday, 28 June 2017 5:56 PM, kurt greaves <k...@instaclustr.com> 
wrote:
 

 There are many scenarios where it can be useful, but to address what seems to 
be your main concern; you could simply restore and then only read at ALL until 
your repair completes.
If you use snapshot restore with commitlog archiving you're in a better state, 
but granted the case you described can still occur. To some extent, if you have 
to restore a snapshot you will have to perform some kind of repair. It's not 
really possible to restore to an older point and expect strong consistency.

Snapshots are also useful for creating a clone of a cluster/node.
But really why are you only restoring a snapshot on one node? If you lost all 
the data, it would be much simpler to just replace the node.​

   

Reply via email to