Good morning all.
Hypothetical Setup:
1 data center
RF = 3
Total nodes > 3
Problem:
Suppose I need maximum consistency for one critical operation; thus I
specify CL = ALL for reads. However, this will fail if only 1 replica
endpoint is down. I don't see why this fail is necessary all of the
time since the data could have been updated since the node became
unavailable and it's data is old anyways. If only one node goes down
and it has the key I need, then the app is not 100% available and it
could take some time making the node available again.
Proposal:
If all of the *available* replica nodes answer the read operation and
the latest value timestamp is clearly AFTER the time the down node
became unavailable, then this situation can meet the requirements for
*near* 100% consistency since the value in the down node would be
outdated anyway. Clearly, the value was updated some time *after* the
node went down or unavailable. This way, you can have max availability
when using read with CL.ALL... or something CL close in meaning to ALL.
I say "near" 100% consistency to leave room for some situation where the
unavailable node was only unavailable to the coordinating node for some
reason such as a network issue and thus still received an update by some
other route after it "appeared" unavailable to the current coordinating
node. In a situation like this, there is a chance the read will still
not return the latest value. So, this will not be truly 100% consistent
which CL.ALL guarantees. However, I think this logic could justify a
new consistency level slightly lower than ALL, such as ALL_AVAIL.
What do you think? Is my logic correct? Is there a conflict with the
architecture or base principles? This fits with the tunable consistency
principle for sure.
Thanks for listening