Handling uncommitted paxos state

Nicholas Wilson Thu, 25 Feb 2016 01:33:12 -0800

Hi,

I have some questions about the behaviour of 'uncommitted paxos state', as 
described here:


http://www.datastax.com/dev/blog/cassandra-error-handling-done-right

If a WriteTimeoutException with WriteType.SIMPLE is thrown for a CAS write, 
that means that the paxos phase was successful, but the data couldn't be 
committed during the final 'commit/reset' phase. On the next SERIAL write or 
read, any other node can commit the write on behalf of the original proposer, 
and must do so in fact before forming a new ballot. The stops the columns from 
getting 'stuck' if the coordinator experiences a network partition after 
forming the ballot, but before committing.

My questions are on the durability of the uncommitted state:

Suppose CAS writes are infrequent, and it takes weeks before another write is 
done to that column; will the paxos state still be there, waiting forever until 
the next commit, or does it get automatically committed during GC if you wait 
long enough? (I don't see how it could be cleaned up by a GC though, since the 
nodes holding the paxos state don't know if the ballot was won or not.)

Or, what if all the nodes are switched off (briefly); is the uncommitted paxos 
state persisted to disk in the log/journal, so the write can still be completed 
when the cluster comes back online?

Finally, how granular is the paxos state? Will the uncommitted write be 
completed on the next SERIAL write that touches the same exact combination of 
cells, or is it per-column across the partition, or....? If the CAS write 
touches two or three cells in the row, will a subsequent SERIAL read from any 
one of those three columns complete the uncommitted state, presumably on the 
other columns as well?

Thanks for your help,
Nick

---
Nick Wilson
Software engineer, RealVNC

Handling uncommitted paxos state

Reply via email to