Hi, I have some questions about the behaviour of 'uncommitted paxos state', as described here:
http://www.datastax.com/dev/blog/cassandra-error-handling-done-right If a WriteTimeoutException with WriteType.SIMPLE is thrown for a CAS write, that means that the paxos phase was successful, but the data couldn't be committed during the final 'commit/reset' phase. On the next SERIAL write or read, any other node can commit the write on behalf of the original proposer, and must do so in fact before forming a new ballot. The stops the columns from getting 'stuck' if the coordinator experiences a network partition after forming the ballot, but before committing. My questions are on the durability of the uncommitted state: Suppose CAS writes are infrequent, and it takes weeks before another write is done to that column; will the paxos state still be there, waiting forever until the next commit, or does it get automatically committed during GC if you wait long enough? (I don't see how it could be cleaned up by a GC though, since the nodes holding the paxos state don't know if the ballot was won or not.) Or, what if all the nodes are switched off (briefly); is the uncommitted paxos state persisted to disk in the log/journal, so the write can still be completed when the cluster comes back online? Finally, how granular is the paxos state? Will the uncommitted write be completed on the next SERIAL write that touches the same exact combination of cells, or is it per-column across the partition, or....? If the CAS write touches two or three cells in the row, will a subsequent SERIAL read from any one of those three columns complete the uncommitted state, presumably on the other columns as well? Thanks for your help, Nick --- Nick Wilson Software engineer, RealVNC