On 6/22/2011 9:18 AM, Trevor Smith wrote:
Right -- that's the part that I am more interested in fleshing out in this post.


Here is one way. Use MVCC <http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. A single global clean-up process would be acceptable since it's not a single point of failure, only a single point of accumulating back-logged work and will not affect availability as long as you are notified if that process terminates and restart it in a reasonable amount of time but this will not affect the validity of subsequent reads.

So, you would have a "balance" column. And each update will create a "balance_<timestamp>" with a positive or negative value indicating a credit or debit. Subsequent clients will read the latest value by doing a slice from "balance" to "balance_~" (i.e. all "balance*" columns). (You would have to work-out your column naming conventions so that your slices return only the pertinent columns.) Then, the clients would have to apply all the credits and debits to the balance to get the current balance.

This handles the lost update problem.

For the dirty read and incorrect summary problems by others reading data that is in the middle of a transaction that hasn't committed yet, I would add a final transaction column to a Transactions CF. The key would be <cf>.<key>.<column>, e.g., Accounts.1234.balance, 1234 being the account # and Accounts being the CF owning the balance column. Then, a new column would be added for each successful transaction (e.g., after debiting and crediting the two accounts) using the same timestamp used in balance_<timestamp>. So, now, a client wanting the current balance would have to do a slice for all of the transactions for that column and only apply the balance updates up to the latest transaction. Note, you might have to do something else with the transaction naming schemes to make sure they are guaranteed to be unique, but you get the idea. If the transaction fails, the client simply does not add a transaction column to Transactions and deletes any "balance_<timestamp>" columns it added to in the Accounts CF (or let's the clean-up process do it... carefully).

This should avoid the need for locks and as long as each account doesn't have a crazy amount of updates, the slices shouldn't be so large as to be a significant perf hit.

A note about the updates. You have to make sure the clean-up process processes the updates in order and only 1 time. If you can't guarantee these, then you'll have to make sure your updates are idempotent and commutative.

Oh yeah, and you must use QUORUM read/writes, of course.

Any critiques?

aj

Reply via email to