I assume it's a hash to detect read/write races. As an example: 1. actor 1 reads key = (1, 'whatever') and gets value = V0 2. actor 2 writes to key (1, 'whatever') with new value V1 3. actor 1 writes an anticolumn with key = (0, 'whatever') and value = md5(V0) 4. later, if someone wants to read the value of 'whatever' they will first read key = (0, 'whatever'), getting md5(V0), and then seconds read key = (1, 'whatever'), getting V1. Then they can check "does the hash of the value match the anti-column value?". It doesn't, so they know the anticolumn value is old and they can ignore it (the value has been updated more recently than the delete).
If step 2 did not happen, then the reader in step 4 would instead obtain V0 as the value of key = (1, 'whatever') and the anti-column value *would* match the hash of the data column value, in which case they would know that data has been deleted and they should ignore it. Cheers, Ian On Sun, Sep 7, 2014 at 4:25 AM, Raymond Lau <raymond.lau...@gmail.com> wrote: > So I watched Instagram’s presentation about Cassandra and how they handle > undos/deletes (http://youtu.be/xDtclzE4ydA?t=12m55s) and how to get > around the race condition that a get-before-write causes. > > They use this anti-column that stores an action where the first component > of the composite column is a 0 or 1, 0 if it’s an undo, 1 if it’s the > action. The second component of the composite column is the md5 hash of > the activity if it is an undo (anti-column = 0), and the actual data > pertaining to the activity if anti-column = 1. Why is the undo activity > stored as an md5 hash? Do they md5 hash everything (both anti column = 1 > and anti column = 0), compare the two lists, and negate everything where > the md5 hashes match? > > -Raymond >