Re: Cassandra ACID

Peter Schuller Fri, 24 Jun 2011 00:12:05 -0700

> Atomicity
> All individual writes are atomic at the row level.  So, a batch mutate for
> one specific key will apply updates to all the columns for that one specific
> row atomically.  If part of the single-key batch update fails, then all of
> the updates will be reverted since they all pertained to one key/row.
> Notice, I said 'reverted' not 'rolled back'.  Note: atomicity and isolation
> are related to the topic of transactions but one does not imply the other.
> Even though row updates are atomic, they are not isolated from other users'
> updates or reads.
> Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic


Atomicity is sort of provided, but there's no reversion going on.
Cassandra does validation of batch mutations prior to their
application and then tries to apply it. In the absence of bugs in
Cassandra, it should generally be safe to say that writes are then
guaranteed to succeed. However I wouldn't necessarily rely on this
type of atomicity to the same level that I would in e.g. PostgreSQL.

One example of violated atomicity is when you run with periodic commit
log mode instead of batch wise. If you for example perform a write on
CL.ONE but the node that took the write got killed (eg SIGKILL) before
the periodic commit log flush, you will have eaten a write that then
gets dropped. If someone read the changes that the write entails, the
application-visible behavior will be that the write will be "undone"
rather than eventually done.

> Consistency
> If you want 100% consistency, use consistency level QUORUM for both reads
> and writes and EACH_QUORUM in a multi-dc scenario.
> Refs: http://wiki.apache.org/cassandra/ArchitectureOverview

For the limited definition of consistency it provides, yes. One thing
to be aware of is that *failed* writes at QUORUM followed by
*succeeding* reads at QUORUM may have readers see inconsistent results
across requests (see
https://issues.apache.org/jira/browse/CASSANDRA-2494 although I still
think it's a designed-for behavior rather than a bug). And of course
the usual bits about concurrent updates and updates spanning multiple
rows.

I'm just a bit hesitant to agree to the term "100% consistency" since
it sounds very all-encompassing :)

> Isolation
> NOTHING is isolated; because there is no transaction support in the first
> place.  This means that two or more clients can update the same row at the
> same time.  Their updates of the same or different columns may be
> interleaved and leave the row in a state that may not make sense depending
> on your application.  Note: this doesn't mean to say that two updates of the
> same column will be corrupted, obviously; columns are the smallest atomic
> unit ('atomic' in the more general thread-safe context).
> Refs: None that directly address this explicitly and clearly and in one
> place.

Yes but the relevant lack of isolation is for reads. Due to
Cassandra's conflict resolution model, given two updates with certain
timestamps associated with them, the actual timing of the writes will
not change the eventual result in the data (absent read-before-write
logic operating on that data concurrently).

The lack of isolation is thus mostly of concern to readers.

> Durability
> Updates are made durable by the use of the commit log.  No worries here.

But be careful about choosing batch commit log sync instead of
periodic if single-node durability or post-quorum-write durability is
a concern.

-- 
/ Peter Schuller

Re: Cassandra ACID

Reply via email to