On 6/30/2011 1:57 PM, Jeremiah Jordan wrote:
For your Consistency case, it is actually an ALL read that is needed,
not an ALL write. ALL read, with what ever consistency level of write
that you need (to support machines dyeing) is the only way to get
consistent results in the face of a failed write which was at >
ONE that went to one node, but not the others.
True, an ALL read is the best and final test for consistency for that
read. I think an ALL write is more of a preemptive measure. If you
know you'll be needing consistency later, better to get it in while you
can. But, this leads to a whole other set of complex topics. I like
the flexibility, however.
*Atomicity*
All individual writes are atomic at the row level. So, a batch mutate
for one specific key will apply updates to all the columns for that one
specific row atomically. If part of the single-key batch update fails,
then all of the updates will be reverted since they all pertained to one
key/row. Notice, I said 'reverted' not 'rolled back'. Note: atomicity
and isolation are related to the topic of transactions but one does not
imply the other. Even though row updates are atomic, they are not
isolated from other users' updates or reads.
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
*Consistency*
Cassandra does not provide the same scope of Consistency as defined in
the ACID standard. Consistency in C* does not include referential
integrity since C* is not a relational database. Any referential
integrity required would have to be handled by the client. Also, even
though the official docs say that QUORUM writes/reads is the minimal
consistency_level setting to guarantee full consistency, this assumes
that the write preceding the read does not fail (see comments below).
What to do in this case is not fully understood by this author.
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview
*Isolation*
NOTHING is isolated; because there is no transaction support in the
first place. This means that two or more clients can update the same
row at the same time. Their updates of the same or different columns
may be interleaved and leave the row in a state that may not make sense
depending on your application. Note: this doesn't mean to say that two
updates of the same column will be corrupted, obviously; columns are the
smallest atomic unit ('atomic' in the more general thread-safe context).
Refs: None that directly address this explicitly and clearly and in one
place.
*Durability*
Updates are made highly durable at the level comparable to a DBMS by the
use of the commit log. However, this requires "commitlog_sync: batch"
in cassandra.yaml. For "some" performance improvement with "some" cost
in durability you can specify "commitlog_sync: periodic". See
discussion below for more details.
Refs: Plenty + this thread.
------------------------------------------------------------------------
*From:* AJ [mailto:a...@dude.podzone.net]
*Sent:* Friday, June 24, 2011 11:28 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Cassandra ACID
Ok, here it is reworked; consider it a summary of the thread. If I
left out an important point that you think is 100% correct even if you
already mentioned it, then make some noise about it and provide some
evidence so it's captured sufficiently. And, if you're in a debate,
please try and get to a resolution; all will appreciate it.
It will be evident below that Consistency is not the only thing that
is "tunable", at least indirectly. Unfortunately, you still can't
tunafish. Ar ar ar.
*Atomicity*
All individual writes are atomic at the row level. So, a batch mutate
for one specific key will apply updates to all the columns for that
one specific row atomically. If part of the single-key batch update
fails, then all of the updates will be reverted since they all
pertained to one key/row. Notice, I said 'reverted' not 'rolled
back'. Note: atomicity and isolation are related to the topic of
transactions but one does not imply the other. Even though row
updates are atomic, they are not isolated from other users' updates or
reads.
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
*Consistency*
Cassandra does not provide the same scope of Consistency as defined in
the ACID standard. Consistency in C* does not include referential
integrity since C* is not a relational database. Any referential
integrity required would have to be handled by the client. Also, even
though the official docs say that QUORUM writes/reads is the minimal
consistency_level setting to guarantee full consistency, this assumes
that the write preceding the read does not fail (see comments below).
Therefore, an ALL write would be necessary prior to a QUORUM read of
the same data. For a multi-dc scenario use an ALL write followed by a
EACH_QUORUM read.
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview
*Isolation*
NOTHING is isolated; because there is no transaction support in the
first place. This means that two or more clients can update the same
row at the same time. Their updates of the same or different columns
may be interleaved and leave the row in a state that may not make
sense depending on your application. Note: this doesn't mean to say
that two updates of the same column will be corrupted, obviously;
columns are the smallest atomic unit ('atomic' in the more general
thread-safe context).
Refs: None that directly address this explicitly and clearly and in
one place.
*Durability*
Updates are made highly durable at the level comparable to a DBMS by
the use of the commit log. However, this requires "commitlog_sync:
batch" in cassandra.yaml. For "some" performance improvement with
"some" cost in durability you can specify "commitlog_sync: periodic".
See discussion below for more details.
Refs: Plenty + this thread.
On 6/24/2011 1:46 PM, Jim Newsham wrote:
On 6/23/2011 8:55 PM, AJ wrote:
Can any Cassandra contributors/guru's confirm my understanding of
Cassandra's degree of support for the ACID properties?
I provide official references when known. Please let me know if I
missed some good official documentation.
*Atomicity*
All individual writes are atomic at the row level. So, a batch
mutate for one specific key will apply updates to all the columns
for that one specific row atomically. If part of the single-key
batch update fails, then all of the updates will be reverted since
they all pertained to one key/row. Notice, I said 'reverted' not
'rolled back'. Note: atomicity and isolation are related to the
topic of transactions but one does not imply the other. Even though
row updates are atomic, they are not isolated from other users'
updates or reads.
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
*Consistency*
If you want 100% consistency, use consistency level QUORUM for both
reads and writes and EACH_QUORUM in a multi-dc scenario.
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview
This is a pretty narrow interpretation of consistency. In a
traditional database, consistency prevents you from getting into a
logically inconsistent state, where records in one table do not agree
with records in another table. This includes referential integrity,
cascading deletes, etc. It seems to me Cassandra has no support for
this concept whatsoever.
*Isolation*
NOTHING is isolated; because there is no transaction support in the
first place. This means that two or more clients can update the
same row at the same time. Their updates of the same or different
columns may be interleaved and leave the row in a state that may not
make sense depending on your application. Note: this doesn't mean
to say that two updates of the same column will be corrupted,
obviously; columns are the smallest atomic unit ('atomic' in the
more general thread-safe context).
Refs: None that directly address this explicitly and clearly and in
one place.
*Durability*
Updates are made durable by the use of the commit log. No worries here.
Refs: Plenty.
Jim