This is my second attempt at a summary of Cassandra vs HBase consistency and performance for an hbase acceptable workload. I think these tricky subtlties are hard to understand, yet it's helpful for the community to understand them. I'm not trying to state my own facts (or opinion) but merely summarize what I've read.
Again, please correct any facts which are wrong. Thanks for the kind and thoughtful responses! *1) Cassandra can't replicate the consistency situation of HBase.* Namely that once a write is finished that new value will either always appear or never appear. [In Cassandra]Provided at least one node receives the write, it will eventually be written to all replicas. A failure to meet the requested ConsistencyLevel is just that; not a failure to write the data itself. Once the write is received by a node, it will eventually reach all replicas, there is no roll back. - Nick Telford [ref<http://www.mail-archive.com/user@cassandra.apache.org/msg07398.html> ] In Cassandra (N3/W3/R1, N3/W2/R2, or N3/W3/R3), a write can occur to a single node, fail to meet the write-consistency request, readback can show the old value, but later show the new value once the write that did occur is propagated. [In HBase]Once a region master accepts a write, it has been flushed to the HDFS log. If the replica server goes down while writing, if the write was finished to any copies of the HDFS log, the new region master will accept and propagate the write, if not, the write will never appear. *2) Cassandra has a less efficient use of memory, particularly for data pinned in memory. *With 3 replicas on Cassandra, each element of data pinned in-memory is kept on 3 servers, wheras in hbase only region masters keep the data in memory, so there is only one-copy of each data element. CASSANDRA-1314 <https://issues.apache.org/jira/browse/CASSANDRA-1314>provides an opportunity to allow a 'soft master', where reads prefer a particular replica. Combined with a disable of read-repair this should allow for more efficient memory usage for data pinned or cached in memory. #1 is still true, namely that a write may only occur to a node which is not the soft-master, and that new new value may not appear for a while and then eventually appear. However, with N3/W3/R1, once a write appears at the soft-master it will remain, so as long as the soft-master preference can be honored it will be closer to HBase's consistency. *3) HBase can't match the row-availability situation of Cassandra (N3/W2/R2).* In the face of a single machine failure, if it is a region master, those keys are offline in HBase until a new region master is elected and brought online. In Cassandra, no single node failure causes the data to become unavailable. *4) Two Cassandra configurations are closest to the **consistency situation of hbase, and provide slightly different node failure characteristics.*(note, #1 above means Cassandra can't truly reach the same consistency situation as HBase) In Cassandra (N3/W3/R1), a node failure will disallow writes to a keyrange during the replica rebuild, while still allowing reads. In Cassandra (N3/W2-3/R2), a node failure will allow both reads and writes to continue, while requiring uncached reads to contact two servers. (Requiring a response from two servers may increase common case latency, but may hide latency from GC spikes, since any two of the three may respond) In HBase, if an HDFS node fails, both reads and writes continue; while when a region-master fails, both reads and writes are stalled until the region master is replaced. Was that a better summary? Is it closer to correct?