I've tried to post the below comment twice at
The problems with ACID, and how to fix them without going NoSQL
http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
For whatever reason, it has appeared in the comments section from my
perspective briefly twice and then disappeared twice, so I will just post it
here, because HBase is mentioned in the article a few times, and ... well, just
read. :-)
>>>
Many earlier comments have covered much of what I would say. However, nobody to
date has raised an objection to the mildly offensive contention that "the NoSQL
decision to give up on ACID is the lazy solution to these scalability and
replication issues." Possibly this was not meant in the pejorative sense, but
it reads that way. I would argue the correct term of art here is pragmatism,
not laziness.
I am a contributor to the HBase project. HBase is an open source implementation
of the BigTable architecture. Indeed our system does scale out by substantially
relaxing the scope of ACID guarantees. But it is a gross generalization to
suggest "NoSQL" is "NoACID", and somehow lazy in the pejorative sense, and this
mars the argument of the authors. HBase at least in particular provides
durability, row-level atomicity (agree here this is a nice convenient
partition), and favors strong consistency in its design choices. In this
regard, I would also like to bring to your attention that the authors made an
error describing the scope of transactional atomicity available in BigTable --
the scope is actually the row, not each individual KV.
Also, at least HBase in particular is a big project with several interesting
design/research directions and so does not reduce to a convenient stereotype: a
transactional layer that provides global ACID properties at user option (that
does not scale out like the underlying system but is nonetheless available),
exploration of notions of referential integrity, even consideration of optional
relaxed consistency (read replicas) in the other direction.
Back to the matter of pragmatism: While it is likely most structured data store
users are not building systems on the scale of a globally distributed search
engine, actually that is not too far off the mark for the design targets of
some HBase installations. We indeed do need to work with very large mutating
data sets today and nothing in the manner of a traditional relational database
system is up to the task. The discussion here, while intriguing, is also
rendered fairly academic by the "horrible" performance if spinning media is
used. Flash will not be competitive with spinning media at high tera- or
peta-scale for at least several years yet. Other commenters have also noticed
apparent bottlenecks in the presented design which suggest a high scale
implementation will be problematic.
Anyway, it is my belief we are attacking the same set of problems but are
starting at it on opposing sides of a continuum and, ultimately, we shall meet
up somewhere in the middle.
September 2, 2010 10:55 AM
<<<
- Andy