Hi R. Verlangen!

 On 2011.12.27 at 15:50:24 +0100, R. Verlangen wrote next:

> You might consider a hybrid solution with a transactional db for all data
> that should be ACID complient and Cassandra for the huge amounts of data
> you want to store.
> 
> 2011/12/27 Radim Kolar <h...@sendmail.cz>
> 
> >
> >  makes me feel disappointed about consistency in Cassandra, but I wonder is
> >> there is a way to work around it.
> >>
> > cassandra is not suitable for this kind of programs. CouchDB is slightly
> > better, it has transactions but no locking and i am not sure if transaction
> > isolation is supported now. mongodb has some kind of atomic operations -
> > http://www.mongodb.org/**display/DOCS/Atomic+Operations<http://www.mongodb.org/display/DOCS/Atomic+Operations>
> >  but no locking or rollbacks either.
> >
> > Standard RDBMS like DB2 and Oracle are best for your kind of applications,
> > they can scale well too. In DB2 you can choose between shared disk and
> > shared nothing cluster architecture.

I see. But is there any way of implementing minimum required ACID subset
on top of Cassandra? Like, for example, I don't care about rollbacks and
can live without transactions as long as there is some way to ensure
that readers read changes in data in the same order as writer submitted
them.

I've read about Cages & Zookeeper but it seems to be too heavyweight and
complicated for something as simple as this, plus it doesn't seem to be
usable outside Java. (current writer client implementation is in Python,
though it might be rewritten in C in the future)

DB2 & Oracle might get very expensive here.. We use some Oracle here,
but for this solution we'd need RAC & partitioning for sure, and that's
just for a start, it's still unclear if it will be able to scale in the
future; as for DB2, while I haven't checked prices, it's probably not
something cheap either, and limitations of free versions simply don't
cut here.

We are still evaluating on how much we can cut the amount of data
inserted into database and how much of consistency we could sacrifice.
Amount of data is huge because it's pre-calculated, inserting more of
raw data for some parts will place more burden on clients during reads
as they will have to recalculate it, but it might be possible.

However, since data is pre-calculated, reads from database will always
be very simple, we don't need any processing - just "read all states for
this point in time", "read previous states", "read next states" kind of
thing. This, too, suggests to look for more optimal solutions than SQL
databases.

As for consistency, something like time-based consistency would satisfy
us too - for example, by saying "during reads, ignore data inserted in
the last second, as only data that's older than second is guaranteed to
be consistent" - but then some way to ensure that at least data which is
N seconds old really *IS* consistent no matter what, by the help of some
mechanism. Generally, we don't need very up-to date data in the
database, missing some latest parts is okay, as long as there is some
way to cut off the end "cleanly" so that the point which reader sees is
100% consistent.

Reply via email to