I suggest the reasons you list (which are certainly great reasons!) are also the reasons there is no referential integrity or transaction support. It seems the common practice of using a system like Zookeeper for the synchronization parts alongside Cassandra would be applicable here. Have you investigated that?
b On Tue, Apr 6, 2010 at 11:47 AM, Steve <sjh_cassan...@shic.co.uk> wrote: > On 06/04/2010 18:50, Benjamin Black wrote: > > I'm finding this exchange very confusing. What exactly about > Cassandra 'looks absolutely ideal' to you for your project? The write > performance, the symmetric, peer to peer architecture, etc? > > > Reasons I like Cassandra for this project: > > Columnar rather than tabular data structures with an extensible 'schemata' - > permitting evolution of back-end data structures to support new features > without down-time. > Decentralised architecture with fault tolerance/redundancy permitting high > availability on shoestring budget hardware in an easily scalable pool - in > spite of needing to track rapidly changing data that precludes meaningful > backup. > Easy to establish that data will be efficiently sharded - allowing many > concurrent reads and writes - i.e. systemic IO bandwidth is scalable - both > for reading and writing. > Lightweight, free and open-source physical data model that minimises risk of > vendor lock-in or insurmountable problems with glitches in commercial > closed-source libraries. > > A shorter answer might be that, in all ways other than depending upon > 'referential integrity' between two 'maps' of hash-values, the data for the > rest of my application looks remarkably like that of large sites that we > know already use Cassandra. > > I'm trying to establish the most effective Cassandra approach to achieve the > logical 'referential integrity' while minimising resource (memory/disk/CPU) > use in order to minimise hardware costs for any given deployment scale - all > the while, retaining the above advantages. > >