On 06/04/2010 18:50, Benjamin Black wrote: > I'm finding this exchange very confusing. What exactly about > Cassandra 'looks absolutely ideal' to you for your project? The write > performance, the symmetric, peer to peer architecture, etc? >
Reasons I like Cassandra for this project: * Columnar rather than tabular data structures with an extensible 'schemata' - permitting evolution of back-end data structures to support new features without down-time. * Decentralised architecture with fault tolerance/redundancy permitting high availability on shoestring budget hardware in an easily scalable pool - in spite of needing to track rapidly changing data that precludes meaningful backup. * Easy to establish that data will be efficiently sharded - allowing many concurrent reads and writes - i.e. systemic IO bandwidth is scalable - both for reading and writing. * Lightweight, free and open-source physical data model that minimises risk of vendor lock-in or insurmountable problems with glitches in commercial closed-source libraries. A shorter answer might be that, in all ways other than depending upon 'referential integrity' between two 'maps' of hash-values, the data for the rest of my application looks remarkably like that of large sites that we know already use Cassandra. I'm trying to establish the most effective Cassandra approach to achieve the logical 'referential integrity' while minimising resource (memory/disk/CPU) use in order to minimise hardware costs for any given deployment scale - all the while, retaining the above advantages.