On Sat, Apr 24, 2010 at 10:20 AM, dir dir <sikerasa...@gmail.com> wrote:
> In general what is the difference between Cassandra and HBase?? > > Thanks. > Others have already said it ... Cassandra has a peer architecture, with all peers being essentially equivalent (minus the concept of a "seed," as far as I can tell). This is a great architectural advantage of Cassandra and Cassandra-like systems. It wasn't really possible to make practical systems like this in earlier ages because of computing (memory, CPU, disk) limitations which made characteristic times (including expected characteristic response, recovery, replication, etc. times) and system dynamics almost impossible to deal with. This problem persists but has become far more manageable because expected response times haven't evolved or narrowed any faster than computational capabilities. HBase on the other hand is a layered system already. It relies on the underlying HDFS, beyond and above the OS. As a more layered systems, it has better service architecture, in a sense, but it relies and is limited to the capabilities of those "services" ... say the distributed file service. Cassandra rolls its own partitioning and replication mechanisms at the level of its peers. It does not rely on some underlying system service for these capabilities. Cassandra is definitely easier to provision and use, from an operational point of view, and this is a great advantage -- although installations that afford scanning (through ordered partitioning) would become more involved. (As suggested by others, reading the BigTable and Dynamo paper will help you to establish the difference between HBase and Cassandra in more clear, architectural terms.) - m.