Re: Please help me overcome HBase's weaknesses

MauMau Sat, 04 Sep 2010 18:27:02 -0700

Hello, Jonathan,

Thank you. I understood the situation.

If you have a strong requirement of not being able to have dataunavailable for more than one second, I think Cassandra would be a clearwinner here. Is this a requirement just for reads, for writes, or both?

Perhaps just for reads, but I'm not sure yet. Front caching may help,however, additional caching requires more money and makes the system morecomplex.

The flipside to this is that Cassandra carries the same in-memory data onevery replica (because data can be read/written from multiple nodes, itmust live on all these nodes), whereas HBase only carries it once on oneserver. The replication in HBase is at the DFS level not the DB level.So across a cluster, you can effectively only have 1/3 of the total memoryavailable for unique data with Cassandra, if that makes sense.

I didn't notice this point. It may be an appealing point that HBase couldcache more data effectively.

Quite honestly the requirement of not being to have data unavailabilityfor more than 1 second likely takes HBase out of the running because underhard RegionServer failure, you will almost certainly have regions offlinefor longer than that. We'll continue improving here, and if you are notincluding the time for fault detection, it is feasible that we could getdown into the realm of 1 second, though in this case you'd likely have aperiod of "eventual consistency" in which you would be able to access aregion while the log replay was going on.

Accessing data during log replay sounds interesting as an option if notusing transactional region servers.


Regards,
Maumau

Re: Please help me overcome HBase's weaknesses

Reply via email to