Re: Hbase vs Cassandra

Ajay Mon, 08 Jun 2015 02:17:27 -0700

Hi Jens,

All the points listed weren't from me. I posted the HBase Vs Cassandra in
both the forums and consolidated here for the discussion.



On Mon, Jun 8, 2015 at 2:27 PM, Jens Rantil <jens.ran...@tink.se> wrote:

> Hi,
>
> Some minor comments:
>
> > 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for Cassandra but it doesn't support vnodes.
>
> Not entirely sure what you mean here, but we ran Cloudera for a while and
> Cloudera Manager was buggy and hard to debug. Overall, our experience
> wasn't very good. This was definitely also due to us not knowing how all
> the Cloudera packages were configured.
>

*>>> This is the one of the response I got it from HBase forum. Datastax
OpsCenter is there but seems it doesn't support the latest Cassandra
versions (we tried it couple of times and there were bugs too)*

>
> > HBase is always consistent. Machine outages lead to inability to read
> or write data on that machine. With Cassandra you can always write.
>
> Sort of true. You can decide write consistency and throw an exception if
> write didn't go through consistently. However, do note that Cassandra will
> never rollback failed writes which means writes aren't atomic (as in ACID).
>
> *>>> If I understand correctly, you mean when we write with QUORUM and
Cassandra writes to few machines and fails to write to few machines and
throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
doesn't rollback?. *


> We chose Cassandra over HBase mostly due to ease of managability. We are a
> small team, and my feeling is that you will want dedicated people taking
> care of a Hadoop cluster if you are going down the HBase path. A Cassandra
> cluster can be handled by a single engineer and is, in my opinion, easier
> to maintain.
>

*>>> This is the most popular reason for Cassandra over HBase. But this
alone is not a sufficient driver. *


> Cheers,
> Jens
>
> On Mon, Jun 8, 2015 at 9:59 AM, Ajay <ajay.ga...@gmail.com> wrote:
>
>> Hi All,
>>
>> Thanks for all the input. I posted the same question in HBase forum and
>> got more response.
>>
>> Posting the consolidated list here.
>>
>> Our case is that a central team builds and maintain the platform
>> (Cassandra as a service). We have couple of usecases which fits Cassandra
>> like time-series data. But as a platform team, we need to know more
>> features and usecases which fits or best handled in Cassandra. Also to
>> understand the usecases where HBase performs better (we might need to have
>> it as a service too).
>>
>> *Cassandra:*
>>
>> 1) From 2013 both can still be relevant:
>> http://www.pythian.com/blog/watch-hbase-vs-cassandra/
>>
>> 2) Here are some use cases from PlanetCassandra.org of companies who
>> chose Cassandra over HBase after evaluation, or migrated to Cassandra from
>> HBase.
>> The eComNext interview cited on the page touches on time-series data;
>> http://planetcassandra.org/hbase-to-cassandra-migration/
>>
>> 3) From googling, the most popular advantages for Cassandra over HBase is
>> easy to deploy, maintain & monitor and no single point of failure.
>>
>> 4) From our six months research and POC experience in Cassandra, CQL is
>> pretty limited. Though CQL is targeted for Real time Read and Write, there
>> are cases where need to pull out data differently and we are OK with little
>> more latency. But Cassandra doesn't support that. We need MapReduce or
>> Spark for those. Then the debate starts why Cassandra and why not HBase if
>> we need Hadoop/Spark for MapReduce.
>>
>> Expected a few more technical features/usecases that is best handled by
>> Cassandra (and how it works).
>>
>> *HBase:*
>>
>> 1) As for the #4 you might be interested in reading
>> https://aphyr.com/posts/294-call-me-maybe-cassandra
>> Not sure if there is comparable article about HBase (anybody knows?) but
>> it can give you another perspective about what else to keep an eye on
>> regarding these systems.
>>
>> 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe
>>
>> 3) http://blog.parsely.com/post/1928/cass/
>> *Anyone have any comments on this?*
>>
>> 4) 1. No killer features comparing to hbase
>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for Cassandra but it doesn't support vnodes.
>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data you try to write.
>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>
>> 5)  Migrated from Cassandra to HBase.
>> Reasons:
>> Scan is fast with HBase. It fits better with time series data model.
>> Please look at opentsdb. Cassandra models it with large rows.
>> Server side filtering. You can use to filter some of your time series
>> data on the server side.
>> Hbase has a better integration with hadoop in general. We had to write
>> our own bulk loader using mapreduce for cassandra. hbase has already had a
>> tool for that. There is a nice integration with flume and kite.
>> High availability didnet matter for us. 10 secs down is fine for our use
>> cases.HBase started to support eventually consistent reads.
>>
>> 6) Coprocessor framework (custom code inside Region Server and
>> MasterServers), which Cassandra is missing, afaik.
>>    Coprocessors have been widely used by hBase users (Phoenix SQL, for
>> example) since inception (in 0.92).
>> * HBase security model is more mature and align well with Hadoop/HDFS
>> security. Cassandra provides just basic authentication/authorization/SSL
>> encryption, no Kerberos, no end-to-end data encryption,
>> no cell level security.
>>
>> 7) Another point to add is the new "HBase read high-availability using
>> timeline-consistent region replicas" feature from HBase 1.0 onward, which
>> brings HBase closer to Cassandra in term of Read Availability during
>> node failures.  You have a choice for Read Availability now.
>> https://issues.apache.org/jira/browse/HBASE-10070
>>
>> 8) Hbase can do range scans, and one can attack many problems with range
>> scans. Cassandra can't do range scans.
>>
>> 9) HBase is a distributed, consistent, sorted key value store. The
>> "sorted" bit allows for range scans in addition to the point gets that all
>> K/V stores support. Nothing more, nothing less.
>> It happens to store its data in HDFS by default, and we provide
>> convenient input and output formats for map reduce.
>>
>> *Neutral:*
>> 1)
>> http://khangaonkar.blogspot.com/2013/09/cassandra-vs-hbase-which-nosql-store-do.html
>>
>> 2) The fundamental differences that come to mind are:
>> * HBase is always consistent. Machine outages lead to inability to read
>> or write data on that machine. With Cassandra you can always write.
>>
>> * Cassandra defaults to a random partitioner, so range scans are not
>> possible (by default)
>> * HBase has a range partitioner (if you don't want that the client has to
>> prefix the rowkey with a prefix of a hash of the rowkey). The main feature
>> that set HBase apart are range scans.
>>
>> * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
>> You can map reduce directly into HFiles and map those into HBase instantly.
>>
>> * Cassandra has a dedicated company supporting (and promoting) it.
>> * Getting started is easier with Cassandra. For HBase you need to run
>> HDFS and Zookeeper, etc.
>> * I've heard lots of anecdotes about Cassandra working nicely with small
>> cluster (< 50 nodes) and quick degenerating above that.
>> * HBase does not have a query language (but you can use Phoenix for full
>> SQL support)
>> * HBase does not have secondary indexes (having an eventually consistent
>> index, similar to what Cassandra has, is easy in HBase, but making it as
>> consistent as the rest of HBase is hard)
>>
>> Thanks
>> Ajay
>>
>>
>>>
>>> On May 29, 2015, at 12:09 PM, Ajay <ajay.ga...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I need some info on Hbase vs Cassandra as a data store (in general plus
>>> specific to time series data).
>>>
>>> The comparison in the following helps:
>>> 1: features
>>> 2: deployment and monitoring
>>> 3: performance
>>> 4: anything else
>>>
>>> Thanks
>>> Ajay
>>>
>>>
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>  Twitter <https://twitter.com/tink>
>

Re: Hbase vs Cassandra

Reply via email to