“Is cassandra only for use cases with data load > 100TB and massive user counts?”
I wouldn’t make that extreme a statement! There are plenty of more moderate use cases for Cassandra. For example, a dozen nodes with 300 GB per node for just a few million users and their interactions and transactions. I would say that as a rough rule of thumb that a traditional RDBMS is great for up to low millions of rows, and Cassandra is clearly needed when you have more than a few hundred millions of rows. In between, it becomes a more subjective choice. Tens of millions of rows can probably be dealt with effectively by an RDBMS, but... you’re starting to have to be careful and configure high-end systems and manage them carefully. 100 million rows? Sure, you could still do that on an RDBMS if you are motivated and put in the effort. For example, some relational databases may require manual partitioning when you have more than 25 million rows or so. And then you have to pay attention to query latency as well. First big question: It may be 100 million rows today, but what growth rate do you anticipate? -- Jack Krupansky From: Matthias Hübner Sent: Saturday, July 5, 2014 5:49 AM To: user@cassandra.apache.org Subject: Re: Cassandra use cases/Strengths/Weakness Hi, i am a bit confused if cassandra is a choice for my use case especially after reading this thread. Is cassandra only for use cases with data load > 100TB and massive user counts? What about all the other features of cassandra, are they not useable to avoid limitations of relational databases, even for smaller use cases? What do you think for my use case: I need to manage data data for around 1000 retail stores to produce each day a delivery plan (including predictions several weeks in the future) to refill the stores. For each store I have to collect data about every single store item. A store has some 10 thousand items. This makes around 100 million items to manage. Each day I have store some updates for every single store item. Also I receive for all items sale predictions day by day. Every day I have to produce one ore more delivery plans. Most data will replace old data, so its not increasing that much. I thought i can handle data load easier with cassandra than with mariadb. I don’t have to care about locking, I could write all incoming data and merge into my tables. And I could use aggregations. So I would be able to add all store item related data together that I need to compute my delivery plans. Finally I would be able to use commodity hardware and can scale easier. Have a nice weekend, Matthias 2014-07-05 0:37 GMT+02:00 Jack Krupansky <j...@basetechnology.com>: Elasticsearch and Solr are “search platforms”, not “databases”. The best description for Cassandra, especially for a CTO, is its home page: http://cassandra.apache.org/ Even if you have seen it before, please read it again. There is a lot packed into a few words. DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for analytics, and tightly integrated Solr for rich search of the Cassandra data. The main, biggest benefit of Cassandra is that it is a master-free distributed real-time database designed for scale, including support for multiple data centers, so that it is ready for managing mission critical operational data, for applications that need low latency and high availability for real-time data access. And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a CTO would appreciate it: http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Here’s a feature comparison of some NoSQL databases: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis -- Jack Krupansky From: Prem Yadav Sent: Friday, July 4, 2014 10:37 AM To: user@cassandra.apache.org Subject: Cassandra use cases/Strengths/Weakness Hi, I have seen this in a lot of replies that Cassandra is not designed for this and that. I don't want to sound rude, i just need some info about this so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc. 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or ElasticSearch What is the use case(s) that suit Cassandra. 2) What kind of queries are best suited for Cassandra. I ask this Because I have seen people asking about queries and getting replies that its not suited for Cassandra. For ex: queries where large number of rows are requested and timeout happens. Or range queries or aggregate queries. 3) Where does Cassandra excel compared to other technologies? I have been working on Casandra for some time. I know how it works and I like it very much. We are moving towards building a big cluster. But at this point, I am not sure if its a right decision. A lot of people including me like Cassandra in my company. But it has more to do with the CQL and not the internals or the use cases. Until now, there have been small PoCs and people enjoyed it. But a large scale project, we are not so sure. Please guide us. Please note that the drawbacks of other technologies do not interest me, its the strengths/weaknesses of Cassandra I am interested in. Thanks