Yes, cassandra's big win is that once you get your data and applications adapted to the platform, you have a clear path to very very large scale and resiliency. Um, assuming you have the dollars. It scales out on commodity hardware, but isn't exactly efficient in the use of that hardware. I like to say that Cassandra makes big data "bigger data" because of the timestamp-per-cell and column name overhead and replication factor.
On Tue, Mar 20, 2018 at 2:54 PM, Jeff Jirsa <jji...@gmail.com> wrote: > I suspect you're approaching this problem from the wrong side. > > The decision of MySQL vs Cassandra isn't usually about performance, it's > about the other features that may impact/enable that performance. > > - Will you have a data set that won't fit on any single MySQL Server? > - Will you want to write into two different hot datacenters at the same > time? > - Do you want to be able to restart any single server without impacting > the cluster? > > If you answer yes to those, then cassandra has an option to do so > trivially, where you'd have to build tooling with MySQL. > > - Do you want to do arbitrary text searches? > - Do you need JOINs? > - Do you want to build indices on a lot of the columns and do ad-hoc > querying? > > If you answer yes to those, they're far easier in MySQL than Cassandra. > > If you're just looking for "Cassandra can do X writes per second and MySQL > can do Y writes per second", those types of benchmarks are rarely relevant, > because in both cases they tend to require expert tuning to get the full > potential (and very few people are experts in both) and data dependent (and > your data probably doesn't match the benchmarker's dataset). > > If I had a dataset that was ~10-20gb and wanted to do arbitrary reads on > the data, I'd choose MySQL unless I absolutely positively could not > tolerate downtime, in which case I'd go with Cassandra spanning multiple > datacenters. If I had a dataset that was 200TB, or 200PB, I'd choose > Cassandra, even if I could theoretically make MySQL do it faster, because > the extra effort in building the tooling to manage that many shards of > MySQL would be prohibitive to most organizations. > > > > > > > > On Tue, Mar 20, 2018 at 11:44 AM, Oliver Ruebenacker <cur...@gmail.com> > wrote: > >> >> Hello, >> >> Thanks for all the responses. >> >> I do know some SQL and CQL, so I know the main differences. You can do >> joins in MySQL, but the bigger your data, the less likely you want to do >> that. >> >> If you are a team that wants to consider migrating from MySQL to >> Cassandra, you need some reason to believe that it is going to be faster. >> What evidence is there? >> >> Even the Cassandra home page has references to benchmarks to make the >> case for Cassandra. Unfortunately, they seem to be about five to six years >> old. It doesn't make sense to keep them there if you just can't compare. >> >> Best, Oliver >> >> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R < >> sean_r_dur...@homedepot.com> wrote: >> >>> I’m not sure there is a fair comparison. MySQL and Cassandra have >>> different ways of solving related (but not necessarily the same) problems >>> of storing and retrieving data. >>> >>> >>> >>> The data model between MySQL and Cassandra is likely to be very >>> different. The key for Cassandra is that you need to model for the queries >>> that will be executed. If you cannot know the queries ahead of time, >>> Cassandra is not the best choice. If table scans are typically required, >>> Cassandra is not a good choice. If you need more than a few hundred tables >>> in a cluster, Cassandra is not a good choice. >>> >>> >>> >>> If multi-datacenter replication is required, Cassandra is an awesome >>> choice. If you are going to always query by a partition key (or primary >>> key), Cassandra is a great choice. The nice thing is that the performance >>> scales linearly, so additional data is fine (as long as you add nodes) – >>> again, if your data model is designed for Cassandra. If you like >>> no-downtime upgrades and extreme reliability and availability, Cassandra is >>> a great choice. >>> >>> >>> >>> Personally, I hope to never have to use/support MySQL again, and I love >>> working with Cassandra. But, Cassandra is not the choice for all data >>> problems. >>> >>> >>> >>> >>> >>> Sean Durity >>> >>> >>> >>> *From:* Oliver Ruebenacker [mailto:cur...@gmail.com] >>> *Sent:* Monday, March 12, 2018 3:58 PM >>> *To:* user@cassandra.apache.org >>> *Subject:* [EXTERNAL] Cassandra vs MySQL >>> >>> >>> >>> >>> >>> Hello, >>> >>> We have a project currently using MySQL single-node with 5-6TB of data >>> and some performance issues, and we plan to add data up to a total size of >>> maybe 25-30TB. >>> >>> We are thinking of migrating to Cassandra. I have been trying to find >>> benchmarks or other guidelines to compare MySQL and Cassandra, but most of >>> them seem to be five years old or older. >>> >>> Is there some good more recent material? >>> >>> Thanks! >>> >>> Best, Oliver >>> >>> >>> -- >>> >>> Oliver Ruebenacker >>> >>> Senior Software Engineer, Diabetes Portal >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>, >>> Broad Institute >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=> >>> >>> >>> >>> ------------------------------ >>> >>> The information in this Internet Email is confidential and may be >>> legally privileged. It is intended solely for the addressee. Access to this >>> Email by anyone else is unauthorized. If you are not the intended >>> recipient, any disclosure, copying, distribution or any action taken or >>> omitted to be taken in reliance on it, is prohibited and may be unlawful. >>> When addressed to our clients any opinions or advice contained in this >>> Email are subject to the terms and conditions expressed in any applicable >>> governing The Home Depot terms of business or client engagement letter. The >>> Home Depot disclaims all responsibility and liability for the accuracy and >>> content of this attachment and for any damages or losses arising from any >>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >>> items of a destructive nature, which may be contained in this attachment >>> and shall not be liable for direct, indirect, consequential or special >>> damages in connection with this e-mail message or its attachment. >>> >> >> >> >> -- >> Oliver Ruebenacker >> Senior Software Engineer, Diabetes Portal >> <http://www.type2diabetesgenetics.org/>, Broad Institute >> <http://www.broadinstitute.org/> >> >> >