Hello Oliver,

The first thing that I check when seeing if a workload will work well
within Cassandra is by looking at it's read patterns. Once the read
patterns can be written down on paper, we need to figure out how the write
patterns will populate the required tables. Since you know enough about
CQL, it's mainly about checking to see how denormalization is going to work
out for on-disk read access requests.

Once the read and write patterns are known, we can see if Cassandra will be
a good fit for denormalizing your workflow and thereby benefiting from a
datastore that can scale out horizontally. If your datastore can scale out
horizontally then Cassandra should be faster than a single node MySQL
cluster. If your datastore has too many relational requirements, is built
in for a queue-like purpose, or other edge cases, then it doesn't matter
how fast Cassandra is if it's not the correct tool for the job.

I hope that helps align your discovery/investigation process. :)

Cheers,

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Tue, Mar 20, 2018 at 1:44 PM, Oliver Ruebenacker <cur...@gmail.com>
wrote:

>
>      Hello,
>
>   Thanks for all the responses.
>
>   I do know some SQL and CQL, so I know the main differences. You can do
> joins in MySQL, but the bigger your data, the less likely you want to do
> that.
>
>   If you are a team that wants to consider migrating from MySQL to
> Cassandra, you need some reason to believe that it is going to be faster.
> What evidence is there?
>
>   Even the Cassandra home page has references to benchmarks to make the
> case for Cassandra. Unfortunately, they seem to be about five to six years
> old. It doesn't make sense to keep them there if you just can't compare.
>
>      Best, Oliver
>
> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> I’m not sure there is a fair comparison. MySQL and Cassandra have
>> different ways of solving related (but not necessarily the same) problems
>> of storing and retrieving data.
>>
>>
>>
>> The data model between MySQL and Cassandra is likely to be very
>> different. The key for Cassandra is that you need to model for the queries
>> that will be executed. If you cannot know the queries ahead of time,
>> Cassandra is not the best choice. If table scans are typically required,
>> Cassandra is not a good choice. If you need more than a few hundred tables
>> in a cluster, Cassandra is not a good choice.
>>
>>
>>
>> If multi-datacenter replication is required, Cassandra is an awesome
>> choice. If you are going to always query by a partition key (or primary
>> key), Cassandra is a great choice. The nice thing is that the performance
>> scales linearly, so additional data is fine (as long as you add nodes) –
>> again, if your data model is designed for Cassandra. If you like
>> no-downtime upgrades and extreme reliability and availability, Cassandra is
>> a great choice.
>>
>>
>>
>> Personally, I hope to never have to use/support MySQL again, and I love
>> working with Cassandra. But, Cassandra is not the choice for all data
>> problems.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Oliver Ruebenacker [mailto:cur...@gmail.com]
>> *Sent:* Monday, March 12, 2018 3:58 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Cassandra vs MySQL
>>
>>
>>
>>
>>
>>      Hello,
>>
>>   We have a project currently using MySQL single-node with 5-6TB of data
>> and some performance issues, and we plan to add data up to a total size of
>> maybe 25-30TB.
>>
>>   We are thinking of migrating to Cassandra. I have been trying to find
>> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
>> them seem to be five years old or older.
>>
>>   Is there some good more recent material?
>>
>>   Thanks!
>>
>>      Best, Oliver
>>
>>
>> --
>>
>> Oliver Ruebenacker
>>
>> Senior Software Engineer, Diabetes Portal
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>,
>> Broad Institute
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> <http://www.type2diabetesgenetics.org/>, Broad Institute
> <http://www.broadinstitute.org/>
>
>

Reply via email to