This is my personal experiences. MySQL is faster than Cassandra on
most normal use cases.
You should understand why you choose Cassandra instead of MySQL. If
one central MySQL can handle your workload, MySQL is better than
Cassandra. BUT if you are overload one MySQL and want multiple boxes,
Cassandra can be a solution for cheap, Cassandra provides fault
tolerant, decentralized, durable and rich data model. It will not
provide your high performance, especially reading performance is poor.
Digg failed to use Cassandra. You can check
http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/
This doesn't mean Cassandra is bad. You need design carefully to use
Cassandra for your application and business model for success.
On Sep 15, 2010, at 12:06 PM, Wayne wrote:
If MySQL is faster then use it. I struggled to do side by side
comparisons with Mysql for months until finally realizing they are
too different to do side by side comparisons. Mysql is always faster
out of the gate when you come at the problem thinking in terms of
relational databases. Add in replication factor, using wider rows,
dealing with databases that are 2-3 terabytes, tables with 3+
billions rows, etc. etc. The nosql "noise" out there should be
ignored, and a solution like cassandra should be evaluated for what
it brings to the table in terms of a technology that can solve the
problems of big data and not how it does individual queries relative
to mysql. If a "normal" database works for you use it!!
We have tested real loads using a 6 node cluster and consistently
get 5ms reads under load. That is 200 reads/second (1 thread). Mysql
is 10x faster, but then we also have wide rows and in that 5ms get 6
months of lots of different time series data which in the end means
it is 10x faster than Mysql (1 thread). By embracing wide rows we
turn slower into faster. Add in multiple threads/processes and the
ability for a 20 node cluster to support concurrent reads and Mysql
falls back in the dust. Also we don't have 300gb compressed backup
files, we can easily add new nodes and grow, we can actually add
columns dynamically without the dreaded ddl deadlock nightmare in
mysql, and for once we have replication that just works.
On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev
<olega...@gmail.com> wrote:
Kamil Gorlo <kgs4242 <at> gmail.com> writes:
>
> So I've got more reads from single MySQL with 400GB of data than
from
> 8 machines storing about 266GB. This doesn't look good. What am I
> doing wrong? :)
The worst case for cassandra is random reads. You should ask youself
a question,
do you really have this kind of workload in production ? If you
really do, that
means cassandra is not the right tool for the job. Some product
based on
berkeley db should work better, e.g. voldemort. Just plain old
filesystem is
also good for 100% random reads (if you dont need to backup of
course).