Hello,
I have some experience in benchmarking Cassandra against Oracle and in
running on a VM cluster.
While the VM solution will work for many applications, it simply won't
cut it for all. In particular, I observed a large difference in insert
performance when I moved from VM to real hardware. Why this is the case,
can be due to bazillion factors, including the high core count on my
"real" machines, and vastly better I/O. The CPU is crucial for inserts
in Cassandra, and it may not be for RDBMS.
Another factor is the potential bottleneck in the client. There are
cases when you won't have enough muscle to handle the data in the client
itself.
None of this is definitive, but I'm just throwing in bit of my
experience from the past 12 months. Right now I'm able to sink data at
insane speeds far beyond these of Oracle.
Maxim
On 1/22/2012 8:10 AM, Gustavo Gustavo wrote:
Hello,
I've set up a testing evironment for Cassandra and MySQL, to compare
both, regarding *performance only*. And I must admit that I was
expecting Cassandra to beat MySQL. But I've not seen this happening up
to now.
My application/use case is INSERT intensive, since I'm not updating
anything, just inserting all the time.
To compare both I created virtual machines with Ubuntu 11.10, and
installed the latest versions of each datastore. Each VM has 1GB of
RAM. I've used VMs as a way to give both datastores an equal sandbox.
MySQL is set up to work as sharded, with 2 databases, that means that
records are inserted to a specific instance based on key % 2. The
engine is MyISAM (InnoDB was really slow and not really needed to my
case). There's a primary compound key (integer and datetime columns)
in this test table.
Let's name the "nodes" MySQL1 and MySQL2.
Cassandra is set up to work with 4 nodes, with keys (tokens) set up to
distribute records evenly across the 4 nodes (nodetool ring reports
25% to each node), replication factor 1 and RandomPartitioner, the
other configs are left to default. Let's name the nodes Cassandra1,
Cassandra2, Cassandra3 and Cassandra4.
I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or
2 (MySQL) virtual machines, this way:
Machine1: MySQL1, Cassandra1, Cassandra3
Machine2: MySQL2, Cassandra2, Cassandra4
The machines have CPU and RAM enough to host Cassandra Cluster or
MySQL "Cluster" at a time.
The client test applicatin is running in a third physical machine,
with 8 threads doing inserts. The test application is written in C#
(Windows7) using Aquiles high-level client.
My use case is a vehicle tracking system. So, let's suppose, from
minute to minute, the vehicle sends its position together with some
other GPS data and vehicle status information. The columns in my
Cassandra cluster are just the DateTime (long value) of a position for
a specific vehicle, and the value is all the other data serialized to
binary format. Therefore, my CF really grows in columns number. So all
data is inserted only to one CF/Table named Positions. The key to
Cassandra is the VehicleID and to MySQL VehicleID + PositionDateTime
(MySQL creates an index to this automatically). Important to note that
MySQL threw tons of connection exceptions, even though, the insert was
retried until it got through MySQL.
My test case was to insert 1k positions for 1k vehicles to 10 days -
which gives 10.000.000 of inserts.
The final thoughtput that my application had for this scenario was:
Cassandra x 4
2012-01-21 11 <tel:2012-01-21%2011>:45:38,044 #6 [Logger.Log]
INFO - >> Inserted 10000 positions for 1000 vehicles (10000000 inserts):
2012-01-21 11 <tel:2012-01-21%2011>:45:38,082 #6 [Logger.Log]
INFO - >> Total Time: 2:37:03,359
2012-01-21 11 <tel:2012-01-21%2011>:45:38,085 #6 [Logger.Log]
INFO - >> Throughput: 1061 inserts/s
And for MySQL x 2
2012-01-21 14 <tel:2012-01-21%2014>:26:25,197 #6 [Logger.Log]
INFO - >> Inserted 10000 positions for 1000 vehicles (10000000 inserts):
2012-01-21 14 <tel:2012-01-21%2014>:26:25,250 #6 [Logger.Log]
INFO - >> Total Time: 2:06:25,914
2012-01-21 14 <tel:2012-01-21%2014>:26:25,263 #6 [Logger.Log]
INFO - >> Throughput: 1318 inserts/s
Is there something that I'm missing here? Is this excepted? Or the
problem is somewhere else and that's hard to say looking at this
description?
Cheers,
Gustavo