Howdy Gustavo,

One thing that jumped out at me is your having put two cassandra images on the 
same box.  There may be enough CPU and memory for the two images combined but 
you may be seeing some other resource not being shared so nicely - network card 
bandwidth, for example.

More generally, the real question is what the bottleneck is (for both db's, 
actually).  Start with Cassandra running in that configuration and start with 
one client thread sending one request a second.  Look at the CPU, network and 
memory metrics for all boxes (including the client).  Nothing should be even 
close to maxing out that that throughout.  Now incrementally increase one of 
the test parameters (number of clients or number of inserts per second) just a 
bit (say from one transaction to 5) and note the above metrics.  Keep slowly 
increasing the test parameters, one at a time, until one of the metrics maxes 
out.  That's the bottleneck you're wondering about.  Fix that and the db, be it 
Cassandra or MySQL) will move ahead of the other performance-wise.  Turn your 
attention to the other db and repeat.

- Chris Gerken

On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote:

> Hello,
> 
> I've set up a testing evironment for Cassandra and MySQL, to compare both, 
> regarding *performance only*. And I must admit that I was expecting Cassandra 
> to beat MySQL. But I've not seen this happening up to now.
> My application/use case is INSERT intensive, since I'm not updating anything, 
> just inserting all the time.
> To compare both I created virtual machines with Ubuntu 11.10, and installed 
> the latest versions of each datastore. Each VM has 1GB of RAM. I've used VMs 
> as a way to give both datastores an equal sandbox.
> MySQL is set up to work as sharded, with 2 databases, that means that records 
> are inserted to a specific instance based on key % 2. The engine is MyISAM 
> (InnoDB was really slow and not really needed to my case). There's a primary 
> compound key (integer and datetime columns) in this test table.
> Let's name the "nodes" MySQL1 and MySQL2.
> Cassandra is set up to work with 4 nodes, with keys (tokens) set up to 
> distribute records evenly across the 4 nodes (nodetool ring reports 25% to 
> each node), replication factor 1 and RandomPartitioner, the other configs are 
> left to default. Let's name the nodes Cassandra1, Cassandra2, Cassandra3 and 
> Cassandra4.
> 
> I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2 
> (MySQL) virtual machines, this way:
> Machine1: MySQL1, Cassandra1, Cassandra3
> Machine2: MySQL2, Cassandra2, Cassandra4
> The machines have CPU and RAM enough to host Cassandra Cluster or MySQL 
> "Cluster" at a time.
> 
> The client test applicatin is running in a third physical machine, with 8 
> threads doing inserts. The test application is written in C# (Windows7) using 
> Aquiles high-level client.
> 
> My use case is a vehicle tracking system. So, let's suppose, from minute to 
> minute, the vehicle sends its position together with some other GPS data and 
> vehicle status information. The columns in my Cassandra cluster are just the 
> DateTime (long value) of a position for a specific vehicle, and the value is 
> all the other data serialized to binary format. Therefore, my CF really grows 
> in columns number. So all data is inserted only to one CF/Table named 
> Positions. The key to Cassandra is the VehicleID and to MySQL VehicleID + 
> PositionDateTime (MySQL creates an index to this automatically). Important to 
> note that MySQL threw tons of connection exceptions, even though, the insert 
> was retried until it got through MySQL.
> 
> My test case was to insert 1k positions for 1k vehicles to 10 days - which 
> gives 10.000.000 of inserts.
> 
> The final thoughtput that my application had for this scenario was:
> 
> Cassandra x 4
> 2012-01-21 11:45:38,044 #6         [Logger.Log] INFO  - >> Inserted 10000 
> positions for 1000 vehicles (10000000 inserts): 
> 2012-01-21 11:45:38,082 #6         [Logger.Log] INFO  - >> Total Time: 
> 2:37:03,359
> 2012-01-21 11:45:38,085 #6         [Logger.Log] INFO  - >> Throughput: 1061 
> inserts/s
> 
> And for MySQL x 2
> 2012-01-21 14:26:25,197 #6         [Logger.Log] INFO  - >> Inserted 10000 
> positions for 1000 vehicles (10000000 inserts): 
> 2012-01-21 14:26:25,250 #6         [Logger.Log] INFO  - >> Total Time: 
> 2:06:25,914
> 2012-01-21 14:26:25,263 #6         [Logger.Log] INFO  - >> Throughput: 1318 
> inserts/s
> 
> Is there something that I'm missing here? Is this excepted? Or the problem is 
> somewhere else and that's hard to say looking at this description?
> 
> Cheers,
> Gustavo
> 

Reply via email to