In some sense 1 for one performance "almost" does not matter. Thou I bet you can get Cassandra better (I remember old school ycsb white paper benches against a sharded mysql).
One of the main bullet points of Cassandra is if you want to grow from 4 nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is elastic and supports online adding and removing of nodes. A do-it-yourself hash mod this algorithm really has no upgrade path Edward On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken <chrisger...@mindspring.com>wrote: > Howdy Gustavo, > > One thing that jumped out at me is your having put two cassandra images on > the same box. There may be enough CPU and memory for the two images > combined but you may be seeing some other resource not being shared so > nicely - network card bandwidth, for example. > > More generally, the real question is what the bottleneck is (for both > db's, actually). Start with Cassandra running in that configuration and > start with one client thread sending one request a second. Look at the > CPU, network and memory metrics for all boxes (including the client). > Nothing should be even close to maxing out that that throughout. Now > incrementally increase one of the test parameters (number of clients or > number of inserts per second) just a bit (say from one transaction to 5) > and note the above metrics. Keep slowly increasing the test parameters, > one at a time, until one of the metrics maxes out. That's the bottleneck > you're wondering about. Fix that and the db, be it Cassandra or MySQL) > will move ahead of the other performance-wise. Turn your attention to the > other db and repeat. > > - Chris Gerken > > On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote: > > Hello, > > I've set up a testing evironment for Cassandra and MySQL, to compare both, > regarding *performance only*. And I must admit that I was expecting > Cassandra to beat MySQL. But I've not seen this happening up to now. > My application/use case is INSERT intensive, since I'm not updating > anything, just inserting all the time. > To compare both I created virtual machines with Ubuntu 11.10, and > installed the latest versions of each datastore. Each VM has 1GB of RAM. > I've used VMs as a way to give both datastores an equal sandbox. > MySQL is set up to work as sharded, with 2 databases, that means that > records are inserted to a specific instance based on key % 2. The engine is > MyISAM (InnoDB was really slow and not really needed to my case). There's a > primary compound key (integer and datetime columns) in this test table. > Let's name the "nodes" MySQL1 and MySQL2. > Cassandra is set up to work with 4 nodes, with keys (tokens) set up to > distribute records evenly across the 4 nodes (nodetool ring reports 25% to > each node), replication factor 1 and RandomPartitioner, the other configs > are left to default. Let's name the nodes Cassandra1, Cassandra2, > Cassandra3 and Cassandra4. > > I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2 > (MySQL) virtual machines, this way: > Machine1: MySQL1, Cassandra1, Cassandra3 > Machine2: MySQL2, Cassandra2, Cassandra4 > The machines have CPU and RAM enough to host Cassandra Cluster or MySQL > "Cluster" at a time. > > The client test applicatin is running in a third physical machine, with 8 > threads doing inserts. The test application is written in C# (Windows7) > using Aquiles high-level client. > > My use case is a vehicle tracking system. So, let's suppose, from minute > to minute, the vehicle sends its position together with some other GPS data > and vehicle status information. The columns in my Cassandra cluster are > just the DateTime (long value) of a position for a specific vehicle, and > the value is all the other data serialized to binary format. Therefore, my > CF really grows in columns number. So all data is inserted only to one > CF/Table named Positions. The key to Cassandra is the VehicleID and to > MySQL VehicleID + PositionDateTime (MySQL creates an index to this > automatically). Important to note that MySQL threw tons of connection > exceptions, even though, the insert was retried until it got through MySQL. > > My test case was to insert 1k positions for 1k vehicles to 10 days - which > gives 10.000.000 of inserts. > > The final thoughtput that my application had for this scenario was: > > Cassandra x 4 > 2012-01-21 11:45:38,044 #6 [Logger.Log] INFO - >> Inserted 10000 > positions for 1000 vehicles (10000000 inserts): > 2012-01-21 11:45:38,082 #6 [Logger.Log] INFO - >> Total Time: > 2:37:03,359 > 2012-01-21 11:45:38,085 #6 [Logger.Log] INFO - >> Throughput: > 1061 inserts/s > > And for MySQL x 2 > 2012-01-21 14:26:25,197 #6 [Logger.Log] INFO - >> Inserted 10000 > positions for 1000 vehicles (10000000 inserts): > 2012-01-21 14:26:25,250 #6 [Logger.Log] INFO - >> Total Time: > 2:06:25,914 > 2012-01-21 14:26:25,263 #6 [Logger.Log] INFO - >> Throughput: > 1318 inserts/s > > Is there something that I'm missing here? Is this excepted? Or the problem > is somewhere else and that's hard to say looking at this description? > > Cheers, > Gustavo > > >