I was able to make Cassandra beat MySQL MyISAM (~10k inserts/s against 6k inserts/s) using two physical machines (laptops) - one the client, and the other one the server, with 50 inserting threads. I don't know exactly why yet, but the high-level client that I was using to C# (Aquiles) was taking a lot of CPU. I switched to fluent-cassandra and things started to go pretty fast. This was the real problem I suspect. Yep, dual boot is a good idea. I'll give it a try and see if I can push both datastores forward. But I think the client won't have enough CPU to handle much more than 50 threads.
/Gustavo 2012/1/24 Maxim Potekhin <potek...@bnl.gov> > a) I hate to break it to you, but 6GB x 4 cores != 'high-end machine'. > It's pretty much middle of the road consumer level these days. > > b) Hosting the client and Cassandra on the same node is a Bad Idea. It > will depend on what exactly the client will do, but in my experience it > won't work too well in general. > > c) Have you considered dual boot, so you can have a "good operating > system" (as per Cassandra folks) in addition to Windows? > > Maxim > > > > On 1/22/2012 8:22 PM, Gustavo Gustavo wrote: > > Ok guys, thank you for the valuable hints you gave me. > For sure, things will perform much better on a real hardware. But my > object maybe isn't really to see what't the max throughput that the > datastores have. It is more or less like, given an equal condition, which > one would perform better. > But I'll do this way, I'm going to use a high-end machine (6GB RAM, 4 > cores) and run Cassandra, MySQL and the Client Test Application on the same > machine. Unfortunately, I'll have to use Windows 7 as a host to the > datastores. > >From your experience, do you think that even in single node, can > Cassandra beat in inserts a RDBMS? I've seen that InnoDB (something that > compares to the other databases relational engine) is pretty slow. But when > it comes to MyISAM, things are much faster. > > /Gustavo > > 2012/1/22 Chris Gerken <chrisger...@mindspring.com> > >> Edward (and Maxim), >> >> I agree. I was just recalling previous performance bake-offs (for >> other technologies, long time ago, galaxy far far away) in which the >> customer had put together a mockup of the high throughput expected in >> production and wanted to make a decision against that one set of numbers. >> We always found that both/all competing products could be made to run >> faster due to unexpected factors in the non-production test build. For our >> side, we always started simple and built up the throughput until we found a >> bottleneck. We fixed the bottleneck. Rinse and repeat. >> >> Chris Gerken >> >> chrisger...@mindspring.com >> 512.587.5261 >> http://www.linkedin.com/in/chgerken >> >> >> >> On Jan 22, 2012, at 8:51 AM, Edward Capriolo wrote: >> >> In some sense 1 for one performance "almost" does not matter. Thou I bet >> you can get Cassandra better (I remember old school ycsb white paper >> benches against a sharded mysql). >> >> One of the main bullet points of Cassandra is if you want to grow from 4 >> nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is elastic and >> supports online adding and removing of nodes. A do-it-yourself hash mod >> this algorithm really has no upgrade path >> >> Edward >> >> On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken <chrisger...@mindspring.com >> > wrote: >> >>> Howdy Gustavo, >>> >>> One thing that jumped out at me is your having put two cassandra >>> images on the same box. There may be enough CPU and memory for the two >>> images combined but you may be seeing some other resource not being shared >>> so nicely - network card bandwidth, for example. >>> >>> More generally, the real question is what the bottleneck is (for both >>> db's, actually). Start with Cassandra running in that configuration and >>> start with one client thread sending one request a second. Look at the >>> CPU, network and memory metrics for all boxes (including the client). >>> Nothing should be even close to maxing out that that throughout. Now >>> incrementally increase one of the test parameters (number of clients or >>> number of inserts per second) just a bit (say from one transaction to 5) >>> and note the above metrics. Keep slowly increasing the test parameters, >>> one at a time, until one of the metrics maxes out. That's the bottleneck >>> you're wondering about. Fix that and the db, be it Cassandra or MySQL) >>> will move ahead of the other performance-wise. Turn your attention to the >>> other db and repeat. >>> >>> - Chris Gerken >>> >>> On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote: >>> >>> Hello, >>> >>> I've set up a testing evironment for Cassandra and MySQL, to compare >>> both, regarding *performance only*. And I must admit that I was expecting >>> Cassandra to beat MySQL. But I've not seen this happening up to now. >>> My application/use case is INSERT intensive, since I'm not updating >>> anything, just inserting all the time. >>> To compare both I created virtual machines with Ubuntu 11.10, and >>> installed the latest versions of each datastore. Each VM has 1GB of RAM. >>> I've used VMs as a way to give both datastores an equal sandbox. >>> MySQL is set up to work as sharded, with 2 databases, that means that >>> records are inserted to a specific instance based on key % 2. The engine is >>> MyISAM (InnoDB was really slow and not really needed to my case). There's a >>> primary compound key (integer and datetime columns) in this test table. >>> Let's name the "nodes" MySQL1 and MySQL2. >>> Cassandra is set up to work with 4 nodes, with keys (tokens) set up to >>> distribute records evenly across the 4 nodes (nodetool ring reports 25% to >>> each node), replication factor 1 and RandomPartitioner, the other configs >>> are left to default. Let's name the nodes Cassandra1, Cassandra2, >>> Cassandra3 and Cassandra4. >>> >>> I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2 >>> (MySQL) virtual machines, this way: >>> Machine1: MySQL1, Cassandra1, Cassandra3 >>> Machine2: MySQL2, Cassandra2, Cassandra4 >>> The machines have CPU and RAM enough to host Cassandra Cluster or MySQL >>> "Cluster" at a time. >>> >>> The client test applicatin is running in a third physical machine, with >>> 8 threads doing inserts. The test application is written in C# (Windows7) >>> using Aquiles high-level client. >>> >>> My use case is a vehicle tracking system. So, let's suppose, from minute >>> to minute, the vehicle sends its position together with some other GPS data >>> and vehicle status information. The columns in my Cassandra cluster are >>> just the DateTime (long value) of a position for a specific vehicle, and >>> the value is all the other data serialized to binary format. Therefore, my >>> CF really grows in columns number. So all data is inserted only to one >>> CF/Table named Positions. The key to Cassandra is the VehicleID and to >>> MySQL VehicleID + PositionDateTime (MySQL creates an index to this >>> automatically). Important to note that MySQL threw tons of connection >>> exceptions, even though, the insert was retried until it got through MySQL. >>> >>> My test case was to insert 1k positions for 1k vehicles to 10 days - >>> which gives 10.000.000 of inserts. >>> >>> The final thoughtput that my application had for this scenario was: >>> >>> Cassandra x 4 >>> 2012-01-21 11 <2012-01-21%2011>:45:38,044 #6 [Logger.Log] INFO >>> - >> Inserted 10000 positions for 1000 vehicles (10000000 inserts): >>> 2012-01-21 11 <2012-01-21%2011>:45:38,082 #6 [Logger.Log] INFO >>> - >> Total Time: 2:37:03,359 >>> 2012-01-21 11 <2012-01-21%2011>:45:38,085 #6 [Logger.Log] INFO >>> - >> Throughput: 1061 inserts/s >>> >>> And for MySQL x 2 >>> 2012-01-21 14 <2012-01-21%2014>:26:25,197 #6 [Logger.Log] INFO >>> - >> Inserted 10000 positions for 1000 vehicles (10000000 inserts): >>> 2012-01-21 14 <2012-01-21%2014>:26:25,250 #6 [Logger.Log] INFO >>> - >> Total Time: 2:06:25,914 >>> 2012-01-21 14 <2012-01-21%2014>:26:25,263 #6 [Logger.Log] INFO >>> - >> Throughput: 1318 inserts/s >>> >>> Is there something that I'm missing here? Is this excepted? Or the >>> problem is somewhere else and that's hard to say looking at this >>> description? >>> >>> Cheers, >>> Gustavo >>> >>> >>> >> >> > >