Re: Cassandra x MySQL Sharded - Insert Comparison

Maxim Potekhin Tue, 24 Jan 2012 14:23:18 -0800

a) I hate to break it to you, but 6GB x 4 cores != 'high-end machine'.It's pretty much middle of the road consumer level these days.

b) Hosting the client and Cassandra on the same node is a Bad Idea. Itwill depend on what exactly the client will do, but in my experience itwon't work too well in general.

c) Have you considered dual boot, so you can have a "good operatingsystem" (as per Cassandra folks) in addition to Windows?


Maxim


On 1/22/2012 8:22 PM, Gustavo Gustavo wrote:

Ok guys, thank you for the valuable hints you gave me.

For sure, things will perform much better on a real hardware. But myobject maybe isn't really to see what't the max throughput that thedatastores have. It is more or less like, given an equal condition,which one would perform better.But I'll do this way, I'm going to use a high-end machine (6GB RAM, 4cores) and run Cassandra, MySQL and the Client Test Application on thesame machine. Unfortunately, I'll have to use Windows 7 as a host tothe datastores.>From your experience, do you think that even in single node, canCassandra beat in inserts a RDBMS? I've seen that InnoDB (somethingthat compares to the other databases relational engine) is prettyslow. But when it comes to MyISAM, things are much faster.


/Gustavo

2012/1/22 Chris Gerken <chrisger...@mindspring.com<mailto:chrisger...@mindspring.com>>


    Edward (and Maxim),

    I agree.  I was just recalling previous performance bake-offs (for
    other technologies, long time ago, galaxy far far away) in which
    the customer had put together a mockup of the high throughput
    expected in production and wanted to make a decision against that
    one set of numbers.  We always found that both/all competing
    products could be made to run faster due to unexpected factors in
    the non-production test build.  For our side, we always started
    simple and built up the throughput until we found a bottleneck.
     We fixed the bottleneck. Rinse and repeat.

    Chris Gerken

    chrisger...@mindspring.com <mailto:chrisger...@mindspring.com>
    512.587.5261 <tel:512.587.5261>
    http://www.linkedin.com/in/chgerken



    On Jan 22, 2012, at 8:51 AM, Edward Capriolo wrote:

    In some sense 1 for one performance "almost" does not matter.
    Thou I bet you can get Cassandra better (I remember old school
    ycsb white paper benches against a sharded mysql).

    One of the main bullet points of Cassandra is if you want to grow
    from 4 nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is
    elastic and supports online adding and removing of nodes. A
    do-it-yourself hash mod this algorithm really has no upgrade path

    Edward

    On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken
    <chrisger...@mindspring.com <mailto:chrisger...@mindspring.com>>
    wrote:

        Howdy Gustavo,

        One thing that jumped out at me is your having put two
        cassandra images on the same box.  There may be enough CPU
        and memory for the two images combined but you may be seeing
        some other resource not being shared so nicely - network card
        bandwidth, for example.

        More generally, the real question is what the bottleneck is
        (for both db's, actually).  Start with Cassandra running in
        that configuration and start with one client thread sending
        one request a second.  Look at the CPU, network and memory
        metrics for all boxes (including the client).  Nothing should
        be even close to maxing out that that throughout.  Now
        incrementally increase one of the test parameters (number of
        clients or number of inserts per second) just a bit (say from
        one transaction to 5) and note the above metrics.  Keep
        slowly increasing the test parameters, one at a time, until
        one of the metrics maxes out.  That's the bottleneck you're
        wondering about.  Fix that and the db, be it Cassandra or
        MySQL) will move ahead of the other performance-wise.  Turn
        your attention to the other db and repeat.

        - Chris Gerken

        On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote:

        Hello,

        I've set up a testing evironment for Cassandra and MySQL, to
        compare both, regarding *performance only*. And I must admit
        that I was expecting Cassandra to beat MySQL. But I've not
        seen this happening up to now.
        My application/use case is INSERT intensive, since I'm not
        updating anything, just inserting all the time.
        To compare both I created virtual machines with Ubuntu
        11.10, and installed the latest versions of each datastore.
        Each VM has 1GB of RAM. I've used VMs as a way to give both
        datastores an equal sandbox.
        MySQL is set up to work as sharded, with 2 databases, that
        means that records are inserted to a specific instance based
        on key % 2. The engine is MyISAM (InnoDB was really slow and
        not really needed to my case). There's a primary compound
        key (integer and datetime columns) in this test table.
        Let's name the "nodes" MySQL1 and MySQL2.
        Cassandra is set up to work with 4 nodes, with keys (tokens)
        set up to distribute records evenly across the 4 nodes
        (nodetool ring reports 25% to each node), replication factor
        1 and RandomPartitioner, the other configs are left to
        default. Let's name the nodes Cassandra1, Cassandra2,
        Cassandra3 and Cassandra4.

        I'm using 2 physical machines (Windows7) to host the 4
        (Cassandra) or 2 (MySQL) virtual machines, this way:
        Machine1: MySQL1, Cassandra1, Cassandra3
        Machine2: MySQL2, Cassandra2, Cassandra4
        The machines have CPU and RAM enough to host Cassandra
        Cluster or MySQL "Cluster" at a time.

        The client test applicatin is running in a third physical
        machine, with 8 threads doing inserts. The test application
        is written in C# (Windows7) using Aquiles high-level client.

        My use case is a vehicle tracking system. So, let's suppose,
        from minute to minute, the vehicle sends its position
        together with some other GPS data and vehicle status
        information. The columns in my Cassandra cluster are just
        the DateTime (long value) of a position for a specific
        vehicle, and the value is all the other data serialized to
        binary format. Therefore, my CF really grows in columns
        number. So all data is inserted only to one CF/Table named
        Positions. The key to Cassandra is the VehicleID and to
        MySQL VehicleID + PositionDateTime (MySQL creates an index
        to this automatically). Important to note that MySQL threw
        tons of connection exceptions, even though, the insert was
        retried until it got through MySQL.

        My test case was to insert 1k positions for 1k vehicles to
        10 days - which gives 10.000.000 of inserts.

        The final thoughtput that my application had for this
        scenario was:

        Cassandra x 4

2012-01-21 11 <tel:2012-01-21%2011>:45:38,044 #6[Logger.Log] INFO - >> Inserted 10000 positions for 1000

        vehicles (10000000 inserts):

2012-01-21 11 <tel:2012-01-21%2011>:45:38,082 #6[Logger.Log] INFO - >> Total Time: 2:37:03,3592012-01-21 11 <tel:2012-01-21%2011>:45:38,085 #6[Logger.Log] INFO - >> Throughput: 1061 inserts/s


        And for MySQL x 2

2012-01-21 14 <tel:2012-01-21%2014>:26:25,197 #6[Logger.Log] INFO - >> Inserted 10000 positions for 1000

        vehicles (10000000 inserts):

2012-01-21 14 <tel:2012-01-21%2014>:26:25,250 #6[Logger.Log] INFO - >> Total Time: 2:06:25,9142012-01-21 14 <tel:2012-01-21%2014>:26:25,263 #6[Logger.Log] INFO - >> Throughput: 1318 inserts/s


        Is there something that I'm missing here? Is this excepted?
        Or the problem is somewhere else and that's hard to say
        looking at this description?

        Cheers,
        Gustavo

Re: Cassandra x MySQL Sharded - Insert Comparison

Reply via email to