Just to follow this up, I repeated the test with a multi-threaded java (Hector) client and was able to get much better performance - 10,000 rows in just over a second. So it looks like the client latency was the killer and I have since read that the ruby thrift implementation is not the fastest.
On Apr 4, 2012, at 9:11 AM, Jeff Williams wrote: > On three machines on the same subnet as the two cassandra nodes. > > On Apr 3, 2012, at 6:40 PM, Collard, David L (Dave) wrote: > >> Where is your client running? >> >> -----Original Message----- >> From: Jeff Williams [mailto:je...@wherethebitsroam.com] >> Sent: Tuesday, April 03, 2012 11:09 AM >> To: user@cassandra.apache.org >> Subject: Re: Write performance compared to Postgresql >> >> Vitalii, >> >> Yep, that sounds like a good idea. Do you have any more information about >> how you're doing that? Which client? >> >> Because even with 3 concurrent client nodes, my single postgresql server is >> still out performing my 2 node cassandra cluster, although the gap is >> narrowing. >> >> Jeff >> >> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote: >> >>> Note that having tons of TCP connections is not good. We are using async >>> client to issue multiple calls over single connection at same time. You can >>> do the same. >>> >>> Best regards, Vitalii Tymchyshyn. >>> >>> 03.04.12 16:18, Jeff Williams написав(ла): >>>> Ok, so you think the write speed is limited by the client and protocol, >>>> rather than the cassandra backend? This sounds reasonable, and fits with >>>> our use case, as we will have several servers writing. However, a bit >>>> harder to test! >>>> >>>> Jeff >>>> >>>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote: >>>> >>>>> Hi Jeff, >>>>> >>>>> Writing serially over one connection will be slower. If you run many >>>>> threads hitting the server at once you will see throughput improve. >>>>> >>>>> Jake >>>>> >>>>> >>>>> >>>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am looking at cassandra for a logging application. We currently log to >>>>>> a Postgresql database. >>>>>> >>>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had >>>>>> 100 hashes representing logs entries, read from a json file. I then >>>>>> looped over these to do 10,000 log inserts. I repeated the same writing >>>>>> to a postgresql instance on one of the cassandra servers. The script is >>>>>> attached. The cassandra writes appear to perform a lot worse. Is this >>>>>> expected? >>>>>> >>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb >>>>>> cassandra >>>>>> 3.170000 0.480000 3.650000 ( 12.032212) >>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb >>>>>> postgres >>>>>> 2.140000 0.330000 2.470000 ( 7.002601) >>>>>> >>>>>> Regards, >>>>>> Jeff >>>>>> >>>>>> <cassandra-bm.rb> >>> >> >