On Wed, 07 Apr 2010 13:19:26 -0700 Mike Gallamore <mike.e.gallam...@googlemail.com> wrote:
MG> As an aside I motified some other code to use Net::Cassandra instead MG> of Net::Cassandra::Easy and noticed that it seems to run 3-4X MG> slower. Both aren't stunningly fast. The test clients are running on MG> the same machine as Cassandra, and I'm only getting somewhere between MG> 100-400 (huge variance) with N::C::Easy and 30-90 with N::C. This test MG> is writing key value pairs, with the keys being an incrementing MG> numbber, and the values being a log line from one of our systems (~200 MG> character string). I'm surprised there is such a huge difference in MG> speed between the two modules and that the transactions per second are MG> so low even on my 3.2Ghz P4 2GB RAM box. I tried dropping the MG> consistency level down to zero but it had a negligible affect. First of all, Thrift and the way it's implemented in pure Perl (Inline::C or XS would have been much better, plus the data structures are horrible) are IMO the most annoying thing about working with Cassandra. I proposed a pluggable API mechanism so users don't have to depend on Thrift but the proposal was rejected, so for now Thrift (with the crash-on-demand feature) is the only actively developed Cassandra API. Avro is supposed to be happening soon and I look forward to that. You should benchmark your code; make sure you're comparing apples to apples. N::C::Easy wraps the operations for you, always using multigets and mutations on the backend. I don't know how your Net::Cassandra test is implemented. It may be you're making multiple requests when you only need one. But more importantly, unless you fork multiple processes you won't be winning any speed races. Use Tie::ShareLite, for example, to synchronize your data structures through shared memory. If you can put together benchmarks that run against the default (Keyspace1) configuration, I can try to optimize things. I won't be rewriting the Thrift side, so it will still be slow on serialize/deserialize operations, but everything else will be fixed if it's suboptimal. Ted