Hi,

I've been trying to benchmark Cassandra for our use case and have been
seeing poor performance on both writes and (extremely) poor
performance on reads.

Using Cassandra 0.51 stable & thrift-0.2.0.

It turns out all the CPU time is going to the PHP client process - the
JVM operating the Cassandra server isn't breaking much of a sweat.

For reads the latency is often up to 1 second to fetch a row
containing ~2000 columns, or around 300ms to fetch a 500-column wide
row.  This is with get_slice(), and a predicate specifying the start &
finish range.

Using cachegrind and inspecting the code inside the Thrift bindings
makes it pretty clear why the performance is so bad, particularly on
reads. The biggest culprit is the translation code which casts data
back and forth into binary representations for sending over the wire
to the Cassandra server.

There seems to be some 32-bit specific code which iterates heavily
apparently due to a limitation in PHPs implementation of LONGs.

However, testing on a 64-bit host doesn't yield any performance improvement.

More surprisingly, if I compile and enable the PHP native thrift
bindings (following this guide
https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
read performance actually degrades by another 50%.  I have verified
that the Thrift code is recognizing and using the native PHP functions
provided by the library.

I've tested all of this on both 32-bit and 64-bit installations of
both PHP 5.1 & 5.2.  Results are the same in all cases.

My environment is on vanilla CentOS 5.4 server installations inside
VMWare on a 4 core 64bit host with plenty of RAM and fast disks.

Has anyone been able to produce decent performance with PHP &
Cassandra?  If so, how have you done it?

Thanks,
Jules

Reply via email to