Hi, I've been trying to benchmark Cassandra for our use case and have been seeing poor performance on both writes and (extremely) poor performance on reads.
Using Cassandra 0.51 stable & thrift-0.2.0. It turns out all the CPU time is going to the PHP client process - the JVM operating the Cassandra server isn't breaking much of a sweat. For reads the latency is often up to 1 second to fetch a row containing ~2000 columns, or around 300ms to fetch a 500-column wide row. This is with get_slice(), and a predicate specifying the start & finish range. Using cachegrind and inspecting the code inside the Thrift bindings makes it pretty clear why the performance is so bad, particularly on reads. The biggest culprit is the translation code which casts data back and forth into binary representations for sending over the wire to the Cassandra server. There seems to be some 32-bit specific code which iterates heavily apparently due to a limitation in PHPs implementation of LONGs. However, testing on a 64-bit host doesn't yield any performance improvement. More surprisingly, if I compile and enable the PHP native thrift bindings (following this guide https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP) read performance actually degrades by another 50%. I have verified that the Thrift code is recognizing and using the native PHP functions provided by the library. I've tested all of this on both 32-bit and 64-bit installations of both PHP 5.1 & 5.2. Results are the same in all cases. My environment is on vanilla CentOS 5.4 server installations inside VMWare on a 4 core 64bit host with plenty of RAM and fast disks. Has anyone been able to produce decent performance with PHP & Cassandra? If so, how have you done it? Thanks, Jules