Julian Simon <jsimon <at> jules.com.au> writes: > > Hi, > > I've been trying to benchmark Cassandra for our use case and have been > seeing poor performance on both writes and (extremely) poor > performance on reads. > > Using Cassandra 0.51 stable & thrift-0.2.0. > > It turns out all the CPU time is going to the PHP client process - the > JVM operating the Cassandra server isn't breaking much of a sweat. > > For reads the latency is often up to 1 second to fetch a row > containing ~2000 columns, or around 300ms to fetch a 500-column wide > row. This is with get_slice(), and a predicate specifying the start & > finish range. > > Using cachegrind and inspecting the code inside the Thrift bindings > makes it pretty clear why the performance is so bad, particularly on > reads. The biggest culprit is the translation code which casts data > back and forth into binary representations for sending over the wire > to the Cassandra server. > > There seems to be some 32-bit specific code which iterates heavily > apparently due to a limitation in PHPs implementation of LONGs. > > However, testing on a 64-bit host doesn't yield any performance improvement. > > More surprisingly, if I compile and enable the PHP native thrift > bindings (following this guide > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP) > read performance actually degrades by another 50%. I have verified > that the Thrift code is recognizing and using the native PHP functions > provided by the library. > > I've tested all of this on both 32-bit and 64-bit installations of > both PHP 5.1 & 5.2. Results are the same in all cases. > > My environment is on vanilla CentOS 5.4 server installations inside > VMWare on a 4 core 64bit host with plenty of RAM and fast disks. > > Has anyone been able to produce decent performance with PHP & > Cassandra? If so, how have you done it? > > Thanks, > Jules > >
I had exactly the same problem: without native thrift bindings the performance was low and PHP used too much CPU. But when I compiled and enabled the native thrift bindings (following this guide https:// wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too). With the several random tries I discovered, that the buffer size matters. I mean the second and the third arguments for "new TBufferedTransport($socket, X, Y)". But the most surprising fact is that it matters much more when using native thrift bindings than when not using them. I.e.: - get_range_slices without native thrift bindings (either small or large buffer size): ~1sec. - get_range_slices with native thrift bindings and small buffer size (1024): ~5sec! - get_range_slices with native thrift bindings and large buffer size (40960): ~0.1sec. I don't know why!! P.S.: cassandra 0.6.3.