> > How does this connection pooling fit in with the TSocketPool.php classes? > Or am I off the wicket here? >
Right now, TSocketPool is not being used (in pycassa or phpcassa); individual TSockets are managed within the library. TSocketPool may be a good alternative to this in the future, but I haven't investigated this fully yet. > These are just a few of my observations in relation to what i have seen so > far when working with PHP and Cassandra. I have been working with cassandra > / php for the last 8 months now in a project, and while not using phpcassa, > it strikes me that the Thrift layer in php may need some energy directed at > it. Reads in particular do seem noticeably slow and i am not sure if this is > tied in with the php socket implementation, how my test cluster is currently > set up or how i am currently working with and structuring my data. > I've noticed this as well for large reads (multigets, in particular). It seems to be a cache issue, as latter portions of the frame are processed much more quickly. I observed this using the C extension, for what it's worth. > I also wonder if there are other aspects of the thrift layer that could > pushed into a native module as there is still seems to be lots php code > present in the thrift classes. > Presumably so, and I think it would be great if anybody could tackle this. I have a feeling that the existing C code needs some looking over. I've seen and tried to fix a few bugs there recently, but it wouldn't surprise me at all if there were some inefficiencies there. > Another observation I have made during this work is that xdebug has a > significant effect on performance, which can make profiling a little more > challenging. > Manual time collections worked pretty well for my purposes (a long loop of the same operation). -- Tyler Hobbs Software Engineer, DataStax <http://datastax.com/> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra Python client library