I did regenerate the thrift bindings. What I have found in testing is that the batch_mutate command occasionally sends bad data to thrift when i try to insert a set of items with too many columns. I don't know if this is a problem with PHP, or the thrift PHP library. I have found that a limit of 1000 columns is perfectly fast enough for my needs and stable. Previously, I was regularly sending 6000 columns (500 rows with about 12 columns each). Most of the columns in each row was fairly small, but some of the rows had a rather large block of text. When this was happening, the output of the TBinaryProtocol would actually be incorrect at seemingly random times. This would then cause an error from cassandra saying that I was missing my timestamp. Enough of these errors and cassandra would crash with an out of memory error. If enough data was on the servers when this happened, cassandra couldn't recover from the error because I didn't have enough memory on the machines. I have now upgraded to larger machines and that has cleared up the real memory issues.
Lee Parker On Sun, Apr 18, 2010 at 6:51 PM, Brandon Williams <dri...@gmail.com> wrote: > On Fri, Apr 16, 2010 at 12:50 PM, Lee Parker <l...@socialagency.com> wrote: > >> This process is running on two clients each working on a separate part of >> the mysql data which totals to about 70G. Each time I start it up, it will >> work fine for about 1 hour and then it will crash the servers. The error >> message on the servers is usually an out of memory error. I will get >> several time out errors on the clients and occasionally get an error telling >> me that i was missing the timestamp. The timestamp error is accompanied by >> a server crashing if I use framed transport instead of buffered. I wasn't >> having the out of memory errors with 0.5.0, but had lots of timeouts and >> some "unknown result" errors. So we upgraded to 0.6.0 when it became the >> stable release. >> > > Did you regenerate the php thrift bindings between 0.5 and 0.6? There's a > decent chance that thrift made some kind of backwards incompatible change > between those revisions (look in the lib dir of each cassandra version to > determine the thrift svn revision you need.) If that happened, then it's > possible the old bindings are sending something the newer version does not > understand, and causing you to run into THRIFT-601, crashing the server. > > -Brandon >