I don't think it is a hardware issue. This is happening on multiple servers and clients on ec2 instances and my local development VM. I think you are right that the timestamp errors are likely being cause by the Thrift PHP bindings. The frustrating part is that I can't get the error to consistently reproduce when I have debugging systems in place.
As for the Memtable thresholds, when I ran with lower thresholds, the server would be thrashing with compaction runs due to the dramatically increased number of sstable files. That was when I was running 0.5.0. Has 0.6.0 improved compaction performance such that this shouldn't be an issue? Lee Parker On Fri, Apr 16, 2010 at 1:13 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Fri, Apr 16, 2010 at 12:50 PM, Lee Parker <l...@socialagency.com> wrote: > > Each time I start it up, it will > > work fine for about 1 hour and then it will crash the servers. The error > > message on the servers is usually an out of memory error. > > Sounds like > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts > to me. > > > I will get > > several time out errors on the clients > > Symtomatic of running out of memory. > > > and occasionally get an error telling > > me that i was missing the timestamp. > > This is an entirely different problem. Your client is sending > garbage, plain and simple. Why that is, I don't know. The PHP Thrift > binding is virtually unmaintained, so it could be a bug there, but > Digg uses PHP against Cassandra extensively and hasn't hit this to my > knowledge. As I said in another thread, I wouldn't rule out bad > hardware. > > > The timestamp error is accompanied by > > a server crashing if I use framed transport instead of buffered. > > Thrift is fragile when the client sends it garbage. > (https://issues.apache.org/jira/browse/THRIFT-601) > > > One of the reasons we > > were trying cassandra was to scale out with smaller nodes rather than > having > > to run larger instances for mysql. > > 2 x 1GB isn't a whole lot to do a bulk load with. You may have to > throttle your clients to fix the OOM completely. > > -Jonathan >