I have seen something similar. Of course correlation is not causation...
Like you, doing testing with heavy writes. I was using a python client to drive the writes using the cql module which is thrift based. The correlation I eventually tracked down was that whichever node my python client(s) connected to eventually ran out of memory because it could not gain enough back by flushing memtables. It was just a matter of time. I switched to the new python-driver client and the problem disappeared. I have now been able to return almost all parameters to defaults and get out the business of manually managing the JVM heap, to my great relief! Currently, I have to retool my test harness as I have been unable to drive C*2.0.0 to destruction (yet). Michael On Mon, Sep 9, 2013 at 8:11 PM, Jan Algermissen <jan.algermis...@nordsc.com>wrote: > I have a strange pattern: In a cluster with three equally dimensioned and > configured nodes I keep loosing one because apparently it fails to flush > its memtables: > > http://twitpic.com/dcrtel > > > It is a different node every time. > > So far I understand that I should expect to see the chain-saw graph when > memtables are build up and then get flushed. But what about that third > node? Has anyone seen something similar? > > Jan > > C* dsc 2.0 , 3x 4GB, 2CPU nodes with heavy writes of 70 col-rows (aprox > 10 of those rows per wide row) > > I have turned off caches, reduced overall memtable and set flush-wroters > to 2, rpc_reader and writer threads to 1. > > >