[moving to u...@] 0.6 fixes replaying faster than it can flush.
as for why it backs up in the first place before the restart, you can either (a) throttle writes [set your timeout lower, make your clients back off temporarily when it gets a timeoutexception] or (b) add capacity. (b) is recommended. https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this but there is still no substitute for adding capacity to match demand. On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro <antho...@alumni.caltech.edu> wrote: > Hi, > > I have a cassandra cluster where a couple things are happening. Every > once in a while a node will start to get backed up. Checking tpstats I > see a very large value for ROW-MUTATION-STAGE. Sometimes it will be able > to clear it if I give it enough time, other times the vm OOMs. With some > nodes I also see this happen during restarts, I'll restart and have to > wait 6-12 hours for the node to not be marked as 'Down'. > I've seen > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts > and ended up with the following settings. > > KeysCachedFraction : 0.01 > MemtableSizeInMB : 100 > MemtableObjectCountInMillions : 0.5 > Heap : -Xmx5G > > I only have 2 CFs in this instance and entries are small so in most cases > I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is > about 60MB-120MB for the 2 CFs combined. > > Anyone have any pointers on where to look next? These are m1.large EC2 > instances (I want to move to xlarge to get more memory, but haven't yet > gotten clarification on the best process for node replacement, per my > other thread). > > Thanks, > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> >