On Wed, Apr 21, 2010 at 12:21:31PM -0500, Jonathan Ellis wrote: > [moving to u...@] > > 0.6 fixes replaying faster than it can flush.
Yeah, I noticed some of those fixes, and will probably take the leap into 0.6 if I can keep my cluster running (it's not doing too bad, I do about 400K reads and 250K writes per minute spread over 23 nodes), however some of the m1.large instances get into this backed up state frequently. So I need to keep the cluster running first. > as for why it backs up in the first place before the restart, you can > either (a) throttle writes [set your timeout lower, make your clients > back off temporarily when it gets a timeoutexception] What timeout is this? Something in the thrift API or a cassandra configuration? > or (b) add capacity. (b) is recommended. Yeah I've been doing that adding xlarge instances with raid0 disks which work better, but I keep running into issues with the old instances which hold up this work. I'll keep chugging along and hopefully get things sorted. -Anthony > > https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this > but there is still no substitute for adding capacity to match demand. > > On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro > <antho...@alumni.caltech.edu> wrote: > > Hi, > > > > I have a cassandra cluster where a couple things are happening. Every > > once in a while a node will start to get backed up. Checking tpstats I > > see a very large value for ROW-MUTATION-STAGE. Sometimes it will be able > > to clear it if I give it enough time, other times the vm OOMs. With some > > nodes I also see this happen during restarts, I'll restart and have to > > wait 6-12 hours for the node to not be marked as 'Down'. > > I've seen > > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts > > and ended up with the following settings. > > > > KeysCachedFraction : 0.01 > > MemtableSizeInMB : 100 > > MemtableObjectCountInMillions : 0.5 > > Heap : -Xmx5G > > > > I only have 2 CFs in this instance and entries are small so in most cases > > I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is > > about 60MB-120MB for the 2 CFs combined. > > > > Anyone have any pointers on where to look next? These are m1.large EC2 > > instances (I want to move to xlarge to get more memory, but haven't yet > > gotten clarification on the best process for node replacement, per my > > other thread). > > > > Thanks, > > > > -Anthony > > > > -- > > ------------------------------------------------------------------------ > > Anthony Molinaro <antho...@alumni.caltech.edu> > > -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>