Hi, I have a cassandra cluster where a couple things are happening. Every once in a while a node will start to get backed up. Checking tpstats I see a very large value for ROW-MUTATION-STAGE. Sometimes it will be able to clear it if I give it enough time, other times the vm OOMs. With some nodes I also see this happen during restarts, I'll restart and have to wait 6-12 hours for the node to not be marked as 'Down'. I've seen http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts and ended up with the following settings.
KeysCachedFraction : 0.01 MemtableSizeInMB : 100 MemtableObjectCountInMillions : 0.5 Heap : -Xmx5G I only have 2 CFs in this instance and entries are small so in most cases I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is about 60MB-120MB for the 2 CFs combined. Anyone have any pointers on where to look next? These are m1.large EC2 instances (I want to move to xlarge to get more memory, but haven't yet gotten clarification on the best process for node replacement, per my other thread). Thanks, -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>