We were running a load test against a single 0.6.2 cassandra node. 24 hours into the test, Cassandra appeared to be nearly frozen for 10 minutes. Our write rate went to almost 0, and we had a large number of write timeouts. We weren't swapping or gc'ing at the time.
It looks like the problems were caused by our memtables flushing after 24 hours (we have MemtableFlushAfterMinutes=1440). Some of our column families are written to infrequently so that they don't hit the flush thresholds in MemtableOperationsInMillions and MemtableThroughputInMB. After 24 hours we had ~3000 commit log files. Is this flushing causing Cassandra to become unresponsive? I would have thought Cassandra could flush in the background without blocking new writes. Thanks, Sean