On Fri, Nov 19, 2010 at 5:30 AM, Jake Maizel <j...@soundcloud.com> wrote: > We see that after starting repair on one node, we get lots of GC > (However, we are not swapping and disk io seems fine). We also see > increases in the pending queue for AE stages (Seems normal, on the > order of 40-80 pending stages). What doesn't seem normal is that we > see large increase in the AE pending queue on all other nodes not > running repair (I would expect this on neighbors, but not all nodes) > and it seems to take forever for these queues to drain (Forever = over > 24 hrs).
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-1674. (Fixed for 0.6.9.) > Here are some questions I have (I can provide any additional info required): > > 1. If a node we run repair on finishes, indicated by compaction and AE > being 0, but the next node we want to repair still has non-zero queues > for C and AE, can we still start up the repair? I think having AE empty is the important one, but I'd wait for everything to be quiesced to be safe. > 2. What is the effect of running repair on more than one node at a > time under 0.6.6? I realize its not recommended but I accidentally > did this and am curious of the effect. Often the repairs will stomp on each others' internal state and neither will finish. > 3. Is large GC activity normal during a repair outside the documented > "GC Storm" cases? Yes. Repair does a lot of object allocation. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com