On Fri, Nov 19, 2010 at 5:30 AM, Jake Maizel <j...@soundcloud.com> wrote:
> We see that after starting repair on one node, we get lots of GC
> (However, we are not swapping and disk io seems fine).  We also see
> increases in the pending queue for AE stages (Seems normal, on the
> order of 40-80 pending stages).  What doesn't seem normal is that we
> see large increase in the AE pending queue on all other nodes not
> running repair (I would expect this on neighbors, but not all nodes)
> and it seems to take forever for these queues to drain (Forever = over
> 24 hrs).

Sounds like https://issues.apache.org/jira/browse/CASSANDRA-1674.
(Fixed for 0.6.9.)

> Here are some questions I have (I can provide any additional info required):
>
> 1. If a node we run repair on finishes, indicated by compaction and AE
> being 0, but the next node we want to repair still has non-zero queues
> for C and AE, can we still start up the repair?

I think having AE empty is the important one, but I'd wait for
everything to be quiesced to be safe.

> 2. What is the effect of running repair on more than one node at a
> time under 0.6.6?  I realize its not recommended but I accidentally
> did this and am curious of the effect.

Often the repairs will stomp on each others' internal state and
neither will finish.

> 3. Is large GC activity normal during a repair outside the documented
> "GC Storm" cases?

Yes.  Repair does a lot of object allocation.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to