subject:"Stalled streams during repairs"

Re: Stalled streams during repairs

2014-04-17 Thread Robert Coli

On Wed, Apr 16, 2014 at 8:39 PM, Andrew Cooper wrote: > It is becoming more and more evident that the most reliable option at > this point would be to do an out-of-band rsync of a snapshot on dc1, with a > custom sstable id de-duplication script paired with a > refresh/compaction/cleanup on dc2 n

RE: Stalled streams during repairs

2014-04-16 Thread Andrew Cooper

First, thanks for the quick reply and jira links! Its helpful to know we are not the only ones experiencing these issues. "Are you sure you actually want/need to run repair as frequently as you currently are? Reducing the frequency won't make it work any better, but it will reduce the number o

Re: Stalled streams during repairs

2014-04-16 Thread Robert Coli

On Wed, Apr 16, 2014 at 3:17 PM, Andrew Cooper wrote: > We are running cassandra 1.2.5. I have checked through the change logs up > to 1.2.16 and do not see any indications of this being a known (and fixed) > issue. > Repair has been re-written in 2.0, because it was broken; that's why you don't

Stalled streams during repairs

2014-04-16 Thread Andrew Cooper

We are running into a reproducible issue in one of our cassandra clusters. We are seeing that during an anti-entropy repair, if a particular sstable is streaming to multiple endpoints and the two streams happen to hit the same section of the sstable, it stalls all streams indefinitely on the so