Hi,

Using Cassandra 1.2.18, we are experimenting an issue in our 2 DC
(EC2MultiRegionSnitch) C*1.2.18 cluster.

We have 2 DC and I saw some weird* inconsistencies between our 2 DC. I
tried to run repair on all the nodes of all 2 DC (We tried running various
repair at the same time and also in a rolling repair way, also tried with
and without -pr options). It spends days (last run started 3 days ago on
various machines), It seems to hang since I can't see any validation
compaction or any streams running. Though, I don't see any error either...
The CF I am trying to run right now is 350 MB large (per node), I am quite
sure it shouldn't take that long... Repairing other CF get also stuck.

The behaviour is quite strange since it seems to work at start (I see this
kind of logs : "INFO [AntiEntropyStage:1] 2014-10-18 06:01:58,991
AntiEntropyService.java (line 213) [repair
#44563cb0-568c-11e4-83c0-4dae0987c5d6] Received merkle tree for mytable
from /xxx.xxx.xxx.xxx", and I see some streams. But then load on nodes goes
down streams finish and there is no more validation. When I check my data
it appears I still have discrepancies, and "nodetool repair" command does t
not return.

I now that 2.1 fixes this all. We are going to migrate to C* 2.0 soon
(asap) and then to 2.1, but we first need to run some tests, which will
take us some time. Is repair officially broken on 1.2.18 ? Is there any
known workaround or solutions to get data repaired on this version ?

Any insight is very welcome. And if you need more information, let me know.

Alain

*That's weird since nodetool rebuild worked just fine on all the nodes
joining while building the new DC (except one that get stuck somehow, but
since I have a RF 3 and CL LOCAL_QUORUM, I should see the exact same result
on my 2 DC for a past value, not updated after the new DC joined the
cluster). Any idea why I have those discrepancies in first place ?

Reply via email to