Hi, Using Cassandra 1.2.18, we are experimenting an issue in our 2 DC (EC2MultiRegionSnitch) C*1.2.18 cluster.
We have 2 DC and I saw some weird* inconsistencies between our 2 DC. I tried to run repair on all the nodes of all 2 DC (We tried running various repair at the same time and also in a rolling repair way, also tried with and without -pr options). It spends days (last run started 3 days ago on various machines), It seems to hang since I can't see any validation compaction or any streams running. Though, I don't see any error either... The CF I am trying to run right now is 350 MB large (per node), I am quite sure it shouldn't take that long... Repairing other CF get also stuck. The behaviour is quite strange since it seems to work at start (I see this kind of logs : "INFO [AntiEntropyStage:1] 2014-10-18 06:01:58,991 AntiEntropyService.java (line 213) [repair #44563cb0-568c-11e4-83c0-4dae0987c5d6] Received merkle tree for mytable from /xxx.xxx.xxx.xxx", and I see some streams. But then load on nodes goes down streams finish and there is no more validation. When I check my data it appears I still have discrepancies, and "nodetool repair" command does t not return. I now that 2.1 fixes this all. We are going to migrate to C* 2.0 soon (asap) and then to 2.1, but we first need to run some tests, which will take us some time. Is repair officially broken on 1.2.18 ? Is there any known workaround or solutions to get data repaired on this version ? Any insight is very welcome. And if you need more information, let me know. Alain *That's weird since nodetool rebuild worked just fine on all the nodes joining while building the new DC (except one that get stuck somehow, but since I have a RF 3 and CL LOCAL_QUORUM, I should see the exact same result on my 2 DC for a past value, not updated after the new DC joined the cluster). Any idea why I have those discrepancies in first place ?