Hi Vincent,

most people handle repair with :
- pain (by hand running nodetool commands)
- cassandra range repair :
https://github.com/BrianGallew/cassandra_range_repair
- Spotify Reaper
- and OpsCenter repair service for DSE users

Reaper is a good option I think and you should stick to it. If it cannot do
the job here then no other tool will.

You have several options from here :

   - Try to break up your repair table by table and see which ones actually
   get stuck
   - Check your logs for any repair/streaming error
   - Avoid repairing everything :
      - you may have expendable tables
      - you may have TTLed only tables with no deletes, accessed with
      QUORUM CL only
   - You can try to relieve repair pressure in Reaper by lowering repair
   intensity (on the tables that get stuck)
   - You can try adding steps to your repair process by putting a higher
   segment count in reaper (on the tables that get stuck)
   - And lastly, you can turn to incremental repair. As you're familiar
   with Reaper already, you might want to take a look at our Reaper fork that
   handles incremental repair :
   https://github.com/thelastpickle/cassandra-reaper
   If you go down that way, make sure you first mark all sstables as
   repaired before you run your first incremental repair, otherwise you'll end
   up in anticompaction hell (bad bad place) :
   
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html
   Even if people say that's not necessary anymore, it'll save you from a
   very bad first experience with incremental repair.
   Furthermore, make sure you run repair daily after your first inc repair
   run, in order to work on small sized repairs.


Cheers,


On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann <m...@vrischmann.me> wrote:

Hi,

we have two Cassandra 2.1.15 clusters at work and are having some trouble
with repairs.

Each cluster has 9 nodes, and the amount of data is not gigantic but some
column families have 300+Gb of data.
We tried to use `nodetool repair` for these tables but at the time we
tested it, it made the whole cluster load too much and it impacted our
production apps.

Next we saw https://github.com/spotify/cassandra-reaper , tried it and had
some success until recently. Since 2 to 3 weeks it never completes a repair
run, deadlocking itself somehow.

I know DSE includes a repair service but I'm wondering how do other
Cassandra users manage repairs ?

Vincent.

-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to