Re: Nodetool repair

Alain RODRIGUEZ Mon, 19 Sep 2016 06:08:25 -0700

Hi Lokesh,

Repair is a regular, very common and yet non trivial operations in
Cassandra. A lot of people are struggling with it.

Some good talks were done about repairs during the summit, you might want
to have a look in the Datastax youtube channel in a few days :-).
https://www.youtube.com/user/DataStaxMedia

Is there a way to know in advance the ETA of manual repair before
> triggering it
>

There is not such a thing. And it is probably because the duration of the
repair is going to depend on:

- The size of your data
- The number of vnodes
- The compaction throughput
- The streaming throughput
- The hardware available
- The load of the cluster
- ...

So the best thing to do is to benchmark it in your own environment. You can
track repairs using logs. I used something like that in the past:

for i in $(echo "SELECT columnfamily_name FROM system.schema_columns WHERE
keyspace_name = ‘my_keyspace';" | cqlsh | uniq | tail -n +4 | head -n -2);
do echo Sessions synced for $i: $(grep -i "$i is fully synced"
/var/log/cassandra/system.log* | wc -l); done

Depending on your version of Cassandra - and the path to your logs - this
might work or not, you might need to adjust it. The number of "sessions"
depends on the number of nodes and of vnodes. But the number of session
will be the same on all the tables, from all the nodes if you are using the
same number of vnodes.

So you will soon have a good idea on how long it takes to repair a table /
a keyspace and some informations about the completeness of the repairs (be
aware of the rotations in the logs and of the previous repairs logs if
using the command above).

How fast repair can go will also depend on the options and techniques you
are using:

- Subranges: https://github.com/BrianGallew/cassandra_range_repair ?
- Incremental / Full repairs ?

I believe repair performs following operations -
>
> 1) Major compaction
> 2) Exchange of merkle trees with neighbouring nodes.
>

 AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
> here.

Jens is right, no major compaction in there. This is how repairs (roughly)
works. There are 2 main steps:

- Compare / exchange merkle trees (done through a VALIDATION compaction,
like a compaction, but without the write phase)
- Streaming: Any mismatch detected in the previous validation is fixed by
streaming a larger block of data (read more about that:
http://www.datastax.com/dev/blog/advanced-repair-techniques)

To monitor those operations use

- validation: nodetool compactionstats -H (Look for "VALIDATION COMPACTION"
off the top of my head)
- streaming: watch -d 'nodetool netstats -H | grep -v 100%'

You should think about what would be a good repair strategy according to
your use case and workload (run repairs by night ? Use subranges ?). Keep
in mind that "nodetool repair" is useful to reduce entropy in your cluster,
and so reducing the risk of inconsistencies. Repair also prevents deleted
data from reappearing (Zombies) as long as it is run cluster-wide within
gc_grace_seconds (per table option).

What if I kill the process in the middle?

This is safe, some parts of the data will not be repair on this node,
that's it. You can either restart the node or find the right JMX command.

C*heers,
-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-19 11:18 GMT+02:00 Jens Rantil <jens.ran...@tink.se>:

> Hi Lokesh,
>
> Which version of Cassandra are you using? Which compaction strategy are
> you using?
>
> AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
> here.
>
> What you could do is to run a repair for a subset of the ring (see `-st`
> and `-et` `nodetool repair` parameters). If you repair 1/1000 or the ring,
> repairing the whole ring will take ~1000 longer than your sample.
>
> Also, you might want to look at incremental repairs.
>
> If you kill the process in the middle the repair will not start again. You
> will need to reissue it.
>
> Cheers,
> Jens
>
> On Sun, Sep 18, 2016 at 2:58 PM Lokesh Shrivastava <
> lokesh.shrivast...@gmail.com> wrote:
>
>> Hi,
>>
>> I tried to run nodetool repair command on one of my keyspaces and found
>> that it took lot more time than I anticipated. Is there a way to know in
>> advance the ETA of manual repair before triggering it? I believe repair
>> performs following operations -
>>
>> 1) Major compaction
>> 2) Exchange of merkle trees with neighbouring nodes.
>>
>> Is there any other operation performed during manual repair? What if I
>> kill the process in the middle?
>>
>> Thanks.
>> Lokesh
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>

Re: Nodetool repair

Reply via email to