Re: repair hangs, validation failed

Alexander Dejanovski Thu, 14 Sep 2017 06:41:09 -0700

There should be no migration needed, but if you have a lot of data,
anticompaction could take a while the first time. The only way to make that
fast would be to mark all sstables as repaired, and then start running
incremental repair every day or so to have small datasets to anticompact.
But all that data wouldn't really be repaired and no subsequent repair
would actually repair it, unless you run a full repair.


Do note that full repair performs anticompaction too, and only subrange
repair will skip that phase.

You should never use "-pr" with incremental repair as it won't mark all
sstables as repaired. It's unnecessary anyway since incremental repair will
skip already repaired tokens, which defeats the benefits of using "-pr".
Just run "nodetool repair" on one node, wait for it to finish (check the
logs for repair completion if nodetool loses the connection at some point).
Only when it is fully finished on that node, move on to the next one.

Cheers,

On Thu, Sep 14, 2017 at 11:52 AM Micha <mich...@fantasymail.de> wrote:

>
> ok, I have restarted the cluster to stop all repairs.
> There is no "migration" process to move to incremental repair in 3.11?
> So I can start "nodetool repair -pr"  node after node or just "nodetool
> repair" on one node?
>
>  Cheers,
>  Michael
>
>
> On 14.09.2017 10:47, Alexander Dejanovski wrote:
> > Hi Micha,
> >
> > Are you running incremental repair ?
> > If so, then validation fails when 2 repair sessions are running at the
> same
> > time, with one anticompacting an SSTable and the other trying to run a
> > validation compaction on it.
> >
> > If you check the logs of the nodes that is referred to in the "Validation
> > failed in ...", you should see that there are error messages stating that
> > an sstable can't be part of 2 different repair sessions.
> >
> > If that happens (and you're indeed running incremental repair), you
> should
> > roll restart the cluster to stop all repairs and then process one node
> at a
> > time only.
> > Reaper does that, but you can handle it manually if you prefer. The plan
> > here is to wait for all anticompactions to be over before starting a
> repair
> > on the next node.
> >
> > In any case, check the logs of the node that failed to run validation
> > compaction in order to understand what failed.
> >
> > Cheers,
> >
> > On Thu, Sep 14, 2017 at 10:18 AM Micha <mich...@fantasymail.de> wrote:
> >
> >> Hi,
> >>
> >> I started a repair (7 nodes, C* 3.11) but at once I get an exception in
> >> the log:
> >> "RepairException: [#.... on keyspace/table, [....],
> >>  Validation failed in /ip"
> >>
> >> The started nodetool repair hangs (the whole day...), strace shows it's
> >> waiting...
> >>
> >> What's the reason for this excpetion and what to do now? If this is an
> >> error, why doesn't nodetool abort the command and shows the error?
> >>
> >> thanks,
> >>  Michael
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >> --
> > -----------------
> > Alexander Dejanovski
> > France
> > @alexanderdeja
> >
> > Consultant
> > Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: repair hangs, validation failed

Reply via email to