Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Alexander Dejanovski Thu, 29 Sep 2016 04:10:15 -0700

Atul,

since you're using 3.6, by default you're running incremental repair, which
doesn't like concurrency very much.
Validation errors are not occurring on a partition or partition range base,
but if you're trying to run both anticompaction and validation compaction
on the same SSTable.


Like advised to Robert yesterday, and if you want to keep on running
incremental repair, I'd suggest the following :

   - run nodetool tpstats on all nodes in search for running/pending repair
   sessions
   - If you have some, and to be sure you will avoid conflicts, roll
   restart your cluster (all nodes)
   - Then, run "nodetool repair" on one node.
   - When repair has finished on this node (track messages in the log and
   nodetool tpstats), check if other nodes are running anticompactions
   - If so, wait until they are over
   - If not, move on to the other node

You should be able to run concurrent incremental compactions on different
tables if you wish to speed up the complete repair of the cluster, but do
not try to repair the same table/full keyspace from two nodes at the same
time.

If you do not want to keep on using incremental repair, and fallback to
classic full repair, I think the only way in 3.6 to avoid anticompaction
will be to use subrange repair (Paulo mentioned that in 3.x full repair
also triggers anticompaction).

You have two options here : cassandra_range_repair (
https://github.com/BrianGallew/cassandra_range_repair) and Spotify Reaper (
https://github.com/spotify/cassandra-reaper)

cassandra_range_repair might scream about subrange + incremental not being
compatible (not sure here), but you can modify the repair_range() method by
adding a --full switch to the command line used to run repair.

We have a fork of Reaper that handles both full subrange repair and
incremental repair here : https://github.com/thelastpickle/cassandra-reaper
It comes with a tweaked version of the UI made by Stephan Podkowinski (
https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
interactions to schedule, run and track repair - which adds fields to run
incremental repair (accessible via ...:8080/webui/ in your browser).

Cheers,



On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> Hi,
>
> We are not sure whether this issue is linked to that node or not. Our
> application does frequent delete and insert.
>
> May be our approach is not correct for nodetool repair. Yes, we generally
> fire repair on all boxes at same time. Till now, it was manual with default
> configuration ( command: "nodetool repair").
> Yes, we saw validation error but that is linked to already running repair
> of  same partition on other box for same partition range. We saw error
> validation failed with some ip as repair in already running for the same
> SSTable.
> Just few days back, we had 2 DCs with 3 nodes each and replication was
> also 3. It means all data on each node.
>
> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Atul,
>>
>> could you be more specific on how you are running repair ? What's the
>> precise command line for that, does it run on several nodes at the same
>> time, etc...
>> What is your gc_grace_seconds ?
>> Do you see errors in your logs that would be linked to repairs
>> (Validation failure or failure to create a merkle tree)?
>>
>> You seem to mention a single node that went down but say the whole
>> cluster seem to have zombie data.
>> What is the connection you see between the node that went down and the
>> fact that deleted data comes back to life ?
>> What is your strategy for cyclic maintenance repair (schedule, command
>> line or tool, etc...) ?
>>
>> Thanks,
>>
>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <atul.sar...@snapdeal.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We have seen a weird behaviour in cassandra 3.6.
>>> Once our node was went down more than 10 hrs. After that, we had ran
>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>> over the cluster. On day- today basis, on expiry of every grace period,
>>> deleted records start surfacing again in cassandra.
>>>
>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>> FYI, we have 3 data centres now.
>>>
>>> Just want the help how to verify and debug this issue. Help will be
>>> appreciated.
>>>
>>>
>>> --
>>> Regards,
>>> Atul Saroha
>>>
>>> *Lead Software Engineer | CAMS*
>>>
>>> M: +91 8447784271
>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Reply via email to