Re: Partition range incremental repairs

Chris Stokesmore Fri, 09 Jun 2017 03:38:24 -0700

> 
> I can't recommend *anyone* use incremental repair as there's some pretty 
> horrible bugs in it that can cause Merkle trees to wildly mismatch & result 
> in massive overstreaming.  Check out 
> https://issues.apache.org/jira/browse/CASSANDRA-9143 
> <https://issues.apache.org/jira/browse/CASSANDRA-9143>.  
> 
> TL;DR: Do not use incremental repair before 4.0.


Hi Jonathan,

Thanks for your reply, this is a slightly scary message for us! 2.2 has been 
out for nearly 2 years and incremental repairs are the default - and it has 
horrible bugs!?
I guess massive over streaming while a performance issue, does not affect data 
integrity..

Are there any plans to back port this to 3 or ideally 2.2 ?

Chris



> On Tue, Jun 6, 2017 at 9:54 AM Anuj Wadehra <anujw_2...@yahoo.co.in.invalid> 
> wrote:
> Hi Chris,
> 
> Can your share following info:
> 
> 1. Exact repair commands you use for inc repair and pr repair
> 
> 2. Repair time should be measured at cluster level for inc repair. So, whats 
> the total time it takes to run repair on all nodes for incremental vs pr 
> repairs?
> 
> 3. You are repairing one dc DC3. How many DCs are there in total and whats 
> the RF for keyspaces? Running pr on a specific dc would not repair entire 
> data.
> 
> 4. 885 ranges? From where did you get this number? Logs? Can you share the 
> number ranges printed in logs for both inc and pr case?
> 
> 
> Thanks
> Anuj
> 
> 
> Sent from Yahoo Mail on Android 
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
> <chris.elsm...@demandlogic.co <mailto:chris.elsm...@demandlogic.co>> wrote:
> Thank you for the excellent and clear description of the different versions 
> of repair Anuj, that has cleared up what I expect to be happening.
> 
> The problem now is in our cluster, we are running repairs with options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and 
> when we do our repairs are taking over a day to complete when previously when 
> running with the partition range option they were taking more like 8-9 hours.
> 
> As I understand it, using incremental should have sped this process up as all 
> three sets of data on each repair job should be marked as repaired however 
> this does not seem to be the case. Any ideas?
> 
> Chris
> 
>> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID 
>> <mailto:anujw_2...@yahoo.co.in.INVALID>> wrote:
>> 
>> Hi Chris,
>> 
>> Using pr with incremental repairs does not make sense. Primary range repair 
>> is an optimization over full repair. If you run full repair on a n node 
>> cluster with RF=3, you would be repairing each data thrice. 
>> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
>> When full repair is run on node A, the entire data in that range gets synced 
>> with replicas on node B and C. Now, when you run full repair on nodes B and 
>> C, you are wasting resources on repairing data which is already repaired. 
>> 
>> Primary range repair ensures that when you run repair on a node, it ONLY 
>> repairs the data which is owned by the node. Thus, no node repairs data 
>> which is not owned by it and must be repaired by other node. Redundant work 
>> is eliminated. 
>> 
>> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
>> to repair complete data in each cycle?? ..even data which has not even 
>> changed since the last repair cycle?
>> 
>> This is where Incremental repair comes as an improvement. Once repaired, a 
>> data would be marked repaired so that the next repair cycle could just focus 
>> on repairing the delta. Now, lets go back to the example of 5 node cluster 
>> with RF =3.This time we run incremental repair on all nodes. When you repair 
>> entire data on node A, all 3 replicas are marked as repaired. Even if you 
>> run inc repair on all ranges on the second node, you would not re-repair the 
>> already repaired data. Thus, there is no advantage of repairing only the 
>> data owned by the node (primary range of the node). You can run inc repair 
>> on all the data present on a node and Cassandra would make sure that when 
>> you repair data on other nodes, you only repair unrepaired data.
>> 
>> Thanks
>> Anuj
>> 
>> 
>> 
>> Sent from Yahoo Mail on Android 
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmore
>> <chris.elsm...@demandlogic.co <mailto:chris.elsm...@demandlogic.co>> wrote:
>> Hi all,
>> 
>> Wondering if anyone had any thoughts on this? At the moment the long running 
>> repairs cause us to be running them on two nodes at once for a bit of time, 
>> which obivould increases the cluster load.
>> 
>> On 2017-05-25 16:18 (+0100), Chris Stokesmore <c...@demandlogic.co 
>> <mailto:c...@demandlogic.co>> wrote: 
>> > Hi,> 
>> > 
>> > We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been 
>> > running repairs with the -pr option, via a cron job that runs on each node 
>> > once per week.> 
>> > 
>> > We changed that as some advice on the Cassandra IRC channel said it would 
>> > cause more anticompaction and  
>> > http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
>> >   
>> > <http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html>says
>> >  'Performing partitioner range repairs by using the -pr option is 
>> > generally considered a good choice for doing manual repairs. However, this 
>> > option cannot be used with incremental repairs (default for Cassandra 2.2 
>> > and later)'
>> > 
>> > Only problem is our -pr repairs were taking about 8 hours, and now the 
>> > non-pr repair are taking 24+ - I guess this makes sense, repairing 1/7 of 
>> > data increased to 3/7, except I was hoping to see a speed up after the 
>> > first loop through the cluster as each repair will be marking much more 
>> > data as repaired, right?> 
>> > 
>> > 
>> > Is running -pr with incremental repairs really that bad? > 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> <mailto:user-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> <mailto:user-h...@cassandra.apache.org>
>

Re: Partition range incremental repairs

Reply via email to