We dont have enough time before our major infrastructure upgrade to do the 
cassandra upgrade.


is there any insight on the below

if I run the repair with DC option (-dc localdatacenter) for local
datacenters, then all repairs are successfully. Is this indication that the
repairs are good? can we proceed with with adding new nodes and
decomissioning nodes even when individual repairs fail but the DC repairs work

Is there anything else other than scrub that can be performed to fix the repair 
issues?

Thanks



________________________________
From: Anup Shirolkar <anup.shirol...@instaclustr.com>
Sent: Thursday, April 19, 2018 11:28 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra 3.7 - Problem with Repairs - all nodes failing

Hi Leena,

The repairs are most likely failing because of some bug in Cassandra 3.7. I 
don't have a JIRA reference handy but there are quite some issues in this 
version.

Considering your scenario, it is highly recommended that you should upgrade to 
3.11.1.
Although, you have mentioned that upgrading is not an option, I would like to 
tell you that

On 19 April 2018 at 23:19, Leena Ghatpande 
<lghatpa...@hotmail.com<mailto:lghatpa...@hotmail.com>> wrote:

we have 8 node prod cluster running on cassandra 3.7. Our 2 largest tables have 
around 100M and 30M rows respectively while all others are relatively smaller.

we have been running repairs on alternate days on 2 of our keyspaces.
We run repair on each node in the cluster with the -pr option on every table 
within each keyspace individually. Repairs are run sequentially on each node
These were working fine, but with no change on the systems, they have started 
failing since last month.

The repairs have started failing for each table on every node with no specific 
error.

I have tried running scrub on every table and then running repair , but still 
the repair fails for all tables.

Our smallest table with only 100 rows also fails on repair.

But if I run the repair with DC option (-dc localdatacenter) for local 
datacenters, then the repairs are successfully. Is this indication that the 
repairs are good?
we would still want the repairs to work on individually tables as expected.

Need help trying to get the repairs to work properly as we have a big migration 
planned for june .

Upgrading cassandra is not an option right now.


Here are some of the errors
INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181 - 
[repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree for clients 
from / IP
ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461 Validator.java:261 - 
Failed creating a merkle tree for [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d 
on secure/clients, [(1849652111528073119,1856811324137977760], 
(3733211856223440695,3737790228588239952], 
(-2500456349659149537,-2498953852677197491], 
(1735271399836012489,1735412813423041471], 
(1871725370007007817,1890457592856328448], 
(4316163881057906640,4323247409810431754], 
(4286141602946572160,4308169130179803373], 
(5189663040558066167,5193871822490506231], 
(7160723554094225326,7161133449395023060], 
(-4363807597425543488,-4361416517953194804], 
(7008956720664744733,7022523551326267501], 
(-5742986989228874052,-5734436401879059890], 
(1828335330499002859,1849652111528073119], 
(7072368932695202361,7144087505892848370], 
(-5791935107311742541,-5781988493712029404], 
(7754917992280096132,7754953485457609099]]], 
/130.5.123.234<http://130.5.123.234> (see log for details)
ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461 CassandraDaemon.java:217 
- Exception in thread Thread[ValidationExecutor:213,1,main]
java.lang.NullPointerException: null
INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181 - 
[repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree for clients 
from /IP
ERROR [Repair#113:12] 2018-04-18 20:36:51,461 CassandraDaemon.java:217 - 
Exception in thread Thread[Repair#113:12,5,RMI Runtime]
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#223c73c2-4372-11e8-8749-89fc1dde5b7d on secure/clients, 
[(1849652111528073119,1856811324137977760], 
(3733211856223440695,3737790228588239952], 
(-2500456349659149537,-2498953852677197491], 
(1735271399836012489,1735412813423041471], 
(1871725370007007817,1890457592856328448], 
(4316163881057906640,4323247409810431754], 
(4286141602946572160,4308169130179803373], 
(5189663040558066167,5193871822490506231], 
(7160723554094225326,7161133449395023060], 
(-4363807597425543488,-4361416517953194804], 
(7008956720664744733,7022523551326267501], 
(-5742986989228874052,-5734436401879059890], 
(1828335330499002859,1849652111528073119], 
(7072368932695202361,7144087505892848370], 
(-5791935107311742541,-5781988493712029404], 
(7754917992280096132,7754953485457609099]]] Validation failed in 
/130.5.127.60<http://130.5.127.60>
        at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1525)
 ~[guava-18.0.jar:na]
        at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1511) 
~[guava-18.0.jar:na]
        at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160) 
~[apache-cassandra-3.7.jar:3.7]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_45]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_45]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#223c73c2-4372-11e8-8749-89fc1dde5b7d on clients, 
[(1849652111528073119,1856811324137977760], 
(3733211856223440695,3737790228588239952], 
(-2500456349659149537,-2498953852677197491], 
(1735271399836012489,1735412813423041471], 
(1871725370007007817,1890457592856328448], 
(4316163881057906640,4323247409810431754], 
(4286141602946572160,4308169130179803373], 
(5189663040558066167,5193871822490506231], 
(7160723554094225326,7161133449395023060], 
(-4363807597425543488,-4361416517953194804], 
(7008956720664744733,7022523551326267501], 
(-5742986989228874052,-5734436401879059890], 
(1828335330499002859,1849652111528073119], 
(7072368932695202361,7144087505892848370], 
(-5791935107311742541,-5781988493712029404], 
(7754917992280096132,7754953485457609099]]] Validation failed in 
/130.5.127.60<http://130.5.127.60>
        at 
org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:439)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:169)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.net<http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_45]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_45]






Reply via email to