We have run restarts on the cluster and that doesn’t seem to help at all.
We ran repair separately for each table that seems to go through usually but running a repair on a keyspace doesn’t. Anything anyone? Hannu > On 3 Jan 2018, at 23:24, Hannu Kröger <[email protected]> wrote: > > I can certainly try that. No problem there. > > However wouldn’t we then get this kind of errors if that was the case: > java.lang.RuntimeException: Cannot start multiple repair sessions over the > same sstables > ? > > Hannu > >> On 3 Jan 2018, at 20:50, Nandakishore Tokala <[email protected] >> <mailto:[email protected]>> wrote: >> >> hi Hannu, >> >> I think some of the repairs are hanging there. please restart all the nodes >> in the cluster and start the repair >> >> >> Thanks >> Nanda >> >> On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger <[email protected] >> <mailto:[email protected]>> wrote: >> Additional notes: >> >> 1) If I run the repair just on those tables, it works fine >> 2) Those tables are empty >> >> Hannu >> >> > On 3 Jan 2018, at 18:23, Hannu Kröger <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hello, >> > >> > Situation is as follows: >> > >> > Repair was started on node X on this keyspace with —full —pr. Repair fails >> > on node Y. >> > >> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m >> > looking at the debug.log. I see following messages related to this repair >> > request: >> > >> > ----------- >> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 >> > RepairMessageVerbHandler.java:114 - Validating >> > ValidationRequest{gcBefore=1511473932} >> > org.apache.cassandra.repair.messages.ValidationRequest@5a17430c >> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 >> > StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable >> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 >> > ColumnFamilyStore.java:954 - forceFlush requested but everything is clean >> > in mytable >> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - >> > Failed creating a merkle tree for [repair >> > #1df000a0-effa-11e7-8361-b7c9edfbfc33 on mykeyspace/mytable, >> > [(6917529027641081856,-9223372036854775808]]], /123.123.123.123 >> > <http://123.123.123.123/> (see log for details) >> > ----------- >> > >> > then the same about another table and after that which indicates that >> > repair “master” has told to abort basically, right? >> > >> > ----------- >> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 >> > RepairMessageVerbHandler.java:142 - Got anticompaction request >> > AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33} >> > org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be >> > ea >> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 >> > RepairMessageVerbHandler.java:168 - Got error, removing parent repair >> > session >> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 >> > CassandraDaemon.java:228 - Exception in thread >> > Thread[AntiEntropyStage:1,5,main] >> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair >> > session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed. >> > at >> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171) >> > ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at org.apache.cassandra.net >> > <http://org.apache.cassandra.net/>.MessageDeliveryTask.run(MessageDeliveryTask.java:66) >> > ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> > ~[na:1.8.0_111] >> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> > ~[na:1.8.0_111] >> > at >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> > ~[na:1.8.0_111] >> > at >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> > [na:1.8.0_111] >> > at >> > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) >> > [apache-cassandra-3.11.0.jar:3.11.0] >> > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] >> > Caused by: java.lang.RuntimeException: Parent repair session with id = >> > 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed. >> > at >> > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:409) >> > ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at >> > org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:444) >> > ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at >> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143) >> > ~[apache-cassandra-3.11.0.jar:3.11.0] >> > ... 7 common frames omitted >> > ----------- >> > >> > But that is almost all in the log and I don’t really see what the original >> > problem here is. >> > >> > Cassandra flushes the table to start building merkle tree and on next >> > millisecond it already fails the repair but without proper exception or >> > error logging about the problem. >> > >> > Cassandra version is the 3.11.0. >> > >> > Any ideas? >> > >> > Cheers, >> > Hannu >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> <mailto:[email protected]> >> For additional commands, e-mail: [email protected] >> <mailto:[email protected]> >> >> >> >> >> -- >> Thanks & Regards, >> Nanda Kishore >
