[ 
https://issues.apache.org/jira/browse/CASSANDRA-20348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931880#comment-17931880
 ] 

Aayush Gupta commented on CASSANDRA-20348:
------------------------------------------

Hii [~aratnofsky] , 

We are getting 2 errors on all nodes. After these errors we see the merkle tree 
errors. Please help in the resolution. 

*1st error*

[ERROR] [Repair-Task:1] 2025-02-28 06:59:54,357 RepairRunnable.java:178 - 
Repair 7fd0c7d0-f5cb-11ef-9e89-67d429cb20c9 failed:
java.lang.RuntimeException: Did not get replies from all endpoints.
    at 
org.apache.cassandra.service.ActiveRepairService.failRepair(ActiveRepairService.java:665)
    at 
org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:605)
    at 
org.apache.cassandra.repair.RepairRunnable.prepare(RepairRunnable.java:393)
    at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
    at org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:241)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
    at java.util.concurrent.FutureTask.run(FutureTask.java:277)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
    at java.util.concurrent.FutureTask.run(FutureTask.java:277)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:825)


*2nd Error*

[ERROR] [Repair#6412:1] 2025-02-28 07:00:15,226 RepairRunnable.java:178 - 
Repair 92123e10-f5cb-11ef-9e89-67d429cb20c9 failed:
java.lang.RuntimeException: Repair session 9228ac40-f5cb-11ef-9e89-67d429cb20c9 
for range [(7560435422318582316,7621922760098450692]] failed with error [repair 
#9228ac40-f5cb-11ef-9e89-67d429cb20c9 on 
reaper_db/repair_schedule_by_cluster_and_keyspace, 
[(7560435422318582316,7621922760098450692]]] Got VALIDATION_REQ failure from 
/10.X.X.X:7000: UNKNOWN
    at 
org.apache.cassandra.repair.RepairRunnable$RepairSessionCallback.onFailure(RepairRunnable.java:698)
    at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
    at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
    at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
    at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
    at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
    at 
org.apache.cassandra.repair.RepairSession.forceShutdown(RepairSession.java:342)
    at 
org.apache.cassandra.repair.RepairSession$1.onFailure(RepairSession.java:323)
    at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:825)
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#9228ac40-f5cb-11ef-9e89-67d429cb20c9 on 
reaper_db/repair_schedule_by_cluster_and_keyspace, 
[(7560435422318582316,7621922760098450692]]] Got VALIDATION_REQ failure from 
/10.X.X.X:7000: UNKNOWN
    at 
org.apache.cassandra.repair.messages.RepairMessage$1.onFailure(RepairMessage.java:81)
    at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
    at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
    at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
    at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
    at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
    at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
    at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
    ... 2 common frames omitted
    

> Issue with Merkle Tree Creation and Parent Repair Session Failure
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-20348
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20348
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Aayush Gupta
>            Priority: Normal
>
> We encountered an error while attempting to validate a repair session on our 
> Cassandra cluster. The following issues were observed:
>  * The validation failed during the creation of a Merkle tree for the repair 
> session with ID {{a36dbed0-d787-11ef-84a0-59e4de22a9ca}} on the 
> {{mailbox/messages_by_id}} table. Several ranges of SSTables were involved in 
> the failure, and the logs indicate that the session could not be successfully 
> validated due to issues with these SSTables.
>  * The logs also show a failure in the parent repair session with ID 
> {{{}a3695200-d787-11ef-84a0-59e4de22a9ca{}}}. The error traceback reveals 
> that the {{ActiveRepairService}} could not retrieve the parent repair 
> session, leading to the failure of the repair process. This caused further 
> issues with repair message handling in the system.
>  * The CassandraDaemon logs indicate that the error was related to an invalid 
> or incomplete parent repair session, which was being handled by the 
> {{{}RepairMessageVerbHandler{}}}. The repair operation failed, and the system 
> attempted to remove the parent repair session, but was unable to proceed 
> successfully.
> {*}Logs{*}:
>  # The validation failed due to the inability to create a Merkle tree for a 
> set of SSTables, and the parent repair session was marked as failed.
>  # The {{RepairMessageVerbHandler}} encountered an exception, leading to the 
> failure of the repair process, as seen in the attached stack traces.
> {*}Request{*}: We would appreciate assistance in understanding the cause of 
> this failure, as well as any recommendations for resolving it. Specifically, 
> any guidance on recovering the parent repair session or resolving issues with 
> the Merkle tree creation would be helpful.
> Logs : 
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 Validator.java:237 - 
> Failed creating a merkle tree for [repair 
> #a36dbed0-d787-11ef-84a0-59e4de22a9ca on mailbox/messages_by_id, 
> [(5734736046850292958,5753303790549862573], 
> (5013853684854274868,5016711782125970873], 
> (2418110930062086372,2421594217423380863], 
> (3333294399526748440,3333803051887680609], 
> (-7673852896118720379,-7668669570613527038], 
> (-8735570610439541581,-8735399615559719989], 
> (3467842041551014709,3480644019042625019]]], /10.X.X.X:7000 (see log for 
> details)
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 
> ValidationManager.java:173 - Validation failed.
> java.lang.RuntimeException: Parent repair session with id = 
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at 
> org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
> at 
> org.apache.cassandra.db.repair.CassandraValidationIterator.<init>(CassandraValidationIterator.java:203)
> at 
> org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
> at 
> org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
> at 
> org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
> at 
> org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
> at 
> org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
>  
>  
>  
> Logs from 10.X.X.X:7000 :
>  
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,724 
> RepairMessageVerbHandler.java:212 - Got error, removing parent repair session
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,725 CassandraDaemon.java:581 
> - Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session 
> with id = a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:215)
> at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
> Caused by: java.lang.RuntimeException: Parent repair session with id = 
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.previewKind(RepairMessageVerbHandler.java:55)
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
> ... 10 common frames omitted
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to