Please help -- I've been having pretty consistent failures that look
like this one. Don't know how to proceed.
Below text comes from the system log. The cluster was all up before and
after the attempted repair, so I don't
quite understand how Cassandra declared a node dead (in the below). Was
is a timeout? How do I fix that?
Thanks,
Maxim
INFO [GossipStage:1] 2011-12-02 17:12:07,293 Gossiper.java (line 683)
InetAddress /130.199.185.194 is now UP
ERROR [AntiEntropySessions:1] 2011-12-02 17:12:07,354
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[AntiEntropySessions:1,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Problem during repair
session manual-repair-618fad49-387f-44df-a25e-aa57b314768a, endpoint
/130.199.185.194 died
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Problem during repair session
manual-repair-618fad49-387f-44df-a25e-aa57b314768a, endpoint
/130.199.185.194 died
at
org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:712)
at
org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:749)
at
org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:155)
at
org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:527)
at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
at
org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
... 3 more
INFO [AntiEntropyStage:1] 2011-12-02 17:12:07,392
AntiEntropyService.java (line 215) Sending AEService tree for
#<TreeRequest manual-repair-c721c217-4b70-4a15-91fc-374b39b8b05\
3, cassandra03.usatlas.bnl.gov/130.199.185.195, (PANDA,files),
(56713727820156410577229101238628035242,113427455640312821154458202477256070484]>