At some point the gossip system on the node this log is from decided that 
130.199.185.195 was DOWN. This was based on how often the node was gossiping to 
the cluster. 

The active repair session was informed. And to avoid failing the job 
unnecessarily it tested that the errant nodes phi value was twice the 
configured phi_convict_threshold. It was and the repair was killed. 

Take a look at the logs on 130.199.185.195 and see if anything was happening on 
the node at the same time. Could  be GC or an overloaded node (it would log 
about dropped messages). 

Perhaps other nodes also saw 130.199.185.195 as down? it only needed to be down 
for a few seconds. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/04/2012, at 2:25 PM, Maxim Potekhin wrote:

> Server log below. Mind you that all the nodes are still up -- even though 
> reported as "dead" in this log.
> What's going on here?
> 
> Thanks!
> 
>  INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java (line 719) 
> InetAddress /130.199.185.193 is now dead.
>  INFO [ScheduledTasks:1] 2012-04-18 22:18:26,487 StatusLogger.java (line 50) 
> Pool Name                    Active   Pending   Blocked
> ERROR [GossipTasks:1] 2012-04-18 22:18:26,488 AntiEntropyService.java (line 
> 722) Problem during repair session manual-repair-1b3453b
> 6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
> ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,488 
> StorageService.java (line 1607) Repair session org.apache.cas
> sandra.service.AntiEntropyService$RepairSession@4cc9e2bc failed.
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.io.IOException: Problem during repair session manual-repai
> r-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.185.195 died
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at 
> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1603)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>         at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>         at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>         at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>         at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>         at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.RuntimeException: java.io.IOException: Problem during 
> repair session manual-repair-43545b22-ffe8-4243-8a98-509b
> bfec9872, endpoint /130.199.185.195 died
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         ... 3 more
> Caused by: java.io.IOException: Problem during repair session 
> manual-repair-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.
> 185.195 died
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>         at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>         at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>         at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         ... 3 more
>  INFO [FlushWriter:4] 2012-04-18 22:18:26,489 Memtable.java (line 237) 
> Writing Memtable-files.pandaid@988533693(1754325/140346000 se
> rialized/live bytes, 70173 ops)
> ERROR [AntiEntropySessions:8] 2012-04-18 22:18:26,488 AntiEntropyService.java 
> (line 691) Repair session manual-repair-1b3453b6-28b5-
> 4abd-84ce-0326b5468064 (on cfs [Ljava.lang.String;@5a44cc06, range 
> (113427455640312821154458202477256070484,0]) failed with the foll
> owing error
> java.io.IOException: Problem during repair session 
> manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 
> die
> d
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>         at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>         at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>         at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
>  INFO [ScheduledTasks:1] 2012-04-18 22:18:26,488 StatusLogger.java (line 65) 
> ReadStage                         0         0         0
> ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,491 
> StorageService.java (line 1607) Repair session org.apache.cas
> sandra.service.AntiEntropyService$RepairSession@41bb342c failed.
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.io.IOException: Problem during repair session manual-repai
> r-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at 
> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1603)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>         at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>         at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>         at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>         at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>         at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.RuntimeException: java.io.IOException: Problem during 
> repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326
> b5468064, endpoint /130.199.185.193 died
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         ... 3 more
> Caused by: java.io.IOException: Problem during repair session 
> manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.
> 185.193 died
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>         at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>         at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>         at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>         at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> 
> 
> 
> On 4/12/2012 10:03 PM, aaron morton wrote:
>> 
>> Look at the server side logs for errors. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote:
>> 
>>> Hello,
>>> 
>>> I'm doing compactions under 0.8.8.
>>> 
>>> Recently, I started seeing a stack trace like one below, and I can't figure 
>>> out what causes this to appear.
>>> The cluster has been in operation for mode than half a year w/o errors like 
>>> this one.
>>> 
>>> Any help will be appreciated,
>>> Thanks
>>> 
>>> Maxim
>>> 
>>> 
>>> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
>>> Read timed out
>>> Exception in thread "main" java.io.IOException: Repair command #1: some 
>>> repair session(s) failed (see log for details).
>>>        at 
>>> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at 
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>        at 
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>        at 
>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>        at 
>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>        at 
>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>        at 
>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>        at 
>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>>        at 
>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>        at 
>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>>        at 
>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at 
>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>>>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>        at 
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>        at 
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>>        at 
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:662)
>>> 
>> 
> 

Reply via email to