Russell,
The hinted handoff manager is checking for hints to see if it needs to pass 
those off during the decommission so that the hints don't get lost.  You most 
likely have a lot of hints, or a bunch of tombstones, or something in the table 
causing the query to timeout.  You aren't seeing any other exceptions in your 
logs before the timeout are you?  Raising the read timeout period on your nodes 
before you decommission them, or manually deleting the hints CF, should most 
likely get your past this.  If you delete them, you would then want to make 
sure you ran a full cluster repair when you are done with all of your 
decommissions, to propagate data from any hints you deleted.

-Jeremiah Jordan

On Apr 10, 2014, at 1:08 PM, Russell Bradberry <rbradbe...@gmail.com> wrote:

> We have about a 30 node cluster running the latest C* 1.2 series DSE.  One 
> datacenter uses VNodes and the other datacenter has VNodes Disabled (because 
> it is running DSE-Seearch)
> 
> We have been replacing nodes in the VNode datacenter with faster ones and we 
> have yet to have a successful decommission.  Every time we attempt to 
> decommission a node we get an “Operation Timed Out” error and the 
> decommission fails.  We keep retrying it and sometimes it will work and other 
> times we will just give up and force the node removal.  It seems though, that 
> all the data has streamed out of the node before the decommission fails.
> 
> What exactly does it need to read before leaving that would cause this?  We 
> also have noticed that in several nodes after the removal that there are 
> ghost entries for the removed node in the system.peers table and this doesn’t 
> get removed until we restart Cassandra on that node.
> 
> Also, we have noticed that running repairs with VNodes is considerably 
> slower. Is this a misconfiguration? Or is it expected that VNodes repairs 
> will be slow?
> 
> 
> Here is the stack trace from the decommission failure:
> 
> Exception in thread "main" java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 0 responses.
>         at 
> org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:578)
>         at 
> org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(HintedHandOffManager.java:528)
>         at 
> org.apache.cassandra.service.StorageService.streamHints(StorageService.java:2925)
>         at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2905)
>         at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2866)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
>         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>         at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>         at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>         at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>         at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>         at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
>         at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
>         at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
>         at sun.rmi.transport.Transport$1.run(Transport.java:177)
>         at sun.rmi.transport.Transport$1.run(Transport.java:174)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
>         at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 0 responses.
>         at 
> org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105)
>         at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1213)
>         at 
> org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:573)
>         ... 39 more

Reply via email to