Re: Errors when decommissioning - 0.7 RC1

Nick Bailey Wed, 15 Dec 2010 06:35:29 -0800

Just realized the ring output is included in the logs for both of those
nodes.  Disregard my earlier request :).


On Wed, Dec 15, 2010 at 8:27 AM, Nick Bailey <n...@riptano.com> wrote:

> This is rc2 I am assuming?
>
> One thing about remove, the removetoken force command is meant to be run on
> the node that originally started a remove and doesn't take a token
> parameter.  Not relevant to you problem though.
>
> Is this a test cluster and have you tried to reproduce the error? I would
> be interested to know what the ring command looks like on both *.19 and *.17
> after the decommission is run.  I assume you were running the ring command
> on another node?  I'll look into the logs more and see if anything jumps
> out.
>
>
> On Wed, Dec 15, 2010 at 6:37 AM, Dan Hendry <dan.hendry.j...@gmail.com>wrote:
>
>> I am seeing very strange things when trying to decommission a node in my
>> cluster (detailed logs attached). Here is a nodetool ring report **after**
>> decommissioning of node 192.168.4.19  (as seen by any other, properly
>> functioning node).
>>
>>
>>
>> 192.168.4.15    Up     Normal  49.9 GB         25.00%
>> 42535295865117307932921825928971026431
>>
>> 192.168.4.20    Up     Normal  42.56 GB        8.33%
>> 56713727820156410577229101238628035242
>>
>> 192.168.4.16    Up     Normal  29.17 GB        16.67%
>> 85070591730234615865843651857942052863
>>
>> 192.168.4.19    Down   Leaving 54.11 GB        16.67%
>> 113427455640312821154458202477256070484
>>
>> 192.168.4.17    Down   Normal  48.88 GB        8.33%
>> 127605887595351923798765477786913079295
>>
>> 192.168.4.18    Up     Normal  59.44 GB        25.00%
>> 170141183460469231731687303715884105726
>>
>> 192.168.4.12    Up     Normal  52.3 GB         0.00%
>> 170141183460469231731687303715884105727
>>
>>
>>
>>
>>
>> What I am seeing is that after nodetool decommission completes on
>> 192.168.4.19, the next node in the ring (192.168.4.17) ‘dies’ (see attached
>> log, its nodetool ring report is quite different). By ‘dies’ I mean that it
>> stops communicating with other nodes (but the Cassandra process is still
>> running and, among other things, compaction continues). After restarting
>> Cassandra on 192.168.4.17, the ring state gets stuck and the decommissioned
>> node (192.168.4.19) does not get removed (at least from the nodetool ring
>> report):
>>
>>
>>
>> 192.168.4.15    Up     Normal  49.9 GB         25.00%
>> 42535295865117307932921825928971026431
>>
>> 192.168.4.20    Up     Normal  42.56 GB        8.33%
>> 56713727820156410577229101238628035242
>>
>> 192.168.4.16    Up     Normal  29.17 GB        16.67%
>> 85070591730234615865843651857942052863
>>
>> 192.168.4.19    Down   Leaving 54.11 GB        16.67%
>> 113427455640312821154458202477256070484
>>
>> 192.168.4.17    Up     Normal  69.12 GB        8.33%
>> 127605887595351923798765477786913079295
>>
>> 192.168.4.18    Up     Normal  58.88 GB        25.00%
>> 170141183460469231731687303715884105726
>>
>> 192.168.4.12    Up     Normal  52.3 GB         0.00%
>> 170141183460469231731687303715884105727
>>
>>
>>
>>
>>
>> Furthermore, when I try running “nodetool removetoken
>> 113427455640312821154458202477256070484”, I get:
>>
>>
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException: Node /
>> 192.168.4.19 is already being removed.
>>
>>                 at
>> org.apache.cassandra.service.StorageService.removeToken(StorageService.java:1731)
>>
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>
>>                 at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>
>>                 at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>
>>                 at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>                 at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>
>>
>>
>>
>>
>> And when I try running “nodetool removetoken force
>> 113427455640312821154458202477256070484”, I get:
>>
>>
>>
>> RemovalStatus: No token removals in process.
>>
>> Exception in thread "main" java.lang.NullPointerException
>>
>>                 at
>> org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:1703)
>>
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>
>>                 at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>
>>                 at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>
>>                 at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>                 at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>
>>
>>
>> ?!?!?!?
>>
>>
>>
>> I think have seen this type of behaviour once or twice before (I believe
>> 0.7 b1 or later) but wrote it off as being caused by my misguided tinkering
>> and/or other Cassandra bugs. This time around, I have done very little with
>> JMX/CLI/nodetool and I can find no related Cassandra bugs.
>>
>>
>>
>> Help/suggestions?
>>
>>
>>
>> Dan Hendry
>>
>> (403) 660-2297
>>
>>
>>
>
>

Re: Errors when decommissioning - 0.7 RC1

Reply via email to