Just realized the ring output is included in the logs for both of those nodes. Disregard my earlier request :).
On Wed, Dec 15, 2010 at 8:27 AM, Nick Bailey <n...@riptano.com> wrote: > This is rc2 I am assuming? > > One thing about remove, the removetoken force command is meant to be run on > the node that originally started a remove and doesn't take a token > parameter. Not relevant to you problem though. > > Is this a test cluster and have you tried to reproduce the error? I would > be interested to know what the ring command looks like on both *.19 and *.17 > after the decommission is run. I assume you were running the ring command > on another node? I'll look into the logs more and see if anything jumps > out. > > > On Wed, Dec 15, 2010 at 6:37 AM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: > >> I am seeing very strange things when trying to decommission a node in my >> cluster (detailed logs attached). Here is a nodetool ring report **after** >> decommissioning of node 192.168.4.19 (as seen by any other, properly >> functioning node). >> >> >> >> 192.168.4.15 Up Normal 49.9 GB 25.00% >> 42535295865117307932921825928971026431 >> >> 192.168.4.20 Up Normal 42.56 GB 8.33% >> 56713727820156410577229101238628035242 >> >> 192.168.4.16 Up Normal 29.17 GB 16.67% >> 85070591730234615865843651857942052863 >> >> 192.168.4.19 Down Leaving 54.11 GB 16.67% >> 113427455640312821154458202477256070484 >> >> 192.168.4.17 Down Normal 48.88 GB 8.33% >> 127605887595351923798765477786913079295 >> >> 192.168.4.18 Up Normal 59.44 GB 25.00% >> 170141183460469231731687303715884105726 >> >> 192.168.4.12 Up Normal 52.3 GB 0.00% >> 170141183460469231731687303715884105727 >> >> >> >> >> >> What I am seeing is that after nodetool decommission completes on >> 192.168.4.19, the next node in the ring (192.168.4.17) ‘dies’ (see attached >> log, its nodetool ring report is quite different). By ‘dies’ I mean that it >> stops communicating with other nodes (but the Cassandra process is still >> running and, among other things, compaction continues). After restarting >> Cassandra on 192.168.4.17, the ring state gets stuck and the decommissioned >> node (192.168.4.19) does not get removed (at least from the nodetool ring >> report): >> >> >> >> 192.168.4.15 Up Normal 49.9 GB 25.00% >> 42535295865117307932921825928971026431 >> >> 192.168.4.20 Up Normal 42.56 GB 8.33% >> 56713727820156410577229101238628035242 >> >> 192.168.4.16 Up Normal 29.17 GB 16.67% >> 85070591730234615865843651857942052863 >> >> 192.168.4.19 Down Leaving 54.11 GB 16.67% >> 113427455640312821154458202477256070484 >> >> 192.168.4.17 Up Normal 69.12 GB 8.33% >> 127605887595351923798765477786913079295 >> >> 192.168.4.18 Up Normal 58.88 GB 25.00% >> 170141183460469231731687303715884105726 >> >> 192.168.4.12 Up Normal 52.3 GB 0.00% >> 170141183460469231731687303715884105727 >> >> >> >> >> >> Furthermore, when I try running “nodetool removetoken >> 113427455640312821154458202477256070484”, I get: >> >> >> >> Exception in thread "main" java.lang.UnsupportedOperationException: Node / >> 192.168.4.19 is already being removed. >> >> at >> org.apache.cassandra.service.StorageService.removeToken(StorageService.java:1731) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at >> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) >> >> >> >> >> >> And when I try running “nodetool removetoken force >> 113427455640312821154458202477256070484”, I get: >> >> >> >> RemovalStatus: No token removals in process. >> >> Exception in thread "main" java.lang.NullPointerException >> >> at >> org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:1703) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at >> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) >> >> >> >> ?!?!?!? >> >> >> >> I think have seen this type of behaviour once or twice before (I believe >> 0.7 b1 or later) but wrote it off as being caused by my misguided tinkering >> and/or other Cassandra bugs. This time around, I have done very little with >> JMX/CLI/nodetool and I can find no related Cassandra bugs. >> >> >> >> Help/suggestions? >> >> >> >> Dan Hendry >> >> (403) 660-2297 >> >> >> > >