Hello Alain, I solved this with a brute force solution, but didn't understand exactly what happened behind the scenes. What I did was:
a) removed the failed node from the ring with the unsafeAssassinate JMX option. b) this caused requests to that node to be routed to the following node which didn't have the data, so in order to fix the problem I inserted a new dummy node with the same token as the failed node, but with "autobootstrap=false" c) after the node joined the ring again, I did a clean shutdown with nodetool -h localhost disablethrift nodetool -h localhost disablegossip && sleep 10 nodetool -h localhost drain d) restart the bootstrap process again in the new node. But in our case, our cluster was not using VNodes, so this workaround will probably not work with VNodes, since you cannot specify the 256 tokens from the old node. This really seem like some kind of metadata inconsistency in gossip, so you probably should check if your nodetool gossipinfo shows a node that's not supposed to be in the ring and unsafeAssassinate it. This post has more info about it: http://nartax.com/2012/09/assassinate-cassandra-node/ But be careful to know what you're doing, as this can be a dangerous operation. Good luck! Cheers, Paulo On Fri, Feb 14, 2014 at 11:17 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: > Hi Paulo, > > Did you find out how to fix this issue ? I am experimenting the exact same > issue after trying to help you on this exact subject a few days ago :). > > Config : 32 C*1.2.11 nodes, Vnodes enabled, RF=3, 1 DC, On AWS EC2 > m1.xlarge. > > We added a few nodes (4) and it seems that this occurs on one node out of > two... > > INFO 12:52:16,889 Finished streaming session > d5e4d014-9558-11e3-950d-cd6aba92807e from /xxx.xxx.xxx.xxx > java.lang.RuntimeException: Unable to fetch range > [(20078703525355016727168231761171377180,20105424945623564908585534414693308183], > (129753652951782325468767616123724624016,129754698153613057562227134647005586420], > (4499106157406300244131405400767888838,4524540663392564361402125588359485564], > (122461441134035840782923349842361962551,122462803389597917496737056756119104930], > (107970238065835199457922160357012606207,107987706615224138615506976884972465320], > (129754698153613057562227134647005586420,129760990520285412763184172827801136526], > (38338043252657275110873170917842646549,38368318768493907804399955985800320618], > (42022774431506526693485667522039962965,42053289032932587102300879230918436885], > (66836265760288088017242608238099612345,66844191330959602627129212011239690831], > (52540232739182066369547232798226785314,52559117354438503565212218200939569114], > (145046787539667961591986998676504957238,145057153206926436867917708334845130444], > (108279691586280658015556401795266720050,108305470056478513440634738885678702409], > (40039571254531814244837067525035822613,40053379084508254942645157728035688263], > (132027653159543236812527609067336099062,132029648290617316887203744857701890860], > (52516518106546460227349801041398186304,52540232739182066369547232798226785314], > (151797253868519929321029931533765036527,151828244658375264200603444399788004805], > (145057153206926436867917708334845130444,145084033851007428646660791831082771964], > (107963567982152736714636832273817259428,107970238065835199457922160357012606207]] > for keyspace foo_bar from any hosts > > at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:260) > at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84) > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:973) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:740) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:584) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:481) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348) > at > org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212) > > Cannot load daemon > > Service exit with a return value of 3 > > Hope you'll be able to help me on this one :) > > > 2014-02-07 19:24 GMT+01:00 Robert Coli <rc...@eventbrite.com>: > > On Fri, Feb 7, 2014 at 4:41 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: >> >>> From changelog : >>> >>> >>> >>> 1.2.15 >>> * Move handling of migration event source to solve bootstrap race >>> (CASSANDRA-6648) >>> >>> Maybe should you give this new version a try, if you suspect your issue to >>> be related to CASSANDRA-6648. >>> >>> 6648 appears to have been introduced in 1.2.14, by : >> >> https://issues.apache.org/jira/browse/CASSANDRA-6615 >> >> So it should only affect 1.2.14. >> >> =Rob >> >> > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200 +55 83 9690-1314