Hello Alain,

I solved this with a brute force solution, but didn't understand exactly
what happened behind the scenes. What I did was:

a) removed the failed node from the ring with the unsafeAssassinate JMX
option.
b) this caused requests to that node to be routed to the following node
which didn't have the data, so in order to fix the problem I inserted a new
dummy node with the same token as the failed node, but with
"autobootstrap=false"
c) after the node joined the ring again, I did a clean shutdown with
nodetool -h localhost disablethrift
nodetool -h localhost disablegossip && sleep 10
nodetool -h localhost drain
d) restart the bootstrap process again in the new node.

But in our case, our cluster was not using VNodes, so this workaround will
probably not work with VNodes, since you cannot specify the 256 tokens from
the old node.

This really seem like some kind of metadata inconsistency in gossip, so you
probably should check if your nodetool gossipinfo shows a node that's not
supposed to be in the ring and unsafeAssassinate it. This post has more
info about it: http://nartax.com/2012/09/assassinate-cassandra-node/

But be careful to know what you're doing, as this can be a dangerous
operation.

Good luck!

Cheers,

Paulo




On Fri, Feb 14, 2014 at 11:17 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote:

> Hi Paulo,
>
> Did you find out how to fix this issue ? I am experimenting the exact same
> issue after trying to help you on this exact subject a few days ago :).
>
> Config : 32 C*1.2.11 nodes, Vnodes enabled, RF=3, 1 DC, On AWS EC2
> m1.xlarge.
>
> We added a few nodes (4) and it seems that this occurs on one node out of
> two...
>
> INFO 12:52:16,889 Finished streaming session
> d5e4d014-9558-11e3-950d-cd6aba92807e from /xxx.xxx.xxx.xxx
> java.lang.RuntimeException: Unable to fetch range
> [(20078703525355016727168231761171377180,20105424945623564908585534414693308183],
> (129753652951782325468767616123724624016,129754698153613057562227134647005586420],
> (4499106157406300244131405400767888838,4524540663392564361402125588359485564],
> (122461441134035840782923349842361962551,122462803389597917496737056756119104930],
> (107970238065835199457922160357012606207,107987706615224138615506976884972465320],
> (129754698153613057562227134647005586420,129760990520285412763184172827801136526],
> (38338043252657275110873170917842646549,38368318768493907804399955985800320618],
> (42022774431506526693485667522039962965,42053289032932587102300879230918436885],
> (66836265760288088017242608238099612345,66844191330959602627129212011239690831],
> (52540232739182066369547232798226785314,52559117354438503565212218200939569114],
> (145046787539667961591986998676504957238,145057153206926436867917708334845130444],
> (108279691586280658015556401795266720050,108305470056478513440634738885678702409],
> (40039571254531814244837067525035822613,40053379084508254942645157728035688263],
> (132027653159543236812527609067336099062,132029648290617316887203744857701890860],
> (52516518106546460227349801041398186304,52540232739182066369547232798226785314],
> (151797253868519929321029931533765036527,151828244658375264200603444399788004805],
> (145057153206926436867917708334845130444,145084033851007428646660791831082771964],
> (107963567982152736714636832273817259428,107970238065835199457922160357012606207]]
> for keyspace foo_bar from any hosts
>
> at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:260)
> at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
> at
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:973)
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:740)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
> at
> org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)
>
> Cannot load daemon
>
> Service exit with a return value of 3
>
> Hope you'll be able to help me on this one :)
>
>
> 2014-02-07 19:24 GMT+01:00 Robert Coli <rc...@eventbrite.com>:
>
> On Fri, Feb 7, 2014 at 4:41 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote:
>>
>>> From changelog :
>>>
>>>
>>>
>>> 1.2.15
>>>  * Move handling of migration event source to solve bootstrap race 
>>> (CASSANDRA-6648)
>>>
>>> Maybe should you give this new version a try, if you suspect your issue to 
>>> be related to CASSANDRA-6648.
>>>
>>> 6648 appears to have been introduced in 1.2.14, by :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6615
>>
>> So it should only affect 1.2.14.
>>
>> =Rob
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200
+55 83 9690-1314

Reply via email to