Hi Aaron, Thanks for your reply. We still haven't solved this unfortunately.
How did you start the bootstrap for the .18 node ? Standard way: we set "AutoBootstrap" to true and added all the servers from the working ring as seeds. > Was it the .18 or the .17 node you tried to add We first tried adding .17, it streamed for a while, took on a 50GB of load, stopped streaming but then didn't enter into the ring. We left it for a few days to see if it would come in, but no luck. After that we did decommission and removeToken ( in that order) operations. Since we couldn't get .17 in we tried again with .18. Before doing so we increased the RpcTimeoutInMillis from 1000, to 10000 having read that this may cause the problem of nodes not entering into the ring. It's been going since friday and still, like .17, won't come into the ring. Does it have a token in the config or did you use nodetool move to set it No we didn't manually set the token in the config, rather we were relaying on the token to be assigned durring bootstrap from the RandomPartitioner. Again thanks for the help. Dimitry. On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton <aa...@thelastpickle.com>wrote: > Dimitry, Did you get anywhere with this ? > > Was it the .18 or the .17 node you tried to add ? How did you start the > bootstrap for the .18 node ? Does it have a token in the config or did you > use nodetool move to set it? > > I had a quick look at the code AKAIK the message about removing the fat > client is logged when the node does not have a record of the token the other > node as. > > Aaron > > On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimi...@reviewpro.com> > wrote: > > Hi All, > We recently upgraded from .65 to .66 after which we tried adding a new node > to our cluster. We left it bootstrapping and after 3 days, it still refused > to join the ring. The strange thing is that nodetool info shows 50GB of load > and nodetool ring shows that it sees the rest of ring, which it is not part > of. We tried the process again with another server -- again the same thing > as before: > > > //from machine 192.168.218 > > > /opt/cassandra/bin/nodetool -h localhost -p 8999 info > 131373516047318302934572185119435768941 > Load : 52.85 GB > Generation No : 1287761987 > Uptime (seconds) : 323157 > Heap Memory (MB) : 795.42 / 1945.63 > > > /opt/cassandra/bin/nodetool -h localhost -p 8999 ring > Address Status Load Range Ring > 158573510920250391466717289405976537674 > 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--| > 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | | > 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | | > 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->| > > opt/cassandra/bin/nodetool -h localhost -p 8999 streams > Mode: Bootstrapping > Not sending any streams. > Not receiving any streams. > > > Whats more, while looking at the log of one of the nodes I see gossip > messages from 192.168.1.17 -- the first node we tried to add to the cluster > but which is not running at the the time of the log message: > INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) FatClient / > 192.168.2.17 has been silent for 3600000ms, removing from gossip > INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node / > 192.168.2.17 is now part of the cluster > > > Thanks in advance for the help, > Dimitry > > -- Dimitry Lvovsky Director of Engineering ReviewPro www.reviewpro.com +34 616 337 103