Le 8 juin 2012 à 20:02, Samuel CARRIERE a écrit : > I'm in the train but just a guess : maybe it's hinted handoff. A look in the > logs of the new nodes could confirm that : look for the IP of an old node and > maybe you'll find hinted handoff related messages.
I grepped on every node about every old node, I got nothing since the "crash". If it can be of some help, here is some grepped log of the crash: system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,241 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will not receive data for re-replication of /10.10.0.22 system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will not receive data for re-replication of /10.10.0.22 system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will not receive data for re-replication of /10.10.0.22 system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will not receive data for re-replication of /10.10.0.22 system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will not receive data for re-replication of /10.10.0.22 system.log.1: INFO [GossipStage:1] 2012-05-06 00:44:33,822 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. system.log.1: INFO [OptionalTasks:1] 2012-05-06 04:25:23,895 HintedHandOffManager.java (line 179) Deleting any stored hints for /10.10.0.24 system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,895 StorageService.java (line 1157) Removing token 127605887595351923798765477786913079296 for /10.10.0.24 system.log.1: INFO [GossipStage:1] 2012-05-09 04:26:25,015 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. Maybe its the way I have removed nodes ? AFAIR I didn't used the decommission command. For each node I got the node down and then issue a remove token command. Here is what I can find in the log about when I removed one of them: system.log.1: INFO [GossipTasks:1] 2012-05-02 17:21:10,281 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:21:21,496 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [GossipStage:1] 2012-05-02 17:21:59,307 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:31:20,336 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:41:06,177 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:51:18,148 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:00:31,709 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:11:02,521 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:20:38,282 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:31:09,513 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:40:31,565 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:51:10,566 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:00:32,197 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:11:17,018 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:21:21,759 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before hint delivery, aborting system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. system.log.1: INFO [OptionalTasks:1] 2012-05-02 20:05:57,281 HintedHandOffManager.java (line 179) Deleting any stored hints for /10.10.0.24 system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 StorageService.java (line 1157) Removing token 145835300108973619103103718265651724288 for /10.10.0.24 Nicolas > > > ----- Message d'origine ----- > De : Nicolas Lalevée [nicolas.lale...@hibnet.org] > Envoyé : 08/06/2012 19:26 ZE2 > À : user@cassandra.apache.org > Objet : Re: Dead node still being pinged > > > > Le 8 juin 2012 à 15:17, Samuel CARRIERE a écrit : > >> What does nodetool ring says ? (Ask every node) > > currently, each of new node see only the tokens of the new nodes. > >> Have you checked that the list of seeds in every yaml is correct ? > > yes, it is correct, every of my new node point to the first of my new node > >> What version of cassandra are you using ? > > Sorry I should have wrote this in my first mail. > I use the 1.0.9 > > Nicolas > >> >> Samuel >> >> >> >> Nicolas Lalevée <nicolas.lale...@hibnet.org> >> 08/06/2012 14:10 >> Veuillez répondre à >> user@cassandra.apache.org >> >> A >> user@cassandra.apache.org >> cc >> Objet >> Dead node still being pinged >> >> >> >> >> >> I had a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger >> machines, data-5,7. And we moved all data from data-1,4 to data-5,7. >> To move all the data without interruption of service, I added one new node >> at a time. And then I removed one by one the old machines via a "remove >> token". >> >> Everything was working fine. Until there was an expected load on our >> cluster, the machine started to swap and become unresponsive. We fixed the >> unexpected load and the three new machines were restarted. After that the >> new cassandra machines were stating that some old token were not assigned, >> namely from data-2 and data-4. To fix this I issued again some "remove >> token" commands. >> >> Everything seems to be back to normal, but on the network I still see some >> packet from the new cluster to the old machines. On the port 7000. >> How I can tell cassandra to completely forget about the old machines ? >> >> Nicolas >> >> >