Hi, we had some really weird issues during the weekend, with our cassandra nodes starting marking as dead other (working) nodes in the cluster. That happened all Sunday, and it's still happening. Node are marked dead and up all the time….
Some example logs: INFO [GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818) InetAddress /xx.xx.xx.233 is now dead. INFO [GossipTasks:1] 2012-07-02 06:55:01,805 Gossiper.java (line 818) InetAddress /xx.xx.xx.235 is now dead. INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804) InetAddress /xx.xx.xx.233 is now UP INFO [GossipStage:1] 2012-07-02 06:55:21,893 Gossiper.java (line 804) InetAddress /xx.xx.xx.235 is now UP INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818) InetAddress /xx.xx.xx.235 is now dead. INFO [GossipTasks:1] 2012-07-02 06:57:58,537 Gossiper.java (line 818) InetAddress /xx.xx.xx.233 is now dead. INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804) InetAddress /xx.xx.xx.233 is now UP I couldn't find any real exception in the logs, but I noticed that the first error occurred at INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 818) InetAddress /xx.xx.xx.234 is now dead. 2012-07-01 02:00:31,169, in the German timezone were the machine is hosted, is June 30th 23:59:60 UTC, the leap second that caused quite a few issues this weekend. Can it be the cause of the cluster failure? Has anybody noticed similar issues? ( also see https://twitter.com/redditstatus/status/219244389044731904 ) I'm running Ubuntu 10.04.3 LTS. Many thanks, -- Filippo Diotalevi