Nodes marked dead…. leap second?

Filippo Diotalevi Mon, 02 Jul 2012 02:35:45 -0700

Hi,  
we had some really weird issues during the weekend, with our cassandra nodes 
starting marking as dead other (working) nodes in the cluster. That happened 
all Sunday, and it's still happening. Node are marked dead and up all the time….


Some example logs:

INFO [GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818) 
InetAddress /xx.xx.xx.233 is now dead.
INFO [GossipTasks:1] 2012-07-02 06:55:01,805 Gossiper.java (line 818) 
InetAddress /xx.xx.xx.235 is now dead.
INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804) 
InetAddress /xx.xx.xx.233 is now UP
INFO [GossipStage:1] 2012-07-02 06:55:21,893 Gossiper.java (line 804) 
InetAddress /xx.xx.xx.235 is now UP
INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818) 
InetAddress /xx.xx.xx.235 is now dead.
INFO [GossipTasks:1] 2012-07-02 06:57:58,537 Gossiper.java (line 818) 
InetAddress /xx.xx.xx.233 is now dead.
INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804) 
InetAddress /xx.xx.xx.233 is now UP



I couldn't find any real exception in the logs, but I noticed that the first 
error occurred at  
 INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 818) 
InetAddress /xx.xx.xx.234 is now dead.

2012-07-01 02:00:31,169, in the German timezone were the machine is hosted, is 
June 30th 23:59:60 UTC, the leap second that caused quite a few issues this 
weekend.  

Can it be the cause of the cluster failure? Has anybody noticed similar issues? 
( also see https://twitter.com/redditstatus/status/219244389044731904 )

I'm running Ubuntu 10.04.3 LTS.  

Many thanks,
--  
Filippo Diotalevi

Nodes marked dead…. leap second?

Reply via email to