Konstantin,

Have you checked the weekly cron job list on the servers or looked at the
system logs at those rough times to see what the servers are doing? I doubt
Cassandra has any time-sensitive code in it to kill off connections at
14:50pm, so my guess is something on the host causing the problem.

-R

On Mon, Dec 5, 2011 at 6:08 AM, Konstantin Chernyakov
<kossof...@gmail.com>wrote:

> Hi.
>
> We are faced with strange problem where Cassandra nodes lose each other
> only one day of week, on friday, in exactly 14:50 PM, within several months.
>
> On that time each node periodically reports that other nodes are dead.
>
> At same time nodes are working fine.
>
> This continues about one hour, after that cluster stabilizes.
>
> Low CPU load.
>
>
>
> There are several snippets of log file from one node:
>
>
>
> TRACE [GossipTasks:1] 2011-12-02 15:12:51,829 FailureDetector.java (line
> 149) PHI for /192.168.68.228 : 38.154333610365036
>
> INFO [GossipTasks:1] 2011-12-02 15:12:51,829 Gossiper.java (line 229)
> InetAddress /192.168.68.228 is now dead.
>
>
>
> ...
>
>
>
> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java
> (line 819) forceFlush requested but everything is clean
>
> INFO [ScheduledTasks:1] 2011-12-02 15:12:51,829 StatusLogger.java (line
> 66) ReadRepairStage                   0         0         0
>
> TRACE [GossipTasks:1] 2011-12-02 15:12:51,829 FailureDetector.java (line
> 149) PHI for /192.168.68.227 : -0.0
>
> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java
> (line 819) forceFlush requested but everything is clean
>
> TRACE [GossipStage:1] 2011-12-02 15:12:51,845 FailureDetector.java (line
> 128) reporting /192.168.68.229
>
> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java
> (line 819) forceFlush requested but everything is clean
>
> TRACE [GossipTasks:1] 2011-12-02 15:12:51,845 FailureDetector.java (line
> 149) PHI for /192.168.68.224 : 0.019569070233147485
>
> INFO [ScheduledTasks:1] 2011-12-02 15:12:51,845 StatusLogger.java (line
> 66) MutationStage                     0         0         0
>
> TRACE [GossipTasks:1] 2011-12-02 15:12:51,845 FailureDetector.java (line
> 149) PHI for /192.168.68.226 : 37.966339304199074
>
> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java
> (line 819) forceFlush requested but everything is clean
>
> TRACE [GossipStage:1] 2011-12-02 15:12:51,845 FailureDetector.java (line
> 128) reporting /192.168.68.228
>
> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java
> (line 819) forceFlush requested but everything is clean
>
> INFO [GossipTasks:1] 2011-12-02 15:12:51,845 Gossiper.java (line 229)
> InetAddress /192.168.68.226 is now dead.
>
>
>
> ...
>
>
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line
> 149) PHI for /192.168.68.228 : 7.7043961801903045
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line
> 149) PHI for /192.168.68.223 : 7.585990557120916
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.227 : 7.922553972766636
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.224 : 7.798568512691048
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.226 : 7.8425064901177715
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.225 : 4.592224429445155
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,900 FailureDetector.java (line
> 149) PHI for /192.168.68.222 : 8.06856164053645
>
> INFO [GossipTasks:1] 2011-12-02 15:13:03,900 Gossiper.java (line 229)
> InetAddress /192.168.68.222 is now dead.
>
> DEBUG [GossipTasks:1] 2011-12-02 15:13:03,900 MessagingService.java (line
> 153) Resetting pool for /192.168.68.222
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line
> 149) PHI for /192.168.68.229 : 7.645354417332889
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line
> 149) PHI for /192.168.68.230 : 7.775610031554557
>
>
>
> ...
>
>
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line
> 149) PHI for /192.168.68.228 : 7.7043961801903045
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line
> 149) PHI for /192.168.68.223 : 7.585990557120916
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.227 : 7.922553972766636
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.224 : 7.798568512691048
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.226 : 7.8425064901177715
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line
> 149) PHI for /192.168.68.225 : 4.592224429445155
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,900 FailureDetector.java (line
> 149) PHI for /192.168.68.222 : 8.06856164053645
>
> INFO [GossipTasks:1] 2011-12-02 15:13:03,900 Gossiper.java (line 229)
> InetAddress /192.168.68.222 is now dead.
>
> DEBUG [GossipTasks:1] 2011-12-02 15:13:03,900 MessagingService.java (line
> 153) Resetting pool for /192.168.68.222
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line
> 149) PHI for /192.168.68.229 : 7.645354417332889
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line
> 149) PHI for /192.168.68.230 : 7.775610031554557
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 307)
> Gossip Digests are : /192.168.68.221:1322136327:682506
> /192.168.68.223:1322116132:702923 /192.168.68.222:1322116089:702938
> /192.168.68.228:1322116156:702981 /192.168.68.225:1322817130:31
> /192.168.68.230:1322116110:702870 /192.168.68.226:1322116095:702557
> /192.168.68.221:1322136327:682506 /192.168.68.224:1322116106:702922
> /192.168.68.227:1322116098:702974 /192.168.68.229:1322116107:702950
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 360)
> Sending a GossipDigestSynMessage to /192.168.68.224 ...
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 360)
> Sending a GossipDigestSynMessage to /192.168.68.228 ...
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 101)
> Performing status check ...
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,904 FailureDetector.java (line
> 149) PHI for /192.168.68.228 : 8.350335221549706
>
> TRACE [GossipTasks:1] 2011-12-02 15:13:04,904 FailureDetector.java (line
> 149) PHI for /192.168.68.223 : 8.222055442973863
>
> INFO [GossipTasks:1] 2011-12-02 15:13:04,904 Gossiper.java (line 229)
> InetAddress /192.168.68.223 is now dead.
>
>
>
> The same picture on other nodes.
>
>
>
> Cassandra version 7.8.
>
> OS Windows server 2008R2.
>
> Cluster size 10 nodes.
>
> Replication factor 5.
>
>
>
> Best regards,
>
> Konstantin Chernyakov.
>
>

Reply via email to