Thank you for reply. Yes, we have checked windows event logs and task scheduler, but haven't seen anything special. In addition, We have deployed one more cluster on same machines without any data and problem haven't reproduced on same time. It's very strange.
On Mon, Dec 5, 2011 at 7:45 PM, Riyad Kalla <rka...@gmail.com> wrote: > Konstantin, > > Have you checked the weekly cron job list on the servers or looked at the > system logs at those rough times to see what the servers are doing? I doubt > Cassandra has any time-sensitive code in it to kill off connections at > 14:50pm, so my guess is something on the host causing the problem. > > -R > > > On Mon, Dec 5, 2011 at 6:08 AM, Konstantin Chernyakov <kossof...@gmail.com > > wrote: > >> Hi. >> >> We are faced with strange problem where Cassandra nodes lose each other >> only one day of week, on friday, in exactly 14:50 PM, within several months. >> >> On that time each node periodically reports that other nodes are dead. >> >> At same time nodes are working fine. >> >> This continues about one hour, after that cluster stabilizes. >> >> Low CPU load. >> >> >> >> There are several snippets of log file from one node: >> >> >> >> TRACE [GossipTasks:1] 2011-12-02 15:12:51,829 FailureDetector.java (line >> 149) PHI for /192.168.68.228 : 38.154333610365036 >> >> INFO [GossipTasks:1] 2011-12-02 15:12:51,829 Gossiper.java (line 229) >> InetAddress /192.168.68.228 is now dead. >> >> >> >> ... >> >> >> >> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java >> (line 819) forceFlush requested but everything is clean >> >> INFO [ScheduledTasks:1] 2011-12-02 15:12:51,829 StatusLogger.java (line >> 66) ReadRepairStage 0 0 0 >> >> TRACE [GossipTasks:1] 2011-12-02 15:12:51,829 FailureDetector.java (line >> 149) PHI for /192.168.68.227 : -0.0 >> >> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java >> (line 819) forceFlush requested but everything is clean >> >> TRACE [GossipStage:1] 2011-12-02 15:12:51,845 FailureDetector.java (line >> 128) reporting /192.168.68.229 >> >> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java >> (line 819) forceFlush requested but everything is clean >> >> TRACE [GossipTasks:1] 2011-12-02 15:12:51,845 FailureDetector.java (line >> 149) PHI for /192.168.68.224 : 0.019569070233147485 >> >> INFO [ScheduledTasks:1] 2011-12-02 15:12:51,845 StatusLogger.java (line >> 66) MutationStage 0 0 0 >> >> TRACE [GossipTasks:1] 2011-12-02 15:12:51,845 FailureDetector.java (line >> 149) PHI for /192.168.68.226 : 37.966339304199074 >> >> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java >> (line 819) forceFlush requested but everything is clean >> >> TRACE [GossipStage:1] 2011-12-02 15:12:51,845 FailureDetector.java (line >> 128) reporting /192.168.68.228 >> >> DEBUG [NonPeriodicTasks:1] 2011-12-02 15:12:51,845 ColumnFamilyStore.java >> (line 819) forceFlush requested but everything is clean >> >> INFO [GossipTasks:1] 2011-12-02 15:12:51,845 Gossiper.java (line 229) >> InetAddress /192.168.68.226 is now dead. >> >> >> >> ... >> >> >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line >> 149) PHI for /192.168.68.228 : 7.7043961801903045 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line >> 149) PHI for /192.168.68.223 : 7.585990557120916 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.227 : 7.922553972766636 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.224 : 7.798568512691048 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.226 : 7.8425064901177715 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.225 : 4.592224429445155 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,900 FailureDetector.java (line >> 149) PHI for /192.168.68.222 : 8.06856164053645 >> >> INFO [GossipTasks:1] 2011-12-02 15:13:03,900 Gossiper.java (line 229) >> InetAddress /192.168.68.222 is now dead. >> >> DEBUG [GossipTasks:1] 2011-12-02 15:13:03,900 MessagingService.java (line >> 153) Resetting pool for /192.168.68.222 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line >> 149) PHI for /192.168.68.229 : 7.645354417332889 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line >> 149) PHI for /192.168.68.230 : 7.775610031554557 >> >> >> >> ... >> >> >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line >> 149) PHI for /192.168.68.228 : 7.7043961801903045 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,898 FailureDetector.java (line >> 149) PHI for /192.168.68.223 : 7.585990557120916 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.227 : 7.922553972766636 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.224 : 7.798568512691048 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.226 : 7.8425064901177715 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,899 FailureDetector.java (line >> 149) PHI for /192.168.68.225 : 4.592224429445155 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,900 FailureDetector.java (line >> 149) PHI for /192.168.68.222 : 8.06856164053645 >> >> INFO [GossipTasks:1] 2011-12-02 15:13:03,900 Gossiper.java (line 229) >> InetAddress /192.168.68.222 is now dead. >> >> DEBUG [GossipTasks:1] 2011-12-02 15:13:03,900 MessagingService.java (line >> 153) Resetting pool for /192.168.68.222 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line >> 149) PHI for /192.168.68.229 : 7.645354417332889 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:03,901 FailureDetector.java (line >> 149) PHI for /192.168.68.230 : 7.775610031554557 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 307) >> Gossip Digests are : /192.168.68.221:1322136327:682506 >> /192.168.68.223:1322116132:702923 /192.168.68.222:1322116089:702938 >> /192.168.68.228:1322116156:702981 /192.168.68.225:1322817130:31 >> /192.168.68.230:1322116110:702870 /192.168.68.226:1322116095:702557 >> /192.168.68.221:1322136327:682506 /192.168.68.224:1322116106:702922 >> /192.168.68.227:1322116098:702974 /192.168.68.229:1322116107:702950 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 360) >> Sending a GossipDigestSynMessage to /192.168.68.224 ... >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 360) >> Sending a GossipDigestSynMessage to /192.168.68.228 ... >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,903 Gossiper.java (line 101) >> Performing status check ... >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,904 FailureDetector.java (line >> 149) PHI for /192.168.68.228 : 8.350335221549706 >> >> TRACE [GossipTasks:1] 2011-12-02 15:13:04,904 FailureDetector.java (line >> 149) PHI for /192.168.68.223 : 8.222055442973863 >> >> INFO [GossipTasks:1] 2011-12-02 15:13:04,904 Gossiper.java (line 229) >> InetAddress /192.168.68.223 is now dead. >> >> >> >> The same picture on other nodes. >> >> >> >> Cassandra version 7.8. >> >> OS Windows server 2008R2. >> >> Cluster size 10 nodes. >> >> Replication factor 5. >> >> >> >> Best regards, >> >> Konstantin Chernyakov. >> >> >