> I don't see errors in the logs, but I do see > a lot of dropped mutations and reads. Any correlation? Yes. The dropped messages mean the server is overloaded.
Look for log messages from the GCInspector in /var/log/cassandra/system.log and/or an overloaded IO system see http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/10/2012, at 1:27 PM, Jason Hill <jasonhill...@gmail.com> wrote: > thanks for the replies. > > I'll check the load on the node that is reported as DOWN/UP. At first > glace it does not appear to be overloaded. But, I will dig in deeper, > is there a specific indicator on an ubuntu server that would be useful > to me? > > Also, I didn't make it clear, but in my original post, there are logs > from 2 different nodes: 10.21 and 10.25. They are each reporting that > the other is DOWN/UP at the same time. Would that still point me to > the suggestions you made? I don't see errors in the logs, but I do see > a lot of dropped mutations and reads. Any correlation? > > thanks again, > Jason > > On Tue, Oct 23, 2012 at 12:49 AM, aaron morton <aa...@thelastpickle.com> > wrote: >> check 10.50.10.21 for what is the system load. >> >> +1 >> >> And take a look in the logs on 10.21. >> >> 10.21 is being seen as down by the other nodes. it could be: >> >> * 10.21 failing to gossip fast enough, say by being overloaded to in long >> ParNew GC pauses. >> * This node failing to process gossip fast , say by being overloaded to in >> long ParNew GC pauses. >> * Problems with the tubes used to connect the nodes. >> >> (It's probably the first one.) >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 23/10/2012, at 8:19 PM, Jason Wee <peich...@gmail.com> wrote: >> >> check 10.50.10.21 for what is the system load. >> >> On Tue, Oct 23, 2012 at 10:41 AM, Jason Hill <jasonhill...@gmail.com> wrote: >>> >>> Hello, >>> >>> I'm on version 1.0.11. >>> >>> I'm seeing this in my system log with occasional frequency: >>> >>> INFO [GossipTasks:1] 2012-10-23 02:26:34,449 Gossiper.java (line 818) >>> InetAddress /10.50.10.21 is now dead. >>> INFO [GossipStage:1] 2012-10-23 02:26:34,620 Gossiper.java (line 804) >>> InetAddress /10.50.10.21 is now UP >>> >>> >>> INFO [StreamStage:1] 2012-10-23 02:24:38,763 StreamOutSession.java >>> (line 228) Streaming to /10.50.10.25 <--this line included for context >>> INFO [GossipTasks:1] 2012-10-23 02:26:30,603 Gossiper.java (line 818) >>> InetAddress /10.50.10.25 is now dead. >>> INFO [GossipStage:1] 2012-10-23 02:26:40,763 Gossiper.java (line 804) >>> InetAddress /10.50.10.25 is now UP >>> INFO [AntiEntropyStage:1] 2012-10-23 02:27:30,249 >>> AntiEntropyService.java (line 233) [repair >>> #5a3383c0-1cb5-11e2-0000-56b66459adef] Sending completed merkle tree >>> to /10.50.10.25 for (Innovari,TICCompressedLoad) <--this line included >>> for context >>> >>> What is this telling me? Is my network dropping for less than a >>> second? Are my nodes really dead and then up? Can someone shed some >>> light on this for me? >>> >>> cheers, >>> Jason >> >> >>