On 20/09/2013, at 10:46 AM, Lists <li...@benjamindsmith.com> wrote: > On 09/19/2013 04:50 PM, Andrew Beekhof wrote: >> From this we can infer that corosync has gotten horribly confused and, as a >> consequence, pacemaker can't talk to its peers anymore. >> >>> >this is a test cluster and not being monitored by a netmon. Any other >>> >details I could provide that would be useful/helpful? >> Shortly before this, Corosync claims: >> >> Sep 19 00:47:07 corosync [TOTEM ] A processor joined or left the membership >> and a new membership was formed. >> Sep 19 00:56:09 [9004] nomad.schoolpathways.com crmd: info: >> pcmk_cpg_membership: Left[2.0] crmd.1 >> Sep 19 00:56:09 [9004] nomad.schoolpathways.com crmd: info: >> crm_update_peer_proc: pcmk_cpg_membership: Node >> bender.schoolpathways.com[1] - corosync-cpg is now offline >> Sep 19 00:56:09 [9004] nomad.schoolpathways.com crmd: info: >> peer_update_callback: Client bender.schoolpathways.com/peer now has status >> [offline] (DC=true) >> >> Is this true? >> If not, perhaps some timeouts need to be adjusted. A switch to udpu >> (instead of multicast) may also be helpful. > > Although the times you specifically mention were probably due to > intentionally created failures, later, similar messages would have been > clearly outside the range of time where I was testing. I've updated > corosync.conf to use udpu from an example config and continue testing. > > What timeout values might be useful to consider?
try 'man corosync.conf' and look for 'milliseconds' :) > These two machines are next to each other, on the same gigabit switch and no > packet loss has ever been detected.Truth is that I'm unsure what would be > waiting. Its quite possibly an algorithm issue. We've seen a few like this in the past.
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org