On 20/09/2013, at 10:46 AM, Lists <li...@benjamindsmith.com> wrote:

> On 09/19/2013 04:50 PM, Andrew Beekhof wrote:
>> From this we can infer that corosync has gotten horribly confused and, as a 
>> consequence, pacemaker can't talk to its peers anymore.
>> 
>>> >this is a test cluster and not being monitored by a netmon. Any other 
>>> >details I could provide that would be useful/helpful?
>> Shortly before this, Corosync claims:
>> 
>> Sep 19 00:47:07 corosync [TOTEM ] A processor joined or left the membership 
>> and a new membership was formed.
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: 
>> pcmk_cpg_membership:   Left[2.0] crmd.1
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: 
>> crm_update_peer_proc:  pcmk_cpg_membership: Node 
>> bender.schoolpathways.com[1] - corosync-cpg is now offline
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: 
>> peer_update_callback:  Client bender.schoolpathways.com/peer now has status 
>> [offline] (DC=true)
>> 
>> Is this true?
>> If not, perhaps some timeouts need to be adjusted.  A switch to udpu 
>> (instead of multicast) may also be helpful.
> 
> Although the times you specifically mention were probably due to 
> intentionally created failures, later, similar messages would have been 
> clearly outside the range of time where I was testing. I've updated 
> corosync.conf to use udpu from an example config and continue testing.
> 
> What timeout values might be useful to consider?

try 'man corosync.conf' and look for 'milliseconds' :)

> These two machines are next to each other, on the same gigabit switch and no 
> packet loss has ever been detected.Truth is that I'm unsure what would be 
> waiting.

Its quite possibly an algorithm issue. We've seen a few like this in the past.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to