Am 06.03.2013 um 05:14 schrieb Andrew Beekhof <and...@beekhof.net>: > On Tue, Mar 5, 2013 at 4:20 AM, Leon Fauster <leonfaus...@googlemail.com> > wrote: >> >> So far all good. I am doing some stress test now and noticed that rebooting >> one node (n2), that node (n2) will be marked as standby in the cib (shown on >> the >> other node (n1)). >> >> After rebooting the node (n2) crm_mon on that node shows that the other node >> (n1) >> is offline and begins to start the ressources. While the other node (n1) >> that wasn't >> rebooted still shows n2 as standby. At that point both nodes are runnnig the >> "same" >> resources. After a couple of minutes that situation is noticed and both nodes >> renegotiate the current state. Then one node take over the responsibility to >> provide >> the resources. On both nodes the previously rebooted node is still listed as >> standby. >> >> >> cat /var/log/messages |grep error >> Mar 4 17:32:33 cn1 pengine[1378]: error: native_create_actions: >> Resource resIP (ocf::IPaddr2) is active on 2 nodes attempting recovery >> Mar 4 17:32:33 cn1 pengine[1378]: error: native_create_actions: >> Resource resApache (ocf::apache) is active on 2 nodes attempting recovery >> Mar 4 17:32:33 cn1 pengine[1378]: error: process_pe_message: Calculated >> Transition 1: /var/lib/pacemaker/pengine/pe-error-6.bz2 >> Mar 4 17:32:48 cn1 crmd[1379]: notice: run_graph: Transition 1 >> (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-error-6.bz2): Complete >> >> >> crm_mon -1 >> Last updated: Mon Mar 4 17:49:08 2013 >> Last change: Mon Mar 4 10:22:53 2013 via crm_resource on cn1.localdomain >> Stack: cman >> Current DC: cn1.localdomain - partition with quorum >> Version: 1.1.8-7.el6-394e906 >> 2 Nodes configured, 2 expected votes >> 2 Resources configured. >> >> Node cn2.localdomain: standby >> Online: [ cn1.localdomain ] >> >> resIP (ocf::heartbeat:IPaddr2): Started cn1.localdomain >> resApache (ocf::heartbeat:apache): Started cn1.localdomain >> >> >> i checked the init scripts and found that the standby "behavior" comes >> from a function that is called on "service pacemaker stop" (added in >> rhel6.4). >> >> cman_pre_stop() >> { >> cname=`crm_node --name` >> crm_attribute -N $cname -n standby -v true -l reboot >> echo -n "Waiting for shutdown of managed resources" >> ... > > That will only last until the node comes back (the cluster will remove > it automatically), the core problem is that it appears not to have. > Can you file a bug and attach a crm_report for the period covered by > the restart?
I used the redhat's bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=918502 as you are also the maintainer of the corresponding rpm. -- Thanks Leon _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org