On 6 Feb 2014, at 7:45 pm, Keisuke MORI <keisuke.mori...@gmail.com> wrote:
> Hi, > > I observed that pre_notify_demote is issued twice when a master > resource is migrating. > I'm wondering if this is the correct behavior. > > Steps to reproduce: > > - Start up 2 nodes cluster configured for the PostgreSQL streaming > replication using pgsql RA as a master/slave resource. > - kill the postgresql process on the master node to induce a fail-over. > - The fail-over succeeds as expected, but pre_notify_demote was > executed twice on each node before demoting on the master resource. > > 100% reproducible on my cluster. > > Pacemaker version: 1.1.11-rc4 (source build from the repo) > OS: RHEL6.4 > > I have never seen this on Pacemaker-1.0.* cluster with the same configuration. > > The relevant logs and pe-inputs are attached. > > > Diagnostics: > > (1) The first transition caused by the process failure (pe-input-160) > initiates pre_notify_demote on both nodes and cancelling slave monitor > on the slave node. > {{{ > 171 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command: > Initiating action 9: cancel prmPostgresql_cancel_10000 on rhel64-2 > 172 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command: > Initiating action 79: notify prmPostgresql_pre_notify_demote_0 on > rhel64-1 (local) > > 175 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command: > Initiating action 81: notify prmPostgresql_pre_notify_demote_0 on > rhel64-2 > }}} > > (2) When cancelling slave monitor completes, the transition is aborted > by Resource op removal. > {{{ > 176 Jan 30 16:08:59 rhel64-1 crmd[8143]: info: match_graph_event: > Action prmPostgresql_monitor_10000 (9) confirmed on rhel64-2 (rc=0) > 177 Jan 30 16:08:59 rhel64-1 cib[8138]: info: cib_process_request: > Completed cib_delete operation for section status: OK (rc=0, > origin=rhel64-2/crmd/21, version=0.37.9) > 178 Jan 30 16:08:59 rhel64-1 crmd[8143]: info: > abort_transition_graph: te_update_diff:258 - Triggered transition > abort (complete=0, node=rhel64-2, tag=lrm_rsc_op, > id=prmPostgresql_monitor_10000, > magic=0:0;26:12:0:acf9a2a3-307c-460b-b786-fc20e6b8aad5, cib=0.37.9) : > Resource op removal > }}} > > (3) The second transition is calculated by the abort (pe-input-161) > which results initiating pre_notify_demote again. If the demote didn't complete (or wasn't even attempted), then we must send the pre_notify_demote again unfortunately. The real bug may well be that the transition shouldn't have been aborted. > {{{ > 227 Jan 30 16:09:01 rhel64-1 pengine[8142]: notice: > process_pe_message: Calculated Transition 15: > /var/lib/pacemaker/pengine/pe-input-161.bz2 > 229 Jan 30 16:09:01 rhel64-1 crmd[8143]: notice: te_rsc_command: > Initiating action 78: notify prmPostgresql_pre_notify_demote_0 on > rhel64-1 (local) > 232 Jan 30 16:09:01 rhel64-1 crmd[8143]: notice: te_rsc_command: > Initiating action 80: notify prmPostgresql_pre_notify_demote_0 on > rhel64-2 > }}} > > I think that the transition abort at (2) should not happen. > > Regards, > -- > Keisuke MORI > <logs-pre-notify-20140206.tar.bz2>_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org