05.07.2011 10:05, Andrew Beekhof wrote: > On Tue, Jul 5, 2011 at 2:37 PM, Vladislav Bogdanov <bub...@hoster-ok.com> > wrote: >> 05.07.2011 04:44, Andrew Beekhof wrote: >>> Looks like the VirtualDomain RA isn't correctly implementing stop. >>> Stop of an undefined domain shouldn't produce an error. >> >> Nope, it just cries in logs, nothing more. > > Hmmm. Better log a bug then and include a crm_report. > I've almost caught up on emails now, so I should be able to start > looking at tarballs and bugs soon.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2615 with hb_report, not crm_report because of http://developerbugs.linux-foundation.org/show_bug.cgi?id=2614 Best, Vladislav > >> >> process_lrm_event: LRM operation >> mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 (call=1006, rc=0, >> cib-update=1031, confirmed=true) ok >> >> And, that stop operation is fired a little bit after lrmd made its verdict. >> >>> >>> On Mon, Jul 4, 2011 at 9:51 PM, Vladislav Bogdanov <bub...@hoster-ok.com> >>> wrote: >>>> Hi all, >>>> >>>> There is feeling that race condition is possible during live migration >>>> of resources. >>>> >>>> I put one node to standby mode, that made all resources migrate to >>>> another one. >>>> Virtual machines were successfully live-migrated, but then marked as >>>> FAILED almost immediately. >>>> Logs show some interesting details: >>>> ========= >>>> Jul 4 10:21:48 s01-1 VirtualDomain[22988]: INFO: >>>> mgmt01.c01.ttc.prague.cz.vds-ok.com: live migration to s01-0 succeeded. >>>> Jul 4 10:21:48 s01-1 lrmd: [7741]: info: RA output: >>>> (mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:migrate_to:stdout) Domain >>>> mgmt01.c01.ttc.prague.cz.vds-ok.com has been undefined >>>> Jul 4 10:21:48 s01-0 VirtualDomain[4641]: INFO: >>>> mgmt01.c01.ttc.prague.cz.vds-ok.com: live migration from s01-1 succeeded. >>>> Jul 4 10:21:49 s01-0 lrmd: [1927]: info: RA output: >>>> (mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:migrate_from:stderr) >>>> mgmt01.c01.ttc.prague.cz.vds-ok.com-vm is active on more than one node, >>>> returning the default value for <null> >>>> Jul 4 10:21:49 s01-1 crmd: [7744]: info: do_lrm_rsc_op: Performing >>>> key=110:695:0:7ae65826-5d35-41c0-945a-8336ecb0bc3c >>>> op=mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 ) >>>> Jul 4 10:21:49 s01-1 lrmd: [7741]: info: >>>> rsc:mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:1006: stop >>>> Jul 4 10:21:49 s01-1 VirtualDomain[24062]: ERROR: Virtual domain >>>> mgmt01.c01.ttc.prague.cz.vds-ok.com has no state during stop operation, >>>> bailing out. >>>> Jul 4 10:21:49 s01-1 crmd: [7744]: info: process_lrm_event: LRM >>>> operation mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 (call=1006, >>>> rc=0, cib-update=1031, confirmed=true) ok >>>> ========= >>>> Note that line with "is active on more than one node" follows "migration >>>> from s01-1 succeeded" immediately in syslog (in both local and remote >>>> files), so it was put into syslog queue immediately after former one. >>>> >>>> From what I understand, lrmd made decision to fail resource just because >>>> 'stop' operation was not yet run on another node. >>>> >>>> What else can it be if my feeling is wrong? >>>> >>>> Version of pacemaker is 'almost' 1.1-devel tip. >>>> cluster-glue is 1.0.7 >>>> I use own version of VirtualDomain RA, but it has the same migration >>>> logic as a stock one. >>>> >>>> Best, >>>> Vladislav >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker