Re: [Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-07 Thread Vladislav Bogdanov
05.07.2011 10:05, Andrew Beekhof wrote: > On Tue, Jul 5, 2011 at 2:37 PM, Vladislav Bogdanov > wrote: >> 05.07.2011 04:44, Andrew Beekhof wrote: >>> Looks like the VirtualDomain RA isn't correctly implementing stop. >>> Stop of an undefined domain shouldn't produce an error. >> >> Nope, it just c

Re: [Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-05 Thread Andrew Beekhof
On Tue, Jul 5, 2011 at 2:37 PM, Vladislav Bogdanov wrote: > 05.07.2011 04:44, Andrew Beekhof wrote: >> Looks like the VirtualDomain RA isn't correctly implementing stop. >> Stop of an undefined domain shouldn't produce an error. > > Nope, it just cries in logs, nothing more. Hmmm. Better log a b

Re: [Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-04 Thread Vladislav Bogdanov
05.07.2011 04:44, Andrew Beekhof wrote: > Looks like the VirtualDomain RA isn't correctly implementing stop. > Stop of an undefined domain shouldn't produce an error. Nope, it just cries in logs, nothing more. process_lrm_event: LRM operation mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 (call=10

Re: [Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-04 Thread Andrew Beekhof
Looks like the VirtualDomain RA isn't correctly implementing stop. Stop of an undefined domain shouldn't produce an error. On Mon, Jul 4, 2011 at 9:51 PM, Vladislav Bogdanov wrote: > Hi all, > > There is feeling that race condition is possible during live migration > of resources. > > I put one n

[Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-04 Thread Vladislav Bogdanov
Hi all, There is feeling that race condition is possible during live migration of resources. I put one node to standby mode, that made all resources migrate to another one. Virtual machines were successfully live-migrated, but then marked as FAILED almost immediately. Logs show some interesting d