On Fri, Mar 22, 2013 at 7:13 PM, Roman Haefeli <[email protected]> wrote:
> Hi,
>
> I encountered a problem when performing a live migration of some OpenVZ
> CTs. Altough the migration didn't trigger any messages in 'crm_mon' and
> was initially performed without any troubles, the resource was restarted
> on the target node 'unnecessarily'. From the logs it looks as if after
> the actual migration pacemaker detected the resource to be running on
> both nodes. Why did it detect that?

Probably the PE couldn't match up the partially completed migration.
I would bet a pacemaker upgrade prevents this form happening again.

> Could it be that it checked too
> early on the source node? Might that be a problem with the RA ManageVE
> returing too early?
>
> (For details see below)
>
> Roman
>
>
>
>
> The setup:
> * Nodes are running Debian Squeeze with the current pve kernel
> * Our CTs are running on an NFS share mounted on both nodes
> * pacemaker 1.1.7
>
> Action:
> Migration of resource 'netpd' from vice1 to vice0
>
> Log of the source node (vice1)
> ------------------------------
> Mar 20 16:30:57 vice1 ManageVE[107511]: INFO: Setting up checkpoint... 
> suspend... dump... kill... Container is unmounted Checkpointing completed 
> succesfully
> Mar 20 16:30:57 vice1 lrmd: [1523]: info: operation migrate_to[66] on netpd 
> for client 1526: pid 107511 exited with return code 0
> Mar 20 16:30:57 vice1 crmd: [1526]: info: process_lrm_event: LRM operation 
> netpd_migrate_to_0 (call=66, rc=0, cib-update=172, confirmed=true) ok
> [...]
> Mar 20 16:30:57 vice1 pengine: [1525]: ERROR: native_create_actions: Resource 
> netpd (ocf::ManageVE) is active on 2 nodes attempting recovery
> Mar 20 16:30:57 vice1 pengine: [1525]: WARN: native_create_actions: See 
> http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
> [...]
> Mar 20 16:30:57 vice1 pengine: [1525]: notice: LogActions: Restart 
> netpd#011(Started vice0)
>
>
> Log of the target node (vice0)
> -------------------------------
> Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd stop[23] (pid 2191)
> [...]
> Mar 20 16:30:57 vice0 ManageVE[2191]: INFO: VE 3025 already stopped.
> [...]
> Mar 20 16:30:57 vice0 lrmd: [1543]: info: operation stop[23] on netpd for 
> client 1546: pid 2191 exited with return code 0
> [...]
> Mar 20 16:30:57 vice0 crmd: [1546]: info: process_lrm_event: LRM operation 
> netpd_stop_0 (call=23, rc=0, cib-update=28, confirmed=true) ok
> [...]
> Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd start[27] (pid 2275)
> [...]
> Mar 20 16:30:57 vice0 kernel: CT: 3025: started
>
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to