Re: [Pacemaker] stonith in a virtual cluster

Andreas Kurz Tue, 06 Mar 2012 05:47:05 -0800

Hello,

On 03/01/2012 09:13 PM, Jean-Francois Malouin wrote:
> * Florian Haas <flor...@hastexo.com> [20120229 08:12]:
>> Jean-François,
>>
>> I realize I'm late to this discussion, however allow me to chime in here 
>> anyhow:
>>
>> On Mon, Feb 27, 2012 at 11:45 PM, Jean-Francois Malouin
>> <jean-francois.malo...@bic.mni.mcgill.ca> wrote:
>>>> Have you looked at fence_virt? 
>>>> http://www.clusterlabs.org/wiki/Guest_Fencing
>>>
>>> Yes I did.
>>>
>>> I had a quick go last week at compiling it on Debian/Squeeze with
>>> backports but with no luck.
>>
>> Seeing as you're on Debian, there really is no need to use fence_virt.
>> Instead, you should just be able to use the "external/libvirt" STONITH
>> plugin that ships with cluster-glue (in squeeze-backports). That
>> plugin works like a charm and I've used it in testing many times. No
>> need to compile anything.
>>
>> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
>> may be a helpful resource.
> 
> Thanks Florian! Exactly what I needed!
> 
> I set it up as you explained above. I can virsh from the guests to the
> physical host but I'm experiencing a few oddities...
> 
> If I manually stonith node1 from node2 (or killall -9 corosync on
> node1) I get repeated console messages:
> 
> node2 stonith: [31734]: CRIT: external_reset_req: 'libvirt reset' for host 
> node1 failed with rc 1
> 
> and syslog shows:
> 
> Mar  1 14:00:51 node2 pengine: [991]: WARN: pe_fence_node: Node node1 will be 
> fenced because it is un-expectedly down
> Mar  1 14:00:51 node2 pengine: [991]: WARN: determine_online_status: Node 
> node1 is unclean
> Mar  1 14:00:51 node2 pengine: [991]: notice: unpack_rsc_op: Operation 
> fence_node1_last_failure_0 found resource fence_node1 active on node2
> Mar  1 14:00:51 node2 pengine: [991]: notice: unpack_rsc_op: Operation 
> fence_node2_last_failure_0 found resource fence_node2 active on node1
> Mar  1 14:00:51 node2 pengine: [991]: WARN: custom_action: Action 
> resPing:0_stop_0 on node1 is unrunnable (offline)
> Mar  1 14:00:51 node2 pengine: [991]: WARN: custom_action: Marking node node1 
> unclean
> Mar  1 14:00:51 node2 pengine: [991]: WARN: custom_action: Action 
> fence_node2_stop_0 on node1 is unrunnable (offline)
> Mar  1 14:00:51 node2 pengine: [991]: WARN: custom_action: Marking node node1 
> unclean
> Mar  1 14:00:51 node2 pengine: [991]: WARN: stage6: Scheduling Node node1 for 
> STONITH
> ...
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: initiate_remote_stonith_op: 
> Initiating remote operation reboot for node1: 
> 339d69d4-7d46-46a0-8256-e2c9a6637f09
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: 
> Refreshing port list for fence_node1
> Mar  1 14:00:52 node2 stonith-ng: [987]: WARN: parse_host_line: Could not 
> parse (0 0):
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: 
> fence_node1 can fence node1: dynamic-list
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: call_remote_stonith: 
> Requesting that node2 perform op reboot node1
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: stonith_fence: Exec 
> <stonith_command t="stonith-ng" 
> st_async_id="339d69d4-7d46-46a0-8256-e2c9a6637f09" st_op="st_fence" 
> st_callid="0" st_callopt="0" 
> st_remote_op="339d69d4-7d46-46a0-8256-e2c9a6637f09" st_target="node1" 
> st_device_action="reboot" st_timeout="54000" src="node2" seq="3" />
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: 
> fence_node1 can fence node1: dynamic-list
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: stonith_fence: Found 1 
> matching devices for 'node1'
> ...
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: stonith_command: Processed 
> st_fence from node2: rc=-1
> Mar  1 14:00:52 node2 stonith-ng: [987]: info: make_args: reboot-ing node 
> 'node1' as 'port=node1'
> Mar  1 14:00:52 node2 pengine: [991]: WARN: process_pe_message: Transition 1: 
> WARNINGs found during PE processing. PEngine Input stored in: 
> /var/lib/pengine/pe-warn-8.bz2
> Mar  1 14:00:52 node2 pengine: [991]: notice: process_pe_message: 
> Configuration WARNINGs found during PE processing.  Please run "crm_verify 
> -L" to identify issues.
> Mar  1 14:00:57 node2 external/libvirt[31741]: [31769]: notice: Domain node1 
> was stopped
> Mar  1 14:01:02 node2 external/libvirt[31741]: [31783]: ERROR: Failed to 
> start domain node1
> Mar  1 14:01:02 node2 external/libvirt[31741]: [31789]: ERROR: error: failed 
> to get domain 'node1'
> Mar  1 14:01:02 node2 external/libvirt[31741]: [31789]: error: Domain not 
> found: xenUnifiedDomainLookupByName


Do you already use libvirt to manage your Xen VMs? Is there a chance you
manage them only with the help of Xen's native "xm" command and
therefore you only have xm config files stored in /etc/xen/ and no
libvirt xml definition files for the vms in /etc/libvirt/xen/?

Without its xml definition files libvirt won't be able to start Xen VMs
... have a look at http://libvirt.org/drvxen.html#xmlimport if you want
to create them easily.

You could also try the external/xen0 STONITH resource agent.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> 
> 
> At this point I can't restart the stonith'ed node1, the cib list it as
> UNCLEAN: first I manually have to wipe it clean with 
> 
> 'crm node clearstate node1' 
> 
> as otherwize the surviving node2 just keep shooting it and some dummy
> resources (and and an IP resource located with a ping to the
> hypervisor) dont restart properly by themselves.
> 
> Must something simple that I overlooked...
> 
> Any ideas?
> 
> jf
> 
>>
>> Cheers,
>> Florian
>>
>> -- 
>> Need help with High Availability?
>> http://www.hastexo.com/now
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] stonith in a virtual cluster

Reply via email to