Hello, On 03/01/2012 09:13 PM, Jean-Francois Malouin wrote: > * Florian Haas <flor...@hastexo.com> [20120229 08:12]: >> Jean-François, >> >> I realize I'm late to this discussion, however allow me to chime in here >> anyhow: >> >> On Mon, Feb 27, 2012 at 11:45 PM, Jean-Francois Malouin >> <jean-francois.malo...@bic.mni.mcgill.ca> wrote: >>>> Have you looked at fence_virt? >>>> http://www.clusterlabs.org/wiki/Guest_Fencing >>> >>> Yes I did. >>> >>> I had a quick go last week at compiling it on Debian/Squeeze with >>> backports but with no luck. >> >> Seeing as you're on Debian, there really is no need to use fence_virt. >> Instead, you should just be able to use the "external/libvirt" STONITH >> plugin that ships with cluster-glue (in squeeze-backports). That >> plugin works like a charm and I've used it in testing many times. No >> need to compile anything. >> >> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes >> may be a helpful resource. > > Thanks Florian! Exactly what I needed! > > I set it up as you explained above. I can virsh from the guests to the > physical host but I'm experiencing a few oddities... > > If I manually stonith node1 from node2 (or killall -9 corosync on > node1) I get repeated console messages: > > node2 stonith: [31734]: CRIT: external_reset_req: 'libvirt reset' for host > node1 failed with rc 1 > > and syslog shows: > > Mar 1 14:00:51 node2 pengine: [991]: WARN: pe_fence_node: Node node1 will be > fenced because it is un-expectedly down > Mar 1 14:00:51 node2 pengine: [991]: WARN: determine_online_status: Node > node1 is unclean > Mar 1 14:00:51 node2 pengine: [991]: notice: unpack_rsc_op: Operation > fence_node1_last_failure_0 found resource fence_node1 active on node2 > Mar 1 14:00:51 node2 pengine: [991]: notice: unpack_rsc_op: Operation > fence_node2_last_failure_0 found resource fence_node2 active on node1 > Mar 1 14:00:51 node2 pengine: [991]: WARN: custom_action: Action > resPing:0_stop_0 on node1 is unrunnable (offline) > Mar 1 14:00:51 node2 pengine: [991]: WARN: custom_action: Marking node node1 > unclean > Mar 1 14:00:51 node2 pengine: [991]: WARN: custom_action: Action > fence_node2_stop_0 on node1 is unrunnable (offline) > Mar 1 14:00:51 node2 pengine: [991]: WARN: custom_action: Marking node node1 > unclean > Mar 1 14:00:51 node2 pengine: [991]: WARN: stage6: Scheduling Node node1 for > STONITH > ... > Mar 1 14:00:52 node2 stonith-ng: [987]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > 339d69d4-7d46-46a0-8256-e2c9a6637f09 > Mar 1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: > Refreshing port list for fence_node1 > Mar 1 14:00:52 node2 stonith-ng: [987]: WARN: parse_host_line: Could not > parse (0 0): > Mar 1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: > fence_node1 can fence node1: dynamic-list > Mar 1 14:00:52 node2 stonith-ng: [987]: info: call_remote_stonith: > Requesting that node2 perform op reboot node1 > Mar 1 14:00:52 node2 stonith-ng: [987]: info: stonith_fence: Exec > <stonith_command t="stonith-ng" > st_async_id="339d69d4-7d46-46a0-8256-e2c9a6637f09" st_op="st_fence" > st_callid="0" st_callopt="0" > st_remote_op="339d69d4-7d46-46a0-8256-e2c9a6637f09" st_target="node1" > st_device_action="reboot" st_timeout="54000" src="node2" seq="3" /> > Mar 1 14:00:52 node2 stonith-ng: [987]: info: can_fence_host_with_device: > fence_node1 can fence node1: dynamic-list > Mar 1 14:00:52 node2 stonith-ng: [987]: info: stonith_fence: Found 1 > matching devices for 'node1' > ... > Mar 1 14:00:52 node2 stonith-ng: [987]: info: stonith_command: Processed > st_fence from node2: rc=-1 > Mar 1 14:00:52 node2 stonith-ng: [987]: info: make_args: reboot-ing node > 'node1' as 'port=node1' > Mar 1 14:00:52 node2 pengine: [991]: WARN: process_pe_message: Transition 1: > WARNINGs found during PE processing. PEngine Input stored in: > /var/lib/pengine/pe-warn-8.bz2 > Mar 1 14:00:52 node2 pengine: [991]: notice: process_pe_message: > Configuration WARNINGs found during PE processing. Please run "crm_verify > -L" to identify issues. > Mar 1 14:00:57 node2 external/libvirt[31741]: [31769]: notice: Domain node1 > was stopped > Mar 1 14:01:02 node2 external/libvirt[31741]: [31783]: ERROR: Failed to > start domain node1 > Mar 1 14:01:02 node2 external/libvirt[31741]: [31789]: ERROR: error: failed > to get domain 'node1' > Mar 1 14:01:02 node2 external/libvirt[31741]: [31789]: error: Domain not > found: xenUnifiedDomainLookupByName
Do you already use libvirt to manage your Xen VMs? Is there a chance you manage them only with the help of Xen's native "xm" command and therefore you only have xm config files stored in /etc/xen/ and no libvirt xml definition files for the vms in /etc/libvirt/xen/? Without its xml definition files libvirt won't be able to start Xen VMs ... have a look at http://libvirt.org/drvxen.html#xmlimport if you want to create them easily. You could also try the external/xen0 STONITH resource agent. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > > At this point I can't restart the stonith'ed node1, the cib list it as > UNCLEAN: first I manually have to wipe it clean with > > 'crm node clearstate node1' > > as otherwize the surviving node2 just keep shooting it and some dummy > resources (and and an IP resource located with a ping to the > hypervisor) dont restart properly by themselves. > > Must something simple that I overlooked... > > Any ideas? > > jf > >> >> Cheers, >> Florian >> >> -- >> Need help with High Availability? >> http://www.hastexo.com/now >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org