Re: [Pacemaker] A caveat in the VirtualDomain resource agent

Cédric Dufour - Idiap Research Institute Fri, 22 Aug 2014 08:02:20 -0700

Hello,

On 22/08/14 15:32, Dejan Muhamedagic wrote:
> Hi,
>
> On Fri, Aug 22, 2014 at 10:23:29AM +0200, Cédric Dufour - Idiap Research 
> Institute wrote:
>> Hello,
>>
>> Is this the right place to report this issue? (please redirect me if not)
> Yes. Though bugs/issues/fixes are nowadays mostly handled at
> github.com/ClusterLabs/resource-agents and reports there have
> certainly more visibility.
>
>> As we were experiencing/demonstrating our new cluster yesterday, we stumbled 
>> on a caveat in our LibvirtQemu resource agent (derived from VirtualDomain). 
>> Since the caveat is the same in the VirtualDomain resource agent; I thought 
>> I better report it. Please see the patch below (for LibvirtQemu), which 
>> comments should allow you to understand where the problem lies.
> Perhaps I missed something, but may I ask why did you decide to
> create a new RA instead of improving the existing one? Was there
> anything in VirtualDomain making it unsuitable for your use
> case?


Long story:
[1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-August/022432.html
[2] http://oss.clusterlabs.org/pipermail/pacemaker/2014-August/022477.html

Shortly put:
[1] "I sized it [CIB] down from 444 to 277 resources by merging 'VirtualDomain' 
and 'MailTo' RA/primitives into a custom/single 'LibvirtQemu' one."
[2] "any error in the "MailTo" primitive would be considered "critical" [...] 
by Pacemaker, resulting in node fencing" and "I simplified the code by assuming 
a local qemu hypervisor [...]; I did so because I experienced strange delays 
when the "VirtualDomain" RA was running the "virsh ... uri" command for the 
sake of acquiring a sensible default value for the "hypervisor" parameter 
(which always resulted in "qemu:///system")."

The modifications I made are thus quite (?) specific (?) to my use case.
Also, I've been using "custom" RAs since Heartbeat V.1, then V.2 and now 
Pacemaker 1.1, in order to rely on RAs that are thoroughly tested in my setup 
rather than one that may change according to distro whims, in ways that may be 
incompatible with my setup (my experience with HA being: setup, test, test, 
test, test, freeze... don't touch anything!)


>
>> --- LibvirtQemu.orig    2014-08-22 09:39:21.997201000 +0200
>> +++ LibvirtQemu    2014-08-22 09:50:32.440969000 +0200
>> @@ -154,11 +154,10 @@
>>    local virsh_output
>>    local domain_name
>>  
>> -  # Note: passing in the domain name from outside the script is
>> -  # intended for testing and debugging purposes only. Don't do this
>> -  # in production, instead let the script figure out the domain name
>> -  # from the config file. You have been warned.
>> -  if [ -z "${DOMAIN_NAME}" ]; then
>> +  # NOTE: Re-defining an already defined domain is dangerous! It shall be 
>> done only
>> +  # if we can reasonably assume the configuration file hasn't changed since 
>> the last
>> +  # time the domain has been defined.
>> +  if [ -z "${DOMAIN_NAME}" ] || [ "${OCF_RESKEY_config}" -ot "${STATEFILE}" 
>> ]; then
>>      # Spin until we have a domain name
>>      while true; do
>>        virsh_output="$(virsh ${VIRSH_OPTIONS} define ${OCF_RESKEY_config})"
>> @@ -170,7 +169,7 @@
>>      echo "${domain_name}" > "${STATEFILE}"
>>      ocf_log info "Domain name '${domain_name}' saved to state file 
>> '${STATEFILE}'."
>>    else
>> -    ocf_log warn "Domain name '${DOMAIN_NAME}' already defined; overriding 
>> configuration file '${OCF_RESKEY_config}' (this should NOT ne done in 
>> production!)."
>> +    ocf_log warn "Domain name '${DOMAIN_NAME}' already defined; overriding 
>> by newer configuration file will NOT be done!"
>>    fi
>>  }
> Under which circumstances did you run into these issues?

1. Stop the resource
2. Undefine the corresponding libvirtd domain from all nodes (without deleting 
the state file)
3. Start the resource

As I said, this is an edge case (which I stumbled on as I was demonstrating the 
cluster; I most likely would never have reasons to execute 2. and 3. 
otherwise... but one never knows)

> There were some recent additions which enable saving the changes
> back to the configuration file. Would that help?

I just had a look at 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/VirtualDomain

I see VirtualDomain has evolved quite a lot compared to the Debian/Wheezy one 
on which I based my custom RA. You now rely on 'virsh undefine' and 'virsh 
create' rather than 'virsh define' and 'virsh start' to manage/start VMs. From 
what I quickly gathered, the latest VirtualDomain should be immune to the 
issue/circumstances at hand (the changes introduced about 'save_config_on_stop' 
are thus irrelevant).

PS: Being under pressure of a deadline, I had no time to investigate why the 
'virsh uri' command would take 20 seconds to complete (as mentioned in my reply 
above). I see this would also be a non-issue in the latest VirtualDomain, 
provide the "hypervisor" parameter is set by the user.

So, apparently, all problems I experimented are SOLVED in the latest 
VirtualDomain.
Thanks for your comments, which pointed me in the right direction.

Unfortunately, the VirtualDomain distributed by Debian/Wheezy (current stable) 
is prone to the issue/circumstances at hand.
But that is another story (not yours to deal with)... :-)

Cédric


>
> Cheers,
>
> Dejan
>
>> @@ -205,12 +204,12 @@
>>          ;;
>>        ''|'no state')
>>          # Empty string may be returned when virsh does not
>> -        # receive a reply from libvirtd.
>> +        # receive a reply from libvirtd or after the domain has
>> +        # been undefined.
>>          # "no state" may occur when the domain is currently
>>          # being migrated (on the migration target only), or
>>          # whenever virsh can't reliably obtain the domain
>>          # state.
>> -        status='no state'
>>          if [ "${__OCF_ACTION}" == 'stop' ] && [ ${try} -ge 3 ]; then
>>            # During the stop operation, we want to bail out
>>            # quickly, so as to be able to force-stop (destroy)
>> @@ -224,6 +223,17 @@
>>            ocf_log info "Domain '${DOMAIN_NAME}' currently has no state; 
>> retrying."
>>            sleep 1
>>          fi
>> +        if [ "${status}" == '' ] && [ $(( ${try} % 10 )) -eq 0 ]; then
>> +          # Could it be that libvirtd is running healthily but the domain
>> +          # has been undefined? In that case, let's attempt to re-define it.
>> +          # If libvirtd IS running, it can not hurt (given the safeguards in
>> +          # LibvirtQemu_Define). If libvirtd is NOT running, then something 
>> is
>> +          # definitely wrong (and the monitor operation will time-out in
>> +          # LibvirtQemu_Define the same way as it would here).
>> +          ocf_log warn "Has domain '${DOMAIN_NAME}' been undefined? 
>> attempting to re-define it."
>> +          LibvirtQemu_Define
>> +        fi
>> +        status='no state'
>>          ;;
>>        *)
>>          # any other output is unexpected.
>> @@ -487,6 +497,11 @@
>>  
>>  # Define the domain on startup, and re-define whenever someone deleted
>>  # the state file, or touched the config.
>> +# WARNING: There is a caveat here! When the resource is stopped, the state 
>> file
>> +# is deleted ONLY on the node where it was running. In case the domain is 
>> then
>> +# undefined (from libvirtd), on all nodes, we will end-up with a state file 
>> but no
>> +# domain definition on those nodes that were not running the resource. The 
>> monitor
>> +# operation MUST handle that situation, should the resource be restarted.
>>  if [ ! -e "${STATEFILE}" ] || [ "${OCF_RESKEY_config}" -nt "${STATEFILE}" 
>> ]; then
>>    LibvirtQemu_Define
>>  fi
>>
>> One could ask "why undefine a libvirt domain and then restart it?". The 
>> answer is two-fold: 1. experience showed us that we shall undefine a 
>> decommissioned domain from libvirt to prevent potential UUID conflict when 
>> defining a new domain (which is likely in our setup, since UUID are build 
>> from the domain IP address); 2. the "demo-effect" (or potential legitimate 
>> reasons), where one would "decommission" a domain and restart it right 
>> afterwards ( :-/ ).
>>
>> PS: we now also make sure to delete the VirtualDomain/LibvirtQemu state file 
>> when undefining the domain. But best have multiple safe guards as far as 
>> this caveat is concerned (thus the patch above).
>>
>> Hope it helps,
>>
>> Cédric
>>
>> -- 
>>
>> Cédric Dufour @ Idiap Research Institute
>>
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] A caveat in the VirtualDomain resource agent

Reply via email to