Hi! I'm no expert, but checkout the previous commit, apply you patch, tehn do "git add --interactive" and you can pick each chunk for the next commit. The rest is still there, but won't be committed. You may repeat the "git add" then.
Regards, Ulrich >>> Tom Parker <[email protected]> schrieb am 21.10.2013 um 17:14 in Nachricht <[email protected]>: > Hi Dejan. > > How can I revert my commits so that they are not include multiple > things? I will submit one patch with the logging cleanup and then if > needed another with my changes to the meta-data. > > Tom > > On 10/21/2013 09:39 AM, Dejan Muhamedagic wrote: >> Hi Ulrich! >> >> On Mon, Oct 21, 2013 at 09:28:50AM +0200, Ulrich Windl wrote: >>> Hi! >>> >>> Basically I think there should be no hard-coded constants whose value depends >>> on some performance measurements, like 5s for rebooting a VM. >> It's actually not 5s, but the status is run 5 times. If the load >> is high, my guess is that the Xen tools used by the RA would >> suffer proportionally. >> >>> So I support >>> Tom's changes. >>> >>> However I noticed: >>> >>> +running; apparently, this period lasts only for a second or >>> +two >>> >>> (missing full stop at end of sentence) >> That's at the end of the comment and, typically, comments end >> with a carriage return (as is here the case). >> >>> Actually I'd rephrase the description: >>> >>> "When the guest is rebooting, there is a short interval where the guest >>> completely disappears from "xm list", which, in turn, will cause the monitor >>> operation to return a "not running" status. If the guest cannot be found , > this >>> value will cause some extra delay in the monitor operation to work around > the >>> problem." >>> >>> (I.e. try to describe the effect, not the implementation) >> That's the code, so the implementation is described. The very >> top of the comment says: >> >> # If the guest is rebooting, it may completely disappear from the >> # list of defined guests >> >> I was hoping that that was enough of an explanation. Look for >> a more thorough description of the cause in the changelog. BTW, >> note that this is a _workaround_ and that the thing should >> eventually be fixed in Xen. >> >>> And yes, I appreciate consistent log formats also ;-) >> That's always welcome, of course. It should also go in a >> separate commit. >> >> Thanks, >> >> Dejan >> >>> Regards, >>> Ulrich >>> >>>>>> Tom Parker <[email protected]> schrieb am 18.10.2013 um 19:30 in >>> Nachricht >>> <[email protected]>: >>>> Hi Dejan. Sorry to be slow to respond to this. I have done some >>>> testing and everything looks good. >>>> >>>> I spent some time tweaking the RA and I added a parameter called >>>> wait_for_reboot (default 5s) to allow us to override the reboot sleep >>>> times (in case it's more than 5 seconds on really loaded hypervisors). >>>> I also cleaned up a few log entries to make them consistent in the RA >>>> and edited your entries for xen status to be a little bit more clear as >>>> to why we think we should be waiting. >>>> >>>> I have attached a patch here because I have NO idea how to create a >>>> branch and pull request. If there are links to a good place to start I >>>> may be able to contribute occasionally to some other RAs that I use. >>>> >>>> Please let me know what you think. >>>> >>>> Thanks for your help >>>> >>>> Tom >>>> >>>> >>>> On 10/17/2013 06:10 AM, Dejan Muhamedagic wrote: >>>>> On Thu, Oct 17, 2013 at 11:45:17AM +0200, Dejan Muhamedagic wrote: >>>>>> Hi Tom, >>>>>> >>>>>> On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: >>>>>>> Some more reading of the source code makes me think the " || [ >>>>>>> "$__OCF_ACTION" != "stop" ]; "is not needed. >>>>>> Yes, you're right. I'll drop that part of the if statement. Many >>>>>> thanks for testing. >>>>> Fixed now. The if statement, which was obviously hard to follow, >>>>> got relegated to the monitor function. Which makes the >>>>> Xen_Status_with_Retry really stand for what's happening in there ;-) >>>>> >>>>> Tom, hope you can test again. >>>>> >>>>> Cheers, >>>>> >>>>> Dejan >>>>> >>>>>> Cheers, >>>>>> >>>>>> Dejan >>>>>> >>>>>>> Xen_Status_with_Retry() is only called from Stop and Monitor so we only >>>>>>> need to check if it's a probe. Everything else should be handled in the >>>>>>> case statement in the loop. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> On 10/16/2013 05:16 PM, Tom Parker wrote: >>>>>>>> Hi. I think there is an issue with the Updated Xen RA. >>>>>>>> >>>>>>>> I think there is an issue with the if statement here but I am not sure. >>>>>>>> I may be confused about how bash || works but I don't see my servers >>>>>>>> ever entering the loop on a vm disappearing. >>>>>>>> >>>>>>>> if ocf_is_probe || [ "$__OCF_ACTION" != "stop" ]; then >>>>>>>> return $rc >>>>>>>> fi >>>>>>>> >>>>>>>> Does this not mean that if we run a monitor operation that is not a >>>>>>>> probe we will have: >>>>>>>> >>>>>>>> (ocf_is_probe) return false >>>>>>>> (stop != monitor) return true >>>>>>>> (false || true) return true >>>>>>>> >>>>>>>> which will cause the if statement to return $rc and never enter the >>> loop? >>>>>>>> Xen_Status_with_Retry() { >>>>>>>> local rc cnt=5 >>>>>>>> >>>>>>>> Xen_Status $1 >>>>>>>> rc=$? >>>>>>>> if ocf_is_probe || [ "$__OCF_ACTION" != "stop" ]; then >>>>>>>> return $rc >>>>>>>> fi >>>>>>>> while [ $rc -eq $OCF_NOT_RUNNING -a $cnt -gt 0 ]; do >>>>>>>> case "$__OCF_ACTION" in >>>>>>>> stop) >>>>>>>> ocf_log debug "domain $1 reported as not running, waiting >>> $cnt >>>>>>>> seconds ..." >>>>>>>> ;; >>>>>>>> monitor) >>>>>>>> ocf_log warn "domain $1 reported as not running, but it is >>>>>>>> expected to be running! Retrying for $cnt seconds ..." >>>>>>>> ;; >>>>>>>> *) : not reachable >>>>>>>> ;; >>>>>>>> esac >>>>>>>> sleep 1 >>>>>>>> Xen_Status $1 >>>>>>>> rc=$? >>>>>>>> let cnt=$((cnt-1)) >>>>>>>> done >>>>>>>> return $rc >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 10/16/2013 12:12 PM, Dejan Muhamedagic wrote: >>>>>>>>> Hi Tom, >>>>>>>>> >>>>>>>>> On Tue, Oct 15, 2013 at 07:55:11PM -0400, Tom Parker wrote: >>>>>>>>>> Hi Dejan >>>>>>>>>> >>>>>>>>>> Just a quick question. I cannot see your new log messages being >>> logged >>>>>>>>>> to syslog >>>>>>>>>> >>>>>>>>>> ocf_log warn "domain $1 reported as not running, but it is expected >>> to >>>>>>>>>> be running! Retrying for $cnt seconds ... >>>>>>>>>> >>>>>>>>>> Do you know where I can set my logging to see warn level messages? I >>>>>>>>>> expected to see them in my testing by default but that does not seem >>> to >>>>>>>>>> be true. >>>>>>>>> You should see them by default. But note that these warnings may >>>>>>>>> not happen, depending on the circumstances on your host. In my >>>>>>>>> experiments they were logged only while the guest was rebooting >>>>>>>>> and then just once or maybe twice. If you have recent >>>>>>>>> resource-agents and crmsh, you can enable operation tracing (with >>>>>>>>> crm resource trace <rsc> monitor <interval>). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Dejan >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Tom >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 10/08/2013 05:04 PM, Dejan Muhamedagic wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote: >>>>>>>>>>>> Hi! >>>>>>>>>>>> >>>>>>>>>>>> I thought, I'll never be bitten by this bug, but I actually was! Now >>> I'm >>>>>>>>>>>> wondering whether the Xen RA sees the guest if you use pygrub, and >>> pygrub is >>>>>>>>>>>> still counting down for actual boot... >>>>>>>>>>>> >>>>>>>>>>>> But the reason why I'm writing is that I think I've discovered >>> another bug >>>> in >>>>>>>>>>>> the RA: >>>>>>>>>>>> >>>>>>>>>>>> CRM decided to "recover" the guest VM "v02": >>>>>>>>>>>> [...] >>>>>>>>>>>> lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client >>> 14906: >>>>>>>>>>>> pid 19516 exited with return code 7 >>>>>>>>>>>> [...] >>>>>>>>>>>> pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started >>> h05) >>>>>>>>>>>> [...] >>>>>>>>>>>> crmd: [14906]: info: te_rsc_command: Initiating action 5: stop >>>>>>>>>>>> prm_xen_v02_stop_0 on h05 (local) >>>>>>>>>>>> [...] >>>>>>>>>>>> Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped. >>>>>>>>>>>> [...] >>>>>>>>>>>> lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client >>> 14906: pid >>>>>>>>>>>> 19552 exited with return code 0 >>>>>>>>>>>> [...] >>>>>>>>>>>> crmd: [14906]: info: te_rsc_command: Initiating action 78: start >>>>>>>>>>>> prm_xen_v02_start_0 on h05 (local) >>>>>>>>>>>> lrmd: [14903]: info: rsc:prm_xen_v02 start[32] (pid 19686) >>>>>>>>>>>> [...] >>>>>>>>>>>> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stderr) Error: >>> Domain >>>> 'v02' >>>>>>>>>>>> already exists with ID '3' >>>>>>>>>>>> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stdout) Using >>> config file >>>>>>>>>>>> "/etc/xen/vm/v02". >>>>>>>>>>>> [...] >>>>>>>>>>>> lrmd: [14903]: info: operation start[32] on prm_xen_v02 for client >>> 14906: >>>> pid >>>>>>>>>>>> 19686 exited with return code 1 >>>>>>>>>>>> [...] >>>>>>>>>>>> crmd: [14906]: info: process_lrm_event: LRM operation >>> prm_xen_v02_start_0 >>>>>>>>>>>> (call=32, rc=1, cib-update=5271, confirmed=true) unknown error >>>>>>>>>>>> crmd: [14906]: WARN: status_from_rc: Action 78 (prm_xen_v02_start_0) >>> on h05 >>>>>>>>>>>> failed (target: 0 vs. rc: 1): Error >>>>>>>>>>>> [...] >>>>>>>>>>>> >>>>>>>>>>>> As you can clearly see "start" failed, because the guest was found >>> up >>>> already! >>>>>>>>>>>> IMHO this is a bug in the RA (SLES11 SP2: >>> resource-agents-3.9.4-0.26.84). >>>>>>>>>>> Yes, I've seen that. It's basically the same issue, i.e. the >>>>>>>>>>> domain being gone for a while and then reappearing. >>>>>>>>>>> >>>>>>>>>>>> I guess the following test is problematic: >>>>>>>>>>>> --- >>>>>>>>>>>> xm create ${OCF_RESKEY_xmfile} name=$DOMAIN_NAME >>>>>>>>>>>> rc=$? >>>>>>>>>>>> if [ $rc -ne 0 ]; then >>>>>>>>>>>> return $OCF_ERR_GENERIC >>>>>>>>>>>> --- >>>>>>>>>>>> Here "xm create" probably fails if the guest is already created... >>>>>>>>>>> It should fail too. Note that this is a race, but the race is >>>>>>>>>>> anyway caused by the strange behaviour of xen. With the recent >>>>>>>>>>> fix (or workaround) in the RA, this shouldn't be happening. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Dejan >>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Ulrich >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> Dejan Muhamedagic <[email protected]> schrieb am 01.10.2013 um >>> 12:24 in >>>>>>>>>>>> Nachricht <[email protected]>: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 01, 2013 at 12:13:02PM +0200, Lars Marowsky-Bree >>> wrote: >>>>>>>>>>>>>> On 2013-10-01T00:53:15, Tom Parker <[email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for paying attention to this issue (not really a bug) as I >>> am >>>>>>>>>>>>>>> sure I am not the only one with this issue. For now I have set >>> all my >>>>>>>>>>>>>>> VMs to destroy so that the cluster is the only thing managing >>> them but >>>>>>>>>>>>>>> this is not super clean as I get failures in my logs that are not >>> really >>>>>>>>>>>>>>> failures. >>>>>>>>>>>>>> It is very much a severe bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The Xen RA has gained a workaround for this now, but we're also >>> pushing >>>>>>>>>>>>> Take a look here: >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/ClusterLabs/resource-agents/pull/314 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Dejan >>>>>>>>>>>>> >>>>>>>>>>>>>> the Xen team (where the real problem is) to investigate and fix. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Lars >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Architect Storage/HA >>>>>>>>>>>>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix >>> Imendörffer, >>>>>>>>>>>>> HRB 21284 (AG Nürnberg) >>>>>>>>>>>>>> "Experience is the name everyone gives to their mistakes." -- >>> Oscar Wilde >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>> _______________________________________________ >>>>>>>>>> Linux-HA mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-HA mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> [email protected] >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> [email protected] >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
