Re: [Linux-HA] stonithd doesn't use "power off" action

Andrew Beekhof Wed, 20 Feb 2013 13:38:44 -0800

On Wed, Feb 20, 2013 at 10:39 PM, Bernd Schubert
<[email protected]> wrote:
> On 02/20/2013 10:48 AM, Andrew Beekhof wrote:
>> On Wed, Feb 20, 2013 at 8:07 PM, Bernd Schubert
>> <[email protected]> wrote:
>>> On 02/20/2013 09:52 AM, Lukas Grossar wrote:
>>>> On 20.02.2013 09:28, Bernd Schubert wrote:
>>>>> On 02/19/2013 10:58 PM, Andrew Beekhof wrote:
>>>>>> On Tue, Feb 19, 2013 at 11:26 PM, Bernd Schubert
>>>>>> <[email protected]> wrote:
>>>>>>> On 02/19/2013 06:53 AM, Andrew Beekhof wrote:
>>>>>>>> On Mon, Feb 18, 2013 at 7:34 PM, Bruce Ford <[email protected]> wrote:
>>>>>>>>> Lukas,
>>>>>>>>>
>>>>>>>>> thanks for the quick reply.
>>>>>>>>>
>>>>>>>>> On Fri, Feb 15, 2013 at 4:54 PM, Lukas Grossar
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> On 15.02.2013 16:43, Bruce Ford wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm running pacemaker 1.1.7 on RedHat 6.3 using the fence_ipmilan
>>>>>>>>>>> fence agent from the "fence-agents" 3.1.5 package.
>>>>>>>>>>>
>>>>>>>>>>> I found that although I have chosen the action "off", this doesn't
>>>>>>>>>>> power off the target node but reboots it with a graceful shutdown. 
>>>>>>>>>>> So
>>>>>>>>>>> I investigated on the commandline:
>>>>>>>>>>
>>>>>>>>>> I ran into the same problem when setting up a cluster using CentOS 
>>>>>>>>>> 6.3
>>>>>>>>>> and sent a mail to the mailing list about a week ago and got the
>>>>>>>>>> following reaction from Andrew Beekhof:
>>>>>>>>>>
>>>>>>>>>>> Prior to 6.4 there was some inconsistency between the various agents
>>>>>>>>>>> and whether they supported "action" or "option".
>>>>>>>>>>> An upgrade to 6.4 in the next few weeks should solve this for you.
>>>>>>>>>
>>>>>>>>> Does 6.4 mean RedHat/Centos 6.4? What a pity, this is currently not 
>>>>>>>>> an option.
>>>>>>>>> Will we face serious problems trying to backport the new fence-agents 
>>>>>>>>> package?
>>>>>>>>
>>>>>>>> No, should be pretty straightforward
>>>>>>>
>>>>>>> So that will introduce another serious change of behaviour in RHEL 6.4?
>>>>>>
>>>>>> No. All agents now support "action".  Anything that used to support
>>>>>> "option" will continue to do so.
>>>>>
>>>>> Hmm, I'm still not sure if I understand it correctly. So with 6.4 one
>>>>> has to set (in crm syntax):
>>>>>
>>>>> property stonith-action="reboot"
>>>>> ?
>>>>>
>>>>> Right now we have:
>>>>>
>>>>> property stonith-action="poweroff"
>>>>>
>>>>> and the fence_ipmilan option: action=off
>>>>>
>>>>> And that leads to a reboot, as it is supposed to do for this
>>>>> installation.
>>>>
>>>> That may be what it is supposed to do for this installation, but it is
>>>> not what it is supposed to do according to the documentation/man page.
>>>
>>>
>>> I'm aware of that. However, it was tested that way, 'reboot' was not
>>> accepted by the agent as parameter and retesting on an upgrade will be
>>> difficult. So if it was a bug, then it was at least a tested bug and
>>> fixing it will break existing installations. From my point of view that
>>> is fine with upstream, but wrong for stable distributions. Distribution
>>> packages instead should document it and add another option such
>>> "really-poweroff".
>>
>> No. No no no no.  This is all kinds of wrong.
>>
>> That the agent did not accept the reboot command (and yet was clearly
>> capable of supporting it) is a bug which should have been reported.
>> Instead you chose a path that relied on a behaviour that was quite
>> obviously the complete opposite of the documented one.
>>
>> Two wrongs do not make a right.
>>
>> It is very unfortunate that you will be affected by this, but you
>> should, at the very least, have queried _someone_ about it before
>> betting the farm.
>> No distro I know of claims to be bug-for-bug compatible with previous
>> releases, even in a stable series.
>
> The only solution would have been to write our own agent, as I always
> did before. But excepts of the weird config options we were rather happy
> with the default agent this time. About reporting a bug - certainly, we
> should have done that. But what would have been the solution?


Err, a patch and an updated package would be the most likely one.

> The
> customer wanted packages from upstream RHEL6 and bugfixed packages were
> not ready at that time.

How can there be a bugfixed package ready if no-one reports it?
Even when developers find the issue, not reported == no-one is
affected == wait for the next x.y release.

> The only solution would have been to add new
> agents to our FhGFS packages, which we indeed should have done.

Please pay attention.  "Reporting the original bug" is what you
_should_ have done.
Even reporting it upstream and applying the patch yourself would be
infinitely better than going it alone.

Don't wait for other people to report your bugs.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stonithd doesn't use "power off" action

Reply via email to