Re: [Pacemaker] resource with colocation rule doesn't fail

Andrew Beekhof Tue, 06 Aug 2013 20:58:47 -0700

I believe this patch should help:

    https://github.com/beekhof/pacemaker/commit/7a0a6f8


Can you give it a try?

On 07/08/2013, at 12:28 PM, Andrew Beekhof <and...@beekhof.net> wrote:

> 
> On 02/08/2013, at 5:56 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote:
> 
>> Hi Andrew,
>> 
>> Thanks for the fix.
>> I tried it on my setup and now when a cloned resource fails the group will 
>> move to the other node as expected.
>> 
>> However I noticed something strange.
>> If a cloned resource is failing I see this in the logs:
>> pengine[12178]:  warning: unpack_rsc_op: Processing failed op monitor for 
>> d_bird:1 on DEM-2: not running (7)
>> 
>> If that same cloned resource is recovered I still see that same message 
>> appear in the logfile.
>> But crm_mon shows it correctly and it functions correctly.
>> 
>> However when I restart the other node (or restart only the pacemaker 
>> service) it reappears as failed in the crm_mon and the cluster behaves as it 
>> is failing, however it isn't.
> 
> It comes down to this:
> 
> # PCMK_trace_functions=unpack_rsc_op tools/crm_mon -x 
> pcmk-vr-02-aug-2013/DEM-1/pengine/pe-input-8.bz2 -V 2>&1 | grep -v -e 
> d_bird_subnet_state -e d_bird6 | grep "Unpacking task.*d_bird.*DEM-2"
> 
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_last_0/start (call_id=41, status=0, rc=0, time=1375428240) on DEM-2 
> (role=Unknown)
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_last_failure_0/monitor (call_id=51, status=0, rc=7, time=1375428590) 
> on DEM-2 (role=Started)
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_monitor_10000/monitor (call_id=51, status=0, rc=0, time=1375428811) on 
> DEM-2 (role=Started)
> 
> vs.
> 
> # PCMK_trace_functions=unpack_rsc_op tools/crm_mon -x 
> pcmk-vr-02-aug-2013/DEM-2/pengine/pe-input-0.bz2 -V 2>&1 | grep -v -e 
> d_bird_subnet_state -e d_bird6 | grep "Unpacking task.*d_bird.*DEM-2" 
> 
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_last_0/start (call_id=41, status=0, rc=0, time=1375428240) on DEM-2 
> (role=Unknown)
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_monitor_10000/monitor (call_id=51, status=0, rc=0, time=1375428240) on 
> DEM-2 (role=Started)
> (    unpack.c:2100  )   trace: unpack_rsc_op:         Unpacking task 
> d_bird_last_failure_0/monitor (call_id=51, status=0, rc=7, time=1375428590) 
> on DEM-2 (role=Started)
> 
> Note the value of 'time' for d_bird_monitor_10000 in the two cases.
> 
> Now I just need to figure out why the value '1375428811' got lost.
> 
>> I have to perform a "crm resource cleanup <resource>" to clear this 
>> behaviour.
>> 
>> I captured this in the attached crm_report.
>> 
>> gr.
>> Johan
>> 
>> On 02-08-13 05:14, Andrew Beekhof wrote:
>>> On 02/08/2013, at 11:42 AM, Andrew Beekhof <and...@beekhof.net> wrote:
>>> 
>>>> On 02/08/2013, at 11:33 AM, Andrew Beekhof <and...@beekhof.net> wrote:
>>>> 
>>>>> On 01/08/2013, at 5:38 PM, Johan Huysmans <johan.huysm...@inuits.be> 
>>>>> wrote:
>>>>> 
>>>>>> I forgot to mention:
>>>>>> 
>>>>>> I'm using a build from git (Version: 1.1.11-1.el6-42f2063).
>>>>>> I used the same config on an old 1.1.10 rc (rc6 or before) and that 
>>>>>> worked, as of rc7 it didn't work anymore.
>>>>> I will have a look, but why are you setting on-fail=block for everything?
>>>> Ironically the log message for the commit which broke this was:
>>>> 
>>>> commit faa883cf7927d84f61f29211fe6e2980de645620
>>>>   Bug: cl#5170 - Correctly support on-fail=block for clones
>>> Fixed in:
>>> 
>>>     https://github.com/beekhof/pacemaker/commit/66a3ea6
>>> 
>>> + Andrew Beekhof (2 minutes ago) 66a3ea6: Fix: PE: Do not allow colocation 
>>> with blocked clone instances  (HEAD, master)
>>> + Andrew Beekhof (21 minutes ago) b2c105b: Fix: PE: Do not re-allocate 
>>> clone instances that are blocked in the Stopped state
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> <pcmk-vr-02-aug-2013.tar.bz2>_______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] resource with colocation rule doesn't fail

Reply via email to