I believe this patch should help: https://github.com/beekhof/pacemaker/commit/7a0a6f8
Can you give it a try? On 07/08/2013, at 12:28 PM, Andrew Beekhof <and...@beekhof.net> wrote: > > On 02/08/2013, at 5:56 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote: > >> Hi Andrew, >> >> Thanks for the fix. >> I tried it on my setup and now when a cloned resource fails the group will >> move to the other node as expected. >> >> However I noticed something strange. >> If a cloned resource is failing I see this in the logs: >> pengine[12178]: warning: unpack_rsc_op: Processing failed op monitor for >> d_bird:1 on DEM-2: not running (7) >> >> If that same cloned resource is recovered I still see that same message >> appear in the logfile. >> But crm_mon shows it correctly and it functions correctly. >> >> However when I restart the other node (or restart only the pacemaker >> service) it reappears as failed in the crm_mon and the cluster behaves as it >> is failing, however it isn't. > > It comes down to this: > > # PCMK_trace_functions=unpack_rsc_op tools/crm_mon -x > pcmk-vr-02-aug-2013/DEM-1/pengine/pe-input-8.bz2 -V 2>&1 | grep -v -e > d_bird_subnet_state -e d_bird6 | grep "Unpacking task.*d_bird.*DEM-2" > > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_last_0/start (call_id=41, status=0, rc=0, time=1375428240) on DEM-2 > (role=Unknown) > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_last_failure_0/monitor (call_id=51, status=0, rc=7, time=1375428590) > on DEM-2 (role=Started) > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_monitor_10000/monitor (call_id=51, status=0, rc=0, time=1375428811) on > DEM-2 (role=Started) > > vs. > > # PCMK_trace_functions=unpack_rsc_op tools/crm_mon -x > pcmk-vr-02-aug-2013/DEM-2/pengine/pe-input-0.bz2 -V 2>&1 | grep -v -e > d_bird_subnet_state -e d_bird6 | grep "Unpacking task.*d_bird.*DEM-2" > > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_last_0/start (call_id=41, status=0, rc=0, time=1375428240) on DEM-2 > (role=Unknown) > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_monitor_10000/monitor (call_id=51, status=0, rc=0, time=1375428240) on > DEM-2 (role=Started) > ( unpack.c:2100 ) trace: unpack_rsc_op: Unpacking task > d_bird_last_failure_0/monitor (call_id=51, status=0, rc=7, time=1375428590) > on DEM-2 (role=Started) > > Note the value of 'time' for d_bird_monitor_10000 in the two cases. > > Now I just need to figure out why the value '1375428811' got lost. > >> I have to perform a "crm resource cleanup <resource>" to clear this >> behaviour. >> >> I captured this in the attached crm_report. >> >> gr. >> Johan >> >> On 02-08-13 05:14, Andrew Beekhof wrote: >>> On 02/08/2013, at 11:42 AM, Andrew Beekhof <and...@beekhof.net> wrote: >>> >>>> On 02/08/2013, at 11:33 AM, Andrew Beekhof <and...@beekhof.net> wrote: >>>> >>>>> On 01/08/2013, at 5:38 PM, Johan Huysmans <johan.huysm...@inuits.be> >>>>> wrote: >>>>> >>>>>> I forgot to mention: >>>>>> >>>>>> I'm using a build from git (Version: 1.1.11-1.el6-42f2063). >>>>>> I used the same config on an old 1.1.10 rc (rc6 or before) and that >>>>>> worked, as of rc7 it didn't work anymore. >>>>> I will have a look, but why are you setting on-fail=block for everything? >>>> Ironically the log message for the commit which broke this was: >>>> >>>> commit faa883cf7927d84f61f29211fe6e2980de645620 >>>> Bug: cl#5170 - Correctly support on-fail=block for clones >>> Fixed in: >>> >>> https://github.com/beekhof/pacemaker/commit/66a3ea6 >>> >>> + Andrew Beekhof (2 minutes ago) 66a3ea6: Fix: PE: Do not allow colocation >>> with blocked clone instances (HEAD, master) >>> + Andrew Beekhof (21 minutes ago) b2c105b: Fix: PE: Do not re-allocate >>> clone instances that are blocked in the Stopped state >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> <pcmk-vr-02-aug-2013.tar.bz2>_______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org