No one with an idea? Or can someone tell me if it is even possible?
Thanks Frank Am 23.01.2014 10:50, schrieb Frank Brendel: > Hi list, > > I have some trouble configuring a resource that is allowed to fail > once in two minutes. > The documentation states that I have to configure migration-threshold > and failure-timeout to achieve this. > Here is the configuration for the resource. > > # pcs config > Cluster Name: mycluster > Corosync Nodes: > > Pacemaker Nodes: > Node1 Node2 Node3 > > Resources: > Clone: resClamd-clone > Meta Attrs: clone-max=3 clone-node-max=1 interleave=true > Resource: resClamd (class=lsb type=clamd) > Meta Attrs: failure-timeout=120s migration-threshold=2 > Operations: monitor on-fail=restart interval=60s > (resClamd-monitor-on-fail-restart) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > Colocation Constraints: > > Cluster Properties: > cluster-infrastructure: cman > dc-version: 1.1.10-14.el6_5.1-368c726 > last-lrm-refresh: 1390468150 > stonith-enabled: false > > # pcs resource defaults > resource-stickiness: INFINITY > > # pcs status > Cluster name: mycluster > Last updated: Thu Jan 23 10:12:49 2014 > Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node2 > Stack: cman > Current DC: Node2 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 3 Nodes configured > 3 Resources configured > > > Online: [ Node1 Node2 Node3 ] > > Full list of resources: > > Clone Set: resClamd-clone [resClamd] > Started: [ Node1 Node2 Node3 ] > > > Stopping the clamd daemon sets the failcount to 1 and the daemon is > started again. Ok. > > > # service clamd stop > Stopping Clam AntiVirus Daemon: [ OK ] > > /var/log/messages > Jan 23 10:15:20 Node1 crmd[6075]: notice: process_lrm_event: > Node1-resClamd_monitor_60000:305 [ clamd is stopped\n ] > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update > relayed from Node2 > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_trigger_update: > Sending flush op to all hosts for: fail-count-resClamd (1) > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_perform_update: > Sent update 177: fail-count-resClamd=1 > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update > relayed from Node2 > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_trigger_update: > Sending flush op to all hosts for: last-failure-resClamd (1390468520) > Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_perform_update: > Sent update 179: last-failure-resClamd=1390468520 > Jan 23 10:15:20 Node1 crmd[6075]: notice: process_lrm_event: > Node1-resClamd_monitor_60000:305 [ clamd is stopped\n ] > Jan 23 10:15:21 Node1 crmd[6075]: notice: process_lrm_event: LRM > operation resClamd_stop_0 (call=310, rc=0, cib-update=110, > confirmed=true) ok > Jan 23 10:15:30 elmailtst1 crmd[6075]: notice: process_lrm_event: > LRM operation resClamd_start_0 (call=314, rc=0, cib-update=111, > confirmed=true) ok > Jan 23 10:15:30 elmailtst1 crmd[6075]: notice: process_lrm_event: > LRM operation resClamd_monitor_60000 (call=317, rc=0, cib-update=112, > confirmed=false) ok > > # pcs status > Cluster name: mycluster > Last updated: Thu Jan 23 10:16:48 2014 > Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node1 > Stack: cman > Current DC: Node2 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 3 Nodes configured > 3 Resources configured > > > Online: [ Node1 Node2 Node3 ] > > Full list of resources: > > Clone Set: resClamd-clone [resClamd] > Started: [ Node1 Node2 Node3 ] > > Failed actions: > resClamd_monitor_60000 on Node1 'not running' (7): call=305, > status=complete, last-rc-change='Thu Jan 23 10:15:20 2014', > queued=0ms, exec=0ms > > # pcs resource failcount show resClamd > Failcounts for resClamd > Node1: 1 > > > After 7 Minutes I let it fail again and as I understood it should be > started as well. But it doesn't. > > > # service clamd stop > Stopping Clam AntiVirus Daemon: [ OK ] > > Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: LRM > operation resClamd_monitor_60000 (call=317, rc=7, cib-update=113, > confirmed=false) not running > Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: > Node1-resClamd_monitor_60000:317 [ clamd is stopped\n ] > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update > relayed from Node2 > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_trigger_update: > Sending flush op to all hosts for: fail-count-resClamd (2) > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_perform_update: > Sent update 181: fail-count-resClamd=2 > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update > relayed from Node2 > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_trigger_update: > Sending flush op to all hosts for: last-failure-resClamd (1390468950) > Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_perform_update: > Sent update 183: last-failure-resClamd=1390468950 > Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: > Node1-resClamd_monitor_60000:317 [ clamd is stopped\n ] > Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: LRM > operation resClamd_stop_0 (call=322, rc=0, cib-update=114, > confirmed=true) ok > > # pcs status > Cluster name: mycluster > Last updated: Thu Jan 23 10:22:41 2014 > Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node1 > Stack: cman > Current DC: Node2 - partition with quorum > Version: 1.1.10-14.el6_5.1-368c726 > 3 Nodes configured > 3 Resources configured > > > Online: [ Node1 Node2 Node3 ] > > Full list of resources: > > Clone Set: resClamd-clone [resClamd] > Started: [ Node2 Node3 ] > Stopped: [ Node1 ] > > Failed actions: > resClamd_monitor_60000 on Node1 'not running' (7): call=317, > status=complete, last-rc-change='Thu Jan 23 10:22:30 2014', > queued=0ms, exec=0ms > > > What's wrong with my configuration? > > > Thanks in advance > Frank > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org