Hi Junko-san, On Thu, May 21, 2009 at 06:32:52PM +0900, Junko IKEDA wrote: > Hi, > > I have 4 nodes (dl380g5a, dl380g5b, dl380g5c, dl380g5d), > and run 1 clone resource with the following configuration. > clone_max"="2" > clone_node_max"="1" > > (1) Initial state > dummy:0 dl380g5a > dummy:1 dl380g5b > > (2) dummy:1 break down, and move to dl380g5c > dummy:0 dl380g5a > dummy:1 dl380g5c > > (3) dummy:1 break down again, move to dl380g5d > dummy:0 dl380g5a > dummy:1 dl380g5d > > (4) Now, the failconts for dummy:1 are; > dl380g5c = 1 > dl380g5d = 1 > > I tried to delete the failcount using crm. > But it seems that delete switch for clone resource doesn't work. > > crm(live)resource# failcount dummy:1 show dl380g5c > scope=status name=fail-count-dummy:1 value=1 > crm(live)resource# failcount dummy:1 delete dl380g5c
I can see in the logs this: crm_attribute -N dl380g5c -n fail-count-dummy:1 -D -t status -d 0 Well, that should've deleted the failcount. Unfortunately, can't see anything in the logs. I think that you should file a bug. > crm(live)resource# failcount dummy:1 show dl380g5c > scope=status name=fail-count-dummy:1 value=1 > > set value "0" worked. > crm(live)resource# failcount dummy:1 set dl380g5c 0 > crm(live)resource# failcount dummy:1 show dl380g5c > scope=status name=fail-count-dummy:1 value=0 > > Is this case only in the clone resrouce? Not sure what you mean. > And anothre thing. > After set value "0", > The failcount was deleted not only for dl380g5c but also dl380g5d. The set value command I can see in the logs is this: crm_attribute -N dl380g5c -n fail-count-dummy:1 -v 0 -t status -d 0 That worked fine. In dl380g5d/pengine/pe-input-4.bz2 I can still see that the fail-count for dummy:1 at 5b is set to 1. Then, in dl380g5d/pengine/pe-input-5.bz2 it is not set to 0 but gone. I'm really not sure what triggered the latter transition. Andrew? > I expected that "failcount <rsc> show _<node>_" could specify one node. > Is there any wrong configurations? Sorry, you lost me here as well. BTW, I can't find the changeset id from the hb_report in the repository: CRM Version: 1.0.3 (2e35b8ac90a327c77ff869e1189fc70234213906) Thanks, Dejan > See also attatched hb_report. > > Best Regards, > Junko Ikeda > > NTT DATA INTELLILINK CORPORATION > > ???(1) initial state > dummy:0 dl380g5a > dummy:1 dl380g5b > > ============ > Last updated: Thu May 21 17:45:16 2009 > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with > quorum > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906 > 4 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ] > > Clone Set: clone > Started: [ dl380g5a dl380g5b ] > > Operations: > * Node dl380g5a: > dummy:0: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5d: > * Node dl380g5b: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5c: > > > (2) dummy:1 break down, and dummy:1 move to dl380g5c > dummy:0 dl380g5a > dummy:1 dl380g5c > > ============ > Last updated: Thu May 21 17:46:21 2009 > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with > quorum > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906 > 4 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ] > > Clone Set: clone > Started: [ dl380g5a dl380g5c ] > > Operations: > * Node dl380g5a: > dummy:0: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5d: > * Node dl380g5b: > dummy:1: migration-threshold=1 fail-count=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=7 (not running) > + (5) stop: rc=0 (ok) > * Node dl380g5c: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > > Failed actions: > dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not > running > > > (3) dummy:1 break down again, and dummy:1 move to dl380g5d > dummy:0 dl380g5a > dummy:1 dl380g5d > > ============ > Last updated: Thu May 21 17:46:51 2009 > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with > quorum > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906 > 4 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ] > > Clone Set: clone > Started: [ dl380g5a dl380g5d ] > > Operations: > * Node dl380g5a: > dummy:0: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5d: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5b: > dummy:1: migration-threshold=1 fail-count=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=7 (not running) > + (5) stop: rc=0 (ok) > * Node dl380g5c: > dummy:1: migration-threshold=1 fail-count=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=7 (not running) > + (5) stop: rc=0 (ok) > > Failed actions: > dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not > running > dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7, status=complete): not > running > > > (4) Now, the failconts for dummy:1 are; > dl380g5c = 1 > dl380g5d = 1 > > ============ > Last updated: Thu May 21 17:48:06 2009 > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with > quorum > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906 > 4 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ] > > Clone Set: clone > Started: [ dl380g5a dl380g5d ] > > Operations: > * Node dl380g5a: > dummy:0: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5d: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=0 (ok) > * Node dl380g5b: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=7 (not running) > + (5) stop: rc=0 (ok) > * Node dl380g5c: > dummy:1: migration-threshold=1 > + (3) start: rc=0 (ok) > + (4) monitor: interval=10000ms rc=7 (not running) > + (5) stop: rc=0 (ok) > > Failed actions: > dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not > running > dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7, status=complete): not > running > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker