On Tue, May 25, 2010 at 3:39 PM, Dejan Muhamedagic <deja...@fastmail.fm> wrote: > Hi, > > On Thu, May 20, 2010 at 06:09:01PM +0200, Gianluca Cecchi wrote: >> Hello, >> manual for 1.0 (and 1.1) reports this for Advisory Ordering: >> >> On the other-hand, when score="0" is specified for a constraint, the >> constraint is considered optional and only has an effect when both resources >> are stopping and or starting. Any change in state by the first resource will >> have no effect on the then resource. >> >> (there is also a link to a >> http://www.clusterlabs.org/mediawiki/images/d/d6/Ordering_Explained.pdf to >> go deeper with constraints, but it seems broken right now...) >> >> Is this also true for order defined between a group and a clone and not >> between resources? >> Because I have this config >> >> order apache_after_nfsd 0: nfs-group apache_clone >> >> where >> >> group nfs-group lv_drbd0 ClusterIP NfsFS nfssrv \ >> meta target-role="Started" >> >> group apache_group nfsclient apache \ >> meta target-role="Started" >> >> clone apache_clone apache_group \ >> meta target-role="Started" >> >> And when I have both nodes up but with corosync stoppped on both and I start >> corosync on one node, I see in logs that: >> - inside nfs-group the lv_drbd0 (linbit drbd resource) is just promoted but >> the following components (nfssrv in particular) have not started yet >> - the nfsclient part of apache_clone tries to start, but fails because the >> nfssrv is not in place yet >> >> I get the same problem if I change into >> order apache_after_nfsd 0: nfssrv apache_clone >> >> So I presume the problem could be caused by the fact that the second part is >> a clone and not a resource? or a bug? >> I can eventually send the whole config. > > Looks like a bug to me. Clone or resource, constraints should be > observed. Perhaps it's a duplicate of this one: > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2422
No. That one only applies to interleaved clone-to-clone ordering when one clone contains a group and the group is partially active. Quite a specific scenario. > >> Setting a value different from 0 for the interval parameter of op start for >> nfsclient doesn't make sense, correct? > > Correct. > >> What would it determine? >> A start every x seconds of the resource? > > Yes. crmd wouldn't even allow it. > > Thanks, > > Dejan > >> At the end of the process I have: >> [r...@webtest1 ]# crm_mon -fr1 >> ============ >> Last updated: Thu May 20 17:58:38 2010 >> Stack: openais >> Current DC: webtest1. - partition WITHOUT quorum >> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 >> 2 Nodes configured, 2 expected votes >> 4 Resources configured. >> ============ >> >> Online: [ webtest1. ] >> OFFLINE: [ webtest2. ] >> >> Full list of resources: >> >> Master/Slave Set: NfsData >> Masters: [ webtest1. ] >> Stopped: [ nfsdrbd:1 ] >> Resource Group: nfs-group >> lv_nfsdata_drbd (ocf::heartbeat:LVM): Started webtest1. >> NfsFS (ocf::heartbeat:Filesystem): Started webtest1. >> VIPlbtest (ocf::heartbeat:IPaddr2): Started webtest1. >> nfssrv (ocf::heartbeat:nfsserver): Started webtest1. >> Clone Set: cl-pinggw >> Started: [ webtest1. ] >> Stopped: [ pinggw:1 ] >> Clone Set: apache_clone >> Stopped: [ apache_group:0 apache_group:1 ] >> >> Migration summary: >> * Node webtest1.: pingd=200 >> nfsclient:0: migration-threshold=1000000 fail-count=1000000 >> >> Failed actions: >> nfsclient:0_start_0 (node=webtest1., call=15, rc=1, status=complete): >> unknown error >> >> >> Example logs for the second case: >> >> >> May 20 17:33:55 webtest1 pengine: [14080]: info: determine_online_status: >> Node webtest1. is online >> May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print: >> Master/Slave Set: NfsData >> May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print: >> Stopped: [ nfsdrbd:0 nfsdrbd:1 ] >> May 20 17:33:55 webtest1 pengine: [14080]: notice: group_print: Resource >> Group: nfs-group >> May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: >> lv_nfsdata_drbd (ocf::heartbeat:LVM): Stopped >> May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: NfsFS >> (ocf::heartbeat:Filesystem): Stopped >> May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: >> VIPlbtest (ocf::heartbeat:IPaddr2): Stopped >> May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: nfssrv >> (ocf::heartbeat:nfsserver): Stopped >> ... >> May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print: Clone Set: >> apache_clone >> May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print: >> Stopped: [ apache_group:0 apache_group:1 ] >> ... >> May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start >> nfsdrbd:0 (webtest1.) >> ... >> May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start >> nfsclient:0 (webtest1.) >> May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start >> apache:0 (webtest1.) >> ... >> May 20 17:33:57 webtest1 kernel: block drbd0: Starting worker thread (from >> cqueue/0 [68]) >> May 20 17:33:57 webtest1 kernel: block drbd0: disk( Diskless -> Attaching ) >> May 20 17:33:57 webtest1 kernel: block drbd0: Found 4 transactions (7 active >> extents) in activity log. >> May 20 17:33:57 webtest1 kernel: block drbd0: Method to ensure write >> ordering: barrier >> May 20 17:33:57 webtest1 kernel: block drbd0: max_segment_size ( = BIO size >> ) = 32768 >> May 20 17:33:57 webtest1 kernel: block drbd0: drbd_bm_resize called with >> capacity == 8388280 >> May 20 17:33:57 webtest1 kernel: block drbd0: resync bitmap: bits=1048535 >> words=32768 >> May 20 17:33:57 webtest1 kernel: block drbd0: size = 4096 MB (4194140 KB) >> May 20 17:33:57 webtest1 kernel: block drbd0: recounting of set bits took >> additional 0 jiffies >> May 20 17:33:57 webtest1 kernel: block drbd0: 144 KB (36 bits) marked >> out-of-sync by on disk bit-map. >> May 20 17:33:57 webtest1 kernel: block drbd0: disk( Attaching -> UpToDate ) >> pdsk( DUnknown -> Outdated ) >> May 20 17:33:57 webtest1 kernel: block drbd0: conn( StandAlone -> >> Unconnected ) >> May 20 17:33:57 webtest1 kernel: block drbd0: Starting receiver thread (from >> drbd0_worker [14378]) >> May 20 17:33:57 webtest1 kernel: block drbd0: receiver (re)started >> May 20 17:33:57 webtest1 kernel: block drbd0: conn( Unconnected -> >> WFConnection ) >> May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: >> (nfsdrbd:0:start:stdout) >> May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending >> flush op to all hosts for: master-nfsdrbd:0 (10000) >> May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_perform_update: Sent >> update 11: master-nfsdrbd:0=10000 >> May 20 17:33:57 webtest1 crmd: [14081]: info: abort_transition_graph: >> te_update_diff:146 - Triggered transition abort (complete=0, >> tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.11) : Transient >> attribute: update >> May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: >> (nfsdrbd:0:start:stdout) >> May 20 17:33:57 webtest1 crmd: [14081]: info: process_lrm_event: LRM >> operation nfsdrbd:0_start_0 (call=10, rc=0, cib-update=37, confirmed=true) >> ok >> May 20 17:33:57 webtest1 crmd: [14081]: info: match_graph_event: Action >> nfsdrbd:0_start_0 (12) confirmed on webtest1. (rc=0) >> May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 15 fired and confirmed >> May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 18 fired and confirmed >> May 20 17:33:57 webtest1 crmd: [14081]: info: te_rsc_command: Initiating >> action 90: notify nfsdrbd:0_post_notify_start_0 on webtest1. (local) >> May 20 17:33:57 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing >> key=90:1:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) >> May 20 17:33:57 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:12: notify >> May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: >> (nfsdrbd:0:notify:stdout) >> ... >> May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: Promoting >> nfsdrbd:0 (Slave webtest1.) >> May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: NfsData: >> Promoted 1 instances of a possible 1 to master >> ... >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating >> action 85: notify nfsdrbd:0_pre_notify_promote_0 on webtest1. (local) >> May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing >> key=85:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) >> May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:14: notify >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 47 fired and confirmed >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating >> action 43: start nfsclient:0_start_0 on webtest1. (local) >> May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing >> key=43:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsclient:0_start_0 ) >> May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsclient:0:15: start >> May 20 17:34:01 webtest1 crmd: [14081]: info: process_lrm_event: LRM >> operation nfsdrbd:0_notify_0 (call=14, rc=0, cib-update=41, confirmed=true) >> ok >> May 20 17:34:01 webtest1 crmd: [14081]: info: match_graph_event: Action >> nfsdrbd:0_pre_notify_promote_0 (85) confirmed on webtest1. (rc=0) >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 23 fired and confirmed >> ... >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 20 fired and confirmed >> May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating >> action 7: promote nfsdrbd:0_promote_0 on webtest1. (local) >> May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing >> key=7:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_promote_0 ) >> May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:16: promote >> May 20 17:34:02 webtest1 kernel: block drbd0: role( Secondary -> Primary ) >> May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output: >> (nfsdrbd:0:promote:stdout) >> May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM >> operation nfsdrbd:0_promote_0 (call=16, rc=0, cib-update=42, confirmed=true) >> ok >> May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action >> nfsdrbd:0_promote_0 (7) confirmed on webtest1. (rc=0) >> May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 21 fired and confirmed >> May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 24 fired and confirmed >> May 20 17:34:02 webtest1 crmd: [14081]: info: te_rsc_command: Initiating >> action 86: notify nfsdrbd:0_post_notify_promote_0 on webtest1. (local) >> May 20 17:34:02 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing >> key=86:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) >> May 20 17:34:02 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:17: notify >> May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output: >> (nfsdrbd:0:notify:stdout) >> May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM >> operation nfsdrbd:0_notify_0 (call=17, rc=0, cib-update=43, confirmed=true) >> ok >> May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action >> nfsdrbd:0_post_notify_promote_0 (86) confirmed on webtest1. (rc=0) >> May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo >> action 25 fired and confirmed >> May 20 17:34:02 webtest1 Filesystem[14438]: INFO: Running start for >> viplbtest.:/nfsdata/web on /usr/local/data >> May 20 17:34:06 webtest1 crmd: [14081]: info: process_lrm_event: LRM >> operation pinggw:0_monitor_10000 (call=13, rc=0, cib-update=44, >> confirmed=false) ok >> May 20 17:34:06 webtest1 crmd: [14081]: info: match_graph_event: Action >> pinggw:0_monitor_10000 (38) confirmed on webtest1. (rc=0) >> May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending >> flush op to all hosts for: pingd (200) >> May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_perform_update: Sent >> update 14: pingd=200 >> May 20 17:34:11 webtest1 crmd: [14081]: info: abort_transition_graph: >> te_update_diff:146 - Triggered transition abort (complete=0, >> tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.19) : Transient >> attribute: update >> May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort >> priority upgraded from 0 to 1000000 >> May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort >> action done superceeded by restart >> May 20 17:34:14 webtest1 lrmd: [14078]: info: RA output: >> (nfsclient:0:start:stderr) mount: mount to NFS server 'viplbtest.' failed: >> System Error: No route to host. > >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf