Hi, On Thu, May 20, 2010 at 06:09:01PM +0200, Gianluca Cecchi wrote: > Hello, > manual for 1.0 (and 1.1) reports this for Advisory Ordering: > > On the other-hand, when score="0" is specified for a constraint, the > constraint is considered optional and only has an effect when both resources > are stopping and or starting. Any change in state by the first resource will > have no effect on the then resource. > > (there is also a link to a > http://www.clusterlabs.org/mediawiki/images/d/d6/Ordering_Explained.pdf to > go deeper with constraints, but it seems broken right now...) > > Is this also true for order defined between a group and a clone and not > between resources? > Because I have this config > > order apache_after_nfsd 0: nfs-group apache_clone > > where > > group nfs-group lv_drbd0 ClusterIP NfsFS nfssrv \ > meta target-role="Started" > > group apache_group nfsclient apache \ > meta target-role="Started" > > clone apache_clone apache_group \ > meta target-role="Started" > > And when I have both nodes up but with corosync stoppped on both and I start > corosync on one node, I see in logs that: > - inside nfs-group the lv_drbd0 (linbit drbd resource) is just promoted but > the following components (nfssrv in particular) have not started yet > - the nfsclient part of apache_clone tries to start, but fails because the > nfssrv is not in place yet > > I get the same problem if I change into > order apache_after_nfsd 0: nfssrv apache_clone > > So I presume the problem could be caused by the fact that the second part is > a clone and not a resource? or a bug? > I can eventually send the whole config.
Looks like a bug to me. Clone or resource, constraints should be observed. Perhaps it's a duplicate of this one: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2422 > Setting a value different from 0 for the interval parameter of op start for > nfsclient doesn't make sense, correct? Correct. > What would it determine? > A start every x seconds of the resource? Yes. crmd wouldn't even allow it. Thanks, Dejan > At the end of the process I have: > [r...@webtest1 ]# crm_mon -fr1 > ============ > Last updated: Thu May 20 17:58:38 2010 > Stack: openais > Current DC: webtest1. - partition WITHOUT quorum > Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ webtest1. ] > OFFLINE: [ webtest2. ] > > Full list of resources: > > Master/Slave Set: NfsData > Masters: [ webtest1. ] > Stopped: [ nfsdrbd:1 ] > Resource Group: nfs-group > lv_nfsdata_drbd (ocf::heartbeat:LVM): Started webtest1. > NfsFS (ocf::heartbeat:Filesystem): Started webtest1. > VIPlbtest (ocf::heartbeat:IPaddr2): Started webtest1. > nfssrv (ocf::heartbeat:nfsserver): Started webtest1. > Clone Set: cl-pinggw > Started: [ webtest1. ] > Stopped: [ pinggw:1 ] > Clone Set: apache_clone > Stopped: [ apache_group:0 apache_group:1 ] > > Migration summary: > * Node webtest1.: pingd=200 > nfsclient:0: migration-threshold=1000000 fail-count=1000000 > > Failed actions: > nfsclient:0_start_0 (node=webtest1., call=15, rc=1, status=complete): > unknown error > > > Example logs for the second case: > > > May 20 17:33:55 webtest1 pengine: [14080]: info: determine_online_status: > Node webtest1. is online > May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print: > Master/Slave Set: NfsData > May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print: > Stopped: [ nfsdrbd:0 nfsdrbd:1 ] > May 20 17:33:55 webtest1 pengine: [14080]: notice: group_print: Resource > Group: nfs-group > May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: > lv_nfsdata_drbd (ocf::heartbeat:LVM): Stopped > May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: NfsFS > (ocf::heartbeat:Filesystem): Stopped > May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: > VIPlbtest (ocf::heartbeat:IPaddr2): Stopped > May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print: nfssrv > (ocf::heartbeat:nfsserver): Stopped > ... > May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print: Clone Set: > apache_clone > May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print: > Stopped: [ apache_group:0 apache_group:1 ] > ... > May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start > nfsdrbd:0 (webtest1.) > ... > May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start > nfsclient:0 (webtest1.) > May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start > apache:0 (webtest1.) > ... > May 20 17:33:57 webtest1 kernel: block drbd0: Starting worker thread (from > cqueue/0 [68]) > May 20 17:33:57 webtest1 kernel: block drbd0: disk( Diskless -> Attaching ) > May 20 17:33:57 webtest1 kernel: block drbd0: Found 4 transactions (7 active > extents) in activity log. > May 20 17:33:57 webtest1 kernel: block drbd0: Method to ensure write > ordering: barrier > May 20 17:33:57 webtest1 kernel: block drbd0: max_segment_size ( = BIO size > ) = 32768 > May 20 17:33:57 webtest1 kernel: block drbd0: drbd_bm_resize called with > capacity == 8388280 > May 20 17:33:57 webtest1 kernel: block drbd0: resync bitmap: bits=1048535 > words=32768 > May 20 17:33:57 webtest1 kernel: block drbd0: size = 4096 MB (4194140 KB) > May 20 17:33:57 webtest1 kernel: block drbd0: recounting of set bits took > additional 0 jiffies > May 20 17:33:57 webtest1 kernel: block drbd0: 144 KB (36 bits) marked > out-of-sync by on disk bit-map. > May 20 17:33:57 webtest1 kernel: block drbd0: disk( Attaching -> UpToDate ) > pdsk( DUnknown -> Outdated ) > May 20 17:33:57 webtest1 kernel: block drbd0: conn( StandAlone -> > Unconnected ) > May 20 17:33:57 webtest1 kernel: block drbd0: Starting receiver thread (from > drbd0_worker [14378]) > May 20 17:33:57 webtest1 kernel: block drbd0: receiver (re)started > May 20 17:33:57 webtest1 kernel: block drbd0: conn( Unconnected -> > WFConnection ) > May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: > (nfsdrbd:0:start:stdout) > May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending > flush op to all hosts for: master-nfsdrbd:0 (10000) > May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_perform_update: Sent > update 11: master-nfsdrbd:0=10000 > May 20 17:33:57 webtest1 crmd: [14081]: info: abort_transition_graph: > te_update_diff:146 - Triggered transition abort (complete=0, > tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.11) : Transient > attribute: update > May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: > (nfsdrbd:0:start:stdout) > May 20 17:33:57 webtest1 crmd: [14081]: info: process_lrm_event: LRM > operation nfsdrbd:0_start_0 (call=10, rc=0, cib-update=37, confirmed=true) > ok > May 20 17:33:57 webtest1 crmd: [14081]: info: match_graph_event: Action > nfsdrbd:0_start_0 (12) confirmed on webtest1. (rc=0) > May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 15 fired and confirmed > May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 18 fired and confirmed > May 20 17:33:57 webtest1 crmd: [14081]: info: te_rsc_command: Initiating > action 90: notify nfsdrbd:0_post_notify_start_0 on webtest1. (local) > May 20 17:33:57 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing > key=90:1:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) > May 20 17:33:57 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:12: notify > May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output: > (nfsdrbd:0:notify:stdout) > ... > May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: Promoting > nfsdrbd:0 (Slave webtest1.) > May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: NfsData: > Promoted 1 instances of a possible 1 to master > ... > May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating > action 85: notify nfsdrbd:0_pre_notify_promote_0 on webtest1. (local) > May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing > key=85:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) > May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:14: notify > May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 47 fired and confirmed > May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating > action 43: start nfsclient:0_start_0 on webtest1. (local) > May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing > key=43:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsclient:0_start_0 ) > May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsclient:0:15: start > May 20 17:34:01 webtest1 crmd: [14081]: info: process_lrm_event: LRM > operation nfsdrbd:0_notify_0 (call=14, rc=0, cib-update=41, confirmed=true) > ok > May 20 17:34:01 webtest1 crmd: [14081]: info: match_graph_event: Action > nfsdrbd:0_pre_notify_promote_0 (85) confirmed on webtest1. (rc=0) > May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 23 fired and confirmed > ... > May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 20 fired and confirmed > May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating > action 7: promote nfsdrbd:0_promote_0 on webtest1. (local) > May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing > key=7:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_promote_0 ) > May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:16: promote > May 20 17:34:02 webtest1 kernel: block drbd0: role( Secondary -> Primary ) > May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output: > (nfsdrbd:0:promote:stdout) > May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM > operation nfsdrbd:0_promote_0 (call=16, rc=0, cib-update=42, confirmed=true) > ok > May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action > nfsdrbd:0_promote_0 (7) confirmed on webtest1. (rc=0) > May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 21 fired and confirmed > May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 24 fired and confirmed > May 20 17:34:02 webtest1 crmd: [14081]: info: te_rsc_command: Initiating > action 86: notify nfsdrbd:0_post_notify_promote_0 on webtest1. (local) > May 20 17:34:02 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing > key=86:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 ) > May 20 17:34:02 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:17: notify > May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output: > (nfsdrbd:0:notify:stdout) > May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM > operation nfsdrbd:0_notify_0 (call=17, rc=0, cib-update=43, confirmed=true) > ok > May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action > nfsdrbd:0_post_notify_promote_0 (86) confirmed on webtest1. (rc=0) > May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo > action 25 fired and confirmed > May 20 17:34:02 webtest1 Filesystem[14438]: INFO: Running start for > viplbtest.:/nfsdata/web on /usr/local/data > May 20 17:34:06 webtest1 crmd: [14081]: info: process_lrm_event: LRM > operation pinggw:0_monitor_10000 (call=13, rc=0, cib-update=44, > confirmed=false) ok > May 20 17:34:06 webtest1 crmd: [14081]: info: match_graph_event: Action > pinggw:0_monitor_10000 (38) confirmed on webtest1. (rc=0) > May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending > flush op to all hosts for: pingd (200) > May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_perform_update: Sent > update 14: pingd=200 > May 20 17:34:11 webtest1 crmd: [14081]: info: abort_transition_graph: > te_update_diff:146 - Triggered transition abort (complete=0, > tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.19) : Transient > attribute: update > May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort > priority upgraded from 0 to 1000000 > May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort > action done superceeded by restart > May 20 17:34:14 webtest1 lrmd: [14078]: info: RA output: > (nfsclient:0:start:stderr) mount: mount to NFS server 'viplbtest.' failed: > System Error: No route to host. > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf