Re: [Pacemaker] Filesystem primitive does not start when one of nodes is switched off

emmanuel segura Thu, 16 Feb 2012 00:39:10 -0800

Sorry but i have one stupid question

Do you have a dlm clone resource for use gfs2 in your cluster?


2012/2/16 Богомолов Дмитрий Викторович <beats...@mail.ru>

> Hi,
> this is $cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: DA5A13F16DE6553FC7CE9B2
>
>  1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>    ns:0 nr:0 dw:0 dr:1616 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
> oos:327516
>
> i tryed to mount drbd resource by hand with:
> mount /dev/drbd/by-res/clusterdata /mnt/cluster
> and with
> mount /dev/drbd/by-disk/mapper/turrel-cluster_storage /mnt/cluster
> and with
> mount /dev/drbd1 /mnt/cluster
> each make this log entrys:
> Feb 16 15:00:52 turrel kernel: [80365.686822] dlm_new_lockspace error -512
> Feb 16 15:00:52 turrel kernel: [80539.590344] GFS2: fsid=: Trying to join
> cluster "lock_dlm", "tumba:data"
> Feb 16 15:00:52 turrel kernel: [80539.603545] dlm: Using TCP for
> communications
> Feb 16 15:00:52 turrel dlm_controld[855]: process_uevent online@ error
> -17 errno 11
>
> both tasks hang, only kill -9 can help
> after killing task i have this log entry:
>
> Feb 16 15:02:50 turrel kernel: [80657.576111] dlm: data: group join failed
> -512 0
>
> I can check gfs2 filesystem with:
> fsck.gfs2 /dev/drbd1
>
> Initializing fsck
> Validating Resource Group index.
> Level 1 RG check.
> (level 1 passed)
> Starting pass1
> Pass1 complete
> Starting pass1b
> Pass1b complete
> Starting pass1c
> Pass1c complete
> Starting pass2
> Pass2 complete
> Starting pass3
> Pass3 complete
> Starting pass4
> Pass4 complete
> Starting pass5
> Pass5 complete
> gfs2_fsck complete
>
> So, whart is going wrong? I can't get it.
>
> > Hi,
> > I have a trouble with my test configuration.
> > I build an Actice/Active cluster
> Ubuntu(11.10)+DRBD+Cman+Pacemaker+gfs2+Xen for test purpose.
> > Now i am doing some tests with availability. I am try to start  cluster
> on one node.
> >
> > Trouble is - the Filesystem primitive ClusterFS (fs type=gfs2) does not
> start when one of two nodes is switched off.
> >
> > Here my configuration:
> >
> > node blaster \
> >         attributes standby="off"
> > node turrel \
> >         attributes standby="off"
> > primitive ClusterData ocf:linbit:drbd \
> >         params drbd_resource="clusterdata" \
> >         op monitor interval="60s"
> > primitive ClusterFS ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd/by-res/clusterdata"
> directory="/mnt/cluster" fstype="gfs2" \
> >         op start interval="0" timeout="60s" \
> >         op stop interval="0" timeout="60s" \
> >         op monitor interval="60s" timeout="60s"
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.122.252" cidr_netmask="32"
> clusterip_hash="sourceip" \
> >         op monitor interval="30s"
> > primitive SSH-stonith stonith:ssh \
> >         params hostlist="turrel blaster" \
> >         op monitor interval="60s"
> > primitive XenDom ocf:heartbeat:Xen \
> >         params xmfile="/etc/xen/xen1.example.com.cfg" \
> >         meta allow-migrate="true" is-managed="true"
> target-role="Stopped" \
> >         utilization cores="1" mem="512" \
> >         op monitor interval="30s" timeout="30s" \
> >         op start interval="0" timeout="90s" \
> >         op stop interval="0" timeout="300s"
> > ms ClusterDataClone ClusterData \
> >         meta master-max="2" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> > clone ClusterFSClone ClusterFS \
> >         meta target-role="Started" is-managed="true"
> > clone IP ClusterIP \
> >         meta globally-unique="true" clone-max="2" clone-node-max="2"
> > clone SSH-stonithClone SSH-stonith
> > location prefere-blaster XenDom 50: blaster
> > colocation XenDom-with-ClusterFS inf: XenDom ClusterFSClone
> > colocation fs_on_drbd inf: ClusterFSClone ClusterDataClone:Master
> > order ClusterFS-after-ClusterData inf: ClusterDataClone:promote
> ClusterFSClone:start
> > order XenDom-after-ClusterFS inf: ClusterFSClone XenDom
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
> >         cluster-infrastructure="cman" \
> >         expected-quorum-votes="2" \
> >         stonith-enabled="true" \
> >         no-quorum-policy="ignore" \
> >         last-lrm-refresh="1329194925"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> > Here is an $crm resource show:
> >
> > Master/Slave Set: ClusterDataClone [ClusterData]
> >      Masters: [ turrel ]
> >      Stopped: [ ClusterData:1 ]
> >  Clone Set: IP [ClusterIP] (unique)
> >      ClusterIP:0        (ocf::heartbeat:IPaddr2) Started
> >      ClusterIP:1        (ocf::heartbeat:IPaddr2) Started
> >  Clone Set: ClusterFSClone [ClusterFS]
> >      Stopped: [ ClusterFS:0 ClusterFS:1 ]
> >  Clone Set: SSH-stonithClone [SSH-stonith]
> >      Started: [ turrel ]
> >      Stopped: [ SSH-stonith:1 ]
> >  XenDom (ocf::heartbeat:Xen) Stopped
> >
> > I tryed:
> > crm(live)resource# cleanup ClusterFSClone
> > Cleaning up ClusterFS:0 on turrel
> > Cleaning up ClusterFS:1 on turrel
> > Waiting for 3 replies from the CRMd... OK
> >
> > I can see only warn message in /var/log/cluster/corosync.log
> > Feb 14 16:25:56 turrel pengine: [1640]: WARN: unpack_rsc_op: Processing
> failed op ClusterFS:0_start_0 on turrel: unknown exec error (-2)
> > and
> > Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness:
> Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000)
> > Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness:
> Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000)
> >
> > Direct me, please, what i need to check or else?
> >
> > Best regards,
> > Dmitriy Bogomolov
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> Best regards,
> Dmitriy Bogomolov
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Filesystem primitive does not start when one of nodes is switched off

Reply via email to