On Tue, Oct 30, 2012 at 12:20 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> On Mon, Oct 29, 2012 at 4:22 PM, Soni Maula Harriz > <soni.har...@sangkuriang.co.id> wrote: > > dear all, > > i configure pacemaker and corosync on 2 Centos 6.3 servers by following > > instruction on 'Cluster from Scratch'. > > on the beginning, i follow 'Cluster from Scratch' edition 5. but, since i > > use centos, i change to 'Cluster from Scratch' edition 3 to configure > > active/active servers. > > Now on 1st server (cluster1), the Filesystem resource cannot start. the > gfs2 > > filesystem can't be mounted. > > > > this is the crm configuration > > [root@cluster2 ~]# crm configure show > > node cluster1 \ > > attributes standby="off" > > node cluster2 \ > > attributes standby="off" > > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > > params ip="xxx.xxx.xxx.229" cidr_netmask="32" > clusterip_hash="sourceip" > > \ > > op monitor interval="30s" > > primitive WebData ocf:linbit:drbd \ > > params drbd_resource="wwwdata" \ > > op monitor interval="60s" > > primitive WebFS ocf:heartbeat:Filesystem \ > > params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" > > fstype="gfs2" > > primitive WebSite ocf:heartbeat:apache \ > > params configfile="/etc/httpd/conf/httpd.conf" > > statusurl="http://localhost/server-status" \ > > op monitor interval="1min" > > ms WebDataClone WebData \ > > meta master-max="2" master-node-max="1" clone-max="2" > clone-node-max="1" > > notify="true" > > clone WebFSClone WebFS > > clone WebIP ClusterIP \ > > meta globally-unique="true" clone-max="2" clone-node-max="1" > > interleave="false" > > clone WebSiteClone WebSite \ > > meta interleave="false" > > colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone > > colocation colocation-WebSite-ClusterIP-INFINITY inf: WebSiteClone WebIP > > colocation fs_on_drbd inf: WebFSClone WebDataClone:Master > > order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start > > order WebSite-after-WebFS inf: WebFSClone WebSiteClone > > order order-ClusterIP-WebSite-mandatory : WebIP:start WebSiteClone:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ > > cluster-infrastructure="cman" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > when i want to mount the filesystem manually, this message appear : > > [root@cluster1 ~]# mount /dev/drbd1 /mnt/ > > mount point already used or other mount in progress > > error mounting lockproto lock_dlm > > > > but when i check the mount, there is no mount from drbd > > This is what the system told me : > what does "ps axf" say? Is there another mount process running? > [root@cluster2 ~]# ps axf PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] 3 ? S 0:00 \_ [migration/0] 4 ? S 0:00 \_ [ksoftirqd/0] 5 ? S 0:00 \_ [migration/0] 6 ? S 0:00 \_ [watchdog/0] 7 ? S 0:03 \_ [events/0] 8 ? S 0:00 \_ [cgroup] 9 ? S 0:00 \_ [khelper] 10 ? S 0:00 \_ [netns] 11 ? S 0:00 \_ [async/mgr] 12 ? S 0:00 \_ [pm] 13 ? S 0:00 \_ [sync_supers] 14 ? S 0:00 \_ [bdi-default] 15 ? S 0:00 \_ [kintegrityd/0] 16 ? S 0:03 \_ [kblockd/0] 17 ? S 0:00 \_ [kacpid] 18 ? S 0:00 \_ [kacpi_notify] 19 ? S 0:00 \_ [kacpi_hotplug] 20 ? S 0:00 \_ [ata/0] 21 ? S 0:00 \_ [ata_aux] 22 ? S 0:00 \_ [ksuspend_usbd] 23 ? S 0:00 \_ [khubd] 24 ? S 0:00 \_ [kseriod] 25 ? S 0:00 \_ [md/0] 26 ? S 0:00 \_ [md_misc/0] 27 ? S 0:00 \_ [khungtaskd] 28 ? S 0:00 \_ [kswapd0] 29 ? SN 0:00 \_ [ksmd] 30 ? SN 0:00 \_ [khugepaged] 31 ? S 0:00 \_ [aio/0] 32 ? S 0:00 \_ [crypto/0] 37 ? S 0:00 \_ [kthrotld/0] 39 ? S 0:00 \_ [kpsmoused] 40 ? S 0:00 \_ [usbhid_resumer] 71 ? S 0:00 \_ [kstriped] 188 ? S 0:00 \_ [scsi_eh_0] 190 ? S 0:00 \_ [scsi_eh_1] 220 ? S 0:00 \_ [scsi_eh_2] 272 ? S 0:00 \_ [kdmflush] 273 ? S 0:00 \_ [kdmflush] 293 ? S 0:00 \_ [jbd2/dm-0-8] 294 ? S 0:00 \_ [ext4-dio-unwrit] 853 ? S 0:00 \_ [kdmflush] 877 ? S 0:00 \_ [flush-253:0] 890 ? S 0:00 \_ [jbd2/sda1-8] 891 ? S 0:00 \_ [ext4-dio-unwrit] 949 ? S 0:00 \_ [kauditd] 1602 ? S 0:00 \_ [rpciod/0] 2344 ? S 0:00 \_ [cqueue] 2456 ? S 0:00 \_ [drbd1_worker] 2831 ? S 0:00 \_ [glock_workqueue] 2832 ? S 0:00 \_ [delete_workqueu] 2833 ? S< 0:00 \_ [kslowd001] 2834 ? S< 0:00 \_ [kslowd000] 2846 ? S 0:00 \_ [dlm_astd] 2847 ? S 0:00 \_ [dlm_scand] 2848 ? S 0:00 \_ [dlm_recv/0] 2849 ? S 0:00 \_ [dlm_send] 2850 ? S 0:00 \_ [dlm_recoverd] 1 ? Ss 0:01 /sbin/init 377 ? S<s 0:00 /sbin/udevd -d 840 ? S< 0:00 \_ /sbin/udevd -d 842 ? S< 0:00 \_ /sbin/udevd -d 1182 ? S<sl 0:00 auditd 1208 ? Sl 0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5 1250 ? Ss 0:00 rpcbind 1351 ? SLsl 0:06 corosync -f 1394 ? Ssl 0:00 fenced 1420 ? Ssl 0:00 dlm_controld 1467 ? Ssl 0:00 gfs_controld 1539 ? Ss 0:00 dbus-daemon --system 1550 ? S 0:00 avahi-daemon: running [cluster2.local] 1551 ? Ss 0:00 \_ avahi-daemon: chroot helper 1568 ? Ss 0:00 rpc.statd 1606 ? Ss 0:00 rpc.idmapd 1616 ? Ss 0:00 cupsd -C /etc/cups/cupsd.conf 1641 ? Ss 0:00 /usr/sbin/acpid 1650 ? Ss 0:00 hald 1651 ? S 0:00 \_ hald-runner 1692 ? S 0:00 \_ hald-addon-input: Listening on /dev/input/event3 /dev/input/event1 /dev/input/event0 1695 ? S 0:00 \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket 1715 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid 1740 ? Ss 0:00 /usr/sbin/sshd 1979 ? Ss 0:00 \_ sshd: root@pts/0 2207 pts/0 Ss 0:00 \_ -bash 8528 pts/0 R+ 0:00 \_ ps axf 1748 ? Ss 0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g 1828 ? Ss 0:00 /usr/libexec/postfix/master 1834 ? S 0:00 \_ pickup -l -t fifo -u 1835 ? S 0:00 \_ qmgr -l -t fifo -u 1852 ? Ss 0:00 /usr/sbin/abrtd 1860 ? Ss 0:00 abrt-dump-oops -d /var/spool/abrt -rwx /var/log/messages 1890 ? Ss 0:00 crond 1901 ? Ss 0:00 /usr/sbin/atd 1913 ? Ss 0:00 /usr/sbin/certmonger -S -p /var/run/certmonger.pid 1939 ? S 0:00 pacemakerd 1943 ? Ss 0:02 \_ /usr/libexec/pacemaker/cib 1944 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd 1945 ? Ss 0:01 \_ /usr/lib64/heartbeat/lrmd 1946 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd 1947 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine 1948 ? Ss 0:01 \_ /usr/libexec/pacemaker/crmd 2005 ? Ss 0:00 /usr/sbin/gdm-binary -nodaemon 2136 ? S 0:00 \_ /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 --force-active-vt 2157 tty1 Ss+ 0:02 \_ /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-nrpPGF/database -nolisten tcp vt1 2485 ? Ssl 0:00 \_ /usr/bin/gnome-session --autostart=/usr/share/gdm/autostart/LoginWindow/ 2595 ? S 0:00 | \_ /usr/libexec/at-spi-registryd 2683 ? S 0:00 | \_ metacity 2705 ? S 0:00 | \_ gnome-power-manager 2706 ? S 0:00 | \_ /usr/libexec/gdm-simple-greeter 2708 ? S 0:00 | \_ /usr/libexec/polkit-gnome-authentication-agent-1 2788 ? S 0:00 \_ pam: gdm-password 2028 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2 2037 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3 2050 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4 2062 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5 2071 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6 2346 ? Sl 0:00 /usr/sbin/console-kit-daemon --no-daemon 2474 ? S 0:00 /usr/bin/dbus-launch --exit-with-session 2482 ? Ss 0:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session 2527 ? S 0:00 /usr/libexec/devkit-power-daemon 2546 ? S 0:00 /usr/libexec/gconfd-2 2609 ? Ssl 0:00 /usr/libexec/gnome-settings-daemon --gconf-prefix=/apps/gdm/simple-greeter/settings-manager-plugins 2615 ? Ssl 0:00 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=12 2672 ? S 0:00 /usr/libexec/gvfsd 2728 ? S 0:00 /usr/libexec/polkit-1/polkitd 2744 ? S<sl 0:00 /usr/bin/pulseaudio --start --log-target=syslog 2748 ? SNl 0:00 /usr/libexec/rtkit-daemon 2843 ? D 0:00 /sbin/mount.gfs2 /dev/drbd1 /var/www/html -o rw 3049 ? D 0:00 blockdev --flushbufs /dev/drbd/by-res/wwwdata 7881 ? Ss 0:00 /usr/sbin/anacron -s > Did crm_mon report any errors? [root@cluster2 ~]# crm status ============ Last updated: Wed Oct 31 12:10:31 2012 Last change: Mon Oct 29 17:01:09 2012 via cibadmin on cluster1 Stack: cman Current DC: cluster2 - partition with quorum Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 2 Nodes configured, 2 expected votes 8 Resources configured. ============ Online: [ cluster1 cluster2 ] Master/Slave Set: WebDataClone [WebData] Masters: [ cluster1 cluster2 ] Clone Set: WebIP [ClusterIP] (unique) ClusterIP:0 (ocf::heartbeat:IPaddr2): Started cluster1 ClusterIP:1 (ocf::heartbeat:IPaddr2): Started cluster2 Clone Set: WebFSClone [WebFS] WebFS:0 (ocf::heartbeat:Filesystem): Started cluster2 (unmanaged) FAILED Stopped: [ WebFS:1 ] Failed actions: WebFS:1_start_0 (node=cluster1, call=14, rc=-2, status=Timed Out): unknown exec error WebFS:0_stop_0 (node=cluster2, call=16, rc=-2, status=Timed Out): unknown exec error > Did you check the system logs? > [root@cluster2 ~]# crm_verify -L -V warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2) warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) warning: common_apply_stickiness: Forcing WebFSClone away from cluster1 after 1000000 failures (max=1000000) warning: common_apply_stickiness: Forcing WebFSClone away from cluster1 after 1000000 failures (max=1000000) warning: common_apply_stickiness: Forcing WebFSClone away from cluster2 after 1000000 failures (max=1000000) warning: common_apply_stickiness: Forcing WebFSClone away from cluster2 after 1000000 failures (max=1000000) warning: should_dump_input: Ignoring requirement that WebFS:0_stop_0 comeplete before WebFSClone_stopped_0: unmanaged failed resources cannot prevent clone shutdown [root@cluster2 ~]# grep -i error /var/log/messages Oct 31 11:12:25 cluster2 kernel: block drbd1: error receiving ReportState, l: 4! Oct 31 11:12:29 cluster2 kernel: block drbd1: error receiving ReportState, l: 4! Oct 31 11:12:56 cluster2 crmd[1948]: error: process_lrm_event: LRM operation WebFS:0_start_0 (15) Timed Out (timeout=20000ms) Oct 31 11:13:17 cluster2 crmd[1948]: error: process_lrm_event: LRM operation WebFS:0_stop_0 (16) Timed Out (timeout=20000ms) Oct 31 11:15:51 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:16:16 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:31:16 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:39:05 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:39:30 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:39:42 cluster2 kernel: block drbd1: error receiving ReportState, l: 4! Oct 31 11:39:44 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:39:44 cluster2 kernel: block drbd1: error receiving ReportState, l: 4! Oct 31 11:39:53 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:40:13 cluster2 crmd[1948]: warning: status_from_rc: Action 49 (WebFS:1_start_0) on cluster1 failed (target: 0 vs. rc: -2): Error Oct 31 11:40:13 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2) Oct 31 11:40:13 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:40:14 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2) Oct 31 11:40:14 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 11:55:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2) Oct 31 11:55:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) Oct 31 12:10:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2) Oct 31 12:10:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2) > > > > there is another strange thing, the 1st server (cluster1) cannot reboot. > it > > hangs with message 'please standby while rebooting the system'. in the > > reboot process, there are 2 failed action which is related to fencing. i > > didn't configure any fencing yet. one of the failed action is : > > 'stopping cluster > > leaving fence domain .... found dlm lockspace /sys/kernel/dlm/web > > fence_tool : cannot leave due to active system [FAILED]' > > > > please help me with this problem > > > > -- > > Best Regards, > > > > Soni Maula Harriz > > Database Administrator > > PT. Data Aksara Sangkuriang > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Best Regards, Soni Maula Harriz Database Administrator PT. Data Aksara Sangkuriang
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org