Hi, The people who look after the dlm are on the linux-clus...@redhat.com mailing list. Best to direct this issue there.
-- Andrew On Sun, Jan 3, 2010 at 8:30 PM, Daniel Qian <dan...@bestningning.com> wrote: > I came a long way to set up this two-node cluster of pacemaker + > openais/corosync + ocfs2 + DLM + drbd on Fedora 12. I resolved issues one > after another until I hit this last hurdle which is beyond my power to > overcome. All other components are working fine. > > [r...@ilo150 ~]# crm_mon -1 > > > ============ > Last updated: Sun Jan 3 12:17:17 2010 > Stack: openais > Current DC: ilo143 - partition with quorum > Version: 1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7 > 2 Nodes configured, 2 expected votes > 5 Resources configured. > ============ > > Online: [ ilo143 ilo150 ] > > Master/Slave Set: drbd_clone0 > Masters: [ ilo143 ilo150 ] > Clone Set: dlm-clone > Started: [ ilo143 ilo150 ] > Clone Set: o2cb-clone > Started: [ ilo143 ilo150 ] > Clone Set: ip-clone (unique) > ClusterIP:0 (ocf::heartbeat:IPaddr2): Started ilo143 > ClusterIP:1 (ocf::heartbeat:IPaddr2): Started ilo143 > > > However I start having this problem when I try to mount the ocfs2 file > system by typing "crm resource start fs0-clone". Snippet from > /var/log/messages > > Jan 2 17:46:13 ilo150 kernel: ------------[ cut here ]------------ > Jan 2 17:46:13 ilo150 kernel: kernel BUG at fs/dlm/lowcomms.c:861! > Jan 2 17:46:13 ilo150 kernel: invalid opcode: 0000 [#1] SMP > Jan 2 17:46:13 ilo150 kernel: last sysfs file: > /sys/kernel/dlm/5316FDFD93BB4F7E97B296FC513FA149/event_done > Jan 2 17:46:13 ilo150 kernel: CPU 1 > Jan 2 17:46:13 ilo150 kernel: Modules linked in: sctp libcrc32c ocfs2 > ocfs2_nodemanager ocfs2_stack_user ocfs2_stackglue dlm drbd configfs ipv6 > bnx2 ipmi_si serio_raw ipmi_msghandler hpwdt iTCO_wdt iTCO_vendor_support > cciss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: > scsi_wait_scan] > Jan 2 17:46:13 ilo150 kernel: Pid: 2918, comm: dlm_send Not tainted > 2.6.31.9-174.fc12.x86_64 #1 ProLiant DL360 G6 > Jan 2 17:46:13 ilo150 kernel: RIP: 0010:[<ffffffffa01d75c9>] > [<ffffffffa01d75c9>] sctp_init_assoc+0x13e/0x2c1 [dlm] > Jan 2 17:46:13 ilo150 kernel: RSP: 0018:ffff8808e9bdbc20 EFLAGS: 00010246 > Jan 2 17:46:13 ilo150 kernel: RAX: ffff8808e9920038 RBX: ffff8808e9920000 > RCX: 0000000000000000 > Jan 2 17:46:13 ilo150 kernel: RDX: 0000000000000000 RSI: 0000000000524852 > RDI: ffff8808e9920048 > Jan 2 17:46:13 ilo150 kernel: RBP: ffff8808e9bdbe00 R08: 0000000000000000 > R09: ffff88091f804200 > Jan 2 17:46:13 ilo150 kernel: R10: ffff88091f804200 R11: 0000000000000000 > R12: ffff8808e9920038 > Jan 2 17:46:13 ilo150 kernel: R13: ffff8808e9920048 R14: ffff8808eed9a000 > R15: ffff8808e9bdbe80 > Jan 2 17:46:13 ilo150 kernel: FS: 0000000000000000(0000) > GS:ffff880028053000(0000) knlGS:0000000000000000 > Jan 2 17:46:13 ilo150 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: > 000000008005003b > Jan 2 17:46:13 ilo150 kernel: CR2: 00007fc4485c9000 CR3: 0000000001001000 > CR4: 00000000000006e0 > Jan 2 17:46:13 ilo150 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: 0000000000000000 > Jan 2 17:46:13 ilo150 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > DR7: 0000000000000400 > Jan 2 17:46:13 ilo150 kernel: Process dlm_send (pid: 2918, threadinfo > ffff8808e9bda000, task ffff8808e9be0000) > Jan 2 17:46:13 ilo150 kernel: Stack: > Jan 2 17:46:13 ilo150 kernel: 0000000000000000 0000000000000000 > ffff8808e9bdbd10 0000000000000010 > Jan 2 17:46:13 ilo150 kernel: <0> 0000000000000000 0000000000000000 > ffff8808e9bdbd90 0000000000000030 > Jan 2 17:46:13 ilo150 kernel: <0> 0000000000000080 0000000000000000 > 0000000000000000 0000000000000000 > Jan 2 17:46:13 ilo150 kernel: Call Trace: > Jan 2 17:46:13 ilo150 kernel: [<ffffffff810106c5>] ? > __switch_to+0x18b/0x217 > Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ? > process_send_sockets+0x0/0x17c [dlm] > Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d72b0>] > process_send_sockets+0x34/0x17c [dlm] > Jan 2 17:46:13 ilo150 kernel: [<ffffffff810b272d>] ? > probe_workqueue_execution+0xb1/0xcd > Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ? > process_send_sockets+0x0/0x17c [dlm] > Jan 2 17:46:13 ilo150 kernel: [<ffffffff810635a0>] > worker_thread+0x18a/0x224 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff81067b37>] ? > autoremove_wake_function+0x0/0x39 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff81063416>] ? > worker_thread+0x0/0x224 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff810677b5>] kthread+0x91/0x99 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff81012daa>] child_rip+0xa/0x20 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff81067724>] ? kthread+0x0/0x99 > Jan 2 17:46:13 ilo150 kernel: [<ffffffff81012da0>] ? child_rip+0x0/0x20 > Jan 2 17:46:13 ilo150 kernel: Code: 60 fe ff ff 80 00 00 00 89 85 38 fe ff > ff 48 8d 45 90 48 89 85 50 fe ff ff e8 88 5f 24 e1 4c 8b 63 38 48 8d 43 38 > 49 39 c4 75 04 <0f> 0b eb fe 4d 63 44 24 1c 41 8b 54 24 18 66 ff 43 48 45 31 > ff > Jan 2 17:46:13 ilo150 kernel: RIP [<ffffffffa01d75c9>] > sctp_init_assoc+0x13e/0x2c1 [dlm] > Jan 2 17:46:13 ilo150 kernel: RSP <ffff8808e9bdbc20> > Jan 2 17:46:13 ilo150 kernel: ---[ end trace d3844af31bca174b ]--- > > I am wondering if this is a Fedora specific bug. I have the full messages > logs from both nodes if anyone is interested and here is my config > > [r...@ilo150 ~]# crm configure show > node ilo143 > node ilo150 > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="xx.xx.xx.xx" cidr_netmask="32" \ > op monitor interval="30s" > primitive dlm ocf:pacemaker:controld \ > op monitor interval="120s" > primitive drbd_r0 ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="20" role="Master" timeout="20" \ > op monitor interval="30" role="Slave" timeout="20" > primitive fs0 ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/mnt" fstype="ocfs2" \ > meta target-role="Stopped" > primitive o2cb ocf:ocfs2:o2cb \ > op monitor interval="120s" > ms drbd_clone0 drbd_r0 \ > meta master-max="2" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > clone dlm-clone dlm \ > meta interleave="true" > clone fs0-clone fs0 > clone ip-clone ClusterIP \ > meta globally-unique="true" clone-max="2" clone-node-max="2" > clone o2cb-clone o2cb \ > meta interleave="true" > colocation fs0-with-o2cb inf: fs0-clone o2cb-clone > colocation fs0_on_drbd inf: fs0-clone drbd_clone0:Master > colocation o2cb-with-dlm inf: o2cb-clone dlm-clone > order fs0-after-drbd inf: drbd_clone0:promote fs0-clone:start > order fs0-after-o2cb inf: o2cb-clone fs0-clone > order o2cb-after-dlm inf: dlm-clone o2cb-clone > property $id="cib-bootstrap-options" \ > dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > last-lrm-refresh="1262472066" > > > > > Thanks, > Daniel > > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker