The umount and the hb stop threads are deadlocking on the s_umount lock.
This problem is due to the local heartbeat scheme employed in which the hb device is the same as the mounted one. umount trigger hb stop which calls open() => ... => rescan_partitions() => ... => get_super() => down_read(). The same lock should be held by the umount thread. Unfortunately there is no fix for this other than using a different hb scheme. Later this year we will be releasing global heartbeat as part of the o2cb stack that will allow users to specify different hb devices. Another option is to move to sles11 and make use of the pacemaker cluster stack. On 07/25/2011 06:58 AM, Simon Hargrave wrote:
A further update, which simplifies the situation. It appears to be more fundamental, and not actually anything to do with the online resize. Basically it appears that simply the act of resizing the LUN and performing the scsi rescan is enough to make the next unmount fail, i.e.:- * create filesystem * mount filesystem * unmounts and mounts fine * extend LUN on storage * echo 1 to /sys/block/sdb/device/rescan * unmount filesystem, which hangs The above happens even if only one node is in the cluster, so it doesn't appear to be a locking issue between the hosts. I have tried exactly the same with ext3 (one node obviously!) and the same resize doesn't cause a hang. I have also configure ocfs on a single physical machine (to rule out VMware), and the symptoms are identical. So for whatever reason, the system call to umount() for an ocfs2 filesystem hangs if the underlying block device has changed size? Simon - Simon Hargrave szhargr...@ybs.co.uk <blocked::blocked::blocked::mailto:szhargr...@ybs.co.uk> Enterprise Systems Team Leader x2831 Yorkshire Building Society 01274 472831 http://wwwtech/sysint/tsgcore.asp <blocked::http://wwwtech/sysint/tsgcore.asp> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ *From:* ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] *On Behalf Of *Simon Hargrave *Sent:* 25 July 2011 13:50 *To:* ocfs2-users@oss.oracle.com *Subject:* Re: [Ocfs2-users] OCFS2 unmount problems after online resize Further to this, I get the following in dmesg every 120 seconds after the attempted unmount: - INFO: task ocfs2_hb_ctl:3794 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ocfs2_hb_ctl D ffff810003db6420 0 3794 3793 (NOTLB) ffff8100b9d05cf8 0000000000000086 00000000f000020a ffffffff8002d0ee 0000000000000000 0000000000000007 ffff8100d801e820 ffffffff80310b60 000000887c712d88 000000000000791a ffff8100d801ea08 0000000080009852 Call Trace: [<ffffffff8002d0ee>] wake_up_bit+0x11/0x22 [<ffffffff8006466c>] __down_read+0x7a/0x92 [<ffffffff800e68aa>] get_super+0x48/0x95 [<ffffffff800e387b>] fsync_bdev+0xe/0x3b [<ffffffff8014a6f8>] invalidate_partition+0x28/0x40 [<ffffffff8010d6e7>] rescan_partitions+0x37/0x279 [<ffffffff800e78ec>] do_open+0x231/0x30f [<ffffffff800e7c1e>] blkdev_open+0x0/0x4f [<ffffffff800e7c41>] blkdev_open+0x23/0x4f [<ffffffff8001eab6>] __dentry_open+0xd9/0x1dc [<ffffffff8002751f>] do_filp_open+0x2a/0x38 [<ffffffff8002ae16>] iput+0x4b/0x84 [<ffffffff800dddf3>] alternate_node_alloc+0x70/0x8c [<ffffffff80019f7e>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 - Simon Hargrave szhargr...@ybs.co.uk <blocked::blocked::blocked::mailto:szhargr...@ybs.co.uk> Enterprise Systems Team Leader x2831 Yorkshire Building Society 01274 472831 http://wwwtech/sysint/tsgcore.asp <blocked::http://wwwtech/sysint/tsgcore.asp> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ *From:* ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] *On Behalf Of *Simon Hargrave *Sent:* 25 July 2011 13:26 *To:* ocfs2-users@oss.oracle.com *Subject:* [Ocfs2-users] OCFS2 unmount problems after online resize Please read the warning at the end of this email ________________________________________________ Hi I'm doing some experimentation with OCFS2 (1.4 on RHEL5) with a view to using as a 2-node clustered filesystem. I seem to be having issues with online resize (which documentation suggests is supported under 1.4). I'm creating a LUN and publishing from a HP EVA6400 storage array to the 2 nodes, and creating a filesystem which works fine. However, it appears that if I online-increase the size of the LUN and subsequently the filesystem, it hangs indefinately on unmount. Full transcript of issue is as below: - /etc/ocfs2/cluster.conf (created via ocfs2console) -------------------------------------------------- node: ip_port = 7777 ip_address = 10.34.8.90 number = 0 name = ybsxlx45 cluster = ocfs2 node: ip_port = 7777 ip_address = 10.34.8.91 number = 1 name = ybsxlx46 cluster = ocfs2 cluster: node_count = 2 name = ocfs2 /etc/sysconfig/o2cb (created via ocfs2console) ---------------------------------------------- # O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_STACK: The name of the cluster stack backing O2CB. O2CB_STACK=o2cb # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=ocfs2 # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD= # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. O2CB_IDLE_TIMEOUT_MS= # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent O2CB_KEEPALIVE_DELAY_MS= # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts O2CB_RECONNECT_DELAY_MS= 2GB LUN published to both nodes and appears as /dev/sdb ------------------------------------------------------- # grep sdb /proc/partitions 8 16 2097152 sdb Operating System ---------------- Red Hat Enterprise Linux Server release 5.6 (Tikanga) Linux ybsxlx45 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux OCFS2 Packages -------------- ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 Create and exercise filesystem ------------------------------ # mkfs.ocfs2 -L "ocfstest" /dev/sdb # mount -L ocfstest /ocfstest # dd if=/dev/zero of=/ocfstest/file1 bs=1024k count=500 (on first node) # dd if=/dev/zero of=/ocfstest/file2 bs=1024k count=500 (on second node) # df -k /ocfstest Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 2097152 1320836 776316 63% /ocfstest Test unmount and remount ------------------------ # strace -f -o before.txt umount /ocfstest # mount -L ocfstest /ocfstest LUN resized to 3GB and rescan on each host ------------------------------------------ # echo "1" > /sys/block/sdb/device/rescan # grep sdb /proc/partitions 8 16 3145728 sdb (new device size showing) Online resize of filesystem --------------------------- # df -k /ocfstest Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 2097152 1312644 784508 63% /ocfstest # tunefs.ocfs2 -S /dev/sdb # df -k /ocfstest Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 3145728 1312676 1833052 42% /ocfstest (new filesystem size shows on both nodes) Exercise filesystem ------------------- # dd if=/dev/zero of=/ocfstest/file3 bs=1024k count=500 (on first node) # dd if=/dev/zero of=/ocfstest/file4 bs=1024k count=500 (on second node) # df -k /ocfstest Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 3145728 2340772 804956 75% /ocfstest (filesystem continues to function and can be filled past old size) Unmount filesystem ------------------ # strace -f -o after.txt umount /ocfstest At this point, the unmount hangs forever and only a reboot will clear it. Comparing the "strace" output, the second one hangs during the call to umount() system call, after having checked that umount.ocfs2 doesn't exist. Whilst hung, the filesystem still "appears" in /etc/mtab and df output, but it is not mounted according to the kernel (/proc/mounts). Other node continues to function whilst in this state, filesystem does not hang. So the question is, is this a bug, or am I doing something wrong? The OCFS2 1.4 user guide does state: - 9. Online File system Resize Users can now grow the file system without having to unmount it. This feature requires a compatible clustered logical volume manager. Compatible volumes managers will be announced when support is available. However since I'm using the raw device, not LVM this should work, provided the scsi device rescan has been performed on all nodes prior to running tunefs.ocfs2? I should finally point out that this is being performed on 2 VMware guests, but the LUN is published directly to the guests as a Raw Device Mapping in Physical Compatibility Mode (passthru), as per the various VMware whitepapers. I don't have 2 spare SAN-attached crash-and-burn hosts to test this out physically, but I don't believe this should be a factor. Any help appreciated as online resize is a must in a 24x7 clustered environment! Thanks - Simon Hargrave szhargr...@ybs.co.uk <blocked::blocked::blocked::mailto:szhargr...@ybs.co.uk> Enterprise Systems Team Leader x2831 Yorkshire Building Society 01274 472831 http://wwwtech/sysint/tsgcore.asp <blocked::http://wwwtech/sysint/tsgcore.asp> ________________________________________________ This email and any attachments are confidential and may contain privileged information. If you are not the person for whom they are intended please return the email and then delete all material from any computer. You must not use the email or attachments for any purpose, nor disclose its contents to anyone other than the intended recipient. Any statements made by an individual in this email do not necessarily reflect the views of the Yorkshire Building Society Group. ________________________________________________ Yorkshire Building Society, which is authorised and regulated by the Financial Services Authority, chooses to introduce its customers to Legal & General for the purposes of advising on and arranging life assurance and investment products bearing Legal & General's name. We are entered in the FSA Register and our FSA registration number is 106085 http://www.fsa.gov.uk/register Head Office: Yorkshire Building Society, Yorkshire House, Yorkshire Drive, Bradford, BD5 8LJ Tel: 0845 1 200 100 Visit Our Website http://www.ybs.co.uk All communications with us may be monitored/recorded to improve the quality of our service and for your protection and security. ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users