Thanks! Srinivas was helping me troubleshoot that last night with a series of strace(es) that started to point to the Kernel issues - and then he realized that it was the wrong version! I updated the version to 2.6.39-400.215.10.el6uek.x86_64 on both of my ocfs boxes and service o2cb enable brought the global HB online with no issues!
Warm regards and thanks! Jon > On Nov 13, 2014, at 12:00 PM, ocfs2-users-requ...@oss.oracle.com wrote: > > Send Ocfs2-users mailing list submissions to > ocfs2-users@oss.oracle.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://oss.oracle.com/mailman/listinfo/ocfs2-users > or, via email, send a message with subject or body 'help' to > ocfs2-users-requ...@oss.oracle.com > > You can reach the person managing the list at > ocfs2-users-ow...@oss.oracle.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Ocfs2-users digest..." > > > Today's Topics: > > 1. Re: Ocfs2-users Digest, Vol 130, Issue 1 (Richard Sibthorp) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 13 Nov 2014 14:16:05 +0000 > From: Richard Sibthorp <richard.sibth...@oracle.com> > Subject: Re: [Ocfs2-users] Ocfs2-users Digest, Vol 130, Issue 1 > To: ocfs2-users@oss.oracle.com > Message-ID: <5464bd25.8010...@oracle.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi Jon, > > The kernel you are using includes the ocfs2 kernel modules at version > 1.6.3. The global heartbeat feature was introduced in ocfs2 1.8. > > I haven't checked whether any of the 2.6.32-based uek's include ocfs2 > 1.8, but certainly the 2.6.39 and later (aka uek2, uek3) ones do. > > I assume from below you have an Oracle support license - at least for > rdbms if not Oracle Linux. When using ocfs2 for rdbms resources, your > rdbms license entitles you to ocfs2 support via MOS, though for > general-purpose ocfs2 issues an Oracle Linux Support contract needs to > be in place. This would have a separate CSI from that of licensed > products - obviously open-source products are not licensed, but if you > require support you need a support contract. > > You may also want to review MOS documents 1552519.1 and 1553162.1 > > Best regards, > Richard. > > On 13/11/2014 02:27, ocfs2-users-requ...@oss.oracle.com wrote: >> Send Ocfs2-users mailing list submissions to >> ocfs2-users@oss.oracle.com >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://oss.oracle.com/mailman/listinfo/ocfs2-users >> or, via email, send a message with subject or body 'help' to >> ocfs2-users-requ...@oss.oracle.com >> >> You can reach the person managing the list at >> ocfs2-users-ow...@oss.oracle.com >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Ocfs2-users digest..." >> >> >> Today's Topics: >> >> 1. OCFS2 v1.8 on VMware VMs global heartbeat woes (Jon Norris) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 12 Nov 2014 18:26:51 -0800 >> From: Jon Norris <jon_nor...@apple.com> >> Subject: [Ocfs2-users] OCFS2 v1.8 on VMware VMs global heartbeat woes >> To: ocfs2-users@oss.oracle.com >> Message-ID: <fff084ca-12f0-4fa1-bb38-b70f5a4a6...@apple.com> >> Content-Type: text/plain; charset="utf-8" >> >> Running two VMs on ESXi 5.1.0 and trying to get global heart beat (HB) >> working with no luck (on about my 20th rebuild and redo) >> >> Environment: >> >> Two VMware based VMs running >> >> # cat /etc/oracle-release >> >> Oracle Linux Server release 6.5 >> >> # uname -r >> >> 2.6.32-400.36.8.el6uek.x86_64 >> >> # yum list installed | grep ocfs >> >> ocfs2-tools.x86_64 1.8.0-11.el6 @oel-latest >> >> # yum list installed | grep uek >> >> kernel-uek.x86_64 2.6.32-400.36.8.el6uek @oel-latest >> kernel-uek-firmware.noarch 2.6.32-400.36.8.el6uek @oel-latest >> kernel-uek-headers.x86_64 2.6.32-400.36.8.el6uek @oel-latest >> >> Configuration: >> >> The shared data stores (HB and mounted OCFS) are setup in a similar way as >> described by VMWare and Oracle for shared RAC VMWare based data stores. All >> blogs, wikis and VMWare KB docs show similar setup, VM shared SCSI settings >> [multi-writer], shared disk [independant + persistent] etc. such as: >> >> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034165 >> >> <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034165> >> ) >> >> The devices can be seen by both VMs after in the OS. I have used the same >> configuration to run an OCFS2 setup with local heartbeat, and that works >> fine (cluster starts up and the OCFS2 file system mounts with no issues) >> >> I followed similar procedures as show in an Oracle blog + docs: >> https://docs.oracle.com/cd/E37670_01/E37355/html/ol_instcfg_ocfs2.html >> <https://docs.oracle.com/cd/E37670_01/E37355/html/ol_instcfg_ocfs2.html> and >> https://blogs.oracle.com/wim/entry/ocfs2_global_heartbeat >> <https://blogs.oracle.com/wim/entry/ocfs2_global_heartbeat> with no luck. >> >> The shared SCSI controllers are VMware paravirtual and set to ?shared none? >> as suggested by the VMware RAC shared disk KB (previously mentioned) >> >> After the shared Linux devices have been added to both VMs and are seen by >> both VMs in the OS (ls /dev/sd* shows the devices on each) I format the >> global HB devices in a way similar to the following from one VM: >> >> # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol1 --cluster-name=test >> --cluster-stack=o2cb --global-heartbeat /dev/sdc >> # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol2 --cluster-name=test >> --cluster-stack=o2cb --global-heartbeat /dev/sdd >> >>> From both VMs you can run the following and see: >> >> # mounted.ocfs2 -d >> >> Device Stack Cluster F UUID Label >> /dev/sdc o2cb test G 5620F19D43D840C7A46523019AE15A96 ocfs2vol1 >> /dev/sdd o2cb test G 9B9182279ABD4FD99F695F91488C94C1 ocfs2vol2 >> >> I then add the global HB devices to the ocfs config file with similar >> commands: >> >> # o2cb add-heartbeat test 5620F19D43D840C7A46523019AE15A96 >> # o2cb add-heartbeat test 9B9182279ABD4FD99F695F91488C94C1 >> >> Thus far looking good (heh, but then all we?ve done is format ocfs2 with >> options and updated a text file) - then I do the following: >> >> # o2cb heartbeat-mode test global >> >> All this being done on one node in the cluster I copy the following to the >> other node (with hostnames changed here, though the actual hostname = output >> of the hostname command on each node): >> >> # cat /etc/ocfs2/cluster.conf >> >> node: >> name = clusterhost1.mydomain.com >> cluster = test >> number = 0 >> ip_address = 10.143.144.12 >> ip_port = 7777 >> >> node: >> name = clusterhost2.mydomain.com >> cluster = test >> number = 1 >> ip_address = 10.143.144.13 >> ip_port = 7777 >> >> cluster: >> name = test >> heartbeat_mode = global >> node_count = 2 >> >> heartbeat: >> cluster = test >> region = 5620F19D43D840C7A46523019AE15A96 >> >> heartbeat: >> cluster = test >> region = 9B9182279ABD4FD99F695F91488C94C1 >> >> The same config works fine with heartbeat_mode set to local and the global >> heartbeat devices removed, and I can mount a shared FS - the local HB >> interfaces are IPv4 on a private L2 non routed VLAN, are up and each node >> can ping each other. >> >> Once the config is copied to each node and have already run: >> >> # service o2cb configure >> >> Which completes in local heartbeat mode fine, so the cluster will start on >> boot and the params are default for timeouts etc. >> >> I check that the service on both nodes unloads and loads modules with no >> issues: >> >> # service o2cb unload >> >> Clean userdlm domains: OK >> Unmounting ocfs2_dlmfs filesystem: OK >> Unloading module "ocfs2_dlmfs": OK >> Unloading module "ocfs2_stack_o2cb": OK >> Unmounting configfs filesystem: OK >> Unloading module "configfs": OK >> >> # service o2cb load >> >> Loading filesystem "configfs": OK >> Mounting configfs filesystem at /sys/kernel/config: OK >> Loading stack plugin "o2cb": OK >> Loading filesystem "ocfs2_dlmfs": OK >> Mounting ocfs2_dlmfs filesystem at /dlm: OK >> >> # mount -v >> ? >> ?. >> debugfs on /sys/kernel/debug type debugfs (rw) >> ?. >> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw) >> >> # lsmod | grep ocfs >> >> ocfs2_dlmfs 18026 1 >> ocfs2_stack_o2cb 3606 0 >> ocfs2_dlm 196778 1 ocfs2_stack_o2cb >> ocfs2_nodemanager 202856 3 ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm >> ocfs2_stackglue 11283 2 ocfs2_dlmfs,ocfs2_stack_o2cb >> configfs 25853 2 ocfs2_nodemanager >> >> Looks good on both nodes?. then (sigh) >> >> # service o2cb enable >> >> Writing O2CB configuration: OK >> Setting cluster stack "o2cb": OK >> Registering O2CB cluster "test": Failed >> o2cb: Unable to access cluster service while registering heartbeat mode >> 'global' >> Unregistering O2CB cluster "test": OK >> >> I have searched for the error string and have come up with a huge ZERO on >> help - and the local OS log messages are equally unhelpful: >> >> # tail /var/log/messages >> >> Nov 12 21:54:53 clusterhost1 o2cb.init: online test >> Nov 13 00:58:38 clusterhost1 o2cb.init: online test >> Nov 13 01:00:06 clusterhost1 o2cb.init: offline test 0 >> Nov 13 01:00:06 clusterhost1 kernel: ocfs2: Unregistered cluster interface >> o2cb >> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 Node Manager 1.6.3 >> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 DLM 1.6.3 >> Nov 13 01:01:14 clusterhost1 kernel: ocfs2: Registered cluster interface o2cb >> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 DLMFS 1.6.3 >> Nov 13 01:01:14 clusterhost1 kernel: OCFS2 User DLM kernel interface loaded >> Nov 13 01:03:32 clusterhost1 o2cb.init: online test >> >> Dmesg shows the same: >> >> # dmesg >> >> OCFS2 Node Manager 1.6.3 >> OCFS2 DLM 1.6.3 >> ocfs2: Registered cluster interface o2cb >> OCFS2 DLMFS 1.6.3 >> OCFS2 User DLM kernel interface loaded >> Slow work thread pool: Starting up >> Slow work thread pool: Ready >> FS-Cache: Loaded >> FS-Cache: Netfs 'nfs' registered for caching >> eth0: no IPv6 routers present >> eth1: no IPv6 routers present >> ocfs2: Unregistered cluster interface o2cb >> OCFS2 Node Manager 1.6.3 >> OCFS2 DLM 1.6.3 >> ocfs2: Registered cluster interface o2cb >> OCFS2 DLMFS 1.6.3 >> OCFS2 User DLM kernel interface loaded >> ocfs2: Unregistered cluster interface o2cb >> OCFS2 Node Manager 1.6.3 >> OCFS2 DLM 1.6.3 >> ocfs2: Registered cluster interface o2cb >> OCFS2 DLMFS 1.6.3 >> OCFS2 User DLM kernel interface loaded >> >> The filesystem looks fine and this can be run from both hosts in the cluster: >> >> # fsck.ocfs2 -n /dev/sdc >> >> fsck.ocfs2 1.8.0 >> Checking OCFS2 filesystem in /dev/sdc: >> Label: ocfs2vol1 >> UUID: 5620F19D43D840C7A46523019AE15A96 >> Number of blocks: 524288 >> Block size: 4096 >> Number of clusters: 524288 >> Cluster size: 4096 >> Number of slots: 4 >> >> # fsck.ocfs2 -n /dev/sdd >> >> fsck.ocfs2 1.8.0 >> Checking OCFS2 filesystem in /dev/sdd: >> Label: ocfs2vol2 >> UUID: 9B9182279ABD4FD99F695F91488C94C1 >> Number of blocks: 524288 >> Block size: 4096 >> Number of clusters: 524288 >> Cluster size: 4096 >> Number of slots: 4 >> >> What am I missing? I?ve re-done this, re-created the devices a few too many >> times (thinking I may have missed something) but I am mystified. From all >> outer appearances I have two VMs that can see and in local heartbeat mode >> mount a shared OCFS2 filesystem and access it (have it running in local >> heartbeat mode for a cluster of rsyslog servers that are being load balanced >> by an F5 LTM VS with no issues) I am stumped on how to get global HB devices >> setup, though I have read and re-read the user guides, troubleshooting >> guides and wikis/blogs on how to make that work until my eyes hurt. >> >> Mounted the debugfs and ran the debugfs.ocfs2 utility but am unfamiliar of >> what I should be looking for there (or if this is where I would look for >> cluster not coming online errors) >> >> As the oc2b/ocfs modules are all kernel based I am not 100% sure how to >> increase debug information without digging into the source code and mucking >> around there. >> >> Any guidance or lessons learned (or things to check) would be super :) and >> if works warrant a happy scream of joy from my frustrated cube! >> >> >> Warm regards, >> >> Jon >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://oss.oracle.com/pipermail/ocfs2-users/attachments/20141112/68b903fe/attachment.html >> >> ------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> End of Ocfs2-users Digest, Vol 130, Issue 1 >> ******************************************* > > > > > ------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > End of Ocfs2-users Digest, Vol 130, Issue 2 > *******************************************
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users