I will see what I can do. How large would a o2image be? To just reiterate, these are not new file systems. They were created with ocfs2-2.6.9-55.ELsmp-1.2.9-1.el4 and ocfs2-tools-1.2.7-1.el4 under RHEL 4. The primary user of these volumes is a cluster of 6-nodes running RHEL 5.8 with ocfs2-2.6.18-308.11.1.el5-1.4.10-1 and ocfs2-tools-1.6.3-2.el5. Another machine, which still runs the same EL4 binaries, is mounting these snap cloned volumes daily, doing operations on the DB files and then copying the data off.
From: Herbert van den Bergh [mailto:herbert.van.den.be...@oracle.com] Sent: Wednesday, July 10, 2013 09:54 To: Mihail Daskalov Cc: Sunil Mushran; Ulf Zimmermann; ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problems with volumes coming from RHEL5 going to OEL6 (slighly OT) It's possible that the 1.8.0 tag was never created in the ocfs-tools git repository. But it's not of any use anyway. If you check the changelog of the ocfs-tools rpm, you'll see that there were many patches since 1.8.0, so the 1.8.0-10 version that Ulf is using would be very different from a 1.8.0 tag in git. Ulf, I suggest you create an o2image of the "bad" filesystem, and see if the problem can be reproduced with that image. If it can, then you may want to make that o2image available to the OCFS2 developers so they can debug ocfs2-tools to see what is causing the malloc/free error. You may also want to include the exact steps to take to reproduce this, starting from the mkfs up to the failure, indicating exactly what versions of kernel and tools were used along the way. Thanks, Herbert. On 7/10/13 7:55 AM, Mihail Daskalov wrote: Hi Sunil, Regarding the ocfs tools version 1.8.0 you should know best what it was meant to be (maybe not true for 1.8.0-10 in OEL6U3). Is it possible that the tag for 1.8.0 disappeared from the git repository? Or there was never a tag for 1.8.0 ? Bellow is the link to commit in 1.8.2 tag, that brings the version to 1.8.0 https://oss.oracle.com/git/?p=ocfs2-tools.git;a=commitdiff;h=2480a215a600050d2bf923044dffac91439d982a;hp=8b5f4ad727e019cb557c4b516ab401c15c5c317e and later on another commit that bring the version to 1.8.2 https://oss.oracle.com/git/?p=ocfs2-tools.git;a=commitdiff;h=560a1e60936fe868b00cfc9cad5def726e10828e I am sorry I am not actually helping to Ulf's problem. Ulf, maybe you can really follow the head version and try to see an explanation of the error message. Anyway I think it would be best to open a SR with Oracle if you have Linux support contract. Does anyone know how to find you the git repository at least for some packages in Oracle Linux. I know the source for each package is available as .src.rpm but how could I see the changes, or the tag from which every version was build? I remember Wim talking on something like that a while ago (saying oracle is not like redhat mangling changelogs), but I can't find the article right now. If you find out what is behind ocfs2-tools 1.8.0-10 it would be easier to track the problem. Regards, Mihail Daskalov From: ocfs2-users-boun...@oss.oracle.com<mailto:ocfs2-users-boun...@oss.oracle.com> [mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of Sunil Mushran Sent: Wednesday, July 10, 2013 2:11 AM To: Ulf Zimmermann Cc: ocfs2-users@oss.oracle.com<mailto:ocfs2-users@oss.oracle.com> Subject: Re: [Ocfs2-users] Problems with volumes coming from RHEL5 going to OEL6 The error does not make sense. Also I don't know what 1.8.0 tools means. I cannot see that label in the src tree. https://oss.oracle.com/git/?p=ocfs2-tools.git;a=summary One option is to build the tools from the head. On Tue, Jul 9, 2013 at 2:25 PM, Ulf Zimmermann <u...@openlane.com<mailto:u...@openlane.com>> wrote: Sunil, any suggestions on this? From: ocfs2-users-boun...@oss.oracle.com<mailto:ocfs2-users-boun...@oss.oracle.com> [mailto:ocfs2-users-boun...@oss.oracle.com<mailto:ocfs2-users-boun...@oss.oracle.com>] On Behalf Of Ulf Zimmermann Sent: Saturday, June 22, 2013 15:20 To: Sunil Mushran Cc: ocfs2-users@oss.oracle.com<mailto:ocfs2-users@oss.oracle.com> Subject: Re: [Ocfs2-users] Problems with volumes coming from RHEL5 going to OEL6 [root@co-db03 ulf]# debugfs.ocfs2 -R "stats" /dev/mapper/aucp_data_bk_2_x Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Sun Sep 25 05:32:29 2011 Creator OS: 0 Feature Compat: 0 Feature Incompat: 0 Tunefs Incomplete: 0 Feature RO compat: 0 Root Blknum: 513 System Dir Blknum: 514 First Cluster Group Blknum: 256 Block Size Bits: 12 Cluster Size Bits: 20 Max Node Slots: 10 Extended Attributes Inline Size: 0 Label: /export/backuprecovery.AUCP UUID: 5F9C2727159743529200CE9C5E155562 Hash: 0 (0x0) DX Seeds: 0 0 0 (0x00000000 0x00000000 0x00000000) Cluster stack: classic o2cb Cluster flags: 0 Inode: 2 Mode: 00 Generation: 3147295185<tel:3147295185> (0xbb97e9d1) FS Generation: 3147295185<tel:3147295185> (0xbb97e9d1) CRC32: 00000000 ECC: 0000 Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 1572864 ctime: 0x4e7f1f5d 0x0 -- Sun Sep 25 05:32:29.0 2011 atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969 mtime: 0x4e7f1f5d 0x0 -- Sun Sep 25 05:32:29.0 2011 dtime: 0x0 -- Wed Dec 31 16:00:00 1969 Refcount Block: 0 Last Extblk: 0 Orphan Slot: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 From: Sunil Mushran [mailto:sunil.mush...@gmail.com<mailto:sunil.mush...@gmail.com>] Sent: Friday, June 21, 2013 11:11 To: Ulf Zimmermann Cc: ocfs2-users@oss.oracle.com<mailto:ocfs2-users@oss.oracle.com> Subject: Re: [Ocfs2-users] Problems with volumes coming from RHEL5 going to OEL6 Can you dump the following using the 1.8 binary. debugfs.ocfs2 -R "stats" /dev/mapper/..... On Fri, Jun 21, 2013 at 6:17 AM, Ulf Zimmermann <u...@openlane.com<mailto:u...@openlane.com>> wrote: We have a production cluster of 6 nodes, which are currently running RHEL 5.8 with OCFS2 1.4.10. We snapclone these volumes to multiple destinations, one of them is a RHEL4 machine with OCFS2 1.2.9. Because of that the volumes are set so that we can read them there. We are now trying to bring up a new server, this one has OEL 6.3 on it and it comes with OCFS2 1.8.0 and tools 1.8.0-10. I can use tunefs.ocfs2 -cloned-volume to reset the UUID, but when I try to change the label I get: [root@co-db03 ulf]# tunefs.ocfs2 -L /export/backuprecovery.AUCP /dev/mapper/aucp_data_bk_2_x tunefs.ocfs2: Invalid name for a cluster while opening device "/dev/mapper/aucp_data_bk_2_x" fsck.ocfs2 core dumps with the following, I also filed a bug on Bugzilla for that: [root@co-db03 ulf]# fsck.ocfs2 /dev/mapper/aucp_data_bk_2_x fsck.ocfs2 1.8.0 *** glibc detected *** fsck.ocfs2: double free or corruption (fasttop): 0x000000000197f320 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3656475366] fsck.ocfs2[0x434c31] fsck.ocfs2[0x403bc2] /lib64/libc.so.6(__libc_start_main+0xfd)[0x365641ecdd] fsck.ocfs2[0x402879] ======= Memory map: ======== 00400000-00450000 r-xp 00000000 fc:00 12489 /sbin/fsck.ocfs2 0064f000-00651000 rw-p 0004f000 fc:00 12489 /sbin/fsck.ocfs2 00651000-00652000 rw-p 00000000 00:00 0 00850000-00851000 rw-p 00050000 fc:00 12489 /sbin/fsck.ocfs2 0197e000-0199f000 rw-p 00000000 00:00 0 [heap] 3655c00000-3655c20000 r-xp 00000000 fc:00 8797 /lib64/ld-2.12.so<http://ld-2.12.so> 3655e1f000-3655e20000 r--p 0001f000 fc:00 8797 /lib64/ld-2.12.so<http://ld-2.12.so> 3655e20000-3655e21000 rw-p 00020000 fc:00 8797 /lib64/ld-2.12.so<http://ld-2.12.so> 3655e21000-3655e22000 rw-p 00000000 00:00 0 3656400000-3656589000 r-xp 00000000 fc:00 8798 /lib64/libc-2.12.so<http://libc-2.12.so> 3656589000-3656788000 ---p 00189000 fc:00 8798 /lib64/libc-2.12.so<http://libc-2.12.so> 3656788000-365678c000 r--p 00188000 fc:00 8798 /lib64/libc-2.12.so<http://libc-2.12.so> 365678c000-365678d000 rw-p 0018c000 fc:00 8798 /lib64/libc-2.12.so<http://libc-2.12.so> 365678d000-3656792000 rw-p 00000000 00:00 0 3659c00000-3659c16000 r-xp 00000000 fc:00 8802 /lib64/libgcc_s-4.4.6-20120305.so.1 3659c16000-3659e15000 ---p 00016000 fc:00 8802 /lib64/libgcc_s-4.4.6-20120305.so.1 3659e15000-3659e16000 rw-p 00015000 fc:00 8802 /lib64/libgcc_s-4.4.6-20120305.so.1 3d3e800000-3d3e817000 r-xp 00000000 fc:00 12028 /lib64/libpthread-2.12.so<http://libpthread-2.12.so> 3d3e817000-3d3ea17000 ---p 00017000 fc:00 12028 /lib64/libpthread-2.12.so<http://libpthread-2.12.so> 3d3ea17000-3d3ea18000 r--p 00017000 fc:00 12028 /lib64/libpthread-2.12.so<http://libpthread-2.12.so> 3d3ea18000-3d3ea19000 rw-p 00018000 fc:00 12028 /lib64/libpthread-2.12.so<http://libpthread-2.12.so> 3d3ea19000-3d3ea1d000 rw-p 00000000 00:00 0 3e26600000-3e26603000 r-xp 00000000 fc:00 426 /lib64/libcom_err.so.2.1 3e26603000-3e26802000 ---p 00003000 fc:00 426 /lib64/libcom_err.so.2.1 3e26802000-3e26803000 r--p 00002000 fc:00 426 /lib64/libcom_err.so.2.1 3e26803000-3e26804000 rw-p 00003000 fc:00 426 /lib64/libcom_err.so.2.1 7fb063711000-7fb063714000 rw-p 00000000 00:00 0 7fb06371d000-7fb063720000 rw-p 00000000 00:00 0 7fffd5b95000-7fffd5bb6000 rw-p 00000000 00:00 0 [stack] 7fffd5bc5000-7fffd5bc6000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Abort (core dumped) I think one of the main question is what is the "Invalid name for a cluster while trying to join the group" or "Invalid name for a cluster while opening device". I am pretty sure that /etc/sysconfig/o2cb and /etc/ocfs2/cluster.conf is correct. Ulf. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com<mailto:Ocfs2-users@oss.oracle.com> https://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com<mailto:Ocfs2-users@oss.oracle.com> https://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users