Hi, On 05/24/2016 02:50 PM, Mailer Regs wrote: > Hi Eric, > Thanks for your reply. > Actually my boss requested me to bring back the file system yesterday, at the > cost of our media data (and half my monthly payment) to continue our service. > Can you show me how to get debug information for this case (or documentation > about it) ? Actually I'm not very familiar with GDB or debugging techniques, > but I will try to reproduce the situation and solve it to prevent future > problems like this to happen. >
You're welcome. 1. what's your linux distribution? 2. what DLM stack do you use? o2cb or pacemaker? Take opensuse Leap42.1 for example: 1. ensure you have relative repos (for software/debuginfo/source packages): $ zypper lr -u # | Alias | Name | Enabled | GPG Check | Refresh | Type | URI 3 | download.opensuse.org-oss | Main Repository (DEBUG) | Yes | (r ) Yes | Yes | yast2 | http://download.opensuse.org/debug/distribution/leap/42.1/repo/oss/ 4 | download.opensuse.org-oss_1 | Main Repository (OSS) | Yes | (r ) Yes | Yes | yast2 | http://download.opensuse.org/distribution/leap/42.1/repo/oss/ 5 | download.opensuse.org-oss_2 | Main Repository (Sources) | Yes | (r ) Yes | Yes | yast2 | http://download.opensuse.org/source/distribution/leap/42.1/repo/oss/ 2. $zypper search ocfs2-tools S | Name | Summary | Type --+----------------------------+--------------------------------------------------------------+----------- | ocfs2-tools | Oracle Cluster File System 2 Core Tools | package | ocfs2-tools | Oracle Cluster File System 2 Core Tools | srcpackage | ocfs2-tools-debuginfo | Debug information for package ocfs2-tools | package | ocfs2-tools-debugsource | Debug sources for package ocfs2-tools | package | ocfs2-tools-devel | Oracle Cluster File System 2 Development files | package | ocfs2-tools-devel-static | Oracle Cluster File System 2 static libraries | package | ocfs2-tools-o2cb | Oracle Cluster File System 2 tools for the native o2cb stack | package | ocfs2-tools-o2cb-debuginfo | Debug information for package ocfs2-tools-o2cb | package 3. $sudo zypper install ocfs2-tools ocfs2-tools-debuginfo $sudo zypper source-install ocfs2-tools 4. gdb --args mount -t ocfs2 /dev/mapper/mpath3p1 /test now you're in gdb... learn about this cmds: start, breakpoint, run, continue, next, step, list, would be enough for you. For you refer: https://sourceware.org/gdb/current/onlinedocs/gdb/ 5. grep in ocfs2-tools source: $:~/ocfs2-tools> grep -rn "while trying to determine heartbeat information" mount.ocfs2/mount.ocfs2.c:385: "while trying to determine heartbeat information"); $vim mount.ocfs2/mount.ocfs2.c +385 we can see something bad happened in ocfs2_fill_heartbeat_desc(). so make a breakpoint and `nexti` into it. 6. do the similar step for fsck.ocfs2. BTW, there's an ocfs2 IRC channel you can find here: https://oss.oracle.com/pipermail/ocfs2-devel/2016-April/011934.html Eric > Sent from my BlackBerry 10 smartphone. > Original Message > From: Eric Ren > Sent: 13:38 Thứ ba, ngày 24 tháng năm năm 2016 > To: Mailer Regs; ocfs2-users@oss.oracle.com > Subject: Re: [Ocfs2-users] OCFS2 - Bad magic number > > Hello, > > I don't encounter this so far. You can install relative ocfs2-tools > debug packages and gdb to find out what's happening. And get your > findings back to us;-) > > To me, it look like a DLM issue, not super block. > > Eric > > On 05/22/2016 05:44 AM, Mailer Regs wrote: >> Hi LQ friends, >> >> I have a problem with our OCFS2 cluster, which I couldn't solve by myself. >> In short, I have a OCFS2 cluster with 3 nodes and a shared storage LUN. I >> have mapped the LUN to all 3 of the nodes, and split the LUN into 2 >> partitions, formatted them as OCFS2 filesystems and mounted them >> successfully. The system has been running OK for nearly 2 years, but today >> the partition 1 suddenly is not accessible. I have to reboot 1 node. After >> rebooting, the partition 2 is mounted OK, but the partition 1 cannot be >> mounted. >> The error is below: >> >> # mount -t ocfs2 /dev/mapper/mpath3p1 /test >> mount.ocfs2: Bad magic number in inode while trying to determine >> heartbeat information >> >> # fsck.ocfs2 /dev/mapper/mpath3p1 >> fsck.ocfs2 1.6.3 >> fsck.ocfs2: Bad magic number in inode while initializing the DLM >> >> # fsck.ocfs2 -r 2 /dev/mapper/mpath3p1 >> fsck.ocfs2 1.6.3 >> [RECOVER_BACKUP_SUPERBLOCK] Recover superblock information from backup >> block#1048576? <n> y >> fsck.ocfs2: Bad magic number in inode while initializing the DLM >> >> # parted /dev/mapper/mpath3 >> GNU Parted 1.8.1 >> Using /dev/mapper/mpath3 >> Welcome to GNU Parted! Type 'help' to view a list of commands. >> (parted) print >> >> Model: Linux device-mapper (dm) >> Disk /dev/mapper/mpath3: 20.0TB >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 17.4kB 10.2TB 10.2TB primary >> 2 10.2TB 20.0TB 9749GB primary >> >> >> >> Usually, the bad magic number happens when the super block is corrupted, >> and I have experienced several similar cases before, which can be solved >> quickly by using backup super blocks. But this case is different, I cannot >> fix the problem by simply replacing the super block, thus I'm out of ideas. >> >> Please take a look and suggest me how to solve this problem, as I need to >> recover the data, it's the most important goal now. >> >> Thanks in advance. >> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users