On Mon, Nov 12, 2012 at 01:24:30AM +0200, Laurentiu Gosu wrote: > We managed to track down the problem: the inodes which hold the > RootDirectory and System Directory(and probably others ..like hb) > were overwritten somehow(!?). > Using debugfs and a lot of detective work Marian found the inode > number of one of the sub-folders and then we cd .. until the most > top level reachable folder...and then used rdump to recover the > data.
Nicely done recovery! > Now the question is why the critical blocks were overwritten. Maybe > you can help to track this down and correct it(if that's the case). > So some facts from 2 days ago: > 1. ocfs2 cluster started becoming unresponsive(could not ls on some folders) > 2. we unmounted the device from all nodes and run a fscheck -y on > it(few months ago we did this succesfully) > 3. after succesfully finished fscheck i remounted the device on all 5 nodes. > 4. after 1 hour all nodes started reporting in syslog something like: > *Nov 9 15:40:17 ro02xsrv003 kernel: > (o2hb-B4CF8D4667,6098,9):o2hb_check_last_timestamp:576 ERROR: > Another node is heartbeating on device (dm-5): > expected(2:0xdfd1f518e3333501, 0x509d07bf), > ondisk(1:0xd81cb80a00020069, 0xac1bf00000001db8)** > **Nov 9 15:40:17 ro02xsrv003 kernel: > (o2hb-B4CF8D4667,6098,9):o2hb_check_slot:802 ERROR: Node 0 has > written a bad crc to dm-5** > **Nov 9 15:40:17 ro02xsrv003 kernel: > (o2hb-B4CF8D4667,6098,9):o2hb_dump_slot:526 ERROR: Dump slot > information: seq = 0x2c2527fa66646f6d, node = 37, cksum = 0xda52, > generation 0xf7a004a5a8c00000** > **Nov 9 15:40:17 ro02xsrv003 kernel: > (o2hb-B4CF8D4667,6098,9):o2hb_check_slot:802 ERROR: Node 3 has > written a bad crc to dm-5* > > So i believe the fscheck marked somehow the meta-data blocks as > writable and when they were used....kaboom. > Hope it helps somebody to find the root cause. If additional info > are needed for debugging let me know. That's really weird. The fsck code treats the system blocks first and doesn't have an easy way to clear them. I think you are very correct that something overwrote the front of your disk. I'm unsure whether this evidence matches your supposition (metadata blocks are re-allocated to regular files) or just a straight dd. If you haven't overwritten the whole disk yet, can you find a file with the heartbeat blocks in the metadata chain? Joel -- "But then she looks me in the eye And says, 'We're going to last forever,' And man you know I can't begin to doubt it. Cause it just feels so good and so free and so right, I know we ain't never going to change our minds about it, Hey! Here comes my girl." http://www.jlbec.org/ jl...@evilplan.org _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users