On 19/07/13 18:34, Joseph Salisbury wrote: > The commit(b0dd6b7) you mention in the upstrem bug report is in the 3.2 > stable tree as commit 76f4fa4: > * 76f4fa4 - ext4: fix the free blocks calculation for ext3 file systems w/ > uninit_bg (1 year, 1 month ago) <Theodore Ts'o> > > It was available as of 3.2.20 as you say: > git describe --contains 76f4fa4 > v3.2.20~1 > > This means that patch is in the 3.2.0-49 Ubuntu kernel, since it > contains all the upstream 3.2.46 updates. > > The patch from Darrick J Wong that you mention is still being discuss on the > linux-ext4 mailing list and is not yet available in the mainline kernel tree: > ext4: Prevent massive fs corruption if verifying the block bitmap fails > > Do you have a way to easily reproduce this bug? If so, I can build a > test kernel with Darrick's patch for you to test.
'Fraid not -- it's a one-off event (I hope!). The filesystem in question (/export/share - mostly used for backups of other machines and ISO boot images) had originally been created on a logical volume of ~640Gb in a volume group of just under 1Tb on a single PV composed of a RAID10 array of two 1Tb partitions, one on each of two 2Gb SATA disks. *At some later time* this LV was expanded to use the rest of the free space in that volume group, making it 800Gb, and *the filesystem was resized *to match*-- this may have been a contributing factor.* This week, because the FS was getting quite full (about ~97% or *~30Gb left, i.e. within the last ~40G **r**eserved for root - could this be part of the trigger?*), I decided to install two spare disks so that I could migrate this VG onto them. This involved a power cycle, reboot, and lots of playing around with mdadm -- but I don't think any of this was significant. After reboot, I had all 4 disks accessible, with no errors. One of the new disks was virgin, and I had created a new RAID10 mirror using it: # mdadm --create /dev/md/scratch --bitmap=internal --level=10 --parity=f2 --raid-devices=2 --name=new missing /dev/sdd1 The other was recycled from another machine, and already had MD/LVM volumes on it, which were correctly recognised as "foreign" arrays/volumes. I mounted the one that still contained the system image from the other machine and copied it into a subdirectory of /export/share (specifically, Backups/Galaxy/suse-11.4/ -- see below) using rsync -- *about 15Gb of data, using up about half the remaining (reserved) space. **This was the last write operation on the FS*. (I ran rsync again immediately afterwards, to verify that all files had been transferred with no errors. and all seemed OK. Nonetheless, *I think this is where the corruption occurred*.) Then I dismantled the foreign LV/MD stack, wiped that disk, and made it part of the new RAID10 array, triggering a resync. Then I added the new array to the existing VG and migrated the LVs in it to the new array using pvmove. The pvmove completed without errors, so I then removed the original array from the VG. (The raid remirroring completed without errors too, but I'm not sure when, probably later). Now that the VG was on a bigger disk, I decided to expand each of the LVs on it. Then when I tried to resize /export/share to use the expanded space, I was told I should run e2fsck first - which reported many errors, starting with: e2fsck 1.42 (29-Nov-2011) e2fsck: Group descriptors look bad... trying backup blocks... One or more block group descriptor checksums are invalid. Fix<y>? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. ... etc etc ... Group descriptor 6397 checksum is invalid. FIXED. Group descriptor 6398 checksum is invalid. FIXED. Group descriptor 6399 checksum is invalid. FIXED. Pass 1: Checking inodes, blocks, and sizes Group 2968's block bitmap at 97248129 conflicts with some other fs block. Relocate<y>? yes Relocating group 2968's block bitmap from 97248129 to 96998147... Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 24248332: 97255511 97255512 97255513 97255514 97255515 97255516 97255517 97255518 97255519 97255520 97255521 97255522 97255523 97255524 97255525 97255526 97255527 97255528 97255529 97255530 97255531 97255532 97255533 97255534 97255535 97255536 97255537 97255538 97255539 97255540 97255541 97255542 97255543 97255544 97255545 97255546 97255547 97255548 97255549 97255550 97255551 97255552 97255553 97255554 97255555 97255556 97255557 97255558 97255559 97255560 97255561 97255562 97255563 97255564 97255565 97255566 97255567 97255568 97255569 97255570 97255571 97255572 97255573 97255574 97255575 97255576 97255577 97255578 97255579 97255580 97255581 97255582 97255583 97255584 97255585 97255586 97255587 97255588 97255589 97255590 97255591 97255592 97255593 97255594 97255595 97255596 97255597 97255598 97255599 97255600 97255601 97255602 97255603 97255604 97255605 97255606 97255607 97255608 97255609 97255610 97255611 97255612 97255613 97255614 97255615 9725 5616 97255617 97255618 97255619 97255620 97255621 97255622 97255623 97255624 97255625 97255626 97255627 97255628 97255629 97255630 97255631 97255632 97255633 97255634 97255635 97255636 97255637 97255638 97255639 97255640 97255641 97255642 97255643 97255644 97255645 97255646 ... etc etc ... Multiply-claimed block(s) in inode 24270904: 97263482 97263483 Multiply-claimed block(s) in inode 24270909: 97263574 97263575 Multiply-claimed block(s) in inode 24270931: 97263606 97263607 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 1334 inodes containing multiply-claimed blocks.) File /Backups/Tesseract/DrivingLicenceReverse_300dpi.bmp (inode #24248332, mod time Thu Mar 25 01:34:37 2010) has 136 multiply-claimed block(s), shared with 7 file(s): /Backups/Galaxy/suse-11.4/bin/bash (inode #24269252, mod time Thu Jul 12 20:04:07 2012) /Backups/Galaxy/suse-11.4/bin/basename (inode #24269251, mod time Wed Sep 21 16:30:45 2011) /Backups/Galaxy/suse-11.4/bin/arch (inode #24269250, mod time Wed Sep 21 16:30:45 2011) /Backups/Galaxy/suse-11.4/.local/share/applications/defaults.list (inode #24269249, mod time Mon Sep 12 19:44:00 2011) /Backups/Galaxy/suse-11.4/.config/Trolltech.conf (inode #24269248, mod time Wed Oct 26 13:59:14 2011) /Backups/Galaxy/suse-11.4/profilerc (inode #24269247, mod time Mon Sep 12 19:44:00 2011) /Backups/Galaxy/suse-11.4/C:\nppdf32Log\debuglog.txt (inode #24269246, mod time Sun Sep 9 14:37:47 2012) Clone multiply-claimed blocks<y>? yes File /Backups/Tesseract/wla_user_guide.pdf (inode #24248352, mod time Thu Nov 13 12:18:26 2003) has 1310 multiply-claimed block(s), shared with 107 file(s): /Backups/Galaxy/suse-11.4/bin/tcsh (inode #24269354, mod time Sat Feb 19 02:49:24 2011) /Backups/Galaxy/suse-11.4/bin/tar (inode #24269353, mod time Tue Jan 3 00:33:47 2012) /Backups/Galaxy/suse-11.4/bin/sync (inode #24269352, mod time Wed Sep 21 16:30:49 2011) /Backups/Galaxy/suse-11.4/bin/su (inode #24269351, mod time Wed Sep 21 16:30:49 2011) /Backups/Galaxy/suse-11.4/bin/stty (inode #24269350, mod time Wed Sep 21 16:30:48 2011) /Backups/Galaxy/suse-11.4/bin/stat (inode #24269349, mod time Wed Sep 21 16:30:48 2011) /Backups/Galaxy/suse-11.4/bin/spawn_login (inode #24269348, mod time Sat Feb 19 02:46:10 2011) /Backups/Galaxy/suse-11.4/bin/spawn_console (inode #24269347, mod time Sat Feb 19 02:46:10 2011) ... etc etc ... On examining the contents of these files, it became evident that in each case the newly copied files in Backups/Galaxy/suse-11.4/ were correct, while the named files in Backups/Tesseract/... were corrupted. Hence my conclusion that some of the blocks already allocated to the latter were erroneously taken to be free and used for the new files copied in by rsync. ... File /Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-oc.xml (inode #24270909, mod time Sun Aug 14 21:50:15 2011) has 2 multiply-claimed block(s), shared with 2 file(s): <filesystem metadata> /Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time Fri Feb 4 22:53:03 2011) Multiply-claimed blocks already reassigned or cloned. File /Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-wa.xml (inode #24270931, mod time Sun Aug 14 21:50:20 2011) has 2 multiply-claimed block(s), shared with 2 file(s): <filesystem metadata> /Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time Fri Feb 4 22:53:03 2011) Multiply-claimed blocks already reassigned or cloned. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +96998147 Fix<y>? yes Free blocks count wrong for group #1133 (0, counted=156). Fix<y>? yes Free blocks count wrong for group #1134 (0, counted=943). Fix<y>? yes ... etc etc ... Free blocks count wrong for group #6019 (32768, counted=0). Fix<y>? yes Free blocks count wrong for group #6020 (32768, counted=0). Fix<y>? yes ... Directories count wrong for group #4465 (0, counted=29). Fix<y>? yes Free inodes count wrong (52421173, counted=51433277). Fix<y>? yes share: ***** FILE SYSTEM WAS MODIFIED ***** 995523 inodes used (1.90%) 1231 non-contiguous files (0.1%) 980 non-contiguous directories (0.1%) # of inodes with ind/dind/tind blocks: 0/0/0 Extent depth histogram: 955338/210/3 195882827 blocks used (93.40%) 0 bad blocks 38 large files 859488 regular files 90714 directories 94 character device files 64 block device files 16 fifos 79548 links 44961 symbolic links (39613 fast symbolic links) 177 sockets -------- 1075062 files Because I suspected the FS might have been corrupted by pvmove shuffling its data between volumes (or even by the md remirroring process going on underneath that!), I put the old PV that I had recently removed from the VG into a new VG of its own, and used lvcreate/lvextend to resurrect the original copy of the FS: # lvcreate --verbose --name replay --extents 171751 --zero n test_vg /dev/md126:65536- # lvextend --verbose --extents 204800 /dev/test_vg/replay /dev/md126:30720-63768 Running # e2fsck -f -n /dev/test_vg/replay showed exactly the same corruption. Thus it seems that the FS was already damaged before it was mirrored onto the new volume, which is why I suspect the problem lies in EXT4 rather than LVM or md. Here's the output of dumpe2fs -h as it was after the corruption but before letting e2fsck fix it: Filesystem volume name: share Last mounted on: /export/share Filesystem UUID: 80477518-0fea-447a-bece-f77fe26193bb Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 52428800 Block count: 209715200 Reserved block count: 10484660 Free blocks: 13897914 Free inodes: 51433277 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 974 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stride: 128 RAID stripe width: 256 Flex block group size: 16 Filesystem created: Wed Feb 6 15:50:31 2013 Last mount time: Mon Jul 15 17:51:37 2013 Last write time: Mon Jul 15 18:01:03 2013 Mount count: 24 Maximum mount count: -1 Last checked: Thu Feb 7 18:33:49 2013 Check interval: 0 (<none>) Lifetime writes: 480 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 5ff8295f-3988-40e0-b195-998d6e67aa31 Journal backup: inode blocks FS Error count: 1 First error time: Mon Jul 15 18:01:03 2013 First error function: ext4_mb_generate_buddy First error line #: 739 First error inode #: 0 First error block #: 0 Last error time: Mon Jul 15 18:01:03 2013 Last error function: ext4_mb_generate_buddy Last error line #: 739 Last error inode #: 0 Last error block #: 0 Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x0000645d Journal start: 0 As it happens, only 13 existing files (containing a total of 65Mb of data between them) were damaged, and they were mostly large but ancient and not very important content backed up from other machines. So I've had something of a lucky escape; and I've subsequently changed all live volumes to use errors=remount-ro rather than errors=continue, which I had never realised was the default! I can provide any information you'd like about the corrupted FS, as I've preserved it in that state since (modulo anything that might have been changed by mounting it read-only). But I don't have any way of finding out what the internal state was when it was last mounted or immediately before the corruption occurred. Hope this helps -- and let me know if there's anything you'd like me to extract from the corrupted FS. Ciao, Dave -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1202994 Title: EXT4 filesystem corruption with uninit_bg and error=continue To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs