On 19/07/13 18:34, Joseph Salisbury wrote:
> The commit(b0dd6b7) you mention in the upstrem bug report is in the 3.2 
> stable tree as commit 76f4fa4:
> * 76f4fa4 - ext4: fix the free blocks calculation for ext3 file systems w/ 
> uninit_bg (1 year, 1 month ago) <Theodore Ts'o>
>
> It was available as of 3.2.20 as you say:
>   git describe --contains 76f4fa4
> v3.2.20~1
>
> This means that patch is in the 3.2.0-49 Ubuntu kernel, since it
> contains all the upstream 3.2.46 updates.
>
> The patch from Darrick J Wong that you mention is still being discuss on the 
> linux-ext4 mailing list and is not yet available in the mainline kernel tree:
>   ext4: Prevent massive fs corruption if verifying the block bitmap fails
>
> Do you have a way to easily reproduce this bug?  If so, I can build a
> test kernel with Darrick's patch for you to test.

'Fraid not -- it's a one-off event (I hope!).

The filesystem in question (/export/share - mostly used for backups of 
other machines and ISO boot images) had originally been created on a 
logical volume of ~640Gb in a volume group of just under 1Tb on a single 
PV composed of a RAID10 array of two 1Tb partitions, one on each of two 
2Gb SATA disks. *At some later time* this LV was expanded to use the 
rest of the free space in that volume group, making it 800Gb, and *the 
filesystem was resized *to match*-- this may have been a contributing 
factor.*

This week, because the FS was getting quite full (about ~97% or *~30Gb 
left, i.e. within the last ~40G **r**eserved for root - could this be 
part of the trigger?*), I decided to install two spare disks so that I 
could migrate this VG onto them. This involved a power cycle, reboot, 
and lots of playing around with mdadm -- but I don't think any of this 
was significant.

After reboot, I had all 4 disks accessible, with no errors. One of the 
new disks was virgin, and I had created a new RAID10 mirror using it:

    # mdadm --create /dev/md/scratch --bitmap=internal --level=10
--parity=f2 --raid-devices=2 --name=new missing /dev/sdd1


The other was recycled from another machine, and already had MD/LVM 
volumes on it, which were correctly recognised as "foreign" 
arrays/volumes. I mounted the one that still contained the system image 
from the other machine and copied it into a subdirectory of 
/export/share (specifically, Backups/Galaxy/suse-11.4/ -- see below) 
using rsync -- *about 15Gb of data, using up about half the remaining 
(reserved) space. **This was the last write operation on the FS*. (I ran 
rsync again immediately afterwards, to verify that all files had been 
transferred with no errors. and all seemed OK. Nonetheless, *I think 
this is where the corruption occurred*.)

Then I dismantled the foreign LV/MD stack, wiped that disk, and made it 
part of the new RAID10 array, triggering a resync. Then I added the new 
array to the existing VG and migrated the LVs in it to the new array 
using pvmove.

The pvmove completed without errors, so I then removed the original 
array from the VG. (The raid remirroring completed without errors too, 
but I'm not sure when, probably later). Now that the VG was on a bigger 
disk, I decided to expand each of the LVs on it. Then when I tried to 
resize /export/share to use the expanded space, I was told I should run 
e2fsck first - which reported many errors, starting with:

    e2fsck 1.42 (29-Nov-2011)
    e2fsck: Group descriptors look bad... trying backup blocks...
    One or more block group descriptor checksums are invalid.  Fix<y>? yes

    Group descriptor 0 checksum is invalid.  FIXED.
    Group descriptor 1 checksum is invalid.  FIXED.
    Group descriptor 2 checksum is invalid.  FIXED.
    Group descriptor 3 checksum is invalid.  FIXED.
    ... etc etc ...
    Group descriptor 6397 checksum is invalid.  FIXED.
    Group descriptor 6398 checksum is invalid.  FIXED.
    Group descriptor 6399 checksum is invalid.  FIXED.
    Pass 1: Checking inodes, blocks, and sizes
    Group 2968's block bitmap at 97248129 conflicts with some other fs block.
    Relocate<y>? yes

    Relocating group 2968's block bitmap from 97248129 to 96998147...

    Running additional passes to resolve blocks claimed by more than one 
inode...
    Pass 1B: Rescanning for multiply-claimed blocks
    Multiply-claimed block(s) in inode 24248332: 97255511 97255512 97255513 
97255514 97255515 97255516 97255517 97255518 97255519 97255520 97255521 
97255522 97255523 97255524 97255525 97255526 97255527 97255528 97255529 
97255530 97255531 97255532 97255533 97255534 97255535 97255536 97255537 
97255538 97255539 97255540 97255541 97255542 97255543 97255544 97255545 
97255546 97255547 97255548 97255549 97255550 97255551 97255552 97255553 
97255554 97255555 97255556 97255557 97255558 97255559 97255560 97255561 
97255562 97255563 97255564 97255565 97255566 97255567 97255568 97255569 
97255570 97255571 97255572 97255573 97255574 97255575 97255576 97255577 
97255578 97255579 97255580 97255581 97255582 97255583 97255584 97255585 
97255586 97255587 97255588 97255589 97255590 97255591 97255592 97255593 
97255594 97255595 97255596 97255597 97255598 97255599 97255600 97255601 
97255602 97255603 97255604 97255605 97255606 97255607 97255608 97255609 
97255610 97255611 97255612 97255613 97255614 97255615 9725
 5616 97255617 97255618
    97255619 97255620 97255621 97255622 97255623 97255624 97255625 97255626 
97255627 97255628 97255629 97255630 97255631 97255632 97255633 97255634 
97255635 97255636 97255637 97255638 97255639 97255640 97255641 97255642 
97255643 97255644 97255645 97255646
    ... etc etc ...
    Multiply-claimed block(s) in inode 24270904: 97263482 97263483
    Multiply-claimed block(s) in inode 24270909: 97263574 97263575
    Multiply-claimed block(s) in inode 24270931: 97263606 97263607
    Pass 1C: Scanning directories for inodes with multiply-claimed blocks
    Pass 1D: Reconciling multiply-claimed blocks
    (There are 1334 inodes containing multiply-claimed blocks.)

    File /Backups/Tesseract/DrivingLicenceReverse_300dpi.bmp (inode #24248332, 
mod time Thu Mar 25 01:34:37 2010)
       has 136 multiply-claimed block(s), shared with 7 file(s):
             /Backups/Galaxy/suse-11.4/bin/bash (inode #24269252, mod time Thu 
Jul 12 20:04:07 2012)
             /Backups/Galaxy/suse-11.4/bin/basename (inode #24269251, mod time 
Wed Sep 21 16:30:45 2011)
             /Backups/Galaxy/suse-11.4/bin/arch (inode #24269250, mod time Wed 
Sep 21 16:30:45 2011)
             /Backups/Galaxy/suse-11.4/.local/share/applications/defaults.list 
(inode #24269249, mod time Mon Sep 12 19:44:00 2011)
             /Backups/Galaxy/suse-11.4/.config/Trolltech.conf (inode #24269248, 
mod time Wed Oct 26 13:59:14 2011)
             /Backups/Galaxy/suse-11.4/profilerc (inode #24269247, mod time Mon 
Sep 12 19:44:00 2011)
             /Backups/Galaxy/suse-11.4/C:\nppdf32Log\debuglog.txt (inode 
#24269246, mod time Sun Sep  9 14:37:47 2012)
    Clone multiply-claimed blocks<y>? yes

    File /Backups/Tesseract/wla_user_guide.pdf (inode #24248352, mod time Thu 
Nov 13 12:18:26 2003)
       has 1310 multiply-claimed block(s), shared with 107 file(s):
             /Backups/Galaxy/suse-11.4/bin/tcsh (inode #24269354, mod time Sat 
Feb 19 02:49:24 2011)
             /Backups/Galaxy/suse-11.4/bin/tar (inode #24269353, mod time Tue 
Jan  3 00:33:47 2012)
             /Backups/Galaxy/suse-11.4/bin/sync (inode #24269352, mod time Wed 
Sep 21 16:30:49 2011)
             /Backups/Galaxy/suse-11.4/bin/su (inode #24269351, mod time Wed 
Sep 21 16:30:49 2011)
             /Backups/Galaxy/suse-11.4/bin/stty (inode #24269350, mod time Wed 
Sep 21 16:30:48 2011)
             /Backups/Galaxy/suse-11.4/bin/stat (inode #24269349, mod time Wed 
Sep 21 16:30:48 2011)
             /Backups/Galaxy/suse-11.4/bin/spawn_login (inode #24269348, mod 
time Sat Feb 19 02:46:10 2011)
             /Backups/Galaxy/suse-11.4/bin/spawn_console (inode #24269347, mod 
time Sat Feb 19 02:46:10 2011)
    ... etc etc ...

On examining the contents of these files, it became evident that in each 
case the newly copied files in Backups/Galaxy/suse-11.4/ were correct, 
while the named files in Backups/Tesseract/... were corrupted. Hence my 
conclusion that some of the blocks already allocated to the latter were 
erroneously taken to be free and used for the new files copied in by rsync.

    ...
    File 
/Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-oc.xml (inode 
#24270909, mod time Sun Aug 14 21:50:15 2011)
       has 2 multiply-claimed block(s), shared with 2 file(s):
             <filesystem metadata>
             /Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the 
Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time 
Fri Feb  4 22:53:03 2011)
    Multiply-claimed blocks already reassigned or cloned.

    File 
/Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-wa.xml (inode 
#24270931, mod time Sun Aug 14 21:50:20 2011)
       has 2 multiply-claimed block(s), shared with 2 file(s):
             <filesystem metadata>
             /Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the 
Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time 
Fri Feb  4 22:53:03 2011)
    Multiply-claimed blocks already reassigned or cloned.

    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Block bitmap differences:  +96998147
    Fix<y>? yes

    Free blocks count wrong for group #1133 (0, counted=156).
    Fix<y>? yes

    Free blocks count wrong for group #1134 (0, counted=943).
    Fix<y>? yes

    ... etc etc ...

    Free blocks count wrong for group #6019 (32768, counted=0).
    Fix<y>? yes

    Free blocks count wrong for group #6020 (32768, counted=0).
    Fix<y>? yes

    ...

    Directories count wrong for group #4465 (0, counted=29).
    Fix<y>? yes

    Free inodes count wrong (52421173, counted=51433277).
    Fix<y>? yes


    share: ***** FILE SYSTEM WAS MODIFIED *****

       995523 inodes used (1.90%)
         1231 non-contiguous files (0.1%)
          980 non-contiguous directories (0.1%)
              # of inodes with ind/dind/tind blocks: 0/0/0
              Extent depth histogram: 955338/210/3
    195882827 blocks used (93.40%)
            0 bad blocks
           38 large files

       859488 regular files
        90714 directories
           94 character device files
           64 block device files
           16 fifos
        79548 links
        44961 symbolic links (39613 fast symbolic links)
          177 sockets
    --------
      1075062 files

Because I suspected the FS might have been corrupted by pvmove shuffling 
its data between volumes (or even by the md remirroring process going on 
underneath that!), I put the old PV that I had recently removed from the 
VG into a new VG of its own, and used lvcreate/lvextend to resurrect the 
original copy of the FS:

    # lvcreate --verbose --name replay --extents 171751 --zero n test_vg 
/dev/md126:65536-
    # lvextend --verbose --extents 204800 /dev/test_vg/replay 
/dev/md126:30720-63768

Running

    # e2fsck -f -n /dev/test_vg/replay

showed exactly the same corruption. Thus it seems that the FS was 
already damaged before it was mirrored onto the new volume, which is why 
I suspect the problem lies in EXT4 rather than LVM or md.

Here's the output of dumpe2fs -h as it was after the corruption but 
before letting e2fsck fix it:

Filesystem volume name:   share
Last mounted on:          /export/share
Filesystem UUID:          80477518-0fea-447a-bece-f77fe26193bb
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype 
extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              52428800
Block count:              209715200
Reserved block count:     10484660
Free blocks:              13897914
Free inodes:              51433277
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      974
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              128
RAID stripe width:        256
Flex block group size:    16
Filesystem created:       Wed Feb  6 15:50:31 2013
Last mount time:          Mon Jul 15 17:51:37 2013
Last write time:          Mon Jul 15 18:01:03 2013
Mount count:              24
Maximum mount count:      -1
Last checked:             Thu Feb  7 18:33:49 2013
Check interval:           0 (<none>)
Lifetime writes:          480 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      5ff8295f-3988-40e0-b195-998d6e67aa31
Journal backup:           inode blocks
FS Error count:           1
First error time:         Mon Jul 15 18:01:03 2013
First error function:     ext4_mb_generate_buddy
First error line #:       739
First error inode #:      0
First error block #:      0
Last error time:          Mon Jul 15 18:01:03 2013
Last error function:      ext4_mb_generate_buddy
Last error line #:        739
Last error inode #:       0
Last error block #:       0
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0000645d
Journal start:            0

As it happens, only 13 existing files (containing a total of 65Mb of data 
between them) were damaged,
and they were mostly large but ancient and not very important content backed up 
from other machines.
So I've had something of a lucky escape; and I've subsequently changed all live 
volumes to use
errors=remount-ro rather than errors=continue, which I had never realised was 
the default!

I can provide any information you'd like about the corrupted FS, as I've 
preserved it in that state since
(modulo anything that might have been changed by mounting it read-only). But I 
don't have any way of finding
out what the internal state was when it was last mounted or immediately before 
the corruption occurred.

Hope this helps -- and let me know if there's anything you'd like me to
extract from the corrupted FS.

Ciao,
Dave

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1202994

Title:
  EXT4 filesystem corruption with uninit_bg and error=continue

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to