Public bug reported:

There was a long and complicated sequence of activities involving mdadm,
lvm, and specifically pvmove leading up to the point where the
corruption was discovered, but I suspect most were irrelevant. AFAICT,
the bug was triggered by the following simple operations:

* the FS was unmounted & remounted -- thus, the journal was fresh and hadn't 
wrapped (which other reports appear to indicate would have prevented the bug 
showing up)
* the FS options include uninit_bg AND error=continue
* a bunch of files were then copied onto the FS -- this was the last write 
operation on the FS.

Later, e2fsck indicated a bunch of problems, including corrupted group
descriptors. Specifically, it fould that many blocks were now claimed by
two files; in each case, one was an old file and one was one of those
newly copied, and the contents matched the expected data for latter.

So I think this starts with an instance of the miscalculation of
checksums in uninit_bg blocks (fixed by Ted Ts'o last June), followed by
the (invalid or uninitialised) bitmap being used anyway (because
error=continue) and the blocks it appeared to show as free then being
allocated to new files.

Jul 15 18:01:03 redshift kernel: [ 9332.021245] EXT4-fs error (device dm-1): 
ext4_mb_generate_buddy:739: group 2968, 8105 clusters in bitmap, 0 in gd
...
Jul 16 18:05:14 redshift kernel: [95982.560034] EXT4-fs (dm-1): error count: 1
Jul 16 18:05:14 redshift kernel: [95982.560044] EXT4-fs (dm-1): initial error 
at 1373907663: ext4_mb_generate_buddy:739
Jul 16 18:05:14 redshift kernel: [95982.560053] EXT4-fs (dm-1): last error at 
1373907663: ext4_mb_generate_buddy:739
...
Jul 16 20:53:19 redshift kernel: [106068.077526] EXT4-fs (dm-1): 
ext4_check_descriptors: Checksum for group 0 failed (47831!=4825)
Jul 16 20:53:19 redshift kernel: [106068.077540] EXT4-fs (dm-1): 
ext4_check_descriptors: Checksum for group 1 failed (14670!=8882)

I see that in an astonishing display of synchronicity, Darrick J Wong
filed a patch at 17 Jul 2013 04:02  -- the very next day, or maybe even
the same day, depending on timezone -- to prevent the knockon effects
(see "[PATCH] ext4: Prevent massive fs corruption if verifying the block
bitmap fails" at http://permalink.gmane.org/gmane.comp.file-
systems.ext4/39535 ).

But what puzzles me is that the initial triggering bug is still in this
kernel (vmlinuz-3.2.0-49-generic), when according to this conversation
https://bugzilla.kernel.org/show_bug.cgi?id=42723#c8 the fix was
backported to 3.2.20? Is it possible that there is another way of
getting the "ext4_mb_generate_buddy:739" error?

I have kept an e2image dump of the corrupted FS in case it's of any use
to EXT4 developers, but it's not attached, as even in QCOW2 format it's
~1Gb.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-49-generic 3.2.0-49.75
ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46
Uname: Linux 3.2.0-49-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu17.3
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC1:  dsg        7005 F.... pulseaudio
 /dev/snd/controlC0:  dsg        7005 F.... pulseaudio
CRDA:
 country AW:
        (2402 - 2482 @ 40), (N/A, 20)
        (5170 - 5250 @ 40), (N/A, 20)
        (5250 - 5330 @ 40), (N/A, 20), DFS
        (5490 - 5710 @ 40), (N/A, 27), DFS
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
   Mixer name   : 'Realtek ALC892'
   Components   : 'HDA:10ec0892,1458a102,00100302'
   Controls      : 46
   Simple ctrls  : 21
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdefc000 irq 19'
   Mixer name   : 'ATI RS690/780 HDMI'
   Components   : 'HDA:1002791a,00791a00,00100000'
   Controls      : 4
   Simple ctrls  : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Thu Jul 18 19:04:57 2013
HibernationDevice: RESUME=UUID=2ab26064-3b90-475d-b3c2-51a70c2d990a
InstallationMedia: Kubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 
(20120822.2)
MachineType: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H
MarkForUpload: True
ProcEnviron:
 LANGUAGE=en_GB
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-49-generic 
root=/dev/mapper/system-kubuntu ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-49-generic N/A
 linux-backports-modules-3.2.0-49-generic  N/A
 linux-firmware                            1.79.4
RfKill:
 0: phy0: Wireless LAN
        Soft blocked: yes
        Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/23/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: FD
dmi.board.name: GA-890GPA-UD3H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: 
dmi:bvnAwardSoftwareInternational,Inc.:bvrFD:bd07/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnGA-890GPA-UD3H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-890GPA-UD3H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-890GPA-UD3H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug precise

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1202994

Title:
  EXT4 filesystem corruption with uninit_bg and error=continue

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to