Public bug reported: There was a long and complicated sequence of activities involving mdadm, lvm, and specifically pvmove leading up to the point where the corruption was discovered, but I suspect most were irrelevant. AFAICT, the bug was triggered by the following simple operations:
* the FS was unmounted & remounted -- thus, the journal was fresh and hadn't wrapped (which other reports appear to indicate would have prevented the bug showing up) * the FS options include uninit_bg AND error=continue * a bunch of files were then copied onto the FS -- this was the last write operation on the FS. Later, e2fsck indicated a bunch of problems, including corrupted group descriptors. Specifically, it fould that many blocks were now claimed by two files; in each case, one was an old file and one was one of those newly copied, and the contents matched the expected data for latter. So I think this starts with an instance of the miscalculation of checksums in uninit_bg blocks (fixed by Ted Ts'o last June), followed by the (invalid or uninitialised) bitmap being used anyway (because error=continue) and the blocks it appeared to show as free then being allocated to new files. Jul 15 18:01:03 redshift kernel: [ 9332.021245] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 2968, 8105 clusters in bitmap, 0 in gd ... Jul 16 18:05:14 redshift kernel: [95982.560034] EXT4-fs (dm-1): error count: 1 Jul 16 18:05:14 redshift kernel: [95982.560044] EXT4-fs (dm-1): initial error at 1373907663: ext4_mb_generate_buddy:739 Jul 16 18:05:14 redshift kernel: [95982.560053] EXT4-fs (dm-1): last error at 1373907663: ext4_mb_generate_buddy:739 ... Jul 16 20:53:19 redshift kernel: [106068.077526] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 0 failed (47831!=4825) Jul 16 20:53:19 redshift kernel: [106068.077540] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 1 failed (14670!=8882) I see that in an astonishing display of synchronicity, Darrick J Wong filed a patch at 17 Jul 2013 04:02 -- the very next day, or maybe even the same day, depending on timezone -- to prevent the knockon effects (see "[PATCH] ext4: Prevent massive fs corruption if verifying the block bitmap fails" at http://permalink.gmane.org/gmane.comp.file- systems.ext4/39535 ). But what puzzles me is that the initial triggering bug is still in this kernel (vmlinuz-3.2.0-49-generic), when according to this conversation https://bugzilla.kernel.org/show_bug.cgi?id=42723#c8 the fix was backported to 3.2.20? Is it possible that there is another way of getting the "ext4_mb_generate_buddy:739" error? I have kept an e2image dump of the corrupted FS in case it's of any use to EXT4 developers, but it's not attached, as even in QCOW2 format it's ~1Gb. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-49-generic 3.2.0-49.75 ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46 Uname: Linux 3.2.0-49-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24. ApportVersion: 2.0.1-0ubuntu17.3 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: dsg 7005 F.... pulseaudio /dev/snd/controlC0: dsg 7005 F.... pulseaudio CRDA: country AW: (2402 - 2482 @ 40), (N/A, 20) (5170 - 5250 @ 40), (N/A, 20) (5250 - 5330 @ 40), (N/A, 20), DFS (5490 - 5710 @ 40), (N/A, 27), DFS Card0.Amixer.info: Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16' Mixer name : 'Realtek ALC892' Components : 'HDA:10ec0892,1458a102,00100302' Controls : 46 Simple ctrls : 21 Card1.Amixer.info: Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdefc000 irq 19' Mixer name : 'ATI RS690/780 HDMI' Components : 'HDA:1002791a,00791a00,00100000' Controls : 4 Simple ctrls : 1 Card1.Amixer.values: Simple mixer control 'IEC958',0 Capabilities: pswitch pswitch-joined penum Playback channels: Mono Mono: Playback [on] Date: Thu Jul 18 19:04:57 2013 HibernationDevice: RESUME=UUID=2ab26064-3b90-475d-b3c2-51a70c2d990a InstallationMedia: Kubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120822.2) MachineType: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H MarkForUpload: True ProcEnviron: LANGUAGE=en_GB TERM=xterm PATH=(custom, no user) LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-49-generic root=/dev/mapper/system-kubuntu ro quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-3.2.0-49-generic N/A linux-backports-modules-3.2.0-49-generic N/A linux-firmware 1.79.4 RfKill: 0: phy0: Wireless LAN Soft blocked: yes Hard blocked: no SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 07/23/2010 dmi.bios.vendor: Award Software International, Inc. dmi.bios.version: FD dmi.board.name: GA-890GPA-UD3H dmi.board.vendor: Gigabyte Technology Co., Ltd. dmi.board.version: x.x dmi.chassis.type: 3 dmi.chassis.vendor: Gigabyte Technology Co., Ltd. dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrFD:bd07/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnGA-890GPA-UD3H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-890GPA-UD3H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr: dmi.product.name: GA-890GPA-UD3H dmi.sys.vendor: Gigabyte Technology Co., Ltd. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug precise -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1202994 Title: EXT4 filesystem corruption with uninit_bg and error=continue To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs