On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <[email protected]> wrote: > On 06/11/2014 10:35 PM, John Stultz wrote: >> I've been seeing some ext4 corruption with recent kernels under >> qemu-system-arm. >> >> This issue seems to crop up after shutting down uncleanly (terminating >> qemu), shortly after booting about 50% of the time. >> >> ext4/mmc related dmesg details are: >> [ 3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at >> 0x10005000 irq 41,42 (pio) >> [ 3.268316] mmc0: new SDHC card at address 4567 >> [ 3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB >> [ 3.315699] mmcblk0: p1 p2 p3 p4 < p5 p6 > >> ... >> [ 11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option >> [ 11.904714] EXT4-fs (mmcblk0p5): recovery complete >> [ 11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered >> data mode. Opts: nomblk_io_submit,errors=panic >> ... >> [ 91.558824] EXT4-fs error (device mmcblk0p5): >> ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in >> gd; block bitmap corrupt. >> [ 91.560641] Aborting journal on device mmcblk0p5-8. >> [ 91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): >> panic forced after error >> [ 91.562589] >> [ 91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560 >> [ 91.564616] [<c00116e5>] (unwind_backtrace) from [<c000f3b1>] >> (show_stack+0x11/0x14) >> [ 91.565154] [<c000f3b1>] (show_stack) from [<c04262b1>] >> (dump_stack+0x59/0x7c) >> [ 91.565666] [<c04262b1>] (dump_stack) from [<c0423297>] (panic+0x67/0x178) >> [ 91.566147] [<c0423297>] (panic) from [<c0134bb1>] >> (ext4_handle_error+0x69/0x74) >> [ 91.566659] [<c0134bb1>] (ext4_handle_error) from [<c0135437>] >> (__ext4_grp_locked_error+0x6b/0x160) >> [ 91.567223] [<c0135437>] (__ext4_grp_locked_error) from >> [<c0143079>] (ext4_mb_generate_buddy+0x1b1/0x29c) >> [ 91.567860] [<c0143079>] (ext4_mb_generate_buddy) from [<c01447e5>] >> (ext4_mb_init_cache+0x219/0x4e0) >> [ 91.568473] [<c01447e5>] (ext4_mb_init_cache) from [<c0144b67>] >> (ext4_mb_init_group+0xbb/0x138) >> [ 91.569021] [<c0144b67>] (ext4_mb_init_group) from [<c0144cd7>] >> (ext4_mb_good_group+0xf3/0xfc) >> [ 91.569659] [<c0144cd7>] (ext4_mb_good_group) from [<c0145c8f>] >> (ext4_mb_regular_allocator+0x153/0x2c4) >> [ 91.570250] [<c0145c8f>] (ext4_mb_regular_allocator) from >> [<c0148095>] (ext4_mb_new_blocks+0x2fd/0x4e4) >> [ 91.570868] [<c0148095>] (ext4_mb_new_blocks) from [<c013f931>] >> (ext4_ext_map_blocks+0x965/0x10bc) >> [ 91.571444] [<c013f931>] (ext4_ext_map_blocks) from [<c0122c8b>] >> (ext4_map_blocks+0xfb/0x36c) >> [ 91.571992] [<c0122c8b>] (ext4_map_blocks) from [<c01263b1>] >> (mpage_map_and_submit_extent+0x99/0x5f0) >> [ 91.572614] [<c01263b1>] (mpage_map_and_submit_extent) from >> [<c0126bc1>] (ext4_writepages+0x2b9/0x4e8) >> [ 91.573201] [<c0126bc1>] (ext4_writepages) from [<c0094ae9>] >> (do_writepages+0x19/0x28) >> [ 91.573709] [<c0094ae9>] (do_writepages) from [<c008c811>] >> (__filemap_fdatawrite_range+0x3d/0x44) >> [ 91.574265] [<c008c811>] (__filemap_fdatawrite_range) from >> [<c008c883>] (filemap_flush+0x23/0x28) >> [ 91.574854] [<c008c883>] (filemap_flush) from [<c012bf75>] >> (ext4_rename+0x2f9/0x3e4) >> [ 91.575360] [<c012bf75>] (ext4_rename) from [<c00c3363>] >> (vfs_rename+0x183/0x45c) >> [ 91.575911] [<c00c3363>] (vfs_rename) from [<c00c3867>] >> (SyS_renameat2+0x22b/0x26c) >> [ 91.576460] [<c00c3867>] (SyS_renameat2) from [<c00c38df>] >> (SyS_rename+0x1f/0x24) >> [ 91.576961] [<c00c38df>] (SyS_rename) from [<c000cd01>] >> (ret_fast_syscall+0x1/0x5c) >> >> >> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 >> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't >> be surprising, as I saw problems with that patch earlier in the >> 3.15-rc cycle: >> https://lkml.org/lkml/2014/4/14/824 >> >> However that discussion petered out (possibly my fault for not >> following up) as to if it was an issue with the patch or a issue with >> qemu. Then the original issue disappeared for me, which I figured was >> due to a fix upstream, but now I'm guessing coincided with me updating >> my system and getting qemu v2.0 (where as previously I was on 1.5). >> >> $ qemu-system-arm -version >> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright >> (c) 2003-2008 Fabrice Bellard >> >> While the previous behavior was annoying and kept my emulated >> environments from booting, this while a bit more rare and subtle eats >> the disks, which is much more painful for my testing. >> >> Unfortunately reverting the change (manually, as it doesn't revert >> cleanly anymore) doesn't seem to completely avoid the issue, so the >> bisection may have gone slightly astray (though it is interesting it >> landed on the same commit I earlier had trouble with). So I'll >> back-track and double check some of the last few "good" results to >> validate I didn't just luck into 3 good boots accidentally. I'll also >> review my revert in case I missed something subtle in doing it >> manually. >> >> Anyway, if there is any thoughts on how to better chase this down and >> debug it, I'd appreciate it! I can also provide reproduction >> instructions with a pre-built Linaro android disk image and hand built >> kernel if anyone wants to debug this themselves. > > So I just wanted to check if anyone else tried looking into this issue? > I'd be happy to share my qemu environment, config, etc. > > I sunk a couple of weeks bisecting to try to narrow down the more > sporadic issue, but was unsuccessful past the initial commit above. > Since then I've been far too swamped to spend any more time on it. Even > so, its a *major* pain for testing but it seems like no one else really > cares?
I'm in the same boat as far as poor bisection results. :( However, I keep using the 3-patch mmci fix series from Ulf, and haven't hit any trouble with them. Though perhaps I'm just getting lucky? http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

