On Wednesday, 7 of November 2007, Romano Giannetti wrote: > > On Tue, 2007-11-06 at 23:17 +0100, Romano Giannetti wrote: > > Well, I started bisecting it. It will be a long shot, I suspect... > > Well, I spent the last 36 hours (more or less) trying to bisect the SD > problem. The method I used was to insert the card, umount it, and make 8 dd > in a row; the kernel is "bad" if they differs, "good" if they are the same. > > I could not finish the bisect. The last pair good/bad were: > > bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] > [BLOCK] blk_rq_map_sg: force clear termination bit > good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] > exportfs: update documentation > > The problem to conclude the bisect is that there is a whole series of > commits, named [SG] something, that seems to matter; but my three try of a > commit between the previous two ended with a MMC layer not working with this > oops:
Can you please update the Bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=9286 with this information? > [ 81.738991] BUG: unable to handle kernel NULL pointer dereference at > virtual address 00000000 > [ 81.739003] printing eip: c01db437 *pde = 00000000 > [ 81.739010] Oops: 0000 [#1] SMP > [ 81.739016] Modules linked in: mmc_block binfmt_misc rfcomm l2cap > bluetooth ppdev i915 drm acpi_cpufreq cpufreq_conservative cpufreq_stats > cpufreq_ondemand freq_table cpufreq_userspace cpufreq_powersave dock > container sbs sbshc af_packet nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs > dm_crypt dm_mod sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss > snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss iTCO_wdt iTCO_vendor_support > serio_raw sdhci snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse pcspkr > mmc_core snd_seq snd_timer snd_seq_device snd soundcore video output battery > snd_page_alloc ac button intel_agp agpgart evdev ext3 jbd mbcache sg sr_mod > cdrom sd_mod ata_piix ehci_hcd ata_generic ohci1394 uhci_hcd ieee1394 libata > scsi_mod generic usbcore r8169 thermal processor fan > [ 81.739122] > [ 81.739127] Pid: 6075, comm: mmcqd Not tainted (2.6.23-bisect #19) > [ 81.739132] EIP: 0060:[<c01db437>] EFLAGS: 00010246 CPU: 0 > [ 81.739141] EIP is at blk_rq_map_sg+0xd7/0x190 > [ 81.739145] EAX: 03619000 EBX: 00000000 ECX: c3464198 EDX: c3464698 > [ 81.739150] ESI: 0361a000 EDI: 00001000 EBP: cb82fe24 ESP: cb82fdec > [ 81.739154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 81.739159] Process mmcqd (pid: 6075, ti=cb82e000 task=cb2a5550 > task.ti=cb82e000) > [ 81.739163] Stack: 00000292 c366c530 cb839a70 00002000 0361b000 c3464698 > 00000001 00000001 > [ 81.739176] 00000000 c34e0848 01ae4698 c33ef2b0 c33ef2b0 cb2ec870 > cb82fe3c f8e81e6c > [ 81.739188] 00200200 c3342580 c33ef2b0 cb2ec870 cb82ffb8 f8e816f9 > 7898775f 5f6f5965 > [ 81.739200] Call Trace: > [ 81.739204] [<c01052fa>] show_trace_log_lvl+0x1a/0x30 > [ 81.739213] [<c01053c1>] show_stack_log_lvl+0xb1/0xe0 > [ 81.739220] [<c01054b1>] show_registers+0xc1/0x1d0 > [ 81.739226] [<c01056da>] die+0x11a/0x230 > [ 81.739232] [<c011d7e9>] do_page_fault+0x269/0x5f0 > [ 81.739239] [<c02f3eea>] error_code+0x72/0x78 > [ 81.739247] [<f8e81e6c>] mmc_queue_map_sg+0x2c/0xe0 [mmc_block] > [ 81.739258] [<f8e816f9>] mmc_blk_issue_rq+0x199/0x750 [mmc_block] > [ 81.739267] [<f8e821a0>] mmc_queue_thread+0x80/0xf0 [mmc_block] > [ 81.739275] [<c013d862>] kthread+0x42/0x70 > [ 81.739282] [<c0104ee7>] kernel_thread_helper+0x7/0x10 > [ 81.739289] ======================= > [ 81.739292] Code: f0 89 45 d8 8b 01 2b 05 80 aa 67 c0 c1 f8 02 69 c0 c5 4e > ec c4 c1 e0 0c 03 41 08 39 45 d8 0f 84 8e 00 00 00 f6 03 02 74 52 31 db <8b> > 03 c7 43 0c 00 00 00 00 c7 43 08 00 00 00 00 83 e0 03 0b 01 > [ 81.739358] EIP: [<c01db437>] blk_rq_map_sg+0xd7/0x190 SS:ESP 0068:cb82fdec > > It seems to me that the two commits: > > [BLOCK] blk_rq_map_sg: force clear termination bit > [BLOCK] Don't clear sg_dma_len/addr() in blk_rq_map_sg() > > have the potential to fix the aforementioned oops, but in a way that create > for the mmc layer the problem reported. It's just gut feeling, I have not > the knowledge of the kernel needed to debug this, but this comment: > > + * If the driver previously mapped a shorter > + * list, we could see a termination bit > + * prematurely unless it fully inits the sg > + * table on each mapping. We KNOW that there > + * must be more entries here or the driver > + * would be buggy, so force clear the > + * termination bit to avoid doing a full > + * sg_init_table() in drivers for each command. > + */ > > rang a bell. When the bug occurs, it seems that some random page is mapped > into the device, so that... maybe the list was not supposed to continue in > this case? > > Well, I hope it can helps someone to find the bug. I am available to > test/try whatever patches you send me. > > Romano > > Complete git bisect log: > > git-bisect start > # bad: [2655e2cee2d77459fcb7e10228259e4ee0328697] ata_piix: Add additional > PCI identifier for 40 wire short cable > git-bisect bad 2655e2cee2d77459fcb7e10228259e4ee0328697 > # good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23 > git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1 > # good: [f4921aff5b174349bc36551f142a5dbac782ea3f] Merge > git://git.linux-nfs.org/pub/linux/nfs-2.6 > git-bisect good f4921aff5b174349bc36551f142a5dbac782ea3f > # good: [9cf52b2921fbe62566b6b2ee79f71203749c9e5e] Merge branch 'master' of > master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 > git-bisect good 9cf52b2921fbe62566b6b2ee79f71203749c9e5e > # bad: [a98ce5c6feead6bfedefabd46cb3d7f5be148d9a] Fix synchronize_irq races > with IRQ handler > git-bisect bad a98ce5c6feead6bfedefabd46cb3d7f5be148d9a > # good: [e9a404580ccaeb31dd2a976f9929c4f9eb6f3540] nfs: Fix build break with > CONFIG_NFS_V4=n > git-bisect good e9a404580ccaeb31dd2a976f9929c4f9eb6f3540 > # good: [668f895a85b0c3a62a690425145f13dabebebd7a] [NET]: Hide the > queue_mapping field inside netif_subqueue_stopped > git-bisect good 668f895a85b0c3a62a690425145f13dabebebd7a > # bad: [ba1c28a94322865457ad59f80474615156065123] Merge branch 'sg' of > git://git.kernel.dk/linux-2.6-block > git-bisect bad ba1c28a94322865457ad59f80474615156065123 > # good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] exportfs: update > documentation > git-bisect good e38f981758118d829cd40cfe9c09e3fa81e422aa > # bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] [BLOCK] blk_rq_map_sg: > force clear termination bit > git-bisect bad 7aeacf982203fb4dea2f3434eefdc268cfd5d6d9 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/