On Tue, Jan 29 2008, Olof Johansson wrote: > On Tue, Jan 29, 2008 at 09:16:48PM +0100, Jens Axboe wrote: > > > Actually, can you try this? It has a known race but nothing to worry > > about, and it removes ioc->lock from irq context. > > I just tried this myself, since I saw hangs within seconds of running > 'aiostress' from autotest on this morning's kernel as well (g0ba6c33). > > It didn't help. My config is 2 cores (powerpc), built with > arch/powerpc/configs/pasemi_defconfig. Hardware is sata disk on marvell > 7042.
Please try this. diff --git a/block/as-iosched.c b/block/as-iosched.c index b201d16..9603684 100644 --- a/block/as-iosched.c +++ b/block/as-iosched.c @@ -1275,9 +1275,13 @@ static void as_merged_requests(struct request_queue *q, struct request *req, * Don't copy here but swap, because when anext is * removed below, it must contain the unused context */ - double_spin_lock(&rioc->lock, &nioc->lock, rioc < nioc); - swap_io_context(&rioc, &nioc); - double_spin_unlock(&rioc->lock, &nioc->lock, rioc < nioc); + if (rioc != nioc) { + double_spin_lock(&rioc->lock, &nioc->lock, + rioc < nioc); + swap_io_context(&rioc, &nioc); + double_spin_unlock(&rioc->lock, &nioc->lock, + rioc < nioc); + } } } > Stacktraces: > > 0:mon> t > [link register ] c000000000221984 .as_merged_requests+0x164/0x1e0 > [c00000007e8ff750] c00000007e8ff800 (unreliable) > [c00000007e8ff800] c000000000212f20 .elv_merge_requests+0x50/0xb0 > [c00000007e8ff890] c000000000218c98 .attempt_merge+0x318/0x3d0 > [c00000007e8ff940] c00000000021af98 .__make_request+0x2f8/0x720 > [c00000007e8ffa20] c000000000216e6c .generic_make_request+0x22c/0x2f0 > [c00000007e8ffad0] c000000000216fd8 .submit_bio+0xa8/0x150 > [c00000007e8ffb90] c0000000000efa7c .submit_bh+0x15c/0x1e0 > [c00000007e8ffc20] c000000000145d5c .journal_do_submit_data+0x6c/0xb0 > [c00000007e8ffcc0] c000000000147460 .journal_commit_transaction+0x1640/0x16e0 > [c00000007e8ffe10] c00000000014c238 .kjournald+0x108/0x2f0 > [c00000007e8fff00] c000000000067ca8 .kthread+0xc8/0xe0 > [c00000007e8fff90] c0000000000237bc .kernel_thread+0x4c/0x68 > 0:mon> c1 > 1:mon> t > [link register ] c00000000033c20c .scsi_device_unbusy+0x8c/0x120 > [c00000007e10b6f0] c00000000033c1a8 .scsi_device_unbusy+0x28/0x120 > (unreliable) > [c00000007e10b780] c000000000333078 .scsi_finish_command+0x38/0x130 > [c00000007e10b810] c00000000033c478 .scsi_softirq_done+0xd8/0x1a0 > [c00000007e10b8b0] c00000000021a780 .blk_done_softirq+0xb0/0xe0 > [c00000007e10b940] c000000000051e58 .__do_softirq+0xe8/0x1d0 > [c00000007e10ba00] c00000000000ac64 .do_softirq+0x64/0xa0 > [c00000007e10ba80] c000000000051b84 .irq_exit+0xc4/0xd0 > [c00000007e10bb00] c00000000000b3c0 .do_IRQ+0xe0/0x120 > [c00000007e10bb80] c00000000000405c hardware_interrupt_entry+0x18/0x3c > --- Exception: 501 (Hardware Interrupt) at c000000000010564 > .cpu_idle+0xb4/0x160 > [c00000007e10be70] c000000000010524 .cpu_idle+0x74/0x160 (unreliable) > [c00000007e10bf00] c000000000540e6c > [c00000007e10bf90] c000000000007364 .start_secondary_prolog+0xc/0x10 > > > scsi_device_busy is sitting on: > > spin_lock(sdev->request_queue->queue_lock); > > and as_merged_requests on the second lock at: > > double_spin_lock(&rioc->lock, &nioc->lock, rioc < nioc); > > The latter is the problem, but since the queue lock is also held before doing the double_spin_lock(), we get into even more trouble. The locking hierarchy is fine, it's always queue lock -> io context locks, so the above patch should be all that is needed to fix this. My initial analysis was wrong, that's all :/ -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/