Hiya, list. Think i've found a rather nasty bug in the kernel, and I need some clues as to where to look for the issue. Stats: Quad Xeon (PIII core) 700mhz machine (1mb cache on each) 4gb RAM 5x36gb SCSI disks - on a DAC1100 RAID controller 3 EEPro 100 cards The box functions as a database server that runs at about 40% load on each CPU, and about 1.5gb memory usage. Kernel: 2.2.18pre11-va2.0smp (Although completely reproducible on stock 2.2.18) If dmesg output from kernel - or any other info is required, i'd be more than happy to provide it. Problem: Box appears to stop responding to network requests for 30 seconds at a time. it appears to be happening when we get this error: wait_on_bh, CPU 0: irq: 0 [0 0] bh: 1 [0 0] <[c010bb29]> <[c011d07b]> <[c011d1ed]> <[c0116658]> <[c01099fc]> it LOOKS like the virtaddr's provided are a call trace, however, I can't be sure - as SOME of the addresses don't show up as a ksym.. the problem LOOKS to be, from my perspective (And light code reading) - that the function synchronize_bh() is called SOMEWHERE, and then, wait_on_bh() is called from that. It also appears that wait_on_bh() loops through MAXCOUNT times (100000000 times), and fails - therefore exiting the function. (it also appears that the 1 global interrupt that is OPEN is a TIMER interrupt.) Can SOMEONE give me a clue as to where to start looking for this problem - and/or perhaps some input from people who work on this code? (The really confusing thing about this, is that the wait_for_bh() function should NOT take 30 seconds to jump through the maximum loop count.) Thanks in advance, Chad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/