On Tue, Mar 22, 2016 at 04:53:26PM -0700, Bart Van Assche wrote: > On 03/22/2016 01:45 PM, Paul E. McKenney wrote: > >You are getting a soft lockup as well as an RCU CPU stall warning, so > >it looks like something is taking a very long time in blk_done_softirq(). > > > >You have multiple occurrences at different times, so it looks to be > >a long time as opposed to an infinite time. Are you perhaps doing > >something that would make a huge amount of work for blk_done_softirq()? > > > >See Documentation/RCU/stallwarn.txt in the kernel source tree for more > >info on how to debug this sort of thing. > > Hello Paul, > > None of the drivers involved in the test I ran contain RCU code that > has been changed recently. The block and SCSI subsystems processes > I/O completions in softirq context but until last week I hadn't seen > any RCU lockup complaints when I ran an SRP test against a kernel > with lockdep and several other kernel debugging options enabled. > This is why I sent an e-mail to you. I have read > Documentation/RCU/stallwarn.txt after I received your reply but this > didn't provide me any clue about where to look for the root cause. > Any further help would be appreciated.
My suggestion would be to check the block/SCSI softirq handler for event traces. If there are some, enable them and see what the loop is doing. Documentation/trace/ftrace.txt describes how to enable existing event tracing. If there is no event tracing, consider adding some in your local view. Failing that, there is always printk(). ;-) Or perhaps you have some sort of debug setup. Either way, the next step is to work out why that CPU is spending so much time in that loop. Thanx, Paul