Re: 2.6.13-rc6-rt6

Ingo Molnar Wed, 17 Aug 2005 09:23:50 -0700

* Steven Rostedt <[EMAIL PROTECTED]> wrote:

> On Wed, 2005-08-17 at 10:24 -0400, Steven Rostedt wrote:
> 
> > OK the output from netconsole still seems like netconsole itself is
> > causing some problems.  But I think it is also showing this lockup. I'll
> > recompile my kernel as UP and see if netconsole works fine.
> 
> Well, the UP kernel boots on my laptop, but netconsole gives strange 
> warnings.
> 
> OK, what's the scoop with the illegal_API_call?  What is it about, and 
> what is the expected work around?


this is a recent change: i've started flagging "naked" use of 
local_irq_disable(), because it's a problem on PREEMPT_RT and it's a 
potential SMP bug on upstream kernels. A local_irq_disable() is 
converted either to raw_local_irq_disable() when justified (it's mostly 
only justified for lowlevel arch code), or is eliminated totally.  
(either by merging it into a nearby spin_lock API call, or by removing 
it altogether, making sure that code doesnt break).

Right now we print a warning on the first such API use, and then shut up 
about it. All local_irq_() APIs map to NOPs. (we keep the PF_IRQSOFF 
flag for compatibility, but only to get irqs_off() right which in turn 
shuts off a number of BUG_ON(!irqs_disabled()) warnings, and it doesnt 
have any other functional purpose.)

the desired end-result would be the total elimination of local_irq_*() 
API calls.

> I'm also getting the following output on shutdown:
> 
> NET: Registered protocol family 31
> Bluetooth: HCI device and connection manager initialized
> Bluetooth: HCI socket layer initialized
> Bluetooth: L2CAP ver 2.7
> Bluetooth: L2CAP socket layer initialized
> Bluetooth: RFCOMM ver 1.5
> Bluetooth: RFCOMM socket layer initialized
> Bluetooth: RFCOMM TTY layer initialized
> BUG: nonzero lock count 1 at exit time?
>             nfsd: 4696 [f7183830, 115]
>  [<c0136922>] check_no_held_locks+0x62/0x330 (8)
>  [<c011df67>] do_exit+0x257/0x480 (32)
>  [<c013d052>] __module_put_and_exit+0x52/0x70 (40)
>  [<f8d54583>] nfsd+0x2b3/0x340 [nfsd] (12)
>  [<f8d542d0>] nfsd+0x0/0x340 [nfsd] (48)
>  [<c010140d>] kernel_thread_helper+0x5/0x18 (16)
> ---------------------------
> | preempt count: 00000000 ]
> | 0-level deep critical section nesting:
> ----------------------------------------
> 
> ------------------------------
> | showing all locks held by: |  (nfsd/4696 [f7183830, 116]):
> ------------------------------
> 
> #001:             [c038e184] {kernel_sem.lock}
> ... acquired at:               lock_kernel+0x21/0x40
> 
> BUG: nfsd/4696, BKL held at task exit time!

hm, it seems nfsd forgets to do an unlock_kernel() in some exit path it 
seems? We are enforcing strict balanced lock use in PREEMPT_RT - the 
upstream kernel is more relaxed about it.

> And it goes on and on. This happens everytime. Without netconsole, I
> only get the nonzero lock count error. Also, one of my lockups on SMP
> had to do with the kernel_thread_helper:
> 
> Using IPI Shortcut mode
> khelper/794[CPU#0]: BUG in set_new_owner at kernel/rt.c:916

this is a 'must not happen'. Somehow lock->held list got non-empty.  
Maybe some use-after-free thing? Havent seen it myself.

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc6-rt6

Reply via email to