Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> Yes, this is the type of explicit-ness I was thinking of. Note that > you can now drop further tests for up == 0 later in the qlock() text. Hm, spoke too quickly. The tests on "up" have to remain, sadly. Sorry about the misleading noise. ++L

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> /n/dump/2010/1118/sys/src/9/port/qlock.c:18,23 - port/qlock.c:18,25 > { > Proc *p; > > + if(up == nil && conf.postdawn) > + panic("qlock: %#p: postdawn up nil\n", getcallerpc(&q)); > if(m->ilockdepth != 0) > print("qlock: %#p: ilockdepth %d\n", getca

Re: [9fans] That deadlock, again

2010-11-18 Thread erik quanstrom
on second thought, conf.postdawn should be set in schedinit(). - erik

Re: [9fans] That deadlock, again

2010-11-18 Thread erik quanstrom
> I suggest you fix ether82598: it is OK to call qlock() and qunlock() > without "up", but only if sure that the qlock() will succeed. If it > has to wait, it will panic. yes. that's it. > >If it has to wait, it will panic. Given that, why do the locking at all? > > i assume the intention is

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> after reset, it's illegal to call qlock without a process (notably > in an interrupt function), as it previously was. That suggests that the (hopefully) few instances of qlock() invocations that may occur in this space should be burdened with the need to check for the value of "up" and altogethe

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> it's to allow the use during reset of a given driver's > standard functions that normally must qlock, to avoid requiring two copies > of them, with and without the qlock. > > after reset, it's illegal to call qlock without a process (notably > in an interrupt function), as it previously was. I'

Re: [9fans] That deadlock, again

2010-11-18 Thread C H Forsyth
>If it has to wait, it will panic. Given that, why do the locking at all? i assume the intention is along these lines: it's to allow the use during reset of a given driver's standard functions that normally must qlock, to avoid requiring two copies of them, with and without the qlock. after res

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> but i have a feeling that there is a mistake in your > modification to qlock. you didn't have this panic > before you modified qlock. qlock() is broken, or at the very least ambivalent. Someone ought to put it out of its misery: is it legal or is it not to call qlock() in a up == 0 context? +

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
> and i'm just wrong. intentionally or not, devsd does > qlock things with no up from sdreset(). ether82598 > does too (my fault). I suggest you fix ether82598: it is OK to call qlock() and qunlock() without "up", but only if sure that the qlock() will succeed. If it has to wait, it will panic.

Re: [9fans] That deadlock, again

2010-11-18 Thread erik quanstrom
On Thu Nov 18 10:23:20 EST 2010, quans...@quanstro.net wrote: > > if(up != nil && up->nlocks.ref) > > print("qlock: %#p: nlocks %lud\n", getcallerpc(&q), > > up->nlocks.ref); > > > > will no longer need the up != nil test. > > that's just wrong. if the kernel is qlocking without

Re: [9fans] That deadlock, again

2010-11-18 Thread erik quanstrom
> if(up != nil && up->nlocks.ref) > print("qlock: %#p: nlocks %lud\n", getcallerpc(&q), > up->nlocks.ref); > > will no longer need the up != nil test. that's just wrong. if the kernel is qlocking without a up, there's a bug. stack dumps are your friend. but i have a feelin

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
On Thu, Nov 18, 2010 at 10:20:33AM +0100, cinap_len...@gmx.de wrote: > > hm... thinking about it... does the kernel assume (maybe in early > initialization) that calling qlock() without a proc is ok as long as > it can make sure it will not be held by another proc? > That's a question for Bell

Re: [9fans] That deadlock, again

2010-11-18 Thread cinap_lenrek
hm... thinking about it... does the kernel assume (maybe in early initialization) that calling qlock() without a proc is ok as long as it can make sure it will not be held by another proc? -- cinap --- Begin Message --- On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: > > you mus

Re: [9fans] That deadlock, again

2010-11-18 Thread cinap_lenrek
was 0xf01e739e really the code that accesses up->qpctry? -- cinap --- Begin Message --- On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: > > you must be in process context to qlock, because only > processes can sleep. > There's obviously at least one exception, because otherwise I

Re: [9fans] That deadlock, again

2010-11-18 Thread Lucio De Re
On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: > > you must be in process context to qlock, because only > processes can sleep. > There's obviously at least one exception, because otherwise I would not have got a panic at startup. Or, for that matter there would not be active co

Re: [9fans] That deadlock, again

2010-11-17 Thread erik quanstrom
> Strangely, later in the qlock() code "up" is checked and a panic > issued if zero. I'm missing something here: it is possible to execute > this code you must be in process context to qlock, because only processes can sleep. - erik

Re: [9fans] That deadlock, again

2010-11-17 Thread Lucio De Re
> Anyway, I have moved the assignment to "qpctry" to after "up" is > tested. Let's see what happens. I'll have to get back to you once > the system is back up. The system is working now. I have to wait for a problem to arise, next. ++L

Re: [9fans] That deadlock, again

2010-11-17 Thread Lucio De Re
> one could move: > > up->qpc = getcallerpc(&q); > > from qlock() before the lock(&q->use); so we can see from where that > qlock gets called that hangs the exportfs call, or add another magic > debug pointer (qpctry) to the proc stucture and print it in dumpaproc(). Cinap, I tried your

Re: [9fans] That deadlock, again

2010-11-17 Thread Russ Cox
lock loops are about Locks (spin locks), not QLocks. the relative ordering of any two calls to qlock and qunlock is irrelevant. russ

Re: [9fans] That deadlock, again

2010-11-16 Thread erik quanstrom
> #I0tcpack pc f01ff12a dbgpc ... and what's at that pc? - erik

Re: [9fans] That deadlock, again

2010-11-16 Thread erik quanstrom
> On Wed, Nov 17, 2010 at 06:33:13AM +0100, cinap_len...@gmx.de wrote: > > sorry for not being clear. what i ment was that qpc is for the last > > qlock we succeeded to acquire. its *not* the one we are spinning on. > > also, qpc is not set to nil on unlock. > > > Ok, so we set qpctry (qpcdbg?)

Re: [9fans] That deadlock, again

2010-11-16 Thread Lucio De Re
On Wed, Nov 17, 2010 at 08:45:00AM +0200, Lucio De Re wrote: > ... and from whatever the other proc is that also contributes to this > jam. I don't have the name right in front of me, but I will post it > separately. As far as I know it's always those two that interfere with > exportfs and usually

Re: [9fans] That deadlock, again

2010-11-16 Thread Lucio De Re
On Wed, Nov 17, 2010 at 06:33:13AM +0100, cinap_len...@gmx.de wrote: > sorry for not being clear. what i ment was that qpc is for the last > qlock we succeeded to acquire. its *not* the one we are spinning on. > also, qpc is not set to nil on unlock. > Ok, so we set qpctry (qpcdbg?) to qpc befor

Re: [9fans] That deadlock, again

2010-11-16 Thread Lucio De Re
On Wed, Nov 17, 2010 at 06:22:33AM +0100, cinap_len...@gmx.de wrote: > > qpc is the just the caller of the last successfull *acquired* qlock. > what we know is that the exportfs proc spins in the q->use taslock > called by qlock() right? this already seems wired... q->use is held > just long eno

Re: [9fans] That deadlock, again

2010-11-16 Thread cinap_lenrek
sorry for not being clear. what i ment was that qpc is for the last qlock we succeeded to acquire. its *not* the one we are spinning on. also, qpc is not set to nil on unlock. -- cinap --- Begin Message --- > > acid: src(0xf0148c8a) > > /sys/src/9/ip/tcp.c:2096 > > 2091 if(waserro

Re: [9fans] That deadlock, again

2010-11-16 Thread cinap_lenrek
qpc is the just the caller of the last successfull *acquired* qlock. what we know is that the exportfs proc spins in the q->use taslock called by qlock() right? this already seems wired... q->use is held just long enougth to test q->locked and manipulate the queue. also sched() will avoid switch

Re: [9fans] That deadlock, again

2010-11-16 Thread erik quanstrom
> Hm, I thought I understood waserror(), but now I'm sure I don't. What > condition is waserror() attempting to handle here? waserror() sets up an entry in the error stack. if there is a call to error() before poperror(), then that entry is poped and waserror() returns 1. it's just like set_jmp

Re: [9fans] That deadlock, again

2010-11-16 Thread Lucio De Re
>> Now, the qunlock(s) should not precede the qlock(s), this is the first >> case in this procedure: > > it doesn't. waserror() can't be executed before the code > following it. perhpas it could be more carefully written > as > >> > 2095 qlock(s); >> > 2091 if(waserr

Re: [9fans] That deadlock, again

2010-11-16 Thread erik quanstrom
> > acid: src(0xf0148c8a) > > /sys/src/9/ip/tcp.c:2096 > > 2091 if(waserror()){ > > 2092 qunlock(s); > > 2093 nexterror(); > > 2094 } > > 2095 qlock(s); > >>2096qunlock(tcp); > > 2097

Re: [9fans] That deadlock, again

2010-11-16 Thread Lucio De Re
> Well, here is an acid dump, I'll inspect it in detail, but I'm hoping > someone will beat me to it (not hard at all, I have to confess): > > rumble# acid /sys/src/9/pc/9pccpuf > /sys/src/9/pc/9pccpuf:386 plan 9 boot image > /sys/lib/acid/port > /sys/lib/acid/386 > [ ... ] This bit looks suspic

Re: [9fans] That deadlock, again

2010-11-16 Thread lucio
> cinap is right, the bug is in the kernel. we know > that because it's a lock loop. that can only happen > if the kernel screws up. also, the address is a kernel > address (starts with 0xf). Well, here is an acid dump, I'll inspect it in detail, but I'm hoping someone will beat me to it (not h

Re: [9fans] That deadlock, again

2010-11-16 Thread erik quanstrom
> I tried acid, but I'm just not familiar enough with it to make it > work. I tried > > rumble% acid 2052 /bin/exportfs > /bin/exportfs:386 plan 9 executable > /sys/lib/acid/port > /sys/lib/acid/386 > acid: src(0xf01e7377) > no source for ?file? cinap is right

Re: [9fans] That deadlock, again

2010-11-15 Thread Lucio De Re
On Tue, Nov 16, 2010 at 06:28:07AM +0100, cinap_len...@gmx.de wrote: > > if your kernel image is uncompressed and is unstriped, you can > just load it with acid: > > acid /386/9pc > > if you build it yourself, then there should be such a kernel in /sys/src/9/pc > OK, will try this evening, I sp

Re: [9fans] That deadlock, again

2010-11-15 Thread cinap_lenrek
you almost had it with your acid approach :) if your kernel image is uncompressed and is unstriped, you can just load it with acid: acid /386/9pc if you build it yourself, then there should be such a kernel in /sys/src/9/pc -- cinap --- Begin Message --- > the pc's printed by the lock loop mess

Re: [9fans] That deadlock, again

2010-11-15 Thread lucio
> the pc's printed by the lock loop message are kernel code. you > have to load a debug kernel into acid. Thanks, I'll do some digging in the Acid document(s), try to familiarise myself with the details... ++L

Re: [9fans] That deadlock, again

2010-11-15 Thread cinap_lenrek
the pc's printed by the lock loop message are kernel code. you have to load a debug kernel into acid. -- cinap --- Begin Message --- > i assume you've fixed this? (not yet fixed on sources.) Yes, before I did that the errors occurred much more frequently; there's definitely something in that, bu

Re: [9fans] That deadlock, again

2010-11-15 Thread lucio
> i assume you've fixed this? (not yet fixed on sources.) Yes, before I did that the errors occurred much more frequently; there's definitely something in that, but as Russ points out, the fix prevents panics and I have yet to see a panic. I have a suspicion that we're looking at the wrong probl

Re: [9fans] That deadlock, again

2010-11-15 Thread erik quanstrom
On Mon Nov 15 23:23:12 EST 2010, lu...@proxima.alt.za wrote: > Regarding the "deadlock" report that I occasionally see on my CPU > server console, I won't bore anyone with PC addresses or anything like > that, but I will recommend something I believe to be a possible > trigger: the failure always s

[9fans] That deadlock, again

2010-11-15 Thread lucio
Regarding the "deadlock" report that I occasionally see on my CPU server console, I won't bore anyone with PC addresses or anything like that, but I will recommend something I believe to be a possible trigger: the failure always seems to occur within "exportfs", which in this case is used exclusive