> Yes, this is the type of explicit-ness I was thinking of. Note that
> you can now drop further tests for up == 0 later in the qlock() text.
Hm, spoke too quickly. The tests on "up" have to remain, sadly.
Sorry about the misleading noise.
++L
> /n/dump/2010/1118/sys/src/9/port/qlock.c:18,23 - port/qlock.c:18,25
> {
> Proc *p;
>
> + if(up == nil && conf.postdawn)
> + panic("qlock: %#p: postdawn up nil\n", getcallerpc(&q));
> if(m->ilockdepth != 0)
> print("qlock: %#p: ilockdepth %d\n", getca
on second thought, conf.postdawn should be set in schedinit().
- erik
> I suggest you fix ether82598: it is OK to call qlock() and qunlock()
> without "up", but only if sure that the qlock() will succeed. If it
> has to wait, it will panic.
yes. that's it.
> >If it has to wait, it will panic. Given that, why do the locking at all?
>
> i assume the intention is
> after reset, it's illegal to call qlock without a process (notably
> in an interrupt function), as it previously was.
That suggests that the (hopefully) few instances of qlock()
invocations that may occur in this space should be burdened with the
need to check for the value of "up" and altogethe
> it's to allow the use during reset of a given driver's
> standard functions that normally must qlock, to avoid requiring two copies
> of them, with and without the qlock.
>
> after reset, it's illegal to call qlock without a process (notably
> in an interrupt function), as it previously was.
I'
>If it has to wait, it will panic. Given that, why do the locking at all?
i assume the intention is along these lines:
it's to allow the use during reset of a given driver's
standard functions that normally must qlock, to avoid requiring two copies
of them, with and without the qlock.
after res
> but i have a feeling that there is a mistake in your
> modification to qlock. you didn't have this panic
> before you modified qlock.
qlock() is broken, or at the very least ambivalent. Someone ought to
put it out of its misery: is it legal or is it not to call qlock() in
a up == 0 context?
+
> and i'm just wrong. intentionally or not, devsd does
> qlock things with no up from sdreset(). ether82598
> does too (my fault).
I suggest you fix ether82598: it is OK to call qlock() and qunlock()
without "up", but only if sure that the qlock() will succeed. If it
has to wait, it will panic.
On Thu Nov 18 10:23:20 EST 2010, quans...@quanstro.net wrote:
> > if(up != nil && up->nlocks.ref)
> > print("qlock: %#p: nlocks %lud\n", getcallerpc(&q),
> > up->nlocks.ref);
> >
> > will no longer need the up != nil test.
>
> that's just wrong. if the kernel is qlocking without
> if(up != nil && up->nlocks.ref)
> print("qlock: %#p: nlocks %lud\n", getcallerpc(&q),
> up->nlocks.ref);
>
> will no longer need the up != nil test.
that's just wrong. if the kernel is qlocking without
a up, there's a bug. stack dumps are your friend.
but i have a feelin
On Thu, Nov 18, 2010 at 10:20:33AM +0100, cinap_len...@gmx.de wrote:
>
> hm... thinking about it... does the kernel assume (maybe in early
> initialization) that calling qlock() without a proc is ok as long as
> it can make sure it will not be held by another proc?
>
That's a question for Bell
hm... thinking about it... does the kernel assume (maybe in early
initialization) that calling qlock() without a proc is ok as long as
it can make sure it will not be held by another proc?
--
cinap
--- Begin Message ---
On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote:
>
> you mus
was 0xf01e739e really the code that accesses up->qpctry?
--
cinap
--- Begin Message ---
On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote:
>
> you must be in process context to qlock, because only
> processes can sleep.
>
There's obviously at least one exception, because otherwise I
On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote:
>
> you must be in process context to qlock, because only
> processes can sleep.
>
There's obviously at least one exception, because otherwise I would not
have got a panic at startup. Or, for that matter there would not be
active co
> Strangely, later in the qlock() code "up" is checked and a panic
> issued if zero. I'm missing something here: it is possible to execute
> this code
you must be in process context to qlock, because only
processes can sleep.
- erik
> Anyway, I have moved the assignment to "qpctry" to after "up" is
> tested. Let's see what happens. I'll have to get back to you once
> the system is back up.
The system is working now. I have to wait for a problem to arise, next.
++L
> one could move:
>
> up->qpc = getcallerpc(&q);
>
> from qlock() before the lock(&q->use); so we can see from where that
> qlock gets called that hangs the exportfs call, or add another magic
> debug pointer (qpctry) to the proc stucture and print it in dumpaproc().
Cinap, I tried your
lock loops are about Locks (spin locks), not QLocks.
the relative ordering of any two calls to qlock and qunlock
is irrelevant.
russ
> #I0tcpack pc f01ff12a dbgpc ...
and what's at that pc?
- erik
> On Wed, Nov 17, 2010 at 06:33:13AM +0100, cinap_len...@gmx.de wrote:
> > sorry for not being clear. what i ment was that qpc is for the last
> > qlock we succeeded to acquire. its *not* the one we are spinning on.
> > also, qpc is not set to nil on unlock.
> >
> Ok, so we set qpctry (qpcdbg?)
On Wed, Nov 17, 2010 at 08:45:00AM +0200, Lucio De Re wrote:
> ... and from whatever the other proc is that also contributes to this
> jam. I don't have the name right in front of me, but I will post it
> separately. As far as I know it's always those two that interfere with
> exportfs and usually
On Wed, Nov 17, 2010 at 06:33:13AM +0100, cinap_len...@gmx.de wrote:
> sorry for not being clear. what i ment was that qpc is for the last
> qlock we succeeded to acquire. its *not* the one we are spinning on.
> also, qpc is not set to nil on unlock.
>
Ok, so we set qpctry (qpcdbg?) to qpc befor
On Wed, Nov 17, 2010 at 06:22:33AM +0100, cinap_len...@gmx.de wrote:
>
> qpc is the just the caller of the last successfull *acquired* qlock.
> what we know is that the exportfs proc spins in the q->use taslock
> called by qlock() right? this already seems wired... q->use is held
> just long eno
sorry for not being clear. what i ment was that qpc is for the last
qlock we succeeded to acquire. its *not* the one we are spinning on.
also, qpc is not set to nil on unlock.
--
cinap
--- Begin Message ---
> > acid: src(0xf0148c8a)
> > /sys/src/9/ip/tcp.c:2096
> > 2091 if(waserro
qpc is the just the caller of the last successfull *acquired* qlock.
what we know is that the exportfs proc spins in the q->use taslock
called by qlock() right? this already seems wired... q->use is held
just long enougth to test q->locked and manipulate the queue. also
sched() will avoid switch
> Hm, I thought I understood waserror(), but now I'm sure I don't. What
> condition is waserror() attempting to handle here?
waserror() sets up an entry in the error stack.
if there is a call to error() before poperror(),
then that entry is poped and waserror() returns
1. it's just like set_jmp
>> Now, the qunlock(s) should not precede the qlock(s), this is the first
>> case in this procedure:
>
> it doesn't. waserror() can't be executed before the code
> following it. perhpas it could be more carefully written
> as
>
>> > 2095 qlock(s);
>> > 2091 if(waserr
> > acid: src(0xf0148c8a)
> > /sys/src/9/ip/tcp.c:2096
> > 2091 if(waserror()){
> > 2092 qunlock(s);
> > 2093 nexterror();
> > 2094 }
> > 2095 qlock(s);
> >>2096qunlock(tcp);
> > 2097
> Well, here is an acid dump, I'll inspect it in detail, but I'm hoping
> someone will beat me to it (not hard at all, I have to confess):
>
> rumble# acid /sys/src/9/pc/9pccpuf
> /sys/src/9/pc/9pccpuf:386 plan 9 boot image
> /sys/lib/acid/port
> /sys/lib/acid/386
>
[ ... ]
This bit looks suspic
> cinap is right, the bug is in the kernel. we know
> that because it's a lock loop. that can only happen
> if the kernel screws up. also, the address is a kernel
> address (starts with 0xf).
Well, here is an acid dump, I'll inspect it in detail, but I'm hoping
someone will beat me to it (not h
> I tried acid, but I'm just not familiar enough with it to make it
> work. I tried
>
> rumble% acid 2052 /bin/exportfs
> /bin/exportfs:386 plan 9 executable
> /sys/lib/acid/port
> /sys/lib/acid/386
> acid: src(0xf01e7377)
> no source for ?file?
cinap is right
On Tue, Nov 16, 2010 at 06:28:07AM +0100, cinap_len...@gmx.de wrote:
>
> if your kernel image is uncompressed and is unstriped, you can
> just load it with acid:
>
> acid /386/9pc
>
> if you build it yourself, then there should be such a kernel in /sys/src/9/pc
>
OK, will try this evening, I sp
you almost had it with your acid approach :)
if your kernel image is uncompressed and is unstriped, you can
just load it with acid:
acid /386/9pc
if you build it yourself, then there should be such a kernel in /sys/src/9/pc
--
cinap
--- Begin Message ---
> the pc's printed by the lock loop mess
> the pc's printed by the lock loop message are kernel code. you
> have to load a debug kernel into acid.
Thanks, I'll do some digging in the Acid document(s), try to
familiarise myself with the details...
++L
the pc's printed by the lock loop message are kernel code. you
have to load a debug kernel into acid.
--
cinap
--- Begin Message ---
> i assume you've fixed this? (not yet fixed on sources.)
Yes, before I did that the errors occurred much more frequently;
there's definitely something in that, bu
> i assume you've fixed this? (not yet fixed on sources.)
Yes, before I did that the errors occurred much more frequently;
there's definitely something in that, but as Russ points out, the fix
prevents panics and I have yet to see a panic.
I have a suspicion that we're looking at the wrong probl
On Mon Nov 15 23:23:12 EST 2010, lu...@proxima.alt.za wrote:
> Regarding the "deadlock" report that I occasionally see on my CPU
> server console, I won't bore anyone with PC addresses or anything like
> that, but I will recommend something I believe to be a possible
> trigger: the failure always s
Regarding the "deadlock" report that I occasionally see on my CPU
server console, I won't bore anyone with PC addresses or anything like
that, but I will recommend something I believe to be a possible
trigger: the failure always seems to occur within "exportfs", which in
this case is used exclusive
39 matches
Mail list logo