Hello Mark,

On Mon, May 02, 2011 at 05:42:10PM +0200, Mark Kettenis wrote:
> > Date: Tue, 19 Apr 2011 15:43:01 +0200
> > From: Christian Ehrhardt <[email protected]>
> > 
> > Hi,
> > 
> > On Mon, Apr 18, 2011 at 10:00:02PM -0700, Philip Guenther wrote:
> > > > The sti was introduced in revision 1.97 of locore.s in March 2006 by
> > > > mickey@. Commit message:
> > > >
> > > > | prevent the faults on iret to run w/ disabled intrs and cause
> > > > | deadlocks; niklas toby tom ok
> > > >
> > > > Maybe mickey or one of the people giving oks back then want to comment?
> > > 
> > > That predates my involvement with OpenBSD, but I think get the gist.
> > > iret can fault if, for example, the user corrupts certain members of
> > > the sigcontext passed to a signal handler.  If that happens the the
> > > CPU generates an exception on the iret and you end up in trap() where
> > > it peeks at the instruction that triggered the trap and, seeing it was
> > > iret, arranges to 'return' to the resume_iret label to generate what
> > > looks like a normal fault against the process.  The goal of that sti
> > > is to keep that 'normal' fault from being processed with interrupts
> > > blocked.
> > 
> > Point taken, I missed the sigreturn path. However, note that iret can
> > only fault if the values of the segment registers stored on the kernel
> > stack are bogus. In particular iret will not fault on the new EIP address.
> > (The address can page fault but the fault will be on the "instruction"
> > pointed to by the new CS:EIP and not in iret).
> > 
> > With that in mind, the question is: Is it really a problem if the trap
> > handler runs with interrupts disabled? See below.
> 
> Yes, that will be a problem.  IPIs will be blocked.  So if that trap
> requires us to grab the kernel lock, and another process (that holds
> the kernel lock) tries to do a TLB shootdown, we'll deadlock.

Ok, but ...

> > > Possibility two would be to try to handle this from, uh, inside
> > > resume_iret I guess.  I'm not 100% sure whether it can be unilateral
> > > there though.
> 
> This seems to be what amd64 does, and since calling uvm_fault() with
> interrupts disabled is really really bad, I don't think this would be
> a problem.
> 
> > This would be my preferred solution. In particular, I see two possibilities:
> > 
> > * Maybe it is not even a problem if irqs are sometimes disabled during
> >   trap.
> 
> naddy@ is seeing hangs on the i386 ports building machine, and I
> suspect that is because interrupts are disabled while handling a page
> fault trap.

... I really don't believe this, unless there is some path that I still
miss. Faulting on iret itself should IMHO be a very rare thing. The one
and only path that is see, where this can happen, is sigreturn with
bogus segment registers provided by userland.

In particular, a simple page fault on the user IP or SP should not fault
on iret in kernel mode but on the target CS:IP in user mode with IRQs
enabled no matter what.

I assume it is not possible to get a stack trace from the hanging machine?

> > * It is a problem if trap runs with irqs disabled. In this case, userland
> >   can easily trigger this by trapping on one of the pop instructions for
> >   ds, es, fs or gs. Those instructions in the interrupt return path run
> >   with irqs disabled. Thus moving the sti in the interupt return path down
> >   one instruction immediately before the iret does not really help.
> >   The proper solution would probably be to add explicit calls to sti in all
> >   of the resume paths (or even explicitly before calling "trap").
> 
> So here is a diff that adds the sti to both resume paths.  Like I
> said, amd64 already does this (but only has the one resume path).  Now
> the i386 ports machine is a bad machine to test this on.  So it would
> be nice if somebody with the appropriate hardware (an 8-way i386
> machine I presume) could:

In fact it might be a good idea to try this patch instead (i.e.
unconditionally enable irqs before calling trap):

Index: i386/locore.s
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.130
diff -u -r1.130 locore.s
--- i386/locore.s       3 Jul 2010 04:54:32 -0000       1.130
+++ i386/locore.s       2 May 2011 16:46:14 -0000
@@ -1511,6 +1511,7 @@
 #ifdef DIAGNOSTIC
        movl    CPL,%ebx
 #endif /* DIAGNOSTIC */
+       sti
        pushl   %esp
        call    _C_LABEL(trap)
        addl    $4,%esp


   regards     Christian

Reply via email to