On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote: [skip] > But I also would like to use this opportunity to discuss how we can > make it easier to debug such issue as this. I think that this problem > demonstrates that when we treat certain junk in kernel address value > as a userland address value, we throw additional heaps of irrelevant > stuff on top of an actual problem. One solution could be to use a > special flag that would mark all actual attempts to access userland > address (e.g. setting the flag on entrance to copyin and clearing it > upon return), so that in the page fault handler we could distinguish > actual faults on userland addresses from faults on garbage kernel > addresses. I am sure that there could be other clever techniques to > catch such garbage addresses early.
We already have such mechanism, the kernel code aware of the usermode page access sets pcb_onfault. See the end of trap_pfault() handler. In fact, we can catch it earlier, before even calling vm_fault(). BTW, I think this is esp. useful in the combination with the support for the SMEP in recent Intel CPUs. commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536 Author: Konstantin Belousov <kos...@pooma.home> Date: Thu Aug 18 00:08:50 2011 +0300 Assert that the exiting process does not return to usermode. On x86, do not call vm_fault() when the kernel is not prepared to handle unsuccessful page fault. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..55e1e5a 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -674,6 +674,19 @@ trap_pfault(frame, usermode) goto nogo; map = &vm->vm_map; + + /* + * When accessing a usermode address, kernel must be + * ready to accept the page fault, and provide a + * handling routine. Since accessing the address + * without the handler is a bug, do not try to handle + * it normally, and panic immediately. + */ + if (!usermode && (td->td_intr_nesting_level != 0 || + PCPU_GET(curpcb)->pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..e6d2b5a 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva) goto nogo; map = &vm->vm_map; + if (!usermode && (td->td_intr_nesting_level != 0 || + PCPU_GET(curpcb)->pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 3527ed1..a69b7b8 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c @@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame) CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid, td->td_name); + KASSERT((p->p_flag & P_WEXIT) == 0, + ("Exiting process returns to usermode")); #if 0 #ifdef DIAGNOSTIC /* Check that we called signotify() enough. */
pgpLAoAJLdIjZ.pgp
Description: PGP signature