On 2 April 2014 23:04, Andrei E. Warkentin <andrey.warken...@gmail.com> wrote: > Hiya, > > I found a couple of corner cases where signal handling fails in QEMU > linux-user support. "Signal handling" here being just a symptom - > actual problems are in TB / page management. > > Here are a couple of simple tests > (https://github.com/andreiw/andreiw-wip/tree/master/qemu/tests). The > test: > 1) Creates a page with two instructions - an illegal instruction and a 'ret'. > 2) Sets up two handlers - SIGILL and SIGSEGV. On SIGILL, the page with > the instructions is unmapped (in sigtest.c, and set to PROT_NONE in > sigtest_mprotect.c), and the instruction pointer is set to point to > the next instruction - the ret. On SIGSEGV, the page is mapped back in > and the execution is supposed to continue successfully. > > The first problem is a bug in page_check_range that cuts the loop > after the first page that needs to be unprotected (whereas all pages > need to be checked in the range). > > https://github.com/andreiw/andreiw-wip/blob/master/qemu/0001-qemu-fix-page_check_range.patch > > With this patch, the above tests now hang instead of segfaulting QEMU.
This patch looks good to me. Reviewed-by: Peter Maydell <peter.mayd...@linaro.org> > The second problem is a bit more involved - the code generator assumes > that the translated code is always present. Jumping to NULL to land in > a SIGSEGV handler, or expecting to "fault in" code like the tests > above do, results in a deadlock condition. On a signal, QEMU tries > delivering the signal, longjmps back to cpu_loop, but the > tcg_ctx.tb_ctx.tb_lock is still held. So tb_gen_code must allow > backing out gracefully if the memory is unreadable. > > I took the simpler approach - making sure the page didn't look > unmapped or mprotected to PROT_NONE. Checking explicit access bits is > a bad idea because different arches have different implicit access > right (i.e. X implies R, or W implies R, or R implies X, etc). Another > alternative is to have a safe memory probe function - i.e. take a > SIGSEGV and patch the return to return EFAULT (like a kernel ;-)), but > this would need an implementation for every arch QEMU runs on. > > Verified both on AArch64 and i386 linux-user targets. Both tests > successfully exit. > > https://github.com/andreiw/andreiw-wip/blob/master/qemu/0002-qemu-handle-tb_gen_code-getting-called-for-unmapped-.patch > https://github.com/andreiw/andreiw-wip/blob/master/qemu/0003-x86-implement-EXCP_TB_EFAULT.patch I'm not so convinced about these two. I think the correct approach is to make sure we unlock the tb_lock in the code path where we longjumped out of the frontend decoder. I'll try to cook up a patch for this. thanks -- PMM