On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote: > After much poking around I concluded that we really need to implement > allocation and probing via a "moving sp" strategy. Probing into > unallocated areas runs afoul of valgrind, so that's a non-starter. > > Allocating stack space, then probing the pages within the space is > vulnerable to async signal delivery between the allocation point and the > probe point. If that occurs the signal handler could end up running on > a stack that has collided with the heap. > > Ideally we would allocate and probe a page as an atomic unit (which is > feasible on PPC). Alternatively, due to ISA restrictions, allocate a > page, then probe the page as distinct instructions. The latter still > has a race, but we'd have to take the async signal in a single > instruction window.
And if the allocation is only a page at a time, the single insn race window can be mitigated in the kernel (probe (read-only is fine) the word at the stack when setting up a signal frame for async signal). > So, time to open the discussion to questions & comments. > > I've got patches I need to cleanup and post for comments that implement > this for x86, ppc, aarch64 and s390. x86 and ppc are IMHO in good > shape. THere's an unhandled case for s390. I've got evaluation still > to do on aarch64. In the patches Jeff is going to post, we have (at least for -fasynchronous-unwind-tables which is on by default on e.g. x86) precise unwind info even with the new stack check mode. ira.c currently has: /* We need the frame pointer to catch stack overflow exceptions if the stack pointer is moving (as for the alloca case just above). */ || (STACK_CHECK_MOVING_SP && flag_stack_check && flag_exceptions && cfun->can_throw_non_call_exceptions) For alloca we have a frame pointer for other reasons, the question is if we really need this hunk even if we provided proper unwind info even for the Ada -fstack-check mode. Or, if we provide proper unwind info for -fasynchronous-unwind-tables, if the above could not be also && !flag_asynchronous_unwind_tables. Eric, what exactly is the reason for the above, is it just lack of proper CFI notes, or something different? Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack, while it is shorter, is it actually faster or as slow as movq $0, (%rsp) or movl $0, (%esp) ? Jakub