Ping.
On 07/11/2016 16:59, Adhemerval Zanella wrote: > > > On 14/10/2016 15:59, Wilco Dijkstra wrote: >> Hi, >> > > Thanks for the thoughtful review and sorry for late response. > >>> Split-stack prologue on function entry is as follow (this goes before the >>> usual function prologue): >> >>> mrs x9, tpidr_el0 >>> mov x10, -<required stack allocation> >> >> As Jiong already remarked, the nop won't work. Do we know the maximum >> adjustment >> that the linker is allowed to make? If so, and we can limit the adjustment >> to 16MB in >> most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub >> but that >> should be extremely rare. > > There is no limit afaik on gold split stack allocation handling, > and I think one could be added for each backend (in the method > override require to implement it). > > In fact it is not really required to tie the nop generation with the > instruction generated by 'aarch64_internal_mov_immediate', it is > just a matter to simplify linker code. > > And although 16MB should be rare, nilptr2.go tests allocates 134217824 > so this test fails with this low stack limit. I am not sure how well > is the stack usage on 'go', but I think we should at least support > current testcase scenario. So for current iteration I kept my > current approach, but I am open to suggestions. > > >> >>> nop/movk >> >>> add x10, sp, x10 >>> ldr x9, [x9, 16] >> >> Is there any need to detect underflow of x10 or is there a guarantee that >> stacks are >> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's >> safe >> to do a signed comparison. > > I do not think so, at least none of current backend that implements > split stack do so. > >> >>> cmp x10, x9 >>> b.cs enough >> >> Why save/restore x30 and the call x30+8 trick when we could pass the >> continuation address and use a tailcall? That also avoids emitting extra >> unwind info. >> >>> stp x30, [sp, -16] >>> bl __morestack >>> ldp x30, [sp], 16 >>> ret >> >> This part doesn't make any sense - both x28 and carry flag as an input, and >> spread >> across the prolog - why??? >> >>> enough: >>> mov x10, sp >> [prolog] >>> b.cs continue >>> mov x10, x28 >> continue: >> [rest of function] >> >> Why not do this? >> >> function: >> mrs x9, tpidr_el0 >> sub x10, sp, N & 0xfff000 >> sub x10, x10, N & 0xfff >> ldr x9, [x9, 16] >> adr x12, main_fn_entry >> mov x11, sp [if function has stacked arguments] >> cmp x10, x9 >> b.ge main_fn_entry >> b __morestack >> main_fn_entry: [x11 is argument pointer] >> [prolog] >> [rest of function] >> >> In __morestack you need to save x8 as well (another argument register!) and >> x12 (the >> continuation address). After returning from the call x8 doesn't need to be >> preserved. > > Indeed this strategy is way better and I adjusted the code follow it. > The only change is I am using a: > > [...] > cmp x9, x10 > b.lt main_fn_entr > b __morestack. > [...] > > So I can issue a 'cmp <continuation address>, 0' on __morestack to indicate > the function was called. > >> >> There are several issues with unwinding in __morestack. x28 is not described >> as a callee-save >> so will be corrupted if unwinding across a __morestack call. This won't >> unwind correctly after >> the ldp as the unwinder will use the restored frame pointer to try to >> restore x29/x30: >> >> + ldp x29, x30, [x28, STACKFRAME_BASE] >> + ldr x28, [x28, STACKFRAME_BASE + 80] >> + >> + .cfi_remember_state >> + .cfi_restore 30 >> + .cfi_restore 29 >> + .cfi_def_cfa 31, 0 > > Indeed, it misses x28 save/restore. I think I have added the missing bits, > but I > must confess that I am not well versed in CFI directives. I will appreciate > if > you could help me on this new version. > >> >> This stores a random x30 value on the stack, what is the purpose of this? >> Nothing can unwind >> to here: >> >> + # Start using new stack >> + stp x29, x30, [x0, -16]! >> + mov sp, x0 >> >> Also we no longer need split_stack_arg_pointer_used_p () or any code that >> uses it (functions >> that don't have any arguments passed on the stack could omit the mov x11, >> sp). > > Right, we new strategy you proposed to do a branch this is indeed not > really required. I remove it from on this new patch. > >> >> Wilco >>