On Jun 19, 2012, at 6:47 PM, Richard Henderson wrote: > On 2012-06-18 05:22, Tristan Gingold wrote: >> + /* Win64 SEH, very large frames need a frame-pointer as maximum stack >> + allocation is 4GB (add a safety guard for saved registers). */ >> + if (TARGET_64BIT_MS_ABI && get_frame_size () + 4096 > SEH_MAX_FRAME_SIZE) >> + return true; > > Elsewhere you say this is an upper bound for stack use by the prologue. > It's clearly a wild guess. The maximum stack use is 10*sse + 8*int > registers saved, which is a lot less than 4096. > > That said, I'm ok with *using* 4096 so long that the comment clearly > states that it's a large over-estimate. I do suggest, however, folding > this into the SEH_MAX_FRAME_SIZE value, and expanding on the comment > there. I see no practical difference between 0x80000000 and 0x7fffe000 > being the limit.
Here is the new comment. I have reduced the estimation to 256. /* According to Windows x64 software convention, the maximum stack allocatable in the prologue is 4G - 8 bytes. Furthermore, there is a limited set of instructions allowed to adjust the stack pointer in the epilog, forcing the use of frame pointer for frames larger than 2 GB. This theorical limit is reduced by 256, an over-estimated upper bound for the stack use by the prologue. We define only one threshold for both the prolog and the epilog. When the frame size is larger than this threshold, we allocate the are to save SSE regs, then save them, and then allocate the remaining. There is no SEH unwind info for this later allocation. */ #define SEH_MAX_FRAME_SIZE ((2U << 30) - 256) > >> +/* Output assembly code to get the establisher frame (Windows x64 only). >> + This corresponds to what will be computed by Windows from Frame Register >> + and Frame Register Offset fields of the UNWIND_INFO structure. Since >> + these values are computed very late (by ix86_expand_prologue), we cannot >> + express this using only RTL. */ >> + >> +const char * >> +ix86_output_establisher_frame (rtx target) >> +{ >> + if (!frame_pointer_needed) >> + { >> + /* Note that we have advertized an lea operation. */ >> + output_asm_insn ("lea{q}\t{0(%%rsp), %0|%0, 0[rsp]}", &target); >> + } >> + else >> + { >> + rtx xops[3]; >> + struct ix86_frame frame; >> + >> + /* Recompute the frame layout here. */ >> + ix86_compute_frame_layout (&frame); >> + >> + /* Closely follow how the frame pointer is set in >> + ix86_expand_prologue. */ >> + xops[0] = target; >> + xops[1] = hard_frame_pointer_rtx; >> + if (frame.hard_frame_pointer_offset == frame.reg_save_offset) >> + xops[2] = GEN_INT (0); >> + else >> + xops[2] = GEN_INT (-(frame.stack_pointer_offset >> + - frame.hard_frame_pointer_offset)); >> + output_asm_insn ("lea{q}\t{%a2(%1), %0|%0, %a2[%1]}", xops); > > This is what register elimination is for; the value substitution happens > during reload. > > Now, one *could* add a new pseudo-hard-register for this (we support as > many register eliminations as needed), but before we do that we need to > decide if we can adjust the soft frame pointer to be the value required. > If so, you can then rely on the existing __builtin_frame_address. Which > is a very attractive sounding solution. I'm 99% moving the sfp will work. Thank you for this idea. I am trying to implement it. Tristan.