http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59086
--- Comment #3 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- First of all, reload also cannot generate this code with mentioned i386.c change When we use -maccumulate-outgoing-args, we have before reload/LRA (insn 6 10 7 2 (parallel [ (set (reg:SI 87 [ x ]) (asm_operands:SI ("1: ") ("=d") 0 [ (mem/c:SI (reg/f:SI 16 argp) [0 count+0 S4 A32]) (mem/c:V2DI (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffffffffffe0])) [0 tmp1+0 S16 A128]) (reg:SI 87 [ x ]) ] [ (asm_input:SI ("m") (null):0) (asm_input:V2DI ("m") (null):0) (asm_input:SI ("0") (null):0) ] [] b2.c:7)) (clobber (reg:QI 18 fpsr)) (clobber (reg:QI 17 flags)) (clobber (reg:QI 0 ax)) (clobber (reg:QI 5 di)) (clobber (reg:QI 4 si)) (clobber (reg:QI 2 cx)) ]) b2.c:7 -1 and argp is transformed after lra/reload into fp+const: (insn 6 10 7 2 (parallel [ (set (reg:SI 1 dx [orig:87 x ] [87]) (asm_operands:SI ("1: ") ("=d") 0 [ (mem/c:SI (plus:SI (reg/f:SI 6 bp) (const_int 8 [0x8])) [0 count+0 S4 A32]) (mem/c:V2DI (reg/f:SI 7 sp) [0 tmp1+0 S16 A128]) (reg:SI 1 dx [orig:87 x ] [87]) ] If we don't use -maccumulate-outgoing-args, we have before reload/LRA: (insn/f 10 3 2 2 (set (reg:SI 89) (reg:SI 2 cx)) 86 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 2 cx) (expr_list:REG_CFA_SET_VDRAP (reg:SI 89) (nil)))) ... (insn 6 11 7 2 (parallel [ (set (reg:SI 87 [ x ]) (asm_operands:SI ("1: ") ("=d") 0 [ (mem/c:SI (reg:SI 89) [0 count+0 S4 A32]) (mem/c:V2DI (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffffffffffe0])) [0 tmp1+0 S16 A128]) (reg:SI 87 [ x ]) ] As we have only 1 free reg for asm (4 regs are clobbered in the asm, ebx is taken for -fPIC, ebp is always needed for -mstackrealign), we cannot use it for p87 (it has + constraint) and p89. The generated code also has no equiv for p89 to use it. So reload/LRA can do nothing in this situation. Even if i implement fp elimination in presence of sp changes in RTL (and i am working on it as it is needed for other existing PRs and very important for generated code performance), bp will not be free (again because of -mstackrealign). So if we want to compile the code, we should revert the original change - /* ??? Unwind info is not correct around the CFG unless either a frame - pointer is present or M_A_O_A is set. Fixing this requires rewriting - unwind info generation to be aware of the CFG and propagating states - around edges. */ - if ((flag_unwind_tables || flag_asynchronous_unwind_tables - || flag_exceptions || flag_non_call_exceptions) - && flag_omit_frame_pointer - && !(target_flags & MASK_ACCUMULATE_OUTGOING_ARGS)) - { - if (target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS) - warning (0, "unwind tables currently require either a frame pointer " - "or %saccumulate-outgoing-args%s for correctness", - prefix, suffix); - target_flags |= MASK_ACCUMULATE_OUTGOING_ARGS; - } - although we could modify the comment if it is not true anymore. Still the code will be broken for some tunings with -mno-accumulate-args by default. To really solve the problem, we should free bp somehow. Probably it can be done by smarter stack realign implementation or saving/restoring bp around asm. The later is very complicated task and can not be done in gcc-4.9 time frame.