On 11/14/14 03:44, Eric Botcazou wrote:
I wonder how many other problems of this nature are lurking in reorg.c.
For example steal_delay_list_from_{target,fallthrough} or the code
which searches for arithmetic at the branch target, and puts the
opposite insn in a delay slot.

Right, and the latter has already been dealt with by Richard:

2012-02-11  Richard Sandiford  <rdsandif...@googlemail.com>

        PR rtl-optimization/52175
        * reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
        to frame-related instructions.
Ahhh, good.


In fact, I really wonder if we should be allowing anything frame related
outside fill_simple_delay_slots.

That could well be the end result after a few more years of tweaking. :-)
IIRC, fill_eager and its related friends are all speculative in some way and aren't those precisely the ones that are causing us problems. Also note we have backends working around this stuff in fairly blunt ways:

;; For conditional branches. Frame related instructions are not allowed
;; because they confuse the unwind support.
(define_attr "in_branch_delay" "false,true"
(if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                     (eq_attr "length" "4")
                     (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                (const_string "true")
                (const_string "false")))

;; Disallow instructions which use the FPU since they will tie up the FPU
;; even if the instruction is nullified.
(define_attr "in_nullified_branch_delay" "false,true"
(if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpdivdbl,fpsqrtsgl,fpsqrtdbl,parallel_branch,trap")
                     (eq_attr "length" "4")
                     (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                (const_string "true")
                (const_string "false")))

;; For calls and millicode calls.
(define_attr "in_call_delay" "false,true"
(if_then_else (and (eq_attr "type" "!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                     (eq_attr "length" "4")
                     (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                (const_string "true")
                (const_string "false")))


Given architectural difficulties of delay slots on modern processors, would it be that painful to just not allow filling slots with frame insns and let dbr try to find something else or drop in a nop? I wouldn't be all that surprised if there wasn't a measurable performance difference on something like a modern Sparc.


Jeff

Reply via email to