https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534

--- Comment #12 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
So while there's nothing buggy about the if-conversion which causes the
performance issue, it does show an interesting missed optimization that ifcvt
can't handle.

We make the transform through find_if_case_2, which looks for things of the
form:

        /* TEST BLOCK  */
        if (test)
           goto E; // x not live
        /* FALLTHRU  */
        /* ELSE BLOCK  */
        x = big();
        goto L;
        E:
        /* THEN BLOCK  */
        x = b;
        goto M;

And transforms them to:

        /* Unconditional copy of THEN BLOCK */
        x = b;
        /* TEST BLOCK  */
        if (test)
          goto M;
        /* ELSE BLOCK  */
        x = big();
        goto L;

In the testcase, using the naming conventions above, and snipping irrelevant
details, this looks like:

TEST BLOCK (309)

    ;; basic block 309, loop depth 4, count 0, freq 3153, maybe hot
    ;;  prev block 308, next block 311, flags: (REACHABLE, RTL, MODIFIED)
    ;;  pred:       219 [98.0%] 

    <snip more insns>
    (insn 1915 1914 1916 309 (set (reg:CC 66 cc)
            (compare:CC (reg:SI 1117)
                (const_int 0 [0]))) "foo.cpp":59 391 {cmpsi}
         (expr_list:REG_DEAD (reg:SI 1117)
            (nil)))
    (jump_insn 1916 1915 1922 309 (set (pc)
            (if_then_else (ne (reg:CC 66 cc)
                    (const_int 0 [0]))
                (label_ref 1905)
                (pc))) "foo.cpp":59 9 {condjump}
         (expr_list:REG_DEAD (reg:CC 66 cc)
            (int_list:REG_BR_PROB 9558 (nil)))
     -> 1905)

    ;;  succ:       227 [95.6%] 
    ;;              226 [4.4%]  (FALLTHRU)

ELSE BLOCK (226)

    ;; basic block 226, loop depth 4, count 0, freq 201, maybe hot
    ;;  prev block 224, next block 227, flags: (REACHABLE, RTL, MODIFIED)
    ;;  pred:       309 [4.4%]  (FALLTHRU)
    ;;              311 [3.8%]  (FALLTHRU)

    (code_label 1917 1413 1417 226 141 (nil) [0 uses])
    (note 1417 1917 1418 226 [bb 226] NOTE_INSN_BASIC_BLOCK)
    (insn 1418 1417 1905 226 (set (reg:SI 690 [ _1517 ])
            (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
                (const_int -3 [0xfffffffffffffffd]))) 95 {*addsi3_aarch64}
         (nil))

    ;;  succ:       237 [100.0%]  (FALLTHRU)

THEN BLOCK (227)

    ;; basic block 227, loop depth 4, count 0, freq 3013, maybe hot
    ;;  prev block 226, next block 228, flags: (REACHABLE, RTL, MODIFIED)
    ;;  pred:       309 [95.6%] 

    (code_label 1905 1418 1421 227 140 (nil) [1 uses])
    (note 1421 1905 1422 227 [bb 227] NOTE_INSN_BASIC_BLOCK)
    (insn 1422 1421 1802 227 (set (reg:SI 690 [ _1517 ])
            (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
                (const_int -3 [0xfffffffffffffffd]))) 95 {*addsi3_aarch64}
         (nil))

    ;;  succ:       237 [100.0%]  (FALLTHRU)

So the interesting thing is that the THEN block and the ELSE block are as good
as identical! Both compute (plus (reg 703) (const_int -3)) and both fall
through to block 237.

The normal if-convert machinery won't catch this because basic block 226 (the
ELSE block) has multiple predecessors. But the transformation we make through
find_if_case_2 ends up looking silly! (again, snipping some unrelated
details/insns):

TEST BLOCK (279)

    ;; basic block 279, loop depth 4, count 0, freq 3153, maybe hot
    ;;  prev block 278, next block 280, flags: (REACHABLE, RTL, MODIFIED)
    ;;  pred:       203 [98.0%] 
    <snip some insns>

    /* Unconditional copy of THEN BLOCK.  */

    (insn 1422 1914 1915 279 (set (reg:SI 690 [ _1517 ])
            (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
                (const_int -3 [0xfffffffffffffffd]))) 95 {*addsi3_aarch64}
         (nil))
    (insn 1915 1422 1916 279 (set (reg:CC 66 cc)
            (compare:CC (reg:SI 1117)
              (const_int 0 [0]))) "foo.cpp":59 391 {cmpsi}
         (expr_list:REG_DEAD (reg:SI 1117)
            (nil)))
    (jump_insn 1916 1915 1922 279 (set (pc)
            (if_then_else (ne (reg:CC 66 cc)
                    (const_int 0 [0]))
                (label_ref:DI 1470)
                (pc))) "foo.cpp":59 9 {condjump}
         (expr_list:REG_DEAD (reg:CC 66 cc)
            (int_list:REG_BR_PROB 9558 (nil)))
     -> 1470)
    ;;  succ:       218 [95.6%] 
    ;;              209 [4.4%]  (FALLTHRU)

ELSE BLOCK (209):

    ;; basic block 209, loop depth 4, count 0, freq 201, maybe hot
    ;;  prev block 208, next block 210, flags: (REACHABLE, RTL, MODIFIED)
    ;;  pred:       279 [4.4%]  (FALLTHRU)
    ;;              280 [3.8%]  (FALLTHRU)

    (code_label 1917 1413 1417 209 141 (nil) [0 uses])
    (note 1417 1917 1418 209 [bb 209] NOTE_INSN_BASIC_BLOCK)
    (insn 1418 1417 1802 209 (set (reg:SI 690 [ _1517 ])
            (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ])
                (const_int -3 [0xfffffffffffffffd]))) 95 {*addsi3_aarch64}
         (nil))
    ;;  succ:       218 [100.0%]  (FALLTHRU)

Note that if we are on the "else" path, we now we compute (plus (reg 703)
(const_int -3)) twice! The missed optimisation is that this should be an
unconditional branch! There is probably a route to getting your performance
back by improving ifcvt to handle this case.

Reply via email to