On Wed, Sep 29, 2010 at 2:16 PM, Bingfeng Mei <b...@broadcom.com> wrote:
> Hello,
> I have been examining a significant performance regression
> between 4.5 and 4.4 in our port. I found that Partial Redundancy
> Elimination introduced in 4.5 causes the issue. The following
> pseudo code explains the problem:
>
> BB 3:
> r118 <-  r114 + 2
>
> BB 4:
> R114 <-  r114 + 2
> ...
> Conditional jump to BB 4
>
> After PRE
>
> BB 3:
> r123 <-  r114 + 2
> r118 <-  r123
>
> BB 4:
> r114 <- r123
> conditional jump to BB 5
>
> BB5:
> r123 <- r114 + 2
> jump to BB 4
>
>
> A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5).
> An extra jump instruction is introduced. On some targets, this
> jump can be removed by bb-reorder pass. On our target, it cannot
> be reordered due to complex doloop_end pattern we generate later.
> Additionally, since bb-reorder is done in very late phase, the code
> miss some optimization opportunity such as auto_inc_dec. I don't
> see any benefit here to do PRE like this. Maybe we should exclude
> such case in the first place? I read the relevant text in
> "Advanced Compiler Design Implementation", the example used is linear
> CFG and it doesn't mention how to handle loop case.

PRE basically sinks the computation into the latch block (possibly
creating that).  Note that without a testcase it's hard to tell whether
this is ok in general.  PRE tries to avoid generation of new induction
variables and cross-iteration data-dependences, see insert_into_preds_of_block.
Note that 4.4 in principle performs the same optimization (you might
figure that PRE in 4.4 is generally disabled for -Os but enabled in 4.5,
but only for hot execution traces following existing practice to tune
code-size/performance on a fine-grained basis).

Richard.

> Any suggestion is greatly appreciated.
>
> Thanks,
> Bingfeng Mei
>
>
>
>
>
>
>
>

Reply via email to