Hello, 
I have been examining a significant performance regression 
between 4.5 and 4.4 in our port. I found that Partial Redundancy
Elimination introduced in 4.5 causes the issue. The following
pseudo code explains the problem:

BB 3:
r118 <-  r114 + 2

BB 4: 
R114 <-  r114 + 2
...
Conditional jump to BB 4

After PRE

BB 3: 
r123 <-  r114 + 2
r118 <-  r123

BB 4:
r114 <- r123
conditional jump to BB 5

BB5: 
r123 <- r114 + 2
jump to BB 4


A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5). 
An extra jump instruction is introduced. On some targets, this
jump can be removed by bb-reorder pass. On our target, it cannot
be reordered due to complex doloop_end pattern we generate later. 
Additionally, since bb-reorder is done in very late phase, the code
miss some optimization opportunity such as auto_inc_dec. I don't
see any benefit here to do PRE like this. Maybe we should exclude
such case in the first place? I read the relevant text in 
"Advanced Compiler Design Implementation", the example used is linear
CFG and it doesn't mention how to handle loop case. 

Any suggestion is greatly appreciated. 

Thanks,
Bingfeng Mei
 






Reply via email to