Hello, I have been examining a significant performance regression between 4.5 and 4.4 in our port. I found that Partial Redundancy Elimination introduced in 4.5 causes the issue. The following pseudo code explains the problem:
BB 3: r118 <- r114 + 2 BB 4: R114 <- r114 + 2 ... Conditional jump to BB 4 After PRE BB 3: r123 <- r114 + 2 r118 <- r123 BB 4: r114 <- r123 conditional jump to BB 5 BB5: r123 <- r114 + 2 jump to BB 4 A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5). An extra jump instruction is introduced. On some targets, this jump can be removed by bb-reorder pass. On our target, it cannot be reordered due to complex doloop_end pattern we generate later. Additionally, since bb-reorder is done in very late phase, the code miss some optimization opportunity such as auto_inc_dec. I don't see any benefit here to do PRE like this. Maybe we should exclude such case in the first place? I read the relevant text in "Advanced Compiler Design Implementation", the example used is linear CFG and it doesn't mention how to handle loop case. Any suggestion is greatly appreciated. Thanks, Bingfeng Mei