https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534

--- Comment #7 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
I'm not sure there are any bugs here to fix, though I can still reproduce the
performance differences.

First up, basic block reordering causes an issue across all microarchitectures
on which I've looked at this. Basic block reordering is kicking in because the
static estimates of the execution profile make it look like a good idea. I'm
struggling to understand exactly what the execution profile of the testcase is
intended to be, as I'm finding both the source and the generated assembly/perf
reports hard to follow. Because I'm struggling to follow it, I can't tell if
the basic block reorganisation is sensible, but it doesn't look buggy.

Turning basic block reordering off (with -fno-reorder-blocks) removes the
performance difference for me, with that off both before and after r245151 have
similar performance on Cortex-A53 and Cortex-A72.

However, Cortex-A57 still shows a performance regression, which I believe is
related to an extra conditional branch in the code after r245151. I tried to
find which pass previously removed this branch and narrowed it down to jump2,
but I haven't figured out why there is such a change in jump2.

I'm on vacation now, so won't be able to look at this in the next week if
anyone else wants to dig.

Reply via email to