On 11/15/14 14:37, Matthew Fortune wrote:
Eric Botcazou <ebotca...@adacore.com> writes:
IIRC, fill_eager and its related friends are all speculative in some
way
and aren't those precisely the ones that are causing us problems.
Also
note we have backends working around this stuff in fairly blunt ways:
I'd say that the PA back-end went a bit too far here, especially if it
marks some insns of the epilogue as frame-related. dwarf2cfi.c has
special code to handle delay slots (SEQUENCEs) so it's not an all-or-
nothing game.
Given architectural difficulties of delay slots on modern processors,
would it be that painful to just not allow filling slots with frame
insns and let dbr try to find something else or drop in a nop? I
wouldn't be all that surprised if there wasn't a measurable
performance difference on something like a modern Sparc.
Yes, modern SPARCs have (short) branches without delay slots. But the
other big contender is MIPS here and the story might be different for
it.
MIPSr6 introduces 'compact' branches which do not have delay slots.
So the issues of filling delay slots will be less important from R6
onwards. However, delay slots remain important for now.
I haven't thought about the problem much but instinctively I'd be surprised
if a blanket restriction on frame-related instructions would lead to lots
of NOPs in delay slots.
Possibly. I'd be surprised if frame-related stuff is used that often
for filling slots... Combine that with the decrease in importance for
filling delay slots when the exist, I wouldn't be terribly surprised if
nobody could actually measure the change if we were to make it.
The PA port may have gone too far, but it's certainly conservatively
correct and on every PA processor that "matters" (for a very liberal
definition of matters), I doubt the difference is measurable due to the
depth of the reorder buffers and the fact that a nop can retire anytime
that's convenient.
Jeff