On 04/19/2013 03:53 PM, Steven Bosscher wrote:
On Thu, Apr 18, 2013 at 6:22 AM, Jeff Law wrote:
On 04/17/2013 03:52 PM, Steven Bosscher wrote:
First of all: What is still important to handle?
It's clear that the expectations in reorg.c are "anything goes" but
modern RISCs (everything since the PA-8000, say) probably have some
limitations on what is helpful to have, or not have, in a delay slot.
According to the comments in pa.h about MASK_JUMP_IN_DELAY, having
jumps in delay slots of other jumps is one such thing: They don't
bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As
far as I know, SPARC and MIPS don't allow jumps in delay slots, SH
looks like it doesn't allow it either, and CRIS can do it for short
branches but doesn't do because the trade-off between benefit and
machine description complexity comes out negative.
Note that sparc and/or mips might use the adjust the return pointer trick.
I know it wasn't my idea when I added it to the PA.
After further research, it was the m88k I took the idea from -- 20 years
ago this summer.
This shouldn't be very difficult to support if the target models this
as a jump in the delay slot of calls only. I can let the delay slot
filler allow jumps in delay slots of calls but not in delay slots of
other jumps. But for the moment I'm going to ignore this case unless
someone knows a target in the FSF tree that would benefit of it.
I'd say drop it given we now know the only other architecture that was
supporting it is also dead.
So I collected some stats myself, for a small number (31) files of gcc
itself, mostly from libcpp and various generator files, compiled at
-O2 for sparc64:
pass 1 pass 2
total simple eager skip simple eager skip
insns 9743 3488 22 1297 525 0
filled 5918 2980 22 21 0 0
hit% 61% 31% 0% 0% 0% 0%
total pass 1 pass 2
insns 9743 1297
filled 8920 21
hit% 92% 2%
Seem like reasonable numbers. I can't say I recall fill slot statistics
from the past, but those are in-line with what I'd expect.
So the first fill_simple_delay_slots pass fills ~60% of the slots, and
the first fill_eager_delay_slots fills another ~30%. The second pass
is not very effective.
Certainly doesn't look terribly effective. One could certainly ask the
question if it's worth running at all or if we would do better off
having relax_delay_slots record things that are worth a second look.
Also note that fill_eager and optimize_skip do nothing useful in your
test during the 2nd pass.
The 60% number also tells me there'd be a lot to be gained by using the
scheduler's dependency information to drive filling. We'd end up
looking at far fewer insns.
Jeff