On 11/19/23 19:23, Xi Ruoyao wrote:
On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
This is work originally started by Joern @ Embecosm.
There's been a long standing sense that we're generating too many
sign/zero extensions on the RISC-V port. REE is useful, but it's really
focused on a relatively narrow part of the extension problem.
What Joern's patch does is introduce a new pass which tracks liveness of
chunks of pseudo regs. Specifically it tracks bits 0..7, 8..15, 16..31
and 32..63.
If it encounters a sign/zero extend that sets bits that are never read,
then it replaces the sign/zero extension with a narrowing subreg. The
narrowing subreg usually gets eliminated by subsequent passes (it's just
a copy after all).
Jivan has done some analysis and found that it eliminates roughly 1% of
the dynamic instruction stream for x264 as well as some redundant
extensions in the coremark benchmark (both on rv64). In my own testing
as I worked through issues on other architectures I clearly saw it
helping in various places within GCC itself or in the testsuite.
The basic structure is to first do a fairly standard liveness analysis
on the chunks, seeding original state with the liveness data from DF.
Once that's stable, we do a final pass to identify the useless
extensions and transform them into narrowing subregs.
A few key points to remember.
For destination processing it is always safe to ignore a destination.
Ignoring a destination merely means that whatever was live after the
given insn will continue to be live before the insn. What is not safe
is to clear a bit in the LIVENOW bitmap for a destination chunk that is
not set. This comes into play with things like STRICT_LOW_PART.
For source processing the safe thing to do is to set all the chunks in a
register as live. It is never safe to fail to process a source operand.
When a destination object is not fully live, we try to transfer that
limited liveness to the source operands. So for example if bits 16..63
are dead in a destination of a PLUS, we need not mark bits 16..63 as
live for the source operands. We have to be careful -- consider a shift
count on a target without SHIFT_COUNT_TRUNCATED set. So we have both a
list of RTL codes where we can transfer liveness and a few codes where
one of the operands may need to be fully live (ex, a shift count) while
the other input may not need to be fully live (value left shifted).
Locally we have had this enabled at -O1 and above to encourage testing,
but I'm thinking that for the trunk enabling at -O2 and above is the
right thing to do.
This has (of course) been tested on rv64. It's also been bootstrapped
and regression tested on x86. Bootstrap and regression tested (C only)
for m68k, sh4, sh4eb, alpha. Earlier versions were also bootstrapped
and regression tested on ppc, hppa and s390x (C only for those as well).
It's also been tested on the various crosses in my tester. So we've
got reasonable coverage of 16, 32 and 64 bit targets, big and little
endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other
oddities.
The included tests are for RISC-V only because not all targets are going
to have extraneous extensions. There's tests from coremark, x264 and
GCC's bz database. It probably wouldn't be hard to add aarch64
testscases. The BZs listed are improved by this patch for aarch64.
Given the amount of work Jivan and I have done, I'm not comfortable
self-approving at this time. I'd much rather have another set of eyes
on the code. Hopefully the code is documented well enough for that to
be useful exercise.
So, no need to work from Pago Pago for this patch. I may make another
attempt at the eswin conditional move work while working virtually in
Pago Pago though.
Thoughts, comments, recommendations?
Unfortunately, I get some ICE building stage 1 libgcc with this patch on
loongarch64-linux-gnu:
during RTL pass: ext_dce
../../../gcc/libgcc/libgcc2.c: In function ‘__absvdi2’:
../../../gcc/libgcc/libgcc2.c:224:1: internal compiler error: Segmentation fault
224 | }
| ^
0x120baa477 crash_signal
../../gcc/gcc/toplev.cc:316
0x1216aeeb4 ext_dce_process_sets
../../gcc/gcc/ext-dce.cc:128
0x1216afbaf ext_dce_process_bb
../../gcc/gcc/ext-dce.cc:647
0x1216afbaf ext_dce
../../gcc/gcc/ext-dce.cc:802
0x1216afbaf execute
../../gcc/gcc/ext-dce.cc:868
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
I think I know what's going on here.
jeff