On 11/23/19 5:34 PM, Segher Boessenkool wrote:
Hi!
On Mon, Nov 18, 2019 at 05:55:13PM +0000, Richard Sandiford wrote:
Richard Sandiford <richard.sandif...@arm.com> writes:
(It's 23:35 local time, so it's still just about stage 1. :-))
Or actually, just under 1 day after end of stage 1. Oops.
Could have sworn stage 1 ended on the 17th :-( Only realised
I'd got it wrong when catching up on Saturday's email traffic.
And inevitably, I introduced a couple of stupid mistakes while
trying to clean the patch up for submission by that (non-)deadline.
Here's a version that fixes an inverted overlapping memref check
and that correctly prunes the use list for combined instructions.
(This last one is just a compile-time saving -- the old code was
correct, just suboptimal.)
I've build the Linux kernel with the previous version, as well as this
one. R0 is unmodified GCC, R1 is the first patch, R2 is this one:
(I've forced --param=run-combine=6 for R1 and R2):
(Percentages are relative to R0):
R0 R1 R2 R1 R2
alpha 6107088 6101088 6101088 99.902% 99.902%
arc 4008224 4006568 4006568 99.959% 99.959%
arm 9206728 9200936 9201000 99.937% 99.938%
arm64 13056174 13018174 13018194 99.709% 99.709%
armhf 0 0 0 0 0
c6x 2337237 2337077 2337077 99.993% 99.993%
csky 3356602 0 0 0 0
h8300 1166996 1166776 1166776 99.981% 99.981%
i386 11352159 0 0 0 0
ia64 18230640 18167000 18167000 99.651% 99.651%
m68k 3714271 0 0 0 0
microblaze 4982749 4979945 4979945 99.944% 99.944%
mips 8499309 8495205 8495205 99.952% 99.952%
mips64 7042036 7039816 7039816 99.968% 99.968%
nds32 4486663 0 0 0 0
nios2 3680001 3679417 3679417 99.984% 99.984%
openrisc 4226076 4225868 4225868 99.995% 99.995%
parisc 7681895 7680063 7680063 99.976% 99.976%
parisc64 8677077 8676581 8676581 99.994% 99.994%
powerpc 10687611 10682199 10682199 99.949% 99.949%
powerpc64 17671082 17658570 17658570 99.929% 99.929%
powerpc64le 17671082 17658570 17658570 99.929% 99.929%
riscv32 1554938 1554758 1554758 99.988% 99.988%
riscv64 6634342 6632788 6632788 99.977% 99.977%
s390 13049643 13014939 13014939 99.734% 99.734%
sh 3254743 0 0 0 0
shnommu 1632364 1632124 1632124 99.985% 99.985%
sparc 4404993 4399593 4399593 99.877% 99.877%
sparc64 6796711 6797491 6797491 100.011% 100.011%
x86_64 19713174 19712817 19712817 99.998% 99.998%
xtensa 0 0 0 0 0
0 means it didn't build.
armhf is probably my own problem, not sure yet.
xtensa starts with
/tmp/ccmJoY7l.s: Assembler messages:
/tmp/ccmJoY7l.s:407: Error: cannot represent `BFD_RELOC_8' relocation in object
file
and it doesn't get better.
My powerpc64 config actually built the powerpc64le config, since the
kernel since a while looks what the host system is, for its defconfig.
Oh well, fixed now.
There are fivew new failures, with either of the combine2 patches. And
all five are actually different (different symptoms, at least):
- csky fails on libgcc build:
/home/segher/src/gcc/libgcc/fp-bit.c: In function '__fixdfsi':
/home/segher/src/gcc/libgcc/fp-bit.c:1405:1: error: unable to generate reloads
for:
1405 | }
| ^
(insn 199 86 87 8 (parallel [
(set (reg:SI 101)
(plus:SI (reg:SI 98)
(const_int -32 [0xffffffffffffffe0])))
(set (reg:CC 33 c)
(lt:CC (plus:SI (reg:SI 98)
(const_int -32 [0xffffffffffffffe0]))
(const_int 0 [0])))
]) "/home/segher/src/gcc/libgcc/fp-bit.c":1403:23 207 {*cskyv2_declt}
(nil))
during RTL pass: reload
Target problem?
- i386 goes into an infinite loop compiling, or at least an hour or so...
Erm I forgot too record what it was compiling. I did attach a GDB... It
is something from lra_create_live_ranges.
- m68k:
/home/segher/src/kernel/fs/exec.c: In function 'copy_strings':
/home/segher/src/kernel/fs/exec.c:590:1: internal compiler error: in
final_scan_insn_1, at final.c:3048
590 | }
| ^
0x10408307 final_scan_insn_1
/home/segher/src/gcc/gcc/final.c:3048
0x10408383 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
/home/segher/src/gcc/gcc/final.c:3152
0x10408797 final_1
/home/segher/src/gcc/gcc/final.c:2020
0x104091f7 rest_of_handle_final
/home/segher/src/gcc/gcc/final.c:4658
0x104091f7 execute
/home/segher/src/gcc/gcc/final.c:4736
and that line is
gcc_assert (prev_nonnote_insn (insn) == last_ignored_compare);
- nds32:
/tmp/ccC8Czca.s: Assembler messages:
/tmp/ccC8Czca.s:3144: Error: Unrecognized operand/register, lmw.bi
[$fp+(-60)],[$fp],$r11,0x0.
/tmp/ccl8o20c.s: Assembler messages:
/tmp/ccl8o20c.s:2449: Error: Unrecognized operand/register, lmw.bi
$r9,[$fp],[$fp+(-132)],0x0.
/tmp/ccZxjwHd.s: Assembler messages:
/tmp/ccZxjwHd.s:4776: Error: Unrecognized operand/register, lmw.bi
[$fp+(-52)],[$fp],[$fp+(-56)],0x0.
/tmp/cczjOS3d.s: Assembler messages:
/tmp/cczjOS3d.s:2336: Error: Unrecognized operand/register, lmw.bi
$r16,[$fp],$r7,0x0.
and more. All lmw.bi... target issue?
- sh (that's sh4-linux):
/home/segher/src/kernel/net/ipv4/af_inet.c: In function 'snmp_get_cpu_field':
/home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find a
register to spill in class 'R0_REGS'
1638 | }
| ^
/home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the insn:
(insn 18 17 19 2 (set (reg:SI 0 r0)
(mem:SI (plus:SI (reg:SI 4 r4 [178])
(reg:SI 6 r6 [171])) [17 *_3+0 S4 A32]))
"/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
(expr_list:REG_DEAD (reg:SI 4 r4 [178])
(expr_list:REG_DEAD (reg:SI 6 r6 [171])
(nil))))
/home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier errors,
bailing out
Looking at just binary size, which is a good stand-in for how many insns
it combined:
R2
arm64 99.709%
ia64 99.651%
s390 99.734%
sparc 99.877%
sparc64 100.011%
(These are those that are not between 99.9% and 100.0%).
So only sparc64 regressed, and just a tiny bit (I can look at what that
is, if there is interest). But 32-bit sparc improved, and s390, arm64,
and ia64 got actual benefit.
Again this is just code size, not analysing the actually changed code.
I did look at the powerpc64le changes. It is almost completely load-
with-update (and store-with-update) insns that make the difference, but
there are also some dot insns. The extra mr. are usually not a good
idea, but the extsw. are. Sometimes this causes *more* insns in the end
(register move insns), but that is the exception.
This mr. problem is there with combine already, btw. In the end it is
caused by this just not being something good to do on pseudos, it would
be better to do this after RA, in a peephole or similar. OTOH it isn't
actually really important for performance either way.
Btw, does the new pass use TARGET_LEGITIMATE_COMBINED_INSN? It probably
should. (That would be the hook where we would probably want to prevent
generating mr. insns).
Segher
Segher,
Please just CC to this conversation as I keep getting removed.
Thanks,
Nick