On 28/11/2023 12:52, Surya Kumari Jangala wrote:
Hi Richard,
Thanks a lot for your response!
Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)
FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve
-moverride=tune=none check-function-bodies dup_x0_m
Expected code:
...
add (x[0-9]+), x0, #?1
mov (p[0-7])\.b, p15\.b
mov z0\.d, \2/m, \1
...
ret
Code obtained w/o patch:
addvl sp, sp, #-1
str p15, [sp]
add x0, x0, 1
mov p3.b, p15.b
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl sp, sp, #1
ret
Code obtained w/ patch:
addvl sp, sp, #-1
str p15, [sp]
mov p3.b, p15.b
add x0, x0, 1
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl sp, sp, #1
ret
As we can see, with the patch, the following two instructions are interchanged:
add x0, x0, 1
mov p3.b, p15.b
Indeed, both look acceptable results to me, especially given that we
don't schedule results at -O1.
There's two ways of fixing this:
1) Simply swap the order to what the compiler currently generates (which
is a little fragile, since it might flip back someday).
2) Write the test as
** (
** add (x[0-9]+), x0, #?1
** mov (p[0-7])\.b, p15\.b
** mov z0\.d, \2/m, \1
** |
** mov (p[0-7])\.b, p15\.b
** add (x[0-9]+), x0, #?1
** mov z0\.d, \1/m, \2
** )
Note, we need to swap the match names in the third insn to account for
the different order of the earlier instructions.
Neither is ideal, but the second is perhaps a little more bomb proof.
I don't really have a strong feeling either way, but perhaps the second
is slightly preferable.
Richard S: thoughts?
R.
I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.
Regards,
Surya
On 24/11/23 4:18 pm, Richard Earnshaw wrote:
On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
Hi Richard,
Ping. Please let me know if the test failure that I mentioned in the mail below
can be handled by changing the expected generated code. I am not conversant
with arm, and hence would appreciate your help.
Regards,
Surya
On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
Hi Richard,
I had submitted a patch for review
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
regarding scaling save/restore costs of callee save registers with block
frequency in the IRA pass (PR111673).
This patch has been approved by VMakarov
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
With this patch, we are seeing performance improvements with spec on x86
(exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
I received a mail from Linaro about some failures seen in the CI pipeline with
this patch. I have analyzed the failures and I wish to discuss the analysis
with you.
One failure reported by the Linaro CI is:
FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, r[0-9]+,
\\[r[0-9]+\\] 2
The diff in the assembly between trunk and patch is:
93c93
< push {r4, r5}
---
push {fp}
95c95
< ldrexd r4, r5, [r0]
---
ldrexd fp, ip, [r0]
99c99
< pop {r4, r5}
---
ldr fp, [sp], #4
The test fails with patch because the ldrexd insn uses fp & ip registers instead
of r[0-9]+
But the code produced by patch is better because it is pushing and restoring
only
one register (fp) instead of two registers (r4, r5). Hence, this test can be
modified to allow it to pass on arm. Please let me know what you think.
If you need more information, please let me know. I will be sending separate
mails
for the other test failures.
Thanks for looking at this.
The key part of this test is that the compiler generates LDREXD. The registers used for
that are pretty much irrelevant as we don't match them to any other operations within the
test. So I'd recommend just testing for the mnemonic and not for any of the operands (ie
just match "ldrexd\t").
R.
Regards,
Surya