Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

Richard Earnshaw via Gcc Tue, 28 Nov 2023 05:49:39 -0800



On 28/11/2023 12:52, Surya Kumari Jangala wrote:

Hi Richard,
Thanks a lot for your response!

Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)

FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
-moverride=tune=none  check-function-bodies dup_x0_m

Expected code:

       ...
       add     (x[0-9]+), x0, #?1
       mov     (p[0-7])\.b, p15\.b
       mov     z0\.d, \2/m, \1
       ...
       ret


Code obtained w/o patch:
         addvl   sp, sp, #-1
         str     p15, [sp]
         add     x0, x0, 1
         mov     p3.b, p15.b
         mov     z0.d, p3/m, x0
         ldr     p15, [sp]
         addvl   sp, sp, #1
         ret

Code obtained w/ patch:
        addvl   sp, sp, #-1
         str     p15, [sp]
         mov     p3.b, p15.b
         add     x0, x0, 1
         mov     z0.d, p3/m, x0
         ldr     p15, [sp]
         addvl   sp, sp, #1
         ret

As we can see, with the patch, the following two instructions are interchanged:
         add     x0, x0, 1
         mov     p3.b, p15.b

Indeed, both look acceptable results to me, especially given that wedon't schedule results at -O1.


There's two ways of fixing this:

1) Simply swap the order to what the compiler currently generates (whichis a little fragile, since it might flip back someday).

2) Write the test as


** (
**       add     (x[0-9]+), x0, #?1
**       mov     (p[0-7])\.b, p15\.b
**       mov     z0\.d, \2/m, \1
** |
**       mov     (p[0-7])\.b, p15\.b
**       add     (x[0-9]+), x0, #?1
**       mov     z0\.d, \1/m, \2
** )

Note, we need to swap the match names in the third insn to account forthe different order of the earlier instructions.


Neither is ideal, but the second is perhaps a little more bomb proof.

I don't really have a strong feeling either way, but perhaps the secondis slightly preferable.


Richard S: thoughts?

R.

I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.

Regards,
Surya


On 24/11/23 4:18 pm, Richard Earnshaw wrote:



On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:

Hi Richard,
Ping. Please let me know if the test failure that I mentioned in the mail below 
can be handled by changing the expected generated code. I am not conversant 
with arm, and hence would appreciate your help.

Regards,
Surya

On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:

Hi Richard,
I had submitted a patch for review 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
regarding scaling save/restore costs of callee save registers with block
frequency in the IRA pass (PR111673).

This patch has been approved by VMakarov
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).

With this patch, we are seeing performance improvements with spec on x86
(exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).

I received a mail from Linaro about some failures seen in the CI pipeline with
this patch. I have analyzed the failures and I wish to discuss the analysis 
with you.

One failure reported by the Linaro CI is:

FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, r[0-9]+, 
\\[r[0-9]+\\] 2

The diff in the assembly between trunk and patch is:

93c93
<       push    {r4, r5}
---

        push    {fp}

95c95
<       ldrexd  r4, r5, [r0]
---

        ldrexd  fp, ip, [r0]

99c99
<       pop     {r4, r5}
---

        ldr     fp, [sp], #4



The test fails with patch because the ldrexd insn uses fp & ip registers instead
of r[0-9]+

But the code produced by patch is better because it is pushing and restoring 
only
one register (fp) instead of two registers (r4, r5). Hence, this test can be
modified to allow it to pass on arm. Please let me know what you think.

If you need more information, please let me know. I will be sending separate 
mails
for the other test failures.


Thanks for looking at this.


The key part of this test is that the compiler generates LDREXD.  The registers used for 
that are pretty much irrelevant as we don't match them to any other operations within the 
test.  So I'd recommend just testing for the mnemonic and not for any of the operands (ie 
just match "ldrexd\t").

R.

Regards,
Surya

Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

Reply via email to