Hi Richard,
Thanks a lot for your response!

Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)

FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
-moverride=tune=none  check-function-bodies dup_x0_m 

Expected code:

      ...
      add     (x[0-9]+), x0, #?1
      mov     (p[0-7])\.b, p15\.b
      mov     z0\.d, \2/m, \1
      ...
      ret


Code obtained w/o patch:
        addvl   sp, sp, #-1
        str     p15, [sp]
        add     x0, x0, 1
        mov     p3.b, p15.b
        mov     z0.d, p3/m, x0
        ldr     p15, [sp]
        addvl   sp, sp, #1
        ret

Code obtained w/ patch:
        addvl   sp, sp, #-1
        str     p15, [sp]
        mov     p3.b, p15.b
        add     x0, x0, 1
        mov     z0.d, p3/m, x0
        ldr     p15, [sp]
        addvl   sp, sp, #1
        ret

As we can see, with the patch, the following two instructions are interchanged:
        add     x0, x0, 1
        mov     p3.b, p15.b

I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.

Regards,
Surya


On 24/11/23 4:18 pm, Richard Earnshaw wrote:
> 
> 
> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
>> Hi Richard,
>> Ping. Please let me know if the test failure that I mentioned in the mail 
>> below can be handled by changing the expected generated code. I am not 
>> conversant with arm, and hence would appreciate your help.
>>
>> Regards,
>> Surya
>>
>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
>>> Hi Richard,
>>> I had submitted a patch for review 
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
>>> regarding scaling save/restore costs of callee save registers with block
>>> frequency in the IRA pass (PR111673).
>>>
>>> This patch has been approved by VMakarov
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>>>
>>> With this patch, we are seeing performance improvements with spec on x86
>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>>>
>>> I received a mail from Linaro about some failures seen in the CI pipeline 
>>> with
>>> this patch. I have analyzed the failures and I wish to discuss the analysis 
>>> with you.
>>>
>>> One failure reported by the Linaro CI is:
>>>
>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, 
>>> r[0-9]+, \\[r[0-9]+\\] 2
>>>
>>> The diff in the assembly between trunk and patch is:
>>>
>>> 93c93
>>> <       push    {r4, r5}
>>> ---
>>>>        push    {fp}
>>> 95c95
>>> <       ldrexd  r4, r5, [r0]
>>> ---
>>>>        ldrexd  fp, ip, [r0]
>>> 99c99
>>> <       pop     {r4, r5}
>>> ---
>>>>        ldr     fp, [sp], #4
>>>
>>>
>>> The test fails with patch because the ldrexd insn uses fp & ip registers 
>>> instead
>>> of r[0-9]+
>>>
>>> But the code produced by patch is better because it is pushing and 
>>> restoring only
>>> one register (fp) instead of two registers (r4, r5). Hence, this test can be
>>> modified to allow it to pass on arm. Please let me know what you think.
>>>
>>> If you need more information, please let me know. I will be sending 
>>> separate mails
>>> for the other test failures.
>>>
> 
> Thanks for looking at this.
> 
> 
> The key part of this test is that the compiler generates LDREXD.  The 
> registers used for that are pretty much irrelevant as we don't match them to 
> any other operations within the test.  So I'd recommend just testing for the 
> mnemonic and not for any of the operands (ie just match "ldrexd\t").
> 
> R.
> 
>>> Regards,
>>> Surya
>>>
>>>
>>>

Reply via email to