Richard Earnshaw <richard.earns...@foss.arm.com> writes: > On 28/11/2023 12:52, Surya Kumari Jangala wrote: >> Hi Richard, >> Thanks a lot for your response! >> >> Another failure reported by the Linaro CI is as follows : >> (Note: I am planning to send a separate mail for each failure, as this will >> make >> the discussion easy to track) >> >> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve >> -moverride=tune=none check-function-bodies dup_x0_m >> >> Expected code: >> >> ... >> add (x[0-9]+), x0, #?1 >> mov (p[0-7])\.b, p15\.b >> mov z0\.d, \2/m, \1 >> ... >> ret >> >> >> Code obtained w/o patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> add x0, x0, 1 >> mov p3.b, p15.b >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >> >> Code obtained w/ patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> mov p3.b, p15.b >> add x0, x0, 1 >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >> >> As we can see, with the patch, the following two instructions are >> interchanged: >> add x0, x0, 1 >> mov p3.b, p15.b > > Indeed, both look acceptable results to me, especially given that we > don't schedule results at -O1. > > There's two ways of fixing this: > 1) Simply swap the order to what the compiler currently generates (which > is a little fragile, since it might flip back someday). > 2) Write the test as > > > ** ( > ** add (x[0-9]+), x0, #?1 > ** mov (p[0-7])\.b, p15\.b > ** mov z0\.d, \2/m, \1 > ** | > ** mov (p[0-7])\.b, p15\.b > ** add (x[0-9]+), x0, #?1 > ** mov z0\.d, \1/m, \2 > ** ) > > Note, we need to swap the match names in the third insn to account for > the different order of the earlier instructions. > > Neither is ideal, but the second is perhaps a little more bomb proof. > > I don't really have a strong feeling either way, but perhaps the second > is slightly preferable. > > Richard S: thoughts?
Yeah, I agree the second is probably better. The | doesn't reset the capture numbers, so I think the final instruction needs to be: ** mov z0\.d, \3/m, \4 Thanks, Richard > > R. > >> I believe that this is fine and the test can be modified to allow it to pass >> on >> aarch64. Please let me know what you think. >> >> Regards, >> Surya >> >> >> On 24/11/23 4:18 pm, Richard Earnshaw wrote: >>> >>> >>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: >>>> Hi Richard, >>>> Ping. Please let me know if the test failure that I mentioned in the mail >>>> below can be handled by changing the expected generated code. I am not >>>> conversant with arm, and hence would appreciate your help. >>>> >>>> Regards, >>>> Surya >>>> >>>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: >>>>> Hi Richard, >>>>> I had submitted a patch for review >>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) >>>>> regarding scaling save/restore costs of callee save registers with block >>>>> frequency in the IRA pass (PR111673). >>>>> >>>>> This patch has been approved by VMakarov >>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). >>>>> >>>>> With this patch, we are seeing performance improvements with spec on x86 >>>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). >>>>> >>>>> I received a mail from Linaro about some failures seen in the CI pipeline >>>>> with >>>>> this patch. I have analyzed the failures and I wish to discuss the >>>>> analysis with you. >>>>> >>>>> One failure reported by the Linaro CI is: >>>>> >>>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, >>>>> r[0-9]+, \\[r[0-9]+\\] 2 >>>>> >>>>> The diff in the assembly between trunk and patch is: >>>>> >>>>> 93c93 >>>>> < push {r4, r5} >>>>> --- >>>>>> push {fp} >>>>> 95c95 >>>>> < ldrexd r4, r5, [r0] >>>>> --- >>>>>> ldrexd fp, ip, [r0] >>>>> 99c99 >>>>> < pop {r4, r5} >>>>> --- >>>>>> ldr fp, [sp], #4 >>>>> >>>>> >>>>> The test fails with patch because the ldrexd insn uses fp & ip registers >>>>> instead >>>>> of r[0-9]+ >>>>> >>>>> But the code produced by patch is better because it is pushing and >>>>> restoring only >>>>> one register (fp) instead of two registers (r4, r5). Hence, this test can >>>>> be >>>>> modified to allow it to pass on arm. Please let me know what you think. >>>>> >>>>> If you need more information, please let me know. I will be sending >>>>> separate mails >>>>> for the other test failures. >>>>> >>> >>> Thanks for looking at this. >>> >>> >>> The key part of this test is that the compiler generates LDREXD. The >>> registers used for that are pretty much irrelevant as we don't match them >>> to any other operations within the test. So I'd recommend just testing for >>> the mnemonic and not for any of the operands (ie just match "ldrexd\t"). >>> >>> R. >>> >>>>> Regards, >>>>> Surya >>>>> >>>>> >>>>>