On 14/12/2023 07:17, Surya Kumari Jangala via Gcc wrote: > Hi Richard, > Thanks a lot for your response! > > Another failure reported by the Linaro CI is as follows: > > Running gcc:gcc.dg/dg.exp ... > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue > "Performing shrink-wrapping" > FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing > shrink-wrapping" > > I analyzed the failures and the root cause is the same for both the failures. > > The test pr10474.c is as follows: > > void f(int *i) > { > if (!i) > return; > else > { > __builtin_printf("Hi"); > *i=0; > } > } > > > With the patch (for PR111673), x1 (volatile) is being assigned to hold value > of > x0 (first parameter). Since it is a volatile, x1 is saved to the stack as > there > is a call later on. The save to the stack is generated in the LRA pass. The > save > is generated in the entry basic block. Due to the usage of the stack pointer > in > the entry bb, the testcase fails to be shrink wrapped.
I'm not entirely sure I understand what you mean from a quick glance. Do you mean that X1 has the /v flag marked on it (ie it's printed in dumps as "reg/v")? If so, that's not volatile, it just means that the register is associated with a user variable (as opposed to a compiler-generated temporary variable): >From the manual: @item REG_USERVAR_P (@var{x}) In a @code{reg}, nonzero if it corresponds to a variable present in the user's source code. Zero for temporaries generated internally by the compiler. Stored in the @code{volatil} field and printed as @samp{/v}. There are several other cases where we re-use this bit on different RTL constructs to mean things other than 'volatile': it pretty much only has the conventional meaning on MEM objects. > > The reason why LRA generates the store insn in the entry bb is as follows: > LRA emits insns to save volatile registers in the inheritance/splitting pass. > In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns > in > the EBBs in reverse order from the last insn to the first insn. When LRA sees > a > write to a pseudo (that has been assigned a volatile register), and there is a > read following the write, with an intervening call insn between the write and > read, > then LRA generates a spill immediately after the write and a restore > immediately > before the read. In the above test, there is an EBB containing the entry bb > and > the bb with the printf call. In the entry bb, there is a write to x1 > (basically > a copy from x0 to x1) and in the printf bb, there is a read of x1 after the > call > insn. So LRA generates a spill in the entry bb. > > Without patch, x19 is chosen to hold the value of x0. Since x19 is a > non-volatile, > the input RTL to the shrink wrap pass does not have any code to save x19 to > the > stack. Only the insn that copies x0 to x19 is present in the entry bb. In the > shrink wrap pass, this insn is moved down the cfg to the bb containing the > call > to printf, thereby allowing prolog to be allocated only where needed. Thus > shrink > wrap succeeds. > > > Shrink wrap can be made to succeed if the save of x1 occurs just before the > call > insn, instead of generating it after the write to x1. This will ensure that > the > spill does not occur in the entry bb. In fact, it is more efficient if the > save > occurs only in the path containing the printf call instead of occurring in the > entry bb. > > I have a patch (bootstrapped and regtested on powerpc) that makes changes in > LRA to save volatile registers before a call instead of after the write to the > volatile. With this patch, both the above tests pass. > > Since the patch for PR111673 has been approved by Vladimir, I plan to > commit the patch to trunk. And I will fix the test failures after doing the > commit. > I think I'd probably understand this better if you could give some example RTL (before and after). Do you have that? R. > Regards, > Surya > > > > On 28/11/23 7:18 pm, Richard Earnshaw wrote: >> >> >> On 28/11/2023 12:52, Surya Kumari Jangala wrote: >>> Hi Richard, >>> Thanks a lot for your response! >>> >>> Another failure reported by the Linaro CI is as follows : >>> (Note: I am planning to send a separate mail for each failure, as this will >>> make >>> the discussion easy to track) >>> >>> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve >>> -moverride=tune=none check-function-bodies dup_x0_m >>> >>> Expected code: >>> >>> ... >>> add (x[0-9]+), x0, #?1 >>> mov (p[0-7])\.b, p15\.b >>> mov z0\.d, \2/m, \1 >>> ... >>> ret >>> >>> >>> Code obtained w/o patch: >>> addvl sp, sp, #-1 >>> str p15, [sp] >>> add x0, x0, 1 >>> mov p3.b, p15.b >>> mov z0.d, p3/m, x0 >>> ldr p15, [sp] >>> addvl sp, sp, #1 >>> ret >>> >>> Code obtained w/ patch: >>> addvl sp, sp, #-1 >>> str p15, [sp] >>> mov p3.b, p15.b >>> add x0, x0, 1 >>> mov z0.d, p3/m, x0 >>> ldr p15, [sp] >>> addvl sp, sp, #1 >>> ret >>> >>> As we can see, with the patch, the following two instructions are >>> interchanged: >>> add x0, x0, 1 >>> mov p3.b, p15.b >> >> Indeed, both look acceptable results to me, especially given that we don't >> schedule results at -O1. >> >> There's two ways of fixing this: >> 1) Simply swap the order to what the compiler currently generates (which is >> a little fragile, since it might flip back someday). >> 2) Write the test as >> >> >> ** ( >> ** add (x[0-9]+), x0, #?1 >> ** mov (p[0-7])\.b, p15\.b >> ** mov z0\.d, \2/m, \1 >> ** | >> ** mov (p[0-7])\.b, p15\.b >> ** add (x[0-9]+), x0, #?1 >> ** mov z0\.d, \1/m, \2 >> ** ) >> >> Note, we need to swap the match names in the third insn to account for the >> different order of the earlier instructions. >> >> Neither is ideal, but the second is perhaps a little more bomb proof. >> >> I don't really have a strong feeling either way, but perhaps the second is >> slightly preferable. >> >> Richard S: thoughts? >> >> R. >> >>> I believe that this is fine and the test can be modified to allow it to >>> pass on >>> aarch64. Please let me know what you think. >>> >>> Regards, >>> Surya >>> >>> >>> On 24/11/23 4:18 pm, Richard Earnshaw wrote: >>>> >>>> >>>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: >>>>> Hi Richard, >>>>> Ping. Please let me know if the test failure that I mentioned in the mail >>>>> below can be handled by changing the expected generated code. I am not >>>>> conversant with arm, and hence would appreciate your help. >>>>> >>>>> Regards, >>>>> Surya >>>>> >>>>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: >>>>>> Hi Richard, >>>>>> I had submitted a patch for review >>>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) >>>>>> regarding scaling save/restore costs of callee save registers with block >>>>>> frequency in the IRA pass (PR111673). >>>>>> >>>>>> This patch has been approved by VMakarov >>>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). >>>>>> >>>>>> With this patch, we are seeing performance improvements with spec on x86 >>>>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). >>>>>> >>>>>> I received a mail from Linaro about some failures seen in the CI >>>>>> pipeline with >>>>>> this patch. I have analyzed the failures and I wish to discuss the >>>>>> analysis with you. >>>>>> >>>>>> One failure reported by the Linaro CI is: >>>>>> >>>>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, >>>>>> r[0-9]+, \\[r[0-9]+\\] 2 >>>>>> >>>>>> The diff in the assembly between trunk and patch is: >>>>>> >>>>>> 93c93 >>>>>> < push {r4, r5} >>>>>> --- >>>>>>> push {fp} >>>>>> 95c95 >>>>>> < ldrexd r4, r5, [r0] >>>>>> --- >>>>>>> ldrexd fp, ip, [r0] >>>>>> 99c99 >>>>>> < pop {r4, r5} >>>>>> --- >>>>>>> ldr fp, [sp], #4 >>>>>> >>>>>> >>>>>> The test fails with patch because the ldrexd insn uses fp & ip registers >>>>>> instead >>>>>> of r[0-9]+ >>>>>> >>>>>> But the code produced by patch is better because it is pushing and >>>>>> restoring only >>>>>> one register (fp) instead of two registers (r4, r5). Hence, this test >>>>>> can be >>>>>> modified to allow it to pass on arm. Please let me know what you think. >>>>>> >>>>>> If you need more information, please let me know. I will be sending >>>>>> separate mails >>>>>> for the other test failures. >>>>>> >>>> >>>> Thanks for looking at this. >>>> >>>> >>>> The key part of this test is that the compiler generates LDREXD. The >>>> registers used for that are pretty much irrelevant as we don't match them >>>> to any other operations within the test. So I'd recommend just testing >>>> for the mnemonic and not for any of the operands (ie just match >>>> "ldrexd\t"). >>>> >>>> R. >>>> >>>>>> Regards, >>>>>> Surya >>>>>> >>>>>> >>>>>>