Iain Sandoe <i...@sandoe.co.uk> writes:
> Hi Richard,
>
> (hopefully, my take won’t cloud the issue ….)
>
>> On 14 Feb 2022, at 16:00, Richard Sandiford via Gcc-patches 
>> <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi Vlad,
>> 
>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>>>> Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>> Two issues resulted in this PR, which manifests when we force a constant 
>>>>> into
>>>>> memory in LRA (in PIC code on Darwin).  The presence of such forced 
>>>>> constants
>>>>> is quite dependent on other RTL optimisations, and it is easy for the 
>>>>> issue to
>>>>> become latent for a specific case.
>>>>> 
>>>>> First, in the Darwin-specific rs6000 backend code, we were not being 
>>>>> careful
>>>>> enough in rejecting invalid symbolic addresses.  Specifically, when 
>>>>> generating
>>>>> PIC code, we require a SYMBOL_REF to be wrapped in an 
>>>>> UNSPEC_MACHOPIC_OFFSET.
>>>>> 
>>>>> Second, LRA was attempting to load a register using an invalid lo_sum 
>>>>> address.
>>>>> 
>>>>> The LRA changes are approved in the PR by Vladimir, and the RS6000 
>>>>> changes are
>>>>> Darwin-specific (although, of course, any observations are welcome).
>>>>> 
>>>>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>>>>> powerpc64-linux (m32/m64) -D
>>>>> powerpc64le-linux  -D
>>>>> powerpc64-aix -Ada -Go -D
>>>>> aarch64-linux -Ada -D
>>>>> x86_64-linux all langs -D
>>>>> powerpc-darwin9 (master and 11.2) -D -Go.
>>>>> 
>>>>> pushed to master, thanks,
>>>>> Iain
>>>>> 
>>>>> Signed-off-by: Iain Sandoe <i...@sandoe.co.uk>
>>>>> Co-authored-by: Vladimir Makarov <vmaka...@redhat.com>
>>>>> 
>>>>>   PR target/104117
>>>>> 
>>>>> gcc/ChangeLog:
>>>>> 
>>>>>   * config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>>>>>   Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>>>>>   emitting PIC code.
>>>>>   (legitimate_lo_sum_address_p): Likewise.
>>>>>   * lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>>>>>   load from an invalid lo_sum address.
>>>>> ---
>>>>>  gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>>>>>  gcc/lra-constraints.cc      | 17 ++---------------
>>>>>  2 files changed, 38 insertions(+), 17 deletions(-)
>>>>> 
>>>>> […]
>>>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>>>> index fdff9e0720a..c700c3f4578 100644
>>>>> --- a/gcc/lra-constraints.cc
>>>>> +++ b/gcc/lra-constraints.cc
>>>>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>>>>             *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>>>>             if (!valid_address_p (op, &ad, cn))
>>>>>               {
>>>>> -               /* Try to put lo_sum into register.  */
>>>>> -               insn = emit_insn (gen_rtx_SET
>>>>> -                                 (new_reg,
>>>>> -                                  gen_rtx_LO_SUM (Pmode, new_reg, 
>>>>> addr)));
>>>>> -               code = recog_memoized (insn);
>>>>> -               if (code >= 0)
>>>>> -                 {
>>>>> -                   *ad.inner = new_reg;
>>>>> -                   if (!valid_address_p (op, &ad, cn))
>>>>> -                     {
>>>>> -                       *ad.inner = addr;
>>>>> -                       code = -1;
>>>>> -                     }
>>>>> -                 }
>>>>> -
>>>>> +               *ad.inner = addr; /* Punt.  */
>>>>> +               code = -1;
>>>>>               }
>>>>>           }
>>>>>         if (code < 0)
>>>> Could you go into more details about this?  Why is it OK to continue
>>>> to try:
>>>> 
>>>>   (lo_sum new_reg addr)
>>>> 
>>>> directly as an address (the context at the top of the hunk), but not try
>>>> moving the lo_sum into a register?  They should be semantically equivalent,
>>>> so it seems that if one is wrong, the other would be too.
>>>> 
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>> 
>>> I think there is no need for this code and it is misleading.  If 
>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
>>> will help for any existing target.  As machine-dependent code for any 
>>> target most probably (for ppc64 darwin it is exactly the case) checks 
>>> address only in memory, it can wrongly accept wrong address by reloading 
>>> it into reg and use it in memory. So these are my arguments for the 
>>> remove this code from process_address_1.
>> 
>> I'm probably making too much of this, but:
>> 
>> I think the code is potentially useful in that existing targets do forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting an
>> unfortunate precedent.
>> 
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>
> If X is an invalid address (in this case for PIC code a SYMBOL_REF is not
> valid without an UNSPEC wrapper) - then (high X) and (lo_sum (high X) X) 
> are also invalid.  AFAICT the target never (this late in the process) gets
> the opportunity to apply a transform to validate the address.
>
>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>
> once again if X is plain invalid - loading it into a register or splitting it
> does not solve that - we’d need to call the target’s address legitimizer.
>
> Of course, if we conclude that it is a target bug, I’ll try to fix it up .

But if the target treats:

   (set R2 (high X))
   (set R1 (lo_sum R2 X))

as an invalid way of setting R1 to X (e.g. because X is plain invalid),
then recog should fail on at least one instruction, preferably both.
So yeah, this does feel like a target bug to me.

We shouldn't assume that lo_sum/high splits will only be generated by
the target.  Target-independent code can generate them “spontaneously”,
provided that it validates the result.  (Combine does this too,
for example.)

It looks like the LRA code might have been missing a HIGH in the
fallback case, but it doesn't sound like that was the bug that was
being fixed.

Thanks,
Richard

Reply via email to