Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Richard Sandiford Fri, 21 Aug 2020 02:02:54 -0700

xiezhiheng <xiezhih...@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Thursday, August 20, 2020 4:55 PM
>> To: xiezhiheng <xiezhih...@huawei.com>
>> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng <xiezhih...@huawei.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> Sent: Wednesday, August 19, 2020 6:06 PM
>> >> To: xiezhiheng <xiezhih...@huawei.com>
>> >> Cc: Richard Biener <richard.guent...@gmail.com>;
>> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >>
>> >> xiezhiheng <xiezhih...@huawei.com> writes:
>> >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first 
>> >> > for a
>> try,
>> >> > including all the add/sub arithmetic intrinsics.
>> >> >
>> >> > Something like faddp intrinsic which only handles floating-point
>> operations,
>> >> > both FP and NONE flags are suitable for it because FLAG_FP will be
>> added
>> >> > later if the intrinsic handles floating-point operations.  And I prefer 
>> >> > FP
>> >> since
>> >> > it would be more clear.
>> >>
>> >> Sounds good to me.
>> >>
>> >> > But for qadd intrinsics, they would modify FPSR register which is a
>> scenario
>> >> > I missed before.  And I consider to add an additional flag
>> >> FLAG_WRITE_FPSR
>> >> > to represent it.
>> >>
>> >> I don't think we make any attempt to guarantee that the Q flag is
>> >> meaningful after saturating intrinsics.  To do that, we'd need to model
>> >> the modification of the flag in the .md patterns too.
>> >>
>> >> So my preference would be to leave this out and just use NONE for the
>> >> saturating forms too.
>> >
>> > The problem is that the test case in the attachment has different results
>> under -O0 and -O2.
>> 
>> Right.  But my point was that I don't think that use case is supported.
>> If you want to use saturating instructions and read the Q flag afterwards,
>> the saturating instructions need to be inline asm too.
>> 
>> > In gimple phase statement:
>> >   _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
>> > would be treated as dead code if we set NONE flag for saturating 
>> > intrinsics.
>> > Adding FLAG_WRITE_FPSR would help fix this problem.
>> >
>> > Even when we set FLAG_WRITE_FPSR, the uqadd insn:
>> >   (insn 11 10 12 2 (set (reg:V2SI 97)
>> >         (us_plus:V2SI (reg:V2SI 98)
>> >             (reg:V2SI 99))) {aarch64_uqaddv2si}
>> >      (nil))
>> > could also be eliminated in RTL phase because this insn will be treated as
>> dead insn.
>> > So I think we might also need to modify saturating instruction patterns
>> adding the side effect of set the FPSR register.
>> 
>> The problem is that FPSR is global state and we don't in general
>> know who might read it.  So if we modelled the modification of the FPSR,
>> we'd never be able to fold away saturating arithmetic that does actually
>> saturate at compile time, because we'd never know whether the program
>> wanted the effect on the Q flag result to be visible (perhaps to another
>> function that the compiler can't see).  We'd also be unable to remove
>> results that really are dead.
>> 
>> So I think this is one of those situations in which we can't keep all
>> constituents happy.  Catering for people who want to read the Q flag
>> would make things worse for those who want saturating arithmetic to be
>> optimised as aggressively as possible.  And the same holds in reverse.
>
> I agree.  The test case is extracted from 
> gcc.target/aarch64/advsimd-intrinsics/vqadd.c
> If we set NONE flag for saturating intrinsics, it would fail in regression 
> because some qadd
> intrinsics would be treated as dead code and be eliminated.
>   Running target unix
>   Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ...
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  (test for excess 
> errors)
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  (test for 
> excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  (test for excess 
> errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  (test for 
> excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
>   PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
>   FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects  execution test


Ah, OK.

> So maybe this test case should only be tested at -O0 level?

Looks like the saturating intrinsics might need a bit more thought.
Would you mind submitting the patch with just the other parts?
Those were uncontroversial and it would be a shame to hold them
up over this.

Thanks,
Richard

Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Reply via email to