xiezhiheng <xiezhih...@huawei.com> writes: >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> Sent: Thursday, August 20, 2020 4:55 PM >> To: xiezhiheng <xiezhih...@huawei.com> >> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> emitted at -O3 >> >> xiezhiheng <xiezhih...@huawei.com> writes: >> >> -----Original Message----- >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> >> Sent: Wednesday, August 19, 2020 6:06 PM >> >> To: xiezhiheng <xiezhih...@huawei.com> >> >> Cc: Richard Biener <richard.guent...@gmail.com>; >> gcc-patches@gcc.gnu.org >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> >> emitted at -O3 >> >> >> >> xiezhiheng <xiezhih...@huawei.com> writes: >> >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first >> >> > for a >> try, >> >> > including all the add/sub arithmetic intrinsics. >> >> > >> >> > Something like faddp intrinsic which only handles floating-point >> operations, >> >> > both FP and NONE flags are suitable for it because FLAG_FP will be >> added >> >> > later if the intrinsic handles floating-point operations. And I prefer >> >> > FP >> >> since >> >> > it would be more clear. >> >> >> >> Sounds good to me. >> >> >> >> > But for qadd intrinsics, they would modify FPSR register which is a >> scenario >> >> > I missed before. And I consider to add an additional flag >> >> FLAG_WRITE_FPSR >> >> > to represent it. >> >> >> >> I don't think we make any attempt to guarantee that the Q flag is >> >> meaningful after saturating intrinsics. To do that, we'd need to model >> >> the modification of the flag in the .md patterns too. >> >> >> >> So my preference would be to leave this out and just use NONE for the >> >> saturating forms too. >> > >> > The problem is that the test case in the attachment has different results >> under -O0 and -O2. >> >> Right. But my point was that I don't think that use case is supported. >> If you want to use saturating instructions and read the Q flag afterwards, >> the saturating instructions need to be inline asm too. >> >> > In gimple phase statement: >> > _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6); >> > would be treated as dead code if we set NONE flag for saturating >> > intrinsics. >> > Adding FLAG_WRITE_FPSR would help fix this problem. >> > >> > Even when we set FLAG_WRITE_FPSR, the uqadd insn: >> > (insn 11 10 12 2 (set (reg:V2SI 97) >> > (us_plus:V2SI (reg:V2SI 98) >> > (reg:V2SI 99))) {aarch64_uqaddv2si} >> > (nil)) >> > could also be eliminated in RTL phase because this insn will be treated as >> dead insn. >> > So I think we might also need to modify saturating instruction patterns >> adding the side effect of set the FPSR register. >> >> The problem is that FPSR is global state and we don't in general >> know who might read it. So if we modelled the modification of the FPSR, >> we'd never be able to fold away saturating arithmetic that does actually >> saturate at compile time, because we'd never know whether the program >> wanted the effect on the Q flag result to be visible (perhaps to another >> function that the compiler can't see). We'd also be unable to remove >> results that really are dead. >> >> So I think this is one of those situations in which we can't keep all >> constituents happy. Catering for people who want to read the Q flag >> would make things worse for those who want saturating arithmetic to be >> optimised as aggressively as possible. And the same holds in reverse. > > I agree. The test case is extracted from > gcc.target/aarch64/advsimd-intrinsics/vqadd.c > If we set NONE flag for saturating intrinsics, it would fail in regression > because some qadd > intrinsics would be treated as dead code and be eliminated. > Running target unix > Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ... > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O0 (test for excess > errors) > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O0 execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O1 (test for excess > errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O1 execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 (test for excess > errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O3 -g (test for > excess errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O3 -g execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Os (test for excess > errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Os execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Og -g (test for > excess errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Og -g execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto > -fno-use-linker-plugin -flto-partition=none (test for excess errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto > -fno-use-linker-plugin -flto-partition=none execution test > PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects execution test
Ah, OK. > So maybe this test case should only be tested at -O0 level? Looks like the saturating intrinsics might need a bit more thought. Would you mind submitting the patch with just the other parts? Those were uncontroversial and it would be a shame to hold them up over this. Thanks, Richard