On 8/16/24 9:43 PM, pan2...@intel.com wrote:
From: Pan Li <pan2...@intel.com>
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected. For example as
below code.
signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
b[0] = -40;
c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9;
__builtin_printf("%d\n", c);
}
After expanding we have:
;; _6 = .SAT_ADD (_3, 9);
(insn 8 7 9 (set (reg:DI 143)
(high:DI (symbol_ref:DI ("d") [flags 0x86] <var_decl d>)))
(nil))
(insn 9 8 10 (set (reg/f:DI 142)
(mem/f/c:DI (lo_sum:DI (reg:DI 143)
(symbol_ref:DI ("d") [flags 0x86] <var_decl d>)) [1 d+0 S8
A64]))
(nil))
(insn 10 9 11 (set (reg:HI 144 [ _3 ])
(sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8])))
"test.c":7:10 -1
(nil))
The convert from signed char to unsigned short will have sign_extend rtl
as above. And finally become the lb insn as below:
lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8
lui a0,0x1a
addi a5,a1,9
slli a5,a5,0x30
srli a5,a5,0x30 // a5 is 65505
sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE
The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we
actually want to compare 65505 and 65496 (0xffd8). Thus we need to
clean up the high bits to ensure this.
The below test suites are passed for this patch:
* The rv64gcv fully regression test.
PR target/116278
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands
and sum.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.
PR 116278
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands 0
and remove the special handing for SImode in RV64.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_u_add-11.c: Adjust asm check body.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.
OK. And I think this shows the basic approach we want to use if there
are other builtins that accept sub-word modes. ie, get the operands
into X mode (by extending them as appropriate), then do as much work in
X mode as possible, then truncate the result if needed.
Thanks for your patience on this.
Jeff