Jeff Law <jeffreya...@gmail.com> writes: > On 11/8/23 23:08, pan2...@intel.com wrote: >> From: Pan Li <pan2...@intel.com> >> >> Update in v2: >> * Move vector type support to get_stored_val. >> >> Original log: >> >> This patch would like to allow the vector mode in the >> get_stored_val in the DSE. It is valid for the read >> rtx if and only if the read bitsize is less than the >> stored bitsize. >> >> Given below example code with >> --param=riscv-autovec-preference=fixed-vlmax. >> >> vuint8m1_t test () { >> uint8_t arr[32] = { >> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, >> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, >> }; >> >> return __riscv_vle8_v_u8m1(arr, 32); >> } >> >> Before this patch: >> test: >> lui a5,%hi(.LANCHOR0) >> addi sp,sp,-32 >> addi a5,a5,%lo(.LANCHOR0) >> li a3,32 >> vl2re64.v v2,0(a5) >> vsetvli zero,a3,e8,m1,ta,ma >> vs2r.v v2,0(sp) <== Unnecessary store to stack >> vle8.v v1,0(sp) <== Ditto >> vs1r.v v1,0(a0) >> addi sp,sp,32 >> jr ra >> >> After this patch: >> test: >> lui a5,%hi(.LANCHOR0) >> addi a5,a5,%lo(.LANCHOR0) >> li a4,32 >> addi sp,sp,-32 >> vsetvli zero,a4,e8,m1,ta,ma >> vle8.v v1,0(a5) >> vs1r.v v1,0(a0) >> addi sp,sp,32 >> jr ra >> >> Below tests are passed within this patch: >> >> * The x86 bootstrap and regression test. >> * The aarch64 regression test. >> * The risc-v regression test. >> >> PR target/111720 >> >> gcc/ChangeLog: >> >> * dse.cc (get_stored_val): Allow vector mode if the read >> bitsize is less than stored bitsize. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/base/pr111720-0.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-1.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-10.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-2.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-3.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-4.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-5.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-6.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-7.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-8.c: New test. >> * gcc.target/riscv/rvv/base/pr111720-9.c: New test. > We're always getting the lowpart here AFAICT and it appears that all the > right thing should happen if gen_lowpart_common fails (it returns NULL, > which bubbles up and is the right return value from get_stored_val if it > can't be optimized).
Yeah, we should always be operating on the lowpart, but it looks like there's a latent bug. This check: if (gap.is_constant () && maybe_ne (gap, 0)) { ... } else ... means that we ignore the gap if it's a nonzero runtime value. I guess it should be: if (maybe_ne (gap, 0)) { if (!gap.is_constant ()) return NULL_RTX; ... } instead. Alternatively, we could remove the is_constant condition and fix PR87815 in a different way, e.g. by protecting the smallest_int_mode_for_size with a tighter condition. That might allow a similar DSE optimisation to this patch for nonzero offsets, thanks to: if (multiple_p (shift, GET_MODE_BITSIZE (new_mode)) && known_le (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))) { /* Try to implement the shift using a subreg. */ ... > Did you want to use known_le so that you'd pick up the case when the two > modes are the same size? Or was known_lt the test you really wanted > (and if so, why). Agree it should be known_le FWIW. Thanks, Richard