On 11/12/23 20:22, pan2...@intel.com wrote:
From: Pan Li <pan2...@intel.com>

Update in v4:
* Merge upstream and removed some independent changes.

Update in v3:
* Take known_le instead of known_lt for vector size.
* Return NULL_RTX when gap is not equal 0 and not constant.

Update in v2:
* Move vector type support to get_stored_val.

Original log:

This patch would like to allow the vector mode in the
get_stored_val in the DSE. It is valid for the read
rtx if and only if the read bitsize is less than the
stored bitsize.

Given below example code with
--param=riscv-autovec-preference=fixed-vlmax.

vuint8m1_t test () {
   uint8_t arr[32] = {
     1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
     1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
   };

   return __riscv_vle8_v_u8m1(arr, 32);
}

Before this patch:
test:
   lui     a5,%hi(.LANCHOR0)
   addi    sp,sp,-32
   addi    a5,a5,%lo(.LANCHOR0)
   li      a3,32
   vl2re64.v       v2,0(a5)
   vsetvli zero,a3,e8,m1,ta,ma
   vs2r.v  v2,0(sp)             <== Unnecessary store to stack
   vle8.v  v1,0(sp)             <== Ditto
   vs1r.v  v1,0(a0)
   addi    sp,sp,32
   jr      ra

After this patch:
test:
   lui     a5,%hi(.LANCHOR0)
   addi    a5,a5,%lo(.LANCHOR0)
   li      a4,32
   addi    sp,sp,-32
   vsetvli zero,a4,e8,m1,ta,ma
   vle8.v  v1,0(a5)
   vs1r.v  v1,0(a0)
   addi    sp,sp,32
   jr      ra

Below tests are passed within this patch:
* The risc-v regression test.
* The x86 bootstrap and regression test.
* The aarch64 regression test.

        PR target/111720

gcc/ChangeLog:

        * dse.cc (get_stored_val): Allow vector mode if read size is
        less than or equal to stored size.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/pr111720-0.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-1.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-10.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-2.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-3.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-4.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-5.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-6.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-7.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-8.c: New test.
        * gcc.target/riscv/rvv/base/pr111720-9.c: New test.
OK for the trunk.




+  else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
+    && known_le (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
+    && targetm.modes_tieable_p (read_mode, store_mode))
+    read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
    else
      read_reg = extract_low_bits (read_mode, store_mode,
                                 copy_rtx (store_info->rhs));
It may not matter, especially for RV, but we could possibly have a mixture of scalar and vector modes in the RTL. Say a vector store followed by a scalar read or vice-versa.

I wouldn't try to handle that case unless we had actual evidence it was useful to do so. Just wanted to point out that unlike pseudos we can have multiple modes referencing the same memory location.

Jeff

Reply via email to