[PATCH v3] DSE: Allow vector type for get_stored_val when read < store

2023-11-12 Thread pan2 . li
From: Pan Li Update in v3: * Take known_le instead of known_lt for vector size. * Return NULL_RTX when gap is not equal 0 and not constant. Update in v2: * Move vector type support to get_stored_val. Original log: This patch would like to allow the vector mode in the get_stored_val in the DSE.

[PATCH v1] RISC-V: Support FP l/ll round and rint HF mode autovec

2023-11-12 Thread pan2 . li
From: Pan Li This patch would like to support the FP below API auto vectorization with different type size ++---+--+ | API| RV64 | RV32 | ++---+--+ | lrintf16 | HF => DI | HF => SI | | llrintf16 | HF => DI | HF => DI |

[PATCH v1] RISC-V: Fix RVV dynamic frm tests failure

2023-11-12 Thread pan2 . li
From: Pan Li The hancement of mode-switching performs some optimization when emit the frm backup insn, some redudant fsrm insns are removed for the following test cases. This patch would like to adjust the asm check for above optimization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rv

[PATCH v4] DSE: Allow vector type for get_stored_val when read < store

2023-11-12 Thread pan2 . li
From: Pan Li Update in v4: * Merge upstream and removed some independent changes. Update in v3: * Take known_le instead of known_lt for vector size. * Return NULL_RTX when gap is not equal 0 and not constant. Update in v2: * Move vector type support to get_stored_val. Original log: This patch

[PATCH v1] RISC-V: Refine the mask generation for vec_init case 2

2023-11-14 Thread pan2 . li
From: Pan Li We take vec_init element int mode when generate the mask for case 2. But actually we don't need as many bits as the element. The extra bigger mode may introduce some unnecessary insns. For example as below code: typedef int64_t v16di __attribute__ ((vector_size (16 * 8))); void __a

[PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f

2023-11-28 Thread pan2 . li
From: Pan Li The exact_div requires the exactly multiple of the divider. Unfortunately, the condition will be broken when zve32f in some cases. For example, potential_ew is 8 BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4] This patch would like to ensure the precondition of exact_div when get_vec_mode

[PATCH v1] RISC-V: Bugfix for legitimize move when get vec mode in zve32f

2023-11-29 Thread pan2 . li
From: Pan Li When require mode after get_vec_mode in riscv_legitimize_move, there will be precondition that the mode is exists. Or we will have E_VOIDMode and of course have ICE when required. Typically we should first check the mode exists or not before require, or more friendly like leverage e

[PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-19 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will onvert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1,

[PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1,

[PATCH v3] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1,

[PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread pan2 . li
From: Pan Li Update in v4: * Add test for _Float16. * Remove unnecessary macro in def.h for test. Original log: This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil

[PATCH v1] RISC-V: Leverage __builtin_xx instead of math.h for test

2023-09-21 Thread pan2 . li
From: Pan Li The math.h may have problems in some environment, take __builtin__xx instead for testing. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: Remove reference to math.h. * gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.

[PATCH v1] RISC-V: Remove arch and abi option for run test case.

2023-09-21 Thread pan2 . li
From: Pan Li Remove the -march and -mabi. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Remove arch and abi. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto. Signed-off-by: Pan

[PATCH v1] RISC-V: Rename the test macro for math autovec test

2023-09-21 Thread pan2 . li
From: Pan Li Rename TEST_CEIL to TEST_UNARY_CALL for the underlying function autovec patch testing. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/test-math.h: Rename. * gcc.target/riscv/rvv/autovec/math-ceil-0.c: Ditto. * gcc.target/riscv/rvv/autovec/math-ceil-

[PATCH v1] RISCV-V: Suport FP floor auto-vectorization

2023-09-21 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the floor API in math.h. It depends on the -ffast-math option. When we would like to call floor/floorf like v2 = floor (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RDN *

[PATCH v1] RISC-V: Move ceil test cases to unop folder

2023-09-22 Thread pan2 . li
From: Pan Li gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c:

[PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread pan2 . li
From: Pan Li We vectorized below ceil code already. void test_ceil (float *out, float *in, int count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_ceilf (in[i]); } Before this patch: vfmv.v.xv4,fa0 // can be removed vfabs.v v0,v1 vmv1r.v v2,v1 // can be r

[PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread pan2 . li
From: Pan Li We vectorized below ceil code already. void test_ceil (float *out, float *in, int count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_ceilf (in[i]); } Before this patch: vfmv.v.xv4,fa0 // can be removed vfabs.v v0,v1 vmv1r.v v2,v1 // can be r

[PATCH v2] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the floor API in math.h. It depends on the -ffast-math option. When we would like to call floor/floorf like v2 = floor (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RDN *

[PATCH v1] RISC-V: Remove FP run test for ceil.

2023-09-22 Thread pan2 . li
From: Pan Li FP16 is not well reconciled when linking. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 --- 1 file changed, 39 deletions(-) delete mod

[PATCH v3] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the floor API in math.h. It depends on the -ffast-math option. When we would like to call floor/floorf like v2 = floor (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RDN *

[PATCH v1] RISC-V: Fix fortran ICE/PR111546 when RV32 vec_init

2023-09-23 Thread pan2 . li
From: Pan Li When broadcast the reperated element, we take the mask machine mode by mistake. This patch would like to fix it by leveraging the machine mode of the element. The below test case in RV32 will be fixed. * gcc/testsuite/gfortran.dg/overload_5.f90 PR target/111546 gcc/Change

[PATCH v2] RISC-V: Fix fortran ICE/PR111546 when RV32 vec_init

2023-09-23 Thread pan2 . li
From: Pan Li When broadcast the reperated element, we take the mask_int_mode by mistake. This patch would like to fix it by leveraging the machine mode of the element. The below test case in RV32 will be fixed. * gcc/testsuite/gfortran.dg/overload_5.f90 PR target/111546 gcc/ChangeLog:

[PATCH v1] RISC-V: Support FP nearbyint auto-vectorization

2023-09-25 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags

[PATCH v1] RISC-V: Rename rounding const fp function for refactor

2023-09-25 Thread pan2 . li
From: Pan Li The rounding related API shared one const, rename it to avoid unnecessary redundant code. gcc/ChangeLog: * config/riscv/riscv-v.cc (gen_ceil_const_fp): Remove. (get_fp_rounding_coefficient): Rename. (gen_floor_const_fp): Remove. (expand_vec_ceil): Ta

[PATCH v2] RISC-V: Support FP nearbyint auto-vectorization

2023-09-26 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags

[PATCH v1] RISC-V: Support FP rint auto-vectorization

2023-09-26 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the rint API in math.h. It depends on the -ffast-math option. When we would like to call rint/rintf like v2 = rint (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1 * vfcvt.f.

[PATCH v1] RISC-V: Support FP round auto-vectorization

2023-09-26 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the round API in math.h. It depends on the -ffast-math option. When we would like to call round/roundf like v2 = round (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RMM *

[PATCH v1] RISC-V: Support FP trunc auto-vectorization

2023-09-26 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the trunc API in math.h. It depends on the -ffast-math option. When we would like to call trunc/truncf like v2 = trunc (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.rtz.x.f v3, v1 *

[PATCH v1] RISC-V: Support FP roundeven auto-vectorization

2023-09-27 Thread pan2 . li
From: Pan Li This patch would like to support auto-vectorization for the roundeven API in math.h. It depends on the -ffast-math option. When we would like to call roundeven like v2 = roundeven (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1,

[PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization

2023-09-27 Thread pan2 . li
From: Pan Li This patch would like to support the auto-vectorization from the INT64 to FP16. We take below steps for the conversion. * INT64 to FP32. * FP32 to FP16. Given sample code as below: void test_func (int64_t * __restrict a, _Float16 *b, unsigned n) { for (unsigned i = 0; i < n; i++)

[PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-26 Thread pan2 . li
From: Pan Li The zip benchmark of coremark-pro have one SAT_SUB like pattern but truncated as below: void test (uint16_t *x, unsigned b, unsigned n) { unsigned a = 0; register uint16_t *p = x; do { a = *--p; *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB } while

[PATCH v2] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-06-26 Thread pan2 . li
From: Pan Li This patch would like to add the middle-end presentation for the saturation truncation. Aka set the result of truncated value to the max value when overflow. It will take the pattern similar as below. Form 1: #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \ NT __attribute__((noinline))

[PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

2024-06-27 Thread pan2 . li
From: Pan Li This patch would like to support the form of unsigned scalar .SAT_ADD when one of the op is IMM. For example as below: Form IMM: #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ {

[PATCH v1] Vect: Distribute truncation into .SAT_SUB operands

2024-06-29 Thread pan2 . li
From: Pan Li To get better vectorized code of .SAT_SUB, we would like to avoid the truncated operation for the assignment. For example, as below. unsigned int _1; unsigned int _2; _9 = (unsigned short int).SAT_SUB (_1, _2); If we make sure that the _1 is in the range of unsigned short int. S

[PATCH v1 3/4] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3

2024-06-30 Thread pan2 . li
From: Pan Li This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 3. Aka: Form 3: #define DEF_SAT_U_ADD_IMM_FMT_3(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_3 (T x) \

[PATCH v1 2/4] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2

2024-06-30 Thread pan2 . li
From: Pan Li This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 2. Aka: Form 2: #define DEF_SAT_U_ADD_IMM_FMT_2(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ { \ retu

[PATCH v1 1/4] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1

2024-06-30 Thread pan2 . li
From: Pan Li This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 1. Aka: Form 1: #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ {\

[PATCH v1 4/4] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4

2024-06-30 Thread pan2 . li
From: Pan Li This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 4. Aka: Form 4: #define DEF_SAT_U_ADD_IMM_FMT_4(T)\ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_4 (T x)

[PATCH v1] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-01 Thread pan2 . li
From: Pan Li This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ {\

[PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

2024-07-01 Thread pan2 . li
From: Pan Li The .SAT_TRUNC has the input and output types, aka cvt from itype to otype and the sizeof (otype) < sizeof (itype). The previous patch only allows the sizeof (otype) == sizeof (itype) / 2. But actually we have 1/4 and 1/8 truncation. This patch would like to support more types tru

[PATCH v2] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-01 Thread pan2 . li
From: Pan Li Update in v2: Rebase the upstream. Log in v1: This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \

[PATCH v1] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-02 Thread pan2 . li
From: Pan Li This patch would like to support the .SAT_TRUNC for the unsigned vector int. Given we have below example code: Form 1 #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \ void __attribute__((noinline)) \ vec_sat_u_truc_#

[PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-02 Thread pan2 . li
From: Pan Li This patch would like to support the .SAT_TRUNC for the unsigned vector int. Given we have below example code: Form 1 #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \ void __attribute__((noinline)) \ vec_sat_u_truc_#

[PATCH v2] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-02 Thread pan2 . li
From: Pan Li This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ {\

[PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB

2024-07-02 Thread pan2 . li
From: Pan Li It seems that the asm check is incorrect for truncated after SAT_SUB, we should take the vx check for vssubu instead of vv check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: Update vssubu check from vv to vx. * gcc.

[PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread pan2 . li
From: Pan Li According to the ISA, the zvfhmin sub extension should only contain convertion insn. Thus, the vfmv insn acts on FP16 should not be present when only the zvfhmin option is given. This patch would like to fix it by split the pred_broadcast define_insn into zvfhmin and zvfh part.

[PATCH v1] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-04 Thread pan2 . li
From: Pan Li This patch would like to implement the .SAT_TRUNC for the RISC-V backend. With the help of the RVV Vector Narrowing Fixed-Point Clip Instructions. The below SEW(S) are supported: * e64 => e32 * e64 => e16 * e64 => e8 * e32 => e16 * e32 => e8 * e16 => e8 Take below example to see

[PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-05 Thread pan2 . li
From: Pan Li This patch would like to add form 2 support for the .SAT_TRUNC. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ {\ bool overflow = x > (

[PATCH v2] Vect: Distribute truncation into .SAT_SUB operands

2024-07-05 Thread pan2 . li
From: Pan Li To get better vectorized code of .SAT_SUB, we would like to avoid the truncated operation for the assignment. For example, as below. unsigned int _1; unsigned int _2; _9 = (unsigned short int).SAT_SUB (_1, _2); If we make sure that the _1 is in the range of unsigned short int. S

[PATCH v2] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread pan2 . li
From: Pan Li This patch would like to implement the .SAT_TRUNC for the RISC-V backend. With the help of the RVV Vector Narrowing Fixed-Point Clip Instructions. The below SEW(S) are supported: * e64 => e32 * e64 => e16 * e64 => e8 * e32 => e16 * e32 => e8 * e16 => e8 Take below example to see

[PATCH v3] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread pan2 . li
From: Pan Li This patch would like to implement the .SAT_TRUNC for the RISC-V backend. With the help of the RVV Vector Narrowing Fixed-Point Clip Instructions. The below SEW(S) are supported: * e64 => e32 * e64 => e16 * e64 => e8 * e32 => e16 * e32 => e8 * e16 => e8 Take below example to see

[PATCH v1 2/2] RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 2

2024-07-08 Thread pan2 . li
From: Pan Li After the middle-end supported the vector mode of .SAT_ADD, add more testcases to ensure the correctness of RISC-V backend for form 2. Aka: Form 2: #define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM) \ T __attribute__((noinline))

[PATCH v1 1/2] RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 1

2024-07-08 Thread pan2 . li
From: Pan Li After the middle-end supported the vector mode of .SAT_ADD, add more testcases to ensure the correctness of RISC-V backend for form 1. Aka: Form 1: #define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM) \ T __attribute__((noinline))

[PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

2024-07-08 Thread pan2 . li
From: Pan Li To get better vectorized code of .SAT_SUB, we would like to avoid the truncated operation for the assignment. For example, as below. unsigned int _1; unsigned int _2; unsigned short int _4; _9 = (unsigned short int).SAT_SUB (_1, _2); If we make sure that the _1 is in the range of

[PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-10 Thread pan2 . li
From: Pan Li The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST. For example _1 = .SAT_ADD (_2, 9) comes from below sample code. Form 3: #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline))

[PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread pan2 . li
From: Pan Li This patch would like to add the test cases for the vector .SAT_SUB in the zip benchmark. Aka: Form in zip benchmark: #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \ void __attribute__((noinline))\ vec_sat_u_sub_##T1##_#

[PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-15 Thread pan2 . li
From: Pan Li Update in v3: * Rebase the upstream. * Adjust asm check. Original log: This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_t

[PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread pan2 . li
From: Pan Li The .SAT_TRUNC matching doesn't check the type has mode precision. Thus when bitfield like below will be recog as .SAT_TRUNC. struct e { unsigned pre : 12; unsigned a : 4; }; __attribute__((noipa)) void bug (e * v, unsigned def, unsigned use) { e & defE = *v; defE.a = min_

[PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes

2024-07-17 Thread pan2 . li
From: Pan Li This patch would like to add the doc for the Standard-Names ustrunc and sstrunc, include both the scalar and vector integer modes. gcc/ChangeLog: * doc/md.texi: Add Standard-Names ustrunc and sstrunc. Signed-off-by: Pan Li --- gcc/doc/md.texi | 12 1 file c

[PATCH v2] Doc: Add Standard-Names ustrunc and sstrunc for integer modes

2024-07-17 Thread pan2 . li
From: Pan Li This patch would like to add the doc for the Standard-Names ustrunc and sstrunc, include both the scalar and vector integer modes. gcc/ChangeLog: * doc/md.texi: Add Standard-Names ustrunc and sstrunc. Signed-off-by: Pan Li --- gcc/doc/md.texi | 12 1 file c

[PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread pan2 . li
From: Pan Li The SAT_TRUNC form 2 has below pattern matching. From: _18 = MIN_EXPR ; iftmp.0_11 = (unsigned int) _18; To: _18 = MIN_EXPR ; iftmp.0_11 = .SAT_TRUNC (_18); But if there is another use of _18 like below, the transform to the .SAT_TRUNC may have no earnings. For example:

[PATCH v1] Internal-fn: Only allow modes describe types for internal fn[PR115961]

2024-07-18 Thread pan2 . li
From: Pan Li The direct_internal_fn_supported_p has no restrictions for the type modes. For example the bitfield like below will be recog as .SAT_TRUNC. struct e { unsigned pre : 12; unsigned a : 4; }; __attribute__((noipa)) void bug (e * v, unsigned def, unsigned use) { e & defE = *v;

[PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-19 Thread pan2 . li
From: Pan Li The direct_internal_fn_supported_p has no restrictions for the type modes. For example the bitfield like below will be recog as .SAT_TRUNC. struct e { unsigned pre : 12; unsigned a : 4; }; __attribute__((noipa)) void bug (e * v, unsigned def, unsigned use) { e & defE = *v;

[PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-14 Thread pan2 . li
From: Pan Li This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_AD

[PATCH v5 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-14 Thread pan2 . li
From: Pan Li The patch implement the SAT_ADD in the riscv backend as the sample for both the scalar and vector. Given below vector as example: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { unsigned i; for (i = 0; i < n; i++) out[i] = (x[i] + y[i]) | (- (u

[PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int

2024-05-14 Thread pan2 . li
From: Pan Li For vectorize, we leverage the existing vect pattern recog to find the pattern similar to scalar and let the vectorizer to perform the rest part for standard name usadd3 in vector mode. The riscv vector backend have insn "Vector Single-Width Saturating Add and Subtract" which can be

[PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-15 Thread pan2 . li
From: Pan Li This patch adds early break auto-vectorization support for target which use length on partial vectorization. Consider this following example: unsigned vect_a[802]; unsigned vect_b[802]; void test (unsigned x, int n) { for (int i = 0; i < n; i++) { vect_b[i] = x + i; i

[PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-15 Thread pan2 . li
From: Pan Li After we supported vectorizable early exit in RISC-V, we would like to enable the gcc vect test for vectorizable early test. The vect-early-break_124-pr114403.c failed to vectorize for now. Because that the __builtin_memcpy with 8 bytes failed to folded into int64 assignment during

[PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-15 Thread pan2 . li
From: Pan Li After we support the loop lens for the vectorizable, we would like to implement the feature for the RISC-V target. Given below example: unsigned vect_a[1923]; unsigned vect_b[1923]; void test (unsigned limit, int n) { for (int i = 0; i < n; i++) { vect_b[i] = limit +

[PATCH v6] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-17 Thread pan2 . li
From: Pan Li Update in v6: * Rebase upstream for conflict. Log for v5: The patch implement the SAT_ADD in the riscv backend as the sample for both the scalar and vector. Given below vector as example: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { unsigned i;

[PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-18 Thread pan2 . li
From: Pan Li There are sorts of match pattern for SAT related cases, there will be some duplicated code to check the dest, op_0, op_1 are same tree types. Aka ternary tree type matches. Thus, extract one helper function to do this and avoid match code duplication. The below test suites are pas

[PATCH v1 1/2] Match: Support __builtin_add_overflow for branchless unsigned SAT_ADD

2024-05-18 Thread pan2 . li
From: Pan Li This patch would like to support the branchless form for unsigned SAT_ADD when leverage __builtin_add_overflow. For example as below: uint64_t sat_add_u(uint64_t x, uint64_t y) { uint64_t ret; uint64_t overflow = __builtin_add_overflow (x, y, &ret); return (T)(-overflow) | r

[PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branchless unsigned SAT_ADD

2024-05-18 Thread pan2 . li
From: Pan Li After we support branchless __builtin_add_overflow unsigned SAT_ADD from the middle end. Add more tests case to cover the functionarlities. The below test suites are passed. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add

[PATCH v2] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread pan2 . li
From: Pan Li There are sorts of match pattern for SAT related cases, there will be some duplicated code to check the dest, op_0, op_1 are same tree types. Aka ternary tree type matches. Thus, extract one helper function to do this and avoid match code duplication. The below test suites are pas

[PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD

2024-05-20 Thread pan2 . li
From: Pan Li This patch would like to support the branch form for unsigned SAT_ADD. For example as below: uint64_t sat_add (uint64_t x, uint64_t y) { return (uint64_t) (x + y) >= x ? (x + y) : -1; } Different to the branchless version, we leverage the simplify to convert the branch version

[PATCH v1 2/2] RISC-V: Add test cases for branch form unsigned SAT_ADD

2024-05-20 Thread pan2 . li
From: Pan Li After we support branch form unsigned SAT_ADD from the middle end. Add more tests case to cover the functionarlities. The below test suites are passed. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add branch form test macro

[PATCH v3] Match: Extract ternary_integer_types_match_p helper func [NFC]

2024-05-20 Thread pan2 . li
From: Pan Li There are sorts of match pattern for SAT related cases, there will be some duplicated code to check the dest, op_0, op_1 are same tree types. Aka ternary tree type matches. Thus, extract one helper function to do this and avoid match code duplication. The below test suites are pas

[PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branch form unsigned SAT_ADD

2024-05-21 Thread pan2 . li
From: Pan Li After we support __builtin_add_overflow branch form unsigned SAT_ADD from the middle end. Add more tests case to cover the functionarlities. The below test suites are passed. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Ad

[PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread pan2 . li
From: Pan Li This patch would like to support the __builtin_add_overflow branch form for unsigned SAT_ADD. For example as below: uint64_t sat_add (uint64_t x, uint64_t y) { uint64_t ret; return __builtin_add_overflow (x, y, &ret) ? -1 : ret; } Different to the branchless version, we lever

[PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread pan2 . li
From: Pan Li This patch would like to support the __builtin_add_overflow branch form for unsigned SAT_ADD. For example as below: uint64_t sat_add (uint64_t x, uint64_t y) { uint64_t ret; return __builtin_add_overflow (x, y, &ret) ? -1 : ret; } Different to the branchless version, we lever

[PATCH v2] Match: Support branch form for unsigned SAT_ADD

2024-05-21 Thread pan2 . li
From: Pan Li This patch would like to support the branch form for unsigned SAT_ADD. For example as below: uint64_t sat_add (uint64_t x, uint64_t y) { return (uint64_t) (x + y) >= x ? (x + y) : -1; } Different to the branchless version, we leverage the simplify to convert the branch version

[PATCH v4] Match: Add overloaded types_match to avoid code dup [NFC]

2024-05-22 Thread pan2 . li
From: Pan Li There are sorts of match pattern for SAT related cases, there will be some duplicated code to check the dest, op_0, op_1 are same tree types. Aka ternary tree type matches. Thus, add overloaded types_match func do this and avoid match code duplication. The below test suites are p

[PATCH v1] Gen-Match: Fix gen_kids_1 right hand braces mis-alignment

2024-05-25 Thread pan2 . li
From: Pan Li Notice some mis-alignment for gen_kids_1 right hand braces as below: if ((_q50 == _q20 && ! TREE_SIDE_EFFECTS (... { if ((_q51 == _q21 && ! TREE_SIDE_EFFECTS (... {

[PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

2024-05-26 Thread pan2 . li
From: Pan Li After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as below: Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x

[PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

2024-05-27 Thread pan2 . li
From: Pan Li This patch would like to add new internal fun for the below 2 IFN. * mask_len_strided_load * mask_len_strided_store The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias). The GIMPLE MASK_LE

[PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-05-28 Thread pan2 . li
From: Pan Li This patch would like to add the middle-end presentation for the saturation sub. Aka set the result of add to the min when downflow. It will take the pattern similar as below. SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y)); For example for uint8_t, we have * SAT_SUB (255, 0) =>

[PATCH v1] Vect: Support IFN SAT_SUB for unsigned vector int

2024-05-29 Thread pan2 . li
From: Pan Li This patch would like to support the .SAT_SUB for the unsigned vector int. Given we have below example code: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { for (unsigned i = 0; i < n; i++) out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i])); }

[PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as below: Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x

[PATCH v5] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li Update in v5 * Fix some doc build error. Log in v4: After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as below: Form 1: #define SAT_ADD_U_1(T) \

[PATCH v6] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li Update in v6 * Fix more doc build error. Update in v5 * Fix some doc build error. Log in v4: After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as bel

[PATCH v1 5/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 5

2024-06-02 Thread pan2 . li
From: Pan Li After the middle-end support the form 5 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 5 of unsigned .SAT_ADD. Form 5: #define SAT_ADD_U_5(T) \ T sat_add_u_5_##T(T x, T y) \ { \ return (T)(x + y) < x ? -1 : (x

[PATCH v1 2/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 2

2024-06-02 Thread pan2 . li
From: Pan Li After the middle-end support the form 2 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 2 of unsigned .SAT_ADD. Form 2: #define SAT_ADD_U_2(T) \ T sat_add_u_2_##T(T x, T y) \ { \ T ret; \ T overflow = __bu

[PATCH v1 4/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 4

2024-06-02 Thread pan2 . li
From: Pan Li After the middle-end support the form 4 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 4 of unsigned .SAT_ADD. Form 4: #define SAT_ADD_U_4(T) \ T sat_add_u_4_##T (T x, T y) \ { \ T ret; \ return __builtin_

[PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

2024-06-02 Thread pan2 . li
From: Pan Li After the middle-end support the form 1 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 1 of unsigned .SAT_ADD. Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ {

[PATCH v1 3/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 3

2024-06-02 Thread pan2 . li
From: Pan Li After the middle-end support the form 3 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 3 of unsigned .SAT_ADD. Form 3: #define SAT_ADD_U_3(T) \ T sat_add_u_3_##T (T x, T y) \ { \ T ret; \ return __builtin_

[PATCH v1] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-08 Thread pan2 . li
From: Pan Li For QI/HImode of .SAT_ADD, the operands may be sign-extended and the high bits of Xmode may be all 1 which is not expected. For example as below code. signed char b[1]; unsigned short c; signed char *d = b; int main() { b[0] = -40; c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig

[PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect

2024-08-10 Thread pan2 . li
From: Pan Li This patch would like to fix one ICE when rv64gcv_zvbb for vwsll. Consider below example. void vwsll_vv_test (short *restrict dst, char *restrict a, int *restrict b, int n) { for (int i = 0; i < n; i++) dst[i] = a[i] << b[i]; } It will hit the vwsll patter

[PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-11 Thread pan2 . li
From: Pan Li For QI/HImode of .SAT_ADD, the operands may be sign-extended and the high bits of Xmode may be all 1 which is not expected. For example as below code. signed char b[1]; unsigned short c; signed char *d = b; int main() { b[0] = -40; c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig

[PATCH v3] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-12 Thread pan2 . li
From: Pan Li For QI/HImode of .SAT_ADD, the operands may be sign-extended and the high bits of Xmode may be all 1 which is not expected. For example as below code. signed char b[1]; unsigned short c; signed char *d = b; int main() { b[0] = -40; c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig

<    1   2   3   4   5   6   7   8   >