From: Pan Li
Update in v3:
* Take known_le instead of known_lt for vector size.
* Return NULL_RTX when gap is not equal 0 and not constant.
Update in v2:
* Move vector type support to get_stored_val.
Original log:
This patch would like to allow the vector mode in the
get_stored_val in the DSE.
From: Pan Li
This patch would like to support the FP below API auto vectorization
with different type size
++---+--+
| API| RV64 | RV32 |
++---+--+
| lrintf16 | HF => DI | HF => SI |
| llrintf16 | HF => DI | HF => DI |
From: Pan Li
The hancement of mode-switching performs some optimization when
emit the frm backup insn, some redudant fsrm insns are removed
for the following test cases.
This patch would like to adjust the asm check for above optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv
From: Pan Li
Update in v4:
* Merge upstream and removed some independent changes.
Update in v3:
* Take known_le instead of known_lt for vector size.
* Return NULL_RTX when gap is not equal 0 and not constant.
Update in v2:
* Move vector type support to get_stored_val.
Original log:
This patch
From: Pan Li
We take vec_init element int mode when generate the mask for
case 2. But actually we don't need as many bits as the element.
The extra bigger mode may introduce some unnecessary insns.
For example as below code:
typedef int64_t v16di __attribute__ ((vector_size (16 * 8)));
void __a
From: Pan Li
The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,
potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]
This patch would like to ensure the precondition of exact_div
when get_vec_mode
From: Pan Li
When require mode after get_vec_mode in riscv_legitimize_move,
there will be precondition that the mode is exists. Or we will
have E_VOIDMode and of course have ICE when required.
Typically we should first check the mode exists or not before
require, or more friendly like leverage e
From: Pan Li
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1,
From: Pan Li
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1,
From: Pan Li
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1,
From: Pan Li
Update in v4:
* Add test for _Float16.
* Remove unnecessary macro in def.h for test.
Original log:
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil
From: Pan Li
The math.h may have problems in some environment, take __builtin__xx
instead for testing.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c:
Remove reference to math.h.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.
From: Pan Li
Remove the -march and -mabi.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Remove arch and abi.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto.
Signed-off-by: Pan
From: Pan Li
Rename TEST_CEIL to TEST_UNARY_CALL for the underlying function
autovec patch testing.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/test-math.h: Rename.
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-
From: Pan Li
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RDN
*
From: Pan Li
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c:
From: Pan Li
We vectorized below ceil code already.
void
test_ceil (float *out, float *in, int count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1 // can be r
From: Pan Li
We vectorized below ceil code already.
void
test_ceil (float *out, float *in, int count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1 // can be r
From: Pan Li
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RDN
*
From: Pan Li
FP16 is not well reconciled when linking.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove.
Signed-off-by: Pan Li
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 ---
1 file changed, 39 deletions(-)
delete mod
From: Pan Li
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RDN
*
From: Pan Li
When broadcast the reperated element, we take the mask machine mode
by mistake. This patch would like to fix it by leveraging the machine
mode of the element.
The below test case in RV32 will be fixed.
* gcc/testsuite/gfortran.dg/overload_5.f90
PR target/111546
gcc/Change
From: Pan Li
When broadcast the reperated element, we take the mask_int_mode
by mistake. This patch would like to fix it by leveraging the machine
mode of the element.
The below test case in RV32 will be fixed.
* gcc/testsuite/gfortran.dg/overload_5.f90
PR target/111546
gcc/ChangeLog:
From: Pan Li
This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.
When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).
* frflags
From: Pan Li
The rounding related API shared one const, rename it to avoid
unnecessary redundant code.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (gen_ceil_const_fp): Remove.
(get_fp_rounding_coefficient): Rename.
(gen_floor_const_fp): Remove.
(expand_vec_ceil): Ta
From: Pan Li
This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.
When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).
* frflags
From: Pan Li
This patch would like to support auto-vectorization for the
rint API in math.h. It depends on the -ffast-math option.
When we would like to call rint/rintf like v2 = rint (v1),
we will convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1
* vfcvt.f.
From: Pan Li
This patch would like to support auto-vectorization for the
round API in math.h. It depends on the -ffast-math option.
When we would like to call round/roundf like v2 = round (v1),
we will convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RMM
*
From: Pan Li
This patch would like to support auto-vectorization for the
trunc API in math.h. It depends on the -ffast-math option.
When we would like to call trunc/truncf like v2 = trunc (v1),
we will convert it into below insns (reference the implementation of
llvm).
* vfcvt.rtz.x.f v3, v1
*
From: Pan Li
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1,
From: Pan Li
This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.
* INT64 to FP32.
* FP32 to FP16.
Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
for (unsigned i = 0; i < n; i++)
From: Pan Li
The zip benchmark of coremark-pro have one SAT_SUB like pattern but
truncated as below:
void test (uint16_t *x, unsigned b, unsigned n)
{
unsigned a = 0;
register uint16_t *p = x;
do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
} while
From: Pan Li
This patch would like to add the middle-end presentation for the
saturation truncation. Aka set the result of truncated value to
the max value when overflow. It will take the pattern similar
as below.
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
NT __attribute__((noinline))
From: Pan Li
This patch would like to support the form of unsigned scalar .SAT_ADD
when one of the op is IMM. For example as below:
Form IMM:
#define DEF_SAT_U_ADD_IMM_FMT_1(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{
From: Pan Li
To get better vectorized code of .SAT_SUB, we would like to avoid the
truncated operation for the assignment. For example, as below.
unsigned int _1;
unsigned int _2;
_9 = (unsigned short int).SAT_SUB (_1, _2);
If we make sure that the _1 is in the range of unsigned short int. S
From: Pan Li
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 3. Aka:
Form 3:
#define DEF_SAT_U_ADD_IMM_FMT_3(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_3 (T x) \
From: Pan Li
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 2. Aka:
Form 2:
#define DEF_SAT_U_ADD_IMM_FMT_2(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{ \
retu
From: Pan Li
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 1. Aka:
Form 1:
#define DEF_SAT_U_ADD_IMM_FMT_1(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{\
From: Pan Li
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 4. Aka:
Form 4:
#define DEF_SAT_U_ADD_IMM_FMT_4(T)\
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_4 (T x)
From: Pan Li
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
{\
From: Pan Li
The .SAT_TRUNC has the input and output types, aka cvt from
itype to otype and the sizeof (otype) < sizeof (itype). The
previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
But actually we have 1/4 and 1/8 truncation.
This patch would like to support more types tru
From: Pan Li
Update in v2:
Rebase the upstream.
Log in v1:
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
From: Pan Li
This patch would like to support the .SAT_TRUNC for the unsigned
vector int. Given we have below example code:
Form 1
#define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
void __attribute__((noinline)) \
vec_sat_u_truc_#
From: Pan Li
This patch would like to support the .SAT_TRUNC for the unsigned
vector int. Given we have below example code:
Form 1
#define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
void __attribute__((noinline)) \
vec_sat_u_truc_#
From: Pan Li
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
{\
From: Pan Li
It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.
From: Pan Li
According to the ISA, the zvfhmin sub extension should only contain
convertion insn. Thus, the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.
This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.
From: Pan Li
This patch would like to implement the .SAT_TRUNC for the RISC-V
backend. With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions. The below SEW(S) are supported:
* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8
Take below example to see
From: Pan Li
This patch would like to add form 2 support for the .SAT_TRUNC. Aka:
Form 2:
#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
{\
bool overflow = x > (
From: Pan Li
To get better vectorized code of .SAT_SUB, we would like to avoid the
truncated operation for the assignment. For example, as below.
unsigned int _1;
unsigned int _2;
_9 = (unsigned short int).SAT_SUB (_1, _2);
If we make sure that the _1 is in the range of unsigned short int. S
From: Pan Li
This patch would like to implement the .SAT_TRUNC for the RISC-V
backend. With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions. The below SEW(S) are supported:
* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8
Take below example to see
From: Pan Li
This patch would like to implement the .SAT_TRUNC for the RISC-V
backend. With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions. The below SEW(S) are supported:
* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8
Take below example to see
From: Pan Li
After the middle-end supported the vector mode of .SAT_ADD, add more
testcases to ensure the correctness of RISC-V backend for form 2. Aka:
Form 2:
#define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM) \
T __attribute__((noinline))
From: Pan Li
After the middle-end supported the vector mode of .SAT_ADD, add more
testcases to ensure the correctness of RISC-V backend for form 1. Aka:
Form 1:
#define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM) \
T __attribute__((noinline))
From: Pan Li
To get better vectorized code of .SAT_SUB, we would like to avoid the
truncated operation for the assignment. For example, as below.
unsigned int _1;
unsigned int _2;
unsigned short int _4;
_9 = (unsigned short int).SAT_SUB (_1, _2);
If we make sure that the _1 is in the range of
From: Pan Li
The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
Form 3:
#define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \
T __attribute__((noinline))
From: Pan Li
This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark. Aka:
Form in zip benchmark:
#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
void __attribute__((noinline))\
vec_sat_u_sub_##T1##_#
From: Pan Li
Update in v3:
* Rebase the upstream.
* Adjust asm check.
Original log:
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_t
From: Pan Li
The .SAT_TRUNC matching doesn't check the type has mode precision. Thus
when bitfield like below will be recog as .SAT_TRUNC.
struct e
{
unsigned pre : 12;
unsigned a : 4;
};
__attribute__((noipa))
void bug (e * v, unsigned def, unsigned use) {
e & defE = *v;
defE.a = min_
From: Pan Li
This patch would like to add the doc for the Standard-Names
ustrunc and sstrunc, include both the scalar and vector integer
modes.
gcc/ChangeLog:
* doc/md.texi: Add Standard-Names ustrunc and sstrunc.
Signed-off-by: Pan Li
---
gcc/doc/md.texi | 12
1 file c
From: Pan Li
This patch would like to add the doc for the Standard-Names
ustrunc and sstrunc, include both the scalar and vector integer
modes.
gcc/ChangeLog:
* doc/md.texi: Add Standard-Names ustrunc and sstrunc.
Signed-off-by: Pan Li
---
gcc/doc/md.texi | 12
1 file c
From: Pan Li
The SAT_TRUNC form 2 has below pattern matching.
From:
_18 = MIN_EXPR ;
iftmp.0_11 = (unsigned int) _18;
To:
_18 = MIN_EXPR ;
iftmp.0_11 = .SAT_TRUNC (_18);
But if there is another use of _18 like below, the transform to the
.SAT_TRUNC may have no earnings. For example:
From: Pan Li
The direct_internal_fn_supported_p has no restrictions for the type
modes. For example the bitfield like below will be recog as .SAT_TRUNC.
struct e
{
unsigned pre : 12;
unsigned a : 4;
};
__attribute__((noipa))
void bug (e * v, unsigned def, unsigned use) {
e & defE = *v;
From: Pan Li
The direct_internal_fn_supported_p has no restrictions for the type
modes. For example the bitfield like below will be recog as .SAT_TRUNC.
struct e
{
unsigned pre : 12;
unsigned a : 4;
};
__attribute__((noipa))
void bug (e * v, unsigned def, unsigned use) {
e & defE = *v;
From: Pan Li
This patch would like to add the middle-end presentation for the
saturation add. Aka set the result of add to the max when overflow.
It will take the pattern similar as below.
SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
Take uint8_t as example, we will have:
* SAT_AD
From: Pan Li
The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector. Given below vector
as example:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
unsigned i;
for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (u
From: Pan Li
For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be
From: Pan Li
This patch adds early break auto-vectorization support for target which
use length on partial vectorization. Consider this following example:
unsigned vect_a[802];
unsigned vect_b[802];
void test (unsigned x, int n)
{
for (int i = 0; i < n; i++)
{
vect_b[i] = x + i;
i
From: Pan Li
After we supported vectorizable early exit in RISC-V, we would like to
enable the gcc vect test for vectorizable early test.
The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during
From: Pan Li
After we support the loop lens for the vectorizable, we would like to
implement the feature for the RISC-V target. Given below example:
unsigned vect_a[1923];
unsigned vect_b[1923];
void test (unsigned limit, int n)
{
for (int i = 0; i < n; i++)
{
vect_b[i] = limit +
From: Pan Li
Update in v6:
* Rebase upstream for conflict.
Log for v5:
The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector. Given below vector
as example:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
unsigned i;
From: Pan Li
There are sorts of match pattern for SAT related cases, there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches. Thus, extract one helper function to
do this and avoid match code duplication.
The below test suites are pas
From: Pan Li
This patch would like to support the branchless form for unsigned
SAT_ADD when leverage __builtin_add_overflow. For example as below:
uint64_t sat_add_u(uint64_t x, uint64_t y)
{
uint64_t ret;
uint64_t overflow = __builtin_add_overflow (x, y, &ret);
return (T)(-overflow) | r
From: Pan Li
After we support branchless __builtin_add_overflow unsigned SAT_ADD from
the middle end. Add more tests case to cover the functionarlities.
The below test suites are passed.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add
From: Pan Li
There are sorts of match pattern for SAT related cases, there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches. Thus, extract one helper function to
do this and avoid match code duplication.
The below test suites are pas
From: Pan Li
This patch would like to support the branch form for unsigned
SAT_ADD. For example as below:
uint64_t
sat_add (uint64_t x, uint64_t y)
{
return (uint64_t) (x + y) >= x ? (x + y) : -1;
}
Different to the branchless version, we leverage the simplify to
convert the branch version
From: Pan Li
After we support branch form unsigned SAT_ADD from the
middle end. Add more tests case to cover the functionarlities.
The below test suites are passed.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add branch form test macro
From: Pan Li
There are sorts of match pattern for SAT related cases, there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches. Thus, extract one helper function to
do this and avoid match code duplication.
The below test suites are pas
From: Pan Li
After we support __builtin_add_overflow branch form unsigned SAT_ADD
from the middle end. Add more tests case to cover the functionarlities.
The below test suites are passed.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Ad
From: Pan Li
This patch would like to support the __builtin_add_overflow branch form for
unsigned SAT_ADD. For example as below:
uint64_t
sat_add (uint64_t x, uint64_t y)
{
uint64_t ret;
return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}
Different to the branchless version, we lever
From: Pan Li
This patch would like to support the __builtin_add_overflow branch form for
unsigned SAT_ADD. For example as below:
uint64_t
sat_add (uint64_t x, uint64_t y)
{
uint64_t ret;
return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}
Different to the branchless version, we lever
From: Pan Li
This patch would like to support the branch form for unsigned
SAT_ADD. For example as below:
uint64_t
sat_add (uint64_t x, uint64_t y)
{
return (uint64_t) (x + y) >= x ? (x + y) : -1;
}
Different to the branchless version, we leverage the simplify to
convert the branch version
From: Pan Li
There are sorts of match pattern for SAT related cases, there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches. Thus, add overloaded types_match func
do this and avoid match code duplication.
The below test suites are p
From: Pan Li
Notice some mis-alignment for gen_kids_1 right hand braces as below:
if ((_q50 == _q20 && ! TREE_SIDE_EFFECTS (...
{
if ((_q51 == _q21 && ! TREE_SIDE_EFFECTS (...
{
From: Pan Li
After we support one gassign form of the unsigned .SAT_ADD, we
would like to support more forms including both the branch and
branchless. There are 5 other forms of .SAT_ADD, list as below:
Form 1:
#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
return (T)(x
From: Pan Li
This patch would like to add new internal fun for the below 2 IFN.
* mask_len_strided_load
* mask_len_strided_store
The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
The GIMPLE MASK_LE
From: Pan Li
This patch would like to add the middle-end presentation for the
saturation sub. Aka set the result of add to the min when downflow.
It will take the pattern similar as below.
SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
For example for uint8_t, we have
* SAT_SUB (255, 0) =>
From: Pan Li
This patch would like to support the .SAT_SUB for the unsigned
vector int. Given we have below example code:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
for (unsigned i = 0; i < n; i++)
out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
}
From: Pan Li
After we support one gassign form of the unsigned .SAT_ADD, we
would like to support more forms including both the branch and
branchless. There are 5 other forms of .SAT_ADD, list as below:
Form 1:
#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
return (T)(x
From: Pan Li
Update in v5
* Fix some doc build error.
Log in v4:
After we support one gassign form of the unsigned .SAT_ADD, we
would like to support more forms including both the branch and
branchless. There are 5 other forms of .SAT_ADD, list as below:
Form 1:
#define SAT_ADD_U_1(T) \
From: Pan Li
Update in v6
* Fix more doc build error.
Update in v5
* Fix some doc build error.
Log in v4:
After we support one gassign form of the unsigned .SAT_ADD, we
would like to support more forms including both the branch and
branchless. There are 5 other forms of .SAT_ADD, list as bel
From: Pan Li
After the middle-end support the form 5 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 5 of unsigned .SAT_ADD.
Form 5:
#define SAT_ADD_U_5(T) \
T sat_add_u_5_##T(T x, T y) \
{ \
return (T)(x + y) < x ? -1 : (x
From: Pan Li
After the middle-end support the form 2 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 2 of unsigned .SAT_ADD.
Form 2:
#define SAT_ADD_U_2(T) \
T sat_add_u_2_##T(T x, T y) \
{ \
T ret; \
T overflow = __bu
From: Pan Li
After the middle-end support the form 4 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 4 of unsigned .SAT_ADD.
Form 4:
#define SAT_ADD_U_4(T) \
T sat_add_u_4_##T (T x, T y) \
{ \
T ret; \
return __builtin_
From: Pan Li
After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.
Form 1:
#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{
From: Pan Li
After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 3 of unsigned .SAT_ADD.
Form 3:
#define SAT_ADD_U_3(T) \
T sat_add_u_3_##T (T x, T y) \
{ \
T ret; \
return __builtin_
From: Pan Li
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected. For example as
below code.
signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
b[0] = -40;
c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig
From: Pan Li
This patch would like to fix one ICE when rv64gcv_zvbb for vwsll.
Consider below example.
void vwsll_vv_test (short *restrict dst, char *restrict a,
int *restrict b, int n)
{
for (int i = 0; i < n; i++)
dst[i] = a[i] << b[i];
}
It will hit the vwsll patter
From: Pan Li
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected. For example as
below code.
signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
b[0] = -40;
c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig
From: Pan Li
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected. For example as
below code.
signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
b[0] = -40;
c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsig
101 - 200 of 755 matches
Mail list logo