[PATCH V2 0/3] RISC-V: Add an experimental vector calling convention

2023-08-10 Thread Lehua Ding
Hi RISC-V folks,

This patch implement the proposal of RISC-V vector calling convention[1] and
this feature can be enabled by `--param=riscv-vector-abi` option. Currently,
all vector type arguments and return values are pass by reference. With this
patch, these arguments and return values can pass through vector registers.
Currently only vector types defined in the RISC-V Vector Extension Intrinsic 
Document[2]
are supported. GNU-ext vector types are unsupported for now since the
corresponding proposal was not presented.

The proposal introduce a new calling convention variant, functions which follow
this variant need follow the bellow vector register convention.

| Name| ABI Mnemonic | Meaning  | Preserved across 
calls?
=
| v0  |  | Argument register| No
| v1-v7   |  | Callee-saved registers   | Yes
| v8-v23  |  | Argument registers   | No
| v24-v31 |  | Callee-saved registers   | Yes

If a functions follow this vector calling convention, then the function symbole
must be annotated with .variant_cc directive[3] (used to indicate that it is a
calling convention variant).

This implementation split into three parts, each part corresponds to a 
sub-patch.

- Part-1: Select suitable vector regsiters for vector type arguments and return
  values according to the proposal.
- Part-2: Allocate frame area for callee-saved vector registers and save/restore
  them in prologue and epilogue.
- Part-3: Generate .variant_cc directive for vector function in assembly code.

Best,
Lehua

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389
[2] 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system
[3] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops

Lehua Ding (3):
  RISC-V: Part-1: Select suitable vector registers for vector type args
and returns
  RISC-V: Part-2: Save/Restore vector registers which need to be
preversed
  RISC-V: Part-3: Output .variant_cc directive for vector function

 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  10 +
 gcc/config/riscv/riscv.cc | 505 --
 gcc/config/riscv/riscv.h  |  40 ++
 gcc/config/riscv/riscv.md |  43 +-
 gcc/config/riscv/riscv.opt|   5 +
 .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
 .../riscv/rvv/base/abi-call-args-1.c  | 197 +++
 .../riscv/rvv/base/abi-call-args-2-run.c  |  34 ++
 .../riscv/rvv/base/abi-call-args-2.c  |  27 +
 .../riscv/rvv/base/abi-call-args-3-run.c  | 260 +
 .../riscv/rvv/base/abi-call-args-3.c  | 116 
 .../riscv/rvv/base/abi-call-args-4-run.c  | 145 +
 .../riscv/rvv/base/abi-call-args-4.c  | 111 
 .../riscv/rvv/base/abi-call-error-1.c |  11 +
 .../riscv/rvv/base/abi-call-return-run.c  | 127 +
 .../riscv/rvv/base/abi-call-return.c  | 197 +++
 .../riscv/rvv/base/abi-call-variant_cc.c  |  39 ++
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
 .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 
 23 files changed, 2322 insertions(+), 62 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-error-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

-- 
2.36.3



[PATCH V2 1/3] RISC-V: Part-1: Select suitable vector registers for vector type args and returns

2023-08-10 Thread Lehua Ding
I have posted below the vector register calling convention rules from in the
proposal[1]:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for 
checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector 
arguments.
(riscv_arguments_is_vector_type_p): New function for check vector 
returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.

---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc |  10 +
 gcc/config/riscv/riscv.cc | 235 ++--
 gcc/config/riscv/riscv.h  |  25 ++
 gcc/config/riscv/riscv.opt|   5 +
 .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
 .../riscv/rvv/base/abi-call-args-1.c  | 197 +
 .../riscv/rvv/base/abi-call-args-2-run.c  |  34 +++
 .../riscv/rvv/base/abi-call-args-2.c  |  27 ++
 .../riscv/rvv/base/abi-call-args-3-run.c  | 260 ++
 .../riscv/rvv/base/abi-call-args-3.c  | 116 
 .../riscv/rvv/base/abi-call-a

[PATCH V2 2/3] RISC-V: Part-2: Save/Restore vector registers which need to be preversed

2023-08-10 Thread Lehua Ding
Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): 
Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_save_libcall_count): Add vector save area.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved 
registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.
---
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv.cc | 222 +++---
 gcc/config/riscv/riscv.md |  43 +++-
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
 .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +
 7 files changed, 606 insertions(+), 45 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

diff --git a/gcc/config/riscv/riscv-sr.cc b/gcc/config/riscv/riscv-sr.cc
index 7248f04d68f..e6e17685df5 100644
--- a/gcc/config/riscv/riscv-sr.cc
+++ b/gcc/config/riscv/riscv-sr.cc
@@ -447,12 +447,18 @@ riscv_remove_unneeded_save_restore_calls (void)
   && !SIBCALL_REG_P (REGNO (target)))
 return;
 
+  /* Extract RISCV CC from the UNSPEC rtx.  */
+  rtx unspec = XVECEXP (callpat, 0, 1);
+  gcc_assert (GET_CODE (unspec) == UNSPEC
+ && XINT (unspec, 1) == UNSPEC_CALLEE_CC);
+  riscv_cc cc = (riscv_cc) INTVAL (XVECEXP (unspec, 0, 0));
   rtx sibcall = NULL;
   if (set_target != NULL)
-sibcall
-  = gen_sibcall_value_internal (set_target, target, const0_rtx);
+sibcall = gen_sibcall_value_internal (set_target, target, const0_rtx,
+ gen_int_mode (cc, SImode));
   else
-sibcall = gen_sibcall_internal (target, const0_rtx);
+sibcall
+  = gen_sibcall_internal (target, const0_rtx, gen_int_mode (cc, SImode));
 
   rtx_insn *before_call = PREV_INSN (call);
   remove_insn (call);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index aa6b46d7611..09c9e09e83a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -108,6 +108,9 @@ struct GTY(())  riscv_frame_info {
   /* Likewise FPR X.  */
   unsigned int fmask;
 
+  /* Likewise for vector registers.  */
+  unsigned int vmask;
+
   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
   unsigned save_libcall_adjustment;
 
@@ -115,6 +118,10 @@ struct GTY(())  riscv_frame_info {
   poly_int64 gp_sp_offset;
   poly_int64 fp_sp_offset;
 
+  /* Top and bottom offsets of vector save areas from frame bottom.  */
+  poly_int64 v_sp_offset_top;
+  poly_int64 v_sp_offset_bottom;
+
   /* Offset of virtual frame pointer from stack pointer/frame bottom */
   poly_int64 frame_pointer_offset;
 
@@ -265,7 +272,7 @@ unsigned riscv_stack_boundary;
 /* If non-zero, this is an offset to be added to SP to redefine the CFA
when restoring the FP register from the stack.  Only valid when generating
the epilogue.  */
-static int epilogue_cfa_sp_offset;
+static poly_int64 epilogue_cfa_sp_offset;
 
 /* Which tuning parameters to use.  */
 static const struct riscv_tune_param *tune_param;

[PATCH V2 3/3] RISC-V: Part-3: Output .variant_cc directive for vector function

2023-08-10 Thread Lehua Ding
Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual and
RISC-V ELF Specification[2].

[1] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):  Output 
.variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME): Implement 
ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |  3 ++
 gcc/config/riscv/riscv.cc | 48 +++
 gcc/config/riscv/riscv.h  | 15 ++
 .../riscv/rvv/base/abi-call-variant_cc.c  | 39 +++
 4 files changed, 105 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4a67297173d..260ba2f9a49 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -101,6 +101,9 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern void riscv_declare_function_name (FILE *, const char *, tree);
+extern void riscv_asm_output_alias (FILE *, const tree, const tree);
+extern void riscv_asm_output_external (FILE *, const tree, const char *);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 09c9e09e83a..83cf7a5da82 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7013,6 +7013,54 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Output .variant_cc for function symbol which follows vector calling
+   convention.  */
+
+static void
+riscv_asm_output_variant_cc (FILE *stream, const tree decl, const char *name)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+{
+  riscv_cc cc = (riscv_cc) fndecl_abi (decl).id ();
+  if (cc == RISCV_CC_V)
+   {
+ fprintf (stream, "\t.variant_cc\t");
+ assemble_name (stream, name);
+ fprintf (stream, "\n");
+   }
+}
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  */
+
+void
+riscv_declare_function_name (FILE *stream, const char *name, tree fndecl)
+{
+  riscv_asm_output_variant_cc (stream, fndecl, name);
+  ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
+  ASM_OUTPUT_LABEL (stream, name);
+}
+
+/* Implement ASM_OUTPUT_DEF_FROM_DECLS.  */
+
+void
+riscv_asm_output_alias (FILE *stream, const tree decl, const tree target)
+{
+  const char *name = XSTR (XEXP (DECL_RTL (decl), 0), 0);
+  const char *value = IDENTIFIER_POINTER (target);
+  riscv_asm_output_variant_cc (stream, decl, name);
+  ASM_OUTPUT_DEF (stream, name, value);
+}
+
+/* Implement ASM_OUTPUT_EXTERNAL.  */
+
+void
+riscv_asm_output_external (FILE *stream, tree decl, const char *name)
+{
+  default_elf_asm_output_external (stream, decl, name);
+  riscv_asm_output_variant_cc (stream, decl, name);
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index b24b240dd75..1820593bab5 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1021,6 +1021,21 @@ while (0)
 
 #define ASM_COMMENT_START "#"
 
+/* Add output .variant_cc directive for specific function definition.  */
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) 
\
+  riscv_declare_function_name (STR, NAME, DECL)
+
+/* Add output .variant_cc directive for specific alias definition.  */
+#undef ASM_OUTPUT_DEF_FROM_DECLS
+#define ASM_OUTPUT_DEF_FROM_DECLS(STR, DECL, TARGET)   
\
+  riscv_asm_output_alias (STR, DECL, TARGET)
+
+/* Add output .variant_cc directive for specific extern function.  */
+#undef ASM_OUTPUT_EXTERNAL
+#define ASM_OUTPUT_EXTERNAL(STR, DECL, NAME)   
\
+  riscv_asm_output_external (STR, DECL, NAME)
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (POINTER_SIZE == 64 ? "long unsigned int" : "unsigned int")
 
diff --git a/gcc/testsu

Re: [PATCH 1/3] RISC-V: Part-1: Select suitable vector registers for vector type args and returns

2023-08-10 Thread Lehua Ding
Thanks so much for Kito's online and offline comments.
I have upload V2 patchs which address all comments.


https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626935.html


Best,
Lehua




-- Original --
From:   
 "Kito Cheng"   
 


Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 06:55:05AM +, Richard Biener wrote:
> On Wed, 9 Aug 2023, Joseph Myers wrote:
> 
> > On Wed, 9 Aug 2023, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > - _Complex _BitInt(N) isn't supported; again mainly because none of the 
> > > psABIs
> > >   mention how those should be passed/returned; in a limited way they are
> > >   supported internally because the internal functions into which
> > >   __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as 
> > > a
> > >   hack to return 2 values without using references/pointers
> > 
> > What happens when the usual arithmetic conversions are applied to 
> > operands, one of which is a complex integer type and the other of which is 
> > a wider _BitInt type?  I don't see anything in the code to disallow this 
> > case (which would produce an expression with a _Complex _BitInt type), or 
> > any testcases for it.
> > 
> > Other testcases I think should be present (along with any corresponding 
> > changes needed to the code itself):
> > 
> > * Verifying that the new integer constant suffix is rejected for C++.
> > 
> > * Verifying appropriate pedwarn-if-pedantic for the new constant suffix 
> > for versions of C before C2x (and probably for use of _BitInt type 
> > specifiers before C2x as well) - along with the expected -Wc11-c2x-compat 
> > handling (in C2x mode) / -pedantic -Wno-c11-c2x-compat in older modes.
> 
> Can we go as far as deprecating our _Complex int extension for
> C17 and make it unavailable for C2x, side-stepping the issue?
> Or maybe at least considering that for C2x?

I can just sorry at it for now.  And now that I search through the x86-64
psABI again, it doesn't mention complex integers at all, so we are there on
our own.  And it seems we don't have anything for complex integers on the
library side and the complex lowering is before bitint lowering, so it might
just work with < 10 lines of changes in code + testsuite, but if we do
enable it, let's do it incrementally.

Jakub



Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Biener via Gcc-patches
On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
 wrote:
>
> If `A` has a range of `[0,0][100,INF]` and the comparison
> of `A < 50`. This should be optimized to `A <= 0` (which then
> will be optimized to just `A == 0`).
> This patch implement this via a new function which sees if
> the constant of a comparison is in the middle of 2 range pairs
> and change the constant to the either upper bound of the first pair
> or the lower bound of the second pair depending on the comparison.
>
> This is the first step in fixing the following PRS:
> PR 110131, PR 108360, and PR 108397.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.



> gcc/ChangeLog:
>
> * vr-values.cc (simplify_compare_using_range_pairs): New function.
> (simplify_using_ranges::simplify_compare_using_ranges_1): Call
> it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/vrp124.c: New test.
> * gcc.dg/pr21643.c: Disable VRP.
> ---
>  gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
>  gcc/vr-values.cc   | 65 ++
>  3 files changed, 114 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
> index 4e7f93d351a..7f121d7006f 100644
> --- a/gcc/testsuite/gcc.dg/pr21643.c
> +++ b/gcc/testsuite/gcc.dg/pr21643.c
> @@ -1,6 +1,10 @@
>  /* PR tree-optimization/21643 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
> logical-op-non-short-circuit=1" } */
> +/* Note VRP is able to transform `c >= 0x20` in f7
> +   to `c >= 0x21` since we want to test
> +   reassociation and not VRP, turn it off. */
> +
> +/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
> logical-op-non-short-circuit=1 -fno-tree-vrp" } */
>
>  int
>  f1 (unsigned char c)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> new file mode 100644
> index 000..6ccbda35d1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* Should be optimized to a == -100 */
> +int g(int a)
> +{
> +  if (a == -100 || a >= 0)
> +;
> +  else
> +return 0;
> +  return a < 0;
> +}
> +
> +/* Should optimize to a == 0 */
> +int f(int a)
> +{
> +  if (a == 0 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 50;
> +}
> +
> +/* Should be optimized to a == 0. */
> +int f2(int a)
> +{
> +  if (a == 0 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 100;
> +}
> +
> +/* Should optimize to a == 100 */
> +int f1(int a)
> +{
> +  if (a < 0 || a == 100)
> +;
> +  else
> +return 0;
> +  return a > 50;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
> index a4fddd62841..1262e7cf9f0 100644
> --- a/gcc/vr-values.cc
> +++ b/gcc/vr-values.cc
> @@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree op0,
>if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
> return min;
>  }
> +
>return NULL;
>  }
>
> +/* Simplify integer comparisons such that the constant is one of the range 
> pairs.
> +   For an example,
> +   A has a range of [0,0][100,INF]
> +   and the comparison of `A < 50`.
> +   This should be optimized to `A <= 0`
> +   and then test_for_singularity can optimize it to `A == 0`.   */
> +
> +static bool
> +simplify_compare_using_range_pairs (tree_code &cond_code, tree &op0, tree 
> &op1,
> +   const value_range *vr)
> +{
> +  if (TREE_CODE (op1) != INTEGER_CST
> +  || vr->num_pairs () < 2)
> +return false;
> +  auto val_op1 = wi::to_wide (op1);
> +  tree type = TREE_TYPE (op0);
> +  auto sign = TYPE_SIGN (type);
> +  auto p = vr->num_pairs ();
> +  /* Find the value range pair where op1
> + is in the middle of if one exist. */
> +  for (unsigned i = 1; i < p; i++)
> +{
> +  auto lower = vr->upper_bound (i - 1);
> +  auto upper = vr->lower_bound (i);
> +  if (wi::lt_p (val_op1, lower, sign))
> +   continue;
> +  if (wi::gt_p (val_op1, upper, sign))
> +   continue;

That looks like a linear search - it looks like m_base[] is
a sorted array of values so we should be able to
binary search here?  array_slice::bsearch could be
used if it existed (simply port it over from vec<> and
use array_slice from that)?

In the linear search above I'm missing an earlier out
if op1 falls inside a sub-range, you seem to walk the whole
array?

When is this transform profitable on its own and why would
it enable followup simplifications?  The example you quote
is for turning the compare into a compare with zero which
is usually cheap but this case would also be easy to special
case.  Tra

Re: [PATCH 1/12] expr: Small optimization [PR102989]

2023-08-10 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> Small optimization to avoid testing modifier multiple times.

OK.

Richard.

> 2023-08-09  Jakub Jelinek  
> 
>   PR c/102989
>   * expr.cc (expand_expr_real_1) : Add an early return for
>   EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple
>   times.
> 
> --- gcc/expr.cc.jj2023-08-08 15:55:06.499164554 +0200
> +++ gcc/expr.cc   2023-08-08 15:59:36.594382141 +0200
> @@ -11248,17 +11248,15 @@ expand_expr_real_1 (tree exp, rtx target
>   set_mem_addr_space (temp, as);
>   if (TREE_THIS_VOLATILE (exp))
> MEM_VOLATILE_P (temp) = 1;
> - if (modifier != EXPAND_WRITE
> - && modifier != EXPAND_MEMORY
> - && !inner_reference_p
> + if (modifier == EXPAND_WRITE || modifier == EXPAND_MEMORY)
> +   return temp;
> + if (!inner_reference_p
>   && mode != BLKmode
>   && align < GET_MODE_ALIGNMENT (mode))
> temp = expand_misaligned_mem_ref (temp, mode, unsignedp, align,
>   modifier == EXPAND_STACK_PARM
>   ? NULL_RTX : target, alt_rtl);
> - if (reverse
> - && modifier != EXPAND_MEMORY
> - && modifier != EXPAND_WRITE)
> + if (reverse)
> temp = flip_storage_order (mode, temp);
>   return temp;
>}
> 
>   Jakub
> 
> 


Re: [PATCH 2/12] lto-streamer-in: Adjust assert [PR102989]

2023-08-10 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> With _BitInt(575) or any other _BitInt(513) or larger constants we can
> run into this assertion.  MAX_BITSIZE_MODE_ANY_INT is just a value from
> which WIDE_INT_MAX_PRECISION is derived.

OK.

Richard.

> 2023-08-09  Jakub Jelinek  
> 
>   PR c/102989
>   * lto-streamer-in.cc (lto_input_tree_1): Assert TYPE_PRECISION
>   is up to WIDE_INT_MAX_PRECISION rather than MAX_BITSIZE_MODE_ANY_INT.
> 
> --- gcc/lto-streamer-in.cc.jj 2023-07-17 09:07:42.078283882 +0200
> +++ gcc/lto-streamer-in.cc2023-07-27 15:03:24.255234159 +0200
> @@ -1888,7 +1888,7 @@ lto_input_tree_1 (class lto_input_block
>  
>for (i = 0; i < len; i++)
>   a[i] = streamer_read_hwi (ib);
> -  gcc_assert (TYPE_PRECISION (type) <= MAX_BITSIZE_MODE_ANY_INT);
> +  gcc_assert (TYPE_PRECISION (type) <= WIDE_INT_MAX_PRECISION);
>result = wide_int_to_tree (type, wide_int::from_array
>(a, len, TYPE_PRECISION (type)));
>streamer_tree_cache_append (data_in->reader_cache, result, hash);
> 
>   Jakub
> 


Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-10 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 10, 2023 at 12:13 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Aug 10, 2023 at 06:55:05AM +, Richard Biener wrote:
> > On Wed, 9 Aug 2023, Joseph Myers wrote:
> >
> > > On Wed, 9 Aug 2023, Jakub Jelinek via Gcc-patches wrote:
> > >
> > > > - _Complex _BitInt(N) isn't supported; again mainly because none of the 
> > > > psABIs
> > > >   mention how those should be passed/returned; in a limited way they are
> > > >   supported internally because the internal functions into which
> > > >   __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE 
> > > > as a
> > > >   hack to return 2 values without using references/pointers
> > >
> > > What happens when the usual arithmetic conversions are applied to
> > > operands, one of which is a complex integer type and the other of which is
> > > a wider _BitInt type?  I don't see anything in the code to disallow this
> > > case (which would produce an expression with a _Complex _BitInt type), or
> > > any testcases for it.
> > >
> > > Other testcases I think should be present (along with any corresponding
> > > changes needed to the code itself):
> > >
> > > * Verifying that the new integer constant suffix is rejected for C++.
> > >
> > > * Verifying appropriate pedwarn-if-pedantic for the new constant suffix
> > > for versions of C before C2x (and probably for use of _BitInt type
> > > specifiers before C2x as well) - along with the expected -Wc11-c2x-compat
> > > handling (in C2x mode) / -pedantic -Wno-c11-c2x-compat in older modes.
> >
> > Can we go as far as deprecating our _Complex int extension for
> > C17 and make it unavailable for C2x, side-stepping the issue?
> > Or maybe at least considering that for C2x?
>
> I can just sorry at it for now.  And now that I search through the x86-64
> psABI again, it doesn't mention complex integers at all, so we are there on
> our own.  And it seems we don't have anything for complex integers on the
> library side and the complex lowering is before bitint lowering, so it might
> just work with < 10 lines of changes in code + testsuite, but if we do
> enable it, let's do it incrementally.

_Complex int division also has issues which is another reason to
deprecate/remove it; see PR 104937 for that and
https://gcc.gnu.org/legacy-ml/gcc/2001-11/msg00790.html (which was the
first time to deprecate _Complex int;
https://gcc.gnu.org/legacy-ml/gcc/2001-11/msg00863.html).

Thanks,
Andrew


>
> Jakub
>


Re: [PATCH] Fix PR 110954: wrong code with cmp | !cmp

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 2:21 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This was an oversight on my part not realizing that
> comparisons in generic can have a non-boolean type.
> This means if we have `(f < 0) | !(f < 0)` we would
> optimize this to -1 rather than just 1.
> This patch just adds the check for the type of the comparisons
> to be boolean type to keep the optimization in that case.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR 110954
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (bitwise_inverted_equal_p): Check
> the type of the comparison to be boolean too.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr110954-1.c: New test.
> ---
>  gcc/generic-match-head.cc|  3 ++-
>  gcc/testsuite/gcc.c-torture/execute/pr110954-1.c | 10 ++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110954-1.c
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index ddaf22f2179..ac2119bfdd0 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -146,7 +146,8 @@ bitwise_inverted_equal_p (tree expr1, tree expr2)
>&& bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
>  return true;
>if (COMPARISON_CLASS_P (expr1)
> -  && COMPARISON_CLASS_P (expr2))
> +  && COMPARISON_CLASS_P (expr2)
> +  && TREE_CODE (TREE_TYPE (expr1)) == BOOLEAN_TYPE)

in other places we restrict this to single-bit integral types instead which
covers a few more cases and also would handle BOOLEAN_TYPE
with either padding or non-padding extra bits correctly (IIRC fortran
has only padding bits but Ada has BOOLEAN_TYPEs with possibly
> 1 bit precision and arbitrary signedness - maybe even with custom
true/false values).

Richard.

>  {
>tree op10 = TREE_OPERAND (expr1, 0);
>tree op20 = TREE_OPERAND (expr2, 0);
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c
> new file mode 100644
> index 000..8aad758e10f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c
> @@ -0,0 +1,10 @@
> +
> +#define comparison (f < 0)
> +int main() {
> +  int f = 0;
> +  int d = comparison | !comparison;
> +  if (d != 1)
> +__builtin_abort();
> +  return 0;
> +}
> +
> --
> 2.31.1
>


Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
>
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

I think -mgather/-mscatter are too close to -mfma suggesting they
enable part of an ISA but they won't disable the use of intrinsics
or enable gather/scatter on CPUs where the ISA doesn't have them.

May I suggest to invent a more generic "short-cut" to
-mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
tunables add ^use_gather_any to cover all cases?  (or
change what use_gather controls - it seems we changed its
meaning before, and instead add use_gather_8parts and
use_gather_16parts)

That is, what's the point of this?

Richard.

> gcc/ChangeLog:
>
> * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> GATHER_SCATTER_DRIVER_SELF_SPECS.
> (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> * config/i386/i386.opt (mgather): New option.
> (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++-
>  gcc/config/i386/i386.opt |  8 
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  
> "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   
> %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter}
>  \
> +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>
>  /* -march=native handling only makes sense with compiler running on
> an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> --
> 2.31.1
>


Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
 wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
>
> I think -mgather/-mscatter are too close to -mfma suggesting they
> enable part of an ISA but they won't disable the use of intrinsics
> or enable gather/scatter on CPUs where the ISA doesn't have them.
>
> May I suggest to invent a more generic "short-cut" to
> -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> tunables add ^use_gather_any to cover all cases?  (or
> change what use_gather controls - it seems we changed its
> meaning before, and instead add use_gather_8parts and
> use_gather_16parts)
>
> That is, what's the point of this?

https://www.phoronix.com/review/downfall

that caused:

https://www.phoronix.com/review/intel-downfall-benchmarks

Uros.


Re: [PATCH V2 0/3] RISC-V: Add an experimental vector calling convention

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 9:04 AM Lehua Ding  wrote:
>
> Hi RISC-V folks,
>
> This patch implement the proposal of RISC-V vector calling convention[1] and
> this feature can be enabled by `--param=riscv-vector-abi` option. Currently,
> all vector type arguments and return values are pass by reference. With this
> patch, these arguments and return values can pass through vector registers.
> Currently only vector types defined in the RISC-V Vector Extension Intrinsic 
> Document[2]
> are supported. GNU-ext vector types are unsupported for now since the
> corresponding proposal was not presented.
>
> The proposal introduce a new calling convention variant, functions which 
> follow
> this variant need follow the bellow vector register convention.
>
> | Name| ABI Mnemonic | Meaning  | Preserved across 
> calls?
> =
> | v0  |  | Argument register| No
> | v1-v7   |  | Callee-saved registers   | Yes
> | v8-v23  |  | Argument registers   | No
> | v24-v31 |  | Callee-saved registers   | Yes
>
> If a functions follow this vector calling convention, then the function 
> symbole
> must be annotated with .variant_cc directive[3] (used to indicate that it is a
> calling convention variant).
>
> This implementation split into three parts, each part corresponds to a 
> sub-patch.
>
> - Part-1: Select suitable vector regsiters for vector type arguments and 
> return
>   values according to the proposal.
> - Part-2: Allocate frame area for callee-saved vector registers and 
> save/restore
>   them in prologue and epilogue.
> - Part-3: Generate .variant_cc directive for vector function in assembly code.

Just to mention at some point you want to think about the OpenMP SIMD ABI which
includes a mangling scheme but would also open up to have different
calling conventions.
So please keep that usage case in mind, possibly allowing the vector
calling convention
to be required for this.  Also note there's 'inbranch' variants which
require passing
a mask - your table above doesn't list any mask registers (in case
those exist in RISC-V).

Richard.

> Best,
> Lehua
>
> [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389
> [2] 
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system
> [3] 
> https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
>
> Lehua Ding (3):
>   RISC-V: Part-1: Select suitable vector registers for vector type args
> and returns
>   RISC-V: Part-2: Save/Restore vector registers which need to be
> preversed
>   RISC-V: Part-3: Output .variant_cc directive for vector function
>
>  gcc/config/riscv/riscv-protos.h   |   4 +
>  gcc/config/riscv/riscv-sr.cc  |  12 +-
>  gcc/config/riscv/riscv-vector-builtins.cc |  10 +
>  gcc/config/riscv/riscv.cc | 505 --
>  gcc/config/riscv/riscv.h  |  40 ++
>  gcc/config/riscv/riscv.md |  43 +-
>  gcc/config/riscv/riscv.opt|   5 +
>  .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
>  .../riscv/rvv/base/abi-call-args-1.c  | 197 +++
>  .../riscv/rvv/base/abi-call-args-2-run.c  |  34 ++
>  .../riscv/rvv/base/abi-call-args-2.c  |  27 +
>  .../riscv/rvv/base/abi-call-args-3-run.c  | 260 +
>  .../riscv/rvv/base/abi-call-args-3.c  | 116 
>  .../riscv/rvv/base/abi-call-args-4-run.c  | 145 +
>  .../riscv/rvv/base/abi-call-args-4.c  | 111 
>  .../riscv/rvv/base/abi-call-error-1.c |  11 +
>  .../riscv/rvv/base/abi-call-return-run.c  | 127 +
>  .../riscv/rvv/base/abi-call-return.c  | 197 +++
>  .../riscv/rvv/base/abi-call-variant_cc.c  |  39 ++
>  .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
>  .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
>  .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
>  .../riscv/rvv/base/abi-callee-saved-2.c   | 117 
>  23 files changed, 2322 insertions(+), 62 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1-run.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2-run.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3-run.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4-run.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-error-1.c
>  create mode 100644

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
>
> On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
>  wrote:
> >
> > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> > >
> > > Currently we have 3 different independent tunes for gather
> > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > similar for scatter, there're
> > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > >
> > > The patch support 2 standardizing options to enable/disable
> > > vectorization for all gather/scatter instructions. The options is
> > > interpreted by driver to 3 tunes.
> > >
> > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > Ok for trunk?
> >
> > I think -mgather/-mscatter are too close to -mfma suggesting they
> > enable part of an ISA but they won't disable the use of intrinsics
> > or enable gather/scatter on CPUs where the ISA doesn't have them.
> >
> > May I suggest to invent a more generic "short-cut" to
> > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > tunables add ^use_gather_any to cover all cases?  (or
> > change what use_gather controls - it seems we changed its
> > meaning before, and instead add use_gather_8parts and
> > use_gather_16parts)
> >
> > That is, what's the point of this?
>
> https://www.phoronix.com/review/downfall
>
> that caused:
>
> https://www.phoronix.com/review/intel-downfall-benchmarks

Yes, I know.  But there's -mtune-ctl= doing the trick.
GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
to resurrect that behavior and add use_gather_8+parts (or two, IIRC
gather works only on SI/SFmode or larger).

Then -mtune-ctl=^use_gather works which I think is nice enough?

Richard.

> Uros.


[PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch add support live vectorization by VEC_EXTRACT for LEN loop control.

Consider this following case:

#include 

#define EXTRACT_LAST(TYPE)  \
  TYPE __attribute__ ((noinline, noclone))  \
  test_##TYPE (TYPE *x, int n, TYPE value)  \
  { \
TYPE last;  \
for (int j = 0; j < n; ++j) \
  { \
last = x[j];\
x[j] = last * value;\
  } \
return last;\
  }

#define TEST_ALL(T) \
  T (uint8_t)   \

TEST_ALL (EXTRACT_LAST)

ARM SVE IR:

Preheader:
  max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });

Loop:
  ...
  # loop_mask_22 = PHI 
  ...
  vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
  vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
  .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
  ...
  next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
  ...

Epilogue:
  _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);

For RVV since we prefer len in loop control, after this patch for RVV:

Loop:
  ...
  loop_len_22 = SELECT_VL;
  vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
  vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
  .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
  ...

Epilogue:
  _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23);

Details of this approach:

1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
vectorization
for LEN loop control.
   
   This function we check whether target support:
- Use LEN as the loop control.
- Support VEC_EXTRACT optab.

2. Step 2 - Record LEN for loop control if 
'vect_can_vectorize_extract_last_with_len_p' is true.

3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS).

NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the 
original STMT is a simple
  assignment wheras VEC_EXTRACT is neither pure nor const function 
according to internal-fn.def:

  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)

  If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will 
cause ICE in:

if (need_ssa_update_p (cfun))
  {
gcc_assert (loop_vinfo->any_known_not_updated_vssa);  > Report 
assertion fail here.
fun->gimple_df->ssa_renaming_needed = false;
todo |= TODO_update_ssa_only_virtuals;
  }
  
  I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as true:

- The one is in 'vectorizable_simd_clone_call':

/* When the original call is pure or const but the SIMD ABI dictates
 an aggregate return we will have to use a virtual definition and
 in a loop eventually even need to add a virtual PHI.  That's
 not straight-forward so allow to fix this up via renaming.  */
  if (gimple_call_lhs (stmt)
  && !gimple_vdef (stmt)
  && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
vinfo->any_known_not_updated_vssa = true;
   
   - The other is in 'vectorizable_load':
   
if (memory_access_type == VMAT_LOAD_STORE_LANES)
  vinfo->any_known_not_updated_vssa = true;

  It seems that they are the same reason as me doing in 
'vectorizable_live_operation'.
  Feel free to correct me if I am wrong.

  Bootstrap and Regression on X86 passed.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_can_vectorize_extract_last_with_len_p): New 
function.
(vectorizable_live_operation): Add loop LEN control.

---
 gcc/tree-vect-loop.cc | 74 +++
 1 file changed, 68 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 00058c3c13e..208918f53fb 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8964,6 +8964,24 @@ vect_can_vectorize_without_simd_p (code_helper code)
  && vect_can_vectorize_without_simd_p (tree_code (code)));
 }
 
+/* Return true if target supports extract last vectorization with LEN.  */
+
+static bool
+vect_can_vectorize_extract_last_with_len_p (tree vectype)
+{
+  /* Return false if target doesn't support LEN in loop control.  */
+  machine_mode vmode;
+  if (!get_len_load_store_mode (TYPE_MODE (vectype), true).exists (&vmode)
+  || !get_len_load_store_mode (TYPE_MODE (vectype), false).exists (&vmode))
+return false;
+
+  /* Target need to support VEC_EXTRACT to extract the last active element.  */
+  return convert_optab_handler (vec_extract_optab,
+   TYPE_MODE (vectype),
+   TYPE_MODE (TREE_TYPE (vectype)))
+!= CODE_FOR_nothing;
+}
+
 /* 

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
>
> Yes, I know.  But there's -mtune-ctl= doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
>
> Then -mtune-ctl=^use_gather works which I think is nice enough?
So basically, -mtune-ctrl=^use_gather is used to turn off all gather
vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
We don't have an extrat explicit flag for target tune, just single bit
- ix86_tune_features[X86_TUNE_USE_GATHER]
>
> Richard.
>
> > Uros.



-- 
BR,
Hongtao


Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE)\
>   TYPE __attribute__ ((noinline, noclone))\
>   test_##TYPE (TYPE *x, int n, TYPE value)\
>   {   \
> TYPE last;\
> for (int j = 0; j < n; ++j)   \
>   {   \
>   last = x[j];\
>   x[j] = last * value;\
>   }   \
> return last;  \
>   }
> 
> #define TEST_ALL(T)   \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23);
> 
> Details of this approach:
> 
> 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
> vectorization
> for LEN loop control.
>
>This function we check whether target support:
> - Use LEN as the loop control.
> - Support VEC_EXTRACT optab.
> 
> 2. Step 2 - Record LEN for loop control if 
> 'vect_can_vectorize_extract_last_with_len_p' is true.
> 
> 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS).
> 
> NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the 
> original STMT is a simple
>   assignment wheras VEC_EXTRACT is neither pure nor const function 
> according to internal-fn.def:
> 
>   DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)
> 
>   If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will 
> cause ICE in:
> 
> if (need_ssa_update_p (cfun))
>   {
> gcc_assert (loop_vinfo->any_known_not_updated_vssa);  > 
> Report assertion fail here.
> fun->gimple_df->ssa_renaming_needed = false;
> todo |= TODO_update_ssa_only_virtuals;
>   }
>   
>   I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as 
> true:
> 
>   - The one is in 'vectorizable_simd_clone_call':
> 
>   /* When the original call is pure or const but the SIMD ABI dictates
>an aggregate return we will have to use a virtual definition and
>in a loop eventually even need to add a virtual PHI.  That's
>not straight-forward so allow to fix this up via renaming.  */
>   if (gimple_call_lhs (stmt)
> && !gimple_vdef (stmt)
> && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
>   vinfo->any_known_not_updated_vssa = true;
>
>- The other is in 'vectorizable_load':
>
> if (memory_access_type == VMAT_LOAD_STORE_LANES)
> vinfo->any_known_not_updated_vssa = true;
> 
>   It seems that they are the same reason as me doing in 
> 'vectorizable_live_operation'.
>   Feel free to correct me if I am wrong.

You should always manually update things.  Did you verify the mask
case is handled by this?

There's the odd

  if (stmts)
{
  gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);

  /* Remove existing phi from lhs and create one copy from 
new_tree.  */
  tree lhs_phi = NULL_TREE;
  gimple_stmt_iterator gsi;
  for (gsi = gsi_start_phis (exit_bb);
   !gsi_end_p (gsi); gsi_next (&gsi))
{
  gimple *phi = gsi_stmt (gsi);
  if ((gimple_phi_arg_def (phi, 0) == lhs))
{
  remove_phi_node (&gsi, false);
  lhs_phi = gimple_phi_result (phi);
  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
  gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
  break;
}
}

code but I don't think it will create new LC PHIs for the mask, instead
it will break LC SSA as well by removing a PHI?

I guess as a 

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu  wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl= doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]
Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH V2 0/3] RISC-V: Add an experimental vector calling convention

2023-08-10 Thread Lehua Ding
Hi Richard,


Thanks review.


> Just to mention at some point you want to think about the OpenMP SIMD ABI 
which
> includes a mangling scheme but would also open up to have different
> calling conventions.> So please keep that usage case in mind, possibly 
allowing the vector
> calling convention
> to be required for this.  


Thanks remainder. A new function attribute `riscv_vector_cc` will be 
introduced
later to specify that a function adheres to the vector calling convention.

> Also note there's 'inbranch' variants which
> require passing
> a mask - your table above doesn't list any mask registers (in case
> those exist in RISC-V).


Separate mask registers do not exist in RISC-V;
mask arguments share vector registers.

Best,
Lehua

PCB/PCBA/wire harness factory support-OurPCB

2023-08-10 Thread Kathy Zhang
Dear partner,

I hope this message finds you in good health.

I am Kathy and represent OurPCB, a company producing high-quality PCB/PCBA 
and wire harness products for over 17 years. 

We have a wealth of experience in our field and are always looking for new 
opportunities to work with clients.

We wanted to explore the possibility of working together. We would be 
thrilled to assist you if you have any new projects in the pipeline.

Thank you for your time and have a great day!




E   ka...@ourpcb.com 
M  +86-15578277901
Whatsapp +86-13138853861

OurPCB Tech.Ltd Shenzhen office
Room 513, Gongle Development Technology Building, 3234 Baoan Avenue, Baoan 
District, Shenzhen, China, 518100 


www.ourpcb.com
www.wiringo.com


 

Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe.zh...@rivai.ai
Hi, Richi.  Thanks so much for the review.


>> You should always manually update things.  Did you verify the mask
>>case is handled by this?
When we enable LEN as loop control. The only mask case will be the condition 
mask case.

Consider this following case:

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v, int n)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < n; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

It will be handled in vectorizable_condition
by FOLD_EXTRACT_LAST (That's why I have said we will need to add 
LEN_FOLD_EXTRACT_LAST).

I have try many cases turns out vectorizable_live_operation only handle either 
loop LEN control or loop MASK control.
And I found all the mask cases of loop LEN control will be handled by 
vectorizable_condition with (LEN_FOLD_EXTRACT_LAST in my downstream).

IMHO, no mask case comes into 'vectorizable_live_operation' when we are using 
LEN as the loop control.

>> code but I don't think it will create new LC PHIs for the mask, instead
>> it will break LC SSA as well by removing a PHI?
I have tried several debug cases in ARM SVE turns out that the LC SSA is 
created by:
'rewrite_into_loop_closed_ssa'


Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-10 15:58
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE) \
>   TYPE __attribute__ ((noinline, noclone)) \
>   test_##TYPE (TYPE *x, int n, TYPE value) \
>   { \
> TYPE last; \
> for (int j = 0; j < n; ++j) \
>   { \
> last = x[j]; \
> x[j] = last * value; \
>   } \
> return last; \
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23);
> 
> Details of this approach:
> 
> 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
> vectorization
> for LEN loop control.
>
>This function we check whether target support:
> - Use LEN as the loop control.
> - Support VEC_EXTRACT optab.
> 
> 2. Step 2 - Record LEN for loop control if 
> 'vect_can_vectorize_extract_last_with_len_p' is true.
> 
> 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS).
> 
> NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the 
> original STMT is a simple
>   assignment wheras VEC_EXTRACT is neither pure nor const function 
> according to internal-fn.def:
> 
>   DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)
> 
>   If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will 
> cause ICE in:
> 
> if (need_ssa_update_p (cfun))
>   {
> gcc_assert (loop_vinfo->any_known_not_updated_vssa);  > 
> Report assertion fail here.
> fun->gimple_df->ssa_renaming_needed = false;
> todo |= TODO_update_ssa_only_virtuals;
>   }
>   
>   I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as 
> true:
> 
> - The one is in 'vectorizable_simd_clone_call':
> 
> /* When the original call is pure or const but the SIMD ABI dictates
> an aggregate return we will have to use a virtual definition and
> in a loop eventually even need to add a virtual PHI.  That's
> not straight-forward so allow to fix this up via renaming.  */
>   if (gimple_call_lhs (stmt)
>   && !gimple_vdef (stmt)
>   && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
> vinfo->any_known_not_updated_vssa = true;
>
>- The other is in 'vectorizable_load':
>
> if (memory_access_type == VMAT_LOAD_STORE_LANES)
>   vinfo->any_known_not_updated_vssa = true;
> 
>   It seems that they are the same reason as me doing in 
> 'vectorizable_live_operation'.
>   Feel free to correct me if I am wrong.
 
You should always manually update things.  Did you verify the mask
case 

[PATCH v1] RISC-V: Support RVV VFNMACC rounding mode intrinsic API

2023-08-10 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFNMACC for the below samples.

* __riscv_vfnmacc_vv_f32m1_rm
* __riscv_vfnmacc_vv_f32m1_rm_m
* __riscv_vfnmacc_vf_f32m1_rm
* __riscv_vfnmacc_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmacc_frm): New class for vfnmacc.
(vfnmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmacc_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-nmacc.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 24 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 .../riscv/rvv/base/float-point-nmacc.c| 47 +++
 4 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 1695d77e8bd..1d4a5a18bf9 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -379,6 +379,28 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfnmacc
+*/
+class vfnmacc_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+   true, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+   true, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2144,6 +2166,7 @@ static CONSTEXPR const vfnmsac vfnmsac_obj;
 static CONSTEXPR const vfmadd vfmadd_obj;
 static CONSTEXPR const vfnmsub vfnmsub_obj;
 static CONSTEXPR const vfnmacc vfnmacc_obj;
+static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
 static CONSTEXPR const vfmsac vfmsac_obj;
 static CONSTEXPR const vfnmadd vfnmadd_obj;
 static CONSTEXPR const vfmsub vfmsub_obj;
@@ -2380,6 +2403,7 @@ BASE (vfnmsac)
 BASE (vfmadd)
 BASE (vfnmsub)
 BASE (vfnmacc)
+BASE (vfnmacc_frm)
 BASE (vfmsac)
 BASE (vfnmadd)
 BASE (vfmsub)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 67d18412b4c..247074d0868 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -165,6 +165,7 @@ extern const function_base *const vfnmsac;
 extern const function_base *const vfmadd;
 extern const function_base *const vfnmsub;
 extern const function_base *const vfnmacc;
+extern const function_base *const vfnmacc_frm;
 extern const function_base *const vfmsac;
 extern const function_base *const vfnmadd;
 extern const function_base *const vfmsub;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 92ecf8a9065..7aae0665520 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -351,6 +351,8 @@ DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
 
 DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f_vvfv_ops)
 
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
new file mode 100644
index 000..fca378b7a8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfnmacc_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+   vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmacc_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfnmacc_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+   vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmacc_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfnmacc_vf_f32m1_rm (vfloat32m1_t vd, float32_t op1, vfloat32m1_t op2,
+ size_t vl) {
+  return __riscv_vfnmacc_vf_f32

Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe.zh...@rivai.ai
>> I guess as a temporary thing your approach is OK but we shouldn't
>> add these as part of new code - it's supposed to handle legacy
>> cases that we didn't fixup yet.

Do you mean we need to fix LC SSA PHI flow so that we don't need to 
set vinfo->any_known_not_updated_vssa = true ?

After it's fixed then this patch with removing 
'vinfo->any_known_not_updated_vssa = true' is ok for trunk, am I right?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-10 15:58
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE) \
>   TYPE __attribute__ ((noinline, noclone)) \
>   test_##TYPE (TYPE *x, int n, TYPE value) \
>   { \
> TYPE last; \
> for (int j = 0; j < n; ++j) \
>   { \
> last = x[j]; \
> x[j] = last * value; \
>   } \
> return last; \
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23);
> 
> Details of this approach:
> 
> 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
> vectorization
> for LEN loop control.
>
>This function we check whether target support:
> - Use LEN as the loop control.
> - Support VEC_EXTRACT optab.
> 
> 2. Step 2 - Record LEN for loop control if 
> 'vect_can_vectorize_extract_last_with_len_p' is true.
> 
> 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS).
> 
> NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the 
> original STMT is a simple
>   assignment wheras VEC_EXTRACT is neither pure nor const function 
> according to internal-fn.def:
> 
>   DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)
> 
>   If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will 
> cause ICE in:
> 
> if (need_ssa_update_p (cfun))
>   {
> gcc_assert (loop_vinfo->any_known_not_updated_vssa);  > 
> Report assertion fail here.
> fun->gimple_df->ssa_renaming_needed = false;
> todo |= TODO_update_ssa_only_virtuals;
>   }
>   
>   I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as 
> true:
> 
> - The one is in 'vectorizable_simd_clone_call':
> 
> /* When the original call is pure or const but the SIMD ABI dictates
> an aggregate return we will have to use a virtual definition and
> in a loop eventually even need to add a virtual PHI.  That's
> not straight-forward so allow to fix this up via renaming.  */
>   if (gimple_call_lhs (stmt)
>   && !gimple_vdef (stmt)
>   && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
> vinfo->any_known_not_updated_vssa = true;
>
>- The other is in 'vectorizable_load':
>
> if (memory_access_type == VMAT_LOAD_STORE_LANES)
>   vinfo->any_known_not_updated_vssa = true;
> 
>   It seems that they are the same reason as me doing in 
> 'vectorizable_live_operation'.
>   Feel free to correct me if I am wrong.
 
You should always manually update things.  Did you verify the mask
case is handled by this?
 
There's the odd
 
  if (stmts)
{
  gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
  /* Remove existing phi from lhs and create one copy from 
new_tree.  */
  tree lhs_phi = NULL_TREE;
  gimple_stmt_iterator gsi;
  for (gsi = gsi_start_phis (exit_bb);
   !gsi_end_p (gsi); gsi_next (&gsi))
{
  gimple *phi = gsi_stmt (gsi);
  if ((gimple_phi_arg_def (phi, 0) == lhs))
{
  remove_phi_node (&gsi, false);
  lhs_phi = gimple_phi_result (phi);
  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
  gsi_insert_before (&exit_gsi, copy, G

Re: [PATCH v1] RISC-V: Support RVV VFNMACC rounding mode intrinsic API

2023-08-10 Thread Kito Cheng via Gcc-patches
LGTM

On Thu, Aug 10, 2023 at 4:20 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch would like to support the rounding mode API for the
> VFNMACC for the below samples.
>
> * __riscv_vfnmacc_vv_f32m1_rm
> * __riscv_vfnmacc_vv_f32m1_rm_m
> * __riscv_vfnmacc_vf_f32m1_rm
> * __riscv_vfnmacc_vf_f32m1_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc
> (class vfnmacc_frm): New class for vfnmacc.
> (vfnmacc_frm_obj): New declaration.
> (BASE): Ditto.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def
> (vfnmacc_frm): New function definition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-nmacc.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 24 ++
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 +
>  .../riscv/rvv/base/float-point-nmacc.c| 47 +++
>  4 files changed, 74 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 1695d77e8bd..1d4a5a18bf9 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -379,6 +379,28 @@ public:
>}
>  };
>
> +/* Implements below instructions for frm
> +   - vfnmacc
> +*/
> +class vfnmacc_frm : public function_base
> +{
> +public:
> +  bool has_rounding_mode_operand_p () const override { return true; }
> +
> +  bool has_merge_operand_p () const override { return false; }
> +
> +  rtx expand (function_expander &e) const override
> +  {
> +if (e.op_info->op == OP_TYPE_vf)
> +  return e.use_ternop_insn (
> +   true, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
> +if (e.op_info->op == OP_TYPE_vv)
> +  return e.use_ternop_insn (
> +   true, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
> +gcc_unreachable ();
> +  }
> +};
> +
>  /* Implements vrsub.  */
>  class vrsub : public function_base
>  {
> @@ -2144,6 +2166,7 @@ static CONSTEXPR const vfnmsac vfnmsac_obj;
>  static CONSTEXPR const vfmadd vfmadd_obj;
>  static CONSTEXPR const vfnmsub vfnmsub_obj;
>  static CONSTEXPR const vfnmacc vfnmacc_obj;
> +static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
>  static CONSTEXPR const vfmsac vfmsac_obj;
>  static CONSTEXPR const vfnmadd vfnmadd_obj;
>  static CONSTEXPR const vfmsub vfmsub_obj;
> @@ -2380,6 +2403,7 @@ BASE (vfnmsac)
>  BASE (vfmadd)
>  BASE (vfnmsub)
>  BASE (vfnmacc)
> +BASE (vfnmacc_frm)
>  BASE (vfmsac)
>  BASE (vfnmadd)
>  BASE (vfmsub)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 67d18412b4c..247074d0868 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -165,6 +165,7 @@ extern const function_base *const vfnmsac;
>  extern const function_base *const vfmadd;
>  extern const function_base *const vfnmsub;
>  extern const function_base *const vfnmacc;
> +extern const function_base *const vfnmacc_frm;
>  extern const function_base *const vfmsac;
>  extern const function_base *const vfnmadd;
>  extern const function_base *const vfmsub;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 92ecf8a9065..7aae0665520 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -351,6 +351,8 @@ DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
>
>  DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
>  DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
> +DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f__ops)
> +DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f_vvfv_ops)
>
>  // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
>  DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
> new file mode 100644
> index 000..fca378b7a8f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +typedef float float32_t;
> +
> +vfloat32m1_t
> +test_riscv_vfnmacc_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
> +   vfloat32m1_t op2, size_t vl) {
> +  return __riscv_vfnmacc_vv_f32m1_rm (vd, op1, op2, 0, vl);
> +}
> +
> +vfloat32m1_t
> +test_vfnmacc_vv_f32m1_rm_m (vbool32_t mask, vfloat

[PATCH] RISC-V: Add missing modes to the iterators

2023-08-10 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/vector-iterators.md: Add missing modes.

---
 gcc/config/riscv/vector-iterators.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 14829989e09..30808ceb241 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -468,6 +468,7 @@
   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
 
+  (V1HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16")
   (V2HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16")
   (V4HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16")
   (V8HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16")
@@ -479,6 +480,7 @@
   (V512HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 
1024")
   (V1024HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN 
>= 2048")
   (V2048HF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN 
>= 4096")
+  (V1SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32")
   (V2SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32")
   (V4SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32")
   (V8SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32")
@@ -489,6 +491,7 @@
   (V256SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 
1024")
   (V512SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 
2048")
   (V1024SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN 
>= 4096")
+  (V1DF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_64")
   (V2DF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_64")
   (V4DF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_64")
   (V8DF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 
64")
-- 
2.36.3



Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu  wrote:
>
> On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu  wrote:
> >
> > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  
> > > > > wrote:
> > > > > >
> > > > > > Currently we have 3 different independent tunes for gather
> > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > similar for scatter, there're
> > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > >
> > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > interpreted by driver to 3 tunes.
> > > > > >
> > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > Ok for trunk?
> > > > >
> > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > >
> > > > > May I suggest to invent a more generic "short-cut" to
> > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > change what use_gather controls - it seems we changed its
> > > > > meaning before, and instead add use_gather_8parts and
> > > > > use_gather_16parts)
> > > > >
> > > > > That is, what's the point of this?
The point of this is to keep consistent between GCC, LLVM, and
ICX(Intel® oneAPI DPC++/C++ Compiler) .
LLVM,ICX will support that option.
> > > >
> > > > https://www.phoronix.com/review/downfall
> > > >
> > > > that caused:
> > > >
> > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > >
> > > Yes, I know.  But there's -mtune-ctl= doing the trick.
> > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > gather works only on SI/SFmode or larger).
> > >
> > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > We don't have an extrat explicit flag for target tune, just single bit
> > - ix86_tune_features[X86_TUNE_USE_GATHER]
> Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > >
> > > Richard.
> > >
> > > > Uros.
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


[PATCH] RISC-V: Support TU for integer ternary OP[PR110964]

2023-08-10 Thread Juzhe-Zhong
PR target/110964

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_cond_len_ternop): Add integer ternary.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110964.c: New test.

---
 gcc/config/riscv/riscv-v.cc |  3 +--
 .../gcc.target/riscv/rvv/autovec/pr110964.c | 13 +
 2 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110964.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c9f0a4a9e7b..a3062c90618 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3604,8 +3604,7 @@ expand_cond_len_ternop (unsigned icode, rtx *ops)
   if (FLOAT_MODE_P (mode))
emit_nonvlmax_fp_ternary_tu_insn (icode, RVV_TERNOP_TU, ops, len);
   else
-   /* FIXME: Enable this case when we support it in the middle-end.  */
-   gcc_unreachable ();
+   emit_nonvlmax_tu_insn (icode, RVV_TERNOP_TU, ops, len);
 }
   else
 {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110964.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110964.c
new file mode 100644
index 000..cf2d1fb5f1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110964.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=scalable -Ofast" } */
+
+int *a;
+long b, c;
+
+int d ()
+{
+  const int e;
+  for (; a < e; a++) /* { dg-warning "comparison between pointer and integer" 
} */
+c += *a * b;
+}
+
-- 
2.36.3



Re: [PATCH v4] Implement new RTL optimizations pass: fold-mem-offsets.

2023-08-10 Thread Manolis Tsamis
Hi Jeff,

Thanks a lot for providing all this information and testcase! I have
been able to reproduce the issue with it.

I have investigated the cause of the issue and it's not what you
mention, the uses of all intermediate calculations are properly taken
into account.
In this case it would be fine to alter the runtime value of insn 58
because its uses are also memory accesses that can have a folded
offset. The real issue is that the offset for these two (insn 60/61)
is not updated.

I think this can be seen by using -fdump-rtl-fold_mem_offsets-all
which also shows root memory instructions and when instructions are
marked for propagation. Here's the relevant dump (a bit long but
should help with this):

Starting analysis from root: (insn 2 95 3 2 (set (reg/v/f:SI 3 %d3
[orig:44 pD.1867 ] [44])
(mem/f/c:SI (plus:SI (reg/f:SI 15 %sp)
(const_int 60 [0x3c])) [1 pD.1867+0 S4 A32]))
"/home/mtsamis/temp/fmo-bug-2/a.c":23:1 55 {*movsi_m68k2}
 (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 %sp)
(const_int 60 [0x3c])) [1 pD.1867+0 S4 A32])
(nil)))
Starting analysis from root: (insn 3 2 4 2 (set (reg/v/f:SI 2 %d2
[orig:45 rD.1868 ] [45])
(mem/f/c:SI (plus:SI (reg/f:SI 15 %sp)
(const_int 64 [0x40])) [1 rD.1868+0 S4 A32]))
"/home/mtsamis/temp/fmo-bug-2/a.c":23:1 55 {*movsi_m68k2}
 (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 %sp)
(const_int 64 [0x40])) [1 rD.1868+0 S4 A32])
(nil)))
Starting analysis from root: (insn 14 12 16 2 (set (mem/f:SI (reg/f:SI
15 %sp) [1  S4 A16])
(reg/v/f:SI 2 %d2 [orig:45 rD.1868 ] [45]))
"/home/mtsamis/temp/fmo-bug-2/a.c":25:21 discrim 1 54 {*movsi_m68k}
 (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
(nil)))
Starting analysis from root: (insn 42 41 43 3 (set (mem:SI (reg:SI 8
%a0 [61]) [0 MEM  [(voidD.37 *)_15]+0 S4 A8])
(const_int 1633837924 [0x61626364]))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 discrim 1 55 {*movsi_m68k2}
 (nil))
Starting analysis from root: (insn 43 42 45 3 (set (mem:QI (plus:SI
(reg:SI 8 %a0 [61])
(const_int 4 [0x4])) [0 MEM  [(voidD.37
*)_15]+4 S1 A8])
(const_int 101 [0x65]))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 discrim 1 62 {*m68k.md:1130}
 (expr_list:REG_DEAD (reg:SI 8 %a0 [61])
(nil)))
Instruction marked for propagation: (insn 41 39 42 3 (set (reg:SI 8 %a0 [61])
(plus:SI (reg/f:SI 12 %a4 [52])
(reg:SI 13 %a5 [orig:36 _14 ] [36])))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 discrim 1 150
{*addsi3_internal}
 (nil))
Starting analysis from root: (insn 60 59 61 3 (set (mem:HI (reg:SI 8
%a0 [73]) [0 MEM  [(voidD.37 *)_19]+0 S2 A8])
(const_int 26215 [0x6667]))
"/home/mtsamis/temp/fmo-bug-2/a.c":31:4 discrim 1 58 {*m68k.md:1084}
 (nil))
Starting analysis from root: (insn 61 60 63 3 (set (mem:QI (plus:SI
(reg:SI 8 %a0 [73])
(const_int 2 [0x2])) [0 MEM  [(voidD.37
*)_19]+2 S1 A8])
(const_int 0 [0])) "/home/mtsamis/temp/fmo-bug-2/a.c":31:4
discrim 1 62 {*m68k.md:1130}
 (expr_list:REG_DEAD (reg:SI 8 %a0 [73])
(nil)))
Instruction marked for propagation: (insn 59 58 60 3 (set (reg:SI 8 %a0 [73])
(plus:SI (reg/f:SI 12 %a4 [52])
(reg:SI 8 %a0 [72])))
"/home/mtsamis/temp/fmo-bug-2/a.c":31:4 discrim 1 150
{*addsi3_internal}
 (nil))
Instruction marked for propagation: (insn 58 57 59 3 (set (reg:SI 8 %a0 [72])
(plus:SI (plus:SI (reg:SI 13 %a5 [orig:36 _14 ] [36])
(reg:SI 11 %a3 [49]))
(const_int 5 [0x5])))
"/home/mtsamis/temp/fmo-bug-2/a.c":31:4 407 {*lea}
 (expr_list:REG_DEAD (reg:SI 13 %a5 [orig:36 _14 ] [36])
(expr_list:REG_DEAD (reg:SI 11 %a3 [49])
(nil
Instruction marked for propagation: (insn 39 38 41 3 (set (reg:SI 13
%a5 [orig:36 _14 ] [36])
(plus:SI (reg:SI 10 %a2 [47])
(const_int 1 [0x1])))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 150 {*addsi3_internal}
 (nil))
Memory offset changed from 0 to 1 for instruction:
(insn 42 41 43 3 (set (mem:SI (reg:SI 8 %a0 [61]) [0 MEM
 [(voidD.37 *)_15]+0 S4 A8])
(const_int 1633837924 [0x61626364]))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 discrim 1 55 {*movsi_m68k2}
 (nil))
deferring rescan insn with uid = 42.
Memory offset changed from 4 to 5 for instruction:
(insn 43 42 45 3 (set (mem:QI (plus:SI (reg:SI 8 %a0 [61])
(const_int 4 [0x4])) [0 MEM  [(voidD.37
*)_15]+4 S1 A8])
(const_int 101 [0x65]))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 discrim 1 62 {*m68k.md:1130}
 (expr_list:REG_DEAD (reg:SI 8 %a0 [61])
(nil)))
deferring rescan insn with uid = 43.
Instruction folded:(insn 39 38 41 3 (set (reg:SI 13 %a5 [orig:36 _14 ] [36])
(plus:SI (reg:SI 10 %a2 [47])
(const_int 1 [0x1])))
"/home/mtsamis/temp/fmo-bug-2/a.c":29:4 150 {*addsi3_internal}
 (nil))

Here you can see that insn 39 is the last to be marked for pro

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-10 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

>> Why would HWCAP_USCAT not be set by the kernel?
>> 
>> Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.
>>
> Answering my own question, N1 does not officially have FEAT_LSE2.

It doesn't indeed. However most cores support atomic 128-bit load/store
(part of LSE2), so we can still use the LSE2 ifunc for those cores. Since there
isn't a feature bit for this in the CPU or HWCAP, I check the CPUID register.

Cheers,
Wilco

[PATCH 13/12] C _BitInt incremental fixes [PR102989]

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 09:17:57PM +, Joseph Myers wrote:
> > - _Complex _BitInt(N) isn't supported; again mainly because none of the 
> > psABIs
> >   mention how those should be passed/returned; in a limited way they are
> >   supported internally because the internal functions into which
> >   __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
> >   hack to return 2 values without using references/pointers
> 
> What happens when the usual arithmetic conversions are applied to 
> operands, one of which is a complex integer type and the other of which is 
> a wider _BitInt type?  I don't see anything in the code to disallow this 
> case (which would produce an expression with a _Complex _BitInt type), or 
> any testcases for it.

I've added a sorry for that case (+ return the narrower COMPLEX_TYPE).
Also added testcase to verify we don't create VECTOR_TYPEs of BITINT_TYPE
even if they have mode precision and suitable size (others were rejected
already before).

> Other testcases I think should be present (along with any corresponding 
> changes needed to the code itself):
> 
> * Verifying that the new integer constant suffix is rejected for C++.

Done.

> * Verifying appropriate pedwarn-if-pedantic for the new constant suffix 
> for versions of C before C2x (and probably for use of _BitInt type 
> specifiers before C2x as well) - along with the expected -Wc11-c2x-compat 
> handling (in C2x mode) / -pedantic -Wno-c11-c2x-compat in older modes.

Done.

Here is an incremental patch which does that:

2023-08-10  Jakub Jelinek  

PR c/102989
gcc/c/
* c-decl.cc (finish_declspecs): Emit pedwarn_c11 on _BitInt.
* c-typeck.cc (c_common_type): Emit sorry for common type between
_Complex integer and larger _BitInt and return the _Complex integer.
gcc/c-family/
* c-attribs.cc (type_valid_for_vector_size): Reject vector types
with BITINT_TYPE elements even if they have mode precision and
suitable size.
gcc/testsuite/
* gcc.dg/bitint-19.c: New test.
* gcc.dg/bitint-20.c: New test.
* gcc.dg/bitint-21.c: New test.
* gcc.dg/bitint-22.c: New test.
* gcc.dg/bitint-23.c: New test.
* gcc.dg/bitint-24.c: New test.
* gcc.dg/bitint-25.c: New test.
* gcc.dg/bitint-26.c: New test.
* gcc.dg/bitint-27.c: New test.
* g++.dg/ext/bitint1.C: New test.
* g++.dg/ext/bitint2.C: New test.
* g++.dg/ext/bitint3.C: New test.
* g++.dg/ext/bitint4.C: New test.
libcpp/
* expr.cc (cpp_classify_number): Diagnose wb literal suffixes
for -pedantic* before C2X or -Wc11-c2x-compat.

--- gcc/c/c-decl.cc.jj  2023-08-10 09:26:39.776509713 +0200
+++ gcc/c/c-decl.cc 2023-08-10 11:14:12.686238299 +0200
@@ -12933,8 +12933,15 @@ finish_declspecs (struct c_declspecs *sp
   if (specs->u.bitint_prec == -1)
specs->type = integer_type_node;
   else
-   specs->type = build_bitint_type (specs->u.bitint_prec,
-specs->unsigned_p);
+   {
+ pedwarn_c11 (specs->locations[cdw_typespec], OPT_Wpedantic,
+  "ISO C does not support %<%s_BitInt(%d)%> before C2X",
+  specs->unsigned_p ? "unsigned "
+  : specs->signed_p ? "signed " : "",
+  specs->u.bitint_prec);
+ specs->type = build_bitint_type (specs->u.bitint_prec,
+  specs->unsigned_p);
+   }
   break;
 default:
   gcc_unreachable ();
--- gcc/c/c-typeck.cc.jj2023-08-10 09:26:39.781509641 +0200
+++ gcc/c/c-typeck.cc   2023-08-10 10:03:00.722917789 +0200
@@ -819,6 +819,12 @@ c_common_type (tree t1, tree t2)
return t1;
   else if (code2 == COMPLEX_TYPE && TREE_TYPE (t2) == subtype)
return t2;
+  else if (TREE_CODE (subtype) == BITINT_TYPE)
+   {
+ sorry ("%<_Complex _BitInt(%d)%> unsupported",
+TYPE_PRECISION (subtype));
+ return code1 == COMPLEX_TYPE ? t1 : t2;
+   }
   else
return build_complex_type (subtype);
 }
--- gcc/c-family/c-attribs.cc.jj2023-06-03 15:32:04.311412926 +0200
+++ gcc/c-family/c-attribs.cc   2023-08-10 10:07:05.222377604 +0200
@@ -4366,7 +4366,8 @@ type_valid_for_vector_size (tree type, t
  && GET_MODE_CLASS (orig_mode) != MODE_INT
  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
   || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
-  || TREE_CODE (type) == BOOLEAN_TYPE)
+  || TREE_CODE (type) == BOOLEAN_TYPE
+  || TREE_CODE (type) == BITINT_TYPE)
 {
   if (error_p)
error ("invalid vector type for attribute %qE", atname);
--- gcc/testsuite/gcc.dg/bitint-19.c.jj 2023-08-10 09:33:49.205287806 +0200
+++ gcc/testsuite/gcc.dg/bitint-19.c2023-08-10 09:36:43.312765194 +0200
@@ -0,0 +1,16 @@
+/* PR c/102989 */
+/* { dg-do compile { target bitint } } */
+/* { 

Re: [PATCH] match.pd: Implement missed optimization ((x ^ y) & z) | x -> (z & y) | x [PR109938]

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 08, 2023 at 03:18:51PM +0200, Richard Biener via Gcc-patches wrote:
> On Fri, Aug 4, 2023 at 11:49 PM Drew Ross via Gcc-patches
>  wrote:
> >
> > Adds a simplification for ((x ^ y) & z) | x to be folded into
> > (z & y) | x. Merges this simplification with ((x | y) & z) | x -> (z & y) | 
> > x
> > to prevent duplicate pattern. Tested successfully on x86_64 and x86 targets.
> 
> OK.

Shouldn't
  (bit_ior:c (bit_and:cs (bit_ior:cs @0 @1) @2) @0)
be changed to
  (bit_ior:c (nop_convert1?:s
   (bit_and:cs (nop_convert2?:s (op:cs @0 @1)) @2)) @3)
rather than
  (bit_ior:c (nop_convert1? (bit_and:c (nop_convert2? (op:c @0 @1)) @2)) @3)
in the patch?
I mean the :s modifiers were there for a reason, if some of the
intermediates aren't a single use, then the simplification doesn't simplify
anything and can even make things larger.

Jakub



[PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread Juzhe-Zhong
This patch fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110962

SUBROUTINE a(b,c,d)
  LOGICAL,DIMENSION(INOUT)  :: b
  LOGICAL e
  REAL, DIMENSION(IN) ::  c
  REAL, DIMENSION(INOUT)  ::  d
  REAL, DIMENSION(SIZE(c))   :: f
  WHERE (b.AND.e)
 WHERE (f>=0.)
d = g
 ENDWHERE
  ENDWHERE
END SUBROUTINE a

   PR target/110962

gcc/ChangeLog:

* config/riscv/autovec.md (vec_duplicate): New pattern.

---
 gcc/config/riscv/autovec.md | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 6cb5fa3ed27..3b396a9a990 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -287,6 +287,27 @@
 ;; == Vector creation
 ;; =
 
+;; -
+;;  [BOOL] Duplicate element
+;; -
+;; The patterns in this section are synthetic.
+;; -
+
+;; Implement a predicate broadcast by shifting the low bit of the scalar
+;; input into the top bit by duplicate the input and do a compare with zero.
+(define_expand "vec_duplicate"
+  [(set (match_operand:VB 0 "register_operand")
+   (vec_duplicate:VB (match_operand:QI 1 "register_operand")))]
+  "TARGET_VECTOR"
+  {
+poly_int64 nunits = GET_MODE_NUNITS (mode);
+machine_mode mode = riscv_vector::get_vector_mode (QImode, nunits).require 
();
+rtx dup = expand_vector_broadcast (mode, operands[1]);
+riscv_vector::expand_vec_cmp (operands[0], NE, dup, CONST0_RTX (mode));
+DONE;
+  }
+)
+
 ;; -
 ;;  [INT] Linear series
 ;; -
-- 
2.36.3



Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:

> >> I guess as a temporary thing your approach is OK but we shouldn't
> >> add these as part of new code - it's supposed to handle legacy
> >> cases that we didn't fixup yet.
> 
> Do you mean we need to fix LC SSA PHI flow so that we don't need to 
> set vinfo->any_known_not_updated_vssa = true ?
> 
> After it's fixed then this patch with removing 
> 'vinfo->any_known_not_updated_vssa = true' is ok for trunk, am I right?

I want to know why we don't need this for SVE fully masked loops.  What
inserts the required LC SSA PHI in that case?

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-10 15:58
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richard and Richi.
> > 
> > This patch add support live vectorization by VEC_EXTRACT for LEN loop 
> > control.
> > 
> > Consider this following case:
> > 
> > #include 
> > 
> > #define EXTRACT_LAST(TYPE) \
> >   TYPE __attribute__ ((noinline, noclone)) \
> >   test_##TYPE (TYPE *x, int n, TYPE value) \
> >   { \
> > TYPE last; \
> > for (int j = 0; j < n; ++j) \
> >   { \
> > last = x[j]; \
> > x[j] = last * value; \
> >   } \
> > return last; \
> >   }
> > 
> > #define TEST_ALL(T) \
> >   T (uint8_t) \
> > 
> > TEST_ALL (EXTRACT_LAST)
> > 
> > ARM SVE IR:
> > 
> > Preheader:
> >   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> > 
> > Loop:
> >   ...
> >   # loop_mask_22 = PHI 
> >   ...
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
> >   ...
> >   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> > 
> > For RVV since we prefer len in loop control, after this patch for RVV:
> > 
> > Loop:
> >   ...
> >   loop_len_22 = SELECT_VL;
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
> >   ...
> > 
> > Epilogue:
> >   _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23);
> > 
> > Details of this approach:
> > 
> > 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable 
> > live vectorization
> > for LEN loop control.
> >
> >This function we check whether target support:
> > - Use LEN as the loop control.
> > - Support VEC_EXTRACT optab.
> > 
> > 2. Step 2 - Record LEN for loop control if 
> > 'vect_can_vectorize_extract_last_with_len_p' is true.
> > 
> > 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS).
> > 
> > NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the 
> > original STMT is a simple
> >   assignment wheras VEC_EXTRACT is neither pure nor const function 
> > according to internal-fn.def:
> > 
> >   DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)
> > 
> >   If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will 
> > cause ICE in:
> > 
> > if (need_ssa_update_p (cfun))
> >   {
> > gcc_assert (loop_vinfo->any_known_not_updated_vssa);  > 
> > Report assertion fail here.
> > fun->gimple_df->ssa_renaming_needed = false;
> > todo |= TODO_update_ssa_only_virtuals;
> >   }
> >   
> >   I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as 
> > true:
> > 
> > - The one is in 'vectorizable_simd_clone_call':
> > 
> > /* When the original call is pure or const but the SIMD ABI dictates
> > an aggregate return we will have to use a virtual definition and
> > in a loop eventually even need to add a virtual PHI.  That's
> > not straight-forward so allow to fix this up via renaming.  */
> >   if (gimple_call_lhs (stmt)
> >   && !gimple_vdef (stmt)
> >   && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
> > vinfo->any_known_not_updated_vssa = true;
> >
> >- The other is in 'vectorizable_load':
> >
> > if (memory_access_type == VMAT_LOAD_STORE_LANES)
> >   vinfo->any_known_not_updated_vssa = true;
> > 
> >   It seems that they are the same reason as me doing in 
> > 'vectorizable_live_operation'.
> >   Feel free to correct me if I am wrong.
>  
> You should always manually update things.  Did you verify the mask
> case is handled by this?
>  
> There's the odd
>  
>   if (stmts)
> {
>   gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
>   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>  
>   /* Remove existing phi from lhs and create one copy from 
> new_tree.  */
>   tree lhs_phi = NULL_TREE;
>   gimple_s

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 9:55 AM Hongtao Liu  wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl= doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?

No, -mtune-ctl=use_gather should turn them all on as well.

> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]

GCC 11 just had that single bit for all.  I'm not sure how awkward it is
to have use_gather alias use_gather_2_parts, use_gather_4_parts ...

> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao


Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu  wrote:
>
> On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu  wrote:
> >
> > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > >  wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  
> > > > > > wrote:
> > > > > > >
> > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > similar for scatter, there're
> > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > >
> > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > > interpreted by driver to 3 tunes.
> > > > > > >
> > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > >
> > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > change what use_gather controls - it seems we changed its
> > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > use_gather_16parts)
> > > > > >
> > > > > > That is, what's the point of this?
> The point of this is to keep consistent between GCC, LLVM, and
> ICX(Intel® oneAPI DPC++/C++ Compiler) .
> LLVM,ICX will support that option.

GCC has very many options that are not the same as LLVM or ICX,
I don't see a good reason to special case this one.  As said, it's
a very bad name IMHO.

Richard.

> > > > >
> > > > > https://www.phoronix.com/review/downfall
> > > > >
> > > > > that caused:
> > > > >
> > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > >
> > > > Yes, I know.  But there's -mtune-ctl= doing the trick.
> > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > gather works only on SI/SFmode or larger).
> > > >
> > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > We don't have an extrat explicit flag for target tune, just single bit
> > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > >
> > > > Richard.
> > > >
> > > > > Uros.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao


[PATCH] preserve base pointer for __deregister_frame [PR110956]

2023-08-10 Thread Thomas Neumann via Gcc-patches

Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956
Rainer Orth successfully tested the patch on Solaris with a full bootstrap.



Some uncommon unwinding table encodings need to access the base pointer
for address computations. We do not have that information in calls to
__deregister_frame_info_bases, and previously simply used nullptr as
base pointer. That is usually fine, but for some Solaris i386 shared
libraries that results in wrong address computations.

To fix this problem we now associate the unwinding object with
the table pointer itself, which is always known, in addition to
the PC range. When deregistering a frame, we first locate the object
using the table pointer, and then use the base pointer stored within
the object to compute the PC range.

libgcc/ChangeLog:
PR libgcc/110956
* unwind-dw2-fde.c: Associate object with address of unwinding
table.
---
 libgcc/unwind-dw2-fde.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index d7c4a467754..ae4530179f3 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -124,6 +124,9 @@ __register_frame_info_bases (const void *begin, struct 
object *ob,
 #endif
 
 #ifdef ATOMIC_FDE_FAST_PATH

+  // Register the object itself to know the base pointer on deregistration.
+  btree_insert (®istered_frames, (uintptr_type) begin, 1, ob);
+
   // Register the frame in the b-tree
   uintptr_type range[2];
   get_pc_range (ob, range);
@@ -175,6 +178,9 @@ __register_frame_info_table_bases (void *begin, struct 
object *ob,
   ob->s.b.encoding = DW_EH_PE_omit;
 
 #ifdef ATOMIC_FDE_FAST_PATH

+  // Register the object itself to know the base pointer on deregistration.
+  btree_insert (®istered_frames, (uintptr_type) begin, 1, ob);
+
   // Register the frame in the b-tree
   uintptr_type range[2];
   get_pc_range (ob, range);
@@ -225,22 +231,17 @@ __deregister_frame_info_bases (const void *begin)
 return ob;
 
 #ifdef ATOMIC_FDE_FAST_PATH

-  // Find the corresponding PC range
-  struct object lookupob;
-  lookupob.tbase = 0;
-  lookupob.dbase = 0;
-  lookupob.u.single = begin;
-  lookupob.s.i = 0;
-  lookupob.s.b.encoding = DW_EH_PE_omit;
-#ifdef DWARF2_OBJECT_END_PTR_EXTENSION
-  lookupob.fde_end = NULL;
-#endif
-  uintptr_type range[2];
-  get_pc_range (&lookupob, range);
+  // Find the originally registered object to get the base pointer.
+  ob = btree_remove (®istered_frames, (uintptr_type) begin);
 
-  // And remove

-  ob = btree_remove (®istered_frames, range[0]);
-  bool empty_table = (range[1] - range[0]) == 0;
+  // Remove the corresponding PC range.
+  if (ob)
+{
+  uintptr_type range[2];
+  get_pc_range (ob, range);
+  if (range[0] != range[1])
+   btree_remove (®istered_frames, range[0]);
+}
 
   // Deallocate the sort array if any.

   if (ob && ob->s.b.sorted)
@@ -283,12 +284,11 @@ __deregister_frame_info_bases (const void *begin)
 
  out:

   __gthread_mutex_unlock (&object_mutex);
-  const int empty_table = 0; // The non-atomic path stores all tables.
 #endif
 
   // If we didn't find anything in the lookup data structures then they

   // were either already destroyed or we tried to remove an empty range.
-  gcc_assert (in_shutdown || (empty_table || ob));
+  gcc_assert (in_shutdown || ob);
   return (void *) ob;
 }
 
--

2.39.2



Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> What inserts the required LC SSA PHI in that case?

Here is the flow how GCC insert LC SSA PHI flow for ARM SVE.
You can see this following 'vect' dump details:
https://godbolt.org/z/564o87oz3 

You can see this following information:

;; Created LCSSA PHI: loop_mask_36 = PHI 
  # loop_mask_36 = PHI 
  _25 = .EXTRACT_LAST (loop_mask_36, vect_last_12.8_24);
  last_17 = _25;

The '# loop_mask_36 = PHI ' is inserted as follows:

Step 1 - Enter file tree-vectorizer.cc
In the function pass_vectorize::execute (function *fun): 1358
'rewrite_into_loop_closed_ssa' is the key function insert the LC SSA PHI for 
ARM SVE in mask loop case.

Step 2 - Investigate more into 'rewrite_into_loop_closed_ssa':
In file tree-ssa-loop-manip.cc:628, 'rewrite_into_loop_closed_ssa' is directly 
calling 'rewrite_into_loop_closed_ssa_1'.
Step 3 - Investigate 'rewrite_into_loop_closed_ssa_1':
In file tree-ssa-loop-manip.cc:588 which is the function 'find_uses_to_rename' 
that:
/* Marks names matching USE_FLAGS that are used outside of the loop they are
   defined in for rewrite.  Records the set of blocks in which the ssa names are
   used to USE_BLOCKS.  Record the SSA names that will need exit PHIs in
   NEED_PHIS.  If CHANGED_BBS is not NULL, scan only blocks in this set.  */

static void
find_uses_to_rename (bitmap changed_bbs, bitmap *use_blocks, bitmap need_phis,
 int use_flags)
{
  basic_block bb;
  unsigned index;
  bitmap_iterator bi;

  if (changed_bbs)
EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
  {
  bb = BASIC_BLOCK_FOR_FN (cfun, index);
  if (bb)
find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
  }
  else
FOR_EACH_BB_FN (bb, cfun)
  find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
}

This function is iterating all blocks of the function to set the BITMAP which 
SSA need to be renamed then the later function will insert LC SSA for it.

In file tree-ssa-loop-manip.cc:606 which is the function 'add_exit_phis' that 
is the real function that is adding LC SSA by calling
this eventually:
/* Add a loop-closing PHI for VAR in basic block EXIT.  */

static void
add_exit_phi (basic_block exit, tree var)
{
  gphi *phi;
  edge e;
  edge_iterator ei;

  /* Check that at least one of the edges entering the EXIT block exits
 the loop, or a superloop of that loop, that VAR is defined in.  */
  if (flag_checking)
{
  gimple *def_stmt = SSA_NAME_DEF_STMT (var);
  basic_block def_bb = gimple_bb (def_stmt);
  FOR_EACH_EDGE (e, ei, exit->preds)
  {
class loop *aloop = find_common_loop (def_bb->loop_father,
 e->src->loop_father);
if (!flow_bb_inside_loop_p (aloop, e->dest))
  break;
  }
  gcc_assert (e);
}

  phi = create_phi_node (NULL_TREE, exit);
  create_new_def_for (var, phi, gimple_phi_result_ptr (phi));
  FOR_EACH_EDGE (e, ei, exit->preds)
add_phi_arg (phi, var, e, UNKNOWN_LOCATION);

  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, ";; Created LCSSA PHI: ");
  print_gimple_stmt (dump_file, phi, 0, dump_flags);
}
}


This is how it works for ARM SVE in EXTRACT_LAST. Such flow 
(rewrite_into_loop_closed_ssa) can always insert LC SSA for RVV which is using 
length loop.

However,

>> I want to know why we don't need this for SVE fully masked loops. 

Before entering 'rewrite_into_loop_closed_ssa', there is a check here that RVV 
assertion failed but ARM SVE passed:

  /* We should not have to update virtual SSA form here but some
 transforms involve creating new virtual definitions which makes
 updating difficult.
 We delay the actual update to the end of the pass but avoid
 confusing ourselves by forcing need_ssa_update_p () to false.  */
  unsigned todo = 0;
  if (need_ssa_update_p (cfun))
{
  gcc_assert (loop_vinfo->any_known_not_updated_vssa);
  fun->gimple_df->ssa_renaming_needed = false;
  todo |= TODO_update_ssa_only_virtuals;
}

in tree-vectorizer.cc, function 'vect_transform_loops'
The assertion (gcc_assert (loop_vinfo->any_known_not_updated_vssa);)
failed for RVV since it is false.

The reason why ARM SVE can pass is that the STMT1 before 
'vectorizable_live_operation' and STMT2 after vectorization of 
'vectorizable_live_operation'
are both CONST or PURE since ARM SVE is using EXTRACT_LAST, here is the define 
of 'EXTRACT_LAST' internal function:
/* Extract the last active element from a vector.  */
DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
   extract_last, fold_left)

You can see 'EXTRACT_LAST' is ECF_CONST.

Wheras, RVV will fail since it is 'VEC_EXTRACT' which is not ECF_CONST:
DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)

When I changed VEC_EXTRACT into ECF_CONST, we don't need 
'vinfo->any_known_not_updated_vssa = true'
The flow can perfectly work and no different from ARM SVE.

However, I found we can't make 'VEC_EXTRACT' as ECF_CONST since I found some 
targets us

c: Support for -Wuseless-cast [RR84510]

2023-08-10 Thread Martin Uecker via Gcc-patches



This patch adds the missing support for -Wuseless-cast
to the C FE as requested by some users. It found about 
50 useless casts in one of my projects without false 
positives.

(I also implemented a detection for various
unneeded pointer casts in convert_for_assignment
such as unneeded casts from / to void or casts
followed by an implicit conversion to the original
type, but I did not figure out how to reliably 
identify casts there... But this would be a potential
future enhancement.)


Regression tested on bootstrapped on x86_64-pc-linux-gnu.



c: Support for -Wuseless-cast [RR84510]

Add support for Wuseless-cast C (and ObjC).

PR c/84510

gcc/c/:
* c-typeck.cc (build_c_cast): Add warning.

gcc/doc/:
* invoke.texi: Update.

gcc/testsuite/:
* Wuseless-cast.c: New test.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 0ed87fcc7be..c7b567ba7ab 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1490,7 +1490,7 @@ C++ ObjC++ Var(warn_zero_as_null_pointer_constant) Warning
 Warn when a literal '0' is used as null pointer.
 
 Wuseless-cast
-C++ ObjC++ Var(warn_useless_cast) Warning
+C ObjC C++ ObjC++ Var(warn_useless_cast) Warning
 Warn about useless casts.
 
 Wsubobject-linkage
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..6f2fff51683 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -6062,9 +6062,13 @@ build_c_cast (location_t loc, tree type, tree expr)
 
   if (type == TYPE_MAIN_VARIANT (TREE_TYPE (value)))
 {
-  if (RECORD_OR_UNION_TYPE_P (type))
-   pedwarn (loc, OPT_Wpedantic,
-"ISO C forbids casting nonscalar to the same type");
+  if (RECORD_OR_UNION_TYPE_P (type)
+ && pedwarn (loc, OPT_Wpedantic,
+ "ISO C forbids casting nonscalar to the same type"))
+ ;
+  else if (warn_useless_cast)
+   warning_at (loc, OPT_Wuseless_cast,
+   "useless cast to type %qT", type);
 
   /* Convert to remove any qualifiers from VALUE's type.  */
   value = convert (type, value);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..75ca72f3190 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4772,7 +4772,7 @@ pointers after reallocation.
 
 @opindex Wuseless-cast
 @opindex Wno-useless-cast
-@item -Wuseless-cast @r{(C++ and Objective-C++ only)}
+@item -Wuseless-cast @r{(C, Objective-C, C++ and Objective-C++ only)}
 Warn when an expression is cast to its own type.  This warning does not
 occur when a class object is converted to a non-reference type as that
 is a way to create a temporary:
diff --git a/gcc/testsuite/gcc.dg/Wuseless-cast.c 
b/gcc/testsuite/gcc.dg/Wuseless-cast.c
new file mode 100644
index 000..86e87584b87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wuseless-cast.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-Wuseless-cast" } */
+
+void foo(void)
+{  
+   // casts to the same type
+   int i = 0;
+   const int ic = 0;
+   struct foo { int x; } x = { 0 };
+   int q[3];
+   (int)ic;/* { dg-warning "useless cast" } */
+   (int)i; /* { dg-warning "useless cast" } */
+   (const int)ic;  /* { dg-warning "useless cast" } */
+   (const int)i;   /* { dg-warning "useless cast" } */
+   (struct foo)x;  /* { dg-warning "useless cast" } */
+   (int(*)[3])&q;  /* { dg-warning "useless cast" } */
+   (_Atomic(int))i;/* { dg-warning "useless cast" } */
+
+   // not the same
+   int n = 3;
+   (int(*)[n])&q;  // no warning
+   int j = (int)0UL;
+   enum X { A = 1 } xx = { A };
+   enum Y { B = 1 } yy = (enum Y)xx;
+}
+




Re: [PATCH] RISC-V: Add missing modes to the iterators

2023-08-10 Thread Robin Dapp via Gcc-patches
Yeah, thanks, better in this separate patch.

OK.

Regards
 Robin



Re: [PATCH] testsuite: Fix gcc.dg/analyzer/allocation-size-multiline-[123].c [PR 110426]

2023-08-10 Thread Christophe Lyon via Gcc-patches
Hi!

On Wed, 9 Aug 2023 at 22:30, David Malcolm  wrote:

> On Tue, 2023-08-08 at 15:01 +, Christophe Lyon wrote:
> > For 32-bit newlib targets (e.g. arm-eabi)  int32_t is "long int".
> >
> > Like previous patches in these tests, update the matching regexps to
> > match "aka (long )?int".
> >
> > Tested on arm-eabi and aarch64-linux-gnu.
>
> Sorry about this breakage.
>
> These tests used to emit the infomation as multiple messages, but were
> consolidated as a side-effect of r14-3001-g021077b94741c9.
>
> I've just committed r14-3114-g73da34a538ddc2, a cleanup of the analyzer
> code, which has a side-effect of splitting the messages back up.  I
> believe that r14-3114 restores these tests to their pre-r14-3001 state,
> but I might have messed up.
>
> Does r14-3114-g73da34a538ddc2 fix the issues for you, or is some
> patching still needed?
>
>
Thanks, indeed the tests pass again (both aarch64 and arm targets)

Christophe


> Dave
>
>
> >
> > 2023-08-08  Christophe Lyon  
> >
> > gcc/testsuite/
> > PR analyzer/110426
> > * gcc.dg/analyzer/allocation-size-multiline-1.c: Handle
> > int32_t being "long int".
> > * gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise.
> > * gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c | 6 +++-
> > --
> >  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c | 6 +++-
> > --
> >  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-3.c | 4 ++--
> >  3 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-
> > 1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> > index 9938ba237a0..b56e4b4e8e1 100644
> > --- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> > @@ -16,7 +16,7 @@ void test_constant_1 (void)
> >  |   int32_t *ptr = __builtin_malloc (1);
> >  |  ^~~~
> >  |  |
> > -|  (1) allocated 1 bytes and assigned to
> > 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> > +|  (1) allocated 1 bytes and assigned to
> > 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> > {re:long :re?}int})' is '4'
> >  |
> > { dg-end-multiline-output "" } */
> >
> > @@ -34,7 +34,7 @@ void test_constant_2 (void)
> >  |   int32_t *ptr = __builtin_malloc (2);
> >  |  ^~~~
> >  |  |
> > -|  (1) allocated 2 bytes and assigned to
> > 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> > +|  (1) allocated 2 bytes and assigned to
> > 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> > {re:long :re?}int})' is '4'
> >  |
> > { dg-end-multiline-output "" } */
> >
> > @@ -52,6 +52,6 @@ void test_symbolic (int n)
> >  |   int32_t *ptr = __builtin_malloc (n * 2);
> >  |  ^~~~
> >  |  |
> > -|  (1) allocated 'n * 2' bytes and assigned to
> > 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> > +|  (1) allocated 'n * 2' bytes and assigned to
> > 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> > {re:long :re?}int})' is '4'
> >  |
> > { dg-end-multiline-output "" } */
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-
> > 2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> > index 9e1269cbb7a..8912913a78c 100644
> > --- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> > @@ -16,7 +16,7 @@ void test_constant_1 (void)
> >  |   int32_t *ptr = __builtin_alloca (1);
> >  |  ^~~~
> >  |  |
> > -|  (1) allocated 1 bytes and assigned to
> > 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> > +|  (1) allocated 1 bytes and assigned to
> > 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> > {re:long :re?}int})' is '4'
> >  |
> > { dg-end-multiline-output "" } */
> >
> > @@ -33,7 +33,7 @@ void test_constant_2 (void)
> >  |   int32_t *ptr = __builtin_alloca (2);
> >  |  ^~~~
> >  |  |
> > -|  (1) allocated 2 bytes and assigned to
> > 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> > +|  (1) allocated 2 bytes and assigned to
> > 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> > {re:long :re?}int})' is '4'
> >  |
> > { dg-end-multiline-output "" } */
> >
> > @@ -50,7

Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread Robin Dapp via Gcc-patches
Is the testcase already in the test suite?  If not we should add it.
Apart from that LGTM. 

Regards
 Robin


Re: Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread juzhe.zh...@rivai.ai
I didn't add it since I don't know how to add a target specific fortran 
testcase.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-10 19:55
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]
Is the testcase already in the test suite?  If not we should add it.
Apart from that LGTM. 
 
Regards
Robin
 


Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Jan Hubicka via Gcc-patches
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
> 
> Yes, I know.  But there's -mtune-ctl= doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
> 
> Then -mtune-ctl=^use_gather works which I think is nice enough?

-mtune-ctl is really intended for GCC developers.  It is not backward
compatible, fully documented and bad sets of values may trigger ICEs.
If gathers became very slow, I think normal users may want to disable
them and in such situation specialized command line option makes sense
to me.

Honza
> 
> Richard.
> 
> > Uros.


Re: [PATCH] RISC-V: Support TU for integer ternary OP[PR110964]

2023-08-10 Thread Robin Dapp via Gcc-patches
OK.

Regards
 Robin



Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> What inserts the required LC SSA PHI in that case?
> 
> Here is the flow how GCC insert LC SSA PHI flow for ARM SVE.
> You can see this following 'vect' dump details:
> https://godbolt.org/z/564o87oz3 
> 
> You can see this following information:
> 
> ;; Created LCSSA PHI: loop_mask_36 = PHI 
>   # loop_mask_36 = PHI 
>   _25 = .EXTRACT_LAST (loop_mask_36, vect_last_12.8_24);
>   last_17 = _25;
> 
> The '# loop_mask_36 = PHI ' is inserted as follows:
> 
> Step 1 - Enter file tree-vectorizer.cc
> In the function pass_vectorize::execute (function *fun): 1358
> 'rewrite_into_loop_closed_ssa' is the key function insert the LC SSA PHI for 
> ARM SVE in mask loop case.
> 
> Step 2 - Investigate more into 'rewrite_into_loop_closed_ssa':
> In file tree-ssa-loop-manip.cc:628, 'rewrite_into_loop_closed_ssa' is 
> directly calling 'rewrite_into_loop_closed_ssa_1'.
> Step 3 - Investigate 'rewrite_into_loop_closed_ssa_1':
> In file tree-ssa-loop-manip.cc:588 which is the function 
> 'find_uses_to_rename' that:
> /* Marks names matching USE_FLAGS that are used outside of the loop they are
>defined in for rewrite.  Records the set of blocks in which the ssa names 
> are
>used to USE_BLOCKS.  Record the SSA names that will need exit PHIs in
>NEED_PHIS.  If CHANGED_BBS is not NULL, scan only blocks in this set.  */
> 
> static void
> find_uses_to_rename (bitmap changed_bbs, bitmap *use_blocks, bitmap need_phis,
>  int use_flags)
> {
>   basic_block bb;
>   unsigned index;
>   bitmap_iterator bi;
> 
>   if (changed_bbs)
> EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
>   {
>   bb = BASIC_BLOCK_FOR_FN (cfun, index);
>   if (bb)
> find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
>   }
>   else
> FOR_EACH_BB_FN (bb, cfun)
>   find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
> }
> 
> This function is iterating all blocks of the function to set the BITMAP which 
> SSA need to be renamed then the later function will insert LC SSA for it.
> 
> In file tree-ssa-loop-manip.cc:606 which is the function 'add_exit_phis' that 
> is the real function that is adding LC SSA by calling
> this eventually:
> /* Add a loop-closing PHI for VAR in basic block EXIT.  */
> 
> static void
> add_exit_phi (basic_block exit, tree var)
> {
>   gphi *phi;
>   edge e;
>   edge_iterator ei;
> 
>   /* Check that at least one of the edges entering the EXIT block exits
>  the loop, or a superloop of that loop, that VAR is defined in.  */
>   if (flag_checking)
> {
>   gimple *def_stmt = SSA_NAME_DEF_STMT (var);
>   basic_block def_bb = gimple_bb (def_stmt);
>   FOR_EACH_EDGE (e, ei, exit->preds)
>   {
> class loop *aloop = find_common_loop (def_bb->loop_father,
>  e->src->loop_father);
> if (!flow_bb_inside_loop_p (aloop, e->dest))
>   break;
>   }
>   gcc_assert (e);
> }
> 
>   phi = create_phi_node (NULL_TREE, exit);
>   create_new_def_for (var, phi, gimple_phi_result_ptr (phi));
>   FOR_EACH_EDGE (e, ei, exit->preds)
> add_phi_arg (phi, var, e, UNKNOWN_LOCATION);
> 
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   fprintf (dump_file, ";; Created LCSSA PHI: ");
>   print_gimple_stmt (dump_file, phi, 0, dump_flags);
> }
> }
> 
> 
> This is how it works for ARM SVE in EXTRACT_LAST. Such flow 
> (rewrite_into_loop_closed_ssa) can always insert LC SSA for RVV which is 
> using length loop.
> 
> However,
> 
> >> I want to know why we don't need this for SVE fully masked loops. 
> 
> Before entering 'rewrite_into_loop_closed_ssa', there is a check here that 
> RVV assertion failed but ARM SVE passed:
> 
>   /* We should not have to update virtual SSA form here but some
>  transforms involve creating new virtual definitions which makes
>  updating difficult.
>  We delay the actual update to the end of the pass but avoid
>  confusing ourselves by forcing need_ssa_update_p () to false.  */
>   unsigned todo = 0;
>   if (need_ssa_update_p (cfun))
> {
>   gcc_assert (loop_vinfo->any_known_not_updated_vssa);
>   fun->gimple_df->ssa_renaming_needed = false;
>   todo |= TODO_update_ssa_only_virtuals;
> }
> 
> in tree-vectorizer.cc, function 'vect_transform_loops'
> The assertion (gcc_assert (loop_vinfo->any_known_not_updated_vssa);)
> failed for RVV since it is false.
> 
> The reason why ARM SVE can pass is that the STMT1 before 
> 'vectorizable_live_operation' and STMT2 after vectorization of 
> 'vectorizable_live_operation'
> are both CONST or PURE since ARM SVE is using EXTRACT_LAST, here is the 
> define of 'EXTRACT_LAST' internal function:
> /* Extract the last active element from a vector.  */
> DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
>extract_last, fold_left)
> 
> You can see 'EXTRACT_LAST' is ECF_CONST.
> 
> Wheras, RVV will fail since it is 'VEC_EXTRACT' which is not

[PATCH V2] RISC-V: Fix error combine of pred_mov pattern

2023-08-10 Thread Lehua Ding
Hi,

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
int64_t scalar = *(int64_t*)(base + 100);
vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
*(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t 
*)out_4(D)]+0 S[32, 32] A128])
(reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 
{*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 
S[32, 32] A128])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I refine the conditon of @pred_mov
pattern to a more restrict. It's Ok for trunk?

Best,
Lehua

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
  New.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::function_expander):
  force_reg mem operand.
* config/riscv/vector.md (@pred_mov): Wrapper.
(*pred_mov): Remove imm -> reg pattern.
(*pred_broadcast_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Update.
* gcc.target/riscv/rvv/base/pr110943.c: New test.

---
 gcc/config/riscv/predicates.md|  5 +
 gcc/config/riscv/riscv-vector-builtins.cc |  8 +-
 gcc/config/riscv/vector.md| 97 +++
 .../gcc.target/riscv/rvv/base/pr110943.c  | 33 +++
 .../riscv/rvv/base/zvfhmin-intrinsic.c| 10 +-
 5 files changed, 104 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110943.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 9db28c2def7..f2e406c718a 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -295,6 +295,11 @@
   (ior (match_operand 0 "register_operand")
(match_operand 0 "const_int_operand")))
 
+(define_predicate "vector_const_int_or_double_0_operand"
+  (and (match_code "const_vector")
+   (match_test "satisfies_constraint_vi (op)
+|| satisfies_constraint_Wc0 (op)")))
+
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
(and (match_code "const_vector")
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index abab06c00ed..2da542585a8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3471,7 +3471,13 @@ function_expander::function_expander (const 
function_instance &instance,
 exp (exp_in), target (target_in), opno (0)
 {
   if (!function_returns_void_p ())
-create_output_operand (&m_ops[opno++], target, TYPE_MODE (TREE_TYPE 
(exp)));
+{
+  if (target != NULL_RTX && MEM_P (target))
+   /* Use force_reg to prevent illegal mem-to-mem pattern on -O0.  */
+   target = force_reg (GET_MODE (target), target);
+  create_output_operand (&m_ops[opno++], target,
+TYPE_MODE (TREE_TYPE (exp)));
+}
 }
 
 /* Take argument ARGNO from EXP's argument list and convert it into
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cf37b472930..508a3074080 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1446,69 +1446,60 @@
 ;; - 15.1 Vector Mask-Register Logical Instructions
 ;; 
--

RE: [PATCH] RISC-V: Add missing modes to the iterators

2023-08-10 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, August 10, 2023 7:50 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Add missing modes to the iterators

Yeah, thanks, better in this separate patch.

OK.

Regards
 Robin



[PATCH] Make ISEL used internal functions const/nothrow where appropriate

2023-08-10 Thread Richard Biener via Gcc-patches
Both .VEC_SET and .VEC_EXTACT and the various .VCOND internal functions
are operating on registers only and they are not supposed to raise
any exceptions.  The following makes them const/nothrow.  I've
verified this avoids useless SSA updates in ISEL.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

* internal-fn.def (VCOND, VCONDU, VCONDEQ, VCOND_MASK,
VEC_SET, VEC_EXTRACT): Make ECF_CONST | ECF_NOTHROW.
---
 gcc/internal-fn.def | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bf6825c5d00..b3c410f4b6a 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -209,13 +209,15 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, 
vec_store_lanes, store_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
   vec_mask_store_lanes, mask_store_lanes)
 
-DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
-DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_cond)
-DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_cond)
-DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
-
-DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
-DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract)
+DEF_INTERNAL_OPTAB_FN (VCOND, ECF_CONST | ECF_NOTHROW, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
+  vcond_mask, vec_cond_mask)
+
+DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
+DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
+  vec_extract, vec_extract)
 
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_STORE, 0, mask_len_store, mask_len_store)
-- 
2.35.3


RE: [PATCH] RISC-V: Support TU for integer ternary OP[PR110964]

2023-08-10 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, August 10, 2023 8:09 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Support TU for integer ternary OP[PR110964]

OK.

Regards
 Robin



Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread juzhe.zh...@rivai.ai
Hi,Richi.

>> comments do not match the implementation, but then that should be fixed?

You mean you allow me to change VEC_EXTRACT into ECF_CONST ?
If I can change VEC_EXTRACT into ECF_CONST then this patch can definitely work
No need 'vinfo->any_known_not_updated_vssa = true'.

So, let me conclude:

I can remove 'vinfo->any_known_not_updated_vssa = true' and
set VEC_EXTRACT as ECF_CONST.  Bootstrap and Regtest on X86 pass then send V3 
patch.

Am I right?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-10 20:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Thu, 10 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> >> What inserts the required LC SSA PHI in that case?
> 
> Here is the flow how GCC insert LC SSA PHI flow for ARM SVE.
> You can see this following 'vect' dump details:
> https://godbolt.org/z/564o87oz3 
> 
> You can see this following information:
> 
> ;; Created LCSSA PHI: loop_mask_36 = PHI 
>   # loop_mask_36 = PHI 
>   _25 = .EXTRACT_LAST (loop_mask_36, vect_last_12.8_24);
>   last_17 = _25;
> 
> The '# loop_mask_36 = PHI ' is inserted as follows:
> 
> Step 1 - Enter file tree-vectorizer.cc
> In the function pass_vectorize::execute (function *fun): 1358
> 'rewrite_into_loop_closed_ssa' is the key function insert the LC SSA PHI for 
> ARM SVE in mask loop case.
> 
> Step 2 - Investigate more into 'rewrite_into_loop_closed_ssa':
> In file tree-ssa-loop-manip.cc:628, 'rewrite_into_loop_closed_ssa' is 
> directly calling 'rewrite_into_loop_closed_ssa_1'.
> Step 3 - Investigate 'rewrite_into_loop_closed_ssa_1':
> In file tree-ssa-loop-manip.cc:588 which is the function 
> 'find_uses_to_rename' that:
> /* Marks names matching USE_FLAGS that are used outside of the loop they are
>defined in for rewrite.  Records the set of blocks in which the ssa names 
> are
>used to USE_BLOCKS.  Record the SSA names that will need exit PHIs in
>NEED_PHIS.  If CHANGED_BBS is not NULL, scan only blocks in this set.  */
> 
> static void
> find_uses_to_rename (bitmap changed_bbs, bitmap *use_blocks, bitmap need_phis,
>  int use_flags)
> {
>   basic_block bb;
>   unsigned index;
>   bitmap_iterator bi;
> 
>   if (changed_bbs)
> EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
>   {
>   bb = BASIC_BLOCK_FOR_FN (cfun, index);
>   if (bb)
> find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
>   }
>   else
> FOR_EACH_BB_FN (bb, cfun)
>   find_uses_to_rename_bb (bb, use_blocks, need_phis, use_flags);
> }
> 
> This function is iterating all blocks of the function to set the BITMAP which 
> SSA need to be renamed then the later function will insert LC SSA for it.
> 
> In file tree-ssa-loop-manip.cc:606 which is the function 'add_exit_phis' that 
> is the real function that is adding LC SSA by calling
> this eventually:
> /* Add a loop-closing PHI for VAR in basic block EXIT.  */
> 
> static void
> add_exit_phi (basic_block exit, tree var)
> {
>   gphi *phi;
>   edge e;
>   edge_iterator ei;
> 
>   /* Check that at least one of the edges entering the EXIT block exits
>  the loop, or a superloop of that loop, that VAR is defined in.  */
>   if (flag_checking)
> {
>   gimple *def_stmt = SSA_NAME_DEF_STMT (var);
>   basic_block def_bb = gimple_bb (def_stmt);
>   FOR_EACH_EDGE (e, ei, exit->preds)
>   {
> class loop *aloop = find_common_loop (def_bb->loop_father,
>  e->src->loop_father);
> if (!flow_bb_inside_loop_p (aloop, e->dest))
>   break;
>   }
>   gcc_assert (e);
> }
> 
>   phi = create_phi_node (NULL_TREE, exit);
>   create_new_def_for (var, phi, gimple_phi_result_ptr (phi));
>   FOR_EACH_EDGE (e, ei, exit->preds)
> add_phi_arg (phi, var, e, UNKNOWN_LOCATION);
> 
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   fprintf (dump_file, ";; Created LCSSA PHI: ");
>   print_gimple_stmt (dump_file, phi, 0, dump_flags);
> }
> }
> 
> 
> This is how it works for ARM SVE in EXTRACT_LAST. Such flow 
> (rewrite_into_loop_closed_ssa) can always insert LC SSA for RVV which is 
> using length loop.
> 
> However,
> 
> >> I want to know why we don't need this for SVE fully masked loops. 
> 
> Before entering 'rewrite_into_loop_closed_ssa', there is a check here that 
> RVV assertion failed but ARM SVE passed:
> 
>   /* We should not have to update virtual SSA form here but some
>  transforms involve creating new virtual definitions which makes
>  updating difficult.
>  We delay the actual update to the end of the pass but avoid
>  confusing ourselves by forcing need_ssa_update_p () to false.  */
>   unsigned todo = 0;
>   if (need_ssa_update_p (cfun))
> {
>   gcc_assert (loop_vinfo->any_known_not_updated_vssa);
>   fun->gimple_df->ssa_renaming_needed = false;
>   todo |= TODO_update_ssa_only_virtuals;
> }
> 
> in tree-vectorizer.cc,

Re: [PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API

2023-08-10 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-10 13:09
To: gcc-patches
CC: juzhe.zhong; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFMACC for the below samples.
 
* __riscv_vfmacc_vv_f32m1_rm
* __riscv_vfmacc_vv_f32m1_rm_m
* __riscv_vfmacc_vf_f32m1_rm
* __riscv_vfmacc_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class vfmacc_frm): New class for vfmacc frm.
(vfmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmacc_frm): New function definition.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-macc.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  3 ++
.../riscv/rvv/base/float-point-macc.c | 47 +++
4 files changed, 76 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index afe3735f5ee..1695d77e8bd 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -356,6 +356,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmacc
+*/
+class vfmacc_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (true,
+ code_for_pred_mul_scalar (PLUS,
+   e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (true,
+ code_for_pred_mul (PLUS, e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2116,6 +2139,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrdiv_frm_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
static CONSTEXPR const widen_binop_frm vfwmul_frm_obj;
static CONSTEXPR const vfmacc vfmacc_obj;
+static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
@@ -2351,6 +2375,7 @@ BASE (vfrdiv_frm)
BASE (vfwmul)
BASE (vfwmul_frm)
BASE (vfmacc)
+BASE (vfmacc_frm)
BASE (vfnmsac)
BASE (vfmadd)
BASE (vfnmsub)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 2d2b52a312c..67d18412b4c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -160,6 +160,7 @@ extern const function_base *const vfrdiv_frm;
extern const function_base *const vfwmul;
extern const function_base *const vfwmul_frm;
extern const function_base *const vfmacc;
+extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
extern const function_base *const vfmadd;
extern const function_base *const vfnmsub;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index d43b33ded17..92ecf8a9065 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -349,6 +349,9 @@ DEF_RVV_FUNCTION (vfnmadd, alu, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
+
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwfv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
new file mode 100644
index 000..df29f4d240f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmacc_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmacc_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmacc_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmacc_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfmacc

Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-10 Thread Lehua Ding
Hi Jeff,

After reconsidering I think the split of pattern you mention
makes sense to me. I have split the `@pred_movhttps://gcc.gnu.org/pipermail/gcc-patches/2023-August/626981.html


Best,
Lehua

Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread Robin Dapp via Gcc-patches
> Hmm, I think VEC_EXTRACT and VEC_SET should be ECF_CONST.  Maybe the 
> GIMPLE ISEL
> comments do not match the implementation, but then that should be fixed?
> 
> /* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls 
> to
>internal function based on vector type of selected expansion.
> 
>For vec_set:
> 
>  VIEW_CONVERT_EXPR(u)[_1] = i_4(D);
>=>
>  _7 = u;
>  _8 = .VEC_SET (_7, i_4(D), _1);
>  u = _8;
>   
>For vec_extract:
> 
>   _3 = VIEW_CONVERT_EXPR(vD.2208)[idx_2(D)];
>=>
>   _4 = vD.2208;
>   _3 = .VEC_EXTRACT (_4, idx_2(D));  */
> 

I probably just forgot to set ECF_CONST in the recent isel patch
for vec_extract.

Regards
 Robin


Re: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, 10 Aug 2023, Robin Dapp wrote:

> > Hmm, I think VEC_EXTRACT and VEC_SET should be ECF_CONST.  Maybe the 
> > GIMPLE ISEL
> > comments do not match the implementation, but then that should be fixed?
> > 
> > /* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls 
> > to
> >internal function based on vector type of selected expansion.
> > 
> >For vec_set:
> > 
> >  VIEW_CONVERT_EXPR(u)[_1] = i_4(D);
> >=>
> >  _7 = u;
> >  _8 = .VEC_SET (_7, i_4(D), _1);
> >  u = _8;
> >   
> >For vec_extract:
> > 
> >   _3 = VIEW_CONVERT_EXPR(vD.2208)[idx_2(D)];
> >=>
> >   _4 = vD.2208;
> >   _3 = .VEC_EXTRACT (_4, idx_2(D));  */
> > 
> 
> I probably just forgot to set ECF_CONST in the recent isel patch
> for vec_extract.

I'm testing a patch adjusting a few IFNs where that was missed.

Richard.


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Phoebe Wang via Gcc-patches
>  Changing ABIs like that for existing code that has worked for some time
on
>  existing hardware is a bad idea.

I agree, so Proposal 3 is the last choice.

The target of the proposals is to solve the ABI incompatible issue between
AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
discussing the default ABI rather than other vector variants.

If you believe that changing 512-bit ABI (the 512-bit version) is a bad
idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
version an ABI because it doesn't provide the interaction between 256-bit
and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
512-bit targets. It is a good time to solve the problem together if we make
the 512-bit ABI consistent and target independent. WDYT?

Thanks
Phoebe

Joseph Myers  于2023年8月10日周四 04:43写道:

> On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
>
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
>
> Changing ABIs like that for existing code that has worked for some time on
> existing hardware is a bad idea.
>
> At this point it seems appropriate to remind people of another ABI
> consideration for vector extensions.  glibc's libmvec defines vector
> versions of various functions, including AVX512 ones (of course those
> function versions only work on hardware with the relevant instructions).
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> including those headers, what function variants are available in glibc.
>
> Existing glibc versions need to continue to work with new compiler
> versions.  That is, it's part of the ABI, which must remain stable,
> exactly which function versions the above pragma and attribute imply are
> available - and of course the details of how those functions versions take
> arguments / return results are also part of the ABI (it would be OK for a
> new compiler to choose not to use some of those vector versions, but not
> to start calling them with a different ABI).
>
> Maybe you'll want to add new vector function versions, with different
> interfaces, to libmvec in future.  If so, you need a *different* pragma or
> attribute to declare to the compiler that the libmvec version using that
> pragma or attribute has the additional functions - so new compilers using
> the existing header will not try to generate calls to new function
> versions that don't exist in that glibc version (but new compilers using a
> new header version from new glibc will see the new pragma or attribute and
> so be able to generate the relevant calls to new functions).  And once
> you've defined the ABI for such a new pragma or attribute, that itself
> then becomes a stable interface - so if you end up with vector extensions
> involving yet another set of interfaces, they need another corresponding
> new pragma / attribute for libmvec to declare to the compiler that the new
> interfaces exist.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to x86-64-abi+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> .
>


Re: Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread juzhe.zh...@rivai.ai
Is this patch ok ? Maybe we can find a way to add a target specific fortran 
test but should not block this bug fix.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-10 19:55
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]
Is the testcase already in the test suite?  If not we should add it.
Apart from that LGTM. 
 
Regards
Robin
 


RE: [PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API

2023-08-10 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 10, 2023 8:27 PM
To: Li, Pan2 ; gcc-patches 
Cc: jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API

LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-10 13:09
To: gcc-patches
CC: juzhe.zhong; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFMACC for the below samples.

* __riscv_vfmacc_vv_f32m1_rm
* __riscv_vfmacc_vv_f32m1_rm_m
* __riscv_vfmacc_vf_f32m1_rm
* __riscv_vfmacc_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmacc_frm): New class for vfmacc frm.
(vfmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmacc_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-macc.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  3 ++
.../riscv/rvv/base/float-point-macc.c | 47 +++
4 files changed, 76 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index afe3735f5ee..1695d77e8bd 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -356,6 +356,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmacc
+*/
+class vfmacc_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (true,
+ code_for_pred_mul_scalar (PLUS,
+   e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (true,
+ code_for_pred_mul (PLUS, e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2116,6 +2139,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrdiv_frm_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
static CONSTEXPR const widen_binop_frm vfwmul_frm_obj;
static CONSTEXPR const vfmacc vfmacc_obj;
+static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
@@ -2351,6 +2375,7 @@ BASE (vfrdiv_frm)
BASE (vfwmul)
BASE (vfwmul_frm)
BASE (vfmacc)
+BASE (vfmacc_frm)
BASE (vfnmsac)
BASE (vfmadd)
BASE (vfnmsub)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 2d2b52a312c..67d18412b4c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -160,6 +160,7 @@ extern const function_base *const vfrdiv_frm;
extern const function_base *const vfwmul;
extern const function_base *const vfwmul_frm;
extern const function_base *const vfmacc;
+extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
extern const function_base *const vfmadd;
extern const function_base *const vfnmsub;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index d43b33ded17..92ecf8a9065 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -349,6 +349,9 @@ DEF_RVV_FUNCTION (vfnmadd, alu, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
+
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwfv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
new file mode 100644
index 000..df29f4d240f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
@@ -

RE: [PATCH v1] RISC-V: Support RVV VFNMACC rounding mode intrinsic API

2023-08-10 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 10, 2023 4:54 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFNMACC rounding mode intrinsic API

LGTM

On Thu, Aug 10, 2023 at 4:20 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch would like to support the rounding mode API for the
> VFNMACC for the below samples.
>
> * __riscv_vfnmacc_vv_f32m1_rm
> * __riscv_vfnmacc_vv_f32m1_rm_m
> * __riscv_vfnmacc_vf_f32m1_rm
> * __riscv_vfnmacc_vf_f32m1_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc
> (class vfnmacc_frm): New class for vfnmacc.
> (vfnmacc_frm_obj): New declaration.
> (BASE): Ditto.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def
> (vfnmacc_frm): New function definition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-nmacc.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 24 ++
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 +
>  .../riscv/rvv/base/float-point-nmacc.c| 47 +++
>  4 files changed, 74 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 1695d77e8bd..1d4a5a18bf9 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -379,6 +379,28 @@ public:
>}
>  };
>
> +/* Implements below instructions for frm
> +   - vfnmacc
> +*/
> +class vfnmacc_frm : public function_base
> +{
> +public:
> +  bool has_rounding_mode_operand_p () const override { return true; }
> +
> +  bool has_merge_operand_p () const override { return false; }
> +
> +  rtx expand (function_expander &e) const override
> +  {
> +if (e.op_info->op == OP_TYPE_vf)
> +  return e.use_ternop_insn (
> +   true, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
> +if (e.op_info->op == OP_TYPE_vv)
> +  return e.use_ternop_insn (
> +   true, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
> +gcc_unreachable ();
> +  }
> +};
> +
>  /* Implements vrsub.  */
>  class vrsub : public function_base
>  {
> @@ -2144,6 +2166,7 @@ static CONSTEXPR const vfnmsac vfnmsac_obj;
>  static CONSTEXPR const vfmadd vfmadd_obj;
>  static CONSTEXPR const vfnmsub vfnmsub_obj;
>  static CONSTEXPR const vfnmacc vfnmacc_obj;
> +static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
>  static CONSTEXPR const vfmsac vfmsac_obj;
>  static CONSTEXPR const vfnmadd vfnmadd_obj;
>  static CONSTEXPR const vfmsub vfmsub_obj;
> @@ -2380,6 +2403,7 @@ BASE (vfnmsac)
>  BASE (vfmadd)
>  BASE (vfnmsub)
>  BASE (vfnmacc)
> +BASE (vfnmacc_frm)
>  BASE (vfmsac)
>  BASE (vfnmadd)
>  BASE (vfmsub)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 67d18412b4c..247074d0868 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -165,6 +165,7 @@ extern const function_base *const vfnmsac;
>  extern const function_base *const vfmadd;
>  extern const function_base *const vfnmsub;
>  extern const function_base *const vfnmacc;
> +extern const function_base *const vfnmacc_frm;
>  extern const function_base *const vfmsac;
>  extern const function_base *const vfnmadd;
>  extern const function_base *const vfmsub;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 92ecf8a9065..7aae0665520 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -351,6 +351,8 @@ DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
>
>  DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
>  DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
> +DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f__ops)
> +DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f_vvfv_ops)
>
>  // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
>  DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
> new file mode 100644
> index 000..fca378b7a8f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmacc.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +typedef float float

[PATCH] tree-optimization/110963 - more PRE when optimizing for size

2023-08-10 Thread Richard Biener via Gcc-patches
The following adjusts the heuristic when we perform PHI insertion
during GIMPLE PRE from requiring at least one edge that is supposed
to be optimized for speed to also doing insertion when the expression
is available on all edges (but possibly with different value) and
we'd at most have one copy from a constant.  The first ensures
we optimize two computations on all paths to one plus a possible
copy due to the PHI, the second makes sure we do not need to insert
many possibly large copies from constants, disregarding the
cummulative size cost of the register copies when they are not
coalesced.

The case in the testcase is

  
  _14 = h;
  if (_14 == 0B)
goto ;
  else
goto ;

  
  h = 0B;

  
  h.6_12 = h;

and we want to optimize that to

  
  # h.6_12 = PHI <_14(5), 0B(6)>

If we want to consider the cost of the register copies I think the
only simplistic enough way would be to restrict the special-case to
two incoming edges - we'd assume one register copy is coalesced
leaving one copy from a register or from a constant.

As with every optimization the downstream effects are probably
bigger than what we can locally estimate.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

PR tree-optimization/110963
* tree-ssa-pre.cc (do_pre_regular_insertion): Also insert
a PHI node when the expression is available on all edges
and we insert at most one copy from a constant.

* gcc.dg/tree-ssa/ssa-pre-34.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c | 56 ++
 gcc/tree-ssa-pre.cc| 11 +
 2 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
new file mode 100644
index 000..9ac37c44336
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-pre-stats -fdump-tree-optimized" } */
+
+void foo(void);
+static int c = 76, f, g;
+static int *h, *j, *k = &g;
+static int **i = &h;
+static short a;
+static signed char(l)(signed char b) {
+if (!(((b) >= 77) && ((b) <= 77))) {
+__builtin_unreachable();
+}
+return 0;
+}
+static short(m)(short d, short e) { return d + e; }
+static short n(signed char) {
+j = *i;
+if (j == 0)
+;
+else
+*i = 0;
+*k = 0;
+return 0;
+}
+static signed char o() {
+l(0);
+return 0;
+}
+static signed char p(int ad) {
+a = m(!0, ad);
+l(a);
+if (f) {
+*i &&n(o());
+*i = 0;
+} else
+n(0);
+if (h == &f || h == 0)
+;
+else
+foo();
+return 0;
+}
+int main() {
+p(c);
+c = 8;
+}
+
+/* Even with main being cold we should optimize the redundant load of h
+   which is available on all incoming edges (but none considered worth
+   optimizing for speed) when doing that doesn't needlessly increase
+   code size.  */
+
+/* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
+/* { dg-final { scan-tree-dump "HOIST inserted: 1" "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 3" "pre" } } */
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 0f2e458395c..07fb165b2a8 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -3314,6 +3314,8 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom,
  bool by_some = false;
  bool cant_insert = false;
  bool all_same = true;
+ unsigned num_inserts = 0;
+ unsigned num_const = 0;
  pre_expr first_s = NULL;
  edge pred;
  basic_block bprime;
@@ -3370,11 +3372,14 @@ do_pre_regular_insertion (basic_block block, 
basic_block dom,
{
  avail[pred->dest_idx] = eprime;
  all_same = false;
+ num_inserts++;
}
  else
{
  avail[pred->dest_idx] = edoubleprime;
  by_some = true;
+ if (edoubleprime->kind == CONSTANT)
+   num_const++;
  /* We want to perform insertions to remove a redundancy on
 a path in the CFG we want to optimize for speed.  */
  if (optimize_edge_for_speed_p (pred))
@@ -3391,6 +3396,12 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom,
 partially redundant.  */
  if (!cant_insert && !all_same && by_some)
{
+ /* If the expression is redundant on all edges and we need
+to at most insert one copy from a constant do the PHI
+insertion even when not optimizing a path that's to be
+optimized for speed.  */
+ if (num_inserts == 0 && num_const <= 1)
+   do_inserti

Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread Robin Dapp via Gcc-patches
> Is this patch ok ? Maybe we can find a way to add a target specific
> fortran test but should not block this bug fix.

It's not much different than adding a C testcase actually, apart from 
starting comments with a !

But well, LGTM.  The test doesn't look that complicated and quite likely
is covered by the Fortran testsuite already.

Regards
 Robin


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
 wrote:
>
> >  Changing ABIs like that for existing code that has worked for some time
> on
> >  existing hardware is a bad idea.
>
> I agree, so Proposal 3 is the last choice.
>
> The target of the proposals is to solve the ABI incompatible issue between
> AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> discussing the default ABI rather than other vector variants.
>
> If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> version an ABI because it doesn't provide the interaction between 256-bit
> and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
> 512-bit targets. It is a good time to solve the problem together if we make
> the 512-bit ABI consistent and target independent. WDYT?

Isn't this situation similar to the not defined ABI when passing generic
vectors (via __attribute__((vector_size))) that do not map to vectors supported
by the current ISA?  There's cases like vector<2> char or vector<1> double
to consider for example that would fit in a lowpart of a supported vector
register and as in the AVX512 case vectors that are larger than any supported
vector register.

The psABI should have some simple rule covering all of the above I think.

Richard.

> Thanks
> Phoebe
>
> Joseph Myers  于2023年8月10日周四 04:43写道:
>
> > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> >
> > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > passed/returned from memory.
> >
> > Changing ABIs like that for existing code that has worked for some time on
> > existing hardware is a bad idea.
> >
> > At this point it seems appropriate to remind people of another ABI
> > consideration for vector extensions.  glibc's libmvec defines vector
> > versions of various functions, including AVX512 ones (of course those
> > function versions only work on hardware with the relevant instructions).
> > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > including those headers, what function variants are available in glibc.
> >
> > Existing glibc versions need to continue to work with new compiler
> > versions.  That is, it's part of the ABI, which must remain stable,
> > exactly which function versions the above pragma and attribute imply are
> > available - and of course the details of how those functions versions take
> > arguments / return results are also part of the ABI (it would be OK for a
> > new compiler to choose not to use some of those vector versions, but not
> > to start calling them with a different ABI).
> >
> > Maybe you'll want to add new vector function versions, with different
> > interfaces, to libmvec in future.  If so, you need a *different* pragma or
> > attribute to declare to the compiler that the libmvec version using that
> > pragma or attribute has the additional functions - so new compilers using
> > the existing header will not try to generate calls to new function
> > versions that don't exist in that glibc version (but new compilers using a
> > new header version from new glibc will see the new pragma or attribute and
> > so be able to generate the relevant calls to new functions).  And once
> > you've defined the ABI for such a new pragma or attribute, that itself
> > then becomes a stable interface - so if you end up with vector extensions
> > involving yet another set of interfaces, they need another corresponding
> > new pragma / attribute for libmvec to declare to the compiler that the new
> > interfaces exist.
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "X86-64 System V Application Binary Interface" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to x86-64-abi+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> > .
> >


Re: [PATCHv2] Use toplevel configure for GMP and MPFR for gdb

2023-08-10 Thread Matthias Klose via Gcc-patches

On 10.11.22 20:05, apinski--- via Binutils wrote:

From: Andrew Pinski 

This patch uses the toplevel configure parts for GMP/MPFR for
gdb. The only thing is that gdb now requires MPFR for building.
Before it was a recommended but not required library.
Also this allows building of GMP and MPFR with the toplevel
directory just like how it is done for GCC.
We now error out in the toplevel configure of the version
of GMP and MPFR that is wrong.

OK after GDB 13 branches? Build gdb 3 ways:
with GMP and MPFR in the toplevel (static library used at that point for both)
With only MPFR in the toplevel (GMP distro library used and MPFR built from 
source)
With neither GMP and MPFR in the toplevel (distro libraries used)


this still seems to be broken for a gdb trunk build, using GMP and MPFR system 
libraries:


linking gdb:

[...]
../gnulib/import/libgnu.a   -Lyes/lib -lmpfr -lgmp -lsource-highlight 
-lboost_regex  -lxxhash  -ldebuginfod   -ldl 
-Wl,--dynamic-list=/<>/gdb/proc-service.list

./libtool: line 5209: cd: yes/lib: No such file or directory
libtool: link: cannot determine absolute directory name of `yes/lib'
make[3]: *** [Makefile:2174: gdb] Error 1
make[3]: Leaving directory '/<>/build/default/gdb'

full build log at
https://launchpad.net/~doko/+archive/ubuntu/toolchain/+sourcepub/15065515/+listing-archive-extra


the toplevel config.log has:

configure:8183: checking for the correct version of gmp.h
configure:8202: x86_64-linux-gnu-gcc -c  -Iyes/include  -fPIC conftest.c >&5
configure:8202: $? = 0
configure:8220: x86_64-linux-gnu-gcc -c  -Iyes/include  -fPIC conftest.c >&5
configure:8220: $? = 0
configure:8221: result: yes
configure:8237: checking for the correct version of mpfr.h
configure:8255: x86_64-linux-gnu-gcc -c  -Iyes/include  -fPIC conftest.c >&5
configure:8255: $? = 0
configure:8272: x86_64-linux-gnu-gcc -c  -Iyes/include  -fPIC conftest.c >&5
configure:8272: $? = 0
configure:8273: result: yes
configure:8342: checking for the correct version of the gmp/mpfr libraries
configure:8366: x86_64-linux-gnu-gcc -o conftest  -Iyes/include  -fPIC 
conftest.c  -Lyes/lib -lmpfr -lgmp >&5

configure:8366: $? = 0
configure:8367: result: yes
configure:8615: checking for isl 0.15 or later
configure:8628: x86_64-linux-gnu-gcc -o conftest   -Iyes/include  -fPIC   -lisl 
-Lyes/lib -lmpfr -lgmp conftest.c  -lisl -lgmp >&5

configure:8628: $? = 0



Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Phoebe Wang via Gcc-patches
>  The psABI should have some simple rule covering all of the above I think.

psABI has a rule for the case doesn't mean the rule is a well defined ABI
in practice. A well defined ABI should guarantee 1) interlinkable across
different compile options within the same compiler; 2) interlinkable across
different compilers. Both aspects are failed in the non 512-bit version.

1) is more important than 2) and becomes more critical on AVX10 targets.
Because we expect AVX10-256 is a general setting for binaries that can run
on both AVX10-256 and AVX10-512. It would be common that binaries compiled
with AVX10-256 may link with native built binaries on AVX10-512 targets.

Both 1) and 2) show the problem of the current rule in the psABI. So I
think the psABI should be updated to solve them.

Thanks
Phoebe

Richard Biener  于2023年8月10日周四 20:46写道:

> On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
>  wrote:
> >
> > >  Changing ABIs like that for existing code that has worked for some
> time
> > on
> > >  existing hardware is a bad idea.
> >
> > I agree, so Proposal 3 is the last choice.
> >
> > The target of the proposals is to solve the ABI incompatible issue
> between
> > AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> > discussing the default ABI rather than other vector variants.
> >
> > If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> > idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> > version an ABI because it doesn't provide the interaction between 256-bit
> > and 512-bit targets. Besides, LLVM also behaves differently with GCC on
> non
> > 512-bit targets. It is a good time to solve the problem together if we
> make
> > the 512-bit ABI consistent and target independent. WDYT?
>
> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors
> supported
> by the current ISA?  There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any
> supported
> vector register.
>
> The psABI should have some simple rule covering all of the above I think.
>
> Richard.
>
> > Thanks
> > Phoebe
> >
> > Joseph Myers  于2023年8月10日周四 04:43写道:
> >
> > > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> > >
> > > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > > passed/returned from memory.
> > >
> > > Changing ABIs like that for existing code that has worked for some
> time on
> > > existing hardware is a bad idea.
> > >
> > > At this point it seems appropriate to remind people of another ABI
> > > consideration for vector extensions.  glibc's libmvec defines vector
> > > versions of various functions, including AVX512 ones (of course those
> > > function versions only work on hardware with the relevant
> instructions).
> > > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > > including those headers, what function variants are available in glibc.
> > >
> > > Existing glibc versions need to continue to work with new compiler
> > > versions.  That is, it's part of the ABI, which must remain stable,
> > > exactly which function versions the above pragma and attribute imply
> are
> > > available - and of course the details of how those functions versions
> take
> > > arguments / return results are also part of the ABI (it would be OK
> for a
> > > new compiler to choose not to use some of those vector versions, but
> not
> > > to start calling them with a different ABI).
> > >
> > > Maybe you'll want to add new vector function versions, with different
> > > interfaces, to libmvec in future.  If so, you need a *different*
> pragma or
> > > attribute to declare to the compiler that the libmvec version using
> that
> > > pragma or attribute has the additional functions - so new compilers
> using
> > > the existing header will not try to generate calls to new function
> > > versions that don't exist in that glibc version (but new compilers
> using a
> > > new header version from new glibc will see the new pragma or attribute
> and
> > > so be able to generate the relevant calls to new functions).  And once
> > > you've defined the ABI for such a new pragma or attribute, that itself
> > > then becomes a stable interface - so if you end up with vector
> extensions
> > > involving yet another set of interfaces, they need another
> corresponding
> > > new pragma / attribute for libmvec to declare to the compiler that the
> new
> > > interfaces exist.
> > >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "X86-64 System V Application Binary Interface" group.
> > > To unsubscribe from this group and stop 

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 10, 2023 at 7:13 PM Richard Biener
 wrote:
>
> On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu  wrote:
> >
> > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu  wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > > similar for scatter, there're
> > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > > >
> > > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > > vectorization for all gather/scatter instructions. The options 
> > > > > > > > is
> > > > > > > > interpreted by driver to 3 tunes.
> > > > > > > >
> > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > > Ok for trunk?
> > > > > > >
> > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > > >
> > > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > > change what use_gather controls - it seems we changed its
> > > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > > use_gather_16parts)
> > > > > > >
> > > > > > > That is, what's the point of this?
> > The point of this is to keep consistent between GCC, LLVM, and
> > ICX(Intel® oneAPI DPC++/C++ Compiler) .
> > LLVM,ICX will support that option.
>
> GCC has very many options that are not the same as LLVM or ICX,
> I don't see a good reason to special case this one.  As said, it's
> a very bad name IMHO.
In general terms, yes.
But this is a new option, shouldn't it be better to be consistent?
And the problem with mfma is mainly that the cpuid is just called fma,
but we don't have a cpuid called gather/scatter, with clear document
that the option is only for auto-vectorization,
-m{no-,}{gather,scattter} looks fine to me.
As Honza mentioned, users need to option to turn on/off gather/scatter
auto vectorization, I don't think they will expect the option is also
valid for intrinsic.
If -mtune-crtl= is not suitable for direct exposure to usersusers,
then the original proposal should be ok?
Developers will manintain the relation between mgather/scatter and
-mtune-crtl=XXX to make it consistent between GCC versions.
>
> Richard.
>
> > > > > >
> > > > > > https://www.phoronix.com/review/downfall
> > > > > >
> > > > > > that caused:
> > > > > >
> > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > > >
> > > > > Yes, I know.  But there's -mtune-ctl= doing the trick.
> > > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > > gather works only on SI/SFmode or larger).
> > > > >
> > > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > > We don't have an extrat explicit flag for target tune, just single bit
> > > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Uros.
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao


RE: Machine Mode ICE in RISC-V when LTO

2023-08-10 Thread Thomas Schwinge
Hi!

On 2023-08-10T12:25:36+, "Li, Pan2"  wrote:
> Thanks Richard for comment, let me try to promote the table to unsigned short.

I have WIP work for this issue -- which I'd already raised a month ago:
:

On 2023-06-30T13:46:07+0200, Thomas Schwinge  wrote:
> In particular, the 'lto_mode_identity_table' changes would seem necessary
> to keep standard LTO ('-flto') functional for large 'machine_mode' size?

... which is exactly the problem you've now run into?

However, a simple:

-GTY(()) const unsigned char *lto_mode_identity_table;
+GTY(()) const unsigned short *lto_mode_identity_table;

..., or:

-GTY(()) const unsigned char *lto_mode_identity_table;
+GTY(()) const machine_mode *lto_mode_identity_table;

... is not sufficient: that runs into GTY issues, as the current
'unsigned char *lto_mode_identity_table' is (mis-)classified by
'gengtype' as a C string.  This happens to work for this case, but still
isn't right, and only works for 'char *' but not 'short *' etc.  I have
WIP work to tighten that.  ..., which got me into other GTY issues, and
so on...  ;-) (Richard already ACKed and I pushed some of the
prerequisite changes, but there's more to come.)  I'm still planning on
resolving all that mess, but I'm tight on time right now.

However, I have a different proposal, which should address your current
issue: simply, get rid of the 'lto_mode_identity_table', which is just
that: a 1-to-1 mapping of array index to value.  Instead, in
'gcc/lto/lto-common.cc:lto_file_finalize', for '!ACCEL_COMPILER', set
'file_data->mode_table = NULL', and in the users (only
'gcc/tree-streamer.h:bp_unpack_machine_mode'?), replace (untested):

-return (machine_mode) ib->file_data->mode_table[ix];
+return ib->file_data->mode_table ? ib->file_data->mode_table[ix] : ix;

Jakub, as the original author of 'lto_mode_identity_table' (see
commit db847fa8f2cca6139188b8dfa0a7064319b19193 (Subversion r221005)), is
there any reason not to do it this way?


Grüße
 Thomas


> -Original Message-
> From: Richard Biener 
> Sent: Thursday, August 10, 2023 7:08 PM
> To: Li, Pan2 
> Cc: richard.sandif...@arm.com; Thomas Schwinge ; 
> ja...@redhat.com; kito.ch...@gmail.com; Jeff Law ; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang 
> Subject: Re: Machine Mode ICE in RISC-V when LTO
>
> On Thu, Aug 10, 2023 at 10:19 AM Li, Pan2  wrote:
>>
>> Hi all,
>>
>>
>>
>> Recently I found there is still some issues for the machine mode with LTO 
>> part by fixing one
>>
>> ICE (only when compile with LTO) in RISC-V backend in , aka below case.
>>
>>
>>
>> >> ../__RISC-V_INSTALL___/bin/riscv64-unknown-elf-g++ -O2 -flto 
>> >> gcc/testsuite/g++.dg/torture/vshuf-v4df.C -o test.elf
>>
>> during RTL pass: expand
>>
>> gcc/testsuite/g++.dg/torture/vshuf-main.inc: In function 'main':
>>
>> gcc/testsuite/g++.dg/torture/vshuf-main.inc:15:9: internal compiler error: 
>> in as_a, at machmode.h:381
>>
>>15 |   V r = __builtin_shuffle(in1[i], mask1[i]);
>>
>>   | ^
>>
>> 0x7e5b8e scalar_int_mode as_a(machine_mode)
>>
>> ../.././gcc/gcc/machmode.h:381
>>
>> 0x7eabdb scalar_mode as_a(machine_mode)
>>
>> ../.././gcc/gcc/expr.cc:332
>>
>> 0x7eabdb convert_mode_scalar
>>
>> ../.././gcc/gcc/expr.cc:325
>>
>> 0xb8485b store_expr(tree_node*, rtx_def*, int, bool, bool)
>>
>> ../.././gcc/gcc/expr.cc:6413
>>
>> 0xb8a556 store_field
>>
>> ../.././gcc/gcc/expr.cc:7648
>>
>> 0xb88f27 store_constructor(tree_node*, rtx_def*, int, poly_int<2u, long>, 
>> bool)
>>
>> ../.././gcc/gcc/expr.cc:7588
>>
>> 0xb8b8b8 expand_constructor
>>
>> ../.././gcc/gcc/expr.cc:8931
>>
>> 0xb76bc7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>
>> ../.././gcc/gcc/expr.cc:11170
>>
>> 0xb77ef7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>
>> ../.././gcc/gcc/expr.cc:10809
>>
>> 0xb83a80 store_expr(tree_node*, rtx_def*, int, bool, bool)
>>
>> ../.././gcc/gcc/expr.cc:6325
>>
>> 0xb851d9 expand_assignment(tree_node*, tree_node*, bool)
>>
>> ../.././gcc/gcc/expr.cc:6043
>>
>> 0xa48717 expand_gimple_stmt_1
>>
>> ../.././gcc/gcc/cfgexpand.cc:3946
>>
>> 0xa48717 expand_gimple_stmt
>>
>> ../.././gcc/gcc/cfgexpand.cc:4044
>>
>> 0xa4d030 expand_gimple_basic_block
>>
>> ../.././gcc/gcc/cfgexpand.cc:6096
>>
>> 0xa4efd6 execute
>>
>> ../.././gcc/gcc/cfgexpand.cc:6831
>>
>>
>>
>> I double checked the reason that comes from we add even more machine modes 
>> in the RISC-V backend,
>>
>> and then did some investigation for the root cause. It should be related to 
>> the mode_table, as well as the
>>
>> bp_unpack_machine_mode.
>>
>>
>>
>> In lto_fe_init:
>>
>>unsigned char *table
>>
>> = ggc_vec_alloc (MAX_MACHINE_MODE);
>>
>>
>>
>>for (int m = 0; m < MAX_MACHINE_MODE; m++)
>>
>> t

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> >> Jakub: do you remember what the reason was?  I don't mind dropping
>> >> "function", but it feels weird to drop the quotes around "simd".
>> >> Seems like, if we do that, there'll one day be a patch to add
>> >> them back. :)
>> >
>> > Because in OpenMP their are % functions, not %
>> > %functions, but we also have the %/% attribute as
>> > extension.
>> 
>> Yeah, I can understand dropping the "function" bit.  But why
>> s/unsupported ... for %/unsupported ... for simd/?
>> Even if it's only a partial syntax quote, it is still a syntax quote.
>
> % in OpenMP is something very different though, so I think it is
> better to use it as a generic term which covers the different syntax cases.

OK, I won't press it further.

Richard


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jan Beulich via Gcc-patches
On 10.08.2023 15:12, Phoebe Wang wrote:
>>  The psABI should have some simple rule covering all of the above I think.
> 
> psABI has a rule for the case doesn't mean the rule is a well defined ABI
> in practice. A well defined ABI should guarantee 1) interlinkable across
> different compile options within the same compiler; 2) interlinkable across
> different compilers. Both aspects are failed in the non 512-bit version.
> 
> 1) is more important than 2) and becomes more critical on AVX10 targets.
> Because we expect AVX10-256 is a general setting for binaries that can run
> on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> with AVX10-256 may link with native built binaries on AVX10-512 targets.

But you're only describing a pre-existing problem here afaict. Code compiled
with -mavx51f passing __m512 type data to a function compiled with only,
say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
doesn't sufficiently define what __m256 etc actually are. After all these
aren't types defined by the C standard (as opposed to at least most other
types in the respective table there), and you can't really make assumptions
like "this is what certain compilers think this is".

Jan


[PATCH 0/5] [og13] OpenMP: Implement 'declare mapper' for 'target update' directives

2023-08-10 Thread Julian Brown
This series (for the og13 branch) implements 'declare mapper' support for
'target update' directives, and improves diagnostic behaviour relating
to mapper expansion (mostly for Fortran) in several ways.

Tested with offloading to AMD GCN.  Further comments on individual
patches.  I will apply (to the og13 branch) shortly.

Julian Brown (5):
  OpenMP: Move Fortran 'declare mapper' instantiation code
  OpenMP: Reprocess expanded clauses after 'declare mapper'
instantiation
  OpenMP: Introduce C_ORT_{,OMP_}DECLARE_MAPPER c_omp_region_type types
  OpenMP: Look up 'declare mapper' definitions at resolution time not
parse time
  OpenMP: Enable 'declare mapper' mappers for 'target update' directives

 gcc/c-family/c-common.h   |4 +
 gcc/c-family/c-omp.cc |  117 +-
 gcc/c/c-parser.cc |  152 +-
 gcc/cp/parser.cc  |  160 +-
 gcc/cp/pt.cc  |4 +-
 gcc/fortran/gfortran.h|   20 +
 gcc/fortran/match.cc  |4 +-
 gcc/fortran/module.cc |6 +
 gcc/fortran/openmp.cc | 1803 +++--
 gcc/fortran/trans-openmp.cc   |  408 +---
 .../c-c++-common/gomp/declare-mapper-17.c |   38 +
 .../c-c++-common/gomp/declare-mapper-19.c |   40 +
 .../gfortran.dg/gomp/declare-mapper-24.f90|   43 +
 .../gfortran.dg/gomp/declare-mapper-26.f90|   28 +
 .../gfortran.dg/gomp/declare-mapper-27.f90|   25 +
 .../gfortran.dg/gomp/declare-mapper-29.f90|   22 +
 .../gfortran.dg/gomp/declare-mapper-31.f90|   34 +
 .../libgomp.c-c++-common/declare-mapper-18.c  |   33 +
 .../libgomp.fortran/declare-mapper-25.f90 |   44 +
 .../libgomp.fortran/declare-mapper-28.f90 |   38 +
 .../libgomp.fortran/declare-mapper-30.f90 |   24 +
 .../libgomp.fortran/declare-mapper-4.f90  |   18 +-
 22 files changed, 2031 insertions(+), 1034 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-17.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-19.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-24.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-26.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-27.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-29.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-31.f90
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-mapper-18.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/declare-mapper-25.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/declare-mapper-28.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/declare-mapper-30.f90

-- 
2.25.1



[PATCH 3/5] OpenMP: Introduce C_ORT_{, OMP_}DECLARE_MAPPER c_omp_region_type types

2023-08-10 Thread Julian Brown
This patch adds C_ORT_DECLARE_MAPPER and C_ORT_OMP_DECLARE_MAPPER
region types to the c_omp_region_type enum, and uses them in cp/pt.cc.
Previously the C_ORT_DECLARE_SIMD code was being abused to inhibit calling
finish_omp_clauses within mapper definitions, but this patch uses one
of the new enumeration values for that purpose instead.  This shouldn't
result in any behaviour change, but improves self-documentation.

2023-08-10  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_DECLARE_MAPPER and
C_ORT_OMP_DECLARE_MAPPER codes.

gcc/cp/
* pt.cc (tsubst_omp_clauses): Use C_ORT_OMP_DECLARE_MAPPER.
(tsubst_expr): Likewise.
---
 gcc/c-family/c-common.h | 2 ++
 gcc/cp/pt.cc| 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index c805c8b2f7e..079d1eaafaa 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1271,9 +1271,11 @@ enum c_omp_region_type
   C_ORT_DECLARE_SIMD   = 1 << 2,
   C_ORT_TARGET = 1 << 3,
   C_ORT_EXIT_DATA  = 1 << 4,
+  C_ORT_DECLARE_MAPPER = 1 << 6,
   C_ORT_OMP_DECLARE_SIMD   = C_ORT_OMP | C_ORT_DECLARE_SIMD,
   C_ORT_OMP_TARGET = C_ORT_OMP | C_ORT_TARGET,
   C_ORT_OMP_EXIT_DATA  = C_ORT_OMP | C_ORT_EXIT_DATA,
+  C_ORT_OMP_DECLARE_MAPPER = C_ORT_OMP | C_ORT_DECLARE_MAPPER,
   C_ORT_ACC_TARGET = C_ORT_ACC | C_ORT_TARGET
 };
 
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index fb50c5ac48d..2794c0ebecb 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18328,7 +18328,7 @@ tsubst_omp_clauses (tree clauses, enum 
c_omp_region_type ort,
 }
 
   new_clauses = nreverse (new_clauses);
-  if (ort != C_ORT_OMP_DECLARE_SIMD)
+  if (ort != C_ORT_OMP_DECLARE_SIMD && ort != C_ORT_OMP_DECLARE_MAPPER)
 {
   if (ort & C_ORT_OMP)
new_clauses = c_omp_instantiate_mappers (new_clauses, ort);
@@ -19905,7 +19905,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
decl = tsubst (decl, args, complain, in_decl);
tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
tree clauses = OMP_DECLARE_MAPPER_CLAUSES (t);
-   clauses = tsubst_omp_clauses (clauses, C_ORT_OMP_DECLARE_SIMD, args,
+   clauses = tsubst_omp_clauses (clauses, C_ORT_OMP_DECLARE_MAPPER, args,
  complain, in_decl);
TREE_TYPE (t) = type;
OMP_DECLARE_MAPPER_DECL (t) = decl;
-- 
2.25.1



[PATCH 1/5] OpenMP: Move Fortran 'declare mapper' instantiation code

2023-08-10 Thread Julian Brown
This patch moves the code for explicit 'declare mapper' directive
instantiation in the Fortran front-end to openmp.cc from trans-openmp.cc.
The transformation takes place entirely in the front end's own
representation and doesn't involve middle-end trees at all. Also, having
the code in openmp.cc is more convenient for the following patch that
introduces the 'resolve_omp_mapper_clauses' function.

2023-08-10  Julian Brown  

gcc/fortran/
* gfortran.h (toc_directive): Move here.
(gfc_omp_instantiate_mappers, gfc_get_location): Add prototypes.
* openmp.cc (omp_split_map_op, omp_join_map_op, omp_map_decayed_kind,
omp_basic_map_kind_name, gfc_subst_replace, gfc_subst_prepend_ref,
gfc_subst_in_expr_1, gfc_subst_in_expr, gfc_subst_mapper_var): Move
here.
(gfc_omp_instantiate_mapper, gfc_omp_instantiate_mappers): Move here
and rename.
* trans-openmp.cc (toc_directive, omp_split_map_op, omp_join_map_op,
omp_map_decayed_kind, gfc_subst_replace, gfc_subst_prepend_ref,
gfc_subst_in_expr_1, gfc_subst_in_expr, gfc_subst_mapper_var,
gfc_trans_omp_instantiate_mapper, gfc_trans_omp_instantiate_mappers):
Remove from here.
(gfc_trans_omp_target, gfc_trans_omp_target_data,
gfc_trans_omp_target_enter_data, gfc_trans_omp_target_exit_data):
Rename calls to gfc_omp_instantiate_mappers.
---
 gcc/fortran/gfortran.h  |  16 ++
 gcc/fortran/openmp.cc   | 435 
 gcc/fortran/trans-openmp.cc | 388 +---
 3 files changed, 456 insertions(+), 383 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 0e7e80e4bf1..788b3797893 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3246,6 +3246,18 @@ typedef struct gfc_finalizer
 gfc_finalizer;
 #define gfc_get_finalizer() XCNEW (gfc_finalizer)
 
+/* Control clause translation per-directive for gfc_trans_omp_clauses.  Also
+   used for gfc_omp_instantiate_mappers.  */
+
+enum toc_directive
+{
+  TOC_OPENMP,
+  TOC_OPENMP_DECLARE_SIMD,
+  TOC_OPENMP_DECLARE_MAPPER,
+  TOC_OPENMP_EXIT_DATA,
+  TOC_OPENACC,
+  TOC_OPENACC_DECLARE
+};
 
 / Function prototypes */
 
@@ -3707,6 +3719,9 @@ void gfc_resolve_omp_do_blocks (gfc_code *, gfc_namespace 
*);
 void gfc_resolve_omp_declare_simd (gfc_namespace *);
 void gfc_resolve_omp_udrs (gfc_symtree *);
 void gfc_resolve_omp_udms (gfc_symtree *);
+void gfc_omp_instantiate_mappers (gfc_code *, gfc_omp_clauses *,
+ toc_directive = TOC_OPENMP,
+ int = OMP_LIST_MAP);
 void gfc_omp_save_and_clear_state (struct gfc_omp_saved_state *);
 void gfc_omp_restore_state (struct gfc_omp_saved_state *);
 void gfc_free_expr_list (gfc_expr_list *);
@@ -3956,6 +3971,7 @@ bool gfc_convert_to_structure_constructor (gfc_expr *, 
gfc_symbol *,
 /* trans.cc */
 void gfc_generate_code (gfc_namespace *);
 void gfc_generate_module_code (gfc_namespace *);
+location_t gfc_get_location (locus *);
 
 /* trans-intrinsic.cc */
 bool gfc_inline_intrinsic_function_p (gfc_expr *);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index deccb14a525..0f715a6f997 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -12584,6 +12584,441 @@ gfc_resolve_omp_udrs (gfc_symtree *st)
 gfc_resolve_omp_udr (omp_udr);
 }
 
+static enum gfc_omp_map_op
+omp_split_map_op (enum gfc_omp_map_op op, bool *force_p, bool *always_p,
+ bool *present_p)
+{
+  *force_p = *always_p = *present_p = false;
+
+  switch (op)
+{
+case OMP_MAP_FORCE_ALLOC:
+case OMP_MAP_FORCE_TO:
+case OMP_MAP_FORCE_FROM:
+case OMP_MAP_FORCE_TOFROM:
+case OMP_MAP_FORCE_PRESENT:
+  *force_p = true;
+  break;
+case OMP_MAP_ALWAYS_TO:
+case OMP_MAP_ALWAYS_FROM:
+case OMP_MAP_ALWAYS_TOFROM:
+  *always_p = true;
+  break;
+case OMP_MAP_ALWAYS_PRESENT_TO:
+case OMP_MAP_ALWAYS_PRESENT_FROM:
+case OMP_MAP_ALWAYS_PRESENT_TOFROM:
+  *always_p = true;
+  /* Fallthrough.  */
+case OMP_MAP_PRESENT_ALLOC:
+case OMP_MAP_PRESENT_TO:
+case OMP_MAP_PRESENT_FROM:
+case OMP_MAP_PRESENT_TOFROM:
+  *present_p = true;
+  break;
+default:
+  ;
+}
+
+  switch (op)
+{
+case OMP_MAP_ALLOC:
+case OMP_MAP_FORCE_ALLOC:
+case OMP_MAP_PRESENT_ALLOC:
+  return OMP_MAP_ALLOC;
+case OMP_MAP_TO:
+case OMP_MAP_FORCE_TO:
+case OMP_MAP_ALWAYS_TO:
+case OMP_MAP_PRESENT_TO:
+case OMP_MAP_ALWAYS_PRESENT_TO:
+  return OMP_MAP_TO;
+case OMP_MAP_FROM:
+case OMP_MAP_FORCE_FROM:
+case OMP_MAP_ALWAYS_FROM:
+case OMP_MAP_PRESENT_FROM:
+case OMP_MAP_ALWAYS_PRESENT_FROM:
+  return OMP_MAP_FROM;
+case OMP_MAP_TOFROM:
+case OMP_MAP_FORCE_TOFROM:
+case OMP_MAP_ALWAYS_TOFROM:
+case OMP_MAP_PRESENT_TOFROM:
+case OMP_MAP_

[PATCH 4/5] OpenMP: Look up 'declare mapper' definitions at resolution time not parse time

2023-08-10 Thread Julian Brown
This patch moves 'declare mapper' lookup for OpenMP clauses from parse
time to resolution time for Fortran, and adds diagnostics for missing
named mappers.  This changes clause lookup in a particular case -- where
several 'declare mapper's are defined in a context, mappers declared
earlier may now instantiate mappers declared later, whereas previously
they would not.  I think the new behaviour makes more sense -- at an
invocation site, all mappers are visible no matter the declaration order
in some particular block.  I've adjusted tests to account for this.

I think the new arrangement better matches the Fortran FE's usual way of
doing things -- mapper lookup is a semantic concept, not a syntactical
one, so shouldn't be handled in the syntax-handling code.

The patch also fixes a case where the user explicitly writes 'default'
as the name on the mapper modifier for a clause.

2023-08-10  Julian Brown  

gcc/fortran/
* gfortran.h (gfc_omp_namelist_udm): Add MAPPER_ID field to store the
mapper name to use for lookup during resolution.
* match.cc (gfc_free_omp_namelist): Handle OMP_LIST_TO and
OMP_LIST_FROM when freeing mapper references.
* module.cc (load_omp_udms, write_omp_udm): Handle MAPPER_ID field.
* openmp.cc (gfc_match_omp_clauses): Handle explicitly-specified
'default' name.  Don't do mapper lookup here, but record mapper name if
the user specifies one.
(resolve_omp_clauses): Do mapper lookup here instead.  Report error for
missing named mapper.

gcc/testsuite/
* gfortran.dg/gomp/declare-mapper-31.f90: New test.

libgomp/
* testsuite/libgomp.fortran/declare-mapper-30.f90: New test.
* testsuite/libgomp.fortran/declare-mapper-4.f90: Adjust test for new
lookup behaviour.
---
 gcc/fortran/gfortran.h|  3 ++
 gcc/fortran/match.cc  |  4 +-
 gcc/fortran/module.cc |  6 +++
 gcc/fortran/openmp.cc | 46 ++-
 .../gfortran.dg/gomp/declare-mapper-31.f90| 34 ++
 .../libgomp.fortran/declare-mapper-30.f90 | 24 ++
 .../libgomp.fortran/declare-mapper-4.f90  | 18 +---
 7 files changed, 116 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-31.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/declare-mapper-30.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index a98424b3263..3b854e14d47 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1784,6 +1784,9 @@ gfc_omp_udm;
 
 typedef struct gfc_omp_namelist_udm
 {
+  /* Used to store mapper_id before resolution.  */
+  const char *mapper_id;
+
   bool multiple_elems_p;
   struct gfc_omp_udm *udm;
 }
diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 53367ab2a0b..3db8e0f0969 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -5537,7 +5537,9 @@ void
 gfc_free_omp_namelist (gfc_omp_namelist *name, int list)
 {
   bool free_ns = (list == OMP_LIST_AFFINITY || list == OMP_LIST_DEPEND);
-  bool free_mapper = (list == OMP_LIST_MAP);
+  bool free_mapper = (list == OMP_LIST_MAP
+ || list == OMP_LIST_TO
+ || list == OMP_LIST_FROM);
   bool free_align = (list == OMP_LIST_ALLOCATE);
   gfc_omp_namelist *n;
 
diff --git a/gcc/fortran/module.cc b/gcc/fortran/module.cc
index 5cd52e7729b..acdbfa7924f 100644
--- a/gcc/fortran/module.cc
+++ b/gcc/fortran/module.cc
@@ -5238,6 +5238,11 @@ load_omp_udms (void)
  if (peek_atom () != ATOM_RPAREN)
{
  n->u2.udm = gfc_get_omp_namelist_udm ();
+ mio_pool_string (&n->u2.udm->mapper_id);
+
+ if (n->u2.udm->mapper_id == NULL)
+   n->u2.udm->mapper_id = gfc_get_string ("%s", "");
+
  n->u2.udm->multiple_elems_p = mio_name (0, omp_map_cardinality);
  mio_pointer_ref (&n->u2.udm->udm);
}
@@ -6314,6 +6319,7 @@ write_omp_udm (gfc_omp_udm *udm)
 
   if (n->u2.udm)
{
+ mio_pool_string (&n->u2.udm->mapper_id);
  mio_name (n->u2.udm->multiple_elems_p, omp_map_cardinality);
  mio_pointer_ref (&n->u2.udm->udm);
}
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0109df4dfce..ba2a8221b96 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -3615,6 +3615,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
  m = gfc_match (" %n ) ", mapper_id);
  if (m != MATCH_YES)
goto error;
+ if (strcmp (mapper_id, "default") == 0)
+   mapper_id[0] = '\0';
}
  else
break;
@@ -3689,19 +3691,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
  for (n = *head; n; n = n->next)
   

[PATCH 2/5] OpenMP: Reprocess expanded clauses after 'declare mapper' instantiation

2023-08-10 Thread Julian Brown
This patch reprocesses expanded clauses after 'declare mapper'
instantiation -- checking things such as duplicated clauses, illegal
use of strided accesses, and so forth.  Two functions are broken out
of the 'resolve_omp_clauses' function and reused in a new function
'resolve_omp_mapper_clauses', called after mapper instantiation.

This improves diagnostic output.

2023-08-10  Julian Brown  

gcc/fortran/
* gfortran.h (gfc_omp_clauses): Add NS field.
* openmp.cc (verify_omp_clauses_symbol_dups,
omp_verify_map_motion_clauses): New functions, broken out of...
(resolve_omp_clauses): Here.  Record namespace containing clauses.
Call above functions.
(resolve_omp_mapper_clauses): New function, using helper functions
broken out above.
(gfc_resolve_omp_directive): Add NS parameter to resolve_omp_clauses
calls.
(gfc_omp_instantiate_mappers): Call resolve_omp_mapper_clauses if we
instantiate any mappers.

gcc/testsuite/
* gfortran.dg/gomp/declare-mapper-26.f90: New test.
* gfortran.dg/gomp/declare-mapper-29.f90: New test.
---
 gcc/fortran/gfortran.h|1 +
 gcc/fortran/openmp.cc | 1250 +
 .../gfortran.dg/gomp/declare-mapper-26.f90|   28 +
 .../gfortran.dg/gomp/declare-mapper-29.f90|   22 +
 4 files changed, 718 insertions(+), 583 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-26.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-29.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 788b3797893..a98424b3263 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1577,6 +1577,7 @@ typedef struct gfc_omp_clauses
   struct gfc_omp_assumptions *assume;
   struct gfc_expr_list *tile_sizes;
   const char *critical_name;
+  gfc_namespace *ns;
   enum gfc_omp_default_sharing default_sharing;
   enum gfc_omp_atomic_op atomic_op;
   enum gfc_omp_defaultmap defaultmap[OMP_DEFAULTMAP_CAT_NUM];
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0f715a6f997..0109df4dfce 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8123,6 +8123,611 @@ gfc_resolve_omp_assumptions (gfc_omp_assumptions 
*assume)
 &el->expr->where);
 }
 
+/* Check OMP_CLAUSES for duplicate symbols and various other constraints.
+   Helper function for resolve_omp_clauses and resolve_omp_mapper_clauses.  */
+
+static void
+verify_omp_clauses_symbol_dups (gfc_code *code, gfc_omp_clauses *omp_clauses,
+   gfc_namespace *ns, bool openacc)
+{
+  gfc_omp_namelist *n;
+  int list;
+
+  /* Check that no symbol appears on multiple clauses, except that a symbol
+ can appear on both firstprivate and lastprivate.  */
+  for (list = 0; list < OMP_LIST_NUM; list++)
+for (n = omp_clauses->lists[list]; n; n = n->next)
+  {
+   if (!n->sym)  /* omp_all_memory.  */
+ continue;
+   n->sym->mark = 0;
+   n->sym->comp_mark = 0;
+   n->sym->data_mark = 0;
+   n->sym->dev_mark = 0;
+   n->sym->gen_mark = 0;
+   n->sym->reduc_mark = 0;
+   if (n->sym->attr.flavor == FL_VARIABLE
+   || n->sym->attr.proc_pointer
+   || (!code
+   && !ns->omp_udm_ns
+   && (!n->sym->attr.dummy || n->sym->ns != ns)))
+ {
+   if (!code
+   && !ns->omp_udm_ns
+   && (!n->sym->attr.dummy || n->sym->ns != ns))
+ gfc_error ("Variable %qs is not a dummy argument at %L",
+n->sym->name, &n->where);
+   continue;
+ }
+   if (n->sym->attr.flavor == FL_PROCEDURE
+   && n->sym->result == n->sym
+   && n->sym->attr.function)
+ {
+   if (gfc_current_ns->proc_name == n->sym
+   || (gfc_current_ns->parent
+   && gfc_current_ns->parent->proc_name == n->sym))
+ continue;
+   if (gfc_current_ns->proc_name->attr.entry_master)
+ {
+   gfc_entry_list *el = gfc_current_ns->entries;
+   for (; el; el = el->next)
+ if (el->sym == n->sym)
+   break;
+   if (el)
+ continue;
+ }
+   if (gfc_current_ns->parent
+   && gfc_current_ns->parent->proc_name->attr.entry_master)
+ {
+   gfc_entry_list *el = gfc_current_ns->parent->entries;
+   for (; el; el = el->next)
+ if (el->sym == n->sym)
+   break;
+   if (el)
+ continue;
+ }
+ }
+   if (list == OMP_LIST_MAP
+   && n->sym->attr.flavor == FL_PARAMETER)
+ {
+   if (openacc)
+ gfc_error ("Object %qs is not a variable at %L; parameters"
+" cannot be and need not be copied", n->sym->name,
+ 

[PATCH 5/5] OpenMP: Enable 'declare mapper' mappers for 'target update' directives

2023-08-10 Thread Julian Brown
This patch enables use of 'declare mapper' for 'target update' directives,
for each of C, C++ and Fortran.

There are some implementation choices here and some
"read-between-the-lines" consequences regarding this functionality,
as follows:

 * It is possible to invoke a mapper which contains clauses that
   don't make sense for a given 'target update' operation.  E.g. if a
   mapper definition specifies a "from:" mapping and the user does "target
   update to(...)" which triggers that mapper, the resulting map kind
   (OpenMP 5.2, "Table 5.3: Map-Type Decay of Map Type Combinations")
   is "alloc" (and for the inverse case "release").  For such cases,
   an unconditional warning is issued and the map clause in question is
   dropped from the mapper expansion.  (Other choices might be to make
   this an error, or to do the same thing but silently, or warn only
   given some special option.)

 * The array-shaping operator is *permitted* for map clauses within
   'declare mapper' definitions.  That is because such mappers may be used
   for 'target update' directives, where the array-shaping operator is
   permitted.  I think that makes sense, depending on the semantic model
   of how and when substitution is supposed to take place, but I couldn't
   find such behaviour explicitly mentioned in the spec (as of 5.2).
   If the mapper is triggered by a different directive ("omp target",
   "omp target data", etc.), an error will be raised.

Support is also added for the "mapper" modifier on to/from clauses for
all three base languages.

2023-08-10  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_UPDATE and C_ORT_OMP_UPDATE
codes.
* c-omp.cc (omp_basic_map_kind_name): New function.
(omp_instantiate_mapper): Add LOC parameter.  Add 'target update'
support.
(c_omp_instantiate_mappers): Add 'target update' support.

gcc/c/
* c-parser.cc (c_parser_omp_variable_list): Support array-shaping
operator in 'declare mapper' definitions.
(c_parser_omp_clause_map): Pass C_ORT_OMP_DECLARE_MAPPER to
c_parser_omp_variable_list in mapper definitions.
(c_parser_omp_clause_from_to): Add parsing for mapper modifier.
(c_parser_omp_target_update): Instantiate mappers.

gcc/cp/
* parser.cc (cp_parser_omp_var_list_no_open): Support array-shaping
operator in 'declare mapper' definitions.
(cp_parser_omp_clause_from_to): Add parsing for mapper modifier.
(cp_parser_omp_clause_map): Pass C_ORT_OMP_DECLARE_MAPPER to
cp_parser_omp_var_list_no_open in mapper definitions.
(cp_parser_omp_target_update): Instantiate mappers.

gcc/fortran/
* openmp.cc (gfc_match_motion_var_list): Add parsing for mapper
modifier.
(gfc_match_omp_clauses): Adjust error handling for changes to
gfc_match_motion_var_list.
* trans-openmp.cc (gfc_trans_omp_clauses): Use correct ref for update
operations.
(gfc_trans_omp_target_update): Instantiate mappers.

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-17.c: New test.
* c-c++-common/gomp/declare-mapper-19.c: New test.
* gfortran.dg/gomp/declare-mapper-24.f90: New test.
* gfortran.dg/gomp/declare-mapper-26.f90: Uncomment 'target update' part
of test.
* gfortran.dg/gomp/declare-mapper-27.f90: New test.

libgomp/
* testsuite/libgomp.c-c++-common/declare-mapper-18.c: New test.
* testsuite/libgomp.fortran/declare-mapper-25.f90: New test.
* testsuite/libgomp.fortran/declare-mapper-28.f90: New test.
---
 gcc/c-family/c-common.h   |   2 +
 gcc/c-family/c-omp.cc | 117 +++--
 gcc/c/c-parser.cc | 152 +++--
 gcc/cp/parser.cc  | 160 --
 gcc/fortran/openmp.cc |  86 --
 gcc/fortran/trans-openmp.cc   |  20 ++-
 .../c-c++-common/gomp/declare-mapper-17.c |  38 +
 .../c-c++-common/gomp/declare-mapper-19.c |  40 +
 .../gfortran.dg/gomp/declare-mapper-24.f90|  43 +
 .../gfortran.dg/gomp/declare-mapper-26.f90|   4 +-
 .../gfortran.dg/gomp/declare-mapper-27.f90|  25 +++
 .../libgomp.c-c++-common/declare-mapper-18.c  |  33 
 .../libgomp.fortran/declare-mapper-25.f90 |  44 +
 .../libgomp.fortran/declare-mapper-28.f90 |  38 +
 14 files changed, 746 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-17.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-19.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-24.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-27.f90
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-mapper-18.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/declare-mapper-25.f90
 creat

Re: [PATCH] MATCH: [PR110937/PR100798] (a ? ~b : b) should be optimized to b ^ -(a)

2023-08-10 Thread Christophe Lyon via Gcc-patches
Hi Andrew,


On Wed, 9 Aug 2023 at 21:20, Andrew Pinski via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> This adds a simple match pattern for this case.
> I noticed it a couple of different places.
> One while I was looking at code generation of a parser and
> also while I was looking at locations where bitwise_inverted_equal_p
> should be used more.
>
> Committed as approved after bootstrapped and tested on x86_64-linux-gnu
> with no regressions.
>
> PR tree-optimization/110937
> PR tree-optimization/100798
>
> gcc/ChangeLog:
>
> * match.pd (`a ? ~b : b`): Handle this
> case.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bool-14.c: New test.
> * gcc.dg/tree-ssa/bool-15.c: New test.
> * gcc.dg/tree-ssa/phi-opt-33.c: New test.
> * gcc.dg/tree-ssa/20030709-2.c: Update testcase
> so `a ? -1 : 0` is not used to hit the match
> pattern.
>

Our CI noticed that your patch introduced regressions as follows on aarch64:

 Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/cond_op_imm_1.c scan-assembler csinv\tw[0-9]*.*
FAIL: gcc.target/aarch64/cond_op_imm_1.c scan-assembler csinv\tx[0-9]*.*

Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-not \\tmov\\tz
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tneg\\tz[0-9]+\\.b, p[0-7]/m, 3
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tneg\\tz[0-9]+\\.h, p[0-7]/m, 2
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tneg\\tz[0-9]+\\.s, p[0-7]/m, 1
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tnot\\tz[0-9]+\\.b, p[0-7]/m, 3
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tnot\\tz[0-9]+\\.h, p[0-7]/m, 2
FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
\\tnot\\tz[0-9]+\\.s, p[0-7]/m, 1

Hopefully you'll just need to update the testcases (I didn't check
manually, I think you can easily reproduce this on aarch64?)

Thanks,

Christophe




> ---
>  gcc/match.pd   | 14 ++
>  gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c |  5 +++--
>  gcc/testsuite/gcc.dg/tree-ssa/bool-14.c| 15 +++
>  gcc/testsuite/gcc.dg/tree-ssa/bool-15.c| 18 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 13 +
>  5 files changed, 63 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 9b4819e5be7..fc630b63563 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6460,6 +6460,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (cmp == NE_EXPR)
> { constant_boolean_node (true, type); })))
>
> +#if GIMPLE
> +/* a?~t:t -> (-(a))^t */
> +(simplify
> + (cond @0 @1 @2)
> + (if (INTEGRAL_TYPE_P (type)
> +  && bitwise_inverted_equal_p (@1, @2))
> +  (with {
> +auto prec = TYPE_PRECISION (type);
> +auto unsign = TYPE_UNSIGNED (type);
> +tree inttype = build_nonstandard_integer_type (prec, unsign);
> +   }
> +   (convert (bit_xor (negate (convert:inttype @0)) (convert:inttype
> @2))
> +#endif
> +
>  /* Simplify pointer equality compares using PTA.  */
>  (for neeq (ne eq)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> index 5009cd69cfe..78938f919d4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> @@ -29,15 +29,16 @@ union tree_node
>  };
>  int make_decl_rtl (tree, int);
>  void *
> -get_alias_set (t)
> +get_alias_set (t, t1)
>   tree t;
> + void *t1;
>  {
>long set;
>if (t->decl.rtl)
>  return (t->decl.rtl->fld[1].rtmem
> ? 0
> : (((t->decl.rtl ? t->decl.rtl: (make_decl_rtl (t, 0),
> t->decl.rtl)))->fld[1]).rtmem);
> -  return (void*)-1;
> +  return t1;
>  }
>
>  /* There should be precisely one load of ->decl.rtl.  If there is
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
> b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
> new file mode 100644
> index 000..0149380a63b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/110937 */
> +
> +_Bool f2(_Bool a, _Bool b)
> +{
> +if (a)
> +  return !b;
> +return b;
> +}
> +
> +/* We should be able to remove the conditional and convert it to an xor.
> */
> +/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-

RE: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

2023-08-10 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, August 10, 2023 8:45 PM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: rdapp@gmail.com; kito.cheng ; Kito.cheng 
; jeffreyalaw 
Subject: Re: [PATCH] RISC-V: Add MASK vec_duplicate pattern[PR110962]

> Is this patch ok ? Maybe we can find a way to add a target specific
> fortran test but should not block this bug fix.

It's not much different than adding a C testcase actually, apart from 
starting comments with a !

But well, LGTM.  The test doesn't look that complicated and quite likely
is covered by the Fortran testsuite already.

Regards
 Robin


Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
>  wrote:
>>
>> If `A` has a range of `[0,0][100,INF]` and the comparison
>> of `A < 50`. This should be optimized to `A <= 0` (which then
>> will be optimized to just `A == 0`).
>> This patch implement this via a new function which sees if
>> the constant of a comparison is in the middle of 2 range pairs
>> and change the constant to the either upper bound of the first pair
>> or the lower bound of the second pair depending on the comparison.
>>
>> This is the first step in fixing the following PRS:
>> PR 110131, PR 108360, and PR 108397.
>>
>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>
>
>> gcc/ChangeLog:
>>
>> * vr-values.cc (simplify_compare_using_range_pairs): New function.
>> (simplify_using_ranges::simplify_compare_using_ranges_1): Call
>> it.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/vrp124.c: New test.
>> * gcc.dg/pr21643.c: Disable VRP.
>> ---
>>  gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
>>  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
>>  gcc/vr-values.cc   | 65 ++
>>  3 files changed, 114 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
>> index 4e7f93d351a..7f121d7006f 100644
>> --- a/gcc/testsuite/gcc.dg/pr21643.c
>> +++ b/gcc/testsuite/gcc.dg/pr21643.c
>> @@ -1,6 +1,10 @@
>>  /* PR tree-optimization/21643 */
>>  /* { dg-do compile } */
>> -/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
>> logical-op-non-short-circuit=1" } */
>> +/* Note VRP is able to transform `c >= 0x20` in f7
>> +   to `c >= 0x21` since we want to test
>> +   reassociation and not VRP, turn it off. */
>> +
>> +/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
>> logical-op-non-short-circuit=1 -fno-tree-vrp" } */
>>
>>  int
>>  f1 (unsigned char c)
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>> new file mode 100644
>> index 000..6ccbda35d1b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>> @@ -0,0 +1,44 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +/* Should be optimized to a == -100 */
>> +int g(int a)
>> +{
>> +  if (a == -100 || a >= 0)
>> +;
>> +  else
>> +return 0;
>> +  return a < 0;
>> +}
>> +
>> +/* Should optimize to a == 0 */
>> +int f(int a)
>> +{
>> +  if (a == 0 || a > 100)
>> +;
>> +  else
>> +return 0;
>> +  return a < 50;
>> +}
>> +
>> +/* Should be optimized to a == 0. */
>> +int f2(int a)
>> +{
>> +  if (a == 0 || a > 100)
>> +;
>> +  else
>> +return 0;
>> +  return a < 100;
>> +}
>> +
>> +/* Should optimize to a == 100 */
>> +int f1(int a)
>> +{
>> +  if (a < 0 || a == 100)
>> +;
>> +  else
>> +return 0;
>> +  return a > 50;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
>> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
>> index a4fddd62841..1262e7cf9f0 100644
>> --- a/gcc/vr-values.cc
>> +++ b/gcc/vr-values.cc
>> @@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree 
>> op0,
>>if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
>> return min;
>>  }
>> +
>>return NULL;
>>  }
>>
>> +/* Simplify integer comparisons such that the constant is one of the range 
>> pairs.
>> +   For an example,
>> +   A has a range of [0,0][100,INF]
>> +   and the comparison of `A < 50`.
>> +   This should be optimized to `A <= 0`
>> +   and then test_for_singularity can optimize it to `A == 0`.   */
>> +
>> +static bool
>> +simplify_compare_using_range_pairs (tree_code &cond_code, tree &op0, tree 
>> &op1,
>> +   const value_range *vr)
>> +{
>> +  if (TREE_CODE (op1) != INTEGER_CST
>> +  || vr->num_pairs () < 2)
>> +return false;
>> +  auto val_op1 = wi::to_wide (op1);
>> +  tree type = TREE_TYPE (op0);
>> +  auto sign = TYPE_SIGN (type);
>> +  auto p = vr->num_pairs ();
>> +  /* Find the value range pair where op1
>> + is in the middle of if one exist. */
>> +  for (unsigned i = 1; i < p; i++)
>> +{
>> +  auto lower = vr->upper_bound (i - 1);
>> +  auto upper = vr->lower_bound (i);
>> +  if (wi::lt_p (val_op1, lower, sign))
>> +   continue;
>> +  if (wi::gt_p (val_op1, upper, sign))
>> +   continue;
>
> That looks like a linear search - it looks like m_base[] is
> a sorted array of values so we should be able to
> binary search here?  array_slice::bsearch could be
> used if it existed (simply port it over from vec<> and
> use array_slice from that)?

Better to use std::lower_bound IMO, rather than implement our
own custom bsearch.

Thanks,
Richard


Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 3:31 PM Jan Beulich  wrote:
>
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined ABI
> > in practice. A well defined ABI should guarantee 1) interlinkable across
> > different compile options within the same compiler; 2) interlinkable across
> > different compilers. Both aspects are failed in the non 512-bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can run
> > on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> > with AVX10-256 may link with native built binaries on AVX10-512 targets.
>
> But you're only describing a pre-existing problem here afaict. Code compiled
> with -mavx51f passing __m512 type data to a function compiled with only,
> say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
> doesn't sufficiently define what __m256 etc actually are. After all these
> aren't types defined by the C standard (as opposed to at least most other
> types in the respective table there), and you can't really make assumptions
> like "this is what certain compilers think this is".

You might be able to speak in terms of OpenMP SIMD with simdlen?

Richard.

> Jan


Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 3:44 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
> >  wrote:
> >>
> >> If `A` has a range of `[0,0][100,INF]` and the comparison
> >> of `A < 50`. This should be optimized to `A <= 0` (which then
> >> will be optimized to just `A == 0`).
> >> This patch implement this via a new function which sees if
> >> the constant of a comparison is in the middle of 2 range pairs
> >> and change the constant to the either upper bound of the first pair
> >> or the lower bound of the second pair depending on the comparison.
> >>
> >> This is the first step in fixing the following PRS:
> >> PR 110131, PR 108360, and PR 108397.
> >>
> >> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >
> >
> >> gcc/ChangeLog:
> >>
> >> * vr-values.cc (simplify_compare_using_range_pairs): New function.
> >> (simplify_using_ranges::simplify_compare_using_ranges_1): Call
> >> it.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/vrp124.c: New test.
> >> * gcc.dg/pr21643.c: Disable VRP.
> >> ---
> >>  gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
> >>  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
> >>  gcc/vr-values.cc   | 65 ++
> >>  3 files changed, 114 insertions(+), 1 deletion(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/pr21643.c 
> >> b/gcc/testsuite/gcc.dg/pr21643.c
> >> index 4e7f93d351a..7f121d7006f 100644
> >> --- a/gcc/testsuite/gcc.dg/pr21643.c
> >> +++ b/gcc/testsuite/gcc.dg/pr21643.c
> >> @@ -1,6 +1,10 @@
> >>  /* PR tree-optimization/21643 */
> >>  /* { dg-do compile } */
> >> -/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
> >> logical-op-non-short-circuit=1" } */
> >> +/* Note VRP is able to transform `c >= 0x20` in f7
> >> +   to `c >= 0x21` since we want to test
> >> +   reassociation and not VRP, turn it off. */
> >> +
> >> +/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
> >> logical-op-non-short-circuit=1 -fno-tree-vrp" } */
> >>
> >>  int
> >>  f1 (unsigned char c)
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >> new file mode 100644
> >> index 000..6ccbda35d1b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >> @@ -0,0 +1,44 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >> +
> >> +/* Should be optimized to a == -100 */
> >> +int g(int a)
> >> +{
> >> +  if (a == -100 || a >= 0)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 0;
> >> +}
> >> +
> >> +/* Should optimize to a == 0 */
> >> +int f(int a)
> >> +{
> >> +  if (a == 0 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 50;
> >> +}
> >> +
> >> +/* Should be optimized to a == 0. */
> >> +int f2(int a)
> >> +{
> >> +  if (a == 0 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 100;
> >> +}
> >> +
> >> +/* Should optimize to a == 100 */
> >> +int f1(int a)
> >> +{
> >> +  if (a < 0 || a == 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a > 50;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> >> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
> >> index a4fddd62841..1262e7cf9f0 100644
> >> --- a/gcc/vr-values.cc
> >> +++ b/gcc/vr-values.cc
> >> @@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree 
> >> op0,
> >>if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
> >> return min;
> >>  }
> >> +
> >>return NULL;
> >>  }
> >>
> >> +/* Simplify integer comparisons such that the constant is one of the 
> >> range pairs.
> >> +   For an example,
> >> +   A has a range of [0,0][100,INF]
> >> +   and the comparison of `A < 50`.
> >> +   This should be optimized to `A <= 0`
> >> +   and then test_for_singularity can optimize it to `A == 0`.   */
> >> +
> >> +static bool
> >> +simplify_compare_using_range_pairs (tree_code &cond_code, tree &op0, tree 
> >> &op1,
> >> +   const value_range *vr)
> >> +{
> >> +  if (TREE_CODE (op1) != INTEGER_CST
> >> +  || vr->num_pairs () < 2)
> >> +return false;
> >> +  auto val_op1 = wi::to_wide (op1);
> >> +  tree type = TREE_TYPE (op0);
> >> +  auto sign = TYPE_SIGN (type);
> >> +  auto p = vr->num_pairs ();
> >> +  /* Find the value range pair where op1
> >> + is in the middle of if one exist. */
> >> +  for (unsigned i = 1; i < p; i++)
> >> +{
> >> +  auto lower = vr->upper_bound (i - 1);
> >> +  auto upper = vr->lower_bound (i);
> >> +  if (wi::lt_p (val_op1, lower, sign))
> >> +   continue;
> >> +  if (wi::gt_p (val_op1, upper, sign))
> >> +   continue;
> >
> > That looks like a linear search - it looks like m_ba

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> > So, should the equivalent FAM struct also have this sizeof()?  If no: 
> > there should be a good argument why it shouldn't be similar to the non-FAM 
> > one.
> 
> The sizeof() of a structure with FAM is defined as: (after I searched online,
>  I think that the one in Wikipedia is the most reasonable one):
> https://en.wikipedia.org/wiki/Flexible_array_member

Well, wikipedia has it's uses.  Here we are in language lawyering land 
together with a discussion what makes most sense in many circumstances.  
FWIW, in this case I think the cited text from wikipedia is correct in the 
sense of "not wrong" but not helpful in the sense of "good advise".

> By definition, the sizeof() of a struct with FAM might not be the same 
> as the non-FAM one. i.e, for the following two structures, one with FAM, 
> the other with fixed array:
> 
> struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> struct foo_fix {int a; short b; char t[3]; } 
> 
> With current GCC:
> sizeof(foo_flex) == 8
> sizeof(foo_fix) == 12
> 
> I think that the current behavior of sizeof for structure with FAM in 
> GCC is correct.

It is, yes.

> The major issue is what was pointed out by Martin in the previous email:
> 
> Whether using the following formula is correct to compute the 
> allocation?
> 
> sizeof(struct foo_flex) + N * sizeof(foo->t);
> 
> As pointed out  in the wikipedia, the value computed by this formula might
> be bigger than the actual size since “sizeof(struct foo_flex)” might include 
> paddings that are used as part of the array.

That doesn't make the formula incorrect, but rather conservatively 
correct.  If you don't want to be conservative, then yes, you can use a 
different formula if you happen to know the layout rules your compiler at 
hand uses (or the ones prescribed by an ABI, if it does that).  I think 
it would be bad advise to the general population do advertise this scheme 
as better.

> So the more accurate formula should be
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);

"* sizeof(foo->t[0])", but yes.

> For the question, whether the compiler needs to allocate paddings after 
> the FAM field, I don’t know the answer, and it’s not specified in the 
> standard either. Does it matter?

It matters for two things:

1) Abstract reasons: is there as reason to deviate 
from the normal rules?  If not: it shouldn't deviate.  Future 
extensibility: while it right now is not possible to form an array 
of FMA-structs (in C!), who's to say that it may not be eventually added?  
It seems a natural enough extension of an extension, and while it has 
certain implementation problems (the "real" size of the elements needs to 
be computable, and hence be part of the array type) those could be 
overcome.  At that point you _really_ want to have the elements aligned 
naturally, be compatible with sizeof, and be the same as an individual 
object.

2) Practical reasons: codegeneration works better if the accessible sizes 
of objects are a multiple of their alignment, as often you have 
instructions that can move around alignment-sized blobs (say, words).  If 
you don't pad out objects you have to be careful to use only byte accesses 
when you get to the end of an object.

Let me ask the question in the opposite way: what would you _gain_ by not 
allocating padding?  And is that enough to deviate from the most obvious 
choices?  (Do note that e.g. global or local FMA-typed objects don't exist 
in isolation, and their surrounding objects have their own alignment 
rules, which often will lead to tail padding anyway, even if you would use 
the non-conservative size calculation; the same applies for malloc'ed 
objects).

> > Note that if one choses to allocate less space than sizeof implies that 
> > this will have quite some consequences for code generation, in that 
> > sometimes the instruction sequences (e.g. for copying) need to be careful 
> > to never access tail padding that should be there in array context, but 
> > isn't there in single-object context.  I think this alone should make it 
> > clear that it's advisable that sizeof() and allocated size agree.
> 
> Sizeof by definition is return the size of the TYPE, not the size of the 
> allocated object.

Sure.  Outside special cases like FMA it's the same, though.  And there 
sizeof does include tail padding.

> > And then the next question is what __builtin_object_size should do with 
> > these: should it return the size with or without padding at end (i.e. 
> > could/should it return 9 even if sizeof is 12).  I can see arguments for 
> > both.
> 
> Currently, GCC’s __builtin_object_size use the following formula to 
> compute the object size for The structure with FAM:
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> 
> I think it’s correct.

See above.  It's non-conservatively correct.  And that may be the right 
choice for this builtin, considering its intended use-cases (strict 
checkin

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Qing Zhao via Gcc-patches


> On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:
>> 
>>> On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> 
> ...
>> 
>> By definition, the sizeof() of a struct with FAM might not be the same as 
>> the non-FAM one. 
>> i.e, for the following two structures, one with FAM, the other with fixed 
>> array:
>> 
>> struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
>> struct foo_fix {int a; short b; char t[3]; } 
>> 
>> With current GCC:
>> sizeof(foo_flex) == 8
>> sizeof(foo_fix) == 12
>> 
>> I think that the current behavior of sizeof for structure with FAM in GCC is 
>> correct. 
> 
> Yes, sadly the sizeof has to be like this as required by ISO C.
Agreed. Yes, if the size information of the FAM can be integrated into the 
TYPE system in C standard, we will not have such issue anymore. 

> 
>> 
>> The major issue is what was pointed out by Martin in the previous email:
>> 
>> Whether using the following formula is correct to compute the allocation?
>> 
>> sizeof(struct foo_flex) + N * sizeof(foo->t);
> 
> That formula is safe for allocation, but might allocate more padding
> than the minimum amount and
Yes. 
> it might allocate less storage than a
> similar array with fixed array.
Yes. 
> 
>> As pointed out  in the wikipedia, the value computed by this formula might
>>  be bigger than the actual size since “sizeof(struct foo_flex)” might 
>> include 
>> paddings that are used as part of the array.
>> 
>> So the more accurate formula should be
>> 
>> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
>> 
>> With GCC, offset(struct foo_flex,t[0]) == 6, which is also correct. 
> 
> This formula might be considered incorrect / dangerous because
> it might allocate less storage than sizeof(struct foo_flex). 
> 
> 
> https://godbolt.org/z/8accq75f3

I see, thanks.
>> 
> ...
>>> As in: I think sizeof for both structs should return 12, and 12 bytes 
>>> should be reserved for objects of such types.
>>> 
>>> And then the next question is what __builtin_object_size should do with 
>>> these: should it return the size with or without padding at end (i.e. 
>>> could/should it return 9 even if sizeof is 12).  I can see arguments for 
>>> both.
>> 
>> Currently, GCC’s __builtin_object_size use the following formula to compute 
>> the object size for
>> The structure with FAM:
>> 
>> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
>> 
>> I think it’s correct. 
>> 
>> I think that the users might need to use this formula to compute the 
>> allocation size for a structure with FAM too.
> 
> I am not sure for the reason given above. The following
> code would not work:
> 
> struct foo_flex { int a; short b; char t[]; } x;
> x.a = 1;
> struct foo_flex *p = malloc(sizeof(x) + x.a);
> if (!p) abort();
> memcpy(p, &x, sizeof(x)); // initialize struct
> 
Okay. 
Then, the user still should use the sizeof(struct foo_flex) + N * 
sizeof(foo->t) for the allocation, even though this might allocate more bytes 
than necessary. (But this is safe)

Let me know if I still miss anything.

Thanks a lot for the explanation.

Qing
> Martin
> 
> 
> 



Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hey,

On Thu, 10 Aug 2023, Martin Uecker wrote:

> > offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> > 
> > With GCC, offset(struct foo_flex,t[0]) == 6, which is also correct. 
> 
> This formula might be considered incorrect / dangerous because
> it might allocate less storage than sizeof(struct foo_flex). 

Oh indeed.  I hadn't even considered that.  That could be "fixed" with 
another max(theabove, sizeof(struct foo_flex)), but that starts to become 
silly when the obvious choice works fine.


Ciao,
Michael.


Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Aug 10, 2023 at 3:44 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener via Gcc-patches  writes:
>> > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
>> >  wrote:
>> >>
>> >> If `A` has a range of `[0,0][100,INF]` and the comparison
>> >> of `A < 50`. This should be optimized to `A <= 0` (which then
>> >> will be optimized to just `A == 0`).
>> >> This patch implement this via a new function which sees if
>> >> the constant of a comparison is in the middle of 2 range pairs
>> >> and change the constant to the either upper bound of the first pair
>> >> or the lower bound of the second pair depending on the comparison.
>> >>
>> >> This is the first step in fixing the following PRS:
>> >> PR 110131, PR 108360, and PR 108397.
>> >>
>> >> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>> >
>> >
>> >
>> >> gcc/ChangeLog:
>> >>
>> >> * vr-values.cc (simplify_compare_using_range_pairs): New function.
>> >> (simplify_using_ranges::simplify_compare_using_ranges_1): Call
>> >> it.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >> * gcc.dg/tree-ssa/vrp124.c: New test.
>> >> * gcc.dg/pr21643.c: Disable VRP.
>> >> ---
>> >>  gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
>> >>  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
>> >>  gcc/vr-values.cc   | 65 ++
>> >>  3 files changed, 114 insertions(+), 1 deletion(-)
>> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>> >>
>> >> diff --git a/gcc/testsuite/gcc.dg/pr21643.c 
>> >> b/gcc/testsuite/gcc.dg/pr21643.c
>> >> index 4e7f93d351a..7f121d7006f 100644
>> >> --- a/gcc/testsuite/gcc.dg/pr21643.c
>> >> +++ b/gcc/testsuite/gcc.dg/pr21643.c
>> >> @@ -1,6 +1,10 @@
>> >>  /* PR tree-optimization/21643 */
>> >>  /* { dg-do compile } */
>> >> -/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
>> >> logical-op-non-short-circuit=1" } */
>> >> +/* Note VRP is able to transform `c >= 0x20` in f7
>> >> +   to `c >= 0x21` since we want to test
>> >> +   reassociation and not VRP, turn it off. */
>> >> +
>> >> +/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
>> >> logical-op-non-short-circuit=1 -fno-tree-vrp" } */
>> >>
>> >>  int
>> >>  f1 (unsigned char c)
>> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
>> >> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>> >> new file mode 100644
>> >> index 000..6ccbda35d1b
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>> >> @@ -0,0 +1,44 @@
>> >> +/* { dg-do compile } */
>> >> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> >> +
>> >> +/* Should be optimized to a == -100 */
>> >> +int g(int a)
>> >> +{
>> >> +  if (a == -100 || a >= 0)
>> >> +;
>> >> +  else
>> >> +return 0;
>> >> +  return a < 0;
>> >> +}
>> >> +
>> >> +/* Should optimize to a == 0 */
>> >> +int f(int a)
>> >> +{
>> >> +  if (a == 0 || a > 100)
>> >> +;
>> >> +  else
>> >> +return 0;
>> >> +  return a < 50;
>> >> +}
>> >> +
>> >> +/* Should be optimized to a == 0. */
>> >> +int f2(int a)
>> >> +{
>> >> +  if (a == 0 || a > 100)
>> >> +;
>> >> +  else
>> >> +return 0;
>> >> +  return a < 100;
>> >> +}
>> >> +
>> >> +/* Should optimize to a == 100 */
>> >> +int f1(int a)
>> >> +{
>> >> +  if (a < 0 || a == 100)
>> >> +;
>> >> +  else
>> >> +return 0;
>> >> +  return a > 50;
>> >> +}
>> >> +
>> >> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
>> >> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
>> >> index a4fddd62841..1262e7cf9f0 100644
>> >> --- a/gcc/vr-values.cc
>> >> +++ b/gcc/vr-values.cc
>> >> @@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree 
>> >> op0,
>> >>if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
>> >> return min;
>> >>  }
>> >> +
>> >>return NULL;
>> >>  }
>> >>
>> >> +/* Simplify integer comparisons such that the constant is one of the 
>> >> range pairs.
>> >> +   For an example,
>> >> +   A has a range of [0,0][100,INF]
>> >> +   and the comparison of `A < 50`.
>> >> +   This should be optimized to `A <= 0`
>> >> +   and then test_for_singularity can optimize it to `A == 0`.   */
>> >> +
>> >> +static bool
>> >> +simplify_compare_using_range_pairs (tree_code &cond_code, tree &op0, 
>> >> tree &op1,
>> >> +   const value_range *vr)
>> >> +{
>> >> +  if (TREE_CODE (op1) != INTEGER_CST
>> >> +  || vr->num_pairs () < 2)
>> >> +return false;
>> >> +  auto val_op1 = wi::to_wide (op1);
>> >> +  tree type = TREE_TYPE (op0);
>> >> +  auto sign = TYPE_SIGN (type);
>> >> +  auto p = vr->num_pairs ();
>> >> +  /* Find the value range pair where op1
>> >> + is in the middle of if one exist. */
>> >> +  for (unsigned i = 1; i < p; i++)
>> >> +{
>> >> +  auto lower = vr->upper_bound (i - 1);
>> >> +  auto upper = vr->lower_bound (i);
>> >> +  if (wi::lt_p (val

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jan Beulich 
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com; Liu,
> Hongtao ; Zhang, Annita ;
> x86-64-abi ; llvm-dev  d...@lists.llvm.org>; Craig Topper ; Richard Biener
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the 
> > non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled 
> with
> -mavx51f passing __m512 type data to a function compiled with only, say, 
> -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is 
> what
> certain compilers think this is".
> 
> Jan


RE: Machine Mode ICE in RISC-V when LTO

2023-08-10 Thread Li, Pan2 via Gcc-patches
Thanks Thomas for the information, great to learn you have a fix WIP.

> ... is not sufficient: that runs into GTY issues, as the current
> 'unsigned char *lto_mode_identity_table' is (mis-)classified by
> 'gengtype' as a C string.  This happens to work for this case, but still
> isn't right, and only works for 'char *' but not 'short *' etc

Does it reports something like " gcc/lto-streamer.h:599: field 
`(*x).mode_table' is pointer to unimplemented type" when changing to short *?

>-return (machine_mode) ib->file_data->mode_table[ix];
>+return ib->file_data->mode_table ? ib->file_data->mode_table[ix] : ix;

Got the point and the mode_table is constant up to a point.

Pan

-Original Message-
From: Thomas Schwinge  
Sent: Thursday, August 10, 2023 9:24 PM
To: Li, Pan2 ; Richard Biener ; 
Jakub Jelinek 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; kito.ch...@gmail.com; 
Jeff Law ; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: RE: Machine Mode ICE in RISC-V when LTO

Hi!

On 2023-08-10T12:25:36+, "Li, Pan2"  wrote:
> Thanks Richard for comment, let me try to promote the table to unsigned short.

I have WIP work for this issue -- which I'd already raised a month ago:
:

On 2023-06-30T13:46:07+0200, Thomas Schwinge  wrote:
> In particular, the 'lto_mode_identity_table' changes would seem necessary
> to keep standard LTO ('-flto') functional for large 'machine_mode' size?

... which is exactly the problem you've now run into?

However, a simple:

-GTY(()) const unsigned char *lto_mode_identity_table;
+GTY(()) const unsigned short *lto_mode_identity_table;

..., or:

-GTY(()) const unsigned char *lto_mode_identity_table;
+GTY(()) const machine_mode *lto_mode_identity_table;

... is not sufficient: that runs into GTY issues, as the current
'unsigned char *lto_mode_identity_table' is (mis-)classified by
'gengtype' as a C string.  This happens to work for this case, but still
isn't right, and only works for 'char *' but not 'short *' etc.  I have
WIP work to tighten that.  ..., which got me into other GTY issues, and
so on...  ;-) (Richard already ACKed and I pushed some of the
prerequisite changes, but there's more to come.)  I'm still planning on
resolving all that mess, but I'm tight on time right now.

However, I have a different proposal, which should address your current
issue: simply, get rid of the 'lto_mode_identity_table', which is just
that: a 1-to-1 mapping of array index to value.  Instead, in
'gcc/lto/lto-common.cc:lto_file_finalize', for '!ACCEL_COMPILER', set
'file_data->mode_table = NULL', and in the users (only
'gcc/tree-streamer.h:bp_unpack_machine_mode'?), replace (untested):

-return (machine_mode) ib->file_data->mode_table[ix];
+return ib->file_data->mode_table ? ib->file_data->mode_table[ix] : ix;

Jakub, as the original author of 'lto_mode_identity_table' (see
commit db847fa8f2cca6139188b8dfa0a7064319b19193 (Subversion r221005)), is
there any reason not to do it this way?


Grüße
 Thomas


> -Original Message-
> From: Richard Biener 
> Sent: Thursday, August 10, 2023 7:08 PM
> To: Li, Pan2 
> Cc: richard.sandif...@arm.com; Thomas Schwinge ; 
> ja...@redhat.com; kito.ch...@gmail.com; Jeff Law ; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang 
> Subject: Re: Machine Mode ICE in RISC-V when LTO
>
> On Thu, Aug 10, 2023 at 10:19 AM Li, Pan2  wrote:
>>
>> Hi all,
>>
>>
>>
>> Recently I found there is still some issues for the machine mode with LTO 
>> part by fixing one
>>
>> ICE (only when compile with LTO) in RISC-V backend in , aka below case.
>>
>>
>>
>> >> ../__RISC-V_INSTALL___/bin/riscv64-unknown-elf-g++ -O2 -flto 
>> >> gcc/testsuite/g++.dg/torture/vshuf-v4df.C -o test.elf
>>
>> during RTL pass: expand
>>
>> gcc/testsuite/g++.dg/torture/vshuf-main.inc: In function 'main':
>>
>> gcc/testsuite/g++.dg/torture/vshuf-main.inc:15:9: internal compiler error: 
>> in as_a, at machmode.h:381
>>
>>15 |   V r = __builtin_shuffle(in1[i], mask1[i]);
>>
>>   | ^
>>
>> 0x7e5b8e scalar_int_mode as_a(machine_mode)
>>
>> ../.././gcc/gcc/machmode.h:381
>>
>> 0x7eabdb scalar_mode as_a(machine_mode)
>>
>> ../.././gcc/gcc/expr.cc:332
>>
>> 0x7eabdb convert_mode_scalar
>>
>> ../.././gcc/gcc/expr.cc:325
>>
>> 0xb8485b store_expr(tree_node*, rtx_def*, int, bool, bool)
>>
>> ../.././gcc/gcc/expr.cc:6413
>>
>> 0xb8a556 store_field
>>
>> ../.././gcc/gcc/expr.cc:7648
>>
>> 0xb88f27 store_constructor(tree_node*, rtx_def*, int, poly_int<2u, long>, 
>> bool)
>>
>> ../.././gcc/gcc/expr.cc:7588
>>
>> 0xb8b8b8 expand_constructor
>>
>> ../.././gcc/gcc/expr.cc:8931
>>
>> 0xb76bc7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>
>> ../.././gcc/gcc/expr.cc:11170
>>
>> 0xb77ef7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>
>>

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-10 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 8 Aug 2023 at 15:27, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Fri, 4 Aug 2023 at 20:36, Richard Sandiford
> >  wrote:
> >>
> >> Full review this time, sorry for the skipping the tests earlier.
> > Thanks for the detailed review! Please find my responses inline below.
> >>
> >> Prathamesh Kulkarni  writes:
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 7e5494dfd39..680d0e54fd4 100644
> >> > --- a/gcc/fold-const.cc
> >> > +++ b/gcc/fold-const.cc
> >> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
> >> >  #include "vec-perm-indices.h"
> >> >  #include "asan.h"
> >> >  #include "gimple-range.h"
> >> > +#include 
> >>
> >> This should be included by defining INCLUDE_ALGORITHM instead.
> > Done. Just curious, why do we use this macro instead of directly
> > including  ?
>
> AIUI, one of the reasons for having every file start with includes
> of config.h and (b)system.h, in that order, is to ensure that a small
> and predictable amount of GCC-specific stuff happens before including
> the system header files.  That helps to avoid OS-specific clashes between
> GCC code and system headers.
>
> But another major reason is that system.h ends by poisoning a lot of
> stuff that system headers would be entitled to use.
Ah OK, thanks for the clarification!
>
> >> > +  tree_vector_builder builder (vectype, npatterns, nelts_per_pattern);
> >> > +
> >> > +  // Fill a0 for each pattern
> >> > +  for (unsigned i = 0; i < npatterns; i++)
> >> > +builder.quick_push (build_int_cst (inner_type, rand () % 100));
> >> > +
> >> > +  if (nelts_per_pattern == 1)
> >> > +return builder.build ();
> >> > +
> >> > +  // Fill a1 for each pattern
> >> > +  for (unsigned i = 0; i < npatterns; i++)
> >> > +builder.quick_push (build_int_cst (inner_type, rand () % 100));
> >> > +
> >> > +  if (nelts_per_pattern == 2)
> >> > +return builder.build ();
> >> > +
> >> > +  for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; 
> >> > i++)
> >> > +{
> >> > +  tree prev_elem = builder[i - npatterns];
> >> > +  int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> >> > +  int val = prev_elem_val + S;
> >> > +  builder.quick_push (build_int_cst (inner_type, val));
> >> > +}
> >> > +
> >> > +  return builder.build ();
> >> > +}
> >> > +
> >> > +static void
> >> > +validate_res (unsigned npatterns, unsigned nelts_per_pattern,
> >> > +   tree res, tree *expected_res)
> >> > +{
> >> > +  ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) == npatterns);
> >> > +  ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) == nelts_per_pattern);
> >>
> >> I don't think this is safe when the inputs are randomised.  E.g. we
> >> could by chance end up with a vector of all zeros, which would have
> >> a single pattern and a single element per pattern, regardless of the
> >> shapes of the inputs.
> >>
> >> Given the way that vector_builder::finalize
> >> canonicalises the encoding, it should be safe to use:
> >>
> >> * VECTOR_CST_NPATTERNS (res) <= npatterns
> >> * vector_cst_encoded_nelts (res) <= npatterns * nelts_per_pattern
> >>
> >> If we do that then...
> >>
> >> > +
> >> > +  for (unsigned i = 0; i < vector_cst_encoded_nelts (res); i++)
> >>
> >> ...this loop bound should be npatterns * nelts_per_pattern instead.
> > Ah indeed. Fixed, thanks.
>
> The patch instead does:
>
>   ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) <= npatterns);
>   ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) <= nelts_per_pattern);
>
> I think the version I suggested is safer.  It's not the goal of the
> canonicalisation algorithm to reduce both npattners and nelts_per_pattern
> individually.  The algorithm can increase nelts_per_pattern in order
> to decrease npatterns.
Oops, sorry I misread, will fix in the next patch.
>
> >> > +  {
> >> > +tree arg0 = build_vec_cst_rand (integer_type_node, 1, 3, 2);
> >> > +tree arg1 = build_vec_cst_rand (integer_type_node, 1, 3, 2);
> >> > +poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > +
> >> > +vec_perm_builder builder (arg0_len, 1, 3);
> >> > +builder.quick_push (arg0_len);
> >> > +builder.quick_push (arg0_len + 1);
> >> > +builder.quick_push (arg0_len + 2);
> >> > +
> >> > +vec_perm_indices sel (builder, 2, arg0_len);
> >> > +tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, 
> >> > NULL, true);
> >> > +tree expected_res[] = { vector_cst_elt (arg1, 0), vector_cst_elt 
> >> > (arg1, 1),
> >> > + vector_cst_elt (arg1, 2) };
> >> > +validate_res (1, 3, res, expected_res);
> >> > +  }
> >> > +
> >> > +  /* Case 3: Leading element of arg1, stepped sequence: pattern 0 of 
> >> > arg0.
> >> > + sel = {len, 0, 0, 0, 2, 0, ...}
> >> > + npatterns = 2, nelts_per_pattern = 3.
> >> > + Use extra pattern {0, ...} to lower number of elements per 
> >> > pattern.  */
> >> > +  {
> >> > +tree arg0 = build_vec_cst_rand (cha

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Martin Uecker
Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:
> 
> > On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:
> > 
> > Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:
> > > 
> > > > On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> > 

> > I am not sure for the reason given above. The following
> > code would not work:
> > 
> > struct foo_flex { int a; short b; char t[]; } x;
> > x.a = 1;
> > struct foo_flex *p = malloc(sizeof(x) + x.a);
> > if (!p) abort();
> > memcpy(p, &x, sizeof(x)); // initialize struct
> > 
> Okay. 
> Then, the user still should use the sizeof(struct foo_flex) + N * 
> sizeof(foo->t) for the allocation, even though this might allocate more bytes 
> than necessary. (But this is safe)
> 
> Let me know if I still miss anything.

The question is not only what the user should use to
allocate, but also what BDOS should return.  In my
example the user uses the sizeof() + N * sizeof
formula and the memcpy is safe, but it would be flagged
as a buffer overrun if BDOS uses the offsetof formula.

Martin




Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:
> Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:
> > 
> > > On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:
> > > 
> > > Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:
> > > > 
> > > > > On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> > > 
> 
> > > I am not sure for the reason given above. The following
> > > code would not work:
> > > 
> > > struct foo_flex { int a; short b; char t[]; } x;
> > > x.a = 1;
> > > struct foo_flex *p = malloc(sizeof(x) + x.a);
> > > if (!p) abort();
> > > memcpy(p, &x, sizeof(x)); // initialize struct
> > > 
> > Okay. 
> > Then, the user still should use the sizeof(struct foo_flex) + N * 
> > sizeof(foo->t) for the allocation, even though this might allocate more 
> > bytes than necessary. (But this is safe)
> > 
> > Let me know if I still miss anything.
> 
> The question is not only what the user should use to
> allocate, but also what BDOS should return.  In my
> example the user uses the sizeof() + N * sizeof
> formula and the memcpy is safe, but it would be flagged
> as a buffer overrun if BDOS uses the offsetof formula.

BDOS/BOS (at least the 0 level) should return what is actually
allocated for the var, what size was passed to malloc and if it
is a var with flex array member with initialization what is actually the
size on the stack or in .data/.rodata etc.
And for 1 level the same unless it is just access to some element, then
it should be capped by the size of that element.

Jakub



Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Martin Uecker
Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek:
> On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:
> > Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:
> > > 
> > > > On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:
> > > > 
> > > > Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:
> > > > > 
> > > > > > On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> > > > 
> > 
> > > > I am not sure for the reason given above. The following
> > > > code would not work:
> > > > 
> > > > struct foo_flex { int a; short b; char t[]; } x;
> > > > x.a = 1;
> > > > struct foo_flex *p = malloc(sizeof(x) + x.a);
> > > > if (!p) abort();
> > > > memcpy(p, &x, sizeof(x)); // initialize struct
> > > > 
> > > Okay. 
> > > Then, the user still should use the sizeof(struct foo_flex) + N * 
> > > sizeof(foo->t) for the allocation, even though this might allocate more 
> > > bytes than necessary. (But this is safe)
> > > 
> > > Let me know if I still miss anything.
> > 
> > The question is not only what the user should use to
> > allocate, but also what BDOS should return.  In my
> > example the user uses the sizeof() + N * sizeof
> > formula and the memcpy is safe, but it would be flagged
> > as a buffer overrun if BDOS uses the offsetof formula.
> 
> BDOS/BOS (at least the 0 level) should return what is actually
> allocated for the var, what size was passed to malloc and if it
> is a var with flex array member with initialization what is actually the
> size on the stack or in .data/.rodata etc.

Agreed.

But what about a struct with FAM with the new "counted_by" attribute
if the original allocation is not visible?

Martin

> And for 1 level the same unless it is just access to some element, then
> it should be capped by the size of that element.
> 






[PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-10 Thread Stefan Schulze Frielinghaus via Gcc-patches
In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
completely missed the fact that the normal form of a generated constant for a
mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
actual constant.  This even holds true for unsigned constants.

Fixed by masking out the upper bits for the incoming constant and sign
extending the resulting unsigned constant.

Bootstrapped and regtested on x64 and s390x.  Ok for mainline?

While reading existing optimizations in combine I stumbled across two
optimizations where either my intuition about the representation of
unsigned integers via a const_int rtx is wrong, which then in turn would
probably also mean that this patch is wrong, or that the optimizations
are missed sometimes.  In other words in the following I would assume
that the upper bits are masked out:

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 468b7fde911..80c4ff0fbaf 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   /* (unsigned) < 0x8000 is equivalent to >= 0.  */
   else if (is_a  (mode, &int_mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
-  && ((unsigned HOST_WIDE_INT) const_op
+  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
(int_mode))
   == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1)))
{
  const_op = 0;
@@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   /* (unsigned) >= 0x8000 is equivalent to < 0.  */
   else if (is_a  (mode, &int_mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
-  && ((unsigned HOST_WIDE_INT) const_op
+  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
(int_mode))
   == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1)))
{
  const_op = 0;

For example, while bootstrapping on x64 the optimization is missed since
a LTU comparison in QImode is done and the constant equals
0xff80.

Sorry for inlining another patch, but I would really like to make sure
that my understanding is correct, now, before I come up with another
patch.  Thus it would be great if someone could shed some light on this.

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Properly handle unsigned
constants while narrowing comparison of memory and constants.
---
 gcc/combine.cc | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index e46d202d0a7..468b7fde911 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   && !MEM_VOLATILE_P (op0)
   /* The optimization makes only sense for constants which are big enough
 so that we have a chance to chop off something at all.  */
-  && (unsigned HOST_WIDE_INT) const_op > 0xff
-  /* Bail out, if the constant does not fit into INT_MODE.  */
-  && (unsigned HOST_WIDE_INT) const_op
-< ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - 1)
+  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 0xff
   /* Ensure that we do not overflow during normalization.  */
-  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
HOST_WIDE_INT_M1U))
+  && (code != GTU
+ || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
+< HOST_WIDE_INT_M1U)
+  && trunc_int_for_mode (const_op, int_mode) == const_op)
 {
-  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
+  unsigned HOST_WIDE_INT n
+   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
   enum rtx_code adjusted_code;
 
   /* Normalize code to either LEU or GEU.  */
@@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
-   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME (adjusted_code),
-   n);
+   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode),
+   GET_RTX_NAME (adjusted_code), n);
}
  poly_int64 offset = (BYTES_BIG_ENDIAN
   ? 0
   : (GET_MODE_SIZE (int_mode)
  - GET_MODE_SIZE (narrow_mode_iter)));
  *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
- *pop1 = GEN_INT (n);
+ *pop1 = gen_int_mode (n, narrow_mode_iter);
  return adjusted_code;
}
 }
-- 
2.41.0



Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Siddhesh Poyarekar

On 2023-08-10 10:47, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek:

On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:



On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:

Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:



On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:





I am not sure for the reason given above. The following
code would not work:

struct foo_flex { int a; short b; char t[]; } x;
x.a = 1;
struct foo_flex *p = malloc(sizeof(x) + x.a);
if (!p) abort();
memcpy(p, &x, sizeof(x)); // initialize struct


Okay.
Then, the user still should use the sizeof(struct foo_flex) + N * 
sizeof(foo->t) for the allocation, even though this might allocate more bytes 
than necessary. (But this is safe)

Let me know if I still miss anything.


The question is not only what the user should use to
allocate, but also what BDOS should return.  In my
example the user uses the sizeof() + N * sizeof
formula and the memcpy is safe, but it would be flagged
as a buffer overrun if BDOS uses the offsetof formula.


BDOS/BOS (at least the 0 level) should return what is actually
allocated for the var, what size was passed to malloc and if it
is a var with flex array member with initialization what is actually the
size on the stack or in .data/.rodata etc.


Agreed.

But what about a struct with FAM with the new "counted_by" attribute
if the original allocation is not visible?


There's precedent for this through the __access__ attribute; __bos 
trusts what the attribute says about the allocation.


Sid


Re: [PATCH] tree-optimization/110963 - more PRE when optimizing for size

2023-08-10 Thread Jeff Law via Gcc-patches




On 8/10/23 06:41, Richard Biener via Gcc-patches wrote:

The following adjusts the heuristic when we perform PHI insertion
during GIMPLE PRE from requiring at least one edge that is supposed
to be optimized for speed to also doing insertion when the expression
is available on all edges (but possibly with different value) and
we'd at most have one copy from a constant.  The first ensures
we optimize two computations on all paths to one plus a possible
copy due to the PHI, the second makes sure we do not need to insert
many possibly large copies from constants, disregarding the
cummulative size cost of the register copies when they are not
coalesced.

The case in the testcase is

   
   _14 = h;
   if (_14 == 0B)
 goto ;
   else
 goto ;

   
   h = 0B;

   
   h.6_12 = h;

and we want to optimize that to

   
   # h.6_12 = PHI <_14(5), 0B(6)>

If we want to consider the cost of the register copies I think the
only simplistic enough way would be to restrict the special-case to
two incoming edges - we'd assume one register copy is coalesced
leaving one copy from a register or from a constant.

As with every optimization the downstream effects are probably
bigger than what we can locally estimate.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

PR tree-optimization/110963
* tree-ssa-pre.cc (do_pre_regular_insertion): Also insert
a PHI node when the expression is available on all edges
and we insert at most one copy from a constant.

* gcc.dg/tree-ssa/ssa-pre-34.c: New testcase.
The other thing in this space is to extend it to the case where multiple 
phi args have the same constant.  My recollection is we had some bits in 
the out-of-ssa code to factor those into a single path -- if that still 
works in the more modern expansion approach, the it'd likely be a win to 
support as well.


jeff


Re: [PATCH] tree-optimization/110963 - more PRE when optimizing for size

2023-08-10 Thread Richard Biener via Gcc-patches



> Am 10.08.2023 um 17:01 schrieb Jeff Law via Gcc-patches 
> :
> 
> 
> 
>> On 8/10/23 06:41, Richard Biener via Gcc-patches wrote:
>> The following adjusts the heuristic when we perform PHI insertion
>> during GIMPLE PRE from requiring at least one edge that is supposed
>> to be optimized for speed to also doing insertion when the expression
>> is available on all edges (but possibly with different value) and
>> we'd at most have one copy from a constant.  The first ensures
>> we optimize two computations on all paths to one plus a possible
>> copy due to the PHI, the second makes sure we do not need to insert
>> many possibly large copies from constants, disregarding the
>> cummulative size cost of the register copies when they are not
>> coalesced.
>> The case in the testcase is
>>   
>>   _14 = h;
>>   if (_14 == 0B)
>> goto ;
>>   else
>> goto ;
>>   
>>   h = 0B;
>>   
>>   h.6_12 = h;
>> and we want to optimize that to
>>   
>>   # h.6_12 = PHI <_14(5), 0B(6)>
>> If we want to consider the cost of the register copies I think the
>> only simplistic enough way would be to restrict the special-case to
>> two incoming edges - we'd assume one register copy is coalesced
>> leaving one copy from a register or from a constant.
>> As with every optimization the downstream effects are probably
>> bigger than what we can locally estimate.
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> Any comments?
>> Thanks,
>> Richard.
>>PR tree-optimization/110963
>>* tree-ssa-pre.cc (do_pre_regular_insertion): Also insert
>>a PHI node when the expression is available on all edges
>>and we insert at most one copy from a constant.
>>* gcc.dg/tree-ssa/ssa-pre-34.c: New testcase.
> The other thing in this space is to extend it to the case where multiple phi 
> args have the same constant.  My recollection is we had some bits in the 
> out-of-ssa code to factor those into a single path -- if that still works in 
> the more modern expansion approach, the it'd likely be a win to support as 
> well.

Yes, though it comes at the cost of another branch, no?  The other thing to 
consider is that undoing the transform is much more difficult for constants, we 
usually have no idea what expression to re-materialize for it.

Richard.

> jeff


RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Zhang, Annita via Gcc-patches
For ABI change proposal, I'd suggest to raise a discussion in x86-64-abi group. 

Thx,
Annita

> -Original Message-
> From: Jiang, Haochen 
> Sent: Thursday, August 10, 2023 10:15 PM
> To: Beulich, Jan ; Phoebe Wang
> 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; x86-64-
> abi ; llvm-dev ;
> Craig Topper ; Richard Biener
> 
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> > -Original Message-
> > From: Jan Beulich 
> > Sent: Thursday, August 10, 2023 9:31 PM
> > To: Phoebe Wang 
> > Cc: Joseph Myers ; Wang, Phoebe
> > ; Hongtao Liu ; Jiang,
> > Haochen ; gcc-patches@gcc.gnu.org;
> > ubiz...@gmail.com; Liu, Hongtao ; Zhang, Annita
> > ; x86-64-abi ;
> > llvm-dev ; Craig Topper
> > ; Richard Biener 
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On 10.08.2023 15:12, Phoebe Wang wrote:
> > >>  The psABI should have some simple rule covering all of the above I 
> > >> think.
> > >
> > > psABI has a rule for the case doesn't mean the rule is a well
> > > defined ABI in practice. A well defined ABI should guarantee 1)
> > > interlinkable across different compile options within the same
> > > compiler; 2) interlinkable across different compilers. Both aspects
> > > are failed in the non 512-
> > bit version.
> > >
> > > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > > Because we expect AVX10-256 is a general setting for binaries that
> > > can run on both AVX10-256 and AVX10-512. It would be common that
> > > binaries compiled with AVX10-256 may link with native built binaries
> > > on AVX10-512
> > targets.
> 
> IMO it is not acceptable for AVX10-256 to generate zmm registers.
> 
> If I have to choose among the three proposal, the second is better.
> 
> But the best choice I suppose is to keep what we are doing currently, which is
> passing them in memory and emit a warning. It is a reasonable behavior.
> 
> Thx,
> Haochen
> 
> >
> > But you're only describing a pre-existing problem here afaict. Code
> > compiled with -mavx51f passing __m512 type data to a function compiled
> > with only, say, -maxv2 won't interoperate properly either. What's
> > worse, imo the psABI doesn't sufficiently define what __m256 etc
> > actually are. After all these aren't types defined by the C standard
> > (as opposed to at least most other types in the respective table
> > there), and you can't really make assumptions like "this is what certain
> compilers think this is".
> >
> > Jan


RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.

For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
> instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
> AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
> (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
> We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
> in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
> is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
> maximum vector width should be the same when different version of AVX10 is
> used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
> AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
> Every feature showed up in the current version will always show up in 
> future
> version.
>   - AVX10 is an independent ISA feature set.
> Although sharing the same instructions and encodings, AVX10 and AVX512 are
> conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
> 256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
> vector width. “-mno-avx10.x-512” option will not be provided to avoid
> confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
> vector width. But it will disable 512 bit vector width since the vector 
> size
> is indicated in option. “-mno-avx10.x-256” option will not be provided to
> keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced 
> >=avx10.x
> (both 256 and 512 bit) and keep features  how
> -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. 
> -
> mavx10.x-512 -mavx10.y-256), we would like to emit a wa

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-10 Thread Richard Henderson via Gcc-patches

On 8/10/23 02:50, Wilco Dijkstra wrote:

Hi Richard,


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


It doesn't indeed. However most cores support atomic 128-bit load/store
(part of LSE2), so we can still use the LSE2 ifunc for those cores. Since there
isn't a feature bit for this in the CPU or HWCAP, I check the CPUID register.


That would be a really nice bit to add to HWCAP, then, to consolidate this knowledge in 
one place.  Certainly I would use it in QEMU as well.



r~



Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Martin Uecker
Am Donnerstag, dem 10.08.2023 um 10:58 -0400 schrieb Siddhesh Poyarekar:
> On 2023-08-10 10:47, Martin Uecker wrote:
> > Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek:
> > > On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:
> > > > Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:
> > > > > 
> > > > > > On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:
> > > > > > 
> > > > > > Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:
> > > > > > > 
> > > > > > > > On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> > > > > > 
> > > > 
> > > > > > I am not sure for the reason given above. The following
> > > > > > code would not work:
> > > > > > 
> > > > > > struct foo_flex { int a; short b; char t[]; } x;
> > > > > > x.a = 1;
> > > > > > struct foo_flex *p = malloc(sizeof(x) + x.a);
> > > > > > if (!p) abort();
> > > > > > memcpy(p, &x, sizeof(x)); // initialize struct
> > > > > > 
> > > > > Okay.
> > > > > Then, the user still should use the sizeof(struct foo_flex) + N * 
> > > > > sizeof(foo->t) for the allocation, even though this might allocate 
> > > > > more bytes than necessary. (But this is safe)
> > > > > 
> > > > > Let me know if I still miss anything.
> > > > 
> > > > The question is not only what the user should use to
> > > > allocate, but also what BDOS should return.  In my
> > > > example the user uses the sizeof() + N * sizeof
> > > > formula and the memcpy is safe, but it would be flagged
> > > > as a buffer overrun if BDOS uses the offsetof formula.
> > > 
> > > BDOS/BOS (at least the 0 level) should return what is actually
> > > allocated for the var, what size was passed to malloc and if it
> > > is a var with flex array member with initialization what is actually the
> > > size on the stack or in .data/.rodata etc.
> > 
> > Agreed.
> > 
> > But what about a struct with FAM with the new "counted_by" attribute
> > if the original allocation is not visible?
> 
> There's precedent for this through the __access__ attribute; __bos 
> trusts what the attribute says about the allocation.

The access attribute gives the size directly. The counted_by gives
a length for the array which needs to be translated into a size
via a formula. There are different formulas in use. The question 
is which formula should bdos trust?

Whatever you pick, if this is not consistent with the actual
allocation or use, then it will cause problems either by
breaking code or not detecting buffer overruns.

So it needs to be consistent with what GCC allocates for a
var with FAM and initialization and also the user needs to 
be told what the right choice is so that he can use the right
size for allocation and argument to memcpy / memset etc.

Martin








Re: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 03:08:11PM +, Zhang, Annita via Gcc-patches wrote:
> > IMO it is not acceptable for AVX10-256 to generate zmm registers.
> > 
> > If I have to choose among the three proposal, the second is better.
> > 
> > But the best choice I suppose is to keep what we are doing currently, which 
> > is
> > passing them in memory and emit a warning. It is a reasonable behavior.

Completely agree on this.  If anything in the psABI should be changed, that
IMHO would be just clarification if it is not clear enough that when __m256
and/or __m512 are passed on ISAs which do not support those they are passed
in memory.  That is what the psABI was clearly effectively saying before the
__m256 resp. __m512 support has been added there.
So yes, warn and use memory if ISA doesn't support those.

Jakub



[PATCH 13/12 v2] C _BitInt incremental fixes [PR102989]

2023-08-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 12:10:07PM +0200, Jakub Jelinek via Gcc-patches wrote:
> Here is an incremental patch which does that:

Bootstrap/regtest on i686-linux (next to x86_64-linux where it went fine)
revealed I forgot to add { target bitint } to dg-do compile lines.

Here is an updated patch which does that and passes even on i686-linux.

2023-08-10  Jakub Jelinek  

PR c/102989
gcc/c/
* c-decl.cc (finish_declspecs): Emit pedwarn_c11 on _BitInt.
* c-typeck.cc (c_common_type): Emit sorry for common type between
_Complex integer and larger _BitInt and return the _Complex integer.
gcc/c-family/
* c-attribs.cc (type_valid_for_vector_size): Reject vector types
with BITINT_TYPE elements even if they have mode precision and
suitable size.
gcc/testsuite/
* gcc.dg/bitint-19.c: New test.
* gcc.dg/bitint-20.c: New test.
* gcc.dg/bitint-21.c: New test.
* gcc.dg/bitint-22.c: New test.
* gcc.dg/bitint-23.c: New test.
* gcc.dg/bitint-24.c: New test.
* gcc.dg/bitint-25.c: New test.
* gcc.dg/bitint-26.c: New test.
* gcc.dg/bitint-27.c: New test.
* g++.dg/ext/bitint1.C: New test.
* g++.dg/ext/bitint2.C: New test.
* g++.dg/ext/bitint3.C: New test.
* g++.dg/ext/bitint4.C: New test.
libcpp/
* expr.cc (cpp_classify_number): Diagnose wb literal suffixes
for -pedantic* before C2X or -Wc11-c2x-compat.

--- gcc/c/c-decl.cc.jj  2023-08-10 09:26:39.776509713 +0200
+++ gcc/c/c-decl.cc 2023-08-10 11:14:12.686238299 +0200
@@ -12933,8 +12933,15 @@ finish_declspecs (struct c_declspecs *sp
   if (specs->u.bitint_prec == -1)
specs->type = integer_type_node;
   else
-   specs->type = build_bitint_type (specs->u.bitint_prec,
-specs->unsigned_p);
+   {
+ pedwarn_c11 (specs->locations[cdw_typespec], OPT_Wpedantic,
+  "ISO C does not support %<%s_BitInt(%d)%> before C2X",
+  specs->unsigned_p ? "unsigned "
+  : specs->signed_p ? "signed " : "",
+  specs->u.bitint_prec);
+ specs->type = build_bitint_type (specs->u.bitint_prec,
+  specs->unsigned_p);
+   }
   break;
 default:
   gcc_unreachable ();
--- gcc/c/c-typeck.cc.jj2023-08-10 09:26:39.781509641 +0200
+++ gcc/c/c-typeck.cc   2023-08-10 10:03:00.722917789 +0200
@@ -819,6 +819,12 @@ c_common_type (tree t1, tree t2)
return t1;
   else if (code2 == COMPLEX_TYPE && TREE_TYPE (t2) == subtype)
return t2;
+  else if (TREE_CODE (subtype) == BITINT_TYPE)
+   {
+ sorry ("%<_Complex _BitInt(%d)%> unsupported",
+TYPE_PRECISION (subtype));
+ return code1 == COMPLEX_TYPE ? t1 : t2;
+   }
   else
return build_complex_type (subtype);
 }
--- gcc/c-family/c-attribs.cc.jj2023-06-03 15:32:04.311412926 +0200
+++ gcc/c-family/c-attribs.cc   2023-08-10 10:07:05.222377604 +0200
@@ -4366,7 +4366,8 @@ type_valid_for_vector_size (tree type, t
  && GET_MODE_CLASS (orig_mode) != MODE_INT
  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
   || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
-  || TREE_CODE (type) == BOOLEAN_TYPE)
+  || TREE_CODE (type) == BOOLEAN_TYPE
+  || TREE_CODE (type) == BITINT_TYPE)
 {
   if (error_p)
error ("invalid vector type for attribute %qE", atname);
--- gcc/testsuite/gcc.dg/bitint-19.c.jj 2023-08-10 09:33:49.205287806 +0200
+++ gcc/testsuite/gcc.dg/bitint-19.c2023-08-10 09:36:43.312765194 +0200
@@ -0,0 +1,16 @@
+/* PR c/102989 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=gnu2x" } */
+
+#define expr_has_type(e, t) _Generic (e, default : 0, t : 1)
+
+void
+foo (_Complex int ci, _Complex long long cl)
+{
+  _BitInt(__SIZEOF_INT__ * __CHAR_BIT__ - 1) bi = 0wb;
+  _BitInt(__SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 1) bl = 0wb;
+  static_assert (expr_has_type (ci + bi, _Complex int));
+  static_assert (expr_has_type (cl + bl, _Complex long long));
+  static_assert (expr_has_type (bi + ci, _Complex int));
+  static_assert (expr_has_type (bl + cl, _Complex long long));
+}
--- gcc/testsuite/gcc.dg/bitint-20.c.jj 2023-08-10 09:40:14.340707650 +0200
+++ gcc/testsuite/gcc.dg/bitint-20.c2023-08-10 10:04:35.306548279 +0200
@@ -0,0 +1,16 @@
+/* PR c/102989 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=gnu2x" } */
+
+void
+foo (_Complex int ci, _Complex long long cl)
+{
+  _BitInt(__SIZEOF_INT__ * __CHAR_BIT__ + 1) bi = 0wb;
+  ci + bi; /* { dg-message "unsupported" } */
+  bi + ci; /* { dg-message "unsupported" } */
+#if __BITINT_MAXWIDTH__ >= 575
+  _BitInt(575) bw = 0wb;
+  cl + bw; /* { dg-message "unsupported" "" { target 
bitint575 } } */
+  bw + cl;  

Re: [PATCH v4] Implement new RTL optimizations pass: fold-mem-offsets.

2023-08-10 Thread Jeff Law via Gcc-patches




On 8/10/23 03:28, Manolis Tsamis wrote:

Hi Jeff,

Thanks a lot for providing all this information and testcase! I have
been able to reproduce the issue with it.

I have investigated the cause of the issue and it's not what you
mention, the uses of all intermediate calculations are properly taken
into account.
In this case it would be fine to alter the runtime value of insn 58
because its uses are also memory accesses that can have a folded
offset. The real issue is that the offset for these two (insn 60/61)
is not updated.

[ ... ]




This instruction doesn't match any of these since the if with the
CONST_INT only accepts arg1 being a single REG (that could be
extended, but that's not the point now) and as a result we do `return
0;`
But return 0 at this point loses the offset 1 calculated from arg1
previously and which is stored in `offset`. And that's our bug :)

Changing that return 0 to return offset (i.e. return what we have up
to now) fixes this testcase with the insn being folded and all offsets
updating properly. But it got me thinking that this is a more general
issue that I need to address differently.
There are more `return 0;` cases in the code which say "We know how to
handle this rtx code, but don't know how to propagate", but returning
0 is not enough.
It's also not correct to propagate an offset from just one argument
and punt on the other because the argument we punt might contain
references to the other argument.
Funny, I'd looked at that as well (return 0 signaling two different 
things), but from the standpoint of the analysis phase it didn't matter 
as we don't use the returned value.  So I set it aside.




So the general solution that solves all the issues is: If we don't
fully understand how to handle an instruction and its arguments in
fold_offsets then we need to mark it in one of the bitsets (either set
in cannot_fold or don't set in can_fold), whereas currently a insn
that has transitively all uses as foldable is foldable.
I'm still struggling a bit with using the transitive set as a global 
like we do.  I haven't come up with a case where it fails, but every 
time I look at it I wonder if it's going to go awry at some point.


Basically we'd be looking for a case where we have two MEMs which share 
some bit of address calculation, where one of the MEMs is foldable, but 
the other is not for some reason.


If we adjust the address calculations and the foldable MEM, then do we 
run the risk of needing to change the non-foldable MEM?


Jeff


  1   2   >