[PATCH] vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412]

2022-11-02 Thread Kewen.Lin via Gcc-patches
Hi,

As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
STORE} into normal vector load/store if the given length is known
to be equal to the length of the whole vector.  It would help to
improve overall cycles as normally the latency of vector access
with length in bytes is bigger than normal vector access, and it
also saves the preparation for length if constant length can not
be encoded into instruction (such as on power).

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/107412

gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
mask_p indicating it's for mask or length, and add some handlings for
IFN LEN_{LOAD,STORE}.
(gimple_fold_mask_load): Rename to ...
(gimple_fold_partial_load): ... this, add one parameter mask_p.
(gimple_fold_mask_store): Rename to ...
(gimple_fold_partial_store): ... this, add one parameter mask_p.
(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
and adjust calls on gimple_fold_mask_load_store_mem_ref to
gimple_fold_partial_load_store_mem_ref.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr107412.c: New test.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
folded LEN_LOAD.
---
 gcc/gimple-fold.cc| 57 ++-
 .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++
 3 files changed, 64 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index a1704784bc9..e3a087defa6 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
type,
   return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
 }

-/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
+/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
for the memory it references, otherwise return null.  VECTYPE is the
-   type of the memory vector.  */
+   type of the memory vector.  MASK_P indicates it's for MASK if true,
+   otherwise it's for LEN.  */

 static tree
-gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
+gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
 {
   tree ptr = gimple_call_arg (call, 0);
   tree alias_align = gimple_call_arg (call, 1);
-  tree mask = gimple_call_arg (call, 2);
-  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
+  if (!tree_fits_uhwi_p (alias_align))
 return NULL_TREE;

+  if (mask_p)
+{
+  tree mask = gimple_call_arg (call, 2);
+  if (!integer_all_onesp (mask))
+   return NULL_TREE;
+} else {
+  tree basic_len = gimple_call_arg (call, 2);
+  if (!tree_fits_uhwi_p (basic_len))
+   return NULL_TREE;
+  unsigned int nargs = gimple_call_num_args (call);
+  tree bias = gimple_call_arg (call, nargs - 1);
+  gcc_assert (tree_fits_uhwi_p (bias));
+  tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
+  unsigned int len = tree_to_uhwi (biased_len);
+  unsigned int vect_len
+   = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
+  if (vect_len != len)
+   return NULL_TREE;
+}
+
   unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
   if (TYPE_ALIGN (vectype) != align)
 vectype = build_aligned_type (vectype, align);
@@ -5390,16 +5410,18 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree 
vectype)
   return fold_build2 (MEM_REF, vectype, ptr, offset);
 }

-/* Try to fold IFN_MASK_LOAD call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_LOAD call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
+gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p)
 {
   tree lhs = gimple_call_lhs (call);
   if (!lhs)
 return false;

-  if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs)))
+  if (tree rhs
+  = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p))
 {
   gassign *new_stmt = gimple_build_assign (lhs, rhs);
   gimple_set_location (new_stmt, gimple_location (call));
@@ -5410,13 +5432,16 @@ gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall 
*call)
   return false;
 }

-/* Try to fold IFN_MASK_STORE call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_STORE call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_ma

[PATCH] testsuite: Fix gen-vect-34.c with vect_masked_load [PR106806]

2022-11-02 Thread Kewen.Lin via Gcc-patches
Hi,

This is to fix the failure on powerpc as reported in PR106806,
the test case requires tree ifcvt pass to perform on that loop,
and it relies on masked_load support.  The fix is to guard the
expected scan with vect_masked_load effective target.

As tested on powerpc64{,le}-linux-gnu and aarch64-linux-gnu
(cfarm machine), the failures were gone.  But on
x86_64-redhat-linux (cfarm machine) the result becomes from
PASS to N/A.  I think it's expected since that machine doesn't
support AVX by default so both check_avx_available and
vect_masked_load fail, it should work fine on machines with
default AVX support, or if we adjust the current
check_avx_available with current_compiler_flags.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/106806

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-34.c: Adjust with vect_masked_load
effective target.
---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
index 41877e05efd..c2e5dfea35f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
@@ -13,4 +13,4 @@ float summul(int n, float *arg1, float *arg2)
 return res1;
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! { avr-*-* pru-*-* riscv*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_masked_load } } } */
--
2.27.0


Re: [COMMITTED] Allow ranger queries on exit block.

2022-11-02 Thread Richard Biener via Gcc-patches
On Tue, Nov 1, 2022 at 2:20 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> Ranger was not allowing the exit block to be queried for range_on_entry
> or exit, for no good reason.  This removes that restriction.
>
> Interestingly, it seems that when we calculate dominance info, GCC does
> not set the dominators for the EXIT_BLOCK?  I worked around it by
> starting with a single pred of the exit block for my queries, but as a
> result it doesn't support multiple exit blocks.
>
> For the record:
>
>get_immediate_dominator (CDI_DOMINATORS, EXIT_BLOCK_PTR_FOR_FN (cfun))
>
> returns NULL.   Is this actually working as intended?  It was unexpected
> on my part.

Yes, working as "intended".  EXIT and ENTRY
basic-blocks are artificial so having dominance info for them would be
somewhat odd.

Richard.

>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.
>
> Andrew


Re: [PATCH] testsuite: Fix gen-vect-34.c with vect_masked_load [PR106806]

2022-11-02 Thread Richard Biener via Gcc-patches
On Wed, Nov 2, 2022 at 9:03 AM Kewen.Lin  wrote:
>
> Hi,
>
> This is to fix the failure on powerpc as reported in PR106806,
> the test case requires tree ifcvt pass to perform on that loop,
> and it relies on masked_load support.  The fix is to guard the
> expected scan with vect_masked_load effective target.
>
> As tested on powerpc64{,le}-linux-gnu and aarch64-linux-gnu
> (cfarm machine), the failures were gone.  But on
> x86_64-redhat-linux (cfarm machine) the result becomes from
> PASS to N/A.  I think it's expected since that machine doesn't
> support AVX by default so both check_avx_available and
> vect_masked_load fail, it should work fine on machines with
> default AVX support, or if we adjust the current
> check_avx_available with current_compiler_flags.
>
> Is it ok for trunk?

OK

> BR,
> Kewen
> -
> PR testsuite/106806
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/gen-vect-34.c: Adjust with vect_masked_load
> effective target.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> index 41877e05efd..c2e5dfea35f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> @@ -13,4 +13,4 @@ float summul(int n, float *arg1, float *arg2)
>  return res1;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { ! { avr-*-* pru-*-* riscv*-*-* } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_masked_load } } } */
> --
> 2.27.0


Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2022-11-02 Thread Dimitrije Milosevic
Hi Jeff,

> This is exactly what I was trying to get to.   If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate.  If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.

I'll take a deeper look into the candidate selection algorithm then. Will
get back to you.

Regards,
Dimitrije


From: Jeff Law 
Sent: Tuesday, November 1, 2022 7:46 PM
To: Richard Biener; Dimitrije Milosevic
Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.


On 10/28/22 01:00, Richard Biener wrote:
> On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
>  wrote:
>> Hi Jeff,
>>
>>> THe part I don't understand is, if you only have BASE+OFF, why does
>>> preventing the calculation of more complex addressing modes matter?  ie,
>>> what's the point of computing the cost of something like base + off +
>>> scaled index when the target can't utilize it?
>> Well, the complexities of all addressing modes other than BASE + OFFSET are
>> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would 
>> still
>> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
>> than a candidate with BASE + INDEX, for example, as it has to compensate
>> the lack of other addressing modes somehow. If complexities for both of
>> those are equal to 0, in cases where complexities decide which candidate is
>> to be chosen, a more complex candidate may be picked.
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?  So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?

This is exactly what I was trying to get to.   If the addressing mode
isn't supported, then we shouldn't be picking it as a candidate.  If it
is, then we've probably got a problem somewhere else in this code and
this patch is likely papering over it.


Jeff



[PATCH] builtins: Guard builtins.cc against HUGE_VAL and NAN definitions

2022-11-02 Thread Rainer Orth
trunk bootstrap recently broke on Solaris like this:

/vol/gcc/src/hg/master/local/gcc/builtins.cc:2104:8: error: pasting 
"CFN_BUILT_IN_" and "(" does not give a valid preprocessing token
 2104 |   case CFN_BUILT_IN_##MATHFN:   \
  |^
/vol/gcc/src/hg/master/local/gcc/builtins.cc:2112:3: note: in expansion of 
macro 'CASE_MATHFN'
 2112 |   CASE_MATHFN(MATHFN)\
  |   ^~~
/vol/gcc/src/hg/master/local/gcc/builtins.cc:1967:5: note: in expansion of 
macro 'CASE_MATHFN_FLOATN'
 1967 | CASE_MATHFN_FLOATN (HUGE_VAL)  \

and similarly for NAN.

It turns out this happens because  is included at some point,
which (in ) defines

#define HUGE_VAL(__builtin_huge_val())
#define NAN (__builtin_nanf(""))

While this only happpens on Solaris right now, the same issue would be
present on other targets when  gets included somehow.

To avoid this, this patch #undef's both macros.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2022-11-01  Rainer Orth  

gcc:
* builtins.cc (mathfn_built_in_2): #undef HUGE_VAL, NAN.

# HG changeset patch
# Parent  3e5ba66da20edf52cdfe371ea2244c91d770f64a
builtins: Guard builtins.cc against HUGE_VAL and NAN definitions

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -1931,6 +1931,11 @@ mathfn_built_in_2 (tree type, combined_f
   built_in_function fcodef64x = END_BUILTINS;
   built_in_function fcodef128x = END_BUILTINS;
 
+  /* If  has been included somehow, HUGE_VAL and NAN definitions
+ break the uses below.  */
+#undef HUGE_VAL
+#undef NAN
+
   switch (fn)
 {
 #define SEQ_OF_CASE_MATHFN			\


[PATCH] libstdc++: Add _Float128 to_chars/from_chars support for x86, ia64 and ppc64le with glibc

2022-11-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch adds std::{to,from}_chars support for std::float128_t
on glibc 2.26+ for {i?86,x86_64,ia64,powerpc64le}-linux.
When long double is already IEEE quad, previous changes already handle
it by using long double overloads in _Float128 overloads.
The powerpc64le case (with explicit or implicit -mabi=ibmlongdouble)
is handled by using the __float128/__ieee128 entrypoints which are
already in the library and used for -mabi=ieeelongdouble.
For i?86, x86_64 and ia64 this patch adds new library entrypoints,
mostly by enabling the code that was already there for powerpc64le-linux.
Those use __float128 or __ieee128, the patch uses _Float128 for the
exported overloads and internally as template parameter.  While
powerpc64le-linux uses __sprintfieee128 and __strtoieee128,
for _Float128 the patch uses the glibc 2.26 strfromf128 and strtof128
APIs.  So that one can build gcc against older glibc and then compile
user programs on newer glibc, the patch uses weak references unless
gcc is compiled against glibc 2.26+.  strfromf128 unfortunately can't
handle %.0Lf and %.*Le, %.*Lf, %.*Lg format strings sprintf/__sprintfieee128
use, we need to remove the L from those and replace * with actually
directly printing the precision into the format string (i.e. it can
handle %.0f and %.27f (floating point type is implied from the function
name)).
Unlike the std::{,b}float16_t support, this one actually exports APIs
with std::float128_t aka _Float128 in the mangled name, because no
standard format is superset of it.  On the other side, e.g. on i?86/x86_64
it doesn't have restrictions like for _Float16/__bf16 which ISAs need
to be enabled in order to use it.

The denorm_min case in the testcase is temporarily commented out because
of the ERANGE subnormal issue Patrick posted patch for.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-02  Jakub Jelinek  

* include/std/charconv (from_chars, to_chars): Add _Float128
overfloads if _GLIBCXX_HAVE_FLOAT128_MATH is defined.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.31): Export
_ZSt8to_charsPcS_DF128_, _ZSt8to_charsPcS_DF128_St12chars_format,
_ZSt8to_charsPcS_DF128_St12chars_formati and
_ZSt10from_charsPKcS0_RDF128_St12chars_format.
* src/c++17/floating_from_chars.cc (USE_STRTOF128_FOR_FROM_CHARS):
Define if needed.
(__strtof128): Declare.
(from_chars_impl): Handle _Float128.
(from_chars): New _Float128 overload if USE_STRTOF128_FOR_FROM_CHARS
is define.
* src/c++17/floating_to_chars.cc (__strfromf128): Declare.
(FLOAT128_TO_CHARS): Define even when _Float128 is supported and
wider than long double.
(F128_type): Use _Float128 for that case.
(floating_type_traits): Specialize for F128_type rather than
__float128.
(sprintf_ld): Add length argument.  Handle _Float128.
(__floating_to_chars_shortest, __floating_to_chars_precision):
Pass length to sprintf_ld.
(to_chars): Add _Float128 overloads for the F128_type being
_Float128 cases.
* testsuite/20_util/to_chars/float128_c++23.cc: New test.

--- libstdc++-v3/include/std/charconv.jj2022-10-31 22:20:39.475072806 
+0100
+++ libstdc++-v3/include/std/charconv   2022-11-01 16:48:50.693196228 +0100
@@ -736,6 +736,27 @@ namespace __detail
   __value = __val;
 return __res;
   }
+#elif defined(__STDCPP_FLOAT128_T__) && defined(_GLIBCXX_HAVE_FLOAT128_MATH)
+#ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
+  __extension__ from_chars_result
+  from_chars(const char* __first, const char* __last, __ieee128& __value,
+chars_format __fmt = chars_format::general) noexcept;
+
+  inline from_chars_result
+  from_chars(const char* __first, const char* __last, _Float128& __value,
+chars_format __fmt = chars_format::general) noexcept
+  {
+__extension__ __ieee128 __val;
+from_chars_result __res = from_chars(__first, __last, __val, __fmt);
+if (__res.ec == errc{})
+  __value = __val;
+return __res;
+  }
+#else
+  from_chars_result
+  from_chars(const char* __first, const char* __last, _Float128& __value,
+chars_format __fmt = chars_format::general) noexcept;
+#endif
 #endif
 
 #if defined(__STDCPP_BFLOAT16_T__) && defined(_GLIBCXX_FLOAT_IS_IEEE_BINARY32) 
\
@@ -851,6 +872,46 @@ namespace __detail
 return to_chars(__first, __last, static_cast(__value), __fmt,
__precision);
   }
+#elif defined(__STDCPP_FLOAT128_T__) && defined(_GLIBCXX_HAVE_FLOAT128_MATH)
+#ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
+  __extension__ to_chars_result
+  to_chars(char* __first, char* __last, __float128 __value) noexcept;
+  __extension__ to_chars_result
+  to_chars(char* __first, char* __last, __float128 __value,
+  chars_format __fmt) noexcept;
+  __extension__ to_chars_result
+  to_chars(char* __first, char* __last, __float128 __value,
+  chars_format _

Re: [PATCH] builtins: Guard builtins.cc against HUGE_VAL and NAN definitions

2022-11-02 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 02, 2022 at 10:13:44AM +0100, Rainer Orth wrote:
> trunk bootstrap recently broke on Solaris like this:
> 
> /vol/gcc/src/hg/master/local/gcc/builtins.cc:2104:8: error: pasting 
> "CFN_BUILT_IN_" and "(" does not give a valid preprocessing token
>  2104 |   case CFN_BUILT_IN_##MATHFN:   \
>   |^
> /vol/gcc/src/hg/master/local/gcc/builtins.cc:2112:3: note: in expansion of 
> macro 'CASE_MATHFN'
>  2112 |   CASE_MATHFN(MATHFN)\
>   |   ^~~
> /vol/gcc/src/hg/master/local/gcc/builtins.cc:1967:5: note: in expansion of 
> macro 'CASE_MATHFN_FLOATN'
>  1967 | CASE_MATHFN_FLOATN (HUGE_VAL)  \
> 
> and similarly for NAN.
> 
> It turns out this happens because  is included at some point,
> which (in ) defines
> 
> #define   HUGE_VAL(__builtin_huge_val())
> #define   NAN (__builtin_nanf(""))
> 
> While this only happpens on Solaris right now, the same issue would be
> present on other targets when  gets included somehow.
> 
> To avoid this, this patch #undef's both macros.
> 
> Bootstrapped without regressions on i386-pc-solaris2.11 and
> sparc-sun-solaris2.11.
> 
> Ok for trunk?
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2022-11-01  Rainer Orth  
> 
>   gcc:
>   * builtins.cc (mathfn_built_in_2): #undef HUGE_VAL, NAN.

LGTM, thanks.

Jakub



RE: [PATCH] ix86: Suggest unroll factor for loop vectorization

2022-11-02 Thread Cui, Lili via Gcc-patches
> > > +@item x86-vect-unroll-min-ldst-threshold
> > > +The vectorizer will check with target information to determine
> > > +whether unroll it. This parameter is used to limit the mininum of
> > > +loads and stores in the main loop.
> > >
> > > It's odd to "limit" the minimum number of something.  I think this
> > > warrants clarification that for some (unknow to me ;)) reason we
> > > think that when we have many loads and (or?) stores it is beneficial
> > > to unroll to get even more loads and stores in a single iteration.
> > > Btw, does the parameter limit the number of loads and stores _after_
> unrolling or before?
> > >
> > When the number of loads/stores exceeds the threshold, the loads/stores
> are more likely to conflict with loop itself in the L1 cache(Assuming that
> address of loads are scattered).
> > Unroll + software scheduling will make 2 or 4 address contiguous
> loads/stores closer together, it will reduce cache miss rate.
> 
> Ah, nice.  Can we express the default as a function of L1 data cache size, L1
> cache line size and more importantly, the size of the vector memory access?
> 
> Btw, I was looking into making a more meaningful cost modeling for loop
> distribution.  Similar reasoning might apply there - try to _reduce_ the
> number of memory streams so L1 cache utilization allows re-use of a cache
> line in the next [next N] iteration[s]?  OTOH given L1D is quite large I'd 
> expect
> the loops affected to be either quite huge or bottlenecked by load/store
> bandwith (there are 1024 L1D cache lines in zen2 for
> example) - what's the effective L1D load you are keying off?.
> Btw, how does L1D allocation on stores play a role here?
> 
Hi Richard,
To answer your question, I rechecked 549, I found that the 549 improvement 
comes from load reduction, it has a 3-level loop and 8 scalar loads in inner 
loop are loop invariants (due to high register pressure, these loop invariants 
all spill to the stack).
After unrolling the inner loop, those scalar parts are not doubled,  so 
unrolling reduces load instructions and L1/L2/L3 accesses. In the inner loop 
there are 8 different three-dimensional arrays, which size like this 
"a[128][480][128]". Although the size of the 3-layer array is very large,
but it doesn't support the theory I said before, Sorry for that. I need to hold 
this patch to see if we can do something about this scenario. 

Thanks,
Lili.




[PATCH] libstdc++: _Bfloat16 for

2022-11-02 Thread Jakub Jelinek via Gcc-patches
Hi!

Jon pointed out that we have TODO: _Bfloat16 in .
Right now _S_fp_fmt() returns _Binary16 for _Float16, __fp16 as well
as __bf16 and it actually works because we don't have a special handling
of _Binary16.  So, either we could just document that, but I'm a little bit
afraid if HPPA or MIPS don't start supporting _Float16 and/or __bf16.
If they do, we have the
#if defined __hppa__ || (defined __mips__ && !defined __mips_nan2008)
  // IEEE 754-1985 allowed the meaning of the quiet/signaling
  // bit to be reversed. Flip that to give desired ordering.
  if (__builtin_isnan(__x) && __builtin_isnan(__y))
{
  using _Int = decltype(__ix);

  constexpr int __nantype = __fmt == _Binary32  ?  22
  : __fmt == _Binary64  ?  51
  : __fmt == _Binary128 ? 111
  : -1;
  constexpr _Int __bit = _Int(1) << __nantype;
  __ix ^= __bit;
  __iy ^= __bit;
}
#endif
code, the only one where we actually care whether something is
_Binary{32,64,128} (elsewhere we just care about the x86 and m68k 80bits
or double double or just floating point type's sizeof) and we'd need
to handle there _Binary16 and/or _Bfloat16.

So this patch uses different enum for it even when it isn't needed right
now, after all _Binary16 isn't needed either and we could just use
_Binary32...

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-02  Jakub Jelinek  

* libsupc++/compare (_Strong_order::_Fp_fmt): Add _Bfloat16.
(_Strong_order::_Bfloat16): New static data member.
(_Strong_order::_S_fp_fmt): Return _Bfloat16 for std::bfloat16_t.

--- libstdc++-v3/libsupc++/compare.jj   2022-05-09 09:09:21.196461093 +0200
+++ libstdc++-v3/libsupc++/compare  2022-11-01 22:13:16.771219615 +0100
@@ -672,7 +672,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
_X86_80bit,  // x86 80-bit extended precision
_M68k_80bit, // m68k 80-bit extended precision
_Dbldbl, // IBM 128-bit double-double
-   // TODO: _Bfloat16,
+   _Bfloat16,   // std::bfloat16_t
   };
 
 #ifndef __cpp_using_enum
@@ -684,6 +684,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   static constexpr _Fp_fmt _X86_80bit = _Fp_fmt::_X86_80bit;
   static constexpr _Fp_fmt _M68k_80bit = _Fp_fmt::_M68k_80bit;
   static constexpr _Fp_fmt _Dbldbl = _Fp_fmt::_Dbldbl;
+  static constexpr _Fp_fmt _Bfloat16 = _Fp_fmt::_Bfloat16;
 #endif
 
   // Identify the format used by a floating-point type.
@@ -714,6 +715,10 @@ namespace std _GLIBCXX_VISIBILITY(defaul
  if constexpr (__is_same(_Tp, __float80))
return _X86_80bit;
 #endif
+#ifdef __STDCPP_BFLOAT16_T__
+ if constexpr (__is_same(_Tp, decltype(0.0bf16)))
+   return _Bfloat16;
+#endif
 
  constexpr int __width = sizeof(_Tp) * __CHAR_BIT__;
 

Jakub



RE: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-11-02 Thread Tamar Christina via Gcc-patches
Hi Aldy,

I'm trying to use Ranger to determine if a range of an expression is a single 
bit.

If possible in case of a mask then also the position of the bit that's being 
checked by the mask (or the mask itself).

Do you have any pointers/existing code I can look at to do this?

Kind regards,
Tamar

> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, November 1, 2022 5:00 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de
> Subject: Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support
> for bit-test-and-branch operations
> 
> 
> On 11/1/22 09:53, Tamar Christina wrote:
> >>
> >>>from the machine description.
> >>>
> >>> +@cindex @code{tbranch@var{mode}4} instruction pattern @item
> >>> +@samp{tbranch@var{mode}4} Conditional branch instruction
> combined
> >>> +with a bit test-and-compare instruction. Operand 0 is a comparison
> >>> +operator.  Operand 1 is the operand of the comparison. Operand 2 is
> >>> +the bit position of Operand 1 to test.
> >>> +Operand 3 is the @code{code_label} to jump to.
> >> Should we refine/document the set of comparison operators allowed?
> >> Is operand 1 an arbitrary RTL expression or more limited?  I'm
> >> guessing its relatively arbitrary given how you've massaged the
> >> existing branch-on-bit patterns from the aarch backend.
> > It can be any expression in theory. However in practical terms we
> > usually force the values to registers before calling the expansion.
> > My assumption is that this is for CSE purposes but that's only a guess.
> 
> Understood.  And generally yes, forcing expressions into regs is good for CSE.
> 
> 
> >
> >> Do we have enough information lying around from Ranger to avoid the
> need
> >> to walk the def-use chain to discover that we're masking off all but one
> bit?
> >>
> > That's an interesting thought.  I'll try to see if I can figure out how to 
> > query
> > Ranger here.  It would be nice to do so here.
> 
> Reach out to Aldy, I suspect he can probably give you the necessary
> pseudocode pretty quickly.
> 
> 
> Jeff
> 



Re: [PATCH] config/rs6000/t-float128: Don't encode full build paths into headers

2022-11-02 Thread Richard Purdie via Gcc-patches
On Wed, 2022-08-17 at 13:10 +0100, Richard Purdie via Gcc-patches
wrote:
> Avoid encoding full build paths into headers, just use the basename of the 
> file.
> This aids build reproducibility where the build paths vary and source is saved
> for debugging purposes.
> 
> libgcc/ChangeLog:
> 
> * config/rs6000/t-float128: Don't encode full build paths into headers
> 

I think this patch is at risk of being lost. It is a simple change
which aids reproducibility so I'm hoping someone might be able to help
with review/merging?

Thanks!

Richard


> Signed-off-by: Richard Purdie 
> ---
>  libgcc/config/rs6000/t-float128 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
> index b09b5664af0..513e63748f1 100644
> --- a/libgcc/config/rs6000/t-float128
> +++ b/libgcc/config/rs6000/t-float128
> @@ -103,7 +103,7 @@ $(ibm128_dec_objs): INTERNAL_CFLAGS += 
> $(IBM128_CFLAGS_DECIMAL)
>  $(fp128_softfp_src) : $(srcdir)/soft-fp/$(subst -sw,,$(subst kf,tf,$@)) 
> $(fp128_dep)
>   @src="$(srcdir)/soft-fp/$(subst -sw,,$(subst kf,tf,$@))"; \
>   echo "Create $@"; \
> - (echo "/* file created from $$src */"; \
> + (echo "/* file created from `basename $$src` */"; \
>echo; \
>sed -f $(fp128_sed) < $$src) > $@
>  





[PATCH v2] libcpp: Avoid remapping filenames within directives

2022-11-02 Thread Richard Purdie via Gcc-patches
Code such as:

 #include __FILE__

can interact poorly with the *-prefix-map options when cross compiling. In
general you're after to remap filenames for use in target context but the
local paths should be used to find include files at compile time. Ingoring
filename remapping for directives allows avoiding such failures.

Fix this to improve such usage and then document this against file-prefix-map
(referenced by the other *-prefix-map options) to make the behaviour clear
and defined.

libcpp/ChangeLog:

* macro.cc (_cpp_builtin_macro_text): Don't remap filenames within 
directives

gcc/ChangeLog:

* doc/invoke.texi: Document prefix-maps don't affect directives

Signed-off-by: Richard Purdie 
---
 gcc/doc/invoke.texi | 3 ++-
 libcpp/macro.cc | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c6323a53ad2..9d5dd3e20b7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2213,7 +2213,8 @@ any references to them in the result of the compilation 
as if the
 files resided in directory @file{@var{new}} instead.  Specifying this
 option is equivalent to specifying all the individual
 @option{-f*-prefix-map} options.  This can be used to make reproducible
-builds that are location independent.  See also
+builds that are location independent.  Directories referenced by
+directives are not affected by these options. See also
 @option{-fmacro-prefix-map}, @option{-fdebug-prefix-map} and
 @option{-fprofile-prefix-map}.
 
diff --git a/libcpp/macro.cc b/libcpp/macro.cc
index 8ebf360c03c..7d5a0d0fd2e 100644
--- a/libcpp/macro.cc
+++ b/libcpp/macro.cc
@@ -563,7 +563,7 @@ _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode 
*node,
if (!name)
  abort ();
  }
-   if (pfile->cb.remap_filename)
+   if (pfile->cb.remap_filename && !pfile->state.in_directive)
  name = pfile->cb.remap_filename (name);
len = strlen (name);
buf = _cpp_unaligned_alloc (pfile, len * 2 + 3);
-- 
2.34.1



Re: [PATCH 2/2] libcpp: Avoid remapping filenames within directives

2022-11-02 Thread Richard Purdie via Gcc-patches
On Tue, 2022-11-01 at 13:32 -0600, Jeff Law wrote:
> On 8/17/22 06:15, Richard Purdie via Gcc-patches wrote:
> > Code such as:
> #include __FILE__
> > 
> > can interact poorly with file-prefix-map options when cross compiling. In
> > general you're after to remap filenames for use in target context but the
> > local paths should be used to find include files at compile time. Ingoring
> > filename remapping for directives is one way to avoid such failures.
> > 
> > libcpp/ChangeLog:
> > 
> >  * macro.cc (_cpp_builtin_macro_text): Don't remap filenames within 
> > directives
> 
> So I went back and reviewed the old PR which introduced this code.  It 
> was actually the Yocto project that got this code in to begin with :-)

Thanks for the review!

That sounds right, we use it heavily and originally had a few issues in
this area. It now generally works really well, we just found this
corner case :)
  
> There wasn't really any discussion AFAICT about whether or not to remap 
> in directives that I saw in the PR.

I don't think we'd realised there was this corner case. Now we have
found code doing it, I think the behaviour we should have is fairly
clear which is why we're sending the patch.

> ISTM that given the change in behavior, we should probably document that 
> we don't remap in directives.  Probably doc/invoke.texi.
> 
> With suitable documentation, this should be fine.  It seems like it 
> ought to be independent of the first patch in this series which adds 
> support for remapping relative paths.

Thanks for merging 1/2, it was independent, just related as a path
mapping issue we found. I'll keep the coding style in mind in future.

I've sent a new version of this patch which updates doc/invoke.texi,
just adding to file-prefix-map as the others all reference it.

Cheers,

Richard






Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-11-02 Thread Aldy Hernandez via Gcc-patches
On Wed, Nov 2, 2022 at 10:55 AM Tamar Christina  wrote:
>
> Hi Aldy,
>
> I'm trying to use Ranger to determine if a range of an expression is a single 
> bit.
>
> If possible in case of a mask then also the position of the bit that's being 
> checked by the mask (or the mask itself).

Just instantiate a ranger, and ask for the range of an SSA name (or an
arbitrary tree expression) at a particular gimple statement (or an
edge):

gimple_ranger ranger;
int_range_max r;
if (ranger.range_of_expr (r, , )) {
  // do stuff with range "r"
  if (r.singleton_p ()) {
wide_int num = r.lower_bound ();
// Check the bits in NUM, etc...
  }
}

You can see the full ranger API in gimple-range.h.

Note that instantiating a new ranger is relatively lightweight, but
it's not free.  So unless you're calling range_of_expr sporadically,
you probably want to have one instance for your pass.  You can pass
around the gimple_ranger around your pass.  Another way of doing this
is calling enable_rager() at pass start, and then doing:

  get_range_query (cfun)->range_of_expr (r, , ));

gimple-loop-versioning.cc has an example of using enable_ranger /
disable_ranger.

I am assuming you are interested in ranges for integers / pointers.
Otherwise (floats, etc) you'd have to use "Value_Range" instead of
int_range_max.  I can give you examples on that if necessary.

Let me know if that helps.
Aldy

>
> Do you have any pointers/existing code I can look at to do this?
>
> Kind regards,
> Tamar
>
> > -Original Message-
> > From: Jeff Law 
> > Sent: Tuesday, November 1, 2022 5:00 PM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de
> > Subject: Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support
> > for bit-test-and-branch operations
> >
> >
> > On 11/1/22 09:53, Tamar Christina wrote:
> > >>
> > >>>from the machine description.
> > >>>
> > >>> +@cindex @code{tbranch@var{mode}4} instruction pattern @item
> > >>> +@samp{tbranch@var{mode}4} Conditional branch instruction
> > combined
> > >>> +with a bit test-and-compare instruction. Operand 0 is a comparison
> > >>> +operator.  Operand 1 is the operand of the comparison. Operand 2 is
> > >>> +the bit position of Operand 1 to test.
> > >>> +Operand 3 is the @code{code_label} to jump to.
> > >> Should we refine/document the set of comparison operators allowed?
> > >> Is operand 1 an arbitrary RTL expression or more limited?  I'm
> > >> guessing its relatively arbitrary given how you've massaged the
> > >> existing branch-on-bit patterns from the aarch backend.
> > > It can be any expression in theory. However in practical terms we
> > > usually force the values to registers before calling the expansion.
> > > My assumption is that this is for CSE purposes but that's only a guess.
> >
> > Understood.  And generally yes, forcing expressions into regs is good for 
> > CSE.
> >
> >
> > >
> > >> Do we have enough information lying around from Ranger to avoid the
> > need
> > >> to walk the def-use chain to discover that we're masking off all but one
> > bit?
> > >>
> > > That's an interesting thought.  I'll try to see if I can figure out how 
> > > to query
> > > Ranger here.  It would be nice to do so here.
> >
> > Reach out to Aldy, I suspect he can probably give you the necessary
> > pseudocode pretty quickly.
> >
> >
> > Jeff
> >
>



Re: [PATCH] libstdc++: _Bfloat16 for

2022-11-02 Thread Jonathan Wakely via Gcc-patches
On Wed, 2 Nov 2022 at 09:39, Jakub Jelinek  wrote:
>
> Hi!
>
> Jon pointed out that we have TODO: _Bfloat16 in .
> Right now _S_fp_fmt() returns _Binary16 for _Float16, __fp16 as well
> as __bf16 and it actually works because we don't have a special handling
> of _Binary16.  So, either we could just document that, but I'm a little bit
> afraid if HPPA or MIPS don't start supporting _Float16 and/or __bf16.
> If they do, we have the
> #if defined __hppa__ || (defined __mips__ && !defined __mips_nan2008)
>   // IEEE 754-1985 allowed the meaning of the quiet/signaling
>   // bit to be reversed. Flip that to give desired ordering.
>   if (__builtin_isnan(__x) && __builtin_isnan(__y))
> {
>   using _Int = decltype(__ix);
>
>   constexpr int __nantype = __fmt == _Binary32  ?  22
>   : __fmt == _Binary64  ?  51
>   : __fmt == _Binary128 ? 111
>   : -1;
>   constexpr _Int __bit = _Int(1) << __nantype;
>   __ix ^= __bit;
>   __iy ^= __bit;
> }
> #endif
> code, the only one where we actually care whether something is
> _Binary{32,64,128} (elsewhere we just care about the x86 and m68k 80bits
> or double double or just floating point type's sizeof) and we'd need
> to handle there _Binary16 and/or _Bfloat16.
>
> So this patch uses different enum for it even when it isn't needed right
> now, after all _Binary16 isn't needed either and we could just use
> _Binary32...
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks for taking care of it.


>
> 2022-11-02  Jakub Jelinek  
>
> * libsupc++/compare (_Strong_order::_Fp_fmt): Add _Bfloat16.
> (_Strong_order::_Bfloat16): New static data member.
> (_Strong_order::_S_fp_fmt): Return _Bfloat16 for std::bfloat16_t.
>
> --- libstdc++-v3/libsupc++/compare.jj   2022-05-09 09:09:21.196461093 +0200
> +++ libstdc++-v3/libsupc++/compare  2022-11-01 22:13:16.771219615 +0100
> @@ -672,7 +672,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
> _X86_80bit,  // x86 80-bit extended precision
> _M68k_80bit, // m68k 80-bit extended precision
> _Dbldbl, // IBM 128-bit double-double
> -   // TODO: _Bfloat16,
> +   _Bfloat16,   // std::bfloat16_t
>};
>
>  #ifndef __cpp_using_enum
> @@ -684,6 +684,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>static constexpr _Fp_fmt _X86_80bit = _Fp_fmt::_X86_80bit;
>static constexpr _Fp_fmt _M68k_80bit = _Fp_fmt::_M68k_80bit;
>static constexpr _Fp_fmt _Dbldbl = _Fp_fmt::_Dbldbl;
> +  static constexpr _Fp_fmt _Bfloat16 = _Fp_fmt::_Bfloat16;
>  #endif
>
>// Identify the format used by a floating-point type.
> @@ -714,6 +715,10 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>   if constexpr (__is_same(_Tp, __float80))
> return _X86_80bit;
>  #endif
> +#ifdef __STDCPP_BFLOAT16_T__
> + if constexpr (__is_same(_Tp, decltype(0.0bf16)))
> +   return _Bfloat16;
> +#endif
>
>   constexpr int __width = sizeof(_Tp) * __CHAR_BIT__;
>
>
> Jakub
>



Re: [PATCH] libstdc++: Fix ERANGE behavior for fallback FP std::from_chars

2022-11-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 1 Nov 2022 at 21:30, Patrick Palka via Libstdc++
 wrote:
>
> The fallback implementation of floating-point std::from_chars for e.g.
> float80 just calls the C library's strtod family of functions.  In case
> of overflow of the parsed result, the behavior of these functions is
> rigidly specified:
>
>   If the correct value overflows and default rounding is in effect, plus
>   or minus HUGE_VAL, HUGE_VALF, or HUGE_VALL is returned (according to
>   the return type and sign of the value), and the value of the macro
>   ERANGE is stored in errno.
>
> But in case of underflow, implementations are given more leeway:
>
>   If the result underflows the functions return a value whose magnitude
>   is no greater than the smallest normalized positive number in the
>   return type; whether errno acquires the value ERANGE is
>   implementation-defined.
>
> Thus we can (and do) portably detect overflow, but we can't portably
> detect underflow.  However, glibc (and presumably other high-quality C
> library implementations) will reliably set errno to ERANGE in case of
> underflow too, and it will also return the nearest denormal number to
> the parsed result (including zero in case of true underflow).
>
> Since we can't be perfect here, this patch takes the best effort
> approach of assuming a high quality C library implementation that
> allows us to distinguish between a denormal parsed result and true
> underflow by inspecting the return value.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?  Dunno

OK for trunk.

> if we should backport this too.  No test because we can't portably
> test this IIUC.

I think it's worth backporting to 11 and 12 because this is a C++17
feature and that's our default mode since GCC 11.
But give it some soak time on trunk first please.


>
> libstdc++-v3/ChangeLog:
>
> * src/c++17/floating_from_chars.cc (from_chars_impl): In the
> ERANGE case, also check for a 0 return value before returning
> result_out_of_range, occurred, otherwise assume it's a denormal
> number.
> ---
>  libstdc++-v3/src/c++17/floating_from_chars.cc | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
> b/libstdc++-v3/src/c++17/floating_from_chars.cc
> index a25ac5ce3aa..939c751f861 100644
> --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> @@ -637,8 +637,13 @@ namespace
>   {
> if (__builtin_isinf(tmpval)) // overflow
>   ec = errc::result_out_of_range;
> -   else // underflow (LWG 3081 wants to set value = tmpval here)
> +   else if (tmpval == 0) // underflow (LWG 3081 wants to set value = 
> tmpval here)
>   ec = errc::result_out_of_range;
> +   else // denormal value
> + {
> +   value = tmpval;
> +   ec = errc();
> + }
>   }
> else if (n)
>   {
> --
> 2.38.1.381.gc03801e19c
>



[og12] OpenACC: Fix reduction tree-sharing issue [PR106982] (was: [gcc/devel/omp/gcc-12] Merge branch 'releases/gcc-12' into devel/omp/gcc-12)

2022-11-02 Thread Thomas Schwinge
Hi Tobias!

On 2022-09-29T14:45:03+, Tobias Burnus via Gcc-cvs  
wrote:
> https://gcc.gnu.org/g:c455181c13a7b00ee09777287bcf0c8b9de9d1fe
>
> commit c455181c13a7b00ee09777287bcf0c8b9de9d1fe
> Merge: d21bfef9867 85adc2ec2b0
> Author: Tobias Burnus 
> Date:   Thu Sep 29 16:37:52 2022 +0200
>
> Merge branch 'releases/gcc-12' into devel/omp/gcc-12
>
> Merged up to r12-8794-g85adc2ec2b0736d07c0df35ad9a450f97ff59a7c (29th 
> Sept 2022)
>
> This includes r12-8793-gafea1ae84f0 (cherry-picked from 
> r13-2868-gd3df98807b5)
> "OpenACC: Fix reduction tree-sharing issue [PR106982]".  However, due to
> omp-low.cc changes, it neither applies cleanly nor it required to make the
> testcases pass. This merge adds the testcases - but due to conflicts 
> under a
> different filename: gcc/testsuite/c-c++-common/goacc/reduction-7.c added 
> as
> ...-9.c and ...-8.c added as ...-10.c.

Hmm, it seems that something needs to be done in og12 'gcc/omp-low.cc',
too -- I do confirm:

+PASS: c-c++-common/goacc/reduction-9.c (test for excess errors)

..., but:

+FAIL: c-c++-common/goacc/reduction-10.c (internal compiler error: 
verify_gimple failed)
+FAIL: c-c++-common/goacc/reduction-10.c (test for excess errors)

[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c: In 
function 'test1':
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c:10:9: 
error: incorrect sharing of tree nodes
MEM  [(double *)&reduced]
MEM  [(double *)&reduced] = .GOACC_REDUCTION (INIT, 0, MEM 
 [(double *)&reduced], -1, 73, 0);
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c:10:9: 
error: incorrect sharing of tree nodes
MEM  [(double *)&reduced]
#pragma acc loop reduction(*:MEM  [(double *)&reduced]) worker 
private(y)
for (y = 0; y < 5; y = y + 1)
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c:10:9: 
error: incorrect sharing of tree nodes
MEM  [(double *)&reduced]
MEM  [(double *)&reduced] = .GOACC_REDUCTION (FINI, 0, MEM 
 [(double *)&reduced], -1, 73, 0);
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c:10:9: 
error: incorrect sharing of tree nodes
MEM  [(double *)&reduced]
MEM  [(double *)&reduced] = .GOACC_REDUCTION (TEARDOWN, 0, MEM 
 [(double *)&reduced], -1, 73, 0);
during GIMPLE pass: cfg
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/reduction-10.c:10:9: 
internal compiler error: verify_gimple failed

(Same for C++ testing.)


Grüße
 Thomas


> Diff:
>
>  gcc/ChangeLog  | 24 +
>  gcc/DATESTAMP  |  2 +-
>  gcc/config/aarch64/aarch64-cores.def   |  3 +-
>  gcc/config/aarch64/aarch64-tune.md |  2 +-
>  gcc/config/aarch64/aarch64.cc  | 40 
> +++---
>  gcc/doc/invoke.texi|  2 +-
>  gcc/omp-low.cc |  3 +-
>  gcc/testsuite/c-c++-common/goacc/reduction-10.c| 12 +++
>  gcc/testsuite/c-c++-common/goacc/reduction-9.c | 22 
>  libstdc++-v3/doc/html/index.html   |  2 +-
>  libstdc++-v3/doc/html/manual/api.html  |  5 +++
>  libstdc++-v3/doc/html/manual/appendix.html |  2 +-
>  libstdc++-v3/doc/html/manual/appendix_porting.html |  2 +-
>  libstdc++-v3/doc/html/manual/bugs.html |  6 
>  libstdc++-v3/doc/html/manual/index.html|  2 +-
>  libstdc++-v3/doc/html/manual/using_macros.html |  5 +--
>  libstdc++-v3/doc/xml/manual/evolution.xml  | 13 +++
>  libstdc++-v3/doc/xml/manual/intro.xml  |  9 +
>  libstdc++-v3/doc/xml/manual/using.xml  |  5 +--
>  libstdc++-v3/include/std/functional| 32 -
>  libstdc++-v3/testsuite/20_util/bind/cv_quals.cc| 25 +++---
>  libstdc++-v3/testsuite/20_util/bind/cv_quals_2.cc  | 12 ---
>  22 files changed, 172 insertions(+), 58 deletions(-)
>
> diff --cc gcc/testsuite/c-c++-common/goacc/reduction-10.c
> index 000,000..2c3ed499d5b
> new file mode 100644
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
> @@@ -1,0 -1,0 +1,12 @@@
> ++/* { dg-do compile } */
> ++
> ++/* PR middle-end/106982 */
> ++
> ++void test1(double *c)
> ++{
> ++double reduced[5];
> ++#pragma acc parallel loop gang private(reduced)
> ++for (int x = 0; x < 5; ++x)
> ++#pragma acc loop worker reduction(*:reduced)
> ++  for (int y = 0; y < 5; ++y) { }
> ++}
> diff --cc gcc/testsuite/c-c++-common/goacc/reduction-9.c
> index 000,000..482b0ab1984
> new file mode 100644
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-9.c
> @@@ -1,0 -1,0 +1,22 @@@
> ++/* { dg-do compile } */
> ++
> ++/* PR middle-end/106982 */
> ++
> ++long long n = 100;
> ++int multiplicitive_n = 128;
> ++
> ++void test1(double *rand, double *a, double *b,

Re: [PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++)

2022-11-02 Thread Jakub Jelinek via Gcc-patches
Hi!

Thanks for working on this!

On Tue, Nov 01, 2022 at 09:50:38PM +, Julian Brown wrote:
> > I think we should figure out when we should temporarily disable
> >   parser->omp_array_section_p = false;
> > and restore it afterwards to a saved value.  E.g.
> > cp_parser_lambda_expression seems like a good candidate, the fact that
> > OpenMP array sections are allowed say in map clause doesn't mean they
> > are allowed inside of lambdas and it would be especially hard when
> > the lambda is defining a separate function and the search for
> > OMP_ARRAY_SECTION probably wouldn't be able to discover those.
> > Other spots to consider might be statement expressions, perhaps type
> > definitions etc.
> 
> I've had a go at doing this -- several expression types now forbid
> array-section syntax (see new "bad-array-section-*" tests added). I'm
> afraid my C++ isn't quite up to figuring out how it's possible to
> define a type inside an expression (inside a map clause) if we forbid
> lambdas and statement expressions though -- can you give an example?

But we can't forbid lambdas inside of the map clause expressions,
they are certainly valid in OpenMP, and IMNSHO shouldn't disallow statement
expressions, people might not even know they use a statement expression,
they could just use some standard macro which uses a statement expression
under the hood.  Though your testcases look good.

> > This shouldn't be done just for OMP_CLAUSE_MAP, but for all the
> > other clauses that accept array sections, including
> > OMP_CLAUSE_DEPEND, OMP_CLAUSE_AFFINITY, OMP_CLAUSE_MAP, OMP_CLAUSE_TO,
> > OMP_CLAUSE_FROM, OMP_CLAUSE_INCLUSIVE, OMP_CLAUSE_EXCLUSIVE,
> > OMP_CLAUSE_USE_DEVICE_ADDR, OMP_CLAUSE_HAS_DEVICE_ADDR,
> > OMP_CLAUSE_*REDUCTION.
> 
> I'm not too sure about all of those -- Tobias points out that
> "INCLUSIVE", "EXCLUSIVE", *DEVICE* and *REDUCTION* take "variable list"
> item types, not "locator list", though sometimes with an array section
> being permitted (in OpenMP 5.2+).

That is true.  For the clauses that don't use locator lists but variable
lists but accept array sections there are strict restrictions on what one
can use, basically one can only have varname or varname[...] or
varname[...][...] etc. where ... is the normal array element or array
section syntax.  So, we probably should continue to parse them as now,
but we can use OMP_ARRAY_SECTION to hold what we've parsed or even share
code with parsing array sections and the [...] on those clauses.

> Tested (alongside next patch) with offloading to NVPTX -- with my
> previously-posted "address tokenization" patch also applied.

> 2022-11-01  Julian Brown  
> 
> gcc/c-family/
> * c-omp.cc (c_omp_address_inspector::map_supported_p): Handle
>   OMP_ARRAY_SECTION.
> 
> gcc/cp/
>   * constexpr.cc (potential_consant_expression_1): Handle
>   OMP_ARRAY_SECTION.
> * error.cc (dump_expr): Handle OMP_ARRAY_SECTION.
> * parser.cc (cp_parser_new): Initialize parser->omp_array_section_p.
>   (cp_parser_statement_expr): Disallow array sections.
> (cp_parser_postfix_open_square_expression): Support OMP_ARRAY_SECTION
> parsing.
>   (cp_parser_parenthesized_expression_list, cp_parser_lambda_expression,
>   cp_parser_braced_list): Disallow array sections.
> (cp_parser_omp_var_list_no_open): Remove ALLOW_DEREF parameter, add
> MAP_LVALUE in its place.  Supported generalised lvalue parsing for
>   OpenMP map, to and from clauses.
> (cp_parser_omp_var_list): Remove ALLOW_DEREF parameter, add 
> MAP_LVALUE.
> Pass to cp_parser_omp_var_list_no_open.
> (cp_parser_oacc_data_clause, cp_parser_omp_all_clauses): Update calls
> to cp_parser_omp_var_list.
>   (cp_parser_omp_clause_map): Add sk_omp scope around
>   cp_parser_omp_var_list_no_open call.
> * parser.h (cp_parser): Add omp_array_section_p field.
> * semantics.cc (handle_omp_array_sections_1): Handle more types of map
> expression.
> (handle_omp_array_section): Handle non-DECL_P attachment points.
> (finish_omp_clauses): Check for supported types of expression.
> 
> gcc/
> * tree-pretty-print.c (dump_generic_node): Support OMP_ARRAY_SECTION.
> * tree.def (OMP_ARRAY_SECTION): New tree code.
> 
> gcc/testsuite/
> * c-c++-common/gomp/map-6.c: Update expected output.
>   * g++.dg/gomp/bad-array-section-1.C: New test.
>   * g++.dg/gomp/bad-array-section-2.C: New test.
>   * g++.dg/gomp/bad-array-section-3.C: New test.
>   * g++.dg/gomp/bad-array-section-4.C: New test.
>   * g++.dg/gomp/bad-array-section-5.C: New test.
>   * g++.dg/gomp/bad-array-section-6.C: New test.
>   * g++.dg/gomp/bad-array-section-7.C: New test.
>   * g++.dg/gomp/bad-array-section-8.C: New test.
>   * g++.dg/gomp/bad-array-section-9.C: New test.
>   * g++.dg/gomp/has_device_addr-non-lvalue-1.C: New test.
> * g++.dg/gomp/pr

Re: Adding a new thread model to GCC

2022-11-02 Thread i.nixman--- via Gcc-patches



hi Eric, Jonathan,

I was able to successfully build gcc-trunk using the provided patch.
moreover, I was able to successfully build all of the packages used in 
the toolchain!
(gmp, mpfr, mpc, isl, libgnurx, bzip2, termcap, libffi, expat, ncurses, 
readline, gdbm, tcl, tk, openssl, xz-utils, sqlite, python3, binutils, 
gdb, make)


at first glance everything seems to be working as before!
I posted the information about this and the link to the archive on the 
project page: https://github.com/niXman/mingw-builds/issues/622




best!


Re: optabs: Variable index vec_set

2022-11-02 Thread Robin Dapp via Gcc-patches
Hi,

> With the patch my local changes to make better use of vec_set work
> nicely even though I haven't done a full bootstrap yet.  Were there
> other issues with the patch or can it still be applied?

I performed a bootstrap as well as a regtest with -march=z16 on s390.
There is no new fallout.

Regards
 Robin


Re: [PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++)

2022-11-02 Thread Julian Brown
On Wed, 2 Nov 2022 12:58:37 +0100
Jakub Jelinek via Fortran  wrote:

> On Tue, Nov 01, 2022 at 09:50:38PM +, Julian Brown wrote:
> > > I think we should figure out when we should temporarily disable
> > >   parser->omp_array_section_p = false;
> > > and restore it afterwards to a saved value.  E.g.
> > > cp_parser_lambda_expression seems like a good candidate, the fact
> > > that OpenMP array sections are allowed say in map clause doesn't
> > > mean they are allowed inside of lambdas and it would be
> > > especially hard when the lambda is defining a separate function
> > > and the search for OMP_ARRAY_SECTION probably wouldn't be able to
> > > discover those. Other spots to consider might be statement
> > > expressions, perhaps type definitions etc.  
> > 
> > I've had a go at doing this -- several expression types now forbid
> > array-section syntax (see new "bad-array-section-*" tests added).
> > I'm afraid my C++ isn't quite up to figuring out how it's possible
> > to define a type inside an expression (inside a map clause) if we
> > forbid lambdas and statement expressions though -- can you give an
> > example?  
> 
> But we can't forbid lambdas inside of the map clause expressions,
> they are certainly valid in OpenMP, and IMNSHO shouldn't disallow
> statement expressions, people might not even know they use a
> statement expression, they could just use some standard macro which
> uses a statement expression under the hood.  Though your testcases
> look good.

I meant "forbid array sections within lambdas and statement
expressions" -- FAOD, does that seem reasonable? Technically it might
not be that hard to support e.g. a statement expression with an array
section on the final expression, but that doesn't work at the moment.
Maybe a follow-on patch could support that if we want it?

I'll take a look at addressing your other review comments, thanks!

Cheers,

Julian


Re: [PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++)

2022-11-02 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 02, 2022 at 12:20:11PM +, Julian Brown wrote:
> > But we can't forbid lambdas inside of the map clause expressions,
> > they are certainly valid in OpenMP, and IMNSHO shouldn't disallow
> > statement expressions, people might not even know they use a
> > statement expression, they could just use some standard macro which
> > uses a statement expression under the hood.  Though your testcases
> > look good.
> 
> I meant "forbid array sections within lambdas and statement
> expressions" -- FAOD, does that seem reasonable? Technically it might

Yeah, my response was to the wording you wrote above the patch, not what
is inside of the patch which looked ok.

> not be that hard to support e.g. a statement expression with an array
> section on the final expression, but that doesn't work at the moment.

And I think we want to keep it that way.

Jakub



Re: [PATCH] gcc: honour -ffile-prefix-map in ASM_MAP [PR93371]

2022-11-02 Thread Rasmus Villemoes
On 01/11/2022 21.11, Jeff Law wrote:
> 
> On 8/29/22 03:29, Rasmus Villemoes wrote:
>> -ffile-prefix-map is supposed to be a superset of -fmacro-prefix-map
>> and -fdebug-prefix-map. However, when building .S or .s files, gas is
>> not called with the appropriate --debug-prefix-map option when
>> -ffile-prefix-map is used.
>>
>> While the user can specify -fdebug-prefix-map when building assembly
>> files via gcc, it's more ergonomic to also support -ffile-prefix-map;
>> especially since for .S files that could contain the __FILE__ macro,
>> one would then also have to specify -fmacro-prefix-map.
>>
>> gcc:
>> PR driver/93371
>> * gcc.cc (ASM_MAP): Honour -ffile-prefix-map.
> 
> OK.  Sorry for the long delay.

Thanks, and no problem.

However, when I try to push the new master branch I get

$ git push origin master
fatal: remote error: service not enabled: /git/gcc.git

I do gcc patches sufficiently rare that I may have forgotten the right
procedure, but this is what I think I've done previously (along with
running a "git gcc-verify HEAD" to ensure there's a proper changelog
fragment to extract, with gcc-verify being a suitable alias).

Have I simply lost by commit bit?

Rasmus



Re: optabs: Variable index vec_set

2022-11-02 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 2, 2022 at 1:12 PM Robin Dapp  wrote:
>
> Hi,
>
> > With the patch my local changes to make better use of vec_set work
> > nicely even though I haven't done a full bootstrap yet.  Were there
> > other issues with the patch or can it still be applied?
>
> I performed a bootstrap as well as a regtest with -march=z16 on s390.
> There is no new fallout.

IIRC, I was trying to "fix" modeless operand by giving it a mode, but
since it made no difference for x86, I later dropped the patch.
However, operand with a known mode is preferred, so if it works for
you, just include my patch in your submission. My patch is somehow
trivial if we want operand to have known mode.

Uros.


Re: optabs: Variable index vec_set

2022-11-02 Thread Robin Dapp via Gcc-patches
> IIRC, I was trying to "fix" modeless operand by giving it a mode, but
> since it made no difference for x86, I later dropped the patch.
> However, operand with a known mode is preferred, so if it works for
> you, just include my patch in your submission. My patch is somehow
> trivial if we want operand to have known mode.

I'd prefer to push it separately as my patch changes several things in
the s390 backend that are kind of unrelated.  Is it OK to do an x86
bootstrap and regtest and push it if everything looks good?  You can of
course also do it yourself :)

Thanks
 Robin


Re: optabs: Variable index vec_set

2022-11-02 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 2, 2022 at 1:45 PM Robin Dapp  wrote:
>
> > IIRC, I was trying to "fix" modeless operand by giving it a mode, but
> > since it made no difference for x86, I later dropped the patch.
> > However, operand with a known mode is preferred, so if it works for
> > you, just include my patch in your submission. My patch is somehow
> > trivial if we want operand to have known mode.
>
> I'd prefer to push it separately as my patch changes several things in
> the s390 backend that are kind of unrelated.  Is it OK to do an x86
> bootstrap and regtest and push it if everything looks good?  You can of
> course also do it yourself :)

It is a middle-end patch, someone will have to approve it.

Uros.


[RFC] RISC-V: Add profile supports.

2022-11-02 Thread jiawei
Supports RISC-V profiles[1] in -march option, add minimal extension name 
supports.

Default input set the profile is before other formal extensions.

Test with -march=RV[I/M/A]2[0/2][U/M/S][64/32]+otherextensions.

[1]https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc


jiawei (2):
  RISC-V: Add minimal supports for new extension in profile.
  RISC-V: Add profile supports.

 gcc/common/config/riscv/riscv-common.cc | 115 ++--
 gcc/config/riscv/riscv-opts.h   |  15 
 gcc/config/riscv/riscv-subset.h |   5 +-
 3 files changed, 129 insertions(+), 6 deletions(-)

-- 
2.25.1



[RFC] RISC-V: Minimal supports for new extensions in profile.

2022-11-02 Thread jiawei
This patch just add name support contain in profiles.
Set the extension version as 0.1.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extensions.
* config/riscv/riscv-opts.h (MASK_ZICCAMOA): New mask.
(MASK_ZICCIF): Ditto.
(MASK_ZICCLSM): Ditto.
(MASK_ZICCRSE): Ditto.
(MASK_ZICNTR): Ditto.
(MASK_ZIHINTPAUSE): Ditto.
(MASK_ZIHPM): Ditto.
(TARGET_ZICCAMOA): New target.
(TARGET_ZICCIF): Ditto.
(TARGET_ZICCLSM): Ditto.
(TARGET_ZICCRSE): Ditto.
(TARGET_ZICNTR): Ditto.
(TARGET_ZIHINTPAUSE): Ditto.
(TARGET_ZIHPM): Ditto.
(MASK_SVPBMT): New mask.

---
 gcc/common/config/riscv/riscv-common.cc | 20 
 gcc/config/riscv/riscv-opts.h   | 15 +++
 2 files changed, 35 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index d6404a01205..602491c638d 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -163,6 +163,15 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+  {"ziccamoa", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"ziccif", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"zicclsm", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"ziccrse", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"zicntr", ISA_SPEC_CLASS_NONE, 0, 1},
+
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"zihpm", ISA_SPEC_CLASS_NONE, 0, 1},
+
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -219,6 +228,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svpbmt", ISA_SPEC_CLASS_NONE, 0, 1},
 
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
@@ -1179,6 +1189,14 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
+  {"ziccamoa", &gcc_options::x_riscv_zi_subext, MASK_ZICCAMOA},
+  {"ziccif", &gcc_options::x_riscv_zi_subext, MASK_ZICCIF},
+  {"zicclsm", &gcc_options::x_riscv_zi_subext, MASK_ZICCLSM},
+  {"ziccrse", &gcc_options::x_riscv_zi_subext, MASK_ZICCRSE},
+  {"zicntr", &gcc_options::x_riscv_zi_subext, MASK_ZICNTR},
+
+  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
+  {"zihpm", &gcc_options::x_riscv_zi_subext, MASK_ZIHPM},
 
   {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
   {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
@@ -1230,6 +1248,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl1024b",  &gcc_options::x_riscv_zvl_flags, MASK_ZVL1024B},
   {"zvl2048b",  &gcc_options::x_riscv_zvl_flags, MASK_ZVL2048B},
   {"zvl4096b",  &gcc_options::x_riscv_zvl_flags, MASK_ZVL4096B},
+
   {"zvl8192b",  &gcc_options::x_riscv_zvl_flags, MASK_ZVL8192B},
   {"zvl16384b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL16384B},
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
@@ -1242,6 +1261,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", &gcc_options::x_riscv_sv_subext, MASK_SVNAPOT},
+  {"svpbmt", &gcc_options::x_riscv_sv_subext, MASK_SVPBMT},
 
   {NULL, NULL, 0}
 };
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1dfe8c89209..906b6280188 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -69,9 +69,23 @@ enum stack_protector_guard {
 
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
+#define MASK_ZICCAMOA (1 << 2)
+#define MASK_ZICCIF   (1 << 3)
+#define MASK_ZICCLSM  (1 << 4)
+#define MASK_ZICCRSE  (1 << 5)
+#define MASK_ZICNTR   (1 << 6)
+#define MASK_ZIHINTPAUSE (1 << 7)
+#define MASK_ZIHPM(1 << 8)
 
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
+#define TARGET_ZICCAMOA ((riscv_zi_subext & MASK_ZICCAMOA) != 0)
+#define TARGET_ZICCIF   ((riscv_zi_subext & MASK_ZICCIF) != 0)
+#define TARGET_ZICCLSM  ((riscv_zi_subext & MASK_ZICCLSM) != 0)
+#define TARGET_ZICCRSE  ((riscv_zi_subext & MASK_ZICCRSE) != 0)
+#define TARGET_ZICNTR   ((riscv_zi_subext & MASK_ZICNTR) != 0)
+#define TARGET_ZIHINTPAUSE ((riscv_zi_subext & MASK_ZIHINTPAUSE) != 0)
+#define TARGET_ZIHPM((riscv_zi_subext & MASK_ZIHPM) != 0)
 
 #define MASK_ZBA  (1 << 0)
 #define MASK_ZBB  (1 << 1)
@@ -174,6 +188,7 @@ enum stack_protector_guard {
 
 #define MASK_SVINVAL (1 << 0)
 #define MASK_SVNAPOT (1 << 1)
+#define MASK_SVPBMT   (1 << 2)
 
 #define TARGET_SVINVAL ((riscv_sv_subext & MASK_SVINVAL) != 0)
 #define TARGET_SVNAPOT ((riscv_sv_subext & MASK_SVNAPOT) != 0)
-- 
2

[RFC] RISC-V: Add profile supports.

2022-11-02 Thread jiawei
Add two new function to handle profile input,
"parse_profile" will check if a input into -march is
legal, if it is then "handle_profile" will check the
profile's type[I/M/A], year[20/22] and mode[U/S/M],
set different extensions combine, just deal mandatory
part currently.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_subset_list::parse_profile): Check if profile name is valid or 
not.
(riscv_subset_list::parse_std_ext): If input of -march option is
a profile,skip first ISA check.
(riscv_subset_list::parse): Handle rofile input in -march.
(riscv_subset_list::handle_profile): Handle differen profiles
 expand to extensions.
* config/riscv/riscv-subset.h: New function prototypes.


---
 gcc/common/config/riscv/riscv-common.cc | 95 +++--
 gcc/config/riscv/riscv-subset.h |  5 +-
 2 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 602491c638d..da06bd89144 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -777,6 +777,35 @@ riscv_subset_list::parsing_subset_version (const char *ext,
   return p;
 }
 
+/* Parsing function for profile.
+
+   Return Value:
+ Points to the end of profile.
+
+   Arguments:
+ `p`: Current parsing position.  */
+
+const char *
+riscv_subset_list::parse_profile (const char *p)
+{
+  if(*p == 'I' || *p == 'M' || *p == 'A'){
+p++;
+if(startswith (p, "20") || startswith (p, "22"))
+  p += 2;
+if (*p == 'U' || *p == 'S' || *p == 'M')
+  p++;
+if(startswith (p, "64") || startswith (p, "32")){
+   p += 2;
+   riscv_subset_list::handle_profile(p-6, p-4, p-3);
+   return p;
+}
+  }
+  else
+error_at (m_loc, "%<-march=%s%>: Invalid profile.", m_arch);
+  return NULL;
+}
+
+
 /* Parsing function for standard extensions.
 
Return Value:
@@ -786,7 +815,7 @@ riscv_subset_list::parsing_subset_version (const char *ext,
  `p`: Current parsing position.  */
 
 const char *
-riscv_subset_list::parse_std_ext (const char *p)
+riscv_subset_list::parse_std_ext (const char *p, bool isprofile)
 {
   const char *all_std_exts = riscv_supported_std_ext ();
   const char *std_exts = all_std_exts;
@@ -795,8 +824,8 @@ riscv_subset_list::parse_std_ext (const char *p)
   unsigned minor_version = 0;
   char std_ext = '\0';
   bool explicit_version_p = false;
-
-  /* First letter must start with i, e or g.  */
+  if (!isprofile){
+/* First letter must start with i, e or g.  */
   switch (*p)
 {
 case 'i':
@@ -850,6 +879,7 @@ riscv_subset_list::parse_std_ext (const char *p)
"% or %", m_arch);
   return NULL;
 }
+}
 
   while (p != NULL && *p)
 {
@@ -1093,6 +1123,7 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   riscv_subset_list *subset_list = new riscv_subset_list (arch, loc);
   riscv_subset_t *itr;
   const char *p = arch;
+  bool isprofile = false;
   if (startswith (p, "rv32"))
 {
   subset_list->m_xlen = 32;
@@ -1103,15 +1134,26 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   subset_list->m_xlen = 64;
   p += 4;
 }
+  else if (startswith (p, "RV"))
+{
+  if (startswith (p+6, "64"))
+   subset_list->m_xlen = 64;
+  else
+   subset_list->m_xlen = 32;
+  p += 2;
+  /* Parsing profile name.  */
+  p = subset_list->parse_profile (p);
+  isprofile = true;
+}
   else
 {
-  error_at (loc, "%<-march=%s%>: ISA string must begin with rv32 or rv64",
+  error_at (loc, "%<-march=%s%>: ISA string must begin with rv32 , rv64 or 
a profile",
arch);
   goto fail;
 }
 
   /* Parsing standard extension.  */
-  p = subset_list->parse_std_ext (p);
+  p = subset_list->parse_std_ext (p,isprofile);
 
   if (p == NULL)
 goto fail;
@@ -1349,6 +1391,49 @@ riscv_handle_option (struct gcc_options *opts,
 }
 }
 
+/* Expand profile with defined mandatory extensions,
+   M-type/mode is emtpy and set as base right now.  */
+void riscv_subset_list::handle_profile(const char *profile_type,
+   const char *profile_year,
+   const char *profile_mode)
+{
+  add ("i", false);
+  if(*profile_type == 'A'){
+add ("m", false);
+add ("a", false);
+add ("f", false);
+add ("d", false);
+add ("c", false);
+add ("ziccamoa", false);
+add ("ziccif", false);
+add ("zicclsm", false);
+add ("ziccrse", false);
+add ("zicntr", false);
+add ("zicsr", false);
+
+if(*profile_mode == 'S')
+  add ("zifencei", false);
+  
+if(*profile_year == '2')
+{
+  add ("zihintpause", false);
+  add ("zihpm", false);
+  add ("zba", false);
+  add ("zbb", false);
+  add ("zbs", false);
+  add ("zicbom",

[committed] libstdc++: Ignore -Wignored-qualifiers warning in

2022-11-02 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The warning is wrong here, the qualifier serves a purpose and is not
ignored (c.f. PR c++/107492).

libstdc++-v3/ChangeLog:

* include/std/variant (__variant::_Multi_array::__untag_result):
Use pragma to suppress warning.
---
 libstdc++-v3/include/std/variant | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c234b54421e..ba8492b6985 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -831,10 +831,13 @@ namespace __variant
: false_type
{ using element_type = _Tp; };
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wignored-qualifiers"
   template 
struct __untag_result
: false_type
{ using element_type = void(*)(_Args...); };
+#pragma GCC diagnostic pop
 
   template 
struct __untag_result<__variant_cookie(*)(_Args...)>
-- 
2.38.1



[committed] libstdc++: Remove unnecessary variant member in std::expected

2022-11-02 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Hui Xie pointed out that we don't need a dummy member in the union,
because all constructors always initialize either _M_val or _M_unex.

We still need the _M_void member of the expected
specialization, because the constructor has to initialize something when
not using the _M_unex member.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::_M_invalid): Remove.
---
 libstdc++-v3/include/std/expected | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 3ee13aa95f6..e491ce41591 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -359,7 +359,7 @@ namespace __expected
   requires is_copy_constructible_v<_Tp> && is_copy_constructible_v<_Er>
   && (!is_trivially_copy_constructible_v<_Tp>
  || !is_trivially_copy_constructible_v<_Er>)
-  : _M_invalid(), _M_has_value(__x._M_has_value)
+  : _M_has_value(__x._M_has_value)
   {
if (_M_has_value)
  std::construct_at(__builtin_addressof(_M_val), __x._M_val);
@@ -376,7 +376,7 @@ namespace __expected
   requires is_move_constructible_v<_Tp> && is_move_constructible_v<_Er>
   && (!is_trivially_move_constructible_v<_Tp>
  || !is_trivially_move_constructible_v<_Er>)
-  : _M_invalid(), _M_has_value(__x._M_has_value)
+  : _M_has_value(__x._M_has_value)
   {
if (_M_has_value)
  std::construct_at(__builtin_addressof(_M_val),
@@ -394,7 +394,7 @@ namespace __expected
expected(const expected<_Up, _Gr>& __x)
noexcept(__and_v,
 is_nothrow_constructible<_Er, const _Gr&>>)
-   : _M_invalid(), _M_has_value(__x._M_has_value)
+   : _M_has_value(__x._M_has_value)
{
  if (_M_has_value)
std::construct_at(__builtin_addressof(_M_val), __x._M_val);
@@ -410,7 +410,7 @@ namespace __expected
expected(expected<_Up, _Gr>&& __x)
noexcept(__and_v,
 is_nothrow_constructible<_Er, _Gr>>)
-   : _M_invalid(), _M_has_value(__x._M_has_value)
+   : _M_has_value(__x._M_has_value)
{
  if (_M_has_value)
std::construct_at(__builtin_addressof(_M_val),
@@ -890,7 +890,6 @@ namespace __expected
   }
 
   union {
-   struct { } _M_invalid;
_Tp _M_val;
_Er _M_unex;
   };
-- 
2.38.1



[PATCH] genmultilib: Add sanity check

2022-11-02 Thread Christophe Lyon via Gcc-patches
When a list of dirnames is provided to genmultilib, its length is
expected to match the number of options.  If this is not the case, the
build fails later for reasons not obviously related to this mistake.
This patch adds a sanity check to help diagnose such cases.

Tested by adding an option to t-aarch64 and no corresponding dirname,
with both bash and dash.

OK for trunk?

gcc/ChangeLog:

* genmultilib: Add sanity check.
---
 gcc/genmultilib | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/genmultilib b/gcc/genmultilib
index 1e387fb1589..ef121e77d17 100644
--- a/gcc/genmultilib
+++ b/gcc/genmultilib
@@ -141,6 +141,20 @@ multiarch=$9
 multilib_reuse=${10}
 enable_multilib=${11}
 
+# Sanity check: make sure we have as many dirnames as options
+if [ -n "${dirnames}" ]; then
+options_arr=($options)
+dirnames_arr=($dirnames)
+nboptions=${#options_arr[@]}
+nbdirnames=${#dirnames_arr[@]}
+if [ $nbdirnames -ne $nboptions ]; then
+   echo 1>&2 "Error calling $0: Number of dirnames ($nbdirnames) does not 
match number of options ($nboptions)"
+   echo 1>&2 "options: ${options}"
+   echo 1>&2 "dirnames: ${dirnames}"
+   exit 1
+fi
+fi
+
 echo "static const char *const multilib_raw[] = {"
 
 mkdir tmpmultilib.$$ || exit 1
-- 
2.34.1



Re: [PATCH] Rewrite NAN and sign handling in frange

2022-11-02 Thread Aldy Hernandez via Gcc-patches



On 9/27/22 15:00, Mikael Morin wrote:

Hello,

Le 16/09/2022 à 15:26, Aldy Hernandez via Gcc-patches a écrit :

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index d759fcf178c..55a216efd8b 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -617,21 +602,24 @@ frange::contains_p (tree cst) const
   if (varying_p ())
 return true;


  (...)


   if (real_compare (GE_EXPR, rv, &m_min) && real_compare (LE_EXPR, 
rv, &m_max))

 {
+  // Make sure the signs are equal for signed zeros.
   if (HONOR_SIGNED_ZEROS (m_type) && real_iszero (rv))
-    {
-  // FIXME: This is still using get_signbit() instead of
-  // known_signbit() because the latter bails on possible NANs
-  // (for now).
-  if (get_signbit ().yes_p ())
-    return real_isneg (rv);
-  else if (get_signbit ().no_p ())
-    return !real_isneg (rv);
-  else
-    return true;
-    }
+    return m_min.sign == m_max.sign && m_min.sign == rv->sign;
   return true;
 }
   return false;


It seems that this won't report any range with mismatching bound signs 
as containing zero.

Maybe a selftest explains it better: the following fails.


My apologies for only seeing this now.  You did not CC me in the 
response, and it got lost amongst my other list mail.


You are absolutely right.

The attached patch fixes this problem.  It has been tested on x86-64 
Linux and pushed.


Thanks for pointing this out.
Aldy



diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 9ca442478c9..8fc909171bc 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -3780,6 +3780,14 @@ range_tests_signed_zeros ()
    ASSERT_TRUE (r0.contains_p (neg_zero));
    ASSERT_FALSE (r0.contains_p (zero));

+  r0 = frange_float ("-3", "5");
+  ASSERT_TRUE (r0.contains_p (neg_zero));
+  ASSERT_TRUE (r0.contains_p (zero));
+
+  r0 = frange (neg_zero, zero);
+  ASSERT_TRUE (r0.contains_p (neg_zero));
+  ASSERT_TRUE (r0.contains_p (zero));
+
    // The intersection of zeros that differ in sign is a NAN (or
    // undefined if not honoring NANs).
    r0 = frange (neg_zero, neg_zero);
From da2d128a87b2f3359f6f38f29624387094875a60 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 2 Nov 2022 12:39:45 +0100
Subject: [PATCH] Fix bug in frange::contains_p() for signed zeros.

The contains_p() code wasn't returning true for non-singleton ranges
containing signed zeros.  With this patch we now handle:

	-0.0 exists in [-3, +5.0]
	+0.0 exists in [-3, +5.0]

gcc/ChangeLog:

	* value-range.cc (frange::contains_p): Fix signed zero handling.
	(range_tests_signed_zeros): New test.
---
 gcc/value-range.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 3743ec714b3..a855aaf626c 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -661,7 +661,7 @@ frange::contains_p (tree cst) const
 {
   // Make sure the signs are equal for signed zeros.
   if (HONOR_SIGNED_ZEROS (m_type) && real_iszero (rv))
-	return m_min.sign == m_max.sign && m_min.sign == rv->sign;
+	return rv->sign == m_min.sign || rv->sign == m_max.sign;
   return true;
 }
   return false;
@@ -3859,6 +3859,14 @@ range_tests_signed_zeros ()
   ASSERT_TRUE (r0.contains_p (neg_zero));
   ASSERT_FALSE (r0.contains_p (zero));
 
+  r0 = frange (neg_zero, zero);
+  ASSERT_TRUE (r0.contains_p (neg_zero));
+  ASSERT_TRUE (r0.contains_p (zero));
+
+  r0 = frange_float ("-3", "5");
+  ASSERT_TRUE (r0.contains_p (neg_zero));
+  ASSERT_TRUE (r0.contains_p (zero));
+
   // The intersection of zeros that differ in sign is a NAN (or
   // undefined if not honoring NANs).
   r0 = frange (neg_zero, neg_zero);
-- 
2.38.1



[PATCH] libstdc++: Declare const global variables inline

2022-11-02 Thread Patrick Palka via Gcc-patches
IIUC such variables should be declared inline to avoid potential ODR
violations since they're otherwise considered to be distinct (internal
linkage) entities across TUs.

The changes inside the regex_constants and execution namespace seem to
be unimplemented parts of P0607R0; the rest of the changes touch only
implementation details.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (_detail::__platform_wait_alignment):
Declare inline.  Remove redundant static specifier.
(__detail::__atomic_spin_count_relax): Declare inline.
(__detail::__atomic_spin_count): Likewise.
* include/bits/regex_automaton.h (__detail::_S_invalid_state_id):
Conditionally declare inline.  Declare constexpr.  Remove
redundant const and static specifiers.
* include/bits/regex_error.h (regex_constants::error_collate): 
Conditionally
declare inline.
(regex_constants::error_ctype): Likewise.
(regex_constants::error_escape): Likewise.
(regex_constants::error_backref): Likewise.
(regex_constants::error_brack): Likewise.
(regex_constants::error_paren): Likewise.
(regex_constants::error_brace): Likewise.
(regex_constants::error_badbrace): Likewise.
(regex_constants::error_range): Likewise.
(regex_constants::error_space): Likewise.
(regex_constants::error_badrepeat): Likewise.
(regex_constants::error_complexity): Likewise.
(regex_constants::error_stack): Likewise.
* include/ext/concurrence.h (__gnu_cxx::__default_lock_policy):
Likewise.  Remove redundant static specifier.
* include/pstl/execution_defs.h (execution::seq): Conditionally declare
inline.
(execution::par): Likewise.
(execution::par_unseq): Likewise.
(execution::unseq): Likewise.
---
 libstdc++-v3/include/bits/atomic_wait.h |  8 +++
 libstdc++-v3/include/bits/regex_automaton.h |  2 +-
 libstdc++-v3/include/bits/regex_error.h | 26 ++---
 libstdc++-v3/include/ext/concurrence.h  |  2 +-
 libstdc++-v3/include/pstl/execution_defs.h  |  8 +++
 5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 76ed7409937..bd1ed56d157 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -58,14 +58,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
 #define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 using __platform_wait_t = int;
-static constexpr size_t __platform_wait_alignment = 4;
+inline constexpr size_t __platform_wait_alignment = 4;
 #else
 // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
 // and __platform_notify() if there is a more efficient primitive supported
 // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
 // a mutex/condvar based wait.
 using __platform_wait_t = uint64_t;
-static constexpr size_t __platform_wait_alignment
+inline constexpr size_t __platform_wait_alignment
   = __alignof__(__platform_wait_t);
 #endif
   } // namespace __detail
@@ -142,8 +142,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 }
 
-constexpr auto __atomic_spin_count_relax = 12;
-constexpr auto __atomic_spin_count = 16;
+inline constexpr auto __atomic_spin_count_relax = 12;
+inline constexpr auto __atomic_spin_count = 16;
 
 struct __default_spin_policy
 {
diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index f95eb7dad6d..44bde42e212 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -46,7 +46,7 @@ namespace __detail
*/
 
   typedef long _StateIdT;
-  static const _StateIdT _S_invalid_state_id  = -1;
+  _GLIBCXX17_INLINE constexpr _StateIdT _S_invalid_state_id  = -1;
 
   template
 using _Matcher = std::function;
diff --git a/libstdc++-v3/include/bits/regex_error.h 
b/libstdc++-v3/include/bits/regex_error.h
index 74a1428c2c7..ab207650d44 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -66,60 +66,60 @@ namespace regex_constants
 };
 
   /** The expression contained an invalid collating element name. */
-  constexpr error_type error_collate(_S_error_collate);
+  _GLIBCXX17_INLINE constexpr error_type error_collate(_S_error_collate);
 
   /** The expression contained an invalid character class name. */
-  constexpr error_type error_ctype(_S_error_ctype);
+  _GLIBCXX17_INLINE constexpr error_type error_ctype(_S_error_ctype);
 
   /**
* The expression contained an invalid escaped character, or a trailing
* escape.
*/
-  constexpr error_type error_escape(_S_error_escape);
+  _GLIBCXX17_INLINE constexpr error_type error_escape(_S_error_escape);
 
   /

Re: [PATCH] Enable shrink wrapping for the RISC-V target.

2022-11-02 Thread Manolis Tsamis
On Tue, Oct 18, 2022 at 8:35 PM Palmer Dabbelt  wrote:
>
> On Tue, 18 Oct 2022 08:57:37 PDT (-0700), j...@ventanamicro.com wrote:
> >
> > Just a couple more comments in-line.
> >
> > On 10/18/22 09:18, Manolis Tsamis wrote:
> >>
>  +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
>  +
>  +static sbitmap
>  +riscv_get_separate_components (void)
>  +{
>  +  HOST_WIDE_INT offset;
>  +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
>  +  bitmap_clear (components);
>  +
>  +  if (riscv_use_save_libcall (&cfun->machine->frame)
>  +  || cfun->machine->interrupt_handler_p)
> >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's
> >>> redundant.  That said, I'm not sure riscv_use_save_libcall() is the
> >>> right check here as unless I'm missing something we don't have all those
> >>> other constraints when shrink-wrapping.
> >>>
> >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so 
> >> the
> >> check for interrupt_handler_p in the branch is not redundant in this case.
> >>
> >> I encountered some issues when shrink wrapping and libcall was used in the 
> >> same
> >> function. Thinking that libcall replaces the prologue/epilogue I didn't 
> >> see a
> >> reason to have both at the same time and hence I opted to disable
> >> shrink wrapping in that case. From my understanding this should be 
> >> harmless?
> >
> > I would have expected things to work fine with libcalls, perhaps with
> > the exception of the save/restore libcalls.  So that needs deeper
> > investigation.
>
> The save/restore libcalls only support saving/restoring a handful of
> register configurations (just the saved X registers in the order they're
> usually saved in by GCC).  It should be OK for correctness to over-save
> registers, but it kind of just un-does the shrink wrapping so not sure
> it's worth worrying about at that point.
>
> There's also some oddness around the save/restore libcall ABI, it's not
> the standard function ABI but instead a GCC-internal one.  IIRC it just
> uses the alternate link register (ie, t0 instead of ra) but I may have
> forgotten something else.
>
> >>> It seems kind of clunky to have two copies of all these loops (and we'll
> >>> need a third to make this work with the V stuff), but we've got that
> >>> issue elsewhere in the port so I don't think you need to fix it here
> >>> (though the V stuff will be there for the v2, so you'll need the third
> >>> copy of each loop).
> >>>
> >> Indeed, I was following the other ports here. Do you think it would be
> >> better to refactor this when the code for the V extension is added?
> >> By taking into account what code will be needed for V, a proper refactored
> >> function could be made to handle all cases.
> >
> > I think refactoring when V gets added would be fine.  While we could
> > probably refactor it correctly now (it isn't terribly complex code after
> > all), but we're more likely to get it right with the least amount of
> > work if we do it when V is submitted.
>
> Some of the V register blocks are already there, but ya I agree we can
> just wait.  There's going to be a bunch of V-related churn for a bit,
> juggling those patches is already enough of a headache ;)
>
> >>> Either way, this deserves a test case.  I think it should be possible to
> >>> write one by introducing some register pressure around a
> >>> shrink-wrappable block that needs a long stack offset and making sure
> >>> in-flight registers don't get trashed.
> >>>
> >> I tried to think of some way to introduce a test like that but couldn't and
> >> I don't see how it would be done. Shrink wrapping only affects saved 
> >> registers
> >> so there are always available temporaries that are not affected by
> >> shrink wrapping.
> >> (Register pressure should be irrelevant in this case if I understand 
> >> correctly).
> >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping
> >> should be unaffected from long stack offsets. If you see some way to write
> >> a test for that based on what I explained please explain how I could do 
> >> that.
> >
> > I think the register pressure was just to ensure that some saves were
> > needed to trigger an attempt to shrink wrap something.  You'd also need
> > something to eat stack space (local array which gets referenced as an
> > asm operand, but where the asm doesn't generate any code perhaps)?
> > Whether or not that works depends on stack layout though which I don't
> > know well enough for riscv.
>
> Sorry for being a bit vague, but it's because I always find it takes a
> bit of time to write up tests like this.  I think something like this
> might do it, but that almost certainly won't work as-is:
>
> // Some extern bits to try and trip up the optimizer.
> extern long helper(long *sa, long a, long b, long c, ...);
> extern long glob_array[1024];
>
> // The function takes a bunch of arguments to fi

Re: [PATCH] Enable shrink wrapping for the RISC-V target.

2022-11-02 Thread Manolis Tsamis
On Wed, Oct 19, 2022 at 8:16 PM Jeff Law via Gcc-patches
 wrote:
>
>
> On 10/18/22 11:35, Palmer Dabbelt wrote:
> >
> >> I would have expected things to work fine with libcalls, perhaps with
> >> the exception of the save/restore libcalls.  So that needs deeper
> >> investigation.
> >
> > The save/restore libcalls only support saving/restoring a handful of
> > register configurations (just the saved X registers in the order
> > they're usually saved in by GCC).  It should be OK for correctness to
> > over-save registers, but it kind of just un-does the shrink wrapping
> > so not sure it's worth worrying about at that point.
> >
> > There's also some oddness around the save/restore libcall ABI, it's
> > not the standard function ABI but instead a GCC-internal one.  IIRC it
> > just uses the alternate link register (ie, t0 instead of ra) but I may
> > have forgotten something else.
>
> I hadn't really dug into it -- I was pretty sure they weren't following
> the standard ABI based on its name and how I've used similar routines to
> save space on some targets in the past.  So if we're having problems
> with shrink-wrapping and libcalls, those two might be worth investigating.
>
>
> But I think the most important takeaway is that shrink wrapping should
> work with libcalls, there's nothing radically different about libcalls
> that would make them inherently interact poorly with shrink-wrapping.
> So that aspect of the shrink-wrapping patch needs deeper investigation.
>
> Jeff

I think I miscommunicated the issue previously because my understanding
of libcalls wasn't very solid. The guard is against the save/restore libcalls
specifically; other than that shrink wrapping and libcalls are fine.I think it
makes sense to leave this check because the prologue/epilogue does
something similar when using libcall save/restore:
  frame->mask = 0; /* Temporarily fib that we need not save GPRs. */

Since shrink wrap components are marked by testing frame->mask then
no registers should be wrapped with the libcall save/restore if I understand
correctly.

Nonetheless, I tested what happens if this guard condition is removed
and the result is that a RISCV test fails (riscv/pr95252.c). In that case
a unnecessary save/restore of a register is emitted together with
inconsistent cfi notes that make dwarf2cfi abort.

To conclude, I believe that this makes the code in the commit fine since
it only guards against the libcall save/restore case. But I may be still
missing something about this.

Manolis


Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Christoph Müllner
On Thu, Oct 27, 2022 at 10:51 PM Palmer Dabbelt  wrote:

> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu
> wrote:
> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
> > christoph.muell...@vrull.eu> wrote:
> >
> >> From: Christoph Muellner 
> >>
> >> This patch adds support for the Zawrs ISA extension.
> >> The patch depends on the corresponding Binutils patch
> >> to be usable (see [1])
> >>
> >> The specification can be found here:
> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> >>
> >> Note, that the Zawrs extension is not frozen or ratified yet.
> >> Therefore this patch is an RFC and not intended to get merged.
> >>
> >
> > Sorry, forgot to update this part:
> > The Zawrs extension is frozen but not ratified.
> > Let me know if I should send a v2 for this change of the commit msg.
>
> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
> the NEWS entry too.  I also don't see any build/test results.
>

I ran the GCC regression test suite with rv32 and rv64 toolchains
using the riscv-gnu-toolchain repo and did not see any regressions.

Where can I create the news entry?


>
> Thanks!
>
> > Binuitls support has been merged recently:
> >
> >
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
> >
> >
> >>
> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
> >>
> >> gcc/ChangeLog:
> >>
> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
> >> (TARGET_ZAWRS): New.
> >> * config/riscv/riscv.opt: New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/zawrs.c: New test.
> >>
> >> Signed-off-by: Christoph Muellner 
> >> ---
> >>  gcc/common/config/riscv/riscv-common.cc |  4 
> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
> >>  gcc/config/riscv/riscv.opt  |  3 +++
> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
> >>  4 files changed, 23 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
> >>
> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> >> b/gcc/common/config/riscv/riscv-common.cc
> >> index d6404a01205..4b7f777c103 100644
> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
> >> riscv_ext_version_table[] =
> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >>
> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
> >> +
> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
> >> riscv_ext_flag_table[] =
> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> >>
> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
> >> +
> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
> >> diff --git a/gcc/config/riscv/riscv-opts.h
> b/gcc/config/riscv/riscv-opts.h
> >> index 1dfe8c89209..25fd85b09b1 100644
> >> --- a/gcc/config/riscv/riscv-opts.h
> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >>
> >> +#define MASK_ZAWRS   (1 << 0)
> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> >> +
> >>  #define MASK_ZBA  (1 << 0)
> >>  #define MASK_ZBB  (1 << 1)
> >>  #define MASK_ZBC  (1 << 2)
> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> index 426ea95cd14..7c3ca48d1cc 100644
> >> --- a/gcc/config/riscv/riscv.opt
> >> +++ b/gcc/config/riscv/riscv.opt
> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
> >>  TargetVariable
> >>  int riscv_zi_subext
> >>
> >> +TargetVariable
> >> +int riscv_za_subext
> >> +
> >>  TargetVariable
> >>  int riscv_zb_subext
> >>
> >> diff --git a/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> new file mode 100644
> >> index 000..0b7e2662343
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> @@ -0,0 +1,13 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc_zawrs" { target { rv64 } } } */
> >> +/* { dg-options "-march=rv32gc_zawrs" { target { rv32 } } } */
> >> +
> >> +#ifndef __riscv_zawrs
> >> +#error Feature macro not defined
> >> +#endif
> >> +
> >> +int
> >> +foo (int a)
> >> +{
> >> +  return a;
> >> +}
> >> --
> >> 2.37.3
> >>
> >>
>


Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Philipp Tomsich
On Wed, 2 Nov 2022 at 15:21, Christoph Müllner
 wrote:
>
>
>
> On Thu, Oct 27, 2022 at 10:51 PM Palmer Dabbelt  wrote:
>>
>> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu wrote:
>> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
>> > christoph.muell...@vrull.eu> wrote:
>> >
>> >> From: Christoph Muellner 
>> >>
>> >> This patch adds support for the Zawrs ISA extension.
>> >> The patch depends on the corresponding Binutils patch
>> >> to be usable (see [1])
>> >>
>> >> The specification can be found here:
>> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
>> >>
>> >> Note, that the Zawrs extension is not frozen or ratified yet.
>> >> Therefore this patch is an RFC and not intended to get merged.
>> >>
>> >
>> > Sorry, forgot to update this part:
>> > The Zawrs extension is frozen but not ratified.
>> > Let me know if I should send a v2 for this change of the commit msg.
>>
>> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
>> the NEWS entry too.  I also don't see any build/test results.
>
>
> I ran the GCC regression test suite with rv32 and rv64 toolchains
> using the riscv-gnu-toolchain repo and did not see any regressions.
>
> Where can I create the news entry?

News are generated from
  git://gcc.gnu.org/git/gcc-wwwdocs.git

You'll want to add to
  htdocs/gcc-13/changes.html

Thanks,
Philipp.

>>
>>
>> Thanks!
>>
>> > Binuitls support has been merged recently:
>> >
>> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
>> >
>> >
>> >>
>> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
>> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
>> >> (TARGET_ZAWRS): New.
>> >> * config/riscv/riscv.opt: New.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >> * gcc.target/riscv/zawrs.c: New test.
>> >>
>> >> Signed-off-by: Christoph Muellner 
>> >> ---
>> >>  gcc/common/config/riscv/riscv-common.cc |  4 
>> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
>> >>  gcc/config/riscv/riscv.opt  |  3 +++
>> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
>> >>  4 files changed, 23 insertions(+)
>> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
>> >>
>> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
>> >> b/gcc/common/config/riscv/riscv-common.cc
>> >> index d6404a01205..4b7f777c103 100644
>> >> --- a/gcc/common/config/riscv/riscv-common.cc
>> >> +++ b/gcc/common/config/riscv/riscv-common.cc
>> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
>> >> riscv_ext_version_table[] =
>> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
>> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
>> >>
>> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
>> >> +
>> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
>> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
>> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
>> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
>> >> riscv_ext_flag_table[] =
>> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
>> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
>> >>
>> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
>> >> +
>> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
>> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
>> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
>> >> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
>> >> index 1dfe8c89209..25fd85b09b1 100644
>> >> --- a/gcc/config/riscv/riscv-opts.h
>> >> +++ b/gcc/config/riscv/riscv-opts.h
>> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
>> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
>> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
>> >>
>> >> +#define MASK_ZAWRS   (1 << 0)
>> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
>> >> +
>> >>  #define MASK_ZBA  (1 << 0)
>> >>  #define MASK_ZBB  (1 << 1)
>> >>  #define MASK_ZBC  (1 << 2)
>> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> >> index 426ea95cd14..7c3ca48d1cc 100644
>> >> --- a/gcc/config/riscv/riscv.opt
>> >> +++ b/gcc/config/riscv/riscv.opt
>> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
>> >>  TargetVariable
>> >>  int riscv_zi_subext
>> >>
>> >> +TargetVariable
>> >> +int riscv_za_subext
>> >> +
>> >>  TargetVariable
>> >>  int riscv_zb_subext
>> >>
>> >> diff --git a/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> b/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> new file mode 100644
>> >> index 000..0b7e2662343
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> @@ -0,0 +1,13 @@
>> >> +/* { dg-do compile } */
>> >> +/* { dg-options "-march=rv64gc_zawrs" { target { rv64 } } } */
>> 

[PATCH 2/2]AArch64 Add implementation for vector cbranch.

2022-11-02 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for AArch64.

For 128-bit vectors we generate:

cmhiv1.4s, v1.4s, v0.4s
umaxp   v1.4s, v1.4s, v1.4s
fmovx3, d1
cbnzx3, .L8

and of 64-bit vector we can omit the compression:

cmhiv1.2s, v1.2s, v0.2s
fmovx2, d1
cbz x2, .L13

I did also want to provide a version that mixes SVE and NEON so I can use the
SVE CMHI instructions with a NEON register.

So concretely for a 128-bit vector you'd get:

ptrue   p0.s, vl4
.L3:
...
cmplo   p2.s, p0/z, z0.s, z2.s
b.any   .L6
...
cmp w2, 200
bne .L3

However I ran into an issue where cbranch is not the thing that does the
comparison.  And if I use combine to do it then the resulting ptrue wouldn't be
floated outside the loop.

Is there a way to currently do this? or does a mid-end pass need to be changed
for this?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable AArch64 generically.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
5386043739a9b2e328bfb2fc9067da8feeac1a92..e53d339ea20492812a3faa7c20ed945255321b11
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3795,6 +3795,41 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors to produce a sets flagsi.
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_BHSI 1 "register_operand")
+ (match_operand:VDQ_BHSI 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  rtx tmp = gen_reg_rtx (mode);
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = simplify_gen_subreg (V4SImode, operands[1], mode, 0);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, simplify_gen_subreg (mode, res, V4SImode, 0));
+}
+  else
+tmp = operands[1];
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, simplify_gen_subreg (DImode, tmp, mode, 0));
+
+  rtx cc_reg = aarch64_gen_compare_reg (NE, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (NE, DImode, cc_reg, operands[2]);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
5cbf54bd2a23dfdc5dc7b148b0dc6ed4c63814ae..8964cbd6610a718711546d312e89cee937d210e8
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3653,8 +3653,7 @@ proc check_effective_target_vect_int { } {
 proc check_effective_target_vect_early_break { } {
 return [check_cached_effective_target_indexed vect_early_break {
   expr {
-   ([istarget aarch64*-*-*]
-&& [check_effective_target_aarch64_sve])
+   [istarget aarch64*-*-*]
}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions 
of




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
5386043739a9b2e328bfb2fc9067da8feeac1a92..e53d339ea20492812a3faa7c20ed945255321b11
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3795,6 +3795,41 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors to produce a sets flagsi.
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_BHSI 1 "register_operand")
+ (match_operand:VDQ_BHSI 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  rtx tmp = gen_reg_rtx (mode);
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = simplify_gen_subreg (V4SImode, operands[1], mode, 0);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, simplify_gen_subreg (mode, res, V4SImode, 0));
+}
+  else
+tmp = operands[1];
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, simplify_gen_subreg (DImode, tmp, mode, 0));
+
+  rtx cc_reg = aarch64_gen_compare_reg (NE, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (NE, DImode, cc_reg, operands

[wwwdocs] gcc-13: riscv: Document the Zawrs support

2022-11-02 Thread Christoph Muellner
From: Christoph Müllner 

This patch documents the new RISC-V Zawrs support.

Signed-off-by: Christoph Müllner 
---
 htdocs/gcc-13/changes.html | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 7c6bfa6e..5e6e054b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -261,7 +261,10 @@ a work-in-progress.
 
 
 
-
+RISC-V
+
+New ISA extension support for zawrs.
+
 
 
 
-- 
2.38.1



Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Christoph Müllner
On Wed, Nov 2, 2022 at 3:38 PM Philipp Tomsich  wrote:
>
> On Wed, 2 Nov 2022 at 15:21, Christoph Müllner
>  wrote:
> >
> >
> >
> > On Thu, Oct 27, 2022 at 10:51 PM Palmer Dabbelt  wrote:
> >>
> >> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu 
> >> wrote:
> >> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
> >> > christoph.muell...@vrull.eu> wrote:
> >> >
> >> >> From: Christoph Muellner 
> >> >>
> >> >> This patch adds support for the Zawrs ISA extension.
> >> >> The patch depends on the corresponding Binutils patch
> >> >> to be usable (see [1])
> >> >>
> >> >> The specification can be found here:
> >> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> >> >>
> >> >> Note, that the Zawrs extension is not frozen or ratified yet.
> >> >> Therefore this patch is an RFC and not intended to get merged.
> >> >>
> >> >
> >> > Sorry, forgot to update this part:
> >> > The Zawrs extension is frozen but not ratified.
> >> > Let me know if I should send a v2 for this change of the commit msg.
> >>
> >> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
> >> the NEWS entry too.  I also don't see any build/test results.
> >
> >
> > I ran the GCC regression test suite with rv32 and rv64 toolchains
> > using the riscv-gnu-toolchain repo and did not see any regressions.
> >
> > Where can I create the news entry?
>
> News are generated from
>   git://gcc.gnu.org/git/gcc-wwwdocs.git
>
> You'll want to add to
>   htdocs/gcc-13/changes.html

The patch can be found here:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604882.html

Thanks,
Christoph

>
>
> Thanks,
> Philipp.
>
> >>
> >>
> >> Thanks!
> >>
> >> > Binuitls support has been merged recently:
> >> >
> >> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
> >> >
> >> >
> >> >>
> >> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
> >> >>
> >> >> gcc/ChangeLog:
> >> >>
> >> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
> >> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
> >> >> (TARGET_ZAWRS): New.
> >> >> * config/riscv/riscv.opt: New.
> >> >>
> >> >> gcc/testsuite/ChangeLog:
> >> >>
> >> >> * gcc.target/riscv/zawrs.c: New test.
> >> >>
> >> >> Signed-off-by: Christoph Muellner 
> >> >> ---
> >> >>  gcc/common/config/riscv/riscv-common.cc |  4 
> >> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
> >> >>  gcc/config/riscv/riscv.opt  |  3 +++
> >> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
> >> >>  4 files changed, 23 insertions(+)
> >> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
> >> >>
> >> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> >> >> b/gcc/common/config/riscv/riscv-common.cc
> >> >> index d6404a01205..4b7f777c103 100644
> >> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
> >> >> riscv_ext_version_table[] =
> >> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >> >>
> >> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >> +
> >> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
> >> >> riscv_ext_flag_table[] =
> >> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> >> >>
> >> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
> >> >> +
> >> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
> >> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
> >> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
> >> >> diff --git a/gcc/config/riscv/riscv-opts.h 
> >> >> b/gcc/config/riscv/riscv-opts.h
> >> >> index 1dfe8c89209..25fd85b09b1 100644
> >> >> --- a/gcc/config/riscv/riscv-opts.h
> >> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
> >> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >> >>
> >> >> +#define MASK_ZAWRS   (1 << 0)
> >> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> >> >> +
> >> >>  #define MASK_ZBA  (1 << 0)
> >> >>  #define MASK_ZBB  (1 << 1)
> >> >>  #define MASK_ZBC  (1 << 2)
> >> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> >> index 426ea95cd14..7c3ca48d1cc 100644
> >> >> --- a/gcc/config/riscv/riscv.opt
> >> >> +++ b/gcc/config/riscv/riscv.opt
> >> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
> >> >>  TargetVariable
> >> >>  int riscv_zi_subext
> >> >>
> >> >> +TargetVariable
> >> >> +int ri

Re: [PATCH] Enable shrink wrapping for the RISC-V target.

2022-11-02 Thread Jeff Law



On 11/2/22 08:12, Manolis Tsamis wrote:

On Wed, Oct 19, 2022 at 8:16 PM Jeff Law via Gcc-patches
 wrote:


On 10/18/22 11:35, Palmer Dabbelt wrote:

I would have expected things to work fine with libcalls, perhaps with
the exception of the save/restore libcalls.  So that needs deeper
investigation.

The save/restore libcalls only support saving/restoring a handful of
register configurations (just the saved X registers in the order
they're usually saved in by GCC).  It should be OK for correctness to
over-save registers, but it kind of just un-does the shrink wrapping
so not sure it's worth worrying about at that point.

There's also some oddness around the save/restore libcall ABI, it's
not the standard function ABI but instead a GCC-internal one.  IIRC it
just uses the alternate link register (ie, t0 instead of ra) but I may
have forgotten something else.

I hadn't really dug into it -- I was pretty sure they weren't following
the standard ABI based on its name and how I've used similar routines to
save space on some targets in the past.  So if we're having problems
with shrink-wrapping and libcalls, those two might be worth investigating.


But I think the most important takeaway is that shrink wrapping should
work with libcalls, there's nothing radically different about libcalls
that would make them inherently interact poorly with shrink-wrapping.
So that aspect of the shrink-wrapping patch needs deeper investigation.

Jeff

I think I miscommunicated the issue previously because my understanding
of libcalls wasn't very solid. The guard is against the save/restore libcalls
specifically; other than that shrink wrapping and libcalls are fine.I think it
makes sense to leave this check because the prologue/epilogue does
something similar when using libcall save/restore:
   frame->mask = 0; /* Temporarily fib that we need not save GPRs. */


Looking more closely, yea, it might have been a miscommunication between 
us WRT libcalls.


 You're testing riscv_use_save_libcall, which is only going to kick in 
when we're using that special function to do register saves/restores.  
While we could, in theory, shrink-wrap that as well, I don't think it's 
worth the additional headache.




Since shrink wrap components are marked by testing frame->mask then
no registers should be wrapped with the libcall save/restore if I understand
correctly.


That's my understanding as well after looking at the code more closely.  
Essentially the mask is set to zero for the duration of the call to 
riscv_for_each_saved_regs which will effectively avoid shrink wrapping 
in that case.





Nonetheless, I tested what happens if this guard condition is removed
and the result is that a RISCV test fails (riscv/pr95252.c). In that case
a unnecessary save/restore of a register is emitted together with
inconsistent cfi notes that make dwarf2cfi abort.


ACK.  This is the kind of thing I was referring to above when I said we 
probably could shrink wrap the call to save/restore registers, but it's 
probably not worth the headache right now. I could see someone in the 
embedded space one day trying to tackle this problem, but I don't think 
we need to for this patch to go forward.





To conclude, I believe that this makes the code in the commit fine since
it only guards against the libcall save/restore case. But I may be still
missing something about this.


I think you've got it figured out reasonably well.


So the final conclusion is libcalls are resolved to my satisfaction.



Jeff


Re: [PATCH] Enable shrink wrapping for the RISC-V target.

2022-11-02 Thread Jeff Law



On 11/2/22 07:54, Manolis Tsamis wrote:


I've revisited this testcase and I think it's not possible to make it
work with the current implementation.
It's not possible to trigger shrink wrapping in this case since the
wrapping of registers is guarded by
  if (SMALL_OPERAND (offset)) { bitmap_set_bit (components, regno); }
Hence if a long stack is generated we get no shrink wrapping.

I also tried to remove that restriction but it looks like it can't
work because we can't create
pseudo-registers during shrink wrapping and shrink wrapping can't work either.

I believe this means that shrink wrapping cannot interfere with a long
stack frame
so there is nothing to test against in this case?


It'd be marginally better to have such a test case to ensure we don't 
shrink wrap it -- that would ensure that someone doesn't accidentally 
introduce shrink wrapping with large offsets.   Just a bit of future 
proofing.



Jeff




Re: [PATCH] gcc: honour -ffile-prefix-map in ASM_MAP [PR93371]

2022-11-02 Thread Jeff Law via Gcc-patches



On 11/2/22 06:35, Rasmus Villemoes wrote:

On 01/11/2022 21.11, Jeff Law wrote:

On 8/29/22 03:29, Rasmus Villemoes wrote:

-ffile-prefix-map is supposed to be a superset of -fmacro-prefix-map
and -fdebug-prefix-map. However, when building .S or .s files, gas is
not called with the appropriate --debug-prefix-map option when
-ffile-prefix-map is used.

While the user can specify -fdebug-prefix-map when building assembly
files via gcc, it's more ergonomic to also support -ffile-prefix-map;
especially since for .S files that could contain the __FILE__ macro,
one would then also have to specify -fmacro-prefix-map.

gcc:
 PR driver/93371
 * gcc.cc (ASM_MAP): Honour -ffile-prefix-map.

OK.  Sorry for the long delay.

Thanks, and no problem.

However, when I try to push the new master branch I get

$ git push origin master
fatal: remote error: service not enabled: /git/gcc.git

I do gcc patches sufficiently rare that I may have forgotten the right
procedure, but this is what I think I've done previously (along with
running a "git gcc-verify HEAD" to ensure there's a proper changelog
fragment to extract, with gcc-verify being a suitable alias).

Have I simply lost by commit bit?


No idea what that error means.  If I had to guess, it'd be that you've 
got an anonymous checkout tree which is obviously unsuitable for pushing 
or something of that nature.


It's probably just faster/easier for me to push it for you.  I'll take 
care of it momentarily.


Jeff


[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr

2022-11-02 Thread Tobias Burnus

This fixes some an issue with 'alloc:' found when working on the patch
'[Patch] OpenMP/Fortran: 'target update' with strides + DT components'
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604687.html
(BTW: This one is still pending review.)

OK for mainline?

 * * *

I think the patch is a great improvement.

However, again, by writing a testcase, more issues have been found:
* one generic Fortran one, worked around by adding '(:)',
  Cf. https://gcc.gnu.org/PR107508 "Invalid bounds due to bogus reallocation
  on assignment with KIND=4 characters".
* Some other string issues, some might be generic Fortran issues
* Some issue with pointers - where exit data give an error as
  0x00 and 0x01 kinds are not known by target exit data
  Those also showed up with the 'target update' patch mentioned above.

For the last two, I used '#if 0' followed by a comment with the current
error message. I do intent to look into those - or at least file a PR.
Likewise for the remaining issues mentioned in the 'tagret update' patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr

When using 'map(alloc: var, dt%comp)' needs to have a 'to' mapping of
the array descriptor as otherwise the bounds are not available in the
target region. - Likewise for character strings.

This patch implements this; however, some additional issues are exposed
by the testcase; those are '#if 0'ed and will be handled later.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses): Ensure DT struct-comp with
	array descriptor and 'alloc:' have the descriptor mapped with 'to:'.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.

 gcc/fortran/trans-openmp.cc   |3 
 libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 |  567 ++
 2 files changed, 569 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4bfdf85cd9b..4eb9d4c9edc 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3507,7 +3507,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 			= gfc_full_array_size (block, inner, rank);
 			  tree elemsz
 			= TYPE_SIZE_UNIT (gfc_get_element_type (type));
-			  if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node)))
+			  if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node))
+			  || OMP_CLAUSE_MAP_KIND (node) == GOMP_MAP_ALLOC)
 			map_kind = GOMP_MAP_TO;
 			  else if (n->u.map_op == OMP_MAP_RELEASE
    || n->u.map_op == OMP_MAP_DELETE)
diff --git a/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90
new file mode 100644
index 000..1fe3f03c7b8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90
@@ -0,0 +1,567 @@
+! { dg-additional-options "-cpp" }
+
+! FIXME: Some tests do not work yet. Those are for now in '#if 0'
+
+! Check that 'map(alloc:' properly works with
+! - deferred-length character strings
+! - arrays with array descriptors
+! For those, the array descriptor / string length must be mapped with 'to:'
+
+program main
+implicit none
+
+type t
+  integer :: ic(2:5), ic2
+  character(len=11) :: ccstr(3:4), ccstr2
+  character(len=11,kind=4) :: cc4str(3:7), cc4str2
+  integer, pointer :: pc(:), pc2
+  character(len=:), pointer :: pcstr(:), pcstr2
+  character(len=:,kind=4), pointer :: pc4str(:), pc4str2
+end type t
+
+type(t) :: dt
+
+integer :: ii(5), ii2
+character(len=11) :: clstr(-1:1), clstr2
+character(len=11,kind=4) :: cl4str(0:3), cl4str2
+integer, pointer :: ip(:), ip2
+integer, allocatable :: ia(:), ia2
+character(len=:), pointer :: pstr(:), pstr2
+character(len=:), allocatable :: astr(:), astr2
+character(len=:,kind=4), pointer :: p4str(:), p4str2
+character(len=:,kind=4), allocatable :: a4str(:), a4str2
+
+
+allocate(dt%pc(5), dt%pc2)
+allocate(character(len=2) :: dt%pcstr(2))
+allocate(character(len=4) :: dt%pcstr2)
+
+allocate(character(len=3,kind=4) :: dt%pc4str(2:3))
+allocate(character(len=5,kind=4) :: dt%pc4str2)
+
+allocate(ip(5), ip2, ia(8), ia2)
+allocate(character(len=2) :: pstr(-2:0))
+allocate(character(len=4) :: pstr2)
+allocate(character(len=6) :: astr(3:5))
+allocate(character(len=8) :: astr2)
+
+allocate(character(len=3,kind=4) :: p4str(2:4))
+allocate(character(len=5,kind=4) :: p4str2)
+allocate(character(len=7,kind=4) :: a4str(-2:3))
+allocate(character(len=9,kind=4) :: a4str2)
+
+
+! integer :: ic(2:5), ic2
+
+!$omp target enter data map(alloc: dt%ic)
+!$omp target map(alloc: dt%ic)
+  if (size(dt%ic) /= 4) error stop
+  if (lbound(dt%ic, 1) /= 2) error stop
+  if (ubound(dt%ic, 1) /= 5) error stop
+  dt%ic = [22,

[PATCH v2] c++: Allow module name to be a single letter on Windows

2022-11-02 Thread Torbjörn SVENSSON via Gcc-patches
v1 -> v2:
Paths without "C:" part can still be absolute if they start with / or
\ on Windows.

Ok for trunk?

-

On Windows, the ':' character is special and when the module name is
a single character, like 'A', then the flatname would be (for
example) 'A:Foo'. On Windows, 'A:Foo' is treated as an absolute
path by the module loader and is likely not found.

Without this patch, the test case pr98944_c.C fails with:

In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_b.C:7:1,
of module A:Foo, imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:
A:Internals: error: header module expected, module 'A:Internals' found
A:Internals: error: failed to read compiled module: Bad file data
A:Internals: note: compiled module file is 'gcm.cache/A-Internals.gcm'
In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:8:
A:Foo: error: failed to read compiled module: Bad import dependency
A:Foo: note: compiled module file is 'gcm.cache/A-Foo.gcm'
A:Foo: fatal error: returning to the gate for a mechanical issue
compilation terminated.

include/ChangeLog:

* filenames.h: Added IS_REAL_ABSOLUTE_PATH macro to check if
path is absolute and not semi-absolute on Windows.

gcc/cp/ChangeLog:

* module.cc: Use IS_REAL_ABSOLUTE_PATH macro.

Tested on Windows with arm-none-eabi for Cortex-M3 in gcc-11 tree.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
 gcc/cp/module.cc|  2 +-
 include/filenames.h | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9957df510e6..84680e183b7 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13958,7 +13958,7 @@ get_module (tree name, module_state *parent, bool 
partition)
 static module_state *
 get_module (const char *ptr)
 {
-  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_ABSOLUTE_PATH (ptr))
+  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_REAL_ABSOLUTE_PATH (ptr))
 /* A header name.  */
 return get_module (build_string (strlen (ptr), ptr));
 
diff --git a/include/filenames.h b/include/filenames.h
index 6c72c422edd..5e08033ff36 100644
--- a/include/filenames.h
+++ b/include/filenames.h
@@ -43,6 +43,7 @@ extern "C" {
 #  define HAS_DRIVE_SPEC(f) HAS_DOS_DRIVE_SPEC (f)
 #  define IS_DIR_SEPARATOR(c) IS_DOS_DIR_SEPARATOR (c)
 #  define IS_ABSOLUTE_PATH(f) IS_DOS_ABSOLUTE_PATH (f)
+#  define IS_REAL_ABSOLUTE_PATH(f) IS_DOS_REAL_ABSOLUTE_PATH (f)
 #else /* not DOSish */
 #  if defined(__APPLE__)
 #ifndef HAVE_CASE_INSENSITIVE_FILE_SYSTEM
@@ -52,6 +53,7 @@ extern "C" {
 #  define HAS_DRIVE_SPEC(f) (0)
 #  define IS_DIR_SEPARATOR(c) IS_UNIX_DIR_SEPARATOR (c)
 #  define IS_ABSOLUTE_PATH(f) IS_UNIX_ABSOLUTE_PATH (f)
+#  define IS_REAL_ABSOLUTE_PATH(f) IS_ABSOLUTE_PATH (f)
 #endif
 
 #define IS_DIR_SEPARATOR_1(dos_based, c)   \
@@ -67,6 +69,7 @@ extern "C" {
 
 #define IS_DOS_DIR_SEPARATOR(c) IS_DIR_SEPARATOR_1 (1, c)
 #define IS_DOS_ABSOLUTE_PATH(f) IS_ABSOLUTE_PATH_1 (1, f)
+#define IS_DOS_REAL_ABSOLUTE_PATH(f) IS_ABSOLUTE_PATH_2 (1, f)
 #define HAS_DOS_DRIVE_SPEC(f) HAS_DRIVE_SPEC_1 (1, f)
 
 #define IS_UNIX_DIR_SEPARATOR(c) IS_DIR_SEPARATOR_1 (0, c)
@@ -81,6 +84,13 @@ extern "C" {
   (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])   \
|| HAS_DRIVE_SPEC_1 (dos_based, f))
 
+/* Identical to IS_ABSOLUTE_PATH_1, but do not allow semi-absolute paths
+   when DOS_BASED is true.  */
+#define IS_ABSOLUTE_PATH_2(dos_based, f)\
+  (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])   \
+   || (HAS_DRIVE_SPEC_1 (dos_based, f)  \
+   && IS_DIR_SEPARATOR_1 (dos_based, (f)[2])))
+
 extern int filename_cmp (const char *s1, const char *s2);
 #define FILENAME_CMP(s1, s2)   filename_cmp(s1, s2)
 
-- 
2.25.1



Re: [wwwdocs] gcc-13: riscv: Document the Zawrs support

2022-11-02 Thread Kito Cheng
LGTM, thanks!

On Wed, Nov 2, 2022 at 7:59 AM Christoph Muellner
 wrote:
>
> From: Christoph Müllner 
>
> This patch documents the new RISC-V Zawrs support.
>
> Signed-off-by: Christoph Müllner 
> ---
>  htdocs/gcc-13/changes.html | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 7c6bfa6e..5e6e054b 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -261,7 +261,10 @@ a work-in-progress.
>
>  
>
> -
> +RISC-V
> +
> +New ISA extension support for zawrs.
> +
>
>  
>
> --
> 2.38.1
>


Re: [RFC] RISC-V: Add profile supports.

2022-11-02 Thread Kito Cheng via Gcc-patches
Could you add some test cases?

---

Parsing logic is kind of too adhoc, I would prefer using something
like the following code to prevent magic pointer arithmetic like p+6:

something like this:

Table of all profile names = {"RVA20U64", riscv_profile::RVA20U64, ...}

const char *rva20u64[] = {"m", "a", "f", "d",... NULL};

table of profile content =
{
  {riscv_profile::RVA20U64, rva20u64},
   ..
}

parse march ()
{
  if march is startswith
  else if ((profile = parse_proile(march)) != risv_profile::NOT_PROFILE)
 handle_profile (profile)
  else
 error
}

handle_profile (profile)
{
  use table of profile content to update ext.
}


On Wed, Nov 2, 2022 at 5:54 AM jiawei  wrote:
>handle_profile
> Add two new function to handle profile input,
> "parse_profile" will check if a input into -march is
> legal, if it is then "handle_profile" will check the
> profile's type[I/M/A], year[20/22] and mode[U/S/M],
> set different extensions combine, just deal mandatory
> part currently.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_subset_list::parse_profile): Check if profile name is valid or 
> not.
> (riscv_subset_list::parse_std_ext): If input of -march option is
> a profile,skip first ISA check.
> (riscv_subset_list::parse): Handle rofile input in -march.
> (riscv_subset_list::handle_profile): Handle differen profiles
>  expand to extensions.
> * config/riscv/riscv-subset.h: New function prototypes.
>
>
> ---
>  gcc/common/config/riscv/riscv-common.cc | 95 +++--
>  gcc/config/riscv/riscv-subset.h |  5 +-
>  2 files changed, 94 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 602491c638d..da06bd89144 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -777,6 +777,35 @@ riscv_subset_list::parsing_subset_version (const char 
> *ext,
>return p;
>  }
>
> +/* Parsing function for profile.
> +
> +   Return Value:
> + Points to the end of profile.
> +
> +   Arguments:
> + `p`: Current parsing position.  */
> +
> +const char *
> +riscv_subset_list::parse_profile (const char *p)
> +{
> +  if(*p == 'I' || *p == 'M' || *p == 'A'){
> +p++;
> +if(startswith (p, "20") || startswith (p, "22"))
> +  p += 2;
> +if (*p == 'U' || *p == 'S' || *p == 'M')
> +  p++;
> +if(startswith (p, "64") || startswith (p, "32")){
> +   p += 2;
> +   riscv_subset_list::handle_profile(p-6, p-4, p-3);
> +   return p;
> +}
> +  }
> +  else
> +error_at (m_loc, "%<-march=%s%>: Invalid profile.", m_arch);
> +  return NULL;
> +}
> +
> +
>  /* Parsing function for standard extensions.parse_std_ext
>

It's sort of too adhoc parsing the profile name, I would prefer using
something like the following code to prevent magic pointer arithmetic
like p+6.
something
Table of all profile names = {"RVA20U64", riscv_profile::RVA20U64, ...}

const char *rva20u64[] = {"m", "a", "f", "d",... NULL};

table of profile content =
{
  {riscv_profile::RVA20U64, rva20u64},
   ..
}

parse march ()
{
  if march is startswith
  else if ((profile = parse_proile(march)) != risv_profile::NOT_PROFILE)
 handle_profile (profile)
  else
 error
}

handle_profile (profile)
{
  ad
}

> Return Value:
> @@ -786,7 +815,7 @@ riscv_subset_list::parsing_subset_version (const char 
> *ext,
>   `p`: Current parsing position.  */
>
>  const char *
> -riscv_subset_list::parse_std_ext (const char *p)
> +riscv_subset_list::parse_std_ext (const char *p, bool isprofile)
>  {
>const char *all_std_exts = riscv_supported_std_ext ();
>const char *std_exts = all_std_exts;
> @@ -795,8 +824,8 @@ riscv_subset_list::parse_std_ext (const char *p)
>unsigned minor_version = 0;
>char std_ext = '\0';
>bool explicit_version_p = false;
> -
> -  /* First letter must start with i, e or g.  */
> +  if (!isprofile){
> +/* First letter must start with i, e or g.  */
>switch (*p)
>  {
>  case 'i':
> @@ -850,6 +879,7 @@ riscv_subset_list::parse_std_ext (const char *p)
> "% or %", m_arch);
>return NULL;
>  }
> +}
>
>while (p != NULL && *p)
>  {
> @@ -1093,6 +1123,7 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>riscv_subset_list *subset_list = new riscv_subset_list (arch, loc);
>riscv_subset_t *itr;
>const char *p = arch;
> +  bool isprofile = false;
>if (startswith (p, "rv32"))
>  {
>subset_list->m_xlen = 32;
> @@ -1103,15 +1134,26 @@ riscv_subset_list::parse (const char *arch, 
> location_t loc)
>subset_list->m_xlen = 64;
>p += 4;
>  }
> +  else if (startswith (p, "RV"))
> +{
> +  if (startswith (p+6, "64"))
> +   subset_list->m_xlen = 64;
> +  else
> +   subset_li

Re: [PATCH, v2] Fortran: ordering of hidden procedure arguments [PR107441]

2022-11-02 Thread Mikael Morin

Le 31/10/2022 à 21:29, Harald Anlauf via Fortran a écrit :

Hi Mikael,

thanks a lot, your testcases broke my initial (and incorrect) patch
in multiple ways.  I understand now that the right solution is much
simpler and smaller.

I've added your testcases, see attached, with a simple scan of the
dump for the generated order of hidden arguments in the function decl
for the last testcase.

Regtested again on x86_64-pc-linux-gnu.  OK now?


Unfortunately no, the coarray case works, but the other problem remains.
The type problem is not visible in the definition of S, it is in the 
declaration of S's prototype in P.


S is defined as:

void s (character(kind=1)[1:_c] & restrict c, integer(kind=4) o, 
logical(kind=1) _o, integer(kind=8) _c)

{
...
}

but P has:

void p ()
{
  static void s (character(kind=1)[1:] & restrict, integer(kind=4), 
integer(kind=8), logical(kind=1));
  void (*) (character(kind=1)[1:] & restrict, integer(kind=4), 
integer(kind=8), logical(kind=1)) pp;


  pp = s;
...
}





Re: [PATCH] genmultilib: Add sanity check

2022-11-02 Thread Joseph Myers
On Wed, 2 Nov 2022, Christophe Lyon via Gcc-patches wrote:

> +# Sanity check: make sure we have as many dirnames as options
> +if [ -n "${dirnames}" ]; then
> +options_arr=($options)

This is an sh script; arrays are a bash feature.  Building GCC isn't 
supposed to need bash (or to rely on $(SHELL) being bash, even when bash 
is available - many GNU/Linux systems use dash for /bin/sh), only a POSIX 
shell.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] genmultilib: Add sanity check

2022-11-02 Thread Christophe Lyon via Gcc-patches




On 11/2/22 18:29, Joseph Myers wrote:

On Wed, 2 Nov 2022, Christophe Lyon via Gcc-patches wrote:


+# Sanity check: make sure we have as many dirnames as options
+if [ -n "${dirnames}" ]; then
+options_arr=($options)


This is an sh script; arrays are a bash feature.  Building GCC isn't
supposed to need bash (or to rely on $(SHELL) being bash, even when bash
is available - many GNU/Linux systems use dash for /bin/sh), only a POSIX
shell.



That's what I feared, and I did "try to try" to build with dash, but I 
realize now that changing SHELL in the generated gcc/Makefile is not 
enough since it's defined by the higher level Makefile/config.status. 
Indeed rebuilding from scratch with CONFIG_SHELL=/bin/dash fails with my 
patch.


We have lived with that behavior for years, so it's not that bad anyway :-)

Thanks,

Christophe


Re: [RFC] RISC-V: Add profile supports.

2022-11-02 Thread Palmer Dabbelt

On Wed, 02 Nov 2022 10:19:15 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Could you add some test cases?


Also documentation, and ideally some sort of spec for what this should 
do so we can maintain compatibility with LLVM as well as we can.


IIUC this also allows for profiles in the arch function attributes, 
which would end up plumbing through to the assembler so we'd need 
support there?  Probably best to just expand these out for the rest of 
the tools so we don't need the profile->extension mappings everywhere, 
IMO it's the same as the -mcpu discussion.




---

Parsing logic is kind of too adhoc, I would prefer using something
like the following code to prevent magic pointer arithmetic like p+6:

something like this:

Table of all profile names = {"RVA20U64", riscv_profile::RVA20U64, ...}

const char *rva20u64[] = {"m", "a", "f", "d",... NULL};

table of profile content =
{
  {riscv_profile::RVA20U64, rva20u64},
   ..
}

parse march ()
{
  if march is startswith
  else if ((profile = parse_proile(march)) != risv_profile::NOT_PROFILE)
 handle_profile (profile)
  else
 error
}

handle_profile (profile)
{
  use table of profile content to update ext.
}


On Wed, Nov 2, 2022 at 5:54 AM jiawei  wrote:

handle_profile
Add two new function to handle profile input,
"parse_profile" will check if a input into -march is
legal, if it is then "handle_profile" will check the
profile's type[I/M/A], year[20/22] and mode[U/S/M],
set different extensions combine, just deal mandatory
part currently.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_profile): Check if profile name is valid or 
not.
(riscv_subset_list::parse_std_ext): If input of -march option is
a profile,skip first ISA check.
(riscv_subset_list::parse): Handle rofile input in -march.
(riscv_subset_list::handle_profile): Handle differen profiles
 expand to extensions.
* config/riscv/riscv-subset.h: New function prototypes.


---
 gcc/common/config/riscv/riscv-common.cc | 95 +++--
 gcc/config/riscv/riscv-subset.h |  5 +-
 2 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 602491c638d..da06bd89144 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -777,6 +777,35 @@ riscv_subset_list::parsing_subset_version (const char *ext,
   return p;
 }

+/* Parsing function for profile.
+
+   Return Value:
+ Points to the end of profile.
+
+   Arguments:
+ `p`: Current parsing position.  */
+
+const char *
+riscv_subset_list::parse_profile (const char *p)
+{
+  if(*p == 'I' || *p == 'M' || *p == 'A'){
+p++;
+if(startswith (p, "20") || startswith (p, "22"))
+  p += 2;
+if (*p == 'U' || *p == 'S' || *p == 'M')
+  p++;
+if(startswith (p, "64") || startswith (p, "32")){
+   p += 2;
+   riscv_subset_list::handle_profile(p-6, p-4, p-3);
+   return p;
+}
+  }
+  else
+error_at (m_loc, "%<-march=%s%>: Invalid profile.", m_arch);
+  return NULL;
+}
+
+
 /* Parsing function for standard extensions.parse_std_ext



It's sort of too adhoc parsing the profile name, I would prefer using
something like the following code to prevent magic pointer arithmetic
like p+6.
something
Table of all profile names = {"RVA20U64", riscv_profile::RVA20U64, ...}

const char *rva20u64[] = {"m", "a", "f", "d",... NULL};

table of profile content =
{
  {riscv_profile::RVA20U64, rva20u64},
   ..
}

parse march ()
{
  if march is startswith
  else if ((profile = parse_proile(march)) != risv_profile::NOT_PROFILE)
 handle_profile (profile)
  else
 error
}

handle_profile (profile)
{
  ad
}


Return Value:
@@ -786,7 +815,7 @@ riscv_subset_list::parsing_subset_version (const char *ext,
  `p`: Current parsing position.  */

 const char *
-riscv_subset_list::parse_std_ext (const char *p)
+riscv_subset_list::parse_std_ext (const char *p, bool isprofile)
 {
   const char *all_std_exts = riscv_supported_std_ext ();
   const char *std_exts = all_std_exts;
@@ -795,8 +824,8 @@ riscv_subset_list::parse_std_ext (const char *p)
   unsigned minor_version = 0;
   char std_ext = '\0';
   bool explicit_version_p = false;
-
-  /* First letter must start with i, e or g.  */
+  if (!isprofile){
+/* First letter must start with i, e or g.  */
   switch (*p)
 {
 case 'i':
@@ -850,6 +879,7 @@ riscv_subset_list::parse_std_ext (const char *p)
"% or %", m_arch);
   return NULL;
 }
+}

   while (p != NULL && *p)
 {
@@ -1093,6 +1123,7 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   riscv_subset_list *subset_list = new riscv_subset_list (arch, loc);
   riscv_subset_t *itr;
   const char *p = arch;
+  bool isprofile = false;
   if (startswith (p, "rv32"))

Re: PING^4 [PATCH] testsuite: Verify that module-mapper is available

2022-11-02 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602844.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 16:24, Torbjorn SVENSSON via Gcc-patches wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603544.html

Kind regards,
Torbjörn

On 2022-10-14 09:42, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602843.html

Kind regards,
Torbjörn

On 2022-10-05 11:17, Torbjorn SVENSSON wrote:

Hi,

Ping, 
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602111.html


Kind regards,
Torbjörn

On 2022-09-23 14:03, Torbjörn SVENSSON wrote:

For some test cases, it's required that the optional module mapper
"g++-mapper-server" is built. As the server is not required, the
test cases will fail if it can't be found.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_is_prog_name_available):
New.
* lib/target-supports-dg.exp
(dg-require-prog-name-available): New.
* g++.dg/modules/modules.exp: Verify avilability of module
mapper.

Signed-off-by: Torbjörn SVENSSON  
---
  gcc/testsuite/g++.dg/modules/modules.exp | 31 


  gcc/testsuite/lib/target-supports-dg.exp | 15 
  gcc/testsuite/lib/target-supports.exp    | 15 
  3 files changed, 61 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
b/gcc/testsuite/g++.dg/modules/modules.exp

index afb323d0efd..4784803742a 100644
--- a/gcc/testsuite/g++.dg/modules/modules.exp
+++ b/gcc/testsuite/g++.dg/modules/modules.exp
@@ -279,6 +279,29 @@ proc module-init { src } {
  return $option_list
  }
+# Return 1 if requirements are met
+proc module-check-requirements { tests } {
+    foreach test $tests {
+    set tmp [dg-get-options $test]
+    foreach op $tmp {
+    switch [lindex $op 0] {
+    "dg-additional-options" {
+    # Example strings to match:
+    # -fmodules-ts -fmodule-mapper=|@g++-mapper-server\\ 
-t\\ [srcdir]/inc-xlate-1.map

+    # -fmodules-ts -fmodule-mapper=|@g++-mapper-server
+    if [regexp -- {(^| )-fmodule-mapper=\|@([^\\ ]*)} 
[lindex $op 2] dummy dummy2 prog] {

+    verbose "Checking that mapper exist: $prog"
+    if { ![ check_is_prog_name_available $prog ] } {
+    return 0
+    }
+    }
+    }
+    }
+    }
+    }
+    return 1
+}
+
  # cleanup any detritus from previous run
  cleanup_module_files [find $DEFAULT_REPO *.gcm]
@@ -307,6 +330,14 @@ foreach src [lsort [find $srcdir/$subdir 
{*_a.[CHX}]] {

  set tests [lsort [find [file dirname $src] \
    [regsub {_a.[CHX]$} [file tail $src] 
{_[a-z].[CHX]}]]]

+    if { ![module-check-requirements $tests] } {
+    set testcase [regsub {_a.[CH]} $src {}]
+    set testcase \
+    [string range $testcase [string length "$srcdir/"] end]
+    unsupported $testcase
+    continue
+    }
+
  set std_list [module-init $src]
  foreach std $std_list {
  set mod_files {}
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp

index aa2164bc789..6ce3b2b1a1b 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -683,3 +683,18 @@ proc dg-require-symver { args } {
  set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
  }
  }
+
+# If this target does not provide prog named "$args", skip this test.
+
+proc dg-require-prog-name-available { args } {
+    # The args are within another list; pull them out.
+    set args [lindex $args 0]
+
+    set prog [lindex $args 1]
+
+    if { ![ check_is_prog_name_available $prog ] } {
+    upvar dg-do-what dg-do-what
+    set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp

index 703aba412a6..c3b7a6c17b3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11928,3 +11928,18 @@ main:
  .byte 0
    } ""]
  }
+
+# Return 1 if this target has prog named "$prog", 0 otherwise.
+
+proc check_is_prog_name_available { prog } {
+    global tool
+
+    set options [list "additional_flags=-print-prog-name=$prog"]
+    set output [lindex [${tool}_target_compile "" "" "none" 
$options] 0]

+
+    if { $output == $prog } {
+    return 0
+    }
+
+    return 1
+}


PING^1 [PATCH] testsuite: Windows paths use \ and not /

2022-11-02 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604312.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 17:15, Torbjörn SVENSSON wrote:

Without this patch, the following error is reported on Windows:

In file included from t:\build\arm-none-eabi\include\c++\11.3.1\string:54,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\locale_classes.h:40,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\ios_base.h:41,
   from t:\build\arm-none-eabi\include\c++\11.3.1\ios:42,
   from t:\build\arm-none-eabi\include\c++\11.3.1\ostream:38,
   from t:\build\arm-none-eabi\include\c++\11.3.1\iostream:39:
t:\build\arm-none-eabi\include\c++\11.3.1\bits\range_access.h:36:10: note: 
include 't:\build\arm-none-eabi\include\c++\11.3.1\initializer_list' translated 
to import
arm-none-eabi-g++.exe: warning: .../gcc/testsuite/g++.dg/modules/pr99023_b.X: 
linker input file unused because linking not done
FAIL: g++.dg/modules/pr99023_b.X -std=c++2a  dg-regexp 6 not found: "[^\n]*: note: 
include '[^\n]*/initializer_list' translated to import\n"

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99023_b.X: Match Windows paths too.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/modules/pr99023_b.X | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/modules/pr99023_b.X 
b/gcc/testsuite/g++.dg/modules/pr99023_b.X
index 3d82f34868b..ca5f32e5bcc 100644
--- a/gcc/testsuite/g++.dg/modules/pr99023_b.X
+++ b/gcc/testsuite/g++.dg/modules/pr99023_b.X
@@ -3,5 +3,5 @@
  
  // { dg-prune-output {linker input file unused} }
  
-// { dg-regexp {[^\n]*: note: include '[^\n]*/initializer_list' translated to import\n} }

+// { dg-regexp {[^\n]*: note: include '[^\n]*[/\\]initializer_list' translated 
to import\n} }
  NO DO NOT COMPILE


PING^1 [PATCH] arm: Allow to override location of .gnu.sgstubs section

2022-11-02 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603878.html

Kind regards,
Torbjörn

On 2022-10-19 11:42, Torbjörn SVENSSON wrote:

Depending on the DejaGNU board definition, the .gnu.sgstubs section
might be placed on different locations in order to suite the target.
With this patch, the start location of the section is overrideable
from the board definition with the fallback of the previously
hardcoded location.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/bitfield-1.c: Use overridable location.
* gcc.target/arm/cmse/bitfield-2.c: Likewise.
* gcc.target/arm/cmse/bitfield-3.c: Likewise.
* gcc.target/arm/cmse/cmse-20.c: Likewise.
* gcc.target/arm/cmse/struct-1.c: Likewise.
* gcc.target/arm/cmse/cmse.exp (cmse_sgstubs): New.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/cmse/bitfield-1.c |  2 +-
  gcc/testsuite/gcc.target/arm/cmse/bitfield-2.c |  2 +-
  gcc/testsuite/gcc.target/arm/cmse/bitfield-3.c |  2 +-
  gcc/testsuite/gcc.target/arm/cmse/cmse-20.c|  2 +-
  gcc/testsuite/gcc.target/arm/cmse/cmse.exp | 11 +++
  gcc/testsuite/gcc.target/arm/cmse/struct-1.c   |  2 +-
  6 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/bitfield-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/bitfield-1.c
index 5685f744435..c1221bef29f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/bitfield-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/bitfield-1.c
@@ -1,5 +1,5 @@
  /* This test is executed only if the execution engine supports CMSE 
instructions.  */
-/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=[cmse_sgstubs]" } */
  
  typedef struct

  {
diff --git a/gcc/testsuite/gcc.target/arm/cmse/bitfield-2.c 
b/gcc/testsuite/gcc.target/arm/cmse/bitfield-2.c
index 7a794d44644..79e9a3efc93 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/bitfield-2.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/bitfield-2.c
@@ -1,5 +1,5 @@
  /* This test is executed only if the execution engine supports CMSE 
instructions.  */
-/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=[cmse_sgstubs]" } */
  
  typedef struct

  {
diff --git a/gcc/testsuite/gcc.target/arm/cmse/bitfield-3.c 
b/gcc/testsuite/gcc.target/arm/cmse/bitfield-3.c
index 5875f8dff48..d621a802ee1 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/bitfield-3.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/bitfield-3.c
@@ -1,5 +1,5 @@
  /* This test is executed only if the execution engine supports CMSE 
instructions.  */
-/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=[cmse_sgstubs]" } */
  
  typedef struct

  {
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
index 08e89bff637..bbea9358870 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
@@ -1,5 +1,5 @@
  /* This test is executed only if the execution engine supports CMSE 
instructions.  */
-/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=[cmse_sgstubs]" } */
  
  #include 

  #include 
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse.exp 
b/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
index 436dd71ef89..1df5d56c6d5 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
@@ -44,6 +44,17 @@ if {[is-effective-target arm_cmse_hw]} then {
  set saved-lto_torture_options ${LTO_TORTURE_OPTIONS}
  set LTO_TORTURE_OPTIONS ""
  
+# Return the start address of the .gnu.sgstubs section.

+proc cmse_sgstubs {} {
+# Allow to override the location of .gnu.sgstubs section.
+set tboard [target_info name]
+if {[board_info $tboard exists cmse_sgstubs]} {
+   return [board_info $tboard cmse_sgstubs]
+}
+
+return "0x0040"
+}
+
  # These are for both baseline and mainline.
  gcc-dg-runtest [lsort [glob $srcdir/$subdir/*.c]] \
"" $DEFAULT_CFLAGS
diff --git a/gcc/testsuite/gcc.target/arm/cmse/struct-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/struct-1.c
index 75a99f487e7..bebd059b13f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/struct-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/struct-1.c
@@ -1,5 +1,5 @@
  /* This test is executed only if the execution engine supports CMSE 
instructions.  */
-/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=[cmse_sgstubs]" } */
  
  typedef struct

  {


PING^1 [PATCH] cpp/remap: Only override if string matched

2022-11-02 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604062.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-20 22:48, Torbjörn SVENSSON wrote:

For systems with HAVE_DOS_BASED_FILE_SYSTEM set, only override the
pointer if the backslash pattern matches.

Output without this patch:
.../gcc/testsuite/gcc.dg/cpp/pr71681-2.c:5:10: fatal error: a/t2.h: No such 
file or directory

With patch applied, no output and the test case succeeds.

libcpp/ChangeLog

* files.cc: Ensure pattern matches before use.

Signed-off-by: Torbjörn SVENSSON 
---
  libcpp/files.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 24208f7b0f8..a18b1caf48d 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -1833,7 +1833,7 @@ remap_filename (cpp_reader *pfile, _cpp_file *file)
  #ifdef HAVE_DOS_BASED_FILE_SYSTEM
{
const char *p2 = strchr (fname, '\\');
-   if (!p || (p > p2))
+   if (!p || (p2 && p > p2))
  p = p2;
}
  #endif


Re: [wwwdocs] gcc-13: riscv: Document the Zawrs support

2022-11-02 Thread Philipp Tomsich
Applied to gcc-wwwdocs/master. Thanks!
Philipp.

On Wed, 2 Nov 2022 at 17:12, Kito Cheng  wrote:
>
> LGTM, thanks!
>
> On Wed, Nov 2, 2022 at 7:59 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch documents the new RISC-V Zawrs support.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  htdocs/gcc-13/changes.html | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 7c6bfa6e..5e6e054b 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -261,7 +261,10 @@ a work-in-progress.
> >
> >  
> >
> > -
> > +RISC-V
> > +
> > +New ISA extension support for zawrs.
> > +
> >
> >  
> >
> > --
> > 2.38.1
> >


[PATCH] Extend optimization for integer bit test on __atomic_fetch_[or|and]_*

2022-11-02 Thread H.J. Lu via Gcc-patches
Extend optimization for

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
_4 = _5 >= 0;

to

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
if (_5 >= 0)

gcc/

PR middle-end/102566
* tree-ssa-ccp.cc (optimize_atomic_bit_test_and): Also handle
if (_5 < 0) and if (_5 >= 0).

gcc/testsuite/

PR middle-end/102566
* g++.target/i386/pr102566-7.C
---
 gcc/testsuite/g++.target/i386/pr102566-7.C | 22 ++
 gcc/tree-ssa-ccp.cc| 84 ++
 2 files changed, 91 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr102566-7.C

diff --git a/gcc/testsuite/g++.target/i386/pr102566-7.C 
b/gcc/testsuite/g++.target/i386/pr102566-7.C
new file mode 100644
index 000..ce90214f33d
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr102566-7.C
@@ -0,0 +1,22 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2" } */
+
+#include 
+
+template
+void lock_bts(std::atomic &a) { while (!(a.fetch_or(b) & b)); }
+template
+void lock_btr(std::atomic &a) { while (a.fetch_and(~b) & b); }
+template
+void lock_btc(std::atomic &a) { while (a.fetch_xor(b) & b); }
+template void lock_bts<1U<<30>(std::atomic &a);
+template void lock_btr<1U<<30>(std::atomic &a);
+template void lock_btc<1U<<30>(std::atomic &a);
+template void lock_bts<1U<<31>(std::atomic &a);
+template void lock_btr<1U<<31>(std::atomic &a);
+template void lock_btc<1U<<31>(std::atomic &a);
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 2 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 2 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcl" 2 } } */
+/* { dg-final { scan-assembler-not "cmpxchg" } } */
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 9778e776cf2..3a4b6bc1118 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -3471,17 +3471,35 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator 
*gsip,
{
  gimple *use_nop_stmt;
  if (!single_imm_use (use_lhs, &use_p, &use_nop_stmt)
- || !is_gimple_assign (use_nop_stmt))
+ || (!is_gimple_assign (use_nop_stmt)
+ && gimple_code (use_nop_stmt) != GIMPLE_COND))
return false;
- tree use_nop_lhs = gimple_assign_lhs (use_nop_stmt);
- rhs_code = gimple_assign_rhs_code (use_nop_stmt);
- if (rhs_code != BIT_AND_EXPR)
+ /* Handle both
+_4 = _5 < 0;
+and
+if (_5 < 0)
+  */
+ tree use_nop_lhs = nullptr;
+ rhs_code = ERROR_MARK;
+ if (is_gimple_assign (use_nop_stmt))
{
- if (TREE_CODE (use_nop_lhs) == SSA_NAME
+ use_nop_lhs = gimple_assign_lhs (use_nop_stmt);
+ rhs_code = gimple_assign_rhs_code (use_nop_stmt);
+   }
+ if (!use_nop_lhs || rhs_code != BIT_AND_EXPR)
+   {
+ /* Also handle
+if (_5 < 0)
+  */
+ if (use_nop_lhs
+ && TREE_CODE (use_nop_lhs) == SSA_NAME
  && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use_nop_lhs))
return false;
- if (rhs_code == BIT_NOT_EXPR)
+ if (use_nop_lhs && rhs_code == BIT_NOT_EXPR)
{
+ /* Handle
+_7 = ~_2;
+  */
  g = convert_atomic_bit_not (fn, use_nop_stmt, lhs,
  mask);
  if (!g)
@@ -3512,14 +3530,31 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator 
*gsip,
}
  else
{
- if (TREE_CODE (TREE_TYPE (use_nop_lhs)) != BOOLEAN_TYPE)
-   return false;
+ tree cmp_rhs1, cmp_rhs2;
+ if (use_nop_lhs)
+   {
+ /* Handle
+_4 = _5 < 0;
+  */
+ if (TREE_CODE (TREE_TYPE (use_nop_lhs))
+ != BOOLEAN_TYPE)
+   return false;
+ cmp_rhs1 = gimple_assign_rhs1 (use_nop_stmt);
+ cmp_rhs2 = gimple_assign_rhs2 (use_nop_stmt);
+   }
+ else
+   {
+ /* Handle
+if (_5 < 0)
+  */
+ rhs_code = gimple_cond_code (use_nop_stmt);
+ cmp_rhs1 = gimple_cond_lhs (use_nop_stmt);
+ cmp_rhs2 = gimple_cond_rhs (use_nop_stmt);
+   }
  if (rhs_code != GE_EXPR && rhs_code != LT_EXPR)
return false;
- tree cmp_rhs1 = gimple_assign_rhs1 (use_nop_stmt);
  if (use_lhs != cmp_rhs1)
return false;
- tree cmp_rhs2 = gimple_assign_rhs2 (use_nop_stmt);
   

Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Philipp Tomsich
Applied to master (with a fixed-up commit message), thanks!
Note that the Zawrs has been approved for ratification by the RISC-V
BoD on Oct 20th.

--Philipp.


On Thu, 27 Oct 2022 at 22:51, Palmer Dabbelt  wrote:
>
> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu wrote:
> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
> > christoph.muell...@vrull.eu> wrote:
> >
> >> From: Christoph Muellner 
> >>
> >> This patch adds support for the Zawrs ISA extension.
> >> The patch depends on the corresponding Binutils patch
> >> to be usable (see [1])
> >>
> >> The specification can be found here:
> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> >>
> >> Note, that the Zawrs extension is not frozen or ratified yet.
> >> Therefore this patch is an RFC and not intended to get merged.
> >>
> >
> > Sorry, forgot to update this part:
> > The Zawrs extension is frozen but not ratified.
> > Let me know if I should send a v2 for this change of the commit msg.
>
> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
> the NEWS entry too.  I also don't see any build/test results.
>
> Thanks!
>
> > Binuitls support has been merged recently:
> >
> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
> >
> >
> >>
> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
> >>
> >> gcc/ChangeLog:
> >>
> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
> >> (TARGET_ZAWRS): New.
> >> * config/riscv/riscv.opt: New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/zawrs.c: New test.
> >>
> >> Signed-off-by: Christoph Muellner 
> >> ---
> >>  gcc/common/config/riscv/riscv-common.cc |  4 
> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
> >>  gcc/config/riscv/riscv.opt  |  3 +++
> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
> >>  4 files changed, 23 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
> >>
> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> >> b/gcc/common/config/riscv/riscv-common.cc
> >> index d6404a01205..4b7f777c103 100644
> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
> >> riscv_ext_version_table[] =
> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >>
> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
> >> +
> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
> >> riscv_ext_flag_table[] =
> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> >>
> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
> >> +
> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
> >> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> >> index 1dfe8c89209..25fd85b09b1 100644
> >> --- a/gcc/config/riscv/riscv-opts.h
> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >>
> >> +#define MASK_ZAWRS   (1 << 0)
> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> >> +
> >>  #define MASK_ZBA  (1 << 0)
> >>  #define MASK_ZBB  (1 << 1)
> >>  #define MASK_ZBC  (1 << 2)
> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> index 426ea95cd14..7c3ca48d1cc 100644
> >> --- a/gcc/config/riscv/riscv.opt
> >> +++ b/gcc/config/riscv/riscv.opt
> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
> >>  TargetVariable
> >>  int riscv_zi_subext
> >>
> >> +TargetVariable
> >> +int riscv_za_subext
> >> +
> >>  TargetVariable
> >>  int riscv_zb_subext
> >>
> >> diff --git a/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> new file mode 100644
> >> index 000..0b7e2662343
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> @@ -0,0 +1,13 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc_zawrs" { target { rv64 } } } */
> >> +/* { dg-options "-march=rv32gc_zawrs" { target { rv32 } } } */
> >> +
> >> +#ifndef __riscv_zawrs
> >> +#error Feature macro not defined
> >> +#endif
> >> +
> >> +int
> >> +foo (int a)
> >> +{
> >> +  return a;
> >> +}
> >> --
> >> 2.37.3
> >>
> >>


Add 'libgomp.oacc-fortran/declare-allocatable-1.f90' (was: [gomp4] add support for fortran allocate support with declare create)

2022-11-02 Thread Thomas Schwinge
Hi!

On 2017-04-05T08:23:58-0700, Cesar Philippidis  wrote:
> This patch implements the OpenACC 2.5 behavior of fortran allocate on
> variables marked with declare create as defined in Section 2.13.2 in the
> OpenACC spec.

That functionality is still missing in GCC master branch, however a test
case included in that submission here:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
> @@ -0,0 +1,211 @@
> +! Test declare create with allocatable arrays.

... is useful in a different (though related) context that I'm currently
working on.  Having applied the following changes:

  - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
changes).
  - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
  - Add scanning for OpenACC compiler diagnostics.
  - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).

..., I've then pushed to master branch
commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
"Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 5 Apr 2017 08:23:58 -0700
Subject: [PATCH] Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New.

Co-authored-by: Thomas Schwinge 
---
 .../declare-allocatable-1.f90 | 268 ++
 1 file changed, 268 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
new file mode 100644
index 000..1c8ccd9f61f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
@@ -0,0 +1,268 @@
+! Test OpenACC 'declare create' with allocatable arrays.
+
+! { dg-do run }
+
+!TODO-OpenACC-declare-allocate
+! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
+! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
+! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
+! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
+
+!TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
+
+! { dg-additional-options -fopt-info-all-omp }
+! { dg-additional-options -foffload=-fopt-info-all-omp }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+! { dg-additional-options -foffload=--param=openacc-privatization=noisy }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable '[Di]\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
+! { dg-additional-options -Wopenacc-parallelism }
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c 0] }
+! { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".
+
+
+module vars
+  implicit none
+  integer, parameter :: n = 100
+  real*8, allocatable :: b(:)
+ !$acc declare create (b)
+end module vars
+
+program test
+  use vars
+  use openacc
+  implicit none
+  real*8 :: a
+  integer :: i
+
+  interface
+ subroutine sub1
+   !$acc routine gang
+ end subroutine sub1
+
+ subroutine sub2
+ end subroutine sub2
+
+ real*8 function fun1 (ix)
+   integer ix
+   !$acc routine seq
+ end function fun1
+
+ real*8 function fun2 (ix)
+   integer ix
+   !$acc routine seq
+ end function fun2
+  end interface
+
+  if (allocated (b)) error stop
+
+  ! Test local usage of an allocated declared array.
+
+  allocate (b(n))
+
+  if (.not.allocated (b)) error stop
+  if (.not.acc_is_present (b)) error stop
+
+  a = 2.0
+
+  !$acc parallel loop ! { dg-line l[incr c] }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l$c }
+  !   { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } l$c }
+  !   { dg-note {variable 'i' adjusted for OpenACC privatization level: 'vector'} {} { target { ! openacc_host_selected } } l$c }
+  ! { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l$c }
+  ! { dg-optimized {as

Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90' (was: Add 'libgomp.oacc-fortran/declare-allocatable-1.f90')

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:04:56+0100, I wrote:
> On 2017-04-05T08:23:58-0700, Cesar Philippidis  wrote:
>> This patch implements the OpenACC 2.5 behavior of fortran allocate on
>> variables marked with declare create as defined in Section 2.13.2 in the
>> OpenACC spec.
>
> That functionality is still missing in GCC master branch, however a test
> case included in that submission here:
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>> @@ -0,0 +1,211 @@
>> +! Test declare create with allocatable arrays.
>
> ... is useful in a different (though related) context that I'm currently
> working on.  Having applied the following changes:
>
>   - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
> changes).
>   - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
>   - Add scanning for OpenACC compiler diagnostics.
>   - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).
>
> ..., I've then pushed to master branch
> commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
> "Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
> @@ -0,0 +1,268 @@
> +! Test OpenACC 'declare create' with allocatable arrays.
> +
> +! { dg-do run }
> +
> +!TODO-OpenACC-declare-allocate
> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
> +
> +[...]

Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
work around (as seen in real-world code), I've pushed to master branch
commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
"Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 14 Oct 2022 17:36:51 +0200
Subject: [PATCH] Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'

... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
for missing support for OpenACC "Changes from Version 2.0 to 2.5":
"The 'declare create' directive with a Fortran 'allocatable' has new behavior".
Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
manually.

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90:
	New.
---
 ...ble-1.f90 => declare-allocatable-1-runtime.f90} | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-1.f90 => declare-allocatable-1-runtime.f90} (96%)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
similarity index 96%
copy from libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
copy to libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
index 1c8ccd9f61f..e4cb9c378a3 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
@@ -3,10 +3,10 @@
 ! { dg-do run }
 
 !TODO-OpenACC-declare-allocate
-! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
 ! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
 ! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
-! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
+! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
+! manually.
 
 !TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
 
@@ -67,6 +67,7 @@ program test
   ! Test local usage of an allocated declared array.
 
   allocate (b(n))
+  call acc_create (b)
 
   if (.not.allocated (b)) error stop
   if (.not.acc_is_present (b)) error stop
@@ -91,12 +92,14 @@ program test
  if (b(i) /= i*a) error stop
   end do
 
+  call acc_delete (b)
   deallocate (b)
 
   ! Test the usage of an allocated declared array inside an acc
   ! routine subroutine.
 
   allocate (b(n))
+  call acc_create (b)
 
   if (.not.allocated (b)) error stop
   if (.not.acc_is_present (b)) error stop
@@ -114,6 +117,7 @@ program test
  if (b(i) /= i*2) error stop
   end do
 
+  call acc_delete (b)
   deallocate (b)
 
   ! Test the usage of an allocated declared array inside a host
@@ -129,6 +133,7 @@ program test
  if (b(i) /= 1.0) error stop
   end do
 
+  call acc_delete (b)
   deallocate (b)
 
   if (allocated (b)) error stop
@@ -137,6 +142

Add 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:10:54+0100, I wrote:
> On 2022-11-02T21:04:56+0100, I wrote:
>> On 2017-04-05T08:23:58-0700, Cesar Philippidis  
>> wrote:
>>> This patch implements the OpenACC 2.5 behavior of fortran allocate on
>>> variables marked with declare create as defined in Section 2.13.2 in the
>>> OpenACC spec.
>>
>> That functionality is still missing in GCC master branch, however a test
>> case included in that submission here:
>>
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>>> @@ -0,0 +1,211 @@
>>> +! Test declare create with allocatable arrays.
>>
>> ... is useful in a different (though related) context that I'm currently
>> working on.  Having applied the following changes:
>>
>>   - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
>> changes).
>>   - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
>>   - Add scanning for OpenACC compiler diagnostics.
>>   - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).
>>
>> ..., I've then pushed to master branch
>> commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
>> "Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>> @@ -0,0 +1,268 @@
>> +! Test OpenACC 'declare create' with allocatable arrays.
>> +
>> +! { dg-do run }
>> +
>> +!TODO-OpenACC-declare-allocate
>> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
>> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
>> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
>> behavior".
>> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
>> +
>> +[...]
>
> Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
> work around (as seen in real-world code), I've pushed to master branch
> commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
> "Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'"

> ... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
> for missing support for OpenACC "Changes from Version 2.0 to 2.5":
> "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
> manually.

A similar test case, but with different focus, I've pushed to master
branch in commit abeaf3735fe2568b9d5b8096318da866b1fe1e5c
"Add 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From abeaf3735fe2568b9d5b8096318da866b1fe1e5c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 26 Oct 2022 23:47:29 +0200
Subject: [PATCH] Add
 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90:
	New.
---
 ...allocatable-array_descriptor-1-runtime.f90 | 402 ++
 1 file changed, 402 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
new file mode 100644
index 000..b27f312631d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
@@ -0,0 +1,402 @@
+! Test OpenACC 'declare create' with allocatable arrays.
+
+! { dg-do run }
+
+! Note that we're not testing OpenACC semantics here, but rather documenting
+! current GCC behavior, specifically, behavior concerning updating of
+! host/device array descriptors.
+! { dg-skip-if n/a { *-*-* } { -DACC_MEM_SHARED=1 } }
+
+!TODO-OpenACC-declare-allocate
+! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
+! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
+! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
+! manually.
+
+
+!TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
+
+
+!TODO OpenACC 'serial' vs. GCC/nvptx:
+!TODO { dg-prune-output {using 'vector_length \(32\)', ignoring 1} }
+
+
+! { dg-additional-options -fdump-tree-original }
+! { dg-additional-options -fdump-tree-gimple }
+
+
+module vars
+  implicit none
+  integer, parameter :: n1_lb = -3
+  integer, parameter :: n1_ub = 6
+  integer, parameter :: n2_lb = -
+  integer, parameter :: n2_ub = 2
+
+  integer, allocatable :: b(:)
+  !$acc declare create (b)
+
+end module vars
+
+program test
+  use vars
+  use openacc
+  implicit none
+  integer :: i
+
+  ! Identif

Support OpenACC 'declare create' with Fortran allocatable arrays, part I [PR106643]

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:15:31+0100, I wrote:
> On 2022-11-02T21:10:54+0100, I wrote:
>> On 2022-11-02T21:04:56+0100, I wrote:
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>>> @@ -0,0 +1,268 @@
>>> +! Test OpenACC 'declare create' with allocatable arrays.
>>> +
>>> +! { dg-do run }
>>> +
>>> +!TODO-OpenACC-declare-allocate
>>> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
>>> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
>>> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
>>> behavior".
>>> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
>>> +
>>> +[...]
>>
>> Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
>> work around (as seen in real-world code), I've pushed to master branch
>> commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
>> "Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'"
>
>> ... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
>> for missing support for OpenACC "Changes from Version 2.0 to 2.5":
>> "The 'declare create' directive with a Fortran 'allocatable' has new 
>> behavior".
>> Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
>> manually.
>
> A similar test case, but with different focus, I've pushed to master
> branch in commit abeaf3735fe2568b9d5b8096318da866b1fe1e5c
> "Add 
> 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'",
> see attached.

> --- /dev/null
> +++ 
> b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
> @@ -0,0 +1,402 @@
> +! Test OpenACC 'declare create' with allocatable arrays.
> +
> +! { dg-do run }
> +
> +! Note that we're not testing OpenACC semantics here, but rather documenting
> +! current GCC behavior, specifically, behavior concerning updating of
> +! host/device array descriptors.
> +! { dg-skip-if n/a { *-*-* } { -DACC_MEM_SHARED=1 } }
> +
> +!TODO-OpenACC-declare-allocate
> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> +! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
> +! manually.

If instead of calling 'acc_create'/'acc_delete' we'd like to use
'!$acc enter data create'/'!$acc exit data delete', we run into

"[gfortran + OpenACC] Allocate in module causes refcount error".
Pushed to master branchcommit da8e0e1191c5512244a752b30dea0eba83e3d10c
"Support OpenACC 'declare create' with Fortran allocatable arrays, part I 
[PR106643]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From da8e0e1191c5512244a752b30dea0eba83e3d10c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 27 Oct 2022 21:52:07 +0200
Subject: [PATCH] Support OpenACC 'declare create' with Fortran allocatable
 arrays, part I [PR106643]

	PR libgomp/106643
	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Support
	OpenACC 'declare create' with Fortran allocatable arrays, part I.
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1-directive.f90:
	New.
	* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90:
	New.
---
 libgomp/oacc-mem.c| 28 +--
 ...90 => declare-allocatable-1-directive.f90} | 14 --
 ...ocatable-array_descriptor-1-directive.f90} | 12 
 3 files changed, 44 insertions(+), 10 deletions(-)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-1.f90 => declare-allocatable-1-directive.f90} (95%)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-array_descriptor-1-runtime.f90 => declare-allocatable-array_descriptor-1-directive.f90} (98%)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 73b2710c2b8..ba010fddbb3 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1150,8 +1150,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	}
   else if (n && groupnum > 1)
 	{
-	  assert (n->refcount != REFCOUNT_INFINITY
-		  && n->refcount != REFCOUNT_LINK);
+	  assert (n->refcount != REFCOUNT_LINK);
 
 	  for (size_t j = i + 1; j <= group_last; j++)
 	if ((kinds[j] & 0xff) == GOMP_MAP_ATTACH)
@@ -1166,6 +1165,31 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  bool processed = false;
 
 	  struct target_mem_desc *tgt = n->tgt;
+
+	  /* Arrange so that OpenACC 'declare' code à la PR106643
+	 "[gfortran + OpenACC] Allocate in module causes refcount error"
+	 has a chance to work.  */
+	  if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET
+	  && tgt->list_count == 0)
+	{
+	  /* 'declare target'

Re: [PATCH] libstdc++: Refactor implementation of operator+ for std::string

2022-11-02 Thread Will Hawkins
Just wanted to see if there was anything else I can do to help move
this over the finish line! Thanks for all the work that you all do!

Sincerely,
Will

On Wed, Oct 19, 2022 at 8:06 PM Will Hawkins  wrote:
>
> Sorry for the delay. Tested on x86-64 Linux.
>
> -->8--
>
> After consultation with Jonathan, it seemed like a good idea to create a
> single function that performed one-allocation string concatenation that
> could be used by various different version of operator+. This patch adds
> such a function and calls it from the relevant implementations.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h:
> Add common function that performs single-allocation string
> concatenation. (__str_cat)
> Use __str_cat to perform optimized operator+, where relevant.
> * include/bits/basic_string.tcc::
> Remove single-allocation implementation of operator+.
>
> Signed-off-by: Will Hawkins 
> ---
>  libstdc++-v3/include/bits/basic_string.h   | 66 --
>  libstdc++-v3/include/bits/basic_string.tcc | 41 --
>  2 files changed, 49 insertions(+), 58 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index cd244191df4..9c2b57f5a1d 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -3485,6 +3485,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>  _GLIBCXX_END_NAMESPACE_CXX11
>  #endif
>
> +  template
> +_GLIBCXX20_CONSTEXPR
> +inline _Str
> +__str_concat(typename _Str::value_type const* __lhs,
> +typename _Str::size_type __lhs_len,
> +typename _Str::value_type const* __rhs,
> +typename _Str::size_type __rhs_len,
> +typename _Str::allocator_type const& __a)
> +{
> +  typedef typename _Str::allocator_type allocator_type;
> +  typedef __gnu_cxx::__alloc_traits _Alloc_traits;
> +  _Str __str(_Alloc_traits::_S_select_on_copy(__a));
> +  __str.reserve(__lhs_len + __rhs_len);
> +  __str.append(__lhs, __lhs_len);
> +  __str.append(__rhs, __rhs_len);
> +  return __str;
> +}
> +
>// operator+
>/**
> *  @brief  Concatenate two strings.
> @@ -3494,13 +3512,14 @@ _GLIBCXX_END_NAMESPACE_CXX11
> */
>template
>  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> -basic_string<_CharT, _Traits, _Alloc>
> +inline basic_string<_CharT, _Traits, _Alloc>
>  operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
>   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
>  {
> -  basic_string<_CharT, _Traits, _Alloc> __str(__lhs);
> -  __str.append(__rhs);
> -  return __str;
> +  typedef basic_string<_CharT, _Traits, _Alloc> _Str;
> +  return std::__str_concat<_Str>(__lhs.c_str(), __lhs.size(),
> +__rhs.c_str(), __rhs.size(),
> +__lhs.get_allocator());
>  }
>
>/**
> @@ -3511,9 +3530,16 @@ _GLIBCXX_END_NAMESPACE_CXX11
> */
>template
>  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> -basic_string<_CharT,_Traits,_Alloc>
> +inline basic_string<_CharT,_Traits,_Alloc>
>  operator+(const _CharT* __lhs,
> - const basic_string<_CharT,_Traits,_Alloc>& __rhs);
> + const basic_string<_CharT,_Traits,_Alloc>& __rhs)
> +{
> +  __glibcxx_requires_string(__lhs);
> +  typedef basic_string<_CharT, _Traits, _Alloc> _Str;
> +  return std::__str_concat<_Str>(__lhs, _Traits::length(__lhs),
> +__rhs.c_str(), __rhs.size(),
> +__rhs.get_allocator());
> +}
>
>/**
> *  @brief  Concatenate character and string.
> @@ -3523,8 +3549,14 @@ _GLIBCXX_END_NAMESPACE_CXX11
> */
>template
>  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> -basic_string<_CharT,_Traits,_Alloc>
> -operator+(_CharT __lhs, const basic_string<_CharT,_Traits,_Alloc>& 
> __rhs);
> +inline basic_string<_CharT,_Traits,_Alloc>
> +operator+(_CharT __lhs, const basic_string<_CharT,_Traits,_Alloc>& __rhs)
> +{
> +  typedef basic_string<_CharT, _Traits, _Alloc> _Str;
> +  return std::__str_concat<_Str>(__builtin_addressof(__lhs), 1,
> +__rhs.c_str(), __rhs.size(),
> +__rhs.get_allocator());
> +}
>
>/**
> *  @brief  Concatenate string and C string.
> @@ -3538,11 +3570,12 @@ _GLIBCXX_END_NAMESPACE_CXX11
>  operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
>   const _CharT* __rhs)
>  {
> -  basic_string<_CharT, _Traits, _Alloc> __str(__lhs);
> -  __str.append(__rhs);
> -  return __str;
> +  __glibcxx_requires_string(__rhs);
> +  typedef basic_string<_CharT, _Traits, _Alloc> _Str;
> +  return std::__str_concat<_Str>(__lhs.c_str(), __lhs.size(),
> +   

[committed] libstdc++: Remove more redundant union members

2022-11-02 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

We don't need these 'unused' members because they're never used, and a
union with a single variant member is fine.

libstdc++-v3/ChangeLog:

* libsupc++/eh_globals.cc (constant_init::unused): Remove.
* src/c++11/system_error.cc (constant_init::unused): Remove.
* src/c++17/memory_resource.cc (constant_init::unused): Remove.
---
 libstdc++-v3/libsupc++/eh_globals.cc  | 1 -
 libstdc++-v3/src/c++11/system_error.cc| 1 -
 libstdc++-v3/src/c++17/memory_resource.cc | 1 -
 3 files changed, 3 deletions(-)

diff --git a/libstdc++-v3/libsupc++/eh_globals.cc 
b/libstdc++-v3/libsupc++/eh_globals.cc
index 0aadb692a96..12abfc10521 100644
--- a/libstdc++-v3/libsupc++/eh_globals.cc
+++ b/libstdc++-v3/libsupc++/eh_globals.cc
@@ -73,7 +73,6 @@ namespace
   struct constant_init
   {
 union {
-  unsigned char unused;
   __cxa_eh_globals obj;
 };
 constexpr constant_init() : obj() { }
diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index 8c13642408d..5707e6b61d6 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -49,7 +49,6 @@ namespace
 struct constant_init
 {
   union {
-   unsigned char unused;
T obj;
   };
   constexpr constant_init() : obj() { }
diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 8bc55a69f1f..651d07489aa 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -82,7 +82,6 @@ namespace pmr
   struct constant_init
   {
union {
- unsigned char unused;
  T obj;
};
constexpr constant_init() : obj() { }
-- 
2.38.1



Support OpenACC 'declare create' with Fortran allocatable arrays, part II [PR106643, PR96668] (was: Support OpenACC 'declare create' with Fortran allocatable arrays, part I [PR106643])

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:22:25+0100, I wrote:
> On 2022-11-02T21:15:31+0100, I wrote:
>> On 2022-11-02T21:10:54+0100, I wrote:
>>> On 2022-11-02T21:04:56+0100, I wrote:
 --- /dev/null
 +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
 @@ -0,0 +1,268 @@
 +! Test OpenACC 'declare create' with allocatable arrays.
 +
 +! { dg-do run }
 +
 +!TODO-OpenACC-declare-allocate
 +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
 +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
 +! "The 'declare create' directive with a Fortran 'allocatable' has new 
 behavior".
 +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
 +
 +[...]
>>>
>>> Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
>>> work around (as seen in real-world code), I've pushed to master branch
>>> commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
>>> "Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'"
>>
>>> ... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
>>> for missing support for OpenACC "Changes from Version 2.0 to 2.5":
>>> "The 'declare create' directive with a Fortran 'allocatable' has new 
>>> behavior".
>>> Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
>>> manually.
>>
>> A similar test case, but with different focus, I've pushed to master
>> branch in commit abeaf3735fe2568b9d5b8096318da866b1fe1e5c
>> "Add 
>> 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'",
>> see attached.
>
>> --- /dev/null
>> +++ 
>> b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
>> @@ -0,0 +1,402 @@
>> +! Test OpenACC 'declare create' with allocatable arrays.
>> +
>> +! { dg-do run }
>> +
>> +! Note that we're not testing OpenACC semantics here, but rather documenting
>> +! current GCC behavior, specifically, behavior concerning updating of
>> +! host/device array descriptors.
>> +! { dg-skip-if n/a { *-*-* } { -DACC_MEM_SHARED=1 } }
>> +
>> +!TODO-OpenACC-declare-allocate
>> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
>> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
>> behavior".
>> +! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
>> +! manually.
>
> If instead of calling 'acc_create'/'acc_delete' we'd like to use
> '!$acc enter data create'/'!$acc exit data delete', we run into
> 
> "[gfortran + OpenACC] Allocate in module causes refcount error".
> Pushed to master branchcommit da8e0e1191c5512244a752b30dea0eba83e3d10c
> "Support OpenACC 'declare create' with Fortran allocatable arrays, part I 
> [PR106643]",
> see attached.

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -1166,6 +1165,31 @@ goacc_enter_data_internal (struct gomp_device_descr 
> *acc_dev, size_t mapnum,
> bool processed = false;
>
> struct target_mem_desc *tgt = n->tgt;
> +
> +   /* Arrange so that OpenACC 'declare' code à la PR106643
> +  "[gfortran + OpenACC] Allocate in module causes refcount error"
> +  has a chance to work.  */
> +   if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET
> +   && tgt->list_count == 0)
> + {
> +   /* 'declare target'.  */
> +   assert (n->refcount == REFCOUNT_INFINITY);
> +
> +   for (size_t k = 1; k < groupnum; k++)
> + {
> +   /* The only thing we expect to see here.  */
> +   assert ((kinds[i + k] & 0xff) == GOMP_MAP_POINTER);
> + }
> +
> +   /* Given that 'goacc_exit_data_internal'/'goacc_exit_datum_1'
> +  will always see 'n->refcount == REFCOUNT_INFINITY',
> +  there's no need to adjust 'n->dynamic_refcount' here.  */
> +
> +   processed = true;
> + }

To make slightly more interesting (real-world) test cases work, we here
also have to process the 'GOMP_MAP_TO_PSET', 'GOMP_MAP_POINTER' here.
Tobias had implemented such a thing in context of OpenMP PR96668
"[OpenMP] Re-mapping allocated but previously unallocated allocatable does not 
work"
a while ago, and we may do similar here.  Side note: in the first version
of my changes, I had actually here in
'libgomp/oacc-mem.c:goacc_enter_data_internal' re-implemented the
corresponding -- "somewhat ugly" -- logic, when at some point I realized
that I instead could simply call into the existing code, greatly reducing
the complexity here...  Pushed to master branch
commit f6ce1e77bbf5d3a096f52e674bfd7354c6537d10
"Support OpenACC 'declare create' with Fortran allocatable arrays, part II 
[PR106643, PR96668]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergeric

Re: [PATCH, v2] Fortran: ordering of hidden procedure arguments [PR107441]

2022-11-02 Thread Harald Anlauf via Gcc-patches

Am 02.11.22 um 18:20 schrieb Mikael Morin:

Unfortunately no, the coarray case works, but the other problem remains.
The type problem is not visible in the definition of S, it is in the
declaration of S's prototype in P.

S is defined as:

void s (character(kind=1)[1:_c] & restrict c, integer(kind=4) o,
logical(kind=1) _o, integer(kind=8) _c)
{
...
}

but P has:

void p ()
{
   static void s (character(kind=1)[1:] & restrict, integer(kind=4),
integer(kind=8), logical(kind=1));
   void (*) (character(kind=1)[1:] & restrict, integer(kind=4),
integer(kind=8), logical(kind=1)) pp;

   pp = s;
...
}


Right, now I see it too.  Simplified case:

program p
  call s ("abcd")
contains
  subroutine s(c, o)
character(*) :: c
integer, optional, value :: o
  end subroutine s
end

I do see what needs to be done in gfc_get_function_type, which seems
in fact very simple.  But I get really lost in create_function_arglist
when trying to get the typelist right.

One thing is I really don't understand how the (hidden_)typelist is
managed here.  How does that macro TREE_CHAIN work?  Can we somehow
chain two typelists the same way we chain arguments?

(Failing that, I tried to split the loop over the dummy arguments in
create_function_arglist into two passes, one for the optional+value
variant, and one for the rest.  It turned out to be a bad idea...)

Harald



Re: Adding a new thread model to GCC

2022-11-02 Thread Eric Botcazou via Gcc-patches
> I was able to successfully build gcc-trunk using the provided patch.
> moreover, I was able to successfully build all of the packages used in
> the toolchain!
> (gmp, mpfr, mpc, isl, libgnurx, bzip2, termcap, libffi, expat, ncurses,
> readline, gdbm, tcl, tk, openssl, xz-utils, sqlite, python3, binutils,
> gdb, make)

Great!  Did you check that C++ threads are enabled in your build?  If they 
are, you must be able to run the attached C++ test; if they are not (because 
the MinGW64 build is configured for older versions of Windows), you need to 
configure the compiler with the option --enable-libstdcxx-threads.

-- 
Eric Botcazou#include 
#include 
#include 
#include 
#include 

#define NUM_THREADS 4

std::condition_variable cond;
std::mutex mx;
int started = 0;

void
do_thread ()
{
  std::unique_lock lock(mx);
  std::cout << "Start thread " << started << std::endl;
  if(++started >= NUM_THREADS)
cond.notify_all();
  else
cond.wait(lock);
}

int
main ()
{
  std::vector vec;
  for (int i = 0; i < NUM_THREADS; ++i)
vec.emplace_back(&do_thread);
  for (int i = 0; i < NUM_THREADS; ++i)
vec[i].join();
  vec.clear();
  return 0;
}


Re: Adding a new thread model to GCC

2022-11-02 Thread i.nixman--- via Gcc-patches

On 2022-11-02 21:27, Eric Botcazou wrote:

Great!  Did you check that C++ threads are enabled in your build?  If 
they
are, you must be able to run the attached C++ test; if they are not 
(because
the MinGW64 build is configured for older versions of Windows), you 
need to

configure the compiler with the option --enable-libstdcxx-threads.


I already checked everything before, but now I re-checked it again - 
everything works!


the output:
$ ./t
Start thread 0
Start thread 1
Start thread 2
Start thread 3


thank you!


Re: [PATCH v3] Add gcc/make-unique.h

2022-11-02 Thread Jason Merrill via Gcc-patches

On 10/26/22 16:40, David Malcolm via Gcc-patches wrote:

Changed in v3: added include of 
v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604137.html
v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598189.html

On Tue, 2022-07-12 at 07:48 +0100, Jonathan Wakely wrote:

On Tue, 12 Jul 2022, 01:25 David Malcolm, 
wrote:


On Fri, 2022-07-08 at 22:16 +0100, Jonathan Wakely wrote:

On Fri, 8 Jul 2022 at 21:47, David Malcolm via Gcc

wrote:


std::unique_ptr is C++11, and I'd like to use it in the
gcc/analyzer
subdirectory, at least.  The following patch eliminates a bunch
of
"takes ownership" comments and manual "delete" invocations in
favor
of simply using std::unique_ptr.

The problem is that the patch makes use of std::make_unique,
but
that
was added in C++14.

I've heard that it's reasonably easy to reimplement
std::make_unique,
but I'm not sure that my C++11 skills are up to it.


You know we have an implementation of std::make_unique in GCC,
with a
GCC-compatible licence that you can look at, right? :-)

But it's not really necessary. There are only two reasons to
prefer
make_unique over just allocating an object with new and
constructing
a
unique_ptr from it:

1) avoid a "naked" new in your code (some coding styles like
this,
but
it's not really important as long as the 'delete' is managed
automatically by unique_ptr).

2) exception-safety when allocating multiple objects as args to a
function, see https://herbsutter.com/gotw/_102/ for details.
Irrelevant for GCC, because we build without exceptions.


[moving from gcc to gcc-patches mailing list]

Also, I *think* it's a lot less typing, since I can write just:

   std::make_unique (args)

rather than

   std::unique_ptr (new
name_of_type_which_could_be_long (args));






Is there:
(a) an easy way to implement a std::make_unique replacement
 (e.g. in system.h? what to call it?), or


If you don't care about using it to create unique_ptr
arrays,
it's trivial:

   template
 inline typename std::enable_if::value,
std::unique_ptr>::type
 make_unique(Args&&... args)
 { return std::unique_ptr(new
T(std::forward(args)...));
}

To add the overload that works for arrays is a little trickier.


Thanks!

I tried adding it to gcc/system.h, but anything that uses it needs
to
have std::unique_ptr declared, which meant forcibly including

from gcc/system.h

So instead, here's a patch that adds a new gcc/make-unique.h
header,
containing just the template decl above (in the root namespace,
rather
than std::, which saves a bit more typing).



Adding things to std isn't allowed anyway, so that's correct.



I've successfully bootstrapped®ression-tested a version of my
earlier
analyzer patch that uses this patch (see patch 2 of the kit, which
has
lots of usage examples).

OK for trunk?

Dave


[...snip...]


+#ifndef INCLUDE_MEMORY
+# error "You must define INCLUDE_MEMORY before including system.h
to use
make-unique.h"
+#endif



You also need  for the enable_if and is_array traits.
With
libstdc++ that gets included by  but that's guaranteed for
other
library implementations.

I don't know if that had the same kind of issues as other system
headers or
if it can just be included here.


I've added an include of  in this version of the patch.




+

+/* Minimal implementation of make_unique for C++11 compatibility
+   (std::make_unique is C++14).  */
+
+template
+inline typename std::enable_if::value,
std::unique_ptr>::type
+make_unique(Args&&... args)
+{
+  return std::unique_ptr (new T (std::forward
(args)...));
+}
+
+#endif /* ! GCC_MAKE_UNIQUE */
--
2.26.3




This patch adds gcc/make-unique.h, containing a minimal C++11
implementation of make_unique (std::make_unique is C++14).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu
in conjunction with a followup series of patches which use this
in dozens of places in the analyzer.

OK for trunk?


OK.


gcc/ChangeLog:
* make-unique.h: New file.
---
  gcc/make-unique.h | 44 
  1 file changed, 44 insertions(+)
  create mode 100644 gcc/make-unique.h

diff --git a/gcc/make-unique.h b/gcc/make-unique.h
new file mode 100644
index 000..c9a7d6ef6ce
--- /dev/null
+++ b/gcc/make-unique.h
@@ -0,0 +1,44 @@
+/* Minimal implementation of make_unique for C++11 compatibility.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+

[PATCH 1/2] Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector types

2022-11-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions

gcc/ChangeLog:

PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/vector-shift-1.c: New test.
---
 gcc/match.pd  | 25 +++
 .../gcc.c-torture/compile/vector-shift-1.c|  8 ++
 2 files changed, 22 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..5833e05a926 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1371,7 +1371,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
/* For logical right shifts, this is possible only if @0 doesn't
   have MSB set and the logical right shift is changed into
   arithmetic shift.  */
-   (if (!wi::neg_p (tree_nonzero_bits (@0)))
+   (if (INTEGRAL_TYPE_P (type)
+&& !wi::neg_p (tree_nonzero_bits (@0)))
 (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
  (convert (rshift (bit_not! (convert:stype @0)) @1))
 
@@ -7518,7 +7519,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
 (simplify
   (plus (POPCOUNT:s @0) (POPCOUNT:s @1))
-  (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
+  (if (INTEGRAL_TYPE_P (type)
+   && wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
 (POPCOUNT (bit_ior @0 @1
 
 /* popcount(X) == 0 is X == 0, and related (in)equalities.  */
@@ -7550,15 +7552,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for pfun (POPCOUNT PARITY)
   (simplify
 (pfun @0)
-(with { wide_int nz = tree_nonzero_bits (@0); }
-  (switch
-   (if (nz == 1)
- (convert @0))
-   (if (wi::popcount (nz) == 1)
- (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
-   (convert (rshift:utype (convert:utype @0)
-  { build_int_cst (integer_type_node,
-   wi::ctz (nz)); }
+(if (INTEGRAL_TYPE_P (type))
+ (with { wide_int nz = tree_nonzero_bits (@0); }
+   (switch
+(if (nz == 1)
+  (convert @0))
+(if (wi::popcount (nz) == 1)
+  (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+(convert (rshift:utype (convert:utype @0)
+   { build_int_cst (integer_type_node,
+wi::ctz (nz)); })
 
 #if GIMPLE
 /* 64- and 32-bits branchless implementations of popcount are detected:
diff --git a/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c 
b/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c
new file mode 100644
index 000..142ea56d5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c
@@ -0,0 +1,8 @@
+typedef unsigned char __attribute__((__vector_size__ (1))) U;
+
+U
+foo (U u)
+{
+  u = u == u;
+  return (~(u >> 255));
+}
-- 
2.17.1



[PATCH 0/2] tree_nonzero_bits vs vector and complex types

2022-11-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 


While looking at older unconfirmed bug reports, I noticed there was
an ubsan found issue and noticed tree_nonzero_bits was being called with
a vector type. How ubsan found it was at the end of tree_nonzero_bits,
did "return wi::shwi (-1, TYPE_PRECISION (TREE_TYPE (t)));" and
it was with a vector of 1 elements which meant precision was 0
as precision stores the log2 of the number of elements in a vector.

Anyways we want to catch these kind of errors of calling tree_nonzero_bits
with a vector or a complex type. And fix the places where it is called.

Thanks,
Andrew Pinski


Andrew Pinski (2):
  Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector
types
  Add assert for type on tree_nonzero_bits

 gcc/fold-const.cc |  3 +++
 gcc/match.pd  | 25 +++
 .../gcc.c-torture/compile/vector-shift-1.c|  8 ++
 3 files changed, 25 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/vector-shift-1.c

-- 
2.17.1



[PATCH 2/2] Add assert for type on tree_nonzero_bits

2022-11-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Right now anyone could call tree_nonzero_bits with
either complex or vector types and this will return
the wrong thing. So just assert that nobody calls
it with this.

OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.

gcc/ChangeLog:

* fold-const.cc (tree_nonzero_bits): Add
assert.
---
 gcc/fold-const.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 7e1ea58518b..3ccac9b28df 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -16567,6 +16567,9 @@ c_getstr (tree str)
 wide_int
 tree_nonzero_bits (const_tree t)
 {
+  gcc_assert (TREE_CODE (TREE_TYPE (t)) != VECTOR_TYPE
+ && TREE_CODE (TREE_TYPE (t)) != COMPLEX_TYPE);
+
   switch (TREE_CODE (t))
 {
 case INTEGER_CST:
-- 
2.17.1



Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-02 Thread Bernhard Reutner-Fischer via Gcc-patches
On 2 November 2022 15:45:39 CET, Tamar Christina via Gcc-patches 
 wrote:
>Hi All,
>
>This patch adds initial support for early break vectorization in GCC.
>The support is added for any target that implements a vector cbranch optab.
>
>Concretely the kind of loops supported are of the forms:
>
> for (int i = 0; i < N; i++)
> {
>   
>   if ()
> ;
>   
> }
>
>where  can be:
> - break
> - return

Just curious, but don't we have graphite for splitting loops on control flow, 
respectively reflow loops to help vectorization like in this case? Did you 
compare, and if so, what's missing?

thanks and cheers,


Re: [RFC] RISC-V: Minimal supports for new extensions in profile.

2022-11-02 Thread Palmer Dabbelt

On Wed, 02 Nov 2022 05:52:34 PDT (-0700), jia...@iscas.ac.cn wrote:

This patch just add name support contain in profiles.
Set the extension version as 0.1.


Or maybe v0.8, as they're in the v0.8 profile spec?  I doubt it really 
matters, though.  Either way we'll need a -mprofile-spec-version (or 
whatever) for these, as these one-phrase definitions will almost 
certainly change.


This also doesn't couple these new extensions to the profiles in any 
way.  IMO that's a sane thing to do, but they're only defined as part of 
the mandatory profile section so I'm just double-checking here 
.


We'll also need news entries and I don't see any testing results, though 
those are probably pretty easy here.




gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extensions.
* config/riscv/riscv-opts.h (MASK_ZICCAMOA): New mask.
(MASK_ZICCIF): Ditto.
(MASK_ZICCLSM): Ditto.
(MASK_ZICCRSE): Ditto.
(MASK_ZICNTR): Ditto.
(MASK_ZIHINTPAUSE): Ditto.
(MASK_ZIHPM): Ditto.
(TARGET_ZICCAMOA): New target.
(TARGET_ZICCIF): Ditto.
(TARGET_ZICCLSM): Ditto.
(TARGET_ZICCRSE): Ditto.
(TARGET_ZICNTR): Ditto.
(TARGET_ZIHINTPAUSE): Ditto.
(TARGET_ZIHPM): Ditto.
(MASK_SVPBMT): New mask.

---
 gcc/common/config/riscv/riscv-common.cc | 20 
 gcc/config/riscv/riscv-opts.h   | 15 +++
 2 files changed, 35 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index d6404a01205..602491c638d 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -163,6 +163,15 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},

+  {"ziccamoa", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"ziccif", ISA_SPEC_CLASS_NONE, 0, 1},


IMO Ziccif should be sufficiently visible in the object that we can 
reject running binaries that require that on systems that don't support 
it.  It's essentially the same as Ztso, we're adding more constraints to 
existing instructions.



+  {"zicclsm", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"ziccrse", ISA_SPEC_CLASS_NONE, 0, 1},
+  {"zicntr", ISA_SPEC_CLASS_NONE, 0, 1},


As per Andrew's post here 
, 
Zicntr and Zihpm should be ignored by software.


I think you could make that compatibility argument for Zicclsm and 
Ziccrse as well, but given that the core of the Zicntr/Zihpm argument is 
based on userspace not knowing about priv-spec details such as PMAs I'm 
guessing it'd go that way too.  That said, these are all listed in the 
"features available to user-mode execution environments" section.



+
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 0, 1},


We should probably have a builtin for this, there's a handful of 
userspace cpu_relax()-type calls and having something to select the 
flavor of pause instruction based on the target seems generally useful.



+  {"zihpm", ISA_SPEC_CLASS_NONE, 0, 1},


See above.


+
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},


There's some missing ones, just poking through the profile I can find: 
Za64rs and Zic64b, but there's a lot in there and I'm kind of getting my 
eyes crossed already.


I'd argue that Za64rs should be handled like Ziccif, but we don't have a 
lot of bits left in the header.  I just sent some patches to the ELF 
psABI spec: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/351



@@ -219,6 +228,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =

   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svpbmt", ISA_SPEC_CLASS_NONE, 0, 1},

   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
@@ -1179,6 +1189,14 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =

   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
+  {"ziccamoa", &gcc_options::x_riscv_zi_subext, MASK_ZICCAMOA},
+  {"ziccif", &gcc_options::x_riscv_zi_subext, MASK_ZICCIF},
+  {"zicclsm", &gcc_options::x_riscv_zi_subext, MASK_ZICCLSM},
+  {"ziccrse", &gcc_options::x_riscv_zi_subext, MASK_ZICCRSE},
+  {"zicntr", &gcc_options::x_riscv_zi_subext, MASK_ZICNTR},
+
+  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
+  {"zihpm", &gcc_options::x_riscv_zi_subext, MASK_ZIHPM},

   {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
   {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
@@ -1230,6 +1248,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl1024b",  &gcc_options::x_riscv_zvl_flags, MASK_ZVL1024B},
   {"zvl2048b",  &gcc_options::x_ris

Re: [RFC] RISC-V: Minimal supports for new extensions in profile.

2022-11-02 Thread Philipp Tomsich
On Wed, 2 Nov 2022 at 23:06, Palmer Dabbelt  wrote:
>
> On Wed, 02 Nov 2022 05:52:34 PDT (-0700), jia...@iscas.ac.cn wrote:
> > This patch just add name support contain in profiles.
> > Set the extension version as 0.1.
>
> Or maybe v0.8, as they're in the v0.8 profile spec?  I doubt it really
> matters, though.  Either way we'll need a -mprofile-spec-version (or
> whatever) for these, as these one-phrase definitions will almost
> certainly change.
>
> This also doesn't couple these new extensions to the profiles in any
> way.  IMO that's a sane thing to do, but they're only defined as part of
> the mandatory profile section so I'm just double-checking here
> .
>
> We'll also need news entries and I don't see any testing results, though
> those are probably pretty easy here.
>
> >
> > gcc/ChangeLog:
> >
> > * common/config/riscv/riscv-common.cc: New extensions.
> > * config/riscv/riscv-opts.h (MASK_ZICCAMOA): New mask.
> > (MASK_ZICCIF): Ditto.
> > (MASK_ZICCLSM): Ditto.
> > (MASK_ZICCRSE): Ditto.
> > (MASK_ZICNTR): Ditto.
> > (MASK_ZIHINTPAUSE): Ditto.
> > (MASK_ZIHPM): Ditto.
> > (TARGET_ZICCAMOA): New target.
> > (TARGET_ZICCIF): Ditto.
> > (TARGET_ZICCLSM): Ditto.
> > (TARGET_ZICCRSE): Ditto.
> > (TARGET_ZICNTR): Ditto.
> > (TARGET_ZIHINTPAUSE): Ditto.
> > (TARGET_ZIHPM): Ditto.
> > (MASK_SVPBMT): New mask.
> >
> > ---
> >  gcc/common/config/riscv/riscv-common.cc | 20 
> >  gcc/config/riscv/riscv-opts.h   | 15 +++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/gcc/common/config/riscv/riscv-common.cc 
> > b/gcc/common/config/riscv/riscv-common.cc
> > index d6404a01205..602491c638d 100644
> > --- a/gcc/common/config/riscv/riscv-common.cc
> > +++ b/gcc/common/config/riscv/riscv-common.cc
> > @@ -163,6 +163,15 @@ static const struct riscv_ext_version 
> > riscv_ext_version_table[] =
> >{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >
> > +  {"ziccamoa", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"ziccif", ISA_SPEC_CLASS_NONE, 0, 1},
>
> IMO Ziccif should be sufficiently visible in the object that we can
> reject running binaries that require that on systems that don't support
> it.  It's essentially the same as Ztso, we're adding more constraints to
> existing instructions.
>
> > +  {"zicclsm", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"ziccrse", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"zicntr", ISA_SPEC_CLASS_NONE, 0, 1},
>
> As per Andrew's post here
> ,
> Zicntr and Zihpm should be ignored by software.
>
> I think you could make that compatibility argument for Zicclsm and
> Ziccrse as well, but given that the core of the Zicntr/Zihpm argument is
> based on userspace not knowing about priv-spec details such as PMAs I'm
> guessing it'd go that way too.  That said, these are all listed in the
> "features available to user-mode execution environments" section.
>
> > +
> > +  {"zihintpause", ISA_SPEC_CLASS_NONE, 0, 1},
>
> We should probably have a builtin for this, there's a handful of
> userspace cpu_relax()-type calls and having something to select the
> flavor of pause instruction based on the target seems generally useful.

I had originally submitted this in early 2021 (including a builtin),
but we never agreed on details (e.g. whether this should be gated, as
it is a true hint):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562936.html

Let me know what behavior we want and I'll submit a v2.

Philipp.

> > +  {"zihpm", ISA_SPEC_CLASS_NONE, 0, 1},
>
> See above.
>
> > +
> >{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
>
> There's some missing ones, just poking through the profile I can find:
> Za64rs and Zic64b, but there's a lot in there and I'm kind of getting my
> eyes crossed already.
>
> I'd argue that Za64rs should be handled like Ziccif, but we don't have a
> lot of bits left in the header.  I just sent some patches to the ELF
> psABI spec: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/351
>
> > @@ -219,6 +228,7 @@ static const struct riscv_ext_version 
> > riscv_ext_version_table[] =
> >
> >{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
> > +  {"svpbmt", ISA_SPEC_CLASS_NONE, 0, 1},
> >
> >/* Terminate the list.  */
> >{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
> > @@ -1179,6 +1189,14 @@ static const riscv_ext_flag_table_t 
> > riscv_ext_flag_table[] =
> >
> >{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> > +  {"ziccamoa", &gcc_options::x_riscv_zi_subext, MASK_ZICCAMOA},
> > +  {"ziccif", &gcc_options::x_riscv_zi_subext

Re: [RFC] RISC-V: Minimal supports for new extensions in profile.

2022-11-02 Thread Palmer Dabbelt

On Wed, 02 Nov 2022 15:20:57 PDT (-0700), philipp.toms...@vrull.eu wrote:

On Wed, 2 Nov 2022 at 23:06, Palmer Dabbelt  wrote:


On Wed, 02 Nov 2022 05:52:34 PDT (-0700), jia...@iscas.ac.cn wrote:
> This patch just add name support contain in profiles.
> Set the extension version as 0.1.

Or maybe v0.8, as they're in the v0.8 profile spec?  I doubt it really
matters, though.  Either way we'll need a -mprofile-spec-version (or
whatever) for these, as these one-phrase definitions will almost
certainly change.

This also doesn't couple these new extensions to the profiles in any
way.  IMO that's a sane thing to do, but they're only defined as part of
the mandatory profile section so I'm just double-checking here
.

We'll also need news entries and I don't see any testing results, though
those are probably pretty easy here.

>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: New extensions.
> * config/riscv/riscv-opts.h (MASK_ZICCAMOA): New mask.
> (MASK_ZICCIF): Ditto.
> (MASK_ZICCLSM): Ditto.
> (MASK_ZICCRSE): Ditto.
> (MASK_ZICNTR): Ditto.
> (MASK_ZIHINTPAUSE): Ditto.
> (MASK_ZIHPM): Ditto.
> (TARGET_ZICCAMOA): New target.
> (TARGET_ZICCIF): Ditto.
> (TARGET_ZICCLSM): Ditto.
> (TARGET_ZICCRSE): Ditto.
> (TARGET_ZICNTR): Ditto.
> (TARGET_ZIHINTPAUSE): Ditto.
> (TARGET_ZIHPM): Ditto.
> (MASK_SVPBMT): New mask.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc | 20 
>  gcc/config/riscv/riscv-opts.h   | 15 +++
>  2 files changed, 35 insertions(+)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
> index d6404a01205..602491c638d 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -163,6 +163,15 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
>
> +  {"ziccamoa", ISA_SPEC_CLASS_NONE, 0, 1},
> +  {"ziccif", ISA_SPEC_CLASS_NONE, 0, 1},

IMO Ziccif should be sufficiently visible in the object that we can
reject running binaries that require that on systems that don't support
it.  It's essentially the same as Ztso, we're adding more constraints to
existing instructions.

> +  {"zicclsm", ISA_SPEC_CLASS_NONE, 0, 1},
> +  {"ziccrse", ISA_SPEC_CLASS_NONE, 0, 1},
> +  {"zicntr", ISA_SPEC_CLASS_NONE, 0, 1},

As per Andrew's post here
,
Zicntr and Zihpm should be ignored by software.

I think you could make that compatibility argument for Zicclsm and
Ziccrse as well, but given that the core of the Zicntr/Zihpm argument is
based on userspace not knowing about priv-spec details such as PMAs I'm
guessing it'd go that way too.  That said, these are all listed in the
"features available to user-mode execution environments" section.

> +
> +  {"zihintpause", ISA_SPEC_CLASS_NONE, 0, 1},

We should probably have a builtin for this, there's a handful of
userspace cpu_relax()-type calls and having something to select the
flavor of pause instruction based on the target seems generally useful.


I had originally submitted this in early 2021 (including a builtin),
but we never agreed on details (e.g. whether this should be gated, as
it is a true hint):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562936.html

Let me know what behavior we want and I'll submit a v2.


Ah, sorry, I guess I forgot.  I don't know if we ever talked about it in 
GCC land, but at least in QEMU and Linux we ended up just ignoring the 
ISA manual here and pretending it's a hint -- the assumption is that 
vendors will do so too.  So IMO we can just document that somewhere in 
GCC as well.


I guess we could add some sort of Xsifive_x0div_relax extension to cover 
the div-based go-slow instructions in some SiFive machines as well, but 
that's sort of a different discussion.



Philipp.


> +  {"zihpm", ISA_SPEC_CLASS_NONE, 0, 1},

See above.

> +
>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},

There's some missing ones, just poking through the profile I can find:
Za64rs and Zic64b, but there's a lot in there and I'm kind of getting my
eyes crossed already.

I'd argue that Za64rs should be handled like Ziccif, but we don't have a
lot of bits left in the header.  I just sent some patches to the ELF
psABI spec: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/351

> @@ -219,6 +228,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
>
>{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
>{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svpbmt", ISA_SPEC_CLASS_NONE, 0, 1},
>
>/* Terminate the list.  */
>{NULL, ISA_SPEC_

Re: [PATCH] tree-object-size: Support strndup and strdup

2022-11-02 Thread Siddhesh Poyarekar

On 2022-09-23 09:02, Jakub Jelinek wrote:

Oh, so can addr_object_size be simplified to use get_base_address too?


You can try.  As you can see in get_base_address, that function
handles something that the above doesn't (looking through some MEM_REFs too).



I went down this rabbithole and it actually simplifies some cases but 
got sucked into flex array related issues that I need more time to 
figure out.  I'll stick to using get_base_address for now since I want 
to make sure this makes the stage 1 deadline.


Thanks,
Sid


Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-02 Thread Jeff Law via Gcc-patches



On 11/2/22 15:50, Bernhard Reutner-Fischer via Gcc-patches wrote:

On 2 November 2022 15:45:39 CET, Tamar Christina via Gcc-patches 
 wrote:

Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab.

Concretely the kind of loops supported are of the forms:

for (int i = 0; i < N; i++)
{
   
   if ()
 ;
   
}

where  can be:
- break
- return

Just curious, but don't we have graphite for splitting loops on control flow, 
respectively reflow loops to help vectorization like in this case? Did you 
compare, and if so, what's missing?


Graphite isn't generally enabled, is largely unmaintained and often 
makes things worse rather than better.



jeff




Re: [PATCH] Enable shrink wrapping for the RISC-V target.

2022-11-02 Thread Palmer Dabbelt

On Wed, 02 Nov 2022 08:06:36 PDT (-0700), j...@ventanamicro.com wrote:


On 11/2/22 07:54, Manolis Tsamis wrote:


I've revisited this testcase and I think it's not possible to make it
work with the current implementation.
It's not possible to trigger shrink wrapping in this case since the
wrapping of registers is guarded by
  if (SMALL_OPERAND (offset)) { bitmap_set_bit (components, regno); }
Hence if a long stack is generated we get no shrink wrapping.


Ah, sorry, I must have just missed that when reading the code.  In that 
case we're essentailly just doing what the other port was, so we're 
safe.



I also tried to remove that restriction but it looks like it can't
work because we can't create
pseudo-registers during shrink wrapping and shrink wrapping can't work either.

I believe this means that shrink wrapping cannot interfere with a long
stack frame
so there is nothing to test against in this case?


It'd be marginally better to have such a test case to ensure we don't
shrink wrap it -- that would ensure that someone doesn't accidentally
introduce shrink wrapping with large offsets.   Just a bit of future
proofing.


If there's passing test cases that fail with that check removed then 
it's probably good enough, though I think in this case just having a 
comment there saying why the short-stack check is necessary should be 
fine.


[committed] c: C2x auto

2022-11-02 Thread Joseph Myers
Implement C2x auto, a more restricted version of the C++ feature
(closer to GNU C __auto_type in terms of what's supported).

Since the feature is very close to GNU C __auto_type, much of the
implementation can be shared.  The main differences are:

* Any prior declaration of the identifier in an outer scope is
  shadowed during the initializer (whereas __auto_type leaves any such
  declaration visible until the initializer ends and the scope of the
  __auto_type declaration itself starts).  (A prior declaration in the
  same scope is undefined behavior.)

* The standard feature supports braced initializers (containing a
  single expression, optionally followed by a comma).

* The standard feature disallows the declaration from declaring
  anything that's not an ordinary identifier (thus, the initializer
  cannot declare a tag or the members of a structure or union), while
  making it undefined behavior for it to declare more than one
  ordinary identifier.  (For the latter, while I keep the existing
  error from __auto_type in the case of more than one declarator, I
  don't restrict other ordinary identifiers from being declared in
  inner scopes such as GNU statement expressions.  I do however
  disallow defining the members of an enumeration inside the
  initializer (if the enum definition has no tag, that doesn't
  actually violate a constraint), to avoid an enum type becoming
  accessible beyond where it would have been without auto.
  (Preventing new types from escaping the initializer - thus, ensuring
  that anything written with auto corresponds to something that could
  have been written without auto, modulo multiple evaluation of VLA
  size expressions when not using auto - is a key motivation for some
  restrictions on what can be declared in the initializer.)

The rule on shadowing and restrictions on other declarations in the
initializer are actually general rules for what C2x calls
underspecified declarations, a description that covers constexpr as
well as auto (in particular, this disallows a constexpr initializer
from referencing the variable being initialized).  Thus, some of the
code added for those restrictions will also be of use in implementing
C2x constexpr.

auto with a type specifier remains a storage class specifier with the
same meaning as before (i.e. a redundant storage class specifier for
use at block scope).

Note that the feature is only enabled in C2x mode (-std=c2x or
-std=gnu2x); in older modes, a declaration with auto and no type is
treated as a case of implicit int (only accepted at block scope).

Since many of the restrictions on C2x auto are specified as undefined
behavior rather than constraint violations, it would be possible to
support more features from C++ auto without requiring diagnostics (but
maybe not a good idea, if it isn't clear exactly what semantics might
be given to such a feature in a future revision of C; and
-Wc23-c2y-compat should arguably warn for any such future feature
anyway).  For now the features are limited to something close to
what's supported with __auto_type, with the differences as discussed
above between the two features.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (in_underspecified_init, start_underspecified_init)
(finish_underspecified_init): New.
(shadow_tag_warned, parser_xref_tag, start_struct, start_enum):
Give errors inside initializers of underspecified declarations.
(grokdeclarator): Handle (erroneous) case of C2X auto on a
parameter.
(declspecs_add_type): Handle c2x_auto_p case.
(declspecs_add_scspec): Handle auto possibly setting c2x_auto_p in
C2X mode.
(finish_declspecs): Handle c2x_auto_p.
* c-parser.cc (c_parser_declaration_or_fndef): Handle C2X auto.
* c-tree.h (C_DECL_UNDERSPECIFIED): New macro.
(struct c_declspecs): Add c2x_auto_p.
(start_underspecified_init, finish_underspecified_init): New
prototypes.
* c-typeck.cc (build_external_ref): Give error for underspecified
declaration referenced in its initializer.

gcc/testsuite/
* gcc.dg/c2x-auto-1.c, gcc.dg/c2x-auto-2.c, gcc.dg/c2x-auto-3.c,
gcc.dg/c2x-auto-4.c, gcc.dg/gnu2x-auto-1.c: New tests.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 795c97134f2..a99b7456055 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -1472,6 +1472,67 @@ pop_file_scope (void)
   maybe_apply_pending_pragma_weaks ();
 }
 
+/* Whether we are curently inside the initializer for an
+   underspecified object definition (C2x auto or constexpr).  */
+static bool in_underspecified_init;
+
+/* Start an underspecified object definition for NAME at LOC.  This
+   means that NAME is shadowed inside its initializer, so neither the
+   definition being initialized, nor any definition from an outer
+   scope, may be referenced during that initializer.  Return state to
+   be passed to finish_underspecified_init.

[PATCH] Support Intel CMPccXADD

2022-11-02 Thread Haochen Jiang via Gcc-patches
Hi all,

I just revised the patch according to review. The changes comparing to
previous version is mentioned below.

Ok for trunk?

BRs,
Haochen

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect cmpccxadd.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_CMPCCXADD_SET,
OPTION_MASK_ISA2_CMPCCXADD_UNSET): New.
(ix86_handle_option): Handle -mcmpccxadd.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_CMPCCXADD.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
cmpccxadd.
* config.gcc: Add cmpccxaddintrin.h.
* config/i386/cpuid.h (bit_CMPCCXADD): New.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE(INT, PINT, INT, INT, INT)
and DEF_FUNCTION_TYPE(LONGLONG, PLONGLONG, LONGLONG, LONGLONG, INT).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__CMPCCXADD__.
* config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
Add new parameter to indicate constant position.
Handle INT_FTYPE_PINT_INT_INT_INT
and LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT.
* config/i386/i386-isa.def (CMPCCXADD): Add DEF_PTA(CMPCCXADD).
* config/i386/i386-options.cc (isa2_opts): Add -mcmpccxadd.
(ix86_valid_target_attribute_inner_p): Handle cmpccxadd.
* config/i386/i386.opt: Add option -mcmpccxadd.
* config/i386/sync.md (cmpccxadd_): New define insn.
* config/i386/x86gprintrin.h: Include cmpccxaddintrin.h.
* doc/extend.texi: Document cmpccxadd.
* doc/invoke.texi: Document -mcmpccxadd.
* doc/sourcebuild.texi: Document target cmpccxadd.
* config/i386/cmpccxaddintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mcmpccxadd.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-13.c: Add -mcmpccxadd.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/x86gprintrin-1.c: Ditto.
* gcc.target/i386/x86gprintrin-2.c: Ditto.
* gcc.target/i386/x86gprintrin-3.c: Ditto.
* gcc.target/i386/x86gprintrin-4.c: Ditto.
* gcc.target/i386/x86gprintrin-5.c: Ditto.
* gcc.target/i386/cmpccxadd-1.c: New test.
* gcc.target/i386/cmpccxadd-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |   2 +
 gcc/common/config/i386/i386-common.cc |  15 ++
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/common/config/i386/i386-isas.h|   1 +
 gcc/config.gcc|   3 +-
 gcc/config/i386/cmpccxaddintrin.h |  89 +++
 gcc/config/i386/cpuid.h   |   1 +
 gcc/config/i386/i386-builtin-types.def|   4 +
 gcc/config/i386/i386-builtin.def  |   4 +
 gcc/config/i386/i386-c.cc |   2 +
 gcc/config/i386/i386-expand.cc|  22 ++-
 gcc/config/i386/i386-isa.def  |   1 +
 gcc/config/i386/i386-options.cc   |   4 +-
 gcc/config/i386/i386.opt  |   5 +
 gcc/config/i386/sync.md   |  29 
 gcc/config/i386/x86gprintrin.h|   2 +
 gcc/doc/extend.texi   |   5 +
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/sourcebuild.texi  |   3 +
 gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 gcc/testsuite/gcc.target/i386/cmpccxadd-1.c   |  61 
 gcc/testsuite/gcc.target/i386/cmpccxadd-2.c   | 138 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +-
 .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-2.c  |   6 +-
 .../gcc.target/i386/x86gprintrin-3.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-4.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-5.c  |   6 +-
 gcc/testsuite/lib/target-supports.exp |  10 ++
 33 files changed, 437 insertions(+), 15 deletions(-)
 create mode 100644 gcc/config/i386/cmpccxaddintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-2.c

diff --git a/gcc/config/i386/cmpccxaddintrin.h 
b/gcc/config/i386/cmpccxaddintrin.h
--- /dev/null
+++ b/gcc/config/i386/cmpccxaddintrin.h
+#define __cmpccxadd_epi64(A,B,C,D) \
+  __builtin_ia32_cmpccxadd64 ((long long *) (A), (long long) (B), \
+ (long long) (C), (_CMPCCX_ENUM) (D))
+#endif

Fi

Re: [PATCH 6/6] Initial Sierra Forest Support

2022-11-02 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_intel_cpu):
> Add Sierra Forest.
> * common/config/i386/i386-common.cc
> (processor_names): Add Sierra Forest.
> (processor_alias_table): Ditto.
> * common/config/i386/i386-cpuinfo.h
> (enum processor_types): Add INTEL_SIERRAFOREST.
> * config.gcc: Add -march=sierraforest.
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Handle Sierra Forest.
> * config/i386/i386-c.cc (ix86_target_macros_internal):
> Ditto.
> * config/i386/i386-options.cc (m_SIERRAFOREST): New define.
> (processor_cost_table): Add sierra forest.
> * config/i386/i386.h (enum processor_type):
> Add PROCESSOR_SIERRA_FOREST.
> (PTA_SIERRAFOREST): Ditto.
> * doc/extend.texi: Add sierra forest.
> * doc/invoke.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/mv16.C: Add sierra forest.
> * gcc.target/i386/funcspec-56.inc: Handle new march.
Ok, please commit this patch after CMPCCXADD patch.
> ---
>  gcc/common/config/i386/cpuinfo.h  | 6 ++
>  gcc/common/config/i386/i386-common.cc | 3 +++
>  gcc/common/config/i386/i386-cpuinfo.h | 1 +
>  gcc/config.gcc| 3 ++-
>  gcc/config/i386/driver-i386.cc| 5 -
>  gcc/config/i386/i386-c.cc | 7 +++
>  gcc/config/i386/i386-options.cc   | 2 ++
>  gcc/config/i386/i386.h| 3 +++
>  gcc/doc/extend.texi   | 3 +++
>  gcc/doc/invoke.texi   | 8 
>  gcc/testsuite/g++.target/i386/mv16.C  | 6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc | 1 +
>  12 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index f73834b086c..cc499c46ed0 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -516,6 +516,12 @@ get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_SAPPHIRERAPIDS;
>break;
> +case 0xaf:
> +  /* Sierra Forest.  */
> +  cpu = "sierraforest";
> +  CHECK___builtin_cpu_is ("sierraforest");
> +  cpu_model->__cpu_type = INTEL_SIERRAFOREST;
> +  break;
>  case 0x17:
>  case 0x1d:
>/* Penryn.  */
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 75966779d82..6ccc4d2f03c 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1874,6 +1874,7 @@ const char *const processor_names[] =
>"goldmont",
>"goldmont-plus",
>"tremont",
> +  "sierraforest",
>"knl",
>"knm",
>"skylake",
> @@ -2019,6 +2020,8 @@ const pta processor_alias_table[] =
>  M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2},
>{"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT,
>  M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2},
> +  {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
> +M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
>{"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL,
>  M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F},
>{"knm", PROCESSOR_KNM, CPU_SLM, PTA_KNM,
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
> b/gcc/common/config/i386/i386-cpuinfo.h
> index 5a61d817007..a71a10ebbd7 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -58,6 +58,7 @@ enum processor_types
>INTEL_TREMONT,
>AMDFAM19H,
>ZHAOXIN_FAM7H,
> +  INTEL_SIERRAFOREST,
>CPU_TYPE_MAX,
>BUILTIN_CPU_TYPE_MAX = CPU_TYPE_MAX
>  };
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index fe063bfbb26..c0e10a72bd5 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -665,7 +665,8 @@ slm nehalem westmere sandybridge ivybridge haswell 
> broadwell bonnell \
>  silvermont knl knm skylake-avx512 cannonlake icelake-client icelake-server \
>  skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
>  sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 
> nano-3000 \
> -nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 native"
> +nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
> +sierraforest native"
>
>  # Additional x86 processors supported by --with-cpu=.  Each processor
>  # MUST be separated by exactly one space.
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index ef567045c67..be205a56ea2 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -589,8 +589,11 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
> 

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-02 Thread Kong, Lingling via Gcc-patches
> > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > > index 7c6bfa6e..cd0282f1 100644
> > > --- a/htdocs/gcc-13/changes.html
> > > +++ b/htdocs/gcc-13/changes.html
> > > @@ -230,6 +230,8 @@ a work-in-progress.
> > >For both C and C++ the __bf16 type is supported on
> > >x86 systems with SSE2 and above enabled.
> > >
> > > +  Use __bf16 type for AVX512BF16 intrinsics.
> > Could you add more explanations. Like originally it's ..., now it's
> > ..., and what's the difference when users compile the same source
> > code(which contains
> > avx512bf16 intrinsics) with gcc12(and before) and GCC13.
> > > +  
> > >  
> > >
> > >  
> > > --
> > > 2.18.2
> > >
> Yes,  changed it. Thanks a lot!
> 
> Subject: [PATCH] Mention Intel __bf16 support in AVX512BF16 intrinsics.
> 
> ---
>  htdocs/gcc-13/changes.html | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 7c6bfa6e..a35f4fab 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -230,6 +230,12 @@ a work-in-progress.
>For both C and C++ the __bf16 type is supported on
>x86 systems with SSE2 and above enabled.
>
> +  Use __bf16 type for AVX512BF16 intrinsics.
> + Previously we use  short to represent bf16. Now we introduced
> __bf16 to x86 psABI.
> +  So we switch intrinsics in AVX512BF16 to the new type __bf16.
> +  When users compile the same source code contains AVX512BF16
> + intrinsics with
> +  GCC13 need to support SSE2, which is different to GCC12 (and before).
> +  
>  
> 
>  
> --
> 2.18.2
> 
> BRs,
> Lingling

Sorry, modified again. New patch is as below.

htdocs/gcc-13/changes.html | 5 +
 1 file changed, 5 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 
7c6bfa6e..7a5d2ab6 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -230,6 +230,11 @@ a work-in-progress.
   For both C and C++ the __bf16 type is supported on
   x86 systems with SSE2 and above enabled.
   
+  Use real __bf16 type for AVX512BF16 intrinsics. 
+ Previously  we use __bfloat16 which is typedef of short. Now we 
+ introduced real  __bf16 type to x86 psABI. Users need to 
+ adjust their  AVX512BF16-related source code when upgrading GCC12 to GCC13.
+  
 
 
 
--
2.18.2

BRs,
Lingling