date:20201104

Re: [PATCH] i386: Cleanup i386/i386elf.h and align it's return convention with the SVR4 ABI

2020-11-04 Thread Uros Bizjak via Gcc-patches

On Fri, Oct 30, 2020 at 9:05 PM Uros Bizjak  wrote:
>
> > As observed a number of years ago in the following thread, i386/i386elf.h 
> > has not been
> > kept up to date:
> >
> > https://gcc.gnu.org/pipermail/gcc/2013-August/209981.html
> >
> > This patch does the following cleanup:
> >
> > 1. The return convention now follows the i386 and x86_64 SVR4 ABIs again. 
> > As discussed
> > in the above thread, the current return convention does not match any other 
> > target or
> > existing ABI, which is problematic since the current approach is 
> > inefficient (particularly on
> > x86_64-elf) and confuses other tools like GDB (unfortunately that thread 
> > did not lead to any
> > fix at the time).
> >
> > 2. The default version of ASM_OUTPUT_ASCII from elfos.h is used. As 
> > mentioned in the
> > cleanup of i386/sysv4.h [1] the ASM_OUTPUT_ASCII implementation then used 
> > by sysv4.h,
> > and currently used by i386elf.h, has a significantly higher computation 
> > complexity than the
> > default version provided by elfos.h.
> >
> > The patch has been tested on i386-elf and x86_64-elf hosted on 
> > x86_64-linux, fixing a
> > number failing tests that were expecting the SVR4 ABI return convention. It 
> > has also been
> > bootstrapped and tested on x86_64-pc-linux-gnu without regression.
> >
> > If approved, I'll need a maintainer to kindly commit on my behalf.
> >
> > Thanks,
> >
> > Pat Bernardi
> > Senior Software Engineer, AdaCore
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2011-February/305559.html
>
> Looking at the [1], it looks that i386elf.h suffered some bitrot.
> Probably nobody cares much for {i386,x86_64}-elf nowadays.
>
> So, I think, uder reasons explained in [1], and based on your testing,
> that the patch should be committed to the mainline to fix the ABI
> issues. However, I wonder if the ABI change is severe enough to
> warrant a compile-time warning?

The difference is with the following testcase:

--cut here--
typedef int v2si __attribute__((__vector_size__(8)));

v2si test2 (void)
{
  v2si z = { 123, 456 };

  return z;
}
--cut here--

currently gcc for i386-elf target crashes when compiled w/o -mmx, and
returns in memory when compiled w/ -mmx. The latter violates i386
psABI, which specifies %mm0 as a return location for _m64 values.

The compiler, patched with your patch returns in %mm0 when compiled w/
-mmx and returns in memory when compiled w/o -mmx in the same way as
unpatched compiler, but also emits a warning about psABI violation.

So, since the unpatched compiler crashes with an example that would
make a difference, I think the patch is OK as it is.

Thanks,
Uros.

RE: [PATCH] SLP: Move load/store-lanes check till late

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> We decided to take the regression in any code-gen this could
> give and fix it properly next stage-1.  As such here's a new
> patch based on your previous feedback.
> 
> Ok for master?

Looks good sofar but be aware that you elide the

- && vect_store_lanes_supported
-  (STMT_VINFO_VECTYPE (scalar_stmts[0]), group_size,
false))

part of the check - that is, you don't verify the store part
of the instance can use store-lanes.  Btw, this means the original
code cancelled an instance only when the SLP graph entry is a
store-lane capable store but your variant would also cancel in
case there's a load-lane capable reduction.

I think that you eventually want to re-instantiate the
store-lane check but treat it the same as any of the load checks
(thus not require all instances to be stores for the cancellation).
But at least when a store cannot use store-lanes we probably shouldn't
cancel the SLP.

Anyway, the patch is OK for master.  The store-lane check part can
be re-added as followup.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes
>   check to ...
>   * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-11b.c: Update output scan.
>   * gcc.dg/vect/slp-perm-6.c: Likewise.
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Thursday, October 22, 2020 9:44 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH] SLP: Move load/store-lanes check till late
> > 
> > On Wed, 21 Oct 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This moves the code that checks for load/store lanes further in the
> > > pipeline and places it after slp_optimize.  This would allow us to
> > > perform optimizations on the SLP tree and only bail out if we really have 
> > > a
> > permute.
> > >
> > > With this change it allows us to handle permutes such as {1,1,1,1}
> > > which should be handled by a load and replicate.
> > >
> > > This change however makes it all or nothing. Either all instances can
> > > be handled or none at all.  This is why some of the test cases have been
> > adjusted.
> > 
> > So this possibly leaves a loop unvectorized in case there's a ldN/stN
> > opportunity but another SLP instance with a permutation not handled by
> > interleaving is present.  What I was originally suggesting is to only 
> > cancel the
> > SLP build if _all_ instances can be handled with ldN/stN.
> > 
> > Of course I'm also happy with completely removing this heuristics.
> > 
> > Note some of the comments look off now, also the assignment to ok before
> > the goto is pointless and you should probably turn this into a dump print
> > instead.
> > 
> > Thanks,
> > Richard.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, -x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Ok for master?
> > 
> > 
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store
> > lanes
> > >   check to ...
> > >   * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/vect/slp-11b.c: Update output scan.
> > >   * gcc.dg/vect/slp-perm-6.c: Likewise.
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

RE: [PATCH] SLP: Move load/store-lanes check till late

2020-11-04 Thread Tamar Christina via Gcc-patches

Hi Richi,

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 8:07 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: RE: [PATCH] SLP: Move load/store-lanes check till late
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > We decided to take the regression in any code-gen this could give and
> > fix it properly next stage-1.  As such here's a new patch based on
> > your previous feedback.
> >
> > Ok for master?
> 
> Looks good sofar but be aware that you elide the
> 
> - && vect_store_lanes_supported
> -  (STMT_VINFO_VECTYPE (scalar_stmts[0]), group_size,
> false))
> 
> part of the check - that is, you don't verify the store part of the instance 
> can
> use store-lanes.  Btw, this means the original code cancelled an instance only
> when the SLP graph entry is a store-lane capable store but your variant
> would also cancel in case there's a load-lane capable reduction.
> 

I do still have it,

  if (loads_permuted
  && vect_store_lanes_supported (vectype, group_size, false))

I just grab the type from the SLP_TREE_VECTYPE (slp_root); which should be the 
store if
one exists. 

> I think that you eventually want to re-instantiate the store-lane check but
> treat it the same as any of the load checks (thus not require all instances to
> be stores for the cancellation).
> But at least when a store cannot use store-lanes we probably shouldn't
> cancel the SLP.

I did however elide the kind check, that was added as part of the rebase, it 
looked like kind wasn't
Being stored inside the SLP instance and I'd have to redo the analysis to find 
it.

Does it does reasonable to include kind as a field in the SLP instance?

> 
> Anyway, the patch is OK for master.  The store-lane check part can be re-
> added as followup.
> 

Thanks! Will do.

> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store
> lanes
> > check to ...
> > * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/slp-11b.c: Update output scan.
> > * gcc.dg/vect/slp-perm-6.c: Likewise.
> >
> > > -Original Message-
> > > From: rguent...@c653.arch.suse.de  On
> > > Behalf Of Richard Biener
> > > Sent: Thursday, October 22, 2020 9:44 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > > Subject: Re: [PATCH] SLP: Move load/store-lanes check till late
> > >
> > > On Wed, 21 Oct 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This moves the code that checks for load/store lanes further in
> > > > the pipeline and places it after slp_optimize.  This would allow
> > > > us to perform optimizations on the SLP tree and only bail out if
> > > > we really have a
> > > permute.
> > > >
> > > > With this change it allows us to handle permutes such as {1,1,1,1}
> > > > which should be handled by a load and replicate.
> > > >
> > > > This change however makes it all or nothing. Either all instances
> > > > can be handled or none at all.  This is why some of the test cases
> > > > have been
> > > adjusted.
> > >
> > > So this possibly leaves a loop unvectorized in case there's a
> > > ldN/stN opportunity but another SLP instance with a permutation not
> > > handled by interleaving is present.  What I was originally
> > > suggesting is to only cancel the SLP build if _all_ instances can be 
> > > handled
> with ldN/stN.
> > >
> > > Of course I'm also happy with completely removing this heuristics.
> > >
> > > Note some of the comments look off now, also the assignment to ok
> > > before the goto is pointless and you should probably turn this into
> > > a dump print instead.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > -x86_64-pc-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > >
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store
> > > lanes
> > > > check to ...
> > > > * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/vect/slp-11b.c: Update output scan.
> > > > * gcc.dg/vect/slp-perm-6.c: Likewise.
> > > >
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imend
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

[PATCH] bootstrap/97666 - really fix sizeof (bool) issue

2020-11-04 Thread Richard Biener

Pastoed the previous fix too quickly, the following fixes the
correct spot - the memset, not the allocation.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-11-04  Richard Biener  

PR bootstrap/97666
* tree-vect-slp.c (vect_build_slp_tree_2): Revert previous
fix and instead adjust the memset.
---
 gcc/tree-vect-slp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 08018a1d799..11fe685bab8 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1428,8 +1428,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 
   /* If the SLP node is a PHI (induction or reduction), terminate
  the recursion.  */
-  bool *skip_args = XALLOCAVEC (bool, sizeof (bool) * nops);
-  memset (skip_args, 0, nops);
+  bool *skip_args = XALLOCAVEC (bool, nops);
+  memset (skip_args, 0, sizeof (bool) * nops);
   if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
 if (gphi *stmt = dyn_cast  (stmt_info->stmt))
   {
-- 
2.26.2

Re: [committed] libstdc++: Allow Lemire's algorithm to be used in more cases

2020-11-04 Thread Stephan Bergmann via Gcc-patches


On 03/11/2020 23:25, Jonathan Wakely wrote:

On 03/11/20 22:28 +0100, Stephan Bergmann via Libstdc++ wrote:

On 29/10/2020 15:59, Jonathan Wakely via Gcc-patches wrote:

This extends the fast path to also work when the URBG's range of
possible values is not the entire range of its result_type. Previously,
the slow path would be used for engines with a uint_fast32_t result type
if that type is actually a typedef for uint64_t rather than uint32_t.
After this change, the generator's result_type is not important, only
the range of possible value that generator can produce. If the
generator's range is exactly UINT64_MAX then the calculation will be
done using 128-bit and 64-bit integers, and if the range is UINT32_MAX
it will be done using 64-bit and 32-bit integers.

In practice, this benefits most of the engines and engine adaptors
defined in [rand.predef] on x86_64-linux and other 64-bit targets. This
is because std::minstd_rand0 and std::mt19937 and others use
uint_fast32_t, which is a typedef for uint64_t.

The code now makes use of the recently-clarified requirement that the
generator's min() and max() functions are usable in constant
expressions (see LWG 2154).

libstdc++-v3/ChangeLog:

* include/bits/uniform_int_dist.h (_Power_of_two): Add
constexpr.
(uniform_int_distribution::_S_nd): Add static_assert to ensure
the wider type is twice as wide as the result type.
(uniform_int_distribution::__generate_impl): Add static_assert
and declare variables as constexpr where appropriate.
(uniform_int_distribution:operator()): Likewise. Only consider
the uniform random bit generator's range of possible results
when deciding whether _S_nd can be used, not the __uctype type.

Tested x86_64-linux. Committed to trunk.


At least with recent Clang trunk, this causes e.g.


$ cat test.cc
#include 
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }


to fail with

$ clang++ --gcc-toolchain=~/gcc/trunk/inst -std=c++17 -fsyntax-only 
test.cc

In file included from test.cc:1:
In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/random:40: 

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/string:52: 

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/stl_algo.h:66: 

~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
error: static_assert expression is not an integral constant expression

   static_assert( __urng.min() < __urng.max(),
  ^~~
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:190:24: 
note: in instantiation of function template specialization 
'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here

   { return this->operator()(__urng, _M_param); }
  ^
test.cc:2:80: note: in instantiation of function template 
specialization 
'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }
  
^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
note: function parameter '__urng' with unknown value cannot be used 
in a constant expression

   static_assert( __urng.min() < __urng.max(),
  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

   operator()(_UniformRandomBitGenerator& __urng,
  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:21: 
error: constexpr variable '__urngmin' must be initialized by a 
constant expression

   constexpr __uctype __urngmin = __urng.min();
  ^   
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:33: 
note: function parameter '__urng' with unknown value cannot be used 
in a constant expression

   constexpr __uctype __urngmin = __urng.min();
  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

   operator()(_UniformRandomBitGenerator& __urng,
  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:285:21: 
error: constexpr variable '__urngmax' must be initialized by a 
constant expression

   constexpr __uctype __urngmax = __urng

Re: [committed] libstdc++: Allow Lemire's algorithm to be used in more cases

2020-11-04 Thread Ville Voutilainen via Gcc-patches

On Wed, 4 Nov 2020 at 10:46, Stephan Bergmann via Libstdc++
 wrote:
> To me it looks like it boils down to disagreement between g++ and
> clang++ over
>
> > struct S { static constexpr int f() { return 0; } };
> > void f(S & s) { static_assert(s.f(), ""); }
>
> where I think Clang might be right in rejecting it based on [expr.const]
> "An expression e is a core constant expression unless [...] an
> id-expression that refers to a variable or data member of reference type
> unless the reference has a preceding initialization [...]"

There's more to it than that. It's a disagreement over [expr.ref]/1.
For a static
member call, gcc just plain doesn't evaluate the s in s.f(). But [expr.ref]/1
says it's evaluated, and since it's not a constant expression, clang rejects
it, and gcc accepts it. That's why your fix works; it removes the use
of the otherwise-mostly-ignored
object expression for a call to a static member function.

So, I think gcc is accepting-invalid here, and we should just apply
the fix you suggested.

[committed] openmp: allocate clause vs. *reduction array sections [PR97670]

2020-11-04 Thread Jakub Jelinek via Gcc-patches

Hi!

This patch finds the base expression of reduction array sections and uses it
in checks whether allocate clause lists only variables that have been 
privatized.
Also fixes a pasto that caused an ICE.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-11-04  Jakub Jelinek  

PR c++/97670
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Look through array reductions to find
underlying decl to clear in the allocate_head bitmap.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Look through array reductions to
find underlying decl to clear in the aligned_head bitmap.
gcc/cp/
* semantics.c (finish_omp_clauses): Look through array reductions to
find underlying decl to clear in the aligned_head bitmap.  Use
DECL_UID (t) instead of DECL_UID (OMP_CLAUSE_DECL (c)) when clearing
in the bitmap.  Only diagnose errors about allocate vars not being
privatized on the same construct on allocate clause if it has
a DECL_P OMP_CLAUSE_DECL.
gcc/testsuite/
* c-c++-common/gomp/allocate-4.c: New test.
* g++.dg/gomp/allocate-2.C: New test.
* g++.dg/gomp/allocate-3.C: New test.

--- gcc/c-family/c-omp.c.jj 2020-10-28 10:37:50.490608344 +0100
+++ gcc/c-family/c-omp.c2020-11-03 12:38:12.152143848 +0100
@@ -2289,13 +2289,36 @@ c_omp_split_clauses (location_t loc, enu
for (c = cclauses[i]; c; c = OMP_CLAUSE_CHAIN (c))
  switch (OMP_CLAUSE_CODE (c))
{
+   case OMP_CLAUSE_REDUCTION:
+   case OMP_CLAUSE_IN_REDUCTION:
+   case OMP_CLAUSE_TASK_REDUCTION:
+ if (TREE_CODE (OMP_CLAUSE_DECL (c)) == MEM_REF)
+   {
+ tree t = TREE_OPERAND (OMP_CLAUSE_DECL (c), 0);
+ if (TREE_CODE (t) == POINTER_PLUS_EXPR)
+   t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) == ADDR_EXPR
+ || TREE_CODE (t) == INDIRECT_REF)
+   t = TREE_OPERAND (t, 0);
+ if (DECL_P (t))
+   bitmap_clear_bit (&allocate_head, DECL_UID (t));
+ break;
+   }
+ else if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST)
+   {
+ tree t;
+ for (t = OMP_CLAUSE_DECL (c);
+  TREE_CODE (t) == TREE_LIST; t = TREE_CHAIN (t))
+   ;
+ if (DECL_P (t))
+   bitmap_clear_bit (&allocate_head, DECL_UID (t));
+ break;
+   }
+ /* FALLTHRU */
case OMP_CLAUSE_PRIVATE:
case OMP_CLAUSE_FIRSTPRIVATE:
case OMP_CLAUSE_LASTPRIVATE:
case OMP_CLAUSE_LINEAR:
-   case OMP_CLAUSE_REDUCTION:
-   case OMP_CLAUSE_IN_REDUCTION:
-   case OMP_CLAUSE_TASK_REDUCTION:
  if (DECL_P (OMP_CLAUSE_DECL (c)))
bitmap_clear_bit (&allocate_head,
  DECL_UID (OMP_CLAUSE_DECL (c)));
--- gcc/c/c-typeck.c.jj 2020-10-30 08:59:57.024496901 +0100
+++ gcc/c/c-typeck.c2020-11-03 12:10:41.436201867 +0100
@@ -15072,13 +15072,26 @@ c_finish_omp_clauses (tree clauses, enum
if (allocate_seen)
  switch (OMP_CLAUSE_CODE (c))
{
+   case OMP_CLAUSE_REDUCTION:
+   case OMP_CLAUSE_IN_REDUCTION:
+   case OMP_CLAUSE_TASK_REDUCTION:
+ if (TREE_CODE (OMP_CLAUSE_DECL (c)) == MEM_REF)
+   {
+ t = TREE_OPERAND (OMP_CLAUSE_DECL (c), 0);
+ if (TREE_CODE (t) == POINTER_PLUS_EXPR)
+   t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) == ADDR_EXPR
+ || TREE_CODE (t) == INDIRECT_REF)
+   t = TREE_OPERAND (t, 0);
+ if (DECL_P (t))
+   bitmap_clear_bit (&aligned_head, DECL_UID (t));
+ break;
+   }
+ /* FALLTHRU */
case OMP_CLAUSE_PRIVATE:
case OMP_CLAUSE_FIRSTPRIVATE:
case OMP_CLAUSE_LASTPRIVATE:
case OMP_CLAUSE_LINEAR:
-   case OMP_CLAUSE_REDUCTION:
-   case OMP_CLAUSE_IN_REDUCTION:
-   case OMP_CLAUSE_TASK_REDUCTION:
  if (DECL_P (OMP_CLAUSE_DECL (c)))
bitmap_clear_bit (&aligned_head,
  DECL_UID (OMP_CLAUSE_DECL (c)));
--- gcc/cp/semantics.c.jj   2020-11-03 11:15:07.302679556 +0100
+++ gcc/cp/semantics.c  2020-11-03 15:35:41.900552419 +0100
@@ -8190,17 +8190,11 @@ finish_omp_clauses (tree clauses, enum c
}
 
   t = OMP_CLAUSE_DECL (c);
-  if (processing_template_decl
- && !VAR_P (t) && TREE_CODE (t) != PARM_DECL)
-   {
- pc = &OMP_CLAUSE_CHAIN

Re: [Patch + RFC][contrib] gcc-changelog/git_commit.py: Check for missing description

2020-11-04 Thread Martin Liška


On 11/3/20 7:46 PM, Tobias Burnus wrote:

On 03.11.20 17:28, Martin Liška wrote:


I really think the check should support situations where a description
is provided on the next line (first after '\t', so not '\t*') as
you see in the failing test:

That was supposed to happen, but obviously didn't (first condition wrong).
Now done more properly, also because I did find an existing check, which
I missed before.

Successfully tested with --flake8 removed and running flake8 *.py.


Yes, it works nice! I've just verified that for git gcc-verify  
misc/first-auto-changelog..HEAD
it found 6 errors that are all valid.

Thanks for working on that. Please install the patch.
Martin



Better?

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

[PATCH] i386: Fix Intel MCU psABI comment w.r.t DEFAULT_PCC_STRUCT_RETURN

2020-11-04 Thread Uros Bizjak via Gcc-patches

2020-11-04  Uroš Bizjak  

gcc/

   * config/i386/i386-options.c (ix86_recompute_optlev_based_flags):
   Fix Intel MCU psABI comment w.r.t DEFAULT_PCC_STRUCT_RETURN.

Pushed.

Uros.
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 4e1dd7ccc93..4128e933291 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1734,7 +1734,7 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
*opts,
   if (opts->x_flag_pcc_struct_return == 2)
{
  /* Intel MCU psABI specifies that -freg-struct-return should
-be on.  Instead of setting DEFAULT_PCC_STRUCT_RETURN to 1,
+be on.  Instead of setting DEFAULT_PCC_STRUCT_RETURN to 0,
 we check -miamcu so that -freg-struct-return is always
 turned on if -miamcu is used.  */
  if (TARGET_IAMCU_P (opts->x_target_flags))

[PATCH][PR target/97642] Fix incorrect replacement of vmovdqu32 with vpblendd.

2020-11-04 Thread Hongtao Liu via Gcc-patches

Hi:
  When programmers explicitly use mask loaded intrinsics, don't
transform the instruction to vpblend{b,w,d,q} since If mem_addr points
to a memory region with less than whole vector size of accessible
memory,  the mask would prevent reading the inaccessible bytes which
could avoid fault.

  Bootstrap is ok, gcc regress test for i386/x86_64 backend is ok.
  Ok for trunk?

gcc/ChangeLog:

PR target/97642
* config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
(*_load_mask): New define_insns for masked load
instructions.
(_load_mask): Changed to define_expands which
specifically handle memory operands.
(_blendm): Changed to define_insns which are same
as original _load_mask with adjustment of
operands order.
(*_load): New define_insn_and_split which is
used to optimize for masked load with all one mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
make sure only masked load instruction is generated.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
* gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
* gcc.target/i386/pr97642-1.c: New test.
* gcc.target/i386/pr97642-2.c: New test.

-- 
BR,
Hongtao
From 48cf0adcd55395653891888f4768b8bdc19786f2 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 3 Nov 2020 17:26:43 +0800
Subject: [PATCH] Fix incorrect replacement of vmovdqu32 with vpblendd which
 can cause fault.

gcc/ChangeLog:

	PR target/97642
	* config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
	(*_load_mask): New define_insns for masked load
	instructions.
	(_load_mask): Changed to define_expands which
	specifically handle memory operands.
	(_blendm): Changed to define_insns which are same
	as original _load_mask with adjustment of
	operands order.
	(*_load): New define_insn_and_split which is
	used to optimize for masked load with all one mask.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
	make sure only masked load instruction is generated.
	* gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
	* gcc.target/i386/pr97642-1.c: New test.
	* gcc.target/i386/pr97642-2.c: New test.
---
 gcc/config/i386/sse.md| 138 ++
 .../gcc.target/i386/avx512bw-vmovdqu16-1.c|   6 +-
 .../gcc.target/i386/avx512bw-vmovdqu8-1.c |   6 +-
 .../gcc.target/i386/avx512f-vmovapd-1.c   |   2 +-
 .../gcc.target/i386/avx512f-vmovaps-1.c   |   2 +-
 .../gcc.target/i386/avx512f-vmovdqa32-1.c |   2 +-
 .../gcc.target/i386/avx512f-vmovdqa64-1.c |   2 +-
 .../gcc.target/i386/avx512vl-vmovapd-1.c  |   4 +-
 .../gcc.target/i386/avx512vl-vmovaps-1.c  |   4 +-
 .../gcc.target/i386/avx512vl-vmovdqa32-1.c|   4 +-
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c|   4 +-
 gcc/testsuite/gcc.target/i386/pr97642-1.c |  23 +++
 gcc/testsuite/gcc.target/i386/pr97642-2.c |  77 ++
 13 files changed, 228 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97642-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97642-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 12e83df3010..0025aba4ad1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -111,6 +111,8 @@ (define_c_enum "unspec" [
   UNSPEC_MASKOP
   UNSPEC_KORTEST
   UNSPEC_KTEST
+  ;; Mask load
+  UNSPEC_MASKLOAD
 
   ;; For embed. rounding feature
   UNSPEC_EMBEDDED_ROUNDING
@@ -1065,18 +1067,34 @@ (define_insn "mov_internal"
 	  ]
 	  (symbol_ref "true")))])
 
-(define_insn "_load_mask"
-  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v")
+;; If mem_addr points to a memory region with less than whole vector size bytes
+;; of accessible memory and k is a mask that would prevent reading the inaccessible
+;; bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd
+;; See pr97642.
+(define_expand "_load_mask"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
 	(vec_merge:V48_AVX512VL
-	  (match_operand:V48_AVX512VL 1 "nonimmediate_operand" "vm,vm")
-	  (match_operand:V48_AVX512VL 2 "nonimm_or_0_operand" "0C,v")
-	  (

Re: Move pass_oacc_device_lower after pass_graphite

2020-11-04 Thread Richard Biener via Gcc-patches

On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath
 wrote:
>
>
> Hi,
>
> as a first step towards enabling the use of Graphite for optimizing
> OpenACC loops this patch moves the OpenACC device lowering after the
> Graphite pass.  This means that the device lowering now takes place
> after some crucial optimization passes. Thus new instances of those
> passes are added inside of a new pass pass_oacc_functions which ensures
> that they run on OpenACC functions only. The choice of the new position
> for pass_oacc_device_lower is further constrainted by the need to
> execute it before pass_vectorize.  This means that
> pass_oacc_device_lower now runs inside of pass_tree_loop. A further
> instance of the pass that handles functions without loops is added
> inside of pass_tree_no_loop. Yet another pass instance that executes if
> optimizations are disabled is included inside of a new
> pass_no_optimizations.
>
> The patch has been bootstrapped on x86_64-linux-gnu and tested with the
> GCC testsuite and with the libgomp testsuite with nvptx and gcn
> offloading.
>
> The patch should have no impact on non-OpenACC user code. However the
> new pass instances have changed the pass instance numbering and hence
> the dump scanning commands in several tests had to be adjusted. I hope

What's on my TODO list (or on the list of things to explore) is to make
the dump file names/suffixes explicit in passes.def like via

  NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")

and we'd get a dump named .ccp_oacc or so.  Or stick with explicit
numbers by specifying , 5.  If just the number is fixed this could
eventually be done with just tweaks to gen-pass-instances.awk

Now, what does oacc_device_lower actually do that you need to
re-run complex lowering?  What does cunrolli do at this point that
the complete_unroll pass later does not do?

What's special about oacc_device lower that doesn't also apply
to omp_device_lower?

Is all this targeted at code compiled exclusively for the offload
target?  Thus we're in lto1 here?  Does it make eventually more
sense to have a completely custom pass pipeline for the
offload compilation?  Maybe even per offload target?  See how
we have a custom pipeline for -Og (pass_all_optimizations_g).

> that I found all that needed adjustment, but it is well possible that I
> missed some tests that execute for particular targets or non-default
> languages only. The resulting UNRESOLVED tests are usually easily fixed
> by appending a pass number to the name of a pass that previously had no
> number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass
> number (e.g. "dce6" becomes "dce7") in a dump scanning command.
>
> The patch leads to several new unresolved tests in the libgomp testsuite
> which are caused by the combination of torture testing, missing cleanup
> of the offload dump files, and the new pass numbering.  If a test that
> uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with
> "-O0" and afterwards with "-O2", each run of the test executes different
> instances of pass_oacc_device_lower and produces dumps whose names
> differ only in the pass instance number.  The dump scanning command in
> the second run fails, because the dump files do not get removed after
> the first run and the command consequently matches two different dump
> files.  This seems to be a known issue.  I am going to submit a patch
> that implements the cleanup of the offload dumps soon.
>
> I have tried to rule out performance regressions by running different
> benchmark suites with nvptx and gcn offloading. Nevertheless, I think
> that it makes sense to keep an eye on OpenACC performance in the close
> future and revisit the optimizations that run on the device lowered
> function if necessary.
>
> Ok to include the patch in master?
>
> Best regards,
> Frederik
>
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter

[RFC PATCH] phiopt: Optimize x ? 1024 : 0 to (int) x << 10 [PR97690]

2020-11-04 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch generalizes the x ? 1 : 0 -> (int) x optimization
to handle also left shifts by constant.

During x86_64-linux and i686-linux bootstraps + regtests it triggered
in 1514 unique non-LTO -m64 cases (sort -u on log mentioning
filename, function name and shift count) and 1866 -m32 cases.

Unfortunately, the patch regresses:
+FAIL: gcc.dg/tree-ssa/ssa-ccp-11.c scan-tree-dump-times optimized "if " 0
+FAIL: gcc.dg/vect/bb-slp-pattern-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp1 "optimized: basic block" 1
+FAIL: gcc.dg/vect/bb-slp-pattern-2.c scan-tree-dump-times slp1 "optimized: 
basic block" 1
and in both cases it actually results in worse code.

In ssa-ccp-11.c since phiopt2 it results in smaller IL due to the
optimization, e.g.
-  if (_1 != 0)
-goto ; [21.72%]
-  else
-goto ; [78.28%]
-
-   [local count: 233216728]:
-
-   [local count: 1073741824]:
-  # _4 = PHI <2(5), 0(4)>
-  return _4;
+  _7 = (int) _1;
+  _8 = _7 << 1;
+  return _8;
but dom2 actually manages to optimize it only without this optimization:
-  # a_7 = PHI <0(3), 1(2)>
-  # b_8 = PHI <1(3), 0(2)>
-  _9 = a_7 & b_8;
-  return 0;
+  # a_2 = PHI <1(2), 0(3)>
+  # b_3 = PHI <0(2), 1(3)>
+  _1 = a_2 & b_3;
+  _7 = (int) _1;
+  _8 = _7 << 1;
+  return _8;
We'd need some optimization that would go through all PHI edges and
compute if some use of the phi results don't actually compute a constant
across all the PHI edges - 1 & 0 and 0 & 1 is always 0.  Similarly in the
other function
+  # a_1 = PHI <3(2), 2(3)>
+  # b_2 = PHI <2(2), 3(3)>
+  c_5 = a_1 + b_2;
is always c_5 = 5;
Similarly, in the slp vectorization test there is:
 a[0] = b[0] ? 1 : 7;
 a[1] = b[1] ? 2 : 0;
 a[2] = b[2] ? 3 : 0;
 a[3] = b[3] ? 4 : 0;
 a[4] = b[4] ? 5 : 0;
 a[5] = b[5] ? 6 : 0;
 a[6] = b[6] ? 7 : 0;
 a[7] = b[7] ? 8 : 0;
and obviously if the ? 2 : 0 and ? 4 : 0 and ? 8 : 0 are optimized
into shifts, it doesn't match anymore.

So, I wonder if we e.g. shouldn't perform this optimization only in the last
phiopt pass (i.e. change the bool early argument to int late where it would
be 0 (early), 1 (late) and 2 (very late) and perform this only if very late.

Thoughts on this?

2020-11-03  Jakub Jelinek  

PR tree-optimization/97690
* tree-ssa-phiopt.c (conditional_replacement): Also optimize
cond ? pow2p_cst : 0 as ((type) cond) << cst.

* gcc.dg/tree-ssa/phi-opt-22.c: New test.

--- gcc/tree-ssa-phiopt.c.jj2020-10-22 09:36:25.602484491 +0200
+++ gcc/tree-ssa-phiopt.c   2020-11-03 17:59:18.133662581 +0100
@@ -752,7 +752,9 @@ conditional_replacement (basic_block con
   gimple_stmt_iterator gsi;
   edge true_edge, false_edge;
   tree new_var, new_var2;
-  bool neg;
+  bool neg = false;
+  int shift = 0;
+  tree nonzero_arg;
 
   /* FIXME: Gimplification of complex type is too hard for now.  */
   /* We aren't prepared to handle vectors either (and it is a question
@@ -763,14 +765,22 @@ conditional_replacement (basic_block con
   || POINTER_TYPE_P (TREE_TYPE (arg1
 return false;
 
-  /* The PHI arguments have the constants 0 and 1, or 0 and -1, then
- convert it to the conditional.  */
-  if ((integer_zerop (arg0) && integer_onep (arg1))
-  || (integer_zerop (arg1) && integer_onep (arg0)))
-neg = false;
-  else if ((integer_zerop (arg0) && integer_all_onesp (arg1))
-  || (integer_zerop (arg1) && integer_all_onesp (arg0)))
+  /* The PHI arguments have the constants 0 and 1, or 0 and -1 or
+ 0 and (1 << cst), then convert it to the conditional.  */
+  if (integer_zerop (arg0))
+nonzero_arg = arg1;
+  else if (integer_zerop (arg1))
+nonzero_arg = arg0;
+  else
+return false;
+  if (integer_all_onesp (nonzero_arg))
 neg = true;
+  else if (integer_pow2p (nonzero_arg))
+{
+  shift = tree_log2 (nonzero_arg);
+  if (shift && POINTER_TYPE_P (TREE_TYPE (nonzero_arg)))
+   return false;
+}
   else
 return false;
 
@@ -782,12 +792,12 @@ conditional_replacement (basic_block con
  falls through into BB.
 
  There is a single PHI node at the join point (BB) and its arguments
- are constants (0, 1) or (0, -1).
+ are constants (0, 1) or (0, -1) or (0, (1 << shift)).
 
  So, given the condition COND, and the two PHI arguments, we can
  rewrite this PHI into non-branching code:
 
-   dest = (COND) or dest = COND'
+   dest = (COND) or dest = COND' or dest = (COND) << shift
 
  We use the condition as-is if the argument associated with the
  true edge has the value one or the argument associated with the
@@ -822,6 +832,14 @@ conditional_replacement (basic_block con
   cond = fold_build1_loc (gimple_location (stmt),
   NEGATE_EXPR, TREE_TYPE (cond), cond);
 }
+  else if (shift)
+{
+  cond = fold_convert_loc (gimple_location (stmt),
+  TREE_TYPE (result), cond);
+  cond = fold_build2_loc (gimple_loc

Re: [PATCH v5] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Raoni Fassina Firmino wrote:

> I am repeating the "changelog" from v3 and v4 here because v4 and v5
> have just minor changes since v3.
> 
> Changes since v4[1]:
>   - Fixed more spelling and code style.
>   - Add more clarification on  comments for feraiseexcept and
> feclearexcept expands;
> 
> Changes since v3[2]:
>   - Fixed fegetround bug on powerpc64 (big endian) that Segher
> spotted;
> 
> Changes since v2[3]:
>   - Added documentation for the new optabs;
>   - Remove use of non portable __builtin_clz;
>   - Changed feclearexcept and feraiseexcept to accept all 4 valid
> flags at the same time and added more test for that case;
>   - Extended feclearexcept and feraiseexcept testcases to match
> accepting multiple flags;
>   - Fixed builtin-feclearexcept-feraiseexcept-2.c testcase comparison
> after feclearexcept tests;
>   - Updated commit message to reflect change in feclearexcept and
> feraiseexcept from the glibc conterpart;
>   - Fixed English spelling and typos;
>   - Fixed code-style;
>   - Changed subject line tag to make clear it is not just rs6000 code.
> 
> Tested on top of master (23ac7a009ecfeec3eab79136abed8aac9768b458)
> on the following plataforms with no regression:
>   - powerpc64le-linux-gnu (Power 9)
>   - powerpc64le-linux-gnu (Power 8)
>   - powerpc64-linux-gnu (Power 8)
>   - powerpc-linux-gnu (Power 8)
> 
> Documentation changes tested on x86_64-redhat-linux.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557349.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557109.html
> [3] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553297.html
> 
>  8< 
> 
> This optimizations were originally in glibc, but was removed
> and suggested that they were a good fit as gcc builtins[1].
> 
> feclearexcept and feraiseexcept were extended (in comparison to the
> glibc version) to accept any combination of the accepted flags, not
> limited to just one flag bit at a time anymore.
> 
> The associated bugreport: PR target/94193
> 
> [1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html
> https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html
> 
> 2020-08-13  Raoni Fassina Firmino  
> 
> gcc/ChangeLog:
> 
> * builtins.c (expand_builtin_fegetround): New function.
> (expand_builtin_feclear_feraise_except): New function.
> (expand_builtin): Add cases for BUILT_IN_FEGETROUND,
> BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT
> * config/rs6000/rs6000.md (fegetroundsi): New pattern.
> (feclearexceptsi): New Pattern.
> (feraiseexceptsi): New Pattern.
> * optabs.def (fegetround_optab): New optab.
> (feclearexcept_optab): New optab.
> (feraiseexcept_optab): New optab.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New 
> test.
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New 
> test.
> * gcc.target/powerpc/builtin-fegetround.c: New test.
> 
> Signed-off-by: Raoni Fassina Firmino 
> ---
>  gcc/builtins.c|  76 +++
>  gcc/config/rs6000/rs6000.md   |  83 +++
>  gcc/doc/md.texi   |  17 ++
>  gcc/optabs.def|   4 +
>  .../builtin-feclearexcept-feraiseexcept-1.c   |  76 +++
>  .../builtin-feclearexcept-feraiseexcept-2.c   | 203 ++
>  .../gcc.target/powerpc/builtin-fegetround.c   |  36 
>  7 files changed, 495 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index da25343beb1..4d80f34a110 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -116,6 +116,9 @@ static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
>  static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
>  static rtx expand_builtin_interclass_mathfn (tree, rtx);
>  static rtx expand_builtin_sincos (tree);
> +static rtx expand_builtin_fegetround (tree, rtx, machine_mode);
> +static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode,
> +   optab);
>  static rtx expand_builtin_cexpi (tree, rtx);
>  static rtx expand_builtin_int_roundingfn (tree, rtx);
>  static rtx expand_builtin_int_roundingfn_2 (tree, rtx);
> @@ -2893,6 +2896,59 @@ expand_builtin_sincos (tree exp)
>return const0_rtx;
>  }
>  
> +/* Expand call EXP to the fegetround builtin (from C99 fenv.h), returning the
> +   result and setting it in TARGET.  Otherwise return NULL_RTX on failure.  
> */
> +static rtx
> +expand_builtin_fegetround (tree exp, rtx target, machine_mode target_mode)
> +{
> +  if (!valid

RE: [PATCH] SLP: Move load/store-lanes check till late

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 8:07 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: RE: [PATCH] SLP: Move load/store-lanes check till late
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > We decided to take the regression in any code-gen this could give and
> > > fix it properly next stage-1.  As such here's a new patch based on
> > > your previous feedback.
> > >
> > > Ok for master?
> > 
> > Looks good sofar but be aware that you elide the
> > 
> > - && vect_store_lanes_supported
> > -  (STMT_VINFO_VECTYPE (scalar_stmts[0]), group_size,
> > false))
> > 
> > part of the check - that is, you don't verify the store part of the 
> > instance can
> > use store-lanes.  Btw, this means the original code cancelled an instance 
> > only
> > when the SLP graph entry is a store-lane capable store but your variant
> > would also cancel in case there's a load-lane capable reduction.
> > 
> 
> I do still have it,
> 
> if (loads_permuted
> && vect_store_lanes_supported (vectype, group_size, false))
> 
> I just grab the type from the SLP_TREE_VECTYPE (slp_root); which should be 
> the store if
> one exists. 

Ah, I see.

> > I think that you eventually want to re-instantiate the store-lane check but
> > treat it the same as any of the load checks (thus not require all instances 
> > to
> > be stores for the cancellation).
> > But at least when a store cannot use store-lanes we probably shouldn't
> > cancel the SLP.
> 
> I did however elide the kind check, that was added as part of the rebase, it 
> looked like kind wasn't
> Being stored inside the SLP instance and I'd have to redo the analysis to 
> find it.
> 
> Does it does reasonable to include kind as a field in the SLP instance?

Yeah, it's on my list of things - I just didn't (yet) have a use outside
of the current narrow scope.  So feel free to add a field to the SLP
instance and move the enum to tree-vectorizer.h.

> > 
> > Anyway, the patch is OK for master.  The store-lane check part can be re-
> > added as followup.
> > 
> 
> Thanks! Will do.
> 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store
> > lanes
> > >   check to ...
> > >   * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/vect/slp-11b.c: Update output scan.
> > >   * gcc.dg/vect/slp-perm-6.c: Likewise.
> > >
> > > > -Original Message-
> > > > From: rguent...@c653.arch.suse.de  On
> > > > Behalf Of Richard Biener
> > > > Sent: Thursday, October 22, 2020 9:44 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > > > Subject: Re: [PATCH] SLP: Move load/store-lanes check till late
> > > >
> > > > On Wed, 21 Oct 2020, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This moves the code that checks for load/store lanes further in
> > > > > the pipeline and places it after slp_optimize.  This would allow
> > > > > us to perform optimizations on the SLP tree and only bail out if
> > > > > we really have a
> > > > permute.
> > > > >
> > > > > With this change it allows us to handle permutes such as {1,1,1,1}
> > > > > which should be handled by a load and replicate.
> > > > >
> > > > > This change however makes it all or nothing. Either all instances
> > > > > can be handled or none at all.  This is why some of the test cases
> > > > > have been
> > > > adjusted.
> > > >
> > > > So this possibly leaves a loop unvectorized in case there's a
> > > > ldN/stN opportunity but another SLP instance with a permutation not
> > > > handled by interleaving is present.  What I was originally
> > > > suggesting is to only cancel the SLP build if _all_ instances can be 
> > > > handled
> > with ldN/stN.
> > > >
> > > > Of course I'm also happy with completely removing this heuristics.
> > > >
> > > > Note some of the comments look off now, also the assignment to ok
> > > > before the goto is pointless and you should probably turn this into
> > > > a dump print instead.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > -x86_64-pc-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store
> > > > lanes
> > > > >   check to ...
> > > > >   * tree-vect-loop.c (vect_analyze_loop_2): ..Here
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >   * gcc.dg/vect/slp-11b.c: Update output scan.
> > > > >   * gcc.dg/vect/slp-perm-6.c: Likew

[PATCH] testsuite: fix arm/pure-code/no-literal-pool-* tests

2020-11-04 Thread Christophe Lyon via Gcc-patches

Hi,

Add -mfloat-abi=soft and skip the tests if -mfloat-abi=hard is
supplied.

This avoids failures when testing with overridden flags such as
mthumb/-mcpu=cortex-m4/-mfloat-abi=hard

Pushed as obvious.

2020-11-04  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/pure-code/no-literal-pool-m0.c: Add dg-skip-if
and -mfloat-abi=soft option.
* gcc.target/arm/pure-code/no-literal-pool-m23.c: Likewise.
From 14ddf41acb96f28815b9fffe9a408be255e1ca2c Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Wed, 4 Nov 2020 09:33:42 +
Subject: [PATCH] testsuite: fix arm/pure-code/no-literal-pool-* tests

Add -mfloat-abi=soft and skip the tests if -mfloat-abi=hard is
supplied.

This avoids failures when testing with overridden flags such as
mthumb/-mcpu=cortex-m4/-mfloat-abi=hard

2020-11-04  Christophe Lyon  

	gcc/testsuite/
	* gcc.target/arm/pure-code/no-literal-pool-m0.c: Add dg-skip-if
	and -mfloat-abi=soft option.
	* gcc.target/arm/pure-code/no-literal-pool-m23.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c  | 3 ++-
 gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m23.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
index 787a61a..bd6f4af 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
+++ b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-mpure-code -mcpu=cortex-m0 -march=armv6s-m -mthumb" } */
+/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
+/* { dg-options "-mpure-code -mcpu=cortex-m0 -march=armv6s-m -mthumb -mfloat-abi=soft" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 /* Does not use thumb1_gen_const_int.
diff --git a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m23.c b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m23.c
index 67d63d2..9537012 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m23.c
+++ b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m23.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb" } */
+/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
+/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb -mfloat-abi=soft" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 /*
-- 
2.7.4

[PATCH] Re-instantiate SLP induction IV CSE

2020-11-04 Thread Richard Biener

This re-instantiates the previously removed CSE, fixing the
FAIL of gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
It turns out the previous approach still works.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-04  Richard Biener  

* tree-vect-loop.c (vectorizable_induction): Re-instantiate
previously removed CSE of SLP IVs.
---
 gcc/tree-vect-loop.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 41e2e2ade20..c09aa392419 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7874,8 +7874,16 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   if (nested_in_vect_loop)
nivs = nvects;
   else
-   nivs = least_common_multiple (group_size,
- const_nunits) / const_nunits;
+   {
+ /* Compute the number of distinct IVs we need.  First reduce
+group_size if it is a multiple of const_nunits so we get
+one IV for a group_size of 4 but const_nunits 2.  */
+ unsigned group_sizep = group_size;
+ if (group_sizep % const_nunits == 0)
+   group_sizep = group_sizep / const_nunits;
+ nivs = least_common_multiple (group_sizep,
+   const_nunits) / const_nunits;
+   }
   tree stept = TREE_TYPE (step_vectype);
   tree lupdate_mul = NULL_TREE;
   if (!nested_in_vect_loop)
@@ -7975,6 +7983,15 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 
  SLP_TREE_VEC_STMTS (slp_node).quick_push (induction_phi);
}
+  if (!nested_in_vect_loop)
+   {
+ /* Fill up to the number of vectors we need for the whole group.  */
+ nivs = least_common_multiple (group_size,
+   const_nunits) / const_nunits;
+ for (; ivn < nivs; ++ivn)
+   SLP_TREE_VEC_STMTS (slp_node)
+ .quick_push (SLP_TREE_VEC_STMTS (slp_node)[0]);
+   }
 
   /* Re-use IVs when we can.  We are generating further vector
 stmts by adding VF' * stride to the IVs generated above.  */
-- 
2.26.2

Re: [RFC PATCH] phiopt: Optimize x ? 1024 : 0 to (int) x << 10 [PR97690]

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch generalizes the x ? 1 : 0 -> (int) x optimization
> to handle also left shifts by constant.
> 
> During x86_64-linux and i686-linux bootstraps + regtests it triggered
> in 1514 unique non-LTO -m64 cases (sort -u on log mentioning
> filename, function name and shift count) and 1866 -m32 cases.
> 
> Unfortunately, the patch regresses:
> +FAIL: gcc.dg/tree-ssa/ssa-ccp-11.c scan-tree-dump-times optimized "if " 0
> +FAIL: gcc.dg/vect/bb-slp-pattern-2.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp1 "optimized: basic block" 1
> +FAIL: gcc.dg/vect/bb-slp-pattern-2.c scan-tree-dump-times slp1 "optimized: 
> basic block" 1
> and in both cases it actually results in worse code.
> 
> In ssa-ccp-11.c since phiopt2 it results in smaller IL due to the
> optimization, e.g.
> -  if (_1 != 0)
> -goto ; [21.72%]
> -  else
> -goto ; [78.28%]
> -
> -   [local count: 233216728]:
> -
> -   [local count: 1073741824]:
> -  # _4 = PHI <2(5), 0(4)>
> -  return _4;
> +  _7 = (int) _1;
> +  _8 = _7 << 1;
> +  return _8;
> but dom2 actually manages to optimize it only without this optimization:
> -  # a_7 = PHI <0(3), 1(2)>
> -  # b_8 = PHI <1(3), 0(2)>
> -  _9 = a_7 & b_8;
> -  return 0;
> +  # a_2 = PHI <1(2), 0(3)>
> +  # b_3 = PHI <0(2), 1(3)>
> +  _1 = a_2 & b_3;
> +  _7 = (int) _1;
> +  _8 = _7 << 1;
> +  return _8;
> We'd need some optimization that would go through all PHI edges and
> compute if some use of the phi results don't actually compute a constant
> across all the PHI edges - 1 & 0 and 0 & 1 is always 0.

PRE should do this, IMHO only optimizing it at -O2 is fine.  Can you
check?

>  Similarly in the
> other function
> +  # a_1 = PHI <3(2), 2(3)>
> +  # b_2 = PHI <2(2), 3(3)>
> +  c_5 = a_1 + b_2;
> is always c_5 = 5;
> Similarly, in the slp vectorization test there is:
>  a[0] = b[0] ? 1 : 7;

note this, carefully avoiding the already "optimized" b[0] ? 1 : 0 ...

>  a[1] = b[1] ? 2 : 0;
>  a[2] = b[2] ? 3 : 0;
>  a[3] = b[3] ? 4 : 0;
>  a[4] = b[4] ? 5 : 0;
>  a[5] = b[5] ? 6 : 0;
>  a[6] = b[6] ? 7 : 0;
>  a[7] = b[7] ? 8 : 0;
> and obviously if the ? 2 : 0 and ? 4 : 0 and ? 8 : 0 are optimized
> into shifts, it doesn't match anymore.

So the option is to put : 7 in the 2, 4 an 8 case as well.  The testcase
wasn't added for any real-world case but is artificial I guess for
COND_EXPR handling of invariants.

> So, I wonder if we e.g. shouldn't perform this optimization only in the last
> phiopt pass (i.e. change the bool early argument to int late where it would
> be 0 (early), 1 (late) and 2 (very late) and perform this only if very late.

Well, we always have the issue that a more "complex" expression might
be more easily canonical.  But removing control flow is important
and if we decide that we want to preserve it it more "canonical"
(general) form then we should consider replacing

  if (_1 != 0)

  # _2 = PHI <0, 1>

with

  _2 = _1 ? 0 : 1;

in general and doing fancy expansion late.  But we're already doing
the other thing so ...

But yeah, for things like SLP it means we eventually have to
implement reverse transforms for all of this to make the lanes
matching.  But that's true anyway for things like x + 1 vs. x + 0
or x / 3 vs. x / 2 or other simplifications we do.

> Thoughts on this?

OK with the FAILing testcases adjusted (use -O2 / different constants).

Thanks,
Richard.

> 2020-11-03  Jakub Jelinek  
> 
>   PR tree-optimization/97690
>   * tree-ssa-phiopt.c (conditional_replacement): Also optimize
>   cond ? pow2p_cst : 0 as ((type) cond) << cst.
> 
>   * gcc.dg/tree-ssa/phi-opt-22.c: New test.
> 
> --- gcc/tree-ssa-phiopt.c.jj  2020-10-22 09:36:25.602484491 +0200
> +++ gcc/tree-ssa-phiopt.c 2020-11-03 17:59:18.133662581 +0100
> @@ -752,7 +752,9 @@ conditional_replacement (basic_block con
>gimple_stmt_iterator gsi;
>edge true_edge, false_edge;
>tree new_var, new_var2;
> -  bool neg;
> +  bool neg = false;
> +  int shift = 0;
> +  tree nonzero_arg;
>  
>/* FIXME: Gimplification of complex type is too hard for now.  */
>/* We aren't prepared to handle vectors either (and it is a question
> @@ -763,14 +765,22 @@ conditional_replacement (basic_block con
>  || POINTER_TYPE_P (TREE_TYPE (arg1
>  return false;
>  
> -  /* The PHI arguments have the constants 0 and 1, or 0 and -1, then
> - convert it to the conditional.  */
> -  if ((integer_zerop (arg0) && integer_onep (arg1))
> -  || (integer_zerop (arg1) && integer_onep (arg0)))
> -neg = false;
> -  else if ((integer_zerop (arg0) && integer_all_onesp (arg1))
> -|| (integer_zerop (arg1) && integer_all_onesp (arg0)))
> +  /* The PHI arguments have the constants 0 and 1, or 0 and -1 or
> + 0 and (1 << cst), then convert it to the conditional.  */
> +  if (integer_zerop (arg0))
> +nonzero_arg = arg1;
> +  else if (integer_zerop (arg1))
> +nonzero_arg = arg

[PATCH][pushed] gcc-changelog: Change parse_git_revisions strict argument to True.

2020-11-04 Thread Martin Liška


Change the default that is used by GIT server hook and also
by git_update_version.py. Both should use True now.

Right now the server hook uses:
home/gccadmin/hooks-bin/commit_checker

commits = parse_git_revisions(os.environ['GIT_DIR'], commit_rev)
errs = []
for commit in commits:
if not commit.success:
errs.extend(commit.errors)
if errs:
message = 'ChangeLog format failed:\n'
for err in errs:
message += 'ERR: %s\n' % err
message += '\nPlease see: 
https://gcc.gnu.org/codingconventions.html#ChangeLogs\n'
error(message)

which means it now uses non-strict mode. That's bad.

Martin

contrib/ChangeLog:

* gcc-changelog/git_repository.py: Set strict=True
for parse_git_revisions as a default.
---
 contrib/gcc-changelog/git_repository.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_repository.py 
b/contrib/gcc-changelog/git_repository.py
index 90edc3ce3d8..8edcff91ad6 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -29,7 +29,7 @@ except ImportError:
 from git_commit import GitCommit, GitInfo
 
 
-def parse_git_revisions(repo_path, revisions, strict=False):

+def parse_git_revisions(repo_path, revisions, strict=True):
 repo = Repo(repo_path)
 
 def commit_to_info(commit):

--
2.29.1

Re: [committed] libstdc++: Allow Lemire's algorithm to be used in more cases

2020-11-04 Thread Jonathan Wakely via Gcc-patches


On 04/11/20 09:45 +0100, Stephan Bergmann via Libstdc++ wrote:

On 03/11/2020 23:25, Jonathan Wakely wrote:

On 03/11/20 22:28 +0100, Stephan Bergmann via Libstdc++ wrote:

On 29/10/2020 15:59, Jonathan Wakely via Gcc-patches wrote:

This extends the fast path to also work when the URBG's range of
possible values is not the entire range of its result_type. Previously,
the slow path would be used for engines with a uint_fast32_t result type
if that type is actually a typedef for uint64_t rather than uint32_t.
After this change, the generator's result_type is not important, only
the range of possible value that generator can produce. If the
generator's range is exactly UINT64_MAX then the calculation will be
done using 128-bit and 64-bit integers, and if the range is UINT32_MAX
it will be done using 64-bit and 32-bit integers.

In practice, this benefits most of the engines and engine adaptors
defined in [rand.predef] on x86_64-linux and other 64-bit targets. This
is because std::minstd_rand0 and std::mt19937 and others use
uint_fast32_t, which is a typedef for uint64_t.

The code now makes use of the recently-clarified requirement that the
generator's min() and max() functions are usable in constant
expressions (see LWG 2154).

libstdc++-v3/ChangeLog:

Â Â Â Â * include/bits/uniform_int_dist.h (_Power_of_two): Add
Â Â Â Â constexpr.
Â Â Â Â (uniform_int_distribution::_S_nd): Add static_assert to ensure
Â Â Â Â the wider type is twice as wide as the result type.
Â Â Â Â (uniform_int_distribution::__generate_impl): Add static_assert
Â Â Â Â and declare variables as constexpr where appropriate.
Â Â Â Â (uniform_int_distribution:operator()): Likewise. Only consider
Â Â Â Â the uniform random bit generator's range of possible results
Â Â Â Â when deciding whether _S_nd can be used, not the __uctype type.

Tested x86_64-linux. Committed to trunk.


At least with recent Clang trunk, this causes e.g.


$ cat test.cc
#include 
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }


to fail with

$ clang++ --gcc-toolchain=~/gcc/trunk/inst -std=c++17 
-fsyntax-only test.cc

In file included from test.cc:1:
In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/random:40:

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/string:52:

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/stl_algo.h:66:

~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
error: static_assert expression is not an integral constant 
expression

Â Â Â Â Â Â  static_assert( __urng.min() < __urng.max(),
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^~~
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:190:24: 
note: in instantiation of function template specialization 'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here

Â Â Â Â Â Â  { return this->operator()(__urng, _M_param); }
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
test.cc:2:80: note: in instantiation of function template 
specialization 'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }

^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
note: function parameter '__urng' with unknown value cannot be 
used in a constant expression

Â Â Â Â Â Â  static_assert( __urng.min() < __urng.max(),
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

Â Â Â Â Â Â  operator()(_UniformRandomBitGenerator& __urng,
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 
Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:21: 
error: constexpr variable '__urngmin' must be initialized by a 
constant expression

Â Â Â Â Â Â  constexpr __uctype __urngmin = __urng.min();
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^Â Â Â Â Â Â Â Â Â Â  

~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:33: 
note: function parameter '__urng' with unknown value cannot be 
used in a constant expression

Â Â Â Â Â Â  constexpr __uctype __urngmin = __urng.min();
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

Â Â Â Â Â Â  operator()(_UniformRandomBitGenerator& __urng,
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â

Re: [PATCH][PR target/97540] Don't extract memory from operand for normal memory constraint.

2020-11-04 Thread Richard Sandiford via Gcc-patches

Hongtao Liu  writes:
> On Tue, Nov 3, 2020 at 9:51 PM Richard Sandiford
>  wrote:
>>
>> Vladimir Makarov via Gcc-patches  writes:
>> > On 2020-10-27 2:53 a.m., Hongtao Liu wrote:
>> >> Hi:
>> >>For inline asm, there could be an operand like (not (mem:)), it's
>> >> not a valid operand for normal memory constraint.
>> >>Bootstrap is ok, regression test is ok for make check
>> >> RUNTESTFLAGS="--target_board='unix{-m32,}'"
>> >>
>> >> gcc/ChangeLog
>> >>  PR target/97540
>> >>  * ira.c: (ira_setup_alts): Extract memory from operand only
>> >>  for special memory constraint.
>> >>  * recog.c (asm_operand_ok): Ditto.
>> >>  * lra-constraints.c (process_alt_operands): MEM_P is
>> >>  required for normal memory constraint.
>> >>
>> >> gcc/testsuite/ChangeLog
>> >>  * gcc.target/i386/pr97540.c: New test.
>> >>
>> > I understand Richard's concerns and actually these concerns were my
>> > motivations to constraint possible cases for extract_mem_from_operand in
>> > the original patch introducing the function.
>> >
>> > If Richard proposes a better solution we will reconsider the current
>> > approach and revert the changes if it is necessary.
>> >
>> > Meanwhile I am approving this patch.  I hope it will not demotivate
>> > Richard's attempt to find a better solution.
>>
>> OK, that's fine with me.  I might come back to this next stage 1,
>> depending on how things turn out.
>>
>> Richard
>
> Thanks for all your comments, patch committed.
> And I'm not going to add "Br" to more patterns until the final
> solution is in place.

Please don't hold off on my account.  I think any future update is
likely to be mechanical and having more example uses will be helpful.

Thanks,
Richard

Re: [PATCH V2] aarch64: Add vcopy(q)__lane(q)_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches

Andrea Corallo via Gcc-patches  writes:

> Richard Sandiford  writes:

[...]

>> OK with that change if it works (for trunk and for whichever
>> branches need it).
>>
>> Thanks,
>> Richard
>
> Hi Richard,
>
> I've applied the suggestions and have now installed it into master as
> 8eb8dcac6ed.  I'll follow up with the backports.
>
> Thanks!
>
>   Andrea


Hi all,

applied to releases/gcc-10 as b768eef488a.

  Andrea

Re: [PATCH V3] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches

Andrea Corallo via Gcc-patches  writes:

> Andrea Corallo via Gcc-patches  writes:
> [...]
>
>> Hi all,
>>
>> third version of this patch following the suggestions got for its sister
>> patch 
>>
>> Regtested and bootstrapped.
>>
>> Okay for trunk and 10?
>>
>> Thanks!
>>
>>   Andrea
>
> Installed into master as 292c812a27c (okay given here
> )
>
> Will follow-up for the backport.
>
>   Andrea


Hi all,

installed into releases/gcc-10 as 73be6bd1433.

  Andrea

Re: [PATCH V4] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

2020-11-04 Thread Andrea Corallo via Gcc-patches

Andrea Corallo via Gcc-patches  writes:

> Richard Sandiford  writes:
> [...]
>> OK for both.  Thanks for doing this.
>>
>> Richard
>
> Welcome, installed into master as 44e570d9fb0.
>
> Will follow-up for the backport.
>
> Thanks!
>
>   Andrea


Hi all,

installed into releases/gcc-10 as aa97379d802.

Bests

  Andrea

Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64

2020-11-04 Thread Richard Sandiford via Gcc-patches

"yangyang (ET)"  writes:
> Hi, 
>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Wednesday, November 4, 2020 12:15 AM
>> To: yangyang (ET) 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into
>> poly_uint64
>> 
>> "yangyang (ET)"  writes:
>> > Hi,
>> >
>> > I have revised the patch based on your suggestions. I use multiple_p 
>> > instead
>> of !multiple_p if the eq situation is OK to make it easier to understand.
>> >
>> >> >> > if (n->simdclone->inbranch)
>> >> >> >   this_badness += 2048;
>> >> >> > int target_badness = targetm.simd_clone.usable (n); @@
>> >> >> > -3988,19
>> >> >> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> >> >> > +stmt_vec_info
>> >> >> stmt_info,
>> >> >> > arginfo[i].vectype = get_vectype_for_scalar_type (vinfo,
>> >> >> > arg_type,
>> >> >> >
>> >> slp_node);
>> >> >> > if (arginfo[i].vectype == NULL
>> >> >> > -   || (simd_clone_subparts (arginfo[i].vectype)
>> >> >> > -   > bestn->simdclone->simdlen))
>> >> >> > +   || (known_gt (simd_clone_subparts (arginfo[i].vectype),
>> >> >> > + bestn->simdclone->simdlen)))
>> >> >>
>> >> >> Here too I think we want constant_multiple_p:
>> >> >>
>> >> >>   || !constant_multiple_p (bestn->simdclone->simdlen,
>> >> >>simd_clone_subparts
>> >> >> (arginfo[i].vectype))
>> >> >>
>> >> >
>> >> > Use multiple_p here since the multiple is not needed.
>> >>
>> >> True, but in the case of vectorisation, we need to generate a
>> >> constant number of copies at compile time.  If we don't enforce a
>> >> constant multiple, we might end up trying to use an Advanced SIMD routine
>> when vectorising for SVE.
>> >>
>> >> The reason we don't have a two-argument version of
>> >> constant_multiple_p is that so far nothing has needed it (at least
>> >> AFAIK).  There's no conceptual problem with adding it though.  I'm happy
>> to do that if it would help.
>> >>
>> >
>> > Two-argument versions of constant_multiple_p are added in the v3-patch.
>> Could you please check if the added versions are OK ?
>> >
>> > Bootstrap and tested on both aarch64 and x86 Linux platform, no new
>> regression witnessed.
>> 
>> Looks great, thanks.  Pushed to trunk.
>> 
>> Richard
>
> Thanks for installing the patch. As you mentioned in the PR, stage1 of GCC 11 
> is going to close in a few weeks, and
> GCC Development Plan describes the stage3 as " During this two-month period, 
> the only (non-documentation) changes
> that may be made are changes that fix bugs or new ports which do not require 
> changes to other parts of the compiler.
> New functionality may not be introduced during this period. ". So does it 
> mean that the rest four patches of this feature
> need to wait for the GCC 12 stage1 to get installed?

Any taret-independent patches would need to be posted by the end of next
week to get into GCC 11.  There's a bit more leeway for SVE-specific
pieces in config/aarch64, since those have a lower impact.

Thanks,
Richard

Re: [committed] libstdc++: Allow Lemire's algorithm to be used in more cases

2020-11-04 Thread Jonathan Wakely via Gcc-patches


On 04/11/20 10:15 +, Jonathan Wakely wrote:

On 04/11/20 09:45 +0100, Stephan Bergmann via Libstdc++ wrote:

On 03/11/2020 23:25, Jonathan Wakely wrote:

On 03/11/20 22:28 +0100, Stephan Bergmann via Libstdc++ wrote:

On 29/10/2020 15:59, Jonathan Wakely via Gcc-patches wrote:

This extends the fast path to also work when the URBG's range of
possible values is not the entire range of its result_type. Previously,
the slow path would be used for engines with a uint_fast32_t result type
if that type is actually a typedef for uint64_t rather than uint32_t.
After this change, the generator's result_type is not important, only
the range of possible value that generator can produce. If the
generator's range is exactly UINT64_MAX then the calculation will be
done using 128-bit and 64-bit integers, and if the range is UINT32_MAX
it will be done using 64-bit and 32-bit integers.

In practice, this benefits most of the engines and engine adaptors
defined in [rand.predef] on x86_64-linux and other 64-bit targets. This
is because std::minstd_rand0 and std::mt19937 and others use
uint_fast32_t, which is a typedef for uint64_t.

The code now makes use of the recently-clarified requirement that the
generator's min() and max() functions are usable in constant
expressions (see LWG 2154).

libstdc++-v3/ChangeLog:

Â Â Â Â * include/bits/uniform_int_dist.h (_Power_of_two): Add
Â Â Â Â constexpr.
Â Â Â Â (uniform_int_distribution::_S_nd): Add static_assert to ensure
Â Â Â Â the wider type is twice as wide as the result type.
Â Â Â Â (uniform_int_distribution::__generate_impl): Add static_assert
Â Â Â Â and declare variables as constexpr where appropriate.
Â Â Â Â (uniform_int_distribution:operator()): Likewise. Only consider
Â Â Â Â the uniform random bit generator's range of possible results
Â Â Â Â when deciding whether _S_nd can be used, not the __uctype type.

Tested x86_64-linux. Committed to trunk.


At least with recent Clang trunk, this causes e.g.


$ cat test.cc
#include 
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }


to fail with

$ clang++ --gcc-toolchain=~/gcc/trunk/inst -std=c++17 
-fsyntax-only test.cc

In file included from test.cc:1:
In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/random:40:

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/string:52:

In file included from 
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/stl_algo.h:66:

~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
error: static_assert expression is not an integral constant 
expression

Â Â Â Â Â Â  static_assert( __urng.min() < __urng.max(),
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^~~
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:190:24: 
note: in instantiation of function template specialization 'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here

Â Â Â Â Â Â  { return this->operator()(__urng, _M_param); }
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
test.cc:2:80: note: in instantiation of function template 
specialization 'std::uniform_int_distribution::operator()long, 16807, 0, 2147483647>>' requested here
void f(std::default_random_engine e) { 
std::uniform_int_distribution{0, 1}(e); }

^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:281:17: 
note: function parameter '__urng' with unknown value cannot be 
used in a constant expression

Â Â Â Â Â Â  static_assert( __urng.min() < __urng.max(),
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

Â Â Â Â Â Â  operator()(_UniformRandomBitGenerator& __urng,
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 
Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:21: 
error: constexpr variable '__urngmin' must be initialized by a 
constant expression

Â Â Â Â Â Â  constexpr __uctype __urngmin = __urng.min();
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^Â Â Â Â Â Â Â Â Â Â  

~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:284:33: 
note: function parameter '__urng' with unknown value cannot be 
used in a constant expression

Â Â Â Â Â Â  constexpr __uctype __urngmin = __urng.min();
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/uniform_int_dist.h:194:41: 
note: declared here

Â Â Â Â Â Â  operator()(_UniformRandomBitGenerator& __urng,
Â

Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Tobias Burnus


Three of the testcases fail on PowerPC: 
gcc.target/i386/zero-scratch-regs-{9,10,11}.c
  powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
unimplemented: '-fzero-call-used_regs' not supported on this target

Did you miss some dg-require-effective-target ?

powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++2a (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++2a (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-std=gnu++2a (test for excess errors)

Tobias

On 30.10.20 20:50, Qing Zhao via Gcc-patches wrote:


FYI.

I just committed the patch to gcc11 as:

https://gcc.gnu.org/pipermail/gcc-cvs/2020-October/336263.html 


Qing

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: [PATCH v9] genemit.c (main): split insn-emit.c for compiling parallelly

2020-11-04 Thread Richard Sandiford via Gcc-patches

Jojo R  writes:
> gcc/ChangeLog:
>
>   * genemit.c (main): Print 'split line'.
>   * Makefile.in (insn-emit.c): Define split count and file

Looks good, thanks.  Will commit once the copyright situation is
sorted out.

Richard

>
> ---
>  gcc/Makefile.in |  35 +++-
>  gcc/genemit.c   | 104 +---
>  2 files changed, 90 insertions(+), 49 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 978a08f7b04..de846c0fcd4 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1154,6 +1154,15 @@ export STRIP_FOR_TARGET
>  export RANLIB_FOR_TARGET
>  export libsubdir
>  
> +number_series0:=1 2 3 4 5 6 7 8 9
> +number_series1:=0 $(number_series0)
> +number_series2:=$(foreach i,$(number_series0),$(addprefix 
> $(i),$(number_series1)))
> +number_series3:=$(addprefix 0,$(number_series1)) $(number_series2)
> +number_series4:=$(foreach i,$(number_series0),$(addprefix 
> $(i),$(number_series3)))
> +number_series5:=$(addprefix 0,$(number_series3)) $(number_series4)
> +number_series6:=$(foreach i,$(number_series0),$(addprefix 
> $(i),$(number_series5)))
> +number_series:=$(number_series0) $(number_series2) $(number_series4) 
> $(number_series6)
> +
>  FLAGS_TO_PASS = \
>   "ADA_CFLAGS=$(ADA_CFLAGS)" \
>   "BISON=$(BISON)" \
> @@ -1259,6 +1268,18 @@ ANALYZER_OBJS = \
>  # We put the *-match.o and insn-*.o files first so that a parallel make
>  # will build them sooner, because they are large and otherwise tend to be
>  # the last objects to finish building.
> +
> +# target overrides
> +-include $(tmake_file)
> +
> +INSN-GENERATED-SPLIT-NUM ?= 0
> +
> +insn-generated-split-num = $(wordlist 1,$(shell expr 
> $(INSN-GENERATED-SPLIT-NUM) + 1),$(number_series))
> +
> +insn-emit-split-c := $(foreach o, $(insn-generated-split-num), 
> insn-emit$(o).c)
> +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
> +$(insn-emit-split-c): insn-emit.c
> +
>  OBJS = \
>   gimple-match.o \
>   generic-match.o \
> @@ -1266,6 +1287,7 @@ OBJS = \
>   insn-automata.o \
>   insn-dfatab.o \
>   insn-emit.o \
> + $(insn-emit-split-obj) \
>   insn-extract.o \
>   insn-latencytab.o \
>   insn-modes.o \
> @@ -2376,6 +2398,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
> build/gen%$(build_exeext)
>   $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
> $(filter insn-conditions.md,$^) > tmp-$*.c
>   $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
> + $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
> {$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
> + [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
> + [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch 
> insn-$*$$i.c; done && echo "" > insn-$*.c)
>   $(STAMP) s-$*
>  
>  # gencheck doesn't read the machine description, and the file produced
> @@ -4096,18 +4121,10 @@ $(patsubst %,%-subtargets,$(lang_checks)): 
> check-%-subtargets:
>  check_p_tool=$(firstword $(subst _, ,$*))
>  check_p_count=$(check_$(check_p_tool)_parallelize)
>  check_p_subno=$(word 2,$(subst _, ,$*))
> -check_p_numbers0:=1 2 3 4 5 6 7 8 9
> -check_p_numbers1:=0 $(check_p_numbers0)
> -check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix 
> $(i),$(check_p_numbers1)))
> -check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
> -check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix 
> $(i),$(check_p_numbers3)))
> -check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
> -check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix 
> $(i),$(check_p_numbers5)))
> -check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) 
> $(check_p_numbers6)
>  check_p_subdir=$(subst _,,$*)
>  check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
>   $(if 
> $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
> - $(check_p_numbers)))
> + $(number_series)))
>  
>  # For parallelized check-% targets, this decides whether parallelization
>  # is desirable (if -jN is used).  If desirable, recursive make is run with
> diff --git a/gcc/genemit.c b/gcc/genemit.c
> index 84d07d388ee..54a0d909d9d 100644
> --- a/gcc/genemit.c
> +++ b/gcc/genemit.c
> @@ -847,24 +847,13 @@ handle_overloaded_gen (overloaded_name *oname)
>  }
>  }
>  
> -int
> -main (int argc, const char **argv)
> -{
> -  progname = "genemit";
> -
> -  if (!init_rtx_reader_args (argc, argv))
> -return (FATAL_EXIT_CODE);
> -
> -#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> -  nofail_optabs[OPTAB##_optab] = true;
> -#include "internal-fn.def"
> -
> -  /* Assign sequential codes to all entries in the machine description
> - in parallel with the tables in insn-output.c.  */
> -
> -  printf ("/* Generated automatically by the program `genemit'\n\
> -from the machine description file `md'.  */\n\n");
> +/* Print include header.  */
>

RE: [PATCH v2 2/16]middle-end: Refactor and expose some vectorizer helper functions.

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch is a respin of the previous one defining a new helper
> function add_pattern_stmt.
> 
> Ok for master?

OK if the rest is approved.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static inline.
>   * tree-vect-slp.c (vect_create_new_slp_node): Remove static and only
>   set smts if valid.
>   * tree-vectorizer.c (vec_info::add_pattern_stmt): New.
>   (vec_info::set_vinfo_for_stmt): Optionally enforce read-only.
>   * tree-vectorizer.h (struct _slp_tree): Use new types.
>   (lane_permutation_t, lane_permutation_t): New.
>   (vect_create_new_slp_node, vect_mark_pattern_stmts): New.
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of Tamar
> > Christina
> > Sent: Friday, September 25, 2020 3:28 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de; o...@ucw.cz
> > Subject: [PATCH v2 2/16]middle-end: Refactor and expose some vectorizer
> > helper functions.
> > 
> > Hi All,
> > 
> > This is a small refactoring which exposes some helper functions in the
> > vectorizer so they can be used in other places.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static.
> > * tree-vect-slp.c (vect_free_slp_tree,
> > vect_build_slp_tree): Remove static.
> > (struct bst_traits, bst_traits::hash, bst_traits::equal): Move...
> > * tree-vectorizer.h (struct bst_traits, bst_traits::hash,
> > bst_traits::equal): ... to here.
> > (vect_mark_pattern_stmts, vect_free_slp_tree,
> > vect_build_slp_tree): Declare.
> > 
> > --
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

[PATCH] tree-optimization/97709 - set abnormal flag when vectorizing live lanes

2020-11-04 Thread Richard Biener

This properly sets the abnormal flag when vectorizing live lanes
when the original scalar was live across an abnormal edge.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-04  Richard Biener  

PR tree-optimization/97709
* tree-vect-loop.c (vectorizable_live_operation): Set
SSA_NAME_OCCURS_IN_ABNORMAL_PHI when necessary.

* gcc.dg/vect/bb-slp-pr97709.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr97709.c | 26 ++
 gcc/tree-vect-loop.c   |  3 +++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr97709.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr97709.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97709.c
new file mode 100644
index 000..672807f167c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97709.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+
+int a;
+struct b {
+  int c;
+  int d;
+};
+void k (struct b);
+struct b
+e()
+{
+  void *f[] = {&&g, &&h, &&i, &&j};
+  int d, c;
+j:
+  goto *a;
+g:
+  d = 0;
+h:
+  c = 1;
+  goto *a;
+i:
+  {
+struct b b = {c, d};
+k(b);
+  }
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c09aa392419..6cb2286d5c8 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8568,6 +8568,9 @@ vectorizable_live_operation (vec_info *vinfo,
   gimple_seq stmts = NULL;
   new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
   &stmts, true, NULL_TREE);
+  if (TREE_CODE (new_tree) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+   SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_tree) = 1;
   if (is_a  (vec_stmt))
{
  gimple_stmt_iterator si = gsi_after_labels (gimple_bb (vec_stmt));
-- 
2.26.2

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Nathan Sidwell


On 11/3/20 10:14 PM, Hans-Peter Nilsson wrote:

On Tue, 3 Nov 2020, Nathan Sidwell wrote:



I have bootstrapped and tested on:
x86_64-linux
aarch64-linux
powerpc8le-linux
powerpc8-aix

Iain Sandoe has been regularly bootstrapping on x86_64-darwin.  Joseph Myers
graciously built for i686-mingw host.  We eventually ran into compilation
errors in the analyzer, as it seemed unprepared for an IL32P64 host.


(So not actually tested there.)

Are any of the powerpc targets you tested ILP32, such that the
patchset is completely tested for such a target?


No.  I tried building on one of the compile farm mips machines but it 
ran out of memory compiling some of the generated expanders (or something).


rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try 
that.


nathan

--
Nathan Sidwell

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> This is a respin which includes the changes you requested.

Comments randomly ordered, I'm pasting in pieces of the patch -
sending it inline would help to get pieces properly quoted and in-order.

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 
4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9aa42ffe0ce0dd

100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
...

I miss comments in this file, see tree-vectorizer.h where we try
to document purpose of classes and fields.

Things that sticks out to me:

+uint8_t m_arity;
+uint8_t m_num_args;

why uint8_t and not simply unsigned int?  Not knowing what
arity / num_args should be here ;)

+vec_info *m_vinfo;
...
+vect_pattern (slp_tree *node, vec_info *vinfo)

so this looks like something I freed stmt_vec_info of - back-pointers
in the "wrong" direction of the logical hierarchy.  I suppose it's
just to avoid passing down vinfo where we need it?  Please do that
instead - pass down vinfo as everything else does.

The class seems to expose both very high-level (build () it!)
and very low level details (get_ifn).  The high-level one suggests
that a pattern _not_ being represented by an ifn is possible
but there's too much implementation detail already in the
vect_pattern class to make that impossible.  I guess the IFN
details could be pushed down to the simple matching class
(and that be called vect_ifn_pattern or so).

+static bool
+vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
+  bool found_p = false;
+
+  if (dump_enabled_p ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match 
--\n");
+  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
+  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
+}

we dumped all instances after their analysis.  Maybe just
refer to the instance with its address (dump_print %p) so
lookup in the (already large) dump file is easy.

+  hash_set *visited = new hash_set ();
+  for (unsigned x = 0; x < num__slp_patterns; x++)
+{
+  visited->empty ();
+  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo, 
slp_patterns[x],
+   visited);
+}
+
+  delete visited;

no need to new / delete, just do

  has_set visited;

like everyone else.  Btw, do you really want to scan
pieces of the SLP graph (with instances being graph entries)
multiple times?  If not then you should move the visited
set to the caller instead.

+  /* TODO: Remove in final version, only here for generating debug dot 
graphs
+  from SLP tree.  */
+
+  if (dump_enabled_p ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
+  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
+  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
+}

now, if there was some pattern matched it is probably useful
to dump the graph (entry) again.  But only conditional on that
I think.  So can you instead make the dump conditional on
found_p and remove the start dot/end dot markers as said in the comment?

+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"transformation for %s not valid due to post 
"
+"condition\n",

not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
MSG_MISSED_OPTIMIZATION should be used for things (likely) making
vectorization fail.

+  /* Perform recursive matching, it's important to do this after matching 
things

before matching things?

+ in the current node as the matches here may re-order the nodes below 
it.
+ As such the pattern that needs to be subsequently match may change.  

and this is no longer true?

*/
+
+  if (SLP_TREE_CHILDREN (node).exists ()) {

elide this check, the loop will simply not run if empty

+slp_tree child;
+FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)

I think you want to perform the recursion in the caller so you
do it only once and not once for each pattern kind now that you
do post-order processing rather than pre-order.

+  vect_pattern *pattern = patt_fn (ref_node, vinfo);
+
+  if (pattern->matches ())

this suggests you get number of SLP nodes times number of pattern
matchers allocations & inits of vect_pattern.  If you move

+  if (pattern->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
+   {

into ->matches () then whether this is a IFN or multi-node pattern
becomes an implementation detail which would be IMHO better.

+  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
+{
+  /* Calculate the location of the statement in NODE to replace.  */
+  stmt_info = SLP_TREE_REPRESENTATIVE (node);
+  gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
+  tree type = gimple_expr_type (old_stmt);
+
+  /* Create the argu

[PATCH] add costing to SLP vectorized PHIs

2020-11-04 Thread Richard Biener

I forgot to cost vectorized PHIs.  Scalar PHIs are just costed
as scalar_stmt so the following costs vector PHIs as vector_stmt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-04  Richard Biener  

* tree-vectorizer.h (vectorizable_phi): Adjust prototype.
* tree-vect-stmts.c (vect_transform_stmt): Adjust.
(vect_analyze_stmt): Pass cost_vec to vectorizable_phi.
* tree-vect-loop.c (vectorizable_phi): Do costing.
---
 gcc/tree-vect-loop.c  | 4 +++-
 gcc/tree-vect-stmts.c | 4 ++--
 gcc/tree-vectorizer.h | 3 ++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 6cb2286d5c8..5e7188ab87a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7548,7 +7548,7 @@ vectorizable_lc_phi (loop_vec_info loop_vinfo,
 bool
 vectorizable_phi (vec_info *,
  stmt_vec_info stmt_info, gimple **vec_stmt,
- slp_tree slp_node)
+ slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   if (!is_a  (stmt_info->stmt) || !slp_node)
 return false;
@@ -7577,6 +7577,8 @@ vectorizable_phi (vec_info *,
   "incompatible vector types for invariants\n");
return false;
  }
+  record_stmt_cost (cost_vec, SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
+   vector_stmt, stmt_info, vectype, 0, vect_body);
   STMT_VINFO_TYPE (stmt_info) = phi_info_type;
   return true;
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9cf85a0cd51..2c7a8a70913 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10727,7 +10727,7 @@ vect_analyze_stmt (vec_info *vinfo,
 NULL, NULL, node, cost_vec)
  || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
  cost_vec)
- || vectorizable_phi (vinfo, stmt_info, NULL, node));
+ || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
 }
 
   if (!ok)
@@ -10868,7 +10868,7 @@ vect_transform_stmt (vec_info *vinfo,
   break;
 
 case phi_info_type:
-  done = vectorizable_phi (vinfo, stmt_info, &vec_stmt, slp_node);
+  done = vectorizable_phi (vinfo, stmt_info, &vec_stmt, slp_node, NULL);
   gcc_assert (done);
   break;
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index fbf5291cf06..0252d799561 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1940,7 +1940,8 @@ extern bool vect_transform_cycle_phi (loop_vec_info, 
stmt_vec_info,
  slp_tree, slp_instance);
 extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info,
 gimple **, slp_tree);
-extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree);
+extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree,
+ stmt_vector_for_cost *);
 extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code);
 extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
stmt_vector_for_cost *,
-- 
2.26.2

Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This optimizes sequential permutes. i.e. if there are two permutes back to 
> back
> this function applies the permute of the parent to the child and removed the
> parent.
> 
> If the resulting permute in the child is now a no-op.  Then the child is also
> dropped from the graph and the parent's parent attached to the child's child.
> 
> This relies on the materialization point calculation in optimize SLP.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests are included as part of the final patch as they need the SLP pattern
> matcher to insert permutes in between.
> 
> This allows us to remove useless permutes such as
> 
>   ldr q0, [x0, x3]
>   ldr q2, [x1, x3]
>   trn1v1.4s, v0.4s, v0.4s
>   trn2v0.4s, v0.4s, v0.4s
>   trn1v0.4s, v1.4s, v0.4s
>   mov v1.16b, v3.16b
>   fcmla   v1.4s, v0.4s, v2.4s, #0
>   fcmla   v1.4s, v0.4s, v2.4s, #90
>   str q1, [x2, x3]
> 
> from the sequence the vectorizer puts out and give
> 
>   ldr q0, [x0, x3]
>   ldr q2, [x1, x3]
>   mov v1.16b, v3.16b
>   fcmla   v1.4s, v0.4s, v2.4s, #0
>   fcmla   v1.4s, v0.4s, v2.4s, #90
>   str q1, [x2, x3]
> 
> instead
> 
> Ok for master?

+ /* If the remaining permute is a no-op then we can just drop 
the
+node instead of materializing it.  */
+ if (vect_slp_tree_permute_noop_p (node))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"removing unneeded permute node 
%p\n",
+node);
+
+  unsigned idx = SLP_TREE_LANE_PERMUTATION 
(node)[0].first;
+  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
+  unsigned src = slpg->vertices[node->vertex].pred->src;
+  slp_tree prev = vertices[src];
+  unsigned dest;
+  slp_tree tmp;
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
+if (tmp == node)
+  {
+ SLP_TREE_CHILDREN (prev)[dest] = value;
+ break;
+   }

so I don't think this will work reliably since we do not update
the graph when inserting permute nodes and thus the "parent"
can refer to a permute rather than the original node now (we're
just walking over all vertices in no specific order during
materialization - guess using IPO might fix this apart from in
cycles).  You would also need to iterate over preds here (pred_next).
I guess removing no-op permutes is only important for costing?
They should not cause any actual code generation?

You also need to adjust reference counts when you change
SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE
and then slp_tree_free node itself (which might be tricky
at this point).

+static bool
+vect_slp_tree_permute_noop_p (slp_tree node)
+{
+  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
+
+  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
+return true;
+
+  unsigned x, seed;
+  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
+  seed = perms[0].second;
+  for (x = 1; x < perms.length (); x++)
+if (perms[x].first != perms[0].first || perms[x].second != ++seed)
+  return false;

'seed' needs to be zero to be a noop permute and
SLP_TREE_LANES (SLP_TREE_CHILDREN (node)[perms[0].first]) needs
to be the same as SLP_TREE_LANES (node).  Otherwise you'll make
permutes that select parts of a vector no-op.

Maybe simplify the patch and do the vect_slp_tree_permute_noop_p
check in vectorizable_slp_permutation instead?

The permute node adjustment part is OK, thus up to

+ else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
+   {
+ /* If the node if already a permute node we just need to 
apply
+the permutation to the permute node itself.  */
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"simplifying permute node %p\n",
+node);
+
+ vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION 
(node),
+   true);

in case you want to split up, independently of the rest of the
patches.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
>   (vect_optimize_slp): Optimize permutes.
>   (vectorizable_slp_permutation): Fix typo.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

[committed] libstdc++: Document istreambuf_iterator base class change [PR 92285]

2020-11-04 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

PR libstdc++/92285
* doc/xml/manual/evolution.xml: Document change to base class.
* doc/html/manual/api.html: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 3ef33e756a65484a17abb95ef0d4133f80c014b1
Author: Jonathan Wakely 
Date:   Wed Nov 4 12:45:32 2020

libstdc++: Document istreambuf_iterator base class change [PR 92285]

libstdc++-v3/ChangeLog:

PR libstdc++/92285
* doc/xml/manual/evolution.xml: Document change to base class.
* doc/html/manual/api.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml 
b/libstdc++-v3/doc/xml/manual/evolution.xml
index 38f11b0300d4..55b8903baff5 100644
--- a/libstdc++-v3/doc/xml/manual/evolution.xml
+++ b/libstdc++-v3/doc/xml/manual/evolution.xml
@@ -972,6 +972,15 @@ now defaults to zero.
   be used instead.
 
 
+
+  The type of the std::iterator base class of
+  std::istreambuf_iterator was changed to be
+  consistent for all -std modes.
+  Before GCC 10.1 the base class had one type in C++98 mode and a
+  different type in C++11 and later modes. The type in C++98 mode
+  was changed to be the same as for C++11 and later.
+
+
 
   Experimental C++2a support improved, with new headers
   ,

[wwwdocs] Document std::istreambuf_iterator change in GCC 10 [PR 92285]

2020-11-04 Thread Jonathan Wakely via Gcc-patches

I'm adding this caveat to the gcc-10 release notes, as well as to the
libstdc++ manual.

Pushed to wwwdocs.


commit 6ffde10eba0811d1223eaba7e2a8daefe26276aa
Author: Jonathan Wakely 
Date:   Wed Nov 4 12:58:19 2020 +

Document std::istreambuf_iterator change in GCC 10 [PR 92285]

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 759e4fd7..d40a633c 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -65,6 +65,12 @@ You may also want to check out our
 Language (HSAIL) has been deprecated and will likely be removed in
 a future release.
   
+  
+The type of the std::iterator base class of
+std::istreambuf_iterator was changed in C++98 mode
+to be consistent with C++11 and later standards.
+See the libstdc++ notes below for more 
details.
+  
 
 
 
@@ -504,6 +510,18 @@ int get_naÃ¯ve_pi() {
   
 Reduced header dependencies, leading to faster compilation for some code.
   
+  
+The std::iterator base class of
+std::istreambuf_iterator was changed in C++98 mode
+to be consistent with C++11 and later standards.
+This is expected to have no noticeable effect except in the unlikely case
+of a class which has potentially overlapping subobjects of type
+std::istreambuf_iterator and another iterator type
+with a std::iterator
+base class. The layout of such a type might change when compiled as C++98.
+https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92285";>Bug 92285
+has more details and concrete examples.
+  
 
 
 D

ping [PATCH 0/2] arm: "noinit" and "persistent" attributes

2020-11-04 Thread Jozef Lawrynowicz

Ping for below
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557184.html

On Tue, Oct 27, 2020 at 11:40:33AM +, Jozef Lawrynowicz wrote:
> This patch series fixes behavior related to the "noinit" attribute, and
> makes the MSP430 "persistent" attribute generic, so it can be used for
> ARM.
> These attributes are related because they are both used to mark
> variables that should not be initialized by the target's runtime
> startup code.
> 
> The "noinit" attribute is used for variables that are not initialized
> to any value by the program loader, or the runtime startup code.
> This attribute was made generic for GCC 10, whilst previously it was
> only supported for MSP430.
> There are a couple of issues when using it for arm-eabi:
> - It does not work at -O0.
>   The test for it is in the torture directory but only runs at -O2,
>   which is why this bug was undetected.
> - It does not work with -fdata-sections.
> Patch 1 fixes these issues.
> 
> The "persistent" attribute is used for variables that *are* initialized
> by the program loader, but are not initialized by the runtime startup
> code. "persistent" variables are placed in a non-volatile area of
> memory, which allows their value to "persist" between processor resets.
> 
> The "persistent" attribute is already implemented for msp430-elf, but
> patch 2 makes it generic so it can be leveraged by ARM targets. The
> ".persistent" section is pervasive in linker scripts distributed ARM
> devices by manufacturers such as ST and TI.
> 
> I've attached a Binutils patch that adds the ".persistent" section to
> the default ARM linker script. I'll apply it alongside this GCC patch.
> 
> Side note: There is handling of a ".persistent.bss" section, however
> this is Ada-specific and unrelated to the "noinit" and "persistent"
> attributes. The handling of the "noinit" and "persistent" attributes
> does not interfere with it.
> 
> Successfully bootstrapped/regtested x86_64-pc-linux-gnu and regtested
> for arm-none-eabi.
> 
> Ok for trunk?
> 
> Jozef Lawrynowicz (2):
>   Fix "noinit" attribute being ignored for -O0 and -fdata-sections
>   Implement the "persistent" attribute
> 
>  gcc/c-family/c-attribs.c  | 146 --
>  gcc/cgraph.h  |   6 +-
>  gcc/cgraphunit.c  |   2 +
>  gcc/doc/extend.texi   |  20 ++-
>  gcc/lto-cgraph.c  |   2 +
>  .../c-c++-common/torture/attr-noinit-1.c  |   7 +
>  .../c-c++-common/torture/attr-noinit-2.c  |   8 +
>  .../c-c++-common/torture/attr-noinit-3.c  |  11 ++
>  .../torture/attr-noinit-invalid.c |  12 ++
>  .../torture/attr-noinit-main.inc} |  37 ++---
>  .../c-c++-common/torture/attr-persistent-1.c  |   8 +
>  .../c-c++-common/torture/attr-persistent-2.c  |   8 +
>  .../c-c++-common/torture/attr-persistent-3.c  |  10 ++
>  .../torture/attr-persistent-invalid.c |  11 ++
>  .../torture/attr-persistent-main.inc  |  58 +++
>  gcc/testsuite/lib/target-supports.exp |  15 +-
>  gcc/tree-core.h   |   1 +
>  gcc/tree.h|   7 +
>  gcc/varasm.c  |  30 +++-
>  19 files changed, 325 insertions(+), 74 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-invalid.c
>  rename gcc/testsuite/{gcc.c-torture/execute/noinit-attribute.c => 
> c-c++-common/torture/attr-noinit-main.inc} (56%)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-3.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/torture/attr-persistent-invalid.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/torture/attr-persistent-main.inc
> 
> -- 
> 2.28.0
> 
>From 965de1985a21ef449d1b1477be566efcf3405f7e Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 26 Oct 2020 14:11:08 +
Subject: [PATCH 1/2] Fix "noinit" attribute being ignored for -O0 and
 -fdata-sections

Variables with the "noinit" attribute are ignored at -O0 because they
are treated like a regular .bss variable and placed in the .bss section.

With -fdata-sections they are ignored because they are not handled in
resolve_unique_section.

gcc/c-family/ChangeLog:

* c-attribs.c (handle_noinit_attribute): Set DECL_NOINIT_P.

gcc/ChangeLog:

* cgraph.h (symtab_node): Add noinit flag.
* cgraphunit.c (process_function_and_variable_attributes): Set
noinit flag of varpool node for DECL_NOINIT_P decls.
* lto-cgraph.c (lto_output_varpool_node): Pack

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Christophe Lyon via Gcc-patches

On Wed, 4 Nov 2020 at 11:54, Tobias Burnus  wrote:
>
> Three of the testcases fail on PowerPC: 
> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> unimplemented: '-fzero-call-used_regs' not supported on this target
>
> Did you miss some dg-require-effective-target ?
>
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++2a (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++2a (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++2a (test for excess errors)
>

This was reported as PR97680, see also PR97699 ofr arm.

> Tobias
>
> On 30.10.20 20:50, Qing Zhao via Gcc-patches wrote:
>
> > FYI.
> >
> > I just committed the patch to gcc11 as:
> >
> > https://gcc.gnu.org/pipermail/gcc-cvs/2020-October/336263.html 
> > 
> >
> > Qing
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter

RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

2020-11-04 Thread Tamar Christina via Gcc-patches

Hi Richi,

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 1:00 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> permutes.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This optimizes sequential permutes. i.e. if there are two permutes
> > back to back this function applies the permute of the parent to the
> > child and removed the parent.
> >
> > If the resulting permute in the child is now a no-op.  Then the child
> > is also dropped from the graph and the parent's parent attached to the
> child's child.
> >
> > This relies on the materialization point calculation in optimize SLP.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Tests are included as part of the final patch as they need the SLP
> > pattern matcher to insert permutes in between.
> >
> > This allows us to remove useless permutes such as
> >
> > ldr q0, [x0, x3]
> > ldr q2, [x1, x3]
> > trn1v1.4s, v0.4s, v0.4s
> > trn2v0.4s, v0.4s, v0.4s
> > trn1v0.4s, v1.4s, v0.4s
> > mov v1.16b, v3.16b
> > fcmla   v1.4s, v0.4s, v2.4s, #0
> > fcmla   v1.4s, v0.4s, v2.4s, #90
> > str q1, [x2, x3]
> >
> > from the sequence the vectorizer puts out and give
> >
> > ldr q0, [x0, x3]
> > ldr q2, [x1, x3]
> > mov v1.16b, v3.16b
> > fcmla   v1.4s, v0.4s, v2.4s, #0
> > fcmla   v1.4s, v0.4s, v2.4s, #90
> > str q1, [x2, x3]
> >
> > instead
> >
> > Ok for master?
> 
> + /* If the remaining permute is a no-op then we can just
> + drop
> the
> +node instead of materializing it.  */
> + if (vect_slp_tree_permute_noop_p (node))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"removing unneeded permute node
> %p\n",
> +node);
> +
> +  unsigned idx = SLP_TREE_LANE_PERMUTATION
> (node)[0].first;
> +  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> +  unsigned src = slpg->vertices[node->vertex].pred->src;
> +  slp_tree prev = vertices[src];
> +  unsigned dest;
> +  slp_tree tmp;
> +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> +if (tmp == node)
> +  {
> + SLP_TREE_CHILDREN (prev)[dest] = value;
> + break;
> +   }
> 
> so I don't think this will work reliably since we do not update the graph when
> inserting permute nodes and thus the "parent"
> can refer to a permute rather than the original node now (we're just walking
> over all vertices in no specific order during materialization - guess using 
> IPO
> might fix this apart from in cycles).  You would also need to iterate over 
> preds
> here (pred_next).
> I guess removing no-op permutes is only important for costing?
> They should not cause any actual code generation?

Yeah, it's just for costing, the simplification of the permute part is the one 
fixing
the codegen. I could just remove the lane permute (as in, clear it) and change 
the
costing function to not cost VEC_PERMS with no lane permutes (if it doesn't 
already do that).

> 
> You also need to adjust reference counts when you change
> SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> slp_tree_free node itself (which might be tricky at this point).
> 
> +static bool
> +vect_slp_tree_permute_noop_p (slp_tree node) {
> +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> +
> +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> +return true;
> +
> +  unsigned x, seed;
> +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> seed =
> + perms[0].second;  for (x = 1; x < perms.length (); x++)
> +if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> +  return false;
> 
> 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> of a vector no-op.
> 
> Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check
> in vectorizable_slp_permutation instead?
> 
> The permute node adjustment part is OK, thus up to
> 
> + else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
> +   {
> + /* If the node if already a permute node we just need to
> apply
> +the permutation to the permute node itself.  */
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"simplifying permute node

RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 1:00 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> > permutes.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This optimizes sequential permutes. i.e. if there are two permutes
> > > back to back this function applies the permute of the parent to the
> > > child and removed the parent.
> > >
> > > If the resulting permute in the child is now a no-op.  Then the child
> > > is also dropped from the graph and the parent's parent attached to the
> > child's child.
> > >
> > > This relies on the materialization point calculation in optimize SLP.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Tests are included as part of the final patch as they need the SLP
> > > pattern matcher to insert permutes in between.
> > >
> > > This allows us to remove useless permutes such as
> > >
> > >   ldr q0, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   trn1v1.4s, v0.4s, v0.4s
> > >   trn2v0.4s, v0.4s, v0.4s
> > >   trn1v0.4s, v1.4s, v0.4s
> > >   mov v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str q1, [x2, x3]
> > >
> > > from the sequence the vectorizer puts out and give
> > >
> > >   ldr q0, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   mov v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str q1, [x2, x3]
> > >
> > > instead
> > >
> > > Ok for master?
> > 
> > + /* If the remaining permute is a no-op then we can just
> > + drop
> > the
> > +node instead of materializing it.  */
> > + if (vect_slp_tree_permute_noop_p (node))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"removing unneeded permute node
> > %p\n",
> > +node);
> > +
> > +  unsigned idx = SLP_TREE_LANE_PERMUTATION
> > (node)[0].first;
> > +  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> > +  unsigned src = slpg->vertices[node->vertex].pred->src;
> > +  slp_tree prev = vertices[src];
> > +  unsigned dest;
> > +  slp_tree tmp;
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> > +if (tmp == node)
> > +  {
> > + SLP_TREE_CHILDREN (prev)[dest] = value;
> > + break;
> > +   }
> > 
> > so I don't think this will work reliably since we do not update the graph 
> > when
> > inserting permute nodes and thus the "parent"
> > can refer to a permute rather than the original node now (we're just walking
> > over all vertices in no specific order during materialization - guess using 
> > IPO
> > might fix this apart from in cycles).  You would also need to iterate over 
> > preds
> > here (pred_next).
> > I guess removing no-op permutes is only important for costing?
> > They should not cause any actual code generation?
> 
> Yeah, it's just for costing, the simplification of the permute part is the 
> one fixing
> the codegen. I could just remove the lane permute (as in, clear it) and 
> change the
> costing function to not cost VEC_PERMS with no lane permutes (if it doesn't 
> already do that).

I think clearing the lane permute is even not necessary.  The vec
perm code generation should already not cost anything here
since it is also able to elide costs when the permute aligns
naturally with vector boundaries as in { [0, 2], [0, 3], [0, 0], [0, 1] }
for two-element vectors. 

> > 
> > You also need to adjust reference counts when you change
> > SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> > slp_tree_free node itself (which might be tricky at this point).
> > 
> > +static bool
> > +vect_slp_tree_permute_noop_p (slp_tree node) {
> > +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> > +
> > +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> > +return true;
> > +
> > +  unsigned x, seed;
> > +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> > seed =
> > + perms[0].second;  for (x = 1; x < perms.length (); x++)
> > +if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> > +  return false;
> > 
> > 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> > (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> > SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> > of a vector no-op.
> > 
> > Maybe simpli

Re: [PATCH v3] pass: Run cleanup passes before SLP [PR96789]

2020-11-04 Thread Christophe Lyon via Gcc-patches

On Tue, 3 Nov 2020 at 07:39, Kewen.Lin via Gcc-patches
 wrote:
>
> Hi Richard,
>
> Thanks again for your review!
>
> on 2020/11/2 下午6:23, Richard Sandiford wrote:
> > "Kewen.Lin"  writes:
> >> diff --git a/gcc/function.c b/gcc/function.c
> >> index 2c8fa217f1f..3e92ee9c665 100644
> >> --- a/gcc/function.c
> >> +++ b/gcc/function.c
> >> @@ -4841,6 +4841,8 @@ allocate_struct_function (tree fndecl, bool 
> >> abstract_p)
> >>   binding annotations among them.  */
> >>cfun->debug_nonbind_markers = lang_hooks.emits_begin_stmt
> >>  && MAY_HAVE_DEBUG_MARKER_STMTS;
> >> +
> >> +  cfun->pending_TODOs = 0;
> >
> > The field is cleared on allocation.  I think it would be better
> > to drop this, to avoid questions about why other fields aren't
> > similarly zero-initialised.
> >
> >>  }
> >>
> >>  /* This is like allocate_struct_function, but pushes a new cfun for FNDECL
> >> diff --git a/gcc/function.h b/gcc/function.h
> >> index d55cbddd0b5..ffed6520bf9 100644
> >> --- a/gcc/function.h
> >> +++ b/gcc/function.h
> >> @@ -269,6 +269,13 @@ struct GTY(()) function {
> >>/* Value histograms attached to particular statements.  */
> >>htab_t GTY((skip)) value_histograms;
> >>
> >> +  /* Different from normal TODO_flags which are handled right at the
> >> + begin or the end of one pass execution, the pending_TODOs are
> >
> > beginning
> >
> >> + passed down in the pipeline until one of its consumers can
> >> + perform the requested action.  Consumers should then clear the
> >> + flags for the actions that they have taken.  */
> >> +  unsigned int pending_TODOs;
> >> +
> >>/* For function.c.  */
> >>
> >>/* Points to the FUNCTION_DECL of this function.  */
> >> […]
> >> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> >> index 298ab215530..9a9076cee67 100644
> >> --- a/gcc/tree-ssa-loop-ivcanon.c
> >> +++ b/gcc/tree-ssa-loop-ivcanon.c
> >> @@ -1411,6 +1411,13 @@ tree_unroll_loops_completely_1 (bool 
> >> may_increase_size, bool unroll_outer,
> >>bitmap_clear (father_bbs);
> >>bitmap_set_bit (father_bbs, loop_father->header->index);
> >>  }
> >> +  else if (unroll_outer
> >> +   && !(cfun->pending_TODOs
> >> +& PENDING_TODO_force_next_scalar_cleanup))
> >> +{
> >> +  /* Trigger scalar cleanup once any outermost loop gets unrolled.  */
> >> +  cfun->pending_TODOs |= PENDING_TODO_force_next_scalar_cleanup;
> >> +}
> >
> > I can see it would make sense to test whether the flag is already set
> > if we were worried about polluting the cache line.  But this test and
> > set isn't performance-sensitive, so I think it would be clearer to
> > remove the “&& …” part of the condition.
> >
> > Nit: there should be no braces around the block, since it's a single
> > statement.
> >
> > OK with those changes, thanks.
>
> The patch was updated as your comments above, re-tested on Power8
> and committed in r11-4637.
>

The new test gcc.dg/tree-ssa/pr96789.c fails on arm:
FAIL: gcc.dg/tree-ssa/pr96789.c scan-tree-dump dse3 "Deleted dead store:.*tmp"

Can you check?


> BR,
> Kewen

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Richard Sandiford via Gcc-patches

Tobias Burnus  writes:
> Three of the testcases fail on PowerPC: 
> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> unimplemented: '-fzero-call-used_regs' not supported on this target
>
> Did you miss some dg-require-effective-target ?

No, these are a signal to target maintainers that they need
to decide whether to add support or accept the status quo
(in which case a new effective-target will be needed).  See:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html:

The new tests are likely to fail on some targets with the sorry()
message, but I think target maintainers are best placed to decide
whether (a) that's a fundamental restriction of the target and the
tests should just be skipped or (b) the target needs to implement
the new hook.

Thanks,
Richard

[0/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


Hi,

I've been working on several implementations of data layout 
optimizations for GCC, and I am again kindly requesting for a review of 
the type escape based dead field elimination and field reorg.


This patchset is organized in the following way:

* Adds a link-time warning if dead fields are detected
* Allows for the dead-field elimination transformation to be applied
* Reorganizes fields in structures.
* Adds some documentation
* Gracefully does not apply transformation if unknown syntax is detected.
* Adds a heuristic to handle void* casts

I have tested this transformations as extensively as I can. The way to 
trigger these transformations are:


-fipa-field-reorder and -fipa-type-escape-analysis

Having said that, I welcome all criticisms and will try to address those 
criticisms which I can. Please let me know if you have any questions or 
comments, I will try to answer in a timely manner.


There has been some initial discussion on the GCC mailing list but I'm 
submitting the patches to the patches mailing list now. Some of the 
initial criticisms mentioned on the GCC mailing list previously will be 
addressed in the following days, and I believe there is definitely 
enough time to address them all during Stage 1.


I had to add one last commit to account to some differences in the build 
script on master. I will be working today to squash it, but I still 
wanted to submit these patches in order to start the review process.


I have bootstrapped on aarch64-linux

[3/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From 91947eea01a41bd7b17e501ad7d53dfb6499eefc Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Sun, 9 Aug 2020 10:22:49 +0200
Subject: [PATCH 3/7] Add Field Reordering

Field reordering of structs at link-time

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add new file to list of sources
* gcc/common.opt: add new flag for field reordering
* gcc/passes.def: add new pass
* gcc/tree-pass.h: same
* gcc/ipa-field-reorder.c: New file
* gcc/ipa-type-escape-analysis.c: Export common functions
* gcc/ipa-type-escape-analysis.h: Same

---
 gcc/Makefile.in|   1 +
 gcc/common.opt |   4 +
 gcc/ipa-dfe.c  |  84 -
 gcc/ipa-dfe.h  |  26 +-
 gcc/ipa-field-reorder.c| 625 +
 gcc/ipa-type-escape-analysis.c |  44 ++-
 gcc/ipa-type-escape-analysis.h |  12 +-
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   2 +
 9 files changed, 751 insertions(+), 48 deletions(-)
 create mode 100644 gcc/ipa-field-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ef6047870b..2184bd0fc3d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1417,6 +1417,7 @@ OBJS = \
internal-fn.o \
ipa-type-escape-analysis.o \
ipa-dfe.o \
+   ipa-field-reorder.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 39bb6e100c3..035c1e8850f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3484,4 +3484,8 @@ fprint-access-analysis
 Common Report Var(flag_print_access_analysis) Optimization
 This flag is used to print the access analysis (if field is read or 
written to).


+fipa-field-reorder
+Common Report Var(flag_ipa_field_reorder) Optimization
+Reorder fields.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index c048fac8621..16f594a36b9 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -242,9 +242,9 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

  */
 void
 substitute_types_in_program (reorg_record_map_t map,
-reorg_field_map_t field_map)
+reorg_field_map_t field_map, bool _delete)
 {
-  GimpleTypeRewriter rewriter (map, field_map);
+  GimpleTypeRewriter rewriter (map, field_map, _delete);
   rewriter.walk ();
   rewriter._rewrite_function_decl ();
 }
@@ -358,8 +358,11 @@ TypeReconstructor::set_is_not_modified_yet 
(const_tree t)

 return;

   tree type = _reorg_map[tt];
-  const bool is_modified
+  bool is_modified
 = strstr (TypeStringifier::get_type_identifier (type).c_str (), 
".reorg");

+  is_modified
+|= (bool) strstr (TypeStringifier::get_type_identifier (type).c_str (),
+ ".reorder");
   if (!is_modified)
 return;

@@ -405,14 +408,20 @@ TypeReconstructor::is_memoized (const_tree t)
   return already_changed;
 }

-static tree
-get_new_identifier (const_tree type)
+const char *
+TypeReconstructor::get_new_suffix ()
+{
+  return _suffix;
+}
+
+tree
+get_new_identifier (const_tree type, const char *suffix)
 {
   const char *identifier = TypeStringifier::get_type_identifier 
(type).c_str ();

-  const bool is_new_type = strstr (identifier, "reorg");
+  const bool is_new_type = strstr (identifier, suffix);
   gcc_assert (!is_new_type);
   char *new_name;
-  asprintf (&new_name, "%s.reorg", identifier);
+  asprintf (&new_name, "%s.%s", identifier, suffix);
   return get_identifier (new_name);
 }

@@ -468,7 +477,9 @@ TypeReconstructor::_walk_ARRAY_TYPE_post (const_tree t)
   TREE_TYPE (copy) = build_variant_type_copy (TREE_TYPE (copy));
   copy = is_modified ? build_distinct_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   // This is useful so that we go again through type layout
   TYPE_SIZE (copy) = is_modified ? NULL : TYPE_SIZE (copy);
   tree domain = TYPE_DOMAIN (t);
@@ -521,7 +532,9 @@ TypeReconstructor::_walk_POINTER_TYPE_post 
(const_tree t)


   copy = is_modified ? build_variant_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   TYPE_CACHED_VALUES_P (copy) = false;

   tree _t = const_tree_to_tree (t);
@@ -616,7 +629,8 @@ TypeReconstructor::_walk_RECORD_TYPE_post (const_tree t)
   tree main = TYPE_MAIN_VARIANT (t);
   tree main_reorg = _reorg_map[main];
   tree copy_variant = bui

[4/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From a8c4d5b99d5c4168ede79054396cba514fdf23b5 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Mon, 10 Aug 2020 09:10:37 +0200
Subject: [PATCH 4/7] Add documentation for dead field elimination

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: Add file to documentation sources
* gcc/doc/dfe.texi: New section
* gcc/doc/gccint.texi: Include new section

---
 gcc/Makefile.in |   3 +-
 gcc/doc/dfe.texi| 187 
 gcc/doc/gccint.texi |   2 +
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 gcc/doc/dfe.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2184bd0fc3d..7e4c442416d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3275,7 +3275,8 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi		\

 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi   \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
-match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
+match-and-simplify.texi analyzer.texi ux.texi poly-int.texi\
+dfe.texi

 TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi \
 gcc-common.texi gcc-vers.texi
diff --git a/gcc/doc/dfe.texi b/gcc/doc/dfe.texi
new file mode 100644
index 000..e8d01d817d3
--- /dev/null
+++ b/gcc/doc/dfe.texi
@@ -0,0 +1,187 @@
+@c Copyright (C) 2001 Free Software Foundation, Inc.
+@c This is part of the GCC manual.
+@c For copying conditions, see the file gcc.texi.
+
+@node Dead Field Elimination
+@chapter Dead Field Elimination
+
+@node Dead Field Elimination Internals
+@section Dead Field Elimination Internals
+
+@subsection Introduction
+
+Dead field elimination is a compiler transformation that removes fields 
from structs. There are several challenges to removing fields from 
structs at link time but, depending on the workload of the compiled 
program and the architecture where the program runs, dead field 
elimination might be a worthwhile transformation to apply. Generally 
speaking, when the bottle-neck of an application is given by the memory 
bandwidth of the host system and the memory requested is of a struct 
which can be reduced in size, then that combination of workload, program 
and architecture can benefit from applying dead field elimination. The 
benefits come from removing unnecessary fields from structures and thus 
reducing the memory/cache requirements to represent a structure.

+
+
+
+While challenges exist to fully automate a dead field elimination 
transformation, similar and more powerful optimizations have been 
implemented in the past. Chakrabarti et al [0] implement struct peeling, 
splitting into hot and cold parts of a structure, and field reordering. 
Golovanevsky et al [1] also shows efforts to implement data layout 
optimizations at link time. Unlike the work of Chakrabarti and 
Golovanesky, this text only talks about dead field elimination. This 
doesn't mean that the implementation can't be expanded to perform other 
link-time layout optimizations, it just means that dead field 
elimination is the only transformation that is implemented at the time 
of this writing.

+
+[0] Chakrabarti, Gautam, Fred Chow, and L. PathScale. "Structure layout 
optimizations in the open64 compiler: Design, implementation and 
measurements." Open64 Workshop at the International Symposium on Code 
Generation and Optimization. 2008.

+
+[1] Golovanevsky, Olga, and Ayal Zaks. "Struct-reorg: current status 
and future perspectives." Proceedings of the GCC Developers’ Summit. 2007.

+
+@subsection Overview
+
+The dead field implementation is structured in the following way:
+
+
+@itemize @bullet
+@item
+Collect all types which can refer to a @code{RECORD_TYPE}. This means 
that if we have a pointer to a record, we also collect this pointer. Or 
an array, or a union.

+@item
+Mark types as escaping. More of this in the following section.
+@item
+Find fields which can be deleted. (Iterate over all gimple code and 
find which fields are read.)

+@item
+Create new types with removed fields (and reference these types in 
pointers, arrays, etc.)

+@item
+Modify gimple to include these types.
+@end itemize
+
+
+Most of this code relies on the visitor pattern. Types, Expr, and 
Gimple statements are visited using this pattern. You can find the base 
classes in @file{type-walker.c} @file{expr-walker.c} and 
@file{gimple-walker.c}. There are assertions in place where a type, 
expr, or gimple code is encountered which has not been encountered 
before during the testing of this transformation. This facilitates 
fuzzying of the transformation.

+
+@subsubsection Implementation Details: Is a global variable escaping?
+
+How does the analysis determine whether a global variable is visible to 
code outside the current linking unit? In the file 
@file{gimple-escaper.c} we have a simple functio

[6/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From 1609f4713b6d0ab2e84e52b4fbd6f645f10a95e7 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Fri, 16 Oct 2020 08:49:08 +0200
Subject: [PATCH 6/7] Add heuristic to take into account void* pattern.

We add a heuristic in order to be able to transform functions which
receive void* arguments as a way to generalize over arguments. An
example of this is qsort. The heuristic works by first inspecting
leaves in the call graph. If the leaves only contain a reference
to a single RECORD_TYPE then we color the nodes in the call graph
as "casts are safe in this function and does not call external
visible functions". We propagate this property up the callgraph
until a fixed point is reached. This will later be changed to
use ipa-modref.

2020-11-04  Erick Ochoa  

* ipa-type-escape-analysis.c : Add new heuristic
* ipa-field-reorder.c : Use heuristic
* ipa-type-escape-analysis.h : Change signatures
---
 gcc/ipa-field-reorder.c|   3 +-
 gcc/ipa-type-escape-analysis.c | 186 +++--
 gcc/ipa-type-escape-analysis.h |  72 -
 3 files changed, 246 insertions(+), 15 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index c23e6a3f818..5dcc5a38958 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -591,8 +591,9 @@ lto_fr_execute ()
   log ("here in field reordering \n");
   // Analysis.
   detected_incompatible_syntax = false;
+  std::map whitelisted = get_whitelisted_nodes();
   tpartitions_t escaping_nonescaping_sets
-= partition_types_into_escaping_nonescaping ();
+= partition_types_into_escaping_nonescaping (whitelisted);
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index b06f33e24fb..fe68eaf70c7 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -166,6 +166,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 

 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"
@@ -249,6 +250,99 @@ lto_dfe_execute ()
   return 0;
 }

+/* Heuristic to determine if casting is allowed in a function.
+ * This heuristic attempts to allow casting in functions which follow the
+ * pattern where a struct pointer or array pointer is casted to void* or
+ * char*.  The heuristic works as follows:
+ *
+ * There is a simple per-function analysis that determines whether there
+ * is more than 1 type of struct referenced in the body of the method.
+ * If there is more than 1 type of struct referenced in the body,
+ * then the layout of the structures referenced within the body
+ * cannot be casted.  However, if there's only one type of struct 
referenced

+ * in the body of the function, casting is allowed in the function itself.
+ * The logic behind this is that the if the code follows good programming
+ * practices, the only way the memory should be accessed is via a singular
+ * type. There is also another requisite to this per-function analysis, and
+ * that is that the function can only call colored functions or functions
+ * which are available in the linking unit.
+ *
+ * Using this per-function analysis, we then start coloring leaf nodes 
in the

+ * call graph as ``safe'' or ``unsafe''.  The color is propagated to the
+ * callers of the functions until a fixed point is reached.
+ */
+std::map
+get_whitelisted_nodes ()
+{
+  cgraph_node *node = NULL;
+  std::set nodes;
+  std::set leaf_nodes;
+  std::set leaf_nodes_decl;
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  {
+node->get_untransformed_body ();
+nodes.insert(node);
+if (node->callees) continue;
+
+leaf_nodes.insert (node);
+leaf_nodes_decl.insert (node->decl);
+  }
+
+  std::queue worklist;
+  for (std::set::iterator i = leaf_nodes.begin (),
+e = leaf_nodes.end (); i != e; ++i)
+  {
+if (dump_file) fprintf (dump_file, "is a leaf node %s\n", 
(*i)->name ());

+worklist.push (*i);
+  }
+
+  for (std::set::iterator i = nodes.begin (),
+e = nodes.end (); i != e; ++i)
+  {
+worklist.push (*i);
+  }
+
+  std::map map;
+  while (!worklist.empty ())
+  {
+
+if (detected_incompatible_syntax) return map;
+cgraph_node *i = worklist.front ();
+worklist.pop ();
+if (dump_file) fprintf (dump_file, "analyzing %s %p\n", i->name (), i);
+GimpleWhiteLister whitelister;
+whitelister._walk_cnode (i);
+bool no_external = whitelister.does_not_call_external_functions (i, 
map);

+bool before_in_map = map.find (i->decl) != map.end ();
+bool place_callers_in_worklist = !before_in_map;
+if (!before_in_map)
+{
+  map.insert(std::pair(i->decl, no_external));
+} else
+{
+  map[i->decl] = no_external;
+}
+bool previous_value = map[i->decl];
+place_callers_in_worklist |= previous

[5/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From bad08833616e9dd7a212e55b93503200393da942 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Sun, 30 Aug 2020 10:21:35 +0200
Subject: [PATCH 5/7] Abort if Gimple produced from C++ or Fortran sources is
 found.

2020-11-04  Erick Ochoa  

* gcc/ipa-field-reorder: Add flag to exit transformation
* gcc/ipa-type-escape-analysis: Same

---
 gcc/ipa-field-reorder.c|  3 +-
 gcc/ipa-type-escape-analysis.c | 53 --
 gcc/ipa-type-escape-analysis.h |  2 ++
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index 611089ecf24..c23e6a3f818 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -590,6 +590,7 @@ lto_fr_execute ()
 {
   log ("here in field reordering \n");
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
@@ -597,7 +598,7 @@ lto_fr_execute ()
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, 0);

-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return 0;

   // Prepare for transformation.
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index 9944580da6c..b06f33e24fb 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -170,6 +170,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"

+#define ABORT_IF_NOT_C true
+
+bool detected_incompatible_syntax = false;
+
 // Main function that drives dfe.
 static unsigned int
 lto_dfe_execute ();
@@ -256,13 +260,14 @@ static void
 lto_dead_field_elimination ()
 {
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, OPT_Wdfa);
-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return;

 // Prepare for transformation.
@@ -589,6 +594,7 @@ TypeWalker::_walk (const_tree type)
   // Improve, verify that having a type is an invariant.
   // I think there was a specific example which didn't
   // allow for it
+  if (detected_incompatible_syntax) return;
   if (!type)
 return;

@@ -642,9 +648,9 @@ TypeWalker::_walk (const_tree type)
 case POINTER_TYPE:
   this->walk_POINTER_TYPE (type);
   break;
-case REFERENCE_TYPE:
-  this->walk_REFERENCE_TYPE (type);
-  break;
+//case REFERENCE_TYPE:
+//  this->walk_REFERENCE_TYPE (type);
+//  break;
 case ARRAY_TYPE:
   this->walk_ARRAY_TYPE (type);
   break;
@@ -654,18 +660,24 @@ TypeWalker::_walk (const_tree type)
 case FUNCTION_TYPE:
   this->walk_FUNCTION_TYPE (type);
   break;
-case METHOD_TYPE:
-  this->walk_METHOD_TYPE (type);
-  break;
+//case METHOD_TYPE:
+  //this->walk_METHOD_TYPE (type);
+  //break;
 // Since we are dealing only with C at the moment,
 // we don't care about QUAL_UNION_TYPE nor LANG_TYPEs
 // So fail early.
+case REFERENCE_TYPE:
+case METHOD_TYPE:
 case QUAL_UNION_TYPE:
 case LANG_TYPE:
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -848,6 +860,7 @@ TypeWalker::_walk_arg (const_tree t)
 void
 ExprWalker::walk (const_tree e)
 {
+  if (detected_incompatible_syntax) return;
   _walk_pre (e);
   _walk (e);
   _walk_post (e);
@@ -932,7 +945,11 @@ ExprWalker::_walk (const_tree e)
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -1165,6 +1182,7 @@ GimpleWalker::walk ()
   cgraph_node *node = NULL;
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
+  if (detected_incompatible_syntax) return;
   node->get_untransformed_body ();
   tree decl = node->decl;
   gcc_assert (decl);
@@ -1411,7 +1429,11 @@ GimpleWalker::_walk_gimple (gimple *stmt)
   // Break if something is unexpected.
   const char *name = gimple_code_name[code];
   log ("gimple code name %s\n", name);
+#ifdef ABORT_IF_NOT_C
+  detected_incompatible_syntax = true;
+#else
   gcc_unreachable ();
+#endif
 }

 void
@@ -2947,6 +2969,8 @@ TypeStringifier::stringify (const_tree t)
 return std::

[7/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From 747b13bf2c6f5b17bc46316998f01483f8039548 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Wed, 4 Nov 2020 13:42:35 +0100
Subject: [PATCH 7/7] Getting rid of warnings


2020-11-04  Erick Ochoa  

* gcc/ipa-dfe.c : Change const_tree to tree
* gcc/ipa-dfe.h : same
* gcc/ipa-field-reorder.h : same
* gcc/ipa-type-escape-analysis.c : same, add unused attribute
* gcc/ipa-type-escape-analysis.h : same, add unused attribute

---
 gcc/ipa-dfe.c  | 164 -
 gcc/ipa-dfe.h  |  80 ++---
 gcc/ipa-field-reorder.c|  72 ++--
 gcc/ipa-type-escape-analysis.c | 612 -
 gcc/ipa-type-escape-analysis.h | 312 -
 5 files changed, 621 insertions(+), 619 deletions(-)

diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index 16f594a36b9..e163a32617c 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -126,22 +126,22 @@ along with GCC; see the file COPYING3.  If not see
  * Find all non_escaping types which point to RECORD_TYPEs in
  * record_field_offset_map.
  */
-std::set
+std::set
 get_all_types_pointing_to (record_field_offset_map_t 
record_field_offset_map,

   tpartitions_t casting)
 {
   const tset_t &non_escaping = casting.non_escaping;

-  std::set specific_types;
+  std::set specific_types;
   TypeStringifier stringifier;

   // Here we are just placing the types of interest in a set.
-  for (std::map::const_iterator i
+  for (std::map::const_iterator i
= record_field_offset_map.begin (),
e = record_field_offset_map.end ();
i != e; ++i)
 {
-  const_tree record = i->first;
+  tree record = i->first;
   std::string name = stringifier.stringify (record);
   specific_types.insert (record);
 }
@@ -150,16 +150,16 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,


   // SpecificTypeCollector will collect all types which point to the 
types in

   // the set.
-  for (std::set::const_iterator i = non_escaping.begin (),
+  for (std::set::const_iterator i = non_escaping.begin (),
e = non_escaping.end ();
i != e; ++i)
 {
-  const_tree type = *i;
+  tree type = *i;
   specifier.walk (type);
 }

   // These are all the types which need modifications.
-  std::set to_modify = specifier.get_set ();
+  std::set to_modify = specifier.get_set ();
   return to_modify;
 }

@@ -178,24 +178,24 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,

  */
 reorg_maps_t
 get_types_replacement (record_field_offset_map_t record_field_offset_map,
-  std::set to_modify)
+  std::set to_modify)
 {
   TypeStringifier stringifier;

   TypeReconstructor reconstructor (record_field_offset_map, "reorg");
-  for (std::set::const_iterator i = to_modify.begin (),
+  for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
-  const_tree record = *i;
+  tree record = *i;
   reconstructor.walk (TYPE_MAIN_VARIANT (record));
 }

-  for (std::set::const_iterator i = to_modify.begin (),
+  for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
-  const_tree record = *i;
+  tree record = *i;
   reconstructor.walk (record);
 }

@@ -205,11 +205,11 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,
   // Here, we are just making sure that we are not doing anything too 
crazy.

   // Also, we found some types for which TYPE_CACHED_VALUES_P is not being
   // rewritten.  This is probably indicative of a bug in 
TypeReconstructor.

-  for (std::map::const_iterator i = map.begin (),
+  for (std::map::const_iterator i = map.begin (),
  e = map.end ();
i != e; ++i)
 {
-  const_tree o_record = i->first;
+  tree o_record = i->first;
   std::string o_name = stringifier.stringify (o_record);
   log ("original: %s\n", o_name.c_str ());
   tree r_record = i->second;
@@ -220,7 +220,7 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

continue;
   tree m_record = TYPE_MAIN_VARIANT (r_record);
   // Info: We had a bug where some TYPED_CACHED_VALUES were preserved?
-  tree _o_record = const_tree_to_tree (o_record);
+  tree _o_record = tree_to_tree (o_record);
   TYPE_CACHED_VALUES_P (_o_record) = false;
   TYPE_CACHED_VALUES_P (m_record) = false;

@@ -252,44 +252,44 @@ substitute_types_in_program (reorg_record_map_t map,
 /* Return a set of trees which point to the set of trees
  * that can be modified.
  */
-std::set
+std::set
 SpecificTypeCollector::get_set ()
 {
   return to_return;
 }

 void
-SpecificTypeCollector::_walk_POINTER_TYPE_pre (const_

[2/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa


From 09feb1cc82a5d9851a6b524e37c32554b923b1c4 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Thu, 6 Aug 2020 14:07:20 +0200
Subject: [PATCH 2/7] Add Dead Field Elimination

Using the Dead Field Analysis, Dead Field Elimination
automatically transforms gimple to eliminate fields that
are never read.

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add file to list of sources
* gcc/ipa-dfe.c: New
* gcc/ipa-dfe.h: Same
* gcc/ipa-type-escape-analysis.h: Export code used in dfe.
* gcc/ipa-type-escape-analysis.c: Call transformation

---
 gcc/Makefile.in|1 +
 gcc/ipa-dfe.c  | 1280 
 gcc/ipa-dfe.h  |  250 +++
 gcc/ipa-type-escape-analysis.c |   21 +-
 gcc/ipa-type-escape-analysis.h |   10 +
 5 files changed, 1553 insertions(+), 9 deletions(-)
 create mode 100644 gcc/ipa-dfe.c
 create mode 100644 gcc/ipa-dfe.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8b18c9217a2..8ef6047870b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-type-escape-analysis.o \
+   ipa-dfe.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
new file mode 100644
index 000..c048fac8621
--- /dev/null
+++ b/gcc/ipa-dfe.c
@@ -0,0 +1,1280 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field elimination (IPA-DFE)
+
+   The goal of this transformation is to
+
+   1) Create new types to replace RECORD_TYPEs which hold dead fields.
+   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
+   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
+   4) Fix some instances of pointer arithmetic.
+   5) Relayout where needed.
+
+   First stage - DFA
+   =
+
+   Use DFA to compute the set of FIELD_DECLs which can be deleted.
+
+   Second stage - Reconstruct Types
+   
+
+   This stage is done by two family of classes, the SpecificTypeCollector
+   and the TypeReconstructor.
+
+   The SpecificTypeCollector collects all TYPE_P trees which point to
+   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
+   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
+   trees with the new RECORD_TYPE trees.
+
+   Third stage - Substitute Types and Relayout
+   ===
+
+   This stage is handled by ExprRewriter and GimpleRewriter.
+   Some pointer arithmetic is fixed here to take into account those 
eliminated

+   FIELD_DECLS.
+ */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "basic-block.h" //needed for gimple.h
+#include "function.h"//needed for gimple.h
+#include "gimple.h"
+#include "stor-layout.h"
+#include "cfg.h" // needed for gimple-iterator.h
+#include "gimple-iterator.h"
+#include "gimplify.h"  //unshare_expr
+#include "value-range.h"   // make_ssa_name dependency
+#include "tree-ssanames.h" // make_ssa_name
+#include "ssa.h"
+#include "tree-into-ssa.h"
+#include "gimple-ssa.h" // update_stmt
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-ssa-alias.h"
+#include "tree-ssanames.h"
+#include "gimple.h"
+#incl

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Christophe Lyon via Gcc-patches

On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
 wrote:
>
> Hi Andrea,
>
> > -Original Message-
> > From: Andrea Corallo 
> > Sent: 26 October 2020 15:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > ; nd 
> > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
> >
> > Hi all,
> >
> > I'd like to submit the following patch implementing the bfloat16_t
> > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
> >
> > Please see refer to:
> > ACLE 
> > ISA  
> >
> > Regtested and bootstrapped.
> >
> > Okay for trunk?
>

I think you need to add -mfloat-abi=hard to the dg-additional-options
otherwise vld1_lane_bf16_1.c
fails on targets with a soft float-abi default (eg arm-linux-gnueabi).

See bf16_vldn_1.c.

BTW, why did you use a different naming scheme for the tests?
(bf16_vldn_1.c vs vld1_lane_bf16_1.c)

Christophe

> Ok.
> Thanks,
> Kyrill
>
>
> >
> >   Andrea
>

Re: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Richard Biener

On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This change allows one to simplify lane permutes that select from multiple 
> load
> leafs that load from the same DR group by promoting the VEC_PERM node into a
> load itself and pushing the lane permute into it as a load permute.
> 
> This saves us from having to calculate where to materialize a new load node.
> If the resulting loads are now unused they are freed and are removed from the
> graph.
> 
> This allows us to handle cases where we would have generated:
> 
>   moviv4.4s, 0
>   adrpx3, .LC0
>   ldr q5, [x3, #:lo12:.LC0]
>   mov x3, 0
>   .p2align 3,,7
> .L2:
>   mov v0.16b, v4.16b
>   mov v3.16b, v4.16b
>   ldr q1, [x1, x3]
>   ldr q2, [x0, x3]
>   fcmla   v0.4s, v2.4s, v1.4s, #0
>   fcmla   v3.4s, v1.4s, v2.4s, #0
>   fcmla   v0.4s, v2.4s, v1.4s, #270
>   fcmla   v3.4s, v1.4s, v2.4s, #270
>   mov v1.16b, v3.16b
>   tbl v0.16b, {v0.16b - v1.16b}, v5.16b
>   str q0, [x2, x3]
>   add x3, x3, 16
>   cmp x3, 1600
>   bne .L2
>   ret
> 
> and instead generate
> 
>   mov x3, 0
>   .p2align 3,,7
> .L27:
>   ldr q0, [x2, x3]
>   ldr q1, [x0, x3]
>   ldr q2, [x1, x3]
>   fcmla   v0.2d, v1.2d, v2.2d, #0
>   fcmla   v0.2d, v1.2d, v2.2d, #270
>   str q0, [x2, x3]
>   add x3, x3, 16
>   cmp x3, 512
>   bne .L27
>   ret
> 
> This runs as a pre step such that permute simplification can still inspect 
> this
> permute is needed
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests are included as part of the final patch as they need the SLP pattern
> matcher to insert permutes in between.
> 
> Ok for master?

So I think this is too specialized for the general issue that we're
doing a bad job in CSEing the load part of different permutes of
the same group.  I've played with fixing this half a year ago (again)
in multiple general ways but they all caused some regressions.

So you're now adding some heuristics as to when to anticipate
"CSE" (or merging with followup permutes).

To quickly recap what I did consider two loads (V2DF)
one { a[0], a[1] } and the other { a[1], a[0] }.  They
currently are two SLP nodes and one with a load_permutation.
My original attempts focused on trying to get rid of load_permutation
in favor of lane_permute nodes and thus during SLP discovery
I turned the second into { a[0], a[1] } (magically unified with
the other load) and a followup lane-permute node.

So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] }
which eventually will (due to patterns) be lane-permuted
into { a[0], a[1] }, right?  So generalizing this as
a single { a[0], a[1] } plus two lane-permute nodes  { 0, 0 }
and { 1, 1 } early would solve the issue as well?  Now,
in general it might be more profitable to generate the
{ a[0], a[0] } and { a[1], a[1] } via scalar-load-and-splat
rather than vector load and permute so we have to be careful
to not over-optimize here or be prepared to do the reverse
transform.

The patch itself is a bit ugly since it modifies the SLP
graph when we already produced the graphds graph so I
would do any of this before.  I did consider gathering
all loads nodes loading from a group and then trying to
apply some heuristic to alter the SLP graph so it can
be better optimized.  In fact when we want to generate
the same code as the non-SLP interleaving scheme does
we do have to look at those since we have to unify
loads there.

I'd put this after vect_slp_build_vertices but before
the new_graph call - altering 'vertices' / 'leafs' should
be more easily possible and the 'leafs' array contains
all loads already (vect_slp_build_vertices could be massaged
to provide a map from DR_GROUP_FIRST_ELEMENT to slp_tree,
giving us the meta we want).

That said, I'd like to see something more forward-looking
rather than the ad-hoc special-casing of what you run into
with the pattern matching.

In case we want to still go with the special-casing it
should IMHO be done in a pre-order walk simply
looking for lane permute nodes with children that all
load from the same group performing what you do before
any of the vertices/graph stuff is built.  That's
probably easiest at this point and it can be done
when then bst_map is still around so you can properly
CSE the new load you build.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.c (vect_optimize_slp): Promote permutes.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

[patch, committed] targhooks.c: Fix -fzero-call-used-regs 'sorry' typo

2020-11-04 Thread Tobias Burnus


As also remarked in Christophe in PR97699.
Committed as obvious.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 243492e2c69741b91dbfe3bba9b772f65fc9354c
Author: Tobias Burnus 
Date:   Wed Nov 4 14:31:34 2020 +0100

targhooks.c: Fix -fzero-call-used-regs 'sorry' typo

gcc/ChangeLog:

* targhooks.c (default_zero_call_used_regs): Fix flag-name typo
in sorry.

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 4e4d100c547..5b68a2ad7d4 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1011,7 +1011,7 @@ default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	  {
 		issued_error = true;
 		sorry ("%qs not supported on this target",
-			"-fzero-call-used_regs");
+			"-fzero-call-used-regs");
 	  }
 	delete_insns_since (last_insn);
 	  }

[committed] libstdc++: Define new C++17 std::search overload for Parallel Mode [PR 94971]

2020-11-04 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

PR libstdc++/94971
* include/bits/stl_algo.h (search(FIter, FIter, const Searcher):
Adjust #if condition.
* include/parallel/algo.h (search(FIter, FIter, const Searcher&):
Define new overload for C++17.

Tested powerpc64le-linux. Committed to trunk.

commit e0af865ab9d9d5b6b3ac7fdde26cf9bbf635b6b4
Author: Jonathan Wakely 
Date:   Wed Nov 4 13:36:32 2020

libstdc++: Define new C++17 std::search overload for Parallel Mode [PR 
94971]

libstdc++-v3/ChangeLog:

PR libstdc++/94971
* include/bits/stl_algo.h (search(FIter, FIter, const Searcher):
Adjust #if condition.
* include/parallel/algo.h (search(FIter, FIter, const Searcher&):
Define new overload for C++17.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 621c6331422e..6efc99035b7d 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -4243,7 +4243,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
__gnu_cxx::__ops::__iter_comp_val(__binary_pred, __val));
 }
 
-#if __cplusplus > 201402L
+#if __cplusplus >= 201703L
   /** @brief Search a sequence using a Searcher object.
*
*  @param  __firstA forward iterator.
diff --git a/libstdc++-v3/include/parallel/algo.h 
b/libstdc++-v3/include/parallel/algo.h
index cec6fd003c38..4b6dcc841191 100644
--- a/libstdc++-v3/include/parallel/algo.h
+++ b/libstdc++-v3/include/parallel/algo.h
@@ -1049,6 +1049,21 @@ namespace __parallel
 std::__iterator_category(__begin2));
 }
 
+#if __cplusplus >= 201703L
+  /** @brief Search a sequence using a Searcher object.
+   *
+   *  @param  __firstA forward iterator.
+   *  @param  __last A forward iterator.
+   *  @param  __searcher A callable object.
+   *  @return @p __searcher(__first,__last).first
+  */
+  template
+inline _ForwardIterator
+search(_ForwardIterator __first, _ForwardIterator __last,
+  const _Searcher& __searcher)
+{ return __searcher(__first, __last).first; }
+#endif
+
   // Sequential fallback
   template
 inline _FIterator

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Christophe Lyon via Gcc-patches

On Wed, 4 Nov 2020 at 14:29, Christophe Lyon  wrote:
>
> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>  wrote:
> >
> > Hi Andrea,
> >
> > > -Original Message-
> > > From: Andrea Corallo 
> > > Sent: 26 October 2020 15:59
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > > ; nd 
> > > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
> > >
> > > Hi all,
> > >
> > > I'd like to submit the following patch implementing the bfloat16_t
> > > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
> > >
> > > Please see refer to:
> > > ACLE 
> > > ISA  
> > >
> > > Regtested and bootstrapped.
> > >
> > > Okay for trunk?
> >
>
> I think you need to add -mfloat-abi=hard to the dg-additional-options
> otherwise vld1_lane_bf16_1.c
> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>
> See bf16_vldn_1.c.

Actually that's not sufficient because in turn we get:
/sysroot-arm-none-linux-gnueabi/usr/include/gnu/stubs.h:10:11: fatal
error: gnu/stubs-hard.h: No such file or directory

So you should check that -mfloat-abi=hard is supported.

Ditto for the vst tests.

>
> BTW, why did you use a different naming scheme for the tests?
> (bf16_vldn_1.c vs vld1_lane_bf16_1.c)
>
> Christophe
>
> > Ok.
> > Thanks,
> > Kyrill
> >
> >
> > >
> > >   Andrea
> >

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 12:41 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; nd ;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > This is a respin which includes the changes you requested.
> 
> Comments randomly ordered, I'm pasting in pieces of the patch - sending it
> inline would help to get pieces properly quoted and in-order.
> 
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9
> aa42ffe0ce0dd
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> ...
> 
> I miss comments in this file, see tree-vectorizer.h where we try to document
> purpose of classes and fields.
> 
> Things that sticks out to me:
> 
> +uint8_t m_arity;
> +uint8_t m_num_args;
> 
> why uint8_t and not simply unsigned int?  Not knowing what arity /
> num_args should be here ;)

I think I can remove arity, but num_args is how many operands the created
internal function call should take.  Since we can't vectorize calls with more 
than
4 arguments at the moment it seemed like 255 would be a safe limit :).

> 
> +vec_info *m_vinfo;
> ...
> +vect_pattern (slp_tree *node, vec_info *vinfo)
> 
> so this looks like something I freed stmt_vec_info of - back-pointers in the
> "wrong" direction of the logical hierarchy.  I suppose it's just to avoid 
> passing
> down vinfo where we need it?  Please do that instead - pass down vinfo as
> everything else does.
> 
> The class seems to expose both very high-level (build () it!) and very low
> level details (get_ifn).  The high-level one suggests that a pattern _not_
> being represented by an ifn is possible but there's too much implementation
> detail already in the vect_pattern class to make that impossible.  I guess the
> IFN details could be pushed down to the simple matching class (and that be
> called vect_ifn_pattern or so).
> 
> +static bool
> +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> +  bool found_p = false;
> +
> +  if (dump_enabled_p ())
> +{
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match
> --\n");
> +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> +}
> 
> we dumped all instances after their analysis.  Maybe just refer to the
> instance with its address (dump_print %p) so lookup in the (already large)
> dump file is easy.
> 
> +  hash_set *visited = new hash_set ();  for
> + (unsigned x = 0; x < num__slp_patterns; x++)
> +{
> +  visited->empty ();
> +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> slp_patterns[x],
> +   visited);
> +}
> +
> +  delete visited;
> 
> no need to new / delete, just do
> 
>   has_set visited;
> 
> like everyone else.  Btw, do you really want to scan pieces of the SLP graph
> (with instances being graph entries) multiple times?  If not then you should
> move the visited set to the caller instead.
> 
> +  /* TODO: Remove in final version, only here for generating debug dot
> graphs
> +  from SLP tree.  */
> +
> +  if (dump_enabled_p ())
> +{
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> +}
> 
> now, if there was some pattern matched it is probably useful to dump the
> graph (entry) again.  But only conditional on that I think.  So can you 
> instead
> make the dump conditional on found_p and remove the start dot/end dot
> markers as said in the comment?
> 
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"transformation for %s not valid due to
> + post
> "
> +"condition\n",
> 
> not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> vectorization fail.
> 
> +  /* Perform recursive matching, it's important to do this after
> + matching
> things
> 
> before matching things?
> 
> + in the current node as the matches here may re-order the nodes
> + below
> it.
> + As such the pattern that needs to be subsequently match may change.
> 
> and this is no longer true?
> 
> */
> +
> +  if (SLP_TREE_CHILDREN (node).exists ()) {
> 
> elide this check, the loop will simply not run if empty
> 
> +slp_tree child;
> +FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> 
> I think you want to perform the recursion in the caller so you do it only once
> and not once for each pat

[PATCH] testsuite: Clean up lto and offload dump files

2020-11-04 Thread Frederik Harwath


Hi,

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects leftover
dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is used
for matching the "ltrans" files is also changed since the existing
pattern failed to remove some LTO ("ltrans0.ltrans.") dump files.


This patch gets rid of at least one unresolved libgomp test result that
would otherwise be introduced by the patch discussed in this thread:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557889.html


diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {

[...]

-   lappend tfiles "$stem.{$basename_ext,exe}"

I do not understand why "exe" should be included here. I have removed it
and I did not notice any files matching the resultig pattern being left
back by "make check-gcc".


Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9eb5da60e8822e1f6fa90b32bff6123ed62c146c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 4 Nov 2020 14:09:46 +0100
Subject: [PATCH] testsuite: Clean up lto and offload dump files

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects
leftover dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is
used for matching the "ltrans" files is also changed since the
existing pattern failed to match some dump files.

2020-11-04  Frederik Harwath  

gcc/testsuite/ChangeLog:

	* lib/gcc-dg.exp (proc schedule-cleanups): Adapt "-flto" handling,
	add "-foffload" handling.
---
 gcc/testsuite/lib/gcc-dg.exp | 50 
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {
 # stem.ext..
 # (tree)passes can have multiple instances, thus optional trailing *
 set ptn "\[0-9\]\[0-9\]\[0-9\]$ptn.*"
+set ltrans no
+set mkoffload no
+
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
 	verbose "Cleanup -flto seen" 4
-	set ltrans "{ltrans\[0-9\]*.,}"
-} else {
-	set ltrans ""
+	set ltrans yes
+}
+
+if [regexp -- {(^|\s+)-foffload=} $opts] {
+	verbose "Cleanup -foffload seen" 4
+	set mkoffload yes
 }
-set ptn "$ltrans$ptn"
+
 verbose "Cleanup final ptn: $ptn" 4
 set tfiles {}
 foreach src $testcases {
-	set basename [file tail $src]
-	if { $ltrans != "" } {
-	# ??? should we use upvar 1 output_file instead of this (dup ?)
-	set stem [file rootname $basename]
-	set basename_ext [file extension $basename]
-	if {$basename_ext != ""} {
-		regsub -- {^.*\.} $basename_ext {} basename_ext
-	}
-	lappend tfiles "$stem.{$basename_ext,exe}"
-	unset basename_ext
-	} else {
-	lappend tfiles $basename
-	}
+set basename [file tail $src]
+set stem [file rootname $basename]
+set basename_ext [file extension $basename]
+if {$basename_ext != ""} {
+regsub -- {^.*\.} $basename_ext {} basename_ext
+}
+set extensions [list $basename_ext]
+
+if { $ltrans == yes } {
+lappend extensions "ltrans\[0-9\]*.ltrans"
+}
+if { $mkoffload == yes} {
+# The * matches the offloading target's name, e.g. "xnvptx-none".
+lappend extensions "*.mkoffload"
+}
+
+set extensions_ptn [join $extensions ","]
+if { [llength $extensions] > 1 } {
+set extensions_ptn "{$extensions_ptn}"
+}
+
+  	lappend tfiles "$stem.$extensions_ptn"
 }
+
 if { [llength $tfiles] > 1 } {
 	set tfiles [join $tfiles ","]
 	set tfiles "{$tfiles}"
-- 
2.17.1

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches

On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
>
> On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
>  wrote:
> >
> > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > >
> > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > >  wrote:
> > > > >
> > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > wrote:
> > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via Gcc-patches 
> > > > > > > wrote:
> > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > The attached patch implements TARGET_ASM_MARK_DECL_PRESERVED 
> > > > > > > > > for ELF GNU
> > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > attribute
> > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > >
> > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > ".retain"
> > > > > > > >
> > > > > > > > Can you use the "R" flag instead?
> > > > > > > >
> > > > > > >
> > > > > > > For the benefit of this mailing list, I have copied my response 
> > > > > > > from the
> > > > > > > Binutils mailing list regarding this.
> > > > > > > The "comm_section" example I gave is actually innacurate, but you 
> > > > > > > can
> > > > > > > see the examples of the variety of sections that would need to be
> > > > > > > handled by doing
> > > > > > >
> > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > >
> > > > > > > > ... snip ...
> > > > > > > > Secondly, for seamless integration with the "used" attribute, 
> > > > > > > > we must be
> > > > > > > > able to to mark the symbol with the used attribute applied as 
> > > > > > > > "retained"
> > > > > > > > without changing its section name. For GCC "named" sections, 
> > > > > > > > this is
> > > > > > > > straightforward, but for "unnamed" sections it is a giant mess.
> > > > > > > >
> > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > available,
> > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > switch to one
> > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded in 
> > > > > > > > the
> > > > > > > > structure.
> > > > > > > >
> > > > > > > > Backends define the assembly code to switch to these sections 
> > > > > > > > (some
> > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > unnamed section
> > > > > > > > "comm_section", might correspond to a .bss section, or emit a 
> > > > > > > > .comm
> > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > what the
> > > > > > > > name of a section will be, but it would be very messy and not 
> > > > > > > > robust.
> > > > > > > >
> > > > > > > > Meanwhile, having a .retain  directive is a very 
> > > > > > > > simmple
> > > > > > > > solution, and keeps the GCC implementation really concise (patch
> > > > > > > > attached). The assembler will know for sure what the section 
> > > > > > > > containing
> > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > directly.
> > > > > > > >
> > > > > >
> > > > > > Please take a look at
> > > > > >
> > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > >
> > > > > > which is built in top of
> > > > > >
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > >
> > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > want, you extract my flags2 change and use it for SHF_GNU_RETAIN.
> > > > >
> > > > > In your patch you have to make the assumption that data_section, 
> > > > > always
> > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > (which
> > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > >
> > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > >
> > > > > data_section for c6x corresponds to .fardata, not .data. So the use of
> > > > > "used" on a data declaration would place it in a different section, 
> > > > > that
> > > > > if the "used" attribute was not applied.
> > > > >
> > > > > For c6x and mips, readonly_data_section does not correspond to 
> > > > > .rodata,
> > > > > so that assumption cannot be made either:
> > > > > > c6x/elf-common.h:#define READONLY_DATA_SECTION_ASM_OP 
> > > > > > "\t.section\t\".const\",\"a\",@progbits"
> > > > > > mips/mips.h:#define READONLY_DATA_SECTION_ASM_OP"\t.rdata"  
> > > > > > /* read-only data */
> > > > >
> > > > > The same can be said for bss_section for c6x as well.
> > > >
> > > > Just add and use named_xxx_section.
> > > >
> >
> > I

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Nathan Sidwell


On 11/4/20 7:30 AM, Nathan Sidwell wrote:

rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try 
that.


yeah, that didn't work.  There's compilation errors in
../../../src/gcc/config/i386/x86-tune-costs.h about missing 
initializers.  and then ...


In file included from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
 from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
 from 
/usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,

 from /usr/include/c++/4.9/random:50,
 from /usr/include/c++/4.9/bits/stl_algo.h:66,
 from /usr/include/c++/4.9/algorithm:62,
 from ../../../src/gcc/cp/mapper-resolver.cc:26:
./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
 return malloc (__size);
^
Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed

it's a little unfortunate we can't use the standard library :(  I'll see 
what I can do about avoiding algorithm.


nathan

--
Nathan Sidwell
make[2]: Entering directory '/home/nathan/egcs/modules/obj/i686/gcc'
g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../../src/gcc -I../../../src/gcc/. -I../../../src/gcc/../include -I../../../src/gcc/../libcpp/include -I../../../src/gcc/../libcody -I/home/nathan/egcs/modules/obj/i686/./gmp -I/home/nathan/egcs/modules/src/gmp -I/home/nathan/egcs/modules/obj/i686/./mpfr/src -I/home/nathan/egcs/modules/src/mpfr/src -I/home/nathan/egcs/modules/src/mpc/src  -I../../../src/gcc/../libdecnumber -I../../../src/gcc/../libdecnumber/bid -I../libdecnumber -I../../../src/gcc/../libbacktrace -I/home/nathan/egcs/modules/obj/i686/./isl/include -I/home/nathan/egcs/modules/src/isl/include  -o i386-options.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo ../../../src/gcc/config/i386/i386-options.c
In file included from ../../../src/gcc/config/i386/i386-options.c:94:0:
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
   {rep_prefix_1_byte, {{-1, rep_prefix_1_byte, false;
^
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../..

RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Tamar Christina via Gcc-patches

Hi Richi,

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 1:36 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> selects from loads from the same DR.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This change allows one to simplify lane permutes that select from
> > multiple load leafs that load from the same DR group by promoting the
> > VEC_PERM node into a load itself and pushing the lane permute into it as a
> load permute.
> >
> > This saves us from having to calculate where to materialize a new load node.
> > If the resulting loads are now unused they are freed and are removed
> > from the graph.
> >
> > This allows us to handle cases where we would have generated:
> >
> > moviv4.4s, 0
> > adrpx3, .LC0
> > ldr q5, [x3, #:lo12:.LC0]
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > mov v0.16b, v4.16b
> > mov v3.16b, v4.16b
> > ldr q1, [x1, x3]
> > ldr q2, [x0, x3]
> > fcmla   v0.4s, v2.4s, v1.4s, #0
> > fcmla   v3.4s, v1.4s, v2.4s, #0
> > fcmla   v0.4s, v2.4s, v1.4s, #270
> > fcmla   v3.4s, v1.4s, v2.4s, #270
> > mov v1.16b, v3.16b
> > tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 1600
> > bne .L2
> > ret
> >
> > and instead generate
> >
> > mov x3, 0
> > .p2align 3,,7
> > .L27:
> > ldr q0, [x2, x3]
> > ldr q1, [x0, x3]
> > ldr q2, [x1, x3]
> > fcmla   v0.2d, v1.2d, v2.2d, #0
> > fcmla   v0.2d, v1.2d, v2.2d, #270
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 512
> > bne .L27
> > ret
> >
> > This runs as a pre step such that permute simplification can still
> > inspect this permute is needed
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Tests are included as part of the final patch as they need the SLP
> > pattern matcher to insert permutes in between.
> >
> > Ok for master?
> 
> So I think this is too specialized for the general issue that we're doing a 
> bad
> job in CSEing the load part of different permutes of the same group.  I've
> played with fixing this half a year ago (again) in multiple general ways but
> they all caused some regressions.
> 
> So you're now adding some heuristics as to when to anticipate "CSE" (or
> merging with followup permutes).
> 
> To quickly recap what I did consider two loads (V2DF) one { a[0], a[1] } and
> the other { a[1], a[0] }.  They currently are two SLP nodes and one with a
> load_permutation.
> My original attempts focused on trying to get rid of load_permutation in
> favor of lane_permute nodes and thus during SLP discovery I turned the
> second into { a[0], a[1] } (magically unified with the other load) and a
> followup lane-permute node.
> 
> So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] } which 
> eventually
> will (due to patterns) be lane-permuted into { a[0], a[1] }, right?  So
> generalizing this as a single { a[0], a[1] } plus two lane-permute nodes  { 
> 0, 0 }
> and { 1, 1 } early would solve the issue as well?

Correct, I did wonder why it was generating two different nodes instead of a 
lane
permute but didn't pay much attention that it was just a short coming.

> Now, in general it might be
> more profitable to generate the { a[0], a[0] } and { a[1], a[1] } via 
> scalar-load-
> and-splat rather than vector load and permute so we have to be careful to
> not over-optimize here or be prepared to do the reverse transform.

This in principle can be done in optimize_slp then right? Since it would do
a lot of the same work already and find the materialization points. 

> 
> The patch itself is a bit ugly since it modifies the SLP graph when we already
> produced the graphds graph so I would do any of this before.  I did consider
> gathering all loads nodes loading from a group and then trying to apply some
> heuristic to alter the SLP graph so it can be better optimized.  In fact when 
> we
> want to generate the same code as the non-SLP interleaving scheme does
> we do have to look at those since we have to unify loads there.
> 

Yes.. I will concede the patch isn't my finest work.. I also don't like the 
fact that I
had to keep leafs in tact less I break things later. But wanted feedback :) 

> I'd put this after vect_slp_build_vertices but before the new_graph call -
> altering 'vertices' / 'leafs' should be more easily possible and the 'leafs' 
> array
> contains all loads already (vect_slp_build_vertices could be massaged to
> provide a map from DR_GROUP_FIRST_ELEMENT to slp_tree, giving us the
> meta we want).
> 
> That said, I'd like to see something more forward-looking rather than the ad-
> hoc special-casing of what you run into with the

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Tamar Christina wrote:

> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 12:41 PM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; nd ;
> > gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > This is a respin which includes the changes you requested.
> > 
> > Comments randomly ordered, I'm pasting in pieces of the patch - sending it
> > inline would help to get pieces properly quoted and in-order.
> > 
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9
> > aa42ffe0ce0dd
> > 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > ...
> > 
> > I miss comments in this file, see tree-vectorizer.h where we try to document
> > purpose of classes and fields.
> > 
> > Things that sticks out to me:
> > 
> > +uint8_t m_arity;
> > +uint8_t m_num_args;
> > 
> > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > num_args should be here ;)
> 
> I think I can remove arity, but num_args is how many operands the created
> internal function call should take.  Since we can't vectorize calls with more 
> than
> 4 arguments at the moment it seemed like 255 would be a safe limit :).
> 
> > 
> > +vec_info *m_vinfo;
> > ...
> > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > 
> > so this looks like something I freed stmt_vec_info of - back-pointers in the
> > "wrong" direction of the logical hierarchy.  I suppose it's just to avoid 
> > passing
> > down vinfo where we need it?  Please do that instead - pass down vinfo as
> > everything else does.
> > 
> > The class seems to expose both very high-level (build () it!) and very low
> > level details (get_ifn).  The high-level one suggests that a pattern _not_
> > being represented by an ifn is possible but there's too much implementation
> > detail already in the vect_pattern class to make that impossible.  I guess 
> > the
> > IFN details could be pushed down to the simple matching class (and that be
> > called vect_ifn_pattern or so).
> > 
> > +static bool
> > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > +  bool found_p = false;
> > +
> > +  if (dump_enabled_p ())
> > +{
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match
> > --\n");
> > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > +}
> > 
> > we dumped all instances after their analysis.  Maybe just refer to the
> > instance with its address (dump_print %p) so lookup in the (already large)
> > dump file is easy.
> > 
> > +  hash_set *visited = new hash_set ();  for
> > + (unsigned x = 0; x < num__slp_patterns; x++)
> > +{
> > +  visited->empty ();
> > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > slp_patterns[x],
> > +   visited);
> > +}
> > +
> > +  delete visited;
> > 
> > no need to new / delete, just do
> > 
> >   has_set visited;
> > 
> > like everyone else.  Btw, do you really want to scan pieces of the SLP graph
> > (with instances being graph entries) multiple times?  If not then you should
> > move the visited set to the caller instead.
> > 
> > +  /* TODO: Remove in final version, only here for generating debug dot
> > graphs
> > +  from SLP tree.  */
> > +
> > +  if (dump_enabled_p ())
> > +{
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > +}
> > 
> > now, if there was some pattern matched it is probably useful to dump the
> > graph (entry) again.  But only conditional on that I think.  So can you 
> > instead
> > make the dump conditional on found_p and remove the start dot/end dot
> > markers as said in the comment?
> > 
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"transformation for %s not valid due to
> > + post
> > "
> > +"condition\n",
> > 
> > not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> > MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> > vectorization fail.
> > 
> > +  /* Perform recursive matching, it's important to do this after
> > + matching
> > things
> > 
> > before matching things?
> > 
> > + in the current node as the matches here may re-order the nodes
> > + below
> > it.
> > + As such the pattern that needs to be subsequently match may change.
> > 
> > and this is no longer t

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches

Christophe Lyon  writes:

> On Wed, 4 Nov 2020 at 14:29, Christophe Lyon  
> wrote:
>>
>> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>>  wrote:
>> >
>> > Hi Andrea,
>> >
>> > > -Original Message-
>> > > From: Andrea Corallo 
>> > > Sent: 26 October 2020 15:59
>> > > To: gcc-patches@gcc.gnu.org
>> > > Cc: Kyrylo Tkachov ; Richard Earnshaw
>> > > ; nd 
>> > > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
>> > >
>> > > Hi all,
>> > >
>> > > I'd like to submit the following patch implementing the bfloat16_t
>> > > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
>> > >
>> > > Please see refer to:
>> > > ACLE 
>> > > ISA  
>> > >
>> > > Regtested and bootstrapped.
>> > >
>> > > Okay for trunk?
>> >
>>
>> I think you need to add -mfloat-abi=hard to the dg-additional-options
>> otherwise vld1_lane_bf16_1.c
>> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>>
>> See bf16_vldn_1.c.
>
> Actually that's not sufficient because in turn we get:
> /sysroot-arm-none-linux-gnueabi/usr/include/gnu/stubs.h:10:11: fatal
> error: gnu/stubs-hard.h: No such file or directory
>
> So you should check that -mfloat-abi=hard is supported.
>
> Ditto for the vst tests.
>

Hi Christophe,

thanks for catching this, I'll prepare a patch.

  Andrea

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Jason Merrill via Gcc-patches

On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell  wrote:

> On 11/4/20 7:30 AM, Nathan Sidwell wrote:
>
> > rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try
> > that.
>
> yeah, that didn't work.  There's compilation errors in
> ../../../src/gcc/config/i386/x86-tune-costs.h about missing
> initializers.  and then ...
>
> In file included from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
>   from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
>   from
> /usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,
>   from /usr/include/c++/4.9/random:50,
>   from /usr/include/c++/4.9/bits/stl_algo.h:66,
>   from /usr/include/c++/4.9/algorithm:62,
>   from ../../../src/gcc/cp/mapper-resolver.cc:26:
> ./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
>   return malloc (__size);
>  ^
> Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed
>
> it's a little unfortunate we can't use the standard library :(  I'll see
> what I can do about avoiding algorithm.
>

We can; apparently the necessary incantation is to

#define INCLUDE_ALGORITHM

before

#include "system.h"

Jason

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches

Christophe Lyon  writes:

> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>  wrote:
>>
>> Hi Andrea,
>>
>> > -Original Message-
>> > From: Andrea Corallo 
>> > Sent: 26 October 2020 15:59
>> > To: gcc-patches@gcc.gnu.org
>> > Cc: Kyrylo Tkachov ; Richard Earnshaw
>> > ; nd 
>> > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
>> >
>> > Hi all,
>> >
>> > I'd like to submit the following patch implementing the bfloat16_t
>> > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
>> >
>> > Please see refer to:
>> > ACLE 
>> > ISA  
>> >
>> > Regtested and bootstrapped.
>> >
>> > Okay for trunk?
>>
>
> I think you need to add -mfloat-abi=hard to the dg-additional-options
> otherwise vld1_lane_bf16_1.c
> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>
> See bf16_vldn_1.c.
>
> BTW, why did you use a different naming scheme for the tests?
> (bf16_vldn_1.c vs vld1_lane_bf16_1.c)

Nothing special, it made more sense to me to use directly the name of
the intrinsic as it include already the bf16 information.  I believe we
have both schemas in the aarch64 & arm backends.  I've no problem with
renaming the tests if we feel is important.

  Andrea

Re: deprecations in OpenMP 5.0

2020-11-04 Thread Kwok Cheung Yeung


On 28/10/2020 4:06 pm, Jakub Jelinek wrote:

On Wed, Oct 28, 2020 at 03:41:25PM +, Kwok Cheung Yeung wrote:

What if we made the definition of __GOMP_DEPRECATED in the original patch
conditional on the current value of __OPENMP__? i.e. Something like:

+#if defined(__GNUC__) && __OPENMP__ >= 201811L
+# define __GOMP_DEPRECATED __attribute__((__deprecated__))
+#else
+# define __GOMP_DEPRECATED
+#endif

In that case, __GOMP_DEPRECATED will not do anything until __OPENMP__ is
updated to reflect OpenMP 5.0, but when it is, the functions will
immediately be marked deprecated without any further work.


That could work, but the macro name would need to incorporate the exact
OpenMP version.
Because some APIs can be deprecated in OpenMP 5.0, others in 5.1 or in 5.2
(all to be removed in 6.0), others in 6.0/6.1 etc. to be removed in 7.0 etc.


I've renamed __GOMP_DEPRECATED to __GOMP_DEPRECATED_5_0.



However, GFortran does not support the deprecated attribute, so how should
it behave? My first thought would be to print out a warning message at
runtime the first time a deprecated function is called (printing it out
every time would probably be too annoying), and maybe add an environment
variable that can be set to disable the warning. A similar runtime warning
could also be printed if the OMP_NESTED environment variable is set. Again,
printing these warnings could be surpressed until the value of __OPENMP__ is
bumped up.


I'm against such runtime diagnostics, that is perhaps good for some
sanitization, but not normal usage.  Perhaps better implement deprecated
attribute in gfortran?



I have used Tobias' recently added patch for Fortran deprecation support to mark 
omp_get_nested and omp_set_nested as deprecated. If the omp_lock_hint_* integer 
parameters are marked though, then the deprecation warnings will fire the moment 
omp_lib is used from a Fortran program, even if they are not referenced in the 
progam itself - a bug perhaps?


I have added '-cpp' (for preprocessor support) and '-fopenmp' (for the _OPENMP 
define) to the Makefile when compiling the omp_lib.f90.


Would a warning message be acceptable if OMP_NESTED is used? Obviously this 
cannot be done at compile-time.


Is this patch okay for trunk? We could add the deprecations for omp_lock_hint_* 
later when the deprecations for parameters are fixed. I have checked that it 
bootstraps on x86_64.


Kwok
From 6e8fc46bdcaf44da11d46968a488fdd990ae Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 4 Nov 2020 03:59:44 -0800
Subject: [PATCH] openmp: Mark deprecated symbols in OpenMP 5.0

2020-11-04  Ulrich Drepper  
Kwok Cheung Yeung  

libgomp/
* Makefile.am (%.mod): Add -cpp and -fopenmp to compile flags.
* Makefile.in: Regenerate.
* fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
pragmas to ignore -Wdeprecated-declarations warnings.
* icv.c: Likewise.
* omp.h.in (__GOMP_DEPRECATED_5_0): Define.
Mark omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
and omp_get_nested with __GOMP_DEPRECATED_5_0.
* omp_lib.f90.in: Mark omp_get_nested and omp_set_nested as
deprecated.
---
 libgomp/Makefile.am|  2 +-
 libgomp/Makefile.in|  2 +-
 libgomp/fortran.c  | 13 +++--
 libgomp/icv.c  | 10 --
 libgomp/omp.h.in   | 22 ++
 libgomp/omp_lib.f90.in |  4 
 6 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 586c930..4cf1f58 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -92,7 +92,7 @@ openacc_kinds.mod: openacc.mod
 openacc.mod: openacc.lo
:
 %.mod: %.f90
-   $(FC) $(FCFLAGS) -fsyntax-only $<
+   $(FC) $(FCFLAGS) -cpp -fopenmp -fsyntax-only $<
 fortran.lo: libgomp_f.h
 fortran.o: libgomp_f.h
 env.lo: libgomp_f.h
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 00d5e29..eb868b3 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -1382,7 +1382,7 @@ openacc_kinds.mod: openacc.mod
 openacc.mod: openacc.lo
:
 %.mod: %.f90
-   $(FC) $(FCFLAGS) -fsyntax-only $<
+   $(FC) $(FCFLAGS) -cpp -fopenmp -fsyntax-only $<
 fortran.lo: libgomp_f.h
 fortran.o: libgomp_f.h
 env.lo: libgomp_f.h
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 029dec1..cd719f9 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -47,10 +47,13 @@ ialias_redirect (omp_test_lock)
 ialias_redirect (omp_test_nest_lock)
 # endif
 ialias_redirect (omp_set_dynamic)
-ialias_redirect (omp_set_nested)
-ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_get_dynamic)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+ialias_redirect (omp_set_nested)
 ialias_redirect (omp_get_nested)
+#pragma GCC diagnostic pop
+ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_in_parallel)
 ialias_redirect (omp_get_max_threads)
 ialias_re

Re: deprecations in OpenMP 5.0

2020-11-04 Thread Jakub Jelinek via Gcc-patches

On Wed, Nov 04, 2020 at 02:23:17PM +, Kwok Cheung Yeung wrote:
> I have used Tobias' recently added patch for Fortran deprecation support to
> mark omp_get_nested and omp_set_nested as deprecated. If the omp_lock_hint_*
> integer parameters are marked though, then the deprecation warnings will
> fire the moment omp_lib is used from a Fortran program, even if they are not
> referenced in the progam itself - a bug perhaps?
> 
> I have added '-cpp' (for preprocessor support) and '-fopenmp' (for the
> _OPENMP define) to the Makefile when compiling the omp_lib.f90.
> 
> Would a warning message be acceptable if OMP_NESTED is used? Obviously this
> cannot be done at compile-time.

I'd strongly prefer no runtime warnings.

> 2020-11-04  Ulrich Drepper  
>   Kwok Cheung Yeung  
> 
>   libgomp/
>   * Makefile.am (%.mod): Add -cpp and -fopenmp to compile flags.
>   * Makefile.in: Regenerate.
>   * fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
>   pragmas to ignore -Wdeprecated-declarations warnings.
>   * icv.c: Likewise.
>   * omp.h.in (__GOMP_DEPRECATED_5_0): Define.
>   Mark omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
>   and omp_get_nested with __GOMP_DEPRECATED_5_0.
>   * omp_lib.f90.in: Mark omp_get_nested and omp_set_nested as
>   deprecated.

LGTM, except:

> +  omp_lock_hint_contended __GOMP_DEPRECATED_5_0 = omp_sync_hint_contended,
>omp_sync_hint_nonspeculative = 4,
> -  omp_lock_hint_nonspeculative = omp_sync_hint_nonspeculative,
> +  omp_lock_hint_nonspeculative __GOMP_DEPRECATED_5_0 = 
> omp_sync_hint_nonspeculative,

The above line is too long and needs wrapping.

But it would be nice to also add -Wno-deprecated to dg-additional-options of
tests that do use those.
Perhaps for testing replace the 201811 temporarily with 201511 and run make
check.

> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in
> @@ -644,4 +644,8 @@
>end function
>  end interface
>  
> +#if _OPENMP >= 201811
> +!GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
> +#endif
> +
>end module omp_lib

Also, what about omp_lib.h?  Do you plan to change it only when we switch
_OPENMP macro?  I mean, we can't rely on preprocessing in that case...

Jakub

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 2:04 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; nd ;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
> 
> On Wed, 4 Nov 2020, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: rguent...@c653.arch.suse.de  On
> > > Behalf Of Richard Biener
> > > Sent: Wednesday, November 4, 2020 12:41 PM
> > > To: Tamar Christina 
> > > Cc: Richard Sandiford ; nd ;
> > > gcc-patches@gcc.gnu.org
> > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > matching scaffolding.
> > >
> > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > >
> > > > Hi Richi,
> > > >
> > > > This is a respin which includes the changes you requested.
> > >
> > > Comments randomly ordered, I'm pasting in pieces of the patch -
> > > sending it inline would help to get pieces properly quoted and in-order.
> > >
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f
> > > 9
> > > aa42ffe0ce0dd
> > > 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > ...
> > >
> > > I miss comments in this file, see tree-vectorizer.h where we try to
> > > document purpose of classes and fields.
> > >
> > > Things that sticks out to me:
> > >
> > > +uint8_t m_arity;
> > > +uint8_t m_num_args;
> > >
> > > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > > num_args should be here ;)
> >
> > I think I can remove arity, but num_args is how many operands the
> > created internal function call should take.  Since we can't vectorize
> > calls with more than
> > 4 arguments at the moment it seemed like 255 would be a safe limit :).
> >
> > >
> > > +vec_info *m_vinfo;
> > > ...
> > > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > >
> > > so this looks like something I freed stmt_vec_info of -
> > > back-pointers in the "wrong" direction of the logical hierarchy.  I
> > > suppose it's just to avoid passing down vinfo where we need it?
> > > Please do that instead - pass down vinfo as everything else does.
> > >
> > > The class seems to expose both very high-level (build () it!) and
> > > very low level details (get_ifn).  The high-level one suggests that
> > > a pattern _not_ being represented by an ifn is possible but there's
> > > too much implementation detail already in the vect_pattern class to
> > > make that impossible.  I guess the IFN details could be pushed down
> > > to the simple matching class (and that be called vect_ifn_pattern or so).
> > >
> > > +static bool
> > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > +  bool found_p = false;
> > > +
> > > +  if (dump_enabled_p ())
> > > +{
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt
> > > + match
> > > --\n");
> > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > > +}
> > >
> > > we dumped all instances after their analysis.  Maybe just refer to
> > > the instance with its address (dump_print %p) so lookup in the
> > > (already large) dump file is easy.
> > >
> > > +  hash_set *visited = new hash_set ();  for
> > > + (unsigned x = 0; x < num__slp_patterns; x++)
> > > +{
> > > +  visited->empty ();
> > > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > > slp_patterns[x],
> > > +   visited);
> > > +}
> > > +
> > > +  delete visited;
> > >
> > > no need to new / delete, just do
> > >
> > >   has_set visited;
> > >
> > > like everyone else.  Btw, do you really want to scan pieces of the
> > > SLP graph (with instances being graph entries) multiple times?  If
> > > not then you should move the visited set to the caller instead.
> > >
> > > +  /* TODO: Remove in final version, only here for generating debug
> > > + dot
> > > graphs
> > > +  from SLP tree.  */
> > > +
> > > +  if (dump_enabled_p ())
> > > +{
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > > +}
> > >
> > > now, if there was some pattern matched it is probably useful to dump
> > > the graph (entry) again.  But only conditional on that I think.  So
> > > can you instead make the dump conditional on found_p and remove the
> > > start dot/end dot markers as said in the comment?
> > >
> > > + if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +"transformation for %s not valid due to
> > > + post
> > > "
> > > +

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Jozef Lawrynowicz

On Wed, Nov 04, 2020 at 05:47:28AM -0800, H.J. Lu wrote:
> On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
> >
> > On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
> >  wrote:
> > >
> > > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > > wrote:
> > > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via 
> > > > > > > > Gcc-patches wrote:
> > > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > The attached patch implements 
> > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED for ELF GNU
> > > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > > attribute
> > > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > > >
> > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > > ".retain"
> > > > > > > > >
> > > > > > > > > Can you use the "R" flag instead?
> > > > > > > > >
> > > > > > > >
> > > > > > > > For the benefit of this mailing list, I have copied my response 
> > > > > > > > from the
> > > > > > > > Binutils mailing list regarding this.
> > > > > > > > The "comm_section" example I gave is actually innacurate, but 
> > > > > > > > you can
> > > > > > > > see the examples of the variety of sections that would need to 
> > > > > > > > be
> > > > > > > > handled by doing
> > > > > > > >
> > > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > > >
> > > > > > > > > ... snip ...
> > > > > > > > > Secondly, for seamless integration with the "used" attribute, 
> > > > > > > > > we must be
> > > > > > > > > able to to mark the symbol with the used attribute applied as 
> > > > > > > > > "retained"
> > > > > > > > > without changing its section name. For GCC "named" sections, 
> > > > > > > > > this is
> > > > > > > > > straightforward, but for "unnamed" sections it is a giant 
> > > > > > > > > mess.
> > > > > > > > >
> > > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > > available,
> > > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > > switch to one
> > > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded 
> > > > > > > > > in the
> > > > > > > > > structure.
> > > > > > > > >
> > > > > > > > > Backends define the assembly code to switch to these sections 
> > > > > > > > > (some
> > > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > > unnamed section
> > > > > > > > > "comm_section", might correspond to a .bss section, or emit a 
> > > > > > > > > .comm
> > > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > > what the
> > > > > > > > > name of a section will be, but it would be very messy and not 
> > > > > > > > > robust.
> > > > > > > > >
> > > > > > > > > Meanwhile, having a .retain  directive is a very 
> > > > > > > > > simmple
> > > > > > > > > solution, and keeps the GCC implementation really concise 
> > > > > > > > > (patch
> > > > > > > > > attached). The assembler will know for sure what the section 
> > > > > > > > > containing
> > > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > > directly.
> > > > > > > > >
> > > > > > >
> > > > > > > Please take a look at
> > > > > > >
> > > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > > >
> > > > > > > which is built in top of
> > > > > > >
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > > >
> > > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > > want, you extract my flags2 change and use it for SHF_GNU_RETAIN.
> > > > > >
> > > > > > In your patch you have to make the assumption that data_section, 
> > > > > > always
> > > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > > (which
> > > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > > >
> > > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > > >
> > > > > > data_section for c6x corresponds to .fardata, not .data. So the use 
> > > > > > of
> > > > > > "used" on a data declaration would place it in a different section, 
> > > > > > that
> > > > > > if the "used" attribute was not applied.
> > > > > >
> > > > > > For c6x and mips, readonly_data_section does not correspond to 
> > > > > > .rodata,
> > > > > > so that assumption cannot be made either:
> > > > > > > c6x/elf-common.h:#define R

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches

On Wed, Nov 4, 2020 at 6:41 AM Jozef Lawrynowicz
 wrote:
>
> On Wed, Nov 04, 2020 at 05:47:28AM -0800, H.J. Lu wrote:
> > On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
> > >
> > > On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
> > >  wrote:
> > > >
> > > > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > > > wrote:
> > > > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via 
> > > > > > > > > Gcc-patches wrote:
> > > > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > The attached patch implements 
> > > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED for ELF GNU
> > > > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > > > attribute
> > > > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > > > >
> > > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > > > ".retain"
> > > > > > > > > >
> > > > > > > > > > Can you use the "R" flag instead?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > For the benefit of this mailing list, I have copied my 
> > > > > > > > > response from the
> > > > > > > > > Binutils mailing list regarding this.
> > > > > > > > > The "comm_section" example I gave is actually innacurate, but 
> > > > > > > > > you can
> > > > > > > > > see the examples of the variety of sections that would need 
> > > > > > > > > to be
> > > > > > > > > handled by doing
> > > > > > > > >
> > > > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > > > >
> > > > > > > > > > ... snip ...
> > > > > > > > > > Secondly, for seamless integration with the "used" 
> > > > > > > > > > attribute, we must be
> > > > > > > > > > able to to mark the symbol with the used attribute applied 
> > > > > > > > > > as "retained"
> > > > > > > > > > without changing its section name. For GCC "named" 
> > > > > > > > > > sections, this is
> > > > > > > > > > straightforward, but for "unnamed" sections it is a giant 
> > > > > > > > > > mess.
> > > > > > > > > >
> > > > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > > > available,
> > > > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > > > switch to one
> > > > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded 
> > > > > > > > > > in the
> > > > > > > > > > structure.
> > > > > > > > > >
> > > > > > > > > > Backends define the assembly code to switch to these 
> > > > > > > > > > sections (some
> > > > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > > > unnamed section
> > > > > > > > > > "comm_section", might correspond to a .bss section, or emit 
> > > > > > > > > > a .comm
> > > > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > > > what the
> > > > > > > > > > name of a section will be, but it would be very messy and 
> > > > > > > > > > not robust.
> > > > > > > > > >
> > > > > > > > > > Meanwhile, having a .retain  directive is a 
> > > > > > > > > > very simmple
> > > > > > > > > > solution, and keeps the GCC implementation really concise 
> > > > > > > > > > (patch
> > > > > > > > > > attached). The assembler will know for sure what the 
> > > > > > > > > > section containing
> > > > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > > > directly.
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Please take a look at
> > > > > > > >
> > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > > > >
> > > > > > > > which is built in top of
> > > > > > > >
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > > > >
> > > > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > > > want, you extract my flags2 change and use it for 
> > > > > > > > SHF_GNU_RETAIN.
> > > > > > >
> > > > > > > In your patch you have to make the assumption that data_section, 
> > > > > > > always
> > > > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > > > (which
> > > > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > > > >
> > > > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > > > >
> > > > > > > data_section for c6x corresponds to .fardata, not .data. So the 
> > > > > > > use of
> > > > > > > "used" on a data declaration would place it in a dif

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Nathan Sidwell


On 11/4/20 9:15 AM, Jason Merrill wrote:
On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell > wrote:



We can; apparently the necessary incantation is to

#define INCLUDE_ALGORITHM


thanks that's fixed the build problem.  And working around the i386 
error I get a working toolchain.  modules test all pass except a trivial 
one detecting va_list looks different.  I must have messed the target 
check there.


so i686-linux is now known good

nathan

--
Nathan Sidwell

Re: [PATCH v5] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-11-04 Thread Raoni Fassina Firmino via Gcc-patches

On Wed, Nov 04, 2020 at 10:35:03AM +0100, Richard Biener wrote:
> > +/* Expand call EXP to the fegetround builtin (from C99 fenv.h), returning 
> > the
> > +   result and setting it in TARGET.  Otherwise return NULL_RTX on failure. 
> >  */
> > +static rtx
> > +expand_builtin_fegetround (tree exp, rtx target, machine_mode target_mode)
> > +{
> > +  if (!validate_arglist (exp, VOID_TYPE))
> > +return NULL_RTX;
> > +
> > +  insn_code icode = direct_optab_handler (fegetround_optab, SImode);
> > +  if (icode == CODE_FOR_nothing)
> > +return NULL_RTX;
> > +
> > +  if (target == 0
> > +  || GET_MODE (target) != target_mode
> > +  || !(*insn_data[icode].operand[0].predicate) (target, target_mode))
> > +target = gen_reg_rtx (target_mode);
> > +
> > +  rtx pat = GEN_FCN (icode) (target);
> > +  if (!pat)
> > +return NULL_RTX;
> > +  emit_insn (pat);
> 
> I think you need to verify whether the expansion ended up in 'target'
> and otherwise emit a move since usually 'target' is just a hint.

I thought the "if (target == 0 ..." took care of that. The expands do
emit a move, if that helps.

For feclearexcept and feraiseexcept I included tests to variable
'target', including none, but now I see that I did not do the same for
fegetround, I can add the same if it is necessary, but the test do check
if the return is correct, so I don't know.


> > +@cindex @code{fegetround@var{m}} instruction pattern
> > +@item @samp{fegetround@var{m}}
> > +Store the current machine floating-point rounding mode into operand 0.
> > +Operand 0 has mode @var{m}, which is scalar.  This pattern is used to
> > +implement the @code{fegetround} function from the ISO C99 standard.
> 
> I think this needs to elaborate on the format of the "rounding mode".
> 
> AFAICS you do nothing to marshall with the actually used libc
> implementation which AFAIU can choose arbitrary values for
> the FE_* macros.  I'm not sure we require the compiler to be
> configured for one specific C library and for example require
> matching FE_* macro definitions for all uses of the built
> compiler.
> 
> For the patch at hand you seem to assume the libc "format"
> matches the hardware one (which would of course be reasonable).
> 
> Does that actually hold up when looking at libcs other than 
> glibc supporting powerpc?

I checked in some other libc implementations that have POWER support and
all have the same value as glic for the four rounding modes and the five
exception flags from libc. The libcs implementations I checked are:

 - musl
 - uclibc & uclibc-ng
 - freebsd

Is There any other I am missing?


> If all of these are non-issues then the middle-end pices look OK.
> If we need any such "translation" layer then I guess we need
> to either have additional operands to the optabs specifying
> all of the FE_* values relevant for the respective call or
> provide a side-channel (target hook) to implement the
> translation on the expansion side.

IMHO, It seems like it is not necessary if there not a libc that have
different values for the FE_* macros. I didn't check other archs, but if
is the case for some other arch I think it could be changed if and when
some other arch implements expands for these builtins.


o/
Raoni

RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 1:36 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> > selects from loads from the same DR.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This change allows one to simplify lane permutes that select from
> > > multiple load leafs that load from the same DR group by promoting the
> > > VEC_PERM node into a load itself and pushing the lane permute into it as a
> > load permute.
> > >
> > > This saves us from having to calculate where to materialize a new load 
> > > node.
> > > If the resulting loads are now unused they are freed and are removed
> > > from the graph.
> > >
> > > This allows us to handle cases where we would have generated:
> > >
> > >   moviv4.4s, 0
> > >   adrpx3, .LC0
> > >   ldr q5, [x3, #:lo12:.LC0]
> > >   mov x3, 0
> > >   .p2align 3,,7
> > > .L2:
> > >   mov v0.16b, v4.16b
> > >   mov v3.16b, v4.16b
> > >   ldr q1, [x1, x3]
> > >   ldr q2, [x0, x3]
> > >   fcmla   v0.4s, v2.4s, v1.4s, #0
> > >   fcmla   v3.4s, v1.4s, v2.4s, #0
> > >   fcmla   v0.4s, v2.4s, v1.4s, #270
> > >   fcmla   v3.4s, v1.4s, v2.4s, #270
> > >   mov v1.16b, v3.16b
> > >   tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > >   str q0, [x2, x3]
> > >   add x3, x3, 16
> > >   cmp x3, 1600
> > >   bne .L2
> > >   ret
> > >
> > > and instead generate
> > >
> > >   mov x3, 0
> > >   .p2align 3,,7
> > > .L27:
> > >   ldr q0, [x2, x3]
> > >   ldr q1, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   fcmla   v0.2d, v1.2d, v2.2d, #0
> > >   fcmla   v0.2d, v1.2d, v2.2d, #270
> > >   str q0, [x2, x3]
> > >   add x3, x3, 16
> > >   cmp x3, 512
> > >   bne .L27
> > >   ret
> > >
> > > This runs as a pre step such that permute simplification can still
> > > inspect this permute is needed
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Tests are included as part of the final patch as they need the SLP
> > > pattern matcher to insert permutes in between.
> > >
> > > Ok for master?
> > 
> > So I think this is too specialized for the general issue that we're doing a 
> > bad
> > job in CSEing the load part of different permutes of the same group.  I've
> > played with fixing this half a year ago (again) in multiple general ways but
> > they all caused some regressions.
> > 
> > So you're now adding some heuristics as to when to anticipate "CSE" (or
> > merging with followup permutes).
> > 
> > To quickly recap what I did consider two loads (V2DF) one { a[0], a[1] } and
> > the other { a[1], a[0] }.  They currently are two SLP nodes and one with a
> > load_permutation.
> > My original attempts focused on trying to get rid of load_permutation in
> > favor of lane_permute nodes and thus during SLP discovery I turned the
> > second into { a[0], a[1] } (magically unified with the other load) and a
> > followup lane-permute node.
> > 
> > So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] } which 
> > eventually
> > will (due to patterns) be lane-permuted into { a[0], a[1] }, right?  So
> > generalizing this as a single { a[0], a[1] } plus two lane-permute nodes  { 
> > 0, 0 }
> > and { 1, 1 } early would solve the issue as well?
> 
> Correct, I did wonder why it was generating two different nodes instead of a 
> lane
> permute but didn't pay much attention that it was just a short coming.
> 
> > Now, in general it might be
> > more profitable to generate the { a[0], a[0] } and { a[1], a[1] } via 
> > scalar-load-
> > and-splat rather than vector load and permute so we have to be careful to
> > not over-optimize here or be prepared to do the reverse transform.
> 
> This in principle can be done in optimize_slp then right? Since it would do
> a lot of the same work already and find the materialization points. 
> 
> > 
> > The patch itself is a bit ugly since it modifies the SLP graph when we 
> > already
> > produced the graphds graph so I would do any of this before.  I did consider
> > gathering all loads nodes loading from a group and then trying to apply some
> > heuristic to alter the SLP graph so it can be better optimized.  In fact 
> > when we
> > want to generate the same code as the non-SLP interleaving scheme does
> > we do have to look at those since we have to unify loads there.
> > 
> 
> Yes.. I will concede the patch isn't my finest work.. I also don't like the 
> fact that I
> had to keep leafs in tact less I break things later. But wanted feedback :) 
> 
> > I'd put this after vect_slp_build_vertices but before the new_graph call -
> > altering 'vertices' / 'leafs' should be more easily possible and the 
> > 'leafs' array
> > contains all loads already (vect_slp_build_vert

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Richard Biener

On Wed, 4 Nov 2020, Tamar Christina wrote:

> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 2:04 PM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; nd ;
> > gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Wed, 4 Nov 2020, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: rguent...@c653.arch.suse.de  On
> > > > Behalf Of Richard Biener
> > > > Sent: Wednesday, November 4, 2020 12:41 PM
> > > > To: Tamar Christina 
> > > > Cc: Richard Sandiford ; nd ;
> > > > gcc-patches@gcc.gnu.org
> > > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > > matching scaffolding.
> > > >
> > > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > > >
> > > > > Hi Richi,
> > > > >
> > > > > This is a respin which includes the changes you requested.
> > > >
> > > > Comments randomly ordered, I'm pasting in pieces of the patch -
> > > > sending it inline would help to get pieces properly quoted and in-order.
> > > >
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > > >
> > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f
> > > > 9
> > > > aa42ffe0ce0dd
> > > > 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > ...
> > > >
> > > > I miss comments in this file, see tree-vectorizer.h where we try to
> > > > document purpose of classes and fields.
> > > >
> > > > Things that sticks out to me:
> > > >
> > > > +uint8_t m_arity;
> > > > +uint8_t m_num_args;
> > > >
> > > > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > > > num_args should be here ;)
> > >
> > > I think I can remove arity, but num_args is how many operands the
> > > created internal function call should take.  Since we can't vectorize
> > > calls with more than
> > > 4 arguments at the moment it seemed like 255 would be a safe limit :).
> > >
> > > >
> > > > +vec_info *m_vinfo;
> > > > ...
> > > > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > > >
> > > > so this looks like something I freed stmt_vec_info of -
> > > > back-pointers in the "wrong" direction of the logical hierarchy.  I
> > > > suppose it's just to avoid passing down vinfo where we need it?
> > > > Please do that instead - pass down vinfo as everything else does.
> > > >
> > > > The class seems to expose both very high-level (build () it!) and
> > > > very low level details (get_ifn).  The high-level one suggests that
> > > > a pattern _not_ being represented by an ifn is possible but there's
> > > > too much implementation detail already in the vect_pattern class to
> > > > make that impossible.  I guess the IFN details could be pushed down
> > > > to the simple matching class (and that be called vect_ifn_pattern or 
> > > > so).
> > > >
> > > > +static bool
> > > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > > +  bool found_p = false;
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +{
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt
> > > > + match
> > > > --\n");
> > > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > > > +}
> > > >
> > > > we dumped all instances after their analysis.  Maybe just refer to
> > > > the instance with its address (dump_print %p) so lookup in the
> > > > (already large) dump file is easy.
> > > >
> > > > +  hash_set *visited = new hash_set ();  for
> > > > + (unsigned x = 0; x < num__slp_patterns; x++)
> > > > +{
> > > > +  visited->empty ();
> > > > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > > > slp_patterns[x],
> > > > +   visited);
> > > > +}
> > > > +
> > > > +  delete visited;
> > > >
> > > > no need to new / delete, just do
> > > >
> > > >   has_set visited;
> > > >
> > > > like everyone else.  Btw, do you really want to scan pieces of the
> > > > SLP graph (with instances being graph entries) multiple times?  If
> > > > not then you should move the visited set to the caller instead.
> > > >
> > > > +  /* TODO: Remove in final version, only here for generating debug
> > > > + dot
> > > > graphs
> > > > +  from SLP tree.  */
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +{
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > > > +}
> > > >
> > > > now, if there was some pattern matched it is probably useful to dump
> > > > the graph (entry) again.  But only conditional on that I think.  So
> > > > can you instead make the dump conditional on f

RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Tamar Christina via Gcc-patches




> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 3:12 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: RE: [PATCH v2 10/18]middle-end simplify lane permutes which
> selects from loads from the same DR.
> 
> On Wed, 4 Nov 2020, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -Original Message-
> > > From: rguent...@c653.arch.suse.de  On
> > > Behalf Of Richard Biener
> > > Sent: Wednesday, November 4, 2020 1:36 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > > Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> > > selects from loads from the same DR.
> > >
> > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This change allows one to simplify lane permutes that select from
> > > > multiple load leafs that load from the same DR group by promoting
> > > > the VEC_PERM node into a load itself and pushing the lane permute
> > > > into it as a
> > > load permute.
> > > >
> > > > This saves us from having to calculate where to materialize a new load
> node.
> > > > If the resulting loads are now unused they are freed and are
> > > > removed from the graph.
> > > >
> > > > This allows us to handle cases where we would have generated:
> > > >
> > > > moviv4.4s, 0
> > > > adrpx3, .LC0
> > > > ldr q5, [x3, #:lo12:.LC0]
> > > > mov x3, 0
> > > > .p2align 3,,7
> > > > .L2:
> > > > mov v0.16b, v4.16b
> > > > mov v3.16b, v4.16b
> > > > ldr q1, [x1, x3]
> > > > ldr q2, [x0, x3]
> > > > fcmla   v0.4s, v2.4s, v1.4s, #0
> > > > fcmla   v3.4s, v1.4s, v2.4s, #0
> > > > fcmla   v0.4s, v2.4s, v1.4s, #270
> > > > fcmla   v3.4s, v1.4s, v2.4s, #270
> > > > mov v1.16b, v3.16b
> > > > tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > > > str q0, [x2, x3]
> > > > add x3, x3, 16
> > > > cmp x3, 1600
> > > > bne .L2
> > > > ret
> > > >
> > > > and instead generate
> > > >
> > > > mov x3, 0
> > > > .p2align 3,,7
> > > > .L27:
> > > > ldr q0, [x2, x3]
> > > > ldr q1, [x0, x3]
> > > > ldr q2, [x1, x3]
> > > > fcmla   v0.2d, v1.2d, v2.2d, #0
> > > > fcmla   v0.2d, v1.2d, v2.2d, #270
> > > > str q0, [x2, x3]
> > > > add x3, x3, 16
> > > > cmp x3, 512
> > > > bne .L27
> > > > ret
> > > >
> > > > This runs as a pre step such that permute simplification can still
> > > > inspect this permute is needed
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > Tests are included as part of the final patch as they need the SLP
> > > > pattern matcher to insert permutes in between.
> > > >
> > > > Ok for master?
> > >
> > > So I think this is too specialized for the general issue that we're
> > > doing a bad job in CSEing the load part of different permutes of the
> > > same group.  I've played with fixing this half a year ago (again) in
> > > multiple general ways but they all caused some regressions.
> > >
> > > So you're now adding some heuristics as to when to anticipate "CSE"
> > > (or merging with followup permutes).
> > >
> > > To quickly recap what I did consider two loads (V2DF) one { a[0],
> > > a[1] } and the other { a[1], a[0] }.  They currently are two SLP
> > > nodes and one with a load_permutation.
> > > My original attempts focused on trying to get rid of
> > > load_permutation in favor of lane_permute nodes and thus during SLP
> > > discovery I turned the second into { a[0], a[1] } (magically unified
> > > with the other load) and a followup lane-permute node.
> > >
> > > So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] }
> > > which eventually will (due to patterns) be lane-permuted into {
> > > a[0], a[1] }, right?  So generalizing this as a single { a[0], a[1]
> > > } plus two lane-permute nodes  { 0, 0 } and { 1, 1 } early would solve the
> issue as well?
> >
> > Correct, I did wonder why it was generating two different nodes
> > instead of a lane permute but didn't pay much attention that it was just a
> short coming.
> >
> > > Now, in general it might be
> > > more profitable to generate the { a[0], a[0] } and { a[1], a[1] }
> > > via scalar-load- and-splat rather than vector load and permute so we
> > > have to be careful to not over-optimize here or be prepared to do the
> reverse transform.
> >
> > This in principle can be done in optimize_slp then right? Since it
> > would do a lot of the same work already and find the materialization points.
> >
> > >
> > > The patch itself is a bit ugly since it modifies the SLP graph when
> > > we already produced the graphds graph so I would do any of this
> > > before.  I did consider gather

[committed] libstdc++: Fix test failure with --disable-linux-futex

2020-11-04 Thread Jonathan Wakely via Gcc-patches

As noted in PR 96817 this new test fails if the library is built without
futexes. That's expected of course, but we might as well fail more
obviously than a deadlock that eventually times out.

libstdc++-v3/ChangeLog:

* testsuite/18_support/96817.cc: Fail fail if the library is
configured to not use futexes.

Tested powerpc64le-linux. Committed to trunk.

I've just realised the changelog above should say "Fail fast", I'll
fix that in the ChangeLog tomorrow.


commit 9c1125c121423a9948fa39e71ef89ba4059a2fad
Author: Jonathan Wakely 
Date:   Wed Nov 4 15:24:47 2020

libstdc++: Fix test failure with --disable-linux-futex

As noted in PR 96817 this new test fails if the library is built without
futexes. That's expected of course, but we might as well fail more
obviously than a deadlock that eventually times out.

libstdc++-v3/ChangeLog:

* testsuite/18_support/96817.cc: Fail fail if the library is
configured to not use futexes.

diff --git a/libstdc++-v3/testsuite/18_support/96817.cc 
b/libstdc++-v3/testsuite/18_support/96817.cc
index f03329678313..4591a7288a57 100644
--- a/libstdc++-v3/testsuite/18_support/96817.cc
+++ b/libstdc++-v3/testsuite/18_support/96817.cc
@@ -24,6 +24,10 @@
 #include 
 #include 
 
+#ifndef _GLIBCXX_HAVE_LINUX_FUTEX
+# error "This test requries futex support in the library"
+#endif
+
 int init()
 {
 #if __has_include()

Re: [PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-11-04 Thread Richard Sandiford via Gcc-patches

Matthew Malcomson  writes:
> Hi Richard,
>
> I'm sending up the revised patch 5 (introducing stack variable handling)
> without the other changes to other patches.
>
> I figure there's been quite a lot of changes to this patch and I wanted
> to give you time to review them while I worked on finishing the less
> widespread changes in patch 6 and before I ran the more exhaustive (and
> time-consuming) tests in case you didn't like the changes and those
> exhaustive tests would just have to get repeated.

Thanks, the new approach looks good to me.  Most of the comments below
are just minor.

> […]
> @@ -75,6 +89,26 @@ extern hash_set  *asan_used_labels;
>  
>  #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE   "use after scope memory"
>  
> +/* NOTE: The values below and the hooks under targetm.memtag define an ABI 
> and
> +   are hard-coded to these values in libhwasan, hence they can't be changed
> +   independently here.  */
> +/* How many bits are used to store a tag in a pointer.
> +   The default version uses the entire top byte of a pointer (i.e. 8 bits).  
> */
> +#define HWASAN_TAG_SIZE targetm.memtag.tag_size ()
> +/* Tag Granule of HWASAN shadow stack.
> +   This is the size in real memory that each byte in the shadow memory refers
> +   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow

s/it's/its/

> +   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
> +   Most variables will need to be aligned to this amount since two variables
> +   that are neighbours in memory and share a tag granule would need to share

s/neighbours/neighbors/

> +   the same tag (the shared tag granule can only store one tag).  */
> +#define HWASAN_TAG_GRANULE_SIZE targetm.memtag.granule_size ()
> +/* Define the tag for the stack background.
> +   This defines what tag the stack pointer will be and hence what tag all
> +   variables that are not given special tags are (e.g. spilled registers,
> +   and parameters passed on the stack).  */
> +#define HWASAN_STACK_BACKGROUND gen_int_mode (0, QImode)
> +
>  /* Various flags for Asan builtins.  */
>  enum asan_check_flags
>  {
> […]
> @@ -1352,6 +1393,28 @@ asan_redzone_buffer::flush_if_full (void)
>  flush_redzone_payload ();
>  }
>  
> +/* Returns whether we are tagging pointers and checking those tags on memory
> +   access.  */
> +bool
> +hwasan_sanitize_p ()
> +{
> +  return sanitize_flags_p (SANITIZE_HWADDRESS);
> +}
> +
> +/* Are we tagging the stack?  */
> +bool
> +hwasan_sanitize_stack_p ()
> +{
> +  return (hwasan_sanitize_p () && param_hwasan_instrument_stack);
> +}
> +
> +/* Are we protecting alloca objects?  */

Same comment as before about avoiding the word “protect”, both in the
comment and the option name.  Maybe s/protect/sanitize/ or s/protect/tag/.

> +bool
> +hwasan_sanitize_allocas_p (void)
> +{
> +  return (hwasan_sanitize_stack_p () && param_hwasan_protect_allocas);
> +}
> +
>  /* Insert code to protect stack vars.  The prologue sequence should be 
> emitted
> directly, epilogue sequence returned.  BASE is the register holding the
> stack base, against which OFFSETS array offsets are relative to, OFFSETS
> […]
> @@ -3702,4 +3772,330 @@ make_pass_asan_O0 (gcc::context *ctxt)
>return new pass_asan_O0 (ctxt);
>  }
>  
> +/* For stack tagging:
> +
> +   Return the offset from the frame base tag that the "next" expanded object
> +   should have.  */
> +uint8_t
> +hwasan_current_frame_tag ()
> +{
> +  return hwasan_frame_tag_offset;
> +}
> +
> +/* For stack tagging:
> +
> +   Return the 'base pointer' for this function.  If that base pointer has not
> +   yet been created then we create a register to hold it and initialise that
> +   value with a possibly random tag and the value of the
> +   virtual_stack_vars_rtx.  */

As discussed offline, I think the old approach of generating the
initialisation in hwasan_emit_prologue was safer, although I agree
there doesn't seem to be a specific problem with doing things this way.

> +rtx
> +hwasan_frame_base ()
> +{
> +  if (! hwasan_frame_base_ptr)
> +{
> +  hwasan_frame_base_ptr
> + = targetm.memtag.insert_random_tag (virtual_stack_vars_rtx);
> +}

Nit: should be no braces around single statements, even if they span
multiple lines.

> +
> +  return hwasan_frame_base_ptr;
> +}
> +
> +/* Record a compile-time constant size stack variable that HWASAN will need 
> to
> +   tag.  This record of the range of a stack variable will be used by
> +   `hwasan_emit_prologue` to emit the RTL at the start of each frame which 
> will
> +   set tags in the shadow memory according to the assigned tag for each 
> object.
> +
> +   The range that the object spans in stack space should be described by the
> +   bounds `untagged_base + nearest` and `untagged_base + farthest`.
> +   `tagged_base` is the base address which contains the "base frame tag" for
> +   this frame, and from which the value to address this object with will be
> +   calculated.
> +
> +   We record the `untagged_

RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-04 Thread Carl Love via Gcc-patches

David:

I have reworked the patch moving the new vector instruction patterns to
vsx.md.  Also, cleaned up the vector division instructions.  The
div3 pattern definitions are the only ones that should be
defined.  

I have retested the patch on:

   powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

--

2020-11-02  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI,
VMULHS_V2DI, VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI):
Add builtin define.
(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_VDIVEU_V4SI,
P10V_BUILTIN_VDIVEU_V2DI, P10V_BUILTIN_VDIVU_V4SI,
P10V_BUILTIN_VDIVU_V2DI, P10V_BUILTIN_VMODU_V2DI,
P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case
statement for builtins.
* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
(UNSPEC_VDIVES, UNSPEC_VDIVEU,
UNSPEC_VMULHS, UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(vdives_, vdiveu_, vdiv3, uuvdiv3,
vmods_, vmodu_, vmulhs_, vmulhu_, mulv2di3):
Add define_insn, mode is VIlong.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  23 ++
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/vsx.md  | 205 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 378 ++
 7 files changed, 730 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..d8f1d2cfc55 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index a58102c3785..7663465b755 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, 
vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
+BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
+BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
+BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
+BU_P10V_AV_

Re: [PATCH 1/4] IBM Z: Remove unused RRe and RXe mode_attrs

2020-11-04 Thread Andreas Krebbel via Gcc-patches

On 03.11.20 22:36, Ilya Leoshkevich wrote:
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390.md (RRe): Remove.
>   (RXe): Remove.

Ok. Thanks!

Andreas

Re: [PATCH 2/4] IBM Z: Unhardcode NR_C_MODES

2020-11-04 Thread Andreas Krebbel via Gcc-patches

On 03.11.20 22:45, Ilya Leoshkevich wrote:
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390.c (NR_C_MODES): Unhardcode.
>   (s390_alloc_pool): Use size_t for iterating from 0 to
>   NR_C_MODES.
>   (s390_add_constant): Likewise.
>   (s390_find_constant): Likewise.
>   (s390_dump_pool): Likewise.
>   (s390_free_pool): Likewise.

Ok. Thanks!

Andreas

Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

2020-11-04 Thread Andreas Krebbel via Gcc-patches

On 03.11.20 22:45, Ilya Leoshkevich wrote:
> On z14+, there are instructions for working with 128-bit floats (long
> doubles) in vector registers.  It's beneficial to use them instead of
> instructions that operate on floating point register pairs, because it
> allows to store 4 times more data in registers at a time, relieveing
> register pressure.  The performance of new instructions is almost the
> same.
> 
> Implement by storing TFmode values in vector registers on z14+.  Since
> not all operations are available with the new instructions, keep the old
> ones using the new FPRX2 mode, and convert between it and TFmode when
> necessary (this is called "forwarder" expanders below).  Change the
> existing TFmode expanders to call either new- or old-style ones
> depending on whether we are on z14+ or older machines ("dispatcher"
> expanders).
> 
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390-modes.def (FPRX2): New mode.
>   * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
>   * config/s390/s390.c (s390_fma_allowed_p): Likewise.
>   (s390_build_signbit_mask): Support 128-bit masks.
>   (print_operand): Support printing the second word of a TFmode
>   operand as vector register.
>   (constant_modes): Add FPRX2mode.
>   (s390_class_max_nregs): Return 1 for TFmode on z14+.
>   (s390_is_fpr128): New function.
>   (s390_is_vr128): Likewise.
>   (s390_can_change_mode_class): Use s390_is_fpr128 and
>   s390_is_vr128 in order to determine whether mode refers to a FPR
>   pair or to a VR.
>   * config/s390/s390.h (EXPAND_MOVTF): New macro.
>   (EXPAND_TF): Likewise.
>   * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
>   alias.
>   (ALL): Add FPRX2.
>   (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (FP): Likewise.
>   (FP_ANYTF): New mode iterator.
>   (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (TD_TF): Likewise.
>   (xde): Add FPRX2.
>   (nBFP): Likewise.
>   (nDFP): Likewise.
>   (DSF): Likewise.
>   (DFDI): Likewise.
>   (SFSI): Likewise.
>   (DF): Likewise.
>   (SF): Likewise.
>   (fT0): Likewise.
>   (bt): Likewise.
>   (_d): Likewise.
>   (HALF_TMODE): Likewise.
>   (tf_fpr): New mode_attr.
>   (type): New mode_attr.
>   (*cmp_ccz_0): Use type instead of mode with fsimp.
>   (*cmp_ccs_0_fastmath): Likewise.
>   (*cmptf_ccs): New pattern for wfcxb.
>   (*cmptf_ccsfps): New pattern for wfkxb.
>   (mov): Rename to mov.
>   (signbit2): Rename to signbit2.
>   (isinf2): Renamed to isinf2.
>   (*TDC_insn_): Use type instead of mode with fsimp.
>   (fixuns_trunc2): Rename to
>   fixuns_trunc2.
>   (fix_trunctf2): Rename to fix_trunctf2_fpr.
>   (floatdi2): Rename to floatdi2, use type
>   instead of mode with itof.
>   (floatsi2): Rename to floatsi2, use type
>   instead of mode with itof.
>   (*floatuns2): Use type instead of mode for
>   itof.
>   (floatuns2): Rename to
>   floatuns2.
>   (trunctf2): Rename to trunctf2_fpr, use type instead
>   of mode with fsimp.
>   (extend2): Rename to
>   extend2.
>   (2): Rename to
>   2, use type instead of
>   mode with fsimp.
>   (rint2): Rename to rint2, use
>   type instead of mode with fsimp.
>   (2): Use type instead of mode for
>   fsimp.
>   (rint2): Likewise.
>   (trunc2): Rename to
>   trunc2.
>   (trunc2): Rename to
>   trunc2.
>   (extend2): Rename to
>   extend2.
>   (extend2): Rename to
>   extend2.
>   (add3): Rename to add3, use type instead of
>   mode with fsimp.
>   (*add3_cc): Use type instead of mode with fsimp.
>   (*add3_cconly): Likewise.
>   (sub3): Rename to sub3, use type instead of
>   mode with fsimp.
>   (*sub3_cc): Use type instead of mode with fsimp.
>   (*sub3_cconly): Likewise.
>   (mul3): Rename to mul3, use type instead of
>   mode with fsimp.
>   (fma4): Restrict using s390_fma_allowed_p.
>   (fms4): Restrict using s390_fma_allowed_p.
>   (div3): Rename to div3, use type instead of
>   mode with fdiv.
>   (neg2): Rename to neg2.
>   (*neg2_cc): Use type instead of mode with fsimp.
>   (*neg2_cconly): Likewise.
>   (*neg2_nocc): Likewise.
>   (*neg2): Likeiwse.
>   (abs2): Rename to abs2, use type instead of
>   mode with fdiv.
>   (*abs2_cc): Use type instead of mode with fsimp.
>   (*abs2_cconly): Likewise.
>   (*abs2_nocc): Likewise.
>   (*abs2): Likewise.
>   (*negabs2_cc): Likewise.
>   (*negabs2_cconly): Likewise.
>   (*negabs2_nocc): Likewise.
>   (*negabs2): Likewise.
>   (sqrt2): Rename to sqrt2, use type instead
>   of mode with fsqrt.
>   (cbranch4): Use FP_ANYTF instead of FP.
>   (copysign3): Rename to copysign3, use

Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers

2020-11-04 Thread Andreas Krebbel via Gcc-patches

These tests all use the -mzvector option but do not appear to make use of the z 
vector languages
extensions. I think that option could be removed. Then these tests should be 
moved to the vector subdir.

You could do the asm scanning also in dg-do run tests.

Andreas


On 03.11.20 22:46, Ilya Leoshkevich wrote:
> gcc/testsuite/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-caller-abi-run.c: New test.
>   * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-copysign-run.c: New test.
>   * gcc.target/s390/zvector/long-double-copysign-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-fprx2-constant.c: New test.
>   * gcc.target/s390/zvector/long-double-from-double-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-double-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-float-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-float-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-double-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-double-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-float-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-float-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-vec-duplicate.c: New test.
>   * gcc.target/s390/zvector/long-double-wf.h: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfixb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-0111.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-1011.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-1101.c: New test.
>   * gcc.target/s390/zvector/long-double-wflcxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wflpxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfmaxb-2.c: New test.
>

[PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers

From: Thomas Rodgers 

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
 libstdc++-v3/include/std/barrier  | 248 ++
 .../testsuite/30_threads/barrier/1.cc |  27 ++
 .../testsuite/30_threads/barrier/2.cc |  27 ++
 .../testsuite/30_threads/barrier/arrive.cc|  51 
 .../30_threads/barrier/arrive_and_drop.cc |  49 
 .../30_threads/barrier/arrive_and_wait.cc |  51 
 .../30_threads/barrier/completion.cc  |  54 
 .../testsuite/30_threads/barrier/max.cc   |  44 
 8 files changed, 551 insertions(+)
 create mode 100644 libstdc++-v3/include/std/barrier
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..80e6d668cf5
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L
+
+#include 
+
+#if defined(_GLIBCXX_HAS_GTHREADS)
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __empty_completion
+  {
+_GLIBCXX_ALWAYS_INLINE void
+operator()() noexcept
+{ }
+  };
+
+/*
+
+The default implementation of __barrier_base is a classic tree barrier.
+
+It looks different from literature pseudocode for two main reasons:
+ 1. Threads that call into std::barrier functions do not provide indices,
+so a numbering step is added before the actual barrier algorithm,
+appearing as an N+1 round to the N rounds of the tree barrier.
+ 2. A great deal of attention has been paid to avoid cache line thrashing
+by flattening the tree structure into cache-line sized arrays, that
+are indexed in an efficient way.
+
+*/
+
+  using __barrier_phase_t = uint8_t;
+
+  template
+class __barrier_base
+{
+  struct alignas(64) /* naturally-align the heap state */ __state_t
+  {
+   struct
+   {
+ __atomic_base<__barrier_phase_t> __phase = ATOMIC_VAR_INIT(0);
+   } __tickets[64];
+  };
+
+  ptrdiff_t _M_expected;
+  unique_ptr _M_state_allocation;
+  __state_t*   _M_state;
+  __atomic_base _M_expected_adjustment;
+  _CompletionF _M_completion;
+  __atomic_base<__barrier_phase_t> _M_phase;
+
+  static __gthread_t
+  _S_get_tid() noexcept
+  {
+#ifdef __GLIBC__
+   // For the GNU C library pthread_self() is usable without linking to
+   // libpthread.so but returns 0, so we cannot use it in single-threaded
+   // programs, because this_thread::get_id() != thread::id{} must be true.
+

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Hans-Peter Nilsson

On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> I personally do not see the problem with the .retain attribute, however
> if it is going to be a barrier to getting the functionality committed, I
> am happy to change it, since I really just want the functionality in
> upstream sources.
>
> If a global maintainer would comment on whether any of the proposed
> approaches are acceptable, then I will try to block out time from other
> deadlines so I can work on the fixups and submit a patch in time for the
> GCC 11 freeze.
>
> Thanks,
> Jozef

I'm not much more than a random voice, but an assembly directive
that specifies the symbol (IIUC your .retain directive) to
adjust a symbol attribute sounds cleaner to me, than requiring
gcc to know that this requires it to adjust what it knows about
section flags (again, IIUC).

brgds, H-P

Re: [PATCH, rs6000] Optimize pcrel access of globals (updated, ping)

2020-11-04 Thread acsawdey--- via Gcc-patches

From: Aaron Sawdey 

Ping, as it has been a while.
This also includes a slight fix to make sure that all references can get
optimized.

This patch implements a RTL pass that looks for pc-relative loads of the
address of an external variable using the PCREL_GOT relocation and a
single load or store that uses that external address.

Produced by a cast of thousands:
 * Michael Meissner
 * Peter Bergner
 * Bill Schmidt
 * Alan Modra
 * Segher Boessenkool
 * Aaron Sawdey

Passes bootstrap/regtest on ppc64le power10. OK for trunk?

gcc/ChangeLog:

* config.gcc: Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md: Add d_form_memory predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT.
* config/rs6000/rs6000-passes.def: Add pass_pcrel_opt.
* config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(),
offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(),
and make_pass_pcrel_opt().
* config/rs6000/rs6000.c (reg_to_non_prefixed): Make global.
(rs6000_option_override_internal): Add pcrel-opt.
(rs6000_delegitimize_address): Support pcrel-opt.
(rs6000_opt_masks): Add pcrel-opt.
(offsettable_non_prefixed_memory): New function.
(reg_to_non_prefixed): Make global.
(rs6000_asm_output_opcode): Reset next_insn_prefixed_p.
(output_pcrel_opt_reloc): New function.
* config/rs6000/rs6000.md (loads_extern_addr): New attr.
(pcrel_extern_addr): Set loads_extern_addr.
Add include for pcrel-opt.md.
* config/rs6000/rs6000.opt: Add -mpcrel-opt.
* config/rs6000/t-rs6000: Add rules for pcrel-opt.c and
pcrel-opt.md.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pcrel-opt-inc-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-df.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-si.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: New test.
* gcc.target/powerpc/pcrel-opt-st-df.c: New test.
* gcc.target/powerpc/pcrel-opt-st-di.c: New test.
* gcc.target/powerpc/pcrel-opt-st-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-st-si.c: New test.
* gcc.target/powerpc/pcrel-opt-st-vector.c: New test.
---
 gcc/config.gcc|   6 +-
 gcc/config/rs6000/pcrel-opt.c | 888 ++
 gcc/config/rs6000/pcrel-opt.md| 386 
 gcc/config/rs6000/predicates.md   |  23 +
 gcc/config/rs6000/rs6000-cpus.def |   2 +
 gcc/config/rs6000/rs6000-passes.def   |   8 +
 gcc/config/rs6000/rs6000-protos.h |   4 +
 gcc/config/rs6000/rs6000.c| 116 ++-
 gcc/config/rs6000/rs6000.md   |   8 +-
 gcc/config/rs6000/rs6000.opt  |   4 +
 gcc/config/rs6000/t-rs6000|   7 +-
 .../gcc.target/powerpc/pcrel-opt-inc-di.c |  18 +
 .../gcc.target/powerpc/pcrel-opt-ld-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-ld-di.c  |  43 +
 .../gcc.target/powerpc/pcrel-opt-ld-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-sf.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-di.c  |  37 +
 .../gcc.target/powerpc/pcrel-opt-st-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-sf.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-st-vector.c  |  36 +
 26 files changed, 2013 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerp

Re: [PATCH,rs6000] Add patterns for combine to support p10 fusion

2020-11-04 Thread Aaron Sawdey via Gcc-patches

Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> This patch adds the first couple patterns to support p10 fusion. These
> will allow combine to create a single insn for a pair of instructions
> that that power10 can fuse and execute. These particular ones have the
> requirement that only cr0 can be used when fusing a load with a compare
> immediate of -1/0/1, so we want combine to put that requirement in, and
> if it doesn't work out later the splitter can get used.
> 
> This also adds option -mpower10-fusion which defaults on for power10 and
> will gate all these fusion patterns. In addition I have added an
> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
> that just controls the load+compare-immediate patterns. I have make
> these default on for power10 but they are not disallowed for earlier
> processors because it is still valid code. This allows us to test the
> correctness of fusion code generation by turning it on explicitly.
> 
> The intention is to work through more patterns of this style to support
> the rest of the power10 fusion pairs.
> 
> Bootstrap and regtest looks good on ppc64le power9 with these patterns
> enabled in stage2/stage3 and for regtest. Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/predicates.md: Add const_me_to_1_operand.
>   * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
>   OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER.
>   * config/rs6000/rs6000-protos.h (address_ok_for_form): Add
>   prototype.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
>   if target is power10.  (rs600_opt_masks): Allow -mpower10-fusion
>   in function attributes.  (address_ok_for_form): New function.
>   * config/rs6000/rs6000.h: Add MASK_P10_FUSION.
>   * config/rs6000/rs6000.md (*ld_cmpi_cr0): New
>   define_insn_and_split.
>   (*lwa_cmpdi_cr0): New define_insn_and_split.
>   (*lwa_cmpwi_cr0): New define_insn_and_split.
>   * config/rs6000/rs6000.opt: Add -mpower10-fusion
>   and -mpower10-fusion-ld-cmpi.
> ---
> gcc/config/rs6000/predicates.md   |  5 +++
> gcc/config/rs6000/rs6000-cpus.def |  6 ++-
> gcc/config/rs6000/rs6000-protos.h |  2 +
> gcc/config/rs6000/rs6000.c| 34 
> gcc/config/rs6000/rs6000.h|  1 +
> gcc/config/rs6000/rs6000.md   | 68 +++
> gcc/config/rs6000/rs6000.opt  |  8 
> 7 files changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 4c2fe7fa312..b75c1ddfb69 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand"
>   (and (match_code "const_int")
>(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
> 
> +;; Match op = -1, op = 0, or op = 1.
> +(define_predicate "const_m1_to_1_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
> +
> ;; Match op = 0..3.
> (define_predicate "const_0_to_3_operand"
>   (and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 8d2c1ffd6cf..3e65289d8df 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -82,7 +82,9 @@
> 
> #define ISA_3_1_MASKS_SERVER  (ISA_3_0_MASKS_SERVER   \
>| OPTION_MASK_POWER10  \
> -  | OTHER_POWER10_MASKS)
> +  | OTHER_POWER10_MASKS  \
> +  | OPTION_MASK_P10_FUSION   \
> +  | OPTION_MASK_P10_FUSION_LD_CMPI)
> 
> /* Flags that need to be turned off if -mno-power9-vector.  */
> #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\
> @@ -129,6 +131,8 @@
>| OPTION_MASK_FLOAT128_KEYWORD \
>| OPTION_MASK_FPRND\
>| OPTION_MASK_POWER10  \
> +  | OPTION_MASK_P10_FUSION   \
> +  | OPTION_MASK_P10_FUSION_LD_CMPI   \
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
>| OPTION_MASK_MFCRF\
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 25fa5dd57cd..d8a344245e6 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -190,6 +190,8 @@ enum non_prefixed_form

[PATCH] Add Ranger temporal cache

2020-11-04 Thread Andrew MacLeod via Gcc-patches

PR 97515 highlighted a bit of silliness that results when we calculate a 
bunch of ranges by traversing a back edge, and set some values.  Then we 
eventually visit that block during the DOM walk, and discover the value 
can be improved, sometimes dramatically.  It is already cached, so 
unfortunately we don't visit it again...


The situation is described in comment 4 : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97515#c4


I have created a temporal cache for the ranger that basically adds a 
timestamp to the global cache.


The timestamp maintains the time a global range was calculated 
(monotonically increasing based on "set") and  a list of up to 2 
directly dependent ssa-names that whenever we access the global value, 
their timestamp is checked to see if they are newer.  Any time the 
global value for a dependent is newer, then the current global value is 
considered stale and the ranger will recalculate using the newer values 
of the dependent.


whats that mean?

Using the PR testcase,  a back edge calculation for a PHI requires a 
range for ui_8:

  ui_8 = ~xe_3;
and at this time, we only know that xe_3 is <=0 based on the branch 
feeding the statement..  This ui_8 is calculated as [-1, +INF] globally 
and stored.

As the caluclting comtniues, we actually discover that xe_3 has to be -1.

When the EVRp dom walk eventually gets to this statement, we know xe_3 
evaluates to [-1, -1] and we fodl this statement to

  ui_8 = -1
Unfortunately, the global cache still have [-1, +INF] for ui_8 and has 
no way to know to reevalaute.


With the temporal cache operating, when we figure out that xe_3 
evaluates to [-1,-1], xe_3 gets a timestamp that is newer than that of ui_8.
when range_of_stmt is called on ui_8 now, it fails the "current" check, 
and the ranger proceeds to recalculate ui_8 using the new value of x_3, 
and we get the proper result of [-1, -1] store for the global value.

With this patch, this testcase now comes out of EVRP  looking like:

e7 (int gg)
{
  int ui;
  int xe;
  _Bool _1;
  int _2;

   :

   :
  _1 = 0;
  _2 = (int) _1;
  goto ; [INV]

}

Time wise its pretty good.  It basically consumes the time I saved with 
the previous cache tweaks, and the overall time now is almost exactly 
the same as it was before.    So we're still pretty zippy.


This  integrates with the previous cache changes so that when this 
global value for ui_8 is updated, any changes are automatically 
propagated into the global cache as well.


I have also updated the testcase to ensure that it now produces the 
above code with a single goto.



This bootstraps on x86_64-pc-linux-gnu, no regressions, and pushed.

Andrew

PS.  Before the next stage 1, I intend to use the preexisting dependency 
chains in the GORI engine instead of this one-or-two name timestamp entry.
Currentlt the  drawback is that only dependent values are checked, so 
intervening calculations will not trigger a recalculation.   If we use 
the GORI dependency chains, then everything in the dependency chain will 
be recognized as stale, and we'll get even more cases. Combined with 
improvements planned for how dependency chain ranges are calculated by 
GORI, we could get even more interesting results.
commit e86fd6a17cdb26710d1f13c9a47a3878c76028f9
Author: Andrew MacLeod 
Date:   Wed Nov 4 12:59:15 2020 -0500

Add Ranger temporal cache

Add a timestamp to supplement the global range cache to detect when a value
may become stale.

gcc/
PR tree-optimization/97515
* gimple-range-cache.h (class ranger_cache): New prototypes plus
temporal cache pointer.
* gimple-range-cache.cc (struct range_timestamp): New.
(class temporal_cache): New.
(temporal_cache::temporal_cache): New.
(temporal_cache::~temporal_cache): New.
(temporal_cache::get_timestamp): New.
(temporal_cache::set_dependency): New.
(temporal_cache::temporal_value): New.
(temporal_cache::current_p): New.
(temporal_cache::set_timestamp): New.
(temporal_cache::set_always_current): New.
(ranger_cache::ranger_cache): Allocate the temporal cache.
(ranger_cache::~ranger_cache): Free temporal cache.
(ranger_cache::get_non_stale_global_range): New.
(ranger_cache::set_global_range): Add a timestamp.
(ranger_cache::register_dependency): New.  Add timestamp dependency.
* gimple-range.cc (gimple_ranger::range_of_range_op): Add operand
dependencies.
(gimple_ranger::range_of_phi): Ditto.
(gimple_ranger::range_of_stmt): Check if global range is stale, and
recalculate if so.
gcc/testsuite/
* gcc.dg/pr97515.c: Check listing for folding of entire function.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index cca9025abba..b01563c83f9 100644
--- a/gcc/gimple-range-c

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches

On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  wrote:
>
> On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > I personally do not see the problem with the .retain attribute, however
> > if it is going to be a barrier to getting the functionality committed, I
> > am happy to change it, since I really just want the functionality in
> > upstream sources.
> >
> > If a global maintainer would comment on whether any of the proposed
> > approaches are acceptable, then I will try to block out time from other
> > deadlines so I can work on the fixups and submit a patch in time for the
> > GCC 11 freeze.
> >
> > Thanks,
> > Jozef
>
> I'm not much more than a random voice, but an assembly directive
> that specifies the symbol (IIUC your .retain directive) to

But .retain directive DOES NOT adjust symbol attribute.  Instead, it sets
the SHF_GNU_RETAIN bit on the section which contains the symbol
definition.  The same section can have many unrelated symbols.

> adjust a symbol attribute sounds cleaner to me, than requiring
> gcc to know that this requires it to adjust what it knows about
> section flags (again, IIUC).
>
> brgds, H-P



-- 
H.J.

[PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers

From: Thomas Rodgers 

IGNORE the previous version of this patch please.

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/atomic_base.h   |  11 +-
 libstdc++-v3/include/std/barrier  | 248 ++
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/30_threads/barrier/1.cc |  27 ++
 .../testsuite/30_threads/barrier/2.cc |  27 ++
 .../testsuite/30_threads/barrier/arrive.cc|  51 
 .../30_threads/barrier/arrive_and_drop.cc |  49 
 .../30_threads/barrier/arrive_and_wait.cc |  51 
 .../30_threads/barrier/completion.cc  |  54 
 .../testsuite/30_threads/barrier/max.cc   |  44 
 12 files changed, 562 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/include/std/barrier
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 382e94322c1..9e497835ee0 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -30,6 +30,7 @@ std_headers = \
${std_srcdir}/any \
${std_srcdir}/array \
${std_srcdir}/atomic \
+   ${std_srcdir}/barrier \
${std_srcdir}/bit \
${std_srcdir}/bitset \
${std_srcdir}/charconv \
diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index dd4db926592..1ad34719d3e 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 #if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, const _Func& __fn) const noexcept
+   { std::__atomic_wait(&_M_i, __old, __fn); }
+
   _GLIBCXX_ALWAYS_INLINE void
   wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
   {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old,
+   [__m, this, __old]
+   { return this->load(__m) != __old; });
   }
 
   // TODO add const volatile overload
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..50654b00a0c
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L
+
+#include 
+
+#if defined(_GLIBCXX_HAS_GT

[pingn] aarch64: move and adjust PROBE_STACK__REG

2020-11-04 Thread Olivier Hainque

Hello,

Another ping for this as a new end of stage1 approaches,
please ?

While this may ring the bell of a more involved issue
with ABIs and the use of R18, this particular change doesn't
have that kind of implication.

Thanks a lot in advance!

With Kind Regards,

Olivier

> On 26 Oct 2020, at 12:08, Olivier Hainque  wrote:
> 
>> On 15 Oct 2020, at 08:38, Olivier Hainque  wrote:
>> 
>>> On 24 Sep 2020, at 11:46, Olivier Hainque  wrote:
>>> 
>>> Re-proposing this patch after re-testing with a recent
>>> mainline on on aarch64-linux (bootstrap and regression test
>>> with --enable-languages=all), and more than a year of in-house
>>> use in production for a few aarch64 ports on a gcc-9 base.
>>> 
>>> The change moves the definitions of PROBE_STACK_FIRST_REG
>>> and PROBE_STACK_SECOND_REG to a more appropriate place for such
>>> items (here, in aarch64.md as suggested by Richard), and adjusts
>>> their value from r9/r10 to r10/r11 to free r9 for a possibly
>>> more general purpose (e.g. as a static chain at least on targets
>>> which have a private use of r18, such as Windows or Vxworks).
>>> 
>>> OK to commit?
>>> 
>>> Thanks in advance,
>>> 
>>> With Kind Regards,
>>> 
>>> Olivier
>>> 
>>> 2020-11-07  Olivier Hainque  
>>> 
>>> * config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
>>> and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
>>> Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
>>> aarch64.c.
>>> * config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
>>> (PROBE_STACK_SECOND_REG): Remove.
>>> (aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
>>> suffix update for PROBE_STACK register numbers.
>> 
>> 
>

Re: [PATCH][AArch64] Use intrinsics for upper saturating shift right

2020-11-04 Thread Richard Sandiford via Gcc-patches

Thanks for the patch, looks good.

David Candler  writes:
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 4f33dd936c7..f93f4e29c89 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -254,6 +254,10 @@ aarch64_types_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define TYPES_GETREG (aarch64_types_binop_imm_qualifiers)
>  #define TYPES_SHIFTIMM (aarch64_types_binop_imm_qualifiers)
>  static enum aarch64_type_qualifiers
> +aarch64_types_ternop_s_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate};
> +#define TYPES_SHIFT2IMM (aarch64_types_ternop_s_imm_qualifiers)
> +static enum aarch64_type_qualifiers
>  aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_none, qualifier_immediate };
>  #define TYPES_SHIFTIMM_USS (aarch64_types_shift_to_unsigned_qualifiers)
> @@ -265,14 +269,16 @@ static enum aarch64_type_qualifiers
>  aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
>  #define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers)
> +#define TYPES_USHIFT2IMM (aarch64_types_ternopu_imm_qualifiers)
> +static enum aarch64_type_qualifiers
> +aarch64_types_shift2_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_none, 
> qualifier_immediate };
> +#define TYPES_SHIFT2IMM_UUSS (aarch64_types_shift2_to_unsigned_qualifiers)
>  
>  static enum aarch64_type_qualifiers
>  aarch64_types_ternop_s_imm_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_none, qualifier_poly, qualifier_immediate};
>  #define TYPES_SETREGP (aarch64_types_ternop_s_imm_p_qualifiers)
> -static enum aarch64_type_qualifiers
> -aarch64_types_ternop_s_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> -  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate};
>  #define TYPES_SETREG (aarch64_types_ternop_s_imm_qualifiers)
>  #define TYPES_SHIFTINSERT (aarch64_types_ternop_s_imm_qualifiers)
>  #define TYPES_SHIFTACC (aarch64_types_ternop_s_imm_qualifiers)

Very minor, but I think it would be better to keep
aarch64_types_ternop_s_imm_qualifiers where it is and define
TYPES_SHIFT2IMM here rather than above.  For better or worse,
the current style seems to be to keep the defines next to the
associated arrays, rather than group them based on the TYPES_* name.

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index d1b21102b2f..0b82b9c072b 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -285,6 +285,13 @@
>BUILTIN_VSQN_HSDI (USHIFTIMM, uqshrn_n, 0, ALL)
>BUILTIN_VSQN_HSDI (SHIFTIMM, sqrshrn_n, 0, ALL)
>BUILTIN_VSQN_HSDI (USHIFTIMM, uqrshrn_n, 0, ALL)
> +  /* Implemented by aarch64_qshrn2_n.  */
> +  BUILTIN_VQN (SHIFT2IMM_UUSS, sqshrun2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM_UUSS, sqrshrun2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM, sqshrn2_n, 0, ALL)
> +  BUILTIN_VQN (USHIFT2IMM, uqshrn2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM, sqrshrn2_n, 0, ALL)
> +  BUILTIN_VQN (USHIFT2IMM, uqrshrn2_n, 0, ALL)

Using ALL is a holdover from the time (until a few weeks ago) when we
didn't record function attributes.  New intrinsics should therefore
have something more specific than ALL.

We discussed offline whether the Q flag side effect of the intrinsics
should be observable or not, and the conclusion was that it shouldn't.
I think we can therefore treat these functions as pure functions,
meaning that they should have flags NONE rather than ALL.

For that reason, I think we should also remove the Set_Neon_Cumulative_Sat
and CHECK_CUMULATIVE_SAT parts of the test (sorry).

Other than that, the patch looks good to go.

Thanks,
Richard

Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Jonathan Wakely via Gcc-patches


On 04/11/20 09:29 -0800, Thomas Rodgers wrote:

From: Thomas Rodgers 

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
libstdc++-v3/include/std/barrier  | 248 ++
.../testsuite/30_threads/barrier/1.cc |  27 ++
.../testsuite/30_threads/barrier/2.cc |  27 ++
.../testsuite/30_threads/barrier/arrive.cc|  51 
.../30_threads/barrier/arrive_and_drop.cc |  49 
.../30_threads/barrier/arrive_and_wait.cc |  51 
.../30_threads/barrier/completion.cc  |  54 
.../testsuite/30_threads/barrier/max.cc   |  44 
8 files changed, 551 insertions(+)
create mode 100644 libstdc++-v3/include/std/barrier
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..80e6d668cf5
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L


This feature test macro will be defined unconditionally, even if
_GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
for gthreads.

You're also missing an edit to  (which should depend on the
same conditions).



+#include 
+
+#if defined(_GLIBCXX_HAS_GTHREADS)
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __empty_completion
+  {
+_GLIBCXX_ALWAYS_INLINE void
+operator()() noexcept
+{ }
+  };
+
+/*
+
+The default implementation of __barrier_base is a classic tree barrier.
+
+It looks different from literature pseudocode for two main reasons:
+ 1. Threads that call into std::barrier functions do not provide indices,
+so a numbering step is added before the actual barrier algorithm,
+appearing as an N+1 round to the N rounds of the tree barrier.
+ 2. A great deal of attention has been paid to avoid cache line thrashing
+by flattening the tree structure into cache-line sized arrays, that
+are indexed in an efficient way.
+
+*/
+
+  using __barrier_phase_t = uint8_t;


Please add  or  since you're using uint8_t
(it's currently included by  but that could
change).

Would it work to use a scoped enumeration type here instead? That
would prevent people accidentally doing arithmetic on it, or passing
it to functions taking an integer (and prevent it promoting to int in
arithmetic).

e.g. define it similar to std::byte:

enum class __barrier_phase_t : unsigned char { };

and then cast to an integer on the way in and the way out, so that the
implementation works with its numeric value, but users have a
non-arithmetic type that the

Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Jonathan Wakely via Gcc-patches


On 04/11/20 10:41 -0800, Thomas Rodgers wrote:

From: Thomas Rodgers 

IGNORE the previous version of this patch please.


OK, but all my comments seem to apply to this one too.


Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
libstdc++-v3/include/Makefile.am  |   1 +
libstdc++-v3/include/Makefile.in  |   1 +
libstdc++-v3/include/bits/atomic_base.h   |  11 +-
libstdc++-v3/include/std/barrier  | 248 ++
libstdc++-v3/include/std/version  |   1 +
.../testsuite/30_threads/barrier/1.cc |  27 ++
.../testsuite/30_threads/barrier/2.cc |  27 ++
.../testsuite/30_threads/barrier/arrive.cc|  51 
.../30_threads/barrier/arrive_and_drop.cc |  49 
.../30_threads/barrier/arrive_and_wait.cc |  51 
.../30_threads/barrier/completion.cc  |  54 
.../testsuite/30_threads/barrier/max.cc   |  44 
12 files changed, 562 insertions(+), 3 deletions(-)
create mode 100644 libstdc++-v3/include/std/barrier
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 382e94322c1..9e497835ee0 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -30,6 +30,7 @@ std_headers = \
${std_srcdir}/any \
${std_srcdir}/array \
${std_srcdir}/atomic \
+   ${std_srcdir}/barrier \
${std_srcdir}/bit \
${std_srcdir}/bitset \
${std_srcdir}/charconv \
diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index dd4db926592..1ad34719d3e 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }

#if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, const _Func& __fn) const noexcept
+   { std::__atomic_wait(&_M_i, __old, __fn); }
+
  _GLIBCXX_ALWAYS_INLINE void
  wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
  {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old,
+   [__m, this, __old]
+   { return this->load(__m) != __old; });
  }


This looks like it's not meant to be part of this patch.

It also looks wrong for any patch, because it adds _M_wait as a public
member.

Not sure what this piece is for :-)

Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers




> On Nov 4, 2020, at 10:52 AM, Jonathan Wakely  wrote:
> 
> On 04/11/20 10:41 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers 
>> 
>> IGNORE the previous version of this patch please.
> 
> OK, but all my comments seem to apply to this one too.
> 

Sure :)

>> Adds 
>> 
>> libstdc++/ChangeLog:
>> 
>>  * include/Makefile.am (std_headers): Add new header.
>>  * include/Makefile.in: Regenerate.
>>  * include/std/barrier: New file.
>>  * testsuite/30_thread/barrier/1.cc: New test.
>>  * testsuite/30_thread/barrier/2.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive.cc: Likewise.
>>  * testsuite/30_thread/barrier/completion.cc: Likewise.
>>  * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/Makefile.am  |   1 +
>> libstdc++-v3/include/Makefile.in  |   1 +
>> libstdc++-v3/include/bits/atomic_base.h   |  11 +-
>> libstdc++-v3/include/std/barrier  | 248 ++
>> libstdc++-v3/include/std/version  |   1 +
>> .../testsuite/30_threads/barrier/1.cc |  27 ++
>> .../testsuite/30_threads/barrier/2.cc |  27 ++
>> .../testsuite/30_threads/barrier/arrive.cc|  51 
>> .../30_threads/barrier/arrive_and_drop.cc |  49 
>> .../30_threads/barrier/arrive_and_wait.cc |  51 
>> .../30_threads/barrier/completion.cc  |  54 
>> .../testsuite/30_threads/barrier/max.cc   |  44 
>> 12 files changed, 562 insertions(+), 3 deletions(-)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>> 
>> diff --git a/libstdc++-v3/include/Makefile.am 
>> b/libstdc++-v3/include/Makefile.am
>> index 382e94322c1..9e497835ee0 100644
>> --- a/libstdc++-v3/include/Makefile.am
>> +++ b/libstdc++-v3/include/Makefile.am
>> @@ -30,6 +30,7 @@ std_headers = \
>>  ${std_srcdir}/any \
>>  ${std_srcdir}/array \
>>  ${std_srcdir}/atomic \
>> +${std_srcdir}/barrier \
>>  ${std_srcdir}/bit \
>>  ${std_srcdir}/bitset \
>>  ${std_srcdir}/charconv \
>> diff --git a/libstdc++-v3/include/bits/atomic_base.h 
>> b/libstdc++-v3/include/bits/atomic_base.h
>> index dd4db926592..1ad34719d3e 100644
>> --- a/libstdc++-v3/include/bits/atomic_base.h
>> +++ b/libstdc++-v3/include/bits/atomic_base.h
>> @@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>  }
>> 
>> #if __cplusplus > 201703L
>> +  template
>> +_GLIBCXX_ALWAYS_INLINE void
>> +_M_wait(__int_type __old, const _Func& __fn) const noexcept
>> +{ std::__atomic_wait(&_M_i, __old, __fn); }
>> +
>>  _GLIBCXX_ALWAYS_INLINE void
>>  wait(__int_type __old,
>>memory_order __m = memory_order_seq_cst) const noexcept
>>  {
>> -std::__atomic_wait(&_M_i, __old,
>> -   [__m, this, __old]
>> -   { return this->load(__m) != __old; });
>> +_M_wait(__old,
>> +[__m, this, __old]
>> +{ return this->load(__m) != __old; });
>>  }
> 
> This looks like it's not meant to be part of this patch.
> 
> It also looks wrong for any patch, because it adds _M_wait as a public
> member.
> 
> Not sure what this piece is for :-)
> 

It is used at include/std/barrier:197 to keep the implementation as close as 
possible to the libc++ version upon which it is based.


>

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Segher Boessenkool

On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
> Tobias Burnus  writes:
> > Three of the testcases fail on PowerPC: 
> > gcc.target/i386/zero-scratch-regs-{9,10,11}.c
> >powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> > unimplemented: '-fzero-call-used_regs' not supported on this target
> >
> > Did you miss some dg-require-effective-target ?
> 
> No, these are a signal to target maintainers that they need
> to decide whether to add support or accept the status quo
> (in which case a new effective-target will be needed).  See:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html:
> 
> The new tests are likely to fail on some targets with the sorry()
> message, but I think target maintainers are best placed to decide
> whether (a) that's a fundamental restriction of the target and the
> tests should just be skipped or (b) the target needs to implement
> the new hook.

But why are tests in gcc.target/i386/ run for other targets at all?!


Segher

Re: [ping] aarch64: move and adjust PROBE_STACK_*_REG

2020-11-04 Thread Richard Sandiford via Gcc-patches

Olivier Hainque  writes:
> Ping, please ?
>
> Patch re-attached for convenience.

Looks OK to me, and I assume Richard would have spoken up by now if
he didn't think the patch did what he wanted.

> +;; The pair of scratch registers used for stack probing with 
> -fstack-check.
> +;; Leave R9 alone as a possible choice for the static chain.
> +(PROBE_STACK_FIRST_REGNUM  10)
> +(PROBE_STACK_SECOND_REGNUM 11)
>  ;; Scratch register used by stack clash protection to calculate
>  ;; SVE CFA offsets during probing.
>  (STACK_CLASH_SVE_CFA_REGNUM 11)

It's a bit concerning that the second register now overlaps
STACK_CLASH_SVE_CFA_REGNUM, but I agree that isn't a problem
in practice, since the two uses are currently mutually-exclusive.
I think it might be worth having a comment about that,  So maybe add:

;; Note that the use of these registers is mutually exclusive with the use
;; of STACK_CLASH_SVE_CFA_REGNUM, which is for -fstack-clash-protection
;; rather than -fstack-check.

to the new comment above.

OK with that change, thanks.  Sorry for the long delay in the review.

Richard

Re: [patch] Add dg-require-effective-target fpic to an aarch64 specific test in gcc.dg

2020-11-04 Thread Richard Sandiford via Gcc-patches

Olivier Hainque  writes:
> Hello,
>
> This patch adds dg-require-effective-target fpic
> to an aarch64 specific gcc.dg test using -fPIC,
> which helps circumvent a failure we observed while
> testing the aarch64 port for VxWorks.
>
> ok to commit ?

OK, thanks.  Also OK for any other current or future aarch64 test that
has -fpic or -fPIC in the options and forgets to do this.

Richard

[PATCH] c++: Use two levels of caching in satisfy_atom

2020-11-04 Thread Patrick Palka via Gcc-patches

[ This patch depends on

  c++: Reuse identical ATOMIC_CONSTRs during normalization

  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557929.html  ]

This improves the effectiveness of caching in satisfy_atom by querying
the cache again after we've instantiated the atom's parameter mapping.

Before instantiating its mapping, the identity of an (atom,args) pair
within the satisfaction cache is determined by idiosyncratic things such
as the level and index of each template parameter used in targets of the
parameter mapping.  For example, the associated constraints of foo in

  template  concept range = range_v;
  template  void foo () requires range && range;

are range_v (with mapping T -> U) /\ range_v (with mapping T -> V).
If during satisfaction the template arguments supplied for U and V are
the same, then the satisfaction value of these two atoms will be the
same (despite their uninstantiated parameter mappings being different).

But sat_cache doesn't see this because it compares the uninstantiated
parameter mapping and the supplied template arguments of sat_entry's
independently.  So satisy_atom currently will end up fully evaluating
the latter atom instead of reusing the satisfaction value of the former.

But there is a point when the two atoms do look the same to sat_cache,
and that's after instantiating their parameter mappings.  By querying
the cache again at this point, we're at least able to avoid substituting
the instantiated mapping into the second atom's expression.

With this patch, compile time and memory usage for the cmcstl2 test
test/algorithm/set_symmetric_diference4.cpp drops from 11s/1.4GB to
8.5s/1.2GB with an --enable-checking=release compiler.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* cp-tree.h (ATOMIC_CONSTR_MAP_INSTANTIATED_P): Define this flag
for ATOMIC_CONSTRs.
* constraint.cc (sat_hasher::hash): Use hash_atomic_constraint
if the flag is set, otherwise keep using a pointer hash.
(sat_hasher::equal): Return false if the flag's setting differs
on two atoms.  Call atomic_constraints_identical_p if the flag
is set, otherwise keep using a pointer equality test.
(satisfy_atom): After instantiating the parameter mapping, form
another ATOMIC_CONSTR using the instantiated mapping and query
the cache again.  Cache the satisfaction value of both atoms.
(diagnose_atomic_constraint): Simplify now that the supplied
atom has an instantiated mapping.
---
 gcc/cp/constraint.cc | 47 +++-
 gcc/cp/cp-tree.h |  6 ++
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 55dba362ca5..c612bfba13b 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2315,12 +2315,32 @@ struct sat_hasher : ggc_ptr_hash
 {
   static hashval_t hash (sat_entry *e)
   {
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e->constr))
+  {
+   gcc_assert (!e->args);
+   return hash_atomic_constraint (e->constr);
+  }
+
 hashval_t value = htab_hash_pointer (e->constr);
 return iterative_hash_template_arg (e->args, value);
   }
 
   static bool equal (sat_entry *e1, sat_entry *e2)
   {
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e1->constr)
+   != ATOMIC_CONSTR_MAP_INSTANTIATED_P (e2->constr))
+  return false;
+
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e1->constr))
+  {
+   /* Atoms with instantiated mappings are built in satisfy_atom.  */
+   gcc_assert (!e1->args && !e2->args);
+   return atomic_constraints_identical_p (e1->constr, e2->constr);
+  }
+
+/* Atoms with uninstantiated mappings are built in normalize_atom.
+   Their identity is determined by their pointer value due to
+   the caching of ATOMIC_CONSTRs performed therein.  */
 if (e1->constr != e2->constr)
   return false;
 return template_args_equal (e1->args, e2->args);
@@ -2614,6 +2634,18 @@ satisfy_atom (tree t, tree args, subst_info info)
   return cache.save (boolean_false_node);
 }
 
+  /* Now build a new atom using the instantiated mapping.  We use
+ this atom as a second key to the satisfaction cache, and we
+ also pass it to diagnose_atomic_constraint so that diagnostics
+ which refer to the atom display the instantiated mapping.  */
+  t = copy_node (t);
+  ATOMIC_CONSTR_MAP (t) = map;
+  gcc_assert (!ATOMIC_CONSTR_MAP_INSTANTIATED_P (t));
+  ATOMIC_CONSTR_MAP_INSTANTIATED_P (t) = true;
+  satisfaction_cache inst_cache (t, /*args=*/NULL_TREE, info.complain);
+  if (tree r = inst_cache.get ())
+return cache.save (r);
+
   /* Rebuild the argument vector from the parameter mapping.  */
   args = get_mapped_args (map);
 
@@ -2626,19 +2658,19 @@ satisfy_atom (tree t, tree args, subst_info info)
 is not satisfied. Replay the substitution.  */
   if (info.noisy ())
tsub

[10/32] config

2020-11-04 Thread Nathan Sidwell


I managed to flub sending this yesterday.

This is the gcc/configure.ac changes (rebuild configure and config.h.in 
after applying).  Generally just checking for network-related 
functionality.  If it's not available, those features of the module 
mapper will be unavailable.


nathan
--
Nathan Sidwell
diff --git c/gcc/configure.ac w/gcc/configure.ac
index 73034bb902b..168a3bc3625 100644
--- c/gcc/configure.ac
+++ w/gcc/configure.ac
@@ -1417,8 +1419,8 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked feof_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
-	gettimeofday mbstowcs wcswidth mmap setlocale \
-	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2)
+	gettimeofday mbstowcs wcswidth mmap memrchr posix_fallocate setlocale \
+	gcc_UNLOCKED_FUNCS madvise mallinfo execv mallinfo2 fstatat)
 
 if test x$ac_cv_func_mbstowcs = xyes; then
   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
@@ -1440,6 +1442,10 @@ fi
 
 AC_CHECK_TYPE(ssize_t, int)
 AC_CHECK_TYPE(caddr_t, char *)
+AC_CHECK_TYPE(sighander_t,
+  AC_DEFINE(HAVE_SIGHANDLER_T, 1,
+[Define if  defines sighandler_t]),
+,signal.h)
 
 GCC_AC_FUNC_MMAP_BLACKLIST
 
@@ -1585,6 +1591,146 @@ if test $ac_cv_f_setlkw = yes; then
   [Define if F_SETLKW supported by fcntl.])
 fi
 
+# Check if O_CLOEXEC is defined by fcntl
+AC_CACHE_CHECK(for O_CLOEXEC, ac_cv_o_cloexec, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]], [[
+return open ("/dev/null", O_RDONLY | O_CLOEXEC);]])],
+[ac_cv_o_cloexec=yes],[ac_cv_o_cloexec=no])])
+if test $ac_cv_o_cloexec = yes; then
+  AC_DEFINE(HOST_HAS_O_CLOEXEC, 1,
+  [Define if O_CLOEXEC supported by fcntl.])
+fi
+
+# C++ Modules would like some networking features to provide the mapping
+# server.  You can still use modules without them though.
+# The following network-related checks could probably do with some
+# Windows and other non-linux defenses and checking.
+
+# Local socket connectivity wants AF_UNIX networking
+# Check for AF_UNIX networking
+AC_CACHE_CHECK(for AF_UNIX, ac_cv_af_unix, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_un un;
+un.sun_family = AF_UNSPEC;
+int fd = socket (AF_UNIX, SOCK_STREAM, 0);
+connect (fd, (sockaddr *)&un, sizeof (un));]])],
+[ac_cv_af_unix=yes],
+[ac_cv_af_unix=no])])
+if test $ac_cv_af_unix = yes; then
+  AC_DEFINE(HAVE_AF_UNIX, 1,
+  [Define if AF_UNIX supported.])
+fi
+
+# Remote socket connectivity wants AF_INET6 networking
+# Check for AF_INET6 networking
+AC_CACHE_CHECK(for AF_INET6, ac_cv_af_inet6, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_in6 in6;
+in6.sin6_family = AF_UNSPEC;
+struct addrinfo *addrs = 0;
+struct addrinfo hints;
+hints.ai_flags = 0;
+hints.ai_family = AF_INET6;
+hints.ai_socktype = SOCK_STREAM;
+hints.ai_protocol = 0;
+hints.ai_canonname = 0;
+hints.ai_addr = 0;
+hints.ai_next = 0;
+int e = getaddrinfo ("localhost", 0, &hints, &addrs);
+const char *str = gai_strerror (e);
+freeaddrinfo (addrs);
+int fd = socket (AF_INET6, SOCK_STREAM, 0);
+connect (fd, (sockaddr *)&in6, sizeof (in6));]])],
+[ac_cv_af_inet6=yes],
+[ac_cv_af_inet6=no])])
+if test $ac_cv_af_inet6 = yes; then
+  AC_DEFINE(HAVE_AF_INET6, 1,
+  [Define if AF_INET6 supported.])
+fi
+
+# Efficient server response wants epoll
+# Check for epoll_create, epoll_ctl, epoll_pwait
+AC_CACHE_CHECK(for epoll, ac_cv_epoll, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+int fd = epoll_create (1);
+epoll_event ev;
+ev.events = EPOLLIN;
+ev.data.fd = 0;
+epoll_ctl (fd, EPOLL_CTL_ADD, 0, &ev);
+epoll_pwait (fd, 0, 0, -1, 0);]])],
+[ac_cv_epoll=yes],
+[ac_cv_epoll=no])])
+if test $ac_cv_epoll = yes; then
+  AC_DEFINE(HAVE_EPOLL, 1,
+  [Define if epoll_create, epoll_ctl, epoll_pwait provided.])
+fi
+
+# If we can't use epoll, try pselect.
+# Check for pselect
+AC_CACHE_CHECK(for pselect, ac_cv_pselect, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+pselect (0, 0, 0, 0, 0, 0);]])],
+[ac_cv_pselect=yes],
+[ac_cv_pselect=no])])
+if test $ac_cv_pselect = yes; then
+  AC_DEFINE(HAVE_PSELECT, 1,
+  [Define if pselect provided.])
+fi
+
+# And failing that, use good old select.
+# If we can't even use this, the server is serialized.
+# Check for select
+AC_CACHE_CHECK(for select, ac_cv_select, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+select (0, 0, 0, 0, 0);]])],
+[ac_cv_select=yes],
+[ac_cv_select=no])])
+if test $ac_cv_select = yes; then
+  AC_DEFINE(HAVE_SELECT, 1,
+  [Define if select provided.])
+fi
+
+# Avoid some fnctl calls by using accept4, when available.
+# Check for accept4
+AC_CACHE_CHECK(for accept4, ac_cv_accept4, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+int err = accept4 (1, 0, 0, SOCK_NONBLOCK);]])],
+[ac_cv_accept4=yes],
+[ac_cv_accept4=no])])
+if test $ac_cv_accept4 = yes; then
+  AC_DEFINE(HAVE_ACCEPT4, 1,
+  [Define i

Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers




> On Nov 4, 2020, at 10:50 AM, Jonathan Wakely  wrote:
> 
> On 04/11/20 09:29 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers 
>> 
>> Adds 
>> 
>> libstdc++/ChangeLog:
>> 
>>  * include/Makefile.am (std_headers): Add new header.
>>  * include/Makefile.in: Regenerate.
>>  * include/std/barrier: New file.
>>  * testsuite/30_thread/barrier/1.cc: New test.
>>  * testsuite/30_thread/barrier/2.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive.cc: Likewise.
>>  * testsuite/30_thread/barrier/completion.cc: Likewise.
>>  * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/std/barrier  | 248 ++
>> .../testsuite/30_threads/barrier/1.cc |  27 ++
>> .../testsuite/30_threads/barrier/2.cc |  27 ++
>> .../testsuite/30_threads/barrier/arrive.cc|  51 
>> .../30_threads/barrier/arrive_and_drop.cc |  49 
>> .../30_threads/barrier/arrive_and_wait.cc |  51 
>> .../30_threads/barrier/completion.cc  |  54 
>> .../testsuite/30_threads/barrier/max.cc   |  44 
>> 8 files changed, 551 insertions(+)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>> 
>> diff --git a/libstdc++-v3/include/std/barrier 
>> b/libstdc++-v3/include/std/barrier
>> new file mode 100644
>> index 000..80e6d668cf5
>> --- /dev/null
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -0,0 +1,248 @@
>> +//  -*- C++ -*-
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// .
>> +
>> +// This implementation is based on libcxx/include/barrier
>> +//===-- barrier.h --===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===---===//
>> +
>> +#ifndef _GLIBCXX_BARRIER
>> +#define _GLIBCXX_BARRIER 1
>> +
>> +#pragma GCC system_header
>> +
>> +#if __cplusplus > 201703L
>> +#define __cpp_lib_barrier 201907L
> 
> This feature test macro will be defined unconditionally, even if
> _GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
> for gthreads.
> 
> You're also missing an edit to  (which should depend on the
> same conditions).
> 
> 
>> +#include 
>> +
>> +#if defined(_GLIBCXX_HAS_GTHREADS)
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +namespace std _GLIBCXX_VISIBILITY(default)
>> +{
>> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +
>> +  struct __empty_completion
>> +  {
>> +_GLIBCXX_ALWAYS_INLINE void
>> +operator()() noexcept
>> +{ }
>> +  };
>> +
>> +/*
>> +
>> +The default implementation of __barrier_base is a classic tree barrier.
>> +
>> +It looks different from literature pseudocode for two main reasons:
>> + 1. Threads that call into std::barrier functions do not provide indices,
>> +so a numbering step is added before the actual barrier algorithm,
>> +appearing as an N+1 round to the N rounds of the tree barrier.
>> + 2. A great deal of attention has been paid to avoid cache line thrashing
>> +by flattening the tree structure into cache-line sized arrays, that
>> +are indexed in an efficient way.
>> +
>> +*/
>> +
>> +  using __barrier_phase_t = uint8_t;
> 
> Please add  or  since you're using uint8_t
> (it's currently included by  but that could
> change).
> 
> Would it work to use a scoped enumeration type here instead?

Re: [PATCH] libstdc++: Implement C++20 features for

2020-11-04 Thread Stephan Bergmann via Gcc-patches


On 07/10/2020 18:55, Thomas Rodgers wrote:

From: Thomas Rodgers 

New ctors and ::view() accessor for -
   * basic_stingbuf
   * basic_istringstream
   * basic_ostringstream
   * basic_stringstreamm

New ::get_allocator() accessor for basic_stringbuf.
I found that this 
 
"libstdc++: Implement C++20 features for " changed the behavior of



$ cat test.cc
#include 
#include 
#include 
int main() {
  std::stringstream s("a");
  std::istreambuf_iterator i(s);
  if (i != std::istreambuf_iterator()) std::cout << *i << '\n';
}

$ g++ -std=c++20 test.cc
$ ./a.out


from printing "a" to printing nothing.  (The `i != ...` comparison 
appears to change i from pointing at "a" to pointing to null, and 
returns false.)


I ran into this when building LibreOffice, and I hope test.cc is a 
faithfully minimized reproducer.  However, I know little about 
std::istreambuf_iterator, so it may well be that the code isn't even valid.

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Qing Zhao via Gcc-patches




> On Nov 4, 2020, at 1:00 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
>> Tobias Burnus  writes:
>>> Three of the testcases fail on PowerPC: 
>>> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>>>   powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
>>> unimplemented: '-fzero-call-used_regs' not supported on this target
>>> 
>>> Did you miss some dg-require-effective-target ?
>> 
>> No, these are a signal to target maintainers that they need
>> to decide whether to add support or accept the status quo
>> (in which case a new effective-target will be needed).  See:
>> https://urldefense.com/v3/__https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html__;!!GqivPVa7Brio!PD1t9rpXf7lNS8yVbiQckiR5w3bv1eqGZenzRGPMBTAlYpshdQ9qVR0JLhoeNFMg$
>>  :
>> 
>>The new tests are likely to fail on some targets with the sorry()
>>message, but I think target maintainers are best placed to decide
>>whether (a) that's a fundamental restriction of the target and the
>>tests should just be skipped or (b) the target needs to implement
>>the new hook.
> 
> But why are tests in gcc.target/i386/ run for other targets at all?!

No,  tests in gcc.target/i386 should not run for PowerPC.

What Tobias Burnus mentioned are the following tests:

powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++2a (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++98 (test for excess errors)


They are under c-c++-common, not gcc.target/i386. 

These testing cases are added intentionaly on all platforms in order to check 
whether  the current middle-end default implementation for
-fzero-call-used-regs works on the specific platform.

If the default implementation doesn’t work for the specific platform, for 
example, on PowerPC, it’s better for the Maintainer of PowerPC to decide
Whether to skip these testing case on this platform or add a PowerPC 
implementation.

Qing
> 
> 
> Segher

Go patch committed: Turn off -fipa-icf-functions

2020-11-04 Thread Ian Lance Taylor via Gcc-patches

Go code expects to be able to do a reliable backtrace and get correct
file/line information of callers.  This is broken by
-fipa-icf-functions, so this Go frontend patch disables that option by
default.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

* go-lang.c (go_langhook_post_options): Disable
-fipa-icf-functions if it was not explicitly enabled.
diff --git a/gcc/go/go-lang.c b/gcc/go/go-lang.c
index 2cfb41042bd..08c1f38a2c1 100644
--- a/gcc/go/go-lang.c
+++ b/gcc/go/go-lang.c
@@ -306,6 +306,12 @@ go_langhook_post_options (const char **pfilename 
ATTRIBUTE_UNUSED)
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
   flag_partial_inlining, 0);
 
+  /* Go programs expect runtime.Callers to give the right answers,
+ which means that we can't combine functions even if they look the
+ same.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  flag_ipa_icf_functions, 0);
+
   /* If the debug info level is still 1, as set in init_options, make
  sure that some debugging type is selected.  */
   if (global_options.x_debug_info_level == DINFO_LEVEL_TERSE

1 2 >

1 - 100 of 136 matches

Mail list logo