Re: [PATCH] testsuite: Update some vect cases for partial vectors

2020-08-06 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Thanks for the review!

on 2020/8/6 下午1:52, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c 
>> b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
>> index 5200ed1cd94..da6fb12eb0d 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
>> @@ -48,6 +48,9 @@ int main (void)
>>return 0;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target vect_unpack } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>>  { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } 
>> } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target { vect_unpack && {! vect_partial_vectors_usage_1 } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target { vect_unpack && { vect_partial_vectors_usage_1 } } } } } */
> 
> I don't understand this bit: don't these two lines reduce back to the
> original vect_unpack one?

Yes, we don't need them.  Sorry that I duplicated it for the different 
conditions like the one after it,
but forgot to change it back when found it's useless.  Thanks for catching!

> 
>> +/* The epilogues are vectorized using partial vectors.  */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>>  { target { vect_unpack && {! vect_partial_vectors_usage_1 } } xfail { 
>> vect_variable_length && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
>>  { target { vect_unpack && { vect_partial_vectors_usage_1 } } xfail { 
>> vect_variable_length && vect_load_lanes } } } } */
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
>> b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> index ca7803ec1a9..af6fe08856f 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> @@ -80,8 +80,10 @@ int main (int argc, const char* argv[])
>>  }
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target vect_perm } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { vect_perm3_int && {! vect_load_lanes } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { vect_perm3_int && { {! vect_load_lanes } && {! 
>> vect_partial_vectors_usage_1 } } } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
>> { target vect_load_lanes } } } */
>> +/* The epilogues are vectorized using partial vectors.  */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
>> { target { vect_perm3_int && { {! vect_load_lanes } && { 
>> vect_partial_vectors_usage_1 } } } } } } */
> 
> Very minor, but I think it'd be better to put this immediately after the
> line you changed above.  Same for the other slp-perm-* changes.
> 

OK, will fix.

>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>> b/gcc/testsuite/lib/target-supports.exp
>> index 57eed3012b9..b571e84d20e 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -7055,6 +7055,27 @@ proc check_effective_target_vect_check_ptrs { } {
>>  return [check_effective_target_aarch64_sve2]
>>  }
>>  
>> +# Return true if loops using partial vectors are supported.
>> +
>> +proc check_effective_target_vect_partial_vectors { } {
>> +return [expr { [check_effective_target_vect_partial_vectors_usage_1]
>> +   || [check_effective_target_vect_partial_vectors_usage_2] }]
>> +}
>> +
>> +# Return true if loops using partial vectors are supported and the default
>> +# value of --param=vect-partial-vector-usage is 1.
>> +
>> +proc check_effective_target_vect_partial_vectors_usage_1 { } {
>> +return 0
>> +}
>> +
>> +# Return true if loops using partial vectors are supported and the default
>> +# value of --param=vect-partial-vector-usage is 2.
>> +
>> +proc check_effective_target_vect_partial_vectors_usage_2 { } {
>> +return [expr { [check_effective_target_vect_fully_masked] }]
>> +}
>> +
> 
> Could we auto-detect this?  What we really care about isn't the default,
> but what's currently being tested.

Yeah, the comments were confusing, its intent is to check which targets
support partial vectors and which usage to be used.

How about to update them like:

"Return true if loops using partial vectors are supported and usage kind is
1/2".

> 
> E.g. maybe use check_compile to run gcc with “-Q --help=params” and an
> arbitrary output type (probably assembly).  Then use “regexp” on the
> lines to parse the --param=vect-partial-vector-usage value.  At that
> point it would be worth caching the result.

Now the default value of this parameter is 2, even for those targets which
don't have the supports with partial vectors.  Since

Re: [PATCH] libgccjit: Improve doc and comments regarding type casts

2020-08-06 Thread Andrea Corallo
Andrea Corallo  writes:

> Hi Alex,
>
> Looking at the code I believe all these casts are meant to be supported
> (read your intuition was correct).
>
> Also IMO source of confusion is that the doc is mentioning 'int' and
> 'float' but I believe would be better to have like 'integral' and
> 'floating-point' to clearly disambiguates with respect to the C
> types.
>
> AFAIU the set of supported casts should be like:
>
>  integral   <-> integral
>  floating-point <-> floating-point
>  integral   <-> floating-point
>  integral   <-> bool
>  P* <-> Q*   for pointer types P and Q.
>
> I'd propose to install the following patch to make doc and comments
> homogeneous at documenting what do we accept, and I guess we should just
> consider bugs if some of these conversions is not handled correctly or
> leads to ICE.
>
> Bests
>
>   Andrea
>
> gcc/jit/ChangeLog
>
> 2020-07-21  Andrea Corallo  
>
>   * docs/_build/texinfo/libgccjit.texi (Type-coercion): Improve doc
>   on allowed type casting.
>   * docs/topics/expressions.rst (gccjit::context::new_cast)
>   (gcc_jit_context_new_cast): Likewise.
>   * libgccjit.c: Improve comment on allowed type casting.
>   * libgccjit.h: Likewise
>
> From 914b9e86808c947d4bb2b06c6960fd8031125f67 Mon Sep 17 00:00:00 2001
> From: Andrea Corallo 
> Date: Tue, 21 Jul 2020 20:12:23 +0200
> Subject: [PATCH] libgccjit: improve documentation on type conversions
>
> gcc/jit/ChangeLog
>
> 2020-07-21  Andrea Corallo  
>
>   * docs/_build/texinfo/libgccjit.texi (Type-coercion): Improve doc
>   on allowed type casting.
>   * docs/topics/expressions.rst (gccjit::context::new_cast)
>   (gcc_jit_context_new_cast): Likewise.
>   * libgccjit.c: Improve comment on allowed type casting.
>   * libgccjit.h: Likewise
> ---
>  gcc/jit/docs/_build/texinfo/libgccjit.texi | 30 +++---
>  gcc/jit/docs/topics/expressions.rst|  8 +++---
>  gcc/jit/libgccjit.c|  8 +++---
>  gcc/jit/libgccjit.h|  7 +++--
>  4 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/jit/docs/_build/texinfo/libgccjit.texi 
> b/gcc/jit/docs/_build/texinfo/libgccjit.texi
> index 1e14be010426..b170f24d1bb1 100644
> --- a/gcc/jit/docs/_build/texinfo/libgccjit.texi
> +++ b/gcc/jit/docs/_build/texinfo/libgccjit.texi
> @@ -6685,13 +6685,19 @@ Currently only a limited set of conversions are 
> possible:
>  @itemize *
>  
>  @item 
> -int <-> float
> +integral   <-> integral
>  
>  @item 
> -int <-> bool
> +floating-point <-> floating-point
>  
>  @item 
> -P*  <-> Q*, for pointer types P and Q
> +integral   <-> floating-point
> +
> +@item 
> +integral   <-> bool
> +
> +@item 
> +P* <-> Q*   for pointer types P and Q
>  @end itemize
>  @end quotation
>  @end deffn
> @@ -12964,14 +12970,20 @@ Currently only a limited set of conversions are 
> possible:
>  
>  @itemize *
>  
> -@item 
> -int <-> float
> +@item
> +integral   <-> integral
>  
> -@item 
> -int <-> bool
> +@item
> +floating-point <-> floating-point
>  
> -@item 
> -P*  <-> Q*, for pointer types P and Q
> +@item
> +integral   <-> floating-point
> +
> +@item
> +integral   <-> bool
> +
> +@item
> +P* <-> Q*, for pointer types P and Q
>  @end itemize
>  @end quotation
>  @end deffn
> diff --git a/gcc/jit/docs/topics/expressions.rst 
> b/gcc/jit/docs/topics/expressions.rst
> index d783ceea51a8..051cee5db211 100644
> --- a/gcc/jit/docs/topics/expressions.rst
> +++ b/gcc/jit/docs/topics/expressions.rst
> @@ -504,9 +504,11 @@ Type-coercion
>  
> Currently only a limited set of conversions are possible:
>  
> - * int <-> float
> - * int <-> bool
> - * P*  <-> Q*, for pointer types P and Q
> + * integral   <-> integral
> + * floating-point <-> floating-point
> + * integral   <-> floating-point
> + * integral   <-> bool
> + * P* <-> Q*   for pointer types P and Q
>  
>  Lvalues
>  ---
> diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
> index 3d04f6db3aff..403233d5577a 100644
> --- a/gcc/jit/libgccjit.c
> +++ b/gcc/jit/libgccjit.c
> @@ -1629,9 +1629,11 @@ gcc_jit_context_new_call_through_ptr (gcc_jit_context 
> *ctxt,
>  
> We only permit these kinds of cast:
>  
> - int <-> float
> - int <-> bool
> - P*  <-> Q*   for pointer types P and Q.  */
> + integral   <-> integral
> + floating-point <-> floating-point
> + integral   <-> floating-point
> + integral   <-> bool
> + P* <-> Q*   for pointer types P and Q.  */
>  
>  static bool
>  is_valid_cast (gcc::jit::recording::type *src_type,
> diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
> index 1c5a12e9c015..228befa896d7 100644
> --- a/gcc/jit/libgccjit.h
> +++ b/gcc/jit/libgccjit.h
> @@ -996,8 +996,11 @@ gcc_jit_context_new_call_through_ptr (gcc_jit_context 
> *ctxt,
>  /* Type-coercion.
>

Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Christophe Lyon via Gcc-patches
Hi,


On Wed, 5 Aug 2020 at 16:24, Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse  wrote:
> >
> > New version that passed bootstrap+regtest during the night.
> >
> > When vector comparisons were forced to use vec_cond_expr, we lost a number 
> > of
> > optimizations (my fault for not adding enough testcases to prevent that).
> > This patch tries to unwrap vec_cond_expr a bit so some optimizations can
> > still happen.
> >
> > I wasn't planning to add all those transformations together, but adding one
> > caused a regression, whose fix introduced a second regression, etc.
> >
> > Restricting to constant folding would not be sufficient, we also need at
> > least things like X|0 or X&X. The transformations are quite conservative
> > with :s and folding only if everything simplifies, we may want to relax
> > this later. And of course we are going to miss things like a?b:c + a?c:b
> > -> b+c.
> >
> > In terms of number of operations, some transformations turning 2
> > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not 
> > look
> > like a gain... I expect the bit_not disappears in most cases, and
> > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
> >
> > I am a bit confused that with avx512 we get types like "vector(4)
> > " with :2 and not :1 (is it a hack so true is 1 and not
> > -1?), but that doesn't matter for this patch.
>
> OK.
>
> Thanks,
> Richard.
>
> > 2020-08-05  Marc Glisse  
> >
> > PR tree-optimization/95906
> > PR target/70314
> > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> > (op (c ? a : b)): Update to match the new transformations.
> >
> > * gcc.dg/tree-ssa/andnot-2.c: New file.
> > * gcc.dg/tree-ssa/pr95906.c: Likewise.
> > * gcc.target/i386/pr70314.c: Likewise.
> >

I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
  Executed from: gcc.c-torture/compile/compile.exp
gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
  Executed from: gcc.dg/dg.exp
gcc.dg/pr87746.c (internal compiler error)
  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
  Executed from: gcc.dg/vect/vect.exp
gcc.dg/vect/pr59591-1.c (internal compiler error)
gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/pr86927.c (internal compiler error)
gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/slp-cond-5.c (internal compiler error)
gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-23.c (internal compiler error)
gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-24.c (internal compiler error)
gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
/gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tr

[PATCH] Remove std::map use from graphite

2020-08-06 Thread Richard Biener
This replaces the use of std::map with hash_map for mapping
ISL ids to SSA names.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-06  Richard Biener  

* graphite-isl-ast-to-gimple.c (ivs_params): Use hash_map instead
of std::map.
(ivs_params_clear): Adjust.
(gcc_expression_from_isl_ast_expr_id): Likewise.
(graphite_create_new_loop): Likewise.
(add_parameters_to_ivs_params): Likewise.
---
 gcc/graphite-isl-ast-to-gimple.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 81f7b48887c..5fa70ff2d4e 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -24,7 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifdef HAVE_isl
 
-#define INCLUDE_MAP
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -69,18 +68,14 @@ struct ast_build_info
 /* IVS_PARAMS maps isl's scattering and parameter identifiers
to corresponding trees.  */
 
-typedef std::map ivs_params;
+typedef hash_map ivs_params;
 
 /* Free all memory allocated for isl's identifiers.  */
 
 static void ivs_params_clear (ivs_params &ip)
 {
-  std::map::iterator it;
-  for (it = ip.begin ();
-   it != ip.end (); it++)
-{
-  isl_id_free (it->first);
-}
+  for (auto it = ip.begin (); it != ip.end (); ++it)
+isl_id_free ((*it).first);
 }
 
 /* Set the "separate" option for the schedule node.  */
@@ -256,13 +251,11 @@ gcc_expression_from_isl_ast_expr_id (tree type,
 {
   gcc_assert (isl_ast_expr_get_type (expr_id) == isl_ast_expr_id);
   isl_id *tmp_isl_id = isl_ast_expr_get_id (expr_id);
-  std::map::iterator res;
-  res = ip.find (tmp_isl_id);
+  tree *tp = ip.get (tmp_isl_id);
   isl_id_free (tmp_isl_id);
-  gcc_assert (res != ip.end () &&
- "Could not map isl_id to tree expression");
+  gcc_assert (tp && "Could not map isl_id to tree expression");
   isl_ast_expr_free (expr_id);
-  tree t = res->second;
+  tree t = *tp;
   if (useless_type_conversion_p (type, TREE_TYPE (t)))
 return t;
   if (POINTER_TYPE_P (TREE_TYPE (t))
@@ -596,11 +589,9 @@ graphite_create_new_loop (edge entry_edge, __isl_keep 
isl_ast_node *node_for,
 
   isl_ast_expr *for_iterator = isl_ast_node_for_get_iterator (node_for);
   isl_id *id = isl_ast_expr_get_id (for_iterator);
-  std::map::iterator res;
-  res = ip.find (id);
-  if (ip.count (id))
-isl_id_free (res->first);
-  ip[id] = iv;
+  bool existed_p = ip.put (id, iv);
+  if (existed_p)
+isl_id_free (id);
   isl_ast_expr_free (for_iterator);
   return loop;
 }
@@ -1347,7 +1338,8 @@ add_parameters_to_ivs_params (scop_p scop, ivs_params &ip)
 {
   isl_id *tmp_id = isl_set_get_dim_id (scop->param_context,
   isl_dim_param, i);
-  ip[tmp_id] = param;
+  bool existed_p = ip.put (tmp_id, param);
+  gcc_assert (!existed_p);
 }
 }
 
-- 
2.26.2


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Richard Biener
On Wed, 5 Aug 2020, Qing Zhao wrote:

> Hi, Richard,
> 
> Thanks a lot for your careful review and detailed comments.  
> 
> 
> > On Aug 4, 2020, at 2:35 AM, Richard Biener  wrote:
> > 
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> 
> Could you let me know who will be the more qualified person to fully review 
> the rest of middle-end change?

Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
it would be nice for other target maintainers to chime in (Segher for
power maybe) for the question below...

> > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > command-line option and
> > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > attribue:
> > 
> > 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > 
> > Don't zero call-used registers upon function return.
> > 
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> > 
> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> > 
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> 
> The major question of the above is:  what’s the motivation of the whole patch?
> H.J.Lu and I have replied this question in separated emails, let’s continue 
> with
> this high-level discussion in that thread. 
> 
> 
> > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, 
> > tree name,
> > return NULL_TREE;
> > }
> > 
> > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > +   int ARG_UNUSED (flags),
> > +   bool *no_add_attris)
> > +{
> > +  tree decl = *node;
> > +  tree id = TREE_VALUE (args);
> > +  enum zero_call_used_regs zero_call_used_regs_type = 
> > zero_call_used_regs_unset;
> > +
> > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > +{
> > +  error_at (DECL_SOURCE_LOCATION (decl),
> > + "%qE attribute applies only to functions", name);
> > +  *no_add_attris = true;
> > +  return NULL_TREE;
> > +}
> > +  else if (DECL_INITIAL (decl))
> > +{
> > +  error_at (DECL_SOURCE_LOCATION (decl),
> > + "cannot set %qE attribute after definition", name);
> > 
> > Why's that?
> This might not be needed, I will fix this in the next update.
> 
> > 
> > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > index 81bd2ee..ded1880 100644
> > --- a/gcc/c/c-decl.c
> > +++ b/gcc/c/c-decl.c
> > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree 
> > newtype, tree oldtype)
> >DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> >  }
> > 
> > +  /* Merge the zero_call_used_regs_type information.  */
> > +  if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > + DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS 
> > (olddecl);
> > +
> > 
> > If you need this (see below) then likely cp/* needs similar adjustment
> > so do other places in the middle-end (function cloning, etc)
> 
> Will check this in cp/* and function cloning etc to see whether the copying 
> and merging are needed in other
> places.
> 
> Another thought, if I use “lookup_attribute” of the function decl instead of 
> checking these bits as you suggested
> later,  all these copying and merging might not be necessary anymore. I will 
> check on that. 
> > 
> > 
> > +
> > +/* Emit a sequence of insns to zero the call-used-registers for the 
> > current
> > + * function.  */
> > 
> > No '*' on the continuation line
> 
> Okay, will fix this.
> 
> > +
> > +  /* This array holds the zero rtx with the correponding machine mode. 
> >  */
> > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > +zero_rtx[i] = NULL_RTX;
> > +
> > +  for (unsigned int regno = 0

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Richard Biener
On Wed, 5 Aug 2020, Qing Zhao wrote:

> >> 
> >> From The SECURE project and GCC in GCC Cauldron 2018:
> >> 
> >> Speaker: Graham Markall
> >> 
> >> The SECURE project is a 15 month program funded by Innovate UK, to
> >> take well known security techniques from academia and make them
> >> generally available in standard compilers, specfically GCC and LLVM.
> >> An explicit objective is for those techniques to be incorporated in
> >> the upstream versions of compilers. The Cauldron takes place in the
> >> final month of the project and this talk will present the technical
> >> details of some of the techniques implemented, and review those that
> >> are yet to be implemented. A particular focus of this talk will be on
> >> verifying that the implemetnation is correct, which can be a bigger
> >> challenge than the implementation.
> >> 
> >> Techniques to be covered in the project include the following:
> >> 
> >> Stack and register erasure. Ensuring that on return from a function,
> >> no data is left lying on the stack or in registers. Particular
> >> challenges are in dealing with inlining, shrink wrapping and caching.
> >> 
> >> This patch implemens register erasure.
> > 
> > Part of it, yes. While I can see abnormal transfer of control is difficult 
> > exception handling is used too wide spread to be ignored. What's the plan 
> > there? 
> > 
> > So can we also see the other parts? In particular I wonder whether exposing 
> > just register clearing (in this fine-grained manner) is required and useful 
> > rather than thinking of a better interface for the whole thing?
> 
> You mean to provide an integrated interface for both stack and register 
> erasure for security purpose?
> 
> However, Is stack erasure at function return really a better idea than 
> zero-init auto-variables in the beginning of the function?
> 
> We had some discussion with Kees Cook several weeks ago on the idea of 
> stack erasure at function return, Kees provided the following comments:
> 
> "But back to why I don't think it's the right approach:
> 
> Based on the performance measurements of pattern-init and zero-init
> in Clang, MSVC, and the kernel plugin, it's clear that adding these
> initializations has measurable performance cost. Doing it at function
> exit means performing large unconditional wipes. Doing it at function
> entry means initializations can be dead-store eliminated and highly
> optimized. Given the current debates on the measurable performance
> difference between pattern and zero initialization (even in the face of
> existing dead-store elimination), I would expect wipe-on-function-exit to
> be outside the acceptable tolerance for performance impact. (Additionally,
> we've seen negative cache effects on wiping memory when the CPU is done
> using it, though this is more pronounced in heap wiping. Zeroing at
> free is about twice as expensive as zeroing at free time due to cache
> temporality. This is true for the stack as well, but it's not as high.)”
> 
> From my understanding, the major issue with stack erasure at function 
> result is the big performance overhead, And these performance overhead 
> cannot be reduced with compiler optimizations since those additional 
> wiping insns are inserted at the end of the routine.
> 
> Based on the previous discussion with Kees, I don’t think that stack 
> erasure at function return is a good idea, Instead, we might provide an 
> alternative approach:  zero/pattern init to auto-variables. (This 
> functionality has Been available in LLVM already) This will be another 
> patch we want to add to GCC for the security purpose in general.
> 
> So, I think for the current patch, -fzero-call-used-regs should be good 
> enough.
> 
> Any comments?

OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
it sounded more like a mitigation against information leaks which
then would be highly incomplete w/o spill slot clearing.  Like
we had that discussion on secure erase of memory that should not
be DSEd.

This needs to be reflected in the documentation and eventually
the option naming?  Like -frop-protection=... similar in spirit
to how we have -fcf-protection=... (though that as well is supposed
to provide ROP mitigation).

I'm not very familiar with ROP [mitigation] techinques, so I'm no
longer questioning usefulness of this patch but leave that to others
(and thus final approval).  I'm continuing to question the plethora
of target hooks you add and will ask for better user-level documentation.

Richard.


RE: [PATCH] x86_64: Integer min/max improvements.

2020-08-06 Thread Roger Sayle

Hi Uros,

Many thanks for the review and feedback.  Here's the final version as committed,
with both the test cases requested by Richard Biener and your suggestion/request
to use ix86_expand_clear.  Tested again on x86_64-pc-linux-gnu.

Thank you again for the fantastic ix86_expand_clear pointer, which cleared up 
one
of two technical questions I had, and allowed this peephole2 to now also apply 
to
QImode and HImode MOV0s, where my original version was limited to SImode and
DImode.

My two questions were (i) why a QImode set of 0 with a flags clobber isn't a 
recognized
instruction?  I'd assume that on some architectures "xorb dl,dl" might be an 
appropriate
sequence to use.  This is mostly answered by the use of ix86_expand_clear, 
which 
intelligently selects the correct form, but the lack of a *movqi_xor was 
previously odd.
(ii) My other question, was that despite my best efforts I couldn't seem to 
convince GCC
to generate/use a *movsi_or to load the constant -1.  It was just a curiosity, 
but this
would affect/benefit the smaxm1 and sminm1 examples in the new i386/minmax-10.c
test program.

Many thanks again for your help.
Roger
--

-Original Message-
From: Uros Bizjak  
Sent: 03 August 2020 11:29
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] x86_64: Integer min/max improvements.

On Thu, Jul 30, 2020 at 1:23 PM Roger Sayle  wrote:
>
>
> This patch tweaks the way that min and max are expanded, so that the 
> semantics of these operations are visible to the early RTL 
> optimization passes, until split into explicit comparison and 
> conditional move instructions. The good news is that i386.md already 
> contains all of the required logic (many thanks to Richard Biener and 
> Uros Bizjak), but this is currently only to enable scalar-to-vector 
> (STV) synthesis of min/max instructions.  This change enables this 
> functionality for all TARGET_CMOVE architectures for SImode, HImode and 
> DImode.
>
> My first attempt to support "min(reg,reg)" as an instruction revealed 
> why this functionality isn't already enabled: In PR rtl-optimization 
> 91154 we end up generating a cmp instead of a test in 
> gcc.target/i386/minmax-3.c which has poor performance on AMD Opteron.  
> My solution to this is to actually support "min(reg,general_operand)" 
> allowing us to inspect any immediate constant at the point this 
> operation is split.  This allows us to use "test" instructions for min/max 
> against 0, 1 and -1.
> As an added benefit it allows us to CSE large immediate constants, 
> reducing code size.
>
> Previously, "int foo(int x) { return max(x,12345); }" would generate:
>
> foo:cmpl$12345, %edi
> movl$12345, %eax
> cmovge  %edi, %eax
> ret
>
> with this patch we instead generate:
>
> foo:movl$12345, %eax
> cmpl%eax, %edi
> cmovge  %edi, %eax
> ret
>
>
> I've also included a peephole2 to avoid the "movl $0,%eax" 
> instructions being produced by the register allocator.  Materializing 
> constants as late as possible reduces register pressure, but for 
> const0_rtx on x86, it's preferable to use "xor" by moving this set 
> from between a flags setting operation and its use.  This should also 
> help instruction macro fusion on some microarchitectures, where 
> test/cmp and the following instruction can sometimes be combined.
>
> Previously, "int foo(int x) { return max(x,0); }" would generate:
>
> foo:testl   %edi, %edi
> movl$0, %eax
> cmovns  %edi, %eax
> ret
>
> with this patch we instead generate:
> foo:xorl%eax, %eax
> testl   %edi, %edi
> cmovns  %edi, %eax
> ret
>
> The benefits of having min/max explicit at the RTL level can be seen 
> from compiling the following example with "-O2 -fno-tree-reassoc".
>
>
> #define max(a,b) (((a) > (b))? (a) : (b))
>
> int foo(int x)
> {
>   int y = max(x,5);
>   return max(y,10);
> }
>
> where GCC currently produces:
>
> foo:cmpl$5, %edi
> movl$5, %eax
> movl$10, %edx
> cmovge  %edi, %eax
> cmpl$10, %eax
> cmovl   %edx, %eax
> ret
>
> and with this patch it instead now produces:
>
> foo:movl$10, %eax
> cmpl%eax, %edi
> cmovge  %edi, %eax
> ret
>
>
> The original motivation was from looking at a performance critical 
> function in a quantum mechanics code, that performed MIN_EXPR and 
> MAX_EXPR of the same arguments (effectively a two-element sort), where 
> GCC was performing the comparison twice.  I'd hoped that it might be 
> possible to fuse these together, perhaps in combine, but this patch is 
> an intermediate step towards that goal.
>
> This patch has been tested on x86_64-pc-linux-gnu with a make 
> bootstrap followed by make -k check with no new regressions.
> Ok for mainline?
>
>
> 2020-07-30  Roger Sayle  
>
> * config/i386/i386.md (MAXMIN_IMODE): No longer needed.
> (3):  Support S

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > For x86, for example, even though the GPR registers are 64-bit, we only 
> > need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

But, does the code need to care?
If one compiles:
void
foo ()
{
  register unsigned long long a __asm ("rax");
  register unsigned long long b __asm ("rsi");
  register unsigned long long c __asm ("r8");
  register unsigned long long d __asm ("r9");
  a = 0;
  b = 0;
  c = 0;
  d = 0;
  asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
}
then the backend uses *movdi_xor patterns which are emitted
as xorl instructions (i.e. just 32-bit).  If you need to emit them
at a spot where the flags register is or might be live, then
*movdi_internal is used instead, but that one will also emit
a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
zero extended 32-bit integer).

Jakub



Re: [PATCH] testsuite: Update some vect cases for partial vectors

2020-08-06 Thread Richard Sandiford
"Kewen.Lin"  writes:
>>> +# Return true if loops using partial vectors are supported.
>>> +
>>> +proc check_effective_target_vect_partial_vectors { } {
>>> +return [expr { [check_effective_target_vect_partial_vectors_usage_1]
>>> +  || [check_effective_target_vect_partial_vectors_usage_2] }]
>>> +}
>>> +
>>> +# Return true if loops using partial vectors are supported and the default
>>> +# value of --param=vect-partial-vector-usage is 1.
>>> +
>>> +proc check_effective_target_vect_partial_vectors_usage_1 { } {
>>> +return 0
>>> +}
>>> +
>>> +# Return true if loops using partial vectors are supported and the default
>>> +# value of --param=vect-partial-vector-usage is 2.
>>> +
>>> +proc check_effective_target_vect_partial_vectors_usage_2 { } {
>>> +return [expr { [check_effective_target_vect_fully_masked] }]
>>> +}
>>> +
>> 
>> Could we auto-detect this?  What we really care about isn't the default,
>> but what's currently being tested.
>
> Yeah, the comments were confusing, its intent is to check which targets
> support partial vectors and which usage to be used.
>
> How about to update them like:
>
> "Return true if loops using partial vectors are supported and usage kind is
> 1/2".

I wasn't really commenting on the comment so much as the intent.
It should be possible to run the testsuite with:

  --target_board unix/--param=vect-partial-vector-usage=1

and get the right results.

>> E.g. maybe use check_compile to run gcc with “-Q --help=params” and an
>> arbitrary output type (probably assembly).  Then use “regexp” on the
>> lines to parse the --param=vect-partial-vector-usage value.  At that
>> point it would be worth caching the result.
>
> Now the default value of this parameter is 2, even for those targets which
> don't have the supports with partial vectors.  Since we will get the value
> 2 on those unsupported targets, it looks like we have to set it manually?

I think that just means we want:

vect_len_load_store
  the len_load_store equivalent of vect_fully_masked, i.e. whether
  the target supports len load/store (regardless of whether the
  --param enables it)

vect_partial_vectors
  (vect_fully_masked || vect_len_load_store) && param != 0

vect_partial_vectors_usage_1
  (vect_fully_masked || vect_len_load_store) && param == 1

vect_partial_vectors_usage_2
  (vect_fully_masked || vect_len_load_store) && param == 2

Thanks,
Richard


Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Marc Glisse

On Thu, 6 Aug 2020, Christophe Lyon wrote:


2020-08-05  Marc Glisse  

PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
(op (c ? a : b)): Update to match the new transformations.

* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.



I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
 Executed from: gcc.c-torture/compile/compile.exp
   gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
   gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
 Executed from: gcc.dg/dg.exp
   gcc.dg/pr87746.c (internal compiler error)
 Executed from: gcc.dg/tree-ssa/tree-ssa.exp
   gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)


I tried a cross from x86_64-linux with current master

.../configure --target=arm-none-linux-gnueabihf --enable-languages=c,c++ 
--with-system-zlib --disable-nls --with-cpu=cortex-a9 --with-fpu=neon-fp16
make

it stops at some point with an error, but I have xgcc and cc1 in
build/gcc.

I copied 2 of the testcases and compiled

./xgcc pr87746.c -Ofast -S -B.
./xgcc -O3 -fdump-tree-ifcvt-details-blocks-details ifc-cd.c -S -B.

without getting any ICE.

Is there a machine on the compile farm where this is easy to reproduce?
Or could you attach the .optimized dump that corresponds to the
backtrace below? It looks like we end up with a comparison with an
unexpected return type.


 Executed from: gcc.dg/vect/vect.exp
   gcc.dg/vect/pr59591-1.c (internal compiler error)
   gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/pr86927.c (internal compiler error)
   gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/slp-cond-5.c (internal compiler error)
   gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-23.c (internal compiler error)
   gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-24.c (internal compiler error)
   gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
   gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
   /gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282

Christophe


--
Marc Glisse


RE: [PATCH] arm: Clear canary value after stack_protect_test [PR96191]

2020-08-06 Thread Kyrylo Tkachov
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 05 August 2020 15:33
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Clear canary value after stack_protect_test [PR96191]
> 
> The stack_protect_test patterns were leaving the canary value in the
> temporary register, meaning that it was often still in registers on
> return from the function.  An attacker might therefore have been
> able to use it to defeat stack-smash protection for a later function.
> 
> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
> I tested the thumb1.md part using arm-linux-gnueabi with the
> test flags -march=armv5t -mthumb.  OK for trunk and branches?
> 
> As I mentioned in the corresponding aarch64 patch, this is needed
> to make arm conform to GCC's current -fstack-protector implementation.
> However, I think we should reconsider whether the zeroing is actually
> necessary and what it's actually protecting against.  I'll send a
> separate message about that to gcc@.  But since the port isn't even
> self-consistent (the *set patterns do clear the registers), I think
> we should do this first rather than wait for any outcome of that
> discussion.

That makes sense.
Ok.
Thanks,
Kyrill

> 
> Richard
> 
> 
> gcc/
>   PR target/96191
>   * config/arm/arm.md (arm_stack_protect_test_insn): Zero out
>   operand 2 after use.
>   * config/arm/thumb1.md (thumb1_stack_protect_test_insn):
> Likewise.
> 
> gcc/testsuite/
>   * gcc.target/arm/stack-protector-1.c: New test.
>   * gcc.target/arm/stack-protector-2.c: Likewise.
> ---
>  gcc/config/arm/arm.md |  6 +-
>  gcc/config/arm/thumb1.md  |  8 ++-
>  .../gcc.target/arm/stack-protector-1.c| 63 +++
>  .../gcc.target/arm/stack-protector-2.c|  6 ++
>  4 files changed, 78 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-2.c
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index a6a31f8f4ef..dd13c77e889 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -9320,6 +9320,8 @@ (define_insn_and_split
> "*stack_protect_combined_test_insn"
>[(set_attr "arch" "t1,32")]
>  )
> 
> +;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
> +;; canary value does not live beyond the end of this sequence.
>  (define_insn "arm_stack_protect_test_insn"
>[(set (reg:CC_Z CC_REGNUM)
>   (compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand"
> "m,m")
> @@ -9329,8 +9331,8 @@ (define_insn "arm_stack_protect_test_insn"
> (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
> (clobber (match_dup 2))]
>"TARGET_32BIT"
> -  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
> -  [(set_attr "length" "8,12")
> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0\;mov\t%2, #0"
> +  [(set_attr "length" "12,16")
> (set_attr "conds" "set")
> (set_attr "type" "multiple")
> (set_attr "arch" "t,32")]
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 24861635fa5..0ff819090d9 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -2020,6 +2020,8 @@ (define_insn_and_split "thumb_eh_return"
>[(set_attr "type" "mov_reg")]
>  )
> 
> +;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
> +;; canary value does not live beyond the end of this sequence.
>  (define_insn "thumb1_stack_protect_test_insn"
>[(set (match_operand:SI 0 "register_operand" "=&l")
>   (unspec:SI [(match_operand:SI 1 "memory_operand" "m")
> @@ -2027,9 +2029,9 @@ (define_insn "thumb1_stack_protect_test_insn"
>UNSPEC_SP_TEST))
> (clobber (match_dup 2))]
>"TARGET_THUMB1"
> -  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
> -  [(set_attr "length" "8")
> -   (set_attr "conds" "set")
> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0\;movs\t%2, #0"
> +  [(set_attr "length" "10")
> +   (set_attr "conds" "clob")
> (set_attr "type" "multiple")]
>  )
> 
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-1.c
> b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
> new file mode 100644
> index 000..b03ea14c4e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
> @@ -0,0 +1,63 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fstack_protector } */
> +/* { dg-options "-fstack-protector-all -O2" } */
> +
> +extern volatile long *stack_chk_guard_ptr;
> +
> +volatile long *
> +get_ptr (void)
> +{
> +  return stack_chk_guard_ptr;
> +}
> +
> +void __attribute__ ((noipa))
> +f (void)
> +{
> +  volatile int x;
> +  x = 1;
> +  x += 1;
> +}
> +
> +#define CHECK(REG) "\tcmp\tr0, " #REG "\n\tbeq\t1f\n"
> +
> +asm (
> +".data\n"
> +".align  3\n"
> +".globl  stack_chk_guard_ptr\n"

Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 08:35 +0200, François Dumont wrote:

On 17/07/20 1:35 pm, Jonathan Wakely wrote:

I really like the general idea of getting rid of some of the
complexity and not supporting infinite customization. But we can do
that without changing mangled names of the _Hashtable specialiations.



I didn't thought we need to keep abi compatibility for extensions.


These aren't extensions though, they're part of std::unordered_map
etc.

Just because something like _Vector_base is an internal type rather
than something defined in the standard doesn't mean we can just change
its ABI, because that would change the ABI of std::vector. It the same
here.

Changing _Hashtable affects all users of std::unordered_map etc.




RE: [PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-06 Thread Kyrylo Tkachov
Hi Omar,

> -Original Message-
> From: Omar Tahir 
> Sent: 05 August 2020 12:42
> To: Kyrylo Tkachov ; ni...@redhat.com;
> Ramana Radhakrishnan ; Richard
> Earnshaw ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH 2/5][Arm] New pattern for CSINV instructions
> 
> Hi Kyrill,
> 
> > -/* Only thumb1 can't support conditional execution, so return true if
> > -   the target is not thumb1.  */
> > static bool
> >
> >
> > Functions should have comments in GCC. Can you please write something
> describing the new logic of the function.
> >
> > arm_have_conditional_execution (void)
> > {
> > -  return !TARGET_THUMB1;
> > +  bool has_cond_exec, enable_ifcvt_trans;
> > +
> > +  /* Only THUMB1 cannot support conditional execution. */
> > +  has_cond_exec = !TARGET_THUMB1;
> > +
> > +  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
> > + transformations before reload. */
> > +  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
> > +
> > +  /* The ifcvt transformations are only turned on if we return false. */
> > +  return has_cond_exec && !enable_ifcvt_trans;
> >
> > I don't think that comment is very useful. Perhaps "Enable ifcvt
> transformations only if..."
> >
> > }
> 
> Fixed, let me know if the new comments are a bit clearer now.
> 
> > +(define_constraint "Z"
> > +  "@internal
> > +   Integer constant zero."
> > +  (match_test "op == const0_rtx"))
> >
> >
> > We're usually wary of adding more constraints unless necessary as it gets
> complicated to read patterns quickly (especially once we get into multi-letter
> constraints).
> > I think you can reuse the existing "Pz" constraint for your purposes.
> 
> Yes Pz works, I'll replace Z with Pz in the other patches as well. In patch 5 
> I
> introduce UM (-1) and U1 (1), I don't think there's any existing combination
> of constraints that can be used instead.

Great!

> 
> >
> > Ok with those changes.
> > If you'd like to commit it yourself please apply for write access at
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my email address
> from MAINTAINERS as the approver.
> 
> Excellent, thanks. If the other three patches are okay I'll commit them as 
> well?

Please wait for review before committing them, but once they're ok'ed feel free 
to push them (please make sure proper testing is done so that trunk is not left 
in a broken state).

Thanks,
Kyrill

> 
> Thanks,
> Omar
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index dac9a6fb5c4..e1bb2db9c8a 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29833,12 +29833,23 @@ arm_frame_pointer_required (void)
>return false;
>  }
> 
> -/* Only thumb1 can't support conditional execution, so return true if
> -   the target is not thumb1.  */
> +/* Implement the TARGET_HAVE_CONDITIONAL_EXECUTION hook.
> +   All modes except THUMB1 have conditional execution.
> +   If we have conditional arithmetic, return false before reload to
> +   enable some ifcvt transformations. */
>  static bool
>  arm_have_conditional_execution (void)
>  {
> -  return !TARGET_THUMB1;
> +  bool has_cond_exec, enable_ifcvt_trans;
> +
> +  /* Only THUMB1 cannot support conditional execution. */
> +  has_cond_exec = !TARGET_THUMB1;
> +
> +  /* Enable ifcvt transformations if we have conditional arithmetic, but only
> + before reload. */
> +  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
> +
> +  return has_cond_exec && !enable_ifcvt_trans;
>  }
> 
>  /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 30e1d6dc994..d67c91796e4 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */
> 
>  #define TARGET_CRC32 (arm_arch_crc)
> 
> +/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
> +   csinv, etc. */
> +#define TARGET_COND_ARITH(arm_arch8_1m_main)
> +
>  /* The following two macros concern the ability to execute coprocessor
> instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are
> currently
> only ever tested when we know we are generating for VFP hardware; we
> need
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 981eec520ba..2144520829c 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -485,6 +485,18 @@
>(and (match_operand 0 "expandable_comparison_operator")
> (match_test "maybe_get_arm_condition_code (op) != ARM_NV")))
> 
> +(define_special_predicate "arm_comparison_operation"
> +  (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered,
> + ordered,unlt,unle,unge,ungt")
> +{
> +  if (XEXP (op, 1) != const0_rtx)
> +return false;
> +  rtx op0 = XEXP (op, 0);
> +  if (!REG_P (op0) || REGNO (op0) != CC_REGNUM)
> +return false;
> +  return maybe_get_arm_condition_code (op) != ARM_NV;
> +})
> +
>  (define_spe

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 6, 2020 at 10:42 AM Jakub Jelinek  wrote:
>
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > > For x86, for example, even though the GPR registers are 64-bit, we only 
> > > need to zero the lower 32-bit. etc.
> >
> > That's an optimization, yes.
>
> But, does the code need to care?

No, because this is only an implementation detail. The RTL code should
still use DImode clears. These are emitted using 32bit insns,
implicitly zero-extended to 64bits, so in effect they implement DImode
clears.

Uros.

> If one compiles:
> void
> foo ()
> {
>   register unsigned long long a __asm ("rax");
>   register unsigned long long b __asm ("rsi");
>   register unsigned long long c __asm ("r8");
>   register unsigned long long d __asm ("r9");
>   a = 0;
>   b = 0;
>   c = 0;
>   d = 0;
>   asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
> }
> then the backend uses *movdi_xor patterns which are emitted
> as xorl instructions (i.e. just 32-bit).  If you need to emit them
> at a spot where the flags register is or might be live, then
> *movdi_internal is used instead, but that one will also emit
> a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
> zero extended 32-bit integer).
>
> Jakub
>


RE: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved registers with CMSE

2020-08-06 Thread Kyrylo Tkachov
Hi Andre,

> -Original Message-
> From: Andre Vieira (lists) 
> Sent: 06 July 2020 15:31
> To: gcc-patches@gcc.gnu.org; Christophe Lyon 
> Cc: Kyrylo Tkachov 
> Subject: Re: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee
> saved registers with CMSE
> 
> 
> On 30/06/2020 14:50, Andre Vieira (lists) wrote:
> >
> > On 29/06/2020 11:15, Christophe Lyon wrote:
> >> On Mon, 29 Jun 2020 at 10:56, Andre Vieira (lists)
> >>  wrote:
> >>>
> >>> On 23/06/2020 21:52, Christophe Lyon wrote:
>  On Tue, 23 Jun 2020 at 15:28, Andre Vieira (lists)
>   wrote:
> > On 23/06/2020 13:10, Kyrylo Tkachov wrote:
> >>> -Original Message-
> >>> From: Andre Vieira (lists) 
> >>> Sent: 22 June 2020 09:52
> >>> To: gcc-patches@gcc.gnu.org
> >>> Cc: Kyrylo Tkachov 
> >>> Subject: [PATCH][GCC][Arm] PR target/95646: Do not clobber
> >>> callee saved
> >>> registers with CMSE
> >>>
> >>> Hi,
> >>>
> >>> As reported in bugzilla when the -mcmse option is used while
> >>> compiling
> >>> for size (-Os) with a thumb-1 target the generated code will
> >>> clear the
> >>> registers r7-r10. These however are callee saved and should be
> >>> preserved
> >>> accross ABI boundaries. The reason this happens is because these
> >>> registers are made "fixed" when optimising for size with Thumb-1
> >>> in a
> >>> way to make sure they are not used, as pushing and popping
> >>> hi-registers
> >>> requires extra moves to and from LO_REGS.
> >>>
> >>> To fix this, this patch uses 'callee_saved_reg_p', which
> >>> accounts for
> >>> this optimisation, instead of 'call_used_or_fixed_reg_p'. Be
> >>> aware of
> >>> 'callee_saved_reg_p''s definition, as it does still take call used
> >>> registers into account, which aren't callee_saved in my opinion,
> >>> so it
> >>> is a rather misnoemer, works in our advantage here though as it
> >>> does
> >>> exactly what we need.
> >>>
> >>> Regression tested on arm-none-eabi.
> >>>
> >>> Is this OK for trunk? (Will eventually backport to previous
> >>> versions if
> >>> stable.)
> >> Ok.
> >> Thanks,
> >> Kyrill
> > As I was getting ready to push this I noticed I didn't add any
> > skip-ifs
> > to prevent this failing with specific target options. So here's a new
> > version with those.
> >
> > Still OK?
> >
>  Hi,
> 
>  This is not sufficient to skip arm-linux-gnueabi* configs built with
>  non-default cpu/fpu.
> 
>  For instance, with arm-linux-gnueabihf --with-cpu=cortex-a9
>  --with-fpu=neon-fp16 --with-float=hard
>  I see:
>  FAIL: gcc.target/arm/pr95646.c (test for excess errors)
>  Excess errors:
>  cc1: error: ARMv8-M Security Extensions incompatible with selected
> FPU
>  cc1: error: target CPU does not support ARM mode
> 
>  and the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os
> >>> Resending as I don't think my earlier one made it to the lists
> >>> (sorry if
> >>> you are receiving this double!)
> >>>
> >>> I'm not following this, before I go off and try to reproduce it,
> >>> what do
> >>> you mean by 'the testcase is compiled with -mcpu=cortex-m23 -mcmse
> >>> -Os'?
> >>> These are the options you are seeing in the log file? Surely they
> >>> should
> >>> override the default options? Only thing I can think of is this might
> >>> need an extra -mfloat-abi=soft to make sure it overrides the default
> >>> float-abi.  Could you give that a try?
> >> No it doesn't make a difference alone.
> >>
> >> I also had to add:
> >> -mfpu=auto (that clears the above warning)
> >> -mthumb otherwise we now get cc1: error: target CPU does not support
> >> ARM mode
> >>
> >> Looks like some effective-target machinery is needed
> > So I had a look at this,  I was pretty sure that -mfloat-abi=soft
> > overwrote -mfpu=<>, which in large it does, as in no FP instructions
> > will be generated but the error you see only checks for the right
> > number of FP registers. Which doesn't check whether
> > 'TARGET_HARD_FLOAT' is set or not. I'll fix this too and use the
> > check-effective-target for armv8-m.base for this test as it is indeed
> > a better approach than my bag of skip-ifs. I'm testing it locally to
> > make sure my changes don't break anything.
> >
> > Cheers,
> > Andre
> Hi,
> 
> Sorry for the delay. So I changed the test to use the effective-target
> machinery as you suggested and I also made sure that you don't get the
> "ARMv8-M Security Extensions incompatible with selected FPU" when
> -mfloat-abi=soft.
> Further changed 'asm' to '__asm__' to avoid failures with '-std=' options.
> 
> Regression tested on arm-none-eabi.
> @Christophe: could you test this for your configuration, shouldn't fail
> anymore!
> 
> Is this OK for trunk?

Sorry for the delay, this is ok.
Thanks,
Kyrill

> 
> Cheers,
> Andre
> 
> gcc/ChangeLog:

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 05, 2020 at 04:31:08PM -0600, Martin Sebor via Gcc-patches wrote:
> I've always found the second argument to __builtin_object_size
> confusing for types above 1.  I don't see anything wrong in
> the diff but I believe the most useful results are with type 1
> for string functions and type 0 for raw memory functions like
> memcpy (that's what _FORTIFY_SOURCE uses for the two sets of
> functions).  In type 2 when the result is zero it means one of
> two things: either the size of the array couldn't be determined
> or it really is zero.  That's less than helpful in cases like:
> 
>   char a[8];
>   strcpy (a + 8, s);
> 
> where it prevents detecting the buffer overflow.

I don't know what is confusing about it.
With the 0/1 arguments bos returns an upper bound for the object size
(and the don't know value is the maximum in that case, i.e. (size_t)-1),
while with 2/3 arguments bos returns an lower bound for the object size
(and thus the don't know value is the minimum value, i.e. 0).
The 2/3 modes are obviously not something you want to use in strcpy etc.
implementation, in those cases you want to abort the program only when
it is guaranteed to be invalid, i.e. when it will certainly overflow
the available size in any case, while with the 2/3 modes it would abort already
if there is a possibility the object might not be big enough.
One can e.g. use both modes to check if the object is known to have exactly
a particular size, when
__builtin_object_size (ptr, 0) == __builtin_object_size (ptr, 2)
and the bos returns say 25, then you know it is exactly 25 bytes.
E.g. if one has:
  ptr = flag ? malloc (32) : malloc (64);
  x[0] = __builtin_object_size (ptr, 0);
  x[1] = __builtin_object_size (ptr, 2);
then x[0] will be 64 as the maximum and x[1] to 32 as the minimum (of course
unless flag can be folded to constant, then both would be the same depending
on to which constant it is folded).

Jakub



[PATCH] tree-optimization/96483 - fix ICE in PRE with POLY_INT_CST

2020-08-06 Thread Richard Biener
This adds a missing case for PRE expression re-materialization.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-06  Richard Biener  

PR tree-optimization/96483
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Handle
POLY_INT_CST.
---
 gcc/tree-ssa-pre.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 7d67305bf4b..0a94f4e3355 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -2644,6 +2644,7 @@ create_component_ref_by_pieces_1 (basic_block block, 
vn_reference_t ref,
   }
 case STRING_CST:
 case INTEGER_CST:
+case POLY_INT_CST:
 case COMPLEX_CST:
 case VECTOR_CST:
 case REAL_CST:
-- 
2.26.2


Re: std:vec for classes with constructor?

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 06:16 +0100, Richard Sandiford wrote:

Andrew MacLeod via Gcc-patches  writes:

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:

On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't

in our internal vec<> implementation.  Although we usually steer away
from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated with

calloc.
Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have
a
vector of constructed items?

It's definitely not what we want. You have to find another solution to this 
problem.

Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no allocation
done per element, so a small number of elements have a couple of fields
initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor, std::vec
does.


I realise you weren't claiming otherwise, but: that could be fixed :-)


It really should be.

Artificial limitations like that are just a booby trap for the unwary.


I quizzed some libstdc++ folks, and there has been a lot of
optimizations done on std::vec over the last few years,.. They think its
pretty good now, and we were encouraged to use it.

We can visit the question tho...  What is the rationale for not using
std::vec in the compiler?  We currently use std::swap, std:pair,
std::map, std::sort, and a few others.
is there some aspect of using std::vec I am not aware of that makes it
something we need to avoid?


One reason to prefer vec<> for general interfaces is that it
works with auto_vec<…, N>, making it possible to pre-allocate a
reasonably-sized buffer on the stack without needing a round-trip
through the allocators.

FWIW, that isn't simply a GCC thing.  LLVM (which is obviously much
more C++-intensive than GCC) still makes heavy use of SmallVector for
automatic variables.  And the reason we have things like memory_block.h
is that malloc did used to show up high in profiles.


Yes, LLVM's SmallVector is very useful. You can achieve a similar
thing with a custom allocator in std::vector, but it's more cumbersome
and it alters the type from std::vector to std::vector.

The beauty of the LLVM design is the common base class for
SmallVector is the same for all N, so you can pass it to APIs
that don't care about the size and just work with the base interface.


(FWIW, I'm not saying that's an argument in favour of avoiding
std::vector completely.  It's just a reason why it might not always
be the right choice.)

Thanks,
Richard





[PATCH] tree-optimization/96491 - avoid store commoning across abnormal edges

2020-08-06 Thread Richard Biener
This avoids store commoning across abnormal edges since that easily
can disrupt abnormal coalescing because it might create overlapping
lifetime of variables.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-06  Richard Biener  

PR tree-optimization/96491
* tree-ssa-sink.c (sink_common_stores_to_bb): Avoid
sinking across abnormal edges.

* gcc.dg/torture/pr96491.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr96491.c | 29 ++
 gcc/tree-ssa-sink.c|  3 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr96491.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr96491.c 
b/gcc/testsuite/gcc.dg/torture/pr96491.c
new file mode 100644
index 000..784559f4754
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr96491.c
@@ -0,0 +1,29 @@
+/* { dg-do compile }  */
+
+int rj;
+
+void __attribute__ ((returns_twice))
+da (void)
+{
+  rj = 1;
+}
+
+void
+c5 (void)
+{
+  for (;;)
+++rj;
+}
+
+void
+ls (int kz)
+{
+  if (kz == 0)
+{
+  rj = 0;
+  c5 ();
+}
+
+  da ();
+  c5 ();
+}
diff --git a/gcc/tree-ssa-sink.c b/gcc/tree-ssa-sink.c
index 962ad076968..4cc5195f2f8 100644
--- a/gcc/tree-ssa-sink.c
+++ b/gcc/tree-ssa-sink.c
@@ -503,7 +503,8 @@ sink_common_stores_to_bb (basic_block bb)
  tree arg = gimple_phi_arg_def (phi, i);
  gimple *def = SSA_NAME_DEF_STMT (arg);
  if (! is_gimple_assign (def)
- || stmt_can_throw_internal (cfun, def))
+ || stmt_can_throw_internal (cfun, def)
+ || (gimple_phi_arg_edge (phi, i)->flags & EDGE_ABNORMAL))
{
  /* ???  We could handle some cascading with the def being
 another PHI.  We'd have to insert multiple PHIs for
-- 
2.26.2


Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 08:57 +0200, Richard Biener wrote:

On Thu, Aug 6, 2020 at 3:07 AM Andrew MacLeod  wrote:


On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  wrote:
>> On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>> [...]
>>
>>> * ipa-cp changes from vec to std::vec.
>>>
>>> We are using std::vec to ensure constructors are run, which they
>> aren't
>>> in our internal vec<> implementation.  Although we usually steer away
>>> from using std::vec because of interactions with our GC system,
>>> ipcp_param_lattices is only live within the pass and allocated with
>> calloc.
>> Ummm... I did not object but I will save the URL of this message in the
>> archive so that I can waive it in front of anyone complaining why I
>> don't use our internal vec's in IPA data structures.
>>
>> But it actually raises a broader question: was this supposed to be an
>> exception, allowed only not to complicate the irange patch further, or
>> will this be generally accepted thing to do when someone wants to have
>> a
>> vector of constructed items?
> It's definitely not what we want. You have to find another solution to this 
problem.
>
> Richard.
>

Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no allocation
done per element, so a small number of elements have a couple of fields
initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor, std::vec
does.


Other places in the compiler use placement new here.  The only
complication is re-allocation which would require move CTORs which
up to a few weeks ago were not available.  But of course we want
our own re-allocation policy, thus vec<>, not std::vector.

Also you do

#include 

inside ipa-fnsummary.c - you should know better to not include
system headers from .c files - they need to go in system.h.
You should do a

#define INCLUDE_VECTOR

before the system.h include in ipa-fnsummary.c instead.

I find it only mildly amusing that review & approval of this happened
behind the scenes ...

I scanned the code but didn't spot the point where you'd need to run
CTORs? Can you point me to that?


 I quizzed some libstdc++ folks, and there has been a lot of
optimizations done on std::vec over the last few years,.. They think its
pretty good now, and we were encouraged to use it.

We can visit the question tho...  What is the rationale for not using
std::vec in the compiler?  We currently use std::swap, std:pair,
std::map, std::sort, and a few others.


We're using std::swap and std::pair throughoug because those are lean.
One of the main motivations to not use the STL is because of TU
explosion when including lots of those large headers.  Bootstrap times
are still an issue.

There's also code size issues when we have both std::vector
and vec instantiations.  And as you say vec<> isn't going away
because there's no way to make std::vector & friends GC aware.

Then there's consistency across the code-base.  A big mess will
scare of new contributors.


So do weird types that don't run constructors and require manually
constructing things into buffers with placement new, or only using
types with trivial default construcrors. It's not 1994.

Containers should own and manage their contents. Maybe adding support
for non-trivial construction to vec and making it movable (and not
copyable) would make it more useful.


Oh, and existing uses are mostly mistakes and are _not_ a reason
to add more "exceptions" - instead they are a reason to rectify those
mistakes.

Richard.


is there some aspect of using std::vec I am not aware of that makes it
something we need to avoid?

Andrew











Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Richard Biener via Gcc-patches
On Thu, Aug 6, 2020 at 10:17 AM Christophe Lyon
 wrote:
>
> Hi,
>
>
> On Wed, 5 Aug 2020 at 16:24, Richard Biener via Gcc-patches
>  wrote:
> >
> > On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse  wrote:
> > >
> > > New version that passed bootstrap+regtest during the night.
> > >
> > > When vector comparisons were forced to use vec_cond_expr, we lost a 
> > > number of
> > > optimizations (my fault for not adding enough testcases to prevent that).
> > > This patch tries to unwrap vec_cond_expr a bit so some optimizations can
> > > still happen.
> > >
> > > I wasn't planning to add all those transformations together, but adding 
> > > one
> > > caused a regression, whose fix introduced a second regression, etc.
> > >
> > > Restricting to constant folding would not be sufficient, we also need at
> > > least things like X|0 or X&X. The transformations are quite conservative
> > > with :s and folding only if everything simplifies, we may want to relax
> > > this later. And of course we are going to miss things like a?b:c + a?c:b
> > > -> b+c.
> > >
> > > In terms of number of operations, some transformations turning 2
> > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not 
> > > look
> > > like a gain... I expect the bit_not disappears in most cases, and
> > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
> > >
> > > I am a bit confused that with avx512 we get types like "vector(4)
> > > " with :2 and not :1 (is it a hack so true is 1 and not
> > > -1?), but that doesn't matter for this patch.
> >
> > OK.
> >
> > Thanks,
> > Richard.
> >
> > > 2020-08-05  Marc Glisse  
> > >
> > > PR tree-optimization/95906
> > > PR target/70314
> > > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> > > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> > > (op (c ? a : b)): Update to match the new transformations.
> > >
> > > * gcc.dg/tree-ssa/andnot-2.c: New file.
> > > * gcc.dg/tree-ssa/pr95906.c: Likewise.
> > > * gcc.target/i386/pr70314.c: Likewise.
> > >
>
> I think this patch is causing several ICEs on arm-none-linux-gnueabihf
> --with-cpu cortex-a9 --with-fpu neon-fp16:
>   Executed from: gcc.c-torture/compile/compile.exp
> gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
> gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
>   Executed from: gcc.dg/dg.exp
> gcc.dg/pr87746.c (internal compiler error)
>   Executed from: gcc.dg/tree-ssa/tree-ssa.exp
> gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
>   Executed from: gcc.dg/vect/vect.exp
> gcc.dg/vect/pr59591-1.c (internal compiler error)
> gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
> gcc.dg/vect/pr86927.c (internal compiler error)
> gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
> gcc.dg/vect/slp-cond-5.c (internal compiler error)
> gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
> gcc.dg/vect/vect-23.c (internal compiler error)
> gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
> gcc.dg/vect/vect-24.c (internal compiler error)
> gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
> gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
> gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
> compiler error)
>
> Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions
> during RTL pass: expand
> /gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
> compiler error: in do_store_flag, at expr.c:12259
> 0x8feb26 do_store_flag
> /gcc/expr.c:12259
> 0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
> /gcc/expr.c:9617
> 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
> /gcc/expr.c:10159
> 0x91174e expand_expr
> /gcc/expr.h:282
> 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> rtx_def**, expand_modifier)
> /gcc/expr.c:8065
> 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
> /gcc/expr.c:9950
> 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
> /gcc/expr.c:10159
> 0x91174e expand_expr
> /gcc/expr.h:282
> 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> rtx_def**, expand_modifier)
> /gcc/expr.c:8065
> 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
> /gcc/expr.c:9950
> 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
> /gcc/expr.c:10159
> 0x91174e expan

Re: std:vec for classes with constructor?

2020-08-06 Thread Richard Biener via Gcc-patches
On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely  wrote:
>
> On 06/08/20 06:16 +0100, Richard Sandiford wrote:
> >Andrew MacLeod via Gcc-patches  writes:
> >> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
> >>> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  
> >>> wrote:
>  On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>  [...]
> 
> > * ipa-cp changes from vec to std::vec.
> >
> > We are using std::vec to ensure constructors are run, which they
>  aren't
> > in our internal vec<> implementation.  Although we usually steer away
> > from using std::vec because of interactions with our GC system,
> > ipcp_param_lattices is only live within the pass and allocated with
>  calloc.
>  Ummm... I did not object but I will save the URL of this message in the
>  archive so that I can waive it in front of anyone complaining why I
>  don't use our internal vec's in IPA data structures.
> 
>  But it actually raises a broader question: was this supposed to be an
>  exception, allowed only not to complicate the irange patch further, or
>  will this be generally accepted thing to do when someone wants to have
>  a
>  vector of constructed items?
> >>> It's definitely not what we want. You have to find another solution to 
> >>> this problem.
> >>>
> >>> Richard.
> >>>
> >>
> >> Why isn't it what we want?
> >>
> >> This is a small vector local to the pass so it doesn't interfere with
> >> our PITA GTY.
> >> The class is pretty straightforward, but we do need a constructor to
> >> initialize the pointer and the max-size field.  There is no allocation
> >> done per element, so a small number of elements have a couple of fields
> >> initialized per element. We'd have to loop to do that anyway.
> >>
> >> GCC's vec<> does not provide he ability to run a constructor, std::vec
> >> does.
> >
> >I realise you weren't claiming otherwise, but: that could be fixed :-)
>
> It really should be.
>
> Artificial limitations like that are just a booby trap for the unwary.

It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?

> >> I quizzed some libstdc++ folks, and there has been a lot of
> >> optimizations done on std::vec over the last few years,.. They think its
> >> pretty good now, and we were encouraged to use it.
> >>
> >> We can visit the question tho...  What is the rationale for not using
> >> std::vec in the compiler?  We currently use std::swap, std:pair,
> >> std::map, std::sort, and a few others.
> >> is there some aspect of using std::vec I am not aware of that makes it
> >> something we need to avoid?
> >
> >One reason to prefer vec<> for general interfaces is that it
> >works with auto_vec<…, N>, making it possible to pre-allocate a
> >reasonably-sized buffer on the stack without needing a round-trip
> >through the allocators.
> >
> >FWIW, that isn't simply a GCC thing.  LLVM (which is obviously much
> >more C++-intensive than GCC) still makes heavy use of SmallVector for
> >automatic variables.  And the reason we have things like memory_block.h
> >is that malloc did used to show up high in profiles.
>
> Yes, LLVM's SmallVector is very useful. You can achieve a similar
> thing with a custom allocator in std::vector, but it's more cumbersome
> and it alters the type from std::vector to std::vector.
>
> The beauty of the LLVM design is the common base class for
> SmallVector is the same for all N, so you can pass it to APIs
> that don't care about the size and just work with the base interface.
>
> >(FWIW, I'm not saying that's an argument in favour of avoiding
> >std::vector completely.  It's just a reason why it might not always
> >be the right choice.)
> >
> >Thanks,
> >Richard
> >
>


Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 12:12 +0200, Jakub Jelinek wrote:

On Wed, Aug 05, 2020 at 04:31:08PM -0600, Martin Sebor via Gcc-patches wrote:

I've always found the second argument to __builtin_object_size
confusing for types above 1.  I don't see anything wrong in
the diff but I believe the most useful results are with type 1
for string functions and type 0 for raw memory functions like
memcpy (that's what _FORTIFY_SOURCE uses for the two sets of
functions).  In type 2 when the result is zero it means one of
two things: either the size of the array couldn't be determined
or it really is zero.  That's less than helpful in cases like:

  char a[8];
  strcpy (a + 8, s);

where it prevents detecting the buffer overflow.


I don't know what is confusing about it.


Personally I find the docs very confusing.

"The second bit determines if maximum or minimum of remaining bytes is
computed. "

OK, so is it maximum when the bit is set of maximum when the bit is
clear?

To answer that question I have to go back to the middle of the
previous paragraph and carefully parse it. 


"the returned number is the maximum of remaining byte counts in those
objects if type & 2 is 0 and minimum if nonzero."

This part talks about "if type & 2 is 0" and "nonzero", could we be
consistent and talk about a bit being clear/set, or use bitwise
operator notation, but not flip between the two? And use zero/nonzero
rather than 0/nonzero?

The inconsistency in presentation increases the mental load of parsing
it. I'll propose a patch for those docs when I get time.



With the 0/1 arguments bos returns an upper bound for the object size
(and the don't know value is the maximum in that case, i.e. (size_t)-1),
while with 2/3 arguments bos returns an lower bound for the object size
(and thus the don't know value is the minimum value, i.e. 0).
The 2/3 modes are obviously not something you want to use in strcpy etc.
implementation, in those cases you want to abort the program only when
it is guaranteed to be invalid, i.e. when it will certainly overflow
the available size in any case, while with the 2/3 modes it would abort already
if there is a possibility the object might not be big enough.


For my case I'm not aborting, I'm deciding whether to use the result
from __builtin_object_size or just assume the array is as large as the
entire address space (which is the old behaviour).

I think Martin's right that I should use 0. Technically I could
probably use 1, because for struct S { char buf1[2]; char buf2[2]; };
it would be undefined to write 4 bytes into it, but it "worked" with
previous versions and so I'm choosing to let it keep "working". This
doesn't need to be 100% safe, because the API has been replaced by a
safer one for C++20 anyway.


One can e.g. use both modes to check if the object is known to have exactly
a particular size, when
__builtin_object_size (ptr, 0) == __builtin_object_size (ptr, 2)
and the bos returns say 25, then you know it is exactly 25 bytes.
E.g. if one has:
 ptr = flag ? malloc (32) : malloc (64);
 x[0] = __builtin_object_size (ptr, 0);
 x[1] = __builtin_object_size (ptr, 2);
then x[0] will be 64 as the maximum and x[1] to 32 as the minimum (of course
unless flag can be folded to constant, then both would be the same depending
on to which constant it is folded).

Jakub




Re: std:vec for classes with constructor?

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 12:31 +0200, Richard Biener wrote:

On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely  wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:
>Andrew MacLeod via Gcc-patches  writes:
>> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
>>> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  
wrote:
 On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
 [...]

> * ipa-cp changes from vec to std::vec.
>
> We are using std::vec to ensure constructors are run, which they
 aren't
> in our internal vec<> implementation.  Although we usually steer away
> from using std::vec because of interactions with our GC system,
> ipcp_param_lattices is only live within the pass and allocated with
 calloc.
 Ummm... I did not object but I will save the URL of this message in the
 archive so that I can waive it in front of anyone complaining why I
 don't use our internal vec's in IPA data structures.

 But it actually raises a broader question: was this supposed to be an
 exception, allowed only not to complicate the irange patch further, or
 will this be generally accepted thing to do when someone wants to have
 a
 vector of constructed items?
>>> It's definitely not what we want. You have to find another solution to this 
problem.
>>>
>>> Richard.
>>>
>>
>> Why isn't it what we want?
>>
>> This is a small vector local to the pass so it doesn't interfere with
>> our PITA GTY.
>> The class is pretty straightforward, but we do need a constructor to
>> initialize the pointer and the max-size field.  There is no allocation
>> done per element, so a small number of elements have a couple of fields
>> initialized per element. We'd have to loop to do that anyway.
>>
>> GCC's vec<> does not provide he ability to run a constructor, std::vec
>> does.
>
>I realise you weren't claiming otherwise, but: that could be fixed :-)

It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.



Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-06 Thread Aldy Hernandez via Gcc-patches




On 8/6/20 8:57 AM, Richard Biener wrote:

On Thu, Aug 6, 2020 at 3:07 AM Andrew MacLeod  wrote:


On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:

On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't

in our internal vec<> implementation.  Although we usually steer away
from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated with

calloc.
Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have
a
vector of constructed items?

It's definitely not what we want. You have to find another solution to this 
problem.

Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no allocation
done per element, so a small number of elements have a couple of fields
initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor, std::vec
does.


Other places in the compiler use placement new here.  The only
complication is re-allocation which would require move CTORs which
up to a few weeks ago were not available.  But of course we want
our own re-allocation policy, thus vec<>, not std::vector.

Also you do

#include 

inside ipa-fnsummary.c - you should know better to not include
system headers from .c files - they need to go in system.h.
You should do a

#define INCLUDE_VECTOR

before the system.h include in ipa-fnsummary.c instead.


This has already been fixed by Gerald (thanks!).

Aldy



Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Marc Glisse

On Thu, 6 Aug 2020, Richard Biener wrote:


On Thu, Aug 6, 2020 at 10:17 AM Christophe Lyon
 wrote:


Hi,


On Wed, 5 Aug 2020 at 16:24, Richard Biener via Gcc-patches
 wrote:


On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse  wrote:


New version that passed bootstrap+regtest during the night.

When vector comparisons were forced to use vec_cond_expr, we lost a number of
optimizations (my fault for not adding enough testcases to prevent that).
This patch tries to unwrap vec_cond_expr a bit so some optimizations can
still happen.

I wasn't planning to add all those transformations together, but adding one
caused a regression, whose fix introduced a second regression, etc.

Restricting to constant folding would not be sufficient, we also need at
least things like X|0 or X&X. The transformations are quite conservative
with :s and folding only if everything simplifies, we may want to relax
this later. And of course we are going to miss things like a?b:c + a?c:b
-> b+c.

In terms of number of operations, some transformations turning 2
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look
like a gain... I expect the bit_not disappears in most cases, and
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.

I am a bit confused that with avx512 we get types like "vector(4)
" with :2 and not :1 (is it a hack so true is 1 and not
-1?), but that doesn't matter for this patch.


OK.

Thanks,
Richard.


2020-08-05  Marc Glisse  

PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
(op (c ? a : b)): Update to match the new transformations.

* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.



I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
  Executed from: gcc.c-torture/compile/compile.exp
gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
  Executed from: gcc.dg/dg.exp
gcc.dg/pr87746.c (internal compiler error)
  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
  Executed from: gcc.dg/vect/vect.exp
gcc.dg/vect/pr59591-1.c (internal compiler error)
gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/pr86927.c (internal compiler error)
gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/slp-cond-5.c (internal compiler error)
gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-23.c (internal compiler error)
gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-24.c (internal compiler error)
gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
/gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/gcc/expr.c:10159
0x91174e expand_expr
/gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
/gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rt

Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Christophe Lyon via Gcc-patches
On Thu, 6 Aug 2020 at 11:06, Marc Glisse  wrote:
>
> On Thu, 6 Aug 2020, Christophe Lyon wrote:
>
> >>> 2020-08-05  Marc Glisse  
> >>>
> >>> PR tree-optimization/95906
> >>> PR target/70314
> >>> * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> >>> (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> >>> (op (c ? a : b)): Update to match the new transformations.
> >>>
> >>> * gcc.dg/tree-ssa/andnot-2.c: New file.
> >>> * gcc.dg/tree-ssa/pr95906.c: Likewise.
> >>> * gcc.target/i386/pr70314.c: Likewise.
> >>>
> >
> > I think this patch is causing several ICEs on arm-none-linux-gnueabihf
> > --with-cpu cortex-a9 --with-fpu neon-fp16:
> >  Executed from: gcc.c-torture/compile/compile.exp
> >gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> > compiler error)
> >gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
> >  Executed from: gcc.dg/dg.exp
> >gcc.dg/pr87746.c (internal compiler error)
> >  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
> >gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
>
> I tried a cross from x86_64-linux with current master
>
> .../configure --target=arm-none-linux-gnueabihf --enable-languages=c,c++ 
> --with-system-zlib --disable-nls --with-cpu=cortex-a9 --with-fpu=neon-fp16
> make
>
> it stops at some point with an error, but I have xgcc and cc1 in
> build/gcc.
>
> I copied 2 of the testcases and compiled
>
> ./xgcc pr87746.c -Ofast -S -B.
> ./xgcc -O3 -fdump-tree-ifcvt-details-blocks-details ifc-cd.c -S -B.
>
> without getting any ICE.

Sorry for the delay, I had to reproduce the problem manually.
>
> Is there a machine on the compile farm where this is easy to reproduce?
I don't think there is any arm machine in the compile farm.

> Or could you attach the .optimized dump that corresponds to the
> backtrace below? It looks like we end up with a comparison with an
> unexpected return type.
>

I've compiled pr87746.c with -fdump-tree-ifcvt-details-blocks-details,
here is the log.
Is that what you need?

Thanks,

Christophe

> >  Executed from: gcc.dg/vect/vect.exp
> >gcc.dg/vect/pr59591-1.c (internal compiler error)
> >gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
> >gcc.dg/vect/pr86927.c (internal compiler error)
> >gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
> >gcc.dg/vect/slp-cond-5.c (internal compiler error)
> >gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler 
> > error)
> >gcc.dg/vect/vect-23.c (internal compiler error)
> >gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
> >gcc.dg/vect/vect-24.c (internal compiler error)
> >gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
> >gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
> >gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
> > compiler error)
> >
> > Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
> > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> > -finline-functions
> > during RTL pass: expand
> > /gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
> > compiler error: in do_store_flag, at expr.c:12259
> > 0x8feb26 do_store_flag
> >/gcc/expr.c:12259
> > 0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> > expand_modifier)
> >/gcc/expr.c:9617
> > 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> >/gcc/expr.c:10159
> > 0x91174e expand_expr
> >/gcc/expr.h:282
> > 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> > rtx_def**, expand_modifier)
> >/gcc/expr.c:8065
> > 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> > expand_modifier)
> >/gcc/expr.c:9950
> > 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> >/gcc/expr.c:10159
> > 0x91174e expand_expr
> >/gcc/expr.h:282
> > 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> > rtx_def**, expand_modifier)
> >/gcc/expr.c:8065
> > 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> > expand_modifier)
> >/gcc/expr.c:9950
> > 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> >/gcc/expr.c:10159
> > 0x91174e expand_expr
> >/gcc/expr.h:282
> > 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> > rtx_def**, expand_modifier)
> >/gcc/expr.c:8065
> > 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> > expand_modifier)
> >/gcc/expr.c:9950
> > 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> >/gcc/expr.c:

RE: [PATCH 3/5][Arm] New pattern for CSINC instructions

2020-08-06 Thread Kyrylo Tkachov
Hi Omar,

From: Omar Tahir  
Sent: 04 August 2020 17:13
To: Kyrylo Tkachov ; ni...@redhat.com; Ramana 
Radhakrishnan ; Richard Earnshaw 
; gcc-patches@gcc.gnu.org
Subject: [PATCH 3/5][Arm] New pattern for CSINC instructions

This patch adds a new pattern, *thumb2_csinc, for generating CSINC
instructions. It also modifies an existing pattern, *thumb2_cond_arith, to
output CINC when the operation is an addition and TARGET_COND_ARITH is true.

Regression tested on arm-none-eabi.


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*thumb2_csinc): New.
(*thumb2_cond_arith): Generate CINC where possible.

gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csinc-1.c: New test.


diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 0b00aef7ef7..79cf684e5cb 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -743,6 +743,9 @@
 if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx)
   return \"%i5\\t%0, %1, %2, lsr #31\";

+if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH)
+  return \"cinc\\t%0, %1, %d4\";
+
 output_asm_insn (\"cmp\\t%2, %3\", operands);


Hmmm, this looks wrong. The pattern needs to perform the comparison (setting 
the CC reg) as well as do the conditional increment.
Emitting a cinc without a cmp won't set the CC flags.
Also, cinc increments only by 1, whereas the "arm_rhs_operand" predicate 
accepts a wider variety of immediates, so just checking for GET_CODE 
(operands[5]) == PLUS isn't enough.

Thanks,
Kyrill


 if (GET_CODE (operands[5]) == AND)
   {
@@ -952,6 +955,21 @@
(set_attr "predicable" "no")]
)

+(define_insn "*thumb2_csinc"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(plus:SI (match_operand:SI 2 "arm_general_register_operand" "r, r")
+ (const_int 1))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))]
+  "TARGET_COND_ARITH"
+  "@
+   csinc\\t%0, %3, %2, %D1
+   csinc\\t%0, zr, %2, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn "*thumb2_movcond"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts")
(if_then_else:SI
diff --git a/gcc/testsuite/gcc.target/arm/csinc-1.c 
b/gcc/testsuite/gcc.target/arm/csinc-1.c
new file mode 100644
index 000..b9928493862
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/csinc-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
+/* { dg-options "-O2 -march=armv8.1-m.main" } */
+
+int
+test_csinc32_condasn1(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*ne" } } */
+  w4 = (w0 == w1) ? (w2 + 1) : w3;
+  return w4;
+}
+
+int
+test_csinc32_condasn2(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*eq" } } */
+  w4 = (w0 == w1) ? w3 : (w2 + 1);
+  return w4;
+}


Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Marc Glisse

On Thu, 6 Aug 2020, Christophe Lyon wrote:


On Thu, 6 Aug 2020 at 11:06, Marc Glisse  wrote:


On Thu, 6 Aug 2020, Christophe Lyon wrote:


2020-08-05  Marc Glisse  

PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
(op (c ? a : b)): Update to match the new transformations.

* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.



I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
 Executed from: gcc.c-torture/compile/compile.exp
   gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
   gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
 Executed from: gcc.dg/dg.exp
   gcc.dg/pr87746.c (internal compiler error)
 Executed from: gcc.dg/tree-ssa/tree-ssa.exp
   gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)


I tried a cross from x86_64-linux with current master

.../configure --target=arm-none-linux-gnueabihf --enable-languages=c,c++ 
--with-system-zlib --disable-nls --with-cpu=cortex-a9 --with-fpu=neon-fp16
make

it stops at some point with an error, but I have xgcc and cc1 in
build/gcc.

I copied 2 of the testcases and compiled

./xgcc pr87746.c -Ofast -S -B.
./xgcc -O3 -fdump-tree-ifcvt-details-blocks-details ifc-cd.c -S -B.

without getting any ICE.


Sorry for the delay, I had to reproduce the problem manually.


Is there a machine on the compile farm where this is easy to reproduce?

I don't think there is any arm machine in the compile farm.


Or could you attach the .optimized dump that corresponds to the
backtrace below? It looks like we end up with a comparison with an
unexpected return type.



I've compiled pr87746.c with -fdump-tree-ifcvt-details-blocks-details,
here is the log.
Is that what you need?


Thanks.
The one from -fdump-tree-optimized would be closer to the ICE.
Though it would also be convenient to know which stmt is being expanded 
when we ICE, etc.


Was I on the right track configuring with 
--target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 
--with-fpu=neon-fp16

then compiling without any special option?


Thanks,

Christophe


 Executed from: gcc.dg/vect/vect.exp
   gcc.dg/vect/pr59591-1.c (internal compiler error)
   gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/pr86927.c (internal compiler error)
   gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/slp-cond-5.c (internal compiler error)
   gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-23.c (internal compiler error)
   gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-24.c (internal compiler error)
   gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
   gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
   gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
   /gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand_expr
   /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
   /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
   /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   /gcc/expr.c:10159
0x91174e expand

Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Christophe Lyon via Gcc-patches
On Thu, 6 Aug 2020 at 13:42, Marc Glisse  wrote:
>
> On Thu, 6 Aug 2020, Christophe Lyon wrote:
>
> > On Thu, 6 Aug 2020 at 11:06, Marc Glisse  wrote:
> >>
> >> On Thu, 6 Aug 2020, Christophe Lyon wrote:
> >>
> > 2020-08-05  Marc Glisse  
> >
> > PR tree-optimization/95906
> > PR target/70314
> > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> > (op (c ? a : b)): Update to match the new transformations.
> >
> > * gcc.dg/tree-ssa/andnot-2.c: New file.
> > * gcc.dg/tree-ssa/pr95906.c: Likewise.
> > * gcc.target/i386/pr70314.c: Likewise.
> >
> >>>
> >>> I think this patch is causing several ICEs on arm-none-linux-gnueabihf
> >>> --with-cpu cortex-a9 --with-fpu neon-fp16:
> >>>  Executed from: gcc.c-torture/compile/compile.exp
> >>>gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
> >>> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> >>> compiler error)
> >>>gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
> >>>  Executed from: gcc.dg/dg.exp
> >>>gcc.dg/pr87746.c (internal compiler error)
> >>>  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
> >>>gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
> >>
> >> I tried a cross from x86_64-linux with current master
> >>
> >> .../configure --target=arm-none-linux-gnueabihf --enable-languages=c,c++ 
> >> --with-system-zlib --disable-nls --with-cpu=cortex-a9 --with-fpu=neon-fp16
> >> make
> >>
> >> it stops at some point with an error, but I have xgcc and cc1 in
> >> build/gcc.
> >>
> >> I copied 2 of the testcases and compiled
> >>
> >> ./xgcc pr87746.c -Ofast -S -B.
> >> ./xgcc -O3 -fdump-tree-ifcvt-details-blocks-details ifc-cd.c -S -B.
> >>
> >> without getting any ICE.
> >
> > Sorry for the delay, I had to reproduce the problem manually.
> >>
> >> Is there a machine on the compile farm where this is easy to reproduce?
> > I don't think there is any arm machine in the compile farm.
> >
> >> Or could you attach the .optimized dump that corresponds to the
> >> backtrace below? It looks like we end up with a comparison with an
> >> unexpected return type.
> >>
> >
> > I've compiled pr87746.c with -fdump-tree-ifcvt-details-blocks-details,
> > here is the log.
> > Is that what you need?
>
> Thanks.
> The one from -fdump-tree-optimized would be closer to the ICE.
Here it is.

> Though it would also be convenient to know which stmt is being expanded
> when we ICE, etc.
I think it's when expanding
_96 = _86 | _95;
(that the value of "stmt" in expand_gimple_stmt_1 when we enter do_store_flag

> Was I on the right track configuring with
> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
> --with-fpu=neon-fp16
> then compiling without any special option?
>
Maybe you also need --with-float=hard, I don't remember if it's
implied by the 'hf' target suffix
(I saw similar problems with arm-none-linux-gnueabi anyway)

> > Thanks,
> >
> > Christophe
> >
> >>>  Executed from: gcc.dg/vect/vect.exp
> >>>gcc.dg/vect/pr59591-1.c (internal compiler error)
> >>>gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler 
> >>> error)
> >>>gcc.dg/vect/pr86927.c (internal compiler error)
> >>>gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
> >>>gcc.dg/vect/slp-cond-5.c (internal compiler error)
> >>>gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler 
> >>> error)
> >>>gcc.dg/vect/vect-23.c (internal compiler error)
> >>>gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
> >>>gcc.dg/vect/vect-24.c (internal compiler error)
> >>>gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
> >>>gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
> >>>gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
> >>> compiler error)
> >>>
> >>> Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
> >>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> >>> -finline-functions
> >>> during RTL pass: expand
> >>> /gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
> >>> compiler error: in do_store_flag, at expr.c:12259
> >>> 0x8feb26 do_store_flag
> >>>/gcc/expr.c:12259
> >>> 0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> >>> expand_modifier)
> >>>/gcc/expr.c:9617
> >>> 0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >>> expand_modifier, rtx_def**, bool)
> >>>/gcc/expr.c:10159
> >>> 0x91174e expand_expr
> >>>/gcc/expr.h:282
> >>> 0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
> >>> rtx_def**, expand_modifier)
> >>>/gcc/expr.c:8065
> >>> 0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> >>> expand_modifier)
> >>>/gcc/expr.c:9950
> >>> 0x908cd0 expand_e

RE: [PATCH 3/5][Arm] New pattern for CSINC instructions

2020-08-06 Thread Omar Tahir
> Hi Omar,
> 
> diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
> index 0b00aef7ef7..79cf684e5cb 100644
> --- a/gcc/config/arm/thumb2.md
> +++ b/gcc/config/arm/thumb2.md
> @@ -743,6 +743,9 @@
>  if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx)
>return \"%i5\\t%0, %1, %2, lsr #31\";
> 
> +if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH)
> +  return \"cinc\\t%0, %1, %d4\";
> +
>  output_asm_insn (\"cmp\\t%2, %3\", operands);
> 
> 
> Hmmm, this looks wrong. The pattern needs to perform the comparison (setting 
> the CC reg) as well as do the conditional increment.
> Emitting a cinc without a cmp won't set the CC flags.
> Also, cinc increments only by 1, whereas the "arm_rhs_operand" predicate 
> accepts a wider variety of immediates, so just checking for GET_CODE 
> (operands[5]) == PLUS isn't enough.
> 
> Thanks,
> Kyrill
> 

My bad, the following line

output_asm_insn (\"cmp\\t%2, %3\", operands);

should be before my change rather than after, that will generate the cmp needed.

As for the predicate accepting other immediates, I don't think that's an issue. 
From what I understand, the pattern represents

r0 = f5 (f4 (r2, r3), r1)

where f5 is a shiftable operator and f4 is a comparison operator. For 
simplicity let's just assume f4 is ==. Then we have

r0 = f5 (r1, r2 == r3)

If f5 is PLUS then we get

r0 = r1 + (r2 == r3)

which is

r0 = (r2 == r3) ? r1 + 1 : r1

i.e. cmp r2, r3 \\ cinc r0, r1, eq. Since all comparisons return either zero 
(comparison failed) or 1 (comparison passed) a cinc should always work as long 
as the shiftable operator is PLUS.
Operand 3 being an "arm_rhs_operand" shouldn't matter since it's just being 
compared to operand 2 and returning a 0 or 1.

Thanks,
Omar


[COMMITTED] bpf: more flexible support for kernel helpers

2020-08-06 Thread Jose E. Marchesi via Gcc-patches
This patch changes the existing support for BPF kernel helpers to be
more flexible, in two main ways.

First, there is no longer a hardcoded list of kernel helpers defined
in the compiler internals.  This is replaced by a new target-specific
attribute `kernel_helper' that the user can use to define her own
helpers, annotating function prototypes.

Second, following feedback from the kernel hackers, the pre-defined
helpers in the distributed bpf-helpers.h are no longer available
conditionally depending on the kernel version used in -mkernel.  The
command-line option stays for now, as it may be useful for other
things.

Target tests and documentation updated.

2020-08-06  Jose E. Marchesi  

gcc/
* config/bpf/bpf-helpers.h (KERNEL_HELPER): Define.
(KERNEL_VERSION): Remove.
* config/bpf/bpf-helpers.def: Delete.
* config/bpf/bpf.c (bpf_handle_fndecl_attribute): New function.
(bpf_attribute_table): Define.
(bpf_helper_names): Delete.
(bpf_helper_code): Likewise.
(enum bpf_builtins): Adjust to new helpers mechanism.
(bpf_output_call): Likewise.
(bpf_init_builtins): Likewise.
(bpf_init_builtins): Likewise.
* doc/extend.texi (BPF Function Attributes): New section.
(BPF Kernel Helpers): Delete section.

gcc/testsuite/
* gcc.target/bpf/helper-bind.c: Adjust to new kernel helpers
mechanism.
* gcc.target/bpf/helper-bpf-redirect.c: Likewise.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewis

[PATCH][testsuite] Add gcc.dg/ia64-sync-5.c

2020-08-06 Thread Tom de Vries
Hi,

There currently is no sync_char_short-enabled run test that tests
__sync_val_compare_and_swap.

Fix this by copying ia64-sync-3.c and modifying it for char/short.

Tested on x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite] Add gcc.dg/ia64-sync-5.c

2020-08-06  Kwok Cheung Yeung  
Tom de Vries  

gcc/testsuite/ChangeLog:

* gcc.dg/ia64-sync-5.c: New test.

---
 gcc/testsuite/gcc.dg/ia64-sync-5.c | 83 ++
 1 file changed, 83 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/ia64-sync-5.c 
b/gcc/testsuite/gcc.dg/ia64-sync-5.c
new file mode 100644
index 000..8b16b29b20e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ia64-sync-5.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_char_short } */
+/* { dg-options } */
+/* { dg-options "-march=i486" { target { { i?86-*-* x86_64-*-* } && ia32 } } } 
*/
+/* { dg-options "-mcpu=v9" { target sparc*-*-* } } */
+
+/* Test basic functionality of the intrinsics.  */
+
+/* This is a copy of gcc.dg/ia64-sync-3.c, for 8-bit and 16-bit.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+static char AC[4];
+static char init_qi[4] = { -30,-30,-50,-50 };
+static char test_qi[4] = { -115,-115,25,25 };
+
+static void
+do_qi (void)
+{
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+}
+
+static short AS[4];
+static short init_hi[4] = { -30,-30,-50,-50 };
+static short test_hi[4] = { -115,-115,25,25 };
+
+static void
+do_hi (void)
+{
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+}
+
+int main()
+{
+  memcpy(AC, init_qi, sizeof(init_qi));
+  memcpy(AS, init_hi, sizeof(init_hi));
+
+  do_qi ();
+  do_hi ();
+
+  if (memcmp (AC, test_qi, sizeof(test_qi)))
+abort ();
+  if (memcmp (AS, test_hi, sizeof(test_hi)))
+abort ();
+
+  return 0;
+}


Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Marek Polacek via Gcc-patches
On Wed, Aug 05, 2020 at 01:01:32PM -0700, Mike Stump wrote:
> On Aug 4, 2020, at 5:54 PM, Marek Polacek  wrote:
> >> As you find it difficult to express a test using the existing mechanisms, 
> >> let's talk about those and see if anyone has a good idea on how to express 
> >> it.  I think ICEs are the most annoying to manage, but, I think excess and 
> >> prune should be able to handle them.  I think should get an error or 
> >> warning, or should not get an error or warning are more trivial to manage.
> > 
> > I experimented with
> > // { dg-prune-output ".*internal compiler error.*" }
> > // { dg-xfail-if "" { *-*-* } }
> > but it's a mouthful and the results were poor (when the ICE is fixed but we
> > generate errors instead).  dg-ice is convenient, handles even the different
> > kind of ICE (when the diagnostic routines were re-entered), and generates
> > nice XPASSes when the ICE goes away.
> > 
> > I've also played games with dg-regexp but it was too ugly.
> > 
> > (I honestly don't see why new directives are such a big deal, if they're
> > properly documented.)
> 
> I don't see a bogus here?  I think that can't be skipped.

A dg-bogus?  Where?  Putting it on the line where the ICE happens results in
FAILs.  A complete example of what I had in mind:

// PR c++/88003
// { dg-do compile { target c++14 } }
// { dg-prune-output ".*internal compiler error.*" }
// { dg-xfail-if "" { *-*-* } }

auto test() {
  struct O {
struct N;
  };
  return O();
}

typedef decltype(test()) TN;
struct TN::N {};


I think I'd still prefer a single dg-ice.

Marek



Re: [PATCH] x86_64: Integer min/max improvements.

2020-08-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 06, 2020 at 09:40:49AM +0100, Roger Sayle wrote:

This test fails on i686-linux (or x86_64-linux when testing with -m32).
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
i386.exp=minmax-9.c'
Running /usr/src/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
FAIL: gcc.target/i386/minmax-9.c scan-assembler-times test 3

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/minmax-9.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +#define max(a,b) (((a) > (b))? (a) : (b))
> +#define min(a,b) (((a) < (b))? (a) : (b))
> +
> +int foo(int x)
> +{
> +  return max(x,0);
> +}
> +
> +int bar(int x)
> +{
> +  return min(x,0);
> +}
> +
> +unsigned int baz(unsigned int x)
> +{
> +  return min(x,1);
> +}
> +
> +/* { dg-final { scan-assembler-times "xor" 3 } } */
> +/* { dg-final { scan-assembler-times "test" 3 } } */

Jakub



Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 06, 2020 at 08:27:05AM -0400, Marek Polacek via Gcc-patches wrote:
> // PR c++/88003
> // { dg-do compile { target c++14 } }
> // { dg-prune-output ".*internal compiler error.*" }
> // { dg-xfail-if "" { *-*-* } }

Would that XPASS if the ICE is fixed though?

Jakub



Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Marek Polacek via Gcc-patches
On Thu, Aug 06, 2020 at 02:30:11PM +0200, Jakub Jelinek wrote:
> On Thu, Aug 06, 2020 at 08:27:05AM -0400, Marek Polacek via Gcc-patches wrote:
> > // PR c++/88003
> > // { dg-do compile { target c++14 } }
> > // { dg-prune-output ".*internal compiler error.*" }
> > // { dg-xfail-if "" { *-*-* } }
> 
> Would that XPASS if the ICE is fixed though?

Yes, but if the ICE is fixed and we issue an error instead, we won't get
an XPASS, just expected failures, so you'd never notice that the ICE is
gone.  Therefore, dg-prune-output + dg-xfail-if is not a viable solution.

Marek



[PATCH/RFC] options: Make --help= to emit values post-overrided

2020-08-06 Thread Kewen.Lin via Gcc-patches
Hi,

When I was working to update patch as Richard's review comments
here https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551474.html,
I noticed that the options "-Q --help=params" don't show the final values
after target option overriding, instead it emits the default values in
params.opt (without any explicit param settings).

I guess it's more meaningful to get it to emit values post-overrided,
to avoid possible confusion for users.  Does it make sense?
Or are there any concerns?

btw, not sure whether it's a good idea to move target_option_override_hook
call into print_specific_help and use one function local static
variable to control it's called once for all kinds of help dumping
(possible combination), then can remove the calls in function 
common_handle_option.


BR,
Kewen
-

gcc/ChangeLog:

* opts-global.c (decode_options): Adjust call to print_help.
* opts.c (print_help): Add one function point argument
target_option_override_hook and call it before print_specific_help.
* opts.h (print_help): Add one more argument to function declare.

diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..ec960c87c9a 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -328,7 +328,7 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   const char *arg;
 
   FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+print_help (opts, lang_mask, arg, target_option_override_hook);
 }
 
 /* Hold command-line options associated with stack limitation.  */
diff --git a/gcc/opts.c b/gcc/opts.c
index 499eb900643..df184f909e6 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2017,7 +2017,8 @@ check_alignment_argument (location_t loc, const char 
*flag, const char *name)
 
 void
 print_help (struct gcc_options *opts, unsigned int lang_mask,
-   const char *help_option_argument)
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void))
 {
   const char *a = help_option_argument;
   unsigned int include_flags = 0;
@@ -2145,9 +2146,11 @@ print_help (struct gcc_options *opts, unsigned int 
lang_mask,
   if (!(include_flags & CL_PARAMS))
 exclude_flags |= CL_PARAMS;
 
-  if (include_flags)
+  if (include_flags) {
+target_option_override_hook ();
 print_specific_help (include_flags, exclude_flags, 0, opts,
 lang_mask);
+  }
 }
 
 /* Handle target- and language-independent options.  Return zero to
diff --git a/gcc/opts.h b/gcc/opts.h
index 8f594b46e33..9a837305af1 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -419,8 +419,9 @@ extern bool target_handle_option (struct gcc_options *opts,
 extern void finish_options (struct gcc_options *opts,
struct gcc_options *opts_set,
location_t loc);
-extern void print_help (struct gcc_options *opts, unsigned int lang_mask, const
-   char *help_option_argument);
+extern void print_help (struct gcc_options *opts, unsigned int lang_mask,
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void));
 extern void default_options_optimization (struct gcc_options *opts,
  struct gcc_options *opts_set,
  struct cl_decoded_option 
*decoded_options,


[PATCH] cmpelim: recognize extra clobbers in insns

2020-08-06 Thread Pip Cet via Gcc-patches
I'm working on the AVR cc0 -> CCmode conversion (bug#92729). One
problem is that the cmpelim pass is currently very strict in requiring
insns of the form

(parallel [(set (reg:SI) (op:SI ... ...))
   (clobber (reg:CC REG_CC))])

when in fact AVR's insns often have the form

(parallel [(set (reg:SI) (op:SI ... ...))
   (clobber (scratch:QI))
   (clobber (reg:CC REG_CC))])

The attached patch relaxes checks in the cmpelim code to recognize
such insns, and makes it attempt to recognize

(parallel [(set (reg:CC REG_CC) (compare:CC ... ...))
   (set (reg:SI (op:SI ... ...)))
   (clobber (scratch:QI))])

as a new insn for that example. This appears to work.

I've bootstrapped and run the test suite with the patch, without differences.
From 788bf691aed9f27e1719a6c2e61b12f2a24e6b5d Mon Sep 17 00:00:00 2001
From: Pip Cet 
Date: Tue, 4 Aug 2020 18:44:26 +
Subject: [PATCH] cmpelim: handle extra clobbers

Handle extra clobbers in CC-clobbering insns when attempting to
recognize the corresponding CC-setting insn.

This is for the AVR CCmode conversion. AVR has insns like

(define_insn "andhi3"
  [(set (match_operand:HI 0 ...)
(and:HI (match_operand:HI 1 ...)
	(match_operand:HI 2 ...)))
   (clobber (match_scratch:QI 3 ...))
   (clobber (reg:CC REG_CC))] ...)

which can be profitably converted into CC-setting variants.

2020-08-04  Pip Cet  

gcc/ChangeLog:

	* compare-elim.c (arithmetic_flags_clobber_p): Allow extra
clobbers. (try_validate_parallel): (try_eliminate_compare):
Likewise.
---
 gcc/compare-elim.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 85f3e344074..295c32b0953 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -202,7 +202,7 @@ arithmetic_flags_clobber_p (rtx_insn *insn)
   if (asm_noperands (pat) >= 0)
 return false;
 
-  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) == 2)
+  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) >= 2)
 {
   x = XVECEXP (pat, 0, 0);
   if (GET_CODE (x) != SET)
@@ -211,7 +211,7 @@ arithmetic_flags_clobber_p (rtx_insn *insn)
   if (!REG_P (x))
 	return false;
 
-  x = XVECEXP (pat, 0, 1);
+  x = XVECEXP (pat, 0, XVECLEN (pat, 0) - 1);
   if (GET_CODE (x) == CLOBBER)
 	{
 	  x = XEXP (x, 0);
@@ -663,6 +663,16 @@ static rtx
 try_validate_parallel (rtx set_a, rtx set_b)
 {
   rtx par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+  if (GET_CODE (set_b) == PARALLEL)
+{
+  int len = XVECLEN (set_b, 0);
+  rtvec v = rtvec_alloc (len);
+  RTVEC_ELT (v, 0) = set_a;
+  for (int i = 0; i < len - 1; i++)
+	RTVEC_ELT (v, i + 1) = XVECEXP (set_b, 0, i);
+
+  par = gen_rtx_PARALLEL (VOIDmode, v);
+}
   rtx_insn *insn = make_insn_raw (par);
 
   if (insn_invalid_p (insn, false))
@@ -873,10 +883,25 @@ try_eliminate_compare (struct comparison *cmp)
  [(set (reg:CCM) (compare:CCM (operation) (immediate)))
   (set (reg) (operation)]  */
 
-  rtvec v = rtvec_alloc (2);
+  /* Rotate
+ [(set A B)
+  (clobber (scratch))
+  ...
+  (clobber (reg:CCM))]]
+
+ to
+
+ [(set (reg:CCM) (compare:CCM (operation) (immediate)))
+  (set A B)
+  (clobber (scratch))
+  ...]  */
+
+  int len = XVECLEN (PATTERN (insn), 0);
+  rtvec v = rtvec_alloc (len);
   RTVEC_ELT (v, 0) = y;
-  RTVEC_ELT (v, 1) = x;
-  
+  for (int i = 0; i < len - 1; i++)
+RTVEC_ELT (v, i + 1) = XVECEXP (PATTERN (insn), 0, i);
+
   rtx pat = gen_rtx_PARALLEL (VOIDmode, v);
   
   /* Succeed if the new instruction is valid.  Note that we may have started
-- 
2.28.0



[PATCH] reassoc: Improve maybe_optimize_range_tests [PR96480]

2020-08-06 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase, if the IL before reassoc would be:
...
   [local count: 354334800]:
  if (x_3(D) == 2)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 233860967]:
  if (x_3(D) == 3)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 79512730]:

   [local count: 1073741824]:
  # prephitmp_7 = PHI <1(3), 1(4), 1(5), 1(2), 0(6)>
then we'd optimize it properly, but as bb 5-7 is instead:
   [local count: 233860967]:
  if (x_3(D) == 3)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 79512730]:

   [local count: 1073741824]:
  # prephitmp_7 = PHI <1(3), 1(4), 0(5), 1(2), 1(6)>
(i.e. the true/false edges on the last bb with condition swapped
and ditto for the phi args), we don't recognize it.  If bb 6
is empty, there should be no functional difference between the two IL
representations.

This patch handles those special cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-08-06  Jakub Jelinek  

PR tree-optimization/96480
* tree-ssa-reassoc.c (suitable_cond_bb): Add TEST_SWAPPED_P argument.
If TEST_BB ends in cond and has one edge to *OTHER_BB and another
through an empty bb to that block too, if PHI args don't match, retry
them through the other path from TEST_BB.
(maybe_optimize_range_tests): Adjust callers.  Handle such LAST_BB
through inversion of the condition.

* gcc.dg/tree-ssa/pr96480.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2020-07-30 15:04:38.0 +0200
+++ gcc/tree-ssa-reassoc.c  2020-08-06 11:19:30.942825436 +0200
@@ -4127,11 +4127,14 @@ final_range_test_p (gimple *stmt)
of TEST_BB, and *OTHER_BB is either NULL and filled by the routine,
or compared with to find a common basic block to which all conditions
branch to if true resp. false.  If BACKWARD is false, TEST_BB should
-   be the only predecessor of BB.  */
+   be the only predecessor of BB.  *TEST_SWAPPED_P is set to true if
+   TEST_BB is a bb ending in condition where the edge to non-*OTHER_BB
+   block points to an empty block that falls through into *OTHER_BB and
+   the phi args match that path.  */
 
 static bool
 suitable_cond_bb (basic_block bb, basic_block test_bb, basic_block *other_bb,
- bool backward)
+ bool *test_swapped_p, bool backward)
 {
   edge_iterator ei, ei2;
   edge e, e2;
@@ -4196,6 +4199,7 @@ suitable_cond_bb (basic_block bb, basic_
   /* Now check all PHIs of *OTHER_BB.  */
   e = find_edge (bb, *other_bb);
   e2 = find_edge (test_bb, *other_bb);
+ retry:;
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gphi *phi = gsi.phi ();
@@ -4221,11 +4225,52 @@ suitable_cond_bb (basic_block bb, basic_
  else
{
  gimple *test_last = last_stmt (test_bb);
- if (gimple_code (test_last) != GIMPLE_COND
- && gimple_phi_arg_def (phi, e2->dest_idx)
-== gimple_assign_lhs (test_last)
- && (integer_zerop (gimple_phi_arg_def (phi, e->dest_idx))
- || integer_onep (gimple_phi_arg_def (phi, e->dest_idx
+ if (gimple_code (test_last) == GIMPLE_COND)
+   {
+ if (backward ? e2->src != test_bb : e->src != bb)
+   return false;
+
+ /* For last_bb, handle also:
+if (x_3(D) == 3)
+  goto ; [34.00%]
+else
+  goto ; [66.00%]
+
+ [local count: 79512730]:
+
+ [local count: 1073741824]:
+# prephitmp_7 = PHI <1(3), 1(4), 0(5), 1(2), 1(6)>
+where bb 7 is *OTHER_BB, but the PHI values from the
+earlier bbs match the path through the empty bb
+in between.  */
+ edge e3;
+ if (backward)
+   e3 = EDGE_SUCC (test_bb,
+   e2 == EDGE_SUCC (test_bb, 0) ? 1 : 0);
+ else
+   e3 = EDGE_SUCC (bb,
+   e == EDGE_SUCC (bb, 0) ? 1 : 0);
+ if (empty_block_p (e3->dest)
+ && single_succ_p (e3->dest)
+ && single_succ (e3->dest) == *other_bb
+ && single_pred_p (e3->dest)
+ && single_succ_edge (e3->dest)->flags == EDGE_FALLTHRU)
+   {
+ if (backward)
+   e2 = single_succ_edge (e3->dest);
+ else
+   e = single_succ_edge (e3->dest);
+ if (test_swapped_p)
+   *test_swapped_p = true;
+ goto retry;
+   }
+   }
+ else if (gimple_phi_arg_def (phi, e2->dest_idx)
+  == gimple_assign_lhs (test_last)
+  

[PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-08-06 Thread Richard Biener
This adds a move CTOR to auto_vec and makes use of a
auto_vec return value for get_loop_exit_edges denoting
that lifetime management of the vector is handed to the caller.

The move CTOR prompted the hash_table change because it appearantly
makes the copy CTOR implicitely deleted (good) and hash-table
expansion of the odr_enum_map which is
hash_map  where odr_enum has an
auto_vec member triggers this.  Not sure if
there's a latent bug there before this (I think we're not
invoking DTORs, but we're invoking copy-CTORs).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Does this all look sensible and is it a good change
(the get_loop_exit_edges one)?

Thanks,
Richard.

2020-08-06  Richard Biener  

* vec.h (auto_vec::auto_vec (auto_vec &&):
New move CTOR.
* hash-table.h (): Use std::move when expanding.
* cfgloop.h (get_loop_exit_edges): Return auto_vec.
* cfgloop.c (get_loop_exit_edges): Adjust.
---
 gcc/cfgloop.c|  4 ++--
 gcc/cfgloop.h|  2 +-
 gcc/cfgloopmanip.c   |  3 +--
 gcc/hash-table.h |  2 +-
 gcc/ipa-fnsummary.c  |  4 +---
 gcc/ira-build.c  | 12 +++-
 gcc/ira-color.c  |  4 +---
 gcc/loop-unroll.c|  3 +--
 gcc/predict.c|  9 ++---
 gcc/tree-predcom.c   |  3 +--
 gcc/tree-ssa-loop-ch.c   |  3 +--
 gcc/tree-ssa-loop-im.c   |  3 +--
 gcc/tree-ssa-loop-ivcanon.c  |  9 ++---
 gcc/tree-ssa-loop-manip.c|  3 +--
 gcc/tree-ssa-loop-niter.c| 20 +---
 gcc/tree-ssa-loop-prefetch.c |  7 ++-
 gcc/vec.h|  6 ++
 17 files changed, 32 insertions(+), 65 deletions(-)

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 7720e6e5d2c..33a26cca6a4 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1202,10 +1202,10 @@ release_recorded_exits (function *fn)
 
 /* Returns the list of the exit edges of a LOOP.  */
 
-vec 
+auto_vec
 get_loop_exit_edges (const class loop *loop, basic_block *body)
 {
-  vec edges = vNULL;
+  auto_vec edges;
   edge e;
   unsigned i;
   edge_iterator ei;
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..f1687f37401 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -383,7 +383,7 @@ extern basic_block *get_loop_body_in_custom_order (const 
class loop *,
 extern basic_block *get_loop_body_in_custom_order (const class loop *, void *,
   int (*) (const void *, const void *, void *));
 
-extern vec get_loop_exit_edges (const class loop *, basic_block * = 
NULL);
+extern auto_vec get_loop_exit_edges (const class loop *, basic_block * = 
NULL);
 extern edge single_exit (const class loop *);
 extern edge single_likely_exit (class loop *loop, vec);
 extern unsigned num_loop_branches (const class loop *);
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 73134a20e33..3c9e2a0a99c 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -126,7 +126,7 @@ fix_loop_placement (class loop *loop, bool 
*irred_invalidated)
 {
   unsigned i;
   edge e;
-  vec exits = get_loop_exit_edges (loop);
+  auto_vec exits = get_loop_exit_edges (loop);
   class loop *father = current_loops->tree_root, *act;
   bool ret = false;
 
@@ -157,7 +157,6 @@ fix_loop_placement (class loop *loop, bool 
*irred_invalidated)
   ret = true;
 }
 
-  exits.release ();
   return ret;
 }
 
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 32f3a634e1e..487003c3acf 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -819,7 +819,7 @@ hash_table::expand ()
   if (!is_empty (x) && !is_deleted (x))
 {
   value_type *q = find_empty_slot_for_expand (Descriptor::hash (x));
- new ((void*) q) value_type (x);
+ new ((void*) q) value_type (std::move (x));
 }
 
   p++;
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 59e52927151..f750ec1725c 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -2767,7 +2767,6 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
   scev_initialize ();
   FOR_EACH_LOOP (loop, 0)
{
- vec exits;
  edge ex;
  unsigned int j;
  class tree_niter_desc niter_desc;
@@ -2776,7 +2775,7 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
  else
bb_predicate = false;
 
- exits = get_loop_exit_edges (loop);
+ auto_vec exits = get_loop_exit_edges (loop);
  FOR_EACH_VEC_ELT (exits, j, ex)
if (number_of_iterations_exit (loop, ex, &niter_desc, false)
&& !is_gimple_min_invariant (niter_desc.niter))
@@ -2794,7 +2793,6 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
   loop with independent predicate.  */
loop_iterations &= will_be_nonconstant;
}
- exits.release ();
}
 
   /* To avoid quadratic behavior we analyze stride predicates only
diff --git a/gcc/i

[PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-06 Thread qiaopeixin
Hi,

The test case vector-subscript-2.c in the gcc testsuit will report an ICE in 
the expand pass since '-mgeneral-regs-only' is incompatible with the use of 
V4SI mode. I propose to report the diagnostic information instead of ICE, and 
the problem has been discussed on PR 96479.

I attached the patch to solve the problem. Bootstrapped and tested on 
aarch64-linux-gnu. Any suggestions?

All the best,
Peixin


0001-AArch64-Add-if-condition-in-aarch64_function_value-P.patch
Description: 0001-AArch64-Add-if-condition-in-aarch64_function_value-P.patch


[PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-06 Thread Maciej W. Rozycki via Gcc-patches
Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
, and replace 
`-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
for the affected functions while not pulling the unwinder proper, which 
is not required here.

Remove the ARM overrides accordingly, retaining the hook infrastructure 
however, and make the ARM test case a generic one.

Beyond saving program space it fixes a RISC-V glibc build error due to 
unsatisfied `malloc' and `free' references from the unwinder causing 
link errors with `ld.so' where libgcc has been built at -O0.

gcc/
* testsuite/gcc.target/arm/div64-unwinding.c: Rename to...
* testsuite/gcc.dg/div64-unwinding.c: ... this.

libgcc/
* Makefile.in [!LIB2_DIVMOD_EXCEPTION_FLAGS]
(LIB2_DIVMOD_EXCEPTION_FLAGS): Replace `-fexceptions
-fnon-call-exceptions' with `-fasynchronous-unwind-tables'.
* config/arm/t-bpabi (LIB2_DIVMOD_EXCEPTION_FLAGS): Remove
variable.
* config/arm/t-netbsd-eabi (LIB2_DIVMOD_EXCEPTION_FLAGS):
Likewise.
---
Hi,

 I realised we still use handwritten ChangeLog entries (I got confused 
with now different policies each of the various pieces of the GNU 
toolchain has), so here's v2 of the change with a fix for that problem 
being the only update.

 Also I have since run verification with the `riscv64-linux-gnu' target 
and the ilp32d multilib as more representative for the change being made.
No problems were observed, although the now enabled test case scored:

UNSUPPORTED: gcc.dg/div64-unwinding.c

of course with the target failing the `! *-*-linux*' condition.

 Given that for the `riscv64-linux-gnu' target and the ilp32d multilib 
glibc currently fails to link against libgcc.a built at -O0 I first ran 
reference testing with target libraries built at -O2, but comparing that 
to change-under-test -O2 results revealed another issue with GCC target 
libraries built at -O0 causing link failures across testsuites, namely 
libgcov.a referring atomic primitives where libatomic.a has not been 
linked in.  I haven't figured out yet if the issue is in libgcov, the 
testsuite or the specs.  Examples of failures:

.../bin/riscv64-linux-gnu-ld: 
.../gcc/testsuite/g++/../../lib32/ilp32d/libgcov.a(_gcov_indirect_call_profiler_v4.o):
 in function `__gcov_topn_values_profiler_body': 
.../libgcc/libgcov-profiler.c:116: undefined reference to `__atomic_fetch_add_8'
.../bin/riscv64-linux-gnu-ld: .../libgcc/libgcov-profiler.c:129: undefined 
reference to `__atomic_fetch_add_8'
.../bin/riscv64-linux-gnu-ld: .../libgcc/libgcov-profiler.c:150: undefined 
reference to `__atomic_fetch_sub_8'
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: g++.dg/other/pr55650.C  -std=gnu++98 (test for excess errors)

There were some odd Fortran failures too, with test cases failing to link, 
making the results difficult to interpret.  Therefore I decided to arrange 
for a special build with first stage GCC built with its target libraries 
at -O2, so that first stage glibc builds, and then second stage GCC built 
with its target libraries at -O0 and second stage glibc omitted.  That 
removed the extra Fortran failures regardless of whether this change has 
been applied or not, but we may consider looking overall into why a full 
`riscv64-linux-gnu' build at -O0 has regressions against -O2 at least in 
the ilp32d multilib.

 Meanwhile, OK to apply?

  Maciej

Changes from v1:

- ChangeLog entries added.
---
 gcc/testsuite/gcc.dg/div64-unwinding.c |   25 +
 gcc/testsuite/gcc.target/arm/div64-unwinding.c |   25 -
 libgcc/Makefile.in |2 +-
 libgcc/config/arm/t-bpabi  |5 -
 libgcc/config/arm/t-netbsd-eabi|5 -
 5 files changed, 26 insertions(+), 36 deletions(-)

gcc-libgcc-divmod-asynchronous-unwind-tables.diff
Index: gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
@@ -0,0 +1,25 @@
+/* Performing a 64-bit division should not pull in the unwinder.  */
+
+/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
+/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
+/* { dg-options "-O0" } */
+
+#include 
+
+long long
+foo (long long c, long long d)
+{
+  return c/d;
+}
+
+long long x = 0;
+long long y = 1;
+
+extern int (*_Unwind_RaiseException) (void *) __attribute__((weak));
+
+int main(void)
+{
+  if (&_Unwind_RaiseException != NULL)
+abort ();;
+  return foo (x, y);
+}
Index: gcc/gcc/testsuite/gcc.target/arm/div64-unwinding.c
===
--- gcc.orig/gcc/testsuite/gcc.

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 05/08/20 16:31 -0600, Martin Sebor via Libstdc++ wrote:

On 8/5/20 3:25 PM, Jonathan Wakely wrote:

P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
  template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then we just assume it's larger than the string to be

Re: [PATCH] reassoc: Improve maybe_optimize_range_tests [PR96480]

2020-08-06 Thread Richard Biener
On Thu, 6 Aug 2020, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, if the IL before reassoc would be:
> ...
>[local count: 354334800]:
>   if (x_3(D) == 2)
> goto ; [34.00%]
>   else
> goto ; [66.00%]
> 
>[local count: 233860967]:
>   if (x_3(D) == 3)
> goto ; [34.00%]
>   else
> goto ; [66.00%]
> 
>[local count: 79512730]:
> 
>[local count: 1073741824]:
>   # prephitmp_7 = PHI <1(3), 1(4), 1(5), 1(2), 0(6)>
> then we'd optimize it properly, but as bb 5-7 is instead:
>[local count: 233860967]:
>   if (x_3(D) == 3)
> goto ; [34.00%]
>   else
> goto ; [66.00%]
> 
>[local count: 79512730]:
> 
>[local count: 1073741824]:
>   # prephitmp_7 = PHI <1(3), 1(4), 0(5), 1(2), 1(6)>
> (i.e. the true/false edges on the last bb with condition swapped
> and ditto for the phi args), we don't recognize it.  If bb 6
> is empty, there should be no functional difference between the two IL
> representations.
> 
> This patch handles those special cases.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-08-06  Jakub Jelinek  
> 
>   PR tree-optimization/96480
>   * tree-ssa-reassoc.c (suitable_cond_bb): Add TEST_SWAPPED_P argument.
>   If TEST_BB ends in cond and has one edge to *OTHER_BB and another
>   through an empty bb to that block too, if PHI args don't match, retry
>   them through the other path from TEST_BB.
>   (maybe_optimize_range_tests): Adjust callers.  Handle such LAST_BB
>   through inversion of the condition.
> 
>   * gcc.dg/tree-ssa/pr96480.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2020-07-30 15:04:38.0 +0200
> +++ gcc/tree-ssa-reassoc.c2020-08-06 11:19:30.942825436 +0200
> @@ -4127,11 +4127,14 @@ final_range_test_p (gimple *stmt)
> of TEST_BB, and *OTHER_BB is either NULL and filled by the routine,
> or compared with to find a common basic block to which all conditions
> branch to if true resp. false.  If BACKWARD is false, TEST_BB should
> -   be the only predecessor of BB.  */
> +   be the only predecessor of BB.  *TEST_SWAPPED_P is set to true if
> +   TEST_BB is a bb ending in condition where the edge to non-*OTHER_BB
> +   block points to an empty block that falls through into *OTHER_BB and
> +   the phi args match that path.  */
>  
>  static bool
>  suitable_cond_bb (basic_block bb, basic_block test_bb, basic_block *other_bb,
> -   bool backward)
> +   bool *test_swapped_p, bool backward)
>  {
>edge_iterator ei, ei2;
>edge e, e2;
> @@ -4196,6 +4199,7 @@ suitable_cond_bb (basic_block bb, basic_
>/* Now check all PHIs of *OTHER_BB.  */
>e = find_edge (bb, *other_bb);
>e2 = find_edge (test_bb, *other_bb);
> + retry:;
>for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
>  {
>gphi *phi = gsi.phi ();
> @@ -4221,11 +4225,52 @@ suitable_cond_bb (basic_block bb, basic_
> else
>   {
> gimple *test_last = last_stmt (test_bb);
> -   if (gimple_code (test_last) != GIMPLE_COND
> -   && gimple_phi_arg_def (phi, e2->dest_idx)
> -  == gimple_assign_lhs (test_last)
> -   && (integer_zerop (gimple_phi_arg_def (phi, e->dest_idx))
> -   || integer_onep (gimple_phi_arg_def (phi, e->dest_idx
> +   if (gimple_code (test_last) == GIMPLE_COND)
> + {
> +   if (backward ? e2->src != test_bb : e->src != bb)
> + return false;
> +
> +   /* For last_bb, handle also:
> +  if (x_3(D) == 3)
> +goto ; [34.00%]
> +  else
> +goto ; [66.00%]
> +
> +   [local count: 79512730]:
> +
> +   [local count: 1073741824]:
> +  # prephitmp_7 = PHI <1(3), 1(4), 0(5), 1(2), 1(6)>
> +  where bb 7 is *OTHER_BB, but the PHI values from the
> +  earlier bbs match the path through the empty bb
> +  in between.  */
> +   edge e3;
> +   if (backward)
> + e3 = EDGE_SUCC (test_bb,
> + e2 == EDGE_SUCC (test_bb, 0) ? 1 : 0);
> +   else
> + e3 = EDGE_SUCC (bb,
> + e == EDGE_SUCC (bb, 0) ? 1 : 0);
> +   if (empty_block_p (e3->dest)
> +   && single_succ_p (e3->dest)
> +   && single_succ (e3->dest) == *other_bb
> +   && single_pred_p (e3->dest)
> +   && single_succ_edge (e3->dest)->flags == EDGE_FALLTHRU)
> + {
> +   if (backward)
> + e2 = single_succ_edge (e3->dest);
> +   else
> + e = single_succ_edge (e3->dest);
> +   if (test_swapped_p)
> + *test_swapped_p = true

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 06, 2020 at 02:14:48PM +0100, Jonathan Wakely wrote:
>   template
> __attribute__((__nonnull__(2), __access__(__write_only__, 2)))
> inline basic_istream<_CharT, _Traits>&
> operator>>(basic_istream<_CharT, _Traits>& __in, _CharT* __s)
> {
>   size_t __n = __builtin_object_size(__s, 0);
>   if (__builtin_expect(__n < sizeof(_CharT), false))
>   {
> // not even space for null terminator
> __glibcxx_assert(__n >= sizeof(_CharT));
> __in.width(0);
> __in.setstate(ios_base::failbit);
>   }
>   else
>   {
> if (__n == (size_t)-1)
>   __n = __gnu_cxx::__numeric_traits::__max;
> std::__istream_extract(__in, __s, __n / sizeof(_CharT));
>   }
>   return __in;
> }
> 
> This will give a -Wstringop-overflow warning at -O0 and then overflow
> the buffer, with undefined behaviour. And it will give no warning but
> avoid the overflow when optimising. This isn't my preferred outcome,
> I'd prefer to always get a warning, *and* be able to avoid the
> overflow when optimising and the size is known.

A way to get warning even at -O2 would be to call some external function
in the if (__bos0 < sizeof(_CharT)) block, which wouldn't be optimized away
and would have __attribute__((warning ("..."))) on it.
See e.g. how glibc uses __warndecl e.g. in
/usr/include/bits/string_fortified.h.
One can use alias attribute to have different warnings for the same external
call (which could do e.g. what part of __glibcxx_assert does, call vprintf
+ abort.

Jakub



[RS6000] PR96493, powerpc local call linkage failure

2020-08-06 Thread Alan Modra via Gcc-patches
This corrects current_file_function_operand, an operand predicate used
to determine whether a symbol_ref is safe to use with the local_call
patterns.  Calls between pcrel and non-pcrel code need to go via
linker stubs.  In the case of non-pcrel code to pcrel the stub saves
r2 but there needs to be a nop after the branch for the r2 restore.
So the local_call patterns can't be used there.  For pcrel code to
non-pcrel the local_call patterns could still be used, but I thought
it better to not use them since the call isn't direct.  Code generated
by the corresponding call_nonlocal_aix for pcrel is identical anyway.

Incidentally, without the TREE_CODE () == FUNCTION_DECL test,
gcc.c-torture/compile/pr37433.c and pr37433-1.c ICE.  Also, if you
make the test more strict by disallowing an op without a
SYMBOL_REF_DECL then a bunch of go and split-stack tests fail.  That's
because a prologue call to __morestack can't have a following nop.
(__morestack calls its caller at a fixed offset from the __morestack
call!)

Bootstrapped and regression tested powerpc64le-linux.  OK to apply?

Should I rename current_file_function_operand to something more
meaningful before committing?  direct_local_call_operand perhaps?

gcc/
PR target/96493
* config/rs6000/predicates.md (current_file_function_operand): Don't
accept functions that differ in r2 usage.
gcc/testsuite/
* gcc.target/powerpc/pr96493.c: New file.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index afb7c02f129..2709e46f7e5 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1051,7 +1051,12 @@
&& !((DEFAULT_ABI == ABI_AIX
  || DEFAULT_ABI == ABI_ELFv2)
 && (SYMBOL_REF_EXTERNAL_P (op)
-|| SYMBOL_REF_WEAK (op)))")))
+|| SYMBOL_REF_WEAK (op)))
+   && !(DEFAULT_ABI == ABI_ELFv2
+&& SYMBOL_REF_DECL (op) != NULL
+&& TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL
+&& (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op))
+!= rs6000_pcrel_p (cfun)))")))
 
 ;; Return 1 if this operand is a valid input for a move insn.
 (define_predicate "input_operand"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96493.c 
b/gcc/testsuite/gcc.target/powerpc/pr96493.c
new file mode 100644
index 000..3ee0fc9fe45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96493.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-mdejagnu-cpu=powerpc64 -O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+
+/* Test local calls between pcrel and non-pcrel code.
+
+   Despite the cpu=power10 option, the code generated here should just
+   be plain powerpc64, even the necessary linker stubs.  */
+
+int one = 1;
+
+int __attribute__ ((target("cpu=power8"),noclone,noinline))
+p8_func (int x)
+{
+  return x - one;
+}
+
+int __attribute__ ((target("cpu=power10"),noclone,noinline))
+p10_func (int x)
+{
+  return p8_func (x);
+}
+
+int
+main (void)
+{
+  return p10_func (1);
+}

-- 
Alan Modra
Australia Development Lab, IBM


Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 14:14 +0100, Jonathan Wakely via Libstdc++ wrote:

On 05/08/20 16:31 -0600, Martin Sebor via Libstdc++ wrote:

On 8/5/20 3:25 PM, Jonathan Wakely wrote:

P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
 template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell 

Re: [PATCH] c++: dependent constraint on placeholder return type [PR96443]

2020-08-06 Thread Patrick Palka via Gcc-patches
On Wed, 5 Aug 2020, Jason Merrill wrote:

> On 8/4/20 8:00 PM, Patrick Palka wrote:
> > On Tue, 4 Aug 2020, Patrick Palka wrote:
> > 
> > > In the testcase below, we never substitute function-template arguments
> > > into f15's placeholder-return-type constraint, which leads to us
> > > incorrectly rejecting this instantiation in do_auto_deduction due to
> > > satisfaction failure (of the constraint SameAs).
> > > 
> > > The fact that we incorrectly reject this testcase is masked by the
> > > other instantiation f15, which we correctly reject and diagnose
> > > (by accident).
> > > 
> > > A good place to do this missing substitution seems to be during
> > > TEMPLATE_TYPE_PARM level lowering.  So this patch adds a call to
> > > tsubst_constraint there, and also adds dg-bogus directives to this
> > > testcase wherever we expect instantiation to succeed. (So without the
> > > substitution fix, this last dg-bogus would FAIL).
> 
> > > Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
> > > range-v3 projects.  Does this look OK to commit?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/96443
> > >   * pt.c (tsubst) : Substitute into
> > >   the constraints on a placeholder type when its level.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/96443
> > >   * g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus wherever we expect
> > >   instantiation to succeed.
> > 
> > Looking back at this patch with fresh eyes, I realized that the commit
> > message is not the best.  I rewrote the commit message to hopefully be
> > more coherent below:
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: dependent constraint on placeholder return type
> >   [PR96443]
> > 
> > In the testcase concepts-ts1.C, we're incorrectly rejecting the call to
> > 'f15(0)' due to satisfaction failure of the function's
> > placeholder-return-type constraint.
> > 
> > The testcase doesn't spot this rejection because the error we emit for
> > the constraint failure points to f15's return statement instead of the
> > call site, and we already have a dg-error at the return statement to
> > verify the (correct) rejection of the call f15('a').  So in order to
> > verify that we indeed accept the call 'f15(0)', we need to add a
> > dg-bogus directive at the call site to look for the "required from here"
> > diagnostic line that generally accompanies an instantiation failure.
> > 
> > As for why satisfaction failure occurs, it turns out that we never
> > substitute the template arguments of a function template specialization
> > in to its placeholder-return-type constraint.  So in this case during
> > do_auto_deduction, we end up checking satisfaction of the still-dependent
> > constraint SameAs from do_auto_deduction, which fails
> > because it's dependent.
> > 
> > A good place to do this missing substitution seems to be during
> > TEMPLATE_TYPE_PARM level lowering; so this patch adds a call to
> > tsubst_constraint there.
> 
> Doing substitution seems like the wrong approach here; requirements should
> never be substituted except as part of satisfaction calculation (or, rarely,
> for declaration matching).  If we aren't communicating all the necessary
> template arguments to the later satisfaction, that's what we need to fix.

Ah okay, that makes sense.

It also looks like the question of perform a single full substitution
(during auto deduction) vs two substitutions may also be a correctness
issue in light of SFINAE.  Consider the following testcase:

template
concept C = true;

auto f(auto x) -> C auto { return 0; }
auto f(auto x, ...) { return 1; }

int a = f(0);

If we do a single substitution, then the substitution failure is a hard
error and we'll reject this testcase.  If we do two substitutions, then
it's a SFINAE error and we select the second overload.  Would we be
correct to issue a hard error here?

> 
> > Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
> > range-v3 projects.  Does this look OK to commit?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/96443
> > * pt.c (tsubst) : Substitute into
> > the constraints on a placeholder type when reducing its level.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/96443
> > * g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus to the call to f15
> > that we expect to accept.
> > ---
> >   gcc/cp/pt.c   | 7 ---
> >   gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C | 2 +-
> >   2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index e7496002c1c..9f3426f8249 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -15524,10 +15524,11 @@ tsubst (tree t, tree args, tsubst_flags_t
> > complain, tree in_decl)
> > if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
> >   {
> > -   /* Propagate constraints on placeholders since they are
> > -  only instantiated during satisfaction.  */
> > +  

Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Nathan Sidwell

On 8/5/20 7:29 PM, Marek Polacek wrote:

On Wed, Aug 05, 2020 at 11:03:08AM -0400, Nathan Sidwell wrote:

On 8/4/20 8:54 PM, Marek Polacek via Gcc-patches wrote:

On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:

I think the read of the room is that people think it would be generally useful, 
so let approve the general plan.


Cool.


So, now we are down to the fine details.  Please do see just how far you can 
stretch the existing mechanisms to cover what you need to do.  I think the 
existing mechanisms should be able to cover it all; but the devil is in the 
details and those matter.


At this point I'm only proposing one new directive, dg-ice.  I think we can't
really do without it.  The other one was a matter of convenience.


I've realized I have a concern.  Grepping (or searching in an editor buffer)
the log file for 'internal compiler error' to find actual regressions is a
thing I want to still be able to do (perhaps with alternative spelling, I
don't care).  I don't want to see the ICEs of tests that are expected to
ICE.

I think that means there has to be a positive marker on the unexpected ICEs,
rather than lack of an expected marker on them.


Hmm, by the log file you mean g++.log?  Currently, if you run a dg-ice test,
and the test still ICEs, the g++.log file (but not the stdout of make
check-c++!) will have:

Executing on host: ... xg++ with options ...
spawn -ignore SIGHUP ... xg++ with options ...
.../foo.C:14:15: internal compiler error: in poplevel_class, at 
cp/name-lookup.c:4225

compiler exited with status 1
XFAIL: g++.dg/foo.C  -std=c++17 (internal compiler error)
PASS: g++.dg/foo.C  -std=c++17 (test for excess errors)

Which one of these would you not like to see?


Neither of these is solving the issue.  How do I find the ICES that are 
unexpected, without tripping over the ICEs that are expected?



Can you give me more details?  Hopefully we'll work something out that doesn't
break your workflow.


sure.
* develop patch
* run testsuite
* observe unexpected ICEs
* load g++.log into editor
* ^sinternal comp
* gets to first unexpected ICE
* debug it.

What does '^sinternal comp' become?  As there could be many expected 
ICEs it'll be painful to determine whether any particular utterance of 
'internal compiler' is expected or not.


nathan

--
Nathan Sidwell


Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 15:26 +0200, Jakub Jelinek via Libstdc++ wrote:

On Thu, Aug 06, 2020 at 02:14:48PM +0100, Jonathan Wakely wrote:

  template
__attribute__((__nonnull__(2), __access__(__write_only__, 2)))
inline basic_istream<_CharT, _Traits>&
operator>>(basic_istream<_CharT, _Traits>& __in, _CharT* __s)
{
  size_t __n = __builtin_object_size(__s, 0);
  if (__builtin_expect(__n < sizeof(_CharT), false))
{
  // not even space for null terminator
  __glibcxx_assert(__n >= sizeof(_CharT));
  __in.width(0);
  __in.setstate(ios_base::failbit);
}
  else
{
  if (__n == (size_t)-1)
__n = __gnu_cxx::__numeric_traits::__max;
  std::__istream_extract(__in, __s, __n / sizeof(_CharT));
}
  return __in;
}

This will give a -Wstringop-overflow warning at -O0 and then overflow
the buffer, with undefined behaviour. And it will give no warning but
avoid the overflow when optimising. This isn't my preferred outcome,
I'd prefer to always get a warning, *and* be able to avoid the
overflow when optimising and the size is known.


A way to get warning even at -O2 would be to call some external function
in the if (__bos0 < sizeof(_CharT)) block, which wouldn't be optimized away
and would have __attribute__((warning ("..."))) on it.
See e.g. how glibc uses __warndecl e.g. in
/usr/include/bits/string_fortified.h.
One can use alias attribute to have different warnings for the same external
call (which could do e.g. what part of __glibcxx_assert does, call vprintf
+ abort.


Every time I've tried that I've found the requirement for an external
function to be frustrating. It means adding a new symbol to the
library, because it doesn't work for inline functions or function
templates, even with __attribute__((noinline)).

And we don't necessarily want it to abort, because that depends on a
macro defined by users, which isn't visible inside the library.

It shouldn't be this hard.




Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-08-06 Thread Richard Biener
On Thu, 6 Aug 2020, Richard Biener wrote:

> This adds a move CTOR to auto_vec and makes use of a
> auto_vec return value for get_loop_exit_edges denoting
> that lifetime management of the vector is handed to the caller.
> 
> The move CTOR prompted the hash_table change because it appearantly
> makes the copy CTOR implicitely deleted (good) and hash-table
> expansion of the odr_enum_map which is
> hash_map  where odr_enum has an
> auto_vec member triggers this.  Not sure if
> there's a latent bug there before this (I think we're not
> invoking DTORs, but we're invoking copy-CTORs).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> 
> Does this all look sensible and is it a good change
> (the get_loop_exit_edges one)?

Regtest went OK, here's an update with a complete ChangeLog
(how useful..) plus the move assign operator deleted, copy
assign wouldn't work as auto-generated and at the moment
there's no use of assigning.  I guess if we'd have functions
that take an auto_vec<> argument meaning they will destroy
the vector that will become useful and we can implement it.

OK for trunk?

Thanks,
Richard.


>From d74c346e95ff967d930b7c83daabc26b0227aea3 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 6 Aug 2020 14:50:56 +0200
Subject: [PATCH] add move CTOR to auto_vec, use auto_vec for
 get_loop_exit_edges

This adds a move CTOR to auto_vec and makes use of a
auto_vec return value for get_loop_exit_edges denoting
that lifetime management of the vector is handed to the caller.

The move CTOR prompted the hash_table change because it appearantly
makes the copy CTOR implicitely deleted (good) and hash-table
expansion of the odr_enum_map which is
hash_map  where odr_enum has an
auto_vec member triggers this.  Not sure if
there's a latent bug there before this (I think we're not
invoking DTORs, but we're invoking copy-CTORs).

2020-08-06  Richard Biener  

* vec.h (auto_vec::auto_vec (auto_vec &&)): New move CTOR.
(auto_vec::operator=(auto_vec &&)): Delete.
* hash-table.h (hash_table::expand): Use std::move when expanding.
* cfgloop.h (get_loop_exit_edges): Return auto_vec.
* cfgloop.c (get_loop_exit_edges): Adjust.
* cfgloopmanip.c (fix_loop_placement): Likewise.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ira-build.c (create_loop_tree_nodes): Likewise.
(create_loop_tree_node_allocnos): Likewise.
(loop_with_complex_edge_p): Likewise.
* ira-color.c (ira_loop_edge_freq): Likewise.
* loop-unroll.c (analyze_insns_in_loop): Likewise.
* predict.c (predict_loops): Likewise.
* tree-predcom.c (last_always_executed_block): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (store_motion_loop): Likewise.
* tree-ssa-loop-ivcanon.c (loop_edge_to_cancel): Likewise.
(canonicalize_loop_induction_variables): Likewise.
* tree-ssa-loop-manip.c (get_loops_exits): Likewise.
* tree-ssa-loop-niter.c (find_loop_niter): Likewise.
(finite_loop_p): Likewise.
(find_loop_niter_by_eval): Likewise.
(estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-prefetch.c (emit_mfence_after_loop): Likewise.
(may_use_storent_in_loop_p): Likewise.
---
 gcc/cfgloop.c|  4 ++--
 gcc/cfgloop.h|  2 +-
 gcc/cfgloopmanip.c   |  3 +--
 gcc/hash-table.h |  2 +-
 gcc/ipa-fnsummary.c  |  4 +---
 gcc/ira-build.c  | 12 +++-
 gcc/ira-color.c  |  4 +---
 gcc/loop-unroll.c|  3 +--
 gcc/predict.c|  9 ++---
 gcc/tree-predcom.c   |  3 +--
 gcc/tree-ssa-loop-ch.c   |  3 +--
 gcc/tree-ssa-loop-im.c   |  3 +--
 gcc/tree-ssa-loop-ivcanon.c  |  9 ++---
 gcc/tree-ssa-loop-manip.c|  3 +--
 gcc/tree-ssa-loop-niter.c| 20 +---
 gcc/tree-ssa-loop-prefetch.c |  7 ++-
 gcc/vec.h|  7 +++
 17 files changed, 33 insertions(+), 65 deletions(-)

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 7720e6e5d2c..33a26cca6a4 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1202,10 +1202,10 @@ release_recorded_exits (function *fn)
 
 /* Returns the list of the exit edges of a LOOP.  */
 
-vec 
+auto_vec
 get_loop_exit_edges (const class loop *loop, basic_block *body)
 {
-  vec edges = vNULL;
+  auto_vec edges;
   edge e;
   unsigned i;
   edge_iterator ei;
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..f1687f37401 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -383,7 +383,7 @@ extern basic_block *get_loop_body_in_custom_order (const 
class loop *,
 extern basic_block *get_loop_body_in_custom_order (const class loop *, void *,
   int (*) (const void *, const void *, void *));
 
-extern vec get_loop_exit_edges (const class loop *, basic_block * = 
NULL);
+extern auto_vec get_loop_exit_edges (co

Re: std:vec for classes with constructor?

2020-08-06 Thread Aldy Hernandez via Gcc-patches



On 8/6/20 12:48 PM, Jonathan Wakely wrote:

On 06/08/20 12:31 +0200, Richard Biener wrote:
On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely  
wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:
>Andrew MacLeod via Gcc-patches  writes:
>> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
>>> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor 
 wrote:

 On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
 [...]

> * ipa-cp changes from vec to std::vec.
>
> We are using std::vec to ensure constructors are run, which they
 aren't
> in our internal vec<> implementation.  Although we usually 
steer away

> from using std::vec because of interactions with our GC system,
> ipcp_param_lattices is only live within the pass and allocated 
with

 calloc.
 Ummm... I did not object but I will save the URL of this message 
in the

 archive so that I can waive it in front of anyone complaining why I
 don't use our internal vec's in IPA data structures.

 But it actually raises a broader question: was this supposed to 
be an
 exception, allowed only not to complicate the irange patch 
further, or
 will this be generally accepted thing to do when someone wants 
to have

 a
 vector of constructed items?
>>> It's definitely not what we want. You have to find another 
solution to this problem.

>>>
>>> Richard.
>>>
>>
>> Why isn't it what we want?
>>
>> This is a small vector local to the pass so it doesn't interfere with
>> our PITA GTY.
>> The class is pretty straightforward, but we do need a constructor to
>> initialize the pointer and the max-size field.  There is no 
allocation
>> done per element, so a small number of elements have a couple of 
fields

>> initialized per element. We'd have to loop to do that anyway.
>>
>> GCC's vec<> does not provide he ability to run a constructor, 
std::vec

>> does.
>
>I realise you weren't claiming otherwise, but: that could be fixed :-)

It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.


For the record, neither value_range nor int_range require any 
allocations.  The sub-range storage resides in the stack or wherever it 
was defined.  However, it is definitely not a POD.


Digging deeper, I notice that the original issue that caused us to use 
std::vector was not in-place new but the safe_grow_cleared.  The 
original code had:



auto_vec known_value_ranges;
...
...
if (!vr.undefined_p () && !vr.varying_p ())
  {
if (!known_value_ranges.length ())
  known_value_ranges.safe_grow_cleared (count);
  known_value_ranges[i] = vr;
  }


I would've gladly kept the auto_vec, had I been able to do call the 
constructor by doing an in-place new:



if (!vr.undefined_p () && !vr.varying_p ())
  {
if (!known_value_ranges.length ())
- known_value_ranges.safe_grow_cleared (count);
+ {
+   known_value_ranges.safe_grow_cleared (count);
+   for (int i = 0; i < count; ++i)
+ new (&known_value_ranges[i]) value_range ();
+ }
known_value_ranges[i] = vr;
  }
  }


But alas, compiling yields:


In file included from /usr/include/wchar.h:35,
 from /usr/include/c++/10/cwchar:44,
 from /usr/include/c++/10/bits/postypes.h:40,
 from /usr/include/c++/10/iosfwd:40,
 from /usr/include/gmp-x86_64.h:34,
 from /usr/include/gmp.h:59,
 from /home/aldyh/src/gcc/gcc/system.h:687,
 from /home/aldyh/src/gcc/gcc/ipa-fnsummary.c:55:
/home/aldyh/src/gcc/gcc/vec.h: In instantiation of ‘static size_t vec::embedded_size(unsigned int) [with T = int_range<1>; A = va_heap; size_t 
= long unsigned int]’:
/home/aldyh/src/gcc/gcc/vec.h:288:58:   required from ‘static void va_heap::reserve(vec*&, unsigned int, bool) [with T = int_range<1>]’
/home/aldyh/src/gcc/gcc/vec.h:1746:20:   required from ‘bool vec::reserve(uns

RE: [PATCH] x86_64: Integer min/max improvements.

2020-08-06 Thread Roger Sayle


Sorry for the inconvenience.  I've just added the obligatory
/* { dg-do compile { target { ! ia32 } } } */
to this new gcc.target/i386 test to resolve this failure.
Please let me know if there's a better fix.

2020-08-06  Roger Sayle  

gcc/testsuite/ChangeLog
* gcc.target/i386/minmax-9.c: Restrict test to !ia32.

My apologies again.
Roger
--

-Original Message-
From: Jakub Jelinek  
Sent: 06 August 2020 13:28
To: Roger Sayle 
Cc: 'Uros Bizjak' ; 'GCC Patches'

Subject: Re: [PATCH] x86_64: Integer min/max improvements.

On Thu, Aug 06, 2020 at 09:40:49AM +0100, Roger Sayle wrote:

This test fails on i686-linux (or x86_64-linux when testing with -m32).
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}
i386.exp=minmax-9.c'
Running /usr/src/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
FAIL: gcc.target/i386/minmax-9.c scan-assembler-times test 3

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/minmax-9.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +#define max(a,b) (((a) > (b))? (a) : (b)) #define min(a,b) (((a) < 
> +(b))? (a) : (b))
> +
> +int foo(int x)
> +{
> +  return max(x,0);
> +}
> +
> +int bar(int x)
> +{
> +  return min(x,0);
> +}
> +
> +unsigned int baz(unsigned int x)
> +{
> +  return min(x,1);
> +}
> +
> +/* { dg-final { scan-assembler-times "xor" 3 } } */
> +/* { dg-final { scan-assembler-times "test" 3 } } */

Jakub




Backporting streaming and enum changes

2020-08-06 Thread Jan Hubicka
Hello,
as discussed some time ago, I would like to discuss possibility to
backport the straming and enum improvements.  The motivation is that
this brings quite noticeable improvements to builds of very large
projects where we currently have nonlinearity problem with anonymous
namespaces (which is solved by first set of patches) and also there is
quite noticeable overhead of streaming of enums that I noticed too late
for gcc 10.1. This is the second combine dpatch.

There is also noticeable reduction of .o files (especially before
compression as hit to WPA->ltrans streaming) and some memory use
benefits.

This is an optional thing to do, but I believe it may be helpful for
distro builds and those using LTO for large projects.  

For firefox the reduction in global stream (that is slowest part of WPA)
is from 25678391 tree bodies to 20821629, 11160520 SCC hash collisions
to 6002. 392382523 overal section size to 287891470 (both is
compressed).

For Firefox streaming is under control, but other projects like Chromium
hits bigger issues. The reason is that Firefox has "unified build" that
#includes multiple cpp sources to one, so it consists of only about 8k
source files, while chromium is over 25k and it was tested on project
with over 250k sources. More smaller sources one gets, the more
noticeable bottleneck streaming become.

The patches are not completely trivial, but they affect code that is
heavily executed during streaming and was in mainline for several
months, so I hope they are safe.

Honza
>From 7edbd8fada3055d55cd238754cd70770d62b10de Mon Sep 17 00:00:00 2001
From: Jan Hubicka 
Date: Wed, 20 May 2020 15:58:22 +0200
Subject: [PATCH 1/5] Avoid SCC hashing on unmergeable trees

This is new incarantion of patch to identify unmergeable tree at streaming out
time rather than streaming in and to avoid pickling them to sccs with with hash
codes.

Building cc1 plus this patch reduces:

[WPA] read 4452927 SCCs of average size 1.986030
[WPA] 8843646 tree bodies read in total
[WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 947551 SCCs, 780270 collisions (0.823460)
[WPA] Merged 944038 SCCs
[WPA] Merged 5253521 tree bodies
[WPA] Merged 590027 types
...
[WPA] Size of mmap'd section decls: 99229066 bytes
[WPA] Size of mmap'd section function_body: 18398837 bytes
[WPA] Size of mmap'd section refs: 733678 bytes
[WPA] Size of mmap'd section jmpfuncs: 2965981 bytes
[WPA] Size of mmap'd section pureconst: 170248 bytes
[WPA] Size of mmap'd section profile: 17985 bytes
[WPA] Size of mmap'd section symbol_nodes: 3392736 bytes
[WPA] Size of mmap'd section inline: 2693920 bytes
[WPA] Size of mmap'd section icf: 435557 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651660 bytes

... to ...

[WPA] read 3312246 unshared trees
[WPA] read 1144381 mergeable SCCs of average size 4.833785
[WPA] 8843938 tree bodies read in total
[WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 946614 SCCs, 775077 collisions (0.818789)
[WPA] Merged 943798 SCCs
[WPA] Merged 5253336 tree bodies
[WPA] Merged 590105 types

[WPA] Size of mmap'd section decls: 81262144 bytes
[WPA] Size of mmap'd section function_body: 14702611 bytes
[WPA] Size of mmap'd section ext_symtab: 0 bytes
[WPA] Size of mmap'd section refs: 733695 bytes
[WPA] Size of mmap'd section jmpfuncs: 2332150 bytes
[WPA] Size of mmap'd section pureconst: 170292 bytes
[WPA] Size of mmap'd section profile: 17986 bytes
[WPA] Size of mmap'd section symbol_nodes: 3393358 bytes
[WPA] Size of mmap'd section inline: 2567939 bytes
[WPA] Size of mmap'd section icf: 435633 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651824 bytes

so results in about 22% reduction in global decl stream and 24% reduction on
function bodies stream (which is read mostly by ICF)

Martin, the zstd compression breaks the compression statistics (it works when
GCC is configured for zlib)

At first ltrans I get:

[LTRANS] Size of mmap'd section decls: 3734248 bytes
[LTRANS] Size of mmap'd section function_body: 4895962 bytes

... to ...

[LTRANS] Size of mmap'd section decls: 3479850 bytes
[LTRANS] Size of mmap'd section function_body: 3722935 bytes

So 7% reduction of global stream and 31% reduction of function bodies.

Stream in seems to get about 3% faster and stream out about 5% but it is
close to noise factor of my experiment.  I expect bigger speedups on
Firefox but I did not test it today since my Firefox setup broke again.
GCC is not very good example on the problem with anonymous namespace
types since we do not have so many of them.

Sice of object files in gcc directory is reduced by 11% (because hash
numbers do not compress well I guess).

The patch makes DFS walk to recognize trees that are not merged

Re: std:vec for classes with constructor?

2020-08-06 Thread Richard Biener via Gcc-patches
On Thu, Aug 6, 2020 at 4:17 PM Aldy Hernandez  wrote:
>
>
>
> On 8/6/20 12:48 PM, Jonathan Wakely wrote:
> > On 06/08/20 12:31 +0200, Richard Biener wrote:
> >> On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely 
> >> wrote:
> >>>
> >>> On 06/08/20 06:16 +0100, Richard Sandiford wrote:
> >>> >Andrew MacLeod via Gcc-patches  writes:
> >>> >> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
> >>> >>> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor
> >>>  wrote:
> >>>  On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
> >>>  [...]
> >>> 
> >>> > * ipa-cp changes from vec to std::vec.
> >>> >
> >>> > We are using std::vec to ensure constructors are run, which they
> >>>  aren't
> >>> > in our internal vec<> implementation.  Although we usually
> >>> steer away
> >>> > from using std::vec because of interactions with our GC system,
> >>> > ipcp_param_lattices is only live within the pass and allocated
> >>> with
> >>>  calloc.
> >>>  Ummm... I did not object but I will save the URL of this message
> >>> in the
> >>>  archive so that I can waive it in front of anyone complaining why I
> >>>  don't use our internal vec's in IPA data structures.
> >>> 
> >>>  But it actually raises a broader question: was this supposed to
> >>> be an
> >>>  exception, allowed only not to complicate the irange patch
> >>> further, or
> >>>  will this be generally accepted thing to do when someone wants
> >>> to have
> >>>  a
> >>>  vector of constructed items?
> >>> >>> It's definitely not what we want. You have to find another
> >>> solution to this problem.
> >>> >>>
> >>> >>> Richard.
> >>> >>>
> >>> >>
> >>> >> Why isn't it what we want?
> >>> >>
> >>> >> This is a small vector local to the pass so it doesn't interfere with
> >>> >> our PITA GTY.
> >>> >> The class is pretty straightforward, but we do need a constructor to
> >>> >> initialize the pointer and the max-size field.  There is no
> >>> allocation
> >>> >> done per element, so a small number of elements have a couple of
> >>> fields
> >>> >> initialized per element. We'd have to loop to do that anyway.
> >>> >>
> >>> >> GCC's vec<> does not provide he ability to run a constructor,
> >>> std::vec
> >>> >> does.
> >>> >
> >>> >I realise you weren't claiming otherwise, but: that could be fixed :-)
> >>>
> >>> It really should be.
> >>>
> >>> Artificial limitations like that are just a booby trap for the unwary.
> >>
> >> It's probably also historic because we couldn't even implement
> >> the case of re-allocation correctly without std::move, could we?
> >
> > I don't see why not. std::vector worked fine without std::move, it's
> > just more efficient with std::move, and can be used with a wider set
> > of element types.
> >
> > When reallocating you can just copy each element to the new storage
> > and destroy the old element. If your type is non-copyable then you
> > need std::move, but I don't think the types I see used with vec<> are
> > non-copyable. Most of them are trivially-copyable.
> >
> > I think the benefit of std::move to GCC is likely to be permitting
> > cheap copies to be made where previously they were banned for
> > performance reasons, but not because those copies were impossible.
>
> For the record, neither value_range nor int_range require any
> allocations.  The sub-range storage resides in the stack or wherever it
> was defined.  However, it is definitely not a POD.
>
> Digging deeper, I notice that the original issue that caused us to use
> std::vector was not in-place new but the safe_grow_cleared.  The
> original code had:
>
> >   auto_vec known_value_ranges;
> > ...
> > ...
> >   if (!vr.undefined_p () && !vr.varying_p ())
> >   {
> > if (!known_value_ranges.length ())
> >   known_value_ranges.safe_grow_cleared (count);
> >   known_value_ranges[i] = vr;
> >   }
>
> I would've gladly kept the auto_vec, had I been able to do call the
> constructor by doing an in-place new:
>
> > if (!vr.undefined_p () && !vr.varying_p ())
> >   {
> > if (!known_value_ranges.length ())
> > - known_value_ranges.safe_grow_cleared (count);
> > + {
> > +   known_value_ranges.safe_grow_cleared (count);
> > +   for (int i = 0; i < count; ++i)
> > + new (&known_value_ranges[i]) value_range ();

With your placement new loop you should only need .safe_grow (count)
which should then make it work(?)

> > + }
> > known_value_ranges[i] = vr;
> >   }
> >   }
>
> But alas, compiling yields:
>
> > In file included from /usr/include/wchar.h:35,
> >  from /usr/include/c++/10/cwchar:44,
> >  from /usr/include/c++/10/bits/postyp

Re: Backporting streaming and enum changes

2020-08-06 Thread Richard Biener
On Thu, 6 Aug 2020, Jan Hubicka wrote:

> Hello,
> as discussed some time ago, I would like to discuss possibility to
> backport the straming and enum improvements.  The motivation is that
> this brings quite noticeable improvements to builds of very large
> projects where we currently have nonlinearity problem with anonymous
> namespaces (which is solved by first set of patches) and also there is
> quite noticeable overhead of streaming of enums that I noticed too late
> for gcc 10.1. This is the second combine dpatch.
> 
> There is also noticeable reduction of .o files (especially before
> compression as hit to WPA->ltrans streaming) and some memory use
> benefits.
> 
> This is an optional thing to do, but I believe it may be helpful for
> distro builds and those using LTO for large projects.  
> 
> For firefox the reduction in global stream (that is slowest part of WPA)
> is from 25678391 tree bodies to 20821629, 11160520 SCC hash collisions
> to 6002. 392382523 overal section size to 287891470 (both is
> compressed).
> 
> For Firefox streaming is under control, but other projects like Chromium
> hits bigger issues. The reason is that Firefox has "unified build" that
> #includes multiple cpp sources to one, so it consists of only about 8k
> source files, while chromium is over 25k and it was tested on project
> with over 250k sources. More smaller sources one gets, the more
> noticeable bottleneck streaming become.
> 
> The patches are not completely trivial, but they affect code that is
> heavily executed during streaming and was in mainline for several
> months, so I hope they are safe.

So we've built the core of openSUSE (~3000 packages) on x86_64
and i586 with these backported and sofar found no issues.

I'm fine with backporting but I'll give Jakub the chance to
object.

Honza - please make sure to bump the LTO stream version minor
together with the streaming change (I think the enum change
doesn't require bumping).

Thanks,
Richard.

> Honza
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 15:01 +0100, Jonathan Wakely wrote:

On 06/08/20 15:26 +0200, Jakub Jelinek via Libstdc++ wrote:

On Thu, Aug 06, 2020 at 02:14:48PM +0100, Jonathan Wakely wrote:

 template
   __attribute__((__nonnull__(2), __access__(__write_only__, 2)))
   inline basic_istream<_CharT, _Traits>&
   operator>>(basic_istream<_CharT, _Traits>& __in, _CharT* __s)
   {
 size_t __n = __builtin_object_size(__s, 0);
 if (__builtin_expect(__n < sizeof(_CharT), false))
{
  // not even space for null terminator
  __glibcxx_assert(__n >= sizeof(_CharT));
  __in.width(0);
  __in.setstate(ios_base::failbit);
}
 else
{
  if (__n == (size_t)-1)
__n = __gnu_cxx::__numeric_traits::__max;
  std::__istream_extract(__in, __s, __n / sizeof(_CharT));
}
 return __in;
   }

This will give a -Wstringop-overflow warning at -O0 and then overflow
the buffer, with undefined behaviour. And it will give no warning but
avoid the overflow when optimising. This isn't my preferred outcome,
I'd prefer to always get a warning, *and* be able to avoid the
overflow when optimising and the size is known.


A way to get warning even at -O2 would be to call some external function
in the if (__bos0 < sizeof(_CharT)) block, which wouldn't be optimized away
and would have __attribute__((warning ("..."))) on it.
See e.g. how glibc uses __warndecl e.g. in
/usr/include/bits/string_fortified.h.
One can use alias attribute to have different warnings for the same external
call (which could do e.g. what part of __glibcxx_assert does, call vprintf
+ abort.


Every time I've tried that I've found the requirement for an external
function to be frustrating. It means adding a new symbol to the
library, because it doesn't work for inline functions or function
templates, even with __attribute__((noinline)).

And we don't necessarily want it to abort, because that depends on a
macro defined by users, which isn't visible inside the library.

It shouldn't be this hard.


The function with __attribute__(__warning__(""))) only warns when
-Wsystem-headers is on, which makes it useless. And when it's on, it
warns twice for a single call:

In file included from /home/jwakely/gcc/11/include/c++/11.0.0/sstream:38,
 from of.cc:1:
In function 'std::basic_istream<_CharT, _Traits>& std::operator>>(std::basic_istream<_CharT, 
_Traits>&, _CharT*) [with _CharT = char; _Traits = std::char_traits]',
inlined from 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with _CharT = char; _Traits = 
std::char_traits]' at /home/jwakely/gcc/11/include/c++/11.0.0/istream:808:5,
inlined from 'void test01(std::istream&)' at of.cc:7:16:
/home/jwakely/gcc/11/include/c++/11.0.0/istream:814:26: warning: call to 
'std::__diag_overflow' declared with attribute warning: buffer overflow 
detected [-Wattribute-warning]
  814 |   __diag_overflow();
  |   ~~~^~
In function 'std::basic_istream<_CharT, _Traits>& std::operator>>(std::basic_istream<_CharT, 
_Traits>&, _CharT*) [with _CharT = char; _Traits = std::char_traits]',
inlined from 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with _CharT = char; _Traits = 
std::char_traits]' at /home/jwakely/gcc/11/include/c++/11.0.0/istream:808:5,
inlined from 'void test01(std::istream&)' at of.cc:7:16,
inlined from 'int main()' at of.cc:13:9:
/home/jwakely/gcc/11/include/c++/11.0.0/istream:814:26: warning: call to 
'std::__diag_overflow' declared with attribute warning: buffer overflow 
detected [-Wattribute-warning]
  814 |   __diag_overflow();
  |   ~~~^~


Adding attributes to __istream_extract is useless, because that's only
called by the library, so again, needs -Wsystem-headers to do
anything.

Adding attributes to operator>> works well, but only at -O0 because
otherwise it gets inlined and the attributes are ignored. The
functions that get called by the inlined function don't warn because
they're in system headers.

This is unusable, and a waste of a day.





[PATCH] Rewrite get_size_range for irange API.

2020-08-06 Thread Aldy Hernandez via Gcc-patches
[Martin, does this sound reasonable to you?]

The following patch converts get_size_range to the irange API, thus
removing the use of VR_ANTI_RANGE.

This was a bit tricky because of the gymnastics we do in get_size_range
to ignore negatives and all that.  I didn't convert the function for
multi-ranges.  The code still returns a pair of trees indicating the
massaged range.  But I do believe the code is cleaner and smaller.

I'm not sure the current code (or my adaptation) gets all cases, but
the goal was to keep to the existing functionality, nothing more.

OK?

gcc/ChangeLog:

* calls.c (range_remove_non_positives): New.
(set_bounds_from_range): New.
(get_size_range): Rewrite for irange API.
* tree-affine.c (expr_to_aff_combination): Call determine_value_range
with a range.
* tree-vrp.c (determine_value_range_1): Rename to...
(determine_value_range): ...this.
* tree-vrp.h (determine_value_range): Adjust prototype.
---
 gcc/calls.c   | 139 ++
 gcc/tree-affine.c |   5 +-
 gcc/tree-vrp.c|  44 ++-
 gcc/tree-vrp.h|   2 +-
 4 files changed, 73 insertions(+), 117 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 44401e6350d..4aeeb36a2be 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "builtins.h"
 #include "gimple-fold.h"
+#include "range.h"
 
 /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits.  */
 #define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT)
@@ -1237,6 +1238,31 @@ alloc_max_size (void)
   return alloc_object_size_limit;
 }
 
+// Remove non-positive numbers from a range.  ALLOW_ZERO is TRUE if 0
+// is considered positive.
+
+static void
+range_remove_non_positives (irange *vr, bool allow_zero)
+{
+  tree floor, type = vr->type ();
+  if (allow_zero)
+floor = build_zero_cst (type);
+  else
+floor = build_one_cst (type);
+  value_range positives (floor, TYPE_MAX_VALUE (type));
+  vr->intersect (positives);
+}
+
+// Set the extreme bounds of range VR into range[].
+
+static bool
+set_bounds_from_range (const irange *vr, tree range[2])
+{
+  range[0] = wide_int_to_tree (vr->type (), vr->lower_bound ());
+  range[1] = wide_int_to_tree (vr->type (), vr->upper_bound ());
+  return true;
+}
+
 /* Return true when EXP's range can be determined and set RANGE[] to it
after adjusting it if necessary to make EXP a represents a valid size
of object, or a valid size argument to an allocation function declared
@@ -1250,9 +1276,11 @@ alloc_max_size (void)
 bool
 get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
 {
-  if (!exp)
-return false;
-
+  if (!exp || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+{
+  range[0] = range[1] = NULL_TREE;
+  return false;
+}
   if (tree_fits_uhwi_p (exp))
 {
   /* EXP is a constant.  */
@@ -1261,91 +1289,30 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)
 }
 
   tree exptype = TREE_TYPE (exp);
-  bool integral = INTEGRAL_TYPE_P (exptype);
-
-  wide_int min, max;
-  enum value_range_kind range_type;
-
-  if (integral)
-range_type = determine_value_range (exp, &min, &max);
-  else
-range_type = VR_VARYING;
-
-  if (range_type == VR_VARYING)
+  value_range vr;
+  determine_value_range (&vr, exp);
+  if (vr.num_pairs () == 1)
+return set_bounds_from_range (&vr, range);
+
+  widest_irange positives (vr);
+  range_remove_non_positives (&positives, allow_zero);
+
+  // If all numbers are negative, let the caller sort it out.
+  if (positives.undefined_p ())
+return set_bounds_from_range (&vr, range);
+
+  // Remove the unknown parts of a multi-range.
+  // This will transform [5,10][20,MAX] into [5,10].
+  int pairs = positives.num_pairs ();
+  if (pairs > 1
+  && positives.upper_bound () == wi::to_wide (TYPE_MAX_VALUE (exptype)))
 {
-  if (integral)
-   {
- /* Use the full range of the type of the expression when
-no value range information is available.  */
- range[0] = TYPE_MIN_VALUE (exptype);
- range[1] = TYPE_MAX_VALUE (exptype);
- return true;
-   }
-
-  range[0] = NULL_TREE;
-  range[1] = NULL_TREE;
-  return false;
+  value_range last_range (exptype,
+ positives.lower_bound (pairs - 1),
+ positives.upper_bound (pairs - 1), VR_ANTI_RANGE);
+  positives.intersect (last_range);
 }
-
-  unsigned expprec = TYPE_PRECISION (exptype);
-
-  bool signed_p = !TYPE_UNSIGNED (exptype);
-
-  if (range_type == VR_ANTI_RANGE)
-{
-  if (signed_p)
-   {
- if (wi::les_p (max, 0))
-   {
- /* EXP is not in a strictly negative range.  That means
-it must be in some (not necessarily strictly) positive
-range which includes zero.  Si

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Qing Zhao via Gcc-patches



> On Aug 6, 2020, at 3:31 AM, Richard Biener  wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> Thanks a lot for your careful review and detailed comments.  
>> 
>> 
>>> On Aug 4, 2020, at 2:35 AM, Richard Biener >> > wrote:
>>> 
>>> I have a few comments below - I'm not sure I'm qualified to fully
>>> review the rest though.
>> 
>> Could you let me know who will be the more qualified person to fully review 
>> the rest of middle-end change?
> 
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...
> 
>>> +  if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>> + continue;
>>> 
>>> Why does the target need some extra say here?
>> 
>> Only target can decide which hard regs should be zeroed, and which hard regs 
>> are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

Yes, I agree that there might be too much details exposed to middle-end in the 
current design. 

A single target hook as you suggested:
 targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);

Might be a cleaner design.


Thanks.

Qing



Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Martin Sebor via Gcc-patches

On 8/6/20 7:30 AM, Jonathan Wakely via Libstdc++ wrote:

On 06/08/20 14:14 +0100, Jonathan Wakely via Libstdc++ wrote:

On 05/08/20 16:31 -0600, Martin Sebor via Libstdc++ wrote:

On 8/5/20 3:25 PM, Jonathan Wakely wrote:
P0487R1 resolved LWG 2499 for C++20 by removing the operator>> 
overloads

that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires 
-O2) or

use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering 
sizeof(char_type)

and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in 
fact

that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed 
from
the header. The explicit instantiation definitions are still needed, 
for

backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
 template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* 
testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:

Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:

Do not test writing to pointers for C++20.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:

New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:

New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:

New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:

New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc: 


New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then we just assume it's larger than the string to be extracted, which
is what the old code did anyway

Re: std:vec for classes with constructor?

2020-08-06 Thread Aldy Hernandez via Gcc-patches




On 8/6/20 4:35 PM, Richard Biener wrote:

On Thu, Aug 6, 2020 at 4:17 PM Aldy Hernandez  wrote:




On 8/6/20 12:48 PM, Jonathan Wakely wrote:

On 06/08/20 12:31 +0200, Richard Biener wrote:

On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely 
wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:

Andrew MacLeod via Gcc-patches  writes:

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:

On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor

 wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't

in our internal vec<> implementation.  Although we usually

steer away

from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated

with

calloc.
Ummm... I did not object but I will save the URL of this message

in the

archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to

be an

exception, allowed only not to complicate the irange patch

further, or

will this be generally accepted thing to do when someone wants

to have

a
vector of constructed items?

It's definitely not what we want. You have to find another

solution to this problem.


Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no

allocation

done per element, so a small number of elements have a couple of

fields

initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor,

std::vec

does.


I realise you weren't claiming otherwise, but: that could be fixed :-)


It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.


For the record, neither value_range nor int_range require any
allocations.  The sub-range storage resides in the stack or wherever it
was defined.  However, it is definitely not a POD.

Digging deeper, I notice that the original issue that caused us to use
std::vector was not in-place new but the safe_grow_cleared.  The
original code had:


   auto_vec known_value_ranges;
...
...
   if (!vr.undefined_p () && !vr.varying_p ())
   {
 if (!known_value_ranges.length ())
   known_value_ranges.safe_grow_cleared (count);
   known_value_ranges[i] = vr;
   }


I would've gladly kept the auto_vec, had I been able to do call the
constructor by doing an in-place new:


 if (!vr.undefined_p () && !vr.varying_p ())
   {
 if (!known_value_ranges.length ())
- known_value_ranges.safe_grow_cleared (count);
+ {
+   known_value_ranges.safe_grow_cleared (count);
+   for (int i = 0; i < count; ++i)
+ new (&known_value_ranges[i]) value_range ();


With your placement new loop you should only need .safe_grow (count)
which should then make it work(?)


Ah yes, no cleared is necessary as the in-place new would initialize the 
chunk.  But as discussed below, the compiler still barfs.





+ }
 known_value_ranges[i] = vr;
   }
   }


But alas, compiling yields:


In file included from /usr/include/wchar.h:35,
  from /usr/include/c++/10/cwchar:44,
  from /usr/include/c++/10/bits/postypes.h:40,
  from /usr/include/c++/10/iosfwd:40,
  from /usr/include/gmp-x86_64.h:34,
  from /usr/include/gmp.h:59,
  from /home/aldyh/src/gcc/gcc/system.h:687,
  from /home/aldyh/src/gcc/gcc/ipa-fnsummary.c:55:
/home/aldyh/src/gcc/gcc/vec.h: In instantiation of ‘static size_t vec::embedded_size(unsigned int) [with T = int_range<1>; A = va_heap; size_t 
= long unsigned int]’:
/home/a

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 08:56 -0600, Martin Sebor via Libstdc++ wrote:

For this specific use case, I saw __istream_extract defined
as an ordinary (non-template) function in a .tcc file in
the patch so I thought it was out of line.  If it's inline


It's overloaded. One is a function template defined inline, the other
is a non-inline function defined in the library.


or if it's a template the only workaround I can think of
to retain the warning is to have it make a call to (no-op)
function with the attribute that is not inlined.  It's too
bad there is no attribute to tell the expander to avoid
emitting such a function (which would be the equivalent of
the idea I outlined in my second paragraph above).


That will still fail to warn because of -Wsystem-headers.

Attempting to use attributes here achieves absolutely nothing for the
problem scenarios I was concerned about.

The only case where it helps is passing a null pointer or a pointer to
a zero-sized buffer to operator>> and not optimising. All other
problematic cases fail to warn, and it's those other cases where a
warning would be helpful.




Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Martin Sebor via Gcc-patches

On 8/6/20 8:45 AM, Jonathan Wakely via Libstdc++ wrote:

On 06/08/20 15:01 +0100, Jonathan Wakely wrote:

On 06/08/20 15:26 +0200, Jakub Jelinek via Libstdc++ wrote:

On Thu, Aug 06, 2020 at 02:14:48PM +0100, Jonathan Wakely wrote:

 template
   __attribute__((__nonnull__(2), __access__(__write_only__, 2)))
   inline basic_istream<_CharT, _Traits>&
   operator>>(basic_istream<_CharT, _Traits>& __in, _CharT* __s)
   {
 size_t __n = __builtin_object_size(__s, 0);
 if (__builtin_expect(__n < sizeof(_CharT), false))
{
  // not even space for null terminator
  __glibcxx_assert(__n >= sizeof(_CharT));
  __in.width(0);
  __in.setstate(ios_base::failbit);
}
 else
{
  if (__n == (size_t)-1)
    __n = __gnu_cxx::__numeric_traits::__max;
  std::__istream_extract(__in, __s, __n / sizeof(_CharT));
}
 return __in;
   }

This will give a -Wstringop-overflow warning at -O0 and then overflow
the buffer, with undefined behaviour. And it will give no warning but
avoid the overflow when optimising. This isn't my preferred outcome,
I'd prefer to always get a warning, *and* be able to avoid the
overflow when optimising and the size is known.


A way to get warning even at -O2 would be to call some external function
in the if (__bos0 < sizeof(_CharT)) block, which wouldn't be 
optimized away

and would have __attribute__((warning ("..."))) on it.
See e.g. how glibc uses __warndecl e.g. in
/usr/include/bits/string_fortified.h.
One can use alias attribute to have different warnings for the same 
external
call (which could do e.g. what part of __glibcxx_assert does, call 
vprintf

+ abort.


Every time I've tried that I've found the requirement for an external
function to be frustrating. It means adding a new symbol to the
library, because it doesn't work for inline functions or function
templates, even with __attribute__((noinline)).

And we don't necessarily want it to abort, because that depends on a
macro defined by users, which isn't visible inside the library.

It shouldn't be this hard.


The function with __attribute__(__warning__(""))) only warns when
-Wsystem-headers is on, which makes it useless. And when it's on, it
warns twice for a single call:

In file included from /home/jwakely/gcc/11/include/c++/11.0.0/sstream:38,
  from of.cc:1:
In function 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with 
_CharT = char; _Traits = std::char_traits]',
     inlined from 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with 
_CharT = char; _Traits = std::char_traits]' at 
/home/jwakely/gcc/11/include/c++/11.0.0/istream:808:5,

     inlined from 'void test01(std::istream&)' at of.cc:7:16:
/home/jwakely/gcc/11/include/c++/11.0.0/istream:814:26: warning: call to 
'std::__diag_overflow' declared with attribute warning: buffer overflow 
detected [-Wattribute-warning]

   814 |   __diag_overflow();
   |   ~~~^~
In function 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with 
_CharT = char; _Traits = std::char_traits]',
     inlined from 'std::basic_istream<_CharT, _Traits>& 
std::operator>>(std::basic_istream<_CharT, _Traits>&, _CharT*) [with 
_CharT = char; _Traits = std::char_traits]' at 
/home/jwakely/gcc/11/include/c++/11.0.0/istream:808:5,

     inlined from 'void test01(std::istream&)' at of.cc:7:16,
     inlined from 'int main()' at of.cc:13:9:
/home/jwakely/gcc/11/include/c++/11.0.0/istream:814:26: warning: call to 
'std::__diag_overflow' declared with attribute warning: buffer overflow 
detected [-Wattribute-warning]

   814 |   __diag_overflow();
   |   ~~~^~


Adding attributes to __istream_extract is useless, because that's only
called by the library, so again, needs -Wsystem-headers to do
anything.

Adding attributes to operator>> works well, but only at -O0 because
otherwise it gets inlined and the attributes are ignored. The
functions that get called by the inlined function don't warn because
they're in system headers.

This is unusable, and a waste of a day.


Sorry.  I don't see this exercise as a complete waste of time
(but I understand why it feels like that to you).

What it highlights is the fact that the warning infrastructure
we have in place is far from optimal for C++ in general (with
its heavy reliance on ilining and templates) and the standard
library in particular (especially with -Wno-system-headers).
We should make an effort to do better.

Setting aside the effort to clean up the library so that it can
be used even with -Wsystem-headers, warnings about out of bounds
accesses should trigger even with -Wno-system-headers.  If one
doesn't I'd tend to view it as a bug.  I added code to have some
trigger despite it but I'm pretty sure there are more places where
the middle end needs to do the same gymnastics t

Re: [PATCH] rs6000: Don't ICE when spilling an MMA accumulator

2020-08-06 Thread Peter Bergner via Gcc-patches
On 8/5/20 6:06 PM, Segher Boessenkool wrote:
> You can just say
> 
>&& reload_completed
>&& !(fpr_reg_operand (operands[0], PXImode) && operands[1] == const0_rtx)
> 
> afaics?

Agreed.  I made that change and retested which was clean.


> Okay (for trunk, and later 10) with or without such a change.  Thanks!

Ok, updated patch pushed to trunk.  I'll push to GCC10 after a day or two.
Thanks!

Peter



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Qing Zhao via Gcc-patches



> On Aug 6, 2020, at 3:37 AM, Richard Biener  wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
 
 From The SECURE project and GCC in GCC Cauldron 2018:
 
 Speaker: Graham Markall
 
 The SECURE project is a 15 month program funded by Innovate UK, to
 take well known security techniques from academia and make them
 generally available in standard compilers, specfically GCC and LLVM.
 An explicit objective is for those techniques to be incorporated in
 the upstream versions of compilers. The Cauldron takes place in the
 final month of the project and this talk will present the technical
 details of some of the techniques implemented, and review those that
 are yet to be implemented. A particular focus of this talk will be on
 verifying that the implemetnation is correct, which can be a bigger
 challenge than the implementation.
 
 Techniques to be covered in the project include the following:
 
 Stack and register erasure. Ensuring that on return from a function,
 no data is left lying on the stack or in registers. Particular
 challenges are in dealing with inlining, shrink wrapping and caching.
 
 This patch implemens register erasure.
>>> 
>>> Part of it, yes. While I can see abnormal transfer of control is difficult 
>>> exception handling is used too wide spread to be ignored. What's the plan 
>>> there? 
>>> 
>>> So can we also see the other parts? In particular I wonder whether exposing 
>>> just register clearing (in this fine-grained manner) is required and useful 
>>> rather than thinking of a better interface for the whole thing?
>> 
>> You mean to provide an integrated interface for both stack and register 
>> erasure for security purpose?
>> 
>> However, Is stack erasure at function return really a better idea than 
>> zero-init auto-variables in the beginning of the function?
>> 
>> We had some discussion with Kees Cook several weeks ago on the idea of 
>> stack erasure at function return, Kees provided the following comments:
>> 
>> "But back to why I don't think it's the right approach:
>> 
>> Based on the performance measurements of pattern-init and zero-init
>> in Clang, MSVC, and the kernel plugin, it's clear that adding these
>> initializations has measurable performance cost. Doing it at function
>> exit means performing large unconditional wipes. Doing it at function
>> entry means initializations can be dead-store eliminated and highly
>> optimized. Given the current debates on the measurable performance
>> difference between pattern and zero initialization (even in the face of
>> existing dead-store elimination), I would expect wipe-on-function-exit to
>> be outside the acceptable tolerance for performance impact. (Additionally,
>> we've seen negative cache effects on wiping memory when the CPU is done
>> using it, though this is more pronounced in heap wiping. Zeroing at
>> free is about twice as expensive as zeroing at free time due to cache
>> temporality. This is true for the stack as well, but it's not as high.)”
>> 
>> From my understanding, the major issue with stack erasure at function 
>> result is the big performance overhead, And these performance overhead 
>> cannot be reduced with compiler optimizations since those additional 
>> wiping insns are inserted at the end of the routine.
>> 
>> Based on the previous discussion with Kees, I don’t think that stack 
>> erasure at function return is a good idea, Instead, we might provide an 
>> alternative approach:  zero/pattern init to auto-variables. (This 
>> functionality has Been available in LLVM already) This will be another 
>> patch we want to add to GCC for the security purpose in general.
>> 
>> So, I think for the current patch, -fzero-call-used-regs should be good 
>> enough.
>> 
>> Any comments?
> 
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.

With the “spill slot clearing”, do you mean the “stack erasure” or something 
else?

From the paper 

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
Attacks"

https://ieeexplore.ieee.org/document/8445132 


The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the 
features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system 
call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In 
most cases, the adversary
 would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode 
can be executed directly
 instead of rewritting shellcode to ROP chains which may cause some troubles 
for the adversary. In upper 
example, the system call is number 59 which is “exec

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 09:17 -0600, Martin Sebor via Libstdc++ wrote:

Sorry.  I don't see this exercise as a complete waste of time
(but I understand why it feels like that to you).

What it highlights is the fact that the warning infrastructure
we have in place is far from optimal for C++ in general (with
its heavy reliance on ilining and templates) and the standard
library in particular (especially with -Wno-system-headers).
We should make an effort to do better.

Setting aside the effort to clean up the library so that it can
be used even with -Wsystem-headers,


Yeah, it's an ongoing effort.


warnings about out of bounds
accesses should trigger even with -Wno-system-headers.  If one
doesn't I'd tend to view it as a bug.


I agree. And __attribute__((__warning__(""))) too.


I added code to have some
trigger despite it but I'm pretty sure there are more places where
the middle end needs to do the same gymnastics to enable it.

Martin






Re: std:vec for classes with constructor?

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 16:17 +0200, Aldy Hernandez wrote:



On 8/6/20 12:48 PM, Jonathan Wakely wrote:

On 06/08/20 12:31 +0200, Richard Biener wrote:
On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely 
 wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:

Andrew MacLeod via Gcc-patches  writes:

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor 

 wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't
in our internal vec<> implementation.  Although we usually 

steer away

from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and 

allocated with

calloc.
Ummm... I did not object but I will save the URL of this 

message in the

archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this 

supposed to be an
exception, allowed only not to complicate the irange patch 

further, or
will this be generally accepted thing to do when someone 

wants to have

a
vector of constructed items?
It's definitely not what we want. You have to find another 

solution to this problem.


Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no 

allocation
done per element, so a small number of elements have a couple 

of fields

initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor, 

std::vec

does.


I realise you weren't claiming otherwise, but: that could be fixed :-)


It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.


For the record, neither value_range nor int_range require any 
allocations.  The sub-range storage resides in the stack or wherever 
it was defined.  However, it is definitely not a POD.


Digging deeper, I notice that the original issue that caused us to use 
std::vector was not in-place new but the safe_grow_cleared.  The 
original code had:



auto_vec known_value_ranges;
...
...
if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
 known_value_ranges.safe_grow_cleared (count);
 known_value_ranges[i] = vr;
 }


I would've gladly kept the auto_vec, had I been able to do call the 
constructor by doing an in-place new:



   if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
- known_value_ranges.safe_grow_cleared (count);
+ {
+   known_value_ranges.safe_grow_cleared (count);
+   for (int i = 0; i < count; ++i)
+ new (&known_value_ranges[i]) value_range ();
+ }
   known_value_ranges[i] = vr;
 }
 }


But alas, compiling yields:


In file included from /usr/include/wchar.h:35,
from /usr/include/c++/10/cwchar:44,
from /usr/include/c++/10/bits/postypes.h:40,
from /usr/include/c++/10/iosfwd:40,
from /usr/include/gmp-x86_64.h:34,
from /usr/include/gmp.h:59,
from /home/aldyh/src/gcc/gcc/system.h:687,
from /home/aldyh/src/gcc/gcc/ipa-fnsummary.c:55:
/home/aldyh/src/gcc/gcc/vec.h: In instantiation of ‘static size_t vec::embedded_size(unsigned int) [with T = int_range<1>; A = va_heap; size_t 
= long unsigned int]’:
/home/aldyh/src/gcc/gcc/vec.h:288:58:   required from ‘static void va_heap::reserve(vec*&, unsigned int, bool) [with T = int_range<1>]’
/home/aldyh/src/gcc/gcc/vec.h:1746:20:   required from ‘bool vec::reserve(unsigned 
int, bool) [with T = int_range<1>]’
/home/aldyh/src/gcc/gcc/vec.h:1766:10:   required from ‘bool 
vec::re

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 14:14 +0100, Jonathan Wakely wrote:

On 05/08/20 16:31 -0600, Martin Sebor via Libstdc++ wrote:

On 8/5/20 3:25 PM, Jonathan Wakely wrote:

P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
 template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then 

[committed] libstdc++: Fix unnecessary allocations in read_symlink [PR 96484]

2020-08-06 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/96484
* src/c++17/fs_ops.cc (fs::read_symlink): Return an error
immediately for non-symlinks.
* src/filesystem/ops.cc (fs::read_symlink): Likewise.

Tested x86_64-linux. Committed to trunk.

Backports to follow.

commit 6a13a4e3f29fc4ce5eff96d74ba965c9fdc02184
Author: Jonathan Wakely 
Date:   Thu Aug 6 18:44:50 2020

libstdc++: Fix unnecessary allocations in read_symlink [PR 96484]

libstdc++-v3/ChangeLog:

PR libstdc++/96484
* src/c++17/fs_ops.cc (fs::read_symlink): Return an error
immediately for non-symlinks.
* src/filesystem/ops.cc (fs::read_symlink): Likewise.

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 873f93aacfc..c685b1824f9 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -1180,6 +1180,12 @@ fs::path fs::read_symlink(const path& p, error_code& ec)
   ec.assign(errno, std::generic_category());
   return result;
 }
+  else if (!fs::is_symlink(make_file_status(st)))
+{
+  ec.assign(EINVAL, std::generic_category());
+  return result;
+}
+
   std::string buf(st.st_size ? st.st_size + 1 : 128, '\0');
   do
 {
diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 29ea9c0ce87..8c8854bf28e 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -998,6 +998,12 @@ fs::path fs::read_symlink(const path& p [[gnu::unused]], 
error_code& ec)
   ec.assign(errno, std::generic_category());
   return result;
 }
+  else if (!fs::is_symlink(make_file_status(st)))
+{
+  ec.assign(EINVAL, std::generic_category());
+  return result;
+}
+
   std::string buf(st.st_size ? st.st_size + 1 : 128, '\0');
   do
 {


[PATCH] c++: Improve RANGE_EXPR optimization in cxx_eval_vec_init

2020-08-06 Thread Patrick Palka via Gcc-patches
This patch eliminates an exponential dependence in cxx_eval_vec_init on
the array dimension of a VEC_INIT_EXPR when the RANGE_EXPR optimization
applies.  This is achieved by using a single constructor_elt (with index
RANGE_EXPR 0...max-1) per dimension instead of two constructor_elts
(with index 0 and RANGE_EXPR 1...max-1 respectively).  In doing so, we
can also get rid of the call to unshare_constructor since the element
initializer now gets used in exactly one spot.

The patch also removes the 'eltinit = new_ctx.ctor' assignment within the
RANGE_EXPR optimization since eltinit should already always be equal to
new_ctx.ctor here (modulo encountering an error when computing eltinit).
This was verified by running the testsuite against an appropriate assert.

Finally, this patch reverses the sense of the ctx->quiet test that
controls whether to short-circuit evaluation upon seeing an error.  This
should speed up speculative evaluation of non-constant VEC_INIT_EXPRs
(since ctx->quiet is true then).  I'm not sure why we were testing
!ctx->quiet originally; it's inconsistent with how we short-circuit in
other spots.  I contrived the testcase array60.C below which verifies
that we now short-circuit quickly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_vec_init_1): Move the i == 0 test to the
if statement that guards the RANGE_EXPR optimization.  Invert
the ctx->quiet test. Apply the RANGE_EXPR optimization before we
append the first element initializer.  Truncate ctx->ctor when
performing the RANGE_EXPR optimization.  Make the built
RANGE_EXPR start at index 0 instead of 1.  Don't call
unshare_constructor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array28.C: New test.
* g++.dg/init/array60.C: New test.
---
 gcc/cp/constexpr.c| 34 ++-
 .../g++.dg/cpp0x/constexpr-array28.C  | 14 
 gcc/testsuite/g++.dg/init/array60.C   | 13 +++
 3 files changed, 45 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array28.C
 create mode 100644 gcc/testsuite/g++.dg/init/array60.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ab747a58fa0..e67ce5da355 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4205,7 +4205,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
  if (value_init || init == NULL_TREE)
{
  eltinit = NULL_TREE;
- reuse = i == 0;
+ reuse = true;
}
  else
eltinit = cp_build_array_ref (input_location, init, idx, complain);
@@ -4222,7 +4222,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
return ctx->ctor;
  eltinit = cxx_eval_constant_expression (&new_ctx, init, lval,
  non_constant_p, overflow_p);
- reuse = i == 0;
+ reuse = true;
}
   else
{
@@ -4236,35 +4236,37 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
  eltinit = cxx_eval_constant_expression (&new_ctx, eltinit, lval,
  non_constant_p, overflow_p);
}
-  if (*non_constant_p && !ctx->quiet)
+  if (*non_constant_p && ctx->quiet)
break;
-  if (new_ctx.ctor != ctx->ctor)
-   {
- /* We appended this element above; update the value.  */
- gcc_assert ((*p)->last().index == idx);
- (*p)->last().value = eltinit;
-   }
-  else
-   CONSTRUCTOR_APPEND_ELT (*p, idx, eltinit);
+
   /* Reuse the result of cxx_eval_constant_expression call
 from the first iteration to all others if it is a constant
 initializer that doesn't require relocations.  */
-  if (reuse
+  if (i == 0
+ && reuse
  && max > 1
  && (eltinit == NULL_TREE
  || (initializer_constant_valid_p (eltinit, TREE_TYPE (eltinit))
  == null_pointer_node)))
{
- if (new_ctx.ctor != ctx->ctor)
-   eltinit = new_ctx.ctor;
  tree range = build2 (RANGE_EXPR, size_type_node,
-  build_int_cst (size_type_node, 1),
+  build_int_cst (size_type_node, 0),
   build_int_cst (size_type_node, max - 1));
- CONSTRUCTOR_APPEND_ELT (*p, range, unshare_constructor (eltinit));
+ vec_safe_truncate (*p, 0);
+ CONSTRUCTOR_APPEND_ELT (*p, range, eltinit);
  break;
}
   else if (i == 0)
vec_safe_reserve (*p, max);
+
+  if (new_ctx.ctor != ctx->ctor)
+   {
+ /* We appended this element above; update the value.  */
+ gcc_assert ((*p)->last().index == idx);
+ (*p)->last().value = eltinit;
+   }
+

[PATCH] c++: constraints and address of template-id

2020-08-06 Thread Patrick Palka via Gcc-patches
When resolving the address of a template-id, we need to drop functions
whose associated constraints are not satisfied, as per [over.over].  We
do so in resolve_address_of_overloaded_function, but not in
resolve_overloaded_unification or resolve_nondeduced_context, which
seems like an oversight.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?

gcc/cp/ChangeLog:

* pt.c (resolve_overloaded_unification): Drop functions with
unsatisfied constraints.
(resolve_nondeduced_context): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn5.C: New test.
* g++.dg/concepts/fn8.C: Generalize dg-error directive to accept
"no matching function ..." diagnostic.
* g++.dg/cpp2a/concepts-fn1.C: Likewise.
* g++.dg/cpp2a/concepts-ts2.C: Likewise.
* g++.dg/cpp2a/concepts-ts3.C: Likewise.
---
 gcc/cp/pt.c   |  5 -
 gcc/testsuite/g++.dg/concepts/fn8.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-fn1.C |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-fn5.C | 16 
 gcc/testsuite/g++.dg/cpp2a/concepts-ts2.C |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-ts3.C |  2 +-
 6 files changed, 24 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e7496002c1c..bcfe8d146b1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22122,6 +22122,8 @@ resolve_overloaded_unification (tree tparms,
  && !any_dependent_template_arguments_p (subargs))
{
  fn = instantiate_template (fn, subargs, tf_none);
+ if (!constraints_satisfied_p (fn))
+   continue;
  if (undeduced_auto_decl (fn))
{
  /* Instantiate the function to deduce its return type.  */
@@ -22268,7 +22270,8 @@ resolve_nondeduced_context (tree orig_expr, 
tsubst_flags_t complain)
  badfn = fn;
  badargs = subargs;
}
- else if (elem && (!goodfn || !decls_match (goodfn, elem)))
+ else if (elem && (!goodfn || !decls_match (goodfn, elem))
+  && constraints_satisfied_p (elem))
{
  goodfn = elem;
  ++good;
diff --git a/gcc/testsuite/g++.dg/concepts/fn8.C 
b/gcc/testsuite/g++.dg/concepts/fn8.C
index ed900809908..32df5a556c0 100644
--- a/gcc/testsuite/g++.dg/concepts/fn8.C
+++ b/gcc/testsuite/g++.dg/concepts/fn8.C
@@ -24,5 +24,5 @@ template
   void g(T x) { }
 
 int main () {
-  g(&f); // { dg-error "no matches" }
+  g(&f); // { dg-error "no match" }
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn1.C
index 238eb819e90..b31675d255c 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-fn1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn1.C
@@ -170,7 +170,7 @@ template void g(T x) { }
 void driver_3 () 
 {
   g(&ok);
-  g(&err); // { dg-error "no matches" }
+  g(&err); // { dg-error "no match" }
 }
 
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn5.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn5.C
new file mode 100644
index 000..c01cedde28e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn5.C
@@ -0,0 +1,16 @@
+// Verify we check constraints when resolving the address of a template-id.
+// { dg-do compile { target c++20 } }
+
+void id(auto) { }
+
+template 
+int f() { return 0; }
+
+template  requires requires { T::fail(); }
+auto f() { T::fail(); }
+
+int main() {
+  using U = decltype(&f);
+  (void)&f;
+  id(&f);
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ts2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ts2.C
index d28002c035a..5942ff19327 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-ts2.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ts2.C
@@ -173,7 +173,7 @@ template void g(T x) { }
 void driver_3 () 
 {
   g(&ok);
-  g(&err); // { dg-error "no matches" }
+  g(&err); // { dg-error "no match" }
 }
 
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ts3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ts3.C
index 9d47a7a083d..6f7ed1ffee4 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-ts3.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ts3.C
@@ -173,7 +173,7 @@ template void g(T x) { }
 void driver_3 () 
 {
   g(&ok);
-  g(&err); // { dg-error "no matches" }
+  g(&err); // { dg-error "no match" }
 }
 
 
-- 
2.28.0.97.gdc04167d37



Re: std:vec for classes with constructor?

2020-08-06 Thread Aldy Hernandez via Gcc-patches




On 8/6/20 6:30 PM, Jonathan Wakely wrote:

On 06/08/20 16:17 +0200, Aldy Hernandez wrote:



On 8/6/20 12:48 PM, Jonathan Wakely wrote:

On 06/08/20 12:31 +0200, Richard Biener wrote:
On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely  
wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:

Andrew MacLeod via Gcc-patches  writes:

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor 

 wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't
in our internal vec<> implementation.  Although we usually 

steer away

from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and 

allocated with

calloc.
Ummm... I did not object but I will save the URL of this 

message in the
archive so that I can waive it in front of anyone complaining 
why I

don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this 

supposed to be an
exception, allowed only not to complicate the irange patch 

further, or
will this be generally accepted thing to do when someone 

wants to have

a
vector of constructed items?
It's definitely not what we want. You have to find another 

solution to this problem.


Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere 
with

our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is no 

allocation
done per element, so a small number of elements have a couple 

of fields

initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a constructor, 

std::vec

does.


I realise you weren't claiming otherwise, but: that could be fixed 
:-)


It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.


For the record, neither value_range nor int_range require any 
allocations.  The sub-range storage resides in the stack or wherever 
it was defined.  However, it is definitely not a POD.


Digging deeper, I notice that the original issue that caused us to use 
std::vector was not in-place new but the safe_grow_cleared.  The 
original code had:



auto_vec known_value_ranges;
...
...
if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
 known_value_ranges.safe_grow_cleared (count);
 known_value_ranges[i] = vr;
 }


I would've gladly kept the auto_vec, had I been able to do call the 
constructor by doing an in-place new:



   if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
- known_value_ranges.safe_grow_cleared (count);
+ {
+   known_value_ranges.safe_grow_cleared 
(count);

+   for (int i = 0; i < count; ++i)
+ new (&known_value_ranges[i]) 
value_range ();

+ }
   known_value_ranges[i] = vr;
 }
 }


But alas, compiling yields:


In file included from /usr/include/wchar.h:35,
    from /usr/include/c++/10/cwchar:44,
    from /usr/include/c++/10/bits/postypes.h:40,
    from /usr/include/c++/10/iosfwd:40,
    from /usr/include/gmp-x86_64.h:34,
    from /usr/include/gmp.h:59,
    from /home/aldyh/src/gcc/gcc/system.h:687,
    from /home/aldyh/src/gcc/gcc/ipa-fnsummary.c:55:
/home/aldyh/src/gcc/gcc/vec.h: In instantiation of ‘static size_t 
vec::embedded_size(unsigned int) [with T = 
int_range<1>; A = va_heap; size_t = long unsigned int]’:
/home/aldyh/src/gcc/gcc/vec.h:288:58:   required from ‘static void 
va_heap::reserve(vec*&, unsigned int, bool) 
[with T = int_range<1>]’
/home/aldyh/src/gcc/gcc/vec.h:1746:20:   required from ‘bool 
vec::reserve(unsigned int, bool) [with T = int_range<1>]’
/home/aldyh/src/gcc/

Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Marc Glisse

On Thu, 6 Aug 2020, Christophe Lyon wrote:


Was I on the right track configuring with
--target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
--with-fpu=neon-fp16
then compiling without any special option?


Maybe you also need --with-float=hard, I don't remember if it's
implied by the 'hf' target suffix


Thanks! That's what I was missing to reproduce the issue. Now I can
reproduce it with just

typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
return a==5 | b==7;
}

with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 
-O1

  _1 = a_5(D) == { 5, 5, 5, 5 };
  _3 = b_6(D) == { 7, 7, 7, 7 };
  _9 = _1 | _3;
  _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);

we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
false), while with -fdisable-tree-forwprop4 we do manage to expand

  _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 
112);

It doesn't make much sense to me that we can expand the more complicated
form and not the simpler form of the same operation (both compare a to 5
and produce a vector of -1 or 0 of the same size), especially when the
target has an instruction (vceq) that does just what we want.

Introducing boolean vectors was fine, but I think they should be real 
types, that we can operate on, not be forced to appear only as the first 
argument of a vcond.


I can think of 2 natural ways to improve things: either implement vector 
comparisons in the ARM backend (possibly by forwarding to their existing 
code for vcond), or in the generic expansion code try using vcond if the 
direct comparison opcode is not provided.


We can temporarily revert my patch, but I would like it to be temporary. 
Since aarch64 seems to handle the same code just fine, maybe someone who 
knows arm could copy the relevant code over?


Does my message make sense, do people have comments?

--
Marc Glisse


Re: [PATCH] Power10: Add BRH, BRW, BRD support.

2020-08-06 Thread Segher Boessenkool
On Tue, Aug 04, 2020 at 10:40:15PM -0400, Michael Meissner wrote:
> The power10 processor adds 3 new instructions (BRH, BRW, BRD) that byte swaps
> half-words, words, and double-words within a GPR register.

The brh insn reverses the bytes in each of four 16-bit words in a GPR,
but this patch only does it for HImode.  Similar for brw.  Okay.

> 2020-08-04  Michael Meissner  
> 
>   * config/rs6000/rs6000.md (bswaphi2_reg): Generate the BRH
>   instruction on ISA 3.1.

The changelog should just describe the change, not the effect of the
change, so just "New define_insn." or "New pattern." or "New." here.
All other info goes in the commit message.

This patch is okay for trunk, and all backports later.  Thanks Mike!


Segher


gcc-patches@gcc.gnu.org

2020-08-06 Thread Jonathan Wakely via Gcc-patches
Similar to the bugs I fixed recently in istream::ignore, we incorrectly
set eofbit too often in operator>>(istream&, string&) and
operator>>(istream&.  char(&)[N]).

We should only set eofbit if we reach EOF but would have kept going
otherwise. If we've already extracted the maximum number of characters
(whether that's because of the buffer size or the istream's width())
then we should not set eofbit.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.tcc
(operator>>(basic_istream&, basic_string&)): Do not set eofbit
if extraction stopped after in.width() characters.
* src/c++98/istream-string.cc (operator>>(istream&, string&)):
Likewise.
* include/bits/istream.tcc (__istream_extract): Do not set
eofbit if extraction stopped after n-1 characters.
* src/c++98/istream.cc (__istream_extract): Likewise.
* testsuite/21_strings/basic_string/inserters_extractors/char/13.cc: 
New test.
* testsuite/21_strings/basic_string/inserters_extractors/wchar_t/13.cc: 
New test.
* testsuite/27_io/basic_istream/extractors_character/char/5.cc: New 
test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/5.cc: New 
test.

Tested powerpc64le-linux. Committed to trunk.

commit 4e39f563c0cd25401f689c2093cb8c13692156ef
Author: Jonathan Wakely 
Date:   Thu Aug 6 19:23:14 2020

libstdc++: Do not set eofbit eagerly in operator>>(istream&, char(&)[N])

Similar to the bugs I fixed recently in istream::ignore, we incorrectly
set eofbit too often in operator>>(istream&, string&) and
operator>>(istream&.  char(&)[N]).

We should only set eofbit if we reach EOF but would have kept going
otherwise. If we've already extracted the maximum number of characters
(whether that's because of the buffer size or the istream's width())
then we should not set eofbit.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.tcc
(operator>>(basic_istream&, basic_string&)): Do not set eofbit
if extraction stopped after in.width() characters.
* src/c++98/istream-string.cc (operator>>(istream&, string&)):
Likewise.
* include/bits/istream.tcc (__istream_extract): Do not set
eofbit if extraction stopped after n-1 characters.
* src/c++98/istream.cc (__istream_extract): Likewise.
* 
testsuite/21_strings/basic_string/inserters_extractors/char/13.cc: New test.
* 
testsuite/21_strings/basic_string/inserters_extractors/wchar_t/13.cc: New test.
* testsuite/27_io/basic_istream/extractors_character/char/5.cc: New 
test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/5.cc: 
New test.

diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index e370f439390..75218a40610 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -1518,7 +1518,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
  __str.append(__buf, __len);
 
- if (_Traits::eq_int_type(__c, __eof))
+ if (__extracted < __n && _Traits::eq_int_type(__c, __eof))
__err |= __ios_base::eofbit;
  __in.width(0);
}
diff --git a/libstdc++-v3/include/bits/istream.tcc 
b/libstdc++-v3/include/bits/istream.tcc
index b8f530f6ef5..022db9383e9 100644
--- a/libstdc++-v3/include/bits/istream.tcc
+++ b/libstdc++-v3/include/bits/istream.tcc
@@ -1023,7 +1023,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  ++__extracted;
  __c = __sb->snextc();
}
- if (_Traits::eq_int_type(__c, __eof))
+
+ if (__extracted < __num - 1
+ && _Traits::eq_int_type(__c, __eof))
__err |= ios_base::eofbit;
 
  // _GLIBCXX_RESOLVE_LIB_DEFECTS
diff --git a/libstdc++-v3/src/c++98/istream-string.cc 
b/libstdc++-v3/src/c++98/istream-string.cc
index c59f2ce0b34..bfd7389e2e2 100644
--- a/libstdc++-v3/src/c++98/istream-string.cc
+++ b/libstdc++-v3/src/c++98/istream-string.cc
@@ -93,7 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
 
- if (__traits_type::eq_int_type(__c, __eof))
+ if (__extracted < __n && __traits_type::eq_int_type(__c, __eof))
__err |= ios_base::eofbit;
  __in.width(0);
}
diff --git a/libstdc++-v3/src/c++98/istream.cc 
b/libstdc++-v3/src/c++98/istream.cc
index 7a48779d337..79a77b8475a 100644
--- a/libstdc++-v3/src/c++98/istream.cc
+++ b/libstdc++-v3/src/c++98/istream.cc
@@ -261,7 +261,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
 
- if (__traits_type::eq_int_type(__c, __eof))
+ if (__extracted < __num - 1
+ && __traits_type::eq_int_type(__c, __eof))
__err |= ios_base::eofbi

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Martin Sebor via Gcc-patches

On 8/6/20 10:00 AM, Jonathan Wakely wrote:

On 06/08/20 09:17 -0600, Martin Sebor via Libstdc++ wrote:

Sorry.  I don't see this exercise as a complete waste of time
(but I understand why it feels like that to you).

What it highlights is the fact that the warning infrastructure
we have in place is far from optimal for C++ in general (with
its heavy reliance on ilining and templates) and the standard
library in particular (especially with -Wno-system-headers).
We should make an effort to do better.

Setting aside the effort to clean up the library so that it can
be used even with -Wsystem-headers,


Yeah, it's an ongoing effort.


warnings about out of bounds
accesses should trigger even with -Wno-system-headers.  If one
doesn't I'd tend to view it as a bug.


I agree. And __attribute__((__warning__(""))) too.


I've opened four bugs to track some of the issues we've discussed:
96502, 96503, and 96505 for the lost attribute effect after
inlining, and 96508 for the system header interaction.  I wasn't
able to reproduce the problem you're having with the attribute
(calling an out-of-line function declared in a system header
does produce a warning) so if you're not completely put off
by your experience so far please take a look at it and see
what I may have missed.

Thanks
Martin




I added code to have some
trigger despite it but I'm pretty sure there are more places where
the middle end needs to do the same gymnastics to enable it.

Martin








Re: [PATCH] Implement P0966 std::string::reserve should not shrink

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 30/07/20 16:39 +0100, Jonathan Wakely wrote:

On 30/07/20 15:29 +0100, Jonathan Wakely wrote:

On 12/05/19 21:22 +, Andrew Luo wrote:

It's been a while, but thought I'd check in again now that GCC 9 has been 
branched, and master is now on GCC 10.

I've merged my changes with the latest code and attached a diff... Let me know 
if there's any thoughts on this.


Well I failed to get around to this for GCC 10, so let's try again
now.

I've done another review of the patch, and I'm now wondering why the
new _M_shrink function takes a requested capacity, when the caller
always passes zero. We can simplify both implementations of _M_shrink
if we remove the parameter and change the callers from _M_shrink(0) to
_M_shrink().

Was there a reason to make it take an argument? Did you anticipate
future uses of _M_shrink(n) for n >= 0?

If we simplify it then we need fewer branches in _M_shrink, because we
don't need to do:

// Make sure we don't shrink below the current size.
if (__requested_capacity < length())
__requested_capacity = length();

We only need to check whether length() < capacity() (and whether the
string is shared, for the COW implementation).

And if we do that, we can get rid of _M_shrink() because it's now
identical to the zero-argument form of reserve(). So we can just put
the body of _M_shrink() straight in reserve(). The reserve() function
is deprecated, but when we need to remove it we can just make it
private, so that shrink_to_fit() can still call it.

The only downside of this I see is that when the deprecated reserve()
eventually gets removed from the standard, our users will get a
"reserve() is private" error rather than a "wrong number of arguments"
error. But that might actually be better, since they can go to the
location of the private member and see the comments and attribute
describing its status in different standard versions.

I've attached a relative diff showing my suggested changes to your
most recent patch. This also fixes some regressions, because the
_M_shrink function was not swallowing exceptions that result from a
failure to reallocate, which shrink_to_fit() was doing previously.

What do you think?


Here's the combined patch, based on your original with my proposed
simplifications applied.


I've now pushed that combined patch to master.

Sorry it took so long to integrate your changes, but thanks very much
for the contribution to GCC!


commit 1dbff6ffd71247a099028d4407d745dc0e5cf720
Author: Andrew Luo 
Date:   Thu Aug 6 19:35:43 2020

libstdc++: Implement P0966 std::string::reserve should not shrink

Remove ability for reserve(n) to reduce a string's capacity. Add a new
reserve() overload that makes a shrink-to-fit request, and make
shrink_to_fit() use that.

libstdc++-v3/ChangeLog:

2020-07-30  Andrew Luo  
Jonathan Wakely  

Implement C++20 P0966
* config/abi/pre/gnu.ver (GLIBCXX_3.4): Use less greedy
patterns for basic_string members.
(GLIBCXX_3.4.29): Export new basic_string::reserve symbols.
* doc/xml/manual/status_cxx2020.xml: Update P0966 status.
* include/bits/basic_string.h (shrink_to_fit()): Call reserve().
(reserve(size_type)): Remove default argument.
(reserve()): Declare new overload.
[!_GLIBCXX_USE_CXX11_ABI] (shrink_to_fit, reserve): Likewise.
* include/bits/basic_string.tcc (reserve(size_type)): Remove
support for shrinking capacity.
(reserve()): Perform shrink-to-fit operation.
[!_GLIBCXX_USE_CXX11_ABI] (reserve): Likewise.
* testsuite/21_strings/basic_string/capacity/1.cc: Adjust to
reflect new behavior.
* testsuite/21_strings/basic_string/capacity/char/1.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/char/18654.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/1.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/18654.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/2.cc:
Likewise.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index b6ce76c1f20..b582f53e363 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -223,7 +223,10 @@ GLIBCXX_3.4 {
 _ZNSs6assignE[PRcjmvy]*;
 _ZNSs6insertE[PRcjmvy]*;
 _ZNSs6insertEN9__gnu_cxx17__normal_iteratorIPcSsEE[PRcjmvy]*;
-_ZNSs[67][j-z]*E[PRcjmvy]*;
+_ZNSs6rbeginEv;
+_ZNSs6resizeE[jmy]*;
+_ZNSs7replaceE[jmy]*;
+_ZNSs7reserveE[jmy];
 _ZNSs7[a-z]*EES2_[NPRjmy]*;
 _ZNSs7[a-z]*EES2_S[12]*;
 _ZNSs12_Alloc_hiderC*;
@@ -290,7 +293,10 @@ GLIBCXX_3.4 {
 _ZNSbIwSt11char_traits

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 12:45 -0600, Martin Sebor via Libstdc++ wrote:

On 8/6/20 10:00 AM, Jonathan Wakely wrote:

On 06/08/20 09:17 -0600, Martin Sebor via Libstdc++ wrote:

Sorry.  I don't see this exercise as a complete waste of time
(but I understand why it feels like that to you).

What it highlights is the fact that the warning infrastructure
we have in place is far from optimal for C++ in general (with
its heavy reliance on ilining and templates) and the standard
library in particular (especially with -Wno-system-headers).
We should make an effort to do better.

Setting aside the effort to clean up the library so that it can
be used even with -Wsystem-headers,


Yeah, it's an ongoing effort.


warnings about out of bounds
accesses should trigger even with -Wno-system-headers.  If one
doesn't I'd tend to view it as a bug.


I agree. And __attribute__((__warning__(""))) too.


I've opened four bugs to track some of the issues we've discussed:
96502, 96503, and 96505 for the lost attribute effect after
inlining, and 96508 for the system header interaction.  I wasn't
able to reproduce the problem you're having with the attribute
(calling an out-of-line function declared in a system header
does produce a warning) so if you're not completely put off
by your experience so far please take a look at it and see
what I may have missed.


This preprocessed code fails to warn without -Wsystem-headers:

# 1 "user.C"
# 1 "sys.h" 1

# 2 "sys.h" 3


# 3 "sys.h" 3
__attribute__((warning("badness")))
void diagnose_badness();

template
void foo(T t)
{
  if (t < 0)
diagnose_badness();
}
# 2 "user.C" 2


# 3 "user.C"
int main()
{
  int i = -1;
  foo(i);
}



Re: std:vec for classes with constructor?

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 19:58 +0200, Aldy Hernandez wrote:



On 8/6/20 6:30 PM, Jonathan Wakely wrote:

On 06/08/20 16:17 +0200, Aldy Hernandez wrote:



On 8/6/20 12:48 PM, Jonathan Wakely wrote:

On 06/08/20 12:31 +0200, Richard Biener wrote:
On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely 
 wrote:


On 06/08/20 06:16 +0100, Richard Sandiford wrote:

Andrew MacLeod via Gcc-patches  writes:

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:

On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor

 wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't
in our internal vec<> implementation.  Although we 
usually

steer away

from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass 
and

allocated with

calloc.
Ummm... I did not object but I will save the URL of 
this

message in the
archive so that I can waive it in front of anyone 
complaining why I

don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this

supposed to be an
exception, allowed only not to complicate the irange 
patch

further, or
will this be generally accepted thing to do when 
someone

wants to have

a
vector of constructed items?
It's definitely not what we want. You have to find 
another

solution to this problem.


Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't 
interfere with

our PITA GTY.
The class is pretty straightforward, but we do need a constructor to
initialize the pointer and the max-size field.  There is 
no

allocation
done per element, so a small number of elements have a 
couple

of fields

initialized per element. We'd have to loop to do that anyway.

GCC's vec<> does not provide he ability to run a 
constructor,

std::vec

does.


I realise you weren't claiming otherwise, but: that could 
be fixed :-)


It really should be.

Artificial limitations like that are just a booby trap for the unwary.


It's probably also historic because we couldn't even implement
the case of re-allocation correctly without std::move, could we?


I don't see why not. std::vector worked fine without std::move, it's
just more efficient with std::move, and can be used with a wider set
of element types.

When reallocating you can just copy each element to the new storage
and destroy the old element. If your type is non-copyable then you
need std::move, but I don't think the types I see used with vec<> are
non-copyable. Most of them are trivially-copyable.

I think the benefit of std::move to GCC is likely to be permitting
cheap copies to be made where previously they were banned for
performance reasons, but not because those copies were impossible.


For the record, neither value_range nor int_range require any 
allocations.  The sub-range storage resides in the stack or 
wherever it was defined.  However, it is definitely not a POD.


Digging deeper, I notice that the original issue that caused us to 
use std::vector was not in-place new but the safe_grow_cleared.  
The original code had:



auto_vec known_value_ranges;
...
...
if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
 known_value_ranges.safe_grow_cleared (count);
 known_value_ranges[i] = vr;
 }


I would've gladly kept the auto_vec, had I been able to do call 
the constructor by doing an in-place new:



   if (!vr.undefined_p () && !vr.varying_p ())
 {
   if (!known_value_ranges.length ())
- known_value_ranges.safe_grow_cleared (count);
+ {
+   known_value_ranges.safe_grow_cleared 
(count);

+   for (int i = 0; i < count; ++i)
+ new (&known_value_ranges[i]) 
value_range ();

+ }
   known_value_ranges[i] = vr;
 }
 }


But alas, compiling yields:


In file included from /usr/include/wchar.h:35,
    from /usr/include/c++/10/cwchar:44,
    from /usr/include/c++/10/bits/postypes.h:40,
    from /usr/include/c++/10/iosfwd:40,
    from /usr/include/gmp-x86_64.h:34,
    from /usr/include/gmp.h:59,
    from /home/aldyh/src/gcc/gcc/system.h:687,
    from /home/aldyh/src/gcc/gcc/ipa-fnsummary.c:55:
/home/aldyh/src/gcc/gcc/vec.h: In instantiation of ‘static 
size_t vec::embedded_size(unsigned int) [with T 
= int_range<1>; A = va_heap; size_t = long unsigned int]’:
/home/aldyh/src/gcc/gcc/vec.h:288:58:   required from ‘static 
void va_heap::reserve(vec*&, unsigned int, 
bool) [with T = int_range<1>]’
/home/aldyh/src/gcc/gcc/vec.h:1746:20:   required from ‘bool 
vec::reserve(unsigned int, bool)

Re: std::vector code cleanup fixes optimizations

2020-08-06 Thread François Dumont via Gcc-patches
I wonder if following the application of this patch we shouldn't bump 
versioned namespace through _GLIBCXX_BEGIN_NAMESPACE_VERSION ?


Unless it is still considered as experimental.

François

On 31/07/20 11:03 pm, François Dumont wrote:

On 17/07/20 12:36 pm, Jonathan Wakely wrote:

On 16/12/19 08:18 +0100, François Dumont wrote:
A small refresh on this patch now tested also for versioned 
namespace which require printers.py to be updated.


Note that this simplification works also for normal mode so I can 
apply it independently from the stl_bvector.h part.



    * include/bits/stl_bvector.h
   [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): 
Define as

    _Bit_type*.
    (_Bvector_impl_data(const _Bvector_impl_data&)): Default.
    (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter.
    (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): 
Default.

(_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter.
    (_Bvector_impl_data::_M_reset()): Likewise.
    (_Bvector_impl_data::_M_swap_data): New.
   (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement 
explicitely.
   (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, 
_Bvector_impl&&)): New.
    (_Bvector_base::_Bvector_base(_Bvector_base&&, const 
allocator_type&)):

    New, use latter.
    (vector::vector(vector&&, const allocator_type&, true_type)): 
New, use

    latter.
    (vector::vector(vector&&, const allocator_type&, 
false_type)): New.

    (vector::vector(vector&&, const allocator_type&)): Use latters.
    (vector::vector(const vector&, const allocator_type&)): Adapt.
    [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt,
    const allocator_type&)): Use _M_initialize_range.
    (vector::operator[](size_type)): Use iterator operator[].
    (vector::operator[](size_type) const): Use const_iterator 
operator[].
    (vector::swap(vector&)): Add assertions on allocators. Use 
_M_swap_data.

    [__cplusplus >= 201103](vector::insert(const_iterator, _InputIt,
    _InputIt)): Use _M_insert_range.
    (vector::_M_initialize(size_type)): Adapt.
    [__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove.
    [__cplusplus >= 201103](vector::_M_insert_dispatch): Remove.
    * python/libstdcxx/v6/printers.py 
(StdVectorPrinter._iterator): Stop

    using start _M_offset.
    (StdVectorPrinter.to_string): Likewise.
    * testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt.
    * 
testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc:

    Add check.

François


On 6/24/19 9:31 PM, François Dumont wrote:

Hi

    Any feedback regarding this patch ?

Thanks

On 5/14/19 7:46 AM, François Dumont wrote:

Hi

    This is the patch on vector to:

- Optimize sizeof in Versioned namespace mode. We could go one 
step further by removing _M_p from _M_finish and just transform it 
into an offset but it is a little bit more impacting for the code.


- Implement the swap optimization already done on main std::vector 
template class.


- Fix move constructor so that it is noexcept no matter allocator 
move constructor noexcept qualification


- Optimize move constructor with allocator when allocator type is 
always equal.


- Use shortcuts in C++11 by skipping the _M_XXX_dispatch methods. 
Those are now defined only in pre-C++11 mode, I can't see any abi 
issue in doing so.


    * include/bits/stl_bvector.h
   [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): 
Define as

    _Bit_type*.
    (_Bvector_impl_data(const _Bvector_impl_data&)): Default.
    (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to 
latter.
    (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): 
Default.

(_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter.
    (_Bvector_impl_data::_M_reset()): Likewise.
    (_Bvector_impl_data::_M_swap_data): New.
   (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement 
explicitely.
   (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, 
_Bvector_impl&&)): New.
   (_Bvector_base::_Bvector_base(_Bvector_base&&, const 
allocator_type&)):

    New, use latter.
    (vector::vector(vector&&, const allocator_type&, 
true_type)): New, use

    latter.
    (vector::vector(vector&&, const allocator_type&, 
false_type)): New.
    (vector::vector(vector&&, const allocator_type&)): Use 
latters.

    (vector::vector(const vector&, const allocator_type&)): Adapt.
    [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt,
    const allocator_type&)): Use _M_initialize_range.
    (vector::operator[](size_type)): Use iterator operator[].
    (vector::operator[](size_type) const): Use const_iterator 
operator[].

    (vector::swap(vector&)): Adapt.
    (vector::_M_initialize(size_type)): Add assertions on 
allocators.

    Use _M_swap_data.
    [__cplusplus >= 201103]

Re: [PATCH] Rewrite get_size_range for irange API.

2020-08-06 Thread Martin Sebor via Gcc-patches

On 8/6/20 8:53 AM, Aldy Hernandez via Gcc-patches wrote:

[Martin, does this sound reasonable to you?]


It mostly makes sense to me except one part:



The following patch converts get_size_range to the irange API, thus
removing the use of VR_ANTI_RANGE.

This was a bit tricky because of the gymnastics we do in get_size_range
to ignore negatives and all that.  I didn't convert the function for
multi-ranges.  The code still returns a pair of trees indicating the
massaged range.  But I do believe the code is cleaner and smaller.

I'm not sure the current code (or my adaptation) gets all cases, but
the goal was to keep to the existing functionality, nothing more.

OK?

gcc/ChangeLog:

* calls.c (range_remove_non_positives): New.
(set_bounds_from_range): New.
(get_size_range): Rewrite for irange API.
* tree-affine.c (expr_to_aff_combination): Call determine_value_range
with a range.
* tree-vrp.c (determine_value_range_1): Rename to...
(determine_value_range): ...this.
* tree-vrp.h (determine_value_range): Adjust prototype.
---
  gcc/calls.c   | 139 ++
  gcc/tree-affine.c |   5 +-
  gcc/tree-vrp.c|  44 ++-
  gcc/tree-vrp.h|   2 +-
  4 files changed, 73 insertions(+), 117 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 44401e6350d..4aeeb36a2be 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "attribs.h"
  #include "builtins.h"
  #include "gimple-fold.h"
+#include "range.h"
  
  /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits.  */

  #define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT)
@@ -1237,6 +1238,31 @@ alloc_max_size (void)
return alloc_object_size_limit;
  }
  
+// Remove non-positive numbers from a range.  ALLOW_ZERO is TRUE if 0

+// is considered positive.
+
+static void
+range_remove_non_positives (irange *vr, bool allow_zero)
+{
+  tree floor, type = vr->type ();
+  if (allow_zero)
+floor = build_zero_cst (type);
+  else
+floor = build_one_cst (type);
+  value_range positives (floor, TYPE_MAX_VALUE (type));
+  vr->intersect (positives);
+}
+
+// Set the extreme bounds of range VR into range[].
+
+static bool
+set_bounds_from_range (const irange *vr, tree range[2])
+{
+  range[0] = wide_int_to_tree (vr->type (), vr->lower_bound ());
+  range[1] = wide_int_to_tree (vr->type (), vr->upper_bound ());
+  return true;
+}
+
  /* Return true when EXP's range can be determined and set RANGE[] to it
 after adjusting it if necessary to make EXP a represents a valid size
 of object, or a valid size argument to an allocation function declared
@@ -1250,9 +1276,11 @@ alloc_max_size (void)
  bool
  get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
  {
-  if (!exp)
-return false;
-
+  if (!exp || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+{
+  range[0] = range[1] = NULL_TREE;
+  return false;
+}
if (tree_fits_uhwi_p (exp))
  {
/* EXP is a constant.  */
@@ -1261,91 +1289,30 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)
  }
  
tree exptype = TREE_TYPE (exp);

-  bool integral = INTEGRAL_TYPE_P (exptype);
-
-  wide_int min, max;
-  enum value_range_kind range_type;
-
-  if (integral)
-range_type = determine_value_range (exp, &min, &max);
-  else
-range_type = VR_VARYING;
-
-  if (range_type == VR_VARYING)
+  value_range vr;
+  determine_value_range (&vr, exp);
+  if (vr.num_pairs () == 1)
+return set_bounds_from_range (&vr, range);
+
+  widest_irange positives (vr);
+  range_remove_non_positives (&positives, allow_zero);
+
+  // If all numbers are negative, let the caller sort it out.
+  if (positives.undefined_p ())
+return set_bounds_from_range (&vr, range);
+
+  // Remove the unknown parts of a multi-range.
+  // This will transform [5,10][20,MAX] into [5,10].


Is this comment correct?  Wouldn't this result in returning smaller
sizes than the actual value allows?  If so, I'd expect that to cause
false positives (and in that case, if none of our tests fail we need
to add some that would).

By my reading of the code below it seems to return the upper range
(i.e., [20, MAX]) but I'm not fully acquainted with the new ranger
APIs yet.

Thanks
Martin



+  int pairs = positives.num_pairs ();
+  if (pairs > 1
+  && positives.upper_bound () == wi::to_wide (TYPE_MAX_VALUE (exptype)))
  {
-  if (integral)
-   {
- /* Use the full range of the type of the expression when
-no value range information is available.  */
- range[0] = TYPE_MIN_VALUE (exptype);
- range[1] = TYPE_MAX_VALUE (exptype);
- return true;
-   }
-
-  range[0] = NULL_TREE;
-  range[1] = NULL_TREE;
-  return false;
+  value_range last_range (exptype,
+ positives.lower_bound (pairs - 1),
+

Re: std::vector code cleanup fixes optimizations

2020-08-06 Thread Jonathan Wakely via Gcc-patches

On 06/08/20 21:30 +0200, François Dumont via Libstdc++ wrote:
I wonder if following the application of this patch we shouldn't bump 
versioned namespace through _GLIBCXX_BEGIN_NAMESPACE_VERSION ?


Definitely not.


Unless it is still considered as experimental.


Yes, it is. And it doesn't claim to provide any stability or backwards
compatibility.

A small change like this doesn't justify changing it.

We changed it last time because I was hoping to change the std::string
ABI used by the versioned namespace, but that never happened.




Re: [PATCH] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-08-06 Thread David Malcolm via Gcc-patches
On Mon, 2020-08-03 at 10:07 +0200, Andrea Corallo wrote:
> David Malcolm  writes:
> 
> > On Fri, 2020-07-24 at 18:05 -0400, David Malcolm via Gcc-patches wrote:
> >
> > [...]
> >
> >> I haven't thought this through in detail, and I'm not sure exactly
> >> how
> >> it would work for arbitrary types, but I thought it worth sharing. 
> >> (For example I can think of nasty issues if we ever want to support
> >> cross-compilation, e.g. where sizeof types or  endianness differs
> >> between host and target).
> >
> > ...which is an argument in favor of retaining the name "blob", perhaps
> > as the name of the argument in the header file e.g.:
> >
> > extern void
> > gcc_jit_global_set_initializer (gcc_jit_lvalue *global,
> >  const void *blob,
> >  size_t num_bytes);
> >  
> >
> > as a subtle hint to the user that they need to be wary about binary
> > layouts ("here be dragons").
> >
> > [...]
> 
> Hi Dave & all,
> 
> following up this is my take on the implementation of:
> 
> gcc_jit_global_set_initializer (gcc_jit_lvalue *global,
> const void *blob,
> size_t num_bytes);
> 
> 'global' must be an array but in the seek of generality it now supports
> all the various integral types and is not limited to char[].
> 
> As you anticipated the implementation I came up is currently not safe
> for cross-compilation, not sure is requirement tho.
> 
> make check-jit is clean
> 
> Feedback very welcome
> 
> Thanks!

Thanks for the updated patch.  Comments inline below.

>   Andrea
> 
> gcc/jit/ChangeLog
> 
> 2020-08-01  Andrea Corallo  
> 
> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_14): New ABI tag.
> * docs/topics/expressions.rst (gcc_jit_global_set_initializer):
> Document new entry point in section 'Global variables'.
> * jit-playback.c (global_new_decl, global_finalize_lvalue): New
> method.
> (playback::context::new_global): Make use of global_new_decl,
> global_finalize_lvalue.
> (load_blob_in_ctor): New template function in use by the
> following.
> (playback::context::new_global_initialized): New method.
> * jit-playback.h (class context): Decl 'new_global_initialized',
> 'global_new_decl', 'global_finalize_lvalue'.
> (lvalue::set_initializer): Add implementation.
> * jit-recording.c (recording::memento_of_get_pointer::get_size)
> (recording::memento_of_get_type::get_size): Add implementation.
> (recording::global::write_initializer_reproducer): New function in
> use by 'recording::global::write_reproducer'.
> (recording::global::replay_into)
> (recording::global::write_to_dump)
> (recording::global::write_reproducer): Handle
> initialized case.
> * jit-recording.h (class type): Decl 'get_size' and
> 'num_elements'.
> * libgccjit++.h (class lvalue): Declare new 'set_initializer'
> method.
> (class lvalue): Decl 'is_global' and 'set_initializer'.
> (class class global) Decl 'write_initializer_reproducer'. Add
> 'm_initializer', 'm_initializer_num_bytes' fields.  Implement
> 'set_initializer'.
> * libgccjit.c (gcc_jit_global_set_initializer): New function.
> * libgccjit.h (gcc_jit_global_set_initializer): New function
> declaration.
> * libgccjit.map (LIBGCCJIT_ABI_14): New ABI tag.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-01  Andrea Corallo  
> 
> * jit.dg/all-non-failing-tests.h: Add test-blob.c.
> * jit.dg/test-global-set-initializer.c: New testcase.

[...]

> diff --git a/gcc/jit/docs/topics/expressions.rst 
> b/gcc/jit/docs/topics/expressions.rst
> index d783ceea51a8..7699dcfd27be 100644
> --- a/gcc/jit/docs/topics/expressions.rst
> +++ b/gcc/jit/docs/topics/expressions.rst
> @@ -582,6 +582,27 @@ Global variables
>referring to it.  Analogous to using an "extern" global from a
>header file.
>  
> +.. function:: gcc_jit_lvalue *\
> +  gcc_jit_global_set_initializer (gcc_jit_lvalue *global,\
> +  const void *blob,\
> +  size_t num_bytes)
> +
> +   Set an initializer for an object using the memory content pointed
> +   by ``blob`` for ``num_bytes``.  ``global`` must be an arrays of an
  
Typo: "arrays" -> "array"

> +   integral type.

Why the non-void return type?  Looking at libgccjit.c I see it returns
"global" if it succeeds, or NULL if it fails.  Wouldn't it be better to
simply have void return type, and rely on the normaly error-handling
mechanisms?
Or is this inspired by the inline asm patch? (for PR 87291)

[...]

> --- a/gcc/jit/jit-playback.h
> +++ b/gcc/jit/jit-playback.h
> @@ -111,6 +111,15 @@ public:
> type *type,
> const cha

Re: [PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-06 Thread Joseph Myers
On Thu, 6 Aug 2020, Maciej W. Rozycki via Gcc-patches wrote:

>  Given that for the `riscv64-linux-gnu' target and the ilp32d multilib 
> glibc currently fails to link against libgcc.a built at -O0 I first ran 
> reference testing with target libraries built at -O2, but comparing that 
> to change-under-test -O2 results revealed another issue with GCC target 
> libraries built at -O0 causing link failures across testsuites, namely 
> libgcov.a referring atomic primitives where libatomic.a has not been 
> linked in.  I haven't figured out yet if the issue is in libgcov, the 
> testsuite or the specs.  Examples of failures:

That --as-needed -latomic --no-as-needed should be used by default to link 
in libatomic when required (with consequent changes needed for all 
testsuites) is a known issue; see bug 81358.  Having such references in 
libgcov simply makes that known issue more visible.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Kees Cook via Gcc-patches
On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.  Like
> we had that discussion on secure erase of memory that should not
> be DSEd.

I've viewed stack erasure as separate from register clearing. The
"when" of stack erasure tends to define which things are being defended
against. If the stack is being erased on function entry, you're defending
against all the various "uninitialized" variable attacks (which can be
info exposures, flow control redirection, etc). If it's on function exit,
this is more aimed at avoiding stale data and minimizing what's available
during an attack (and it also provides similar "uninit" defenses, just
in a different way). And FWIW, past benchmarks on this appear to indicate
erase-on-entry is more cache-friendly.

-- 
Kees Cook


Re: [PATCH/RFC] options: Make --help= to emit values post-overrided

2020-08-06 Thread Segher Boessenkool
Hi!

On Thu, Aug 06, 2020 at 08:37:23PM +0800, Kewen.Lin wrote:
> When I was working to update patch as Richard's review comments
> here https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551474.html,
> I noticed that the options "-Q --help=params" don't show the final values
> after target option overriding, instead it emits the default values in
> params.opt (without any explicit param settings).
> 
> I guess it's more meaningful to get it to emit values post-overrided,
> to avoid possible confusion for users.  Does it make sense?
> Or are there any concerns?

I think this makes a lot of sense.

> btw, not sure whether it's a good idea to move target_option_override_hook
> call into print_specific_help and use one function local static
> variable to control it's called once for all kinds of help dumping
> (possible combination), then can remove the calls in function 
> common_handle_option.

I cannot easily imagine what that will look like...  it could easily be
worse than what you have here (callbacks aren't so nice, but there are
worse things).

> @@ -2145,9 +2146,11 @@ print_help (struct gcc_options *opts, unsigned int 
> lang_mask,
>if (!(include_flags & CL_PARAMS))
>  exclude_flags |= CL_PARAMS;
>  
> -  if (include_flags)
> +  if (include_flags) {
> +target_option_override_hook ();
>  print_specific_help (include_flags, exclude_flags, 0, opts,
>lang_mask);
> +  }
>  }

Indenting should be like

  if (include_flags)
{
  target_option_override_hook ();
  print_specific_help (include_flags, exclude_flags, 0, opts, lang_mask);
}


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Qing Zhao via Gcc-patches
Hi, Richard,


> On Aug 5, 2020, at 4:35 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
>> 
>> + continue;
>> +  if (fixed_regs[regno])
>> + continue;
>> +  if (is_live_reg_at_exit (regno))
>> + continue;
>> 
>> How can a call-used reg be live at exit?
> 
> Yes, this might not be needed, I will double check on this.

Just double checked this. And it turned out that this condition cannot be 
deleted.

a call-used reg might be the register that holds the return value and return to 
caller (so it’s live at exit).
For example, the EAX register of i386 is a call-used register and at the same 
time, it is the register that holds the return value.

Hope this is clear.

thanks.

Qing



Re: [RS6000] PR96493, powerpc local call linkage failure

2020-08-06 Thread Segher Boessenkool
Hi!

On Thu, Aug 06, 2020 at 10:58:18PM +0930, Alan Modra wrote:
> This corrects current_file_function_operand, an operand predicate used
> to determine whether a symbol_ref is safe to use with the local_call
> patterns.  Calls between pcrel and non-pcrel code need to go via
> linker stubs.  In the case of non-pcrel code to pcrel the stub saves
> r2 but there needs to be a nop after the branch for the r2 restore.
> So the local_call patterns can't be used there.

Okay.

> For pcrel code to
> non-pcrel the local_call patterns could still be used, but I thought
> it better to not use them since the call isn't direct.  Code generated
> by the corresponding call_nonlocal_aix for pcrel is identical anyway.

Hrm.

> Should I rename current_file_function_operand to something more
> meaningful before committing?  direct_local_call_operand perhaps?

As a separate patch, either before or after this one.  And maybe a
better name than that as well, direct_local_call_operand isn't great?

In the same vein, maybe the local_call pattern names should be changed?
Because this isn't used just for local calls anymore; instead, the
defining characteristic is whether there is a restore of r2 after the
call (whether there might be any such restore needed).  The pattern
names and the operand name ideally would be obviously related?

> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1051,7 +1051,12 @@
>   && !((DEFAULT_ABI == ABI_AIX
> || DEFAULT_ABI == ABI_ELFv2)
>&& (SYMBOL_REF_EXTERNAL_P (op)
> -  || SYMBOL_REF_WEAK (op)))")))
> +  || SYMBOL_REF_WEAK (op)))
> + && !(DEFAULT_ABI == ABI_ELFv2
> +  && SYMBOL_REF_DECL (op) != NULL
> +  && TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL
> +  && (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op))
> +  != rs6000_pcrel_p (cfun)))")))

This condition is much too complex like that...  can you factor it out
to a code block, perhaps?  Or maybe there should be an actual helper
function.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96493.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-mdejagnu-cpu=powerpc64 -O2" } */

That is not a -mcpu= value you should ever use.  Please just pick a real
existing CPU, maybe p7 or p8 since this requires ELFv2 anyway?  Or, what
does it need here?  It isn't clear to me.  But you don't want a pseudo-
POWER3 with ELFv2 :-)

The patch is okay for trunk (and backports later) if you fix the
testcase (the renames and other improvements can be done later, and do
not need backporting).  Thanks!


Segher


Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Marek Polacek via Gcc-patches
On Thu, Aug 06, 2020 at 10:01:37AM -0400, Nathan Sidwell wrote:
> On 8/5/20 7:29 PM, Marek Polacek wrote:
> > On Wed, Aug 05, 2020 at 11:03:08AM -0400, Nathan Sidwell wrote:
> > > On 8/4/20 8:54 PM, Marek Polacek via Gcc-patches wrote:
> > > > On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:
> > > > > I think the read of the room is that people think it would be 
> > > > > generally useful, so let approve the general plan.
> > > > 
> > > > Cool.
> > > > 
> > > > > So, now we are down to the fine details.  Please do see just how far 
> > > > > you can stretch the existing mechanisms to cover what you need to do. 
> > > > >  I think the existing mechanisms should be able to cover it all; but 
> > > > > the devil is in the details and those matter.
> > > > 
> > > > At this point I'm only proposing one new directive, dg-ice.  I think we 
> > > > can't
> > > > really do without it.  The other one was a matter of convenience.
> > > 
> > > I've realized I have a concern.  Grepping (or searching in an editor 
> > > buffer)
> > > the log file for 'internal compiler error' to find actual regressions is a
> > > thing I want to still be able to do (perhaps with alternative spelling, I
> > > don't care).  I don't want to see the ICEs of tests that are expected to
> > > ICE.
> > > 
> > > I think that means there has to be a positive marker on the unexpected 
> > > ICEs,
> > > rather than lack of an expected marker on them.
> > 
> > Hmm, by the log file you mean g++.log?  Currently, if you run a dg-ice test,
> > and the test still ICEs, the g++.log file (but not the stdout of make
> > check-c++!) will have:
> > 
> > Executing on host: ... xg++ with options ...
> > spawn -ignore SIGHUP ... xg++ with options ...
> > .../foo.C:14:15: internal compiler error: in poplevel_class, at 
> > cp/name-lookup.c:4225
> > 
> > compiler exited with status 1
> > XFAIL: g++.dg/foo.C  -std=c++17 (internal compiler error)
> > PASS: g++.dg/foo.C  -std=c++17 (test for excess errors)
> > 
> > Which one of these would you not like to see?
> 
> Neither of these is solving the issue.  How do I find the ICES that are
> unexpected, without tripping over the ICEs that are expected?
> 
> > Can you give me more details?  Hopefully we'll work something out that 
> > doesn't
> > break your workflow.
> 
> sure.
> * develop patch
> * run testsuite
> * observe unexpected ICEs
> * load g++.log into editor
> * ^sinternal comp
> * gets to first unexpected ICE
> * debug it.
> 
> What does '^sinternal comp' become?  As there could be many expected ICEs
> it'll be painful to determine whether any particular utterance of 'internal
> compiler' is expected or not.

That is a problem I don't know how to deal with.  I know how to pass
additional options to the compiler from dejagnu.  I thought maybe I could use
-pass-exit-codes, redirect stderr to /dev/null, and check if the exit code is
ICE_EXIT_CODE, but there seems to be no way to do that redirection.  So I'm
stuck.

Though, you could just grep for '^FAIL.*internal comp', which will find the
first unexpected ICE.  Contrary to the expected ICEs, the unexpected ICEs will
be shown in 'Excess errors:'.  Won't that work?

Marek



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Segher Boessenkool
Hi!

On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...

It would be nice if this described anywhere what the benefit of this is,
including actual hard numbers.  I only see it is very costly, and I see
no benefit whatsoever.

> > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > command-line option and
> > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > > attribue:

"call-used" is such a bad name.  "call-clobbered" is better already, but
"volatile" (over calls) is most obvious I think.

There are at least four different kinds of volatile registers:

1) Argument registers are volatile, on most ABIs.
2) The *linker* (or dynamic linker!) may insert code that needs some
   registers for itself;
3) Registers only used for scratch space;
4) Registers used for returning the function value.

And these can overlap, and differ per function.

> > > Again - what's the intended use (and how does it fulful anything useful
> > > for that case)?

Yes, exactly.

> > > +  if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > + continue;
> > > 
> > > Why does the target need some extra say here?
> > 
> > Only target can decide which hard regs should be zeroed, and which hard 
> > regs are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

It probably would be much easier to just have the target do *all* of
this, in one hook, or maybe even in the existing epilogue stuff.  The
resulting binary code will be very slow no matter what, so this should
not matter much at all.

> > > +  machine_mode mode
> > > + = targetm.calls.zero_call_used_regno_mode (regno,
> > > +reg_raw_mode[regno]);
> > > 
> > > In what case does the target ever need to adjust this (we're dealing
> > > with hard-regs only?)?
> > 
> > For x86, for example, even though the GPR registers are 64-bit, we only 
> > need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

I gues what is meant here is that the usual x86-64 insns to clear the
low 32 bits of a register actually clear the whole register?  It is a
huge security leak otherwise.  And, the generic code has nothing to do
with this, define hooks that ask the target to clear stuff, instead?

> > > +  reg = gen_rtx_REG (mode, regno);
> > > +  if (zero_rtx[(int)mode] == NULL_RTX)
> > > + {
> > > +   zero_rtx[(int)mode] = reg;
> > > +   tmp = gen_rtx_SET (reg, const0_rtx);
> > > +   emit_insn (tmp);
> > > + }
> > > +  else
> > > + emit_move_insn (reg, zero_rtx[(int)mode]);
> > > 
> > > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > > but I may be wrong.  
> > 
> > You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
> > I will check on this.

If it is a CONST_INT, you should use const0_rtx; otherwise,
CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
const_tiny_rtx already, and you shouldn't use that directly either.

> But why not simplify it all to a single hook
> 
>   targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> 
> ?

Yeah.  With a much better name though (it should say what it is for, or
describe at a *high level* what it does).

> > > start_sequence ();
> > > emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > +
> > > +  gen_call_used_regs_seq ();
> > > +
> > > 
> > > The caller eventually performs shrink-wrapping - are you sure that
> > > doesn't mess up things?
> > 
> > My understanding is, in the standard epilogue, there is no handling of 
> > “call-used” registers.  Therefore, shrink-wrapping will not impact
> > “call-used” registers as well. 
> > Our patch only handles call-used registers, so, there should be no any 
> > interaction between this patch and shrink-wrapping.
> 
> I don't know (CCed Segher, he should eventually).

Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.  But the epilogue can use
some volatile registers as well, including to hold sensitive info.  And
of course everything is different if you use separate shrink-wrapping,
but that work is done already when you get here (so it is too late?)


Anyway.  This all needs a good description in the user manual (is there?
I couldn't find any), explaining what exactly it does (user-visible),
and when you would want to use it, etc.  We need that before we can
review anything else in this patch sanely.


Segher


Re: [PATCH][testsuite] Add gcc.dg/ia64-sync-5.c

2020-08-06 Thread Mike Stump via Gcc-patches
On Aug 6, 2020, at 5:23 AM, Tom de Vries  wrote:
> 
> There currently is no sync_char_short-enabled run test that tests
> __sync_val_compare_and_swap.
> 
> OK for trunk?

Ok.


mmix: fix gcc.dg/loop-9.c by more accurate move insns

2020-08-06 Thread Hans-Peter Nilsson
Committed.

It looks like gcc.dg/loop-9.c kind-of works as sentinel for sane
move-instruction generation for a port.

Looking at the
FAIL: gcc.dg/loop-9.c scan-rtl-dump loop2_invariant "Decided"
FAIL: gcc.dg/loop-9.c scan-rtl-dump loop2_invariant "without introducing a new 
temporary register"
it seems the problem is that in the loop:

  for (i = 0; i < 100; i++)
a[i] = 18.4242;

the move insn corresponding to "a[i] = 18.4242" happens to be
generated as a move of a constant to a memory address, using no
registers except for the address (edited):

(insn 9 8 10 3 (set (mem:DF (reg:DI 269 [ ivtmp::9 ]))
(const_double:DF 1.84241999e+1)) "x/loop-9.c":9:10 6 {movdf})

To wit, at the loop2 pass there's no register-initialization to move
out of the loop!  The insn above isn't accurate and has to be fixed up
at register allocation time to make constraints match.  While there
are insns to set memory to constant in MMIX, that's limited to 64-bit
moves corresponding to the integer bit-patterns for 0..255, and
18.4242 isn't one of them.  (Only 0.0 matches; the bit-patterns for
0..255 would IIUC be interpreted as denormal floating-point numbers
a.k.a. subnormal numbers and don't seem worthwhile to handle.)

The fault is with the port, for not requiring a register for an
operand that actually requires an intermediate register, in order to
enable pre-register-allocation passes to do their job.  The movdf
pattern (actually, all MMIX movM), only required the destination to be
a non-immediate operand and the source to be a general_operand,
i.e. anything-to-anything.

Better force the source to be a register, when asked to generate such
a move insn.  Also, make operands stay sane by using the matching insn
condition to require one of the operands to be a register
pre-register-allocation (for sake of combine-like passes that cook up
"simplified" insns, possibly losing the use of a register).  Looking
no deeper than at the results of test-runs with different variants, I
see that the latter "safety latch" has no effect on the test-results
(at 919c9d4bd3db7da0), but it just feels like the right thing to do.
Similarly, there's no effect on test-suite results, to do the same not
just for movdf but for all moves.

gcc:
* config/mmix/mmix.md (MM): New mode_iterator.
("mov"): New expander to expand for all MM-modes.
("*movqi_expanded", "*movhi_expanded", "*movsi_expanded")
("*movsf_expanded", "*movdf_expanded"): Rename from the
corresponding mov named pattern.  Add to the condition that
either operand must be a register_operand.
("*movdi_expanded"): Similar, but also allow STCO in the condition.

--- gcc/gcc/config/mmix/mmix.md.origMon Jan 13 22:30:46 2020
+++ gcc/gcc/config/mmix/mmix.md Wed Aug  5 02:55:54 2020
@@ -38,6 +38,8 @@
(MMIX_rR_REGNUM 260)
(MMIX_fp_rO_OFFSET -24)]
 )
+
+(define_mode_iterator MM [QI HI SI DI SF DF])

 ;; Operand and operator predicates.

@@ -46,10 +48,25 @@

 ;; FIXME: Can we remove the reg-to-reg for smaller modes?  Shouldn't they
 ;; be synthesized ok?
-(define_insn "movqi"
+(define_expand "mov"
+  [(set (match_operand:MM 0 "nonimmediate_operand")
+   (match_operand:MM 1 "general_operand"))]
+  ""
+{
+  /*  Help pre-register-allocation to use at least one register in a move.
+  FIXME: support STCO also for DFmode (storing 0.0).  */
+  if (!REG_P (operands[0]) && !REG_P (operands[1])
+  && (mode != DImode
+ || !memory_operand (operands[0], DImode)
+ || !satisfies_constraint_I (operands[1])))
+operands[1] = force_reg (mode, operands[1]);
+})
+
+(define_insn "*movqi_expanded"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r ,r,x ,r,r,m,??r")
(match_operand:QI 1 "general_operand"   "r,LS,K,rI,x,m,r,n"))]
-  ""
+  "register_operand (operands[0], QImode)
+   || register_operand (operands[1], QImode)"
   "@
SET %0,%1
%s1 %0,%v1
@@ -60,10 +77,11 @@
STBU %1,%0
%r0%I1")

-(define_insn "movhi"
+(define_insn "*movhi_expanded"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,x,r,r,m,??r")
(match_operand:HI 1 "general_operand"   "r,LS,K,r,x,m,r,n"))]
-  ""
+  "register_operand (operands[0], HImode)
+   || register_operand (operands[1], HImode)"
   "@
SET %0,%1
%s1 %0,%v1
@@ -75,10 +93,11 @@
%r0%I1")

 ;; gcc.c-torture/compile/920428-2.c fails if there's no "n".
-(define_insn "movsi"
+(define_insn "*movsi_expanded"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r ,r,x,r,r,m,??r")
(match_operand:SI 1 "general_operand"   "r,LS,K,r,x,m,r,n"))]
-  ""
+  "register_operand (operands[0], SImode)
+   || register_operand (operands[1], SImode)"
   "@
SET %0,%1
%s1 %0,%v1
@@ -90,10 +109,13 @@
%r0%I1")

 ;; We assume all "s" are addresses.  Does that hold?
-(define_insn "movdi"
+(define_insn "*movdi_expanded"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r ,r,x,r,m,r,m,r,r,??r")
(match_operand:DI 1 "

Re: RFC: Monitoring old PRs, new dg directives

2020-08-06 Thread Mike Stump via Gcc-patches
On Aug 6, 2020, at 7:01 AM, Nathan Sidwell  wrote:
> 
>> XFAIL: g++.dg/foo.C  -std=c++17 (internal compiler error)
>> PASS: g++.dg/foo.C  -std=c++17 (test for excess errors)
>> Which one of these would you not like to see?
> 
> Neither of these is solving the issue.  How do I find the ICES that are 
> unexpected, without tripping over the ICEs that are expected?

Don't you already have this issue for current xfailed ICEs?  ^FAIL.*internal 
compiler error I think finds them, doesn't it?

[PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-06 Thread Segher Boessenkool
When the compgotos pass copies the tail of blocks ending in an indirect
jump, there is a micro-optimization to not copy the last one, since the
original block will then just be deleted.  This does not work properly
if cleanup_cfg does not merge all pairs of blocks we expect it to.

2020-08-07  Segher Boessenkool  

PR rtl-optimization/96475
* bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p
micro-optimization.
---
 gcc/bb-reorder.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index c635010..f7a7de8 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt)
 static bool
 maybe_duplicate_computed_goto (basic_block bb, int max_size)
 {
-  if (single_pred_p (bb))
-return false;
-
   /* Make sure that the block is small enough.  */
   rtx_insn *insn;
   FOR_BB_INSNS (bb, insn)
-- 
1.8.3.1



Re: [PATCH/RFC] options: Make --help= to emit values post-overrided

2020-08-06 Thread Kewen.Lin via Gcc-patches
Hi Segher!

Thanks for the comments!

on 2020/8/7 上午6:04, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 06, 2020 at 08:37:23PM +0800, Kewen.Lin wrote:
>> When I was working to update patch as Richard's review comments
>> here https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551474.html,
>> I noticed that the options "-Q --help=params" don't show the final values
>> after target option overriding, instead it emits the default values in
>> params.opt (without any explicit param settings).
>>
>> I guess it's more meaningful to get it to emit values post-overrided,
>> to avoid possible confusion for users.  Does it make sense?
>> Or are there any concerns?
> 
> I think this makes a lot of sense.
> 
>> btw, not sure whether it's a good idea to move target_option_override_hook
>> call into print_specific_help and use one function local static
>> variable to control it's called once for all kinds of help dumping
>> (possible combination), then can remove the calls in function 
>> common_handle_option.
> 
> I cannot easily imagine what that will look like...  it could easily be
> worse than what you have here (callbacks aren't so nice, but there are
> worse things).
> 

I attached opts_alt2.diff to be more specific for this, both alt1 and alt2
follow the existing callback scheme, alt2 aims to avoid possible multiple
times target_option_override_hook calls when we have several --help= or
similar, but I guess alt1 is also fine since the hook should be allowed to
be called more than once.

>> @@ -2145,9 +2146,11 @@ print_help (struct gcc_options *opts, unsigned int 
>> lang_mask,
>>if (!(include_flags & CL_PARAMS))
>>  exclude_flags |= CL_PARAMS;
>>  
>> -  if (include_flags)
>> +  if (include_flags) {
>> +target_option_override_hook ();
>>  print_specific_help (include_flags, exclude_flags, 0, opts,
>>   lang_mask);
>> +  }
>>  }
> 
> Indenting should be like
> 
>   if (include_flags)
> {
>   target_option_override_hook ();
>   print_specific_help (include_flags, exclude_flags, 0, opts, lang_mask);
> }
> 

Thanks for catching!  Updated.

BR,
Kewen
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..ec960c87c9a 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -328,7 +328,7 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   const char *arg;
 
   FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+print_help (opts, lang_mask, arg, target_option_override_hook);
 }
 
 /* Hold command-line options associated with stack limitation.  */
diff --git a/gcc/opts.c b/gcc/opts.c
index 499eb900643..fb4d4b8aa43 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2017,7 +2017,8 @@ check_alignment_argument (location_t loc, const char 
*flag, const char *name)
 
 void
 print_help (struct gcc_options *opts, unsigned int lang_mask,
-   const char *help_option_argument)
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void))
 {
   const char *a = help_option_argument;
   unsigned int include_flags = 0;
@@ -2146,8 +2147,11 @@ print_help (struct gcc_options *opts, unsigned int 
lang_mask,
 exclude_flags |= CL_PARAMS;
 
   if (include_flags)
-print_specific_help (include_flags, exclude_flags, 0, opts,
-lang_mask);
+{
+  gcc_assert (target_option_override_hook);
+  target_option_override_hook ();
+  print_specific_help (include_flags, exclude_flags, 0, opts, lang_mask);
+}
 }
 
 /* Handle target- and language-independent options.  Return zero to
diff --git a/gcc/opts.h b/gcc/opts.h
index 8f594b46e33..9a837305af1 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -419,8 +419,9 @@ extern bool target_handle_option (struct gcc_options *opts,
 extern void finish_options (struct gcc_options *opts,
struct gcc_options *opts_set,
location_t loc);
-extern void print_help (struct gcc_options *opts, unsigned int lang_mask, const
-   char *help_option_argument);
+extern void print_help (struct gcc_options *opts, unsigned int lang_mask,
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void));
 extern void default_options_optimization (struct gcc_options *opts,
  struct gcc_options *opts_set,
  struct cl_decoded_option 
*decoded_options,
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..ec960c87c9a 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -328,7 +328,7 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   const char *arg;
 
   FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+print_help (opts, lang_mask, arg, target_option_override_hook);
 }
 
 /* Hold command-line options associated wi

Re: [RS6000] PR96493, powerpc local call linkage failure

2020-08-06 Thread Alan Modra via Gcc-patches
On Thu, Aug 06, 2020 at 05:34:03PM -0500, Segher Boessenkool wrote:
> > +/* { dg-do run } */
> > +/* { dg-options "-mdejagnu-cpu=powerpc64 -O2" } */
> 
> That is not a -mcpu= value you should ever use.  Please just pick a real
> existing CPU, maybe p7 or p8 since this requires ELFv2 anyway?  Or, what
> does it need here?  It isn't clear to me.  But you don't want a pseudo-
> POWER3 with ELFv2 :-)

What I was trying to say by using cpu=powerpc64 is that this test has
minimal requirements on the required cpu level to run the test.
That's all.  Changed to power8 and committed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-06 Thread Richard Biener
On Thu, 6 Aug 2020, Kees Cook wrote:

> On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> > OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> > it sounded more like a mitigation against information leaks which
> > then would be highly incomplete w/o spill slot clearing.  Like
> > we had that discussion on secure erase of memory that should not
> > be DSEd.
> 
> I've viewed stack erasure as separate from register clearing. The
> "when" of stack erasure tends to define which things are being defended
> against. If the stack is being erased on function entry, you're defending
> against all the various "uninitialized" variable attacks (which can be
> info exposures, flow control redirection, etc). If it's on function exit,
> this is more aimed at avoiding stale data and minimizing what's available
> during an attack (and it also provides similar "uninit" defenses, just
> in a different way). And FWIW, past benchmarks on this appear to indicate
> erase-on-entry is more cache-friendly.

So I originally thought this was about leaking security sensitive data
to callers and thus we want to define API entries to not leak any
data from callees other than via the ABI defined return values or
global memory the callee chooses to populate.  Clearing registers
not involved in returning data is one part but then contents of such
registers could also reside in spill slots which means you have to
clear those as well.  And yes, even local automatic variables of the
callee fall into the category and thus 'stack-erasure' would be
required.  To appropriately have such a "security boundary" at
function return you _do_ have to do the clearing at function return
though.

But it's a completely different topic and it seems the patch was
not intended to help the folks that also ask for "secure"_memset
the compiler isn't supposed to optimize away as dead.

Richard.


Re: VEC_COND_EXPR optimizations v2

2020-08-06 Thread Richard Biener via Gcc-patches
On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse  wrote:
>
> On Thu, 6 Aug 2020, Christophe Lyon wrote:
>
> >> Was I on the right track configuring with
> >> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
> >> --with-fpu=neon-fp16
> >> then compiling without any special option?
> >
> > Maybe you also need --with-float=hard, I don't remember if it's
> > implied by the 'hf' target suffix
>
> Thanks! That's what I was missing to reproduce the issue. Now I can
> reproduce it with just
>
> typedef unsigned int vec __attribute__((vector_size(16)));
> typedef int vi __attribute__((vector_size(16)));
> vi f(vec a,vec b){
>  return a==5 | b==7;
> }
>
> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 
> -fdisable-tree-forwprop3 -O1
>
>_1 = a_5(D) == { 5, 5, 5, 5 };
>_3 = b_6(D) == { 7, 7, 7, 7 };
>_9 = _1 | _3;
>_7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);
>
> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
> false), while with -fdisable-tree-forwprop4 we do manage to expand
>
>_2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 
> 112);
>
> It doesn't make much sense to me that we can expand the more complicated
> form and not the simpler form of the same operation (both compare a to 5
> and produce a vector of -1 or 0 of the same size), especially when the
> target has an instruction (vceq) that does just what we want.
>
> Introducing boolean vectors was fine, but I think they should be real
> types, that we can operate on, not be forced to appear only as the first
> argument of a vcond.
>
> I can think of 2 natural ways to improve things: either implement vector
> comparisons in the ARM backend (possibly by forwarding to their existing
> code for vcond), or in the generic expansion code try using vcond if the
> direct comparison opcode is not provided.
>
> We can temporarily revert my patch, but I would like it to be temporary.
> Since aarch64 seems to handle the same code just fine, maybe someone who
> knows arm could copy the relevant code over?
>
> Does my message make sense, do people have comments?

So what complicates things now (and to some extent pre-existed when you
used AVX512 which _could_ operate on boolean vectors) is that we
have split out the condition from VEC_COND_EXPR to separate stmts
but we do not expect backends to be able to code-generate the separate
form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
to .VCOND[U] "merging" the compares again.  Now that process breaks
down once we have things like _9 = _1 | _3;  -  at some point I argued
that we should handle vector compares [and operations on boolean vectors]
as well in ISEL but then when it came up again for some reason I
disregarded that again.

Thus - we don't want to go back to fixing up the generic expansion code
(which looks at one instruction at a time and is restricted by TER single-use
restrictions).  Instead we want to deal with this in ISEL which should
behave more intelligently.  In the above case it might involve turning
the _1 and _3 defs into .VCOND [with different result type], doing
_9 in that type and then somehow dealing with _7 ... but this eventually
means undoing the match simplification that introduced the code?

Not sure if that helps though.

Richard.

> --
> Marc Glisse


Re: std:vec for classes with constructor?

2020-08-06 Thread Richard Biener via Gcc-patches
On Thu, Aug 6, 2020 at 9:24 PM Jonathan Wakely  wrote:
>
> On 06/08/20 19:58 +0200, Aldy Hernandez wrote:
> >
> >
> >On 8/6/20 6:30 PM, Jonathan Wakely wrote:
> >>On 06/08/20 16:17 +0200, Aldy Hernandez wrote:
> >>>
> >>>
> >>>On 8/6/20 12:48 PM, Jonathan Wakely wrote:
> On 06/08/20 12:31 +0200, Richard Biener wrote:
> >On Thu, Aug 6, 2020 at 12:19 PM Jonathan Wakely
> > wrote:
> >>
> >>On 06/08/20 06:16 +0100, Richard Sandiford wrote:
> >>>Andrew MacLeod via Gcc-patches  writes:
> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
> >On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor
> >> wrote:
> >>On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
> >>[...]
> >>
> >>>* ipa-cp changes from vec to std::vec.
> >>>
> >>>We are using std::vec to ensure constructors are run, which they
> >>aren't
> >>>in our internal vec<> implementation.  Although we
> >>>usually
> >>steer away
> >>>from using std::vec because of interactions with our GC system,
> >>>ipcp_param_lattices is only live within the pass
> >>>and
> >>allocated with
> >>calloc.
> >>Ummm... I did not object but I will save the URL of
> >>this
> >>message in the
> >>archive so that I can waive it in front of anyone
> >>complaining why I
> >>don't use our internal vec's in IPA data structures.
> >>
> >>But it actually raises a broader question: was this
> >>supposed to be an
> >>exception, allowed only not to complicate the irange
> >>patch
> >>further, or
> >>will this be generally accepted thing to do when
> >>someone
> >>wants to have
> >>a
> >>vector of constructed items?
> >It's definitely not what we want. You have to find
> >another
> >>solution to this problem.
> >
> >Richard.
> >
> 
> Why isn't it what we want?
> 
> This is a small vector local to the pass so it doesn't
> interfere with
> our PITA GTY.
> The class is pretty straightforward, but we do need a constructor to
> initialize the pointer and the max-size field.  There is
> no
> >>allocation
> done per element, so a small number of elements have a
> couple
> >>of fields
> initialized per element. We'd have to loop to do that anyway.
> 
> GCC's vec<> does not provide he ability to run a
> constructor,
> >>std::vec
> does.
> >>>
> >>>I realise you weren't claiming otherwise, but: that could
> >>>be fixed :-)
> >>
> >>It really should be.
> >>
> >>Artificial limitations like that are just a booby trap for the unwary.
> >
> >It's probably also historic because we couldn't even implement
> >the case of re-allocation correctly without std::move, could we?
> 
> I don't see why not. std::vector worked fine without std::move, it's
> just more efficient with std::move, and can be used with a wider set
> of element types.
> 
> When reallocating you can just copy each element to the new storage
> and destroy the old element. If your type is non-copyable then you
> need std::move, but I don't think the types I see used with vec<> are
> non-copyable. Most of them are trivially-copyable.
> 
> I think the benefit of std::move to GCC is likely to be permitting
> cheap copies to be made where previously they were banned for
> performance reasons, but not because those copies were impossible.
> >>>
> >>>For the record, neither value_range nor int_range require any
> >>>allocations.  The sub-range storage resides in the stack or
> >>>wherever it was defined.  However, it is definitely not a POD.
> >>>
> >>>Digging deeper, I notice that the original issue that caused us to
> >>>use std::vector was not in-place new but the safe_grow_cleared.
> >>>The original code had:
> >>>
> auto_vec known_value_ranges;
> ...
> ...
> if (!vr.undefined_p () && !vr.varying_p ())
>  {
>    if (!known_value_ranges.length ())
>  known_value_ranges.safe_grow_cleared (count);
>  known_value_ranges[i] = vr;
>  }
> >>>
> >>>I would've gladly kept the auto_vec, had I been able to do call
> >>>the constructor by doing an in-place new:
> >>>
>    if (!vr.undefined_p () && !vr.varying_p ())
>  {
>    if (!known_value_ranges.length ())
> - known_value_ranges.safe_grow_cleared (count);
> + {
> +   known_value_ranges.safe_grow_cleared
> (count);
> +   for (int i = 0; i < count; ++i)
> + 

  1   2   >