Re: [PATCH] Allow entry point markers without debug support in accelerator compiler

2019-12-09 Thread Richard Biener
On Fri, Dec 6, 2019 at 6:27 PM Kwok Cheung Yeung  wrote:
>
> Hello
>
> A number of the libgomp tests running with AMD GCN offloading fail with
> the following internal compiler error:
>
> during RTL pass: final
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/libgomp/testsuite/libgomp.fortran/examples-4/async_target-1.f90:
> In function 'pipedf_._omp_fn.2':
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/libgomp/testsuite/libgomp.fortran/examples-4/async_target-1.f90:49:
> internal compiler error: in dwarf2out_inline_entry, at dwarf2out.c:27682
> 0x626210 dwarf2out_inline_entry
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/dwarf2out.c:27682
> 0x9692c4 final_scan_insn_1
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/final.c:2435
> 0x969f4b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/final.c:3152
> 0x96a214 final_1
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/final.c:2020
> 0x96ac7f rest_of_handle_final
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/final.c:4658
> 0x96ac7f execute
> 
> /scratch/ci-cs/amdtest/upstream-offload/src/gcc-mainline/gcc/final.c:4736
>
> The ICE is due to an assert for debug_inline_points firing. The test
> case does not explicitly set this (using -ginline-points), so it is
> auto-detected.
>
> The problem arises because the host compiler enables it by default, but
> the offload compiler disables it. The host compiler generates the Gimple
> debug statements for inlined functions, then streams them out using the
> LTO mechanism. The accelerator compiler streams them in, encounters the
> unexpected debug statements and ICEs due to a failed assertion.
>
> It is possible to make GCN enable support inline-points by default, but
> I think it would be better to fix it for the general case where there is
> a disagreement between host and accelerator? This patch makes
> dwarf2out_inline_entry ignore the inline entry note if
> debug_inline_points is not set while the compiler is in LTO mode. This
> is effectively relaxing the assertion condition by allowing an exception
> for LTO.
>
> Bootstrapped on x86_64, and tested using GCN as an offload accelerator.
> Okay for trunk?

The stream-in code has

  /* If we're recompiling LTO objects with debug stmts but
 we're not supposed to have debug stmts, remove them now.
 We can't remove them earlier because this would cause uid
 mismatches in fixups, but we can do it at this point, as
 long as debug stmts don't require fixups.
 Similarly remove all IFN_*SAN_* internal calls   */
  if (!flag_wpa)
{
  if (is_gimple_debug (stmt)
  && (gimple_debug_nonbind_marker_p (stmt)
  ? !MAY_HAVE_DEBUG_MARKER_STMTS
  : !MAY_HAVE_DEBUG_BIND_STMTS))
remove = true;
  /* In case the linemap overflows locations can be dropped
 to zero.  Thus do not keep nonsensical inline entry markers
 we'd later ICE on.  */
  tree block;
  if (gimple_debug_inline_entry_p (stmt)
  && (block = gimple_block (stmt))
  && !inlined_function_outer_scope_p (block))
remove = true;

so can you please instead amend that or figure why it doesn't work?

Thanks,
Richard.

> Kwok
>
>
> It is possible for the host compiler to emit entry point markers in the
> GIMPLE code while the accelerator compiler has them disabled, causing an
> assertion to fire where processed by the accelerator compiler.  This is
> fixed by allowing the markers to be ignored in LTO mode only.
>
> 2019-12-06  Kwok Cheung Yeung  
>
> gcc/
> * dwarf2out.c (dwarf2out_inline_entry): Return early if in LTO and
> debug_inline_points not set.
> ---
>   gcc/dwarf2out.c | 7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 6fb345b..44fa071 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -27679,6 +27679,13 @@ block_within_block_p (tree block, tree outer,
> bool bothways)
>   static void
>   dwarf2out_inline_entry (tree block)
>   {
> +  /* In an offloading configuration, it is possible for the host
> toolchain but
> + not the offload toolchain to support extended debug information
> for inlined
> + functions.  In that case, we can just ignore any entry point markers
> + read from the LTO stream.  */
> +  if (in_lto_p && !debug_inline_points)
> +return;
> +
> gcc_assert (debug_inline_points);
>
> /* If we can't represent it, don't bother.  */
> --
> 2.8.1
>


Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-09 Thread Richard Biener
On Fri, Dec 6, 2019 at 11:31 PM Jeff Law  wrote:
>
> On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > This patch kit introduces a static analysis pass for GCC that can
> > > diagnose
> > > various kinds of problems in C code at compile-time (e.g. double-
> > > free,
> > > use-after-free, etc).
> >
> > I haven't looked at the analyzer bits in any detail yet so I have
> > just some very high-level questions.  But first let me say I'm
> > excited to see this project! :)
> >
> > It looks like the analyzer detects some of the same problems as
> > some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
> > and some that I have been working toward implementing (invalid uses
> > of freed pointers such as returning them from functions or passing
> > them to others), and others still that I have been thinking about
> > as possible future projects (e.g., detecting uses of uninitialized
> > arrays in string functions).
> >
> > What are your thoughts about this sort of overlap?  Do you expect
> > us to enhance both sets of warnings in parallel, or do you see us
> > moving away from issuing warnings in the middle-end and toward
> > making the analyzer the main source of these kinds of diagnostics?
> > Maybe even replace some of the problematic middle-end warnings
> > with the analyzer?  What (if anything) should we do about warnings
> > issued for the same problems by both the middle-end and the analyzer?
> > Or about false negatives?  E.g., a bug detected by the middle-end
> > but not the analyzer or vice versa.
> >
> > What do you see as the biggest pros and cons of either approach?
> > (Middle-end vs analyzer.)  What limitations is the analyzer
> > approach inherently subject to that the middle-end warnings aren't,
> > and vice versa?
> >
> > How do we prioritize between the two approaches (e.g., choose
> > where to add a new warning)?
> Given the cost of David's analyzer, I would tend to prioritize the more
> localized analysis.  Also note that because of the compile-time
> complexities we end up pruning paths from the search space and lose
> precision when we have to merge nodes.   These issues are inherent in
> the depth of analysis we're looking to do.
>
> So the way to think about things is David's work is a slower, deeper
> analysis than what we usually do.  So things that are reasonable
> candidates for -Wall would need to use the traditional mechansisms.
> Things that require deeper analysis would be done in David's framework.
>
> Also note that part of David's work is to bring a fairly generic engine
> that we can expand with different domain specific analyzers.  It just
> happens to be the case that the first place he's focused is on double-
> free and use-after-free.  But (IMHO) the gem is really the generic
> engine.

So if the "generic engine" lives inside GCC can the actual analyzers
be plugins on a (stable) "analyzer plugin API"?

Does the analyzer work with LTO at whole-program scope btw?

Richard.

> jeff
>


Re: [patch, fortran] Introduce -finline-pack

2019-12-09 Thread Richard Biener
On Sat, Dec 7, 2019 at 2:53 PM Thomas Koenig  wrote:
>
> Hello world,
>
> the attached patch introduces a new option, -finline-pack.
>
> Since the fix for PR88821, we now do inline packing of
> arguments (if required) via the scalarizer, instead of
> using _gfortran_internal_[un]pack when optimizing, but
> not when optimizing for size.
>
> This introduces (really) large performance gains for some test
> cases because now the middle end can see through the packing.
> On the other hand, for test cases which do a _lot_ of this,
> compile time and code size can increase by quite a bit.
>
> So, this patch introduces an option to control that behavior,
> so that people can turn it off on a by-file basis if they
> don't want it.
>
> OK for trunk?

Just as a suggestion, maybe we'd want to extend this
to other intrinsics in future so a -fno-inline-intrinsic=pack[,...]
is more future proof? (I'd inline all intrinsics by default thus
only provide the negative form).  You can avoid the extra
option parsing complexity by only literally adding
-fno-inline-intrinsic=pack for now.

Richard.

> Regards
>
> Thomas
>
> Introduce -finline-pack.
>
> 2019-12-07  Thomas Koenig  
>
> PR middle-end/91512
> PR fortran/92738
> * invoke.texi: Document -finline-pack.
> * lang.opt: Add -finline-pack.
> * options.c (gfc_post_options): Handle -finline-pack.
> * trans-array.c (gfc_conv_array_parameter): Use flag_inline_pack
> instead of checking for optimize and optimize_size.
>
> 2019-12-07  Thomas Koenig  
>
> PR middle-end/91512
> PR fortran/92738
> * gfortran.dg/inline_pack_25.f90: New test.


Re: [PATCH 09/49] gimple const-correctness fixes

2019-12-09 Thread Richard Biener
On Sat, Dec 7, 2019 at 3:28 PM Jeff Law  wrote:
>
> On Fri, 2019-11-15 at 20:22 -0500, David Malcolm wrote:
> > This patch converts various "gimple *" to "const gimple *" and
> > similar
> > fixes for gimple subclasses, adding is_a_helper for gimple subclasses
> > to support the const form of as_a, and adding a few "const" overloads
> > of accessors.
> >
> > This is enough to make pp_gimple_stmt_1's stmt const.
> >
> > gcc/ChangeLog:
> >   * gimple-predict.h (gimple_predict_predictor): Make "gs" param
> >   const.
> >   (gimple_predict_outcome): Likewise.
> >   * gimple-pretty-print.c (do_niy): Likewise.
> >   (dump_unary_rhs): Likewise.
> >   (dump_binary_rhs): Likewise.
> >   (dump_ternary_rhs): Likewise.
> >   (dump_gimple_assign): Likewise.
> >   (dump_gimple_return): Likewise.
> >   (dump_gimple_call_args): Likewise.
> >   (pp_points_to_solution): Make "pt" param const.
> >   (dump_gimple_call): Make "gs" param const.
> >   (dump_gimple_switch): Likewise.
> >   (dump_gimple_cond): Likewise.
> >   (dump_gimple_label): Likewise.
> >   (dump_gimple_goto): Likewise.
> >   (dump_gimple_bind): Likewise.
> >   (dump_gimple_try): Likewise.
> >   (dump_gimple_catch): Likewise.
> >   (dump_gimple_eh_filter): Likewise.
> >   (dump_gimple_eh_must_not_throw): Likewise.
> >   (dump_gimple_eh_else): Likewise.
> >   (dump_gimple_resx): Likewise.
> >   (dump_gimple_eh_dispatch): Likewise.
> >   (dump_gimple_debug): Likewise.
> >   (dump_gimple_omp_for): Likewise.
> >   (dump_gimple_omp_continue): Likewise.
> >   (dump_gimple_omp_single): Likewise.
> >   (dump_gimple_omp_taskgroup): Likewise.
> >   (dump_gimple_omp_target): Likewise.
> >   (dump_gimple_omp_teams): Likewise.
> >   (dump_gimple_omp_sections): Likewise.
> >   (dump_gimple_omp_block): Likewise.
> >   (dump_gimple_omp_critical): Likewise.
> >   (dump_gimple_omp_ordered): Likewise.
> >   (dump_gimple_omp_scan): Likewise.
> >   (dump_gimple_omp_return): Likewise.
> >   (dump_gimple_transaction): Likewise.
> >   (dump_gimple_asm): Likewise.
> >   (dump_gimple_phi): Make "phi" param const.
> >   (dump_gimple_omp_parallel): Make "gs" param const.
> >   (dump_gimple_omp_task): Likewise.
> >   (dump_gimple_omp_atomic_load): Likewise.
> >   (dump_gimple_omp_atomic_store): Likewise.
> >   (dump_gimple_mem_ops): Likewise.
> >   (pp_gimple_stmt_1): Likewise.  Add "const" to the various as_a
> > <>
> >   casts throughout.
> >   * gimple-pretty-print.h (gimple_stmt_1): Make gimple * param
> > const.
> >   * gimple.h (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (is_a_helper ::test): New.
> >   (gimple_call_tail_p): Make param const.
> >   (gimple_call_return_slot_opt_p): Likewise.
> >   (gimple_call_va_arg_pack_p): Likewise.
> >   (gimple_call_use_set): Add const overload.
> >   (gimple_call_clobber_set): Likewise.
> >   (gimple_has_lhs): Make param const.
> >   (gimple_bind_body): Likewise.
> >   (gimple_catch_handler): Likewise.
> >   (gimple_eh_filter_failure): Likewise.
> >   (gimple_eh_must_not_throw_fndecl): Likewise.
> >   (gimple_eh_else_n_body): Likewise.
> >   (gimple_eh_else_e_body): Likewise.
> >   (gimple_try_eval): Likewise.
> >   (gimple_try_cleanup): Likewise.
> >   (gimple_phi_arg): Add const overload.
> >   (gimple_phi_arg_def): Make param const.
> >   (gimple_phi_arg_edge): Likewise.
> >   (gimple_phi_arg_location): Likewise.
> >   (gimple_phi_arg_has_location): Likewise.
> >   (gimple_debug_bind_get_var): Likewise.
> >   (gimple_debug_bind_get_value): Likewise.
> >   (gimple_debug_source_bind_get_var): Likewise.
> >   (gimple_debug_source_bind_get_value): Likewise.
> >   (gimple_omp_body): Likewise.
> >   (gimple_omp_for_collapse): Likewise.
> >   (gimple_omp_for_pre_body): Likewise.
> >   (gimple_transaction_body): Likewise.
> >   * tree-eh.c (lookup_stmt_eh_lp_fn): Make param "t" const.
> >   (lookup_stmt_eh_lp): Likewise.
> >   * tree-eh.h (lookup_stmt_eh_lp_fn): Make param const.
> >   (lookup_stmt_eh_lp): Likewise.
> >   * tree-ssa-alias.h (pt_solution_empty_p): Make param const.
> >   * tree-ssa-structalias.c (pt_solution_empty_p): Likewise.
> OK.  And more generally, adding "const" to generally improve our const-
> correctness shouldn't require a review cycle.  Just to the normal
> testing and install 'em.
>
> Similarly for dropping unnecessary "struct", "class" or "union" like we
> see in the tree-ssa-structalias.c changes. I'm terrible about adding
> extraneous struct/class keywords when they're not needed.  An

Re: [PATCH 09/49] gimple const-correctness fixes

2019-12-09 Thread Richard Biener
On Fri, Dec 6, 2019 at 6:47 PM David Malcolm  wrote:
>
> On Fri, 2019-12-06 at 11:52 +0100, Richard Biener wrote:
> > On Sat, Nov 16, 2019 at 2:20 AM David Malcolm 
> > wrote:
> > > This patch converts various "gimple *" to "const gimple *" and
> > > similar
> > > fixes for gimple subclasses, adding is_a_helper for gimple
> > > subclasses
> > > to support the const form of as_a, and adding a few "const"
> > > overloads
> > > of accessors.
> > >
> > > This is enough to make pp_gimple_stmt_1's stmt const.
> >
> > Hum.  Can't the const is-a variants be somehow magically implemented
> > generally?  If something is a T then it is also a const T, no?  I
> > guess
> > if something is a const T it isn't a T though?
> >
> > Richard.
>
> It is something of a wart to need new is_a_helper<>::test functions for
> the const variants.
>
> I tried poking at is-a.h to do this in a more generic way, but I'm not
> sure it's doable without an invasive change:  is_a_helper's T is
> already a pointer, so AIUI, if we apply "const" to it, we're making the
> pointer const, rather than the thing being pointed to.
>
> Maybe someone else can see a way?

I guess is_a_helper should always be specialized for CV-unqualified
_non_-pointer types and is_a can then transparently provide
indirect and qualified variants?

We can also do some pointer strippers like

template 
struct strip_pointer { typedef T type; typedef const T const_type; };
template 
struct strip_pointer { typedef T type; typedef const T * const_type; };

and use

template 
struct is_a_helper::const_type> : is_a_helper {};

?  Just my quick idea as a non-C++ literate.

> In the meantime, is this patch OK?  (I use "const gimple *" etc
> throughout the analyzer, to emphasize that I'm not changing them)
>
> Thanks
> Dave
>
>
> > > gcc/ChangeLog:
> > > * gimple-predict.h (gimple_predict_predictor): Make "gs"
> > > param
> > > const.
> > > (gimple_predict_outcome): Likewise.
> > > * gimple-pretty-print.c (do_niy): Likewise.
> > > (dump_unary_rhs): Likewise.
> > > (dump_binary_rhs): Likewise.
> > > (dump_ternary_rhs): Likewise.
> > > (dump_gimple_assign): Likewise.
> > > (dump_gimple_return): Likewise.
> > > (dump_gimple_call_args): Likewise.
> > > (pp_points_to_solution): Make "pt" param const.
> > > (dump_gimple_call): Make "gs" param const.
> > > (dump_gimple_switch): Likewise.
> > > (dump_gimple_cond): Likewise.
> > > (dump_gimple_label): Likewise.
> > > (dump_gimple_goto): Likewise.
> > > (dump_gimple_bind): Likewise.
> > > (dump_gimple_try): Likewise.
> > > (dump_gimple_catch): Likewise.
> > > (dump_gimple_eh_filter): Likewise.
> > > (dump_gimple_eh_must_not_throw): Likewise.
> > > (dump_gimple_eh_else): Likewise.
> > > (dump_gimple_resx): Likewise.
> > > (dump_gimple_eh_dispatch): Likewise.
> > > (dump_gimple_debug): Likewise.
> > > (dump_gimple_omp_for): Likewise.
> > > (dump_gimple_omp_continue): Likewise.
> > > (dump_gimple_omp_single): Likewise.
> > > (dump_gimple_omp_taskgroup): Likewise.
> > > (dump_gimple_omp_target): Likewise.
> > > (dump_gimple_omp_teams): Likewise.
> > > (dump_gimple_omp_sections): Likewise.
> > > (dump_gimple_omp_block): Likewise.
> > > (dump_gimple_omp_critical): Likewise.
> > > (dump_gimple_omp_ordered): Likewise.
> > > (dump_gimple_omp_scan): Likewise.
> > > (dump_gimple_omp_return): Likewise.
> > > (dump_gimple_transaction): Likewise.
> > > (dump_gimple_asm): Likewise.
> > > (dump_gimple_phi): Make "phi" param const.
> > > (dump_gimple_omp_parallel): Make "gs" param const.
> > > (dump_gimple_omp_task): Likewise.
> > > (dump_gimple_omp_atomic_load): Likewise.
> > > (dump_gimple_omp_atomic_store): Likewise.
> > > (dump_gimple_mem_ops): Likewise.
> > > (pp_gimple_stmt_1): Likewise.  Add "const" to the various
> > > as_a <>
> > > casts throughout.
> > > * gimple-pretty-print.h (gimple_stmt_1): Make gimple *
> > > param const.
> > > * gimple.h (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (is_a_helper ::test): New.
> > > (gimple_call_tail_p): Make param const.
> > > (gimple_call_return_slot_opt_p): Likewise.
> > > (gimple_call_va_arg_pack_p): Likewise.
> > > (gimple_call_use_set): Add const overload.
> > > (gimple_call_clobber_set): Likewise.
> > > (gimple_has_lhs): Make param const.
> > > (gimple_bind_body): Likewise.
> > > (gimple_catch_handler): Likewise.
> > 

[PATCH] Fix _GLIBCXX_DEBUG tests static_assert lines

2019-12-09 Thread François Dumont

When applying:

2019-11-26  François Dumont  

    * include/debug/array (array<>::fill): Add C++20 constexpr.
    (array<>::swap): Likewise.

I forgot to adapt some line numbers.

Committed as trivial.

François

diff --git a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
index 3c60a435491..a4b199c23e3 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
@@ -27,6 +27,6 @@ int n1 = std::get<1>(a);
 int n2 = std::get<1>(std::move(a));
 int n3 = std::get<1>(ca);
 
-// { dg-error "static assertion failed" "" { target *-*-* } 294 }
-// { dg-error "static assertion failed" "" { target *-*-* } 303 }
-// { dg-error "static assertion failed" "" { target *-*-* } 311 }
+// { dg-error "static assertion failed" "" { target *-*-* } 295 }
+// { dg-error "static assertion failed" "" { target *-*-* } 304 }
+// { dg-error "static assertion failed" "" { target *-*-* } 312 }
diff --git a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
index a6b44eb57fe..59e728c4a37 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
@@ -22,4 +22,4 @@
 
 typedef std::tuple_element<1, std::array>::type type;
 
-// { dg-error "static assertion failed" "" { target *-*-* } 376 }
+// { dg-error "static assertion failed" "" { target *-*-* } 377 }


Re: [PATCH] Come up with constructors of symtab_node, cgraph_node and varpool_node.

2019-12-09 Thread Martin Liška

On 12/7/19 12:49 AM, Bernhard Reutner-Fischer wrote:

On 5 December 2019 16:24:53 CET, "Martin Liška"  wrote:

-/* Allocate new callgraph node.  */
-
-inline cgraph_node *
-symbol_table::allocate_cgraph_symbol (void)
-{
-  cgraph_node *node;
-
-  node = ggc_cleared_alloc ();
-  node->type = SYMTAB_FUNCTION;
-  node->m_summary_id = -1;
-  node->m_uid = cgraph_max_uid++;
-  return node;
-}

Just because I don't see it in the patch, how is cgraph_max_uid++ maintained 
after that patch?


It's moved to function symbol_table::create_empty (void) where we call:
  return new (ggc_alloc ()) cgraph_node (cgraph_max_uid++);

Thanks,
Martin




thanks,





Re: [PATCH] Enhance _GLIBCXX_DEBUG constexpr support

2019-12-09 Thread François Dumont
After completing execution of all tests I had to fix implementation of 
__check_singular to get only 1 return statement.




On 12/2/19 8:31 PM, François Dumont wrote:

Hi

    Here is a patch to enhance constexpr support in _GLIBCXX_DEBUG. I 
work on std::lower_bound/upper_bound to find out if Debug mode is well 
prepared. I'll continue on other algos later.


    I initially hope that I could count on the compiler for the 
valid_range check. But for lower_bound/upper_bound there is no 
constexpr dedicated implementation like for copy/copy_backward so it 
implies changing the existing implementation. So I tried to change the 
while(__len > 0) into a while(__len != 0) and it gave:


constexpr_valid_range_neg.cc:35:20: error: non-constant condition for 
static assertion

   35 | static_assert(test1()); // { dg-error "" }
  |   ~^~
In file included from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/algorithm:61,

 from constexpr_valid_range_neg.cc:22:
constexpr_valid_range_neg.cc:35:20:   in ‘constexpr’ expansion of 
‘test1()’
constexpr_valid_range_neg.cc:30:38:   in ‘constexpr’ expansion of 
‘std::lower_bound(ca0.std::array::end(), 
ca0.std::array::begin(), 6)’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1484:32: 
in ‘constexpr’ expansion of ‘std::__lower_bound__gnu_cxx::__ops::_Iter_less_val>(__first, __last, (* & __val), 
__gnu_cxx::__ops::__iter_less_val())’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1444:7: 
error: ‘constexpr’ loop iteration count exceeds limit of 262144 (use 
‘-fconstexpr-loop-limit=’ to increase the limit)

 1444 |   while (__len != 0)
  |   ^

    It seems ok but it isn't. The compiler had to loop 262144 times to 
eventually give this status which is not even clear about the fact 
that begin/end has been inverted. It is a quite heavy operation for a 
limited result.


    So this patch rather enable _GLIBCXX_DEBUG valid_range check which 
gives:


constexpr_valid_range_neg.cc:35:20: error: non-constant condition for 
static assertion

   35 | static_assert(test1()); // { dg-error "" }
  |   ~^~
In file included from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/debug/debug.h:90,
 from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:69,
 from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/algorithm:61,

 from constexpr_valid_range_neg.cc:22:
constexpr_valid_range_neg.cc:35:20:   in ‘constexpr’ expansion of 
‘test1()’
constexpr_valid_range_neg.cc:30:38:   in ‘constexpr’ expansion of 
‘std::lower_bound(ca0.std::__debug::array12>::end(), ca0.std::__debug::array::begin(), 6)’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1482:7: 
error: inline assembly is not a constant expression
 1482 |   __glibcxx_requires_partitioned_lower(__first, __last, 
__val);

  |   ^~~~
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1482:7: 
note: only unevaluated inline assembly is allowed in a ‘constexpr’ 
function in C++2a


This time it is done in no time. Of course you can see that the asm 
trick to generate a non-constant condition is not so nice.


We just don't see the asm call parameter but showing 
__glibcxx_requires_partitioned_lower is not so bad. For this reason 
the tests in this patch are not checking for any failure message. 
We'll see how to adapt when we have the necessary front end help to 
generate this compilation error.


Of course I could have a nicer compilation error by directly calling 
__glibcxx_requires_valid_range in the algo and stop doing it within 
__glibcxx_requires_partitioned_lower. Do you want to expose all macros 
this way ?



    * include/debug/formatter.h (__check_singular): Add C++11 constexpr
    qualification.
    * include/debug/helper_functions.h (__check_singular): Likewise. Skip
    check if constant evaluated.
    (__valid_range): Remove check skip if constant evaluated.
    * include/debug/macros.h 
[_GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED]

    (_GLIBCXX_DEBUG_VERIFY_COND_AT): Define.
    * testsuite/25_algorithms/lower_bound/constexpr.cc (test): Add 
checks on

    lower_bound results.
    * testsuite/25_algorithms/upper_bound/constexpr.cc (test): Likewise.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_partitioned_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_partitioned_pred_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_valid_range_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/partitioned_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/partitioned_pred_neg.cc:
    New.
    * testsuite/25_algorithms/upper_bound/debug/
    constexpr_partitioned_neg.cc: New.
    * testsuite/25_algorithms/upper_bound/debug/
    constexpr_partitioned_pred_neg.cc: New.
    * testsuite/25_algorithms/upper_bound/debug/
 

Re: [PATCH] Come up with constructors of symtab_node, cgraph_node and varpool_node.

2019-12-09 Thread Martin Liška

On 12/5/19 5:25 PM, Martin Sebor wrote:

On 12/5/19 8:13 AM, Martin Liška wrote:

On 12/5/19 2:03 PM, Jan Hubicka wrote:

Hi.

As mentioned in the PR, there are classes in cgraph.h that are
not PODs and are initialized with ggc_alloc_cleared. So that I'm suggesting
to use proper constructors. I added ggc_new function that can be used
at different locations as well.

I'm attaching optimized dump file with how ctor expansion looks like.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


index 9c086fedaef..b7dea696782 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -109,6 +109,23 @@ struct GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),
  public:
    friend class symbol_table;


Hello.



+  /* Default constructor.  */
+  symtab_node (symtab_type t)


I've adjusted this in the patch.



Since it takes an argument the above is not the default ctor.
It can be made one by providing a default argument for t, such
as NULL_TREE if that's valid.

I don't know the details of how these things are defined now
(POD vs non-POD) or how they're used (where POD is expected)
but having been bitten a bunch of times recently by various
GCC container templates making assumptions about their
elements being PODs, it seems worth pointing it out here.
If without this change symtab_node is a POD, adding one
could cause problems.  Same for the other structs.

+    : type (t), resolution (LDPR_UNKNOWN), definition (false), alias (false),
+  transparent_alias (false), weakref (false), cpp_implicit_alias (false),
+  symver (false), analyzed (false), writeonly (false),
+  refuse_visibility_changes (false), externally_visible (false),
+  no_reorder (false), force_output (false), forced_by_abi (false),
+  unique_name (false), implicit_section (false), body_removed (false),
+  used_from_other_partition (false), in_other_partition (false),
+  address_taken (false), in_init_priority_hash (false),
+  need_lto_streaming (false), offloadable (false), ifunc_resolver (false),
+  order (false), next_sharing_asm_name (NULL),
+  previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list (),
+  alias_target (NULL), lto_file_data (NULL), aux (NULL),
+  x_comdat_group (NULL_TREE), x_section (NULL)
+  {}

This is a matter of personal preference but for whatever it's
worth, I like using default zero initialization for zero-
initialized members, e.g.,

   definition (), alias (), ... alias_target (), ...

Besides being less verbose it has the advantage that changing
the type of the member doesn't need to require changing its
initializer.  (An argument for the more verbose form might
be that it makes the initial value immediately clear.)


Well, I do prefer the more explicit form where we will use 'false'
as default value for various flag member variables.

I'm going to install the following patch.
Martin



Martin


>From 15994371176a7e8a2efcfe94e6f6dcf9bb3a80e0 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 5 Dec 2019 11:27:43 +0100
Subject: [PATCH] Come up with constructors of symtab_node, cgraph_node and
 varpool_node.

gcc/ChangeLog:

2019-12-05  Martin Liska  

	PR ipa/92737
	* cgraph.c (symbol_table_test::symbol_table_test): Fix
	coding style.
	* cgraph.h (symtab_node::symtab_node): New constructor.
	(cgraph_node::cgraph_node): Likewise.
	(varpool_node::varpool_node): Likewise.
	(symbol_table::allocate_cgraph_symbol): Use newly
	created constructor.
	(symbol_table::allocate_cgraph_symbol): Remove.
	* cgraphunit.c (symtab_terminator): Likewise.
	* varpool.c (varpool_node::create_empty): Use newly
	created constructor.
---
 gcc/cgraph.c | 12 ++---
 gcc/cgraph.h | 63 ++--
 gcc/cgraphunit.c |  2 +-
 gcc/varpool.c|  6 ++---
 4 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 5c72e832a23..5c7a03d61be 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -283,14 +283,8 @@ symbol_table::initialize (void)
 cgraph_node *
 symbol_table::create_empty (void)
 {
-  cgraph_node *node = allocate_cgraph_symbol ();
-
-  node->type = SYMTAB_FUNCTION;
-  node->frequency = NODE_FREQUENCY_NORMAL;
-  node->count_materialization_scale = REG_BR_PROB_BASE;
   cgraph_count++;
-
-  return node;
+  return new (ggc_alloc ()) cgraph_node (cgraph_max_uid++);
 }
 
 /* Register HOOK to be called with DATA on each removed edge.  */
@@ -510,8 +504,6 @@ cgraph_node::create (tree decl)
 
   node->decl = decl;
 
-  node->count = profile_count::uninitialized ();
-
   if ((flag_openacc || flag_openmp)
   && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
 {
@@ -3750,7 +3742,7 @@ symbol_table_test::symbol_table_test ()
 {
   gcc_assert (saved_symtab == NULL);
   saved_symtab = symtab;
-  symtab = new (ggc_alloc  ()) symbol_table ();
+  symtab = new (ggc_alloc ()) symbol_table ();
 }
 
 /* Destructor.  Restore the old value of symtab.  */
diff --git a/gcc/cg

Re: [PATCH] Canonicalize fancy ways of expressing blend operation into COND_EXPR (PR tree-optimization/92834)

2019-12-09 Thread Richard Biener
On Fri, 6 Dec 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following patch canonicalizes fancy ways of writing cond ? A : B
> into COND_EXPR, which is what we expect people writing and thus are able to
> optimize it better.  If in some case we wouldn't optimize it better,
> the right way would be improve the COND_EXPR handling, as that is what
> people are using in the wild most of the time.
> E.g. on the testcase in the patch on x86_64-linux with -O2, the difference
> is that test used to be 519 bytes long and now is 223, with -O2
> -march=skylake used to be the same 519 bytes long and now is 275 bytes (in
> that case it uses the SSE4.1+ min/max).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-12-06  Jakub Jelinek  
> 
>   PR tree-optimization/92834
>   * match.pd (A - ((A - B) & -(C cmp D)) -> (C cmp D) ? B : A,
>   A + ((B - A) & -(C cmp D)) -> (C cmp D) ? B : A): New simplifications.
> 
>   * gcc.dg/tree-ssa/pr92834.c: New test.
> 
> --- gcc/match.pd.jj   2019-12-06 14:07:26.877749065 +0100
> +++ gcc/match.pd  2019-12-06 15:06:08.042953309 +0100
> @@ -2697,6 +2697,31 @@ (define_operator_list COND_TERNARY
>(cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
>(comb (cmp @0 @2) (cmp @1 @2
>  
> +/* Undo fancy way of writing max/min or other ?: expressions,
> +   like a - ((a - b) & -(a < b)), in this case into (a < b) ? b : a.
> +   People normally use ?: and that is what we actually try to optimize.  */
> +(for cmp (simple_comparison)
> + (simplify
> +  (minus @0 (bit_and:c (minus @0 @1)
> +(convert? (negate@4 (convert? (cmp@5 @2 @3))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@4))
> +   && TREE_CODE (TREE_TYPE (@4)) != BOOLEAN_TYPE
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@5))
> +   && (TYPE_PRECISION (TREE_TYPE (@4)) >= TYPE_PRECISION (type)
> +|| !TYPE_UNSIGNED (TREE_TYPE (@4
> +   (cond (cmp @2 @3) @1 @0)))
> + (simplify
> +  (plus:c @0 (bit_and:c (minus @1 @0)
> + (convert? (negate@4 (convert? (cmp@5 @2 @3))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@4))
> +   && TREE_CODE (TREE_TYPE (@4)) != BOOLEAN_TYPE
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@5))
> +   && (TYPE_PRECISION (TREE_TYPE (@4)) >= TYPE_PRECISION (type)
> +|| !TYPE_UNSIGNED (TREE_TYPE (@4
> +   (cond (cmp @2 @3) @1 @0
> +
>  /* Simplifications of shift and rotates.  */
>  
>  (for rotate (lrotate rrotate)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr92834.c.jj2019-12-06 
> 15:24:56.817353747 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr92834.c   2019-12-06 15:24:08.921100518 
> +0100
> @@ -0,0 +1,122 @@
> +/* PR tree-optimization/92834 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR <" 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR <" 8 "optimized" } } */
> +
> +static inline unsigned
> +umax1 (unsigned a, unsigned b)
> +{
> +  return a - ((a - b) & -(a < b));
> +}
> +
> +static inline unsigned
> +umin1 (unsigned a, unsigned b)
> +{
> +  return a - ((a - b) & -(a > b));
> +}
> +
> +static inline int
> +smax1 (int a, int b)
> +{
> +  return a - ((a - b) & -(a < b));
> +}
> +
> +static inline int
> +smin1 (int a, int b)
> +{
> +  return a - ((a - b) & -(a > b));
> +}
> +
> +static inline unsigned long long
> +umax2 (unsigned long long a, unsigned long long b)
> +{
> +  return a - ((a - b) & -(a <= b));
> +}
> +
> +static inline unsigned long long
> +umin2 (unsigned long long a, unsigned long long b)
> +{
> +  return a - ((a - b) & -(a >= b));
> +}
> +
> +static inline long long
> +smax2 (long long a, long long b)
> +{
> +  return a - ((a - b) & -(a <= b));
> +}
> +
> +static inline long long
> +smin2 (long long a, long long b)
> +{
> +  return a - ((a - b) & -(a >= b));
> +}
> +
> +static inline unsigned
> +umax3 (unsigned a, unsigned b)
> +{
> +  return a + ((b - a) & -(a < b));
> +}
> +
> +static inline unsigned
> +umin3 (unsigned a, unsigned b)
> +{
> +  return a + ((b - a) & -(a > b));
> +}
> +
> +static inline int
> +smax3 (int a, int b)
> +{
> +  return a + ((b - a) & -(a < b));
> +}
> +
> +static inline int
> +smin3 (int a, int b)
> +{
> +  return a + ((b - a) & -(a > b));
> +}
> +
> +static inline unsigned long long
> +umax4 (unsigned long long a, unsigned long long b)
> +{
> +  return a + ((b - a) & -(a <= b));
> +}
> +
> +static inline unsigned long long
> +umin4 (unsigned long long a, unsigned long long b)
> +{
> +  return a + ((b - a) & -(a >= b));
> +}
> +
> +static inline long long
> +smax4 (long long a, long long b)
> +{
> +  return a + ((b - a) & -(a <= b));
> +}
> +
> +static inline long long
> +smin4 (long long a, long long b)
> +{
> +  return a + ((b - a) & -(a >= b));
> +}
> +
> +void
> +test (unsigned *x, int *y, unsigned long long *z, long long *w)
> +{
> +  x[2] = umax1 (x[0], 

Re: copy/copy_backward/fill/fill_n/equal rework

2019-12-09 Thread François Dumont
After completing this work and running more tests I realized that the 
declaration of algos was still not ideal.


So here is another version where algos are not re-declare in 
stl_deque.h, I rather include stl_algobase.h in deque.tcc. The problem 
was spotted but another patch I am going to submit afterward.


Note that this patch is based after this one:

https://gcc.gnu.org/ml/libstdc++/2019-10/msg00072.html

François

On 9/25/19 6:44 AM, François Dumont wrote:

Ping ?

On 9/9/19 8:34 PM, François Dumont wrote:

Hi

    This patch improves stl_algobase.h 
copy/copy_backward/fill/fill_n/equal implementations. The 
improvements are:


- activation of algo specialization for __gnu_debug::_Safe_iterator 
(w/o _GLIBCXX_DEBUG mode)


- activation of algo specialization for _Deque_iterator even if mixed 
with another kind of iterator.


- activation of algo specializations __copy_move_a2 for something 
else than pointers. For example this code:


std::vector v { 'a', 'b',  };

ostreambuf_iterator out(std::cout);

std::copy(v.begin(), v.end(), out);

is not calling the specialization __copy_move_a2(const char*, const 
char*, ostreambuf_iterator<>);


It also fix a _GLIBCXX_DEBUG issue where the __niter_base 
specialization was wrongly removing the _Safe_iterator<> layer. The 
testsuite/25_algorithms/copy/debug/1_neg.cc test case was failing on 
a debug assertion because _after_ the copy we were trying to 
increment the vector iterator after past-the-end. Of course the 
problem is the _after_, Debug mode should detect this _before_ it 
takes place which it does now.


Note that std::fill_n is now making use of std::fill for some 
optimizations dealing with random access iterators.


Performances are very good:

Before:

copy_backward_deque_iterators.cc    deque 2 deque 1084r 1084u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 vector 3373r 3372u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    vector 2 deque 3316r 3316u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    int deque 2 char vector 3610r 
3609u    0s 0mem    0pf
copy_backward_deque_iterators.cc    char vector 2 int deque 3552r 
3552u    0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 list 10528r 10528u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    list 2 deque 2161r 2162u 
0s 0mem    0pf
copy_deque_iterators.cc      deque 2 deque         752r 
751u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 vector       3300r 
3299u    0s 0mem    0pf
copy_deque_iterators.cc      vector 2 deque       3144r 
3140u    0s 0mem    0pf
copy_deque_iterators.cc      int deque 2 char vector      3340r 
3338u    1s 0mem    0pf
copy_deque_iterators.cc      char vector 2 int deque      3132r 
3132u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 list     10013r 
10012u    0s 0mem    0pf
copy_deque_iterators.cc      list 2 deque     2274r 
2275u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs deque       8676r 
8675u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs vector      5870r 
5870u    0s 0mem    0pf
equal_deque_iterators.cc     vector vs deque      3163r 
3163u    0s 0mem    0pf
equal_deque_iterators.cc     int deque vs char vector     5845r 
5845u    0s 0mem    0pf
equal_deque_iterators.cc     char vector vs int deque     3307r 
3307u    0s 0mem    0pf


After:

copy_backward_deque_iterators.cc    deque 2 deque  697r  697u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 vector  219r  218u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    vector 2 deque  453r  453u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    int deque 2 char vector 1914r 
1915u    0s 0mem    0pf
copy_backward_deque_iterators.cc    char vector 2 int deque 2112r 
2111u    0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 list 7770r 7771u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    list 2 deque 2194r 2193u 
0s 0mem    0pf
copy_deque_iterators.cc      deque 2 deque         505r 
504u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 vector        221r 
221u    0s 0mem    0pf
copy_deque_iterators.cc      vector 2 deque        398r 
397u    0s 0mem    0pf
copy_deque_iterators.cc      int deque 2 char vector      1770r 
1767u    0s 0mem    0pf
copy_deque_iterators.cc      char vector 2 int deque      1995r 
1993u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 list     7650r 
7641u    2s 0mem    0pf
copy_deque_iterators.cc      list 2 deque     2270r 
2270u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs deque        769r 
768u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs vector  

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-09 Thread Christophe Lyon
On Fri, 6 Dec 2019 at 19:47, Wilco Dijkstra  wrote:
>
> Hi Christophe,
>
> > In practice, how do you activate it when running the GCC testsuite? Do
> > you plan to send a GCC patch to enable this assembler flag, or do you
> > locally enable that option by default in your binutils?
>
> The warning is off by default so there is no need to do anything in the 
> testsuite,
> you just need a fixed binutils.
>
Don't we want to fix GCC to stop generating the offending sequence?

> > FWIW, I've also noticed that the whole libstdc++ testsuite is somehow
> > "deactivated" (I have 0 pass, 0 fail etc...)  after your GCC patch
> > when configuring GCC
> > --target arm-none-linux-gnueabihf
> > --with-mode thumb
> > --with-cpu cortex-a57
> > --with-fpu crypto-neon-fp-armv8
>
> Well it's possible a configure check failed somehow.
>
Yes, it fails when compiling testsuite_abi.cc, resulting in tcl errors.


> Cheers,
> Wilco


Re: introduce -fcallgraph-info option

2019-12-09 Thread Richard Biener
On Tue, 3 Dec 2019, Alexandre Oliva wrote:

> On Nov 14, 2019, Alexandre Oliva  wrote:
> 
> > In order to address this, I propose we add an internal option (not for
> > the driver), -dumpbase-ext, that names the extension to be discarded
> > from dumpbase to form aux output names.
> 
> Here's a WIP patch that implements much of the desired semantics.
> 
> I'm still struggling a bit with -gdwarf-split and -save-temps; -dumpbase
> and multiple inputs, -dumpdir as a prefix, and -flto + -dump*.
> 
> -gdwarf-split uses %b to strip debug info into the .dwo file, so it
> lands in the same location as the .o, rather than in a named -dumpdir as
> specified in the .o debug info skeleton.  I'm thinking of arranging for
> -dump* flags to affect %b and %B, just like -save-temps does.  I've
> reviewed all uses of %b and %B, and it looks like this would enable us
> to fix the .dwo naming mismatch without significant complication.
> 
> Which brings us to the next issue.  This would cause -dumpdir to
> override the -save-temps location.  This is arguably an improvement.  It
> might conflict with -save-temps=cwd, however.
> 
> I'm considering rejecting command lines that specify both an explicit
> -dumpdir and -save-temps=cwd, and in the absence of an explicit
> -dumpdir, arranging for -save-temps=cwd or -save-temps=obj to override
> what would otherwise be the default -dumpdir.
> 
> Or, for the sake of simplifying and bringing more sanity to the logic of
> naming extra output files, we could just discontinue -save-temps=*, and
> require -dumpdir ./ along with plain -save-temps to get the effects of
> -save-temps=cwd.

Making -save-temps=cwd essentially a short-cut to -save-temps -dumpdir ./
is fine I guess (we usually do not start to reject previously accepted
options).  Auto-magically splitting this via the 'Alias' mechanism
isn't (yet) supported I think (split one option into two others).

> 
> When compiling multiple inputs with a single -dumpbase, the current
> implementation arranges for each compilation to take an adjusted
> -dumpbase appending - to the given dumpbase, minus extension.  An
> alternative would be to reject such compilations, just as we reject
> multiple compilations with a single object file named as output.  That
> feels excessive for -dumpbase, however.  OTOH, adjusting -dumpbase only
> when there are multiple inputs causes different behavior comparing:
> 
>   gcc -c foo.c -dumpbase foobar && gcc -c bar.c -dumpbase foobar
> 
> and
> 
>   gcc -c foo.c bar.c -dumpbase foobar
> 
> The latter will name outputs after foobar-foo and foobar-bar,
> respectively, whereas the former will overwrite outputs named foobar
> when compiling bar.c.  Under the proposal to modify %b according to
> -dump*, even object files would be named after an explicit -dumpbase,
> when -o is not explicitly specified.

I think rejecting option combinations that do not make much sense
or would introduce inconsistencies like this is better than trying
to invent creative things second-guessing what the user meant.

> Yet another thing I'm not so sure about is -dumpdir as a prefix, e.g.,
> in cases we're compiling multiple files and then linking them together,
> say 'gcc foo.c bar.c -o foobar', the proposal was to name dumps of the
> compilations after foobar-foo and foobar-bar, respectively.
> 
> If we use -dumpdir as a prefix to dump names, as we historically have,
> if it doesn't end with a slash (or any dir separator) then it could be
> used to specify the prefix for multiple outputs, as in the above.  So
> gcc -dumpdir foobar- foo.c bar.c -o foobar *could* be equivalent to the
> above.  I.e., an executable output name would affect the -dumpdir, but
> not the -dumpbase passed to the compiler, whereas -dumpbase would be
> derived from an asm or obj output or from input.
> 
> In the end, they're pasted together one way or the other, the difference
> is the ability to override one or the other.  E.g.,
> 
>   gcc -dumpbase foobar foo.c bar.c -c
> 
> could then be rejected, just as -o foo+bar.o would be, or foobar could
> be appended to the implicit -dumpdir and then override -dumpbase to
> foo.c or bar.c in each compilation, to get foobar-foo.o and foobar-bar.o
> outputs, getting the same as:
> 
>   gcc -dumpdir foobar- foo.c bar.c -c
> 
> and then
> 
>   gcc -dumpdir temp/foobar- foo.c bar.c -o foo+bar -save-temps
> 
> would still create and preserve .o (and .i and .s) named after
> foobar-foo and foobar-bar within temp, rather than foo+bar-foo and
> foo+bar-bar.

Hum.  I didn't notice -dumpdir is just a prefix and I wouldn't object
to make it errorneous if it doesn't specify an acutal directory.

I also note that neither -dumpdir nor -dumpbase are documented
in invoke.texi (as opposed to -auxbase and -auxbase-strip which
are not user-accessible as they are rejected by the driver).
Not sure if all this means we should document the altered behavior
or if we should take it as a hint we can alter behavior at will
(in future) ;)

> Now, t

Re: [committed, amdgcn] Enable QI/HImode vector moves

2019-12-09 Thread Andrew Stubbs

On 06/12/2019 18:21, Richard Sandiford wrote:

Andrew Stubbs  writes:

Hi all,

This patch re-enables the V64QImode and V64HImode for GCN.

GCC does not make these easy to work with because there is (was?) an
assumption that vector registers do not have excess bits in vector
registers, and therefore does not need to worry about truncating or
extending smaller types, when  vectorized. This is not true on GCN where
each vector lane is always at least 32-bits wide, so we only really
implement loading at storing these vectors modes (for now).


FWIW, partial SVE modes work the same way, and this is supposed to be
supported now.  E.g. SVE's VNx4QI is a vector of QIs stored in SI
containers; in other words, it's a VNx4SI in which only the low 8 bits
of each SI are used.

sext_optab, zext_optab and trunc_optab now support vector modes,
so e.g. extendv64qiv64si2 provides sign extension from V64QI to V64SI.
At the moment, in-register truncations like truncv64siv16qi2 have to
be provided as patterns, even though they're no-ops for the target
machine, since they're not no-ops in rtl terms.

And the main snag is rtl, because this isn't the way GCC expects vector
registers to be laid out.  It looks like you already handle that in
TARGET_CAN_CHANGE_MODE_CLASS and TARGET_SECONDARY_RELOAD though.

For SVE, partial vector loads are actually extending loads and partial
vector stores are truncating stores.  Maybe it's the same for amdgcn.
If so, there's a benefit to providing both native movv64qis
and V64QI->V64SI extending loads, i.e. a combine pattern the fuses
movv64qi with a sign_extend or zero_extend.

(Probably none of that is news, sorry, just saying in case.)


Thanks, Richard.

That it's now supposed to work is news to me; good news! :-)

GCN has both unsigned and signed subword loads, so we should be able to 
have both independent and combined loads.


How does the middle end know that QImode and HImode should be extended 
before use? Is there a hook for that?


I suppose I need to go read what you changed in the internals documentation.

Andrew


[PATCH] [ARC] Use hardware support for double-precision compare instructions.

2019-12-09 Thread Claudiu Zissulescu
Although the FDCMP (the double precision floating point compare instruction) is 
added to the compiler, it is not properly used via cstoredi pattern. Fix it.

OK to apply?
Claudidu

-xx-xx  Claudiu Zissulescu  

* config/arc/arc.md (iterator SDF): Check TARGET_FP_DP_BASE.
(cstoredi4): Use TARGET_HARD_FLOAT.
---
 gcc/config/arc/arc.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index b592f25afce..bd44030b409 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -3749,7 +3749,7 @@ archs4x, archs4xd"
 })
 
 (define_mode_iterator SDF [(SF "TARGET_FP_SP_BASE || TARGET_OPTFPE")
-  (DF "TARGET_OPTFPE")])
+  (DF "TARGET_FP_DP_BASE || TARGET_OPTFPE")])
 
 (define_expand "cstore4"
   [(set (reg:CC CC_REG)
@@ -3759,7 +3759,7 @@ archs4x, archs4xd"
(match_operator:SI 1 "comparison_operator" [(reg CC_REG)
(const_int 0)]))]
 
-  "TARGET_FP_SP_BASE || TARGET_OPTFPE"
+  "TARGET_HARD_FLOAT || TARGET_OPTFPE"
 {
   gcc_assert (XEXP (operands[1], 0) == operands[2]);
   gcc_assert (XEXP (operands[1], 1) == operands[3]);
-- 
2.23.0



Re: [PATCH] Canonicalize fancy ways of expressing blend operation into COND_EXPR (PR tree-optimization/92834)

2019-12-09 Thread Marc Glisse

On Fri, 6 Dec 2019, Jakub Jelinek wrote:


--- gcc/match.pd.jj 2019-12-06 14:07:26.877749065 +0100
+++ gcc/match.pd2019-12-06 15:06:08.042953309 +0100
@@ -2697,6 +2697,31 @@ (define_operator_list COND_TERNARY
  (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
  (comb (cmp @0 @2) (cmp @1 @2

+/* Undo fancy way of writing max/min or other ?: expressions,
+   like a - ((a - b) & -(a < b)), in this case into (a < b) ? b : a.
+   People normally use ?: and that is what we actually try to optimize.  */
+(for cmp (simple_comparison)
+ (simplify
+  (minus @0 (bit_and:c (minus @0 @1)
+  (convert? (negate@4 (convert? (cmp@5 @2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (TREE_TYPE (@4))
+   && TREE_CODE (TREE_TYPE (@4)) != BOOLEAN_TYPE
+   && INTEGRAL_TYPE_P (TREE_TYPE (@5))
+   && (TYPE_PRECISION (TREE_TYPE (@4)) >= TYPE_PRECISION (type)
+  || !TYPE_UNSIGNED (TREE_TYPE (@4
+   (cond (cmp @2 @3) @1 @0)))


I was going to suggest
 (cond @5 @1 @0)

and possibly replacing (cmp@5 @2 @3) with truth_valued_p@5, before 
remembering that COND_EXPR embeds the comparison, and that not 
transforming when we don't see the comparison is likely on purpose. Plus, 
if @5 was in a signed 1-bit type, it may look more like -1 than 1 and 
break the transformation (is that forbidden as return type of a 
comparion?).


--
Marc Glisse


[PATCH] libstdc++: fix buffer overflow in path::operator+= (PR92853)

2019-12-09 Thread Jonathan Wakely

When concatenating a path ending in a root-directory onto another path,
we added an empty filename to the end of the path twice, but only
reserved space for one. That meant the second write went past the end of
the allocated buffer.

PR libstdc++/92853
* src/c++17/fs_path.cc (filesystem::path::operator+=(const path&)):
Do not process a trailing directory separator twice.
* testsuite/27_io/filesystem/path/concat/92853.cc: New test.
* testsuite/27_io/filesystem/path/concat/path.cc: Test more cases.

Tested powerpc64le-linux, committed to trunk. I'll backport to
gcc-9-branch too.


commit ac0d55229433ddd9609684e56474ed2335dd98d8
Author: Jonathan Wakely 
Date:   Mon Dec 9 09:12:26 2019 +

libstdc++: fix buffer overflow in path::operator+= (PR92853)

When concatenating a path ending in a root-directory onto another path,
we added an empty filename to the end of the path twice, but only
reserved space for one. That meant the second write went past the end of
the allocated buffer.

PR libstdc++/92853
* src/c++17/fs_path.cc (filesystem::path::operator+=(const path&)):
Do not process a trailing directory separator twice.
* testsuite/27_io/filesystem/path/concat/92853.cc: New test.
* testsuite/27_io/filesystem/path/concat/path.cc: Test more cases.

diff --git a/libstdc++-v3/src/c++17/fs_path.cc 
b/libstdc++-v3/src/c++17/fs_path.cc
index 5fba971fef6..3aefef271fa 100644
--- a/libstdc++-v3/src/c++17/fs_path.cc
+++ b/libstdc++-v3/src/c++17/fs_path.cc
@@ -975,16 +975,7 @@ path::operator+=(const path& p)
}
 
   if (it != last && it->_M_type() == _Type::_Root_dir)
-   {
- ++it;
- if (it == last)
-   {
- // This root-dir becomes a trailing slash
- auto pos = _M_pathname.length() + p._M_pathname.length();
- ::new(output++) _Cmpt({}, _Type::_Filename, pos);
- ++_M_cmpts._M_impl->_M_size;
-   }
-   }
+   ++it;
 
   while (it != last)
{
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/92853.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/92853.cc
new file mode 100644
index 000..62bde05c3ad
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/92853.cc
@@ -0,0 +1,61 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++17 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/92853
+  using std::filesystem::path;
+  path p1{ "." }, p2{ "/" };
+  p1 += p2;// corrupts heap
+  path p3{ p1 };   // CRASH!
+  __gnu_test::compare_paths( p3, "./" );
+}
+
+void
+test02()
+{
+  using std::filesystem::path;
+  path p1{ "." }, p2{ "" };
+  p1 += p2;
+  path p3{ p1 };
+  __gnu_test::compare_paths( p3, "." );
+}
+
+void
+test03()
+{
+  using std::filesystem::path;
+  path p1{ "./" }, p2{ "/" };
+  p1 += p2;
+  path p3{ p1 };
+  __gnu_test::compare_paths( p3, ".//" );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+  test03();
+}
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/path.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/path.cc
index 9f534e64cb7..16e668c0163 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/path.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/path.cc
@@ -55,6 +55,8 @@ test02()
 path x("//blah/di/blah");
 p += x;
 VERIFY( p.native() == prior_native + x.native() );
+path copy(p);
+compare_paths( copy, p );
   }
 }
 
@@ -66,10 +68,28 @@ test03()
   compare_paths(p, "a//b");
 }
 
+void
+test04()
+{
+  // Concat every test path onto every test path.
+  for (path p : __gnu_test::test_paths)
+  {
+for (path x : __gnu_test::test_paths)
+{
+  auto prior_native = p.native();
+  p += x;
+  VERIFY( p.native() == prior_native + x.native() );
+  path copy(p); // PR libstdc++/98523
+  compare_paths( copy, p );
+}
+  }
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }


Re: [PATCH] libstdc++: fix buffer overflow in path::operator+= (PR92853)

2019-12-09 Thread Jonathan Wakely

On 09/12/19 09:55 +, Jonathan Wakely wrote:

When concatenating a path ending in a root-directory onto another path,
we added an empty filename to the end of the path twice, but only
reserved space for one. That meant the second write went past the end of
the allocated buffer.

PR libstdc++/92853
* src/c++17/fs_path.cc (filesystem::path::operator+=(const path&)):
Do not process a trailing directory separator twice.
* testsuite/27_io/filesystem/path/concat/92853.cc: New test.
* testsuite/27_io/filesystem/path/concat/path.cc: Test more cases.


This adds similar improvements to the test for operator+= for strings,
rather than operator+=(const path&).

Tested x86_64-linux, committed to trunk. I'll backport this to
gcc-9-branch too.


commit 43a2985ffbff0a4c80da0dcc01162a02f9158cdb
Author: Jonathan Wakely 
Date:   Mon Dec 9 09:57:48 2019 +

libstdc++: Improve testing for path::operator+=(const string&)

* testsuite/27_io/filesystem/path/concat/strings.cc: Test more cases.

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
index 80ce25ef119..f51707b171c 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
@@ -113,10 +113,29 @@ test03()
   p4 += s;
   compare_paths(p4, path(s0+'/'+s));
 }
+
+void
+test04()
+{
+  // Concat every test path onto every test path.
+  for (path p : __gnu_test::test_paths)
+  {
+for (path x : __gnu_test::test_paths)
+{
+  auto prior_native = p.native();
+  p += x.native();
+  VERIFY( p.native() == prior_native + x.native() );
+  path copy(p);
+  compare_paths( copy, p );
+}
+  }
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }


[PATCH] Extend std::lexicographical_compare optimizations

2019-12-09 Thread François Dumont

Following:

https://gcc.gnu.org/ml/libstdc++/2019-12/msg00028.html

I've done the same kind of work on std::lexicographical_compare algo.

I had to make the internal lexicographical_compare functions return int 
rather than bool cause with bool you can't use a chunck based approach 
unless you double the number of comparison (once a < b and and another b 
< a).


    * include/bits/stl_algobase.h
    (__lexicographical_compare_impl): Return int.
    (__lexicographical_compare::__lc): Likewise.
    (__lexicographical_compare_aux1(_II1, _II1, _II2, _II2)): New.
    (__lexicographical_compare_aux1(_Deque_iterator<>, _Deque_iterator<>,
    _II2, _II2)): New.
    (__lexicographical_compare_aux1(_II1, _II1,
    _Deque_iterator<>, _Deque_iterator<>)): New.
    (__lexicographical_compare_aux1(_Deque_iterator<>, _Deque_iterator<>,
    _Deque_iterator<>, _Deque_iterator<>)): New.
    (__lexicographical_compare_aux): Adapt, call later.
    (__lexicographical_compare_aux(_Safe_iterator<>, _Safe_iterator<>,
    _II2, _II2)): New.
    (__lexicographical_compare_aux(_II1, _II1,
    _Safe_iterator<>, _Safe_iterator<>)): New.
    (__lexicographical_compare_aux(_Safe_iterator<>, _Safe_iterator<>,
    _Safe_iterator<>, _Safe_iterator<>)): New.
    (std::lexicographical_compare): Adapt, call later.
    * include/bits/deque.tcc (__lex_cmp_dit): New.
    (__lexicographical_compare_aux1): Add definitions.
    * include/debug/safe_iterator.tcc (__lexicographical_compare_aux): New.
    * testsuite/25_algorithms/lexicographical_compare/1.cc (test6, test7):
    New.
    * testsuite/25_algorithms/lexicographical_compare/deque_iterators/1.cc:
    New.

Tested under Linux x86_64 normal and debug modes.

François

diff --git a/libstdc++-v3/include/bits/deque.tcc b/libstdc++-v3/include/bits/deque.tcc
index ae5366d6208..ef32d2d19dd 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -1210,6 +1210,98 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   return true;
 }
 
+  template
+int
+__lex_cmp_dit(
+	const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr>& __first1,
+	const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr>& __last1,
+	_II __first2, _II __last2)
+{
+  typedef _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr> _Iter;
+  typedef typename _Iter::difference_type difference_type;
+
+  if (__first1._M_node != __last1._M_node)
+	{
+	  difference_type __len = __last2 - __first2;
+	  difference_type __flen
+	= std::min(__len, __first1._M_last - __first1._M_cur);
+	  if (int __ret = std::__lexicographical_compare_aux1(
+	  __first1._M_cur, __first1._M_last, __first2, __first2 + __flen))
+	return __ret;
+
+	  __first2 += __flen;
+	  __len -= __flen;
+	  __flen = std::min(__len, _Iter::_S_buffer_size());
+	  for (typename _Iter::_Map_pointer __node = __first1._M_node + 1;
+	   __node != __last1._M_node;
+	   __first2 += __flen, __len -= __flen,
+	   __flen = std::min(__len, _Iter::_S_buffer_size()),
+	   ++__node)
+	if (int __ret = std::__lexicographical_compare_aux1(
+		  *__node, *__node + _Iter::_S_buffer_size(),
+		  __first2, __first2 + __flen))
+	  return __ret;
+
+	  return std::__lexicographical_compare_aux1(
+	__last1._M_first, __last1._M_cur, __first2, __last2);
+	}
+
+  return std::__lexicographical_compare_aux1(
+	  __first1._M_cur, __last1._M_cur, __first2, __last2);
+}
+
+  template
+typename __gnu_cxx::__enable_if<
+  __is_random_access_iter<_II2>::__value, int>::__type
+__lexicographical_compare_aux1(
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __first1,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __last1,
+		_II2 __first2, _II2 __last2)
+{ return std::__lex_cmp_dit(__first1, __last1, __first2, __last2); }
+
+  template
+int
+__lexicographical_compare_aux1(
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __first1,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __last1,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __first2,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __last2)
+{ return std::__lex_cmp_dit(__first1, __last1, __first2, __last2); }
+
+  template
+typename __gnu_cxx::__enable_if<
+  __is_random_access_iter<_II1>::__value, int>::__type
+__lexicographical_compare_aux1(
+		_II1 __first1, _II1 __last1,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __first2,
+		_GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __last2)
+{
+  typedef _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> _Iter;
+  typedef typename _Iter::difference_type difference_type;
+
+  difference_type __len = __last1 - __first1;
+  while (__len > 0)
+	{
+	  const difference_type __flen = __first2._M_node == __last2._M_node
+	? __last2._M_cur - __first2._M_cur
+	: __first2._M_last - __first2._M_cur;
+	  const difference_type __clen = std::min(__len, __flen);
+	  if (int __ret = std::__lexicographical_

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-12-09 Thread Prathamesh Kulkarni
On Thu, 5 Dec 2019 at 18:17, Richard Biener  wrote:
>
> On Thu, 5 Dec 2019, Prathamesh Kulkarni wrote:
>
> > On Fri, 29 Nov 2019 at 15:41, Richard Biener  
> > wrote:
> > >
> > > On Fri, Nov 22, 2019 at 12:40 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 20 Nov 2019 at 16:54, Richard Biener  wrote:
> > > > >
> > > > > On Wed, 20 Nov 2019, Richard Sandiford wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> > > > > > tag might have put him off.  There's not really anything 
> > > > > > SVE-specific
> > > > > > here apart from the testcase.
> > > > >
> > > > > Ah.
> > > > >
> > > > > > > 2019-11-19  Prathamesh Kulkarni  
> > > > > > >
> > > > > > > PR tree-optimization/89007
> > > > > > > * tree-vect-patterns.c (vect_recog_average_pattern): If there 
> > > > > > > is no
> > > > > > > target support available, generate code to distribute rshift 
> > > > > > > over plus
> > > > > > > and add one depending upon floor or ceil rounding.
> > > > > > >
> > > > > > > testsuite/
> > > > > > > * gcc.target/aarch64/sve/pr89007.c: New test.
> > > > > > >
> > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > > > new file mode 100644
> > > > > > > index 000..32095c63c61
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > > > @@ -0,0 +1,29 @@
> > > > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > > > > --save-temps" } */
> > > > > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > > > > +
> > > > > > > +#define N 1024
> > > > > > > +unsigned char dst[N];
> > > > > > > +unsigned char in1[N];
> > > > > > > +unsigned char in2[N];
> > > > > > > +
> > > > > > > +/*
> > > > > > > +**  foo:
> > > > > > > +** ...
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** add (z[0-9]+\.b), \1, \2
> > > > > > > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > > > > +** add z0.b, \3, \5
> > > > > >
> > > > > > It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> > > > > > (\3, \5|\5, \3).  Same for the other testcase.
> > > > > >
> > > > > > > +** ...
> > > > > > > +*/
> > > > > > > +void
> > > > > > > +foo ()
> > > > > > > +{
> > > > > > > +  for( int x = 0; x < N; x++ )
> > > > > > > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > > > new file mode 100644
> > > > > > > index 000..cc40f45046b
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > > > @@ -0,0 +1,29 @@
> > > > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > > > > --save-temps" } */
> > > > > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > > > > +
> > > > > > > +#define N 1024
> > > > > > > +unsigned char dst[N];
> > > > > > > +unsigned char in1[N];
> > > > > > > +unsigned char in2[N];
> > > > > > > +
> > > > > > > +/*
> > > > > > > +**  foo:
> > > > > > > +** ...
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** add (z[0-9]+\.b), \1, \2
> > > > > > > +** and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > > > > +** add z0.b, \3, \5
> > > > > > > +** ...
> > > > > > > +*/
> > > > > > > +void
> > > > > > > +foo ()
> > > > > > > +{
> > > > > > > +  for( int x = 0; x < N; x++ )
> > > > > > > +dst[x] = (in1[x] + in2[x]) >> 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > > > index 8ebbcd76b64..7025a3b4dc2 100644
> > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> > > > > > > last_stmt_info, tree *type_out)
> > > > > >
> > > > > > >/* Check for target support.  */
> > > > > > >tree new_vectype = get_vectype_for_scalar_type (vinfo, 
> > > > > > > new_type);
> > > > > > > -  if (!new_vectype
> > > > > > > -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> > > > > > > - 

Re: [committed, amdgcn] Enable QI/HImode vector moves

2019-12-09 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 06/12/2019 18:21, Richard Sandiford wrote:
>> Andrew Stubbs  writes:
>>> Hi all,
>>>
>>> This patch re-enables the V64QImode and V64HImode for GCN.
>>>
>>> GCC does not make these easy to work with because there is (was?) an
>>> assumption that vector registers do not have excess bits in vector
>>> registers, and therefore does not need to worry about truncating or
>>> extending smaller types, when  vectorized. This is not true on GCN where
>>> each vector lane is always at least 32-bits wide, so we only really
>>> implement loading at storing these vectors modes (for now).
>> 
>> FWIW, partial SVE modes work the same way, and this is supposed to be
>> supported now.  E.g. SVE's VNx4QI is a vector of QIs stored in SI
>> containers; in other words, it's a VNx4SI in which only the low 8 bits
>> of each SI are used.
>> 
>> sext_optab, zext_optab and trunc_optab now support vector modes,
>> so e.g. extendv64qiv64si2 provides sign extension from V64QI to V64SI.
>> At the moment, in-register truncations like truncv64siv16qi2 have to
>> be provided as patterns, even though they're no-ops for the target
>> machine, since they're not no-ops in rtl terms.
>> 
>> And the main snag is rtl, because this isn't the way GCC expects vector
>> registers to be laid out.  It looks like you already handle that in
>> TARGET_CAN_CHANGE_MODE_CLASS and TARGET_SECONDARY_RELOAD though.
>> 
>> For SVE, partial vector loads are actually extending loads and partial
>> vector stores are truncating stores.  Maybe it's the same for amdgcn.
>> If so, there's a benefit to providing both native movv64qis
>> and V64QI->V64SI extending loads, i.e. a combine pattern the fuses
>> movv64qi with a sign_extend or zero_extend.
>> 
>> (Probably none of that is news, sorry, just saying in case.)
>
> Thanks, Richard.
>
> That it's now supposed to work is news to me; good news! :-)
>
> GCN has both unsigned and signed subword loads, so we should be able to 
> have both independent and combined loads.

Yeah, SVE supports both signed and unsigned too.  We used unsigned
for "pure" QI moves.

> How does the middle end know that QImode and HImode should be extended 
> before use? Is there a hook for that?

For SVE we just provide .md patterns for all modes and hide any adjustment
there.  This means that we can decide on a case-by-case basis whether to
use the narrow "element" mode or the wide "container" mode.

E.g. rshifts by VNx2QI would still use QImode shifts and just ignore the
extra elements.  But other operations use the container mode instead.  E.g.:

(define_insn "vec_series"
  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w")
(vec_series:SVE_I
  (match_operand: 1 "aarch64_sve_index_operand" "Usi, r, r")
  (match_operand: 2 "aarch64_sve_index_operand" "r, Usi, r")))]
  "TARGET_SVE"
  "@
   index\t%0., #%1, %2
   index\t%0., %1, #%2
   index\t%0., %1, %2"
)

(define_mode_attr Vctype [(VNx16QI "b") (VNx8QI "h") (VNx4QI "s") (VNx2QI "d")
  ...)

So VNx2QI is actually a 64-bit ("d") operation.

For things like addition and logic ops it doesn't whether we pick the
element mode or the container mode.

I guess if the wide mode is the only option, the .md patterns for things
like rshifts would need to extend the inputs first.  There's currently
no specific option to force the vectoriser to do this itself.  (In most
cases, you might get that effect if you don't provide QI rshift patterns,
since rshifts are usually still int operations on entry to the vectoriser.
That doesn't sound very robust though.)

Thanks,
Richard


Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-09 Thread Sudakshina Das
Hi Jeff

On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>  Richard Sandiford  
>>
>>  * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>  vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.

Thanks :) Committed as r279114.

Sudi

> 
> jeff
> 



[PATCH] Use OPTION_MASK_ISA2_$target_[SET, UNSET, ] to indicate those for x_ix86_isa_flags2

2019-12-09 Thread Hongtao Liu
Hi uros:
  This patch is about to rename OPTION_MASK_ISA_$target_[SET,UNSET, ]
to OPTION_MASK_ISA2_$target_[SET,UNSET, ] for those targets setting
x_ix86_isa_flags2.
  target list as bellow:
-
 188static struct ix86_target_opts isa2_opts[] =
 189{
 190  { "-mcx16",   OPTION_MASK_ISA2_CX16 },
 191  { "-mvaes",   OPTION_MASK_ISA2_VAES },
 192  { "-mrdpid",  OPTION_MASK_ISA2_RDPID },
 193  { "-mpconfig",OPTION_MASK_ISA2_PCONFIG },
 194  { "-mwbnoinvd",   OPTION_MASK_ISA2_WBNOINVD },
 195  { "-mavx512vp2intersect", OPTION_MASK_ISA2_AVX512VP2INTERSECT },
 196  { "-msgx",OPTION_MASK_ISA2_SGX },
 197  { "-mavx5124vnniw",   OPTION_MASK_ISA2_AVX5124VNNIW },
 198  { "-mavx5124fmaps",   OPTION_MASK_ISA2_AVX5124FMAPS },
 199  { "-mhle",OPTION_MASK_ISA2_HLE },
 200  { "-mmovbe",  OPTION_MASK_ISA2_MOVBE },
 201  { "-mclzero", OPTION_MASK_ISA2_CLZERO },
 202  { "-mmwaitx", OPTION_MASK_ISA2_MWAITX },
 203  { "-mmovdir64b",  OPTION_MASK_ISA2_MOVDIR64B },
 204  { "-mwaitpkg",OPTION_MASK_ISA2_WAITPKG },
 205  { "-mcldemote",   OPTION_MASK_ISA2_CLDEMOTE },
 206  { "-mptwrite",OPTION_MASK_ISA2_PTWRITE },
 207  { "-mavx512bf16", OPTION_MASK_ISA2_AVX512BF16 },
 208  { "-menqcmd", OPTION_MASK_ISA2_ENQCMD }
 209};
--

  Bootstrap and regression test on i386/x86-64 backend is ok.
  Ok for trunk?

Changelog
* gcc/common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX5124FMAPS_SET): Rename to
OPTION_MASK_ISA2_AVX5124FMAPS_SET.
(OPTION_MASK_ISA_AVX5124VNNIW_SET, OPTION_MASK_ISA_AVX512BF16_SET,
OPTION_MASK_ISA_AVX512VP2INTERSECT_SET,
OPTION_MASK_ISA_PCONFIG_SET, OPTION_MASK_ISA_WBNOINVD_SET,
OPTION_MASK_ISA_SGX_SET, OPTION_MASK_ISA_CX16_SET,
OPTION_MASK_ISA_MOVBE_SET, OPTION_MASK_ISA_PTWRITE_SET,
OPTION_MASK_ISA_MWAITX_SET, OPTION_MASK_ISA_CLZERO_SET,
OPTION_MASK_ISA_RDPID_SET, OPTION_MASK_ISA_VAES_SET,
OPTION_MASK_ISA_MOVDIR64B_SET, OPTION_MASK_ISA_WAITPKG_SET,
OPTION_MASK_ISA_CLDEMOTE_SET, OPTION_MASK_ISA_ENQCMD_SET,
OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
OPTION_MASK_ISA_AVX5124VNNIW_UNSET,
OPTION_MASK_ISA_AVX512BF16_UNSET,
OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET,
OPTION_MASK_ISA_PCONFIG_UNSET, OPTION_MASK_ISA_WBNOINVD_UNSET,
OPTION_MASK_ISA_SGX_UNSET, OPTION_MASK_ISA_CX16_UNSET,
OPTION_MASK_ISA_MOVBE_UNSET, OPTION_MASK_ISA_PTWRITE_UNSET,
OPTION_MASK_ISA_MWAITX_UNSET, OPTION_MASK_ISA_CLZERO_UNSET,
OPTION_MASK_ISA_RDPID_UNSET, OPTION_MASK_ISA_VAES_UNSET,
OPTION_MASK_ISA_MOVDIR64B_UNSET, OPTION_MASK_ISA_WAITPKG_UNSET,
OPTION_MASK_ISA_CLDEMOTE_UNSET, OPTION_MASK_ISA_ENQCMD_UNSET,
OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
OPTION_MASK_ISA_PCONFIG, OPTION_MASK_ISA_WBNOINVD,
OPTION_MASK_ISA_SGX, OPTION_MASK_ISA_CX16, OPTION_MASK_ISA_MOVBE,
OPTION_MASK_ISA_PTWRITE, OPTION_MASK_ISA_MWAITX,
OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_RDPID,
OPTION_MASK_ISA_VAES, OPTION_MASK_ISA_MOVDIR64B,
OPTION_MASK_ISA_WAITPKG, OPTION_MASK_ISA_CLDEMOTE,
OPTION_MASK_ISA_ENQCMD): Ditto.

* gcc/config/i386/i386-builtin.def
(OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
OPTION_MASK_ISA_WBNOINVD, OPTION_MASK_ISA_PTWRITE,
OPTION_MASK_ISA_RDPID, OPTION_MASK_ISA_VAES,
OPTION_MASK_ISA_MOVDIR64B, OPTION_MASK_ISA_ENQCMD): Ditto.
* gcc/config/i386/i386-builtins.c (OPTION_MASK_ISA_MWAITX,
OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_WAITPKG,
OPTION_MASK_ISA_CLDEMOTE, OPTION_MASK_ISA_WBNOINVD): Ditto.
* gcc/config/i386/i386-c.c
(OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
OPTION_MASK_ISA_PCONFIG, OPTION_MASK_ISA_WBNOINVD,
OPTION_MASK_ISA_SGX, OPTION_MASK_ISA_CX16, OPTION_MASK_ISA_MOVBE,
OPTION_MASK_ISA_PTWRITE, OPTION_MASK_ISA_MWAITX,
OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_RDPID,
OPTION_MASK_ISA_VAES, OPTION_MASK_ISA_MOVDIR64B,
OPTION_MASK_ISA_WAITPKG, OPTION_MASK_ISA_CLDEMOTE,
OPTION_MASK_ISA_ENQCMD): Ditto.
* gcc/config/i386/i386-option.c: Ditto
* gcc/config/i386/i386.opt: Ditto..
* gcc/config/i386/i386.h: (TARGET_ISA_AVX5124FMAPS,
TARGET_ISA_AVX5124VNNIW,  TARGET_ISA_AVX512BF16,
TARGET_ISA_AVX512VP2INTERSECT, TARGET_ISA_PCONFIG,
TARGET_ISA_WBNOINVD, TARGET_ISA_SGX, TARGET_ISA_CX16,
TARGET_ISA_MOVBE, TARGET_ISA_PTWRITE, TARGET_ISA_MWAITX,
TARGET_ISA_CLZERO, TARGET_ISA_RDPID, TARGET_ISA_VAES,
TARGET_ISA_MOVDIR64B, TARGET_ISA_WAITPKG, TARGET_ISA_CLDEMOTE,
TARGET_ISA_ENQCMD): Ditto.

-- 
BR,
Hongtao
From 42d004c271228a4d6a1075cf4b77ae3282388e69 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Mon, 9 Dec 2019 13:37:46 +0800
Subject: [PATCH] Use OPTION_MASK_ISA2_*_SET, OPTION_MASK_ISA2_*

Re: [PATCH] Canonicalize fancy ways of expressing blend operation into COND_EXPR (PR tree-optimization/92834)

2019-12-09 Thread Jakub Jelinek
On Mon, Dec 09, 2019 at 10:54:34AM +0100, Marc Glisse wrote:
> On Fri, 6 Dec 2019, Jakub Jelinek wrote:
> 
> > --- gcc/match.pd.jj 2019-12-06 14:07:26.877749065 +0100
> > +++ gcc/match.pd2019-12-06 15:06:08.042953309 +0100
> > @@ -2697,6 +2697,31 @@ (define_operator_list COND_TERNARY
> >   (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
> >   (comb (cmp @0 @2) (cmp @1 @2
> > 
> > +/* Undo fancy way of writing max/min or other ?: expressions,
> > +   like a - ((a - b) & -(a < b)), in this case into (a < b) ? b : a.
> > +   People normally use ?: and that is what we actually try to optimize.  */
> > +(for cmp (simple_comparison)
> > + (simplify
> > +  (minus @0 (bit_and:c (minus @0 @1)
> > +  (convert? (negate@4 (convert? (cmp@5 @2 @3))
> > +  (if (INTEGRAL_TYPE_P (type)
> > +   && INTEGRAL_TYPE_P (TREE_TYPE (@4))
> > +   && TREE_CODE (TREE_TYPE (@4)) != BOOLEAN_TYPE
> > +   && INTEGRAL_TYPE_P (TREE_TYPE (@5))
> > +   && (TYPE_PRECISION (TREE_TYPE (@4)) >= TYPE_PRECISION (type)
> > +  || !TYPE_UNSIGNED (TREE_TYPE (@4
> > +   (cond (cmp @2 @3) @1 @0)))
> 
> I was going to suggest
>  (cond @5 @1 @0)
> 
> and possibly replacing (cmp@5 @2 @3) with truth_valued_p@5, before
> remembering that COND_EXPR embeds the comparison, and that not transforming
> when we don't see the comparison is likely on purpose. Plus, if @5 was in a
> signed 1-bit type, it may look more like -1 than 1 and break the
> transformation (is that forbidden as return type of a comparion?).

FYI, I've already committed the patch, so any improvement or bugfix needs to
be done incrementally.
The comparison in there was mainly an attempt to have a truth value
in there, so maybe truth_valued_p would work too, maybe even a
get_range_info checked value of [0, 1] would, though perhaps just
truth_valued_p is better because it involves some kind of setcc-like
instruction in the end.  All I'd like to see for comparisons is that they
are in the COND_EXPR's first operand if they can't throw.
I'm afraid I have no idea whether we can have signed 1-bit truth_valued_p
operations and what will happen with them, if it is possible, then I guess
an additional condition will be needed, check that it has prec > 1 or
TYPE_UNSIGNED.

Jakub



[PATCH] Extend std::copy_n optimization

2019-12-09 Thread François Dumont

Last patch of my series following this one:

https://gcc.gnu.org/ml/libstdc++/2019-12/msg00028.html

This time I work on std::copy_n/std::copy overloads for 
istreambuf_iterator so that it works also for deque iterators and 
transparently in _GLIBCXX_DEBUG mode.



    * include/bits/stl_algo.h (__copy_n_a): Move to ...
    * include/bits/stl_algobase.h (__copy_n_a): ...here. Add __strict
    parameter.
    (__niter_base(const _Safe_iterator<_Ite, _Seq,
    random_access_iterator_tag>&)): New declaration.
    (__copy_move_a2(istreambuf_iterator<>, istreambuf_iterator<>,
    _Deque_iterator<>)): New declaration.
    (__copy_n_a(istreambuf_iterator<>, _Size, _Deque_iterator<>, bool)):
    New declaration.
    * include/bits/deque.tcc
    (__copy_move_a2(istreambuf_iterator<>, istreambuf_iterator<>,
    _Deque_iterator<>)): Add definition.
    (__copy_n_a(istreambuf_iterator<>, _Size, _Deque_iterator<>, bool)):
    Add definition.
    * include/bits/streambuf_iterator.h
    (__copy_n_a(istreambuf_iterator<>, _Size, _CharT*, bool)): Adapt
    definition.
    * include/debug/safe_iterator.tcc (__niter_base): Add definition.
    * testsuite/25_algorithms/copy/streambuf_iterators/char/4.cc (test03):
    New.
    * testsuite/25_algorithms/copy/streambuf_iterators/char/debug/
    deque_neg.cc: New.
    * testsuite/25_algorithms/copy_n/debug/istreambuf_ite_deque_neg.cc:
    New.
    * testsuite/25_algorithms/copy_n/istreambuf_iterator/2.cc: New.
    * testsuite/25_algorithms/copy_n/istreambuf_iterator/deque.cc: New.

Tested under Linux x86_64 normal and debug modes.

François

diff --git a/libstdc++-v3/include/bits/deque.tcc b/libstdc++-v3/include/bits/deque.tcc
index ef32d2d19dd..009b696e7c4 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -1065,6 +1065,57 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   return __result;
 }
 
+#if __cplusplus >= 201103L
+  template
+__enable_if_t<__is_char<_CharT>::__value,
+		  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*>>
+__copy_move_a2(
+	istreambuf_iterator<_CharT, char_traits<_CharT> > __first,
+	istreambuf_iterator<_CharT, char_traits<_CharT> > __last,
+	_GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> __result)
+{
+  if (__first == __last)
+	return __result;
+
+  for (;;)
+	{
+	  const auto __len = __result._M_last - __result._M_cur;
+	  const auto __nb
+	= std::__copy_n_a(__first, __len, __result._M_cur, false)
+	- __result._M_cur;
+	  __result += __nb;
+
+	  if (__nb != __len)
+	break;
+	}
+
+  return __result;
+}
+
+  template
+__enable_if_t<__is_char<_CharT>::__value,
+		  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*>>
+__copy_n_a(
+  istreambuf_iterator<_CharT, char_traits<_CharT>> __it, _Size __size,
+  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> __result,
+  bool __strict)
+{
+  if (__size == 0)
+	return __result;
+
+  do
+	{
+	  const auto __len = std::min<_Size>(__result._M_last - __result._M_cur,
+	 __size);
+	  std::__copy_n_a(__it, __len, __result._M_cur, __strict);
+	  __result += __len;
+	  __size -= __len;
+	}
+  while (__size != 0);
+  return __result;
+}
+#endif
+
   template
 _OI
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 769c27a02b6..4f2a6bbdbbf 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -775,31 +775,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __result;
 }
 
-  template
-_GLIBCXX20_CONSTEXPR
-_OutputIterator
-__copy_n_a(_InputIterator __first, _Size __n, _OutputIterator __result)
-{
-  if (__n > 0)
-	{
-	  while (true)
-	{
-	  *__result = *__first;
-	  ++__result;
-	  if (--__n > 0)
-		++__first;
-	  else
-		break;
-	}
-	}
-  return __result;
-}
- 
-  template
-__enable_if_t<__is_char<_CharT>::__value, _CharT*>
-__copy_n_a(istreambuf_iterator<_CharT, char_traits<_CharT>>,
-	   _Size, _CharT*);
-
   template
 _GLIBCXX20_CONSTEXPR
 _OutputIterator
@@ -808,7 +783,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   return std::__niter_wrap(__result,
 			   __copy_n_a(__first, __n,
-	  std::__niter_base(__result)));
+	  std::__niter_base(__result), true));
 }
 
   template::value)
 { return __it; }
 
+  template
+_Ite
+__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
+		 std::random_access_iterator_tag>&);
+
   // Reverse the __niter_base transformation to get a
   // __normal_iterator back again (this assumes that __normal_iterator
   // is only used to wrap random access iterators, like pointers).
@@ -484,6 +489,13 @@ namespace __detail
 	}
 };
 
+_GLIBCXX_BEGIN_NAMESPACE_CONTAINER
+
+  template
+struct _Deque_iterator;
+
+_GLIBCXX_END_NAMESPACE_CONTAINER
+
   // Helpers for streambuf iterators (either istream or ostream).
   // NB: avoid in

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-09 Thread Wilco Dijkstra
Hi Christophe,

>> The warning is off by default so there is no need to do anything in the 
>> testsuite,
>> you just need a fixed binutils.
>>
>
> Don't we want to fix GCC to stop generating the offending sequence?

Why? All ARMv8 implementations have to support it, and despite the warning 
code actually runs significantly faster.

>> Well it's possible a configure check failed somehow.
>>
> Yes, it fails when compiling testsuite_abi.cc, resulting in tcl errors.

It's odd it's that sensitive to extra warnings, but anyway...

Cheers,
Wilco

Re: [committed, amdgcn] Enable QI/HImode vector moves

2019-12-09 Thread Andrew Stubbs
Oops, please consider this patch as submitted from my @codesourcery.com 
address, for copyright assignment purposes.


Andrew

On 06/12/2019 17:31, Andrew Stubbs wrote:

Hi all,

This patch re-enables the V64QImode and V64HImode for GCN.

GCC does not make these easy to work with because there is (was?) an 
assumption that vector registers do not have excess bits in vector 
registers, and therefore does not need to worry about truncating or 
extending smaller types, when  vectorized. This is not true on GCN where 
each vector lane is always at least 32-bits wide, so we only really 
implement loading at storing these vectors modes (for now).


These modes were originally disabled because, previously, the GCC 
vectorizer would "lock" into the first vector register size that it 
encountered in a region, and would refuse to vectorize any type that 
didn't match that size in the rest of that region. On GCN, where all 
types have the same number of lanes, and therefore different bit-sizes, 
this meant that allowing QImode or HImode could prevent it vectorizing 
SImode or DImode, which are the ones we really want vectorized.


Now that Richard Sandiford has introduced TARGET_VECTORIZE_RELATED_MODE 
this issue has now been removed, and we can re-enable the vector types 
once more. Thanks Richard! :-)


This change results in 207 new passes in the vect.exp (there's also 41 
new fails, but those are exposed bugs I'll fix shortly). Some of these 
were internal compiler errors that did not exist in older compilers.






Re: [PATCH] Use OPTION_MASK_ISA2_$target_[SET, UNSET, ] to indicate those for x_ix86_isa_flags2

2019-12-09 Thread Uros Bizjak
On Mon, Dec 9, 2019 at 11:25 AM Hongtao Liu  wrote:
>
> Hi uros:
>   This patch is about to rename OPTION_MASK_ISA_$target_[SET,UNSET, ]
> to OPTION_MASK_ISA2_$target_[SET,UNSET, ] for those targets setting
> x_ix86_isa_flags2.
>   target list as bellow:
> -
>  188static struct ix86_target_opts isa2_opts[] =
>  189{
>  190  { "-mcx16",   OPTION_MASK_ISA2_CX16 },
>  191  { "-mvaes",   OPTION_MASK_ISA2_VAES },
>  192  { "-mrdpid",  OPTION_MASK_ISA2_RDPID },
>  193  { "-mpconfig",OPTION_MASK_ISA2_PCONFIG },
>  194  { "-mwbnoinvd",   OPTION_MASK_ISA2_WBNOINVD },
>  195  { "-mavx512vp2intersect", OPTION_MASK_ISA2_AVX512VP2INTERSECT },
>  196  { "-msgx",OPTION_MASK_ISA2_SGX },
>  197  { "-mavx5124vnniw",   OPTION_MASK_ISA2_AVX5124VNNIW },
>  198  { "-mavx5124fmaps",   OPTION_MASK_ISA2_AVX5124FMAPS },
>  199  { "-mhle",OPTION_MASK_ISA2_HLE },
>  200  { "-mmovbe",  OPTION_MASK_ISA2_MOVBE },
>  201  { "-mclzero", OPTION_MASK_ISA2_CLZERO },
>  202  { "-mmwaitx", OPTION_MASK_ISA2_MWAITX },
>  203  { "-mmovdir64b",  OPTION_MASK_ISA2_MOVDIR64B },
>  204  { "-mwaitpkg",OPTION_MASK_ISA2_WAITPKG },
>  205  { "-mcldemote",   OPTION_MASK_ISA2_CLDEMOTE },
>  206  { "-mptwrite",OPTION_MASK_ISA2_PTWRITE },
>  207  { "-mavx512bf16", OPTION_MASK_ISA2_AVX512BF16 },
>  208  { "-menqcmd", OPTION_MASK_ISA2_ENQCMD }
>  209};
> --
>
>   Bootstrap and regression test on i386/x86-64 backend is ok.
>   Ok for trunk?
>
> Changelog
> * gcc/common/config/i386/i386-common.c
> (OPTION_MASK_ISA_AVX5124FMAPS_SET): Rename to
> OPTION_MASK_ISA2_AVX5124FMAPS_SET.
> (OPTION_MASK_ISA_AVX5124VNNIW_SET, OPTION_MASK_ISA_AVX512BF16_SET,
> OPTION_MASK_ISA_AVX512VP2INTERSECT_SET,
> OPTION_MASK_ISA_PCONFIG_SET, OPTION_MASK_ISA_WBNOINVD_SET,
> OPTION_MASK_ISA_SGX_SET, OPTION_MASK_ISA_CX16_SET,
> OPTION_MASK_ISA_MOVBE_SET, OPTION_MASK_ISA_PTWRITE_SET,
> OPTION_MASK_ISA_MWAITX_SET, OPTION_MASK_ISA_CLZERO_SET,
> OPTION_MASK_ISA_RDPID_SET, OPTION_MASK_ISA_VAES_SET,
> OPTION_MASK_ISA_MOVDIR64B_SET, OPTION_MASK_ISA_WAITPKG_SET,
> OPTION_MASK_ISA_CLDEMOTE_SET, OPTION_MASK_ISA_ENQCMD_SET,
> OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
> OPTION_MASK_ISA_AVX5124VNNIW_UNSET,
> OPTION_MASK_ISA_AVX512BF16_UNSET,
> OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET,
> OPTION_MASK_ISA_PCONFIG_UNSET, OPTION_MASK_ISA_WBNOINVD_UNSET,
> OPTION_MASK_ISA_SGX_UNSET, OPTION_MASK_ISA_CX16_UNSET,
> OPTION_MASK_ISA_MOVBE_UNSET, OPTION_MASK_ISA_PTWRITE_UNSET,
> OPTION_MASK_ISA_MWAITX_UNSET, OPTION_MASK_ISA_CLZERO_UNSET,
> OPTION_MASK_ISA_RDPID_UNSET, OPTION_MASK_ISA_VAES_UNSET,
> OPTION_MASK_ISA_MOVDIR64B_UNSET, OPTION_MASK_ISA_WAITPKG_UNSET,
> OPTION_MASK_ISA_CLDEMOTE_UNSET, OPTION_MASK_ISA_ENQCMD_UNSET,
> OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
> OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
> OPTION_MASK_ISA_PCONFIG, OPTION_MASK_ISA_WBNOINVD,
> OPTION_MASK_ISA_SGX, OPTION_MASK_ISA_CX16, OPTION_MASK_ISA_MOVBE,
> OPTION_MASK_ISA_PTWRITE, OPTION_MASK_ISA_MWAITX,
> OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_RDPID,
> OPTION_MASK_ISA_VAES, OPTION_MASK_ISA_MOVDIR64B,
> OPTION_MASK_ISA_WAITPKG, OPTION_MASK_ISA_CLDEMOTE,
> OPTION_MASK_ISA_ENQCMD): Ditto.
>
> * gcc/config/i386/i386-builtin.def
> (OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
> OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
> OPTION_MASK_ISA_WBNOINVD, OPTION_MASK_ISA_PTWRITE,
> OPTION_MASK_ISA_RDPID, OPTION_MASK_ISA_VAES,
> OPTION_MASK_ISA_MOVDIR64B, OPTION_MASK_ISA_ENQCMD): Ditto.
> * gcc/config/i386/i386-builtins.c (OPTION_MASK_ISA_MWAITX,
> OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_WAITPKG,
> OPTION_MASK_ISA_CLDEMOTE, OPTION_MASK_ISA_WBNOINVD): Ditto.
> * gcc/config/i386/i386-c.c
> (OPTION_MASK_ISA_AVX5124FMAPS, OPTION_MASK_ISA_AVX5124VNNIW,
> OPTION_MASK_ISA_AVX512BF16, OPTION_MASK_ISA_AVX512VP2INTERSECT,
> OPTION_MASK_ISA_PCONFIG, OPTION_MASK_ISA_WBNOINVD,
> OPTION_MASK_ISA_SGX, OPTION_MASK_ISA_CX16, OPTION_MASK_ISA_MOVBE,
> OPTION_MASK_ISA_PTWRITE, OPTION_MASK_ISA_MWAITX,
> OPTION_MASK_ISA_CLZERO, OPTION_MASK_ISA_RDPID,
> OPTION_MASK_ISA_VAES, OPTION_MASK_ISA_MOVDIR64B,
> OPTION_MASK_ISA_WAITPKG, OPTION_MASK_ISA_CLDEMOTE,
> OPTION_MASK_ISA_ENQCMD): Ditto.
> * gcc/config/i386/i386-option.c: Ditto
> * gcc/config/i386/i386.opt: Ditto..
> * gcc/config/i386/i386.h: (TARGET_ISA_AVX5124FMAPS,
> TARGET_ISA_AVX5124VNNIW,  TARGET_ISA_AVX512BF16,
> TARGET_ISA_AVX512VP2INTERSECT, TARGET_ISA_PCONFIG,
> TARGET_ISA_WBNOINVD, TARGET_ISA_SGX, TARGET_ISA_CX16,
> TARGET_ISA_MOVBE, TARGET_ISA_PTWRITE, TARGET_ISA_MWAITX,
> TARGET_ISA_CLZERO, TARGET_ISA_RDPID, TARGET_ISA_VAES,
> TARGET_ISA_MOVDIR64B, TARGET_ISA_WAITPKG, TARGET_ISA_CLDE

[Patch, committed] libgomp/testsuite/*fortran – make 'stop' values unique

2019-12-09 Thread Tobias Burnus
Somehow, I managed to commit non-unique stop codes, again. While they do 
not harm, they make debugging a bit harder. Hence:


Committed as obvious. (Rev. 279117).

Tobias

Index: libgomp/ChangeLog
===
--- libgomp/ChangeLog	(revision 279114)
+++ libgomp/ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2019-12-09  Tobias Burnus  
+
+	* testsuite/libgomp.fortran/use_device_addr-3.f90: Make 'stop' codes
+	unique.
+	* testsuite/libgomp.fortran/use_device_addr-4.f90: Ditto.
+	* testsuite/libgomp.fortran/use_device_ptr-optional-2.f90: Ditto.
+	* testsuite/libgomp.oacc-fortran/declare-5.f90: Ditto.
+	* testsuite/libgomp.oacc-fortran/optional-data-copyin-by-value.f90:
+	Ditto.
+	* testsuite/libgomp.oacc-fortran/optional-firstprivate.f90: Ditto.
+	* testsuite/libgomp.oacc-fortran/optional-update-host.f90: Ditto.
+
 2019-12-06  Kwok Cheung Yeung  
 
 	* config/accel/proc.c (omp_get_num_procs): Apply ialias macro.
diff --git a/libgomp/testsuite/libgomp.fortran/use_device_addr-3.f90 b/libgomp/testsuite/libgomp.fortran/use_device_addr-3.f90
index 82cf9ac8070..a917d289fe8 100644
--- a/libgomp/testsuite/libgomp.fortran/use_device_addr-3.f90
+++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-3.f90
@@ -101,2 +101,2 @@ contains
 if (any(abs(aa - 11.0_c_double) > 10.0_c_double * epsilon(aa))) stop [-1-]{+2+}
 if (any(abs(3.0_c_double * aa - bb) > 10.0_c_double * epsilon(aa))) stop [-1-]{+3+}
@@ -107,2 +107,2 @@ contains
 if (any(abs(cc - 33.0_c_double) > 10.0_c_double * epsilon(cc))) stop [-1-]{+4+}
 if (any(abs(3.0_c_double * cc - dd) > 10.0_c_double * epsilon(cc))) stop [-1-]{+5+}
@@ -113,2 +113,2 @@ contains
 if (any(abs(ee - 55.0_c_double) > 10.0_c_double * epsilon(ee))) stop [-1-]{+6+}
 if (any(abs(3.0_c_double * ee - ff) > 10.0_c_double * epsilon(ee))) stop [-1-]{+7+}
@@ -170,2 +170,2 @@ contains
 if (any(abs(aa - 111.0_c_double) > 10.0_c_double * epsilon(aa))) stop [-1-]{+8+}
 if (any(abs(3.0_c_double * aa - bb) > 10.0_c_double * epsilon(aa))) stop [-1-]{+9+}
@@ -178,2 +178,2 @@ contains
 if (any(abs(aa - .0_c_double) > 10.0_c_double * epsilon(aa))) stop [-1-]{+10+}
 if (any(abs(3.0_c_double * aa - bb) > 10.0_c_double * epsilon(aa))) stop [-1-]{+11+}
@@ -186,2 +186,2 @@ contains
 if (any(abs(aa - 1.0_c_double) > 10.0_c_double * epsilon(aa))) stop [-1-]{+12+}
 if (any(abs(3.0_c_double * aa - bb) > 10.0_c_double * epsilon(aa))) stop [-1-]{+13+}
@@ -190,2 +190,2 @@ contains
 if (any(abs(aa - 1.0_c_double) > 10.0_c_double * epsilon(aa))) stop [-1-]{+14+}
 if (any(abs(3.0_c_double * aa - bb) > 10.0_c_double * epsilon(aa))) stop [-1-]{+15+}
@@ -205,2 +205,2 @@ contains
 if (any(abs(cc - 333.0_c_double) > 10.0_c_double * epsilon(cc))) stop [-1-]{+16+}
 if (any(abs(3.0_c_double * cc - dd) > 10.0_c_double * epsilon(cc))) stop [-1-]{+17+}
@@ -213,2 +213,2 @@ contains
 if (any(abs(cc - .0_c_double) > 10.0_c_double * epsilon(cc))) stop [-1-]{+18+}
 if (any(abs(3.0_c_double * cc - dd) > 10.0_c_double * epsilon(cc))) stop [-1-]{+19+}
@@ -221,2 +221,2 @@ contains
 if (any(abs(cc - 3.0_c_double) > 10.0_c_double * epsilon(cc))) stop [-1-]{+20+}
 if (any(abs(3.0_c_double * cc - dd) > 10.0_c_double * epsilon(cc))) stop [-1-]{+21+}
@@ -225,2 +225,2 @@ contains
 if (any(abs(cc - 3.0_c_double) > 10.0_c_double * epsilon(dd))) stop [-1-]{+22+}
 if (any(abs(3.0_c_double * cc - dd) > 10.0_c_double * epsilon(dd))) stop [-1-]{+23+}
@@ -240,2 +240,2 @@ contains
 if (any(abs(ee - 555.0_c_double) > 10.0_c_double * epsilon(ee))) stop [-1-]{+24+}
 if (any(abs(3.0_c_double * ee - ff) > 10.0_c_double * epsilon(ee))) stop [-1-]{+25+}
@@ -248,2 +248,2 @@ contains
 if (any(abs(ee - .0_c_double) > 10.0_c_double * epsilon(ee))) stop [-1-]{+26+}
 if (any(abs(3.0_c_double * ee - ff) > 10.0_c_double * epsilon(ee))) stop [-1-]{+27+}
@@ -256,2 +256,2 @@ contains
 if (any(abs(ee - 5.0_c_double) > 10.0_c_double * epsilon(ee))) stop [-1-]{+28+}
 if (any(abs(3.0_c_double * ee - ff) > 10.0_c_double * epsilon(ff))) stop [-1-]{+29+}
@@ -260,2 +260,2 @@ contains
 if (any(abs(ee - 5.0_c_double) > 10.0_c_double * epsilon(ee))) stop [-1-]{+30+}
 if (any(abs(3.0_c_double * ee - ff) > 10.0_c_double * epsilon(ee))) stop [-1-]{+31+}
@@ -306,3 +306,3 @@ contains
 if (.not.present(aa) .or. .not.present(bb)) stop [-1-]{+32+}
 if (.not.present(cc) .or. .not.present(dd)) stop [-1-]{+33+}
 if (.not.present(ee) .or. .not.present(ff)) stop [-1-]{+34+}
@@ -310,2 +310,2 @@ contains
 if (.not.allocated(cc) .or. .not.allocated(dd)) stop [-1-]{+35+}
 if (.not.associated(ee) .or. .not.associated(ff)) stop [-1-]{+36+}
@@ -314,2 +314,2 @@ contains
 if (.not.present(aa) .or. .not.present(bb)) stop [-1-]{+37+}
 if (.not.c_associated(c_loc(aa)) .or. .not.c_associated(c_loc(bb))) stop [-1-]{+38+}
@@ -318,2 +318,2 @@ contains
 if (any(abs(aa - 11.0_c_

In 'libgomp/target.c:gomp_exit_data', remove open-coded 'gomp_remove_var' (was: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - runtime)

2019-12-09 Thread Thomas Schwinge
Hi!

On 2018-06-19T10:01:20-0700, Cesar Philippidis  wrote:
> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> +attribute_hidden bool
> +gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
> +{
> +  bool is_tgt_unmapped = false;
> +  splay_tree_remove (&devicep->mem_map, k);
> +  if (k->link_key)
> +splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key);
> +  if (k->tgt->refcount > 1)
> +k->tgt->refcount--;
> +  else
> +{
> +  is_tgt_unmapped = true;
> +  gomp_unmap_tgt (k->tgt);
> +}
> +  return is_tgt_unmapped;

This new function, can, like done here:

> @@ -1059,16 +1077,7 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> do_copyfrom)
> + tgt->list[i].offset),
>   tgt->list[i].length);
>if (do_unmap)
> - {
> -   splay_tree_remove (&devicep->mem_map, k);
> -   if (k->link_key)
> - splay_tree_insert (&devicep->mem_map,
> -(splay_tree_node) k->link_key);
> -   if (k->tgt->refcount > 1)
> - k->tgt->refcount--;
> -   else
> - gomp_unmap_tgt (k->tgt);
> - }
> + gomp_remove_var (devicep, k);
>  }

..., and here:

> @@ -1298,17 +1307,7 @@ gomp_unload_image_from_device (struct 
> gomp_device_descr *devicep,
>else
>   {
> splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
> -   splay_tree_remove (&devicep->mem_map, n);
> -   if (n->link_key)
> - {
> -   if (n->tgt->refcount > 1)
> - n->tgt->refcount--;
> -   else
> - {
> -   is_tgt_unmapped = true;
> -   gomp_unmap_tgt (n->tgt);
> - }
> - }
> +   is_tgt_unmapped = gomp_remove_var (devicep, n);
>   }

..., also be used in 'gomp_exit_data', see attached "In
'libgomp/target.c:gomp_exit_data', remove open-coded 'gomp_remove_var'",
committed to trunk in r279118.


Grüße
 Thomas


From bbfdb255a0b5cb6e183e11026c2a482d4eeba981 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 11:39:57 +
Subject: [PATCH] In 'libgomp/target.c:gomp_exit_data', remove open-coded
 'gomp_remove_var'

	libgomp/
	* target.c (gomp_exit_data): Use 'gomp_remove_var'.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279118 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  4 
 libgomp/target.c  | 11 +--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index a0bd25177d1..c5541bcec81 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,7 @@
+2019-12-09  Thomas Schwinge  
+
+	* target.c (gomp_exit_data): Use 'gomp_remove_var'.
+
 2019-12-09  Tobias Burnus  
 
 	* testsuite/libgomp.fortran/use_device_addr-3.f90: Make 'stop' codes
diff --git a/libgomp/target.c b/libgomp/target.c
index 84d6daa76ca..13f7921651f 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2095,16 +2095,7 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 	  - k->host_start),
 cur_node.host_end - cur_node.host_start);
 	  if (k->refcount == 0)
-	{
-	  splay_tree_remove (&devicep->mem_map, k);
-	  if (k->link_key)
-		splay_tree_insert (&devicep->mem_map,
-   (splay_tree_node) k->link_key);
-	  if (k->tgt->refcount > 1)
-		k->tgt->refcount--;
-	  else
-		gomp_unmap_tgt (k->tgt);
-	}
+	gomp_remove_var (devicep, k);
 
 	  break;
 	default:
-- 
2.17.1



signature.asc
Description: PGP signature


Add 'libgomp.oacc-c-c++-common/host_data-6.c'

2019-12-09 Thread Thomas Schwinge
Hi!

See attached "Add 'libgomp.oacc-c-c++-common/host_data-6.c'", committed
to trunk in r279119.


Grüße
 Thomas


From e5247d4a6930ca12fef2d38922cf6dbd9812da22 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 11:40:08 +
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/host_data-6.c'

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279119 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  2 +
 .../libgomp.oacc-c-c++-common/host_data-6.c   | 47 +++
 2 files changed, 49 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-6.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index c5541bcec81..6ef2f24e4d5 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,7 @@
 2019-12-09  Thomas Schwinge  
 
+	* testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New file.
+
 	* target.c (gomp_exit_data): Use 'gomp_remove_var'.
 
 2019-12-09  Tobias Burnus  
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-6.c
new file mode 100644
index 000..1cda442b001
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-6.c
@@ -0,0 +1,47 @@
+/* Call 'acc_memcpy_from_device' inside '#pragma acc host_data'.  */
+
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include 
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  const int SIZE = 318;
+  const int c0 = 22;
+  const int c1 = 112;
+
+  char *h = (char *) malloc (SIZE);
+
+  memset (h, c0, SIZE);
+
+#pragma acc data create (h[0:SIZE - 44])
+  {
+#pragma acc update device (h[0:SIZE - 44])
+
+memset (h, c1, 67);
+
+void *d = h;
+#pragma acc host_data use_device (d)
+{
+  acc_memcpy_from_device (h, d, 12);
+}
+  }
+
+  for (int i = 0; i < SIZE; ++i)
+{
+  if (i < 12)
+	assert (h[i] == c0);
+  else if (i < 67)
+	assert (h[i] == c1);
+  else
+	assert (h[i] == c0);
+}
+
+  free (h);
+
+  return 0;
+}
-- 
2.17.1



signature.asc
Description: PGP signature


[PR92854] Add 'libgomp.oacc-c-c++-common/pr92854-1.c'

2019-12-09 Thread Thomas Schwinge
Hi!

See attached "[PR92854] Add 'libgomp.oacc-c-c++-common/pr92854-1.c'",
committed to trunk in r279120, "to document the status quo", which does
match my understanding of the OpenACC 2.6 semantics.


Grüße
 Thomas


From e14bd9d202bc4140d825a396ddaf64a5930ee3d1 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 11:40:17 +
Subject: [PATCH] [PR92854] Add 'libgomp.oacc-c-c++-common/pr92854-1.c'

... to document the status quo.

	libgomp/
	PR libgomp/92854
	* testsuite/libgomp.oacc-c-c++-common/pr92854-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279120 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  3 ++
 .../libgomp.oacc-c-c++-common/pr92854-1.c | 31 +++
 2 files changed, 34 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr92854-1.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 6ef2f24e4d5..aac3b1887b0 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,8 @@
 2019-12-09  Thomas Schwinge  
 
+	PR libgomp/92854
+	* testsuite/libgomp.oacc-c-c++-common/pr92854-1.c: New file.
+
 	* testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New file.
 
 	* target.c (gomp_exit_data): Use 'gomp_remove_var'.
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92854-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92854-1.c
new file mode 100644
index 000..6ba96b6bf8f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92854-1.c
@@ -0,0 +1,31 @@
+/* Verify that 'acc_unmap_data' unmaps even in presence of dynamic reference
+   counts.  */
+
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  const int N = 180;
+
+  char *h = (char *) malloc (N);
+  char *d = (char *) acc_malloc (N);
+  if (!d)
+abort ();
+  acc_map_data (h, d, N);
+
+  char *d_ = (char *) acc_create (h + 3, N - 77);
+  assert (d_ == d + 3);
+
+  d_ = (char *) acc_create (h, N);
+  assert (d_ == d);
+
+  acc_unmap_data (h);
+  assert (!acc_is_present (h, N));
+
+  return 0;
+}
-- 
2.17.1



signature.asc
Description: PGP signature


Add 'libgomp.oacc-c-c++-common/map-data-1.c'

2019-12-09 Thread Thomas Schwinge
Hi!

See attached "Add 'libgomp.oacc-c-c++-common/map-data-1.c'", committed to
trunk in r279121.


Grüße
 Thomas


From 524aec42ea4d0a98fd6a0815e7573bf94fa70ee3 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 11:40:27 +
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/map-data-1.c'

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/map-data-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279121 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  2 +
 .../libgomp.oacc-c-c++-common/map-data-1.c| 53 +++
 2 files changed, 55 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/map-data-1.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index aac3b1887b0..51a00a3a46c 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,7 @@
 2019-12-09  Thomas Schwinge  
 
+	* testsuite/libgomp.oacc-c-c++-common/map-data-1.c: New file.
+
 	PR libgomp/92854
 	* testsuite/libgomp.oacc-c-c++-common/pr92854-1.c: New file.
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/map-data-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/map-data-1.c
new file mode 100644
index 000..d0781dd7f56
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/map-data-1.c
@@ -0,0 +1,53 @@
+/* Verify that 'acc_map_data' does not copy data to, and 'acc_unmap_data' does
+   not copy data from the device.  */
+
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  const int c0 = 9;
+  const int c1 = 40;
+  const int c2 = 47;
+
+  const size_t N = 256;
+
+  unsigned char *h = (unsigned char *) malloc (N);
+
+  void *d = acc_malloc (N);
+
+  memset (h, c0, N); // H <- c0
+  acc_memcpy_to_device (d, h, N); // D <- H = c0
+
+  memset (h, c1, N); // H <- c1
+  acc_map_data (h, d, N);
+  for (size_t i = 0; i < N; ++i)
+if (h[i] != c1)
+  abort ();
+
+  acc_memcpy_from_device (h, d, N); // H <- D = c0
+  for (size_t i = 0; i < N; ++i)
+if (h[i] != c0)
+  abort ();
+
+  memset (h, c2, N); // H <- c2
+  acc_unmap_data (h);
+  for (size_t i = 0; i < N; ++i)
+if (h[i] != c2)
+  abort ();
+
+  acc_memcpy_from_device (h, d, N); // H <- D = c0
+  for (size_t i = 0; i < N; ++i)
+if (h[i] != c0)
+  abort ();
+
+  acc_free (d);
+
+  free (h);
+
+  return 0;
+}
-- 
2.17.1



signature.asc
Description: PGP signature


Re: [PATCH] OpenACC "present" subarrays: runtime API return value and unmapping fixes

2019-12-09 Thread Thomas Schwinge
Hi!

On 2019-11-14T17:02:02+0100, I wrote:
> [...] I couldn't really find wording in the
> OpenACC specification that explicitly permits such things.  But given
> that, for example, in OpenACC 2.7, 3.2.20. "acc_copyin", 'acc_copyin' is
> described to be "equivalent to the 'enter data' directive with a 'copyin'
> clause", and the latter supposedly (?) does allow such "subset subarray
> mappings", and in 2.7.6. "copyin clause" it is said that "An 'enter data'
> directive with a 'copyin' clause is functionally equivalent to a call to
> the 'acc_copyin' API routine", that's probably motivation enough to fix
> the latter to conform what the former supposedly already is allowing
> (though not implementing by means of 'enter data copyin' just calling
> 'acc_copyin' etc.
>
> I see that 2.7.6. "copyin clause" also states that "The restrictions
> regarding subarrays in the present clause apply to this clause", which
> per 2.7.4. "present clause" is that "If only a subarray of an array is
> present in the current device memory, the 'present' clause must specify
> the same subarray, or a subarray that is a proper subset of the subarray
> in the data lifetime".  From that we probably are to deduce that it's
> fine the other way round (as you've argued): if a subarray of an array
> (or, the whole array) is present in the current device memory, the
> 'present' clause may specify the same subarray, or a subarray that is a
> proper subset of the subarray in the data lifetime (my words).  Unless
> you object to that, we shall (later) try to get the clarified/amended in
> the OpenACC specification.

I filed  "Subset
subarray restrictions".


> Later (not now), we should then also add corresponding testing for actual
> 'data' etc. constructs being nested in that way.

> On 2019-11-09T01:04:21+, Julian Brown  wrote:
>> a couple of existing "shouldfail" tests no longer fail, and have been
>> adjusted accordingly.
>
> These should then actually be removed, or re-written, because in their
> current form they no longer make much sense, as far as I can tell:
>
> For example, 'libgomp.oacc-c-c++-common/lib-22.c':
>
> acc_copyin (h, N);
>
> ... followed by:
>
> acc_copyout (h + 1, N - 1);
>
> ... is now meant to no longer abort with a "surrounds2" message, but
> instead we now expect success, and '!acc_is_present'.
>
> I'll take care of that later on -- I have some more tests to add anyway.

See attached '[PR92511] More testing for OpenACC "present" subarrays',
committed to trunk in r279122.


Grüße
 Thomas


From 2d5187149761bb9566b2c221c9c7ae7a18c92822 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 11:40:36 +
Subject: [PATCH] [PR92511] More testing for OpenACC "present" subarrays

In particular, "subset subarrays".

	libgomp/
	PR libgomp/92511
	* testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c: Remove
	this file...
	* testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c: ..., and
	this file...
	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: ..., and this
	file...
	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: ..., and this
	file...
	* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-r-p.c:
	... with their content moved into, and extended in this new file.
	* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-d-a.c:
	New file.
	* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-d-p.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-r-a.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-2.c:
	Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279122 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  20 +
 .../copyin-devptr-1.c |  28 -
 .../copyin-devptr-2.c |  35 --
 .../libgomp.oacc-c-c++-common/lib-22.c|  33 --
 .../libgomp.oacc-c-c++-common/lib-30.c|  30 -
 .../subset-subarray-mappings-1-d-a.c  |   7 +
 .../subset-subarray-mappings-1-d-p.c  |   7 +
 .../subset-subarray-mappings-1-r-a.c  |   7 +
 .../subset-subarray-mappings-1-r-p.c  | 514 ++
 .../subset-subarray-mappings-2.c  | 115 
 10 files changed, 670 insertions(+), 126 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-d-a.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-d-p.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-r-a.c
 create mode 100644 libgomp/testsuite/libgomp

Re: [C++ Patch] Improve build_*_cast locations

2019-12-09 Thread Paolo Carlini

Hi,

On 08/12/19 18:51, Jason Merrill wrote:
Hmm, is the change to cp_expr really necessary vs. using 
protected_set_expr_location?


Yes, using protected_set_expr_location works fine in this case, I 
suppose because we are dealing with expressions anyway plus the cp_expr 
constructor from a tree copies the location too. In the below I also 
added the thin build_functional_case wrapper, this way consistently all 
the build_*_cast functions called by the parser do not use set_location 
afterwards. Note, at some point we should also do something about the 
build_x_* functions which have been doing that for a while...


Anyway, the below passed testing.

Thanks, Paolo.



Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h(revision 279041)
+++ gcc/cp/cp-tree.h(working copy)
@@ -6998,7 +6998,8 @@ extern tree build_typeid  (tree, 
tsubst_flags_t);
 extern tree get_tinfo_decl (tree);
 extern tree get_typeid (tree, tsubst_flags_t);
 extern tree build_headof   (tree);
-extern tree build_dynamic_cast (tree, tree, tsubst_flags_t);
+extern tree build_dynamic_cast (location_t, tree, tree,
+tsubst_flags_t);
 extern void emit_support_tinfos(void);
 extern bool emit_tinfo_decl(tree);
 
@@ -7547,13 +7548,17 @@ extern tree build_x_compound_expr   
(location_t, tr
 tsubst_flags_t);
 extern tree build_compound_expr (location_t, tree, tree);
 extern tree cp_build_compound_expr (tree, tree, tsubst_flags_t);
-extern tree build_static_cast  (tree, tree, tsubst_flags_t);
-extern tree build_reinterpret_cast (tree, tree, tsubst_flags_t);
-extern tree build_const_cast   (tree, tree, tsubst_flags_t);
+extern tree build_static_cast  (location_t, tree, tree,
+tsubst_flags_t);
+extern tree build_reinterpret_cast (location_t, tree, tree,
+tsubst_flags_t);
+extern tree build_const_cast   (location_t, tree, tree,
+tsubst_flags_t);
 extern tree build_c_cast   (location_t, tree, tree);
 extern cp_expr build_c_cast(location_t loc, tree type,
 cp_expr expr);
-extern tree cp_build_c_cast(tree, tree, tsubst_flags_t);
+extern tree cp_build_c_cast(location_t, tree, tree,
+tsubst_flags_t);
 extern cp_expr build_x_modify_expr (location_t, tree,
 enum tree_code, tree,
 tsubst_flags_t);
@@ -7613,7 +7618,8 @@ extern int lvalue_or_else (tree, enum 
lvalue_use
 extern void check_template_keyword (tree);
 extern bool check_raw_literal_operator (const_tree decl);
 extern bool check_literal_operator_args(const_tree, bool *, 
bool *);
-extern void maybe_warn_about_useless_cast   (tree, tree, tsubst_flags_t);
+extern void maybe_warn_about_useless_cast   (location_t, tree, tree,
+tsubst_flags_t);
 extern tree cp_perform_integral_promotions  (tree, tsubst_flags_t);
 
 extern tree finish_left_unary_fold_expr  (tree, int);
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c   (revision 279041)
+++ gcc/cp/decl.c   (working copy)
@@ -6483,7 +6483,8 @@ reshape_init (tree type, tree init, tsubst_flags_t
{
  warning_sentinel w (warn_useless_cast);
  warning_sentinel w2 (warn_ignored_qualifiers);
- return cp_build_c_cast (type, elt, tf_warning_or_error);
+ return cp_build_c_cast (input_location, type, elt,
+ tf_warning_or_error);
}
   else
return error_mark_node;
Index: gcc/cp/method.c
===
--- gcc/cp/method.c (revision 279041)
+++ gcc/cp/method.c (working copy)
@@ -474,7 +474,8 @@ forward_parm (tree parm)
   if (!TYPE_REF_P (type))
 type = cp_build_reference_type (type, /*rval=*/true);
   warning_sentinel w (warn_useless_cast);
-  exp = build_static_cast (type, exp, tf_warning_or_error);
+  exp = build_static_cast (input_location, type, exp,
+  tf_warning_or_error);
   if (DECL_PACK_P (parm))
 exp = make_pack_expansion (exp);
   return exp;
@@ -1361,7 +1362,8 @@ build_comparison_op (tree fndecl, tsubst_flags_

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Martin Liška

Hello.

Based on presentation that had Sriraman Tallam at a LLVM conference:
https://www.youtube.com/watch?v=DySuXFGmB40

I made a heatmap based on executed instruction addresses. I used
$ perf record -F max -- ./cc1plus -fpreprocessed 
/home/marxin/Programming/tramp3d/tramp3d-v4.ii
and
$ perf script -F time,ip,dso

I'm sending link for my system GCC 9 (PGO+lean LTO bootstrap), GCC 10 before 
and after my reorder
patch (also PGO+lean LTO bootstrap).

One can see quite significant clustering starting from 5s till the end of 
compilation.
Link: https://drive.google.com/open?id=1M0YlxvQPyiVguy5VWRC8dG52UArwAuKS

Martin


Re: [PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be empty

2019-12-09 Thread Tobias Burnus

Hi, I see now the following error:

…/libgcc/crtstuff.c:372:52: error: operator '||' has no right operand
  372 |   || USE_TM_CLONE_REGISTRY || USE_EH_FRAME_REGISTRY
  |^
/net/build5-trusty-cs/scratch/tburnus/mainline-nv/src/gcc-mainline/libgcc/crtstuff.c:254:17:
 warning: '__DTOR_LIST__' defined but not used [-Wunused-variable]
  254 | STATIC func_ptr __DTOR_LIST__[1]
  | ^
Makefile:1038: recipe for target 'crtbeginT.o' failed

Cheers,

Tobias

On 11/6/19 5:17 PM, Jozef Lawrynowicz wrote:

__do_global_dtors_aux in crtstuff.c will not do anything meaningful if:
  * crtstuff.c is not being compiled for use in a shared library
  * the target uses .{init,fini}_array sections
  * TM clone registry is disabled
  * EH frame registry is disabled

The attached patch prevents it from being defined at all if all the above
conditions are true. This saves code size in the final linked executable.

0002-libgcc-Dont-define-__do_global_dtors_aux-if-it-will-.patch

 From 967262117f0c838fe8a9226484bf6e014c86f0ba Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz
Date: Tue, 29 Oct 2019 13:02:08 +
Subject: [PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be
  empty

libgcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz

* crtstuff.c (__do_global_dtors_aux): Wrap in #if so it's only defined
if it will have contents.

---
  libgcc/crtstuff.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 9a3247b7848..0b0a0b865fe 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -368,8 +368,12 @@ extern void __cxa_finalize (void *) TARGET_ATTRIBUTE_WEAK;
 On some systems, this routine is run more than once from the .fini,
 when exit is called recursively, so we arrange to remember where in
 the list we left off processing, and we resume at that point,
-   should we be re-invoked.  */
+   should we be re-invoked.
  
+   This routine does not need to be run if none of the following clauses are

+   true, as it will not do anything, so can be removed.  */
+#if defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP) \
+  || USE_TM_CLONE_REGISTRY || USE_EH_FRAME_REGISTRY
  static void __attribute__((used))
  __do_global_dtors_aux (void)
  {
@@ -455,6 +459,9 @@ __do_global_dtors_aux_1 (void)
  CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__,
  __do_global_dtors_aux_1)
  #endif
+#endif /* defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP)
+  || defined(USE_TM_CLONE_REGISTRY) || defined(USE_EH_FRAME_REGISTRY) */
+
  
  #if USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY

  /* Stick a call to __register_frame_info into the .init section.  For some
-- 2.17.1


[patch] Fix ICE on VLA in LTO mode

2019-12-09 Thread Eric Botcazou
Hi,

this is a regression present on the mainline and 9 branch: the compiler gives 
an ICE for the attached Ada testcase on the following assertion:

  if (DECL_P (ref))
{
  /* We shouldn't have true variables here.  */
  gcc_assert (TREE_READONLY (ref));
  subst = ref;
}

in self_referential_size because the size function machinery is invoked again 
by the *free_lang_data pass, more precisely from fld_process_array_type:

  if (!existed)
{
  array
= build_array_type_1 (t2, TYPE_DOMAIN (t), TYPE_TYPELESS_STORAGE (t),
  false, false);
  TYPE_CANONICAL (array) = TYPE_CANONICAL (t);
  if (!fld->pset.add (array))
add_tree_to_fld_list (array, fld);
}

through the call to build_array_type_1, and more precisely through the 
recursive call made for computing TYPE_CANONICAL:

  if (TYPE_CANONICAL (t) == t)
{
  if (TYPE_STRUCTURAL_EQUALITY_P (elt_type)
  || (index_type && TYPE_STRUCTURAL_EQUALITY_P (index_type))
  || in_lto_p)
SET_TYPE_STRUCTURAL_EQUALITY (t);
  else if (TYPE_CANONICAL (elt_type) != elt_type
   || (index_type && TYPE_CANONICAL (index_type) != index_type))
TYPE_CANONICAL (t)
  = build_array_type_1 (TYPE_CANONICAL (elt_type),
index_type
? TYPE_CANONICAL (index_type) : NULL_TREE,
typeless_storage, shared, set_canonical);
}


That's a bit surprising because t2 is an incomplete type and we have these 
lines in build_array_type_1 just before:

  /* If the element type is incomplete at this point we get marked for
 structural equality.  Do not record these types in the canonical
 type hashtable.  */
  if (TYPE_STRUCTURAL_EQUALITY_P (t))
return t;

so the computation of TYPE_CANONICAL should be skipped.  But it turns out that 
these lines from 2009 are obsolete because layout_type no longer forces the 
TYPE_STRUCTURAL_EQUALITY_P on the array when the element type is incomplete.


Since fld_process_array_type overwrites TYPE_CANONICAL just after the call to 
build_array_type_1, there is no point for the latter in computing it so the 
proposed fix is to add a new SET_CANONICAL parameter to build_array_type_1.

Tested on x86_64-suse-linux, OK for mainline and 9 branch?


2019-12-09  Eric Botcazou  

* tree.c (build_array_type_1): Add SET_CANONICAL parameter and compute
TYPE_CANONICAL from the element type only if it is set.  Remove obsolete
lines and adjust recursive call.
(fld_process_array_type): Adjust call to build_array_type_1.
(build_array_type): Likewise.
(build_nonshared_array_type): Likewise.


2019-12-09  Eric Botcazou  

* gnat.dg/lto23.adb: New test.
 
-- 
Eric BotcazouIndex: tree.c
===
--- tree.c	(revision 278938)
+++ tree.c	(working copy)
@@ -266,7 +266,7 @@ static void print_type_hash_statistics (
 static void print_debug_expr_statistics (void);
 static void print_value_expr_statistics (void);
 
-static tree build_array_type_1 (tree, tree, bool, bool);
+static tree build_array_type_1 (tree, tree, bool, bool, bool);
 
 tree global_trees[TI_MAX];
 tree integer_types[itk_none];
@@ -5303,8 +5303,9 @@ fld_process_array_type (tree t, tree t2,
  = map->get_or_insert (t, &existed);
   if (!existed)
 {
-  array = build_array_type_1 (t2, TYPE_DOMAIN (t),
-  TYPE_TYPELESS_STORAGE (t), false);
+  array
+	= build_array_type_1 (t2, TYPE_DOMAIN (t), TYPE_TYPELESS_STORAGE (t),
+			  false, false);
   TYPE_CANONICAL (array) = TYPE_CANONICAL (t);
   if (!fld->pset.add (array))
 	add_tree_to_fld_list (array, fld);
@@ -8155,11 +8156,12 @@ subrange_type_for_debug_p (const_tree ty
 /* Construct, lay out and return the type of arrays of elements with ELT_TYPE
and number of elements specified by the range of values of INDEX_TYPE.
If TYPELESS_STORAGE is true, TYPE_TYPELESS_STORAGE flag is set on the type.
-   If SHARED is true, reuse such a type that has already been constructed.  */
+   If SHARED is true, reuse such a type that has already been constructed.
+   If SET_CANONICAL is true, compute TYPE_CANONICAL from the element type.  */
 
 static tree
 build_array_type_1 (tree elt_type, tree index_type, bool typeless_storage,
-		bool shared)
+		bool shared, bool set_canonical)
 {
   tree t;
 
@@ -8176,19 +8178,13 @@ build_array_type_1 (tree elt_type, tree
   TYPE_TYPELESS_STORAGE (t) = typeless_storage;
   layout_type (t);
 
-  /* If the element type is incomplete at this point we get marked for
- structural equality.  Do not record these types in the canonical
- type hashtable.  */
-  if (TYPE_STRUCTURAL_EQUALITY_P (t))
-return t;
-
   if (shared)
 {
   hashval_t hash = type_hash_canon_hash (t);
   t = type_hash_canon (hash, t);
 }
 
-  if (TYPE_CA

Re: [PATCH] Refactor IPA devirt a bit.

2019-12-09 Thread Richard Sandiford
Martin Liška  writes:
> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
> index a884a465a5d..e53461b1f5c 100644
> --- a/gcc/ipa-devirt.c
> +++ b/gcc/ipa-devirt.c
> @@ -1036,20 +1036,13 @@ warn_types_mismatch (tree t1, tree t2, location_t 
> loc1, location_t loc2)
>/* If types have mangled ODR names and they are different, it is most
>   informative to output those.
>   This also covers types defined in different namespaces.  */
> -  if (TYPE_NAME (mt1) && TYPE_NAME (mt2)
> -  && TREE_CODE (TYPE_NAME (mt1)) == TYPE_DECL
> -  && TREE_CODE (TYPE_NAME (mt2)) == TYPE_DECL
> -  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt1))
> -  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt2))
> -  && DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))
> -  != DECL_ASSEMBLER_NAME (TYPE_NAME (mt2)))
> -{
> -  char *name1 = xstrdup (cplus_demangle
> -  (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))),
> -   DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES));
> -  char *name2 = cplus_demangle
> -  (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt2))),
> -   DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES);
> +  const char *odr1 = get_odr_name_for_type (mt1);
> +  const char *odr2 = get_odr_name_for_type (mt2);
> +  if (odr1 != NULL && odr2 != NULL && odr1 != odr2)
> +{
> +  const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
> +  char *name1 = xstrdup (cplus_demangle (odr1, opts));
> +  char *name2 = xstrdup (cplus_demangle (odr2, opts));

This adds an xstrdup for name2.  Is that intentional or just a pasto?
The old code assumed that the demangler buffer wouldn't be reused by
the diagnostics machinery, but maybe copying the buffer is more robust.
In that case though, we need to free name2 in the same way that we
already free name1.

In the patch below I went for removing the xstrdup, but I can add
the frees instead if that seems better.

> diff --git a/gcc/tree.h b/gcc/tree.h
> index 0f3cc5d7e5a..40a4fde6aec 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6222,6 +6222,19 @@ fndecl_built_in_p (const_tree node, built_in_function 
> name)
> && DECL_FUNCTION_CODE (node) == name);
>  }
>  
> +/* If TYPE has mangled ODR name, return it.  Otherwise return NULL.  */
> +
> +inline const char *
> +get_odr_name_for_type (tree type)
> +{
> +  tree type_name = TYPE_NAME (type);
> +  if (type_name == NULL_TREE
> +  || !DECL_ASSEMBLER_NAME_SET_P (type_name))
> +return NULL;
> +
> +  return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (type_name));
> +}
> +
>  /* A struct for encapsulating location information about an operator
> and the operation built from it.
>  

This drops the TYPE_DECL test from the original code above, which
causes an ICE for C tags.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


2019-12-09  Richard Sandiford  

gcc/
* ipa-utils.h (get_odr_name_for_type): Check for a TYPE_DECL.
* ipa-devirt.c (warn_types_mismatch): Don't call xstrdup for the
second demangled name.

gcc/testsuite/
* gcc.dg/lto/tag-1_0.c, gcc.dg/lto/tag-1_1.c: New test.

Index: gcc/ipa-utils.h
===
--- gcc/ipa-utils.h 2019-12-09 12:23:47.0 +
+++ gcc/ipa-utils.h 2019-12-09 12:23:48.326292463 +
@@ -256,6 +256,7 @@ get_odr_name_for_type (tree type)
 {
   tree type_name = TYPE_NAME (type);
   if (type_name == NULL_TREE
+  || TREE_CODE (type_name) != TYPE_DECL
   || !DECL_ASSEMBLER_NAME_SET_P (type_name))
 return NULL;
 
Index: gcc/ipa-devirt.c
===
--- gcc/ipa-devirt.c2019-12-09 12:23:47.0 +
+++ gcc/ipa-devirt.c2019-12-09 12:23:48.326292463 +
@@ -1042,7 +1042,7 @@ warn_types_mismatch (tree t1, tree t2, l
 {
   const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
   char *name1 = xstrdup (cplus_demangle (odr1, opts));
-  char *name2 = xstrdup (cplus_demangle (odr2, opts));
+  char *name2 = cplus_demangle (odr2, opts);
   if (name1 && name2 && strcmp (name1, name2))
{
  inform (loc_t1,
Index: gcc/testsuite/gcc.dg/lto/tag-1_0.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/lto/tag-1_0.c  2019-12-09 12:23:48.326292463 +
@@ -0,0 +1,5 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -Wodr -flto } } }  */
+
+struct foo { int x; };
+struct foo a = {};
Index: gcc/testsuite/gcc.dg/lto/tag-1_1.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/lto/tag-1_1.c  2019-12-09 12:23:48.326292463 +
@@ -0,0 +1,6 @@
+struct foo { short x; };
+
+extern struct foo a; /* { dg-lto-warning {type of 'a' does not match original 
declaration} } */
+struct foo *ptr = &a;
+
+int main () { return 0; }


Re: [PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be empty

2019-12-09 Thread Jozef Lawrynowicz
On Mon, 9 Dec 2019 13:19:22 +0100
Tobias Burnus  wrote:

> Hi, I see now the following error:
> 
> …/libgcc/crtstuff.c:372:52: error: operator '||' has no right operand
>372 |   || USE_TM_CLONE_REGISTRY || USE_EH_FRAME_REGISTRY
>|^
> /net/build5-trusty-cs/scratch/tburnus/mainline-nv/src/gcc-mainline/libgcc/crtstuff.c:254:17:
>  warning: '__DTOR_LIST__' defined but not used [-Wunused-variable]
>254 | STATIC func_ptr __DTOR_LIST__[1]
>| ^
> Makefile:1038: recipe for target 'crtbeginT.o' failed
> 
> Cheers,
> 
> Tobias

Sorry, I need to change that to defined(USE_EH_FRAME_REGISTRY). Committing
shortly.

Thanks,
Jozef

> 
> On 11/6/19 5:17 PM, Jozef Lawrynowicz wrote:
> > __do_global_dtors_aux in crtstuff.c will not do anything meaningful if:
> >   * crtstuff.c is not being compiled for use in a shared library
> >   * the target uses .{init,fini}_array sections
> >   * TM clone registry is disabled
> >   * EH frame registry is disabled
> >
> > The attached patch prevents it from being defined at all if all the above
> > conditions are true. This saves code size in the final linked executable.
> >
> > 0002-libgcc-Dont-define-__do_global_dtors_aux-if-it-will-.patch
> >
> >  From 967262117f0c838fe8a9226484bf6e014c86f0ba Mon Sep 17 00:00:00 2001
> > From: Jozef Lawrynowicz
> > Date: Tue, 29 Oct 2019 13:02:08 +
> > Subject: [PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be
> >   empty
> >
> > libgcc/ChangeLog:
> >
> > 2019-11-06  Jozef Lawrynowicz
> >
> > * crtstuff.c (__do_global_dtors_aux): Wrap in #if so it's only defined
> > if it will have contents.
> >
> > ---
> >   libgcc/crtstuff.c | 9 -
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
> > index 9a3247b7848..0b0a0b865fe 100644
> > --- a/libgcc/crtstuff.c
> > +++ b/libgcc/crtstuff.c
> > @@ -368,8 +368,12 @@ extern void __cxa_finalize (void *) 
> > TARGET_ATTRIBUTE_WEAK;
> >  On some systems, this routine is run more than once from the .fini,
> >  when exit is called recursively, so we arrange to remember where in
> >  the list we left off processing, and we resume at that point,
> > -   should we be re-invoked.  */
> > +   should we be re-invoked.
> >   
> > +   This routine does not need to be run if none of the following clauses 
> > are
> > +   true, as it will not do anything, so can be removed.  */
> > +#if defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP) \
> > +  || USE_TM_CLONE_REGISTRY || USE_EH_FRAME_REGISTRY
> >   static void __attribute__((used))
> >   __do_global_dtors_aux (void)
> >   {
> > @@ -455,6 +459,9 @@ __do_global_dtors_aux_1 (void)
> >   CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__,
> >   __do_global_dtors_aux_1)
> >   #endif
> > +#endif /* defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP)
> > +  || defined(USE_TM_CLONE_REGISTRY) || defined(USE_EH_FRAME_REGISTRY) */
> > +
> >   
> >   #if USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY
> >   /* Stick a call to __register_frame_info into the .init section.  For some
> > -- 2.17.1  



[PATCH] rs6000: Name set_cc, and delete some old mfcr patterns

2019-12-09 Thread Segher Boessenkool
This names the so far unnamed basic mfcr pattern "set_cc", and
it deletes all the others (only the ashift one ever was generated, and
even that one only once during a whole bootstrap+regtest, and that one
is questionable -- we don't cost that pattern correctly).

Tested on powerpc64-linux {-m32,-m64}.  (This defaults to power4, so
this code actually is tested then).  Committing to trunk.


Segher


2019-12-09  Segher Boessenkool  

* config/rs6000/rs6000.md (unnamed mfcr define_insn): Name this
set_cc.
(unnamed define_insn_and_split): Delete.
(unnamed define_insn): Delete.
(unnamed define_insn): Delete.
(unnamed define_split): Delete.

---
 gcc/config/rs6000/rs6000.md | 111 +---
 1 file changed, 1 insertion(+), 110 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index f3c8eb0..4c44c1f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -11838,7 +11838,7 @@ (define_insn_and_split "*cmp_internal2"
 ;; mfcr and rlinm, but this is tricky.  Let's leave it for now.  In most
 ;; cases the insns below which don't use an intermediate CR field will
 ;; be used instead.
-(define_insn ""
+(define_insn "set_cc"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(match_operator:GPR 1 "scc_comparison_operator"
[(match_operand 2 "cc_reg_operand" "y")
@@ -11852,115 +11852,6 @@ (define_insn ""
(const_string "mfcr")))
(set_attr "length" "8")])
 
-(define_insn_and_split ""
-  [(set (match_operand:CC 0 "cc_reg_operand" "=x,?y")
-   (compare:CC (match_operator:SI 1 "scc_comparison_operator"
-  [(match_operand 2 "cc_reg_operand" "y,y")
-   (const_int 0)])
-   (const_int 0)))
-   (set (match_operand:SI 3 "gpc_reg_operand" "=r,r")
-   (match_op_dup 1 [(match_dup 2) (const_int 0)]))]
-  "TARGET_32BIT"
-  "@
-   mfcr %3%Q2\;rlwinm. %3,%3,%J1,1
-   #"
-  "&& reload_completed"
-  [(set (match_dup 3)
-   (match_op_dup 1 [(match_dup 2) (const_int 0)]))
-   (set (match_dup 0)
-   (compare:CC (match_dup 3)
-   (const_int 0)))]
-  ""
-  [(set_attr "type" "shift")
-   (set_attr "dot" "yes")
-   (set_attr "length" "8,16")])
-
-(define_insn ""
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
-   (ashift:SI (match_operator:SI 1 "scc_comparison_operator"
- [(match_operand 2 "cc_reg_operand" "y")
-  (const_int 0)])
-  (match_operand:SI 3 "const_int_operand" "n")))]
-  ""
-{
-  int is_bit = ccr_bit (operands[1], 1);
-  int put_bit = 31 - (INTVAL (operands[3]) & 31);
-  int count;
-
-  gcc_assert (is_bit != -1);
-  if (is_bit >= put_bit)
-count = is_bit - put_bit;
-  else
-count = 32 - (put_bit - is_bit);
-
-  operands[4] = GEN_INT (count);
-  operands[5] = GEN_INT (put_bit);
-
-  return "mfcr %0%Q2\;rlwinm %0,%0,%4,%5,%5";
-}
-  [(set (attr "type")
- (cond [(match_test "TARGET_MFCRF")
-   (const_string "mfcrf")
-  ]
-   (const_string "mfcr")))
-   (set_attr "length" "8")])
-
-(define_insn ""
-  [(set (match_operand:CC 0 "cc_reg_operand" "=x,?y")
-   (compare:CC
-(ashift:SI (match_operator:SI 1 "scc_comparison_operator"
-  [(match_operand 2 "cc_reg_operand" "y,y")
-   (const_int 0)])
-   (match_operand:SI 3 "const_int_operand" "n,n"))
-(const_int 0)))
-   (set (match_operand:SI 4 "gpc_reg_operand" "=r,r")
-   (ashift:SI (match_op_dup 1 [(match_dup 2) (const_int 0)])
-  (match_dup 3)))]
-  ""
-{
-  int is_bit = ccr_bit (operands[1], 1);
-  int put_bit = 31 - (INTVAL (operands[3]) & 31);
-  int count;
-
-  gcc_assert (is_bit != -1);
-  /* Force split for non-cc0 compare.  */
-  if (which_alternative == 1)
- return "#";
-
-  if (is_bit >= put_bit)
-count = is_bit - put_bit;
-  else
-count = 32 - (put_bit - is_bit);
-
-  operands[5] = GEN_INT (count);
-  operands[6] = GEN_INT (put_bit);
-
-  return "mfcr %4%Q2\;rlwinm. %4,%4,%5,%6,%6";
-}
-  [(set_attr "type" "shift")
-   (set_attr "dot" "yes")
-   (set_attr "length" "8,16")])
-
-(define_split
-  [(set (match_operand:CC 0 "cc_reg_not_cr0_operand")
-   (compare:CC
-(ashift:SI (match_operator:SI 1 "scc_comparison_operator"
-  [(match_operand 2 "cc_reg_operand")
-   (const_int 0)])
-   (match_operand:SI 3 "const_int_operand"))
-(const_int 0)))
-   (set (match_operand:SI 4 "gpc_reg_operand")
-   (ashift:SI (match_op_dup 1 [(match_dup 2) (const_int 0)])
-  (match_dup 3)))]
-  "reload_completed"
-  [(set (match_dup 4)
-   (ashift:SI (match_op_dup 1 [(match_dup 2) (const_int 0)])
-   

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Martin Liška

On 12/9/19 1:14 PM, Martin Liška wrote:

Hello.

Based on presentation that had Sriraman Tallam at a LLVM conference:
https://www.youtube.com/watch?v=DySuXFGmB40

I made a heatmap based on executed instruction addresses. I used
$ perf record -F max -- ./cc1plus -fpreprocessed 
/home/marxin/Programming/tramp3d/tramp3d-v4.ii
and
$ perf script -F time,ip,dso

I'm sending link for my system GCC 9 (PGO+lean LTO bootstrap), GCC 10 before 
and after my reorder
patch (also PGO+lean LTO bootstrap).

One can see quite significant clustering starting from 5s till the end of 
compilation.
Link: https://drive.google.com/open?id=1M0YlxvQPyiVguy5VWRC8dG52UArwAuKS

Martin


For the completeness, the heatmap was generated with the following script:
https://github.com/marxin/script-misc/blob/master/binary-heatmap.py

Martin


Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2019-12-09 Thread Martin Liška

Hello.

The patch triggers the following warning:

In file included from /home/marxin/Programming/gcc/gcc/regstat.c:23:
/home/marxin/Programming/gcc/gcc/regstat.c: In function ‘void 
regstat_bb_compute_calls_crossed(unsigned int, bitmap)’:
/home/marxin/Programming/gcc/gcc/regstat.c:327:35: warning: comparison of 
integer expressions of different signedness: ‘int’ and ‘unsigned int’ 
[-Wsign-compare]
  327 |   gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
/home/marxin/Programming/gcc/gcc/system.h:748:14: note: in definition of macro 
‘gcc_assert’
  748 |((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 
: 0))
  |  ^~~~

What about something like:

 gcc/regstat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/regstat.c b/gcc/regstat.c
index c6cefb117d7..035d48c28ab 100644
--- a/gcc/regstat.c
+++ b/gcc/regstat.c
@@ -324,7 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index, 
bitmap live)
 
   FOR_BB_INSNS_REVERSE (bb, insn)

 {
-  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
+  gcc_assert (INSN_UID (insn) < (int)DF_INSN_SIZE ());
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   unsigned int regno;
 
Martin


Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2019-12-09 Thread Matthew Malcomson
Ah,  apologies -- you're right.
I'd already committed the patch this morning, so I'll update it with the 
obvious fix.

Thanks for the catch,
Matthew

On 09/12/2019 12:48, Martin Liška wrote:
> Hello.
> 
> The patch triggers the following warning:
> 
> In file included from /home/marxin/Programming/gcc/gcc/regstat.c:23:
> /home/marxin/Programming/gcc/gcc/regstat.c: In function ‘void 
> regstat_bb_compute_calls_crossed(unsigned int, bitmap)’:
> /home/marxin/Programming/gcc/gcc/regstat.c:327:35: warning: comparison 
> of integer expressions of different signedness: ‘int’ and ‘unsigned int’ 
> [-Wsign-compare]
>    327 |   gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> /home/marxin/Programming/gcc/gcc/system.h:748:14: note: in definition of 
> macro ‘gcc_assert’
>    748 |    ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, 
> __FUNCTION__), 0 : 0))
>    |  ^~~~
> 
> What about something like:
> 
>   gcc/regstat.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/regstat.c b/gcc/regstat.c
> index c6cefb117d7..035d48c28ab 100644
> --- a/gcc/regstat.c
> +++ b/gcc/regstat.c
> @@ -324,7 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int 
> bb_index, bitmap live)
> 
>     FOR_BB_INSNS_REVERSE (bb, insn)
>   {
> -  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> +  gcc_assert (INSN_UID (insn) < (int)DF_INSN_SIZE ());
>     struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
>     unsigned int regno;
> 
> Martin



Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2019-12-09 Thread Matthew Malcomson
Ah,  apologies -- you're right.
I'd already committed the patch this morning, so I'll update it with the 
obvious fix.

Thanks for the catch,
Matthew

On 09/12/2019 12:48, Martin Liška wrote:
> Hello.
> 
> The patch triggers the following warning:
> 
> In file included from /home/marxin/Programming/gcc/gcc/regstat.c:23:
> /home/marxin/Programming/gcc/gcc/regstat.c: In function ‘void 
> regstat_bb_compute_calls_crossed(unsigned int, bitmap)’:
> /home/marxin/Programming/gcc/gcc/regstat.c:327:35: warning: comparison 
> of integer expressions of different signedness: ‘int’ and ‘unsigned int’ 
> [-Wsign-compare]
>    327 |   gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> /home/marxin/Programming/gcc/gcc/system.h:748:14: note: in definition of 
> macro ‘gcc_assert’
>    748 |    ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, 
> __FUNCTION__), 0 : 0))
>    |  ^~~~
> 
> What about something like:
> 
>   gcc/regstat.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/regstat.c b/gcc/regstat.c
> index c6cefb117d7..035d48c28ab 100644
> --- a/gcc/regstat.c
> +++ b/gcc/regstat.c
> @@ -324,7 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int 
> bb_index, bitmap live)
> 
>     FOR_BB_INSNS_REVERSE (bb, insn)
>   {
> -  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> +  gcc_assert (INSN_UID (insn) < (int)DF_INSN_SIZE ());
>     struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
>     unsigned int regno;
> 
> Martin



Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2019-12-09 Thread Segher Boessenkool
On Mon, Dec 09, 2019 at 01:48:51PM +0100, Martin Liška wrote:
> -  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> +  gcc_assert (INSN_UID (insn) < (int)DF_INSN_SIZE ());

Space after cast please.


Segher


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Jan Hubicka
> On 12/9/19 1:14 PM, Martin Liška wrote:
> > Hello.
> > 
> > Based on presentation that had Sriraman Tallam at a LLVM conference:
> > https://www.youtube.com/watch?v=DySuXFGmB40
> > 
> > I made a heatmap based on executed instruction addresses. I used
> > $ perf record -F max -- ./cc1plus -fpreprocessed 
> > /home/marxin/Programming/tramp3d/tramp3d-v4.ii
> > and
> > $ perf script -F time,ip,dso
> > 
> > I'm sending link for my system GCC 9 (PGO+lean LTO bootstrap), GCC 10 
> > before and after my reorder
> > patch (also PGO+lean LTO bootstrap).
> > 
> > One can see quite significant clustering starting from 5s till the end of 
> > compilation.
> > Link: https://drive.google.com/open?id=1M0YlxvQPyiVguy5VWRC8dG52UArwAuKS
> > 
> > Martin
> 
> For the completeness, the heatmap was generated with the following script:
> https://github.com/marxin/script-misc/blob/master/binary-heatmap.py

Thanks,
this looks really useful as we had almost no way to check code layout
ever since you systemtap script stopped working.

On the first glance the difference between gcc9 and gcc10 is explained
by the changes to profile updating. gcc9 makes very small cold
partitions compared to gcc10.  It is very nice that we have a way to
measure it. I will also check if some of the more important profiling
update fixes makes sense to backport to gcc9.

Over weekend I did some fixes to tp reordreing, so it may be nice to
update your tests, but I will try to run it myself.

In general one can see individual stages of compilation on the graph -
parsing, early lowering, early opts.  On bigger programs this should be
more visible.  I will give it a try.

Honza


Re: [PATCH 3/3] libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE

2019-12-09 Thread Jozef Lawrynowicz
On Sat, 07 Dec 2019 11:27:54 -0700
Jeff Law  wrote:

> On Wed, 2019-11-06 at 16:19 +, Jozef Lawrynowicz wrote:
> > From 7bc0971d2936ebe71e7b7d3d805cf1bbf9f9f5af Mon Sep 17 00:00:00 2001
> > From: Jozef Lawrynowicz 
> > Date: Mon, 4 Nov 2019 17:38:13 +
> > Subject: [PATCH 3/3] libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE
> > 
> > gcc/ChangeLog:
> > 
> > 2019-11-06  Jozef Lawrynowicz  
> > 
> > * doc/tm.texi: Regenerate.
> > * doc/tm.texi.in: Define TARGET_LIBGCC_REMOVE_DSO_HANDLE.
> > 
> > libgcc/ChangeLog:
> > 
> > 2019-11-06  Jozef Lawrynowicz  
> > 
> > * crtstuff.c: Don't declare __dso_handle if
> > TARGET_LIBGCC_REMOVE_DSO_HANDLE is defined.  
> Presumably you'll switch this on for your bare elf target
> configuration?

Yep that's the plan. I originally didn't include the target changes in
this patch since other target changes (disabling __cxa_atexit) were required for
the removal of __dso_handle to be OK.

> 
> Are there other things, particularly related to shared library support,
> that we wouldn't need to use as well?  The reason I ask is I'm trying
> to figure out if REMOVE_DSO_HANDLE is the right name or if we should
> generalize it to a name that indicates shared libraries in general
> aren't supported on the target.

CRTSTUFFS_O is defined when compiling shared versions of crt{begin,end} and
handles an extra case in crtstuff.c where there's some shared library related
functionality we don't need on MSP430.

But when CRTSTUFFS_O is undefined __dso_handle is still declared - to 0. The
comment gives some additional insight:

/* Declare the __dso_handle variable.  It should have a unique value  
   in every shared-object; in a main program its value is zero.  The  
   object should in any case be protected.  This means the instance   
   in one DSO or the main program is not used in another object.  The 
   dynamic linker takes care of this.  */ 

I haven't noticed any further shared library-related bloat coming from libgcc.

I think a better way of solving this problem is just to check
DEFAULT_USE_CXA_ATEXIT rather than adding this new macro. If __cxa_atexit is
not enabled then as far as I understand __dso_handle serves no purpose.
DEFAULT_USE_CXA_ATEXIT is defined at configure time for any targets that want
__cxa_atexit support.

A quick bootstrap and test of dg.exp on x86_64-pc-linux-gnu shows no issues
with the following:

> diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
> index ae6328d317d..349f8191e61 100644
> --- a/libgcc/crtstuff.c
> +++ b/libgcc/crtstuff.c
> @@ -340,8 +340,10 @@ extern void *__dso_handle __attribute__ ((__visibility__ 
> ("hidden")));
>  #ifdef CRTSTUFFS_O
>  void *__dso_handle = &__dso_handle;
>  #else
> +#if DEFAULT_USE_CXA_ATEXIT
>  void *__dso_handle = 0;
>  #endif
> +#endif
>  
>  /* The __cxa_finalize function may not be available so we use only a
> weak declaration.  */

I'll put that patch through some more rigorous testing.

Thanks,
Jozef
> 
> Jeff


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Martin Liška

On 12/9/19 2:03 PM, Jan Hubicka wrote:

On 12/9/19 1:14 PM, Martin Liška wrote:

Hello.

Based on presentation that had Sriraman Tallam at a LLVM conference:
https://www.youtube.com/watch?v=DySuXFGmB40

I made a heatmap based on executed instruction addresses. I used
$ perf record -F max -- ./cc1plus -fpreprocessed 
/home/marxin/Programming/tramp3d/tramp3d-v4.ii
and
$ perf script -F time,ip,dso

I'm sending link for my system GCC 9 (PGO+lean LTO bootstrap), GCC 10 before 
and after my reorder
patch (also PGO+lean LTO bootstrap).

One can see quite significant clustering starting from 5s till the end of 
compilation.
Link: https://drive.google.com/open?id=1M0YlxvQPyiVguy5VWRC8dG52UArwAuKS

Martin


For the completeness, the heatmap was generated with the following script:
https://github.com/marxin/script-misc/blob/master/binary-heatmap.py


Thanks,
this looks really useful as we had almost no way to check code layout
ever since you systemtap script stopped working.


Great, thanks.



On the first glance the difference between gcc9 and gcc10 is explained
by the changes to profile updating. gcc9 makes very small cold
partitions compared to gcc10.  It is very nice that we have a way to
measure it. I will also check if some of the more important profiling
update fixes makes sense to backport to gcc9.

Over weekend I did some fixes to tp reordreing, so it may be nice to
update your tests, but I will try to run it myself.

In general one can see individual stages of compilation on the graph -
parsing, early lowering, early opts.  On bigger programs this should be
more visible.  I will give it a try.


You haven't replied to question whether we want to let ipa-reorder into
trunk based on the sent images for GCC 10 PGO+LTO boostrap?

Martin



Honza





Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-09 Thread Stam Markianos-Wright


On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
> 
> 
> On 12/2/19 9:27 PM, Joseph Myers wrote:
>> On Mon, 2 Dec 2019, Jeff Law wrote:
>>
 2019-11-13  Stam Markianos-Wright  

     * real.c (struct arm_bfloat_half_format,
     encode_arm_bfloat_half, decode_arm_bfloat_half): New.
     * real.h (arm_bfloat_half_format): New.


>>> Generally OK.  Please consider using "arm_bfloat_half" instead of
>>> "bfloat_half" for the name field in the arm_bfloat_half_format
>>> structure.  I'm not sure if that's really visible externally, but it
>>
> Hi both! Agreed that we want to be conservative. See latest diff 
> attached with the name field change (also pasted below).

.Ping :)
> 

>> Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit
>> with Arm and Intel using different rounding modes)?
> 
> Yes it is remarkably similar, but there's really only so much variation 
> you can have with what is half an f32!
> 
> Cheers,
> Stam
> 
> 
>>
> 
> 
> diff --git a/gcc/real.h b/gcc/real.h
> index 0f660c9c671..2b337bb7f7d 100644
> --- a/gcc/real.h
> +++ b/gcc/real.h
> @@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format;
>   extern const struct real_format decimal_quad_format;
>   extern const struct real_format ieee_half_format;
>   extern const struct real_format arm_half_format;
> +extern const struct real_format arm_bfloat_half_format;
> 
> 
>   /* 
> == */
> diff --git a/gcc/real.c b/gcc/real.c
> index 134240a6be9..07b63b6f27e 100644
> --- a/gcc/real.c
> +++ b/gcc/real.c
> @@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, 
> REAL_VALUE_TYPE *r,
>   }
>   }
> 
> +/* Encode arm_bfloat types.  */
> +static void
> +encode_arm_bfloat_half (const struct real_format *fmt, long *buf,
> +    const REAL_VALUE_TYPE *r)
> +{
> +  unsigned long image, sig, exp;
> +  unsigned long sign = r->sign;
> +  bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0;
> +
> +  image = sign << 15;
> +  sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f;
> +
> +  switch (r->cl)
> +    {
> +    case rvc_zero:
> +  break;
> +
> +    case rvc_inf:
> +  if (fmt->has_inf)
> +    image |= 255 << 7;
> +  else
> +    image |= 0x7fff;
> +  break;
> +
> +    case rvc_nan:
> +  if (fmt->has_nans)
> +    {
> +  if (r->canonical)
> +    sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0);
> +  if (r->signalling == fmt->qnan_msb_set)
> +    sig &= ~(1 << 6);
> +  else
> +    sig |= 1 << 6;
> +  if (sig == 0)
> +    sig = 1 << 5;
> +
> +  image |= 255 << 7;
> +  image |= sig;
> +    }
> +  else
> +    image |= 0x7fff;
> +  break;
> +
> +    case rvc_normal:
> +  if (denormal)
> +    exp = 0;
> +  else
> +  exp = REAL_EXP (r) + 127 - 1;
> +  image |= exp << 7;
> +  image |= sig;
> +  break;
> +
> +    default:
> +  gcc_unreachable ();
> +    }
> +
> +  buf[0] = image;
> +}
> +
> +/* Decode arm_bfloat types.  */
> +static void
> +decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
> +    const long *buf)
> +{
> +  unsigned long image = buf[0] & 0x;
> +  bool sign = (image >> 15) & 1;
> +  int exp = (image >> 7) & 0xff;
> +
> +  memset (r, 0, sizeof (*r));
> +  image <<= HOST_BITS_PER_LONG - 8;
> +  image &= ~SIG_MSB;
> +
> +  if (exp == 0)
> +    {
> +  if (image && fmt->has_denorm)
> +    {
> +  r->cl = rvc_normal;
> +  r->sign = sign;
> +  SET_REAL_EXP (r, -126);
> +  r->sig[SIGSZ-1] = image << 1;
> +  normalize (r);
> +    }
> +  else if (fmt->has_signed_zero)
> +    r->sign = sign;
> +    }
> +  else if (exp == 255 && (fmt->has_nans || fmt->has_inf))
> +    {
> +  if (image)
> +    {
> +  r->cl = rvc_nan;
> +  r->sign = sign;
> +  r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1)
> +   ^ fmt->qnan_msb_set);
> +  r->sig[SIGSZ-1] = image;
> +    }
> +  else
> +    {
> +  r->cl = rvc_inf;
> +  r->sign = sign;
> +    }
> +    }
> +  else
> +    {
> +  r->cl = rvc_normal;
> +  r->sign = sign;
> +  SET_REAL_EXP (r, exp - 127 + 1);
> +  r->sig[SIGSZ-1] = image | SIG_MSB;
> +    }
> +}
> +
>   /* Half-precision format, as specified in IEEE 754R.  */
>   const struct real_format ieee_half_format =
>     {
> @@ -4848,6 +4958,33 @@ const struct real_format arm_half_format =
>   false,
>   "arm_half"
>     };
> +
> +/* ARM Bfloat half-precision format.  This format resembles a truncated
> +   (16-bit) version of the 32-bit IEEE 754 single-precision floating-point
> +   format.  */
> +const struct real_format arm_bfloat_half_format =
> +  {
> +    encode_arm_bfloat_half,
> +    decode_arm_bfloat_half,
> +    2,
> +    8,
> +    8,
> +    -125,
> +    128,
> +    15,
> +    15,
> +    0,
> +    false,
> +    true,
> +    true,
> +    true,
> +    true,
>

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Jan Hubicka
> > On the first glance the difference between gcc9 and gcc10 is explained
> > by the changes to profile updating. gcc9 makes very small cold
> > partitions compared to gcc10.  It is very nice that we have a way to
> > measure it. I will also check if some of the more important profiling
> > update fixes makes sense to backport to gcc9.
> > 
> > Over weekend I did some fixes to tp reordreing, so it may be nice to
> > update your tests, but I will try to run it myself.
> > 
> > In general one can see individual stages of compilation on the graph -
> > parsing, early lowering, early opts.  On bigger programs this should be
> > more visible.  I will give it a try.
> 
> You haven't replied to question whether we want to let ipa-reorder into
> trunk based on the sent images for GCC 10 PGO+LTO boostrap?

My concern is still the same - while I like the patch I am worried that
we have only one example where it produces some benefit. For this reason
I looked into tp_first_run issues this weekend and found fixed some
issues / verified that the order now seems to be fine for cc1 binary and
partly for Firefox.

We do run periodic benchmarks to keep optimization in shape, but code layout
stuff is out in wild and no one is verifying that it works. It is now
bit nontrivial piece of code and thus it is not big suprise that there
are numeber of bugs.

I like the heatmap generator (but for some reason it generates empty
pngs for me. Maybe things are just out of range since bounds are
hardcoded in your script) 

I don't have much time today but tomorrow I will try to get to it
tested and send some comments on the patch.

Honza


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-09 Thread Martin Liška

Hi.

I've just updated the script a bit and I added also address histogram:
https://drive.google.com/file/d/11s9R_JnEMohDE6ctqzsj092QD22HKXJI/view?usp=sharing

Martin


[PATCH, COMMITTED] arm: fix v[78]-r multilibs when configured with --with-multlib-list=aprofile

2019-12-09 Thread Richard Earnshaw (lists)
When gcc for Arm is configured with --with-multilib-list=aprofile a 
misplaced endif directive in the makefile was causing the arm->thumb 
mapping for multilibs to be omitted from the reuse rules.  This resulted 
in the default multilib being picked rather than the thumb2 opimized 
version.


* config/arm/t-multilib: Use arm->thumb multilib reuse rules
on a-profile.

Committed to trunk.
diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib
index dc97c8f09fb..d5ee537193f 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -185,6 +185,8 @@ MULTILIB_MATCHES	+= march?armv7=march?armv8.5-a
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_5_a_simd_variants), \
 			 march?armv7+fp=march?armv8.5-a$(ARCH))
 
+endif		# Not APROFILE.
+
 # Use Thumb libraries for everything.
 
 MULTILIB_REUSE		+= mthumb/march.armv7/mfloat-abi.soft=marm/march.armv7/mfloat-abi.soft
@@ -198,4 +200,3 @@ MULTILIB_REUSE		+= $(foreach MODE, arm thumb, \
 			 $(foreach ARCH, armv7, \
 			   mthumb/march.$(ARCH)/mfloat-abi.soft=m$(MODE)/march.$(ARCH)/mfloat-abi.softfp))
 
-endif		# Not APROFILE.


Re: [patch] Fix ICE on VLA in LTO mode

2019-12-09 Thread Richard Biener
On Mon, Dec 9, 2019 at 1:23 PM Eric Botcazou  wrote:
>
> Hi,
>
> this is a regression present on the mainline and 9 branch: the compiler gives
> an ICE for the attached Ada testcase on the following assertion:
>
>   if (DECL_P (ref))
> {
>   /* We shouldn't have true variables here.  */
>   gcc_assert (TREE_READONLY (ref));
>   subst = ref;
> }
>
> in self_referential_size because the size function machinery is invoked again
> by the *free_lang_data pass, more precisely from fld_process_array_type:
>
>   if (!existed)
> {
>   array
> = build_array_type_1 (t2, TYPE_DOMAIN (t), TYPE_TYPELESS_STORAGE (t),
>   false, false);
>   TYPE_CANONICAL (array) = TYPE_CANONICAL (t);
>   if (!fld->pset.add (array))
> add_tree_to_fld_list (array, fld);
> }
>
> through the call to build_array_type_1, and more precisely through the
> recursive call made for computing TYPE_CANONICAL:
>
>   if (TYPE_CANONICAL (t) == t)
> {
>   if (TYPE_STRUCTURAL_EQUALITY_P (elt_type)
>   || (index_type && TYPE_STRUCTURAL_EQUALITY_P (index_type))
>   || in_lto_p)
> SET_TYPE_STRUCTURAL_EQUALITY (t);
>   else if (TYPE_CANONICAL (elt_type) != elt_type
>|| (index_type && TYPE_CANONICAL (index_type) != index_type))
> TYPE_CANONICAL (t)
>   = build_array_type_1 (TYPE_CANONICAL (elt_type),
> index_type
> ? TYPE_CANONICAL (index_type) : NULL_TREE,
> typeless_storage, shared, set_canonical);
> }
>
>
> That's a bit surprising because t2 is an incomplete type and we have these
> lines in build_array_type_1 just before:
>
>   /* If the element type is incomplete at this point we get marked for
>  structural equality.  Do not record these types in the canonical
>  type hashtable.  */
>   if (TYPE_STRUCTURAL_EQUALITY_P (t))
> return t;
>
> so the computation of TYPE_CANONICAL should be skipped.  But it turns out that
> these lines from 2009 are obsolete because layout_type no longer forces the
> TYPE_STRUCTURAL_EQUALITY_P on the array when the element type is incomplete.
>
>
> Since fld_process_array_type overwrites TYPE_CANONICAL just after the call to
> build_array_type_1, there is no point for the latter in computing it so the
> proposed fix is to add a new SET_CANONICAL parameter to build_array_type_1.
>
> Tested on x86_64-suse-linux, OK for mainline and 9 branch?

OK.

>
> 2019-12-09  Eric Botcazou  
>
> * tree.c (build_array_type_1): Add SET_CANONICAL parameter and compute
> TYPE_CANONICAL from the element type only if it is set.  Remove 
> obsolete
> lines and adjust recursive call.
> (fld_process_array_type): Adjust call to build_array_type_1.
> (build_array_type): Likewise.
> (build_nonshared_array_type): Likewise.
>
>
> 2019-12-09  Eric Botcazou  
>
> * gnat.dg/lto23.adb: New test.
>
> --
> Eric Botcazou


Re: [PATCH] OpenACC reference count overhaul

2019-12-09 Thread Thomas Schwinge
Hi Julian!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t s, int 
> async, const char *libfnname)

>if (f & FLAG_COPYOUT)
> [...]
> gomp_copy_dev2host (acc_dev, aq, h, d, s);
>   }
> -  gomp_remove_var (acc_dev, n);
> +  gomp_remove_var_async (acc_dev, n, aq);

Conceptually, I understand correctly that we need to use this (new)
'gomp_remove_var_async' to make sure that we don't
'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above is
still in process?

I'm curious why this isn't causing any problems for nvptx offloading
already, any thoughts on that?  Or, is this just missing test coverage?
(Always difficult for 'async' stuff, of course.)  By chance, is this
right now already causing problems with AMD GCN offloading?  (I really
need to set up AMD GCN offloading testing...)


I'm citing below the changes introducing 'gomp_remove_var_async',
modelled similar to the existing 'gomp_unmap_vars_async'.


Also for both these, do I understand correctly, that it's actually not
the 'gomp_unref_tgt' that needs to be "delayed" via 'goacc_asyncqueue',
but rather really only the 'gomp_free_device_memory', called via
'gomp_unmap_tgt', called via 'gomp_unref_tgt'?  In other words: why do we
need to keep the 'struct target_mem_desc' alive?  Per my understanding,
that one is one component of the mapping table, and not relevant anymore
(thus can be 'free'd) as soon as it has been determined that
'tgt->refcount == 0'?  Am I missing something there?

It will be OK to clean that up later, but I'd like to understand this
now.  Well, or, stating that you just blindly copied that from the
existing 'gomp_unmap_vars_async' is fine, too!  ;-P


Grüße
 Thomas


> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -1092,32 +1106,66 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
>free (tgt);
>  }
>  
> -attribute_hidden bool
> -gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
> +static bool
> +gomp_unref_tgt (void *ptr)
>  {
>bool is_tgt_unmapped = false;
> -  splay_tree_remove (&devicep->mem_map, k);
> -  if (k->link_key)
> -splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key);
> -  if (k->tgt->refcount > 1)
> -k->tgt->refcount--;
> +
> +  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> +
> +  if (tgt->refcount > 1)
> +tgt->refcount--;
>else
>  {
> +  gomp_unmap_tgt (tgt);
>is_tgt_unmapped = true;
> -  gomp_unmap_tgt (k->tgt);
>  }
> +
>return is_tgt_unmapped;
>  }
>  
>  static void
> -gomp_unref_tgt (void *ptr)
> +gomp_unref_tgt_void (void *ptr)
>  {
> -  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> +  (void) gomp_unref_tgt (ptr);
> +}
>  
> -  if (tgt->refcount > 1)
> -tgt->refcount--;
> +static inline __attribute__((always_inline)) bool
> +gomp_remove_var_internal (struct gomp_device_descr *devicep, splay_tree_key 
> k,
> +   struct goacc_asyncqueue *aq)
> +{
> +  bool is_tgt_unmapped = false;
> +  splay_tree_remove (&devicep->mem_map, k);
> +  if (k->virtual_refcount == VREFCOUNT_LINK_KEY)
> +{
> +  if (k->u.link_key)
> + splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->u.link_key);
> +}
> +  if (aq)
> +devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void,
> + (void *) k->tgt);
>else
> -gomp_unmap_tgt (tgt);
> +is_tgt_unmapped = gomp_unref_tgt ((void *) k->tgt);
> +  return is_tgt_unmapped;
> +}
> +
> +attribute_hidden bool
> +gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
> +{
> +  return gomp_remove_var_internal (devicep, k, NULL);
> +}
> +
> +/* Remove a variable asynchronously.  This actually removes the variable
> +   mapping immediately, but retains the linked target_mem_desc until the
> +   asynchronous operation has completed (as it may still refer to target
> +   memory).  The device lock must be held before entry, and remains locked on
> +   exit.  */
> +
> +attribute_hidden void
> +gomp_remove_var_async (struct gomp_device_descr *devicep, splay_tree_key k,
> +struct goacc_asyncqueue *aq)
> +{
> +  (void) gomp_remove_var_internal (devicep, k, aq);
>  }


signature.asc
Description: PGP signature


Re: [committed, amdgcn] Fix unrecognised instruction

2019-12-09 Thread Andrew Stubbs

On 06/12/2019 17:57, Andrew Stubbs wrote:

Hi all,

I've committed the attached to fix a failure-to-assemble bug that can 
occur in some vectorized code.  This has been hidden for a long time 
because sub-word vectors were disabled on GCN, but this is no longer the 
case.


The gather load instructions had the suffixes for store, which didn't 
assemble well.  E.g. it had 'flat_load_short', instead of 
'flat_load_ustore'.


That should have been "ushort".



This fixes about 39 tests in vect.exp.


And this patch does the same for 'global_load_ushort', which fixes the 
same tests in the GCN5 multilibs.


Andrew
Fix more unrecognised GCN instructions

2019-12-09  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (gather_insn_1offset): Change
	%s to %o in asm output.
	(gather_insn_2offsets): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 95e0731a374..16b37e8daab 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -722,7 +722,7 @@
 	  sprintf (buf, "flat_load%%o0\t%%0, %%1%s\;s_waitcnt\t0", glc);
   }
 else if (AS_GLOBAL_P (as))
-  sprintf (buf, "global_load%%s0\t%%0, %%1, off offset:%%2%s\;"
+  sprintf (buf, "global_load%%o0\t%%0, %%1, off offset:%%2%s\;"
 	   "s_waitcnt\tvmcnt(0)", glc);
 else
   gcc_unreachable ();
@@ -780,7 +780,7 @@
 	/* Work around assembler bug in which a 64-bit register is expected,
 	but a 32-bit value would be correct.  */
 	int reg = REGNO (operands[2]) - FIRST_VGPR_REG;
-	sprintf (buf, "global_load%%s0\t%%0, v[%d:%d], %%1 offset:%%3%s\;"
+	sprintf (buf, "global_load%%o0\t%%0, v[%d:%d], %%1 offset:%%3%s\;"
 		  "s_waitcnt\tvmcnt(0)", reg, reg + 1, glc);
   }
 else


Re: [PATCH] OpenACC reference count overhaul

2019-12-09 Thread Julian Brown
On Mon, 9 Dec 2019 15:44:25 +0100
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2019-10-03T09:35:04-0700, Julian Brown 
> wrote:
> > --- a/libgomp/oacc-mem.c
> > +++ b/libgomp/oacc-mem.c  
> 
> > @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t
> > s, int async, const char *libfnname)  
> 
> >if (f & FLAG_COPYOUT)
> > [...]
> >   gomp_copy_dev2host (acc_dev, aq, h, d, s);
> > }
> > -  gomp_remove_var (acc_dev, n);
> > +  gomp_remove_var_async (acc_dev, n, aq);  
> 
> Conceptually, I understand correctly that we need to use this (new)
> 'gomp_remove_var_async' to make sure that we don't
> 'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above
> is still in process?

Yep.

> I'm curious why this isn't causing any problems for nvptx offloading
> already, any thoughts on that?  Or, is this just missing test
> coverage? (Always difficult for 'async' stuff, of course.)  By
> chance, is this right now already causing problems with AMD GCN
> offloading?  (I really need to set up AMD GCN offloading testing...)

In a few cases, async stuff on nvidia seems to "just work" even in
cases where we wouldn't expect it to via inspection (either because the
driver/hardware is doing something "magic", or because we're
somehow driving async operations in such a way that they run
synchronously in practice). One such case is with the "ephemeral"
asynchronous host-to-device memory copy patch.

The AMD side seems much more sensitive to improper async behaviour --
but I don't actually remember if I hit problems with this code in
particular.

> I'm citing below the changes introducing 'gomp_remove_var_async',
> modelled similar to the existing 'gomp_unmap_vars_async'.
> 
> 
> Also for both these, do I understand correctly, that it's actually not
> the 'gomp_unref_tgt' that needs to be "delayed" via
> 'goacc_asyncqueue', but rather really only the
> 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
> 'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
> target_mem_desc' alive?  Per my understanding, that one is one
> component of the mapping table, and not relevant anymore (thus can be
> 'free'd) as soon as it has been determined that 'tgt->refcount ==
> 0'?  Am I missing something there?

IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying
freeing of the target_mem_desc isn't really a huge problem, in practice.

> It will be OK to clean that up later, but I'd like to understand this
> now.  Well, or, stating that you just blindly copied that from the
> existing 'gomp_unmap_vars_async' is fine, too!  ;-P

Some changes arose via the porting to AMD GCN, and some may have been
drive-by fixes (e.g. where a synchronous call was used in a context
where it is obvious that an asynchronous call is really needed). Like
you mentioned, test coverage could probably be better, and writing
reliable tests for async behaviour is challenging.

Julian


[PATCH] Fix typos in 2 functions.

2019-12-09 Thread Martin Liška

Hi.

I'm sending fix for 2 locations where we have a typo.
Second hunk is pre-approved by Rich, first one needs to be approved
by Honza?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-12-09  Martin Liska  

PR tree-optimization/92862
* predict.c (predict_paths_leading_to_edge): Fix typo from e to e2.
* tree-ssa-loop-niter.c (loop_only_exit_p): Return false
instead of true;
---
 gcc/predict.c |  7 +++
 gcc/tree-ssa-loop-niter.c | 10 +++---
 2 files changed, 6 insertions(+), 11 deletions(-)


diff --git a/gcc/predict.c b/gcc/predict.c
index 67f850de17a..8db24816d29 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -3217,16 +3217,15 @@ predict_paths_leading_to_edge (edge e, enum br_predictor pred,
   basic_block bb = e->src;
   FOR_EACH_EDGE (e2, ei, bb->succs)
 if (e2->dest != e->src && e2->dest != e->dest
-	&& !unlikely_executed_edge_p (e)
+	&& !unlikely_executed_edge_p (e2)
 	&& !dominated_by_p (CDI_POST_DOMINATORS, e->src, e2->dest))
   {
 	has_nonloop_edge = true;
 	break;
   }
+
   if (!has_nonloop_edge)
-{
-  predict_paths_for_bb (bb, bb, pred, taken, auto_bitmap (), in_loop);
-}
+predict_paths_for_bb (bb, bb, pred, taken, auto_bitmap (), in_loop);
   else
 predict_edge_def (e, pred, taken);
 }
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index f0dd9a0b363..39e937705f1 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2376,13 +2376,9 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit)
 return false;
 
   for (i = 0; i < loop->num_nodes; i++)
-{
-  for (bsi = gsi_start_bb (body[i]); !gsi_end_p (bsi); gsi_next (&bsi))
-	if (stmt_can_terminate_bb_p (gsi_stmt (bsi)))
-	  {
-	return true;
-	  }
-}
+for (bsi = gsi_start_bb (body[i]); !gsi_end_p (bsi); gsi_next (&bsi))
+  if (stmt_can_terminate_bb_p (gsi_stmt (bsi)))
+	return false;
 
   return true;
 }



[RFC, vectorizer] Fix ICE with masked vectors

2019-12-09 Thread Andrew Stubbs

Hi,

This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:

during GIMPLE pass: vect
dump file: vect-ctor-1.c.159t.vect
.../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':
.../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 
exact_div, at poly-int.h:2162
0xdf845f poly_int<1u, poly_resultlong, unsigned long, poly_int_traits::is_poly>::type, 
poly_coeff_pair_traitslong, poly_int_traitslong>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 
unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)

/scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162
0xdf649a poly_int<1u, poly_resultpoly_coeff_pair_traitslong>::result_kind>::type> exact_div<1u, unsigned long, unsigned 
long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 
long> const&)

/scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175
0x1c473cd vect_get_num_vectors
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520
0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)

/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798
0x1596732 vect_analyze_loop_2
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095
0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536
0x15d7b36 try_vectorize_loop_1
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892
0x15d831f try_vectorize_loop
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044
0x15d84f9 vectorize_loops()
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125
0x144f0af execute
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


The problem is that exact_div is being asked to do "8 / 64", which it 
won't. The comment on the function says "NUNITS should be based on the 
vectorization factor, so it is always a known multiple of the number of 
elements in VECTYPE". This is on the amdgcn target where the 
vectorization factor is always 64, but smaller tasks can be vectorized 
using masking.


I think what's happening here is that the assumption described in the 
comment is invalid in the presence of masked vectors.


The attached patch fixes the ICE in the testcase, but I suspect does not 
go far enough. Can it happen that NUNITS can be greater than the 
vectorization factor, but not a multiple? Is this even a valid fix in 
the first place? Must it be conditionalized on masking being available? 
Is the exactness even worth checking, in the presence of exceptions?


Thanks

Andrew
WIP Fix vect-ctor-1.c ICE


diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 51a13f1d207..bf1c3eeda85 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)
 static inline unsigned int
 vect_get_num_vectors (poly_uint64 nunits, tree vectype)
 {
+  /* Masked vectors can cause partial vector use.  */
+  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
+return 1;
+
   return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
 }
 


Re: [PATCH v2][MSP430] Add msp430-elfbare target

2019-12-09 Thread Jozef Lawrynowicz
On Sat, 07 Dec 2019 11:40:33 -0700
Jeff Law  wrote:

> On Fri, 2019-11-29 at 21:00 +, Jozef Lawrynowicz wrote:
> > The attached patch consolidates some configuration tweaks I
> > previously submitted
> > as modifications to the msp430-elf target into a new target called
> > "msp430-elfbare" i.e. "bare-metal".
> > 
> > MSP430: Disable TM clone registry by default
> >   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00550.html
> > MSP430: Disable __cxa_atexit
> >   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00552.html
> > 
> > The patches tweak the CRT code to achieve the smallest possible code
> > size, 
> > and rely on some additional generic tweaks to crtstuff.c.
> > 
> > I did submit these tweaks a while ago, but I didn't get any feedback,
> > however even if they are acceptable I suspect it is too late for GCC-
> > 10 anyway:
> > libgcc: Dont define __do_global_dtors_aux if it will be empty
> >   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00417.html
> > libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE
> >   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00418.html
> > 
> > (The second one is a bit hacky, but without some way of removing the
> > __dso_handle declaration, we end up with 150 bytes of unnecessary
> > code in some
> > programs.)
> > 
> > So for this patch crtstuff.c was copied to the msp430 subdirectory
> > and the
> > changes were made to that target specific version.
> > 
> > Tiny program size can now be achieved by configuring gcc for msp430-
> > elfbare.
> > 
> > For example in an "empty main" program which loops forever:
> >   msp430-elfbare @ -Os:
> >  textdata bss dec hex filename
> >14   0   0  14   e a.out
> >   msp430-elf @ -Os:
> >  textdata bss dec hex filename
> >   270   6   2 278 116 a.out
> > 
> > Successfully regtested msp430-elfbare vs msp430-elf.
> > 
> > Ok to apply?
> > 
> > P.S. This patch relies on the -fno-exceptions multilib patch
> > submitted here:
> > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02523.html
> > 
> > P.P.S. This requires some minor configury tweaks to Newlib and GDB of
> > the form:
> > -  msp430*-*-elf)
> > +  msp430-*-elf*)  
> 
> > I'll apply these changes if the patch is accepted.
> > From cff4611855d838315e793d45256de5fc8eeefafe Mon Sep 17 00:00:00
> > 2001
> > From: Jozef Lawrynowicz 
> > Date: Mon, 25 Nov 2019 19:41:05 +
> > Subject: [PATCH] MSP430: Add new msp430-elfbare target
> > 
> > contrib/ChangeLog:
> > 
> > 2019-11-29  Jozef Lawrynowicz  
> > 
> > * config-list.mk: Add msp430-elfbare.
> > 
> > gcc/ChangeLog:
> > 
> > 2019-11-29  Jozef Lawrynowicz  
> > 
> > * config.gcc: s/msp430*-*-*/msp430-*-*.
> > Handle msp430-*-elfbare.
> > * config/msp430/msp430-devices.c (TARGET_SUBDIR): Define.
> > (_MSPMKSTR): Define.
> > (__MSPMKSTR): Define.
> > (rest_of_devices_path): Use TARGET_SUBDIR value in string.
> > * config/msp430/msp430.c (msp430_option_override): Error if
> > -fuse-cxa-atexit is used when it has been disabled at configure
> > time.
> > * config/msp430/t-msp430: Define TARGET_SUBDIR when building
> > msp430-devices.o.
> > * doc/install.texi: Document msp430-*-elf and msp430-*-elfbare.
> > * doc/invoke.texi: Update documentation about which path
> > devices.csv is
> > searched for.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2019-11-29  Jozef Lawrynowicz  
> > 
> > * g++.dg/init/dso_handle1.C: Require cxa_atexit support.
> > * g++.dg/init/dso_handle2.C: Likewise.
> > * g++.dg/other/cxa-atexit1.C: Likewise.
> > * gcc.target/msp430/msp430.exp: Update csv-using-installed.c
> > test to
> > handle msp430-elfbare configuration.
> > 
> > libgcc/ChangeLog:
> > 
> > 2019-11-29  Jozef Lawrynowicz  
> > 
> > * config.host: Use t-msp430-elfbare-crtstuff Makefile fragment
> > when GCC
> > is configured for the msp430-elfbare target.
> > * config/msp430/msp430-elfbare-crtstuff.c: New file.
> > * config/msp430/t-msp430: Remove Makefile rules for object
> > files
> > built from crtstuff.c
> > * config/msp430/t-msp430-crtstuff: New file.
> > * config/msp430/t-msp430-elfbare-crtstuff: New file.
> > * configure: Regenerate.
> > * configure.ac: Disable TM clone registry by default for
> > msp430-elfbare.  
> OK.   I probably would have tried to avoid msp430-elfbare-crtstuff, but
> it's not a huge wart IMHO.

If we get the __dso_handle removal into the generic libgcc/crtstuff.c those
changes won't be necessary.

Did you get a chance to look at "Add -fno-exceptions multilib" -
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02523.html

It is the cumulative effect of these patches that gives the good code size
results, unfortunately without them all there isn't a significant code size
improvement.

Thanks,
Jozef
> 
> Jeff
> >   
> 


[PATCH] Fix column information for omp_clauses in Fortran code

2019-12-09 Thread Harwath, Frederik
Hi,
Tobias has recently fixed a problem with the column information in gfortran 
locations
("PR 92793 - fix column used for error diagnostic"). Diagnostic messages for 
OpenMP/OpenACC
clauses do not contain the right column information yet. The reason is that the 
location
information of the first clause is used for all clauses on a line and hence the 
columns
are wrong for all but the first clause. The attached patch fixes this problem.

I have tested the patch manually by adapting the validity check for nested 
OpenACC reductions (see omp-low.c)
to include the location of clauses in warnings instead of the location of the 
loop to which the clause belongs.
I can add a regression test based on this later on after adapting the code in 
omp-low.c.

Is it ok to include the patch in trunk?

Best regards,
Frederik


On 04.12.19 14:37, Tobias Burnus wrote:
> As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
> == 0 to the middle end. The reason for that is how parsing works – gfortran 
> reads the input line by line.
> 
> For internal error diagnostic (fortran/error.c), the column location was 
> corrected –  but not for locations passed to the middle end. Hence, the 
> diagnostic there wasn't optimal.
> 
> Fixed by introducing a new function; now one only needs to make sure that no 
> new code will re-introduce "lb->location" :-)
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
> 
> Tobias

From af3a63b64f38d522b0091a123a919d1f20f5a8b1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 9 Dec 2019 15:07:53 +0100
Subject: [PATCH] Fix column information for omp_clauses in Fortran code

The location of all OpenMP/OpenACC clauses on any given line in Fortran code
always points to the first clause on that line. Hence, the column information
is wrong for all clauses but the first one.

Use the correct location for each clause instead.

2019-12-09  Frederik Harwath  

/gcc/fortran/
	* trans-openmp (gfc_trans_omp_reduction_list): Pass correct location for each
	clause to build_omp_clause.
---
 gcc/fortran/trans-openmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d07ff86fc0b..356fd04e6c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1982,7 +1982,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list,
 	tree t = gfc_trans_omp_variable (namelist->sym, false);
 	if (t != error_mark_node)
 	  {
-	tree node = build_omp_clause (gfc_get_location (&where),
+	tree node = build_omp_clause (gfc_get_location (&namelist->where),
 	  OMP_CLAUSE_REDUCTION);
 	OMP_CLAUSE_DECL (node) = t;
 	if (mark_addressable)
-- 
2.17.1



Re: [RFC, vectorizer] Fix ICE with masked vectors

2019-12-09 Thread Richard Sandiford
Andrew Stubbs  writes:
> Hi,
>
> This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:
>
> during GIMPLE pass: vect
> dump file: vect-ctor-1.c.159t.vect
> .../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':
> .../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 
> exact_div, at poly-int.h:2162
> 0xdf845f poly_int<1u, poly_result long, unsigned long, poly_int_traits::is_poly>::type, 
> poly_coeff_pair_traits long, poly_int_traits long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 
> unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162
> 0xdf649a poly_int<1u, poly_result poly_coeff_pair_traits long>::result_kind>::type> exact_div<1u, unsigned long, unsigned 
> long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 
> long> const&)
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175
> 0x1c473cd vect_get_num_vectors
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520
> 0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)
>  
> /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798
> 0x1596732 vect_analyze_loop_2
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095
> 0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536
> 0x15d7b36 try_vectorize_loop_1
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892
> 0x15d831f try_vectorize_loop
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044
> 0x15d84f9 vectorize_loops()
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125
> 0x144f0af execute
>  /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
>
> The problem is that exact_div is being asked to do "8 / 64", which it 
> won't. The comment on the function says "NUNITS should be based on the 
> vectorization factor, so it is always a known multiple of the number of 
> elements in VECTYPE". This is on the amdgcn target where the 
> vectorization factor is always 64, but smaller tasks can be vectorized 
> using masking.
>
> I think what's happening here is that the assumption described in the 
> comment is invalid in the presence of masked vectors.

No, the assumption's correct even there.  The assert usually triggers
because something elsewhere is getting confused about the vector types.

> The attached patch fixes the ICE in the testcase, but I suspect does not 
> go far enough. Can it happen that NUNITS can be greater than the 
> vectorization factor, but not a multiple? Is this even a valid fix in 
> the first place? Must it be conditionalized on masking being available? 
> Is the exactness even worth checking, in the presence of exceptions?

The vector types and VF aren't chosen based on whether masking is available.
It happens the other way around: we first analyse the loop and pick the VF
for an unmasked loop, but record as we go whether a masked implementation
is also possible.  Then we decide at the end whether to use a masked
implementation instead of an unmasked one.

So if this assert triggers for masked loops, it could trigger for unmasked
loops too.

FWIW there's an instance of this for SVE that I haven't got around
to debugging yet, but from a quick look at the dump, it was somehow
combining a vector of 8 longs with a vector of 4 floats.  I'm not sure
it's going to be the same issue as yours though.

Thanks,
Richard

>
> Thanks
>
> Andrew
>
> WIP Fix vect-ctor-1.c ICE
>
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 51a13f1d207..bf1c3eeda85 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info 
> loop_vinfo)
>  static inline unsigned int
>  vect_get_num_vectors (poly_uint64 nunits, tree vectype)
>  {
> +  /* Masked vectors can cause partial vector use.  */
> +  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
> +return 1;
> +
>return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
>  }
>  


Re: [PATCH] Fix column information for omp_clauses in Fortran code

2019-12-09 Thread Tobias Burnus

LGTM. Thanks for the patch!

Tobias

On 12/9/19 4:58 PM, Harwath, Frederik wrote:

Hi,
Tobias has recently fixed a problem with the column information in gfortran 
locations
("PR 92793 - fix column used for error diagnostic"). Diagnostic messages for 
OpenMP/OpenACC
clauses do not contain the right column information yet. The reason is that the 
location
information of the first clause is used for all clauses on a line and hence the 
columns
are wrong for all but the first clause. The attached patch fixes this problem.

I have tested the patch manually by adapting the validity check for nested 
OpenACC reductions (see omp-low.c)
to include the location of clauses in warnings instead of the location of the 
loop to which the clause belongs.
I can add a regression test based on this later on after adapting the code in 
omp-low.c.

Is it ok to include the patch in trunk?

Best regards,
Frederik


On 04.12.19 14:37, Tobias Burnus wrote:

As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
== 0 to the middle end. The reason for that is how parsing works – gfortran 
reads the input line by line.

For internal error diagnostic (fortran/error.c), the column location was 
corrected –  but not for locations passed to the middle end. Hence, the 
diagnostic there wasn't optimal.

Fixed by introducing a new function; now one only needs to make sure that no new code will 
re-introduce "lb->location" :-)

Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias


Re: [PATCH] bring -Warray-bounds closer to -Wstringop-overflow (PR91647, 91463, 91679)

2019-12-09 Thread Matthew Malcomson
On 01/11/2019 21:09, Martin Sebor wrote:
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 53278168a59..d7c74a1865a 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -837,8 +837,8 @@ try_conditional_simplification (internal_fn ifn, 
> gimple_match_op *res_op,
> gimple_match_op cond_op (gimple_match_cond (res_op->ops[0],
> res_op->ops[num_ops - 1]),
>  op, res_op->type, num_ops - 2);
> -  for (unsigned int i = 1; i < num_ops - 1; ++i)
> -cond_op.ops[i - 1] = res_op->ops[i];
> +
> +  memcpy (cond_op.ops, res_op->ops + 1, (num_ops - 1) * sizeof *cond_op.ops);
> switch (num_ops - 2)
>   {
>   case 2:

I think this copies one extra element than the original code.

(copying `num_ops - 1` elements, while the previous loop only copied 
`num_ops - 2` elements since the counter started at 1).


Re: [patch, fortran] Introduce -finline-pack

2019-12-09 Thread Thomas Koenig

Hi Richard,


Just as a suggestion, maybe we'd want to extend this
to other intrinsics in future so a -fno-inline-intrinsic=pack[,...]
is more future proof? (I'd inline all intrinsics by default thus
only provide the negative form).  You can avoid the extra
option parsing complexity by only literally adding
-fno-inline-intrinsic=pack for now.


I agree that such an option would make sense, I think this is
something we should consider for gcc 11.

In this instance, your reply shows that the option is poorly named,
because it is actually not about the PACK intrinsic, but the internal
packing that happens for arguments.

Maybe -finline-repack would be a better name? -finline-internal-pack?

Regards

Thomas


Re: [PATCH] Fix typos in 2 functions.

2019-12-09 Thread Jan Hubicka
> Hi.
> 
> I'm sending fix for 2 locations where we have a typo.
> Second hunk is pre-approved by Rich, first one needs to be approved
> by Honza?
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-12-09  Martin Liska  
> 
>   PR tree-optimization/92862
>   * predict.c (predict_paths_leading_to_edge): Fix typo from e to e2.
>   * tree-ssa-loop-niter.c (loop_only_exit_p): Return false
>   instead of true;

OK, thanks!

Honza


Re: [PATCH, Modula-2 (C/C++/D/F/Go/Jit)] (Register spec fn) (v3)

2019-12-09 Thread Gaius Mulley
Matthias Klose  writes:

> On 17.11.19 07:49, Gaius Mulley wrote:
>> 
>> Hello,
>> 
>> while spending the weekend on the Howland and Baker islands :-) I
>> thought I'd post version three of the patches which introduce Modula-2
>> into the GCC trunk.  The patches include:
>
> [...]
>
>> At a later point (after it is reviewed/approved) the gm2 tree
>> http://git.savannah.gnu.org/cgit/gm2.git/tree/gcc-versionno/m2/ could
>> be included.  Together with the gm2 testsuite.
>> 
>> But for now here are the proposed patches and ChangeLogs and new files
>> (gm2-v3.tar.gz) (after the patches):
>
> I have updated my distro packaging to build gcc-10, including gm2 from the
> trunk.  Both native and cross builds seem to work, with some glitches:
>
>  - For native builds, the profiled build doesn't work, failing to link
>the gcov library. Failing that, I can't check the lto+profiled build.
>Both the profiled and lto+profiled builds are working on your gcc-9
>branch.
>
>  - For cross builds, the libgm2 libraries install as host libraries,
>not target libraries (but are correctly built).  I sent one patch
>to Gaius, but couldn't figure out yet, why the libs are not
>installed as target libraries.
>
> The packages are publicly available in Debian experimental [1] and Ubuntu 
> focal
> [2], test results are sent to the gcc-testresults ML.
>
> Are you still aiming for inclusion in GCC 10?
>
> Matthias
>
> [1] https://tracker.debian.org/pkg/gcc-10
> [2]
> https://launchpad.net/~doko/+archive/ubuntu/toolchain/+sourcepub/10781708/+listing-archive-extra

Hello,

yes I'm still aiming for inclusion in GCC 10.  I'm still examining the
target/host bugs in the libgm2 build infrastructure (and slowly going
insane :-).  I'll also look at the lto+profiled build - great to hear
the gcc-9/gm2 branch works.  How do the GCC 10 gm2 v3 patches look?


regards,
Gaius


Re: [PATCH] bring -Warray-bounds closer to -Wstringop-overflow (PR91647, 91463, 91679)

2019-12-09 Thread Martin Sebor

On 12/9/19 9:11 AM, Matthew Malcomson wrote:

On 01/11/2019 21:09, Martin Sebor wrote:

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 53278168a59..d7c74a1865a 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -837,8 +837,8 @@ try_conditional_simplification (internal_fn ifn, 
gimple_match_op *res_op,
 gimple_match_op cond_op (gimple_match_cond (res_op->ops[0],
  res_op->ops[num_ops - 1]),
   op, res_op->type, num_ops - 2);
-  for (unsigned int i = 1; i < num_ops - 1; ++i)
-cond_op.ops[i - 1] = res_op->ops[i];
+
+  memcpy (cond_op.ops, res_op->ops + 1, (num_ops - 1) * sizeof *cond_op.ops);
 switch (num_ops - 2)
   {
   case 2:


I think this copies one extra element than the original code.

(copying `num_ops - 1` elements, while the previous loop only copied
`num_ops - 2` elements since the counter started at 1).



Yes, I think you're right.  I only noticed after I committed
the change, but didn't think it actually causes any problems
(i.e., it doesn't read past the end).  Let me know if you
think otherwise.

Martin


[PATCH] libstdc++: Implement ranges::safe_range for C++20 (P1870R1)

2019-12-09 Thread Jonathan Wakely

This change replaces the __forwarding_range implementation detail with
the ranges::safe_range concept and adds the ranges::enable_safe_range
variable template for opt-in in to the concept.

It also adjusts the begin/end/rbegin/rend customization point objects to
match the new rules for accessing rvalue ranges only when safe to do so.

* include/bits/range_access.h (ranges::enable_safe_range): Define.
(ranges::begin, ranges::end, ranges::rbegin, ranges::rend): Constrain
to only accept types satisfying safe_range and treat argument as an
lvalue when calling a member of performing ADL.
(ranges::__detail::__range_impl, ranges::__detail::__forwarding_range):
Remove.
(ranges::range): Adjust definition.
(ranges::safe_range): Define.
(ranges::iterator_t, ranges::range_difference_t): Reorder definitions
to match the synopsis in the working draft.
(ranges::disable_sized_range): Remove duplicate definition.
* include/experimental/string_view (ranges::enable_safe_range): Add
partial specialization for std::experimental::basic_string_view.
* include/std/ranges (ranges::viewable_range, ranges::subrange)
(ranges::empty_view, ranges::iota_view): Use safe_range. Specialize
enable_safe_range.
(ranges::safe_iterator_t, ranges::safe_subrange_t): Define.
* include/std/span (ranges::enable_safe_range): Add partial
specialization for std::span.
* include/std/string_view (ranges::enable_safe_range): Likewise for
std::basic_string_view.
* testsuite/std/ranges/access/begin.cc: Adjust expected results.
* testsuite/std/ranges/access/cbegin.cc: Likewise.
* testsuite/std/ranges/access/cdata.cc: Likewise.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/crbegin.cc: Likewise.
* testsuite/std/ranges/access/crend.cc: Likewise.
* testsuite/std/ranges/access/data.cc: Likewise.
* testsuite/std/ranges/access/end.cc: Likewise.
* testsuite/std/ranges/access/rbegin.cc: Likewise.
* testsuite/std/ranges/access/rend.cc: Likewise.
* testsuite/std/ranges/empty_view.cc: Test ranges::begin and
ranges::end instead of unqualified calls to begin and end.
* testsuite/std/ranges/safe_range.cc: New test.
* testsuite/std/ranges/safe_range_types.cc: New test.
* testsuite/util/testsuite_iterators.h: Add comment about safe_range.

Tested powerpc64le-linux, committed to trunk.


commit b081863b2bb9d2a0b5421556a3e3d7c638278145
Author: Jonathan Wakely 
Date:   Mon Dec 9 08:50:48 2019 +

libstdc++: Implement ranges::safe_range for C++20 (P1870R1)

This change replaces the __forwarding_range implementation detail with
the ranges::safe_range concept and adds the ranges::enable_safe_range
variable template for opt-in in to the concept.

It also adjusts the begin/end/rbegin/rend customization point objects to
match the new rules for accessing rvalue ranges only when safe to do so.

* include/bits/range_access.h (ranges::enable_safe_range): Define.
(ranges::begin, ranges::end, ranges::rbegin, ranges::rend): 
Constrain
to only accept types satisfying safe_range and treat argument as an
lvalue when calling a member of performing ADL.
(ranges::__detail::__range_impl, 
ranges::__detail::__forwarding_range):
Remove.
(ranges::range): Adjust definition.
(ranges::safe_range): Define.
(ranges::iterator_t, ranges::range_difference_t): Reorder 
definitions
to match the synopsis in the working draft.
(ranges::disable_sized_range): Remove duplicate definition.
* include/experimental/string_view (ranges::enable_safe_range): Add
partial specialization for std::experimental::basic_string_view.
* include/std/ranges (ranges::viewable_range, ranges::subrange)
(ranges::empty_view, ranges::iota_view): Use safe_range. Specialize
enable_safe_range.
(ranges::safe_iterator_t, ranges::safe_subrange_t): Define.
* include/std/span (ranges::enable_safe_range): Add partial
specialization for std::span.
* include/std/string_view (ranges::enable_safe_range): Likewise for
std::basic_string_view.
* testsuite/std/ranges/access/begin.cc: Adjust expected results.
* testsuite/std/ranges/access/cbegin.cc: Likewise.
* testsuite/std/ranges/access/cdata.cc: Likewise.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/crbegin.cc: Likewise.
* testsuite/std/ranges/access/crend.cc: Likewise.
* testsuite/std/ranges/access/data.cc: Likewise.
* testsuite/std/ranges/access/end.cc: Likewise.
 

[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-09 Thread Stam Markianos-Wright


On 12/2/19 4:43 PM, Stam Markianos-Wright wrote:
> 
> 
> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
>> Pinging with more correct maintainers this time :)
>>
>> Also would need to backport to gcc7,8,9, but need to get this approved 
>> first!
>>
>> Thank you,
>> Stam
>>
>>
>>  Forwarded Message 
>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional 
>> branches in Thumb2 (PR91816)
>> Date: Mon, 21 Oct 2019 10:37:09 +0100
>> From: Stam Markianos-Wright 
>> To: Ramana Radhakrishnan 
>> CC: gcc-patches@gcc.gnu.org , nd 
>> , James Greenhalgh , Richard 
>> Earnshaw 
>>
>>
>>
>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:

 Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
 however, on my native Aarch32 setup the test times out when run as part
 of a big "make check-gcc" regression, but not when run individually.

 2019-10-11  Stamatis Markianos-Wright 

 * config/arm/arm.md: Update b for Thumb2 range checks.
 * config/arm/arm.c: New function arm_gen_far_branch.
    * config/arm/arm-protos.h: New function arm_gen_far_branch
 prototype.

 gcc/testsuite/ChangeLog:

 2019-10-11  Stamatis Markianos-Wright 

    * testsuite/gcc.target/arm/pr91816.c: New test.
>>>
 diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
 index f995974f9bb..1dce333d1c3 100644
 --- a/gcc/config/arm/arm-protos.h
 +++ b/gcc/config/arm/arm-protos.h
 @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
 cpu_arch_option *,
   void arm_initialize_isa (sbitmap, const enum isa_feature *);
 +const char * arm_gen_far_branch (rtx *, int,const char * , const 
 char *);
 +
 +
>>>
>>> Lets get the nits out of the way.
>>>
>>> Unnecessary extra new line, need a space between int and const above.
>>>
>>>
>>
>> .Fixed!
>>
   #endif /* ! GCC_ARM_PROTOS_H */
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 39e1a1ef9a2..1a693d2ddca 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
   }
   } /* Namespace selftest.  */
 +
 +/* Generate code to enable conditional branches in functions over 1 
 MiB.  */
 +const char *
 +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
 +    const char * branch_format)
>>>
>>> Not sure if this is some munging from the attachment but check
>>> vertical alignment of parameters.
>>>
>>
>> .Fixed!
>>
 +{
 +  rtx_code_label * tmp_label = gen_label_rtx ();
 +  char label_buf[256];
 +  char buffer[128];
 +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
 +    CODE_LABEL_NUMBER (tmp_label));
 +  const char *label_ptr = arm_strip_name_encoding (label_buf);
 +  rtx dest_label = operands[pos_label];
 +  operands[pos_label] = tmp_label;
 +
 +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
 label_ptr);
 +  output_asm_insn (buffer, operands);
 +
 +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
 label_ptr);
 +  operands[pos_label] = dest_label;
 +  output_asm_insn (buffer, operands);
 +  return "";
 +}
 +
 +
>>>
>>> Unnecessary extra newline.
>>>
>>
>> .Fixed!
>>
   #undef TARGET_RUN_TARGET_SELFTESTS
   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
   #endif /* CHECKING_P */
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index f861c72ccfc..634fd0a59da 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -6686,9 +6686,16 @@
   ;; And for backward branches we have
   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
 -4) + 4).
   ;;
 +;; In 16-bit Thumb these ranges are:
   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
 (-2040->2048).
   ;; For a 'b' pos_range = 254,  neg_range = -256  giving 
 (-250 ->256).
 +;; In 32-bit Thumb these ranges are:
 +;; For a 'b'   +/- 16MB is not checked for.
 +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
 +;; (-1048568 -> 1048576).
 +
 +
>>>
>>> Unnecessary extra newline.
>>>
>>
>> .Fixed!
>>
   (define_expand "cbranchsi4"
     [(set (pc) (if_then_else
     (match_operator 0 "expandable_comparison_operator"
 @@ -6947,22 +6954,42 @@
     (pc)))]
     "TARGET_32BIT"
     "*
 -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
 -    {
 -  arm_ccfsm_state += 2;
 -  return \"\";
 -    }
 -  return \"b%d1\\t%l0\";
 + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
 +  {
 +    arm_ccfsm_state += 2;
 +    return \"\";
 +  }
 + switch (get_attr_length (insn))
 +  {
 +    // Thu

Re: libgo patch committed: Hurd fix

2019-12-09 Thread Ian Lance Taylor
On Sun, Dec 8, 2019 at 7:43 PM Ian Lance Taylor  wrote:
>
> I've committed this Hurd fix by Samuel Thibault for GCC PR 92861.

And a followup patch, also by Samuel Thibault.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279106)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1da5ceb8daaab7a243fffd6a647554cf674716f8
+6f2bf15e15bf7516c393966577d72b79cba7f980
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/os_hurd.go
===
--- libgo/go/runtime/os_hurd.go (revision 279106)
+++ libgo/go/runtime/os_hurd.go (working copy)
@@ -125,7 +125,3 @@ func osinit() {
physPageSize = uintptr(getPageSize())
}
 }
-
-const (
-   _CLOCK_REALTIME = 0
-)


Re: [PING^3] Re: [PATCH 1/2] Add a pass to automatically add ptwrite instrumentation

2019-12-09 Thread Andi Kleen
Andi Kleen  writes:

Ping!

> Andi Kleen  writes:
>
> Ping!
>
>> Andi Kleen  writes:
>>
>> Ping!
>>
>>> From: Andi Kleen 
>>>
>>> [v4: Rebased on current tree. Avoid some redundant log statements
>>> for locals and a few other fixes.  Fix some comments. Improve
>>> documentation. Did some studies on the debug information quality,
>>> see below]
>>>
>>> Add a new pass to automatically instrument changes to variables
>>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>>> field into an Processor Trace log, which allows low over head
>>> logging of information. Essentially it's a hardware accelerated
>>> printf.


[PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons

2019-12-09 Thread Vineet Gupta
ARC gcc generates FDCMP instructions which raises Invalid operation for
signaling NaN only. This causes glibc iseqsig() primitives to fail (in
the current ongoing glibc port to ARC)

So split up the hard float compares into two categories and for unordered
compares generate the FDCMPF instruction (vs. FDCMP) which raises exception
for either NaNs.

With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC.

Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no
regressions

gcc/
-xx-xx  Vineet Gupta  

* config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which
helps codegen generate exceptions even for quiet NaN.
* config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode.
(get_arc_condition_code): Likewise.
(arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE mode.
* config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE mode.
* config/arc/predicates.md (proper_comparison_operator): Likewise.
* config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE.
(cmpdf_fpu_trap): Likewise.

Signed-off-by: Vineet Gupta 
---
 gcc/config/arc/arc-modes.def |  1 +
 gcc/config/arc/arc.c |  8 ++--
 gcc/config/arc/arc.h |  2 +-
 gcc/config/arc/fpu.md| 24 
 gcc/config/arc/predicates.md |  1 +
 5 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
index 36a2f4abfb25..d16b6a289a15 100644
--- a/gcc/config/arc/arc-modes.def
+++ b/gcc/config/arc/arc-modes.def
@@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
 
 /* FPU condition flags.  */
 CC_MODE (CC_FPU);
+CC_MODE (CC_FPUE);
 CC_MODE (CC_FPU_UNEQ);
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 28305f459dcd..cbb95d6e9043 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison)
default : gcc_unreachable ();
}
 case E_CC_FPUmode:
+case E_CC_FPUEmode:
   switch (GET_CODE (comparison))
{
case EQ: return ARC_CC_EQ;
@@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   case UNLE:
   case UNGT:
   case UNGE:
+   return CC_FPUmode;
+
   case LT:
   case LE:
   case GT:
   case GE:
-   return CC_FPUmode;
+   return CC_FPUEmode;
 
   case LTGT:
   case UNEQ:
@@ -1844,7 +1847,7 @@ arc_init_reg_tables (void)
  if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
  || i == (int) CC_Cmode
  || i == CC_FP_GTmode || i == CC_FP_GEmode || i == CC_FP_ORDmode
- || i == CC_FPUmode || i == CC_FPU_UNEQmode)
+ || i == CC_FPUmode || i == CC_FPUEmode || i == CC_FPU_UNEQmode)
arc_mode_class[i] = 1 << (int) C_MODE;
  else
arc_mode_class[i] = 0;
@@ -8401,6 +8404,7 @@ arc_reorg (void)
 
  /* Avoid FPU instructions.  */
  if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode)
+ || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode)
  || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPU_UNEQmode))
continue;
 
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 4d7ac3281b41..c08ca3d0d432 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1531,7 +1531,7 @@ enum arc_function_type {
   (((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode\
 || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode  \
 || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode\
-|| (MODE) == CC_FPUmode)\
+|| (MODE) == CC_FPUmode || (MODE) == CC_FPUEmode)   \
? reverse_condition_maybe_unordered ((CODE)) \
: reverse_condition ((CODE)))
 
diff --git a/gcc/config/arc/fpu.md b/gcc/config/arc/fpu.md
index 6289e9c3f593..6729795de542 100644
--- a/gcc/config/arc/fpu.md
+++ b/gcc/config/arc/fpu.md
@@ -242,6 +242,18 @@
(set_attr "type" "fpu")
(set_attr "predicable" "yes")])
 
+(define_insn "*cmpsf_fpu_trap"
+  [(set (reg:CC_FPUE CC_REG)
+   (compare:CC_FPUE (match_operand:SF 0 "register_operand"  "r,  r,r")
+   (match_operand:SF 1 "nonmemory_operand" "r,CfZ,F")))]
+  "TARGET_FP_SP_BASE"
+  "fscmpf%?\\t%0,%1"
+  [(set_attr "length" "4,4,8")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*cmpsf_fpu_uneq"
   [(set (reg:CC_FPU_UNEQ CC_REG)
(compare:CC_FPU_UNEQ
@@ -338,6 +350,18 @@
(set_attr "type" "fpu")
(set_attr "predicable" "yes")])
 
+(define_insn "*cmpdf_fpu_trap"
+  [(set (reg:CC_FPUE CC_REG)
+   (compare:CC_FPUE (match_operand:DF 0 "even_register_operand"  "r")
+   (match_operand:DF 1 "even_register_operand"  "r")))

Re: [C++ Patch] Improve build_*_cast locations

2019-12-09 Thread Jason Merrill

On 12/9/19 7:06 AM, Paolo Carlini wrote:

Hi,

On 08/12/19 18:51, Jason Merrill wrote:
Hmm, is the change to cp_expr really necessary vs. using 
protected_set_expr_location?


Yes, using protected_set_expr_location works fine in this case, I 
suppose because we are dealing with expressions anyway plus the cp_expr 
constructor from a tree copies the location too. In the below I also 
added the thin build_functional_case wrapper, this way consistently all 
the build_*_cast functions called by the parser do not use set_location 
afterwards. Note, at some point we should also do something about the 
build_x_* functions which have been doing that for a while...


Anyway, the below passed testing.

Thanks, Paolo.





OK.

Jason



Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-12-09 Thread David Malcolm
On Fri, 2019-12-06 at 15:31 -0500, Lewis Hyatt wrote:
> On Fri, Dec 06, 2019 at 10:54:30AM -0500, David Malcolm wrote:

[...]

> > The patch is OK for trunk with the nits above fixed.  Do you have
> > commit access?  (I've got my own patch [1] that touches diagnostic-
> > show-locus.c which I'll need to refresh once yours goes in); need
> > to
> > remember to include those data files when committing.
> > 
> > Thanks very much for fixing this, and sorry again for the delay in
> > reviewing it.
> 
> That's wonderful, thanks very much. Happy to contribute something. I
> do not
> have commit access so if you could please apply it, that would be
> great.

Thanks; I've committed it on your behalf to trunk as r279137 (including
the data files).

FWIW I made a few minor fixes:
* I added a ChangeLog entry for contrib/unicode/unicode-license.txt,
and added the PR reference to the gcc/testsuite/ChangeLog.
* I also moved the top-level ChangeLog to contrib/ChangeLog, updating
the paths.
* I updated the instructions to reflect the path of the generated file:
-3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_wcwidth.h
+3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
(I tested regenerating the file and got the same result as you, on
supplying "12.1.0")

and did another bootstrap®ression test (on x86_64-pc-linux-gnu) for
good measure.


> Might I also trouble you to please have a look at this one:
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00766.html ?

Looking now.

> It is very much shorter than this one and fixes just a couple
> glitches in
> pretty-print.c -- UTF-8 gets mangled when printed via %q, and line-
> wrapping
> shouldn't wrap in the middle of a UTF-8 sequence. That's the only
> other
> issue I am aware of as far as diagnostics with multibyte characters
> go.

If you feel like tackling a related issue, maybe have a look at how we
print tab characters in diagnostic_show_locus; we should probably
resolve them into spaces (respecting -ftab-stop [1]), since when we
print them as tabs after a left-margin for line-numbers the
"tabification" is mangled.  I don't know if there's a bug for this in
bugzilla, but it seems tightly related to this column-handling work.

> Thanks again!

Thanks for all your work putting the patch together.

Dave

[1] see also c-indentation.c; gah



Re: [PATCH] rs6000: Fix 2 for PR92661, Do not define builtins that overload disabled builtins

2019-12-09 Thread Peter Bergner
On 12/6/19 5:12 PM, Segher Boessenkool wrote:
> On Thu, Dec 05, 2019 at 08:44:57AM +, Iain Sandoe wrote:
>> .. or I can just force a false return from effective_target_dfp as we
>>  do for other cases where assembler support does not imply system 
>>  support.
> 
> That's what I would do, yes.

I'm not sure that's necessary.  DFP enablement isn't triggered by
assembler support.  Just the gcc/configure fragment (ignoring manually
using --enable-decimal-float):

  case $target in
powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
i?86*-*-elfiamcu | i?86*-*-gnu* | x86_64*-*-gnu* | \
i?86*-*-mingw* | x86_64*-*-mingw* | \
i?86*-*-cygwin* | x86_64*-*-cygwin*)
  enable_decimal_float=yes
  ;;
*)
  { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: decimal float is not 
supported for this target, ignored" >&5
$as_echo "$as_me: WARNING: decimal float is not supported for this target, 
ignored" >&2;}
  enable_decimal_float=no
  ;;

So I don't think there is anything to do wrt Darwin here.

Peter




Re: [PATCH][Hashtable 0/6] Code review

2019-12-09 Thread François Dumont

This patch also require an update of the printers.py file.

Here is an updated version.

François

On 11/17/19 9:42 PM, François Dumont wrote:

This is the begining of a patch series for _Hashtable

Initial patch to clarify code. I was tired to see true/false or 
true_type/false_type without knowing what was true/false.


I also made code more consistent by chosing to specialize methods 
through usage of __unique_keys_t/__multi_keys_t rather than calling 
them _M_[multi]_XXX.



    * include/bits/hashtable_policy.h (__detail::__unique_keys_t): New.
    (__detail::__multi_keys_t): New.
    (__detail::__constant_iterators_t): New.
    (__detail::__mutable_iterators_t): New.
    (__detail::__hash_cached_t): New.
    (__detail::__hash_not_cached_t): New.
    (_Hash_node<>): Change _Cache_hash_code template parameter from 
bool to

    typename. Adapt partial specializations.
    (_Node_iterator_base<>): Likewise.
    (operator==(const _Node_iterator_base<>&,const 
_Node_iterator_base<>&)):

    Adapt.
    (operator!=(const _Node_iterator_base<>&,const 
_Node_iterator_base<>&)):

    Adapt.
    (_Node_iterator<>): Change __constant_iterators and __cache template
    parameters from bool to typename.
    (_Node_const_iterator<>): Likewise.
    (_Map_base<>): Change _Unique_keys template parameter from bool to
    typename. Adapt partial specializations.
    (_Insert<>): Change _Constant_iterators template parameter from 
bool to

    typename. Adapt partial specializations.
    (_Local_iterator_base<>): Change __cache_hash_code template parameter
    from bool to typename. Adapt partial specialization.
    (_Hash_code_base<>): Likewise.
    (operator==(const _Local_iterator_base<>&,
    const _Local_iterator_base<>&)): Adapt.
    (operator!=(const _Local_iterator_base<>&,
    const _Local_iterator_base<>&)):
    Adapt.
    (_Local_iterator<>): Change __constant_iterators and __cache template
    parameters from bool to typename.
    (_Local_const_iterator<>): Likewise.
    (_Hashtable_base<>): Adapt.
    (_Equal_hash_code<>): Adapt.
    (_Equality<>): Adapt.
    * include/bits/hashtable.h (_Hashtable<>): Replace occurences of
    true_type/false_type by respoectively __unique_type_t/__multi_type_t.
    (_M_insert_unique_node(const key_type&, size_t, __hash_code,
    __node_type*, size_t)): Replace by...
    (_M_insert_node(__unique_keys_t, size_t, __hash_code, __node_type*,
    size_t)): ...this.
    (_M_insert_muti_node(__node_type*, const key_type&, __hash_code,
    __node_type*)): Replace by...
    (_M_insert_node(__multi_keys_t, __node_type*, __hash_code,
    __node_type*)): ...this.
    (_M_reinsert_node(node_type&&)): Replace by...
    (_M_reinsert_node(node_type&&, __unique_keys_t)): ...this.
    (_M_reinsert_node(const_iterator, node_type&&, __unique_keys_t)): 
New,

    forward to latter.
    (_M_reinsert_node_multi(const_iterator, node_type&&)): Replace by...
    (_M_reinsert_node(const_iterator, node_type&&, __multi_keys_t)):
    ...this.
    (_M_reinsert_node(node_type&&, __multi_keys_t)): New, forward to 
latter.

    (_M_reinsert_node(node_type&&)): New, use latters.
    (_M_reinsert_node(const_iterator, node_type&&)): Likewise.
    (_M_merge_unique(_Compatible_Hashtable&)): Replace by...
    (_M_merge(__unique_keys_t, _Compatible_Hashtable&)): ...this.
    (_M_merge_multi(_Compatible_Hashtable&)): Replace by...
    (_M_merge(__multi_keys_t, _Compatible_Hashtable&)): ...this.
    (_M_merge(_Compatible_Hashtable&)): New, use latters.
    * include/bits/unordered_map.h
    (unordered_map<>::insert(const_iterator, node_type&&)): Adapt.
    (unordered_map<>::merge(unordered_map<>&)): Adapt.
(unordered_map<>::merge(unordered_multimap<>&)): Adapt.
    (unordered_multimap<>::insert(node_type&&)): Adapt.
    (unordered_multimap<>::insert(const_iterator, node_type&&)): Adapt.
(unordered_multimap<>::merge(unordered_multimap<>&)): Adapt.
(unordered_multimap<>::merge(unordered_map<>&)): Adapt.
    * include/bits/unordered_set.h
    (unordered_set<>::insert(const_iterator, node_type&&)): Adapt.
    (unordered_set<>::merge(unordered_set<>&)): Adapt.
(unordered_set<>::merge(unordered_multiset<>&)): Adapt.
    (unordered_multiset<>::insert(node_type&&)): Adapt.
    (unordered_multiset<>::insert(const_iterator, node_type&&)): Adapt.
(unordered_multiset<>::merge(unordered_multiset<>&)): Adapt.
(unordered_multiset<>::merge(unordered_set<>&)): Adapt.

Tested under Linux x86_64.

François



diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index c2b2219d471..ef71c090f3b 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -184,7 +184,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private __detail::_Hashtable_alloc<
 	__alloc_rebind<_Alloc,
 		   __detail::_Hash_node<_Value,
-	_Traits::__hash_cached::value>>>
+	typename _Traits::__hash_cached>>>
 {
   static_assert(is_same::type, _Value>::value,
 	  "unorde

[PING][PATCH v2 2/2] testsuite: Fix run-time tracking down of `libgcc_s'

2019-12-09 Thread Maciej W. Rozycki
On Fri, 29 Nov 2019, Maciej W. Rozycki wrote:

> Fix a catastrophic libgo testsuite failure in cross-compilation where 
> the shared `libgcc_s' library cannot be found by the loader at run time 
> in build-tree testing and consequently all test cases fail the execution 
> stage, giving output (here with the `x86_64-linux-gnu' host and the 
> `riscv64-linux-gnu' target, with RISC-V QEMU in the Linux user emulation 
> mode as the target board) like:
> 
> spawn qemu-riscv64 -E 
> LD_LIBRARY_PATH=.:.../riscv64-linux-gnu/lib64/lp64d/libgo/.libs ./a.exe
> ./a.exe: error while loading shared libraries: libgcc_s.so.1: cannot open 
> shared object file: No such file or directory
> FAIL: archive/tar

 Ping for:



  Maciej


[PING^3][PATCH 0/4] Fix library testsuite compilation for build sysroot

2019-12-09 Thread Maciej W. Rozycki
On Mon, 11 Nov 2019, Maciej W. Rozycki wrote:

>  This patch series addresses a problem with the testsuite compiler being 
> set up across libatomic, libffi, libgo, libgomp with no correlation 
> whatsoever to the target compiler being used in GCC compilation.  
> Consequently there in no arrangement made to set up the compilation 
> sysroot according to the build sysroot specified for GCC compilation, 
> causing a catastrophic failure across the testsuites affected from the 
> inability to link executables.

 Ping for:





  Maciej


[PING][PATCH v3] Add `--with-toolexeclibdir=' configuration option

2019-12-09 Thread Maciej W. Rozycki
On Mon, 2 Dec 2019, Maciej W. Rozycki wrote:

> Provide means, in the form of a `--with-toolexeclibdir=' configuration 
> option, to override the default installation directory for target 
> libraries, otherwise known as $toolexeclibdir.  This is so that it is 
> possible to get newly-built libraries, particularly the shared ones, 
> installed in a common place, so that they can be readily used by the 
> target system as their host libraries, possibly over NFS, without a need 
> to manually copy them over from the currently hardcoded location they 
> would otherwise be installed in.

 Ping for:



  Maciej


Re: [PATCH] Fix multibyte-related issues in pretty-print.c (PR 91843)

2019-12-09 Thread David Malcolm
On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:
> Hello-
> 
> This short patch addresses 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843
> by adding the needed multibyte awareness to pretty-print.c.
> Together with my other patch awaiting review
> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this
> fixes all
> issues that I am aware of regarding printing diagnostics with
> multibyte
> characters in UTF-8 locales. Would you please have a look and see if
> it's OK?
> Thanks very much.
> 
> bootstrapped and tested on x86-64 Linux, all test results were
> identical before
> and after:
> 34 XPASS
> 109 FAIL
> 1490 XFAIL
> 9470 UNSUPPORTED
> 332971 PASS
> 
> -Lewis

Patch looks good to me.

Do you want SVN commit access, as per:
  https://www.gnu.org/software/gcc/svnwrite.html
?

I'm willing to sponsor you.

Dave



[PR92840] [OpenACC] Refuse 'acc_unmap_data' unless mapped by 'acc_map_data'

2019-12-09 Thread Thomas Schwinge
Hi!

See attached "[PR92840] [OpenACC] Refuse 'acc_unmap_data' unless mapped
by 'acc_map_data'", committed to trunk in r279145.

As mentioned in the patch, some further checking can be applied, later,
incrementally.


Grüße
 Thomas


From bea573cb7ea13cece9c51ca9eb1cc9c34005dedf Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 22:52:36 +
Subject: [PATCH] [PR92840] [OpenACC] Refuse 'acc_unmap_data' unless mapped by
 'acc_map_data'

	libgomp/
	PR libgomp/92840
	* oacc-mem.c (acc_map_data): Clarify reference counting behavior.
	(acc_unmap_data): Add error case for 'REFCOUNT_INFINITY'.
	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c:
	New file.
	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Adjust.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279145 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog | 12 +
 libgomp/oacc-mem.c| 18 -
 .../acc_unmap_data-pr92840-1.c| 27 +++
 .../acc_unmap_data-pr92840-2.c| 25 +
 .../acc_unmap_data-pr92840-3.c| 26 ++
 .../libgomp.oacc-c-c++-common/clauses-1.c | 21 ---
 .../libgomp.oacc-c-c++-common/nested-1.c  | 14 +-
 7 files changed, 126 insertions(+), 17 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-3.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 739a76d48ac..7606f17825d 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,17 @@
 2019-12-09  Thomas Schwinge  
 
+	PR libgomp/92840
+	* oacc-mem.c (acc_map_data): Clarify reference counting behavior.
+	(acc_unmap_data): Add error case for 'REFCOUNT_INFINITY'.
+	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c:
+	New file.
+	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Adjust.
+	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Adjust.
+
 	PR libgomp/92511
 	* testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c: Remove
 	this file...
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 55c195bd819..480b9fbb71b 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -407,7 +407,11 @@ acc_map_data (void *h, void *d, size_t s)
 
   tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
 			   &kinds, true, GOMP_MAP_VARS_OPENACC);
-  tgt->list[0].key->refcount = REFCOUNT_INFINITY;
+  splay_tree_key n = tgt->list[0].key;
+  assert (n->refcount == 1);
+  assert (n->dynamic_refcount == 0);
+  /* Special reference counting behavior.  */
+  n->refcount = REFCOUNT_INFINITY;
 
   if (profiling_p)
 	{
@@ -459,6 +463,18 @@ acc_unmap_data (void *h)
   gomp_fatal ("[%p,%d] surrounds %p",
 		  (void *) n->host_start, (int) host_size, (void *) h);
 }
+  /* TODO This currently doesn't catch 'REFCOUNT_INFINITY' usage different from
+ 'acc_map_data'.  Maybe 'dynamic_refcount' can be used for disambiguating
+ the different 'REFCOUNT_INFINITY' cases, or simply separate
+ 'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA'
+ etc.)?  */
+  else if (n->refcount != REFCOUNT_INFINITY)
+{
+  gomp_mutex_unlock (&acc_dev->lock);
+  gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped"
+		  " by 'acc_map_data'",
+		  (void *) h, (int) host_size);
+}
 
   /* Mark for removal.  */
   n->refcount = 1;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c
new file mode 100644
index 000..d7ae59dd548
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_unmap_data-pr92840-1.c
@@ -0,0 +1,27 @@
+/* Verify that we refuse 'acc_unmap_data', after 'acc_create'.  */
+
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  const int N = 101;
+
+  char *h = (char *) malloc (N);
+  void *d = acc_create (h, N - 3);
+  if (!d)
+abort ();
+
+  fprintf (stderr, "CheCKpOInT\n");
+  acc_unmap_data (h);
+
+  return 0;
+}
+
+/* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */
+/* { dg-output "refusing to unmap block \\\[\[0-9a-fA-FxX\]+,\\\+98\\\] that has not been mapped by 'acc_map_data'" } */
+/* { dg-shouldfail "" } */
diff --git a/libgomp/testsuite/libgomp.oacc

Re: [PATCH] Multibyte awareness for diagnostics (PR 49973)

2019-12-09 Thread Lewis Hyatt
On Mon, Dec 09, 2019 at 03:12:31PM -0500, David Malcolm wrote:
> On Fri, 2019-12-06 at 15:31 -0500, Lewis Hyatt wrote:
> > On Fri, Dec 06, 2019 at 10:54:30AM -0500, David Malcolm wrote:
> 
> [...]
> 
> > > The patch is OK for trunk with the nits above fixed.  Do you have
> > > commit access?  (I've got my own patch [1] that touches diagnostic-
> > > show-locus.c which I'll need to refresh once yours goes in); need
> > > to
> > > remember to include those data files when committing.
> > > 
> > > Thanks very much for fixing this, and sorry again for the delay in
> > > reviewing it.
> > 
> > That's wonderful, thanks very much. Happy to contribute something. I
> > do not
> > have commit access so if you could please apply it, that would be
> > great.
> 
> Thanks; I've committed it on your behalf to trunk as r279137 (including
> the data files).
> 
> FWIW I made a few minor fixes:
> * I added a ChangeLog entry for contrib/unicode/unicode-license.txt,
> and added the PR reference to the gcc/testsuite/ChangeLog.
> * I also moved the top-level ChangeLog to contrib/ChangeLog, updating
> the paths.
> * I updated the instructions to reflect the path of the generated file:
> -3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_wcwidth.h
> +3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
> (I tested regenerating the file and got the same result as you, on
> supplying "12.1.0")
> 
> and did another bootstrap®ression test (on x86_64-pc-linux-gnu) for
> good measure.
>

> 
> > Might I also trouble you to please have a look at this one:
> > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00766.html ?
> 
> Looking now.
> 
> > It is very much shorter than this one and fixes just a couple
> > glitches in
> > pretty-print.c -- UTF-8 gets mangled when printed via %q, and line-
> > wrapping
> > shouldn't wrap in the middle of a UTF-8 sequence. That's the only
> > other
> > issue I am aware of as far as diagnostics with multibyte characters
> > go.
> 
> If you feel like tackling a related issue, maybe have a look at how we
> print tab characters in diagnostic_show_locus; we should probably
> resolve them into spaces (respecting -ftab-stop [1]), since when we
> print them as tabs after a left-margin for line-numbers the
> "tabification" is mangled.  I don't know if there's a bug for this in
> bugzilla, but it seems tightly related to this column-handling work.
> 
> > Thanks again!
> 
> Thanks for all your work putting the patch together.
> 
> Dave
> 
> [1] see also c-indentation.c; gah
> 

That's great, thanks very much.

I'm happy to look at the tab situation. My understanding is that -ftabstop is
only used to implement the logic in c-indentation.c and does not propagate to
diagnostics otherwise. Currently, cpp_wcwidth() returns 1 for all control
characters, including tabs. The tab is also counted as 1 byte for purpose of
the column number calculation. Then in diagnostic-show-locus, we convert all
tabs to a single space on output. So that much is at least consistent and
doesn't lead to any spacing issues AFAIK, as far as internal consistency of
diagnostic-show-locus's generated output goes. But it does make the source
which is output look weird for mixed-tabs-and-spaces styles.

Below patch would fix it including respecting -ftabstop. This patch causes
cpp_wcwidth() to return the actual tab width for '\t', and also changes
diagnostic-show-locus to convert the tab to the same number of spaces. My main
concern with this approach is I am not quite sure what's the "right" way to
expose the option value everywhere it needs to be. It lives inside a
cpp_options struct, but I think this is only accessible from a cpp_file
object, which I don't have in all relevant contexts. For this illustration I
used a global variable but I wonder whether there's a more preferred way to do
it. Other issue would be that the -ftabstop option is only processed in
c-family but presumably it's applicable to other languages too. Anyway if that
much seems OK, I could prepare this properly including comments + tests.

There would of course be the question of what the column number should show in
this case. It would be trivial to change it to show the display width of all
characters (multibyte and tabs). Changing it to show the byte count of
multibyte characters as it does now, but the actual width of tabs newly, would
require adding some new concepts that don't exist yet.

-Lewis

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index c913291c07c..ff85344d63f 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -506,7 +506,7 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
 case OPT_ftabstop_:
   /* It is documented that we silently ignore silly values.  */
   if (value >= 1 && value <= 100)
-   cpp_opts->tabstop = value;
+   cpp_opts->tabstop = cpp_tab_width = value;
   break;
 
 case OPT_fexec_charset_:
diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnosti

Re: [PATCH] Fix multibyte-related issues in pretty-print.c (PR 91843)

2019-12-09 Thread Lewis Hyatt
On Mon, Dec 9, 2019 at 4:58 PM David Malcolm  wrote:
>
> On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > This short patch addresses
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843
> > by adding the needed multibyte awareness to pretty-print.c.
> > Together with my other patch awaiting review
> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this
> > fixes all
> > issues that I am aware of regarding printing diagnostics with
> > multibyte
> > characters in UTF-8 locales. Would you please have a look and see if
> > it's OK?
> > Thanks very much.
> >
> > bootstrapped and tested on x86-64 Linux, all test results were
> > identical before
> > and after:
> > 34 XPASS
> > 109 FAIL
> > 1490 XFAIL
> > 9470 UNSUPPORTED
> > 332971 PASS
> >
> > -Lewis
>
> Patch looks good to me.
>
> Do you want SVN commit access, as per:
>   https://www.gnu.org/software/gcc/svnwrite.html
> ?
>
> I'm willing to sponsor you.
>
> Dave
>

Thanks, that sounds great. I will submit the form then.

-Lewis


[PR92503] [OpenACC] Don't silently 'acc_unmap_data' in 'acc_free'

2019-12-09 Thread Thomas Schwinge
Hi!

See attached "[PR92503] [OpenACC] Don't silently 'acc_unmap_data' in
'acc_free'", committed to trunk in r279146.

As mentioned in PR92503, further work can be done later on,
incrementally, to avoid "expensive device-to-host-address lookup":
possibly "we might actually keep such additional/expensive
sanity-checking, but guard it by an environment variable".


Grüße
 Thomas


From 03383a93c7318009ddd0e8d77b1a950c4b2b8f5a Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 22:52:47 +
Subject: [PATCH] [PR92503] [OpenACC] Don't silently 'acc_unmap_data' in
 'acc_free'

	libgomp/
	PR libgomp/92503
	* oacc-mem.c (acc_free): Error out instead of 'acc_unmap_data'.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-1.c: New
	file.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/context-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279146 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog | 25 +++
 libgomp/oacc-mem.c| 17 +-
 .../acc_free-pr92503-1.c  | 28 
 .../acc_free-pr92503-2.c  | 27 
 .../acc_free-pr92503-3-2.c| 28 
 .../acc_free-pr92503-3.c  | 28 
 .../acc_free-pr92503-4-2.c| 31 ++
 .../acc_free-pr92503-4.c  | 32 +++
 .../libgomp.oacc-c-c++-common/clauses-1.c | 12 +--
 .../libgomp.oacc-c-c++-common/context-1.c |  6 ++--
 .../libgomp.oacc-c-c++-common/context-2.c |  6 ++--
 .../libgomp.oacc-c-c++-common/context-3.c |  6 ++--
 .../libgomp.oacc-c-c++-common/context-4.c |  6 ++--
 .../libgomp.oacc-c-c++-common/lib-13.c|  2 +-
 .../libgomp.oacc-c-c++-common/lib-14.c|  2 +-
 .../libgomp.oacc-c-c++-common/lib-18.c|  2 +-
 .../libgomp.oacc-c-c++-common/lib-91.c|  2 ++
 .../libgomp.oacc-c-c++-common/nested-1.c  | 12 +--
 18 files changed, 242 insertions(+), 30 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 7606f17825d..62092a2d765 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,30 @@
 2019-12-09  Thomas Schwinge  
 
+	PR libgomp/92503
+	* oacc-mem.c (acc_free): Error out instead of 'acc_unmap_data'.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-1.c: New
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Adjust.
+	* testsuite/libgomp.oacc-c-c++-common/context-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/context-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
+
 	PR libgomp/92840
 	* oacc-mem.c (acc_map_data): Clarify reference counting behavior.
 	(acc_unmap_data): Add error case for 'REFCOUNT_INFINITY'.
diff 

Re: [PATCH 25/49] analyzer: new files: graphviz.{cc|h}

2019-12-09 Thread David Malcolm
On Sat, 2019-12-07 at 07:54 -0700, Jeff Law wrote:
> On Fri, 2019-11-15 at 20:23 -0500, David Malcolm wrote:
> > This patch adds a simple wrapper class to make it easier to
> > write human-readable .dot files.
> > 
> > gcc/ChangeLog:
> > * analyzer/graphviz.cc: New file.
> > * analyzer/graphviz.h: New file.
> This doesn't seem specific to the analyzer at all.  Can we move it
> into
> a more generic place for others to see and be able to use?
> 
> jeff

Will do for next iteration.

Dave



Re: [PATCH 26/49] analyzer: new files: digraph.{cc|h} and shortest-paths.h

2019-12-09 Thread David Malcolm
On Sat, 2019-12-07 at 07:58 -0700, Jeff Law wrote:
> On Fri, 2019-11-15 at 20:23 -0500, David Malcolm wrote:
> > This patch adds template classes for directed graphs, their nodes
> > and edges, and for finding the shortest path through such a graph.
> > 
> > gcc/ChangeLog:
> > * analyzer/digraph.cc: New file.
> > * analyzer/digraph.h: New file.
> > * analyzer/shortest-paths.h: New file.
> Nothing too worrisome here.  I'm kindof surprised if haven't needed
> some of these capabilities before now.  Thoughts on moving them
> outside
> of analyzer into a more generic location?
> 
> jeff

Will do for next iteration - nothing is analyzer-specific here (but
it's used in two different places in the analyzer code, via
subclassing).

Dave



Re: [PATCH 29/49] analyzer: new files: tristate.{cc|h}

2019-12-09 Thread David Malcolm
On Sat, 2019-12-07 at 08:03 -0700, Jeff Law wrote:
> On Fri, 2019-11-15 at 20:23 -0500, David Malcolm wrote:
> > gcc/ChangeLog:
> > * analyzer/tristate.cc: New file.
> > * analyzer/tristate.h: New file.
> Nothing really concerning here.  Seems like a generic facility we'd
> like to be able to use elsewhere.   Move outside the analyzer?

Will do for next iteration.

Dave



Re: [PATCH 17/49] Support for adding selftests via a plugin

2019-12-09 Thread David Malcolm
On Sat, 2019-12-07 at 07:40 -0700, Jeff Law wrote:
> On Fri, 2019-11-15 at 20:23 -0500, David Malcolm wrote:
> > This patch provides a plugin callback for invoking selftests, so
> > that
> > a
> > plugin can add tests to those run by -fself-test=DIR.  The callback
> > invocation uses invoke_plugin_callbacks, which is a no-op if plugin
> > support is disabled.
> > 
> > gcc/ChangeLog:
> > * plugin.c (register_callback): Add case for
> > PLUGIN_RUN_SELFTESTS.
> > (invoke_plugin_callbacks_full): Likewise.
> > * plugin.def (PLUGIN_RUN_SELFTESTS): New event.
> > * selftest-run-tests.c: Include "plugin.h".
> > (selftest::run_tests): Run any plugin-provided selftests.
> I'm generally in favor of having the ability for plugins to invoke
> selftests.  But do we want to push on this now given the preferred
> direction for this kit?
> 
> jeff

I'm dropping this for the sake of simplicity/YAGNI in the next
iteration of the kit, as it's no longer needed by the kit.

Dave



[PR92116, PR92877] [OpenACC] Replace 'openacc.data_environ' by standard libgomp mechanics (was: [PATCH] OpenACC reference count overhaul)

2019-12-09 Thread Thomas Schwinge
Hi!

\o/ Yay for the first split-out piece of the big "OpenACC reference count
overhaul" going in:

On 2019-10-29T12:15:01+, Julian Brown  wrote:
> On Mon, 21 Oct 2019 16:14:11 +0200
> Thomas Schwinge  wrote:
>> Remeber to look into  "Potential null
>> pointer dereference in 'gomp_acc_remove_pointer'", which may be
>> relevant here.

I investigated and answered that one, and "we shall be removing this code
from 'gomp_acc_remove_pointer' any moment now" -- now done by means of:

>  - the "data_environ" field in the device descriptor -- a linear linked
>list containing a target memory descriptor for each "acc enter data"
>mapping -- has been removed.  This brings OpenACC closer to the
>OpenMP implementation for non-lexically-scoped data mapping
>(GOMP_target_enter_exit_data), and is potentially a performance win
>if lots of data is mapped in this way.

And, the 'data_environ' on-the-side data structure caused actual bugs:
structured mappings (via 'gomp_map_vars') didn't maintain 'data_environ',
so 'lookup_dev' didn't work for these, which caused some diagnostic
confusion as well as 'acc_hostptr' always returning NULL for these, huh!

See attached "[PR92116, PR92877] [OpenACC] Replace 'openacc.data_environ'
by standard libgomp mechanics", committed to trunk in r279147.


Grüße
 Thomas


From a74d1c85921f0828075a6bf35e94df411d110673 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 9 Dec 2019 22:52:56 +
Subject: [PATCH] [PR92116, PR92877] [OpenACC] Replace 'openacc.data_environ'
 by standard libgomp mechanics

	libgomp/
	PR libgomp/92116
	PR libgomp/92877
	* oacc-mem.c (lookup_dev): Reimplement.  Adjust all users.
	* libgomp.h (struct acc_dispatch_t): Remove 'data_environ' member.
	Adjust all users.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c:
	Remove XFAIL.
	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr92877-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279147 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  15 +++
 libgomp/libgomp.h |  10 +-
 libgomp/oacc-host.c   |   2 -
 libgomp/oacc-mem.c| 121 --
 libgomp/target.c  |   1 -
 .../acc_free-pr92503-4-2.c|   4 +-
 .../acc_free-pr92503-4.c  |   4 +-
 .../libgomp.oacc-c-c++-common/pr92877-1.c |  19 +++
 8 files changed, 64 insertions(+), 112 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr92877-1.c

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 62092a2d765..83227032f88 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,18 @@
+2019-12-09  Thomas Schwinge  
+	Julian Brown  
+
+	PR libgomp/92116
+	PR libgomp/92877
+
+	* oacc-mem.c (lookup_dev): Reimplement.  Adjust all users.
+	* libgomp.h (struct acc_dispatch_t): Remove 'data_environ' member.
+	Adjust all users.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c:
+	Remove XFAIL.
+	* testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr92877-1.c: New file.
+
 2019-12-09  Thomas Schwinge  
 
 	PR libgomp/92503
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index bab733d2b2d..a35aa07c80b 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1025,13 +1025,6 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 typedef struct acc_dispatch_t
 {
-  /* This is a linked list of data mapped using the
- acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
- Unlike mapped_data in the goacc_thread struct, unmapping can
- happen out-of-order with respect to mapping.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  struct target_mem_desc *data_environ;
-
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
@@ -1132,8 +1125,7 @@ struct gomp_device_descr
   enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
-  /* This is mutable because of its mutable data_environ and target_data
- members.  */
+  /* This is mutable because of its mutable target_data member.  */
   acc_dispatch_t openacc;
 };
 
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index cbcac9bf7b3..e9cd4bfcd4a 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -264,8 +264,6 @@ static struct gomp_device_descr host_dispatch =
 .state = GOMP_DEVICE_UNINITIALIZED,
 
 .openacc = {
-  .data_environ = NULL,
-
   .exec_func = host_openacc_exec,
 
   .create_thread_data_func = host_openacc_create_thread_data,
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 81ebddf7580..369a11696da 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -50,44 +50,42 @@ lookup_host (struct gomp_device_descr *dev, void

Re: [PATCH 28/49] analyzer: new files: analyzer.{cc|h}

2019-12-09 Thread David Malcolm
On Fri, 2019-12-06 at 22:38 -0500, Eric Gallager wrote:
> On 11/15/19, David Malcolm  wrote:
[...]
> > diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
> > new file mode 100644
> > index 000..ace8924
> > --- /dev/null
> > +++ b/gcc/analyzer/analyzer.h
> > @@ -0,0 +1,126 @@
> > +/* Utility functions for the analyzer.
> > +   Copyright (C) 2019 Free Software Foundation, Inc.
> > +   Contributed by David Malcolm .
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your
> > option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful, but
> > +WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +;.  */
> > +
> > +#ifndef GCC_ANALYZER_ANALYZER_H
> > +#define GCC_ANALYZER_ANALYZER_H
> > +
> > +/* Forward decls of common types, with indentation to show
> > inheritance. */
> 
> I'm wondering about the "with indentation to show inheritance"
> part...
> does that require tweaking any editor configuration files or adding
> /*INDENT-OFF*/ comments or anything to prevent automatic formatting
> tools from "fixing" the indentation to go back to the normal style of
> having everything be aligned?

If we had some kind of automatic formatting then I guess it would, but
I don't think we have such a system in place.

[...]



Re: [PATCH] add -Wmismatched-tags (PR 61339)

2019-12-09 Thread Martin Sebor

On 12/6/19 12:08 PM, Jason Merrill wrote:

On 12/5/19 6:47 PM, Jakub Jelinek wrote:

On Thu, Dec 05, 2019 at 04:33:10PM -0700, Martin Sebor wrote:
It's hard to distinguish between this type and the previous one by 
name;

this one should probably have "map" in its name.


+static GTY (()) record_to_locs_t *rec2loc;

...

+    rec2loc = new record_to_locs_t ();


If this isn't GC-allocated, marking it with GTY(()) seems wrong.  
How do

you imagine this warning interacting with PCH?


I have to confess I know too little about PCH to have an idea how
it might interact.  Is there something you suggest I try testing?


For your patch, obviously some struct/class forward declarations or
definitions in a header that you compile into PCH and then the main 
testcase

that contains the mismatched pairs.

If there is something that you need to record during parsing of the
precompiled header and use later on, everything needs to be GGC 
allocated.

So, the hash_map needs to be created with something like
hash_map::create_ggc (nnn)
and it really can't use pointer hashing, but has to use some different 
one
(say on DECL_UID, TYPE_UID etc.), because the addresses are remapped 
during

PCH save/restore cycle, but hash tables aren't rehashed.
See e.g. PR92458.


Alternately you can decide that this information will not be saved to 
PCH, and rely on CLASSTYPE_DECLARED_CLASS for classes loaded from a PCH.


This seems like the right approach to me.  Mismatches in
a precompiled header should be diagnosed when the header is
being compiled, so the only ones involving its uses should
be between classes defined in it and declared or referenced
outside it.  I've implemented this in the attached revision.

Martin
PR c++/61339 - add warning for mismatch between struct and class

gcc/c-family/ChangeLog:

	PR c++/61339
	* c.opt (-Wmismatched-tags, -Wredundant-tags): New options.

gcc/cp/ChangeLog:

	PR c++/61339
	* parser.c (cp_parser_maybe_warn_enum_key): New function.
	(class_decl_loc_t): New class.
	(class_to_loc_map_t): New typedef.
	(class2loc): New global variable.
	(cp_parser_elaborated_type_specifier): Call
	cp_parser_maybe_warn_enum_key.
	(cp_parser_class_head): Call cp_parser_check_class_key.
	(cp_parser_check_class_key): Add arguments.  Call class_decl_loc_t::add.
	(c_parse_file): Call class_decl_loc_t::diag_mismatched_tags.

gcc/testsuite/ChangeLog:

	PR c++/61339
	* g++.dg/warn/Wmismatched-tags.C: New test.
	* g++.dg/warn/Wredundant-tags.C: New test.
	* g++.dg/pch/Wmismatched-tags.C: New test.
	* g++.dg/pch/Wmismatched-tags.Hs: New test header.

gcc/ChangeLog:

	PR c++/61339
	* doc/invoke.texi (-Wmismatched-tags, -Wredundant-tags): Document
	new C++ options.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 914a2f0ef44..4f3d3cf0d43 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -755,6 +755,10 @@ Wmisleading-indentation
 C C++ Common Var(warn_misleading_indentation) Warning LangEnabledBy(C C++,Wall)
 Warn when the indentation of the code does not reflect the block structure.
 
+Wmismatched-tags
+C++ Objc++ Var(warn_mismatched_tags) Warning
+Warn when a class is redeclared or referenced using a mismatched class-key.
+
 Wmissing-braces
 C ObjC C++ ObjC++ Var(warn_missing_braces) Warning LangEnabledBy(C ObjC,Wall)
 Warn about possibly missing braces around initializers.
@@ -783,6 +787,10 @@ Wpacked-not-aligned
 C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn when fields in a struct with the packed attribute are misaligned.
 
+Wredundant-tags
+C++ Objc++ Var(warn_redundant_tags) Warning
+Warn when a class or enumerated type is referenced using a redundant class-key.
+
 Wsized-deallocation
 C++ ObjC++ Var(warn_sized_deallocation) Warning EnabledBy(Wextra)
 Warn about missing sized deallocation functions.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fb030022627..7915c7416aa 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2599,8 +2599,9 @@ static enum tag_types cp_parser_token_is_class_key
   (cp_token *);
 static enum tag_types cp_parser_token_is_type_parameter_key
   (cp_token *);
+static void cp_parser_maybe_warn_enum_key (cp_parser *, location_t, tree, rid);
 static void cp_parser_check_class_key
-  (enum tag_types, tree type);
+(cp_parser *, location_t, enum tag_types, tree type, bool, bool);
 static void cp_parser_check_access_in_redeclaration
   (tree type, location_t location);
 static bool cp_parser_optional_template_keyword
@@ -18480,6 +18481,11 @@ cp_parser_elaborated_type_specifier (cp_parser* parser,
   tree globalscope;
   cp_token *token = NULL;
 
+  /* For class and enum types the location of the class-key or enum-key.  */
+  location_t key_loc = cp_lexer_peek_token (parser->lexer)->location;
+  /* For a scoped enum, the 'class' or 'struct' keyword id.  */
+  rid scoped_key = RID_MAX;
+
   /* See if we're looking at the `enum' keyword.  */
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_ENUM))
 {
@@ -18490,10 +

Re: [PATCH 26/49] analyzer: new files: digraph.{cc|h} and shortest-paths.h

2019-12-09 Thread Martin Sebor

On 11/15/19 6:23 PM, David Malcolm wrote:

This patch adds template classes for directed graphs, their nodes
and edges, and for finding the shortest path through such a graph.


Just a few mostly minor comments from me, in a similar vein as
on some of the other patches.



gcc/ChangeLog:
* analyzer/digraph.cc: New file.
* analyzer/digraph.h: New file.
* analyzer/shortest-paths.h: New file.
---
  gcc/analyzer/digraph.cc   | 189 
  gcc/analyzer/digraph.h| 248 ++
  gcc/analyzer/shortest-paths.h | 147 +
  3 files changed, 584 insertions(+)
  create mode 100644 gcc/analyzer/digraph.cc
  create mode 100644 gcc/analyzer/digraph.h
  create mode 100644 gcc/analyzer/shortest-paths.h

diff --git a/gcc/analyzer/digraph.cc b/gcc/analyzer/digraph.cc
new file mode 100644
index 000..c1fa46e
--- /dev/null
+++ b/gcc/analyzer/digraph.cc
@@ -0,0 +1,189 @@
+/* Template classes for directed graphs.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "gcc-plugin.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "analyzer/graphviz.h"
+#include "analyzer/digraph.h"
+#include "analyzer/shortest-paths.h"
+#include "selftest.h"
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* A family of digraph classes for writing selftests.  */
+
+struct test_node;
+struct test_edge;
+struct test_graph;
+struct test_dump_args_t {};
+struct test_cluster;
+
+struct test_graph_traits
+{
+  typedef test_node node_t;
+  typedef test_edge edge_t;
+  typedef test_graph graph_t;
+  typedef test_dump_args_t dump_args_t;
+  typedef test_cluster cluster_t;
+};
+
+struct test_node : public dnode
+{
+  test_node (const char *name, int index) : m_name (name), m_index (index) {}
+  void dump_dot (graphviz_out *, const dump_args_t &) const OVERRIDE
+  {
+  }
+
+  const char *m_name;
+  int m_index;
+};
+
+struct test_edge : public dedge
+{
+  test_edge (node_t *src, node_t *dest)
+  : dedge (src, dest)
+  {}
+
+  void dump_dot (graphviz_out *gv, const dump_args_t &) const OVERRIDE
+  {
+gv->println ("%s -> %s;", m_src->m_name, m_dest->m_name);
+  }
+};
+
+struct test_graph : public digraph
+{
+  test_node *add_test_node (const char *name)
+  {
+test_node *result = new test_node (name, m_nodes.length ());
+add_node (result);
+return result;
+  }
+
+  test_edge *add_test_edge (test_node *src, test_node *dst)
+  {
+test_edge *result = new test_edge (src, dst);
+add_edge (result);
+return result;
+  }
+};
+
+struct test_cluster : public cluster
+{
+};
+
+struct test_path
+{
+  auto_vec m_edges;
+};
+
+/* Smoketest of digraph dumping.  */
+
+static void
+test_dump_to_dot ()
+{
+  test_graph g;
+  test_node *a = g.add_test_node ("a");
+  test_node *b = g.add_test_node ("b");
+  g.add_test_edge (a, b);
+
+  pretty_printer pp;
+  pp.buffer->stream = NULL;
+  test_dump_args_t dump_args;
+  g.dump_dot_to_pp (&pp, NULL, dump_args);
+
+  ASSERT_STR_CONTAINS (pp_formatted_text (&pp),
+  "a -> b;\n");
+}
+
+/* Test shortest paths from A in this digraph,
+   where edges run top-to-bottom if not otherwise labeled:
+
+  A
+ / \
+B   C-->D
+|   |
+E   |
+ \ /
+  F.  */
+
+static void
+test_shortest_paths ()
+{
+  test_graph g;
+  test_node *a = g.add_test_node ("a");
+  test_node *b = g.add_test_node ("b");
+  test_node *c = g.add_test_node ("d");
+  test_node *d = g.add_test_node ("d");
+  test_node *e = g.add_test_node ("e");
+  test_node *f = g.add_test_node ("f");
+
+  test_edge *ab = g.add_test_edge (a, b);
+  test_edge *ac = g.add_test_edge (a, c);
+  test_edge *cd = g.add_test_edge (c, d);
+  test_edge *be = g.add_test_edge (b, e);
+  g.add_test_edge (e, f);
+  test_edge *cf = g.add_test_edge (c, f);
+
+  shortest_paths sp (g, a);
+
+  test_path path_to_a = sp.get_shortest_path (a);
+  ASSERT_EQ (path_to_a.m_edges.length (), 0);
+
+  test_path path_to_b = sp.get_shortest_path (b);
+  ASSERT_EQ (path_to_b.m_edges.length (), 1);
+  ASSERT_EQ (path_to_b.m_edges[0], ab);
+
+  test_path path_to_c = sp.get_shortest_path (c);
+  ASSERT_EQ (path_to_c.m_edges.length (), 1);
+  ASSERT_EQ (path_to_c.m_edges[0], ac);
+
+  test_path path_to_d = sp.get

Re: [PATCH 29/49] analyzer: new files: tristate.{cc|h}

2019-12-09 Thread Martin Sebor

On 11/15/19 6:23 PM, David Malcolm wrote:

gcc/ChangeLog:
* analyzer/tristate.cc: New file.
* analyzer/tristate.h: New file.
---
  gcc/analyzer/tristate.cc | 222 +++
  gcc/analyzer/tristate.h  |  82 +
  2 files changed, 304 insertions(+)
  create mode 100644 gcc/analyzer/tristate.cc
  create mode 100644 gcc/analyzer/tristate.h

diff --git a/gcc/analyzer/tristate.cc b/gcc/analyzer/tristate.cc
new file mode 100644
index 000..ac16129
--- /dev/null
+++ b/gcc/analyzer/tristate.cc
@@ -0,0 +1,222 @@
+/* "True" vs "False" vs "Unknown".
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "gcc-plugin.h"
+#include "system.h"
+#include "coretypes.h"
+#include "analyzer/tristate.h"
+#include "selftest.h"
+
+const char *
+tristate::as_string () const
+{
+  switch (m_value)
+{
+default:
+  gcc_unreachable ();
+case TS_UNKNOWN:
+  return "UNKNOWN";
+case TS_TRUE:
+  return "TRUE";
+case TS_FALSE:
+  return "FALSE";
+}
+}
+
+tristate
+tristate::not_ () const
+{
+  switch (m_value)
+{
+default:
+  gcc_unreachable ();
+case TS_UNKNOWN:
+  return tristate (TS_UNKNOWN);
+case TS_TRUE:
+  return tristate (TS_FALSE);
+case TS_FALSE:
+  return tristate (TS_TRUE);
+}
+}
+
+tristate
+tristate::or_ (tristate other) const
+{
+  switch (m_value)
+{
+default:
+  gcc_unreachable ();
+case TS_UNKNOWN:
+  if (other.is_true ())
+   return tristate (TS_TRUE);
+  else
+   return tristate (TS_UNKNOWN);
+case TS_FALSE:
+  return other;
+case TS_TRUE:
+  return tristate (TS_TRUE);
+}
+}
+
+tristate
+tristate::and_ (tristate other) const
+{
+  switch (m_value)
+{
+default:
+  gcc_unreachable ();
+case TS_UNKNOWN:
+  if (other.is_false ())
+   return tristate (TS_FALSE);
+  else
+   return tristate (TS_UNKNOWN);
+case TS_TRUE:
+  return other;
+case TS_FALSE:
+  return tristate (TS_FALSE);
+}
+}
+
+#if CHECKING_P
+
+namespace selftest {
+
+#define ASSERT_TRISTATE_TRUE(TRISTATE) \
+  SELFTEST_BEGIN_STMT  \
+  ASSERT_EQ (TRISTATE, tristate (tristate::TS_TRUE));  \
+  SELFTEST_END_STMT
+
+#define ASSERT_TRISTATE_FALSE(TRISTATE) \
+  SELFTEST_BEGIN_STMT  \
+  ASSERT_EQ (TRISTATE, tristate (tristate::TS_FALSE)); \
+  SELFTEST_END_STMT
+
+#define ASSERT_TRISTATE_UNKNOWN(TRISTATE) \
+  SELFTEST_BEGIN_STMT  \
+  ASSERT_EQ (TRISTATE, tristate (tristate::TS_UNKNOWN));   \
+  SELFTEST_END_STMT
+
+/* Test tristate's ctors, along with is_*, as_string, operator==, and
+   operator!=.  */
+
+static void
+test_ctors ()
+{
+  tristate u (tristate::TS_UNKNOWN);
+  ASSERT_FALSE (u.is_known ());
+  ASSERT_FALSE (u.is_true ());
+  ASSERT_FALSE (u.is_false ());
+  ASSERT_STREQ (u.as_string (), "UNKNOWN");
+
+  tristate t (tristate::TS_TRUE);
+  ASSERT_TRUE (t.is_known ());
+  ASSERT_TRUE (t.is_true ());
+  ASSERT_FALSE (t.is_false ());
+  ASSERT_STREQ (t.as_string (), "TRUE");
+
+  tristate f (tristate::TS_FALSE);
+  ASSERT_TRUE (f.is_known ());
+  ASSERT_FALSE (f.is_true ());
+  ASSERT_TRUE (f.is_false ());
+  ASSERT_STREQ (f.as_string (), "FALSE");
+
+  ASSERT_EQ (u, u);
+  ASSERT_EQ (t, t);
+  ASSERT_EQ (f, f);
+  ASSERT_NE (u, t);
+  ASSERT_NE (u, f);
+  ASSERT_NE (t, f);
+
+  tristate t2 (true);
+  ASSERT_TRUE (t2.is_true ());
+  ASSERT_EQ (t, t2);
+
+  tristate f2 (false);
+  ASSERT_TRUE (f2.is_false ());
+  ASSERT_EQ (f, f2);
+
+  tristate u2 (tristate::unknown ());
+  ASSERT_TRUE (!u2.is_known ());
+  ASSERT_EQ (u, u2);
+}
+
+/* Test && on tristate instances.  */
+
+static void
+test_and ()
+{
+  ASSERT_TRISTATE_UNKNOWN (tristate::unknown () && tristate::unknown ());
+
+  ASSERT_TRISTATE_FALSE (tristate (false) && tristate (false));
+  ASSERT_TRISTATE_FALSE (tristate (false) && tristate (true));
+  ASSERT_TRISTATE_FALSE (tristate (true) && tristate (false));
+  ASSERT_TRISTATE_TRUE (tristate (true) && tristate (true));
+
+  ASSERT_TRISTATE_UNKNOWN (tristate::unknown () && tristate (true));
+  ASSERT_TRISTATE_UNKNOWN (tristate (true) && tristate::unknown ());
+
+  ASSERT_TRISTATE_FALSE (tristate::un

Re: [PATCH 03/49] diagnostic_show_locus: move initial newline to callers

2019-12-09 Thread David Malcolm
On Sat, 2019-12-07 at 07:30 -0700, Jeff Law wrote:
> On Fri, 2019-11-15 at 20:22 -0500, David Malcolm wrote:
> > diagnostic_show_locus adds a newline before doing anything
> > (including
> > the do-nothing-else case).
> > 
> > This patch removes this initial newline, adding it to all callers
> > of diagnostic_show_locus instead.
> > 
> > Doing so makes diagnostic_show_locus more flexible, allowing it to
> > be
> > used later in this patch kit when printing diagnostic paths.

[...]

> > OK
> jeff

FWIW I've fixed up the conflicts in this patch with Lewis's fix for PR
49973, retested the result, and committed it as r279152 (to reduce the
size of the next version of the analyzer patch kit).

Dave



[RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-09 Thread Xiong Hu Luo
The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2, This patch enables adding orig_sum to the new nodes
for self recursive node.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

2019-12-10  Luo Xiong Hu  

* ipa-pure-const.c (self_recursive_p): Move it from ...
(propagate_pure_const): Use cgraph_node::self_recursive_p.
(pass_nothrow::execute): Likewise.
* cgraph.h (cgraph_node::self_recursive_p): to ... this.
* ipa-cp.c (update_profiling_info): Check self_recursive_p node.
---
 gcc/cgraph.h | 18 ++
 gcc/ipa-cp.c |  5 -
 gcc/ipa-pure-const.c | 19 ++-
 3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cdeea4d9953..1aca7d114e9 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1329,6 +1329,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
   /* Return true if function should be optimized for size.  */
   bool optimize_for_size_p (void);
 
+  /* Return true if NODE is self recursive function.  */
+  inline bool self_recursive_p (void);
+
   /* Dump the callgraph to file F.  */
   static void dump_cgraph (FILE *f);
 
@@ -3285,6 +3288,21 @@ cgraph_node::optimize_for_size_p (void)
 return false;
 }
 
+/* Return true if NODE is self recursive function.
+   Indirectly recursive functions appears as non-trivial strongly
+   connected components, so we need to care about self recursion
+   only.  */
+
+inline bool
+cgraph_node::self_recursive_p (void)
+{
+  struct cgraph_edge *e;
+  for (e = this->callees; e; e = e->next_callee)
+if (e->callee->function_symbol () == this)
+  return true;
+  return false;
+}
+
 /* Return symtab_node for NODE or create one if it is not present
in symtab.  */
 
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..76c1b309d04 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4305,7 +4305,10 @@ update_profiling_info (struct cgraph_node *orig_node,
 remainder = remainder.guessed_local ();
 
   new_sum = orig_node_count.combine_with_ipa_count (new_sum);
-  new_node->count = new_sum;
+  if (orig_node->self_recursive_p ())
+new_node->count = (orig_sum + new_sum).apply_scale (5, 10);
+  else
+new_node->count = new_sum;
   orig_node->count = remainder;
 
   profile_count::adjust_for_ipa_scaling (&new_sum, &orig_new_node_count);
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a142e0cc8f6..520ed39b476 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1371,21 +1371,6 @@ ignore_edge_for_nothrow (struct cgraph_edge *e)
  || !opt_for_fn (ultimate_target->decl, flag_ipa_pure_const));
 }
 
-/* Return true if NODE is self recursive function.
-   Indirectly recursive functions appears as non-trivial strongly
-   connected components, so we need to care about self recursion
-   only.  */
-
-static bool
-self_recursive_p (struct cgraph_node *node)
-{
-  struct cgraph_edge *e;
-  for (e = node->callees; e; e = e->next_callee)
-if (e->callee->function_symbol () == node)
-  return true;
-  return false;
-}
-
 /* Return true if N is cdtor that is not const or pure.  In this case we may
need to remove unreachable function if it is marked const/pure.  */
 
@@ -1666,7 +1651,7 @@ propagate_pure_const (void)
  if (this_state == IPA_NEITHER)
this_looping = w_l->looping_previously_known;
}
- if (!this_looping && self_recursive_p (w))
+ if (!this_looping && w->self_recursive_p ())
this_looping = true;
  if (!w_l->looping_previously_known)
this_looping = false;
@@ -2342,7 +2327,7 @@ pass_nothrow::execute (function *)
   node->set_nothrow_flag (true);
 
   bool cfg_changed = false;
-  if (self_recursive_p (node))
+  if (node->self_recursive_p ())
 FOR_EACH_BB_FN (this_block, cfun)
   if (gimple *g = last_stmt (this_block))
if (is_gimple_call (g))
-- 
2.21.0.777.g83232e3864



libbacktrace patch committed: Remove duplication of address handling

2019-12-09 Thread Ian Lance Taylor
Before this patch libbacktrace duplicated the handling of
DW_AT_low_pc, DW_AT_high_pc, and DW_AT_ranges, once to build a mapping
from addresses to compilation units, and then again to build a mapping
from addresses to functions within a compilation unit.  This patch
removes the duplication into a pair of functions, one of which takes a
function pointer to actually add the appropriate mapping.  This is a
step toward adding DWARF 5 support, as DWARF 5 requires handling more
cases here, and it seemed painful to introduce further duplication.
Bootstrapped and ran libbacktrace and Go testsuites on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: dwarf.c
===
--- dwarf.c (revision 279094)
+++ dwarf.c (working copy)
@@ -945,31 +945,28 @@ function_addrs_search (const void *vkey,
 return 0;
 }
 
-/* Add a new compilation unit address range to a vector.  Returns 1 on
-   success, 0 on failure.  */
+/* Add a new compilation unit address range to a vector.  This is
+   called via add_ranges.  Returns 1 on success, 0 on failure.  */
 
 static int
-add_unit_addr (struct backtrace_state *state, uintptr_t base_address,
-  struct unit_addrs addrs,
+add_unit_addr (struct backtrace_state *state, void *rdata,
+  uint64_t lowpc, uint64_t highpc,
   backtrace_error_callback error_callback, void *data,
-  struct unit_addrs_vector *vec)
+  void *pvec)
 {
+  struct unit *u = (struct unit *) rdata;
+  struct unit_addrs_vector *vec = (struct unit_addrs_vector *) pvec;
   struct unit_addrs *p;
 
-  /* Add in the base address of the module here, so that we can look
- up the PC directly.  */
-  addrs.low += base_address;
-  addrs.high += base_address;
-
   /* Try to merge with the last entry.  */
   if (vec->count > 0)
 {
   p = (struct unit_addrs *) vec->vec.base + (vec->count - 1);
-  if ((addrs.low == p->high || addrs.low == p->high + 1)
- && addrs.u == p->u)
+  if ((lowpc == p->high || lowpc == p->high + 1)
+ && u == p->u)
{
- if (addrs.high > p->high)
-   p->high = addrs.high;
+ if (highpc > p->high)
+   p->high = highpc;
  return 1;
}
 }
@@ -980,8 +977,12 @@ add_unit_addr (struct backtrace_state *s
   if (p == NULL)
 return 0;
 
-  *p = addrs;
+  p->low = lowpc;
+  p->high = highpc;
+  p->u = u;
+
   ++vec->count;
+
   return 1;
 }
 
@@ -1262,29 +1263,122 @@ lookup_abbrev (struct abbrevs *abbrevs,
   return (const struct abbrev *) p;
 }
 
-/* Add non-contiguous address ranges for a compilation unit.  Returns
-   1 on success, 0 on failure.  */
+/* This struct is used to gather address range information while
+   reading attributes.  We use this while building a mapping from
+   address ranges to compilation units and then again while mapping
+   from address ranges to function entries.  Normally either
+   lowpc/highpc is set or ranges is set.  */
+
+struct pcrange {
+  uint64_t lowpc;  /* The low PC value.  */
+  int have_lowpc;  /* Whether a low PC value was found.  */
+  uint64_t highpc; /* The high PC value.  */
+  int have_highpc; /* Whether a high PC value was found.  */
+  int highpc_is_relative;  /* Whether highpc is relative to lowpc.  */
+  uint64_t ranges; /* Offset in ranges section.  */
+  int have_ranges; /* Whether ranges is valid.  */
+};
+
+/* Update PCRANGE from an attribute value.  */
+
+static void
+update_pcrange (const struct attr* attr, const struct attr_val* val,
+   struct pcrange *pcrange)
+{
+  switch (attr->name)
+{
+case DW_AT_low_pc:
+  if (val->encoding == ATTR_VAL_ADDRESS)
+   {
+ pcrange->lowpc = val->u.uint;
+ pcrange->have_lowpc = 1;
+   }
+  break;
+
+case DW_AT_high_pc:
+  if (val->encoding == ATTR_VAL_ADDRESS)
+   {
+ pcrange->highpc = val->u.uint;
+ pcrange->have_highpc = 1;
+   }
+  else if (val->encoding == ATTR_VAL_UINT)
+   {
+ pcrange->highpc = val->u.uint;
+ pcrange->have_highpc = 1;
+ pcrange->highpc_is_relative = 1;
+   }
+  break;
+
+case DW_AT_ranges:
+  if (val->encoding == ATTR_VAL_UINT
+ || val->encoding == ATTR_VAL_REF_SECTION)
+   {
+ pcrange->ranges = val->u.uint;
+ pcrange->have_ranges = 1;
+   }
+  break;
+
+default:
+  break;
+}
+}
+
+/* Call ADD_RANGE for each lowpc/highpc pair in PCRANGE.  RDATA is
+   passed to ADD_RANGE, and is either a struct unit * or a struct
+   function *.  VEC is the vector we are adding ranges to, and is
+   either a struct unit_addrs_vector * or a struct function_vector *.
+   Returns 1 on success, 0 on error.  */
 
 static int
-add_unit_ranges (struct backtrace_state *state, uintptr_t base_address,
-struct unit *u, uint64_t ranges, uint64

[PATCH] Fix unrecognizable insn of pr92865

2019-12-09 Thread Hongtao Liu
Hi jakub:
  This patch is to enable integer mask cmp/cmov under AVX512F even
with TARGET_XOP .
  Bootstrap and regression test on i386/x86_64 backend is ok.

Changelog:
PR target/92865
* gcc/config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
integer mask cmov when available even with TARGET_XOP.
* gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.

-- 
BR,
Hongtao
From 2c53eb1ddf876a616c7ee914256e3a27f30cd158 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 10 Dec 2019 09:44:18 +0800
Subject: [PATCH] Fix unrecognizable insn of pr92865.

PR target/92865
* gcc/config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
integer mask cmov when available even with TARGET_XOP.
* gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.
---
 gcc/config/i386/i386-expand.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr92865-1.c | 67 +++
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92865-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index ff3c24cc5b7..cbf4eb7b487 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3428,7 +3428,7 @@ static bool
 ix86_valid_mask_cmp_mode (machine_mode mode)
 {
   /* XOP has its own vector conditional movement.  */
-  if (TARGET_XOP)
+  if (TARGET_XOP && !TARGET_AVX512F)
 return false;
 
   /* AVX512F is needed for mask operation.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr92865-1.c b/gcc/testsuite/gcc.target/i386/pr92865-1.c
new file mode 100644
index 000..49b5778a067
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92865-1.c
@@ -0,0 +1,67 @@
+/* PR target/92865 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512f -mavx512bw -mxop" } */
+/* { dg-final { scan-assembler-times "vpcmp\[bwdq\]\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vpcmpu\[bwdq\]\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 4 } } *
+/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 4 } } */
+
+extern char arraysb[64];
+extern short arraysw[32];
+extern int arraysd[16];
+extern long long arraysq[8];
+
+extern unsigned char arrayub[64];
+extern unsigned short arrayuw[32];
+extern unsigned int arrayud[16];
+extern unsigned long long arrayuq[8];
+
+int f1(char a)
+{
+  for (int i = 0; i < 64; i++)
+arraysb[i] = arraysb[i] >= a;
+}
+
+int f2(short a)
+{
+  for (int i = 0; i < 32; i++)
+arraysw[i] = arraysw[i] >= a;
+}
+
+int f3(int a)
+{
+  for (int i = 0; i < 16; i++)
+arraysd[i] = arraysd[i] >= a;
+}
+
+int f4(long long a)
+{
+  for (int i = 0; i < 8; i++)
+arraysq[i] = arraysq[i] >= a;
+}
+
+int f5(unsigned char a)
+{
+  for (int i = 0; i < 64; i++)
+arrayub[i] = arrayub[i] >= a;
+}
+
+int f6(unsigned short a)
+{
+  for (int i = 0; i < 32; i++)
+arrayuw[i] = arrayuw[i] >= a;
+}
+
+int f7(unsigned int a)
+{
+  for (int i = 0; i < 16; i++)
+arrayud[i] = arrayud[i] >= a;
+}
+
+int f8(unsigned long long a)
+{
+  for (int i = 0; i < 8; i++)
+arrayuq[i] = arrayuq[i] >= a;
+}
-- 
2.18.1



Re: [PATCH] implement pre-c++20 contracts

2019-12-09 Thread Jason Merrill

On 11/13/19 2:07 PM, Jeff Chapman wrote:

Attached is a patch that implements pre-c++20 contracts. This comes
from a long running development branch which included ChangeLog entries
as we went, which are included in the patch itself. The repo and
initial wiki are located here:
https://gitlab.com/lock3/gcc-new/wikis/GCC-with-Contracts


Thanks.  I've mostly been referring to the repo rather than the attached 
patch.  Below are a bunch of comments about the implementation, in no 
particular order.



We've previously circulated a paper (P1680) which documents our
implementation experience which largely covers new flags and features.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1680r0.pdf

That paper documents our changes in depth, barring the recently added
-fcontracts flag which is a global opt-in for contracts functionality.
As an overview of what we've done is included below for convenience.

The following switches have been added:

-fcontracts
   enable contract features in general

Flags from the old Working Draft:

-fcontract-build-level=[off|default|audit]
   specify max contract level to generate runtime checks for

-fcontract-continuation-mode=[on|off]
   toggle whether execution resumes after contract failure

Flags from P1290:

-fcontract-assumption-mode=[on|off]
   enable treating axiom level contracts as compile time assumptions
   (default on)

Flags from P1429:

-fcontract-mode=[on|off]
   enable or disable all contract facilities (default on)

-fcontract-semantic=:
   specify the concrete semantics for a level

Flags from P1332:

-fcontract-role=:
   specify semantics for all levels in a role (default, review, or a
 custom role)
   (ex: opt:assume,assume,assume)

Additional flags:

-fcontract-strict-declarations=[on|off]
   toggle warnings on generalized redecl of member functions
 without contracts (default off)


One assert contract may be present on any block scope empty statement:
   [[ assert contract-mode_opt : conditional-expression ]]

Function declarations have an optional, ordered, list of pre and post
contracts:
   [[ pre contract-mode_opt : conditional-expression ]]
   [[ post contract-mode_opt identifier_opt : conditional-expression ]]


The grammar for the contract-mode_opt portion which configures the
concrete semantics of the contracts is:

contract-mode
   contract-semantic
   contract-level_opt contract-role_opt

contract-semantic
   ignore
   check_never_continue
   check_maybe_continue
   check_always_continue
   assume

contract-level
   default
   audit
   axiom

contract-role
   %default
   %identifier


Contracts are configured via concrete semantics or by an optional
level and role which map to one of the concrete semantics:

   ignore – The contract does not affect the behavior of the program.
   assume – The condition may be used for optimization.
   never_continue – The program terminates if the contract is
not satisfied.
   maybe_continue – A user-defined violation handler may terminate the
program.
   always_continue – The program continues even if the contract is not
 satisfied.


I find the proposed always_continue semantics kind of nonsensical, as 
somewhat evidenced by the contortions the implementation gets into with 
marking the violation handler as pure.  The trick of assigning the 
result to a local variable won't work with optimization.


It also depends on the definition of a function that can be overridden 
to in fact never return.  This seems pretty fatal to it ever getting 
into the standard.


It's also unclear to me why anyone would want the described semantics. 
Why would you want a contract check that can be optimized away due to 
later undefined behavior?  The 0.2 use case from P1332 seems better 
suited to maybe_continue, because with always_continue such a check will 
have false negatives, leading to an unpleasant surprise when switching 
to never_continue.


I'd prefer to treat always_continue as equivalent to maybe_continue. 
Perhaps with ECF_NOTHROW|ECF_LEAF|ECF_NOVOPS to indicate that it doesn't 
clobber anything the caller can see, but that's risky if the handler is 
in fact defined in the same TU with anything that uses contracts.



+  if (strcmp (name, "check_always_continue") == 0
+  || strcmp (name, "always") == 0
+  || strcmp (name, "continue") == 0)


Accordingly, "continue" should mean maybe_continue.


+/* Definitions for C++ contract levels
+   Copyright (C) 1987-2018 Free Software Foundation, Inc.


Should just be 2019 for a new file.


+   Contributed by Michael Tiemann (tiem...@cygnus.com)


This seems inaccurate.  :)

It would also be good to have a reference to P1332 in this header.


+/* Assertion role info.
+
+   FIXME: Why is this prefixed cpp?  */
+struct cpp_contract_role


There seems to be no reason for it, since the struct definition is 
followed by a typedef; let's remove the prefix.


Any cpp_ prefixes we want to keep should c

Re: [PATCH] implement pre-c++20 contracts

2019-12-09 Thread Jason Merrill

On 12/10/19 12:58 AM, Jason Merrill wrote:

On 11/13/19 2:07 PM, Jeff Chapman wrote:

Attached is a patch that implements pre-c++20 contracts. This comes
from a long running development branch which included ChangeLog entries
as we went, which are included in the patch itself. The repo and
initial wiki are located here:
https://gitlab.com/lock3/gcc-new/wikis/GCC-with-Contracts


Thanks.  I've mostly been referring to the repo rather than the attached 
patch.  Below are a bunch of comments about the implementation, in no 
particular order.



We've previously circulated a paper (P1680) which documents our
implementation experience which largely covers new flags and features.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1680r0.pdf

That paper documents our changes in depth, barring the recently added
-fcontracts flag which is a global opt-in for contracts functionality.
As an overview of what we've done is included below for convenience.

The following switches have been added:

-fcontracts
   enable contract features in general

Flags from the old Working Draft:

-fcontract-build-level=[off|default|audit]
   specify max contract level to generate runtime checks for

-fcontract-continuation-mode=[on|off]
   toggle whether execution resumes after contract failure

Flags from P1290:

-fcontract-assumption-mode=[on|off]
   enable treating axiom level contracts as compile time assumptions
   (default on)

Flags from P1429:

-fcontract-mode=[on|off]
   enable or disable all contract facilities (default on)

-fcontract-semantic=:
   specify the concrete semantics for a level

Flags from P1332:

-fcontract-role=:
   specify semantics for all levels in a role (default, review, or a
 custom role)
   (ex: opt:assume,assume,assume)

Additional flags:

-fcontract-strict-declarations=[on|off]
   toggle warnings on generalized redecl of member functions
 without contracts (default off)


One assert contract may be present on any block scope empty statement:
   [[ assert contract-mode_opt : conditional-expression ]]

Function declarations have an optional, ordered, list of pre and post
contracts:
   [[ pre contract-mode_opt : conditional-expression ]]
   [[ post contract-mode_opt identifier_opt : conditional-expression ]]


The grammar for the contract-mode_opt portion which configures the
concrete semantics of the contracts is:

contract-mode
   contract-semantic
   contract-level_opt contract-role_opt

contract-semantic
   ignore
   check_never_continue
   check_maybe_continue
   check_always_continue
   assume

contract-level
   default
   audit
   axiom

contract-role
   %default
   %identifier


Contracts are configured via concrete semantics or by an optional
level and role which map to one of the concrete semantics:

   ignore – The contract does not affect the behavior of the program.
   assume – The condition may be used for optimization.
   never_continue – The program terminates if the contract is
    not satisfied.
   maybe_continue – A user-defined violation handler may terminate the
    program.
   always_continue – The program continues even if the contract is not
 satisfied.


I find the proposed always_continue semantics kind of nonsensical, as 
somewhat evidenced by the contortions the implementation gets into with 
marking the violation handler as pure.  The trick of assigning the 
result to a local variable won't work with optimization.


It also depends on the definition of a function that can be overridden 
to in fact never return.  This seems pretty fatal to it ever getting 
into the standard.


It's also unclear to me why anyone would want the described semantics. 
Why would you want a contract check that can be optimized away due to 
later undefined behavior?  The 0.2 use case from P1332 seems better 
suited to maybe_continue, because with always_continue such a check will 
have false negatives, leading to an unpleasant surprise when switching 
to never_continue.


I'd prefer to treat always_continue as equivalent to maybe_continue. 
Perhaps with ECF_NOTHROW|ECF_LEAF|ECF_NOVOPS to indicate that it doesn't 
clobber anything the caller can see, but that's risky if the handler is 
in fact defined in the same TU with anything that uses contracts.



+  if (strcmp (name, "check_always_continue") == 0
+  || strcmp (name, "always") == 0
+  || strcmp (name, "continue") == 0)


Accordingly, "continue" should mean maybe_continue.


+/* Definitions for C++ contract levels
+   Copyright (C) 1987-2018 Free Software Foundation, Inc.


Should just be 2019 for a new file.


+   Contributed by Michael Tiemann (tiem...@cygnus.com)


This seems inaccurate.  :)

It would also be good to have a reference to P1332 in this header.


+/* Assertion role info.
+
+   FIXME: Why is this prefixed cpp?  */
+struct cpp_contract_role


There seems to be no reason for it, since the struct definition is 
followed by a typedef; let's remove the prefix.

  1   2   >