from:"Jakub Jelinek"

Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Jakub Jelinek

On Thu, Sep 28, 2023 at 10:55:46AM +0100, Richard Sandiford wrote:
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  Also tested with
> Jakub's vec.h patch with the static_asserts uncommented; there were
> no errors from poly_int-related stuff.  OK to install?

LGTM (mostly as the general idea, but didn't see anything wrong in the
patch either), please give others a day or so to comment though.

Jakub

Re: [PATCH] Remove poly_int_pod

2023-09-29 Thread Jakub Jelinek

On Fri, Sep 29, 2023 at 08:31:47AM +0200, Richard Biener wrote:
> > IIRC the primary reason we settled on gcc-4.8.x was RHEL7/Centos7.  With
> > RHEL 7 approaching EOL moving the baseline forward would seem to make sense.
> >
> > I'd want to know if this affects folks using SuSE's enterprise distro
> > before actually making the change, but I'm broadly in favor of moving
> > forward it it's not going to have a major impact on users that are using
> > enterprise distros.
> 
> We're thinking of making GCC 13 the last major release to officially
> build for SLE12 which
> also uses GCC 4.8, so we'd be fine with doing this for GCC 14.

We'd need to figure out what to do with the OS on gcc{110,112,135} on CFarm,
those are all CentOS 7.9 with gcc 4.8.5.  Sure, one possibility would be
to install some DTS gcc if CentOS has one somewhere (though looking around,
/opt/at12.0 already has there gcc 8 from IBM Advanced Toolchain 12).

Jakub

Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-29 Thread Jakub Jelinek

On Thu, Sep 28, 2023 at 04:03:55PM +0200, Jakub Jelinek wrote:
> Bet we should make wide_int_storage and widest_int_storage GTY ((user)) and
> just declare but don't define the handlers or something similar.

That doesn't catch anything, but the following incremental patch compiles
just fine, proving we don't have any wide_int in GC memory anymore after
the wide_int -> rwide_int change in dwarf2out.h.
And the attached incremental patch on top of it which deletes even
widest_int from GC shows that we use widest_int in GC in:
omp_declare_variant_entry::score
omp_declare_variant_entry::score_in_declare_simd_clone
nb_iter_bound::member
loop::nb_iterations_upper_bound
loop::nb_iterations_likely_upper_bound
loop::nb_iterations_estimate
ipa_bits::value
ipa_bits::mask
so pretty much everything I spoke about (except I thought loop has
2 such members when it has 3).

--- gcc/wide-int.h  2023-09-28 14:55:40.059632413 +0200
+++ gcc/wide-int.h  2023-09-29 09:59:58.703931879 +0200
@@ -85,7 +85,7 @@
  and it always uses an inline buffer.  offset_int and rwide_int are
  GC-friendly, wide_int and widest_int are not.
 
- 3) widest_int.  This representation is an approximation of
+ 4) widest_int.  This representation is an approximation of
  infinite precision math.  However, it is not really infinite
  precision math as in the GMP library.  It is really finite
  precision math where the precision is WIDEST_INT_MAX_PRECISION.
@@ -4063,21 +4063,61 @@
   return wi::smod_trunc (x, y);
 }
 
-template
+void gt_ggc_mx (generic_wide_int  *) = delete;
+void gt_pch_nx (generic_wide_int  *) = delete;
+void gt_pch_nx (generic_wide_int  *,
+   gt_pointer_operator, void *) = delete;
+
+inline void
+gt_ggc_mx (generic_wide_int  *)
+{
+}
+
+inline void
+gt_pch_nx (generic_wide_int  *)
+{
+}
+
+inline void
+gt_pch_nx (generic_wide_int  *, gt_pointer_operator, void *)
+{
+}
+
+template
+void
+gt_ggc_mx (generic_wide_int  > *)
+{
+}
+
+template
+void
+gt_pch_nx (generic_wide_int  > *)
+{
+}
+
+template
+void
+gt_pch_nx (generic_wide_int  > *,
+  gt_pointer_operator, void *)
+{
+}
+
+template
 void
-gt_ggc_mx (generic_wide_int  *)
+gt_ggc_mx (generic_wide_int  > *)
 {
 }
 
-template
+template
 void
-gt_pch_nx (generic_wide_int  *)
+gt_pch_nx (generic_wide_int  > *)
 {
 }
 
-template
+template
 void
-gt_pch_nx (generic_wide_int  *, gt_pointer_operator, void *)
+gt_pch_nx (generic_wide_int  > *,
+  gt_pointer_operator, void *)
 {
 }
 


Jakub
--- gcc/wide-int.h.jj   2023-09-29 09:59:58.703931879 +0200
+++ gcc/wide-int.h  2023-09-29 10:05:27.653317149 +0200
@@ -4103,23 +4103,12 @@ gt_pch_nx (generic_wide_int 
-void
-gt_ggc_mx (generic_wide_int  > *)
-{
-}
-
+void gt_ggc_mx (generic_wide_int  > *) = delete;
 template
-void
-gt_pch_nx (generic_wide_int  > *)
-{
-}
-
+void gt_pch_nx (generic_wide_int  > *) = delete;
 template
-void
-gt_pch_nx (generic_wide_int  > *,
-  gt_pointer_operator, void *)
-{
-}
+void gt_pch_nx (generic_wide_int  > *,
+   gt_pointer_operator, void *) = delete;
 
 template
 void

Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-29 Thread Jakub Jelinek

On Thu, Sep 28, 2023 at 11:53:53AM -0400, Aldy Hernandez wrote:
> > ipa_bits is even worse, because unlike niter analysis, I think it is very
> > much desirable to support IPA VRP of all supported _BitInt sizes.  Shall
> > we perhaps use trailing_wide_int storage in there, or conditionally
> > rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else?
> 
> BTW, we already track value/mask pairs in the irange, so I think ipa_bits
> should ultimately disappear.  Doing so would probably simplify the code
> base.

Well, having irange in GC memory would be equally bad, it does have
non-trivial destructors (plus isn't meant to be space efficient either,
right?).
Though, perhaps we should use value-range-storage.h for that now that it
can store value/mask pair as well?  Either tweak it on the IPA side
such that everything is stored together (both the IPA VRP and IPA bit CCP)
or say use vrange_storage with zero (or one dummy) ranges + the value/mask
pair.

Jakub

[PATCH] use _grow_cleared rather than _grow on vect_unpromoted_value

2023-09-29 Thread Jakub Jelinek

On Wed, Sep 27, 2023 at 11:15:26AM +, Richard Biener wrote:
> > tree-vect-patterns.cc:2947  unprom.quick_grow (nops);
> > T = vect_unpromoted_value
> > Go for quick_grow_cleared?  Something else?
> 
> The CTOR zero-initializes everything, so maybe it can go.  In theory
> .set_op could also be changed to .push_op ...

So, I had a look at this and I think using quick_grow_cleared is best choice
here.  The nops is 2 or 1 most of the time, worst case 3, so the price of
extra initialization of 4 pointer-sized-or-less members times 1, 2 or 3
doesn't seem worth bothering, it is similar to the bitmap_head case where
we already pay the price for just one structure anytime we do
  vect_unpromoted_value unprom_diff;
(and later set_op on it) or even
  vect_unpromoted_value unprom0[2];

With this patch and Richard S's poly_int_pod removal the static_assert can
be enabled as well and gcc builds.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The second patch waits for the poly_int_pod removal commit and has been just
build tested but not bootstrapped yet.

2023-09-29  Jakub Jelinek  

* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Use
quick_grow_cleared method on unprom rather than quick_grow.

--- gcc/tree-vect-patterns.cc.jj2023-08-24 15:37:29.321410276 +0200
+++ gcc/tree-vect-patterns.cc   2023-09-29 09:45:27.980168865 +0200
@@ -2944,7 +2944,7 @@ vect_recog_over_widening_pattern (vec_in
   /* Check the operands.  */
   unsigned int nops = gimple_num_ops (last_stmt) - first_op;
   auto_vec  unprom (nops);
-  unprom.quick_grow (nops);
+  unprom.quick_grow_cleared (nops);
   unsigned int min_precision = 0;
   bool single_use_p = false;
   for (unsigned int i = 0; i < nops; ++i)

Jakub
2023-09-29  Jakub Jelinek  

* vec.h (quick_grow): Uncomment static_assert.

--- gcc/vec.h.jj2023-09-29 10:39:09.327223861 +0200
+++ gcc/vec.h   2023-09-29 10:44:03.764108443 +0200
@@ -1396,7 +1396,7 @@ inline void
 vec::quick_grow (unsigned len)
 {
   gcc_checking_assert (length () <= len && len <= m_vecpfx.m_alloc);
-//  static_assert (std::is_trivially_default_constructible ::value, "");
+  static_assert (std::is_trivially_default_constructible ::value, "");
   m_vecpfx.m_num = len;
 }

Re: [PATCH] vec.h, v3: Make some ops work with non-trivially copy constructible and/or destructible types

2023-09-29 Thread Jakub Jelinek

On Fri, Sep 29, 2023 at 11:00:01AM +0100, Jonathan Wakely wrote:
> > +/* Destruct N elements in DST.  */
> > +
> > +template 
> > +inline void
> > +vec_destruct (T *dst, unsigned n)
> > +{
> > +  for ( ; n; ++dst, --n)
> > +dst->~T ();
> > +}
> > +
> >  /* Control data for vectors.  This contains the number of allocated
> > and used slots inside a vector.  */
> >
> > @@ -310,6 +338,9 @@ va_heap::release (vec >if (v == NULL)
> >  return;
> >
> > +  if (!std::is_trivially_destructible ::value)
> 
> Do GCC's coding standards really require a space before the template
> argument list, like for a function parameter list?
> That style doesn't seem to be used elsewhere (and is not idiomatic for
> C++ at all).

Seems it is mixed, in gcc/ subdirectory:
grep ' <[a-zA-Z]' *.h *.cc */*.h */*.cc | grep -v '#.*include' | wc -l
7143
grep '[^ ]<[a-zA-Z]' *.h *.cc */*.h */*.cc | grep -v '#.*include' | wc -l
13579

> > +  template
> > +  struct is_trivially_copyable_or_pair >
> 
> The space in "> >" is only required in C++98, we don't need it in C++11.

I know, I was just following what code around used as well.  Though
admittedly that is from the days where we needed C++98 compatibility.

Jakub

[PATCH] vec.h: Guard most of static assertions for GCC >= 5

2023-09-29 Thread Jakub Jelinek

Hi!

As reported by Jonathan on IRC, my vec.h patch broke build with GCC 4.8.x
or 4.9.x as system compiler, e.g. on CFarm.
The problem is that while all of
std::is_trivially_{destructible,copyable,default_constructible} traits
are in C++, only std::is_trivially_destructible has been implemented in GCC
4.8, the latter two were added only in GCC 5.
Only std::is_trivially_destructible is the really important one though,
which is used to decide what pop returns and whether to invoke the
destructors or not.  The rest are solely used in static_asserts and as such
I think it is acceptable if we don't assert those when built with GCC 4.8
or 4.9, anybody doing bootstrap from those system compilers or doing builds
with newer GCC will catch that.

So, the following patch guards those for 5+.
If we switch to C++14 later on and start requiring newer version of system
GCC as well (do we require GCC >= 5 which claims the last C++14 language
features, or what provides all C++14 library features, or GCC >= 6 which
uses -std=c++14 by default?), this patch then can be reverted.

Ok for trunk?

2023-09-29  Jakub Jelinek  

* vec.h (quick_insert, ordered_remove, unordered_remove,
block_remove, qsort, sort, stablesort, quick_grow): Guard
std::is_trivially_{copyable,default_constructible} and
vec_detail::is_trivially_copyable_or_pair static assertions
with GCC_VERSION >= 5000.
(vec_detail::is_trivially_copyable_or_pair): Guard definition
with GCC_VERSION >= 5000.

--- gcc/vec.h.jj2023-09-29 10:59:09.830551963 +0200
+++ gcc/vec.h   2023-09-29 12:29:32.676428677 +0200
@@ -1086,7 +1086,12 @@ vec::quick_insert (unsig
 {
   gcc_checking_assert (length () < allocated ());
   gcc_checking_assert (ix <= length ());
+#if GCC_VERSION >= 5000
+  /* GCC 4.8 and 4.9 only implement std::is_trivially_destructible,
+ but not std::is_trivially_copyable nor
+ std::is_trivially_default_constructible.  */
   static_assert (std::is_trivially_copyable ::value, "");
+#endif
   T *slot = &address ()[ix];
   memmove (slot + 1, slot, (m_vecpfx.m_num++ - ix) * sizeof (T));
   *slot = obj;
@@ -1102,7 +1107,9 @@ inline void
 vec::ordered_remove (unsigned ix)
 {
   gcc_checking_assert (ix < length ());
+#if GCC_VERSION >= 5000
   static_assert (std::is_trivially_copyable ::value, "");
+#endif
   T *slot = &address ()[ix];
   memmove (slot, slot + 1, (--m_vecpfx.m_num - ix) * sizeof (T));
 }
@@ -1150,7 +1157,9 @@ inline void
 vec::unordered_remove (unsigned ix)
 {
   gcc_checking_assert (ix < length ());
+#if GCC_VERSION >= 5000
   static_assert (std::is_trivially_copyable ::value, "");
+#endif
   T *p = address ();
   p[ix] = p[--m_vecpfx.m_num];
 }
@@ -1164,13 +1173,16 @@ inline void
 vec::block_remove (unsigned ix, unsigned len)
 {
   gcc_checking_assert (ix + len <= length ());
+#if GCC_VERSION >= 5000
   static_assert (std::is_trivially_copyable ::value, "");
+#endif
   T *slot = &address ()[ix];
   m_vecpfx.m_num -= len;
   memmove (slot, slot + len, (m_vecpfx.m_num - ix) * sizeof (T));
 }
 
 
+#if GCC_VERSION >= 5000
 namespace vec_detail
 {
   /* gcc_{qsort,qsort_r,stablesort_r} implementation under the hood
@@ -1189,6 +1201,7 @@ namespace vec_detail
   : std::integral_constant::value
 && std::is_trivially_copyable::value> { };
 }
+#endif
 
 /* Sort the contents of this vector with qsort.  CMP is the comparison
function to pass to qsort.  */
@@ -1197,7 +1210,9 @@ template
 inline void
 vec::qsort (int (*cmp) (const void *, const void *))
 {
+#if GCC_VERSION >= 5000
   static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
+#endif
   if (length () > 1)
 gcc_qsort (address (), length (), sizeof (T), cmp);
 }
@@ -1210,7 +1225,9 @@ inline void
 vec::sort (int (*cmp) (const void *, const void *, void *),
   void *data)
 {
+#if GCC_VERSION >= 5000
   static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
+#endif
   if (length () > 1)
 gcc_sort_r (address (), length (), sizeof (T), cmp, data);
 }
@@ -1223,7 +1240,9 @@ inline void
 vec::stablesort (int (*cmp) (const void *, const void *,
 void *), void *data)
 {
+#if GCC_VERSION >= 5000
   static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
+#endif
   if (length () > 1)
 gcc_stablesort_r (address (), length (), sizeof (T), cmp, data);
 }
@@ -1396,7 +1415,9 @@ inline void
 vec::quick_grow (unsigned len)
 {
   gcc_checking_assert (length () <= len && len <= m_vecpfx.m_alloc);
+#if GCC_VERSION >= 5000
 //  static_assert (std::is_trivially_default_constructible ::value, "");
+#endif
   m_vecpfx.m_num = len;
 }
 

Jakub

Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-29 Thread Jakub Jelinek

On Fri, Sep 29, 2023 at 11:30:06AM +0100, Richard Sandiford wrote:
> Yeah, think I agree with this.  widest_int really combined two things:
> 
> (a) a way of storing any integer IL value without loss of precision
> 
> (b) a way of attaching sign information
> 
> Arithmetic on widest_int is dubious, because it wraps at an essentially
> arbitrary point.

Yeah, but some more sensible than others.  Logical ops make sense at any
precision, addition/subtraction might better have for widest_int a few bits
of extra maximum precision over wide_int maximum supported precision such
that overflows are caught, but say for multiplication that would be much
more.  Of course, we already document that bswap/rotates don't make any
sense on widest_int, and as I wrote, e.g. lrshift or unsigned division
of values with MSB set are very questionable too.

> The approach in the patch looks good to me from a quick scan FWIW.
> Will try to review over the weekend.

For the actual patch I have another worry (but without the GTY widest_int
uses and slsr etc. addressed first it can't be easily verified). 
wide_int_ref_storage has VAR_PRECISION like wide_int, while I've hacked up
get_binary_precision not to allocate uselessly for it a lot of memory,
I'm afraid any time we perform some operation on wide_int_refs created from
widest_int (so, they get in most cases reasonably small get_len () but
huge get_precision ()) we'd uselessly allocate 255 HOST_WIDE_INTs of memory
from heap.  So maybe wide_int should also like widest_int in the patch
have u.val vs. u.valp decided based on estimated or later real get_len ()
rather than get_precision ().  In the end, I think we should make sure that
unless _BitInt is seen in the sources, we don't really ever allocate any
heap memory in wide_int/widest_int.  At least unless we change the number
of inline elements of the arrays for wide_int/widest_int, if we lower that
say to some hardcoded number of limbs on all arches (say 4 or 6 or 8, 9
x86-64 is kind of weird) that allocations happen only very rarely.  Normal
128-bit precision math shouldn't trigger them certainly.

Jakub

[committed] lowerbitint: Fix 2 bitint lowering bugs [PR111625]

2023-09-30 Thread Jakub Jelinek

Hi!

This patch fixes 2 issues.  One is when we want to get address of
an uninitialized large/huge bitint SSA_NAME for multiplication/division/modulo
or conversion to floating point (binary or decimal), the code just creates
an uninitialized limb sized variable and passes address of that, but I forgot
to initialize *prec in that case, so it invoked UB at compile time rather
than at runtime.  As it is UB, we could use anything valid as precision there,
say 2 bits for signed, 1 bit for unsigned as smallest possible set of values,
or full bitint precision as full random value.  Though, because we only pass
address to a single limb, I think it is best to pass the bitsize of the limb.

And the other issue is that when ranger in range_to_prec finds some range
is undefined_p (), it will assert {lower,upper}_bound () method isn't called
on it, but we were.  So, the patch adjusts range_to_proc to treat it like
the !optimized case, full bitint precision.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-09-30  Jakub Jelinek  

PR middle-end/111625
PR middle-end/111637
* gimple-lower-bitint.cc (range_to_prec): Use prec or -prec if
r.undefined_p ().
(bitint_large_huge::handle_operand_addr): For uninitialized operands
use limb_prec or -limb_prec precision.

--- gcc/gimple-lower-bitint.cc.jj   2023-09-20 09:45:39.0 +0200
+++ gcc/gimple-lower-bitint.cc  2023-09-29 16:29:36.541473743 +0200
@@ -1932,7 +1932,8 @@ range_to_prec (tree op, gimple *stmt)
   unsigned int prec = TYPE_PRECISION (type);
 
   if (!optimize
-  || !get_range_query (cfun)->range_of_expr (r, op, stmt))
+  || !get_range_query (cfun)->range_of_expr (r, op, stmt)
+  || r.undefined_p ())
 {
   if (TYPE_UNSIGNED (type))
return prec;
@@ -2066,6 +2067,9 @@ bitint_large_huge::handle_operand_addr (
}
  else if (gimple_code (g) == GIMPLE_NOP)
{
+ *prec = TYPE_UNSIGNED (TREE_TYPE (op)) ? limb_prec : -limb_prec;
+ if (prec_stored)
+   *prec_stored = *prec;
  tree var = create_tmp_var (m_limb_type);
  TREE_ADDRESSABLE (var) = 1;
  ret = build_fold_addr_expr (var);

Jakub

[committed] gimple-match-head: Fix a pasto in function comment

2023-09-30 Thread Jakub Jelinek

Hi!

This function comment has been pasted from gimple_bitwise_equal_p and haven't
been adjusted for different function name.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-09-30  Jakub Jelinek  

* gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Fix a pasto
in function comment.

--- gcc/gimple-match-head.cc.jj 2023-08-11 09:27:36.292378545 +0200
+++ gcc/gimple-match-head.cc2023-09-29 18:03:34.535495052 +0200
@@ -274,7 +274,7 @@ gimple_bitwise_equal_p (tree expr1, tree
 bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree));
 bool gimple_maybe_cmp (tree, tree *, tree (*) (tree));
 
-/* Helper function for bitwise_equal_p macro.  */
+/* Helper function for bitwise_inverted_equal_p macro.  */
 
 static inline bool
 gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, bool &wascmp, tree 
(*valueize) (tree))

Jakub

[PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]

2023-09-30 Thread Jakub Jelinek

Hi!

I really can't figure out why one would need to add extra casts.
type must be an integral type which has BIT_NOT_EXPR applied on it
which yields all ones and we need a type in which negating 0 or 1
range will yield 0 or all ones, I think all integral types satisfy
that.
This fixes PR111369, where one of the bitint*.c tests FAILs with
GCC_TEST_RUN_EXPENSIVE=1.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-09-30  Jakub Jelinek  

PR middle-end/111369
* match.pd (a?~t:t -> (-(a))^t): Always convert to type rather
than using build_nonstandard_integer_type.

--- gcc/match.pd.jj 2023-09-28 11:32:16.122434235 +0200
+++ gcc/match.pd2023-09-29 18:05:50.554640268 +0200
@@ -6742,12 +6742,7 @@ (define_operator_list SYNC_FETCH_AND_AND
   (if (INTEGRAL_TYPE_P (type)
&& bitwise_inverted_equal_p (@1, @2, wascmp)
&& (!wascmp || element_precision (type) == 1))
-   (with {
- auto prec = TYPE_PRECISION (type);
- auto unsign = TYPE_UNSIGNED (type);
- tree inttype = build_nonstandard_integer_type (prec, unsign);
-}
-(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2)))
+   (bit_xor (negate (convert:type @0)) (convert:type @2)
 #endif
 
 /* Simplify pointer equality compares using PTA.  */

Jakub

Re: [PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]

2023-09-30 Thread Jakub Jelinek

On Sat, Sep 30, 2023 at 11:44:59AM +0200, Jakub Jelinek wrote:
> I really can't figure out why one would need to add extra casts.
> type must be an integral type which has BIT_NOT_EXPR applied on it
> which yields all ones and we need a type in which negating 0 or 1
> range will yield 0 or all ones, I think all integral types satisfy
> that.
> This fixes PR111369, where one of the bitint*.c tests FAILs with
> GCC_TEST_RUN_EXPENSIVE=1.

Though, I think there is an preexisting issue which the
build_nonstandard_integer_type didn't help with; if type is signed 1-bit
precision, then I think a ? ~t : t could be valid, but -(type)a would invoke
UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit
invokes UB.
So perhaps we should guard this optimization on type having element precision > 
1
or being unsigned.  Plus the (convert:type @2) didn't make sense, @2 already
must have TREE_TYPE type.

So untested patch would be then:

2023-09-29  Jakub Jelinek  

PR middle-end/111369
* match.pd (a?~t:t -> (-(a))^t): Always convert to type rather
than using build_nonstandard_integer_type.  Punt if type is
signed 1-bit precision type.

--- gcc/match.pd.jj 2023-09-29 18:58:42.724956659 +0200
+++ gcc/match.pd2023-09-30 11:54:16.603280666 +0200
@@ -6741,13 +6741,9 @@ (define_operator_list SYNC_FETCH_AND_AND
  (with { bool wascmp; }
   (if (INTEGRAL_TYPE_P (type)
&& bitwise_inverted_equal_p (@1, @2, wascmp)
-   && (!wascmp || element_precision (type) == 1))
-   (with {
- auto prec = TYPE_PRECISION (type);
- auto unsign = TYPE_UNSIGNED (type);
- tree inttype = build_nonstandard_integer_type (prec, unsign);
-}
-(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2)))
+   && (!wascmp || element_precision (type) == 1)
+   && (!TYPE_OVERFLOW_WRAPS (type) || element_precision (type) > 1))
+   (bit_xor (negate (convert:type @0)) @2
 #endif

 /* Simplify pointer equality compares using PTA.  */

Jakub

Re: [PATCH] RISC-V: Use safe_grow_cleared for vector info [PR111469]

2023-10-02 Thread Jakub Jelinek

On Sun, Oct 01, 2023 at 07:03:51AM +0800, 钟居哲 wrote:
> LGTM.
> 
> juzhe.zh...@rivai.ai
>  
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index af8c31d873c..4b06d93e7f9 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -2417,8 +2417,8 @@ vector_infos_manager::vector_infos_manager ()
>vector_antin = nullptr;
>vector_antout = nullptr;
>vector_earliest = nullptr;
> -  vector_insn_infos.safe_grow (get_max_uid ());
> -  vector_block_infos.safe_grow (last_basic_block_for_fn (cfun));
> +  vector_insn_infos.safe_grow_cleared (get_max_uid ());
> +  vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
>if (!optimize)
>  {
>basic_block cfg_bb;

Note, while it works around the build failure, I strongly doubt it is the
right fix in this case.  The other spots which have been changed similarly
are quite different, all the vec cases have been followed
(almost) immediately by bitmap_initialize of all the elements or just 1-3
elements and their actual uses.
The above is very different.  Sizing a vector with get_max_uid ()
means it is very likely very sparse and so constructing every element in the
array seems undesirable.  While it is true that e.g. IRA or DF use vectors
indexed by INSN_UID, I think the vectors pretty much always have pointer
elements and say pool allocate what it points to upon first access.
While vector_insn_info is huge (48 bytes on 64-bit hosts from what I can see)
even without much attempts to make it shorter (e.g. the vl_vtype_info member
ordering of 2xpointer sized member, 1 byte member, 4 byte member, 3 1 byte
members creates unnecessary padding).
The reason why arrays indexed by INSN_UID are sparse are 3:
1) we have --param min-nondebug-insn-uid= parameter (where, albeit usually
   just for debugging, one can specify very high start for those, say
   --param min-nondebug-insn-uid=10 means debug insns will be created
   with INSN_UID 1 and up, while non-DEBUG_INSNs only with INSN_UID 10
   and up, so even a function containing just 3-4 insns will have
   get_max_uid () of 14 or so; the above allocates in that case
   14 * 48 bytes (bad) and newly constructs all the elements in it
2) INSN_UIDs are given to all newly created insns during RTL optimizations,
   when an insn is deleted, its INSN_UID is not reused, so there can be
   large gaps; the more churn in RTL optimizations, the larger
3) INSN_UIDs are given even to BARRIERs, DEBUG_INSNs etc., plus the question
   is if the algorithm really needs to track every "normal" insn as well,
   whether it shouldn't track just those that are in some ways using vectors
   or relevant to it, many scalar instructions might be uninteresting,
   DEBUG_INSNs certainly should be uninteresting (they must not affect
   code generation), etc.
So, I think having something lazily allocated and initialized on demand
might be a compile time memory and time win.
For basic blockswe perform some block number compactions during compilation,
so those shouldn't be denser.  But the structure for each basic block is
even bigger (104 bytes, because it contains 2 x 48 plus probability).
E.g. DF uses for many purposes LUIDs instead, which need to be recomputed,
but are logical uids within each basic block.

Jakub

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-02 Thread Jakub Jelinek

On Mon, Oct 02, 2023 at 01:38:53PM +0100, Tamar Christina wrote:
> Hi All,
> 
> I recently committed a patch that uses a nested std::pair in the second 
> argument.
> It temporarily adds a second ranking variable for sorting and then later 
> drops it.
> 
> This hits the newly added assert in vec.h.  This assert made some relaxation 
> for
> std::pair but doesn't allow this case through.  The patch allows a recursive
> std::pair in the second argument which fixes bootstrap.

I must say I still don't understand why using a
struct ifcvt_arg_entry { tree arg; unsigned len, occur; };
with comments describing what the members mean wouldn't be a better fix,
in the sorting function what exactly means x{1,2}.second.first and
x{1,2}.second.second isn't easily understandable, neither from the
identifiers nor from any comments.
Seems because you use 2 separate vectors, one with just tree elements and
another with those tree elements + 2 unsigned values cached from it for the
sorting purpose and then rewrite the original tree vector after sorting, I
don't really see why nested std::pair would be a better match for it than
a named structure.  Furthermore, why populate args first, then compute
the extra 2 integers in another loop pushing to argsKV and finally overwrite
args with sorted values?  Can't the first loop push tree with the 2 integers
already?  And what is the point of not using this structure later on when
both args and argsKV vectors are live until the end of the same function?
Can't you either pass that argsKV to others, having just one vector, or
at least release the other vector when you don't really need it?
Formatting style, swap? arg1 : arg0 isn't correctly formatted, missing space
before ?.

Also, ArgEntry is CamelCase which we (usually) don't use in GCC and
additionally doesn't seem to be unique enough for ODR purposes.
Ditto argsKV.

> It should also still maintain the invariant that was being tested here since
> the nested arguments should still be trivially copyable.
> 
> Bootstrapped on aarch64-none-linux-gnu, x86_64-linux-gnu, and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   vec.h (struct is_trivially_copyable_or_pair): Check recursively in

Missing "* " above.

>   second arg.

Jakub

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Jakub Jelinek

On Tue, Oct 03, 2023 at 10:27:16AM +, Tamar Christina wrote:
> +/* Structure used to track meta-data on PHI arguments used to generate
> +   most efficient comparison sequence to slatten a PHI node.  */

^^^ typo (at least, never heard
of this word, and wiktionary doesn't know it either (except for 
Dannish/Swedish))

> @@ -2045,6 +2065,25 @@ gen_phi_nest_statement (gphi *phi, 
> gimple_stmt_iterator *gsi,
>return lhs;
>  }
>  

Perhaps add a short function comment here?

> +static int
> +cmp_arg_entry (const void *p1, const void *p2)
> +{
> +  const ifcvt_arg_entry sval1 = *(const ifcvt_arg_entry *)p1;
> +  const ifcvt_arg_entry sval2 = *(const ifcvt_arg_entry *)p2;
> +
> +  if (sval1.num_compares < sval2.num_compares)
> +return -1;
> +  else if (sval1.num_compares > sval2.num_compares)
> +return 1;
> +
> +  if (sval1.occurs < sval2.occurs)
> +return -1;
> +  else if (sval1.occurs > sval2.occurs)
> +return 1;
> +
> +  return 0;
> +}
> +

> @@ -2167,61 +2206,53 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator 
> *gsi)
>/* Create hashmap for PHI node which contain vector of argument indexes
>   having the same value.  */
>bool swap = false;
> -  hash_map > phi_arg_map;
> +  hash_map > phi_arg_map;
>unsigned int num_args = gimple_phi_num_args (phi);
>/* Vector of different PHI argument values.  */
> -  auto_vec args (num_args);
> +  auto_vec args;
>  
> -  /* Compute phi_arg_map.  */
> +  /* Compute phi_arg_map, determine the list of unique PHI args and the 
> indices
> + where they are in the PHI node.  The indices will be used to determine
> + the conditions to apply and their complexity.  */
>for (i = 0; i < num_args; i++)
>  {
>tree arg;
>  
>arg = gimple_phi_arg_def (phi, i);
> -  if (!phi_arg_map.get (arg))
> - args.quick_push (arg);
>phi_arg_map.get_or_insert (arg).safe_push (i);
>  }
>  
> -  /* Determine element with max number of occurrences and complexity.  
> Looking at only
> - number of occurrences as a measure for complexity isn't enough as all 
> usages can
> - be unique but the comparisons to reach the PHI node differ per branch.  
> */
> -  typedef std::pair > ArgEntry;
> -  auto_vec argsKV;
> -  for (i = 0; i < args.length (); i++)
> +  /* Determine element with max number of occurrences and complexity.  
> Looking
> + at only number of occurrences as a measure for complexity isn't enough 
> as
> + all usages can be unique but the comparisons to reach the PHI node 
> differ
> + per branch.  */
> +  for (auto entry : phi_arg_map)
>  {
>unsigned int len = 0;
> -  for (int index : phi_arg_map.get (args[i]))
> +  for (int index : entry.second)
>   {
> edge e = gimple_phi_arg_edge (phi, index);
> len += get_bb_num_predicate_stmts (e->src);
>   }
>  
> -  unsigned occur = phi_arg_map.get (args[i])->length ();
> +  unsigned occur = entry.second.length ();
>if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> -  argsKV.safe_push ({ args[i], { len, occur }});
> +  args.safe_push ({ entry.first, len, occur, entry.second });
>  }
>  
>/* Sort elements based on rankings ARGS.  */
> -  std::sort(argsKV.begin(), argsKV.end(), [](const ArgEntry &left,
> -  const ArgEntry &right) {
> -return left.second < right.second;
> -  });
> -
> -  for (i = 0; i < args.length (); i++)
> -args[i] = argsKV[i].first;
> +  args.qsort (cmp_arg_entry);

I admit I don't know what you're using the args vector later on for and
whether its ordering affects code generation, but because you qsort it I
assume it does.  My worry is that a hash_map traversal might not be the same
order on all hosts and similarly qsort doesn't achieve stable sorting
in case num_compares and occurrs members are equal for two or more different
arguments.  Can that ever happen?  We have stablesort method instead of
qsort but that would require consistent ordering in the vector (std::sort
doesn't ensure stable sorting either).

If it is a non-issue, the patch is ok with the above nits fixed.  Otherwise
perhaps we'd need to push in the first loop into the vector (but that
  if (!phi_arg_map.get (arg))
args.quick_push (arg);
  phi_arg_map.get_or_insert (arg).safe_push (i);
in there was quite inefficient, better would be
  bool existed;
  phi_arg_map.get_or_insert (arg, &existed).safe_push (i);
  if (!existed)
args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL });
or something similar), plus use stablesort.  Or add another compared member
which would be the first position.

Jakub

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Jakub Jelinek

On Tue, Oct 03, 2023 at 11:41:01AM +, Tamar Christina wrote:
> > We have stablesort method instead of
> > qsort but that would require consistent ordering in the vector (std::sort
> > doesn't ensure stable sorting either).
> > 
> > If it is a non-issue, the patch is ok with the above nits fixed.  Otherwise
> > perhaps we'd need to push in the first loop into the vector (but that
> >   if (!phi_arg_map.get (arg))
> > args.quick_push (arg);
> >   phi_arg_map.get_or_insert (arg).safe_push (i); in there was quite
> > inefficient, better would be
> >   bool existed;
> >   phi_arg_map.get_or_insert (arg, &existed).safe_push (i);
> >   if (!existed)
> > args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL }); or something
> > similar), plus use stablesort.  Or add another compared member which would
> > be the first position.
> 
> Hmm the problem here is that it would make the second loop that fills in the 
> len
> quadratic as it has to search for arg in the list.  I suppose I could push a 
> pointer
> to the struct instead of `i` in the hashmap and the element into args and 
> update
> the pointer as we go along?  Would that work?

Only if the second loop traverses the hashmap elements and for each tries to
find the corresponding vector element.
If instead you do what you've done before in the second loop, walk the
vector and for each arg in there lookup phi_args_map.get (v.arg) (but please
just once, vanilla trunk looks it up twice in
  for (int index : phi_arg_map.get (args[i]))
{
  edge e = gimple_phi_arg_edge (phi, index);
  len += get_bb_num_predicate_stmts (e->src);
}

  unsigned occur = phi_arg_map.get (args[i])->length ();
), then I don't think it would be quadratic.

Jakub

Re: [PATCH] contrib/mklog.py: Fix issues reported by flake8

2023-10-03 Thread Jakub Jelinek

On Tue, Oct 03, 2023 at 02:02:40PM +0200, Martin Jambor wrote:
> Hi,
> 
> the testing infrastructure built by Martin Liška contains checking a
> few python scripts in contrib witha tool flake8.  That tool recently
> complains that:
> 
>   contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is 
> None:'
>   contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function 
> definition, found 1
> 
> I'd like to silence these with the following, hopefully trivial,
> changes.  However, I have only tested the changes by running flake8
> again and running ./contrib/mklog.py --help.
> 
> Is this good for trunk?  (Or should I stop using flake8 instead?)
> 
> Thanks,
> 
> Martin
> 
> 
> contrib/ChangeLog:
> 
> 2023-10-03  Martin Jambor  
> 
>   * mklog.py (skip_line_in_changelog): Compare to None using is instead
>   of ==, add an extra newline after the function.

Ok, thanks.

Jakub

[PATCH] match.pd: Fix up a ? cst1 : cst2 regression on signed bool [PR111668]

2023-10-03 Thread Jakub Jelinek

Hi!

My relatively recent changes to these simplifiers to avoid
doing build_nonstandard_integer_type (primarily for BITINT_TYPE)
broke PR111668, a recurrence of the PR110487 bug.
I thought the build_nonstandard_integer_type isn't ever needed there,
but there is one special case where it is.
For the a ? -1 : 0 and a ? 0 : -1 simplifications there are actually
3 different cases.  One is for signed 1-bit precision types (signed
kind of implied from integer_all_onesp, because otherwise it would
match integer_onep earlier), where the simplifier wierdly was matching
them using the a ? powerof2cst : 0 -> a << (log2(powerof2cst))
simplification and then another simplifier optimizing away the left shift
when log2(powerof2cst) was 0.  Another one is signed BOOLEAN_TYPE with
precision > 1, where indeed we shouldn't be doing the negation in type,
because it isn't well defined in that type, the type only has 2 valid
values, 0 and -1.  As an alternative, we could also e.g. cast to
signed 1-bit precision BOOLEAN_TYPE and then extend to type.
And the last case is what we were doing for types which have both 1 and -1
(all all ones) as valid values (i.e. all signed/unsigned ENUMERAL_TYPEs,
INTEGRAL_TYPEs and BITINT_TYPEs with precision > 1).

The following patch avoids the hops through << 0 for 1-bit precision
and uses build_nonstandard_integer_type solely for the BOOLEAN_TYPE types
(where we have a guarantee the precision is reasonably small, nothing ought
to be created 129+ bit precision BOOLEAN_TYPEs).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-03  Jakub Jelinek  

PR tree-optimization/111668
* match.pd (a ? CST1 : CST2): Handle the a ? -1 : 0 and
a ? 0 : -1 cases before the powerof2cst cases and differentiate
between 1-bit precision types, larger precision boolean types
and other integral types.  Fix comment pastos and formatting.

--- gcc/match.pd.jj 2023-10-02 09:42:01.657836005 +0200
+++ gcc/match.pd2023-10-03 10:33:30.817614648 +0200
@@ -5100,36 +5100,53 @@ (define_operator_list SYNC_FETCH_AND_AND
  (switch
   (if (integer_zerop (@2))
(switch
-/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
+/* a ? 1 : 0 -> a if 0 and 1 are integral types.  */
 (if (integer_onep (@1))
  (convert (convert:boolean_type_node @0)))
+/* a ? -1 : 0 -> -a.  */
+(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
+ (if (TYPE_PRECISION (type) == 1)
+  /* For signed 1-bit precision just cast bool to the type.  */
+  (convert (convert:boolean_type_node @0))
+  (if (TREE_CODE (type) == BOOLEAN_TYPE)
+   (with {
+ tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type),
+ TYPE_UNSIGNED (type));
+   }
+   (convert (negate (convert:intt (convert:boolean_type_node @0)
+   (negate (convert:type (convert:boolean_type_node @0))
 /* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
 (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@1))
  (with {
tree shift = build_int_cst (integer_type_node, tree_log2 (@1));
   }
-  (lshift (convert (convert:boolean_type_node @0)) { shift; })))
-/* a ? -1 : 0 -> -a.  No need to check the TYPE_PRECISION not being 1
-   here as the powerof2cst case above will handle that case correctly.  */
-(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
- (negate (convert:type (convert:boolean_type_node @0))
+  (lshift (convert (convert:boolean_type_node @0)) { shift; })
   (if (integer_zerop (@1))
(switch
-/* a ? 0 : 1 -> !a. */
+/* a ? 0 : 1 -> !a.  */
 (if (integer_onep (@2))
  (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; })))
-/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
+/* a ? 0 : -1 -> -(!a).  */
+(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2))
+ (if (TYPE_PRECISION (type) == 1)
+  /* For signed 1-bit precision just cast bool to the type.  */
+  (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; }))
+  (if (TREE_CODE (type) == BOOLEAN_TYPE)
+   (with {
+ tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type),
+ TYPE_UNSIGNED (type));
+   }
+   (convert (negate (convert:intt (bit_xor (convert:boolean_type_node @0)
+   { boolean_true_node; })
+   (negate (convert:type (bit_xor (convert:boolean_type_node @0)
+ { boolean_true_node; }))
+/* a ? 0 : powerof2cst -> (!a) << (log2(powerof2cst)) */
 (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@2))
  (with {
tree shift = build_int_cst (integer_type_node, tree_log2 (@2));

[PATCH] match.pd: Avoid other build_nonstandard_integer_type calls [PR111369]

2023-10-03 Thread Jakub Jelinek

Hi!

On Sat, Sep 30, 2023 at 11:57:38AM +0200, Jakub Jelinek wrote:
> > This fixes PR111369, where one of the bitint*.c tests FAILs with
> > GCC_TEST_RUN_EXPENSIVE=1.
> 
> Though, I think there is an preexisting issue which the
> build_nonstandard_integer_type didn't help with; if type is signed 1-bit
> precision, then I think a ? ~t : t could be valid, but -(type)a would invoke
> UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit
> invokes UB.
> So perhaps we should guard this optimization on type having element precision 
> > 1
> or being unsigned.  Plus the (convert:type @2) didn't make sense, @2 already
> must have TREE_TYPE type.

In the light of the PR111668 patch which shows that
build_nonstandard_integer_type is needed (at least for some signed prec > 1
BOOLEAN_TYPEs if we use e.g. negation), I've reworked this patch and handled
the last problematic build_nonstandard_integer_type call in there as well.

In the x == cstN ? cst4 : cst3 optimization it uses
build_nonstandard_integer_type solely for BOOLEAN_TYPEs (I really don't see
why ENUMERAL_TYPEs would be a problem, we treat them in GIMPLE as uselessly
convertible to same precision/sign INTEGER_TYPEs), for INTEGER_TYPEs it is
really a no-op (might return a different type, but always INTEGER_TYPE
with same TYPE_PRECISION same TYPE_UNSIGNED) and for BITINT_TYPE with larger
precisions really harmful (we shouldn't create large precision
INTEGER_TYPEs).

The a?~t:t optimization just omits the negation of a in type for 1-bit
precision types or any BOOLEAN_TYPEs.  I think that is correct, because
for both signed and unsigned 1-bit precision type, cast to type of a bool
value yields already 0, -1 or 0, 1 values and for 1-bit precision negation
of that is still 0, -1 or 0, 1 (except for invoking sometimes UB).
And for signed larger precision BOOLEAN_TYPEs I think it is correct as well,
cast of [0, 1] to type yields 0, -1 and those can be xored with 0 or -1
to yield the proper result, any other values would be UB.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-03  Jakub Jelinek  

PR middle-end/111369
* match.pd (x == cstN ? cst4 : cst3): Use
build_nonstandard_integer_type only if type1 is BOOLEAN_TYPE.
Fix comment typo.  Formatting fix.
(a?~t:t -> (-(a))^t): Always convert to type rather
than using build_nonstandard_integer_type.  Perform negation
only if type has precision > 1 and is not signed BOOLEAN_TYPE.

--- gcc/match.pd.jj 2023-10-03 10:33:30.817614648 +0200
+++ gcc/match.pd2023-10-03 11:29:54.089566764 +0200
@@ -5178,7 +5178,7 @@ (define_operator_list SYNC_FETCH_AND_AND

 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
-   x_5 ? cstN ? cst4 : cst3
+   x_5 == cstN ? cst4 : cst3
# op is == or != and N is 1 or 2
to r_6 = x_5 + (min (cst3, cst4) - cst1) or
r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which
@@ -5214,7 +5214,8 @@ (define_operator_list SYNC_FETCH_AND_AND
 type1 = type;
auto prec = TYPE_PRECISION (type1);
auto unsign = TYPE_UNSIGNED (type1);
-   type1 = build_nonstandard_integer_type (prec, unsign);
+   if (TREE_CODE (type1) == BOOLEAN_TYPE)
+   type1 = build_nonstandard_integer_type (prec, unsign);
min = wide_int::from (min, prec,
 TYPE_SIGN (TREE_TYPE (@0)));
wide_int a = wide_int::from (wi::to_wide (arg0), prec,
@@ -5253,14 +5254,7 @@ (define_operator_list SYNC_FETCH_AND_AND
   }
   (if (code == PLUS_EXPR)
(convert (plus (convert:type1 @0) { arg; }))
-   (convert (minus { arg; } (convert:type1 @0)))
-  )
- )
-)
-   )
-  )
- )
-)
+   (convert (minus { arg; } (convert:type1 @0))
 #endif

 (simplify
@@ -6758,13 +6752,11 @@ (define_operator_list SYNC_FETCH_AND_AND
  (with { bool wascmp; }
   (if (INTEGRAL_TYPE_P (type)
&& bitwise_inverted_equal_p (@1, @2, wascmp)
-   && (!wascmp || element_precision (type) == 1))
-   (with {
- auto prec = TYPE_PRECISION (type);
- auto unsign = TYPE_UNSIGNED (type);
- tree inttype = build_nonstandard_integer_type (prec, unsign);
-}
-(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2)))
+   && (!wascmp || TYPE_PRECISION (type) == 1))
+   (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
+   || TYPE_PRECISION (type) == 1)
+(bit_xor (convert:type @0) @2)
+(bit_xor (negate (convert:type @0)) @2)
 #endif

 /* Simplify pointer equality compares using PTA.  */

Jakub

Re: [Patch] libgomp.texi: Clarify that no other OpenMP context selectors are implemented

2023-10-04 Thread Jakub Jelinek

On Wed, Oct 04, 2023 at 01:08:15PM +0200, Tobias Burnus wrote:
> I got confused myself when reading
> 
> https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html
> 
> Especially with regards to other platforms like PowerPC.
> 
> It turned out that the list is complete, kind of. For 'arch' and 'isa'
> those are the only ones - if we want to have more, it has to be
> implemented (→ cf. PR105640).
> 
> For 'kind': The host compiler always matches 'kind=host'; this also
> applies to AMD GCN and Nvidia PTX - when compiled as stand-alone
> compiler. Those two also match 'kind=gpu' (both as stand-alone
> compiler(*) and for offloading).
> 
> The attached documentation patch attempts to clarify this for both users
> – and for implementers, for the latter, there is also a comment in the
> .texi with more details.
> 
> Comments, suggestions, remarks?
> 
> Tobias
> 
> (*) This could be changed by checking for "#ifndef ACCEL_COMPILER" in
> gcc/config/{gcn/nvptx}/ but that does not seem to be worthwhile.
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> libgomp.texi: Clarify that no other OpenMP context selectors are implemented
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OpenMP Context Selectors): Clarify 'kind' trait
>   and that other target archs have no 'arch'/'isa' traits implemented.

LGTM.

Jakub

Re: [committed] libstdc++: Define std::numeric_limits<_FloatNN> before C++23

2023-10-04 Thread Jakub Jelinek

On Wed, Oct 04, 2023 at 09:47:34AM -0700, Pranav Kant wrote:
> Thanks for bringing this to my attention. I am working on a fix. Will keep
> this thread posted.
> 
> Clang *does* define this macro only when float128 type is available. But
> the problem seems to be that clang doesn't define _Float128 alias type
> which is what's being used here. It only defines __float128 type. Should be
> easy to fix.

_Float128 (and other _FloatNN types) are standard C23 and C++23 types,
they certainly can't be simple aliases to __float128 or other similar types.
In C++ they mangle differently, have significantly different behavior in
lots of different language details.
So, you need to implement the https://wg21.link/P1467R9 paper, rather than
doing quick hacks.

Jakub

[PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jakub Jelinek

Hi!

The following patch removes ipa_bits struct pointer/vector from ipa
jump functions and ipa cp transformations.

The reason is because the struct uses widest_int to represent
mask/value pair, which in the RFC patches to allow larger precisions
for wide_int/widest_int is GC unfriendly because those types become
non-trivially default constructible/copyable/destructible.
One option would be to use trailing_wide_int for that instead, but
as pointed out by Aldy, irange_storage which we already use under
the hood for ipa_vr when type of parameter is integral or pointer
already stores the mask/value pair because VRP now does the bit cp
as well.
So, this patch just uses m_vr to store both the value range and
the bitmask.  There is still separate propagation of the
ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
when storing we merge the two into the same container.

I've bootstrapped/regtested a slightly older version of this
patch on x86_64-linux and i686-linux and that version regressed
+FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:"
+FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test"
+FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param 0 to 
0x7"
+FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param 0 to 
0xf"
The last 2 were solely about the earlier patch not actually copying
the if (dump_file) dumping of message that we set some mask for some
parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk).
The first testcase is a test for -fno-ipa-bit-cp disabling bit cp
for alignments.  For integral types I'm afraid it is a lost case
when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp
as well, but for pointer alignments I've added
  && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp)
and
  && opt_for_fn (node->decl, flag_ipa_bit_cp)
guards such that even just -fno-ipa-bit-cp disables it (alternatively
we could just add -fno-ipa-vrp to propalign-3.c dg-options).

Ok for trunk if this passes another bootstrap/regtest?
Or defer until it is really needed (when the wide_int/widest_int
changes are about to be committed)?

2023-10-05  Jakub Jelinek  

* ipa-prop.h (ipa_bits): Remove.
(struct ipa_jump_func): Remove bits member.
(struct ipcp_transformation): Remove bits member, adjust
ctor and dtor.
(ipa_get_ipa_bits_for_value): Remove.
* ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove.
(ipa_bits_hash_table): Remove.
(ipa_print_node_jump_functions_for_edge): Don't print bits.
(ipa_get_ipa_bits_for_value): Remove.
(ipa_set_jfunc_bits): Remove.
(ipa_compute_jump_functions_for_edge): For pointers query
pointer alignment before ipa_set_jfunc_vr and update_bitmask
in there.  For integral types, just rely on bitmask already
being handled in value ranges.
(ipa_check_create_edge_args): Don't create ipa_bits_hash_table.
(ipcp_transformation_initialize): Neither here.
(ipcp_transformation_t::duplicate): Don't copy bits vector.
(ipa_write_jump_function): Don't stream bits here.
(ipa_read_jump_function): Neither here.
(useful_ipcp_transformation_info_p): Don't test bits vec.
(write_ipcp_transformation_info): Don't stream bits here.
(read_ipcp_transformation_info): Neither here.
(ipcp_get_parm_bits): Get mask and value from m_vr rather
than bits.
(ipcp_update_bits): Remove.
(ipcp_update_vr): For pointers, set_ptr_info_alignment from
bitmask stored in value range.
(ipcp_transform_function): Don't test bits vector, don't call
ipcp_update_bits.
* ipa-cp.cc (propagate_bits_across_jump_function): Don't use
jfunc->bits, instead get mask and value from jfunc->m_vr.
(ipcp_store_bits_results): Remove.
(ipcp_store_vr_results): Incorporate parts of
ipcp_store_bits_results here, merge the bitmasks with value
range if both are supplied.
(ipcp_driver): Don't call ipcp_store_bits_results.
* ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits
clearing.

--- gcc/ipa-prop.h.jj   2023-10-05 11:32:40.172739988 +0200
+++ gcc/ipa-prop.h  2023-10-05 11:36:45.405378086 +0200
@@ -292,18 +292,6 @@ public:
   array_slice m_elts;
 };
 
-/* Information about zero/non-zero bits.  */
-class GTY(()) ipa_bits
-{
-public:
-  /* The propagated value.  */
-  widest_int value;
-  /* Mask corresponding to the value.
- Similar to ccp_lattice_t, if xth bit of mask is 0,
- implies xth bit of value is constant.  */
-  widest_int mask;
-};
-
 /* Info about value ranges.  */
 
 class GTY(()) ipa_vr
@@ -342,11 +330,6 @@ struct GTY (()) ipa_jump_func
  and its description.  */
   struct ipa_agg_jump_f

[committed] sreal: Fix typo in function name

2023-10-05 Thread Jakub Jelinek

Hi!

My earlier version of the ipa_bits removal patch resulted in self-test
failures in sreal.  When debugging it, I was really confused that I couldn't
find verify_arithmetics function in the source.  Turns out it had bad
spelling...

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-10-05  Jakub Jelinek  

* sreal.cc (verify_aritmetics): Rename to ...
(verify_arithmetics): ... this.
(sreal_verify_arithmetics): Adjust caller.

--- gcc/sreal.cc.jj 2023-08-08 15:55:08.366138409 +0200
+++ gcc/sreal.cc2023-10-05 10:20:20.528806377 +0200
@@ -323,7 +323,7 @@ sreal_verify_basics (void)
of given arguments A and B.  */
 
 static void
-verify_aritmetics (int64_t a, int64_t b)
+verify_arithmetics (int64_t a, int64_t b)
 {
   ASSERT_EQ (a, -(-(sreal (a))).to_int ());
   ASSERT_EQ (a < b, sreal (a) < sreal (b));
@@ -356,7 +356,7 @@ sreal_verify_arithmetics (void)
int a = values[i];
int b = values[j];
 
-   verify_aritmetics (a, b);
+   verify_arithmetics (a, b);
   }
 }
 

Jakub

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-05 Thread Jakub Jelinek

On Thu, Oct 05, 2023 at 02:01:40PM +, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
>   (typedef struct ifcvt_arg_entry): New.
>   (cmp_arg_entry): New.
>   (gen_phi_arg_condition, gen_phi_nest_statement,
>   predicate_scalar_phi): Use them.

> -  /* Compute phi_arg_map.  */
> +  /* Compute phi_arg_map, determine the list of unique PHI args and the 
> indices
> + where they are in the PHI node.  The indices will be used to determine
> + the conditions to apply and their complexity.  */
> +  auto_vec unique_args (num_args);
>for (i = 0; i < num_args; i++)
>  {
>tree arg;
>  
>arg = gimple_phi_arg_def (phi, i);
>if (!phi_arg_map.get (arg))
> - args.quick_push (arg);
> + unique_args.quick_push (arg);
>phi_arg_map.get_or_insert (arg).safe_push (i);
>  }

I meant instead of using another vector (unique_args) just do
args.quick_push ({ arg, 0, 0, NULL });
above (to avoid needing another allocation etc.).

> -  /* Determine element with max number of occurrences and complexity.  
> Looking at only
> - number of occurrences as a measure for complexity isn't enough as all 
> usages can
> - be unique but the comparisons to reach the PHI node differ per branch.  
> */
> -  typedef std::pair > ArgEntry;
> -  auto_vec argsKV;
> -  for (i = 0; i < args.length (); i++)
> +  /* Determine element with max number of occurrences and complexity.  
> Looking
> + at only number of occurrences as a measure for complexity isn't enough 
> as
> + all usages can be unique but the comparisons to reach the PHI node 
> differ
> + per branch.  */
> +  for (auto arg : unique_args)

And then
  for (auto &entry : args)
here with entry.arg instead of arg and

>  {
>unsigned int len = 0;
> -  for (int index : phi_arg_map.get (args[i]))
> +  vec *indices = phi_arg_map.get (arg);
> +  for (int index : *indices)
>   {
> edge e = gimple_phi_arg_edge (phi, index);
> len += get_bb_num_predicate_stmts (e->src);
>   }
>  
> -  unsigned occur = phi_arg_map.get (args[i])->length ();
> +  unsigned occur = indices->length ();
>if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> -  argsKV.safe_push ({ args[i], { len, occur }});
> +  args.safe_push ({ arg, len, occur, indices });

either
  entry.num_compares = len;
  entry.occur = occur;
  entry.indices = indices;
here or just using entry.{num_occurrences,occur,indices} directly
instead of the extra automatic vars.

>  }
>  
> +  unique_args.release ();

Plus drop this.

Though, if Richi or Jeff think this is ok as is, I won't stand against it.

Jakub

Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jakub Jelinek

On Thu, Oct 05, 2023 at 04:42:42PM +0200, Jan Hubicka wrote:
> It does look like a nice cleanup to me.
> I wonder if you did some compare of the bit information propagated with
> new code and old code?  Theoretically they should be equivalent?

Beyond testsuite, I've tried
__attribute__((noinline, noclone)) static int
foo (int x, int y, int *p)
{
  return p[x + y];
}

__attribute__((noinline, noclone)) static int
bar (int x, int y, int *p)
{
  return foo (x, y & 0xff, p);
}

int
baz (int x, int y, int *p)
{
  return bar ((x & 0x) | 0x12345678, (x & 0x) | 0x87654321, 
__builtin_assume_aligned (p, 32, 16));
}
and -fdump-tree-ccp2-alias was identical before/after the patch,
so the expected
  # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  # ALIGN = 32, MISALIGN = 16
  int * p_7(D) = p;
in foo (-O2).  With -O2 -fno-ipa-vrp
  # RANGE [irange] int [-INF, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [-INF, +INF] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  # ALIGN = 32, MISALIGN = 16
  int * p_7(D) = p;
and -O2 -fno-ipa-bit-cp
  # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  int * p_7(D) = p;
All that is the same as before.

Jakub

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-05 Thread Jakub Jelinek

On Fri, Oct 06, 2023 at 02:23:06AM +, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
>   (typedef struct ifcvt_arg_entry): New.
>   (cmp_arg_entry): New.
>   (gen_phi_arg_condition, gen_phi_nest_statement,
>   predicate_scalar_phi): Use them.

Ok, thanks.

Jakub

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-09 Thread Jakub Jelinek

On Mon, Oct 09, 2023 at 01:54:19PM +0100, Richard Sandiford wrote:
> > I've additionally built it with the incremental attached patch and
> > on make -C gcc check-gcc check-g++ -j32 -k it didn't show any
> > wide_int/widest_int heap allocations unless a > 128-bit _BitInt or wb/uwb
> > constant needing > 128-bit _BitInt was used in a testcase.
> 
> Overall it looks really good to me FWIW.  Some comments about the
> wide-int.h changes below.  Will send a separate message about wide-int.cc.

Thanks, just quick answers, will work on patch adjustments after trying to
get rid of rwide_int (seems dwarf2out has very limited needs from it, just
some routine to construct it in GCed memory (and never change afterwards)
from const wide_int_ref & or so, and then working operator ==,
get_precision, elt, get_len and get_val methods, so I think we could just
have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; };
and perform the methods on it after converting to a storage ref.

> > @@ -380,7 +406,11 @@ namespace wi
> >  
> >  /* The integer has a constant precision (known at GCC compile time)
> > and is signed.  */
> > -CONST_PRECISION
> > +CONST_PRECISION,
> > +
> > +/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger
> > +   precision where not all elements of arrays are always present.  */
> > +WIDEST_CONST_PRECISION
> >};
> 
> Sorry to bring this up so late, but how about using INL_CONST_PRECISION
> for the fully inline case and CONST_PRECISION for the general case?
> That seems more consistent with the other naming in the patch.

Ok.

> > @@ -482,6 +541,18 @@ namespace wi
> >};
> >  
> >template 
> > +  struct binary_traits  > WIDEST_CONST_PRECISION>
> > +  {
> > +STATIC_ASSERT (int_traits ::precision == int_traits 
> > ::precision);
> 
> Should this assert for equal inl_precision too?  Although it probably
> isn't necessary computationally, it seems a bit arbitrary to pick the
> first inl_precision...

inl_precision is only used for widest_int/widest2_int, so if precision is
equal, inl_precision is as well.

> > +inline wide_int_storage::wide_int_storage (const wide_int_storage &x)
> > +{
> > +  len = x.len;
> > +  precision = x.precision;
> > +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> > +{
> > +  u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, 
> > HOST_BITS_PER_WIDE_INT));
> > +  memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> > +}
> > +  else if (LIKELY (precision))
> > +memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> > +}
> 
> Does the variable-length memcpy pay for itself?  If so, perhaps that's a
> sign that we should have a smaller inline buffer for this class (say 2 HWIs).

Guess I'll try to see what results in smaller .text size.

> > +namespace wi
> > +{
> > +  template 
> > +  struct int_traits < widest_int_storage  >
> > +  {
> > +static const enum precision_type precision_type = 
> > WIDEST_CONST_PRECISION;
> > +static const bool host_dependent_precision = false;
> > +static const bool is_sign_extended = true;
> > +static const bool needs_write_val_arg = true;
> > +static const unsigned int precision
> > +  = N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION;
> 
> What's the reasoning behind this calculation?  It would give 0 for
> N < WIDE_INT_MAX_INL_PRECISION, and the "MAX" suggests that N
> shouldn't be > WIDE_INT_MAX_INL_PRECISION either.
> 
> I wonder whether this should be a second template parameter, with an
> assert that precision > inl_precision.

Maybe.  Yet another option would be to always use WIDE_INT_MAX_INL_PRECISION
as the inline precision (and use N template parameter just to decide about
the overall precision), regardless of whether it is widest_int or
widest2_int.  The latter is very rare and even much rarer that something
wouldn't fit into the WIDE_INT_MAX_INL_PRECISION when not using _BitInt.
The reason for introducing inl_precision was to avoid the heap allocation
for widest2_int unless _BitInt is in use, but maybe that isn't worth it.

> Nit: might format more naturally with:
> 
>   using res_traits = wi::int_traits :
>   ...

Ok.

> > @@ -2203,6 +2781,9 @@ wi::sext (const T &x, unsigned int offse
> >unsigned int precision = get_precision (result);
> >WIDE_INT_REF_FOR (T) xi (x, precision);
> >  
> > +  if (result.needs_write_val_arg)
> > +val = result.write_val (MAX (xi.len,
> > +CEIL (offset, HOST_BITS_PER_WIDE_INT)));
> 
> Why MAX rather than MIN?

Because it needs to be an upper bound.
In this case, sext_large has
  unsigned int len = offset / HOST_BITS_PER_WIDE_INT;
  /* Extending beyond the precision is a no-op.  If we have only stored
 OFFSET bits or fewer, the rest are already signs.  */
  if (offset >= precision || len >= xlen)
{
  for (unsigned i = 0; i < xlen; ++i)
val[i] = xval[i];
  return xlen;
}
  unsigned int suboffset = of

[PATCH] wide-int: Remove rwide_int, introduce dw_wide_int

2023-10-09 Thread Jakub Jelinek

On Mon, Oct 09, 2023 at 12:55:02PM +0200, Jakub Jelinek wrote:
> This makes wide_int unusable in GC structures, so for dwarf2out
> which was the only place which needed it there is a new rwide_int type
> (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> inline and is trivially copyable (dwarf2out should never deal with large
> _BitInt constants, those should have been lowered earlier).

As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited,
it just needs to allocate new GC structures val_wide points to (constructed
from some const wide_int_ref &) and needs to call operator==,
get_precision, elt, get_len and get_val methods on it.
Even trailing_wide_int would be overkill for that, the following just adds
a new struct with precision/len and trailing val array members and
implements the needed methods (only 2 of them using wide_int_ref constructed
from those).

Incremental patch, so far compile time tested only:

--- gcc/wide-int.h.jj   2023-10-09 14:37:45.878940132 +0200
+++ gcc/wide-int.h  2023-10-09 16:06:39.326805176 +0200
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.
other longer storage GCC representations (rtl and tree).
 
The actual precision of a wide_int depends on the flavor.  There
-   are four predefined flavors:
+   are three predefined flavors:
 
  1) wide_int (the default).  This flavor does the math in the
  precision of its input arguments.  It is assumed (and checked)
@@ -80,12 +80,7 @@ along with GCC; see the file COPYING3.
wi::leu_p (a, b) as a more efficient short-hand for
"a >= 0 && a <= b". ]
 
- 3) rwide_int.  Restricted wide_int.  This is similar to
- wide_int, but maximum possible precision is RWIDE_INT_MAX_PRECISION
- and it always uses an inline buffer.  offset_int and rwide_int are
- GC-friendly, wide_int and widest_int are not.
-
- 4) widest_int.  This representation is an approximation of
+ 3) widest_int.  This representation is an approximation of
  infinite precision math.  However, it is not really infinite
  precision math as in the GMP library.  It is really finite
  precision math where the precision is WIDEST_INT_MAX_PRECISION.
@@ -257,9 +252,6 @@ along with GCC; see the file COPYING3.
 #define WIDE_INT_MAX_ELTS 255
 #define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)
 
-#define RWIDE_INT_MAX_ELTS WIDE_INT_MAX_INL_ELTS
-#define RWIDE_INT_MAX_PRECISION WIDE_INT_MAX_INL_PRECISION
-
 /* Precision of widest_int and largest _BitInt precision + 1 we can
support.  */
 #define WIDEST_INT_MAX_ELTS 510
@@ -343,7 +335,6 @@ STATIC_ASSERT (WIDE_INT_MAX_INL_ELTS < W
 template  class generic_wide_int;
 template  class fixed_wide_int_storage;
 class wide_int_storage;
-class rwide_int_storage;
 template  class widest_int_storage;
 
 /* An N-bit integer.  Until we can use typedef templates, use this instead.  */
@@ -352,7 +343,6 @@ template  class widest_int_storag
 
 typedef generic_wide_int  wide_int;
 typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int;
-typedef generic_wide_int  rwide_int;
 typedef generic_wide_int  > 
widest_int;
 typedef generic_wide_int  
> widest2_int;
 
@@ -1371,180 +1361,6 @@ wi::int_traits ::get_b
 return wi::get_precision (x);
 }
 
-/* The storage used by rwide_int.  */
-class GTY(()) rwide_int_storage
-{
-private:
-  HOST_WIDE_INT val[RWIDE_INT_MAX_ELTS];
-  unsigned int len;
-  unsigned int precision;
-
-public:
-  rwide_int_storage () = default;
-  template 
-  rwide_int_storage (const T &);
-
-  /* The standard generic_rwide_int storage methods.  */
-  unsigned int get_precision () const;
-  const HOST_WIDE_INT *get_val () const;
-  unsigned int get_len () const;
-  HOST_WIDE_INT *write_val (unsigned int);
-  void set_len (unsigned int, bool = false);
-
-  template 
-  rwide_int_storage &operator = (const T &);
-
-  static rwide_int from (const wide_int_ref &, unsigned int, signop);
-  static rwide_int from_array (const HOST_WIDE_INT *, unsigned int,
-  unsigned int, bool = true);
-  static rwide_int create (unsigned int);
-};
-
-namespace wi
-{
-  template <>
-  struct int_traits 
-  {
-static const enum precision_type precision_type = VAR_PRECISION;
-/* Guaranteed by a static assert in the rwide_int_storage constructor.  */
-static const bool host_dependent_precision = false;
-static const bool is_sign_extended = true;
-static const bool needs_write_val_arg = false;
-template 
-static rwide_int get_binary_result (const T1 &, const T2 &);
-template 
-static unsigned int get_binary_precision (const T1 &, const T2 &);
-  };
-}
-
-/* Initialize the storage from integer X, in its natural precision.
-   Note that we do not allow integers with host-dependent precision
-   to become rwide_ints; rwide_ints must always be logically independen

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-09 Thread Jakub Jelinek

On Mon, Oct 09, 2023 at 03:44:10PM +0200, Jakub Jelinek wrote:
> Thanks, just quick answers, will work on patch adjustments after trying to
> get rid of rwide_int (seems dwarf2out has very limited needs from it, just
> some routine to construct it in GCed memory (and never change afterwards)
> from const wide_int_ref & or so, and then working operator ==,
> get_precision, elt, get_len and get_val methods, so I think we could just
> have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; };
> and perform the methods on it after converting to a storage ref.

Now in patch form (again, incremental).

> > Does the variable-length memcpy pay for itself?  If so, perhaps that's a
> > sign that we should have a smaller inline buffer for this class (say 2 
> > HWIs).
> 
> Guess I'll try to see what results in smaller .text size.

I've left the memcpy changes into a separate patch (incremental, attached).
Seems that second patch results in .text growth by 16256 bytes (0.04%),
though I'd bet it probably makes compile time tiny bit faster because it
replaces an out of line memcpy (caused by variable length) with inlined one.

With even the third one it shrinks by 84544 bytes (0.21% down), but the
extra statistics patch then shows massive number of allocations after
running make check-gcc check-g++ check-gfortran for just a minute or two.
On the widest_int side, I see (first number from sort | uniq -c | sort -nr,
second the estimated or final len)
7289034 4
 173586 5
  21819 6
i.e. there are tons of widest_ints which need len 4 (or perhaps just
have it as upper estimation), maybe even 5 would be nice.
On the wide_int side, I see
 155291 576
(supposedly because of bound_wide_int, where we create wide_int_ref from
the 576-bit precision bound_wide_int and then create 576-bit wide_int when
using unary or binary operation on that).

So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6
instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed
from MAX_BITSIZE_MODE_ANY_INT?).  Or keep it at 9 for both (i.e. without
the third patch).

--- gcc/poly-int.h.jj   2023-10-09 14:37:45.883940062 +0200
+++ gcc/poly-int.h  2023-10-09 17:05:26.629828329 +0200
@@ -96,7 +96,7 @@ struct poly_coeff_traits
-struct poly_coeff_traits
+struct poly_coeff_traits
 {
   typedef WI_UNARY_RESULT (T) result;
   typedef int int_type;
@@ -110,14 +110,13 @@ struct poly_coeff_traits
-struct poly_coeff_traits
+struct poly_coeff_traits
 {
   typedef WI_UNARY_RESULT (T) result;
   typedef int int_type;
   /* These types are always signed.  */
   static const int signedness = 1;
   static const int precision = wi::int_traits::precision;
-  static const int inl_precision = wi::int_traits::inl_precision;
   static const int rank = precision * 2 / CHAR_BIT;
 
   template
--- gcc/double-int.h.jj 2023-01-02 09:32:22.747280053 +0100
+++ gcc/double-int.h2023-10-09 17:06:03.446317336 +0200
@@ -440,7 +440,7 @@ namespace wi
   template <>
   struct int_traits 
   {
-static const enum precision_type precision_type = CONST_PRECISION;
+static const enum precision_type precision_type = INL_CONST_PRECISION;
 static const bool host_dependent_precision = true;
 static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT;
 static unsigned int get_precision (const double_int &);
--- gcc/wide-int.h.jj   2023-10-09 16:06:39.326805176 +0200
+++ gcc/wide-int.h  2023-10-09 17:29:20.016951691 +0200
@@ -343,8 +343,8 @@ template  class widest_int_storag
 
 typedef generic_wide_int  wide_int;
 typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int;
-typedef generic_wide_int  > 
widest_int;
-typedef generic_wide_int  
> widest2_int;
+typedef generic_wide_int  > 
widest_int;
+typedef generic_wide_int  > 
widest2_int;
 
 /* wi::storage_ref can be a reference to a primitive type,
so this is the conservatively-correct setting.  */
@@ -394,13 +394,13 @@ namespace wi
 /* The integer has a variable precision but no defined signedness.  */
 VAR_PRECISION,
 
-/* The integer has a constant precision (known at GCC compile time)
-   and is signed.  */
-CONST_PRECISION,
-
-/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger
-   precision where not all elements of arrays are always present.  */
-WIDEST_CONST_PRECISION
+/* The integer has a constant precision (known at GCC compile time),
+   is signed and all elements are in inline buffer.  */
+INL_CONST_PRECISION,
+
+/* Like INL_CONST_PRECISION, but elements can be heap allocated for
+   larger lengths.  */
+CONST_PRECISION
   };
 
   /* This class, which has no default implementation, is expected to
@@ -410,15 +410,10 @@ namespace wi
Classifies the type of T.
 
  static const unsigned int precision;
-   Only defined if precision_type == CONST_PRECISION or
-   prec

Re: [PATCH] x86: set spincount 1 for x86 hybrid platform [PR109812]

2023-10-10 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 12:59:52PM +0800, Jun Zhang wrote:
> include/ChangeLog:
> 
>   * omphook.h: define RUNOMPHOOK macro.

ChangeLog formatting.  The description should start with
capital letter.  If you add a new file, you should just
mention : New file. or something similar.

But much more importantly, we don't do hooks like that in libgomp,
it should be better a static inline function where the name shows
what it is doing (e.g. do_adjust_default_spincount), best in some
already existing header (e.g. wait.h), and best at the end of the
if (!parse_spincount ("GOMP_SPINCOUNT", &gomp_spin_count_var))
block in env.c.
> 
> libgomp/ChangeLog:
> 
>   * env.c (initialize_env): add RUNOMPHOOK macro.
>   * config/linux/x86/omphook.h: define RUNOMPHOOK macro.
> ---
>  include/omphook.h  |  1 +
>  libgomp/config/linux/x86/omphook.h | 19 +++
>  libgomp/env.c  |  3 +++
>  3 files changed, 23 insertions(+)
>  create mode 100644 include/omphook.h
>  create mode 100644 libgomp/config/linux/x86/omphook.h
> 
> diff --git a/include/omphook.h b/include/omphook.h
> new file mode 100644
> index 000..2ebe3ad57e6
> --- /dev/null
> +++ b/include/omphook.h
> @@ -0,0 +1 @@
> +#define RUNOMPHOOK()
> diff --git a/libgomp/config/linux/x86/omphook.h 
> b/libgomp/config/linux/x86/omphook.h
> new file mode 100644
> index 000..aefb311cc07
> --- /dev/null
> +++ b/libgomp/config/linux/x86/omphook.h
> @@ -0,0 +1,19 @@
> +#ifdef __x86_64__
> +#include "cpuid.h"
> +
> +/* only for x86 hybrid platform */

Comments should again start with capital letter and end with dot and 2
spaces before */

> +#define RUNOMPHOOK()  \
> +  do \
> +{ \
> +  unsigned int eax, ebx, ecx, edx; \
> +  if ((getenv ("GOMP_SPINCOUNT") == NULL) && (wait_policy < 0) \

Spurious ()s around the subcondition, the first shouldn't be tested when
placed right, if condition doesn't fit on one line, each subcondition
should be on separate line.

> +   && __get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx) \

If we don't have a macro for the CPUID.07H.0H:EDX[15] bit, there should
be a comment which says what that bit is.

> +   && ((edx >> 15) & 1)) \
> + gomp_spin_count_var = 1LL; \
> +  if (gomp_throttled_spin_count_var > gomp_spin_count_var) \
> + gomp_throttled_spin_count_var = gomp_spin_count_var; \

The above 2 lines won't be needed if placed right.

Jakub

Re: [PATCH] wide-int: Remove rwide_int, introduce dw_wide_int

2023-10-10 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 09:30:31AM +, Richard Biener wrote:
> On Mon, 9 Oct 2023, Jakub Jelinek wrote:
> 
> > On Mon, Oct 09, 2023 at 12:55:02PM +0200, Jakub Jelinek wrote:
> > > This makes wide_int unusable in GC structures, so for dwarf2out
> > > which was the only place which needed it there is a new rwide_int type
> > > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> > > inline and is trivially copyable (dwarf2out should never deal with large
> > > _BitInt constants, those should have been lowered earlier).
> > 
> > As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited,
> > it just needs to allocate new GC structures val_wide points to (constructed
> > from some const wide_int_ref &) and needs to call operator==,
> > get_precision, elt, get_len and get_val methods on it.
> > Even trailing_wide_int would be overkill for that, the following just adds
> > a new struct with precision/len and trailing val array members and
> > implements the needed methods (only 2 of them using wide_int_ref constructed
> > from those).
> > 
> > Incremental patch, so far compile time tested only:
> 
> LGTM, wonder if we can push this separately as prerequesite?

The dwarf2out.{h,cc} changes sure, the wide-int.h changes obviously need to
be merged into the main patch.

Jakub

Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store

2023-10-10 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 10:49:04AM +, Richard Biener wrote:
> The following fixes a mistake in count_nonzero_bytes which happily
> skips over stores clobbering the memory we load a value we store
> from and then performs analysis on the memory state before the
> intermediate store.
> 
> The patch implements the most simple fix - guarantee that there are
> no intervening stores by tracking the original active virtual operand
> and comparing that to the one of a load we attempt to analyze.
> 
> This simple approach breaks two subtests of gcc.dg/Wstringop-overflow-69.c
> which I chose to XFAIL.

This function is a total mess, but I think punting right after the
gimple_assign_single_p test is too big hammer.
There are various cases and some of them are fine even when vuse is
different, others aren't.
The function at that point tries to handle CONSTRUCTOR, MEM_REF, or decls.

I don't see why the CONSTRUCTOR case couldn't be fine regardless of the
vuse.  Though, am not really sure when a CONSTRUCTOR would appear, the
lhs would need to be an SSA_NAME, so wouldn't for vectors that be a
VECTOR_CST instead, etc.?  Ah, perhaps a vector with some non-constant
element in it.
The MEM_REF case supposedly only if we can guarantee nothing could overwrite
it (so MEM_REF of TREE_STATIC TREE_READONLY could be fine, STRING_CST too,
anything else is wrong - count_nonzero_bytes_addr uses the
get_stridx/get_strinfo APIs, which for something that can be changed
always reflects only the state at the current statement.  So, e.g. the
get_stridx (exp, stmt) > 0 case in count_nonzero_bytes_addr is when
the caller must definitely punt if vuse is different.
Then for decls, again, CONST_DECLs, DECL_IN_CONSTANT_POOLs are certainly
fine.  Other decls for which ctor_for_folding returns non-error_mark_node
should be fine as well, I think ctor_useable_for_folding_p is supposed
to verify that.  STRING_CSTs should be fine as well.

So maybe pass the vuse down to count_nonzero_bytes_addr and return false
in the idx > 0 case in there if gimple_vuse (stmt) != vuse?

Jakub

Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store

2023-10-10 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 11:59:28AM +, Richard Biener wrote:
> > I don't see why the CONSTRUCTOR case couldn't be fine regardless of the
> > vuse.  Though, am not really sure when a CONSTRUCTOR would appear, the
> > lhs would need to be an SSA_NAME, so wouldn't for vectors that be a
> > VECTOR_CST instead, etc.?  Ah, perhaps a vector with some non-constant
> > element in it.
> 
> Yeah, but what should that be interpreted to in terms of object-size?!

The function in question doesn't compute object sizes, just minimum/maximum
number of non-zero bytes in some rhs of a store and whether everything
stored is 0s, or everything non-zeros, or some non-zeros followed by zero.

> I think the only real case we'd see here is the MEM_REF RHS given
> we know we have a register type value.  I'll note the function
> totally misses handled_component_p (), it only seems to handle
> *p and 'decl'.

Yeah, maybe we could handle even that at some point.
Though perhaps better first to rewrite it completely, because the recursive
calls where in some cases it means one thing and in another case something
completely different are just bad design (or lack thereof).

> > So maybe pass the vuse down to count_nonzero_bytes_addr and return false
> > in the idx > 0 case in there if gimple_vuse (stmt) != vuse?
> 
> I don't know enough of the pass to do better, can you take it from here?
> One of the points is that we need to know the memory context (aka vuse)
> of both the original store and the load we are interpreting.  For
> _addr I wasn't sure how we arrive here.  As you said, this is a bit
> of spaghetti and I don't want to untangle this any further.

I meant something like below, without getting rid of the -Wshadow stuff
in there my initial attempt didn't work.  This passes the new testcase
as well as the testcase you've been touching, but haven't tested it beyond
that yet.
In theory we could even handle some cases with gimple_vuse (stmt) != vuse,
because we save a copy of the strinfo state at the end of basic blocks and
only throw that away after we process all dominator children.  But we'd need
to figure out at which bb to look and temporarily switch the vectors.

2023-10-10  Richard Biener  
Jakub Jelinek  

PR tree-optimization/111519
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse
argument and pass it through to recursive calls and
count_nonzero_bytes_addr calls.  Don't shadow the stmt argument, but
change stmt for gimple_assign_single_p statements for which we don't
immediately punt.
(strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass
it through to recursive calls and count_nonzero_bytes calls.  Don't
use get_strinfo if gimple_vuse (stmt) is different from vuse.  Don't
shadow the stmt argument.

* gcc.dg/torture/pr111519.c: New testcase.

--- gcc/tree-ssa-strlen.cc.jj   2023-08-30 11:21:38.539521966 +0200
+++ gcc/tree-ssa-strlen.cc  2023-10-10 15:05:44.731871218 +0200
@@ -281,14 +281,14 @@ public:
gimple *stmt,
unsigned lenrange[3], bool *nulterm,
bool *allnul, bool *allnonnul);
-  bool count_nonzero_bytes (tree exp,
+  bool count_nonzero_bytes (tree exp, tree vuse,
gimple *stmt,
unsigned HOST_WIDE_INT offset,
unsigned HOST_WIDE_INT nbytes,
unsigned lenrange[3], bool *nulterm,
bool *allnul, bool *allnonnul,
ssa_name_limit_t &snlim);
-  bool count_nonzero_bytes_addr (tree exp,
+  bool count_nonzero_bytes_addr (tree exp, tree vuse,
 gimple *stmt,
 unsigned HOST_WIDE_INT offset,
 unsigned HOST_WIDE_INT nbytes,
@@ -4531,8 +4531,8 @@ nonzero_bytes_for_type (tree type, unsig
 }

 /* Recursively determine the minimum and maximum number of leading nonzero
-   bytes in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
-   to each.
+   bytes in the representation of EXP at memory state VUSE and set
+   LENRANGE[0] and LENRANGE[1] to each.
Sets LENRANGE[2] to the total size of the access (which may be less
than LENRANGE[1] when what's being referenced by EXP is a pointer
rather than an array).
@@ -4546,7 +4546,7 @@ nonzero_bytes_for_type (tree type, unsig
Returns true on success and false otherwise.  */

 bool
-strlen_pass::count_nonzero_bytes (tree exp, gimple *stmt,
+strlen_pass::count_nonzero_bytes (tree exp, tree vuse, gimple *stmt,
  unsigned HOST_WIDE_INT offset,
  unsigned

[PATCH] dwarf2out: Stop using wide_int in GC structures

2023-10-10 Thread Jakub Jelinek

Hi!

On Tue, Oct 10, 2023 at 09:30:31AM +, Richard Biener wrote:
> On Mon, 9 Oct 2023, Jakub Jelinek wrote:
> > > This makes wide_int unusable in GC structures, so for dwarf2out
> > > which was the only place which needed it there is a new rwide_int type
> > > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> > > inline and is trivially copyable (dwarf2out should never deal with large
> > > _BitInt constants, those should have been lowered earlier).
> > 
> > As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited,
> > it just needs to allocate new GC structures val_wide points to (constructed
> > from some const wide_int_ref &) and needs to call operator==,
> > get_precision, elt, get_len and get_val methods on it.
> > Even trailing_wide_int would be overkill for that, the following just adds
> > a new struct with precision/len and trailing val array members and
> > implements the needed methods (only 2 of them using wide_int_ref constructed
> > from those).
> > 
> > Incremental patch, so far compile time tested only:
> 
> LGTM, wonder if we can push this separately as prerequesite?

Here it is as a separate independent patch.  Even without the
wide_int changing patch it should save some memory, by not always
allocating room for 9 limbs, but say just the 2/3 or how many we actually
need.  And, another advantage is that if we really need it at some point,
it could support even more than 9 limbs if it is created from a wide_int_ref
with get_len () > 9.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-10  Jakub Jelinek  

* dwarf2out.h (wide_int_ptr): Remove.
(dw_wide_int_ptr): New typedef.
(struct dw_val_node): Change type of val_wide from wide_int_ptr
to dw_wide_int_ptr.
(struct dw_wide_int): New type.
(dw_wide_int::elt): New method.
(dw_wide_int::operator ==): Likewise.
* dwarf2out.cc (get_full_len): Change argument type to
const dw_wide_int & from const wide_int &.  Use CEIL.  Call
get_precision method instead of calling wi::get_precision.
(alloc_dw_wide_int): New function.
(add_AT_wide): Change w argument type to const wide_int_ref &
from const wide_int &.  Use alloc_dw_wide_int.
(mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int.
(insert_wide_int): Change val argument type to const wide_int_ref &
from const wide_int &.
(add_const_value_attribute): Pass rtx_mode_t temporary directly to
add_AT_wide instead of using a temporary variable.

--- gcc/dwarf2out.h.jj  2023-10-09 14:37:45.890939965 +0200
+++ gcc/dwarf2out.h 2023-10-09 16:46:14.705816928 +0200
@@ -30,7 +30,7 @@ typedef struct dw_cfi_node *dw_cfi_ref;
 typedef struct dw_loc_descr_node *dw_loc_descr_ref;
 typedef struct dw_loc_list_struct *dw_loc_list_ref;
 typedef struct dw_discr_list_node *dw_discr_list_ref;
-typedef wide_int *wide_int_ptr;
+typedef struct dw_wide_int *dw_wide_int_ptr;
 
 
 /* Call frames are described using a sequence of Call Frame
@@ -252,7 +252,7 @@ struct GTY(()) dw_val_node {
   unsigned HOST_WIDE_INT
GTY ((tag ("dw_val_class_unsigned_const"))) val_unsigned;
   double_int GTY ((tag ("dw_val_class_const_double"))) val_double;
-  wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
+  dw_wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
   dw_vec_const GTY ((tag ("dw_val_class_vec"))) val_vec;
   struct dw_val_die_union
{
@@ -313,6 +313,35 @@ struct GTY(()) dw_discr_list_node {
   int dw_discr_range;
 };
 
+struct GTY((variable_size)) dw_wide_int {
+  unsigned int precision;
+  unsigned int len;
+  HOST_WIDE_INT val[1];
+
+  unsigned int get_precision () const { return precision; }
+  unsigned int get_len () const { return len; }
+  const HOST_WIDE_INT *get_val () const { return val; }
+  inline HOST_WIDE_INT elt (unsigned int) const;
+  inline bool operator == (const dw_wide_int &) const;
+};
+
+inline HOST_WIDE_INT
+dw_wide_int::elt (unsigned int i) const
+{
+  if (i < len)
+return val[i];
+  wide_int_ref ref = wi::storage_ref (val, len, precision);
+  return wi::sign_mask (ref);
+}
+
+inline bool
+dw_wide_int::operator == (const dw_wide_int &o) const
+{
+  wide_int_ref ref1 = wi::storage_ref (val, len, precision);
+  wide_int_ref ref2 = wi::storage_ref (o.val, o.len, o.precision);
+  return ref1 == ref2;
+}
+
 /* Interface from dwarf2out.cc to dwarf2cfi.cc.  */
 extern struct dw_loc_descr_node *build_cfa_loc
   (dw_cfa_location *, poly_int64);
--- gcc/dwarf2out.cc.jj 2023-10-09 14:37:45.894939909 +0200
+++ gcc/dwarf2out.cc2023-10-09 16:48:24.565014459 +0200
@@ -397,11 +397,9 @@ dump_struct_debug (tree type, enum

C++ patch ping^2

2023-10-10 Thread Jakub Jelinek

Hi!

I'd like to ping a couple of C++ patches.

- c++, v2: Implement C++26 P2169R4 - Placeholder variables with no name 
[PR110349]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630802.html

- c++: Implement C++26 P2361R6 - Unevaluated strings [PR110342]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628375.html

- c++, v2: Implement C++26 P2741R3 - user-generated static_assert messages 
[PR110348]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630803.html

- c++: Implement C++ DR 2406 - [[fallthrough]] attribute and iteration 
statements [PR107571]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628487.html
  (from this one Richi approved the middle-end changes)

- c++: Implement C++26 P1854R4 - Making non-encodable string literals 
ill-formed [PR110341]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628490.html

- libcpp, v2: Small incremental patch for P1854R4 [PR110341]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628586.html

Jakub

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-10 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 06:41:50PM +0100, Richard Sandiford wrote:
> > On the wide_int side, I see
> >  155291 576
> > (supposedly because of bound_wide_int, where we create wide_int_ref from
> > the 576-bit precision bound_wide_int and then create 576-bit wide_int when
> > using unary or binary operation on that).
> 
> 576 bits seems quite a lot for a loop bound.  We're enterning near-infinite
> territory with 128 bits. :)  But I don't want to rehash old discussions.
> If we take this size for wide_int as given, then...

We could go for 128 bits bounds_wide_int and redo the stats given that.
Though maybe such tweaks can be done incrementally.

> > So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6
> > instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed
> > from MAX_BITSIZE_MODE_ANY_INT?).  Or keep it at 9 for both (i.e. without
> > the third patch).
> 
> ...I agree we might as well keep the widest_int size the same for
> simplicity.  It'd only be worth distinguishing them if we have positive
> proof that it's worthwhile.
> 
> So going with patches 1 + 2 sounds good to me, but I don't have a strong
> preference.

Ok.

> > +  if (prec > WIDE_INT_MAX_INL_PRECISION && !high)
> > +prec = (op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT;
> 
> Changing the precision looked a bit dangerous at first, but I agree it
> looks correct in context, in that nothing later on seems to require prec
> to be the real precision of the number.  But I wonder whether we should
> instead do:
> 
>   if (!high)
> prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec);

Ok, will try that (and in other spots too).

> so that the assumption gets a bit more testing.  Same idea for the others.
> I.e. in any case where we think it's safe to reduce a precision or
> length for out-of-line buffers, I think we should try to do the same
> for inline ones.
> 
> I'm not sure off-hand why + 1 is safe here but + 2 is needed for the
> write_val estimate.  Might be worth a comment in one place or the other.

As I wrote earlier, I'm not sure about the need for the + 1 above in
prec computation, perhaps just
  if (!high)
prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec);
could work too (especially if the multiplication is always signed vs. signed
or unsigned vs. unsigned).  Even say
  HOST_WIDE_INT_MIN * HOST_WIDE_INT_MIN (i.e. op1len == op2len == 1) result
of 0x4000'0'' (for smallest signed^2) or
~(unsigned HOST_WIDE_INT) 0 * ~(unsigned HOST_WIDE_INT) 0 (largest
unsigned^2) of
0x'fffe''0001
fits into prec 2; and for widest_int-like representation,
0x'ff wouldn't have op1len 1, but 2; but the + 1 on top of
that is needed because of the way wi_pack then works.  But guess will try to
play with it some more tomorrow.

> > @@ -2399,9 +2453,12 @@ from_int (int i)
> >  static void
> >  assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
> >  {
> > -  char buf[WIDE_INT_PRINT_BUFFER_SIZE];
> > -  print_dec (wi, buf, sgn);
> > -  ASSERT_STREQ (expected, buf);
> > +  char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
> > +  unsigned len = wi.get_len ();
> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> > +p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
> 
> Is this size enough for decimal?  E.g. to go back to the 576-bit example,
> 2^575 is O(10^173), but 576/4+4 == 148.

generic_wide_ints with precision larger than HOST_BITS_PER_WIDE_INT and
length larger than 1 are always printed hexadecimally, we don't have code to
print it decimally (it wouldn't be terribly hard, we have divmod, but could
be expensive).  Though, I've always wondered why we at least for print_decs
don't print negative values as -0xwhatever, rather than 0xsomethinglarge.
Any change in this area could affect a lot of dumps though.

Jakub

Re: [PATCH] x86: set spincount 1 for x86 hybrid platform [PR109812]

2023-10-10 Thread Jakub Jelinek

On Wed, Oct 11, 2023 at 02:47:56AM +, Zhang, Jun wrote:
> Sorry, I don't know how to place wait.h. This patch is only for x86.
> Wait.h is in libgomp/config/linux/.
> if I directly place do_adjust_default_spincount in wait.h,
> it maybe need #ifdef  __x86_64__, and also need a wait.h in include/
> for other os platform.
> if I create a new wait.h in libgomp/config/linux/x86/,
> It need copy whole wait.h from libgomp/config/linux/, 
> also need a wait.h in include/ for other os platform.
> Could you help me?

Put it into a new spincount.h then.
Sorry, missed wait.h isn't CPU specific.

Jakub

Re: [PATCH v2] x86: set spincount 1 for x86 hybrid platform

2023-10-11 Thread Jakub Jelinek

On Wed, Oct 11, 2023 at 04:39:28PM +0800, Jun Zhang wrote:
> include/ChangeLog:
> 
>   * spincount.h: New file.
> 
> libgomp/ChangeLog:
> 
>   * env.c (initialize_env): Use do_adjust_default_spincount.
>   * config/linux/x86/spincount.h: New file.

Ok.

Jakub

Re: Principles of the C99 testsuite conversion

2023-10-11 Thread Jakub Jelinek

On Wed, Oct 11, 2023 at 08:17:49AM -0600, Jeff Law wrote:
> 
> 
> On 10/11/23 08:10, Richard Earnshaw (lists) wrote:
> > On 11/10/2023 14:56, Jeff Law wrote:
> > > 
> > > 
> > > On 10/11/23 04:39, Florian Weimer wrote:
> > > > I've started to look at what it is required to convert the testsuite to
> > > > C99 (without implicit ints, without implicit function declarations, and
> > > > a few other legacy language features).
> > > I bet those older tests originating from c-torture will be a bit painful. 
> > >  Torbjorn liked having them minimized, to the point of squashing out 
> > > nearly everything he considered extraneous.  I'd bet many of those older 
> > > tests are going to need lots of changes.
> > > 
> > 
> > I've often wondered just how much of the original c-torture suite is still 
> > relevant today.  Most of those tests were written at a  time when the 
> > compiler expanded tree directly into RTL and I suspect that today the tests 
> > never get even close to tickling the original bug they were intended to 
> > validate.
> I'm sure it's a mixed bag.  Some I do see pop up regularly during
> development testing.
> 
> The real way to try and answer that question would be with gcov. Something
> like take an existing coverage report, run test and see if it added any
> additional coverage.  If not, flag it as potentially irrelevant.  Do this on
> x86 or aarch64.
> 
> Given that list of potentially irrelevant tests, then look at them from a
> target standpoint, potentially testing with the same methodology on several
> targets, perhaps a standardized set meant to cover 16->64 bit targets,
> big/little endian and a few other key features.

I think over the years we've seen tons of cases where a test written for one
problem at some point is able to trigger some completely different problem a
few years ago.  So, I don't think most of the tests are irrelevant.

Jakub

[PATCH] libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc

2023-10-12 Thread Jakub Jelinek

Hi!

The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too.  By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).

Tested on x86_64-linux, ok for trunk?

2023-10-12  Jakub Jelinek  

* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.

--- libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc.jj 
2023-01-16 23:19:06.651711546 +0100
+++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc
2023-10-12 09:46:28.695011763 +0200
@@ -35,7 +35,7 @@ void test01(int dummy, ...)
   char* s = 0;
   const char* cs = 0;
   const char* format = "%i";
-  FILE* stream = 0;
+  FILE* stream = va_arg(ap, FILE*);
   std::size_t n = 0;
 
   int ret;
--- libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc.jj 
2023-01-16 23:19:06.651711546 +0100
+++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc
2023-10-12 09:46:19.236141897 +0200
@@ -42,7 +42,7 @@ void test01(int dummy, ...)
 #endif
 
 #if _GLIBCXX_HAVE_VFWSCANF
-  FILE* stream = 0;
+  FILE* stream = va_arg(arg, FILE*);
   const wchar_t* format1 = 0;
   int ret1;
   ret1 = std::tr1::vfwscanf(stream, format1, arg);

Jakub

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Jakub Jelinek

On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
> >unsigned int xlen, unsigned int xprecision,
> >unsigned int precision, unsigned int shift)
> >  {
> > -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> > shift);
> > +  /* Work out how many blocks are needed to store the significant bits
> > + (excluding the upper zeros or signs).  */
> > +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> > +  unsigned int len = blocks_needed;
> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> > +  && len > xlen
> > +  && xval[xlen - 1] >= 0)
> > +len = xlen;
> 
> I think here too it would be worth dropping the:
> 
>   UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> 
> part of the condition, since presumably the change should be safe
> regardless of that.

If so, there is also one spot in lshift_large as well.  So incrementally:

--- gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
+++ gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
@@ -2013,8 +2013,7 @@
 
   /* The whole-block shift fills with zeros.  */
   unsigned int len = BLOCKS_NEEDED (precision);
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-len = xlen + skip + 1;
+  len = MIN (xlen + skip + 1, len);
   for (unsigned int i = 0; i < skip; ++i)
 val[i] = 0;
 
@@ -2079,9 +2078,7 @@
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
   unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
-  && len > xlen
-  && xval[xlen - 1] >= 0)
+  if (len > xlen && xval[xlen - 1] >= 0)
 len = xlen;
 
   rshift_large_common (val, xval, xlen, shift, len);
@@ -2114,9 +2111,7 @@
   /* Work out how many blocks are needed to store the significant bits
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
-  unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
-len = xlen;
+  unsigned int len = MIN (xlen, blocks_needed);
 
   rshift_large_common (val, xval, xlen, shift, len);
 
which I'll test soon.

> OK for thw wide-int parts with those changes.

Thanks.  What do you think about that
--- gcc/wide-int.h.jj   2023-10-11 12:05:47.718059477 +0200
+++ gcc/wide-int.h  2023-10-11 13:51:56.081552500 +0200
@@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig
   u.valp = XNEWVEC (HOST_WIDE_INT, l);
   return u.valp;
 }
+  else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS)
+u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef);
   return u.val;
 }
 
@@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne
   memcpy (u.val, valp, l * sizeof (u.val[0]));
   XDELETEVEC (valp);
 }
+  else if (len && len < WIDE_INT_MAX_INL_ELTS)
+gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len]
+== HOST_WIDE_INT_UC (0xbaaddeadbeef));
   len = l;
   /* There are no excess bits in val[len - 1].  */
   STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);

part, shall that go into trunk as well or is that too much slowdown
for checking builds?

Jakub

[committed] wide-int: Fix build with gcc < 12 or clang++ [PR111787]

2023-10-12 Thread Jakub Jelinek

Hi!

While my wide_int patch bootstrapped/regtested fine when I used GCC 12
as system gcc, apparently it doesn't with GCC 11 and older or clang++.
For GCC before PR96555 C++ DR1315 implementation the compiler complains
about template argument involving template parameters, for clang++ the
same + complains about missing needs_write_val_arg static data member
in some wi::int_traits specializations.

I've so far rebuilt just stage3 gcc subdirectory with this patch and Tobias
and William made it with it through stage1.  Committed to unbreak build
for others.

2023-10-12  Jakub Jelinek  

PR bootstrap/111787
* tree.h (wi::int_traits ::needs_write_val_arg): New
static data member.
(int_traits >::needs_write_val_arg): Likewise.
(wi::ints_for): Provide separate partial specializations for
generic_wide_int > and INL_CONST_PRECISION or that
and CONST_PRECISION, rather than using
int_traits  >::precision_type as the second template
argument.
* rtl.h (wi::int_traits ::needs_write_val_arg): New
static data member.
* double-int.h (wi::int_traits ::needs_write_val_arg):
Likewise.

--- gcc/tree.h.jj   2023-10-12 16:01:04.0 +0200
+++ gcc/tree.h  2023-10-12 16:52:51.977954615 +0200
@@ -6237,6 +6237,7 @@ namespace wi
 static const enum precision_type precision_type = VAR_PRECISION;
 static const bool host_dependent_precision = false;
 static const bool is_sign_extended = false;
+static const bool needs_write_val_arg = false;
   };
 
   template 
@@ -6262,6 +6263,7 @@ namespace wi
   = N == ADDR_MAX_PRECISION ? INL_CONST_PRECISION : CONST_PRECISION;
 static const bool host_dependent_precision = false;
 static const bool is_sign_extended = true;
+static const bool needs_write_val_arg = false;
 static const unsigned int precision = N;
   };
 
@@ -6293,8 +6295,14 @@ namespace wi
   tree_to_poly_wide_ref to_poly_wide (const_tree);
 
   template 
-  struct ints_for  >,
-  int_traits  >::precision_type>
+  struct ints_for  >, INL_CONST_PRECISION>
+  {
+typedef generic_wide_int  > extended;
+static extended zero (const extended &);
+  };
+
+  template 
+  struct ints_for  >, CONST_PRECISION>
   {
 typedef generic_wide_int  > extended;
 static extended zero (const extended &);
@@ -6532,8 +6540,15 @@ wi::to_poly_wide (const_tree t)
 template 
 inline generic_wide_int  >
 wi::ints_for  >,
- wi::int_traits  >::precision_type
->::zero (const extended &x)
+ wi::INL_CONST_PRECISION>::zero (const extended &x)
+{
+  return build_zero_cst (TREE_TYPE (x.get_tree ()));
+}
+
+template 
+inline generic_wide_int  >
+wi::ints_for  >,
+ wi::CONST_PRECISION>::zero (const extended &x)
 {
   return build_zero_cst (TREE_TYPE (x.get_tree ()));
 }
--- gcc/rtl.h.jj2023-09-29 22:04:44.463012421 +0200
+++ gcc/rtl.h   2023-10-12 16:54:59.915240074 +0200
@@ -2270,6 +2270,7 @@ namespace wi
 /* This ought to be true, except for the special case that BImode
is canonicalized to STORE_FLAG_VALUE, which might be 1.  */
 static const bool is_sign_extended = false;
+static const bool needs_write_val_arg = false;
 static unsigned int get_precision (const rtx_mode_t &);
 static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,
  const rtx_mode_t &);
--- gcc/double-int.h.jj 2023-10-12 16:01:04.260164202 +0200
+++ gcc/double-int.h2023-10-12 16:53:41.401292272 +0200
@@ -442,6 +442,7 @@ namespace wi
   {
 static const enum precision_type precision_type = INL_CONST_PRECISION;
 static const bool host_dependent_precision = true;
+static const bool needs_write_val_arg = false;
 static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT;
 static unsigned int get_precision (const double_int &);
 static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,

Jakub

Re: [Patch] libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory

2023-10-12 Thread Jakub Jelinek

On Thu, Oct 12, 2023 at 06:37:00PM +0200, Tobias Burnus wrote:
> libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory
> 
> In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was
> insufficiently specified; 5.2 clarified this with extensions/clarifications
> (omp_initial_device, omp_invalid_device, "conforming device number").
> GCC's implementation matches OpenMP 5.2.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to
>   OMP_TARGET_OFFLOAD.
>   (OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE;
>   clarify MANDATORY behavior.

LGTM.

Jakub

Re: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars

2023-10-13 Thread Jakub Jelinek

On Tue, Oct 10, 2023 at 06:46:35PM +0200, Tobias Burnus wrote:
>   * parse.cc (check_omp_allocate_stmt): Permit procedure pointers
>   here (rejected later) for less mislreading diagnostic.

s/misl/mis/

> libgomp/ChangeLog:
> 
>   * libgomp.texi:

Fill in something here.

> @@ -7220,8 +7227,7 @@ gfc_resolve_omp_allocate (gfc_namespace *ns, 
> gfc_omp_namelist *list)
>&n->where);
> continue;
>   }
> -  if (ns != n->sym->ns || n->sym->attr.use_assoc
> -   || n->sym->attr.host_assoc || n->sym->attr.imported)
> +  if (ns != n->sym->ns || n->sym->attr.use_assoc ||  
> n->sym->attr.imported)

s/  n/ n/

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -1400,23 +1400,53 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
> "region must specify an % clause", t);
> /* Skip for omp_default_mem_alloc (= 1),
>unless align is present. */
> -   else if (!errorcount
> -&& (align != NULL_TREE
> -|| alloc == NULL_TREE
> -|| !integer_onep (alloc)))
> +   else if (errorcount
> +|| (align == NULL_TREE
> +&& alloc != NULL_TREE
> +&& integer_onep (alloc)))
> + DECL_ATTRIBUTES (t) = remove_attribute ("omp allocate",
> + DECL_ATTRIBUTES (t));

Probably already preexisting, by I wonder how safe remove_attribute is.
Aren't attributes shared in some cases
(like __attribute__((attr1, attr2, attr3)) int a, b, c, d;)?
Not really sure if something unshares them afterwards.
If they are shared, adding new attributes is fine, that will make the new
additions not shared and the tail shared, but remove_attribute could remove
it from all of them at once.  Perhaps I'm wrong, haven't verified.

Otherwise LGTM (though, I didn't spot anything about allocatables in the
patch, am I just looking wrong or are they still unsupported)?

Jakub

[PATCH] wide-int: Fix estimation of buffer sizes for wide_int printing [PR111800]

2023-10-14 Thread Jakub Jelinek

Hi!

As mentioned in the PR, my estimations on needed buffer size for wide_int
and especially widest_int printing were incorrect, I've used get_len ()
in the estimations, but that is true only for !wi::neg_p (x) values.
Under the hood, we have 3 ways to print numbers.
print_decs which if
  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
  || (wi.get_len () == 1))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive or
negative) and otherwise uses print_hex,
print_decu which if
  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
  || (wi.get_len () == 1 && !wi::neg_p (wi)))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive
only) and print_hex, which doesn't print most significant limbs which are
zero and the first limb which is non-zero prints such that redundant 0
hex digits aren't printed, while all limbs below that are printed with
"%016" PRIx64.  For wi::neg_p (x) values, the first limb of the precision
is always non-zero, so we print all the limbs for the precision.
So, the current estimations are accurate if !wi::neg_p (x), or when
print_decs will be used and x.get_len () == 1, otherwise we need to use
estimation based on get_precision () rather than get_len ().

The following patch does that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

The patch doesn't address what I've talked about earlier, that we might
actually stop using print_hex when asked for print_dec{s,u} - we could for
negative print_decs just negate and call print_decu, and in print_decu
e.g. in a loop UNSIGNED wi::divmod_trunc by
HOST_WIDE_INT_UC (1000) and print the 19 decimal digits of
remainder if quotient is non-zero, otherwise non-padded rest, and then
reshuffle the buffer.  And/or perhaps print_hex should also take signop
and print negative hex constants as -0x. if asked for SIGNED.
And finally, I think we should try to rewrite tree-ssa-ccp.cc bit-cp from
widest_int to wide_int, even the earlier:
PHI node value: CONSTANT 
0xffe2
 (0x19)
in the -fdump-tree-ccp-details dumps is horribly confusing when the
type is say just 32-bit or 64-bit, and with the recent widest_int changes
those are now around with > 32000 f hex digits in there.  Not to mention we 
shouldn't
really care about state of bits beyond the precision and I think we always
have the type in question around (x.val is INTEGER_CST of the right type
and we just to::widest it, just x.mask is widest_int).

2023-10-14  Jakub Jelinek  

PR tree-optimization/111800
gcc/
* wide-int.cc (assert_deceq): Use wi.get_len () for buffer size
estimation only if !wi::neg_p (wi) or if len is 1 and sgn is SIGNED,
otherwise use WIDE_INT_MAX_HWIS for wi.get_precision ().
(assert_hexeq): Use wi.get_len () for buffer size estimation only
if !wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for
wi.get_precision ().
* wide-int-print.cc (print_decs): Use wi.get_len () for buffer size
estimation only if !wi::neg_p (wi) or if len is 1, otherwise use
WIDE_INT_MAX_HWIS for wi.get_precision ().
(print_decu): Use wi.get_len () for buffer size estimation only if
!wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for
wi.get_precision ().
(print_hex): Likewise.
* value-range.cc (irange_bitmask::dump): Use get_len () for
buffer size estimation only if !wi::neg_p (wi), otherwise use
WIDE_INT_MAX_HWIS for get_precision ().
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
i_bound.get_len () for buffer size estimation only if
!wi::neg_p (i_bound) or if len is 1 and !TYPE_UNSIGNED, otherwise use
WIDE_INT_MAX_HWIS for i_bound.get_precision ().  Use TYPE_SIGN macro
in print_dec call argument.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.

--- gcc/wide-int.cc.jj  2023-10-13 19:34:44.288830022 +0200
+++ gcc/wide-int.cc 2023-10-13 20:23:12.889386810 +0200
@@ -2450,7 +2450,9 @@ static void
 assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
   unsigned len = wi.get_len ();
+  if ((len != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
+len = WIDE_INT_MAX_HWIS (wi.get_precision ());
   if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
 p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
   print_dec (wi, p, sgn);
@@ -2463,7 +2465,11 @@ static void
 as

[PATCH] wide-int, v2: Fix estimation of buffer sizes for wide_int printing [PR111800]

2023-10-14 Thread Jakub Jelinek

Hi!

On Sat, Oct 14, 2023 at 10:41:28AM +0200, Richard Biener wrote:
> Can we somehow abstract this common pattern?

So like this?  With room for the future tweaks like printing decimal
instead of hex numbers by print_dec*, where we'd only need to adjust
the inlines.  The XALLOCAVEC call is left for the callers, those would
make the inlines uninlinable and not doing what they should.

2023-10-14  Jakub Jelinek  

PR tree-optimization/111800
gcc/
* wide-int-print.h (print_dec_buf_size, print_decs_buf_size,
print_decu_buf_size, print_hex_buf_size): New inline functions.
* wide-int.cc (assert_deceq): Use print_dec_buf_size.
(assert_hexeq): Use print_hex_buf_size.
* wide-int-print.cc (print_decs): Use print_decs_buf_size.
(print_decu): Use print_decu_buf_size.
(print_hex): Use print_hex_buf_size.
(pp_wide_int_large): Use print_dec_buf_size.
* value-range.cc (irange_bitmask::dump): Use print_hex_buf_size.
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
print_dec_buf_size.  Use TYPE_SIGN macro in print_dec call argument.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.

--- gcc/wide-int-print.h.jj 2023-10-13 19:34:44.283830089 +0200
+++ gcc/wide-int-print.h2023-10-14 11:21:44.190603091 +0200
@@ -36,4 +36,40 @@ extern void print_hex (const wide_int_re
 extern void print_hex (const wide_int_ref &wi, FILE *file);
 extern void pp_wide_int_large (pretty_printer *, const wide_int_ref &, signop);
 
+inline bool
+print_dec_buf_size (const wide_int_ref &wi, signop sgn, unsigned int *len)
+{
+  unsigned int l = wi.get_len ();
+  if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
+l = WIDE_INT_MAX_HWIS (wi.get_precision ());
+  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
+  *len = l;
+  return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
+}
+
+inline bool
+print_decs_buf_size (const wide_int_ref &wi, unsigned int *len)
+{
+  return print_dec_buf_size (wi, SIGNED, len);
+}
+
+inline bool
+print_decu_buf_size (const wide_int_ref &wi, unsigned int *len)
+{
+  return print_dec_buf_size (wi, UNSIGNED, len);
+}
+
+inline bool
+print_hex_buf_size (const wide_int_ref &wi, unsigned int *len)
+{
+  unsigned int l;
+  if (wi::neg_p (wi))
+l = WIDE_INT_MAX_HWIS (wi.get_precision ());
+  else
+l = wi.get_len ();
+  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
+  *len = l;
+  return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
+}
+
 #endif /* WIDE_INT_PRINT_H */
--- gcc/wide-int.cc.jj  2023-10-14 11:07:52.738850767 +0200
+++ gcc/wide-int.cc 2023-10-14 11:22:03.100347386 +0200
@@ -2450,9 +2450,9 @@ static void
 assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
-  unsigned len = wi.get_len ();
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
+  unsigned len;
+  if (print_dec_buf_size (wi, sgn, &len))
+p = XALLOCAVEC (char, len);
   print_dec (wi, p, sgn);
   ASSERT_STREQ (expected, p);
 }
@@ -2463,9 +2463,9 @@ static void
 assert_hexeq (const char *expected, const wide_int_ref &wi)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
-  unsigned len = wi.get_len ();
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
+  unsigned len;
+  if (print_hex_buf_size (wi, &len))
+p = XALLOCAVEC (char, len);
   print_hex (wi, p);
   ASSERT_STREQ (expected, p);
 }
--- gcc/wide-int-print.cc.jj2023-10-14 11:07:52.737850781 +0200
+++ gcc/wide-int-print.cc   2023-10-14 11:37:43.994623668 +0200
@@ -75,9 +75,9 @@ void
 print_decs (const wide_int_ref &wi, FILE *file)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
-  unsigned len = wi.get_len ();
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
+  unsigned len;
+  if (print_decs_buf_size (wi, &len))
+p = XALLOCAVEC (char, len);
   print_decs (wi, p);
   fputs (p, file);
 }
@@ -102,9 +102,9 @@ void
 print_decu (const wide_int_ref &wi, FILE *file)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
-  unsigned len = wi.get_len ();
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
+  unsigned len;
+  if (print_decu_buf_size (wi, &len))
+p = XALLOCAVEC (char, len);
   print_decu (wi, p);
   fputs (p, file);
 }
@@ -141,9 +141,9 @@ void
 print_hex (const wide_int_ref &wi, FILE *file)
 {
   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
-  unsigned len

Re: [patch] libgomp.texi: Update "Enabling OpenMP"

2023-10-14 Thread Jakub Jelinek

On Sat, Oct 14, 2023 at 03:46:52PM -0600, Sandra Loosemore wrote:
> On 10/14/23 13:43, Tobias Burnus wrote:
> > When browsing libgomp doc, I came across
> > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html>> First, I
> > found especially the Fortran part difficult to read. Secondly,
> > it missed the C++ attribute syntax. And I also missed a reference to
> > -fopenmp-simd.
> > 
> > The attached patch tries to improve this. Note that it talks about C and
> > C++ attributes, even though C23's [[omp::]] support has not yet landed.
> > (But is expected very soon.)
> 
> Is somebody actively working on implementing that, and expecting to get it
> in before Stage 1 closes?  I don't think we should document features that

I am (attached is a WIP, which can now compile most of g++.dg/gomp/attrs-1.C
in -std=c2x -fopenmp, except for the scan/section directives).
That said, I agree it might be premature to document it before it is in.

> don't exist yet.  This syntax for C also isn't even in the draft OpenMP 6.0
> document so at this point it's just a hypothetical extension.

It is in OpenMP spec git and it is very unlikely it would be removed.

Jakub
--- gcc/cp/parser.h.jj  2023-09-20 08:42:51.987008923 +0200
+++ gcc/cp/parser.h 2023-10-12 13:32:42.503496571 +0200
@@ -408,7 +408,8 @@ struct GTY(()) cp_parser {
  identifiers) rather than an explicit template parameter list.  */
   bool fully_implicit_function_template_p;
 
-  /* TRUE if omp::directive or omp::sequence attributes may not appear.  */
+  /* TRUE if omp::directive, omp::decl or omp::sequence attributes may not
+ appear.  */
   bool omp_attrs_forbidden_p;
 
   /* Tracks the function's template parameter list when declaring a function
--- gcc/c/c-decl.cc.jj  2023-10-11 10:59:12.378170030 +0200
+++ gcc/c/c-decl.cc 2023-10-11 17:23:42.902257966 +0200
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.
 #include "context.h"  /* For 'g'.  */
 #include "omp-general.h"
 #include "omp-offload.h"  /* For offload_vars.  */
+#include "c-parser.h"
 
 #include "tree-pretty-print.h"
 
@@ -325,15 +326,34 @@ i_label_binding (tree node)
 
 /* The resulting tree type.  */
 
-union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE"),
+union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE + 2 * (TREE_CODE 
(&%h.generic) == C_TOKEN_VEC)"),
chain_next ("(union lang_tree_node *) c_tree_chain_next 
(&%h.generic)"))) lang_tree_node
  {
   union tree_node GTY ((tag ("0"),
desc ("tree_node_structure (&%h)")))
 generic;
   struct lang_identifier GTY ((tag ("1"))) identifier;
+  struct c_tree_token_vec GTY ((tag ("2"))) c_token_vec;
 };
 
+/* Langhook for tree_size.  */
+size_t
+c_tree_size (enum tree_code code)
+{
+  gcc_checking_assert (code >= NUM_TREE_CODES);
+  switch (code)
+{
+case C_TOKEN_VEC: return sizeof (c_tree_token_vec);
+default:
+  switch (TREE_CODE_CLASS (code))
+   {
+   case tcc_declaration: return sizeof (tree_decl_non_common);
+   case tcc_type: return sizeof (tree_type_non_common);
+   default: gcc_unreachable ();
+   }
+}
+}
+
 /* Track bindings and other things that matter for goto warnings.  For
efficiency, we do not gather all the decls at the point of
definition.  Instead, we point into the bindings structure.  As
--- gcc/c/c-parser.cc.jj2023-10-11 10:59:12.426169364 +0200
+++ gcc/c/c-parser.cc   2023-10-13 17:47:27.32902 +0200
@@ -247,12 +247,21 @@ struct GTY(()) c_parser {
  macro.  */
   BOOL_BITFIELD seen_string_literal : 1;
 
+  /* TRUE if omp::directive, omp::decl or omp::sequence attributes may not
+ appear.  */
+  BOOL_BITFIELD omp_attrs_forbidden_p : 1;
+
   /* Location of the last consumed token.  */
   location_t last_token_location;
 
   /* Holds state for parsing collapsed OMP_FOR loops.  Managed by
  c_parser_omp_for_loop.  */
   struct omp_for_parse_data * GTY((skip)) omp_for_parse_state;
+
+  /* If we're in the context of OpenMP directives written as C23
+ attributes turned into pragma, vector of tokens created from that,
+ otherwise NULL.  */
+  vec *in_omp_attribute_pragma;
 };
 
 /* Return a pointer to the Nth token in PARSERs tokens_buf.  */
@@ -1383,6 +1392,17 @@ c_parser_skip_to_pragma_eol (c_parser *p
 }
   while (token_type != CPP_PRAGMA_EOL);
 
+  if (parser->in_omp_attribute_pragma)
+{
+  c_token *token = c_parser_peek_token (parser);
+  if (token->type == CPP_EOF)
+   {
+ parser->tokens = &parser->tokens_buf[0];
+ parser->tokens_avail = token->flags;
+ parser->in_omp_attribute_pragma = NULL;
+   }
+}
+
   parser->error = false;
 }
 
@@ -5430,6 +5450,109 @@ c_parser_balanced_token_sequence (c_pars
 }
 }
 
+static bool c_parser_check_balanced_raw_token_sequence (c_parser *,
+   unsigned int *);
+
+/* Parse arguments of omp::directive or omp::decl attribute.
+
+

[PATCH] wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}

2023-10-17 Thread Jakub Jelinek

Hi!

The following patch implements printing of wide_int/widest_int numbers
decimally when asked for that using print_dec{,s,u}, even if they have
precision larger than 64 and get_len () above 1 (right now we printed
them hexadecimally and even negative numbers as huge positive hexadecimal).

In order to avoid the expensive division/modulo by 10^19 twice, once to
estimate how many will be needed and another to actually print it, the
patch prints the 19 digit chunks in reverse order (from least significant
to most significant) and then reorders those with linear complexity to form
the right printed number.
Tested with printing both 256 and 320 bit numbers (first as an example
of even number of 19 digit chunks plus one shorter above it, the second
as an example of odd number of 19 digit chunks plus one shorter above it).

The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now
is one byte too much (one byte for -, one for '\0') and too conservative,
so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g.
l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative
estimation (though more expensive to compute in inline code).
But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much
as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come
from?  Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple
of 64...

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-17  Jakub Jelinek  

* wide-int-print.h (print_dec_buf_size): For length, divide number
of bits by 3 and add 3 instead of division by 4 and adding 4.
* wide-int-print.cc (print_decs): Remove superfluous ()s.  Don't call
print_hex, instead call print_decu on either negated value after
printing - or on wi itself.
(print_decu): Don't call print_hex, instead print even large numbers
decimally.
(pp_wide_int_large): Assume len from print_dec_buf_size is big enough
even if it returns false.
* pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if
pp_wide_int_large should be used.
* tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size
to compute needed buffer size.

--- gcc/wide-int-print.h.jj 2023-10-15 23:04:06.195422820 +0200
+++ gcc/wide-int-print.h2023-10-16 10:14:41.327401697 +0200
@@ -42,7 +42,7 @@ print_dec_buf_size (const wide_int_ref &
   unsigned int l = wi.get_len ();
   if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
 l = WIDE_INT_MAX_HWIS (wi.get_precision ());
-  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
+  l = l * HOST_BITS_PER_WIDE_INT / 3 + 3;
   *len = l;
   return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
 }
--- gcc/wide-int-print.cc.jj2023-10-15 23:04:06.195422820 +0200
+++ gcc/wide-int-print.cc   2023-10-16 11:20:30.662174735 +0200
@@ -49,14 +49,12 @@ print_dec (const wide_int_ref &wi, FILE
 }
 
 
-/* Try to print the signed self in decimal to BUF if the number fits
-   in a HWI.  Other print in hex.  */
+/* Try to print the signed self in decimal to BUF.  */
 
 void
 print_decs (const wide_int_ref &wi, char *buf)
 {
-  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
-  || (wi.get_len () == 1))
+  if (wi.get_precision () <= HOST_BITS_PER_WIDE_INT || wi.get_len () == 1)
 {
   if (wi::neg_p (wi))
sprintf (buf, "-" HOST_WIDE_INT_PRINT_UNSIGNED,
@@ -64,12 +62,17 @@ print_decs (const wide_int_ref &wi, char
   else
sprintf (buf, HOST_WIDE_INT_PRINT_DEC, wi.to_shwi ());
 }
+  else if (wi::neg_p (wi))
+{
+  widest2_int w = widest2_int::from (wi, SIGNED);
+  *buf = '-';
+  print_decu (-w, buf + 1);
+}
   else
-print_hex (wi, buf);
+print_decu (wi, buf);
 }
 
-/* Try to print the signed self in decimal to FILE if the number fits
-   in a HWI.  Other print in hex.  */
+/* Try to print the signed self in decimal to FILE.  */
 
 void
 print_decs (const wide_int_ref &wi, FILE *file)
@@ -82,8 +85,7 @@ print_decs (const wide_int_ref &wi, FILE
   fputs (p, file);
 }
 
-/* Try to print the unsigned self in decimal to BUF if the number fits
-   in a HWI.  Other print in hex.  */
+/* Try to print the unsigned self in decimal to BUF.  */
 
 void
 print_decu (const wide_int_ref &wi, char *buf)
@@ -92,11 +94,37 @@ print_decu (const wide_int_ref &wi, char
   || (wi.get_len () == 1 && !wi::neg_p (wi)))
 sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, wi.to_uhwi ());
   else
-print_hex (wi, buf);
+{
+  widest2_int w = widest2_int::from (wi, UNSIGNED), r;
+  widest2_int ten19 = HOST_WIDE_INT_UC (1000);
+  char buf2[20], next1[19], next2[19];
+  size_t l, c = 0, i;
+  /* In order to avoid dividing this twice, print the 19 decimal
+digit chunks in reverse order into buffer

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-10-17 Thread Jakub Jelinek

On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote:
> C++11 (and C23) attribute do not seem to be properly handled:
> 
> [[omp::decl (declare target,indirect(1))]]
> int foo(void) { return 5; }
> [[omp::decl (declare target indirect)]]
> int bar(void) { return 8; }

Isn't that correct?
Declare target directive has the forms
declare target (list)
declare target
declare target clauses
The first form is essentially equivalent to declare target enter (list),
the second to begin declare target with no clauses.
Now,
[[omp::decl (declare target)]] int v;
matches the first form and so enter (v) clause is implied.
But say
[[omp::decl (declare target, device_type (any))]] int v;
is the third type and so nothing is implied, so it is equivalent to
int v;
#pragma omp declare target device_type (any)
Don't remember if that is supposed to be an error or just not do anything
because there is no enter or to or link clause.
So, I think if you want to make foo indirect, the above would have to be:
[[omp::decl (declare target,enter,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect enter)]]
int bar(void) { return 8; }
or so (or to instead of enter, but guess that is either deprecated or
removed but we should support that anyway).

Jakub

Re: [r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 07:14:36AM +, Richard Biener wrote:
> It's interesting that when the target has AVX512 enabled we get
> AVX512 style masks used also for SSE and AVX vector sizes but the
> OMP SIMD clones for SSE and AVX vector sizes use SSE/AVX style
> masks and only the AVX512 size clone uses the AVX512 integer mode
> mask.  That necessarily requires an extra setup instruction for
> the mask argument.

It is an ABI matter, the ABI of the clones shouldn't change just because
of a supposedly non ABI changing option (ISA flags like -mavx512f etc.).
Under the hood, if the callers are -mavx512f the expectation is that the
AVX512 simd clone will be used, but of course that doesn't have to be the
case either because of options requesting only 256 or 128-bit vector width
or loops with small safelen or number of iterations or other reasons.

Jakub

Re: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 11:12:44AM +0200, Thomas Schwinge wrote:
> Hi Tobias!
> 
> On 2023-10-13T15:29:52+0200, Tobias Burnus  wrote:
> > => Updated patch attached
> 
> When cherry-picking this commit 2d3dbf0eff668bed5f5f168b3cafd8590c54
> "Fortran: Support OpenMP's 'allocate' directive for stack vars" on top of
> slightly older GCC sources (mentioning that just in case that's
> relevant), in a configuration with offloading enabled (only), I see:
> 
> +FAIL: gfortran.dg/gomp/allocate-13.f90   -O  (internal compiler error: 
> tree code 'statement_list' is not supported in LTO streams)
> +FAIL: gfortran.dg/gomp/allocate-13.f90   -O  (test for excess errors)

Any references to GENERIC code in clauses etc. should have been gimplified
or cleared during gimplification, we shouldn't support STATEMENT_LIST
in LTO streaming.

Jakub

[PATCH] tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]

2023-10-18 Thread Jakub Jelinek

Hi!

GCC ICEs on the first testcase.  Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
  /* Remove some statements which can't be kept in the IL because they
 use SSA_NAME whose setter is going to be removed too.  */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.

So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.

While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-18  Jakub Jelinek  

PR tree-optimization/111845
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
statements for the 4 operand addition or subtraction of 3 operands
from 1 operand cases and remove them when successful.  Look for
nested additions even from rhs[2], not just rhs[1].

* gcc.dg/pr111845.c: New test.
* gcc.target/i386/pr111845.c: New test.

--- gcc/tree-ssa-math-opts.cc.jj2023-09-18 10:37:56.180963000 +0200
+++ gcc/tree-ssa-math-opts.cc   2023-10-17 14:46:39.430139602 +0200
@@ -4581,6 +4581,7 @@ match_uaddc_usubc (gimple_stmt_iterator
   if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
 return false;
 
+  auto_vec temp_stmts;
   if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
 {
   /* If overflow flag is ignored on the MSB limb, we can end up with
@@ -4615,26 +4616,29 @@ match_uaddc_usubc (gimple_stmt_iterator
  rhs[0] = gimple_assign_rhs1 (g);
  tree &r = rhs[2] ? rhs[3] : rhs[2];
  r = r2;
+ temp_stmts.quick_push (g);
}
  else
break;
}
-  while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
-   {
- gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
- if (has_single_use (rhs[1])
- && is_gimple_assign (g)
- && gimple_assign_rhs_code (g) == PLUS_EXPR)
-   {
- rhs[1] = gimple_assign_rhs1 (g);
- if (rhs[2])
-   rhs[3] = gimple_assign_rhs2 (g);
- else
-   rhs[2] = gimple_assign_rhs2 (g);
-   }
- else
-   break;
-   }
+  for (int i = 1; i <= 2; ++i)
+   while (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME && !rhs[3])
+ {
+   gimple *g = SSA_NAME_DEF_STMT (rhs[i]);
+   if (has_single_use (rhs[i])
+   && is_gimple_assign (g)
+   && gimple_assign_rhs_code (g) == PLUS_EXPR)
+ {
+   rhs[i] = gimple_assign_rhs1 (g);
+   if (rhs[2])
+ rhs[3] = gimple_assign_rhs2 (g);
+   else
+ rhs[2] = gimple_assign_rhs2 (g);
+   temp_stmts.quick_push (g);
+ }
+   else
+ break;
+ }
   /* If there are just 3 addends or one minuend and two subtrahends,
 check for UADDC or USUBC being pattern recognized earlier.
 Say r = op1 + op2 + ovf1 + ovf2; where the (ovf1 + ovf2) part
@@ -5039,7 +5043,17 @@ match_uaddc_usubc (gimple_stmt_iterator
   g = gimple_build_assign (ilhs, IMAGPART_EXPR,
   build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
   if (rhs[2])
-gsi_insert_before (gsi, g, GSI_SAME_STMT);
+{
+  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  /* Remove some further statements which can't be kept in the IL because
+they can use SSA_NAMEs whose setter is going to be removed too.  */
+  while (temp_stmts.length ())
+   {
+ g = temp_stmts.pop ();
+ gsi2 = gsi_for_stmt (g);
+ gsi_remove (&gsi2, true);
+

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-18 Thread Jakub Jelinek

On Sun, Oct 15, 2023 at 12:43:10PM +0100, Richard Sandiford wrote:
> It seemed like there was considerable support for bumping the minimum
> to beyond 4.8.  I think we should wait until a decision has been made
> before adding more 4.8 workarounds.

I think adding a workaround until that decision is made and perhaps
removing it afterwards will make life easier for people still using gcc 4.8.

> Having a conditional explicit constructor is dangerous because it changes
> semantics.  E.g. consider:
> 
>   #include 
> 
>   union u { int x; };
>   void f(u *ptr) { new(ptr) u; }
>   void g(u *ptr) { new(ptr) u(); }
> 
> g(ptr) zeros ptr->x whereas f(ptr) doesn't.  If we add "u() {}" then g()
> does not zero ptr->x.
> 
> So if we did add the workaround, it would need to be unconditional,
> like you say.

What about using more directed workaround then?

Like (just stage1 build tested, perhaps with comment why we do that)
below?  Seems at least in stage1 it is the only problematic spot.

--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4951,8 +4951,14 @@ cse_insn (rtx_insn *insn)
  && is_a  (mode, &int_mode)
  && (extend_op = load_extend_op (int_mode)) != UNKNOWN)
{
+#if GCC_VERSION >= 5000
  struct rtx_def memory_extend_buf;
  rtx memory_extend_rtx = &memory_extend_buf;
+#else
+ alignas (alignof (rtx_def)) unsigned char
+   memory_extended_buf[sizeof (rtx_def)];
+ rtx memory_extend_rtx = (rtx) &memory_extended_buf[0];
+#endif
 
  /* Set what we are trying to extend and the operation it might
 have been extended with.  */


Jakub

Re: [Patch] OpenMP: Avoid ICE with LTO and 'omp allocate (was: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars)

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 12:56:01PM +0200, Tobias Burnus wrote:
> On 18.10.23 11:36, Jakub Jelinek wrote:
> > On Wed, Oct 18, 2023 at 11:12:44AM +0200, Thomas Schwinge wrote:
> > >  +FAIL: gfortran.dg/gomp/allocate-13.f90   -O  (internal compiler 
> > > error: tree code 'statement_list' is not supported in LTO streams)
> > Any references to GENERIC code in clauses etc. should have been gimplified
> > or cleared during gimplification, we shouldn't support STATEMENT_LIST
> > in LTO streaming.
> 
> We only needed the statement list as aid during the gimplify.cc handling
> of GOMP_alloc/GOMP_free for Fortran. How about just remove_attribute it
> in that case? As discussed, as DECL_ATTRIBUTE gets added from the left
> to the DECL_CHAIN, there shouldn't be a problem of introducing shared
> trees; note that 'omp allocate' itself is added per DECL, i.e. it does
> not introduce sharing itself, either.
> 
> Tested with x86-64-gnu-linux.
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> OpenMP: Avoid ICE with LTO and 'omp allocate'
> 
> gcc/ChangeLog:
> 
>   * gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
>   to avoid that auxillary statement list reaches to LTO.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/allocate-13a.f90: New test.

LGTM.

Jakub

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 11:23:49AM +0100, Richard Sandiford wrote:
> > --- a/gcc/cse.cc
> > +++ b/gcc/cse.cc
> > @@ -4951,8 +4951,14 @@ cse_insn (rtx_insn *insn)
> >   && is_a  (mode, &int_mode)
> >   && (extend_op = load_extend_op (int_mode)) != UNKNOWN)
> > {
> > +#if GCC_VERSION >= 5000
> >   struct rtx_def memory_extend_buf;
> >   rtx memory_extend_rtx = &memory_extend_buf;
> > +#else
> > + alignas (alignof (rtx_def)) unsigned char
> > +   memory_extended_buf[sizeof (rtx_def)];
> 
> Looks like the simpler "alignas (rtx_def)" should work.

It does.

> LGTM otherwise FWIW.

Here is what I'm bootstrapping/regtesting on gcc112 now (i.e. with 4.8.5
as system compiler), added details what bug we are working around.
The reduced testcase on which I've bisected it is:
struct poly_int {
  poly_int() = default;
  template  poly_int(const T &...) {}
};
union rtunion {
  poly_int rt_subregrt_rtx;
};
struct rtx_def {
  rtunion fld;
};
void cse_insn() { rtx_def memory_extend_buf; }
or even with just
  template  poly_int();
line in there.  Bet gcc 4.8/4.9 was unhappy about the
template variadic ctor accepting empty pack and being like
the default ctor but not defaulted in that case.
Making it guaranteed that it has at least one argument say through
  template  poly_int(const U &, const T &...) {}
fixes it for 4.8/4.9 as well.

2023-10-18  Jakub Jelinek  

PR bootstrap/111852
* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
using rtx_def type for memory_extend_buf, use unsigned char
arrayy with size of rtx_def and its alignment.

--- gcc/cse.cc.jj   2023-06-20 08:57:38.339505245 +0200
+++ gcc/cse.cc  2023-10-18 13:20:30.555836778 +0200
@@ -4951,8 +4951,15 @@ cse_insn (rtx_insn *insn)
  && is_a  (mode, &int_mode)
  && (extend_op = load_extend_op (int_mode)) != UNKNOWN)
{
+#if GCC_VERSION >= 5000
  struct rtx_def memory_extend_buf;
  rtx memory_extend_rtx = &memory_extend_buf;
+#else
+ /* Workaround GCC < 5 bug, fixed in r5-3834 as part of PR63362
+fix.  */
+ alignas (rtx_def) unsigned char memory_extended_buf[sizeof (rtx_def)];
+ rtx memory_extend_rtx = (rtx) &memory_extended_buf[0];
+#endif

  /* Set what we are trying to extend and the operation it might
 have been extended with.  */

Jakub

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 01:33:40PM +0200, Jakub Jelinek wrote:
> Making it guaranteed that it has at least one argument say through
>   template  poly_int(const U &, const T &...) {}
> fixes it for 4.8/4.9 as well.

So, perhaps (but so far totally untested, the other bootstrap is still
running):

2023-10-18  Jakub Jelinek  

* poly-int.h (poly_int::poly_int): Ensure the const Cs &...
argument ctor has at least one argument.

--- gcc/poly-int.h.jj   2023-10-13 19:34:44.112832389 +0200
+++ gcc/poly-int.h  2023-10-18 13:49:29.038751482 +0200
@@ -379,8 +379,8 @@ public:
   template
   poly_int (const poly_int &);
 
-  template
-  constexpr poly_int (const Cs &...);
+  template
+  constexpr poly_int (const C0 &, const Cs &...);
 
   poly_int &operator = (const poly_int &) = default;
 
@@ -446,11 +446,11 @@ poly_int::poly_int (const poly_int
 }
 
 template
-template
+template
 inline constexpr
-poly_int::poly_int (const Cs &... cs)
+poly_int::poly_int (const C0 &c0, const Cs &... cs)
   : poly_int (typename poly_int_fullness= N>::type (),
- cs...) {}
+ c0, cs...) {}
 
 /* Initialize with c0, cs..., and some trailing zeros.  */
 template


Jakub

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-18 Thread Jakub Jelinek

On Wed, Oct 18, 2023 at 05:28:10PM +, waffl3x wrote:
> I've seen plenty of these G_ or _ macros on strings around like in
> grokfndecl for these errors.
> 
> G_("static member function %qD cannot have cv-qualifier")
> G_("non-member function %qD cannot have cv-qualifier")
> 
> G_("static member function %qD cannot have ref-qualifier")
> G_("non-member function %qD cannot have ref-qualifier")
> 
> I have been able to figure out it relates to translation, but not
> exactly what the protocol around them is. I think in my original patch
> I had refactored this code a bunch, I figured adding a 3rd case to it
> justifies a refactor. I think I forgot to add those changes to the
> original patch, either that or I undid it or moved it somewhere else.
> Anyway, the point is, coming back to it now to re-add those diagnostics
> I realized I probably shouldn't have changed those strings.
> 
> I also have been wondering whether I should be putting macros on any
> strings I add, it seemed like there might have been a macro for text
> that needs translation. Is this something I should be doing?

There are different kinds of format strings in GCC, the most common
are the gcc-internal-format strings.  If you call a function which
is expected to take such translatable format string (in particular
a function which takes a gmsgid named argument like error, error_at,
pedwarn, warning_at, ...) and pass a string literal to that function,
nothing needs to be marked in a special way, both gcc/po/exgettext
is able to collect such literals into gcc/po/gcc.pot for translations
and the function is supposed to use gettext etc. to translate it
- e.g. see diagnostic_set_info using _(gmsgid) for that.
But, if there is e.g. a temporary pointer var which points to format
strings and only that is eventually passed to the diagnostic functions,
gcc/po/exgettext won't be able to collect such literals, which is where
the G_() macro comes into play and one marks the string as
gcc-internal-format with it; the translation is still handled by the
diagnostic function at runtime.  The N_() macro is similar but for c-format
strings instead.  The _() macro both collects for translations if it is
used with string literal, and expands to gettext call to translate it at
runtime, which is something that should be avoided if something translates
it again.

And another i18n rule is that one shouldn't try to construct diagnostic
messages from parts of english sentences, it is fine to fill in with %s/%qs
etc. language keywords etc. but otherwise the format string should contain
the whole diagnostic line, so that translators can reorder the words etc.

Jakub

Re: gcc 13.2 is missing warnings?

2023-10-19 Thread Jakub Jelinek

On Thu, Oct 19, 2023 at 07:39:43AM -0400, Eric Sokolowsky via Gcc wrote:
> I am using gcc 13.2 on Fedora 38. Consider the following program.
> 
> #include 
> int main(int argc, char **argv)
> {
> printf("Enter a number: ");
> int num = 0;
> scanf("%d", &num);
> 
> switch (num)
> {
> case 1:
> int a = num + 3;
> printf("The new number is %d.\n", a);
> break;
> case 2:
> int b = num - 4;
> printf("The new number is %d.\n", b);
> break;
> default:
> int c = num * 3;
> printf("The new number is %d.\n", c);
> break;
> }
> }
> 
> I would expect that gcc would complain about the declaration of
> variables (a, b, and c) within the case statements. When I run "gcc
> -Wall t.c" I get no warnings. When I run "g++ -Wall t.c" I get
> warnings and errors as expected. I do get warnings when using MinGW on
> Windows (gcc version 6.3 specifically). Did something change in 13.2?

C isn't C++.

In particular, the above is valid C23, which is why it is accepted as an
extension in older C language versions starting with GCC 11.
It is warned about with -pedantic/-Wpedantic and errored on with
-pedantic-errors/-Werror=pedantic unless -std=c2x or -std=gnu2x is used.

The C++ case is completely different.  There labels are allowed before
declarations already in C++98, but it is invalid to cross initialization
of some variable using the jump to case 2 or default labels above.
If you rewrite it as:
 case 1:
 int a;
 a = num + 3;
 printf("The new number is %d.\n", a);
 break;
 case 2:
 int b;
 b = num - 4;
 printf("The new number is %d.\n", b);
 break;
 default:
 int c;
 c = num * 3;
 printf("The new number is %d.\n", c);
 break;
it is valid C++ and it won't be diagnosed.

Note, this should have been posted to gcc-help instead.

Jakub

Re: [PATCH] ABOUT-GCC-NLS: add usage guidance

2023-10-19 Thread Jakub Jelinek

On Thu, Oct 19, 2023 at 11:11:30AM -0400, Jason Merrill wrote:
> A recent question led me to look at this file again, and it occurred to me 
> that
> it could use to offer more guidance.  OK for trunk?
> 
> -- 8< --
> 
> gcc/ChangeLog:
> 
>   * ABOUT-GCC-NLS: Add usage guidance.
> ---
>  gcc/ABOUT-GCC-NLS | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/ABOUT-GCC-NLS b/gcc/ABOUT-GCC-NLS
> index e90a67144e3..4c8b94d0811 100644
> --- a/gcc/ABOUT-GCC-NLS
> +++ b/gcc/ABOUT-GCC-NLS
> @@ -23,6 +23,19 @@ For example, GCC source code should not contain calls like 
> `error
>  ("unterminated comment")' instead, as it is the `error' function's
>  responsibility to translate the message before the user sees it.
>  
> +In general, use no markup for strings that are the immediate format string
> +argument of a diagnostic function.  Use G_("str") for strings that will be
> +used as the format string for a diagnostic but are e.g. assigned to a
> +variable first.  Use N_("str") for other strings, particularly in a
> +statically allocated array, that will be translated later by
> +e.g. _(msgs[idx]).  Use _("str") for strings that will not be translated

The difference between G_ and N_ is whether they are GCC internal diagnostic
format strings or printf family format strings (gcc-internal-format vs.
c-format in po/gcc.pot), not anything else, so G_ can be used in statically
allocated array as well and both for diagnostic routines and printf* family
when translation is desirable it is eventually translated through _() or
other gettext invocations, either inside of diagnostics.cc or elsewhere.
So, the note about statically allocated array should move to G_ as well, and
N_ should be described as similarly, but for strings which after translation
are passed to printf family functions or so.  ANd the translated later by
e.g. note should be again for both and said that it is done automatically
for GCC diagnostic routines.

Jakub

Re: [pushed] c++: use G_ instead of _

2023-10-19 Thread Jakub Jelinek

On Thu, Oct 19, 2023 at 11:31:58AM -0400, Jason Merrill wrote:
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -3607,8 +3607,8 @@ identify_goto (tree decl, location_t loc, const 
> location_t *locus,
>  {
>bool complained
>  = emit_diagnostic (diag_kind, loc, 0,
> -decl ? N_("jump to label %qD")
> -: N_("jump to case label"), decl);
> +decl ? G_("jump to label %qD")

N_ for this is wrong because gettext will then not properly verify
translators didn't screw things up by using some incompatible format
string in the translation.
I believe some translations e.g. changed %s to %S.  And that seems to be
still the case:
grep -B3 '[^%]%S' po/*.po
po/sr.po-#, fuzzy, gcc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Multiple %qs modifiers specified at %C"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Duplicate %s attribute specified at %L"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Duplicate BIND attribute specified at %L"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/tr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/tr.po-#| msgid "%s statement must appear in a MODULE"
po/tr.po-msgid "%s statement must appear in a MODULE"
po/tr.po:msgstr "%S deyimi bir MODULE'de görünmemeli"

> +: G_("jump to case label"), decl);

While in this case G_ is better just for consistency, N_ would work
exactly the same given that there are no format strings.

Jakub

Re: [PATCH RFA] diagnostic: rename new permerror overloads

2023-10-19 Thread Jakub Jelinek

On Thu, Oct 19, 2023 at 11:45:01AM -0400, Jason Merrill wrote:
> OK for trunk?
> 
> -- 8< --
> 
> While checking another change, I noticed that the new permerror overloads
> break gettext with "permerror used incompatibly as both
>  --keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
>  --keyword=permerror:3 --flag=permerror:3:gcc-internal-format".  So let's
> change the name.
> 
> gcc/ChangeLog:
> 
>   * diagnostic-core.h (permerror): Rename new overloads...
>   (permerror_opt): To this.
>   * diagnostic.cc: Likewise.
> 
> gcc/cp/ChangeLog:
> 
>   * typeck2.cc (check_narrowing): Adjust.

Ok.

Jakub

Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c

2018-07-02 Thread Jakub Jelinek

On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
> should be available via DW_OP_[GNU_]entry_value.
> 
> I see it is
> 
> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
> 
> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
> storage itself doesn't have a location but the
> type specifies the size.
> 
> (gdb) ptype a
> type = char [variable length]
> (gdb) p sizeof(a)
> $3 = 0
> 
> this looks like a gdb bug to me?
> 
> Btw, the location expression looks odd, if I deciper it correctly
> we end up with ((%rdi << 32) >> 32) - 1 which computes to 4
> but the upper bound should be 5.  The GIMPLE debug stmts compute
> the upper bound as (sizetype)((long)(i_1(D) + 1) - 1)

The << 32 >> 32 is sign extension.  And yes, for f1 I don't see why
DW_OP_GNU_entry_value shouldn't work, i in main is needed for the call to
f2, so needs to live in some register or memory in that function until the
second call.  For f2 i is needed after the bar call for the a[i + 4] read,
worst case in form of precomputed i + 4, but that is reversible op.

Jakub

[C++ PATCH] Fix extern_decl_map handling (PR c++/3698, PR c++/86208)

2018-07-03 Thread Jakub Jelinek

Hi!

This testcase got fixed in G++ 3.2, where we used for decision if inline
function body should be kept TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (...)),
but we now use cgraph and the testcase fails again; the particular problem
is that we set TREE_USED only on the local extern decl and don't propagate
it to the decl that extern_decl_map maps it into.

Fixed by the following patch, which propagates it at genericization time.
Bootstrapped/regtested on x86_64-linux and i686-linux.

Another option is:
+  /* If decl is local extern, recurse into corresponding decl.  */
+  if (cfun
+  && cp_function_chain
+  && cp_function_chain->extern_decl_map
+  && VAR_OR_FUNCTION_DECL_P (decl)
+  && DECL_EXTERNAL (decl))
+{
+  struct cxx_int_tree_map *h, in;
+  in.uid = DECL_UID (decl);
+  h = cp_function_chain->extern_decl_map->find_with_hash (&in, in.uid);
+  if (h)
+   TREE_USED (h->to) = 1;
+}
+
in mark_used, another one:
+  /* If decl is local extern, recurse into corresponding decl.  */
+  if (cfun
+  && cp_function_chain
+  && cp_function_chain->extern_decl_map
+  && VAR_OR_FUNCTION_DECL_P (decl)
+  && DECL_EXTERNAL (decl))
+{
+  struct cxx_int_tree_map *h, in;
+  in.uid = DECL_UID (decl);
+  h = cp_function_chain->extern_decl_map->find_with_hash (&in, in.uid);
+  if (h && !mark_used (h->to))
+   return false;
+}
+
in the same spot.
None of these fix the PR82204 though.

2018-07-02  Jakub Jelinek  

PR c++/3698
PR c++/86208
* cp-gimplify.c (cp_genericize_r): When using extern_decl_map, or
in TREE_USED flag from stmt to h->to.

* g++.dg/opt/pr3698.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2018-06-20 08:15:28.980857357 +0200
+++ gcc/cp/cp-gimplify.c2018-07-02 18:03:00.714313555 +0200
@@ -1085,6 +1085,7 @@ cp_genericize_r (tree *stmt_p, int *walk
   if (h)
{
  *stmt_p = h->to;
+ TREE_USED (h->to) |= TREE_USED (stmt);
  *walk_subtrees = 0;
  return NULL;
}
--- gcc/testsuite/g++.dg/opt/pr3698.C.jj2018-07-02 18:05:52.535479087 
+0200
+++ gcc/testsuite/g++.dg/opt/pr3698.C   2018-07-02 18:05:44.507471531 +0200
@@ -0,0 +1,21 @@
+// PR c++/3698
+// { dg-do link }
+// { dg-options "-O0" }
+
+struct X {
+  int i;
+};
+
+inline const int&
+OHashKey (const X& x)
+{
+  return x.i;
+}
+
+int
+main ()
+{
+ extern const int& OHashKey (const X& x);
+ X x;
+ return OHashKey (x);
+}

Jakub

[C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)

2018-07-03 Thread Jakub Jelinek

Hi!

While working on OpenMP range-for support, I ran into the weird thing that
the C++ FE accepts
  for (auto &i : a)
if (i != *__for_begin || &i == __for_end || &__for_range[0] != &a[0])
  __builtin_abort ();
outside of templates, but doesn't inside of templates.
I think we shouldn't let people do this at any time, it will just lead to
non-portable code, and when it works only outside of templates, it isn't a
well defined extension.

Now, we could just use create_tmp_var_name ("__for_range") instead of
get_identifier ("__for_range") etc. and the symbols would be non-accessible
to users (usually; if assembler doesn't support dots nor dollars in labels,
it is still a symbol with just alphanumeric chars in it, but there is a hard
to predict number suffix at least), or just NULL DECL_NAME.
But my understanding is that the intent was that in the debugger one could
use __for_range, __for_begin and __for_end to make it easier to debug
range fors.  In that case, the following patch solves it by using a name
not accessible for the user when parsing the body (with space in it) and
correcting the names when the var gets out of scope.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-07-02  Jakub Jelinek  

PR c++/85515
* parser.c (build_range_temp): Use "__for range" instead of
"__for_range".
(cp_convert_range_for): Use "__for begin" and "__for end" instead of
"__for_begin" and "__for_end".
* semantics.c (finish_for_stmt): Rename "__for {range,begin,end}"
local symbols to "__for_{range,begin,end}".

* g++.dg/pr85515-2.C: Add expected dg-error.
* g++.dg/cpp0x/range-for36.C: New test.

--- gcc/cp/parser.c.jj  2018-07-02 23:04:00.869370436 +0200
+++ gcc/cp/parser.c 2018-07-02 23:08:44.572618486 +0200
@@ -11953,7 +11953,7 @@ build_range_temp (tree range_expr)
 
   /* Create the __range variable.  */
   range_temp = build_decl (input_location, VAR_DECL,
-  get_identifier ("__for_range"), range_type);
+  get_identifier ("__for range"), range_type);
   TREE_USED (range_temp) = 1;
   DECL_ARTIFICIAL (range_temp) = 1;
 
@@ -12061,7 +12061,7 @@ cp_convert_range_for (tree statement, tr
 
   /* The new for initialization statement.  */
   begin = build_decl (input_location, VAR_DECL,
- get_identifier ("__for_begin"), iter_type);
+ get_identifier ("__for begin"), iter_type);
   TREE_USED (begin) = 1;
   DECL_ARTIFICIAL (begin) = 1;
   pushdecl (begin);
@@ -12072,7 +12072,7 @@ cp_convert_range_for (tree statement, tr
   if (cxx_dialect >= cxx17)
 iter_type = cv_unqualified (TREE_TYPE (end_expr));
   end = build_decl (input_location, VAR_DECL,
-   get_identifier ("__for_end"), iter_type);
+   get_identifier ("__for end"), iter_type);
   TREE_USED (end) = 1;
   DECL_ARTIFICIAL (end) = 1;
   pushdecl (end);
--- gcc/cp/semantics.c.jj   2018-06-25 14:51:23.096989196 +0200
+++ gcc/cp/semantics.c  2018-07-02 23:23:40.784400542 +0200
@@ -1060,7 +1060,31 @@ finish_for_stmt (tree for_stmt)
 : &FOR_SCOPE (for_stmt));
   tree scope = *scope_ptr;
   *scope_ptr = NULL;
+
+  /* During parsing of the body, range for uses "__for {range,begin,end}"
+ decl names to make those unaccessible by code in the body.
+ Change it to ones with underscore instead of space, so that it can
+ be inspected in the debugger.  */
+  tree range_for_decl[3] = { NULL_TREE, NULL_TREE, NULL_TREE };
+  for (int i = 0; i < 3; i++)
+{
+  tree id
+   = get_identifier ("__for range\0__for begin\0__for end" + 12 * i);
+  if (IDENTIFIER_BINDING (id)
+ && IDENTIFIER_BINDING (id)->scope == current_binding_level)
+   {
+ range_for_decl[i] = IDENTIFIER_BINDING (id)->value;
+ gcc_assert (VAR_P (range_for_decl[i])
+ && DECL_ARTIFICIAL (range_for_decl[i]));
+   }
+}
+
   add_stmt (do_poplevel (scope));
+
+  for (int i = 0; i < 3; i++)
+if (range_for_decl[i])
+  DECL_NAME (range_for_decl[i])
+   = get_identifier ("__for_range\0__for_begin\0__for_end" + 12 * i);
 }
 
 /* Begin a range-for-statement.  Returns a new RANGE_FOR_STMT.
--- gcc/testsuite/g++.dg/pr85515-2.C.jj 2018-04-27 22:28:02.889462532 +0200
+++ gcc/testsuite/g++.dg/pr85515-2.C2018-07-02 23:33:50.638930364 +0200
@@ -15,8 +15,7 @@ int test_2 ()
   int sum = 0;
   for (const auto v: arr) {
 sum += v;
-// TODO: should we issue an error for the following assignment?
-__for_begin = __for_end;
+__for_begin = __for_end;   // { dg-error "was not declared in this scope" }
   }
   return sum;

Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)

2018-07-03 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote:
> Can we make them DECL_ARTIFICIAL and/or make name-lookup never

They are DECL_ARTIFICIAL already.

> lookup DECL_ARTIFICIAL vars instead?

Not sure about that, will try to gather some statistics on how often we
rely on name-lookup of DECL_ARTIFICIALs.

Jakub

Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)

2018-07-03 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 11:58:31AM +0200, Richard Biener wrote:
> On Tue, Jul 3, 2018 at 11:43 AM Jakub Jelinek  wrote:
> >
> > On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote:
> > > Can we make them DECL_ARTIFICIAL and/or make name-lookup never
> >
> > They are DECL_ARTIFICIAL already.
> >
> > > lookup DECL_ARTIFICIAL vars instead?
> >
> > Not sure about that, will try to gather some statistics on how often we
> > rely on name-lookup of DECL_ARTIFICIALs.
> 
> Hmm, we might indeed.  At least we should make sure those
> cases never have valid identifiers?  Or is the implementation

At least __FUNCTION__, __PRETTY_FUNCTION__, __func__ are all local
VAR_DECLs with DECL_ARTIFICIAL that need to be found by name lookup
(so far gathered stats just show __FUNCTION__ in lookup_name_real_1).

Jakub

Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)

2018-07-03 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 01:24:28PM +0200, Jakub Jelinek wrote:
> On Tue, Jul 03, 2018 at 11:58:31AM +0200, Richard Biener wrote:
> > On Tue, Jul 3, 2018 at 11:43 AM Jakub Jelinek  wrote:
> > >
> > > On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote:
> > > > Can we make them DECL_ARTIFICIAL and/or make name-lookup never
> > >
> > > They are DECL_ARTIFICIAL already.
> > >
> > > > lookup DECL_ARTIFICIAL vars instead?
> > >
> > > Not sure about that, will try to gather some statistics on how often we
> > > rely on name-lookup of DECL_ARTIFICIALs.
> > 
> > Hmm, we might indeed.  At least we should make sure those
> > cases never have valid identifiers?  Or is the implementation
> 
> At least __FUNCTION__, __PRETTY_FUNCTION__, __func__ are all local
> VAR_DECLs with DECL_ARTIFICIAL that need to be found by name lookup
> (so far gathered stats just show __FUNCTION__ in lookup_name_real_1).

And it isn't limited to that, omp_priv/omp_orig/omp_in/omp_out too (the
OpenMP UDR artifical vars), also variables captured in lambdas,
anon union fields, ...
So I'm afraid it is not possible to ignore DECL_ARTIFICIAL VAR_DECLs in name
lookup and generally it is ok that they have user accessible names.
Just the range for vars are a special case.

Jakub

Re: [PATCH] Fix bootstrap on ia64 with old GCC version.

2018-07-03 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 07:22:19PM +0200, Martin Liška wrote:
> In order to make GCC 4.1 happy and build current tip, we need to define
> static constants out of a class definition.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-03  Martin Liska  
> 
>   * tree-switch-conversion.h (struct jump_table_cluster): Define
> constant values outside of class declaration.

That looks incorrect.  I don't see why 4.1 wouldn't allow the const static
data members initializers inside of the class.

You just need to define those vars, and the definition (without the
initializers) shouldn't go into the header, but to a single .c file instead
(I know right now there is just one .c file that includes this header, but
if we ever want to include it in more than one, it would be a problem;
if we never want to include in more than one, the question is why we have
the header file at all).

So IMHO keep tree-switch-conversion.h unmodified and add:

const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_size;
const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_speed;

to tree-switch-conversion.c somewhere.

> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 4beac785f05..8efb125aff1 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -259,12 +259,17 @@ struct jump_table_cluster: public group_cluster
>static bool is_enabled (void);
>  
>/* Max growth ratio for code that is optimized for size.  */
> -  static const unsigned HOST_WIDE_INT max_ratio_for_size = 3;
> +  static const unsigned HOST_WIDE_INT max_ratio_for_size;
>  
>/* Max growth ratio for code that is optimized for speed.  */
> -  static const unsigned HOST_WIDE_INT max_ratio_for_speed = 8;
> +  static const unsigned HOST_WIDE_INT max_ratio_for_speed;
>  };
>  
> +const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_size = 3;
> +
> +const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_speed = 8;
> +
> +
>  /* A GIMPLE switch statement can be expanded to a short sequence of bit-wise
>  comparisons.  "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)"
>  where CST and MINVAL are integer constants.  This is better than a series
> 


Jakub

Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations

2018-07-03 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 10:02:47PM +0100, Jonathan Wakely wrote:
> +#ifndef _GLIBCXX_BIT
> +#define _GLIBCXX_BIT 1
> +
> +#pragma GCC system_header
> +
> +#if __cplusplus >= 201402L
> +
> +#include 
> +#include 
> +
> +namespace std _GLIBCXX_VISIBILITY(default)
> +{
> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
> +  template
> +constexpr _Tp
> +__rotl(_Tp __x, unsigned int __s) noexcept
> +{
> +  constexpr auto _Nd = numeric_limits<_Tp>::digits;
> +  const unsigned __sN = __s % _Nd;
> +  if (__sN)
> +return (__x << __sN) | (__x >> (_Nd - __sN));

Wouldn't it be better to use some branchless pattern that
GCC can also optimize well, like:
  return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
(iff _Nd is always power of two), or perhaps
  return (__x << __sN) | (__x >> ((-_sN) % _Nd));
which is going to be folded into the above one for power of two constants?
E.g. ia32intrin.h also uses:
/* 64bit rol */
extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
__rolq (unsigned long long __X, int __C)
{
  __C &= 63;
  return (__X << __C) | (__X >> (-__C & 63));
}
etc.

Jakub

Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations

2018-07-04 Thread Jakub Jelinek

On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote:
> > Wouldn't it be better to use some branchless pattern that
> > GCC can also optimize well, like:
> >  return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
> > (iff _Nd is always power of two),
> 
> _Nd is 20 for one of the INT_N types on msp340, but we could have a
> special case for the rare integer types with unusual sizes.
> 
> > or perhaps
> >  return (__x << __sN) | (__x >> ((-_sN) % _Nd));
> > which is going to be folded into the above one for power of two constants?
> 
> That looks good.

Unfortunately it is not correct if _Nd is not power of two.
E.g. for __sN 1, -1U % 20 is 15, not 19.
So it would need to be 
  return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Unfortunately, our rotate pattern recognizer handles
  return (__x << __sN) | (__x >> ((-__sN) % _Nd));
or
  return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
but doesn't handle the _Nd - __sN case.
Is this C++17+ only?  Then perhaps
  if constexpr ((_Nd & (_Nd - 1)) == 0)
return (__x << __sN) | (__x >> (-__sN & (_Nd - 1)));
  return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));

Verify that on x86_64 for all the unsigned {char,short,int,long long} you
actually get a mere rol? instruction with perhaps some register movement,
but no masking, nor shifts etc.

> > E.g. ia32intrin.h also uses:
> > /* 64bit rol */
> > extern __inline unsigned long long
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > __rolq (unsigned long long __X, int __C)
> > {
> >  __C &= 63;
> >  return (__X << __C) | (__X >> (-__C & 63));
> > }
> > etc.
> 
> Should we delegate to those intrinsics for x86, so that
> __builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when
> relevant?

No, the pattern recognizers should handle (for power of two bitcounts)
even the char/short cases.  Those intrinsics predate the improvements
in rotate pattern recognition.

Jakub

Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations

2018-07-04 Thread Jakub Jelinek

On Wed, Jul 04, 2018 at 09:14:04AM +0100, Jonathan Wakely wrote:
> > Unfortunately it is not correct if _Nd is not power of two.
> > E.g. for __sN 1, -1U % 20 is 15, not 19.
> > So it would need to be
> >  return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
> > Unfortunately, our rotate pattern recognizer handles
> >  return (__x << __sN) | (__x >> ((-__sN) % _Nd));
> > or
> >  return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
> > but doesn't handle the _Nd - __sN case.
> > Is this C++17+ only?  Then perhaps
> 
> The std::rotr and std::rotl functions are C++2a only, but I've added
> the __rotr and __rotl versions for our own internal use in C++14 and
> later.
> 
> In practice I have no internal use for rotr and rotl, so I could
> remove the __rot[rl] forms. However, won't ((_Nd & (_Nd - 1)) optimize
> to a constant even without if-constexpr? I'll check.

It should, sure.

Anyway, I guess my C tests didn't test properly what happens in your
functions.  We actually do optimize:
unsigned long long foo (unsigned long long __x, unsigned int __s)
{
  constexpr int _Nd = 64;
  unsigned int _sN = __s % _Nd;
  return (__x << _sN) | (__x >> ((_Nd - _sN) % _Nd));
}
properly, just don't do it if it is:
unsigned long long foo (unsigned long long __x, unsigned int __s)
{
  int _Nd = 64;
  unsigned int _sN = __s % _Nd;
  return (__x << _sN) | (__x >> ((_Nd - _sN) % _Nd));
}
Apparently, the (64 - x) & 63 optimization to -x & 63 is only done
somewhere in the FEs and not in match.pd, which is something we should fix.

But with constexpr _Nd you actually can use
  return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
unconditionally.

Jakub

[PATCH] Move ((A & N) + B) & M -> (A + B) & M etc. optimization from fold-const.c to match.pd (PR tree-optimization/86401)

2018-07-05 Thread Jakub Jelinek

Hi!

I've encountered this while testing the rotate patterns in discussion
with Jonathan for the std::__rot{l,r}, in the rotate-9.c test without
this patch f1 is optimized into a rotate, but f2 is not.

Fixed by moving the optimization from fold-const.c to match.pd (well,
leaving a helper in fold-const.c to avoid too much duplication).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-07-05  Jakub Jelinek  

PR tree-optimization/86401
* fold-const.c (fold_binary_loc) : Move the
((A & N) + B) & M -> (A + B) & M etc. optimization into ...
(fold_bit_and_mask): ... here.  New helper function for match.pd.
* fold-const.h (fold_bit_and_mask): Declare.
* match.pd (((A & N) + B) & M -> (A + B) & M): New optimization.

* gcc.dg/tree-ssa/pr86401-1.c: New test.
* gcc.dg/tree-ssa/pr86401-2.c: New test.
* c-c++-common/rotate-9.c: New test.

--- gcc/fold-const.c.jj 2018-06-26 09:05:23.196346433 +0200
+++ gcc/fold-const.c2018-07-04 12:47:59.139981801 +0200
@@ -10236,121 +10236,6 @@ fold_binary_loc (location_t loc, enum tr
}
}
 
-  /* For constants M and N, if M == (1LL << cst) - 1 && (N & M) == M,
-((A & N) + B) & M -> (A + B) & M
-Similarly if (N & M) == 0,
-((A | N) + B) & M -> (A + B) & M
-and for - instead of + (or unary - instead of +)
-and/or ^ instead of |.
-If B is constant and (B & M) == 0, fold into A & M.  */
-  if (TREE_CODE (arg1) == INTEGER_CST)
-   {
- wi::tree_to_wide_ref cst1 = wi::to_wide (arg1);
- if ((~cst1 != 0) && (cst1 & (cst1 + 1)) == 0
- && INTEGRAL_TYPE_P (TREE_TYPE (arg0))
- && (TREE_CODE (arg0) == PLUS_EXPR
- || TREE_CODE (arg0) == MINUS_EXPR
- || TREE_CODE (arg0) == NEGATE_EXPR)
- && (TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0))
- || TREE_CODE (TREE_TYPE (arg0)) == INTEGER_TYPE))
-   {
- tree pmop[2];
- int which = 0;
- wide_int cst0;
-
- /* Now we know that arg0 is (C + D) or (C - D) or
--C and arg1 (M) is == (1LL << cst) - 1.
-Store C into PMOP[0] and D into PMOP[1].  */
- pmop[0] = TREE_OPERAND (arg0, 0);
- pmop[1] = NULL;
- if (TREE_CODE (arg0) != NEGATE_EXPR)
-   {
- pmop[1] = TREE_OPERAND (arg0, 1);
- which = 1;
-   }
-
- if ((wi::max_value (TREE_TYPE (arg0)) & cst1) != cst1)
-   which = -1;
-
- for (; which >= 0; which--)
-   switch (TREE_CODE (pmop[which]))
- {
- case BIT_AND_EXPR:
- case BIT_IOR_EXPR:
- case BIT_XOR_EXPR:
-   if (TREE_CODE (TREE_OPERAND (pmop[which], 1))
-   != INTEGER_CST)
- break;
-   cst0 = wi::to_wide (TREE_OPERAND (pmop[which], 1)) & cst1;
-   if (TREE_CODE (pmop[which]) == BIT_AND_EXPR)
- {
-   if (cst0 != cst1)
- break;
- }
-   else if (cst0 != 0)
- break;
-   /* If C or D is of the form (A & N) where
-  (N & M) == M, or of the form (A | N) or
-  (A ^ N) where (N & M) == 0, replace it with A.  */
-   pmop[which] = TREE_OPERAND (pmop[which], 0);
-   break;
- case INTEGER_CST:
-   /* If C or D is a N where (N & M) == 0, it can be
-  omitted (assumed 0).  */
-   if ((TREE_CODE (arg0) == PLUS_EXPR
-|| (TREE_CODE (arg0) == MINUS_EXPR && which == 0))
-   && (cst1 & wi::to_wide (pmop[which])) == 0)
- pmop[which] = NULL;
-   break;
- default:
-   break;
- }
-
- /* Only build anything new if we optimized one or both arguments
-above.  */
- if (pmop[0] != TREE_OPERAND (arg0, 0)
- || (TREE_CODE (arg0) != NEGATE_EXPR
- && pmop[1] != TREE_OPERAND (arg0, 1)))
-   {
- tree utype = TREE_TYPE (arg0);
- if (! TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0)))
-   {
- /* Perform the operations in a type that has defined
-overflow behavior.  */
- utype = unsigned_type_for (TREE_TYPE (arg0));
- if (pmop[0] != NULL)

Re: [PATCH][PR sanitizer/84250] Avoid global symbols collision when using both ASan and UBSan

2018-07-05 Thread Jakub Jelinek

On Wed, Jul 04, 2018 at 08:20:47PM +0300, Maxim Ostapenko wrote:
> On 07/04/2018 05:45 AM, Jeff Law wrote:
> > On 05/23/2018 11:15 AM, Maxim Ostapenko wrote:
> >> as described in PR, when using both ASan and UBSan
> >> (-fsanitize=address,undefined ), we have symbols collision for global
> >> functions, like __sanitizer_set_report_path. This leads to fuzzy results
> >> when printing reports into files e.g. for this test case:
> >>
> >> #include 
> >> int main(int argc, char **argv) {
> >>     __sanitizer_set_report_path("/tmp/sanitizer.txt");
> >>     int i = 23;
> >>     i <<= 32;
> >>     int *array = new int[100];
> >>     delete [] array;
> >>     return array[argc];
> >> }
> >>
> >> only ASan's report gets written to file; UBSan output goes to stderr.
> >>
> >> To resolve this issue we could use two approaches:
> >>
> >> 1) Use the same approach to that is implemented in Clang (UBSan embedded
> >> to ASan). The only caveat here is that we need to link (unused) C++ part
> >> of UBSan even in C programs when linking static ASan runtime. This
> >> happens because GCC, as opposed to Clang, doesn't split C and C++
> >> runtimes for sanitizers.
> >>
> >> 2) Just add SANITIZER_INTERFACE_ATTRIBUTE to report_file global
> >> variable. In this case all __sanitizer_set_report_path calls will set
> >> the same report_file variable. IMHO this is a hacky way to fix the
> >> issue, it's better to use the first option if possible.
> >>
> >>
> >> The attached patch fixes the symbols collision by embedding UBSan into
> >> ASan (variant 1), just like we do for LSan.
> >>
> >>
> >> Regtested/bootstrapped on x86_64-unknown-linux-gnu, looks reasonable
> >> enough for trunk?
> >>
> >>
> >> -Maxim
> >>
> >>
> >> pr84250-2.diff
> >>
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-05-23  Maxim Ostapenko  
> >>
> >>* config/gnu-user.h (LIBASAN_EARLY_SPEC): Pass -lstdc++ for static
> >>libasan.
> >>* gcc.c: Do not pass LIBUBSAN_SPEC if ASan is enabled with UBSan.
> >>
> >> libsanitizer/ChangeLog:
> >>
> >> 2018-05-23  Maxim Ostapenko  
> >>
> >>* Makefile.am: Reorder libs.
> >>* Makefile.in: Regenerate.
> >>* asan/Makefile.am: Define DCAN_SANITIZE_UB=1, add dependancy from
> >>libsanitizer_ubsan.la.
> >>* asan/Makefile.in: Regenerate.
> >>* ubsan/Makefile.am: Define new libsanitizer_ubsan.la library.
> >>* ubsan/Makefile.in: Regenerate.
> > You know this code better than anyone else working on GCC.  My only
> > concern would be the kernel builds with asan, but I suspect they're
> > providing their own runtime anyway, so the libstdc++ caveat shouldn't apply.
> 
> Yes, you are right, kernel provides its own runtime.
> 
> >
> > OK for the trunk.
> 
> Ok, thanks, I'll apply the patch today (with fixed ChangeLog entry).

This broke the c-c++-common/asan/pr59063-2.c test:

FAIL: c-c++-common/asan/pr59063-2.c   -O1  (test for excess errors)
Excess errors:
/usr/bin/ld: cannot find -lstdc++

While it could be fixed by tweaking asan-dg.exp, thinking about this, the
1) variant is actually not a good idea, it will not work properly anyway
if you link one library with -fsanitize=undefined and another library
with -fsanitize=address, the right solution is to make the two libraries
coexist sanely, so I'd prefer 2) or if not exporting a variable, export
an accessor function to get the address of the variable (or whole block of
shared state in one object between the libraries).

Yes, it means trying to get something accepted upstream, but anything else
is an ugly hack.

Jakub

Re: [PATCH] Fix several AVX512 intrinsic mask arguments.

2018-07-05 Thread Jakub Jelinek

On Thu, Jul 05, 2018 at 08:30:27PM +0300, Grazvydas Ignotas wrote:
> gcc/ChangeLog:
> 
> 2018-07-05  Grazvydas Ignotas  
> 
>   * config/i386/avx512bwintrin.h: (_mm512_mask_cmp_epi8_mask,
>   _mm512_mask_cmp_epu8_mask): Fix mask arguments.

LGTM, but
1) I think it would be nice to add a runtime testcase that fails (on avx512bw 
hw)
   without this patch and succeeds with this patch (have some non-zero and
   zero bits in the high 32 bits of the mask and test that the result is
   correct
2) there are other functions that have this bug, e.g.
   _mm_mask_cmp_epi8_mask, _mm256_mask_cmp_epi8_mask,
   _mm_mask_cmp_epu8_mask, _mm256_mask_cmp_epu8_mask in avx512vlbwintrin.h

Let's grep for all suspicious parts:
echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 
's/@@@/}\n/g' | grep '__mmask8.*__mmask\(16\|32\|64\)'
 _mm512_mask_bitshuffle_epi64_mask (__mmask8 __M, __m512i __A, __m512i __B) { 
return (__mmask64) __builtin_ia32_vpshufbitqmb512_mask ((__v64qi) __A, 
(__v64qi) __B, (__mmask64) __M); }
 _mm_mask_cmp_epi8_mask (__mmask8 __U, __m128i __X, __m128i __Y, const int __P) 
{ return (__mmask16) __builtin_ia32_cmpb128_mask ((__v16qi) __X, (__v16qi) __Y, 
__P, (__mmask16) __U); }
 _mm_mask_cmp_epu8_mask (__mmask8 __U, __m128i __X, __m128i __Y, const int __P) 
{ return (__mmask16) __builtin_ia32_ucmpb128_mask ((__v16qi) __X, (__v16qi) 
__Y, __P, (__mmask16) __U); }
echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 
's/@@@/}\n/g' | grep '__mmask16.*__mmask\(8\|32\|64\)'
 _mm512_mask_xor_epi64 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) { 
return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, (__v8di) __B, 
(__v8di) __W, (__mmask8) __U); }
 _mm512_maskz_xor_epi64 (__mmask16 __U, __m512i __A, __m512i __B) { return 
(__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, (__v8di) __B, (__v8di) 
_mm512_setzero_si512 (), (__mmask8) __U); }
 _mm512_mask_cmpneq_epi64_mask (__mmask16 __M, __m512i __X, __m512i __Y) { 
return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, (__v8di) __Y, 4, 
(__mmask8) __M); }
 _mm256_mask_cmp_epi8_mask (__mmask16 __U, __m256i __X, __m256i __Y, const int 
__P) { return (__mmask32) __builtin_ia32_cmpb256_mask ((__v32qi) __X, (__v32qi) 
__Y, __P, (__mmask32) __U); }
 _mm256_mask_cmp_epu8_mask (__mmask16 __U, __m256i __X, __m256i __Y, const int 
__P) { return (__mmask32) __builtin_ia32_ucmpb256_mask ((__v32qi) __X, 
(__v32qi) __Y, __P, (__mmask32) __U); }
 _mm_mask_add_ps (__m128 __W, __mmask16 __U, __m128 __A, __m128 __B) { return 
(__m128) __builtin_ia32_addps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) 
__W, (__mmask8) __U); }
 _mm_maskz_add_ps (__mmask16 __U, __m128 __A, __m128 __B) { return (__m128) 
__builtin_ia32_addps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) 
_mm_setzero_ps (), (__mmask8) __U); }
 _mm256_mask_add_ps (__m256 __W, __mmask16 __U, __m256 __A, __m256 __B) { 
return (__m256) __builtin_ia32_addps256_mask ((__v8sf) __A, (__v8sf) __B, 
(__v8sf) __W, (__mmask8) __U); }
 _mm256_maskz_add_ps (__mmask16 __U, __m256 __A, __m256 __B) { return (__m256) 
__builtin_ia32_addps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) 
_mm256_setzero_ps (), (__mmask8) __U); }
 _mm_mask_sub_ps (__m128 __W, __mmask16 __U, __m128 __A, __m128 __B) { return 
(__m128) __builtin_ia32_subps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) 
__W, (__mmask8) __U); }
 _mm_maskz_sub_ps (__mmask16 __U, __m128 __A, __m128 __B) { return (__m128) 
__builtin_ia32_subps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) 
_mm_setzero_ps (), (__mmask8) __U); }
 _mm256_mask_sub_ps (__m256 __W, __mmask16 __U, __m256 __A, __m256 __B) { 
return (__m256) __builtin_ia32_subps256_mask ((__v8sf) __A, (__v8sf) __B, 
(__v8sf) __W, (__mmask8) __U); }
 _mm256_maskz_sub_ps (__mmask16 __U, __m256 __A, __m256 __B) { return (__m256) 
__builtin_ia32_subps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) 
_mm256_setzero_ps (), (__mmask8) __U); }
 _mm256_maskz_cvtepi32_ps (__mmask16 __U, __m256i __A) { return (__m256) 
__builtin_ia32_cvtdq2ps256_mask ((__v8si) __A, (__v8sf) _mm256_setzero_ps (), 
(__mmask8) __U); }
 _mm_maskz_cvtepi32_ps (__mmask16 __U, __m128i __A) { return (__m128) 
__builtin_ia32_cvtdq2ps128_mask ((__v4si) __A, (__v4sf) _mm_setzero_ps (), 
(__mmask8) __U); }
echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 
's/@@@/}\n/g' | grep '__mmask32.*__mmask\(8\|16\|64\)'
 _mm512_mask_cmp_epi8_mask (__mmask32 __U, __m512i __X, __m512i __Y, const int 
__P) { return (__mmask64) __builtin_ia32_cmpb512_mask ((__v64qi) __X, (__v64qi) 
__Y, __P, (__mmask64) __U); }
 _mm512_mask_cmp_epu8_mask (__mmask32 __U, __m512i __X, __m512i __Y, const int 
__P) { return (__mmask64) __builtin_ia32_ucmpb512_mask ((__v64qi) __X, 
(__v64qi) __Y, __P, (__mmask64) __U); }
 _mm256_mask_cvtepi8_epi16 (__m256i __W, __mmask32 __U, __m128i __A) { return 
(__m256i) __builtin_ia32_pmovsxbw256_mask ((__v16qi) __A, (__v16hi) __W, 
(__mmask16) __U); }
 _mm_

Re: [PATCH] Fix several AVX512 intrinsic mask arguments.

2018-07-06 Thread Jakub Jelinek

On Thu, Jul 05, 2018 at 11:57:26PM +0300, Grazvydas Ignotas wrote:
> I think it would be more efficient if you took care of it. I won't
> have time for at least a few days anyway.

Here is what I have ATM, but will still need to work on the testsuite side
for it and bootstrap/regtest it.
In addition to the checks I've posted I've also done:
echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 
's/@@@/}\n/g' > /tmp/11
echo `sed -n '/^#define[ \t]_mm/,/)$/p' config/i386/*.h | sed 's/)$/@@@/' | sed 
's/\\$//'` | sed 's/@@@/)\n/g' | grep __mmask >> /tmp/11
for i in `grep '__builtin.*_UQI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask8; done
for i in `grep '__builtin.*_UHI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask16; done
for i in `grep '__builtin.*_USI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask32; done
for i in `grep '__builtin.*_UDI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask64; done
for i in `grep '__builtin.*_QI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask8; done
for i in `grep '__builtin.*_HI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask16; done
for i in `grep '__builtin.*_SI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask32; done
for i in `grep '__builtin.*_DI)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask64; done
for i in `grep '__builtin.*_UQI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask8; done
for i in `grep '__builtin.*_UHI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask16; done
for i in `grep '__builtin.*_USI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask32; done
for i in `grep '__builtin.*_UDI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask64; done
for i in `grep '__builtin.*_QI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask8; done
for i in `grep '__builtin.*_HI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask16; done
for i in `grep '__builtin.*_SI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask32; done
for i in `grep '__builtin.*_DI_INT)' config/i386/i386-builtin.def | sed 
's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep 
-v __mmask64; done
and finally manual checking (could be automated too) of:
for i in `grep '__builtin.*_INT)' config/i386/i386-builtin.def | grep -v 
'_U\?[QHSD]I_INT)' | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ 
\t(].*__mmask' /tmp/11 | grep -v __mmask32; done
For this last one, it is about trying to verify what kind of 
'__v\(2\|4\|8\|16\|32\|64\)[qhsd][if]'
is used with the different __mmask, 2/4/8 should be used with __mmask8,
16 with __mmask16, 32 with __mmask32 and 64 with __mmask64.

Some fixes in the patch are mostly for consistency and harmless for code
generation (e.g. when argument should have been __mmask8 but was __mmask16),
but several changes should fix wrong-code bugs.

--- gcc/config/i386/avx512bwintrin.h.jj 2018-01-03 10:20:06.699535804 +0100
+++ gcc/config/i386/avx512bwintrin.h2018-07-06 09:53:44.657235040 +0200
@@ -3043,7 +3043,7 @@ _mm512_cmp_epi16_mask (__m512i __X, __m5
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epi8_mask (__mmask32 __U, __m512i __X, __m512i __Y,
+_mm512_mask_cmp_epi8_mask (__mmask64 __U, __m512i __X, __m512i __Y,
   const int __P)
 {
   return (__mmask64) __builtin_ia32_cmpb512_mask ((__v64qi) __X,
@@ -3081,7 +3081,7 @@ _mm512_cmp_epu16_mask (__m512i __X, __m5
 
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epu8_mask (__mmask32 __U, __m512i __X, __m512i __Y,
+_mm512_mask_cmp_epu8_mask (__mmask64 __U, __m512i __X, __m512i __Y,
   const int __P)
 {
   return (__mmask64) __builtin_ia32_ucmpb512_mask ((__v64qi) __X,
--- gcc/config/i386/avx512bitalgintrin.h.jj

Re: [committed][gcc][patch] Require sse for testcase on i686.

2018-07-07 Thread Jakub Jelinek

On Fri, Jul 06, 2018 at 11:46:43AM +0100, Tamar Christina wrote:
> This fixes an ABI warning generated on i686-pc-linux-gnu when using
> `vector_size` with no sse enabled explicitly.
> 
> Regtested single test on x86_64-pc-linux-gnu with -m32 and no issues.
> 
> Committed under the GCC obvious rule.

That is insufficient, I get the FAIL on i686-linux.
You don't really need dg-require-effective-target sse, it is a dg-do compile
time only test and on i?86/x86_64 failures with old assemblers would show up
only when assembling.
But -msse really should be used on all i?86-*-* and x86_64-*-*.

You'd normally get the -msse etc. automatically, but dg-options
is undesirable in gcc.dg/vect/ tests where all those predefined options
are lost that way, dg-additional-options should be used instead (and you can
say use there -O2 -fno-tree-vectorize or whatever you want).

So, if you have spare cycles, please test such change whether it still FAILs
on arm with your patch reverted after such test changes.

> gcc/testsuite/
> 2018-07-06  Tamar Christina  
> 
>   PR target/84711
>   * gcc.dg/vect/pr84711.c: Add -msse for i686 targets.

In the meantime, I've committed following fix as obvious:

2018-07-07  Jakub Jelinek  

PR target/84711
* gcc.dg/vect/pr84711.c: Remove unnecessary sse
dg-require-effective-target.  Add -msse not just on i386-*, but
on all i?86-* and x86_64-*.

--- gcc/testsuite/gcc.dg/vect/pr84711.c.jj  2018-07-06 23:35:44.952791972 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr84711.c 2018-07-07 09:43:27.068785902 +0200
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target sse { target i386*-*-* } } */
 /* { dg-options "-O2" } */
-/* { dg-additional-options "-msse" { target i386*-*-* } } */
+/* { dg-additional-options "-msse" { target i?86-*-* x86_64-*-* } } */

 typedef int v4si
   __attribute__ ((vector_size (16)));

Jakub

[PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)

2018-07-07 Thread Jakub Jelinek

Hi!

So, apparently I've misread when exceptions are raised by
nextafter/nexttoward (and errno set).
if(((ix>=0x7ff0)&&((ix-0x7ff0)|lx)!=0) ||   /* x is nan */
   ((iy>=0x7ff0)&&((iy-0x7ff0)|ly)!=0)) /* y is nan */
   return x+y;
I believe the above only raises exception if either operand is sNaN and thus
we should handle it:
  if (REAL_VALUE_ISSIGNALING_NAN (*arg0)
  || REAL_VALUE_ISSIGNALING_NAN (*arg1))
return false;
Then:
if(x==y) return y;  /* x=y, return y */
If arguments are equal, no exception is raised even if it is denormal,
we handle this with:
  /* If x == y, return y cast to target type.  */
  if (cmp == 0)
{
  real_convert (r, fmt, y);
  return false;
}
Next:
if((ix|lx)==0) {/* x == 0 */
double u;
INSERT_WORDS(x,hy&0x8000,1);/* return +-minsubnormal */
u = math_opt_barrier (x);
u = u*u;
math_force_eval (u);/* raise underflow flag */
return x;
}
going from zero to +/- ulp should only raise underflow, but not set errno,
handled with:
  /* Similarly for nextafter (0, 1) raising underflow.  */
  else if (flag_trapping_math
   && arg0->cl == rvc_zero
   && result->cl != rvc_zero)
return false;
Then overflow:
hy = hx&0x7ff0;
if(hy>=0x7ff0) {
  double u = x+x;   /* overflow  */
  math_force_eval (u);
  __set_errno (ERANGE);
}
should be handled with:
  if (REAL_EXP (r) > fmt->emax)
{
  get_inf (r, x->sign);
  return true;
}
And finally there is:
if(hy<0x0010) {
double u = x*x; /* underflow */
math_force_eval (u);/* raise underflow flag */
  __set_errno (ERANGE);
}
which I've misread as raising exception and setting errno only if
the result is 0 and first arg isn't:
  return r->cl == rvc_zero;
but actually it is true also for all denormal returns except for the x == y
case, so the following patch uses:
  return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin;
instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-07-07  Jakub Jelinek  

PR c/86420
* real.c (real_nextafter): Return true if result is denormal.

* gcc.dg/nextafter-1.c (TEST): Adjust the tests that expect denormals
to be returned and when first argument is not 0, so that they don't do
anything for NEED_EXC or NEED_ERRNO.

--- gcc/real.c.jj   2018-05-06 23:12:49.211619736 +0200
+++ gcc/real.c  2018-07-06 18:42:44.761026632 +0200
@@ -5141,7 +5141,7 @@ real_nextafter (REAL_VALUE_TYPE *r, form
   get_zero (r, x->sign);
   return true;
 }
-  return r->cl == rvc_zero;
+  return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin;
 }
 
 /* Write into BUF the maximum representable finite floating-point
--- gcc/testsuite/gcc.dg/nextafter-1.c.jj   2018-05-10 09:38:03.040250709 
+0200
+++ gcc/testsuite/gcc.dg/nextafter-1.c  2018-07-06 19:17:55.138355524 +0200
@@ -58,23 +58,41 @@ name (void) 
 \
 = (NEED_EXC || NEED_ERRNO) ? __builtin_inf##l1 ()   \
   : fn (MAX1, __builtin_inf ());\
   CHECK (__builtin_isinf##l1 (m) && !__builtin_signbit (m));\
-  const type n = fn (DENORM_MIN1, 12.0##L2);\
+  const type n  \
+= (NEED_EXC || NEED_ERRNO) ? 2.0##L1 * DENORM_MIN1  \
+  : fn (DENORM_MIN1, 12.0##L2); \
   CHECK (n == 2.0##L1 * DENORM_MIN1);   \
-  const type o = fn (n, 24.0##L2);  \
+  const type o  \
+= (NEED_EXC || NEED_ERRNO) ? 3.0##L1 * DENORM_MIN1  \
+  : fn (n, 24.0##L2);   \
   CHECK (o == 3.0##L1 * DENORM_MIN1);   \
-  const type p = fn (o, 132.0##L2); \
+  const type p  \
+= (NEED_EXC || NEED_ERRNO) ? 4.0##L1 * DENORM_MIN1  \
+  : fn (o, 132.0##L2);  \
   CHECK (p == 4.0##L1 * DENORM_MIN1);   \
-  const type q = fn (2.0##L1 * DENORM_MIN1, -__builtin_inf ());
 \
+  const type q

[PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-07 Thread Jakub Jelinek

Hi!

On Fri, Jul 06, 2018 at 12:47:07PM +0200, Jakub Jelinek wrote:
> On Thu, Jul 05, 2018 at 11:57:26PM +0300, Grazvydas Ignotas wrote:
> > I think it would be more efficient if you took care of it. I won't
> > have time for at least a few days anyway.

Here is the complete patch, I found two further issues where
the __mmask mismatch was in between the return type and what was used
in the rest of the intrinsic, so not caught by my earlier greps.

I've added (except for the avx512bitalg which seems to have no runtime
test coverage whatsoever) tests that cover the real bugs and further
fixed the avx512*-vpcmp{,u}b-2.c test because (rel) << i triggered UB
if i could go up to 63.

I don't have AVX512* hw, so I've just bootstrapped/regtested the patch
normally on i686-linux and x86_64-linux AVX2 hw and tried the affected
tests without the config/i386/ changes and with them under SDE.
The patch should fix these FAILs:

FAIL: gcc.target/i386/avx512bw-vpcmpb-2.c execution test
FAIL: gcc.target/i386/avx512bw-vpcmpub-2.c execution test
FAIL: gcc.target/i386/avx512f-vinsertf32x4-3.c execution test
FAIL: gcc.target/i386/avx512f-vinserti32x4-3.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpgeb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpgeub-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpgeuw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpgew-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpleb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpleub-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpleuw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmplew-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpltb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpltub-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpltuw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpltw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpneqb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpnequb-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpnequw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpneqw-2.c execution test
FAIL: gcc.target/i386/avx512vl-vpcmpub-2.c execution test

Ok for trunk?

I guess we want to backport it soon, but would appreciate somebody testing
it on real AVX512-{BW,VL} hw before doing the backports.

Another thing to consider is whether we shouldn't add those grep/sed checks
I've been doing (at least the easy ones that don't cross-check the
i386-builtins.def against the uses in the intrin headers) to config/i386/t-*
some way.

2018-07-07  Jakub Jelinek  

* config/i386/avx512bitalgintrin.h (_mm512_mask_bitshuffle_epi64_mask):
Use __mmask64 type instead of __mmask8 for __M argument.
* config/i386/avx512fintrin.h (_mm512_mask_xor_epi64,
_mm512_maskz_xor_epi64): Use __mmask8 type instead of __mmask16 for
__U argument.
(_mm512_mask_cmpneq_epi64_mask): Use __mmask8 type instead of
__mmask16 for __M argument.
(_mm512_maskz_insertf32x4, _mm512_maskz_inserti32x4,
_mm512_mask_insertf32x4, _mm512_mask_inserti32x4): Cast last argument
to __mmask16 instead of __mmask8.
* config/i386/avx512vlintrin.h (_mm_mask_add_ps, _mm_maskz_add_ps,
_mm256_mask_add_ps, _mm256_maskz_add_ps, _mm_mask_sub_ps,
_mm_maskz_sub_ps, _mm256_mask_sub_ps, _mm256_maskz_sub_ps,
_mm256_maskz_cvtepi32_ps, _mm_maskz_cvtepi32_ps): Use __mmask8 type
instead of __mmask16 for __U argument.
* config/i386/avx512vlbwintrin.h (_mm_mask_cmp_epi8_mask): Use
__mmask16 instead of __mmask8 for __U argument.
(_mm256_mask_cmp_epi8_mask): Use __mmask32 instead of __mmask16 for
__U argument.
(_mm256_cmp_epi8_mask): Use __mmask32 return type instead of
__mmask16.
(_mm_mask_cmp_epu8_mask): Use __mmask16 instead of __mmask8 for __U
argument.
(_mm256_mask_cmp_epu8_mask): Use __mmask32 instead of __mmask16 for
__U argument.
(_mm256_cmp_epu8_mask): Use __mmask32 return type instead of
__mmask16.
(_mm_mask_cmp_epi16_mask): Cast last argument to __mmask8 instead
of __mmask16.
(_mm256_mask_cvtepi8_epi16): Use __mmask16 instead of __mmask32 for
__U argument.
(_mm_mask_cvtepi8_epi16): Use __mmask8 instead of __mmask32 for
__U argument.
(_mm256_mask_cvtepu8_epi16): Use __mmask16 instead of __mmask32 for
__U argument.
(_mm_mask_cvtepu8_epi16): Use __mmask8 instead of __mmask32 for
__U argument.
(_mm256_mask_cmpneq_epu8_mask, _mm256_mask_cmplt_epu8_mask,
_mm256_mask_cmpge_epu8_mask, _mm256_mask_cmple_epu8_mask): Change
return type as well as __M argument type and all casts from __mmask8
to __mmask32.
(_mm256_mask_cmpneq_epu16_mask, _mm256_mas

Re: [PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)

2018-07-07 Thread Jakub Jelinek

On Sat, Jul 07, 2018 at 11:55:17AM +0200, Marc Glisse wrote:
> On Sat, 7 Jul 2018, Jakub Jelinek wrote:
> 
> > 2018-07-07  Jakub Jelinek  
> > 
> > PR c/86420
> > * real.c (real_nextafter): Return true if result is denormal.
> 
> I have a question on the side: would it be hard / useful, in cases where
> nextafter may set errno or some exception flag, to fold the result to a
> constant while keeping the function call (ignoring the value it returns)? To
> clarify, I mean replace
> 
> _2 = nextafter(DBL_DENORM_MIN, 0);
> 
> with
> 
> nextafter(DBL_DENORM_MIN, 0);
> _2 = 0;
> 
> I think we already do that for some other calls, although I can't remember
> where. The point would be that we have the value of _2 and can keep folding
> its uses.

For errno purposes alone that would be possible, but the function is marked
#define ATTR_MATHFN_ERRNO (flag_errno_math ? \
ATTR_NOTHROW_LEAF_LIST : ATTR_CONST_NOTHROW_LEAF_LIST)
and thus with -ftrapping-math -fno-math-errno I'm afraid we'd immediately
DCE the call in the second form (without lhs).

Jakub

Re: [committed][gcc][patch] Require sse for testcase on i686.

2018-07-07 Thread Jakub Jelinek

On Sat, Jul 07, 2018 at 11:07:28AM +, Tamar Christina wrote:
> > On Fri, Jul 06, 2018 at 11:46:43AM +0100, Tamar Christina wrote:
> > > This fixes an ABI warning generated on i686-pc-linux-gnu when using
> > > `vector_size` with no sse enabled explicitly.
> > >
> > > Regtested single test on x86_64-pc-linux-gnu with -m32 and no issues.
> > >
> > > Committed under the GCC obvious rule.
> > 
> > That is insufficient, I get the FAIL on i686-linux.
> > You don't really need dg-require-effective-target sse, it is a dg-do compile
> > time only test and on i?86/x86_64 failures with old assemblers would show up
> > only when assembling.
> > But -msse really should be used on all i?86-*-* and x86_64-*-*.
> 
> Ah I wasn't are of that, I initially didn't add it to x86-64 since the
> test wasn't generating the ABI warning there.  But this is good to know
> for the future.

The thing is that i?86-*-* and x86_64-*-* can be both multilib, and only the
-m64 and -mx32 multilibs default to -msse2, -m32 does not.
In gcc.target/ia32 one uses typically ia32 effective target for the -m32
i?86/x86_64 multilib, but in this case it doesn't hurt to add -msse to all.

Jakub

Re: [PATCH][debug] Handle debug references to skipped params

2018-07-08 Thread Jakub Jelinek

On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5141,6 +5141,10 @@ expand_debug_source_expr (tree exp)
>  
>switch (TREE_CODE (exp))
>  {
> +case VAR_DECL:
> +  if (DECL_ABSTRACT_ORIGIN (exp))
> + return expand_debug_source_expr (DECL_ABSTRACT_ORIGIN (exp));
> +  break;
>  case PARM_DECL:
>{
>   mode = DECL_MODE (exp);

This is ok.

> diff --git a/gcc/testsuite/gcc.dg/vla-1.c b/gcc/testsuite/gcc.dg/vla-1.c
> new file mode 100644
> index 000..0c19feffd2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vla-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -O3 -fdump-tree-optimized" } */
> +
> +
> +/* One debug source bind is generated for the parameter, and two to describe 
> the
> +   sizes of a and b.  */
> +/* { dg-final { scan-tree-dump-times " s=> i" 3 "optimized" } } */

I think you at least need explicit
-fvar-tracking-assignments -fno-selective-scheduling -fno-selective-scheduling2
and perhaps some guard to ignore the test on nvptx which disables
-fvar-tracking unconditionally?

> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 427ef959740..6fbd8c3ca61 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -208,7 +208,9 @@ remap_ssa_name (tree name, copy_body_data *id)
> n = id->decl_map->get (val);
> if (n != NULL)
>   val = *n;
> -   if (TREE_CODE (val) != PARM_DECL)
> +   if (TREE_CODE (val) != PARM_DECL
> +   && !(TREE_CODE (val) == VAR_DECL
> +&& DECL_ABSTRACT_ORIGIN (val)))
>   {
> processing_debug_stmt = -1;
> return name;

Please use VAR_P macro.

Jakub

Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-09 Thread Jakub Jelinek

On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote:
> > I guess we want to backport it soon, but would appreciate somebody testing
> > it on real AVX512-{BW,VL} hw before doing the backports.
> 
> I've run the testsuite with this patch applied and all tests passed on
> i7-7800X.

Thanks for the testing.

> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c
> failures, but those seem unrelated.

These are dg-do compile tests, and they PASS for me, even when doing
make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 
i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'"
So, how exactly you've configured your gcc, what kind of options are
passed to the test and how they FAIL?

Jakub

Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-09 Thread Jakub Jelinek

On Mon, Jul 09, 2018 at 12:00:46PM +0300, Grazvydas Ignotas wrote:
> On Mon, Jul 9, 2018 at 10:37 AM, Jakub Jelinek  wrote:
> > On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote:
> >> > I guess we want to backport it soon, but would appreciate somebody 
> >> > testing
> >> > it on real AVX512-{BW,VL} hw before doing the backports.
> >>
> >> I've run the testsuite with this patch applied and all tests passed on
> >> i7-7800X.
> >
> > Thanks for the testing.
> >
> >> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c
> >> failures, but those seem unrelated.
> >
> > These are dg-do compile tests, and they PASS for me, even when doing
> > make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 
> > i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'"
> > So, how exactly you've configured your gcc, what kind of options are
> > passed to the test and how they FAIL?
> 
> I should've mentioned I've tested this patch on top of 8.1 release
> tarball and used crosstool-NG to build the toolchain with it's "GCC
> test suite" option enabled. It looks like crosstool is applying some
> patches, so the results might not be valid. Here is the log (seems to
> contain the configuration info), where I just grepped for FAIL and the
> new test names to see if they were actually run:
> 
> http://notaz.gp2x.de/misc/unsorted/gcc.log.xz

Don't see any FAILs even in your log file on the above tests:
spawn -ignore SIGHUP gcc 
/home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 
-ffat-lto-objects -S -o avx512vl-vmovdqa64-1.s
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c (test for excess errors)
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+\\([^\n]*%ymm[0-9]+(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+\\([^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\nxy]*\\(.{5,6}(?:\n|[ \\t]+#) 1
gcc.target/i386/avx512vl-vmovdqa64-1.c: vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1

spawn -ignore SIGHUP gcc 
/home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vpermilpdi-1.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 
-ffat-lto-objects -S -o avx512vl-vpermilpdi-1.s
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c (test for excess errors)
gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1

XFAIL is expected fail, not unexpected...

Jakub

[committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)

2018-07-10 Thread Jakub Jelinek

Hi!

While working on OpenMP 5.0 range-for support, I've discovered that even for
normal class iterators distribute parallel for gimplification ICEs in
several ways (other composite loop constructs work only because class
iterators are not allowed on them).  The problem is that the FEs emit the
code that needs to be done before computing number of the iterations around
the innermost construct, which we then wrap into OMP_PARALLEL and
OMP_DISTRIBUTE and then we want to compute number of iterations on the
OMP_DISTRIBUTE.  The following patch fixes it by detecting these cases and
moving the outer composite constructs right around the innermost one.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-07-10  Jakub Jelinek  

PR c++/86443
* gimplify.c (find_combined_omp_for): Add DATA argument, in addition
to finding the inner OMP_FOR/OMP_SIMD stmt find non-trivial wrappers,
BLOCKs with BLOCK_VARs, OMP_PARALLEL in between, OMP_FOR in between.
(gimplify_omp_for): For composite loops, move outer
OMP_{DISTRIBUTE,TASKLOOP,FOR,PARALLEL} right around innermost
OMP_FOR/OMP_SIMD if there are any non-trivial wrappers.  For class
iterators add any needed clauses.  Allow OMP_FOR_ORIG_DECLS to contain
TREE_LIST for both the original class iterator and the "last" helper
var.  Gimplify OMP_FOR_PRE_BODY before the outermost composite
loop, remember has_decl_expr from outer composite loops for the
innermost OMP_SIMD in TREE_PRIVATE bit on OMP_FOR_INIT.
gcc/c-family/
* c-omp.c (c_omp_check_loop_iv_r, c_omp_check_loop_iv): Allow declv
to contain TREE_LIST for both the original class iterator and the
"last" helper var.
gcc/cp/
* semantics.c (handle_omp_for_class_iterator): Remove lastp argument,
instead of setting *lastp turn orig_declv elt into a TREE_LIST.
(finish_omp_for): Adjust handle_omp_for_class_iterator caller.
* pt.c (tsubst_omp_for_iterator): Allow OMP_FOR_ORIG_DECLS to contain
TREE_LIST for both the original class iterator and the "last" helper
var.
libgomp/
* testsuite/libgomp.c++/for-15.C: New test.

--- gcc/gimplify.c.jj   2018-07-07 09:45:42.133890332 +0200
+++ gcc/gimplify.c  2018-07-09 15:47:14.587400243 +0200
@@ -9532,24 +9532,53 @@ gimplify_omp_task (tree *expr_p, gimple_
 }
 
 /* Helper function of gimplify_omp_for, find OMP_FOR resp. OMP_SIMD
-   with non-NULL OMP_FOR_INIT.  */
+   with non-NULL OMP_FOR_INIT.  Also, fill in pdata array,
+   pdata[0] non-NULL if there is anything non-trivial in between, pdata[1]
+   is address of OMP_PARALLEL in between if any, pdata[2] is address of
+   OMP_FOR in between if any and pdata[3] is address of the inner
+   OMP_FOR/OMP_SIMD.  */
 
 static tree
-find_combined_omp_for (tree *tp, int *walk_subtrees, void *)
+find_combined_omp_for (tree *tp, int *walk_subtrees, void *data)
 {
+  tree **pdata = (tree **) data;
   *walk_subtrees = 0;
   switch (TREE_CODE (*tp))
 {
 case OMP_FOR:
+  if (OMP_FOR_INIT (*tp) != NULL_TREE)
+   {
+ pdata[3] = tp;
+ return *tp;
+   }
+  pdata[2] = tp;
   *walk_subtrees = 1;
-  /* FALLTHRU */
+  break;
 case OMP_SIMD:
   if (OMP_FOR_INIT (*tp) != NULL_TREE)
-   return *tp;
+   {
+ pdata[3] = tp;
+ return *tp;
+   }
   break;
 case BIND_EXPR:
+  if (BIND_EXPR_VARS (*tp)
+ || (BIND_EXPR_BLOCK (*tp)
+ && BLOCK_VARS (BIND_EXPR_BLOCK (*tp
+   pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
 case STATEMENT_LIST:
+  if (!tsi_one_before_end_p (tsi_start (*tp)))
+   pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
+case TRY_FINALLY_EXPR:
+  pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
 case OMP_PARALLEL:
+  pdata[1] = tp;
   *walk_subtrees = 1;
   break;
 default:
@@ -9574,6 +9603,115 @@ gimplify_omp_for (tree *expr_p, gimple_s
 
   orig_for_stmt = for_stmt = *expr_p;
 
+  if (OMP_FOR_INIT (for_stmt) == NULL_TREE)
+{
+  tree *data[4] = { NULL, NULL, NULL, NULL };
+  gcc_assert (TREE_CODE (for_stmt) != OACC_LOOP);
+  inner_for_stmt = walk_tree (&OMP_FOR_BODY (for_stmt),
+ find_combined_omp_for, data, NULL);
+  if (inner_for_stmt == NULL_TREE)
+   {
+ gcc_assert (seen_error ());
+ *expr_p = NULL_TREE;
+ return GS_ERROR;
+   }
+  if (data[2] && OMP_FOR_PRE_BODY (*data[2]))
+   {
+ append_to_statement_list_force (OMP_FOR_PRE_BODY (*data[2]),
+ &OMP_FOR_PRE_BODY (for_stmt));
+ OMP_FOR_PRE_BODY (*data[2]) = NULL_TREE;
+   }
+  if (OMP_FOR_PRE_BODY (inner_for_stmt))
+   {
+ append_to_statement_list_force (OMP_FOR_PRE_BODY (inner_for_stmt),

[committed] Save/restore OpenMP linear clause modifiers in modules (PR fortran/86421)

2018-07-10 Thread Jakub Jelinek

Hi!

This patch (in a backwards compatible way) handles saving and restoring of
linear clause modifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-07-10  Jakub Jelinek  

PR fortran/86421
* module.c (omp_declare_simd_clauses): Add LINEAR with _REF, _VAL and
_UVAL suffixes.
(mio_omp_declare_simd): Save and restore ref, val and uval modifiers
on linear clauses.  Initialize n->where to gfc_current_locus.

* gfortran.dg/vect/pr86421.f90: New test.

--- gcc/fortran/module.c.jj 2018-02-13 09:28:10.0 +0100
+++ gcc/fortran/module.c2018-07-09 18:56:49.595348962 +0200
@@ -4098,6 +4098,9 @@ static const mstring omp_declare_simd_cl
 minit ("UNIFORM", 3),
 minit ("LINEAR", 4),
 minit ("ALIGNED", 5),
+minit ("LINEAR_REF", 33),
+minit ("LINEAR_VAL", 34),
+minit ("LINEAR_UVAL", 35),
 minit (NULL, -1)
 };
 
@@ -4140,7 +4143,10 @@ mio_omp_declare_simd (gfc_namespace *ns,
}
  for (n = ods->clauses->lists[OMP_LIST_LINEAR]; n; n = n->next)
{
- mio_name (4, omp_declare_simd_clauses);
+ if (n->u.linear_op == OMP_LINEAR_DEFAULT)
+   mio_name (4, omp_declare_simd_clauses);
+ else
+   mio_name (32 + n->u.linear_op, omp_declare_simd_clauses);
  mio_symbol_ref (&n->sym);
  mio_expr (&n->expr);
}
@@ -4181,11 +4187,20 @@ mio_omp_declare_simd (gfc_namespace *ns,
case 4:
case 5:
  *ptrs[t - 3] = n = gfc_get_omp_namelist ();
+   finish_namelist:
+ n->where = gfc_current_locus;
  ptrs[t - 3] = &n->next;
  mio_symbol_ref (&n->sym);
  if (t != 3)
mio_expr (&n->expr);
  break;
+   case 33:
+   case 34:
+   case 35:
+ *ptrs[1] = n = gfc_get_omp_namelist ();
+ n->u.linear_op = (enum gfc_omp_linear_op) (t - 32);
+ t = 4;
+ goto finish_namelist;
}
}
 }
--- gcc/testsuite/gfortran.dg/vect/pr86421.f90.jj   2018-07-09 
19:09:56.662398875 +0200
+++ gcc/testsuite/gfortran.dg/vect/pr86421.f90  2018-07-09 19:07:57.432240946 
+0200
@@ -0,0 +1,35 @@
+! PR fortran/86421
+! { dg-require-effective-target vect_simd_clones }
+! { dg-additional-options "-fopenmp-simd" }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+module mod86421
+  implicit none
+contains
+  subroutine foo(x, y, z)
+real :: x
+integer :: y, z
+!$omp declare simd linear(ref(x)) linear(val(y)) linear(uval(z))
+x = x + y
+z = z + 1
+  end subroutine
+end module mod86421
+
+program pr86421
+  use mod86421
+  implicit none
+  integer :: i, j
+  real :: a(64)
+  j = 0
+  do i = 1, 64
+a(i) = i
+  end do
+  !$omp simd
+  do i = 1, 64
+call foo (a(i), i, j)
+  end do
+  do i = 1, 64
+if (a(i) .ne. (2 * i)) stop 1
+  end do
+  if (j .ne. 64) stop 2
+end program pr86421

Jakub

[PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)

2018-07-10 Thread Jakub Jelinek

Hi!

cp_maybe_instrument_return is looking for a return stmt at the end of
function to decide whether to omit -fsanitize=return instrumentation or
__builtin_unreachable addition.  If a STATEMENT_LIST has a return followed
by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return
though.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-07-10  Jakub Jelinek  

PR sanitizer/86406
* cp-gimplify.c (cp_maybe_instrument_return): Skip trailing
DEBUG_BEGIN_STMTs.

* g++.dg/ubsan/pr86406.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2018-07-05 11:41:51.687718588 +0200
+++ gcc/cp/cp-gimplify.c2018-07-09 09:57:16.368775004 +0200
@@ -1621,6 +1621,13 @@ cp_maybe_instrument_return (tree fndecl)
case STATEMENT_LIST:
  {
tree_stmt_iterator i = tsi_last (t);
+   while (!tsi_end_p (i))
+ {
+   tree p = tsi_stmt (i);
+   if (TREE_CODE (p) != DEBUG_BEGIN_STMT)
+ break;
+   tsi_prev (&i);
+ }
if (!tsi_end_p (i))
  {
t = tsi_stmt (i);
--- gcc/testsuite/g++.dg/ubsan/pr86406.C.jj 2018-07-09 09:58:57.362878125 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr86406.C2018-07-09 09:58:37.716858063 
+0200
@@ -0,0 +1,33 @@
+// PR sanitizer/86406
+// { dg-do compile }
+// { dg-options "-fcompare-debug -fsanitize=undefined -g -O1" }
+
+typedef enum { } cmd_status;
+class ECell;
+class ECell_const_ptr { };
+class ECell_ptr
+{
+  ECell *mp_element;
+  ECell *getPointer () const { return mp_element; }
+public:
+  operator  ECell_const_ptr () const { return ECell_const_ptr(); }
+};
+
+extern ECell_ptr NULL_CELL;
+class VwUI_2DCellLayerView;
+class view_cell_layoutImpl
+{
+  cmd_status handleChangeFlags (VwUI_2DCellLayerView *
+  p_ui_celllayerview,
+  ECell_const_ptr p_peekCell);
+  cmd_status openCellLayoutView ();
+};
+
+cmd_status
+view_cell_layoutImpl::openCellLayoutView ()
+{
+  ECell_const_ptr pcell = NULL_CELL;
+  VwUI_2DCellLayerView *p_user_interface;
+  return handleChangeFlags (p_user_interface, pcell);
+  ;
+}

Jakub

Re: [PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)

2018-07-10 Thread Jakub Jelinek

On Tue, Jul 10, 2018 at 10:01:10AM +0200, Richard Biener wrote:
> On Tue, 10 Jul 2018, Jakub Jelinek wrote:
> > cp_maybe_instrument_return is looking for a return stmt at the end of
> > function to decide whether to omit -fsanitize=return instrumentation or
> > __builtin_unreachable addition.  If a STATEMENT_LIST has a return followed
> > by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return
> > though.
> > 
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> 
> OK.  This also affects the branch?

8.x only, will commit it there too.

> > 2018-07-10  Jakub Jelinek  
> > 
> > PR sanitizer/86406
> > * cp-gimplify.c (cp_maybe_instrument_return): Skip trailing
> > DEBUG_BEGIN_STMTs.
> > 
> > * g++.dg/ubsan/pr86406.C: New test.

Jakub

Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Jakub Jelinek

On Tue, Jul 10, 2018 at 03:17:48PM +0200, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> > Jeff told me that the recent popcount built-in detection is causing
> > kernel build issues as
> > ERROR: "__popcountsi2"
> > [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
> >
> > I could also reproduce this. AFIK, we should check if the libfunc is
> > defined while checking popcount?
> >
> > I am testing the attached RFC patch. Is this reasonable?
> 
> It doesn't work that way, all targets have this libfunc in libgcc.  This means
> the kernel has to provide it.  The only thing you could do is restrict
> replacement of CALL_EXPRs (in SCEV cprop) to those the target
> natively supports.

Yeah, that is what we've done in the past in other cases, e.g. PR82981
is somewhat similar.  So perhaps just check the optab and use it only if it
is supported?

Jakub

Re: [PATCH] PR debug/86459 - Fix -gsplit-dwarf -g3 gcc_assert

2018-07-10 Thread Jakub Jelinek

On Wed, Jul 11, 2018 at 12:20:08AM +0200, Mark Wielaard wrote:
> There was a typo in the output_macinfo_op gcc_assert.
> The function is called dwarf_FORM, not dwarf_form.
> Add the provided testcase from the bug to test -gsplit-dwarf -g3.
> 
> gcc/ChangeLog:
> 
>   PR debug/86459
>   * dwarf2out.c (output_macinfo_op): Fix dwarf_FORM typo in gcc_assert.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR debug/86459
>   * gcc.dg/pr86459.c: New test.

Ok, thanks.

Jakub

Re: [committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)

2018-07-11 Thread Jakub Jelinek

On Tue, Jul 10, 2018 at 09:18:18AM +0200, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

I found two small issues and one big issue (results being declare target)
which break the test if using non-shared memory offloading.

This should fix it, tested on x86_64-linux, committed to trunk.

2018-07-11  Jakub Jelinek  

PR c++/86443
* testsuite/libgomp.c++/for-15.C (a): Remove unused variable.
(results): Make sure the variable is not inside declare target region.
(qux): Remove unused function.

--- libgomp/testsuite/libgomp.c++/for-15.C  (revision 262551)
+++ libgomp/testsuite/libgomp.c++/for-15.C  (working copy)
@@ -88,10 +88,11 @@ private:
 
 template  const I &J::begin () { return b; }
 template  const I &J::end () { return e; }
+#pragma omp end declare target
 
-int a[2000];
 int results[2000];
 
+#pragma omp declare target
 template 
 void
 baz (I &i)
@@ -110,13 +111,6 @@ baz (int i)
 }
 
 void
-qux (I &i)
-{
-  if (*i != 1931)
-abort ();
-}
-
-void
 f1 (J j)
 {
 #pragma omp distribute parallel for default(none)


Jakub

[PATCH] Fix store-merging wrong-code issue (PR tree-optimization/86492)

2018-07-11 Thread Jakub Jelinek

Hi!

The following testcase is a similar issue to PR84503 and the fix is similar,
because coalesce_immediate_stores temporarily sorts the stores on ascending
bitpos and if stores are merged, the merged store is emitted in the location
of the latest of the stores, we need to verify that there is no overlap with
other stores that are originally before that latest store from those we are
considering and overlaps the set of stores we are considering to merge.
In that case we need to punt and not merge (unless we do smarts like prove
overlap between just some of the stores and force reordering).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and 8.2?

2018-07-11  Jakub Jelinek  

PR tree-optimization/86492
* gimple-ssa-store-merging.c
(imm_store_chain_info::coalesce_immediate_stores): Call
check_no_overlap even for the merge_overlapping case.  Formatting fix.

* gcc.c-torture/execute/pr86492.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2018-06-13 10:05:53.0 +0200
+++ gcc/gimple-ssa-store-merging.c  2018-07-11 19:24:12.084120206 +0200
@@ -2702,7 +2702,12 @@ imm_store_chain_info::coalesce_immediate
{
  /* Only allow overlapping stores of constants.  */
  if (info->rhs_code == INTEGER_CST
- && merged_store->stores[0]->rhs_code == INTEGER_CST)
+ && merged_store->stores[0]->rhs_code == INTEGER_CST
+ && check_no_overlap (m_store_info, i, INTEGER_CST,
+  MAX (merged_store->last_order, info->order),
+  MAX (merged_store->start
+   + merged_store->width,
+   info->bitpos + info->bitsize)))
{
  merged_store->merge_overlapping (info);
  goto done;
@@ -2732,10 +2737,8 @@ imm_store_chain_info::coalesce_immediate
  info->ops_swapped_p = true;
}
  if (check_no_overlap (m_store_info, i, info->rhs_code,
-   MAX (merged_store->last_order,
-info->order),
-   MAX (merged_store->start
-+ merged_store->width,
+   MAX (merged_store->last_order, info->order),
+   MAX (merged_store->start + merged_store->width,
 info->bitpos + info->bitsize)))
{
  /* Turn MEM_REF into BIT_INSERT_EXPR for bit-field stores.  */
--- gcc/testsuite/gcc.c-torture/execute/pr86492.c.jj2018-07-11 
19:40:27.760122514 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr86492.c   2018-07-11 
19:40:13.460107841 +0200
@@ -0,0 +1,34 @@
+/* PR tree-optimization/86492 */
+
+union U
+{
+  unsigned int r;
+  struct S
+  {
+unsigned int a:12;
+unsigned int b:4;
+unsigned int c:16;
+  } f;
+};
+
+__attribute__((noipa)) unsigned int
+foo (unsigned int x)
+{
+  union U u;
+  u.r = 0;
+  u.f.c = x;
+  u.f.b = 0xe;
+  return u.r;
+}
+
+int
+main ()
+{
+  union U u;
+  if (__CHAR_BIT__ * __SIZEOF_INT__ != 32 || sizeof (u.r) != sizeof (u.f))
+return 0;
+  u.r = foo (0x72);
+  if (u.f.a != 0 || u.f.b != 0xe || u.f.c != 0x72)
+__builtin_abort ();
+  return 0;
+}

Jakub

Re: [PATCH] Fix PR86462

2018-07-12 Thread Jakub Jelinek

On Thu, Jul 12, 2018 at 12:29:20PM +0200, Richard Biener wrote:
> After my PR86413 fix to always annotate existing lexical block DIEs with
> range attributes debuginfo grows significantly in case we previously
> had "stale" lexical block DIEs without any variables.
> 
> The following fixes this by eliding those comletely and not emitting
> a lexical block DIE for blocks that just contain DECL_INGORED_P
> entities.  This solves the reported size regression and the
> empty lexical block DIEs vanish.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

Do you have a proof that BLOCK_NON_LOCALIZED_VAR is ever !DECL_IGNORED_P?

I see it is filled with:
  if ((!optimize || debug_info_level > DINFO_LEVEL_TERSE)
  && !DECL_IGNORED_P (old_var)
  && nonlocalized_list)
vec_safe_push (*nonlocalized_list, old_var);
(twice) in tree-inline.c.  Anything else that populates it?

Jakub

Re: [PATCH] Fix PR86462

2018-07-12 Thread Jakub Jelinek

On Thu, Jul 12, 2018 at 01:01:14PM +0200, Richard Biener wrote:
> No, I don't think so.  OK, I'll remove the loop over 
> BLOCK_NON_LOCALIZED_VAR and keep the original check for it.
> 
> I also simplified stuff by hoisting the TREE_USED and friends checks.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok?

LGTM.

> 2018-07-12  Richard Biener  
> 
>   PR debug/86462
>   * dwarf2out.c (gen_block_die): Only output blocks when they have
>   at least one !DECL_IGNORED_P variable.

Jakub

[PATCH] Add testcase for already fixed bug

2018-07-12 Thread Jakub Jelinek

Hi!

I've created a self-contained testcase for a powerpc64{,le}
wrong-code introduced (at least for powerpc64le) with r256656
and just later found it was already fixed with r260329.
The SLP vectorized:
+  w.l = x.l;
+  w.u = x.u;
was actually swapping it, so acting as:
  w.u = x.l;
  w.l = x.u;
instead.  As the testcase is sufficiently different from PR85698
testcase, I've committed this one too to trunk.

2018-07-12  Jakub Jelinek  

* gcc.dg/torture/20180712-1.c: New test.

--- gcc/testsuite/gcc.dg/torture/20180712-1.c.jj2018-07-12 
13:14:46.513068757 +0200
+++ gcc/testsuite/gcc.dg/torture/20180712-1.c   2018-07-12 13:14:07.851035363 
+0200
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fstack-protector" { target fstack_protector } } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+
+struct S { int *l, *u; };
+int a[3];
+
+__attribute__((noipa)) struct S
+foo (void)
+{
+  int *p = a, *q = a + 1;
+  struct S s;
+  asm volatile ("" : "+g" (p), "+g" (q) : : "memory");
+  s.l = p;
+  s.u = q;
+  a[0]++;
+  return s;
+}
+
+__attribute__((noipa)) void
+bar (struct S *x)
+{
+  asm volatile ("" : : "g" (x) : "memory");
+  if (x->l != a || x->u != a + 1)
+__builtin_abort ();
+  a[1]++;
+}
+
+__attribute__((noipa)) int
+baz (int *x, int *y)
+{
+  int r = -1;
+  asm volatile ("" : "+g" (r) : "g" (x), "g" (y) : "memory");
+  a[2]++;
+  return r;
+}
+
+__attribute__((noipa)) void
+quux (void)
+{
+  asm volatile ("" : : : "memory");
+}
+
+__attribute__((noipa)) void
+qux (void)
+{
+  struct S v = foo ();
+  struct S w;
+  struct S x = foo ();
+  int y = 0;
+
+  w.l = x.l;
+  w.u = x.u;
+  if (baz (x.l, v.l) > 0)
+{
+  w.l = v.l;
+  y = 1;
+  quux ();
+}
+  if (baz (x.u, v.u) < 0)
+{
+  w.u = v.u;
+  y = 1;
+}
+  if (y)
+bar (&w);
+}
+
+int
+main ()
+{
+  qux ();
+  if (a[0] != 2 || a[1] != 1 || a[2] != 2)
+__builtin_abort ();
+  return 0;
+}


Jakub

Re: [PATCH] Fix PR84829

2018-07-12 Thread Jakub Jelinek

On Wed, Jul 04, 2018 at 09:41:17AM +0200, Richard Biener wrote:
> 
> The following removes adding -lieee to the link command when -mieee-fp
> is specified for GNU targets as suggested by Joseph in the PR.  It doesn't
> touch m32r so the fix may be incomplete depending on the C library used.
> 
> I've added a testcase that makes sure we can link but not functional
> tests of -mieee-fp which I suppose exist already (eh...).
> 
> Bootstrap and regtest is running on x86_64-unknown-linux-gnu with
> glibc 2.27 where the testcase fails before the patch.
> 
> OK for trunk and active branches?
> 
> Thanks,
> Richard.
> 
> >From a71f976df0f17790202c0a692920c9d67e009259 Mon Sep 17 00:00:00 2001
> From: Richard Guenther 
> Date: Wed, 4 Jul 2018 09:34:14 +0200
> Subject: [PATCH] fix-pr84829
> 
> 2018-07-04  Richard Biener  
> 
>   PR target/84829
>   * config/gnu-user.h (GNU_USER_TARGET_NO_PTHREADS_LIB_SPEC):
>   Remove -mieee-fp handling.
> 
>   * gcc.target/i386/pr84829.c: New testcase.

Ok.

Jakub

[committed] #pragma omp declare target fixes

2018-07-12 Thread Jakub Jelinek

Hi!

As the following testcases show, diagnosing non-mappable type already in
c*_decl_attributes is too early, the type might not be finalized and
completed at that point for some variables yet.

This patch defers it until *finish_decl.  Bootstrapped/regtested on
x86_64-linux and i686-linux, committed to trunk and 8.2.

2018-07-12  Jakub Jelinek  

* c-attribs.c (c_common_attribute_table): Add
"omp declare target implicit" attribute.

* c-decl.c (c_decl_attributes): Don't diagnose vars without mappable
type here, instead add "omp declare target implicit" attribute.
(finish_decl): Diagnose vars without mappable type here.

* decl2.c (cplus_decl_attributes): Don't diagnose vars without mappable
type here, instead add "omp declare target implicit" attribute.  Add
that attribute instead of "omp declare target" also when
processing_template_decl.
* decl.c (cp_finish_decl): Diagnose vars without mappable type here,
and before calling cp_omp_mappable_type call complete_type.

* c-c++-common/gomp/declare-target-3.c: New test.
* g++.dg/gomp/declare-target-2.C: New test.

--- gcc/c-family/c-attribs.c.jj 2018-07-12 09:33:07.648102554 +0200
+++ gcc/c-family/c-attribs.c2018-07-12 16:18:46.483238747 +0200
@@ -439,6 +439,8 @@ const struct attribute_spec c_common_att
  handle_omp_declare_target_attribute, NULL },
   { "omp declare target link", 0, 0, true, false, false, false,
  handle_omp_declare_target_attribute, NULL },
+  { "omp declare target implicit", 0, 0, true, false, false, false,
+ handle_omp_declare_target_attribute, NULL },
   { "alloc_align",   1, 1, false, true, true, false,
  handle_alloc_align_attribute,
  attr_alloc_exclusions },
--- gcc/c/c-decl.c.jj   2018-06-13 10:05:04.996092971 +0200
+++ gcc/c/c-decl.c  2018-07-12 17:48:35.370278519 +0200
@@ -4643,8 +4643,8 @@ c_decl_attributes (tree *node, tree attr
 {
   if (VAR_P (*node)
  && !lang_hooks.types.omp_mappable_type (TREE_TYPE (*node)))
-   error ("%q+D in declare target directive does not have mappable type",
-  *node);
+   attributes = tree_cons (get_identifier ("omp declare target implicit"),
+   NULL_TREE, attributes);
   else
attributes = tree_cons (get_identifier ("omp declare target"),
NULL_TREE, attributes);
@@ -5223,7 +5223,27 @@ finish_decl (tree decl, location_t init_
diagnose_uninitialized_cst_member (decl, type);
 }
 
-   invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl);
+  if (flag_openmp
+  && VAR_P (decl)
+  && lookup_attribute ("omp declare target implicit",
+  DECL_ATTRIBUTES (decl)))
+{
+  DECL_ATTRIBUTES (decl)
+   = remove_attribute ("omp declare target implicit",
+   DECL_ATTRIBUTES (decl));
+  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (decl)))
+   error ("%q+D in declare target directive does not have mappable type",
+  decl);
+  else if (!lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (decl))
+  && !lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (decl)))
+   DECL_ATTRIBUTES (decl)
+ = tree_cons (get_identifier ("omp declare target"),
+  NULL_TREE, DECL_ATTRIBUTES (decl));
+}
+
+  invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl);
 }
 
 /* Given a parsed parameter declaration, decode it into a PARM_DECL.
--- gcc/cp/decl2.c.jj   2018-07-02 18:01:29.300224096 +0200
+++ gcc/cp/decl2.c  2018-07-12 17:48:35.367278517 +0200
@@ -1490,11 +1490,11 @@ cplus_decl_attributes (tree *decl, tree
  && DECL_CLASS_SCOPE_P (*decl))
error ("%q+D static data member inside of declare target directive",
   *decl);
-  else if (!processing_template_decl
-  && VAR_P (*decl)
-  && !cp_omp_mappable_type (TREE_TYPE (*decl)))
-   error ("%q+D in declare target directive does not have mappable type",
-  *decl);
+  else if (VAR_P (*decl)
+  && (processing_template_decl
+  || !cp_omp_mappable_type (TREE_TYPE (*decl
+   attributes = tree_cons (get_identifier ("omp declare target implicit"),
+   NULL_TREE, attributes);
   else
attributes = tree_cons (get_identifier ("omp declare target"),

Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Jakub Jelinek

On Fri, Jul 13, 2018 at 01:49:38PM +0200, Richard Biener wrote:
> The testcase in the PR, while previously ICEing because the C++ FE doesn't
> properly capture VLA size fields, now ICEs because gimplification
> introduces SSA uses that appear in a different function than its
> definition.  This happens because there is tree sharing between
> the functions.  For nested functions (which the C++ lambdas are not)
> such tree sharing ended up being harmless before GCC7 because unnesting
> resolves all locals with wrong origin to the static chain (and 
> gimplification ordering made sure definitions always appear in the
> outer function).
> 
> The following resolves this by unsharing size expressions in c-common.c
> 
> I realize that this may end up pessimizing early code since 1:1
> tree-sharing with what is gimplified from a DECL_EXPR doesn't
> share the gimplification result.

I think the unsharing is just wrong, we never want to unshare those,
the SAVE_EXPR expression needs to be evaluated at the DECL_EXPR point and
never anywhere else, never twice (it could have side-effects etc.).
The C++ FE must be fixed to handle the lambda cases.

> Another option might be to force gimplification to not generate
> SSA temporaries when gimplifying size positions but gimplify_one_sizepos
> oddly enough unshares trees before gimplifying ...(!?)  This would
> need to be removed (see patch after the tested patch below).

I like that gimplify_one_sizepos change much more (I guess we need a
non-lambda testcase).
Lambdas were broken even before GCC7, while we might not ICE, we certainly
didn't generate correct code.

Jakub

C++ patch ping

2018-07-13 Thread Jakub Jelinek

Hi!

I'd like to ping the following C++ patches:

- PR c++/85515
  make range for temporaries unspellable during parsing and only
  turn them into spellable for debug info purposes
  http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html

- PR c++/3698, PR c++/86208
  extern_decl_map & TREE_USED fix (plus 2 untested variants)
  http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00084.html

Jakub

Re: C++ patch ping

2018-07-13 Thread Jakub Jelinek

On Fri, Jul 13, 2018 at 12:24:02PM -0400, Nathan Sidwell wrote:
> On 07/13/2018 09:49 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > I'd like to ping the following C++ patches:
> > 
> > - PR c++/85515
> >make range for temporaries unspellable during parsing and only
> >turn them into spellable for debug info purposes
> >http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html
> 
> 
> How hard would it be to add the 6 special identifiers to the C++ global
> table via initialize_predefined_identifiers (decl.c) and then use them
> directly in the for range machinery?  repeated get_identifier
> ("string-const") just smells bad.

Probably not too hard, but we have hundreds of other
get_identifier ("string-const") calls in the middle-end, C++ FE, other FEs.
Are those 6 more important than say "abi_tag", "gnu", "begin", "end", "get",
"tuple_size", "tuple_element", and many others?

Is it worth caching those?

Jakub

Re: [PATCH][debug] Fix pre_dec handling in vartrack

2018-07-16 Thread Jakub Jelinek

On Sun, Jul 15, 2018 at 11:21:56PM +0200, Tom de Vries wrote:
> 2018-07-15  Tom de Vries  
> 
>   * var-tracking.c (vt_initialize): Fix pre_dec handling.  Print adjusted
>   insn slim if dump_flags request TDF_SLIM.
> 
>   * gcc.target/i386/vartrack-1.c: New test.
> 
> ---
> --- a/gcc/var-tracking.c
> +++ b/gcc/var-tracking.c
> @@ -115,6 +115,7 @@
>  #include "tree-pretty-print.h"
>  #include "rtl-iter.h"
>  #include "fibonacci_heap.h"
> +#include "print-rtl.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  typedef fibonacci_node  bb_heap_node_t;
> @@ -10208,12 +10209,17 @@ vt_initialize (void)
>   log_op_type (PATTERN (insn), bb, insn,
>MO_ADJUST, dump_file);
> VTI (bb)->mos.safe_push (mo);
> -   VTI (bb)->out.stack_adjust += pre;
>   }
>   }
>  
> cselib_hook_called = false;
> adjust_insn (bb, insn);
> +
> +   if (!frame_pointer_needed)
> + {
> +   if (pre)
> + VTI (bb)->out.stack_adjust += pre;
> + }

That is certainly unexpected.  For the pre side-effects, they should be
applied before adjusting the insn, not after that.
I'll want to see this under the debugger.

> if (DEBUG_MARKER_INSN_P (insn))
>   {
> reemit_marker_as_note (insn);
> @@ -10227,7 +10233,10 @@ vt_initialize (void)
> cselib_process_insn (insn);
> if (dump_file && (dump_flags & TDF_DETAILS))
>   {
> -   print_rtl_single (dump_file, insn);
> +   if (dump_flags & TDF_SLIM)
> + dump_insn_slim (dump_file, insn);
> +   else
> + print_rtl_single (dump_file, insn);
> dump_cselib_table (dump_file);
>   }
>   }

This part is certainly ok.

Jakub

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 18832 matches

Mail list logo