Re: [PATCH] Remove poly_int_pod
On Thu, Sep 28, 2023 at 10:55:46AM +0100, Richard Sandiford wrote: > Tested on aarch64-linux-gnu & x86_64-linux-gnu. Also tested with > Jakub's vec.h patch with the static_asserts uncommented; there were > no errors from poly_int-related stuff. OK to install? LGTM (mostly as the general idea, but didn't see anything wrong in the patch either), please give others a day or so to comment though. Jakub
Re: [PATCH] Remove poly_int_pod
On Fri, Sep 29, 2023 at 08:31:47AM +0200, Richard Biener wrote: > > IIRC the primary reason we settled on gcc-4.8.x was RHEL7/Centos7. With > > RHEL 7 approaching EOL moving the baseline forward would seem to make sense. > > > > I'd want to know if this affects folks using SuSE's enterprise distro > > before actually making the change, but I'm broadly in favor of moving > > forward it it's not going to have a major impact on users that are using > > enterprise distros. > > We're thinking of making GCC 13 the last major release to officially > build for SLE12 which > also uses GCC 4.8, so we'd be fine with doing this for GCC 14. We'd need to figure out what to do with the OS on gcc{110,112,135} on CFarm, those are all CentOS 7.9 with gcc 4.8.5. Sure, one possibility would be to install some DTS gcc if CentOS has one somewhere (though looking around, /opt/at12.0 already has there gcc 8 from IBM Advanced Toolchain 12). Jakub
Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int
On Thu, Sep 28, 2023 at 04:03:55PM +0200, Jakub Jelinek wrote: > Bet we should make wide_int_storage and widest_int_storage GTY ((user)) and > just declare but don't define the handlers or something similar. That doesn't catch anything, but the following incremental patch compiles just fine, proving we don't have any wide_int in GC memory anymore after the wide_int -> rwide_int change in dwarf2out.h. And the attached incremental patch on top of it which deletes even widest_int from GC shows that we use widest_int in GC in: omp_declare_variant_entry::score omp_declare_variant_entry::score_in_declare_simd_clone nb_iter_bound::member loop::nb_iterations_upper_bound loop::nb_iterations_likely_upper_bound loop::nb_iterations_estimate ipa_bits::value ipa_bits::mask so pretty much everything I spoke about (except I thought loop has 2 such members when it has 3). --- gcc/wide-int.h 2023-09-28 14:55:40.059632413 +0200 +++ gcc/wide-int.h 2023-09-29 09:59:58.703931879 +0200 @@ -85,7 +85,7 @@ and it always uses an inline buffer. offset_int and rwide_int are GC-friendly, wide_int and widest_int are not. - 3) widest_int. This representation is an approximation of + 4) widest_int. This representation is an approximation of infinite precision math. However, it is not really infinite precision math as in the GMP library. It is really finite precision math where the precision is WIDEST_INT_MAX_PRECISION. @@ -4063,21 +4063,61 @@ return wi::smod_trunc (x, y); } -template +void gt_ggc_mx (generic_wide_int *) = delete; +void gt_pch_nx (generic_wide_int *) = delete; +void gt_pch_nx (generic_wide_int *, + gt_pointer_operator, void *) = delete; + +inline void +gt_ggc_mx (generic_wide_int *) +{ +} + +inline void +gt_pch_nx (generic_wide_int *) +{ +} + +inline void +gt_pch_nx (generic_wide_int *, gt_pointer_operator, void *) +{ +} + +template +void +gt_ggc_mx (generic_wide_int > *) +{ +} + +template +void +gt_pch_nx (generic_wide_int > *) +{ +} + +template +void +gt_pch_nx (generic_wide_int > *, + gt_pointer_operator, void *) +{ +} + +template void -gt_ggc_mx (generic_wide_int *) +gt_ggc_mx (generic_wide_int > *) { } -template +template void -gt_pch_nx (generic_wide_int *) +gt_pch_nx (generic_wide_int > *) { } -template +template void -gt_pch_nx (generic_wide_int *, gt_pointer_operator, void *) +gt_pch_nx (generic_wide_int > *, + gt_pointer_operator, void *) { } Jakub --- gcc/wide-int.h.jj 2023-09-29 09:59:58.703931879 +0200 +++ gcc/wide-int.h 2023-09-29 10:05:27.653317149 +0200 @@ -4103,23 +4103,12 @@ gt_pch_nx (generic_wide_int -void -gt_ggc_mx (generic_wide_int > *) -{ -} - +void gt_ggc_mx (generic_wide_int > *) = delete; template -void -gt_pch_nx (generic_wide_int > *) -{ -} - +void gt_pch_nx (generic_wide_int > *) = delete; template -void -gt_pch_nx (generic_wide_int > *, - gt_pointer_operator, void *) -{ -} +void gt_pch_nx (generic_wide_int > *, + gt_pointer_operator, void *) = delete; template void
Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int
On Thu, Sep 28, 2023 at 11:53:53AM -0400, Aldy Hernandez wrote: > > ipa_bits is even worse, because unlike niter analysis, I think it is very > > much desirable to support IPA VRP of all supported _BitInt sizes. Shall > > we perhaps use trailing_wide_int storage in there, or conditionally > > rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else? > > BTW, we already track value/mask pairs in the irange, so I think ipa_bits > should ultimately disappear. Doing so would probably simplify the code > base. Well, having irange in GC memory would be equally bad, it does have non-trivial destructors (plus isn't meant to be space efficient either, right?). Though, perhaps we should use value-range-storage.h for that now that it can store value/mask pair as well? Either tweak it on the IPA side such that everything is stored together (both the IPA VRP and IPA bit CCP) or say use vrange_storage with zero (or one dummy) ranges + the value/mask pair. Jakub
[PATCH] use *_grow_cleared rather than *_grow on vect_unpromoted_value
On Wed, Sep 27, 2023 at 11:15:26AM +, Richard Biener wrote: > > tree-vect-patterns.cc:2947 unprom.quick_grow (nops); > > T = vect_unpromoted_value > > Go for quick_grow_cleared? Something else? > > The CTOR zero-initializes everything, so maybe it can go. In theory > .set_op could also be changed to .push_op ... So, I had a look at this and I think using quick_grow_cleared is best choice here. The nops is 2 or 1 most of the time, worst case 3, so the price of extra initialization of 4 pointer-sized-or-less members times 1, 2 or 3 doesn't seem worth bothering, it is similar to the bitmap_head case where we already pay the price for just one structure anytime we do vect_unpromoted_value unprom_diff; (and later set_op on it) or even vect_unpromoted_value unprom0[2]; With this patch and Richard S's poly_int_pod removal the static_assert can be enabled as well and gcc builds. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? The second patch waits for the poly_int_pod removal commit and has been just build tested but not bootstrapped yet. 2023-09-29 Jakub Jelinek * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Use quick_grow_cleared method on unprom rather than quick_grow. --- gcc/tree-vect-patterns.cc.jj2023-08-24 15:37:29.321410276 +0200 +++ gcc/tree-vect-patterns.cc 2023-09-29 09:45:27.980168865 +0200 @@ -2944,7 +2944,7 @@ vect_recog_over_widening_pattern (vec_in /* Check the operands. */ unsigned int nops = gimple_num_ops (last_stmt) - first_op; auto_vec unprom (nops); - unprom.quick_grow (nops); + unprom.quick_grow_cleared (nops); unsigned int min_precision = 0; bool single_use_p = false; for (unsigned int i = 0; i < nops; ++i) Jakub 2023-09-29 Jakub Jelinek * vec.h (quick_grow): Uncomment static_assert. --- gcc/vec.h.jj2023-09-29 10:39:09.327223861 +0200 +++ gcc/vec.h 2023-09-29 10:44:03.764108443 +0200 @@ -1396,7 +1396,7 @@ inline void vec::quick_grow (unsigned len) { gcc_checking_assert (length () <= len && len <= m_vecpfx.m_alloc); -// static_assert (std::is_trivially_default_constructible ::value, ""); + static_assert (std::is_trivially_default_constructible ::value, ""); m_vecpfx.m_num = len; }
Re: [PATCH] vec.h, v3: Make some ops work with non-trivially copy constructible and/or destructible types
On Fri, Sep 29, 2023 at 11:00:01AM +0100, Jonathan Wakely wrote: > > +/* Destruct N elements in DST. */ > > + > > +template > > +inline void > > +vec_destruct (T *dst, unsigned n) > > +{ > > + for ( ; n; ++dst, --n) > > +dst->~T (); > > +} > > + > > /* Control data for vectors. This contains the number of allocated > > and used slots inside a vector. */ > > > > @@ -310,6 +338,9 @@ va_heap::release (vec >if (v == NULL) > > return; > > > > + if (!std::is_trivially_destructible ::value) > > Do GCC's coding standards really require a space before the template > argument list, like for a function parameter list? > That style doesn't seem to be used elsewhere (and is not idiomatic for > C++ at all). Seems it is mixed, in gcc/ subdirectory: grep ' <[a-zA-Z]' *.h *.cc */*.h */*.cc | grep -v '#.*include' | wc -l 7143 grep '[^ ]<[a-zA-Z]' *.h *.cc */*.h */*.cc | grep -v '#.*include' | wc -l 13579 > > + template > > + struct is_trivially_copyable_or_pair > > > The space in "> >" is only required in C++98, we don't need it in C++11. I know, I was just following what code around used as well. Though admittedly that is from the days where we needed C++98 compatibility. Jakub
[PATCH] vec.h: Guard most of static assertions for GCC >= 5
Hi! As reported by Jonathan on IRC, my vec.h patch broke build with GCC 4.8.x or 4.9.x as system compiler, e.g. on CFarm. The problem is that while all of std::is_trivially_{destructible,copyable,default_constructible} traits are in C++, only std::is_trivially_destructible has been implemented in GCC 4.8, the latter two were added only in GCC 5. Only std::is_trivially_destructible is the really important one though, which is used to decide what pop returns and whether to invoke the destructors or not. The rest are solely used in static_asserts and as such I think it is acceptable if we don't assert those when built with GCC 4.8 or 4.9, anybody doing bootstrap from those system compilers or doing builds with newer GCC will catch that. So, the following patch guards those for 5+. If we switch to C++14 later on and start requiring newer version of system GCC as well (do we require GCC >= 5 which claims the last C++14 language features, or what provides all C++14 library features, or GCC >= 6 which uses -std=c++14 by default?), this patch then can be reverted. Ok for trunk? 2023-09-29 Jakub Jelinek * vec.h (quick_insert, ordered_remove, unordered_remove, block_remove, qsort, sort, stablesort, quick_grow): Guard std::is_trivially_{copyable,default_constructible} and vec_detail::is_trivially_copyable_or_pair static assertions with GCC_VERSION >= 5000. (vec_detail::is_trivially_copyable_or_pair): Guard definition with GCC_VERSION >= 5000. --- gcc/vec.h.jj2023-09-29 10:59:09.830551963 +0200 +++ gcc/vec.h 2023-09-29 12:29:32.676428677 +0200 @@ -1086,7 +1086,12 @@ vec::quick_insert (unsig { gcc_checking_assert (length () < allocated ()); gcc_checking_assert (ix <= length ()); +#if GCC_VERSION >= 5000 + /* GCC 4.8 and 4.9 only implement std::is_trivially_destructible, + but not std::is_trivially_copyable nor + std::is_trivially_default_constructible. */ static_assert (std::is_trivially_copyable ::value, ""); +#endif T *slot = &address ()[ix]; memmove (slot + 1, slot, (m_vecpfx.m_num++ - ix) * sizeof (T)); *slot = obj; @@ -1102,7 +1107,9 @@ inline void vec::ordered_remove (unsigned ix) { gcc_checking_assert (ix < length ()); +#if GCC_VERSION >= 5000 static_assert (std::is_trivially_copyable ::value, ""); +#endif T *slot = &address ()[ix]; memmove (slot, slot + 1, (--m_vecpfx.m_num - ix) * sizeof (T)); } @@ -1150,7 +1157,9 @@ inline void vec::unordered_remove (unsigned ix) { gcc_checking_assert (ix < length ()); +#if GCC_VERSION >= 5000 static_assert (std::is_trivially_copyable ::value, ""); +#endif T *p = address (); p[ix] = p[--m_vecpfx.m_num]; } @@ -1164,13 +1173,16 @@ inline void vec::block_remove (unsigned ix, unsigned len) { gcc_checking_assert (ix + len <= length ()); +#if GCC_VERSION >= 5000 static_assert (std::is_trivially_copyable ::value, ""); +#endif T *slot = &address ()[ix]; m_vecpfx.m_num -= len; memmove (slot, slot + len, (m_vecpfx.m_num - ix) * sizeof (T)); } +#if GCC_VERSION >= 5000 namespace vec_detail { /* gcc_{qsort,qsort_r,stablesort_r} implementation under the hood @@ -1189,6 +1201,7 @@ namespace vec_detail : std::integral_constant::value && std::is_trivially_copyable::value> { }; } +#endif /* Sort the contents of this vector with qsort. CMP is the comparison function to pass to qsort. */ @@ -1197,7 +1210,9 @@ template inline void vec::qsort (int (*cmp) (const void *, const void *)) { +#if GCC_VERSION >= 5000 static_assert (vec_detail::is_trivially_copyable_or_pair ::value, ""); +#endif if (length () > 1) gcc_qsort (address (), length (), sizeof (T), cmp); } @@ -1210,7 +1225,9 @@ inline void vec::sort (int (*cmp) (const void *, const void *, void *), void *data) { +#if GCC_VERSION >= 5000 static_assert (vec_detail::is_trivially_copyable_or_pair ::value, ""); +#endif if (length () > 1) gcc_sort_r (address (), length (), sizeof (T), cmp, data); } @@ -1223,7 +1240,9 @@ inline void vec::stablesort (int (*cmp) (const void *, const void *, void *), void *data) { +#if GCC_VERSION >= 5000 static_assert (vec_detail::is_trivially_copyable_or_pair ::value, ""); +#endif if (length () > 1) gcc_stablesort_r (address (), length (), sizeof (T), cmp, data); } @@ -1396,7 +1415,9 @@ inline void vec::quick_grow (unsigned len) { gcc_checking_assert (length () <= len && len <= m_vecpfx.m_alloc); +#if GCC_VERSION >= 5000 // static_assert (std::is_trivially_default_constructible ::value, ""); +#endif m_vecpfx.m_num = len; } Jakub
Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int
On Fri, Sep 29, 2023 at 11:30:06AM +0100, Richard Sandiford wrote: > Yeah, think I agree with this. widest_int really combined two things: > > (a) a way of storing any integer IL value without loss of precision > > (b) a way of attaching sign information > > Arithmetic on widest_int is dubious, because it wraps at an essentially > arbitrary point. Yeah, but some more sensible than others. Logical ops make sense at any precision, addition/subtraction might better have for widest_int a few bits of extra maximum precision over wide_int maximum supported precision such that overflows are caught, but say for multiplication that would be much more. Of course, we already document that bswap/rotates don't make any sense on widest_int, and as I wrote, e.g. lrshift or unsigned division of values with MSB set are very questionable too. > The approach in the patch looks good to me from a quick scan FWIW. > Will try to review over the weekend. For the actual patch I have another worry (but without the GTY widest_int uses and slsr etc. addressed first it can't be easily verified). wide_int_ref_storage has VAR_PRECISION like wide_int, while I've hacked up get_binary_precision not to allocate uselessly for it a lot of memory, I'm afraid any time we perform some operation on wide_int_refs created from widest_int (so, they get in most cases reasonably small get_len () but huge get_precision ()) we'd uselessly allocate 255 HOST_WIDE_INTs of memory from heap. So maybe wide_int should also like widest_int in the patch have u.val vs. u.valp decided based on estimated or later real get_len () rather than get_precision (). In the end, I think we should make sure that unless _BitInt is seen in the sources, we don't really ever allocate any heap memory in wide_int/widest_int. At least unless we change the number of inline elements of the arrays for wide_int/widest_int, if we lower that say to some hardcoded number of limbs on all arches (say 4 or 6 or 8, 9 x86-64 is kind of weird) that allocations happen only very rarely. Normal 128-bit precision math shouldn't trigger them certainly. Jakub
[committed] lowerbitint: Fix 2 bitint lowering bugs [PR111625]
Hi! This patch fixes 2 issues. One is when we want to get address of an uninitialized large/huge bitint SSA_NAME for multiplication/division/modulo or conversion to floating point (binary or decimal), the code just creates an uninitialized limb sized variable and passes address of that, but I forgot to initialize *prec in that case, so it invoked UB at compile time rather than at runtime. As it is UB, we could use anything valid as precision there, say 2 bits for signed, 1 bit for unsigned as smallest possible set of values, or full bitint precision as full random value. Though, because we only pass address to a single limb, I think it is best to pass the bitsize of the limb. And the other issue is that when ranger in range_to_prec finds some range is undefined_p (), it will assert {lower,upper}_bound () method isn't called on it, but we were. So, the patch adjusts range_to_proc to treat it like the !optimized case, full bitint precision. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2023-09-30 Jakub Jelinek PR middle-end/111625 PR middle-end/111637 * gimple-lower-bitint.cc (range_to_prec): Use prec or -prec if r.undefined_p (). (bitint_large_huge::handle_operand_addr): For uninitialized operands use limb_prec or -limb_prec precision. --- gcc/gimple-lower-bitint.cc.jj 2023-09-20 09:45:39.0 +0200 +++ gcc/gimple-lower-bitint.cc 2023-09-29 16:29:36.541473743 +0200 @@ -1932,7 +1932,8 @@ range_to_prec (tree op, gimple *stmt) unsigned int prec = TYPE_PRECISION (type); if (!optimize - || !get_range_query (cfun)->range_of_expr (r, op, stmt)) + || !get_range_query (cfun)->range_of_expr (r, op, stmt) + || r.undefined_p ()) { if (TYPE_UNSIGNED (type)) return prec; @@ -2066,6 +2067,9 @@ bitint_large_huge::handle_operand_addr ( } else if (gimple_code (g) == GIMPLE_NOP) { + *prec = TYPE_UNSIGNED (TREE_TYPE (op)) ? limb_prec : -limb_prec; + if (prec_stored) + *prec_stored = *prec; tree var = create_tmp_var (m_limb_type); TREE_ADDRESSABLE (var) = 1; ret = build_fold_addr_expr (var); Jakub
[committed] gimple-match-head: Fix a pasto in function comment
Hi! This function comment has been pasted from gimple_bitwise_equal_p and haven't been adjusted for different function name. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as obvious. 2023-09-30 Jakub Jelinek * gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Fix a pasto in function comment. --- gcc/gimple-match-head.cc.jj 2023-08-11 09:27:36.292378545 +0200 +++ gcc/gimple-match-head.cc2023-09-29 18:03:34.535495052 +0200 @@ -274,7 +274,7 @@ gimple_bitwise_equal_p (tree expr1, tree bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree)); bool gimple_maybe_cmp (tree, tree *, tree (*) (tree)); -/* Helper function for bitwise_equal_p macro. */ +/* Helper function for bitwise_inverted_equal_p macro. */ static inline bool gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, bool &wascmp, tree (*valueize) (tree)) Jakub
[PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]
Hi! I really can't figure out why one would need to add extra casts. type must be an integral type which has BIT_NOT_EXPR applied on it which yields all ones and we need a type in which negating 0 or 1 range will yield 0 or all ones, I think all integral types satisfy that. This fixes PR111369, where one of the bitint*.c tests FAILs with GCC_TEST_RUN_EXPENSIVE=1. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-09-30 Jakub Jelinek PR middle-end/111369 * match.pd (a?~t:t -> (-(a))^t): Always convert to type rather than using build_nonstandard_integer_type. --- gcc/match.pd.jj 2023-09-28 11:32:16.122434235 +0200 +++ gcc/match.pd2023-09-29 18:05:50.554640268 +0200 @@ -6742,12 +6742,7 @@ (define_operator_list SYNC_FETCH_AND_AND (if (INTEGRAL_TYPE_P (type) && bitwise_inverted_equal_p (@1, @2, wascmp) && (!wascmp || element_precision (type) == 1)) - (with { - auto prec = TYPE_PRECISION (type); - auto unsign = TYPE_UNSIGNED (type); - tree inttype = build_nonstandard_integer_type (prec, unsign); -} -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2))) + (bit_xor (negate (convert:type @0)) (convert:type @2) #endif /* Simplify pointer equality compares using PTA. */ Jakub
Re: [PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]
On Sat, Sep 30, 2023 at 11:44:59AM +0200, Jakub Jelinek wrote: > I really can't figure out why one would need to add extra casts. > type must be an integral type which has BIT_NOT_EXPR applied on it > which yields all ones and we need a type in which negating 0 or 1 > range will yield 0 or all ones, I think all integral types satisfy > that. > This fixes PR111369, where one of the bitint*.c tests FAILs with > GCC_TEST_RUN_EXPENSIVE=1. Though, I think there is an preexisting issue which the build_nonstandard_integer_type didn't help with; if type is signed 1-bit precision, then I think a ? ~t : t could be valid, but -(type)a would invoke UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit invokes UB. So perhaps we should guard this optimization on type having element precision > 1 or being unsigned. Plus the (convert:type @2) didn't make sense, @2 already must have TREE_TYPE type. So untested patch would be then: 2023-09-29 Jakub Jelinek PR middle-end/111369 * match.pd (a?~t:t -> (-(a))^t): Always convert to type rather than using build_nonstandard_integer_type. Punt if type is signed 1-bit precision type. --- gcc/match.pd.jj 2023-09-29 18:58:42.724956659 +0200 +++ gcc/match.pd2023-09-30 11:54:16.603280666 +0200 @@ -6741,13 +6741,9 @@ (define_operator_list SYNC_FETCH_AND_AND (with { bool wascmp; } (if (INTEGRAL_TYPE_P (type) && bitwise_inverted_equal_p (@1, @2, wascmp) - && (!wascmp || element_precision (type) == 1)) - (with { - auto prec = TYPE_PRECISION (type); - auto unsign = TYPE_UNSIGNED (type); - tree inttype = build_nonstandard_integer_type (prec, unsign); -} -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2))) + && (!wascmp || element_precision (type) == 1) + && (!TYPE_OVERFLOW_WRAPS (type) || element_precision (type) > 1)) + (bit_xor (negate (convert:type @0)) @2 #endif /* Simplify pointer equality compares using PTA. */ Jakub
Re: [PATCH] RISC-V: Use safe_grow_cleared for vector info [PR111469]
On Sun, Oct 01, 2023 at 07:03:51AM +0800, 钟居哲 wrote: > LGTM. > > juzhe.zh...@rivai.ai > > diff --git a/gcc/config/riscv/riscv-vsetvl.cc > b/gcc/config/riscv/riscv-vsetvl.cc > index af8c31d873c..4b06d93e7f9 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -2417,8 +2417,8 @@ vector_infos_manager::vector_infos_manager () >vector_antin = nullptr; >vector_antout = nullptr; >vector_earliest = nullptr; > - vector_insn_infos.safe_grow (get_max_uid ()); > - vector_block_infos.safe_grow (last_basic_block_for_fn (cfun)); > + vector_insn_infos.safe_grow_cleared (get_max_uid ()); > + vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun)); >if (!optimize) > { >basic_block cfg_bb; Note, while it works around the build failure, I strongly doubt it is the right fix in this case. The other spots which have been changed similarly are quite different, all the vec cases have been followed (almost) immediately by bitmap_initialize of all the elements or just 1-3 elements and their actual uses. The above is very different. Sizing a vector with get_max_uid () means it is very likely very sparse and so constructing every element in the array seems undesirable. While it is true that e.g. IRA or DF use vectors indexed by INSN_UID, I think the vectors pretty much always have pointer elements and say pool allocate what it points to upon first access. While vector_insn_info is huge (48 bytes on 64-bit hosts from what I can see) even without much attempts to make it shorter (e.g. the vl_vtype_info member ordering of 2xpointer sized member, 1 byte member, 4 byte member, 3 1 byte members creates unnecessary padding). The reason why arrays indexed by INSN_UID are sparse are 3: 1) we have --param min-nondebug-insn-uid= parameter (where, albeit usually just for debugging, one can specify very high start for those, say --param min-nondebug-insn-uid=10 means debug insns will be created with INSN_UID 1 and up, while non-DEBUG_INSNs only with INSN_UID 10 and up, so even a function containing just 3-4 insns will have get_max_uid () of 14 or so; the above allocates in that case 14 * 48 bytes (bad) and newly constructs all the elements in it 2) INSN_UIDs are given to all newly created insns during RTL optimizations, when an insn is deleted, its INSN_UID is not reused, so there can be large gaps; the more churn in RTL optimizations, the larger 3) INSN_UIDs are given even to BARRIERs, DEBUG_INSNs etc., plus the question is if the algorithm really needs to track every "normal" insn as well, whether it shouldn't track just those that are in some ways using vectors or relevant to it, many scalar instructions might be uninteresting, DEBUG_INSNs certainly should be uninteresting (they must not affect code generation), etc. So, I think having something lazily allocated and initialized on demand might be a compile time memory and time win. For basic blockswe perform some block number compactions during compilation, so those shouldn't be denser. But the structure for each basic block is even bigger (104 bytes, because it contains 2 x 48 plus probability). E.g. DF uses for many purposes LUIDs instead, which need to be recomputed, but are logical uids within each basic block. Jakub
Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h
On Mon, Oct 02, 2023 at 01:38:53PM +0100, Tamar Christina wrote: > Hi All, > > I recently committed a patch that uses a nested std::pair in the second > argument. > It temporarily adds a second ranking variable for sorting and then later > drops it. > > This hits the newly added assert in vec.h. This assert made some relaxation > for > std::pair but doesn't allow this case through. The patch allows a recursive > std::pair in the second argument which fixes bootstrap. I must say I still don't understand why using a struct ifcvt_arg_entry { tree arg; unsigned len, occur; }; with comments describing what the members mean wouldn't be a better fix, in the sorting function what exactly means x{1,2}.second.first and x{1,2}.second.second isn't easily understandable, neither from the identifiers nor from any comments. Seems because you use 2 separate vectors, one with just tree elements and another with those tree elements + 2 unsigned values cached from it for the sorting purpose and then rewrite the original tree vector after sorting, I don't really see why nested std::pair would be a better match for it than a named structure. Furthermore, why populate args first, then compute the extra 2 integers in another loop pushing to argsKV and finally overwrite args with sorted values? Can't the first loop push tree with the 2 integers already? And what is the point of not using this structure later on when both args and argsKV vectors are live until the end of the same function? Can't you either pass that argsKV to others, having just one vector, or at least release the other vector when you don't really need it? Formatting style, swap? arg1 : arg0 isn't correctly formatted, missing space before ?. Also, ArgEntry is CamelCase which we (usually) don't use in GCC and additionally doesn't seem to be unique enough for ODR purposes. Ditto argsKV. > It should also still maintain the invariant that was being tested here since > the nested arguments should still be trivially copyable. > > Bootstrapped on aarch64-none-linux-gnu, x86_64-linux-gnu, and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > vec.h (struct is_trivially_copyable_or_pair): Check recursively in Missing "* " above. > second arg. Jakub
Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h
On Tue, Oct 03, 2023 at 10:27:16AM +, Tamar Christina wrote: > +/* Structure used to track meta-data on PHI arguments used to generate > + most efficient comparison sequence to slatten a PHI node. */ ^^^ typo (at least, never heard of this word, and wiktionary doesn't know it either (except for Dannish/Swedish)) > @@ -2045,6 +2065,25 @@ gen_phi_nest_statement (gphi *phi, > gimple_stmt_iterator *gsi, >return lhs; > } > Perhaps add a short function comment here? > +static int > +cmp_arg_entry (const void *p1, const void *p2) > +{ > + const ifcvt_arg_entry sval1 = *(const ifcvt_arg_entry *)p1; > + const ifcvt_arg_entry sval2 = *(const ifcvt_arg_entry *)p2; > + > + if (sval1.num_compares < sval2.num_compares) > +return -1; > + else if (sval1.num_compares > sval2.num_compares) > +return 1; > + > + if (sval1.occurs < sval2.occurs) > +return -1; > + else if (sval1.occurs > sval2.occurs) > +return 1; > + > + return 0; > +} > + > @@ -2167,61 +2206,53 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator > *gsi) >/* Create hashmap for PHI node which contain vector of argument indexes > having the same value. */ >bool swap = false; > - hash_map > phi_arg_map; > + hash_map > phi_arg_map; >unsigned int num_args = gimple_phi_num_args (phi); >/* Vector of different PHI argument values. */ > - auto_vec args (num_args); > + auto_vec args; > > - /* Compute phi_arg_map. */ > + /* Compute phi_arg_map, determine the list of unique PHI args and the > indices > + where they are in the PHI node. The indices will be used to determine > + the conditions to apply and their complexity. */ >for (i = 0; i < num_args; i++) > { >tree arg; > >arg = gimple_phi_arg_def (phi, i); > - if (!phi_arg_map.get (arg)) > - args.quick_push (arg); >phi_arg_map.get_or_insert (arg).safe_push (i); > } > > - /* Determine element with max number of occurrences and complexity. > Looking at only > - number of occurrences as a measure for complexity isn't enough as all > usages can > - be unique but the comparisons to reach the PHI node differ per branch. > */ > - typedef std::pair > ArgEntry; > - auto_vec argsKV; > - for (i = 0; i < args.length (); i++) > + /* Determine element with max number of occurrences and complexity. > Looking > + at only number of occurrences as a measure for complexity isn't enough > as > + all usages can be unique but the comparisons to reach the PHI node > differ > + per branch. */ > + for (auto entry : phi_arg_map) > { >unsigned int len = 0; > - for (int index : phi_arg_map.get (args[i])) > + for (int index : entry.second) > { > edge e = gimple_phi_arg_edge (phi, index); > len += get_bb_num_predicate_stmts (e->src); > } > > - unsigned occur = phi_arg_map.get (args[i])->length (); > + unsigned occur = entry.second.length (); >if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur); > - argsKV.safe_push ({ args[i], { len, occur }}); > + args.safe_push ({ entry.first, len, occur, entry.second }); > } > >/* Sort elements based on rankings ARGS. */ > - std::sort(argsKV.begin(), argsKV.end(), [](const ArgEntry &left, > - const ArgEntry &right) { > -return left.second < right.second; > - }); > - > - for (i = 0; i < args.length (); i++) > -args[i] = argsKV[i].first; > + args.qsort (cmp_arg_entry); I admit I don't know what you're using the args vector later on for and whether its ordering affects code generation, but because you qsort it I assume it does. My worry is that a hash_map traversal might not be the same order on all hosts and similarly qsort doesn't achieve stable sorting in case num_compares and occurrs members are equal for two or more different arguments. Can that ever happen? We have stablesort method instead of qsort but that would require consistent ordering in the vector (std::sort doesn't ensure stable sorting either). If it is a non-issue, the patch is ok with the above nits fixed. Otherwise perhaps we'd need to push in the first loop into the vector (but that if (!phi_arg_map.get (arg)) args.quick_push (arg); phi_arg_map.get_or_insert (arg).safe_push (i); in there was quite inefficient, better would be bool existed; phi_arg_map.get_or_insert (arg, &existed).safe_push (i); if (!existed) args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL }); or something similar), plus use stablesort. Or add another compared member which would be the first position. Jakub
Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h
On Tue, Oct 03, 2023 at 11:41:01AM +, Tamar Christina wrote: > > We have stablesort method instead of > > qsort but that would require consistent ordering in the vector (std::sort > > doesn't ensure stable sorting either). > > > > If it is a non-issue, the patch is ok with the above nits fixed. Otherwise > > perhaps we'd need to push in the first loop into the vector (but that > > if (!phi_arg_map.get (arg)) > > args.quick_push (arg); > > phi_arg_map.get_or_insert (arg).safe_push (i); in there was quite > > inefficient, better would be > > bool existed; > > phi_arg_map.get_or_insert (arg, &existed).safe_push (i); > > if (!existed) > > args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL }); or something > > similar), plus use stablesort. Or add another compared member which would > > be the first position. > > Hmm the problem here is that it would make the second loop that fills in the > len > quadratic as it has to search for arg in the list. I suppose I could push a > pointer > to the struct instead of `i` in the hashmap and the element into args and > update > the pointer as we go along? Would that work? Only if the second loop traverses the hashmap elements and for each tries to find the corresponding vector element. If instead you do what you've done before in the second loop, walk the vector and for each arg in there lookup phi_args_map.get (v.arg) (but please just once, vanilla trunk looks it up twice in for (int index : phi_arg_map.get (args[i])) { edge e = gimple_phi_arg_edge (phi, index); len += get_bb_num_predicate_stmts (e->src); } unsigned occur = phi_arg_map.get (args[i])->length (); ), then I don't think it would be quadratic. Jakub
Re: [PATCH] contrib/mklog.py: Fix issues reported by flake8
On Tue, Oct 03, 2023 at 02:02:40PM +0200, Martin Jambor wrote: > Hi, > > the testing infrastructure built by Martin Liška contains checking a > few python scripts in contrib witha tool flake8. That tool recently > complains that: > > contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is > None:' > contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function > definition, found 1 > > I'd like to silence these with the following, hopefully trivial, > changes. However, I have only tested the changes by running flake8 > again and running ./contrib/mklog.py --help. > > Is this good for trunk? (Or should I stop using flake8 instead?) > > Thanks, > > Martin > > > contrib/ChangeLog: > > 2023-10-03 Martin Jambor > > * mklog.py (skip_line_in_changelog): Compare to None using is instead > of ==, add an extra newline after the function. Ok, thanks. Jakub
[PATCH] match.pd: Fix up a ? cst1 : cst2 regression on signed bool [PR111668]
Hi! My relatively recent changes to these simplifiers to avoid doing build_nonstandard_integer_type (primarily for BITINT_TYPE) broke PR111668, a recurrence of the PR110487 bug. I thought the build_nonstandard_integer_type isn't ever needed there, but there is one special case where it is. For the a ? -1 : 0 and a ? 0 : -1 simplifications there are actually 3 different cases. One is for signed 1-bit precision types (signed kind of implied from integer_all_onesp, because otherwise it would match integer_onep earlier), where the simplifier wierdly was matching them using the a ? powerof2cst : 0 -> a << (log2(powerof2cst)) simplification and then another simplifier optimizing away the left shift when log2(powerof2cst) was 0. Another one is signed BOOLEAN_TYPE with precision > 1, where indeed we shouldn't be doing the negation in type, because it isn't well defined in that type, the type only has 2 valid values, 0 and -1. As an alternative, we could also e.g. cast to signed 1-bit precision BOOLEAN_TYPE and then extend to type. And the last case is what we were doing for types which have both 1 and -1 (all all ones) as valid values (i.e. all signed/unsigned ENUMERAL_TYPEs, INTEGRAL_TYPEs and BITINT_TYPEs with precision > 1). The following patch avoids the hops through << 0 for 1-bit precision and uses build_nonstandard_integer_type solely for the BOOLEAN_TYPE types (where we have a guarantee the precision is reasonably small, nothing ought to be created 129+ bit precision BOOLEAN_TYPEs). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-10-03 Jakub Jelinek PR tree-optimization/111668 * match.pd (a ? CST1 : CST2): Handle the a ? -1 : 0 and a ? 0 : -1 cases before the powerof2cst cases and differentiate between 1-bit precision types, larger precision boolean types and other integral types. Fix comment pastos and formatting. --- gcc/match.pd.jj 2023-10-02 09:42:01.657836005 +0200 +++ gcc/match.pd2023-10-03 10:33:30.817614648 +0200 @@ -5100,36 +5100,53 @@ (define_operator_list SYNC_FETCH_AND_AND (switch (if (integer_zerop (@2)) (switch -/* a ? 1 : 0 -> a if 0 and 1 are integral types. */ +/* a ? 1 : 0 -> a if 0 and 1 are integral types. */ (if (integer_onep (@1)) (convert (convert:boolean_type_node @0))) +/* a ? -1 : 0 -> -a. */ +(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1)) + (if (TYPE_PRECISION (type) == 1) + /* For signed 1-bit precision just cast bool to the type. */ + (convert (convert:boolean_type_node @0)) + (if (TREE_CODE (type) == BOOLEAN_TYPE) + (with { + tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type), + TYPE_UNSIGNED (type)); + } + (convert (negate (convert:intt (convert:boolean_type_node @0) + (negate (convert:type (convert:boolean_type_node @0)) /* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */ (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@1)) (with { tree shift = build_int_cst (integer_type_node, tree_log2 (@1)); } - (lshift (convert (convert:boolean_type_node @0)) { shift; }))) -/* a ? -1 : 0 -> -a. No need to check the TYPE_PRECISION not being 1 - here as the powerof2cst case above will handle that case correctly. */ -(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1)) - (negate (convert:type (convert:boolean_type_node @0)) + (lshift (convert (convert:boolean_type_node @0)) { shift; }) (if (integer_zerop (@1)) (switch -/* a ? 0 : 1 -> !a. */ +/* a ? 0 : 1 -> !a. */ (if (integer_onep (@2)) (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; }))) -/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */ +/* a ? 0 : -1 -> -(!a). */ +(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2)) + (if (TYPE_PRECISION (type) == 1) + /* For signed 1-bit precision just cast bool to the type. */ + (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; })) + (if (TREE_CODE (type) == BOOLEAN_TYPE) + (with { + tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type), + TYPE_UNSIGNED (type)); + } + (convert (negate (convert:intt (bit_xor (convert:boolean_type_node @0) + { boolean_true_node; }) + (negate (convert:type (bit_xor (convert:boolean_type_node @0) + { boolean_true_node; })) +/* a ? 0 : powerof2cst -> (!a) << (log2(powerof2cst)) */ (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@2)) (with { tree shift = build_int_cst (integer_type_node, tree_log2 (@2));
[PATCH] match.pd: Avoid other build_nonstandard_integer_type calls [PR111369]
Hi! On Sat, Sep 30, 2023 at 11:57:38AM +0200, Jakub Jelinek wrote: > > This fixes PR111369, where one of the bitint*.c tests FAILs with > > GCC_TEST_RUN_EXPENSIVE=1. > > Though, I think there is an preexisting issue which the > build_nonstandard_integer_type didn't help with; if type is signed 1-bit > precision, then I think a ? ~t : t could be valid, but -(type)a would invoke > UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit > invokes UB. > So perhaps we should guard this optimization on type having element precision > > 1 > or being unsigned. Plus the (convert:type @2) didn't make sense, @2 already > must have TREE_TYPE type. In the light of the PR111668 patch which shows that build_nonstandard_integer_type is needed (at least for some signed prec > 1 BOOLEAN_TYPEs if we use e.g. negation), I've reworked this patch and handled the last problematic build_nonstandard_integer_type call in there as well. In the x == cstN ? cst4 : cst3 optimization it uses build_nonstandard_integer_type solely for BOOLEAN_TYPEs (I really don't see why ENUMERAL_TYPEs would be a problem, we treat them in GIMPLE as uselessly convertible to same precision/sign INTEGER_TYPEs), for INTEGER_TYPEs it is really a no-op (might return a different type, but always INTEGER_TYPE with same TYPE_PRECISION same TYPE_UNSIGNED) and for BITINT_TYPE with larger precisions really harmful (we shouldn't create large precision INTEGER_TYPEs). The a?~t:t optimization just omits the negation of a in type for 1-bit precision types or any BOOLEAN_TYPEs. I think that is correct, because for both signed and unsigned 1-bit precision type, cast to type of a bool value yields already 0, -1 or 0, 1 values and for 1-bit precision negation of that is still 0, -1 or 0, 1 (except for invoking sometimes UB). And for signed larger precision BOOLEAN_TYPEs I think it is correct as well, cast of [0, 1] to type yields 0, -1 and those can be xored with 0 or -1 to yield the proper result, any other values would be UB. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-10-03 Jakub Jelinek PR middle-end/111369 * match.pd (x == cstN ? cst4 : cst3): Use build_nonstandard_integer_type only if type1 is BOOLEAN_TYPE. Fix comment typo. Formatting fix. (a?~t:t -> (-(a))^t): Always convert to type rather than using build_nonstandard_integer_type. Perform negation only if type has precision > 1 and is not signed BOOLEAN_TYPE. --- gcc/match.pd.jj 2023-10-03 10:33:30.817614648 +0200 +++ gcc/match.pd2023-10-03 11:29:54.089566764 +0200 @@ -5178,7 +5178,7 @@ (define_operator_list SYNC_FETCH_AND_AND /* Optimize # x_5 in range [cst1, cst2] where cst2 = cst1 + 1 - x_5 ? cstN ? cst4 : cst3 + x_5 == cstN ? cst4 : cst3 # op is == or != and N is 1 or 2 to r_6 = x_5 + (min (cst3, cst4) - cst1) or r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which @@ -5214,7 +5214,8 @@ (define_operator_list SYNC_FETCH_AND_AND type1 = type; auto prec = TYPE_PRECISION (type1); auto unsign = TYPE_UNSIGNED (type1); - type1 = build_nonstandard_integer_type (prec, unsign); + if (TREE_CODE (type1) == BOOLEAN_TYPE) + type1 = build_nonstandard_integer_type (prec, unsign); min = wide_int::from (min, prec, TYPE_SIGN (TREE_TYPE (@0))); wide_int a = wide_int::from (wi::to_wide (arg0), prec, @@ -5253,14 +5254,7 @@ (define_operator_list SYNC_FETCH_AND_AND } (if (code == PLUS_EXPR) (convert (plus (convert:type1 @0) { arg; })) - (convert (minus { arg; } (convert:type1 @0))) - ) - ) -) - ) - ) - ) -) + (convert (minus { arg; } (convert:type1 @0)) #endif (simplify @@ -6758,13 +6752,11 @@ (define_operator_list SYNC_FETCH_AND_AND (with { bool wascmp; } (if (INTEGRAL_TYPE_P (type) && bitwise_inverted_equal_p (@1, @2, wascmp) - && (!wascmp || element_precision (type) == 1)) - (with { - auto prec = TYPE_PRECISION (type); - auto unsign = TYPE_UNSIGNED (type); - tree inttype = build_nonstandard_integer_type (prec, unsign); -} -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2))) + && (!wascmp || TYPE_PRECISION (type) == 1)) + (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE) + || TYPE_PRECISION (type) == 1) +(bit_xor (convert:type @0) @2) +(bit_xor (negate (convert:type @0)) @2) #endif /* Simplify pointer equality compares using PTA. */ Jakub
Re: [Patch] libgomp.texi: Clarify that no other OpenMP context selectors are implemented
On Wed, Oct 04, 2023 at 01:08:15PM +0200, Tobias Burnus wrote: > I got confused myself when reading > > https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html > > Especially with regards to other platforms like PowerPC. > > It turned out that the list is complete, kind of. For 'arch' and 'isa' > those are the only ones - if we want to have more, it has to be > implemented (→ cf. PR105640). > > For 'kind': The host compiler always matches 'kind=host'; this also > applies to AMD GCN and Nvidia PTX - when compiled as stand-alone > compiler. Those two also match 'kind=gpu' (both as stand-alone > compiler(*) and for offloading). > > The attached documentation patch attempts to clarify this for both users > – and for implementers, for the latter, there is also a comment in the > .texi with more details. > > Comments, suggestions, remarks? > > Tobias > > (*) This could be changed by checking for "#ifndef ACCEL_COMPILER" in > gcc/config/{gcn/nvptx}/ but that does not seem to be worthwhile. > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > München, HRB 106955 > libgomp.texi: Clarify that no other OpenMP context selectors are implemented > > libgomp/ChangeLog: > > * libgomp.texi (OpenMP Context Selectors): Clarify 'kind' trait > and that other target archs have no 'arch'/'isa' traits implemented. LGTM. Jakub
Re: [committed] libstdc++: Define std::numeric_limits<_FloatNN> before C++23
On Wed, Oct 04, 2023 at 09:47:34AM -0700, Pranav Kant wrote: > Thanks for bringing this to my attention. I am working on a fix. Will keep > this thread posted. > > Clang *does* define this macro only when float128 type is available. But > the problem seems to be that clang doesn't define _Float128 alias type > which is what's being used here. It only defines __float128 type. Should be > easy to fix. _Float128 (and other _FloatNN types) are standard C23 and C++23 types, they certainly can't be simple aliases to __float128 or other similar types. In C++ they mangle differently, have significantly different behavior in lots of different language details. So, you need to implement the https://wg21.link/P1467R9 paper, rather than doing quick hacks. Jakub
[PATCH] ipa: Remove ipa_bits
Hi! The following patch removes ipa_bits struct pointer/vector from ipa jump functions and ipa cp transformations. The reason is because the struct uses widest_int to represent mask/value pair, which in the RFC patches to allow larger precisions for wide_int/widest_int is GC unfriendly because those types become non-trivially default constructible/copyable/destructible. One option would be to use trailing_wide_int for that instead, but as pointed out by Aldy, irange_storage which we already use under the hood for ipa_vr when type of parameter is integral or pointer already stores the mask/value pair because VRP now does the bit cp as well. So, this patch just uses m_vr to store both the value range and the bitmask. There is still separate propagation of the ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but when storing we merge the two into the same container. I've bootstrapped/regtested a slightly older version of this patch on x86_64-linux and i686-linux and that version regressed +FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:" +FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test" +FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param 0 to 0x7" +FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param 0 to 0xf" The last 2 were solely about the earlier patch not actually copying the if (dump_file) dumping of message that we set some mask for some parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk). The first testcase is a test for -fno-ipa-bit-cp disabling bit cp for alignments. For integral types I'm afraid it is a lost case when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp as well, but for pointer alignments I've added && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp) and && opt_for_fn (node->decl, flag_ipa_bit_cp) guards such that even just -fno-ipa-bit-cp disables it (alternatively we could just add -fno-ipa-vrp to propalign-3.c dg-options). Ok for trunk if this passes another bootstrap/regtest? Or defer until it is really needed (when the wide_int/widest_int changes are about to be committed)? 2023-10-05 Jakub Jelinek * ipa-prop.h (ipa_bits): Remove. (struct ipa_jump_func): Remove bits member. (struct ipcp_transformation): Remove bits member, adjust ctor and dtor. (ipa_get_ipa_bits_for_value): Remove. * ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove. (ipa_bits_hash_table): Remove. (ipa_print_node_jump_functions_for_edge): Don't print bits. (ipa_get_ipa_bits_for_value): Remove. (ipa_set_jfunc_bits): Remove. (ipa_compute_jump_functions_for_edge): For pointers query pointer alignment before ipa_set_jfunc_vr and update_bitmask in there. For integral types, just rely on bitmask already being handled in value ranges. (ipa_check_create_edge_args): Don't create ipa_bits_hash_table. (ipcp_transformation_initialize): Neither here. (ipcp_transformation_t::duplicate): Don't copy bits vector. (ipa_write_jump_function): Don't stream bits here. (ipa_read_jump_function): Neither here. (useful_ipcp_transformation_info_p): Don't test bits vec. (write_ipcp_transformation_info): Don't stream bits here. (read_ipcp_transformation_info): Neither here. (ipcp_get_parm_bits): Get mask and value from m_vr rather than bits. (ipcp_update_bits): Remove. (ipcp_update_vr): For pointers, set_ptr_info_alignment from bitmask stored in value range. (ipcp_transform_function): Don't test bits vector, don't call ipcp_update_bits. * ipa-cp.cc (propagate_bits_across_jump_function): Don't use jfunc->bits, instead get mask and value from jfunc->m_vr. (ipcp_store_bits_results): Remove. (ipcp_store_vr_results): Incorporate parts of ipcp_store_bits_results here, merge the bitmasks with value range if both are supplied. (ipcp_driver): Don't call ipcp_store_bits_results. * ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits clearing. --- gcc/ipa-prop.h.jj 2023-10-05 11:32:40.172739988 +0200 +++ gcc/ipa-prop.h 2023-10-05 11:36:45.405378086 +0200 @@ -292,18 +292,6 @@ public: array_slice m_elts; }; -/* Information about zero/non-zero bits. */ -class GTY(()) ipa_bits -{ -public: - /* The propagated value. */ - widest_int value; - /* Mask corresponding to the value. - Similar to ccp_lattice_t, if xth bit of mask is 0, - implies xth bit of value is constant. */ - widest_int mask; -}; - /* Info about value ranges. */ class GTY(()) ipa_vr @@ -342,11 +330,6 @@ struct GTY (()) ipa_jump_func and its description. */ struct ipa_agg_jump_f
[committed] sreal: Fix typo in function name
Hi! My earlier version of the ipa_bits removal patch resulted in self-test failures in sreal. When debugging it, I was really confused that I couldn't find verify_arithmetics function in the source. Turns out it had bad spelling... Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2023-10-05 Jakub Jelinek * sreal.cc (verify_aritmetics): Rename to ... (verify_arithmetics): ... this. (sreal_verify_arithmetics): Adjust caller. --- gcc/sreal.cc.jj 2023-08-08 15:55:08.366138409 +0200 +++ gcc/sreal.cc2023-10-05 10:20:20.528806377 +0200 @@ -323,7 +323,7 @@ sreal_verify_basics (void) of given arguments A and B. */ static void -verify_aritmetics (int64_t a, int64_t b) +verify_arithmetics (int64_t a, int64_t b) { ASSERT_EQ (a, -(-(sreal (a))).to_int ()); ASSERT_EQ (a < b, sreal (a) < sreal (b)); @@ -356,7 +356,7 @@ sreal_verify_arithmetics (void) int a = values[i]; int b = values[j]; - verify_aritmetics (a, b); + verify_arithmetics (a, b); } } Jakub
Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h
On Thu, Oct 05, 2023 at 02:01:40PM +, Tamar Christina wrote: > gcc/ChangeLog: > > * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove. > (typedef struct ifcvt_arg_entry): New. > (cmp_arg_entry): New. > (gen_phi_arg_condition, gen_phi_nest_statement, > predicate_scalar_phi): Use them. > - /* Compute phi_arg_map. */ > + /* Compute phi_arg_map, determine the list of unique PHI args and the > indices > + where they are in the PHI node. The indices will be used to determine > + the conditions to apply and their complexity. */ > + auto_vec unique_args (num_args); >for (i = 0; i < num_args; i++) > { >tree arg; > >arg = gimple_phi_arg_def (phi, i); >if (!phi_arg_map.get (arg)) > - args.quick_push (arg); > + unique_args.quick_push (arg); >phi_arg_map.get_or_insert (arg).safe_push (i); > } I meant instead of using another vector (unique_args) just do args.quick_push ({ arg, 0, 0, NULL }); above (to avoid needing another allocation etc.). > - /* Determine element with max number of occurrences and complexity. > Looking at only > - number of occurrences as a measure for complexity isn't enough as all > usages can > - be unique but the comparisons to reach the PHI node differ per branch. > */ > - typedef std::pair > ArgEntry; > - auto_vec argsKV; > - for (i = 0; i < args.length (); i++) > + /* Determine element with max number of occurrences and complexity. > Looking > + at only number of occurrences as a measure for complexity isn't enough > as > + all usages can be unique but the comparisons to reach the PHI node > differ > + per branch. */ > + for (auto arg : unique_args) And then for (auto &entry : args) here with entry.arg instead of arg and > { >unsigned int len = 0; > - for (int index : phi_arg_map.get (args[i])) > + vec *indices = phi_arg_map.get (arg); > + for (int index : *indices) > { > edge e = gimple_phi_arg_edge (phi, index); > len += get_bb_num_predicate_stmts (e->src); > } > > - unsigned occur = phi_arg_map.get (args[i])->length (); > + unsigned occur = indices->length (); >if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur); > - argsKV.safe_push ({ args[i], { len, occur }}); > + args.safe_push ({ arg, len, occur, indices }); either entry.num_compares = len; entry.occur = occur; entry.indices = indices; here or just using entry.{num_occurrences,occur,indices} directly instead of the extra automatic vars. > } > > + unique_args.release (); Plus drop this. Though, if Richi or Jeff think this is ok as is, I won't stand against it. Jakub
Re: [PATCH] ipa: Remove ipa_bits
On Thu, Oct 05, 2023 at 04:42:42PM +0200, Jan Hubicka wrote: > It does look like a nice cleanup to me. > I wonder if you did some compare of the bit information propagated with > new code and old code? Theoretically they should be equivalent? Beyond testsuite, I've tried __attribute__((noinline, noclone)) static int foo (int x, int y, int *p) { return p[x + y]; } __attribute__((noinline, noclone)) static int bar (int x, int y, int *p) { return foo (x, y & 0xff, p); } int baz (int x, int y, int *p) { return bar ((x & 0x) | 0x12345678, (x & 0x) | 0x87654321, __builtin_assume_aligned (p, 32, 16)); } and -fdump-tree-ccp2-alias was identical before/after the patch, so the expected # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678 int x_5(D) = x; # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21 int y_6(D) = y; # PT = nonlocal null # ALIGN = 32, MISALIGN = 16 int * p_7(D) = p; in foo (-O2). With -O2 -fno-ipa-vrp # RANGE [irange] int [-INF, +INF] MASK 0x45410105 VALUE 0x12345678 int x_5(D) = x; # RANGE [irange] int [-INF, +INF] MASK 0x8a VALUE 0x21 int y_6(D) = y; # PT = nonlocal null # ALIGN = 32, MISALIGN = 16 int * p_7(D) = p; and -O2 -fno-ipa-bit-cp # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678 int x_5(D) = x; # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21 int y_6(D) = y; # PT = nonlocal null int * p_7(D) = p; All that is the same as before. Jakub
Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h
On Fri, Oct 06, 2023 at 02:23:06AM +, Tamar Christina wrote: > gcc/ChangeLog: > > * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove. > (typedef struct ifcvt_arg_entry): New. > (cmp_arg_entry): New. > (gen_phi_arg_condition, gen_phi_nest_statement, > predicate_scalar_phi): Use them. Ok, thanks. Jakub
Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
On Mon, Oct 09, 2023 at 01:54:19PM +0100, Richard Sandiford wrote: > > I've additionally built it with the incremental attached patch and > > on make -C gcc check-gcc check-g++ -j32 -k it didn't show any > > wide_int/widest_int heap allocations unless a > 128-bit _BitInt or wb/uwb > > constant needing > 128-bit _BitInt was used in a testcase. > > Overall it looks really good to me FWIW. Some comments about the > wide-int.h changes below. Will send a separate message about wide-int.cc. Thanks, just quick answers, will work on patch adjustments after trying to get rid of rwide_int (seems dwarf2out has very limited needs from it, just some routine to construct it in GCed memory (and never change afterwards) from const wide_int_ref & or so, and then working operator ==, get_precision, elt, get_len and get_val methods, so I think we could just have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; }; and perform the methods on it after converting to a storage ref. > > @@ -380,7 +406,11 @@ namespace wi > > > > /* The integer has a constant precision (known at GCC compile time) > > and is signed. */ > > -CONST_PRECISION > > +CONST_PRECISION, > > + > > +/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger > > + precision where not all elements of arrays are always present. */ > > +WIDEST_CONST_PRECISION > >}; > > Sorry to bring this up so late, but how about using INL_CONST_PRECISION > for the fully inline case and CONST_PRECISION for the general case? > That seems more consistent with the other naming in the patch. Ok. > > @@ -482,6 +541,18 @@ namespace wi > >}; > > > >template > > + struct binary_traits > WIDEST_CONST_PRECISION> > > + { > > +STATIC_ASSERT (int_traits ::precision == int_traits > > ::precision); > > Should this assert for equal inl_precision too? Although it probably > isn't necessary computationally, it seems a bit arbitrary to pick the > first inl_precision... inl_precision is only used for widest_int/widest2_int, so if precision is equal, inl_precision is as well. > > +inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > > +{ > > + len = x.len; > > + precision = x.precision; > > + if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION)) > > +{ > > + u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, > > HOST_BITS_PER_WIDE_INT)); > > + memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT)); > > +} > > + else if (LIKELY (precision)) > > +memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT)); > > +} > > Does the variable-length memcpy pay for itself? If so, perhaps that's a > sign that we should have a smaller inline buffer for this class (say 2 HWIs). Guess I'll try to see what results in smaller .text size. > > +namespace wi > > +{ > > + template > > + struct int_traits < widest_int_storage > > > + { > > +static const enum precision_type precision_type = > > WIDEST_CONST_PRECISION; > > +static const bool host_dependent_precision = false; > > +static const bool is_sign_extended = true; > > +static const bool needs_write_val_arg = true; > > +static const unsigned int precision > > + = N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION; > > What's the reasoning behind this calculation? It would give 0 for > N < WIDE_INT_MAX_INL_PRECISION, and the "MAX" suggests that N > shouldn't be > WIDE_INT_MAX_INL_PRECISION either. > > I wonder whether this should be a second template parameter, with an > assert that precision > inl_precision. Maybe. Yet another option would be to always use WIDE_INT_MAX_INL_PRECISION as the inline precision (and use N template parameter just to decide about the overall precision), regardless of whether it is widest_int or widest2_int. The latter is very rare and even much rarer that something wouldn't fit into the WIDE_INT_MAX_INL_PRECISION when not using _BitInt. The reason for introducing inl_precision was to avoid the heap allocation for widest2_int unless _BitInt is in use, but maybe that isn't worth it. > Nit: might format more naturally with: > > using res_traits = wi::int_traits : > ... Ok. > > @@ -2203,6 +2781,9 @@ wi::sext (const T &x, unsigned int offse > >unsigned int precision = get_precision (result); > >WIDE_INT_REF_FOR (T) xi (x, precision); > > > > + if (result.needs_write_val_arg) > > +val = result.write_val (MAX (xi.len, > > +CEIL (offset, HOST_BITS_PER_WIDE_INT))); > > Why MAX rather than MIN? Because it needs to be an upper bound. In this case, sext_large has unsigned int len = offset / HOST_BITS_PER_WIDE_INT; /* Extending beyond the precision is a no-op. If we have only stored OFFSET bits or fewer, the rest are already signs. */ if (offset >= precision || len >= xlen) { for (unsigned i = 0; i < xlen; ++i) val[i] = xval[i]; return xlen; } unsigned int suboffset = of
[PATCH] wide-int: Remove rwide_int, introduce dw_wide_int
On Mon, Oct 09, 2023 at 12:55:02PM +0200, Jakub Jelinek wrote: > This makes wide_int unusable in GC structures, so for dwarf2out > which was the only place which needed it there is a new rwide_int type > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs > inline and is trivially copyable (dwarf2out should never deal with large > _BitInt constants, those should have been lowered earlier). As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited, it just needs to allocate new GC structures val_wide points to (constructed from some const wide_int_ref &) and needs to call operator==, get_precision, elt, get_len and get_val methods on it. Even trailing_wide_int would be overkill for that, the following just adds a new struct with precision/len and trailing val array members and implements the needed methods (only 2 of them using wide_int_ref constructed from those). Incremental patch, so far compile time tested only: --- gcc/wide-int.h.jj 2023-10-09 14:37:45.878940132 +0200 +++ gcc/wide-int.h 2023-10-09 16:06:39.326805176 +0200 @@ -27,7 +27,7 @@ along with GCC; see the file COPYING3. other longer storage GCC representations (rtl and tree). The actual precision of a wide_int depends on the flavor. There - are four predefined flavors: + are three predefined flavors: 1) wide_int (the default). This flavor does the math in the precision of its input arguments. It is assumed (and checked) @@ -80,12 +80,7 @@ along with GCC; see the file COPYING3. wi::leu_p (a, b) as a more efficient short-hand for "a >= 0 && a <= b". ] - 3) rwide_int. Restricted wide_int. This is similar to - wide_int, but maximum possible precision is RWIDE_INT_MAX_PRECISION - and it always uses an inline buffer. offset_int and rwide_int are - GC-friendly, wide_int and widest_int are not. - - 4) widest_int. This representation is an approximation of + 3) widest_int. This representation is an approximation of infinite precision math. However, it is not really infinite precision math as in the GMP library. It is really finite precision math where the precision is WIDEST_INT_MAX_PRECISION. @@ -257,9 +252,6 @@ along with GCC; see the file COPYING3. #define WIDE_INT_MAX_ELTS 255 #define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT) -#define RWIDE_INT_MAX_ELTS WIDE_INT_MAX_INL_ELTS -#define RWIDE_INT_MAX_PRECISION WIDE_INT_MAX_INL_PRECISION - /* Precision of widest_int and largest _BitInt precision + 1 we can support. */ #define WIDEST_INT_MAX_ELTS 510 @@ -343,7 +335,6 @@ STATIC_ASSERT (WIDE_INT_MAX_INL_ELTS < W template class generic_wide_int; template class fixed_wide_int_storage; class wide_int_storage; -class rwide_int_storage; template class widest_int_storage; /* An N-bit integer. Until we can use typedef templates, use this instead. */ @@ -352,7 +343,6 @@ template class widest_int_storag typedef generic_wide_int wide_int; typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int; -typedef generic_wide_int rwide_int; typedef generic_wide_int > widest_int; typedef generic_wide_int > widest2_int; @@ -1371,180 +1361,6 @@ wi::int_traits ::get_b return wi::get_precision (x); } -/* The storage used by rwide_int. */ -class GTY(()) rwide_int_storage -{ -private: - HOST_WIDE_INT val[RWIDE_INT_MAX_ELTS]; - unsigned int len; - unsigned int precision; - -public: - rwide_int_storage () = default; - template - rwide_int_storage (const T &); - - /* The standard generic_rwide_int storage methods. */ - unsigned int get_precision () const; - const HOST_WIDE_INT *get_val () const; - unsigned int get_len () const; - HOST_WIDE_INT *write_val (unsigned int); - void set_len (unsigned int, bool = false); - - template - rwide_int_storage &operator = (const T &); - - static rwide_int from (const wide_int_ref &, unsigned int, signop); - static rwide_int from_array (const HOST_WIDE_INT *, unsigned int, - unsigned int, bool = true); - static rwide_int create (unsigned int); -}; - -namespace wi -{ - template <> - struct int_traits - { -static const enum precision_type precision_type = VAR_PRECISION; -/* Guaranteed by a static assert in the rwide_int_storage constructor. */ -static const bool host_dependent_precision = false; -static const bool is_sign_extended = true; -static const bool needs_write_val_arg = false; -template -static rwide_int get_binary_result (const T1 &, const T2 &); -template -static unsigned int get_binary_precision (const T1 &, const T2 &); - }; -} - -/* Initialize the storage from integer X, in its natural precision. - Note that we do not allow integers with host-dependent precision - to become rwide_ints; rwide_ints must always be logically independen
Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
On Mon, Oct 09, 2023 at 03:44:10PM +0200, Jakub Jelinek wrote: > Thanks, just quick answers, will work on patch adjustments after trying to > get rid of rwide_int (seems dwarf2out has very limited needs from it, just > some routine to construct it in GCed memory (and never change afterwards) > from const wide_int_ref & or so, and then working operator ==, > get_precision, elt, get_len and get_val methods, so I think we could just > have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; }; > and perform the methods on it after converting to a storage ref. Now in patch form (again, incremental). > > Does the variable-length memcpy pay for itself? If so, perhaps that's a > > sign that we should have a smaller inline buffer for this class (say 2 > > HWIs). > > Guess I'll try to see what results in smaller .text size. I've left the memcpy changes into a separate patch (incremental, attached). Seems that second patch results in .text growth by 16256 bytes (0.04%), though I'd bet it probably makes compile time tiny bit faster because it replaces an out of line memcpy (caused by variable length) with inlined one. With even the third one it shrinks by 84544 bytes (0.21% down), but the extra statistics patch then shows massive number of allocations after running make check-gcc check-g++ check-gfortran for just a minute or two. On the widest_int side, I see (first number from sort | uniq -c | sort -nr, second the estimated or final len) 7289034 4 173586 5 21819 6 i.e. there are tons of widest_ints which need len 4 (or perhaps just have it as upper estimation), maybe even 5 would be nice. On the wide_int side, I see 155291 576 (supposedly because of bound_wide_int, where we create wide_int_ref from the 576-bit precision bound_wide_int and then create 576-bit wide_int when using unary or binary operation on that). So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6 instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed from MAX_BITSIZE_MODE_ANY_INT?). Or keep it at 9 for both (i.e. without the third patch). --- gcc/poly-int.h.jj 2023-10-09 14:37:45.883940062 +0200 +++ gcc/poly-int.h 2023-10-09 17:05:26.629828329 +0200 @@ -96,7 +96,7 @@ struct poly_coeff_traits -struct poly_coeff_traits +struct poly_coeff_traits { typedef WI_UNARY_RESULT (T) result; typedef int int_type; @@ -110,14 +110,13 @@ struct poly_coeff_traits -struct poly_coeff_traits +struct poly_coeff_traits { typedef WI_UNARY_RESULT (T) result; typedef int int_type; /* These types are always signed. */ static const int signedness = 1; static const int precision = wi::int_traits::precision; - static const int inl_precision = wi::int_traits::inl_precision; static const int rank = precision * 2 / CHAR_BIT; template --- gcc/double-int.h.jj 2023-01-02 09:32:22.747280053 +0100 +++ gcc/double-int.h2023-10-09 17:06:03.446317336 +0200 @@ -440,7 +440,7 @@ namespace wi template <> struct int_traits { -static const enum precision_type precision_type = CONST_PRECISION; +static const enum precision_type precision_type = INL_CONST_PRECISION; static const bool host_dependent_precision = true; static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT; static unsigned int get_precision (const double_int &); --- gcc/wide-int.h.jj 2023-10-09 16:06:39.326805176 +0200 +++ gcc/wide-int.h 2023-10-09 17:29:20.016951691 +0200 @@ -343,8 +343,8 @@ template class widest_int_storag typedef generic_wide_int wide_int; typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int; -typedef generic_wide_int > widest_int; -typedef generic_wide_int > widest2_int; +typedef generic_wide_int > widest_int; +typedef generic_wide_int > widest2_int; /* wi::storage_ref can be a reference to a primitive type, so this is the conservatively-correct setting. */ @@ -394,13 +394,13 @@ namespace wi /* The integer has a variable precision but no defined signedness. */ VAR_PRECISION, -/* The integer has a constant precision (known at GCC compile time) - and is signed. */ -CONST_PRECISION, - -/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger - precision where not all elements of arrays are always present. */ -WIDEST_CONST_PRECISION +/* The integer has a constant precision (known at GCC compile time), + is signed and all elements are in inline buffer. */ +INL_CONST_PRECISION, + +/* Like INL_CONST_PRECISION, but elements can be heap allocated for + larger lengths. */ +CONST_PRECISION }; /* This class, which has no default implementation, is expected to @@ -410,15 +410,10 @@ namespace wi Classifies the type of T. static const unsigned int precision; - Only defined if precision_type == CONST_PRECISION or - prec
Re: [PATCH] x86: set spincount 1 for x86 hybrid platform [PR109812]
On Tue, Oct 10, 2023 at 12:59:52PM +0800, Jun Zhang wrote: > include/ChangeLog: > > * omphook.h: define RUNOMPHOOK macro. ChangeLog formatting. The description should start with capital letter. If you add a new file, you should just mention : New file. or something similar. But much more importantly, we don't do hooks like that in libgomp, it should be better a static inline function where the name shows what it is doing (e.g. do_adjust_default_spincount), best in some already existing header (e.g. wait.h), and best at the end of the if (!parse_spincount ("GOMP_SPINCOUNT", &gomp_spin_count_var)) block in env.c. > > libgomp/ChangeLog: > > * env.c (initialize_env): add RUNOMPHOOK macro. > * config/linux/x86/omphook.h: define RUNOMPHOOK macro. > --- > include/omphook.h | 1 + > libgomp/config/linux/x86/omphook.h | 19 +++ > libgomp/env.c | 3 +++ > 3 files changed, 23 insertions(+) > create mode 100644 include/omphook.h > create mode 100644 libgomp/config/linux/x86/omphook.h > > diff --git a/include/omphook.h b/include/omphook.h > new file mode 100644 > index 000..2ebe3ad57e6 > --- /dev/null > +++ b/include/omphook.h > @@ -0,0 +1 @@ > +#define RUNOMPHOOK() > diff --git a/libgomp/config/linux/x86/omphook.h > b/libgomp/config/linux/x86/omphook.h > new file mode 100644 > index 000..aefb311cc07 > --- /dev/null > +++ b/libgomp/config/linux/x86/omphook.h > @@ -0,0 +1,19 @@ > +#ifdef __x86_64__ > +#include "cpuid.h" > + > +/* only for x86 hybrid platform */ Comments should again start with capital letter and end with dot and 2 spaces before */ > +#define RUNOMPHOOK() \ > + do \ > +{ \ > + unsigned int eax, ebx, ecx, edx; \ > + if ((getenv ("GOMP_SPINCOUNT") == NULL) && (wait_policy < 0) \ Spurious ()s around the subcondition, the first shouldn't be tested when placed right, if condition doesn't fit on one line, each subcondition should be on separate line. > + && __get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx) \ If we don't have a macro for the CPUID.07H.0H:EDX[15] bit, there should be a comment which says what that bit is. > + && ((edx >> 15) & 1)) \ > + gomp_spin_count_var = 1LL; \ > + if (gomp_throttled_spin_count_var > gomp_spin_count_var) \ > + gomp_throttled_spin_count_var = gomp_spin_count_var; \ The above 2 lines won't be needed if placed right. Jakub
Re: [PATCH] wide-int: Remove rwide_int, introduce dw_wide_int
On Tue, Oct 10, 2023 at 09:30:31AM +, Richard Biener wrote: > On Mon, 9 Oct 2023, Jakub Jelinek wrote: > > > On Mon, Oct 09, 2023 at 12:55:02PM +0200, Jakub Jelinek wrote: > > > This makes wide_int unusable in GC structures, so for dwarf2out > > > which was the only place which needed it there is a new rwide_int type > > > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs > > > inline and is trivially copyable (dwarf2out should never deal with large > > > _BitInt constants, those should have been lowered earlier). > > > > As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited, > > it just needs to allocate new GC structures val_wide points to (constructed > > from some const wide_int_ref &) and needs to call operator==, > > get_precision, elt, get_len and get_val methods on it. > > Even trailing_wide_int would be overkill for that, the following just adds > > a new struct with precision/len and trailing val array members and > > implements the needed methods (only 2 of them using wide_int_ref constructed > > from those). > > > > Incremental patch, so far compile time tested only: > > LGTM, wonder if we can push this separately as prerequesite? The dwarf2out.{h,cc} changes sure, the wide-int.h changes obviously need to be merged into the main patch. Jakub
Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store
On Tue, Oct 10, 2023 at 10:49:04AM +, Richard Biener wrote: > The following fixes a mistake in count_nonzero_bytes which happily > skips over stores clobbering the memory we load a value we store > from and then performs analysis on the memory state before the > intermediate store. > > The patch implements the most simple fix - guarantee that there are > no intervening stores by tracking the original active virtual operand > and comparing that to the one of a load we attempt to analyze. > > This simple approach breaks two subtests of gcc.dg/Wstringop-overflow-69.c > which I chose to XFAIL. This function is a total mess, but I think punting right after the gimple_assign_single_p test is too big hammer. There are various cases and some of them are fine even when vuse is different, others aren't. The function at that point tries to handle CONSTRUCTOR, MEM_REF, or decls. I don't see why the CONSTRUCTOR case couldn't be fine regardless of the vuse. Though, am not really sure when a CONSTRUCTOR would appear, the lhs would need to be an SSA_NAME, so wouldn't for vectors that be a VECTOR_CST instead, etc.? Ah, perhaps a vector with some non-constant element in it. The MEM_REF case supposedly only if we can guarantee nothing could overwrite it (so MEM_REF of TREE_STATIC TREE_READONLY could be fine, STRING_CST too, anything else is wrong - count_nonzero_bytes_addr uses the get_stridx/get_strinfo APIs, which for something that can be changed always reflects only the state at the current statement. So, e.g. the get_stridx (exp, stmt) > 0 case in count_nonzero_bytes_addr is when the caller must definitely punt if vuse is different. Then for decls, again, CONST_DECLs, DECL_IN_CONSTANT_POOLs are certainly fine. Other decls for which ctor_for_folding returns non-error_mark_node should be fine as well, I think ctor_useable_for_folding_p is supposed to verify that. STRING_CSTs should be fine as well. So maybe pass the vuse down to count_nonzero_bytes_addr and return false in the idx > 0 case in there if gimple_vuse (stmt) != vuse? Jakub
Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store
On Tue, Oct 10, 2023 at 11:59:28AM +, Richard Biener wrote: > > I don't see why the CONSTRUCTOR case couldn't be fine regardless of the > > vuse. Though, am not really sure when a CONSTRUCTOR would appear, the > > lhs would need to be an SSA_NAME, so wouldn't for vectors that be a > > VECTOR_CST instead, etc.? Ah, perhaps a vector with some non-constant > > element in it. > > Yeah, but what should that be interpreted to in terms of object-size?! The function in question doesn't compute object sizes, just minimum/maximum number of non-zero bytes in some rhs of a store and whether everything stored is 0s, or everything non-zeros, or some non-zeros followed by zero. > I think the only real case we'd see here is the MEM_REF RHS given > we know we have a register type value. I'll note the function > totally misses handled_component_p (), it only seems to handle > *p and 'decl'. Yeah, maybe we could handle even that at some point. Though perhaps better first to rewrite it completely, because the recursive calls where in some cases it means one thing and in another case something completely different are just bad design (or lack thereof). > > So maybe pass the vuse down to count_nonzero_bytes_addr and return false > > in the idx > 0 case in there if gimple_vuse (stmt) != vuse? > > I don't know enough of the pass to do better, can you take it from here? > One of the points is that we need to know the memory context (aka vuse) > of both the original store and the load we are interpreting. For > _addr I wasn't sure how we arrive here. As you said, this is a bit > of spaghetti and I don't want to untangle this any further. I meant something like below, without getting rid of the -Wshadow stuff in there my initial attempt didn't work. This passes the new testcase as well as the testcase you've been touching, but haven't tested it beyond that yet. In theory we could even handle some cases with gimple_vuse (stmt) != vuse, because we save a copy of the strinfo state at the end of basic blocks and only throw that away after we process all dominator children. But we'd need to figure out at which bb to look and temporarily switch the vectors. 2023-10-10 Richard Biener Jakub Jelinek PR tree-optimization/111519 * tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse argument and pass it through to recursive calls and count_nonzero_bytes_addr calls. Don't shadow the stmt argument, but change stmt for gimple_assign_single_p statements for which we don't immediately punt. (strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass it through to recursive calls and count_nonzero_bytes calls. Don't use get_strinfo if gimple_vuse (stmt) is different from vuse. Don't shadow the stmt argument. * gcc.dg/torture/pr111519.c: New testcase. --- gcc/tree-ssa-strlen.cc.jj 2023-08-30 11:21:38.539521966 +0200 +++ gcc/tree-ssa-strlen.cc 2023-10-10 15:05:44.731871218 +0200 @@ -281,14 +281,14 @@ public: gimple *stmt, unsigned lenrange[3], bool *nulterm, bool *allnul, bool *allnonnul); - bool count_nonzero_bytes (tree exp, + bool count_nonzero_bytes (tree exp, tree vuse, gimple *stmt, unsigned HOST_WIDE_INT offset, unsigned HOST_WIDE_INT nbytes, unsigned lenrange[3], bool *nulterm, bool *allnul, bool *allnonnul, ssa_name_limit_t &snlim); - bool count_nonzero_bytes_addr (tree exp, + bool count_nonzero_bytes_addr (tree exp, tree vuse, gimple *stmt, unsigned HOST_WIDE_INT offset, unsigned HOST_WIDE_INT nbytes, @@ -4531,8 +4531,8 @@ nonzero_bytes_for_type (tree type, unsig } /* Recursively determine the minimum and maximum number of leading nonzero - bytes in the representation of EXP and set LENRANGE[0] and LENRANGE[1] - to each. + bytes in the representation of EXP at memory state VUSE and set + LENRANGE[0] and LENRANGE[1] to each. Sets LENRANGE[2] to the total size of the access (which may be less than LENRANGE[1] when what's being referenced by EXP is a pointer rather than an array). @@ -4546,7 +4546,7 @@ nonzero_bytes_for_type (tree type, unsig Returns true on success and false otherwise. */ bool -strlen_pass::count_nonzero_bytes (tree exp, gimple *stmt, +strlen_pass::count_nonzero_bytes (tree exp, tree vuse, gimple *stmt, unsigned HOST_WIDE_INT offset, unsigned
[PATCH] dwarf2out: Stop using wide_int in GC structures
Hi! On Tue, Oct 10, 2023 at 09:30:31AM +, Richard Biener wrote: > On Mon, 9 Oct 2023, Jakub Jelinek wrote: > > > This makes wide_int unusable in GC structures, so for dwarf2out > > > which was the only place which needed it there is a new rwide_int type > > > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs > > > inline and is trivially copyable (dwarf2out should never deal with large > > > _BitInt constants, those should have been lowered earlier). > > > > As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite limited, > > it just needs to allocate new GC structures val_wide points to (constructed > > from some const wide_int_ref &) and needs to call operator==, > > get_precision, elt, get_len and get_val methods on it. > > Even trailing_wide_int would be overkill for that, the following just adds > > a new struct with precision/len and trailing val array members and > > implements the needed methods (only 2 of them using wide_int_ref constructed > > from those). > > > > Incremental patch, so far compile time tested only: > > LGTM, wonder if we can push this separately as prerequesite? Here it is as a separate independent patch. Even without the wide_int changing patch it should save some memory, by not always allocating room for 9 limbs, but say just the 2/3 or how many we actually need. And, another advantage is that if we really need it at some point, it could support even more than 9 limbs if it is created from a wide_int_ref with get_len () > 9. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-10-10 Jakub Jelinek * dwarf2out.h (wide_int_ptr): Remove. (dw_wide_int_ptr): New typedef. (struct dw_val_node): Change type of val_wide from wide_int_ptr to dw_wide_int_ptr. (struct dw_wide_int): New type. (dw_wide_int::elt): New method. (dw_wide_int::operator ==): Likewise. * dwarf2out.cc (get_full_len): Change argument type to const dw_wide_int & from const wide_int &. Use CEIL. Call get_precision method instead of calling wi::get_precision. (alloc_dw_wide_int): New function. (add_AT_wide): Change w argument type to const wide_int_ref & from const wide_int &. Use alloc_dw_wide_int. (mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int. (insert_wide_int): Change val argument type to const wide_int_ref & from const wide_int &. (add_const_value_attribute): Pass rtx_mode_t temporary directly to add_AT_wide instead of using a temporary variable. --- gcc/dwarf2out.h.jj 2023-10-09 14:37:45.890939965 +0200 +++ gcc/dwarf2out.h 2023-10-09 16:46:14.705816928 +0200 @@ -30,7 +30,7 @@ typedef struct dw_cfi_node *dw_cfi_ref; typedef struct dw_loc_descr_node *dw_loc_descr_ref; typedef struct dw_loc_list_struct *dw_loc_list_ref; typedef struct dw_discr_list_node *dw_discr_list_ref; -typedef wide_int *wide_int_ptr; +typedef struct dw_wide_int *dw_wide_int_ptr; /* Call frames are described using a sequence of Call Frame @@ -252,7 +252,7 @@ struct GTY(()) dw_val_node { unsigned HOST_WIDE_INT GTY ((tag ("dw_val_class_unsigned_const"))) val_unsigned; double_int GTY ((tag ("dw_val_class_const_double"))) val_double; - wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide; + dw_wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide; dw_vec_const GTY ((tag ("dw_val_class_vec"))) val_vec; struct dw_val_die_union { @@ -313,6 +313,35 @@ struct GTY(()) dw_discr_list_node { int dw_discr_range; }; +struct GTY((variable_size)) dw_wide_int { + unsigned int precision; + unsigned int len; + HOST_WIDE_INT val[1]; + + unsigned int get_precision () const { return precision; } + unsigned int get_len () const { return len; } + const HOST_WIDE_INT *get_val () const { return val; } + inline HOST_WIDE_INT elt (unsigned int) const; + inline bool operator == (const dw_wide_int &) const; +}; + +inline HOST_WIDE_INT +dw_wide_int::elt (unsigned int i) const +{ + if (i < len) +return val[i]; + wide_int_ref ref = wi::storage_ref (val, len, precision); + return wi::sign_mask (ref); +} + +inline bool +dw_wide_int::operator == (const dw_wide_int &o) const +{ + wide_int_ref ref1 = wi::storage_ref (val, len, precision); + wide_int_ref ref2 = wi::storage_ref (o.val, o.len, o.precision); + return ref1 == ref2; +} + /* Interface from dwarf2out.cc to dwarf2cfi.cc. */ extern struct dw_loc_descr_node *build_cfa_loc (dw_cfa_location *, poly_int64); --- gcc/dwarf2out.cc.jj 2023-10-09 14:37:45.894939909 +0200 +++ gcc/dwarf2out.cc2023-10-09 16:48:24.565014459 +0200 @@ -397,11 +397,9 @@ dump_struct_debug (tree type, enum
C++ patch ping^2
Hi! I'd like to ping a couple of C++ patches. - c++, v2: Implement C++26 P2169R4 - Placeholder variables with no name [PR110349] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630802.html - c++: Implement C++26 P2361R6 - Unevaluated strings [PR110342] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628375.html - c++, v2: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630803.html - c++: Implement C++ DR 2406 - [[fallthrough]] attribute and iteration statements [PR107571] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628487.html (from this one Richi approved the middle-end changes) - c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628490.html - libcpp, v2: Small incremental patch for P1854R4 [PR110341] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628586.html Jakub
Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
On Tue, Oct 10, 2023 at 06:41:50PM +0100, Richard Sandiford wrote: > > On the wide_int side, I see > > 155291 576 > > (supposedly because of bound_wide_int, where we create wide_int_ref from > > the 576-bit precision bound_wide_int and then create 576-bit wide_int when > > using unary or binary operation on that). > > 576 bits seems quite a lot for a loop bound. We're enterning near-infinite > territory with 128 bits. :) But I don't want to rehash old discussions. > If we take this size for wide_int as given, then... We could go for 128 bits bounds_wide_int and redo the stats given that. Though maybe such tweaks can be done incrementally. > > So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6 > > instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed > > from MAX_BITSIZE_MODE_ANY_INT?). Or keep it at 9 for both (i.e. without > > the third patch). > > ...I agree we might as well keep the widest_int size the same for > simplicity. It'd only be worth distinguishing them if we have positive > proof that it's worthwhile. > > So going with patches 1 + 2 sounds good to me, but I don't have a strong > preference. Ok. > > + if (prec > WIDE_INT_MAX_INL_PRECISION && !high) > > +prec = (op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT; > > Changing the precision looked a bit dangerous at first, but I agree it > looks correct in context, in that nothing later on seems to require prec > to be the real precision of the number. But I wonder whether we should > instead do: > > if (!high) > prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec); Ok, will try that (and in other spots too). > so that the assumption gets a bit more testing. Same idea for the others. > I.e. in any case where we think it's safe to reduce a precision or > length for out-of-line buffers, I think we should try to do the same > for inline ones. > > I'm not sure off-hand why + 1 is safe here but + 2 is needed for the > write_val estimate. Might be worth a comment in one place or the other. As I wrote earlier, I'm not sure about the need for the + 1 above in prec computation, perhaps just if (!high) prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec); could work too (especially if the multiplication is always signed vs. signed or unsigned vs. unsigned). Even say HOST_WIDE_INT_MIN * HOST_WIDE_INT_MIN (i.e. op1len == op2len == 1) result of 0x4000'0'' (for smallest signed^2) or ~(unsigned HOST_WIDE_INT) 0 * ~(unsigned HOST_WIDE_INT) 0 (largest unsigned^2) of 0x'fffe''0001 fits into prec 2; and for widest_int-like representation, 0x'ff wouldn't have op1len 1, but 2; but the + 1 on top of that is needed because of the way wi_pack then works. But guess will try to play with it some more tomorrow. > > @@ -2399,9 +2453,12 @@ from_int (int i) > > static void > > assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn) > > { > > - char buf[WIDE_INT_PRINT_BUFFER_SIZE]; > > - print_dec (wi, buf, sgn); > > - ASSERT_STREQ (expected, buf); > > + char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; > > + unsigned len = wi.get_len (); > > + if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) > > +p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); > > Is this size enough for decimal? E.g. to go back to the 576-bit example, > 2^575 is O(10^173), but 576/4+4 == 148. generic_wide_ints with precision larger than HOST_BITS_PER_WIDE_INT and length larger than 1 are always printed hexadecimally, we don't have code to print it decimally (it wouldn't be terribly hard, we have divmod, but could be expensive). Though, I've always wondered why we at least for print_decs don't print negative values as -0xwhatever, rather than 0xsomethinglarge. Any change in this area could affect a lot of dumps though. Jakub
Re: [PATCH] x86: set spincount 1 for x86 hybrid platform [PR109812]
On Wed, Oct 11, 2023 at 02:47:56AM +, Zhang, Jun wrote: > Sorry, I don't know how to place wait.h. This patch is only for x86. > Wait.h is in libgomp/config/linux/. > if I directly place do_adjust_default_spincount in wait.h, > it maybe need #ifdef __x86_64__, and also need a wait.h in include/ > for other os platform. > if I create a new wait.h in libgomp/config/linux/x86/, > It need copy whole wait.h from libgomp/config/linux/, > also need a wait.h in include/ for other os platform. > Could you help me? Put it into a new spincount.h then. Sorry, missed wait.h isn't CPU specific. Jakub
Re: [PATCH v2] x86: set spincount 1 for x86 hybrid platform
On Wed, Oct 11, 2023 at 04:39:28PM +0800, Jun Zhang wrote: > include/ChangeLog: > > * spincount.h: New file. > > libgomp/ChangeLog: > > * env.c (initialize_env): Use do_adjust_default_spincount. > * config/linux/x86/spincount.h: New file. Ok. Jakub
Re: Principles of the C99 testsuite conversion
On Wed, Oct 11, 2023 at 08:17:49AM -0600, Jeff Law wrote: > > > On 10/11/23 08:10, Richard Earnshaw (lists) wrote: > > On 11/10/2023 14:56, Jeff Law wrote: > > > > > > > > > On 10/11/23 04:39, Florian Weimer wrote: > > > > I've started to look at what it is required to convert the testsuite to > > > > C99 (without implicit ints, without implicit function declarations, and > > > > a few other legacy language features). > > > I bet those older tests originating from c-torture will be a bit painful. > > > Torbjorn liked having them minimized, to the point of squashing out > > > nearly everything he considered extraneous. I'd bet many of those older > > > tests are going to need lots of changes. > > > > > > > I've often wondered just how much of the original c-torture suite is still > > relevant today. Most of those tests were written at a time when the > > compiler expanded tree directly into RTL and I suspect that today the tests > > never get even close to tickling the original bug they were intended to > > validate. > I'm sure it's a mixed bag. Some I do see pop up regularly during > development testing. > > The real way to try and answer that question would be with gcov. Something > like take an existing coverage report, run test and see if it added any > additional coverage. If not, flag it as potentially irrelevant. Do this on > x86 or aarch64. > > Given that list of potentially irrelevant tests, then look at them from a > target standpoint, potentially testing with the same methodology on several > targets, perhaps a standardized set meant to cover 16->64 bit targets, > big/little endian and a few other key features. I think over the years we've seen tons of cases where a test written for one problem at some point is able to trigger some completely different problem a few years ago. So, I don't think most of the tests are irrelevant. Jakub
[PATCH] libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc
Hi! The following testcase started FAILing recently after the https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554 glibc change which marked vfscanf with nonnull (1) attribute. While vfwscanf hasn't been marked similarly (strangely), the patch changes that too. By using va_arg one hides the value of it from the compiler (volatile keyword would do too, or making the FILE* stream a function argument, but then it might need to be guarded by #if or something). Tested on x86_64-linux, ok for trunk? 2023-10-12 Jakub Jelinek * testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01): Initialize stream to va_arg(ap, FILE*) rather than 0. * testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01): Likewise. --- libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc.jj 2023-01-16 23:19:06.651711546 +0100 +++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdio/functions.cc 2023-10-12 09:46:28.695011763 +0200 @@ -35,7 +35,7 @@ void test01(int dummy, ...) char* s = 0; const char* cs = 0; const char* format = "%i"; - FILE* stream = 0; + FILE* stream = va_arg(ap, FILE*); std::size_t n = 0; int ret; --- libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc.jj 2023-01-16 23:19:06.651711546 +0100 +++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cwchar/functions.cc 2023-10-12 09:46:19.236141897 +0200 @@ -42,7 +42,7 @@ void test01(int dummy, ...) #endif #if _GLIBCXX_HAVE_VFWSCANF - FILE* stream = 0; + FILE* stream = va_arg(arg, FILE*); const wchar_t* format1 = 0; int ret1; ret1 = std::tr1::vfwscanf(stream, format1, arg); Jakub
Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote: > Jakub Jelinek writes: > > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c > >unsigned int xlen, unsigned int xprecision, > >unsigned int precision, unsigned int shift) > > { > > - unsigned int len = rshift_large_common (val, xval, xlen, xprecision, > > shift); > > + /* Work out how many blocks are needed to store the significant bits > > + (excluding the upper zeros or signs). */ > > + unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift); > > + unsigned int len = blocks_needed; > > + if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) > > + && len > xlen > > + && xval[xlen - 1] >= 0) > > +len = xlen; > > I think here too it would be worth dropping the: > > UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) > > part of the condition, since presumably the change should be safe > regardless of that. If so, there is also one spot in lshift_large as well. So incrementally: --- gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200 +++ gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200 @@ -2013,8 +2013,7 @@ /* The whole-block shift fills with zeros. */ unsigned int len = BLOCKS_NEEDED (precision); - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) -len = xlen + skip + 1; + len = MIN (xlen + skip + 1, len); for (unsigned int i = 0; i < skip; ++i) val[i] = 0; @@ -2079,9 +2078,7 @@ (excluding the upper zeros or signs). */ unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift); unsigned int len = blocks_needed; - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) - && len > xlen - && xval[xlen - 1] >= 0) + if (len > xlen && xval[xlen - 1] >= 0) len = xlen; rshift_large_common (val, xval, xlen, shift, len); @@ -2114,9 +2111,7 @@ /* Work out how many blocks are needed to store the significant bits (excluding the upper zeros or signs). */ unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift); - unsigned int len = blocks_needed; - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen) -len = xlen; + unsigned int len = MIN (xlen, blocks_needed); rshift_large_common (val, xval, xlen, shift, len); which I'll test soon. > OK for thw wide-int parts with those changes. Thanks. What do you think about that --- gcc/wide-int.h.jj 2023-10-11 12:05:47.718059477 +0200 +++ gcc/wide-int.h 2023-10-11 13:51:56.081552500 +0200 @@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig u.valp = XNEWVEC (HOST_WIDE_INT, l); return u.valp; } + else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS) +u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef); return u.val; } @@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne memcpy (u.val, valp, l * sizeof (u.val[0])); XDELETEVEC (valp); } + else if (len && len < WIDE_INT_MAX_INL_ELTS) +gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len] +== HOST_WIDE_INT_UC (0xbaaddeadbeef)); len = l; /* There are no excess bits in val[len - 1]. */ STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0); part, shall that go into trunk as well or is that too much slowdown for checking builds? Jakub
[committed] wide-int: Fix build with gcc < 12 or clang++ [PR111787]
Hi! While my wide_int patch bootstrapped/regtested fine when I used GCC 12 as system gcc, apparently it doesn't with GCC 11 and older or clang++. For GCC before PR96555 C++ DR1315 implementation the compiler complains about template argument involving template parameters, for clang++ the same + complains about missing needs_write_val_arg static data member in some wi::int_traits specializations. I've so far rebuilt just stage3 gcc subdirectory with this patch and Tobias and William made it with it through stage1. Committed to unbreak build for others. 2023-10-12 Jakub Jelinek PR bootstrap/111787 * tree.h (wi::int_traits ::needs_write_val_arg): New static data member. (int_traits >::needs_write_val_arg): Likewise. (wi::ints_for): Provide separate partial specializations for generic_wide_int > and INL_CONST_PRECISION or that and CONST_PRECISION, rather than using int_traits >::precision_type as the second template argument. * rtl.h (wi::int_traits ::needs_write_val_arg): New static data member. * double-int.h (wi::int_traits ::needs_write_val_arg): Likewise. --- gcc/tree.h.jj 2023-10-12 16:01:04.0 +0200 +++ gcc/tree.h 2023-10-12 16:52:51.977954615 +0200 @@ -6237,6 +6237,7 @@ namespace wi static const enum precision_type precision_type = VAR_PRECISION; static const bool host_dependent_precision = false; static const bool is_sign_extended = false; +static const bool needs_write_val_arg = false; }; template @@ -6262,6 +6263,7 @@ namespace wi = N == ADDR_MAX_PRECISION ? INL_CONST_PRECISION : CONST_PRECISION; static const bool host_dependent_precision = false; static const bool is_sign_extended = true; +static const bool needs_write_val_arg = false; static const unsigned int precision = N; }; @@ -6293,8 +6295,14 @@ namespace wi tree_to_poly_wide_ref to_poly_wide (const_tree); template - struct ints_for >, - int_traits >::precision_type> + struct ints_for >, INL_CONST_PRECISION> + { +typedef generic_wide_int > extended; +static extended zero (const extended &); + }; + + template + struct ints_for >, CONST_PRECISION> { typedef generic_wide_int > extended; static extended zero (const extended &); @@ -6532,8 +6540,15 @@ wi::to_poly_wide (const_tree t) template inline generic_wide_int > wi::ints_for >, - wi::int_traits >::precision_type ->::zero (const extended &x) + wi::INL_CONST_PRECISION>::zero (const extended &x) +{ + return build_zero_cst (TREE_TYPE (x.get_tree ())); +} + +template +inline generic_wide_int > +wi::ints_for >, + wi::CONST_PRECISION>::zero (const extended &x) { return build_zero_cst (TREE_TYPE (x.get_tree ())); } --- gcc/rtl.h.jj2023-09-29 22:04:44.463012421 +0200 +++ gcc/rtl.h 2023-10-12 16:54:59.915240074 +0200 @@ -2270,6 +2270,7 @@ namespace wi /* This ought to be true, except for the special case that BImode is canonicalized to STORE_FLAG_VALUE, which might be 1. */ static const bool is_sign_extended = false; +static const bool needs_write_val_arg = false; static unsigned int get_precision (const rtx_mode_t &); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, const rtx_mode_t &); --- gcc/double-int.h.jj 2023-10-12 16:01:04.260164202 +0200 +++ gcc/double-int.h2023-10-12 16:53:41.401292272 +0200 @@ -442,6 +442,7 @@ namespace wi { static const enum precision_type precision_type = INL_CONST_PRECISION; static const bool host_dependent_precision = true; +static const bool needs_write_val_arg = false; static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT; static unsigned int get_precision (const double_int &); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, Jakub
Re: [Patch] libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory
On Thu, Oct 12, 2023 at 06:37:00PM +0200, Tobias Burnus wrote: > libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory > > In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was > insufficiently specified; 5.2 clarified this with extensions/clarifications > (omp_initial_device, omp_invalid_device, "conforming device number"). > GCC's implementation matches OpenMP 5.2. > > libgomp/ChangeLog: > > * libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to > OMP_TARGET_OFFLOAD. > (OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE; > clarify MANDATORY behavior. LGTM. Jakub
Re: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars
On Tue, Oct 10, 2023 at 06:46:35PM +0200, Tobias Burnus wrote: > * parse.cc (check_omp_allocate_stmt): Permit procedure pointers > here (rejected later) for less mislreading diagnostic. s/misl/mis/ > libgomp/ChangeLog: > > * libgomp.texi: Fill in something here. > @@ -7220,8 +7227,7 @@ gfc_resolve_omp_allocate (gfc_namespace *ns, > gfc_omp_namelist *list) >&n->where); > continue; > } > - if (ns != n->sym->ns || n->sym->attr.use_assoc > - || n->sym->attr.host_assoc || n->sym->attr.imported) > + if (ns != n->sym->ns || n->sym->attr.use_assoc || > n->sym->attr.imported) s/ n/ n/ > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -1400,23 +1400,53 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) > "region must specify an % clause", t); > /* Skip for omp_default_mem_alloc (= 1), >unless align is present. */ > - else if (!errorcount > -&& (align != NULL_TREE > -|| alloc == NULL_TREE > -|| !integer_onep (alloc))) > + else if (errorcount > +|| (align == NULL_TREE > +&& alloc != NULL_TREE > +&& integer_onep (alloc))) > + DECL_ATTRIBUTES (t) = remove_attribute ("omp allocate", > + DECL_ATTRIBUTES (t)); Probably already preexisting, by I wonder how safe remove_attribute is. Aren't attributes shared in some cases (like __attribute__((attr1, attr2, attr3)) int a, b, c, d;)? Not really sure if something unshares them afterwards. If they are shared, adding new attributes is fine, that will make the new additions not shared and the tail shared, but remove_attribute could remove it from all of them at once. Perhaps I'm wrong, haven't verified. Otherwise LGTM (though, I didn't spot anything about allocatables in the patch, am I just looking wrong or are they still unsupported)? Jakub
[PATCH] wide-int: Fix estimation of buffer sizes for wide_int printing [PR111800]
Hi! As mentioned in the PR, my estimations on needed buffer size for wide_int and especially widest_int printing were incorrect, I've used get_len () in the estimations, but that is true only for !wi::neg_p (x) values. Under the hood, we have 3 ways to print numbers. print_decs which if if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT) || (wi.get_len () == 1)) uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive or negative) and otherwise uses print_hex, print_decu which if if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT) || (wi.get_len () == 1 && !wi::neg_p (wi))) uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive only) and print_hex, which doesn't print most significant limbs which are zero and the first limb which is non-zero prints such that redundant 0 hex digits aren't printed, while all limbs below that are printed with "%016" PRIx64. For wi::neg_p (x) values, the first limb of the precision is always non-zero, so we print all the limbs for the precision. So, the current estimations are accurate if !wi::neg_p (x), or when print_decs will be used and x.get_len () == 1, otherwise we need to use estimation based on get_precision () rather than get_len (). The following patch does that, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? The patch doesn't address what I've talked about earlier, that we might actually stop using print_hex when asked for print_dec{s,u} - we could for negative print_decs just negate and call print_decu, and in print_decu e.g. in a loop UNSIGNED wi::divmod_trunc by HOST_WIDE_INT_UC (1000) and print the 19 decimal digits of remainder if quotient is non-zero, otherwise non-padded rest, and then reshuffle the buffer. And/or perhaps print_hex should also take signop and print negative hex constants as -0x. if asked for SIGNED. And finally, I think we should try to rewrite tree-ssa-ccp.cc bit-cp from widest_int to wide_int, even the earlier: PHI node value: CONSTANT 0xffe2 (0x19) in the -fdump-tree-ccp-details dumps is horribly confusing when the type is say just 32-bit or 64-bit, and with the recent widest_int changes those are now around with > 32000 f hex digits in there. Not to mention we shouldn't really care about state of bits beyond the precision and I think we always have the type in question around (x.val is INTEGER_CST of the right type and we just to::widest it, just x.mask is widest_int). 2023-10-14 Jakub Jelinek PR tree-optimization/111800 gcc/ * wide-int.cc (assert_deceq): Use wi.get_len () for buffer size estimation only if !wi::neg_p (wi) or if len is 1 and sgn is SIGNED, otherwise use WIDE_INT_MAX_HWIS for wi.get_precision (). (assert_hexeq): Use wi.get_len () for buffer size estimation only if !wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for wi.get_precision (). * wide-int-print.cc (print_decs): Use wi.get_len () for buffer size estimation only if !wi::neg_p (wi) or if len is 1, otherwise use WIDE_INT_MAX_HWIS for wi.get_precision (). (print_decu): Use wi.get_len () for buffer size estimation only if !wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for wi.get_precision (). (print_hex): Likewise. * value-range.cc (irange_bitmask::dump): Use get_len () for buffer size estimation only if !wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for get_precision (). * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use i_bound.get_len () for buffer size estimation only if !wi::neg_p (i_bound) or if len is 1 and !TYPE_UNSIGNED, otherwise use WIDE_INT_MAX_HWIS for i_bound.get_precision (). Use TYPE_SIGN macro in print_dec call argument. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Assert w.get_precision () is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS. --- gcc/wide-int.cc.jj 2023-10-13 19:34:44.288830022 +0200 +++ gcc/wide-int.cc 2023-10-13 20:23:12.889386810 +0200 @@ -2450,7 +2450,9 @@ static void assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; unsigned len = wi.get_len (); + if ((len != 1 || sgn == UNSIGNED) && wi::neg_p (wi)) +len = WIDE_INT_MAX_HWIS (wi.get_precision ()); if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); print_dec (wi, p, sgn); @@ -2463,7 +2465,11 @@ static void as
[PATCH] wide-int, v2: Fix estimation of buffer sizes for wide_int printing [PR111800]
Hi! On Sat, Oct 14, 2023 at 10:41:28AM +0200, Richard Biener wrote: > Can we somehow abstract this common pattern? So like this? With room for the future tweaks like printing decimal instead of hex numbers by print_dec*, where we'd only need to adjust the inlines. The XALLOCAVEC call is left for the callers, those would make the inlines uninlinable and not doing what they should. 2023-10-14 Jakub Jelinek PR tree-optimization/111800 gcc/ * wide-int-print.h (print_dec_buf_size, print_decs_buf_size, print_decu_buf_size, print_hex_buf_size): New inline functions. * wide-int.cc (assert_deceq): Use print_dec_buf_size. (assert_hexeq): Use print_hex_buf_size. * wide-int-print.cc (print_decs): Use print_decs_buf_size. (print_decu): Use print_decu_buf_size. (print_hex): Use print_hex_buf_size. (pp_wide_int_large): Use print_dec_buf_size. * value-range.cc (irange_bitmask::dump): Use print_hex_buf_size. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use print_dec_buf_size. Use TYPE_SIGN macro in print_dec call argument. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Assert w.get_precision () is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS. --- gcc/wide-int-print.h.jj 2023-10-13 19:34:44.283830089 +0200 +++ gcc/wide-int-print.h2023-10-14 11:21:44.190603091 +0200 @@ -36,4 +36,40 @@ extern void print_hex (const wide_int_re extern void print_hex (const wide_int_ref &wi, FILE *file); extern void pp_wide_int_large (pretty_printer *, const wide_int_ref &, signop); +inline bool +print_dec_buf_size (const wide_int_ref &wi, signop sgn, unsigned int *len) +{ + unsigned int l = wi.get_len (); + if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi)) +l = WIDE_INT_MAX_HWIS (wi.get_precision ()); + l = l * HOST_BITS_PER_WIDE_INT / 4 + 4; + *len = l; + return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE); +} + +inline bool +print_decs_buf_size (const wide_int_ref &wi, unsigned int *len) +{ + return print_dec_buf_size (wi, SIGNED, len); +} + +inline bool +print_decu_buf_size (const wide_int_ref &wi, unsigned int *len) +{ + return print_dec_buf_size (wi, UNSIGNED, len); +} + +inline bool +print_hex_buf_size (const wide_int_ref &wi, unsigned int *len) +{ + unsigned int l; + if (wi::neg_p (wi)) +l = WIDE_INT_MAX_HWIS (wi.get_precision ()); + else +l = wi.get_len (); + l = l * HOST_BITS_PER_WIDE_INT / 4 + 4; + *len = l; + return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE); +} + #endif /* WIDE_INT_PRINT_H */ --- gcc/wide-int.cc.jj 2023-10-14 11:07:52.738850767 +0200 +++ gcc/wide-int.cc 2023-10-14 11:22:03.100347386 +0200 @@ -2450,9 +2450,9 @@ static void assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; - unsigned len = wi.get_len (); - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); + unsigned len; + if (print_dec_buf_size (wi, sgn, &len)) +p = XALLOCAVEC (char, len); print_dec (wi, p, sgn); ASSERT_STREQ (expected, p); } @@ -2463,9 +2463,9 @@ static void assert_hexeq (const char *expected, const wide_int_ref &wi) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; - unsigned len = wi.get_len (); - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); + unsigned len; + if (print_hex_buf_size (wi, &len)) +p = XALLOCAVEC (char, len); print_hex (wi, p); ASSERT_STREQ (expected, p); } --- gcc/wide-int-print.cc.jj2023-10-14 11:07:52.737850781 +0200 +++ gcc/wide-int-print.cc 2023-10-14 11:37:43.994623668 +0200 @@ -75,9 +75,9 @@ void print_decs (const wide_int_ref &wi, FILE *file) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; - unsigned len = wi.get_len (); - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); + unsigned len; + if (print_decs_buf_size (wi, &len)) +p = XALLOCAVEC (char, len); print_decs (wi, p); fputs (p, file); } @@ -102,9 +102,9 @@ void print_decu (const wide_int_ref &wi, FILE *file) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; - unsigned len = wi.get_len (); - if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)) -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4); + unsigned len; + if (print_decu_buf_size (wi, &len)) +p = XALLOCAVEC (char, len); print_decu (wi, p); fputs (p, file); } @@ -141,9 +141,9 @@ void print_hex (const wide_int_ref &wi, FILE *file) { char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf; - unsigned len
Re: [patch] libgomp.texi: Update "Enabling OpenMP"
On Sat, Oct 14, 2023 at 03:46:52PM -0600, Sandra Loosemore wrote: > On 10/14/23 13:43, Tobias Burnus wrote: > > When browsing libgomp doc, I came across > > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html>> First, I > > found especially the Fortran part difficult to read. Secondly, > > it missed the C++ attribute syntax. And I also missed a reference to > > -fopenmp-simd. > > > > The attached patch tries to improve this. Note that it talks about C and > > C++ attributes, even though C23's [[omp::]] support has not yet landed. > > (But is expected very soon.) > > Is somebody actively working on implementing that, and expecting to get it > in before Stage 1 closes? I don't think we should document features that I am (attached is a WIP, which can now compile most of g++.dg/gomp/attrs-1.C in -std=c2x -fopenmp, except for the scan/section directives). That said, I agree it might be premature to document it before it is in. > don't exist yet. This syntax for C also isn't even in the draft OpenMP 6.0 > document so at this point it's just a hypothetical extension. It is in OpenMP spec git and it is very unlikely it would be removed. Jakub --- gcc/cp/parser.h.jj 2023-09-20 08:42:51.987008923 +0200 +++ gcc/cp/parser.h 2023-10-12 13:32:42.503496571 +0200 @@ -408,7 +408,8 @@ struct GTY(()) cp_parser { identifiers) rather than an explicit template parameter list. */ bool fully_implicit_function_template_p; - /* TRUE if omp::directive or omp::sequence attributes may not appear. */ + /* TRUE if omp::directive, omp::decl or omp::sequence attributes may not + appear. */ bool omp_attrs_forbidden_p; /* Tracks the function's template parameter list when declaring a function --- gcc/c/c-decl.cc.jj 2023-10-11 10:59:12.378170030 +0200 +++ gcc/c/c-decl.cc 2023-10-11 17:23:42.902257966 +0200 @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. #include "context.h" /* For 'g'. */ #include "omp-general.h" #include "omp-offload.h" /* For offload_vars. */ +#include "c-parser.h" #include "tree-pretty-print.h" @@ -325,15 +326,34 @@ i_label_binding (tree node) /* The resulting tree type. */ -union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE"), +union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE + 2 * (TREE_CODE (&%h.generic) == C_TOKEN_VEC)"), chain_next ("(union lang_tree_node *) c_tree_chain_next (&%h.generic)"))) lang_tree_node { union tree_node GTY ((tag ("0"), desc ("tree_node_structure (&%h)"))) generic; struct lang_identifier GTY ((tag ("1"))) identifier; + struct c_tree_token_vec GTY ((tag ("2"))) c_token_vec; }; +/* Langhook for tree_size. */ +size_t +c_tree_size (enum tree_code code) +{ + gcc_checking_assert (code >= NUM_TREE_CODES); + switch (code) +{ +case C_TOKEN_VEC: return sizeof (c_tree_token_vec); +default: + switch (TREE_CODE_CLASS (code)) + { + case tcc_declaration: return sizeof (tree_decl_non_common); + case tcc_type: return sizeof (tree_type_non_common); + default: gcc_unreachable (); + } +} +} + /* Track bindings and other things that matter for goto warnings. For efficiency, we do not gather all the decls at the point of definition. Instead, we point into the bindings structure. As --- gcc/c/c-parser.cc.jj2023-10-11 10:59:12.426169364 +0200 +++ gcc/c/c-parser.cc 2023-10-13 17:47:27.32902 +0200 @@ -247,12 +247,21 @@ struct GTY(()) c_parser { macro. */ BOOL_BITFIELD seen_string_literal : 1; + /* TRUE if omp::directive, omp::decl or omp::sequence attributes may not + appear. */ + BOOL_BITFIELD omp_attrs_forbidden_p : 1; + /* Location of the last consumed token. */ location_t last_token_location; /* Holds state for parsing collapsed OMP_FOR loops. Managed by c_parser_omp_for_loop. */ struct omp_for_parse_data * GTY((skip)) omp_for_parse_state; + + /* If we're in the context of OpenMP directives written as C23 + attributes turned into pragma, vector of tokens created from that, + otherwise NULL. */ + vec *in_omp_attribute_pragma; }; /* Return a pointer to the Nth token in PARSERs tokens_buf. */ @@ -1383,6 +1392,17 @@ c_parser_skip_to_pragma_eol (c_parser *p } while (token_type != CPP_PRAGMA_EOL); + if (parser->in_omp_attribute_pragma) +{ + c_token *token = c_parser_peek_token (parser); + if (token->type == CPP_EOF) + { + parser->tokens = &parser->tokens_buf[0]; + parser->tokens_avail = token->flags; + parser->in_omp_attribute_pragma = NULL; + } +} + parser->error = false; } @@ -5430,6 +5450,109 @@ c_parser_balanced_token_sequence (c_pars } } +static bool c_parser_check_balanced_raw_token_sequence (c_parser *, + unsigned int *); + +/* Parse arguments of omp::directive or omp::decl attribute. + +
[PATCH] wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}
Hi! The following patch implements printing of wide_int/widest_int numbers decimally when asked for that using print_dec{,s,u}, even if they have precision larger than 64 and get_len () above 1 (right now we printed them hexadecimally and even negative numbers as huge positive hexadecimal). In order to avoid the expensive division/modulo by 10^19 twice, once to estimate how many will be needed and another to actually print it, the patch prints the 19 digit chunks in reverse order (from least significant to most significant) and then reorders those with linear complexity to form the right printed number. Tested with printing both 256 and 320 bit numbers (first as an example of even number of 19 digit chunks plus one shorter above it, the second as an example of odd number of 19 digit chunks plus one shorter above it). The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now is one byte too much (one byte for -, one for '\0') and too conservative, so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g. l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative estimation (though more expensive to compute in inline code). But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come from? Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple of 64... Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-10-17 Jakub Jelinek * wide-int-print.h (print_dec_buf_size): For length, divide number of bits by 3 and add 3 instead of division by 4 and adding 4. * wide-int-print.cc (print_decs): Remove superfluous ()s. Don't call print_hex, instead call print_decu on either negated value after printing - or on wi itself. (print_decu): Don't call print_hex, instead print even large numbers decimally. (pp_wide_int_large): Assume len from print_dec_buf_size is big enough even if it returns false. * pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if pp_wide_int_large should be used. * tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size to compute needed buffer size. --- gcc/wide-int-print.h.jj 2023-10-15 23:04:06.195422820 +0200 +++ gcc/wide-int-print.h2023-10-16 10:14:41.327401697 +0200 @@ -42,7 +42,7 @@ print_dec_buf_size (const wide_int_ref & unsigned int l = wi.get_len (); if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi)) l = WIDE_INT_MAX_HWIS (wi.get_precision ()); - l = l * HOST_BITS_PER_WIDE_INT / 4 + 4; + l = l * HOST_BITS_PER_WIDE_INT / 3 + 3; *len = l; return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE); } --- gcc/wide-int-print.cc.jj2023-10-15 23:04:06.195422820 +0200 +++ gcc/wide-int-print.cc 2023-10-16 11:20:30.662174735 +0200 @@ -49,14 +49,12 @@ print_dec (const wide_int_ref &wi, FILE } -/* Try to print the signed self in decimal to BUF if the number fits - in a HWI. Other print in hex. */ +/* Try to print the signed self in decimal to BUF. */ void print_decs (const wide_int_ref &wi, char *buf) { - if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT) - || (wi.get_len () == 1)) + if (wi.get_precision () <= HOST_BITS_PER_WIDE_INT || wi.get_len () == 1) { if (wi::neg_p (wi)) sprintf (buf, "-" HOST_WIDE_INT_PRINT_UNSIGNED, @@ -64,12 +62,17 @@ print_decs (const wide_int_ref &wi, char else sprintf (buf, HOST_WIDE_INT_PRINT_DEC, wi.to_shwi ()); } + else if (wi::neg_p (wi)) +{ + widest2_int w = widest2_int::from (wi, SIGNED); + *buf = '-'; + print_decu (-w, buf + 1); +} else -print_hex (wi, buf); +print_decu (wi, buf); } -/* Try to print the signed self in decimal to FILE if the number fits - in a HWI. Other print in hex. */ +/* Try to print the signed self in decimal to FILE. */ void print_decs (const wide_int_ref &wi, FILE *file) @@ -82,8 +85,7 @@ print_decs (const wide_int_ref &wi, FILE fputs (p, file); } -/* Try to print the unsigned self in decimal to BUF if the number fits - in a HWI. Other print in hex. */ +/* Try to print the unsigned self in decimal to BUF. */ void print_decu (const wide_int_ref &wi, char *buf) @@ -92,11 +94,37 @@ print_decu (const wide_int_ref &wi, char || (wi.get_len () == 1 && !wi::neg_p (wi))) sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, wi.to_uhwi ()); else -print_hex (wi, buf); +{ + widest2_int w = widest2_int::from (wi, UNSIGNED), r; + widest2_int ten19 = HOST_WIDE_INT_UC (1000); + char buf2[20], next1[19], next2[19]; + size_t l, c = 0, i; + /* In order to avoid dividing this twice, print the 19 decimal +digit chunks in reverse order into buffer
Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote: > C++11 (and C23) attribute do not seem to be properly handled: > > [[omp::decl (declare target,indirect(1))]] > int foo(void) { return 5; } > [[omp::decl (declare target indirect)]] > int bar(void) { return 8; } Isn't that correct? Declare target directive has the forms declare target (list) declare target declare target clauses The first form is essentially equivalent to declare target enter (list), the second to begin declare target with no clauses. Now, [[omp::decl (declare target)]] int v; matches the first form and so enter (v) clause is implied. But say [[omp::decl (declare target, device_type (any))]] int v; is the third type and so nothing is implied, so it is equivalent to int v; #pragma omp declare target device_type (any) Don't remember if that is supposed to be an error or just not do anything because there is no enter or to or link clause. So, I think if you want to make foo indirect, the above would have to be: [[omp::decl (declare target,enter,indirect(1))]] int foo(void) { return 5; } [[omp::decl (declare target indirect enter)]] int bar(void) { return 8; } or so (or to instead of enter, but guess that is either deprecated or removed but we should support that anyway). Jakub
Re: [r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64
On Wed, Oct 18, 2023 at 07:14:36AM +, Richard Biener wrote: > It's interesting that when the target has AVX512 enabled we get > AVX512 style masks used also for SSE and AVX vector sizes but the > OMP SIMD clones for SSE and AVX vector sizes use SSE/AVX style > masks and only the AVX512 size clone uses the AVX512 integer mode > mask. That necessarily requires an extra setup instruction for > the mask argument. It is an ABI matter, the ABI of the clones shouldn't change just because of a supposedly non ABI changing option (ISA flags like -mavx512f etc.). Under the hood, if the callers are -mavx512f the expectation is that the AVX512 simd clone will be used, but of course that doesn't have to be the case either because of options requesting only 256 or 128-bit vector width or loops with small safelen or number of iterations or other reasons. Jakub
Re: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars
On Wed, Oct 18, 2023 at 11:12:44AM +0200, Thomas Schwinge wrote: > Hi Tobias! > > On 2023-10-13T15:29:52+0200, Tobias Burnus wrote: > > => Updated patch attached > > When cherry-picking this commit 2d3dbf0eff668bed5f5f168b3cafd8590c54 > "Fortran: Support OpenMP's 'allocate' directive for stack vars" on top of > slightly older GCC sources (mentioning that just in case that's > relevant), in a configuration with offloading enabled (only), I see: > > +FAIL: gfortran.dg/gomp/allocate-13.f90 -O (internal compiler error: > tree code 'statement_list' is not supported in LTO streams) > +FAIL: gfortran.dg/gomp/allocate-13.f90 -O (test for excess errors) Any references to GENERIC code in clauses etc. should have been gimplified or cleared during gimplification, we shouldn't support STATEMENT_LIST in LTO streaming. Jakub
[PATCH] tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]
Hi! GCC ICEs on the first testcase. Successful match_uaddc_usubc ends up with some dead stmts which DCE will remove (hopefully) later all. The ICE is because one of the dead stmts refers to a freed SSA_NAME. The code already gsi_removes a couple of stmts in the /* Remove some statements which can't be kept in the IL because they use SSA_NAME whose setter is going to be removed too. */ section for the same reason (the reason for the freed SSA_NAMEs is that we don't really have a replacement for those cases - all we have after a match is combined overflow from the addition/subtraction of 2 operands + a [0, 1] carry in, but not the individual overflows from the former 2 additions), but for the last (most significant) limb case, where we try to match x = op1 + op2 + carry1 + carry2; or x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but left around the 2 temporary stmts as dead; if we were unlucky enough that those referenced the carry flag that went away, it ICEs. So, the following patch remembers those temporary statements (rather than trying to rediscover them more expensively) and removes them before the final one is replaced. While working on it, I've noticed we didn't support all the reassociated possibilities of writing the addition of 4 operands or subtracting 3 operands from one, we supported e.g. x = ((op1 + op2) + op3) + op4; x = op1 + ((op2 + op3) + op4); but not x = (op1 + (op2 + op3)) + op4; x = op1 + (op2 + (op3 + op4)); Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what we were searching for (if non-NULL) - rhs[0] is inspected in the first loop and has different handling for the MINUS_EXPR case. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-10-18 Jakub Jelinek PR tree-optimization/111845 * tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary statements for the 4 operand addition or subtraction of 3 operands from 1 operand cases and remove them when successful. Look for nested additions even from rhs[2], not just rhs[1]. * gcc.dg/pr111845.c: New test. * gcc.target/i386/pr111845.c: New test. --- gcc/tree-ssa-math-opts.cc.jj2023-09-18 10:37:56.180963000 +0200 +++ gcc/tree-ssa-math-opts.cc 2023-10-17 14:46:39.430139602 +0200 @@ -4581,6 +4581,7 @@ match_uaddc_usubc (gimple_stmt_iterator if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type)) return false; + auto_vec temp_stmts; if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR) { /* If overflow flag is ignored on the MSB limb, we can end up with @@ -4615,26 +4616,29 @@ match_uaddc_usubc (gimple_stmt_iterator rhs[0] = gimple_assign_rhs1 (g); tree &r = rhs[2] ? rhs[3] : rhs[2]; r = r2; + temp_stmts.quick_push (g); } else break; } - while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3]) - { - gimple *g = SSA_NAME_DEF_STMT (rhs[1]); - if (has_single_use (rhs[1]) - && is_gimple_assign (g) - && gimple_assign_rhs_code (g) == PLUS_EXPR) - { - rhs[1] = gimple_assign_rhs1 (g); - if (rhs[2]) - rhs[3] = gimple_assign_rhs2 (g); - else - rhs[2] = gimple_assign_rhs2 (g); - } - else - break; - } + for (int i = 1; i <= 2; ++i) + while (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME && !rhs[3]) + { + gimple *g = SSA_NAME_DEF_STMT (rhs[i]); + if (has_single_use (rhs[i]) + && is_gimple_assign (g) + && gimple_assign_rhs_code (g) == PLUS_EXPR) + { + rhs[i] = gimple_assign_rhs1 (g); + if (rhs[2]) + rhs[3] = gimple_assign_rhs2 (g); + else + rhs[2] = gimple_assign_rhs2 (g); + temp_stmts.quick_push (g); + } + else + break; + } /* If there are just 3 addends or one minuend and two subtrahends, check for UADDC or USUBC being pattern recognized earlier. Say r = op1 + op2 + ovf1 + ovf2; where the (ovf1 + ovf2) part @@ -5039,7 +5043,17 @@ match_uaddc_usubc (gimple_stmt_iterator g = gimple_build_assign (ilhs, IMAGPART_EXPR, build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs)); if (rhs[2]) -gsi_insert_before (gsi, g, GSI_SAME_STMT); +{ + gsi_insert_before (gsi, g, GSI_SAME_STMT); + /* Remove some further statements which can't be kept in the IL because +they can use SSA_NAMEs whose setter is going to be removed too. */ + while (temp_stmts.length ()) + { + g = temp_stmts.pop (); + gsi2 = gsi_for_stmt (g); + gsi_remove (&gsi2, true); +
Re: [PATCH] Support g++ 4.8 as a host compiler.
On Sun, Oct 15, 2023 at 12:43:10PM +0100, Richard Sandiford wrote: > It seemed like there was considerable support for bumping the minimum > to beyond 4.8. I think we should wait until a decision has been made > before adding more 4.8 workarounds. I think adding a workaround until that decision is made and perhaps removing it afterwards will make life easier for people still using gcc 4.8. > Having a conditional explicit constructor is dangerous because it changes > semantics. E.g. consider: > > #include > > union u { int x; }; > void f(u *ptr) { new(ptr) u; } > void g(u *ptr) { new(ptr) u(); } > > g(ptr) zeros ptr->x whereas f(ptr) doesn't. If we add "u() {}" then g() > does not zero ptr->x. > > So if we did add the workaround, it would need to be unconditional, > like you say. What about using more directed workaround then? Like (just stage1 build tested, perhaps with comment why we do that) below? Seems at least in stage1 it is the only problematic spot. --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -4951,8 +4951,14 @@ cse_insn (rtx_insn *insn) && is_a (mode, &int_mode) && (extend_op = load_extend_op (int_mode)) != UNKNOWN) { +#if GCC_VERSION >= 5000 struct rtx_def memory_extend_buf; rtx memory_extend_rtx = &memory_extend_buf; +#else + alignas (alignof (rtx_def)) unsigned char + memory_extended_buf[sizeof (rtx_def)]; + rtx memory_extend_rtx = (rtx) &memory_extended_buf[0]; +#endif /* Set what we are trying to extend and the operation it might have been extended with. */ Jakub
Re: [Patch] OpenMP: Avoid ICE with LTO and 'omp allocate (was: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars)
On Wed, Oct 18, 2023 at 12:56:01PM +0200, Tobias Burnus wrote: > On 18.10.23 11:36, Jakub Jelinek wrote: > > On Wed, Oct 18, 2023 at 11:12:44AM +0200, Thomas Schwinge wrote: > > > +FAIL: gfortran.dg/gomp/allocate-13.f90 -O (internal compiler > > > error: tree code 'statement_list' is not supported in LTO streams) > > Any references to GENERIC code in clauses etc. should have been gimplified > > or cleared during gimplification, we shouldn't support STATEMENT_LIST > > in LTO streaming. > > We only needed the statement list as aid during the gimplify.cc handling > of GOMP_alloc/GOMP_free for Fortran. How about just remove_attribute it > in that case? As discussed, as DECL_ATTRIBUTE gets added from the left > to the DECL_CHAIN, there shouldn't be a problem of introducing shared > trees; note that 'omp allocate' itself is added per DECL, i.e. it does > not introduce sharing itself, either. > > Tested with x86-64-gnu-linux. > > Tobias > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > München, HRB 106955 > OpenMP: Avoid ICE with LTO and 'omp allocate' > > gcc/ChangeLog: > > * gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute > to avoid that auxillary statement list reaches to LTO. > > gcc/testsuite/ChangeLog: > > * gfortran.dg/gomp/allocate-13a.f90: New test. LGTM. Jakub
Re: [PATCH] Support g++ 4.8 as a host compiler.
On Wed, Oct 18, 2023 at 11:23:49AM +0100, Richard Sandiford wrote: > > --- a/gcc/cse.cc > > +++ b/gcc/cse.cc > > @@ -4951,8 +4951,14 @@ cse_insn (rtx_insn *insn) > > && is_a (mode, &int_mode) > > && (extend_op = load_extend_op (int_mode)) != UNKNOWN) > > { > > +#if GCC_VERSION >= 5000 > > struct rtx_def memory_extend_buf; > > rtx memory_extend_rtx = &memory_extend_buf; > > +#else > > + alignas (alignof (rtx_def)) unsigned char > > + memory_extended_buf[sizeof (rtx_def)]; > > Looks like the simpler "alignas (rtx_def)" should work. It does. > LGTM otherwise FWIW. Here is what I'm bootstrapping/regtesting on gcc112 now (i.e. with 4.8.5 as system compiler), added details what bug we are working around. The reduced testcase on which I've bisected it is: struct poly_int { poly_int() = default; template poly_int(const T &...) {} }; union rtunion { poly_int rt_subregrt_rtx; }; struct rtx_def { rtunion fld; }; void cse_insn() { rtx_def memory_extend_buf; } or even with just template poly_int(); line in there. Bet gcc 4.8/4.9 was unhappy about the template variadic ctor accepting empty pack and being like the default ctor but not defaulted in that case. Making it guaranteed that it has at least one argument say through template poly_int(const U &, const T &...) {} fixes it for 4.8/4.9 as well. 2023-10-18 Jakub Jelinek PR bootstrap/111852 * cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of using rtx_def type for memory_extend_buf, use unsigned char arrayy with size of rtx_def and its alignment. --- gcc/cse.cc.jj 2023-06-20 08:57:38.339505245 +0200 +++ gcc/cse.cc 2023-10-18 13:20:30.555836778 +0200 @@ -4951,8 +4951,15 @@ cse_insn (rtx_insn *insn) && is_a (mode, &int_mode) && (extend_op = load_extend_op (int_mode)) != UNKNOWN) { +#if GCC_VERSION >= 5000 struct rtx_def memory_extend_buf; rtx memory_extend_rtx = &memory_extend_buf; +#else + /* Workaround GCC < 5 bug, fixed in r5-3834 as part of PR63362 +fix. */ + alignas (rtx_def) unsigned char memory_extended_buf[sizeof (rtx_def)]; + rtx memory_extend_rtx = (rtx) &memory_extended_buf[0]; +#endif /* Set what we are trying to extend and the operation it might have been extended with. */ Jakub
Re: [PATCH] Support g++ 4.8 as a host compiler.
On Wed, Oct 18, 2023 at 01:33:40PM +0200, Jakub Jelinek wrote: > Making it guaranteed that it has at least one argument say through > template poly_int(const U &, const T &...) {} > fixes it for 4.8/4.9 as well. So, perhaps (but so far totally untested, the other bootstrap is still running): 2023-10-18 Jakub Jelinek * poly-int.h (poly_int::poly_int): Ensure the const Cs &... argument ctor has at least one argument. --- gcc/poly-int.h.jj 2023-10-13 19:34:44.112832389 +0200 +++ gcc/poly-int.h 2023-10-18 13:49:29.038751482 +0200 @@ -379,8 +379,8 @@ public: template poly_int (const poly_int &); - template - constexpr poly_int (const Cs &...); + template + constexpr poly_int (const C0 &, const Cs &...); poly_int &operator = (const poly_int &) = default; @@ -446,11 +446,11 @@ poly_int::poly_int (const poly_int } template -template +template inline constexpr -poly_int::poly_int (const Cs &... cs) +poly_int::poly_int (const C0 &c0, const Cs &... cs) : poly_int (typename poly_int_fullness= N>::type (), - cs...) {} + c0, cs...) {} /* Initialize with c0, cs..., and some trailing zeros. */ template Jakub
Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]
On Wed, Oct 18, 2023 at 05:28:10PM +, waffl3x wrote: > I've seen plenty of these G_ or _ macros on strings around like in > grokfndecl for these errors. > > G_("static member function %qD cannot have cv-qualifier") > G_("non-member function %qD cannot have cv-qualifier") > > G_("static member function %qD cannot have ref-qualifier") > G_("non-member function %qD cannot have ref-qualifier") > > I have been able to figure out it relates to translation, but not > exactly what the protocol around them is. I think in my original patch > I had refactored this code a bunch, I figured adding a 3rd case to it > justifies a refactor. I think I forgot to add those changes to the > original patch, either that or I undid it or moved it somewhere else. > Anyway, the point is, coming back to it now to re-add those diagnostics > I realized I probably shouldn't have changed those strings. > > I also have been wondering whether I should be putting macros on any > strings I add, it seemed like there might have been a macro for text > that needs translation. Is this something I should be doing? There are different kinds of format strings in GCC, the most common are the gcc-internal-format strings. If you call a function which is expected to take such translatable format string (in particular a function which takes a gmsgid named argument like error, error_at, pedwarn, warning_at, ...) and pass a string literal to that function, nothing needs to be marked in a special way, both gcc/po/exgettext is able to collect such literals into gcc/po/gcc.pot for translations and the function is supposed to use gettext etc. to translate it - e.g. see diagnostic_set_info using _(gmsgid) for that. But, if there is e.g. a temporary pointer var which points to format strings and only that is eventually passed to the diagnostic functions, gcc/po/exgettext won't be able to collect such literals, which is where the G_() macro comes into play and one marks the string as gcc-internal-format with it; the translation is still handled by the diagnostic function at runtime. The N_() macro is similar but for c-format strings instead. The _() macro both collects for translations if it is used with string literal, and expands to gettext call to translate it at runtime, which is something that should be avoided if something translates it again. And another i18n rule is that one shouldn't try to construct diagnostic messages from parts of english sentences, it is fine to fill in with %s/%qs etc. language keywords etc. but otherwise the format string should contain the whole diagnostic line, so that translators can reorder the words etc. Jakub
Re: gcc 13.2 is missing warnings?
On Thu, Oct 19, 2023 at 07:39:43AM -0400, Eric Sokolowsky via Gcc wrote: > I am using gcc 13.2 on Fedora 38. Consider the following program. > > #include > int main(int argc, char **argv) > { > printf("Enter a number: "); > int num = 0; > scanf("%d", &num); > > switch (num) > { > case 1: > int a = num + 3; > printf("The new number is %d.\n", a); > break; > case 2: > int b = num - 4; > printf("The new number is %d.\n", b); > break; > default: > int c = num * 3; > printf("The new number is %d.\n", c); > break; > } > } > > I would expect that gcc would complain about the declaration of > variables (a, b, and c) within the case statements. When I run "gcc > -Wall t.c" I get no warnings. When I run "g++ -Wall t.c" I get > warnings and errors as expected. I do get warnings when using MinGW on > Windows (gcc version 6.3 specifically). Did something change in 13.2? C isn't C++. In particular, the above is valid C23, which is why it is accepted as an extension in older C language versions starting with GCC 11. It is warned about with -pedantic/-Wpedantic and errored on with -pedantic-errors/-Werror=pedantic unless -std=c2x or -std=gnu2x is used. The C++ case is completely different. There labels are allowed before declarations already in C++98, but it is invalid to cross initialization of some variable using the jump to case 2 or default labels above. If you rewrite it as: case 1: int a; a = num + 3; printf("The new number is %d.\n", a); break; case 2: int b; b = num - 4; printf("The new number is %d.\n", b); break; default: int c; c = num * 3; printf("The new number is %d.\n", c); break; it is valid C++ and it won't be diagnosed. Note, this should have been posted to gcc-help instead. Jakub
Re: [PATCH] ABOUT-GCC-NLS: add usage guidance
On Thu, Oct 19, 2023 at 11:11:30AM -0400, Jason Merrill wrote: > A recent question led me to look at this file again, and it occurred to me > that > it could use to offer more guidance. OK for trunk? > > -- 8< -- > > gcc/ChangeLog: > > * ABOUT-GCC-NLS: Add usage guidance. > --- > gcc/ABOUT-GCC-NLS | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/gcc/ABOUT-GCC-NLS b/gcc/ABOUT-GCC-NLS > index e90a67144e3..4c8b94d0811 100644 > --- a/gcc/ABOUT-GCC-NLS > +++ b/gcc/ABOUT-GCC-NLS > @@ -23,6 +23,19 @@ For example, GCC source code should not contain calls like > `error > ("unterminated comment")' instead, as it is the `error' function's > responsibility to translate the message before the user sees it. > > +In general, use no markup for strings that are the immediate format string > +argument of a diagnostic function. Use G_("str") for strings that will be > +used as the format string for a diagnostic but are e.g. assigned to a > +variable first. Use N_("str") for other strings, particularly in a > +statically allocated array, that will be translated later by > +e.g. _(msgs[idx]). Use _("str") for strings that will not be translated The difference between G_ and N_ is whether they are GCC internal diagnostic format strings or printf family format strings (gcc-internal-format vs. c-format in po/gcc.pot), not anything else, so G_ can be used in statically allocated array as well and both for diagnostic routines and printf* family when translation is desirable it is eventually translated through _() or other gettext invocations, either inside of diagnostics.cc or elsewhere. So, the note about statically allocated array should move to G_ as well, and N_ should be described as similarly, but for strings which after translation are passed to printf family functions or so. ANd the translated later by e.g. note should be again for both and said that it is done automatically for GCC diagnostic routines. Jakub
Re: [pushed] c++: use G_ instead of _
On Thu, Oct 19, 2023 at 11:31:58AM -0400, Jason Merrill wrote: > --- a/gcc/cp/decl.cc > +++ b/gcc/cp/decl.cc > @@ -3607,8 +3607,8 @@ identify_goto (tree decl, location_t loc, const > location_t *locus, > { >bool complained > = emit_diagnostic (diag_kind, loc, 0, > -decl ? N_("jump to label %qD") > -: N_("jump to case label"), decl); > +decl ? G_("jump to label %qD") N_ for this is wrong because gettext will then not properly verify translators didn't screw things up by using some incompatible format string in the translation. I believe some translations e.g. changed %s to %S. And that seems to be still the case: grep -B3 '[^%]%S' po/*.po po/sr.po-#, fuzzy, gcc-internal-format po/sr.po-#| msgid "Duplicate %s attribute specified at %L" po/sr.po-msgid "Multiple %qs modifiers specified at %C" po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L" -- po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format po/sr.po-#| msgid "Duplicate %s attribute specified at %L" po/sr.po-msgid "Duplicate %s attribute specified at %L" po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L" -- po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format po/sr.po-#| msgid "Duplicate %s attribute specified at %L" po/sr.po-msgid "Duplicate BIND attribute specified at %L" po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L" -- po/tr.po-#, fuzzy, gcc-internal-format, gfc-internal-format po/tr.po-#| msgid "%s statement must appear in a MODULE" po/tr.po-msgid "%s statement must appear in a MODULE" po/tr.po:msgstr "%S deyimi bir MODULE'de görünmemeli" > +: G_("jump to case label"), decl); While in this case G_ is better just for consistency, N_ would work exactly the same given that there are no format strings. Jakub
Re: [PATCH RFA] diagnostic: rename new permerror overloads
On Thu, Oct 19, 2023 at 11:45:01AM -0400, Jason Merrill wrote: > OK for trunk? > > -- 8< -- > > While checking another change, I noticed that the new permerror overloads > break gettext with "permerror used incompatibly as both > --keyword=permerror:2 --flag=permerror:2:gcc-internal-format and > --keyword=permerror:3 --flag=permerror:3:gcc-internal-format". So let's > change the name. > > gcc/ChangeLog: > > * diagnostic-core.h (permerror): Rename new overloads... > (permerror_opt): To this. > * diagnostic.cc: Likewise. > > gcc/cp/ChangeLog: > > * typeck2.cc (check_narrowing): Adjust. Ok. Jakub
Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c
On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote: > Given the array has size i + 1 it's upper bound should be 'i' and 'i' > should be available via DW_OP_[GNU_]entry_value. > > I see it is > > <175> DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31 > 1c (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl; > DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus) > > and %rdi is 1. Not sure why gdb fails to print it's length. Yes, the > storage itself doesn't have a location but the > type specifies the size. > > (gdb) ptype a > type = char [variable length] > (gdb) p sizeof(a) > $3 = 0 > > this looks like a gdb bug to me? > > Btw, the location expression looks odd, if I deciper it correctly > we end up with ((%rdi << 32) >> 32) - 1 which computes to 4 > but the upper bound should be 5. The GIMPLE debug stmts compute > the upper bound as (sizetype)((long)(i_1(D) + 1) - 1) The << 32 >> 32 is sign extension. And yes, for f1 I don't see why DW_OP_GNU_entry_value shouldn't work, i in main is needed for the call to f2, so needs to live in some register or memory in that function until the second call. For f2 i is needed after the bar call for the a[i + 4] read, worst case in form of precomputed i + 4, but that is reversible op. Jakub
[C++ PATCH] Fix extern_decl_map handling (PR c++/3698, PR c++/86208)
Hi! This testcase got fixed in G++ 3.2, where we used for decision if inline function body should be kept TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (...)), but we now use cgraph and the testcase fails again; the particular problem is that we set TREE_USED only on the local extern decl and don't propagate it to the decl that extern_decl_map maps it into. Fixed by the following patch, which propagates it at genericization time. Bootstrapped/regtested on x86_64-linux and i686-linux. Another option is: + /* If decl is local extern, recurse into corresponding decl. */ + if (cfun + && cp_function_chain + && cp_function_chain->extern_decl_map + && VAR_OR_FUNCTION_DECL_P (decl) + && DECL_EXTERNAL (decl)) +{ + struct cxx_int_tree_map *h, in; + in.uid = DECL_UID (decl); + h = cp_function_chain->extern_decl_map->find_with_hash (&in, in.uid); + if (h) + TREE_USED (h->to) = 1; +} + in mark_used, another one: + /* If decl is local extern, recurse into corresponding decl. */ + if (cfun + && cp_function_chain + && cp_function_chain->extern_decl_map + && VAR_OR_FUNCTION_DECL_P (decl) + && DECL_EXTERNAL (decl)) +{ + struct cxx_int_tree_map *h, in; + in.uid = DECL_UID (decl); + h = cp_function_chain->extern_decl_map->find_with_hash (&in, in.uid); + if (h && !mark_used (h->to)) + return false; +} + in the same spot. None of these fix the PR82204 though. 2018-07-02 Jakub Jelinek PR c++/3698 PR c++/86208 * cp-gimplify.c (cp_genericize_r): When using extern_decl_map, or in TREE_USED flag from stmt to h->to. * g++.dg/opt/pr3698.C: New test. --- gcc/cp/cp-gimplify.c.jj 2018-06-20 08:15:28.980857357 +0200 +++ gcc/cp/cp-gimplify.c2018-07-02 18:03:00.714313555 +0200 @@ -1085,6 +1085,7 @@ cp_genericize_r (tree *stmt_p, int *walk if (h) { *stmt_p = h->to; + TREE_USED (h->to) |= TREE_USED (stmt); *walk_subtrees = 0; return NULL; } --- gcc/testsuite/g++.dg/opt/pr3698.C.jj2018-07-02 18:05:52.535479087 +0200 +++ gcc/testsuite/g++.dg/opt/pr3698.C 2018-07-02 18:05:44.507471531 +0200 @@ -0,0 +1,21 @@ +// PR c++/3698 +// { dg-do link } +// { dg-options "-O0" } + +struct X { + int i; +}; + +inline const int& +OHashKey (const X& x) +{ + return x.i; +} + +int +main () +{ + extern const int& OHashKey (const X& x); + X x; + return OHashKey (x); +} Jakub
[C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)
Hi! While working on OpenMP range-for support, I ran into the weird thing that the C++ FE accepts for (auto &i : a) if (i != *__for_begin || &i == __for_end || &__for_range[0] != &a[0]) __builtin_abort (); outside of templates, but doesn't inside of templates. I think we shouldn't let people do this at any time, it will just lead to non-portable code, and when it works only outside of templates, it isn't a well defined extension. Now, we could just use create_tmp_var_name ("__for_range") instead of get_identifier ("__for_range") etc. and the symbols would be non-accessible to users (usually; if assembler doesn't support dots nor dollars in labels, it is still a symbol with just alphanumeric chars in it, but there is a hard to predict number suffix at least), or just NULL DECL_NAME. But my understanding is that the intent was that in the debugger one could use __for_range, __for_begin and __for_end to make it easier to debug range fors. In that case, the following patch solves it by using a name not accessible for the user when parsing the body (with space in it) and correcting the names when the var gets out of scope. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-07-02 Jakub Jelinek PR c++/85515 * parser.c (build_range_temp): Use "__for range" instead of "__for_range". (cp_convert_range_for): Use "__for begin" and "__for end" instead of "__for_begin" and "__for_end". * semantics.c (finish_for_stmt): Rename "__for {range,begin,end}" local symbols to "__for_{range,begin,end}". * g++.dg/pr85515-2.C: Add expected dg-error. * g++.dg/cpp0x/range-for36.C: New test. --- gcc/cp/parser.c.jj 2018-07-02 23:04:00.869370436 +0200 +++ gcc/cp/parser.c 2018-07-02 23:08:44.572618486 +0200 @@ -11953,7 +11953,7 @@ build_range_temp (tree range_expr) /* Create the __range variable. */ range_temp = build_decl (input_location, VAR_DECL, - get_identifier ("__for_range"), range_type); + get_identifier ("__for range"), range_type); TREE_USED (range_temp) = 1; DECL_ARTIFICIAL (range_temp) = 1; @@ -12061,7 +12061,7 @@ cp_convert_range_for (tree statement, tr /* The new for initialization statement. */ begin = build_decl (input_location, VAR_DECL, - get_identifier ("__for_begin"), iter_type); + get_identifier ("__for begin"), iter_type); TREE_USED (begin) = 1; DECL_ARTIFICIAL (begin) = 1; pushdecl (begin); @@ -12072,7 +12072,7 @@ cp_convert_range_for (tree statement, tr if (cxx_dialect >= cxx17) iter_type = cv_unqualified (TREE_TYPE (end_expr)); end = build_decl (input_location, VAR_DECL, - get_identifier ("__for_end"), iter_type); + get_identifier ("__for end"), iter_type); TREE_USED (end) = 1; DECL_ARTIFICIAL (end) = 1; pushdecl (end); --- gcc/cp/semantics.c.jj 2018-06-25 14:51:23.096989196 +0200 +++ gcc/cp/semantics.c 2018-07-02 23:23:40.784400542 +0200 @@ -1060,7 +1060,31 @@ finish_for_stmt (tree for_stmt) : &FOR_SCOPE (for_stmt)); tree scope = *scope_ptr; *scope_ptr = NULL; + + /* During parsing of the body, range for uses "__for {range,begin,end}" + decl names to make those unaccessible by code in the body. + Change it to ones with underscore instead of space, so that it can + be inspected in the debugger. */ + tree range_for_decl[3] = { NULL_TREE, NULL_TREE, NULL_TREE }; + for (int i = 0; i < 3; i++) +{ + tree id + = get_identifier ("__for range\0__for begin\0__for end" + 12 * i); + if (IDENTIFIER_BINDING (id) + && IDENTIFIER_BINDING (id)->scope == current_binding_level) + { + range_for_decl[i] = IDENTIFIER_BINDING (id)->value; + gcc_assert (VAR_P (range_for_decl[i]) + && DECL_ARTIFICIAL (range_for_decl[i])); + } +} + add_stmt (do_poplevel (scope)); + + for (int i = 0; i < 3; i++) +if (range_for_decl[i]) + DECL_NAME (range_for_decl[i]) + = get_identifier ("__for_range\0__for_begin\0__for_end" + 12 * i); } /* Begin a range-for-statement. Returns a new RANGE_FOR_STMT. --- gcc/testsuite/g++.dg/pr85515-2.C.jj 2018-04-27 22:28:02.889462532 +0200 +++ gcc/testsuite/g++.dg/pr85515-2.C2018-07-02 23:33:50.638930364 +0200 @@ -15,8 +15,7 @@ int test_2 () int sum = 0; for (const auto v: arr) { sum += v; -// TODO: should we issue an error for the following assignment? -__for_begin = __for_end; +__for_begin = __for_end; // { dg-error "was not declared in this scope" } } return sum;
Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)
On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote: > Can we make them DECL_ARTIFICIAL and/or make name-lookup never They are DECL_ARTIFICIAL already. > lookup DECL_ARTIFICIAL vars instead? Not sure about that, will try to gather some statistics on how often we rely on name-lookup of DECL_ARTIFICIALs. Jakub
Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)
On Tue, Jul 03, 2018 at 11:58:31AM +0200, Richard Biener wrote: > On Tue, Jul 3, 2018 at 11:43 AM Jakub Jelinek wrote: > > > > On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote: > > > Can we make them DECL_ARTIFICIAL and/or make name-lookup never > > > > They are DECL_ARTIFICIAL already. > > > > > lookup DECL_ARTIFICIAL vars instead? > > > > Not sure about that, will try to gather some statistics on how often we > > rely on name-lookup of DECL_ARTIFICIALs. > > Hmm, we might indeed. At least we should make sure those > cases never have valid identifiers? Or is the implementation At least __FUNCTION__, __PRETTY_FUNCTION__, __func__ are all local VAR_DECLs with DECL_ARTIFICIAL that need to be found by name lookup (so far gathered stats just show __FUNCTION__ in lookup_name_real_1). Jakub
Re: [C++ PATCH] Hide __for_{range,begin,end} symbols (PR c++/85515)
On Tue, Jul 03, 2018 at 01:24:28PM +0200, Jakub Jelinek wrote: > On Tue, Jul 03, 2018 at 11:58:31AM +0200, Richard Biener wrote: > > On Tue, Jul 3, 2018 at 11:43 AM Jakub Jelinek wrote: > > > > > > On Tue, Jul 03, 2018 at 11:34:51AM +0200, Richard Biener wrote: > > > > Can we make them DECL_ARTIFICIAL and/or make name-lookup never > > > > > > They are DECL_ARTIFICIAL already. > > > > > > > lookup DECL_ARTIFICIAL vars instead? > > > > > > Not sure about that, will try to gather some statistics on how often we > > > rely on name-lookup of DECL_ARTIFICIALs. > > > > Hmm, we might indeed. At least we should make sure those > > cases never have valid identifiers? Or is the implementation > > At least __FUNCTION__, __PRETTY_FUNCTION__, __func__ are all local > VAR_DECLs with DECL_ARTIFICIAL that need to be found by name lookup > (so far gathered stats just show __FUNCTION__ in lookup_name_real_1). And it isn't limited to that, omp_priv/omp_orig/omp_in/omp_out too (the OpenMP UDR artifical vars), also variables captured in lambdas, anon union fields, ... So I'm afraid it is not possible to ignore DECL_ARTIFICIAL VAR_DECLs in name lookup and generally it is ok that they have user accessible names. Just the range for vars are a special case. Jakub
Re: [PATCH] Fix bootstrap on ia64 with old GCC version.
On Tue, Jul 03, 2018 at 07:22:19PM +0200, Martin Liška wrote: > In order to make GCC 4.1 happy and build current tip, we need to define > static constants out of a class definition. > > Ready for trunk? > Thanks, > Martin > > gcc/ChangeLog: > > 2018-07-03 Martin Liska > > * tree-switch-conversion.h (struct jump_table_cluster): Define > constant values outside of class declaration. That looks incorrect. I don't see why 4.1 wouldn't allow the const static data members initializers inside of the class. You just need to define those vars, and the definition (without the initializers) shouldn't go into the header, but to a single .c file instead (I know right now there is just one .c file that includes this header, but if we ever want to include it in more than one, it would be a problem; if we never want to include in more than one, the question is why we have the header file at all). So IMHO keep tree-switch-conversion.h unmodified and add: const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_size; const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_speed; to tree-switch-conversion.c somewhere. > diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h > index 4beac785f05..8efb125aff1 100644 > --- a/gcc/tree-switch-conversion.h > +++ b/gcc/tree-switch-conversion.h > @@ -259,12 +259,17 @@ struct jump_table_cluster: public group_cluster >static bool is_enabled (void); > >/* Max growth ratio for code that is optimized for size. */ > - static const unsigned HOST_WIDE_INT max_ratio_for_size = 3; > + static const unsigned HOST_WIDE_INT max_ratio_for_size; > >/* Max growth ratio for code that is optimized for speed. */ > - static const unsigned HOST_WIDE_INT max_ratio_for_speed = 8; > + static const unsigned HOST_WIDE_INT max_ratio_for_speed; > }; > > +const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_size = 3; > + > +const unsigned HOST_WIDE_INT jump_table_cluster::max_ratio_for_speed = 8; > + > + > /* A GIMPLE switch statement can be expanded to a short sequence of bit-wise > comparisons. "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)" > where CST and MINVAL are integer constants. This is better than a series > Jakub
Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
On Tue, Jul 03, 2018 at 10:02:47PM +0100, Jonathan Wakely wrote: > +#ifndef _GLIBCXX_BIT > +#define _GLIBCXX_BIT 1 > + > +#pragma GCC system_header > + > +#if __cplusplus >= 201402L > + > +#include > +#include > + > +namespace std _GLIBCXX_VISIBILITY(default) > +{ > +_GLIBCXX_BEGIN_NAMESPACE_VERSION > + > + template > +constexpr _Tp > +__rotl(_Tp __x, unsigned int __s) noexcept > +{ > + constexpr auto _Nd = numeric_limits<_Tp>::digits; > + const unsigned __sN = __s % _Nd; > + if (__sN) > +return (__x << __sN) | (__x >> (_Nd - __sN)); Wouldn't it be better to use some branchless pattern that GCC can also optimize well, like: return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1))); (iff _Nd is always power of two), or perhaps return (__x << __sN) | (__x >> ((-_sN) % _Nd)); which is going to be folded into the above one for power of two constants? E.g. ia32intrin.h also uses: /* 64bit rol */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) __rolq (unsigned long long __X, int __C) { __C &= 63; return (__X << __C) | (__X >> (-__C & 63)); } etc. Jakub
Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote: > > Wouldn't it be better to use some branchless pattern that > > GCC can also optimize well, like: > > return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1))); > > (iff _Nd is always power of two), > > _Nd is 20 for one of the INT_N types on msp340, but we could have a > special case for the rare integer types with unusual sizes. > > > or perhaps > > return (__x << __sN) | (__x >> ((-_sN) % _Nd)); > > which is going to be folded into the above one for power of two constants? > > That looks good. Unfortunately it is not correct if _Nd is not power of two. E.g. for __sN 1, -1U % 20 is 15, not 19. So it would need to be return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); Unfortunately, our rotate pattern recognizer handles return (__x << __sN) | (__x >> ((-__sN) % _Nd)); or return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1))); but doesn't handle the _Nd - __sN case. Is this C++17+ only? Then perhaps if constexpr ((_Nd & (_Nd - 1)) == 0) return (__x << __sN) | (__x >> (-__sN & (_Nd - 1))); return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); Verify that on x86_64 for all the unsigned {char,short,int,long long} you actually get a mere rol? instruction with perhaps some register movement, but no masking, nor shifts etc. > > E.g. ia32intrin.h also uses: > > /* 64bit rol */ > > extern __inline unsigned long long > > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > > __rolq (unsigned long long __X, int __C) > > { > > __C &= 63; > > return (__X << __C) | (__X >> (-__C & 63)); > > } > > etc. > > Should we delegate to those intrinsics for x86, so that > __builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when > relevant? No, the pattern recognizers should handle (for power of two bitcounts) even the char/short cases. Those intrinsics predate the improvements in rotate pattern recognition. Jakub
Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
On Wed, Jul 04, 2018 at 09:14:04AM +0100, Jonathan Wakely wrote: > > Unfortunately it is not correct if _Nd is not power of two. > > E.g. for __sN 1, -1U % 20 is 15, not 19. > > So it would need to be > > return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); > > Unfortunately, our rotate pattern recognizer handles > > return (__x << __sN) | (__x >> ((-__sN) % _Nd)); > > or > > return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1))); > > but doesn't handle the _Nd - __sN case. > > Is this C++17+ only? Then perhaps > > The std::rotr and std::rotl functions are C++2a only, but I've added > the __rotr and __rotl versions for our own internal use in C++14 and > later. > > In practice I have no internal use for rotr and rotl, so I could > remove the __rot[rl] forms. However, won't ((_Nd & (_Nd - 1)) optimize > to a constant even without if-constexpr? I'll check. It should, sure. Anyway, I guess my C tests didn't test properly what happens in your functions. We actually do optimize: unsigned long long foo (unsigned long long __x, unsigned int __s) { constexpr int _Nd = 64; unsigned int _sN = __s % _Nd; return (__x << _sN) | (__x >> ((_Nd - _sN) % _Nd)); } properly, just don't do it if it is: unsigned long long foo (unsigned long long __x, unsigned int __s) { int _Nd = 64; unsigned int _sN = __s % _Nd; return (__x << _sN) | (__x >> ((_Nd - _sN) % _Nd)); } Apparently, the (64 - x) & 63 optimization to -x & 63 is only done somewhere in the FEs and not in match.pd, which is something we should fix. But with constexpr _Nd you actually can use return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); unconditionally. Jakub
[PATCH] Move ((A & N) + B) & M -> (A + B) & M etc. optimization from fold-const.c to match.pd (PR tree-optimization/86401)
Hi! I've encountered this while testing the rotate patterns in discussion with Jonathan for the std::__rot{l,r}, in the rotate-9.c test without this patch f1 is optimized into a rotate, but f2 is not. Fixed by moving the optimization from fold-const.c to match.pd (well, leaving a helper in fold-const.c to avoid too much duplication). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-07-05 Jakub Jelinek PR tree-optimization/86401 * fold-const.c (fold_binary_loc) : Move the ((A & N) + B) & M -> (A + B) & M etc. optimization into ... (fold_bit_and_mask): ... here. New helper function for match.pd. * fold-const.h (fold_bit_and_mask): Declare. * match.pd (((A & N) + B) & M -> (A + B) & M): New optimization. * gcc.dg/tree-ssa/pr86401-1.c: New test. * gcc.dg/tree-ssa/pr86401-2.c: New test. * c-c++-common/rotate-9.c: New test. --- gcc/fold-const.c.jj 2018-06-26 09:05:23.196346433 +0200 +++ gcc/fold-const.c2018-07-04 12:47:59.139981801 +0200 @@ -10236,121 +10236,6 @@ fold_binary_loc (location_t loc, enum tr } } - /* For constants M and N, if M == (1LL << cst) - 1 && (N & M) == M, -((A & N) + B) & M -> (A + B) & M -Similarly if (N & M) == 0, -((A | N) + B) & M -> (A + B) & M -and for - instead of + (or unary - instead of +) -and/or ^ instead of |. -If B is constant and (B & M) == 0, fold into A & M. */ - if (TREE_CODE (arg1) == INTEGER_CST) - { - wi::tree_to_wide_ref cst1 = wi::to_wide (arg1); - if ((~cst1 != 0) && (cst1 & (cst1 + 1)) == 0 - && INTEGRAL_TYPE_P (TREE_TYPE (arg0)) - && (TREE_CODE (arg0) == PLUS_EXPR - || TREE_CODE (arg0) == MINUS_EXPR - || TREE_CODE (arg0) == NEGATE_EXPR) - && (TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0)) - || TREE_CODE (TREE_TYPE (arg0)) == INTEGER_TYPE)) - { - tree pmop[2]; - int which = 0; - wide_int cst0; - - /* Now we know that arg0 is (C + D) or (C - D) or --C and arg1 (M) is == (1LL << cst) - 1. -Store C into PMOP[0] and D into PMOP[1]. */ - pmop[0] = TREE_OPERAND (arg0, 0); - pmop[1] = NULL; - if (TREE_CODE (arg0) != NEGATE_EXPR) - { - pmop[1] = TREE_OPERAND (arg0, 1); - which = 1; - } - - if ((wi::max_value (TREE_TYPE (arg0)) & cst1) != cst1) - which = -1; - - for (; which >= 0; which--) - switch (TREE_CODE (pmop[which])) - { - case BIT_AND_EXPR: - case BIT_IOR_EXPR: - case BIT_XOR_EXPR: - if (TREE_CODE (TREE_OPERAND (pmop[which], 1)) - != INTEGER_CST) - break; - cst0 = wi::to_wide (TREE_OPERAND (pmop[which], 1)) & cst1; - if (TREE_CODE (pmop[which]) == BIT_AND_EXPR) - { - if (cst0 != cst1) - break; - } - else if (cst0 != 0) - break; - /* If C or D is of the form (A & N) where - (N & M) == M, or of the form (A | N) or - (A ^ N) where (N & M) == 0, replace it with A. */ - pmop[which] = TREE_OPERAND (pmop[which], 0); - break; - case INTEGER_CST: - /* If C or D is a N where (N & M) == 0, it can be - omitted (assumed 0). */ - if ((TREE_CODE (arg0) == PLUS_EXPR -|| (TREE_CODE (arg0) == MINUS_EXPR && which == 0)) - && (cst1 & wi::to_wide (pmop[which])) == 0) - pmop[which] = NULL; - break; - default: - break; - } - - /* Only build anything new if we optimized one or both arguments -above. */ - if (pmop[0] != TREE_OPERAND (arg0, 0) - || (TREE_CODE (arg0) != NEGATE_EXPR - && pmop[1] != TREE_OPERAND (arg0, 1))) - { - tree utype = TREE_TYPE (arg0); - if (! TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0))) - { - /* Perform the operations in a type that has defined -overflow behavior. */ - utype = unsigned_type_for (TREE_TYPE (arg0)); - if (pmop[0] != NULL)
Re: [PATCH][PR sanitizer/84250] Avoid global symbols collision when using both ASan and UBSan
On Wed, Jul 04, 2018 at 08:20:47PM +0300, Maxim Ostapenko wrote: > On 07/04/2018 05:45 AM, Jeff Law wrote: > > On 05/23/2018 11:15 AM, Maxim Ostapenko wrote: > >> as described in PR, when using both ASan and UBSan > >> (-fsanitize=address,undefined ), we have symbols collision for global > >> functions, like __sanitizer_set_report_path. This leads to fuzzy results > >> when printing reports into files e.g. for this test case: > >> > >> #include > >> int main(int argc, char **argv) { > >> __sanitizer_set_report_path("/tmp/sanitizer.txt"); > >> int i = 23; > >> i <<= 32; > >> int *array = new int[100]; > >> delete [] array; > >> return array[argc]; > >> } > >> > >> only ASan's report gets written to file; UBSan output goes to stderr. > >> > >> To resolve this issue we could use two approaches: > >> > >> 1) Use the same approach to that is implemented in Clang (UBSan embedded > >> to ASan). The only caveat here is that we need to link (unused) C++ part > >> of UBSan even in C programs when linking static ASan runtime. This > >> happens because GCC, as opposed to Clang, doesn't split C and C++ > >> runtimes for sanitizers. > >> > >> 2) Just add SANITIZER_INTERFACE_ATTRIBUTE to report_file global > >> variable. In this case all __sanitizer_set_report_path calls will set > >> the same report_file variable. IMHO this is a hacky way to fix the > >> issue, it's better to use the first option if possible. > >> > >> > >> The attached patch fixes the symbols collision by embedding UBSan into > >> ASan (variant 1), just like we do for LSan. > >> > >> > >> Regtested/bootstrapped on x86_64-unknown-linux-gnu, looks reasonable > >> enough for trunk? > >> > >> > >> -Maxim > >> > >> > >> pr84250-2.diff > >> > >> > >> gcc/ChangeLog: > >> > >> 2018-05-23 Maxim Ostapenko > >> > >>* config/gnu-user.h (LIBASAN_EARLY_SPEC): Pass -lstdc++ for static > >>libasan. > >>* gcc.c: Do not pass LIBUBSAN_SPEC if ASan is enabled with UBSan. > >> > >> libsanitizer/ChangeLog: > >> > >> 2018-05-23 Maxim Ostapenko > >> > >>* Makefile.am: Reorder libs. > >>* Makefile.in: Regenerate. > >>* asan/Makefile.am: Define DCAN_SANITIZE_UB=1, add dependancy from > >>libsanitizer_ubsan.la. > >>* asan/Makefile.in: Regenerate. > >>* ubsan/Makefile.am: Define new libsanitizer_ubsan.la library. > >>* ubsan/Makefile.in: Regenerate. > > You know this code better than anyone else working on GCC. My only > > concern would be the kernel builds with asan, but I suspect they're > > providing their own runtime anyway, so the libstdc++ caveat shouldn't apply. > > Yes, you are right, kernel provides its own runtime. > > > > > OK for the trunk. > > Ok, thanks, I'll apply the patch today (with fixed ChangeLog entry). This broke the c-c++-common/asan/pr59063-2.c test: FAIL: c-c++-common/asan/pr59063-2.c -O1 (test for excess errors) Excess errors: /usr/bin/ld: cannot find -lstdc++ While it could be fixed by tweaking asan-dg.exp, thinking about this, the 1) variant is actually not a good idea, it will not work properly anyway if you link one library with -fsanitize=undefined and another library with -fsanitize=address, the right solution is to make the two libraries coexist sanely, so I'd prefer 2) or if not exporting a variable, export an accessor function to get the address of the variable (or whole block of shared state in one object between the libraries). Yes, it means trying to get something accepted upstream, but anything else is an ugly hack. Jakub
Re: [PATCH] Fix several AVX512 intrinsic mask arguments.
On Thu, Jul 05, 2018 at 08:30:27PM +0300, Grazvydas Ignotas wrote: > gcc/ChangeLog: > > 2018-07-05 Grazvydas Ignotas > > * config/i386/avx512bwintrin.h: (_mm512_mask_cmp_epi8_mask, > _mm512_mask_cmp_epu8_mask): Fix mask arguments. LGTM, but 1) I think it would be nice to add a runtime testcase that fails (on avx512bw hw) without this patch and succeeds with this patch (have some non-zero and zero bits in the high 32 bits of the mask and test that the result is correct 2) there are other functions that have this bug, e.g. _mm_mask_cmp_epi8_mask, _mm256_mask_cmp_epi8_mask, _mm_mask_cmp_epu8_mask, _mm256_mask_cmp_epu8_mask in avx512vlbwintrin.h Let's grep for all suspicious parts: echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 's/@@@/}\n/g' | grep '__mmask8.*__mmask\(16\|32\|64\)' _mm512_mask_bitshuffle_epi64_mask (__mmask8 __M, __m512i __A, __m512i __B) { return (__mmask64) __builtin_ia32_vpshufbitqmb512_mask ((__v64qi) __A, (__v64qi) __B, (__mmask64) __M); } _mm_mask_cmp_epi8_mask (__mmask8 __U, __m128i __X, __m128i __Y, const int __P) { return (__mmask16) __builtin_ia32_cmpb128_mask ((__v16qi) __X, (__v16qi) __Y, __P, (__mmask16) __U); } _mm_mask_cmp_epu8_mask (__mmask8 __U, __m128i __X, __m128i __Y, const int __P) { return (__mmask16) __builtin_ia32_ucmpb128_mask ((__v16qi) __X, (__v16qi) __Y, __P, (__mmask16) __U); } echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 's/@@@/}\n/g' | grep '__mmask16.*__mmask\(8\|32\|64\)' _mm512_mask_xor_epi64 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) { return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, (__v8di) __B, (__v8di) __W, (__mmask8) __U); } _mm512_maskz_xor_epi64 (__mmask16 __U, __m512i __A, __m512i __B) { return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, (__v8di) __B, (__v8di) _mm512_setzero_si512 (), (__mmask8) __U); } _mm512_mask_cmpneq_epi64_mask (__mmask16 __M, __m512i __X, __m512i __Y) { return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, (__v8di) __Y, 4, (__mmask8) __M); } _mm256_mask_cmp_epi8_mask (__mmask16 __U, __m256i __X, __m256i __Y, const int __P) { return (__mmask32) __builtin_ia32_cmpb256_mask ((__v32qi) __X, (__v32qi) __Y, __P, (__mmask32) __U); } _mm256_mask_cmp_epu8_mask (__mmask16 __U, __m256i __X, __m256i __Y, const int __P) { return (__mmask32) __builtin_ia32_ucmpb256_mask ((__v32qi) __X, (__v32qi) __Y, __P, (__mmask32) __U); } _mm_mask_add_ps (__m128 __W, __mmask16 __U, __m128 __A, __m128 __B) { return (__m128) __builtin_ia32_addps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) __W, (__mmask8) __U); } _mm_maskz_add_ps (__mmask16 __U, __m128 __A, __m128 __B) { return (__m128) __builtin_ia32_addps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } _mm256_mask_add_ps (__m256 __W, __mmask16 __U, __m256 __A, __m256 __B) { return (__m256) __builtin_ia32_addps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) __W, (__mmask8) __U); } _mm256_maskz_add_ps (__mmask16 __U, __m256 __A, __m256 __B) { return (__m256) __builtin_ia32_addps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) _mm256_setzero_ps (), (__mmask8) __U); } _mm_mask_sub_ps (__m128 __W, __mmask16 __U, __m128 __A, __m128 __B) { return (__m128) __builtin_ia32_subps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) __W, (__mmask8) __U); } _mm_maskz_sub_ps (__mmask16 __U, __m128 __A, __m128 __B) { return (__m128) __builtin_ia32_subps128_mask ((__v4sf) __A, (__v4sf) __B, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } _mm256_mask_sub_ps (__m256 __W, __mmask16 __U, __m256 __A, __m256 __B) { return (__m256) __builtin_ia32_subps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) __W, (__mmask8) __U); } _mm256_maskz_sub_ps (__mmask16 __U, __m256 __A, __m256 __B) { return (__m256) __builtin_ia32_subps256_mask ((__v8sf) __A, (__v8sf) __B, (__v8sf) _mm256_setzero_ps (), (__mmask8) __U); } _mm256_maskz_cvtepi32_ps (__mmask16 __U, __m256i __A) { return (__m256) __builtin_ia32_cvtdq2ps256_mask ((__v8si) __A, (__v8sf) _mm256_setzero_ps (), (__mmask8) __U); } _mm_maskz_cvtepi32_ps (__mmask16 __U, __m128i __A) { return (__m128) __builtin_ia32_cvtdq2ps128_mask ((__v4si) __A, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 's/@@@/}\n/g' | grep '__mmask32.*__mmask\(8\|16\|64\)' _mm512_mask_cmp_epi8_mask (__mmask32 __U, __m512i __X, __m512i __Y, const int __P) { return (__mmask64) __builtin_ia32_cmpb512_mask ((__v64qi) __X, (__v64qi) __Y, __P, (__mmask64) __U); } _mm512_mask_cmp_epu8_mask (__mmask32 __U, __m512i __X, __m512i __Y, const int __P) { return (__mmask64) __builtin_ia32_ucmpb512_mask ((__v64qi) __X, (__v64qi) __Y, __P, (__mmask64) __U); } _mm256_mask_cvtepi8_epi16 (__m256i __W, __mmask32 __U, __m128i __A) { return (__m256i) __builtin_ia32_pmovsxbw256_mask ((__v16qi) __A, (__v16hi) __W, (__mmask16) __U); } _mm_
Re: [PATCH] Fix several AVX512 intrinsic mask arguments.
On Thu, Jul 05, 2018 at 11:57:26PM +0300, Grazvydas Ignotas wrote: > I think it would be more efficient if you took care of it. I won't > have time for at least a few days anyway. Here is what I have ATM, but will still need to work on the testsuite side for it and bootstrap/regtest it. In addition to the checks I've posted I've also done: echo `sed -n '/^_mm.*__mmask/,/^}/p' config/i386/*.h | sed 's/^}/@@@/'` | sed 's/@@@/}\n/g' > /tmp/11 echo `sed -n '/^#define[ \t]_mm/,/)$/p' config/i386/*.h | sed 's/)$/@@@/' | sed 's/\\$//'` | sed 's/@@@/)\n/g' | grep __mmask >> /tmp/11 for i in `grep '__builtin.*_UQI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask8; done for i in `grep '__builtin.*_UHI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask16; done for i in `grep '__builtin.*_USI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask32; done for i in `grep '__builtin.*_UDI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask64; done for i in `grep '__builtin.*_QI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask8; done for i in `grep '__builtin.*_HI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask16; done for i in `grep '__builtin.*_SI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask32; done for i in `grep '__builtin.*_DI)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask64; done for i in `grep '__builtin.*_UQI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask8; done for i in `grep '__builtin.*_UHI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask16; done for i in `grep '__builtin.*_USI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask32; done for i in `grep '__builtin.*_UDI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask64; done for i in `grep '__builtin.*_QI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask8; done for i in `grep '__builtin.*_HI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask16; done for i in `grep '__builtin.*_SI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask32; done for i in `grep '__builtin.*_DI_INT)' config/i386/i386-builtin.def | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask64; done and finally manual checking (could be automated too) of: for i in `grep '__builtin.*_INT)' config/i386/i386-builtin.def | grep -v '_U\?[QHSD]I_INT)' | sed 's/^[^"]*"//;s/".*$//' | sort -u`; do grep $i'[ \t(].*__mmask' /tmp/11 | grep -v __mmask32; done For this last one, it is about trying to verify what kind of '__v\(2\|4\|8\|16\|32\|64\)[qhsd][if]' is used with the different __mmask, 2/4/8 should be used with __mmask8, 16 with __mmask16, 32 with __mmask32 and 64 with __mmask64. Some fixes in the patch are mostly for consistency and harmless for code generation (e.g. when argument should have been __mmask8 but was __mmask16), but several changes should fix wrong-code bugs. --- gcc/config/i386/avx512bwintrin.h.jj 2018-01-03 10:20:06.699535804 +0100 +++ gcc/config/i386/avx512bwintrin.h2018-07-06 09:53:44.657235040 +0200 @@ -3043,7 +3043,7 @@ _mm512_cmp_epi16_mask (__m512i __X, __m5 extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cmp_epi8_mask (__mmask32 __U, __m512i __X, __m512i __Y, +_mm512_mask_cmp_epi8_mask (__mmask64 __U, __m512i __X, __m512i __Y, const int __P) { return (__mmask64) __builtin_ia32_cmpb512_mask ((__v64qi) __X, @@ -3081,7 +3081,7 @@ _mm512_cmp_epu16_mask (__m512i __X, __m5 extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_cmp_epu8_mask (__mmask32 __U, __m512i __X, __m512i __Y, +_mm512_mask_cmp_epu8_mask (__mmask64 __U, __m512i __X, __m512i __Y, const int __P) { return (__mmask64) __builtin_ia32_ucmpb512_mask ((__v64qi) __X, --- gcc/config/i386/avx512bitalgintrin.h.jj
Re: [committed][gcc][patch] Require sse for testcase on i686.
On Fri, Jul 06, 2018 at 11:46:43AM +0100, Tamar Christina wrote: > This fixes an ABI warning generated on i686-pc-linux-gnu when using > `vector_size` with no sse enabled explicitly. > > Regtested single test on x86_64-pc-linux-gnu with -m32 and no issues. > > Committed under the GCC obvious rule. That is insufficient, I get the FAIL on i686-linux. You don't really need dg-require-effective-target sse, it is a dg-do compile time only test and on i?86/x86_64 failures with old assemblers would show up only when assembling. But -msse really should be used on all i?86-*-* and x86_64-*-*. You'd normally get the -msse etc. automatically, but dg-options is undesirable in gcc.dg/vect/ tests where all those predefined options are lost that way, dg-additional-options should be used instead (and you can say use there -O2 -fno-tree-vectorize or whatever you want). So, if you have spare cycles, please test such change whether it still FAILs on arm with your patch reverted after such test changes. > gcc/testsuite/ > 2018-07-06 Tamar Christina > > PR target/84711 > * gcc.dg/vect/pr84711.c: Add -msse for i686 targets. In the meantime, I've committed following fix as obvious: 2018-07-07 Jakub Jelinek PR target/84711 * gcc.dg/vect/pr84711.c: Remove unnecessary sse dg-require-effective-target. Add -msse not just on i386-*, but on all i?86-* and x86_64-*. --- gcc/testsuite/gcc.dg/vect/pr84711.c.jj 2018-07-06 23:35:44.952791972 +0200 +++ gcc/testsuite/gcc.dg/vect/pr84711.c 2018-07-07 09:43:27.068785902 +0200 @@ -1,8 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_int } */ -/* { dg-require-effective-target sse { target i386*-*-* } } */ /* { dg-options "-O2" } */ -/* { dg-additional-options "-msse" { target i386*-*-* } } */ +/* { dg-additional-options "-msse" { target i?86-*-* x86_64-*-* } } */ typedef int v4si __attribute__ ((vector_size (16))); Jakub
[PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)
Hi! So, apparently I've misread when exceptions are raised by nextafter/nexttoward (and errno set). if(((ix>=0x7ff0)&&((ix-0x7ff0)|lx)!=0) || /* x is nan */ ((iy>=0x7ff0)&&((iy-0x7ff0)|ly)!=0)) /* y is nan */ return x+y; I believe the above only raises exception if either operand is sNaN and thus we should handle it: if (REAL_VALUE_ISSIGNALING_NAN (*arg0) || REAL_VALUE_ISSIGNALING_NAN (*arg1)) return false; Then: if(x==y) return y; /* x=y, return y */ If arguments are equal, no exception is raised even if it is denormal, we handle this with: /* If x == y, return y cast to target type. */ if (cmp == 0) { real_convert (r, fmt, y); return false; } Next: if((ix|lx)==0) {/* x == 0 */ double u; INSERT_WORDS(x,hy&0x8000,1);/* return +-minsubnormal */ u = math_opt_barrier (x); u = u*u; math_force_eval (u);/* raise underflow flag */ return x; } going from zero to +/- ulp should only raise underflow, but not set errno, handled with: /* Similarly for nextafter (0, 1) raising underflow. */ else if (flag_trapping_math && arg0->cl == rvc_zero && result->cl != rvc_zero) return false; Then overflow: hy = hx&0x7ff0; if(hy>=0x7ff0) { double u = x+x; /* overflow */ math_force_eval (u); __set_errno (ERANGE); } should be handled with: if (REAL_EXP (r) > fmt->emax) { get_inf (r, x->sign); return true; } And finally there is: if(hy<0x0010) { double u = x*x; /* underflow */ math_force_eval (u);/* raise underflow flag */ __set_errno (ERANGE); } which I've misread as raising exception and setting errno only if the result is 0 and first arg isn't: return r->cl == rvc_zero; but actually it is true also for all denormal returns except for the x == y case, so the following patch uses: return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin; instead. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-07-07 Jakub Jelinek PR c/86420 * real.c (real_nextafter): Return true if result is denormal. * gcc.dg/nextafter-1.c (TEST): Adjust the tests that expect denormals to be returned and when first argument is not 0, so that they don't do anything for NEED_EXC or NEED_ERRNO. --- gcc/real.c.jj 2018-05-06 23:12:49.211619736 +0200 +++ gcc/real.c 2018-07-06 18:42:44.761026632 +0200 @@ -5141,7 +5141,7 @@ real_nextafter (REAL_VALUE_TYPE *r, form get_zero (r, x->sign); return true; } - return r->cl == rvc_zero; + return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin; } /* Write into BUF the maximum representable finite floating-point --- gcc/testsuite/gcc.dg/nextafter-1.c.jj 2018-05-10 09:38:03.040250709 +0200 +++ gcc/testsuite/gcc.dg/nextafter-1.c 2018-07-06 19:17:55.138355524 +0200 @@ -58,23 +58,41 @@ name (void) \ = (NEED_EXC || NEED_ERRNO) ? __builtin_inf##l1 () \ : fn (MAX1, __builtin_inf ());\ CHECK (__builtin_isinf##l1 (m) && !__builtin_signbit (m));\ - const type n = fn (DENORM_MIN1, 12.0##L2);\ + const type n \ += (NEED_EXC || NEED_ERRNO) ? 2.0##L1 * DENORM_MIN1 \ + : fn (DENORM_MIN1, 12.0##L2); \ CHECK (n == 2.0##L1 * DENORM_MIN1); \ - const type o = fn (n, 24.0##L2); \ + const type o \ += (NEED_EXC || NEED_ERRNO) ? 3.0##L1 * DENORM_MIN1 \ + : fn (n, 24.0##L2); \ CHECK (o == 3.0##L1 * DENORM_MIN1); \ - const type p = fn (o, 132.0##L2); \ + const type p \ += (NEED_EXC || NEED_ERRNO) ? 4.0##L1 * DENORM_MIN1 \ + : fn (o, 132.0##L2); \ CHECK (p == 4.0##L1 * DENORM_MIN1); \ - const type q = fn (2.0##L1 * DENORM_MIN1, -__builtin_inf ()); \ + const type q
[PATCH] Fix __mmask* types on many AVX512 intrinsics
Hi! On Fri, Jul 06, 2018 at 12:47:07PM +0200, Jakub Jelinek wrote: > On Thu, Jul 05, 2018 at 11:57:26PM +0300, Grazvydas Ignotas wrote: > > I think it would be more efficient if you took care of it. I won't > > have time for at least a few days anyway. Here is the complete patch, I found two further issues where the __mmask mismatch was in between the return type and what was used in the rest of the intrinsic, so not caught by my earlier greps. I've added (except for the avx512bitalg which seems to have no runtime test coverage whatsoever) tests that cover the real bugs and further fixed the avx512*-vpcmp{,u}b-2.c test because (rel) << i triggered UB if i could go up to 63. I don't have AVX512* hw, so I've just bootstrapped/regtested the patch normally on i686-linux and x86_64-linux AVX2 hw and tried the affected tests without the config/i386/ changes and with them under SDE. The patch should fix these FAILs: FAIL: gcc.target/i386/avx512bw-vpcmpb-2.c execution test FAIL: gcc.target/i386/avx512bw-vpcmpub-2.c execution test FAIL: gcc.target/i386/avx512f-vinsertf32x4-3.c execution test FAIL: gcc.target/i386/avx512f-vinserti32x4-3.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpgeb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpgeub-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpgeuw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpgew-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpleb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpleub-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpleuw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmplew-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpltb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpltub-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpltuw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpltw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpneqb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpnequb-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpnequw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpneqw-2.c execution test FAIL: gcc.target/i386/avx512vl-vpcmpub-2.c execution test Ok for trunk? I guess we want to backport it soon, but would appreciate somebody testing it on real AVX512-{BW,VL} hw before doing the backports. Another thing to consider is whether we shouldn't add those grep/sed checks I've been doing (at least the easy ones that don't cross-check the i386-builtins.def against the uses in the intrin headers) to config/i386/t-* some way. 2018-07-07 Jakub Jelinek * config/i386/avx512bitalgintrin.h (_mm512_mask_bitshuffle_epi64_mask): Use __mmask64 type instead of __mmask8 for __M argument. * config/i386/avx512fintrin.h (_mm512_mask_xor_epi64, _mm512_maskz_xor_epi64): Use __mmask8 type instead of __mmask16 for __U argument. (_mm512_mask_cmpneq_epi64_mask): Use __mmask8 type instead of __mmask16 for __M argument. (_mm512_maskz_insertf32x4, _mm512_maskz_inserti32x4, _mm512_mask_insertf32x4, _mm512_mask_inserti32x4): Cast last argument to __mmask16 instead of __mmask8. * config/i386/avx512vlintrin.h (_mm_mask_add_ps, _mm_maskz_add_ps, _mm256_mask_add_ps, _mm256_maskz_add_ps, _mm_mask_sub_ps, _mm_maskz_sub_ps, _mm256_mask_sub_ps, _mm256_maskz_sub_ps, _mm256_maskz_cvtepi32_ps, _mm_maskz_cvtepi32_ps): Use __mmask8 type instead of __mmask16 for __U argument. * config/i386/avx512vlbwintrin.h (_mm_mask_cmp_epi8_mask): Use __mmask16 instead of __mmask8 for __U argument. (_mm256_mask_cmp_epi8_mask): Use __mmask32 instead of __mmask16 for __U argument. (_mm256_cmp_epi8_mask): Use __mmask32 return type instead of __mmask16. (_mm_mask_cmp_epu8_mask): Use __mmask16 instead of __mmask8 for __U argument. (_mm256_mask_cmp_epu8_mask): Use __mmask32 instead of __mmask16 for __U argument. (_mm256_cmp_epu8_mask): Use __mmask32 return type instead of __mmask16. (_mm_mask_cmp_epi16_mask): Cast last argument to __mmask8 instead of __mmask16. (_mm256_mask_cvtepi8_epi16): Use __mmask16 instead of __mmask32 for __U argument. (_mm_mask_cvtepi8_epi16): Use __mmask8 instead of __mmask32 for __U argument. (_mm256_mask_cvtepu8_epi16): Use __mmask16 instead of __mmask32 for __U argument. (_mm_mask_cvtepu8_epi16): Use __mmask8 instead of __mmask32 for __U argument. (_mm256_mask_cmpneq_epu8_mask, _mm256_mask_cmplt_epu8_mask, _mm256_mask_cmpge_epu8_mask, _mm256_mask_cmple_epu8_mask): Change return type as well as __M argument type and all casts from __mmask8 to __mmask32. (_mm256_mask_cmpneq_epu16_mask, _mm256_mas
Re: [PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)
On Sat, Jul 07, 2018 at 11:55:17AM +0200, Marc Glisse wrote: > On Sat, 7 Jul 2018, Jakub Jelinek wrote: > > > 2018-07-07 Jakub Jelinek > > > > PR c/86420 > > * real.c (real_nextafter): Return true if result is denormal. > > I have a question on the side: would it be hard / useful, in cases where > nextafter may set errno or some exception flag, to fold the result to a > constant while keeping the function call (ignoring the value it returns)? To > clarify, I mean replace > > _2 = nextafter(DBL_DENORM_MIN, 0); > > with > > nextafter(DBL_DENORM_MIN, 0); > _2 = 0; > > I think we already do that for some other calls, although I can't remember > where. The point would be that we have the value of _2 and can keep folding > its uses. For errno purposes alone that would be possible, but the function is marked #define ATTR_MATHFN_ERRNO (flag_errno_math ? \ ATTR_NOTHROW_LEAF_LIST : ATTR_CONST_NOTHROW_LEAF_LIST) and thus with -ftrapping-math -fno-math-errno I'm afraid we'd immediately DCE the call in the second form (without lhs). Jakub
Re: [committed][gcc][patch] Require sse for testcase on i686.
On Sat, Jul 07, 2018 at 11:07:28AM +, Tamar Christina wrote: > > On Fri, Jul 06, 2018 at 11:46:43AM +0100, Tamar Christina wrote: > > > This fixes an ABI warning generated on i686-pc-linux-gnu when using > > > `vector_size` with no sse enabled explicitly. > > > > > > Regtested single test on x86_64-pc-linux-gnu with -m32 and no issues. > > > > > > Committed under the GCC obvious rule. > > > > That is insufficient, I get the FAIL on i686-linux. > > You don't really need dg-require-effective-target sse, it is a dg-do compile > > time only test and on i?86/x86_64 failures with old assemblers would show up > > only when assembling. > > But -msse really should be used on all i?86-*-* and x86_64-*-*. > > Ah I wasn't are of that, I initially didn't add it to x86-64 since the > test wasn't generating the ABI warning there. But this is good to know > for the future. The thing is that i?86-*-* and x86_64-*-* can be both multilib, and only the -m64 and -mx32 multilibs default to -msse2, -m32 does not. In gcc.target/ia32 one uses typically ia32 effective target for the -m32 i?86/x86_64 multilib, but in this case it doesn't hurt to add -msse to all. Jakub
Re: [PATCH][debug] Handle debug references to skipped params
On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote: > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -5141,6 +5141,10 @@ expand_debug_source_expr (tree exp) > >switch (TREE_CODE (exp)) > { > +case VAR_DECL: > + if (DECL_ABSTRACT_ORIGIN (exp)) > + return expand_debug_source_expr (DECL_ABSTRACT_ORIGIN (exp)); > + break; > case PARM_DECL: >{ > mode = DECL_MODE (exp); This is ok. > diff --git a/gcc/testsuite/gcc.dg/vla-1.c b/gcc/testsuite/gcc.dg/vla-1.c > new file mode 100644 > index 000..0c19feffd2b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vla-1.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-g -O3 -fdump-tree-optimized" } */ > + > + > +/* One debug source bind is generated for the parameter, and two to describe > the > + sizes of a and b. */ > +/* { dg-final { scan-tree-dump-times " s=> i" 3 "optimized" } } */ I think you at least need explicit -fvar-tracking-assignments -fno-selective-scheduling -fno-selective-scheduling2 and perhaps some guard to ignore the test on nvptx which disables -fvar-tracking unconditionally? > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index 427ef959740..6fbd8c3ca61 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -208,7 +208,9 @@ remap_ssa_name (tree name, copy_body_data *id) > n = id->decl_map->get (val); > if (n != NULL) > val = *n; > - if (TREE_CODE (val) != PARM_DECL) > + if (TREE_CODE (val) != PARM_DECL > + && !(TREE_CODE (val) == VAR_DECL > +&& DECL_ABSTRACT_ORIGIN (val))) > { > processing_debug_stmt = -1; > return name; Please use VAR_P macro. Jakub
Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics
On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote: > > I guess we want to backport it soon, but would appreciate somebody testing > > it on real AVX512-{BW,VL} hw before doing the backports. > > I've run the testsuite with this patch applied and all tests passed on > i7-7800X. Thanks for the testing. > There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c > failures, but those seem unrelated. These are dg-do compile tests, and they PASS for me, even when doing make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'" So, how exactly you've configured your gcc, what kind of options are passed to the test and how they FAIL? Jakub
Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics
On Mon, Jul 09, 2018 at 12:00:46PM +0300, Grazvydas Ignotas wrote: > On Mon, Jul 9, 2018 at 10:37 AM, Jakub Jelinek wrote: > > On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote: > >> > I guess we want to backport it soon, but would appreciate somebody > >> > testing > >> > it on real AVX512-{BW,VL} hw before doing the backports. > >> > >> I've run the testsuite with this patch applied and all tests passed on > >> i7-7800X. > > > > Thanks for the testing. > > > >> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c > >> failures, but those seem unrelated. > > > > These are dg-do compile tests, and they PASS for me, even when doing > > make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 > > i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'" > > So, how exactly you've configured your gcc, what kind of options are > > passed to the test and how they FAIL? > > I should've mentioned I've tested this patch on top of 8.1 release > tarball and used crosstool-NG to build the toolchain with it's "GCC > test suite" option enabled. It looks like crosstool is applying some > patches, so the results might not be valid. Here is the log (seems to > contain the configuration info), where I just grepped for FAIL and the > new test names to see if they were actually run: > > http://notaz.gp2x.de/misc/unsorted/gcc.log.xz Don't see any FAILs even in your log file on the above tests: spawn -ignore SIGHUP gcc /home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 -ffat-lto-objects -S -o avx512vl-vmovdqa64-1.s PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c (test for excess errors) PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+\\([^\n]*%ymm[0-9]+(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+\\([^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\nxy]*\\(.{5,6}(?:\n|[ \\t]+#) 1 gcc.target/i386/avx512vl-vmovdqa64-1.c: vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) found 0 times XFAIL: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1 spawn -ignore SIGHUP gcc /home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vpermilpdi-1.c -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 -ffat-lto-objects -S -o avx512vl-vpermilpdi-1.s PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c (test for excess errors) gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ \\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) found 0 times XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ \\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ \\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) found 0 times XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ \\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ \\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1 PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ \\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1 XFAIL is expected fail, not unexpected... Jakub
[committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)
Hi! While working on OpenMP 5.0 range-for support, I've discovered that even for normal class iterators distribute parallel for gimplification ICEs in several ways (other composite loop constructs work only because class iterators are not allowed on them). The problem is that the FEs emit the code that needs to be done before computing number of the iterations around the innermost construct, which we then wrap into OMP_PARALLEL and OMP_DISTRIBUTE and then we want to compute number of iterations on the OMP_DISTRIBUTE. The following patch fixes it by detecting these cases and moving the outer composite constructs right around the innermost one. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2018-07-10 Jakub Jelinek PR c++/86443 * gimplify.c (find_combined_omp_for): Add DATA argument, in addition to finding the inner OMP_FOR/OMP_SIMD stmt find non-trivial wrappers, BLOCKs with BLOCK_VARs, OMP_PARALLEL in between, OMP_FOR in between. (gimplify_omp_for): For composite loops, move outer OMP_{DISTRIBUTE,TASKLOOP,FOR,PARALLEL} right around innermost OMP_FOR/OMP_SIMD if there are any non-trivial wrappers. For class iterators add any needed clauses. Allow OMP_FOR_ORIG_DECLS to contain TREE_LIST for both the original class iterator and the "last" helper var. Gimplify OMP_FOR_PRE_BODY before the outermost composite loop, remember has_decl_expr from outer composite loops for the innermost OMP_SIMD in TREE_PRIVATE bit on OMP_FOR_INIT. gcc/c-family/ * c-omp.c (c_omp_check_loop_iv_r, c_omp_check_loop_iv): Allow declv to contain TREE_LIST for both the original class iterator and the "last" helper var. gcc/cp/ * semantics.c (handle_omp_for_class_iterator): Remove lastp argument, instead of setting *lastp turn orig_declv elt into a TREE_LIST. (finish_omp_for): Adjust handle_omp_for_class_iterator caller. * pt.c (tsubst_omp_for_iterator): Allow OMP_FOR_ORIG_DECLS to contain TREE_LIST for both the original class iterator and the "last" helper var. libgomp/ * testsuite/libgomp.c++/for-15.C: New test. --- gcc/gimplify.c.jj 2018-07-07 09:45:42.133890332 +0200 +++ gcc/gimplify.c 2018-07-09 15:47:14.587400243 +0200 @@ -9532,24 +9532,53 @@ gimplify_omp_task (tree *expr_p, gimple_ } /* Helper function of gimplify_omp_for, find OMP_FOR resp. OMP_SIMD - with non-NULL OMP_FOR_INIT. */ + with non-NULL OMP_FOR_INIT. Also, fill in pdata array, + pdata[0] non-NULL if there is anything non-trivial in between, pdata[1] + is address of OMP_PARALLEL in between if any, pdata[2] is address of + OMP_FOR in between if any and pdata[3] is address of the inner + OMP_FOR/OMP_SIMD. */ static tree -find_combined_omp_for (tree *tp, int *walk_subtrees, void *) +find_combined_omp_for (tree *tp, int *walk_subtrees, void *data) { + tree **pdata = (tree **) data; *walk_subtrees = 0; switch (TREE_CODE (*tp)) { case OMP_FOR: + if (OMP_FOR_INIT (*tp) != NULL_TREE) + { + pdata[3] = tp; + return *tp; + } + pdata[2] = tp; *walk_subtrees = 1; - /* FALLTHRU */ + break; case OMP_SIMD: if (OMP_FOR_INIT (*tp) != NULL_TREE) - return *tp; + { + pdata[3] = tp; + return *tp; + } break; case BIND_EXPR: + if (BIND_EXPR_VARS (*tp) + || (BIND_EXPR_BLOCK (*tp) + && BLOCK_VARS (BIND_EXPR_BLOCK (*tp + pdata[0] = tp; + *walk_subtrees = 1; + break; case STATEMENT_LIST: + if (!tsi_one_before_end_p (tsi_start (*tp))) + pdata[0] = tp; + *walk_subtrees = 1; + break; +case TRY_FINALLY_EXPR: + pdata[0] = tp; + *walk_subtrees = 1; + break; case OMP_PARALLEL: + pdata[1] = tp; *walk_subtrees = 1; break; default: @@ -9574,6 +9603,115 @@ gimplify_omp_for (tree *expr_p, gimple_s orig_for_stmt = for_stmt = *expr_p; + if (OMP_FOR_INIT (for_stmt) == NULL_TREE) +{ + tree *data[4] = { NULL, NULL, NULL, NULL }; + gcc_assert (TREE_CODE (for_stmt) != OACC_LOOP); + inner_for_stmt = walk_tree (&OMP_FOR_BODY (for_stmt), + find_combined_omp_for, data, NULL); + if (inner_for_stmt == NULL_TREE) + { + gcc_assert (seen_error ()); + *expr_p = NULL_TREE; + return GS_ERROR; + } + if (data[2] && OMP_FOR_PRE_BODY (*data[2])) + { + append_to_statement_list_force (OMP_FOR_PRE_BODY (*data[2]), + &OMP_FOR_PRE_BODY (for_stmt)); + OMP_FOR_PRE_BODY (*data[2]) = NULL_TREE; + } + if (OMP_FOR_PRE_BODY (inner_for_stmt)) + { + append_to_statement_list_force (OMP_FOR_PRE_BODY (inner_for_stmt),
[committed] Save/restore OpenMP linear clause modifiers in modules (PR fortran/86421)
Hi! This patch (in a backwards compatible way) handles saving and restoring of linear clause modifiers. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2018-07-10 Jakub Jelinek PR fortran/86421 * module.c (omp_declare_simd_clauses): Add LINEAR with _REF, _VAL and _UVAL suffixes. (mio_omp_declare_simd): Save and restore ref, val and uval modifiers on linear clauses. Initialize n->where to gfc_current_locus. * gfortran.dg/vect/pr86421.f90: New test. --- gcc/fortran/module.c.jj 2018-02-13 09:28:10.0 +0100 +++ gcc/fortran/module.c2018-07-09 18:56:49.595348962 +0200 @@ -4098,6 +4098,9 @@ static const mstring omp_declare_simd_cl minit ("UNIFORM", 3), minit ("LINEAR", 4), minit ("ALIGNED", 5), +minit ("LINEAR_REF", 33), +minit ("LINEAR_VAL", 34), +minit ("LINEAR_UVAL", 35), minit (NULL, -1) }; @@ -4140,7 +4143,10 @@ mio_omp_declare_simd (gfc_namespace *ns, } for (n = ods->clauses->lists[OMP_LIST_LINEAR]; n; n = n->next) { - mio_name (4, omp_declare_simd_clauses); + if (n->u.linear_op == OMP_LINEAR_DEFAULT) + mio_name (4, omp_declare_simd_clauses); + else + mio_name (32 + n->u.linear_op, omp_declare_simd_clauses); mio_symbol_ref (&n->sym); mio_expr (&n->expr); } @@ -4181,11 +4187,20 @@ mio_omp_declare_simd (gfc_namespace *ns, case 4: case 5: *ptrs[t - 3] = n = gfc_get_omp_namelist (); + finish_namelist: + n->where = gfc_current_locus; ptrs[t - 3] = &n->next; mio_symbol_ref (&n->sym); if (t != 3) mio_expr (&n->expr); break; + case 33: + case 34: + case 35: + *ptrs[1] = n = gfc_get_omp_namelist (); + n->u.linear_op = (enum gfc_omp_linear_op) (t - 32); + t = 4; + goto finish_namelist; } } } --- gcc/testsuite/gfortran.dg/vect/pr86421.f90.jj 2018-07-09 19:09:56.662398875 +0200 +++ gcc/testsuite/gfortran.dg/vect/pr86421.f90 2018-07-09 19:07:57.432240946 +0200 @@ -0,0 +1,35 @@ +! PR fortran/86421 +! { dg-require-effective-target vect_simd_clones } +! { dg-additional-options "-fopenmp-simd" } +! { dg-additional-options "-mavx" { target avx_runtime } } + +module mod86421 + implicit none +contains + subroutine foo(x, y, z) +real :: x +integer :: y, z +!$omp declare simd linear(ref(x)) linear(val(y)) linear(uval(z)) +x = x + y +z = z + 1 + end subroutine +end module mod86421 + +program pr86421 + use mod86421 + implicit none + integer :: i, j + real :: a(64) + j = 0 + do i = 1, 64 +a(i) = i + end do + !$omp simd + do i = 1, 64 +call foo (a(i), i, j) + end do + do i = 1, 64 +if (a(i) .ne. (2 * i)) stop 1 + end do + if (j .ne. 64) stop 2 +end program pr86421 Jakub
[PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)
Hi! cp_maybe_instrument_return is looking for a return stmt at the end of function to decide whether to omit -fsanitize=return instrumentation or __builtin_unreachable addition. If a STATEMENT_LIST has a return followed by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return though. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-07-10 Jakub Jelinek PR sanitizer/86406 * cp-gimplify.c (cp_maybe_instrument_return): Skip trailing DEBUG_BEGIN_STMTs. * g++.dg/ubsan/pr86406.C: New test. --- gcc/cp/cp-gimplify.c.jj 2018-07-05 11:41:51.687718588 +0200 +++ gcc/cp/cp-gimplify.c2018-07-09 09:57:16.368775004 +0200 @@ -1621,6 +1621,13 @@ cp_maybe_instrument_return (tree fndecl) case STATEMENT_LIST: { tree_stmt_iterator i = tsi_last (t); + while (!tsi_end_p (i)) + { + tree p = tsi_stmt (i); + if (TREE_CODE (p) != DEBUG_BEGIN_STMT) + break; + tsi_prev (&i); + } if (!tsi_end_p (i)) { t = tsi_stmt (i); --- gcc/testsuite/g++.dg/ubsan/pr86406.C.jj 2018-07-09 09:58:57.362878125 +0200 +++ gcc/testsuite/g++.dg/ubsan/pr86406.C2018-07-09 09:58:37.716858063 +0200 @@ -0,0 +1,33 @@ +// PR sanitizer/86406 +// { dg-do compile } +// { dg-options "-fcompare-debug -fsanitize=undefined -g -O1" } + +typedef enum { } cmd_status; +class ECell; +class ECell_const_ptr { }; +class ECell_ptr +{ + ECell *mp_element; + ECell *getPointer () const { return mp_element; } +public: + operator ECell_const_ptr () const { return ECell_const_ptr(); } +}; + +extern ECell_ptr NULL_CELL; +class VwUI_2DCellLayerView; +class view_cell_layoutImpl +{ + cmd_status handleChangeFlags (VwUI_2DCellLayerView * + p_ui_celllayerview, + ECell_const_ptr p_peekCell); + cmd_status openCellLayoutView (); +}; + +cmd_status +view_cell_layoutImpl::openCellLayoutView () +{ + ECell_const_ptr pcell = NULL_CELL; + VwUI_2DCellLayerView *p_user_interface; + return handleChangeFlags (p_user_interface, pcell); + ; +} Jakub
Re: [PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)
On Tue, Jul 10, 2018 at 10:01:10AM +0200, Richard Biener wrote: > On Tue, 10 Jul 2018, Jakub Jelinek wrote: > > cp_maybe_instrument_return is looking for a return stmt at the end of > > function to decide whether to omit -fsanitize=return instrumentation or > > __builtin_unreachable addition. If a STATEMENT_LIST has a return followed > > by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return > > though. > > > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > > trunk? > > OK. This also affects the branch? 8.x only, will commit it there too. > > 2018-07-10 Jakub Jelinek > > > > PR sanitizer/86406 > > * cp-gimplify.c (cp_maybe_instrument_return): Skip trailing > > DEBUG_BEGIN_STMTs. > > > > * g++.dg/ubsan/pr86406.C: New test. Jakub
Re: [RFC] Fix recent popcount change is breaking
On Tue, Jul 10, 2018 at 03:17:48PM +0200, Richard Biener wrote: > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah > > Jeff told me that the recent popcount built-in detection is causing > > kernel build issues as > > ERROR: "__popcountsi2" > > [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined! > > > > I could also reproduce this. AFIK, we should check if the libfunc is > > defined while checking popcount? > > > > I am testing the attached RFC patch. Is this reasonable? > > It doesn't work that way, all targets have this libfunc in libgcc. This means > the kernel has to provide it. The only thing you could do is restrict > replacement of CALL_EXPRs (in SCEV cprop) to those the target > natively supports. Yeah, that is what we've done in the past in other cases, e.g. PR82981 is somewhat similar. So perhaps just check the optab and use it only if it is supported? Jakub
Re: [PATCH] PR debug/86459 - Fix -gsplit-dwarf -g3 gcc_assert
On Wed, Jul 11, 2018 at 12:20:08AM +0200, Mark Wielaard wrote: > There was a typo in the output_macinfo_op gcc_assert. > The function is called dwarf_FORM, not dwarf_form. > Add the provided testcase from the bug to test -gsplit-dwarf -g3. > > gcc/ChangeLog: > > PR debug/86459 > * dwarf2out.c (output_macinfo_op): Fix dwarf_FORM typo in gcc_assert. > > gcc/testsuite/ChangeLog: > > PR debug/86459 > * gcc.dg/pr86459.c: New test. Ok, thanks. Jakub
Re: [committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)
On Tue, Jul 10, 2018 at 09:18:18AM +0200, Jakub Jelinek wrote: > Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. I found two small issues and one big issue (results being declare target) which break the test if using non-shared memory offloading. This should fix it, tested on x86_64-linux, committed to trunk. 2018-07-11 Jakub Jelinek PR c++/86443 * testsuite/libgomp.c++/for-15.C (a): Remove unused variable. (results): Make sure the variable is not inside declare target region. (qux): Remove unused function. --- libgomp/testsuite/libgomp.c++/for-15.C (revision 262551) +++ libgomp/testsuite/libgomp.c++/for-15.C (working copy) @@ -88,10 +88,11 @@ private: template const I &J::begin () { return b; } template const I &J::end () { return e; } +#pragma omp end declare target -int a[2000]; int results[2000]; +#pragma omp declare target template void baz (I &i) @@ -110,13 +111,6 @@ baz (int i) } void -qux (I &i) -{ - if (*i != 1931) -abort (); -} - -void f1 (J j) { #pragma omp distribute parallel for default(none) Jakub
[PATCH] Fix store-merging wrong-code issue (PR tree-optimization/86492)
Hi! The following testcase is a similar issue to PR84503 and the fix is similar, because coalesce_immediate_stores temporarily sorts the stores on ascending bitpos and if stores are merged, the merged store is emitted in the location of the latest of the stores, we need to verify that there is no overlap with other stores that are originally before that latest store from those we are considering and overlaps the set of stores we are considering to merge. In that case we need to punt and not merge (unless we do smarts like prove overlap between just some of the stores and force reordering). Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.2? 2018-07-11 Jakub Jelinek PR tree-optimization/86492 * gimple-ssa-store-merging.c (imm_store_chain_info::coalesce_immediate_stores): Call check_no_overlap even for the merge_overlapping case. Formatting fix. * gcc.c-torture/execute/pr86492.c: New test. --- gcc/gimple-ssa-store-merging.c.jj 2018-06-13 10:05:53.0 +0200 +++ gcc/gimple-ssa-store-merging.c 2018-07-11 19:24:12.084120206 +0200 @@ -2702,7 +2702,12 @@ imm_store_chain_info::coalesce_immediate { /* Only allow overlapping stores of constants. */ if (info->rhs_code == INTEGER_CST - && merged_store->stores[0]->rhs_code == INTEGER_CST) + && merged_store->stores[0]->rhs_code == INTEGER_CST + && check_no_overlap (m_store_info, i, INTEGER_CST, + MAX (merged_store->last_order, info->order), + MAX (merged_store->start + + merged_store->width, + info->bitpos + info->bitsize))) { merged_store->merge_overlapping (info); goto done; @@ -2732,10 +2737,8 @@ imm_store_chain_info::coalesce_immediate info->ops_swapped_p = true; } if (check_no_overlap (m_store_info, i, info->rhs_code, - MAX (merged_store->last_order, -info->order), - MAX (merged_store->start -+ merged_store->width, + MAX (merged_store->last_order, info->order), + MAX (merged_store->start + merged_store->width, info->bitpos + info->bitsize))) { /* Turn MEM_REF into BIT_INSERT_EXPR for bit-field stores. */ --- gcc/testsuite/gcc.c-torture/execute/pr86492.c.jj2018-07-11 19:40:27.760122514 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr86492.c 2018-07-11 19:40:13.460107841 +0200 @@ -0,0 +1,34 @@ +/* PR tree-optimization/86492 */ + +union U +{ + unsigned int r; + struct S + { +unsigned int a:12; +unsigned int b:4; +unsigned int c:16; + } f; +}; + +__attribute__((noipa)) unsigned int +foo (unsigned int x) +{ + union U u; + u.r = 0; + u.f.c = x; + u.f.b = 0xe; + return u.r; +} + +int +main () +{ + union U u; + if (__CHAR_BIT__ * __SIZEOF_INT__ != 32 || sizeof (u.r) != sizeof (u.f)) +return 0; + u.r = foo (0x72); + if (u.f.a != 0 || u.f.b != 0xe || u.f.c != 0x72) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix PR86462
On Thu, Jul 12, 2018 at 12:29:20PM +0200, Richard Biener wrote: > After my PR86413 fix to always annotate existing lexical block DIEs with > range attributes debuginfo grows significantly in case we previously > had "stale" lexical block DIEs without any variables. > > The following fixes this by eliding those comletely and not emitting > a lexical block DIE for blocks that just contain DECL_INGORED_P > entities. This solves the reported size regression and the > empty lexical block DIEs vanish. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > OK for trunk? Do you have a proof that BLOCK_NON_LOCALIZED_VAR is ever !DECL_IGNORED_P? I see it is filled with: if ((!optimize || debug_info_level > DINFO_LEVEL_TERSE) && !DECL_IGNORED_P (old_var) && nonlocalized_list) vec_safe_push (*nonlocalized_list, old_var); (twice) in tree-inline.c. Anything else that populates it? Jakub
Re: [PATCH] Fix PR86462
On Thu, Jul 12, 2018 at 01:01:14PM +0200, Richard Biener wrote: > No, I don't think so. OK, I'll remove the loop over > BLOCK_NON_LOCALIZED_VAR and keep the original check for it. > > I also simplified stuff by hoisting the TREE_USED and friends checks. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok? LGTM. > 2018-07-12 Richard Biener > > PR debug/86462 > * dwarf2out.c (gen_block_die): Only output blocks when they have > at least one !DECL_IGNORED_P variable. Jakub
[PATCH] Add testcase for already fixed bug
Hi! I've created a self-contained testcase for a powerpc64{,le} wrong-code introduced (at least for powerpc64le) with r256656 and just later found it was already fixed with r260329. The SLP vectorized: + w.l = x.l; + w.u = x.u; was actually swapping it, so acting as: w.u = x.l; w.l = x.u; instead. As the testcase is sufficiently different from PR85698 testcase, I've committed this one too to trunk. 2018-07-12 Jakub Jelinek * gcc.dg/torture/20180712-1.c: New test. --- gcc/testsuite/gcc.dg/torture/20180712-1.c.jj2018-07-12 13:14:46.513068757 +0200 +++ gcc/testsuite/gcc.dg/torture/20180712-1.c 2018-07-12 13:14:07.851035363 +0200 @@ -0,0 +1,76 @@ +/* { dg-do run } */ +/* { dg-additional-options "-fstack-protector" { target fstack_protector } } */ +/* { dg-additional-options "-fPIC" { target fpic } } */ + +struct S { int *l, *u; }; +int a[3]; + +__attribute__((noipa)) struct S +foo (void) +{ + int *p = a, *q = a + 1; + struct S s; + asm volatile ("" : "+g" (p), "+g" (q) : : "memory"); + s.l = p; + s.u = q; + a[0]++; + return s; +} + +__attribute__((noipa)) void +bar (struct S *x) +{ + asm volatile ("" : : "g" (x) : "memory"); + if (x->l != a || x->u != a + 1) +__builtin_abort (); + a[1]++; +} + +__attribute__((noipa)) int +baz (int *x, int *y) +{ + int r = -1; + asm volatile ("" : "+g" (r) : "g" (x), "g" (y) : "memory"); + a[2]++; + return r; +} + +__attribute__((noipa)) void +quux (void) +{ + asm volatile ("" : : : "memory"); +} + +__attribute__((noipa)) void +qux (void) +{ + struct S v = foo (); + struct S w; + struct S x = foo (); + int y = 0; + + w.l = x.l; + w.u = x.u; + if (baz (x.l, v.l) > 0) +{ + w.l = v.l; + y = 1; + quux (); +} + if (baz (x.u, v.u) < 0) +{ + w.u = v.u; + y = 1; +} + if (y) +bar (&w); +} + +int +main () +{ + qux (); + if (a[0] != 2 || a[1] != 1 || a[2] != 2) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix PR84829
On Wed, Jul 04, 2018 at 09:41:17AM +0200, Richard Biener wrote: > > The following removes adding -lieee to the link command when -mieee-fp > is specified for GNU targets as suggested by Joseph in the PR. It doesn't > touch m32r so the fix may be incomplete depending on the C library used. > > I've added a testcase that makes sure we can link but not functional > tests of -mieee-fp which I suppose exist already (eh...). > > Bootstrap and regtest is running on x86_64-unknown-linux-gnu with > glibc 2.27 where the testcase fails before the patch. > > OK for trunk and active branches? > > Thanks, > Richard. > > >From a71f976df0f17790202c0a692920c9d67e009259 Mon Sep 17 00:00:00 2001 > From: Richard Guenther > Date: Wed, 4 Jul 2018 09:34:14 +0200 > Subject: [PATCH] fix-pr84829 > > 2018-07-04 Richard Biener > > PR target/84829 > * config/gnu-user.h (GNU_USER_TARGET_NO_PTHREADS_LIB_SPEC): > Remove -mieee-fp handling. > > * gcc.target/i386/pr84829.c: New testcase. Ok. Jakub
[committed] #pragma omp declare target fixes
Hi! As the following testcases show, diagnosing non-mappable type already in c*_decl_attributes is too early, the type might not be finalized and completed at that point for some variables yet. This patch defers it until *finish_decl. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 8.2. 2018-07-12 Jakub Jelinek * c-attribs.c (c_common_attribute_table): Add "omp declare target implicit" attribute. * c-decl.c (c_decl_attributes): Don't diagnose vars without mappable type here, instead add "omp declare target implicit" attribute. (finish_decl): Diagnose vars without mappable type here. * decl2.c (cplus_decl_attributes): Don't diagnose vars without mappable type here, instead add "omp declare target implicit" attribute. Add that attribute instead of "omp declare target" also when processing_template_decl. * decl.c (cp_finish_decl): Diagnose vars without mappable type here, and before calling cp_omp_mappable_type call complete_type. * c-c++-common/gomp/declare-target-3.c: New test. * g++.dg/gomp/declare-target-2.C: New test. --- gcc/c-family/c-attribs.c.jj 2018-07-12 09:33:07.648102554 +0200 +++ gcc/c-family/c-attribs.c2018-07-12 16:18:46.483238747 +0200 @@ -439,6 +439,8 @@ const struct attribute_spec c_common_att handle_omp_declare_target_attribute, NULL }, { "omp declare target link", 0, 0, true, false, false, false, handle_omp_declare_target_attribute, NULL }, + { "omp declare target implicit", 0, 0, true, false, false, false, + handle_omp_declare_target_attribute, NULL }, { "alloc_align", 1, 1, false, true, true, false, handle_alloc_align_attribute, attr_alloc_exclusions }, --- gcc/c/c-decl.c.jj 2018-06-13 10:05:04.996092971 +0200 +++ gcc/c/c-decl.c 2018-07-12 17:48:35.370278519 +0200 @@ -4643,8 +4643,8 @@ c_decl_attributes (tree *node, tree attr { if (VAR_P (*node) && !lang_hooks.types.omp_mappable_type (TREE_TYPE (*node))) - error ("%q+D in declare target directive does not have mappable type", - *node); + attributes = tree_cons (get_identifier ("omp declare target implicit"), + NULL_TREE, attributes); else attributes = tree_cons (get_identifier ("omp declare target"), NULL_TREE, attributes); @@ -5223,7 +5223,27 @@ finish_decl (tree decl, location_t init_ diagnose_uninitialized_cst_member (decl, type); } - invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl); + if (flag_openmp + && VAR_P (decl) + && lookup_attribute ("omp declare target implicit", + DECL_ATTRIBUTES (decl))) +{ + DECL_ATTRIBUTES (decl) + = remove_attribute ("omp declare target implicit", + DECL_ATTRIBUTES (decl)); + if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (decl))) + error ("%q+D in declare target directive does not have mappable type", + decl); + else if (!lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("omp declare target link", +DECL_ATTRIBUTES (decl))) + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("omp declare target"), + NULL_TREE, DECL_ATTRIBUTES (decl)); +} + + invoke_plugin_callbacks (PLUGIN_FINISH_DECL, decl); } /* Given a parsed parameter declaration, decode it into a PARM_DECL. --- gcc/cp/decl2.c.jj 2018-07-02 18:01:29.300224096 +0200 +++ gcc/cp/decl2.c 2018-07-12 17:48:35.367278517 +0200 @@ -1490,11 +1490,11 @@ cplus_decl_attributes (tree *decl, tree && DECL_CLASS_SCOPE_P (*decl)) error ("%q+D static data member inside of declare target directive", *decl); - else if (!processing_template_decl - && VAR_P (*decl) - && !cp_omp_mappable_type (TREE_TYPE (*decl))) - error ("%q+D in declare target directive does not have mappable type", - *decl); + else if (VAR_P (*decl) + && (processing_template_decl + || !cp_omp_mappable_type (TREE_TYPE (*decl + attributes = tree_cons (get_identifier ("omp declare target implicit"), + NULL_TREE, attributes); else attributes = tree_cons (get_identifier ("omp declare target"),
Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)
On Fri, Jul 13, 2018 at 01:49:38PM +0200, Richard Biener wrote: > The testcase in the PR, while previously ICEing because the C++ FE doesn't > properly capture VLA size fields, now ICEs because gimplification > introduces SSA uses that appear in a different function than its > definition. This happens because there is tree sharing between > the functions. For nested functions (which the C++ lambdas are not) > such tree sharing ended up being harmless before GCC7 because unnesting > resolves all locals with wrong origin to the static chain (and > gimplification ordering made sure definitions always appear in the > outer function). > > The following resolves this by unsharing size expressions in c-common.c > > I realize that this may end up pessimizing early code since 1:1 > tree-sharing with what is gimplified from a DECL_EXPR doesn't > share the gimplification result. I think the unsharing is just wrong, we never want to unshare those, the SAVE_EXPR expression needs to be evaluated at the DECL_EXPR point and never anywhere else, never twice (it could have side-effects etc.). The C++ FE must be fixed to handle the lambda cases. > Another option might be to force gimplification to not generate > SSA temporaries when gimplifying size positions but gimplify_one_sizepos > oddly enough unshares trees before gimplifying ...(!?) This would > need to be removed (see patch after the tested patch below). I like that gimplify_one_sizepos change much more (I guess we need a non-lambda testcase). Lambdas were broken even before GCC7, while we might not ICE, we certainly didn't generate correct code. Jakub
C++ patch ping
Hi! I'd like to ping the following C++ patches: - PR c++/85515 make range for temporaries unspellable during parsing and only turn them into spellable for debug info purposes http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html - PR c++/3698, PR c++/86208 extern_decl_map & TREE_USED fix (plus 2 untested variants) http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00084.html Jakub
Re: C++ patch ping
On Fri, Jul 13, 2018 at 12:24:02PM -0400, Nathan Sidwell wrote: > On 07/13/2018 09:49 AM, Jakub Jelinek wrote: > > Hi! > > > > I'd like to ping the following C++ patches: > > > > - PR c++/85515 > >make range for temporaries unspellable during parsing and only > >turn them into spellable for debug info purposes > >http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html > > > How hard would it be to add the 6 special identifiers to the C++ global > table via initialize_predefined_identifiers (decl.c) and then use them > directly in the for range machinery? repeated get_identifier > ("string-const") just smells bad. Probably not too hard, but we have hundreds of other get_identifier ("string-const") calls in the middle-end, C++ FE, other FEs. Are those 6 more important than say "abi_tag", "gnu", "begin", "end", "get", "tuple_size", "tuple_element", and many others? Is it worth caching those? Jakub
Re: [PATCH][debug] Fix pre_dec handling in vartrack
On Sun, Jul 15, 2018 at 11:21:56PM +0200, Tom de Vries wrote: > 2018-07-15 Tom de Vries > > * var-tracking.c (vt_initialize): Fix pre_dec handling. Print adjusted > insn slim if dump_flags request TDF_SLIM. > > * gcc.target/i386/vartrack-1.c: New test. > > --- > --- a/gcc/var-tracking.c > +++ b/gcc/var-tracking.c > @@ -115,6 +115,7 @@ > #include "tree-pretty-print.h" > #include "rtl-iter.h" > #include "fibonacci_heap.h" > +#include "print-rtl.h" > > typedef fibonacci_heap bb_heap_t; > typedef fibonacci_node bb_heap_node_t; > @@ -10208,12 +10209,17 @@ vt_initialize (void) > log_op_type (PATTERN (insn), bb, insn, >MO_ADJUST, dump_file); > VTI (bb)->mos.safe_push (mo); > - VTI (bb)->out.stack_adjust += pre; > } > } > > cselib_hook_called = false; > adjust_insn (bb, insn); > + > + if (!frame_pointer_needed) > + { > + if (pre) > + VTI (bb)->out.stack_adjust += pre; > + } That is certainly unexpected. For the pre side-effects, they should be applied before adjusting the insn, not after that. I'll want to see this under the debugger. > if (DEBUG_MARKER_INSN_P (insn)) > { > reemit_marker_as_note (insn); > @@ -10227,7 +10233,10 @@ vt_initialize (void) > cselib_process_insn (insn); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > - print_rtl_single (dump_file, insn); > + if (dump_flags & TDF_SLIM) > + dump_insn_slim (dump_file, insn); > + else > + print_rtl_single (dump_file, insn); > dump_cselib_table (dump_file); > } > } This part is certainly ok. Jakub