[PATCH] Fix PR69951
The following fixes PR69951, hopefully the last case of decl alias issues with alias analysis. This time it's points-to and the DECL_UIDs used in points-to sets not being canonicalized. The simplest (and cheapest) fix is to make aliases refer to the ultimate alias target via their DECL_PT_UID which we conveniently have available. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-02-26 Richard Biener PR tree-optimization/69551 * tree-ssa-structalias.c (get_constraint_for_ssa_var): When looking through aliases adjust DECL_PT_UID to refer to the ultimate alias target. * gcc.dg/torture/pr69951.c: New testcase. Index: gcc/tree-ssa-structalias.c === *** gcc/tree-ssa-structalias.c (revision 233693) --- gcc/tree-ssa-structalias.c (working copy) *** get_constraint_for_ssa_var (tree t, vec< *** 2943,2948 --- 2943,2956 if (node && node->alias && node->analyzed) { node = node->ultimate_alias_target (); + /* Canonicalize the PT uid of all aliases to the ultimate target. +??? Hopefully the set of aliases can't change in a way that +changes the ultimate alias target. */ + gcc_assert ((! DECL_PT_UID_SET_P (node->decl) + || DECL_PT_UID (node->decl) == DECL_UID (node->decl)) + && (! DECL_PT_UID_SET_P (t) + || DECL_PT_UID (t) == DECL_UID (node->decl))); + DECL_PT_UID (t) = DECL_UID (node->decl); t = node->decl; } } Index: gcc/testsuite/gcc.dg/torture/pr69951.c === *** gcc/testsuite/gcc.dg/torture/pr69951.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr69951.c (working copy) *** *** 0 --- 1,21 + /* { dg-do run } */ + /* { dg-require-alias "" } */ + + extern void abort (void); + + int a = 1, c = 1; + extern int b __attribute__((alias("a"))); + extern int d __attribute__((alias("c"))); + int main(int argc) + { + int *p, *q; + if (argc) + p = &c, q = &d; + else + p = &b, q = &d; + *p = 1; + *q = 2; + if (*p == 1) + abort(); + return 0; + }
Re: [PATCH] Fix DW_OP_GNU_implicit_pointer referring to DW_TAG_dwarf_procedure (PR debug/69947)
On 02/25/2016 05:46 PM, Jakub Jelinek wrote: Ah, I've been looking for something that would set die_perennial_p, but actually you just set die_mark later on instead for those. So IMHO the right fix is just handle all the ops that could directly or indirectly contain references to other DIEs, rather than just handling the 3 you have there. Agreed! (as per the patch I sent) Going to bootstrap/regtest this on x86_64-linux and i686-linux now. Is this ok for trunk if it passes testing? I guess this was not for me as I’m not maintainer, but: this looks completely fine to me. :-) Thanks! -- Pierre-Marie de Rodat
[PATCH, PR69956] Fix multi-step conversion of boolean vectors
Hi, Currently multi-step vector conversion tries to compute intermediate type from its mode but it doesn't work for boolean vectors. This patch introduces a computation of intermediate vector masks. Bootstrapped and tested on x86_64-pc-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2016-02-26 Ilya Enkovich PR tree-optimization/69956 * tree-vect-stmts.c (supportable_widening_operation): Support multi-step conversion of boolean vectors. (supportable_narrowing_operation): Likewise. gcc/testsuite/ 2016-02-26 Ilya Enkovich PR tree-optimization/69956 * gcc.dg/pr69956.c: New test. diff --git a/gcc/testsuite/gcc.dg/pr69956.c b/gcc/testsuite/gcc.dg/pr69956.c new file mode 100644 index 000..37d24d4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr69956.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* x86_64-*-* } } } */ + +void +fn1 (char *b, char *d, int *c, int i) +{ + for (; i; i++, d++) +if (b[i]) + *d = c[i]; +} diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9678d7c..182b277 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -9000,9 +9000,19 @@ supportable_widening_operation (enum tree_code code, gimple *stmt, for (i = 0; i < MAX_INTERM_CVT_STEPS; i++) { intermediate_mode = insn_data[icode1].operand[0].mode; - intermediate_type - = lang_hooks.types.type_for_mode (intermediate_mode, - TYPE_UNSIGNED (prev_type)); + if (VECTOR_BOOLEAN_TYPE_P (prev_type)) + { + intermediate_type + = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) / 2, + current_vector_size); + if (intermediate_mode != TYPE_MODE (intermediate_type)) + return false; + } + else + intermediate_type + = lang_hooks.types.type_for_mode (intermediate_mode, + TYPE_UNSIGNED (prev_type)); + optab3 = optab_for_tree_code (c1, intermediate_type, optab_default); optab4 = optab_for_tree_code (c2, intermediate_type, optab_default); @@ -9065,7 +9075,7 @@ supportable_narrowing_operation (enum tree_code code, tree vectype = vectype_in; tree narrow_vectype = vectype_out; enum tree_code c1; - tree intermediate_type; + tree intermediate_type, prev_type; machine_mode intermediate_mode, prev_mode; int i; bool uns; @@ -9111,6 +9121,7 @@ supportable_narrowing_operation (enum tree_code code, /* Check if it's a multi-step conversion that can be done using intermediate types. */ prev_mode = vec_mode; + prev_type = vectype; if (code == FIX_TRUNC_EXPR) uns = TYPE_UNSIGNED (vectype_out); else @@ -9145,8 +9156,17 @@ supportable_narrowing_operation (enum tree_code code, for (i = 0; i < MAX_INTERM_CVT_STEPS; i++) { intermediate_mode = insn_data[icode1].operand[0].mode; - intermediate_type - = lang_hooks.types.type_for_mode (intermediate_mode, uns); + if (VECTOR_BOOLEAN_TYPE_P (prev_type)) + { + intermediate_type + = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) * 2, + current_vector_size); + if (intermediate_mode != TYPE_MODE (intermediate_type)) + return false; + } + else + intermediate_type + = lang_hooks.types.type_for_mode (intermediate_mode, uns); interm_optab = optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type, optab_default); @@ -9164,6 +9184,7 @@ supportable_narrowing_operation (enum tree_code code, return true; prev_mode = intermediate_mode; + prev_type = intermediate_type; optab1 = interm_optab; }
Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi
On 12/09/2015 11:01 AM, Markus Trippelsdorf wrote: > Sorry, but this is simply awful ;-). How about: > > Mark selected memory related operations in the compiler when run under > valgrind to suppress false positives. Hi. I forgot to install the patch, following Markus' correction, installed as r233735. Martin
Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.
On 12/04/2015 10:45 AM, Martin Liška wrote: > Hello. > > I noticed that Builtins section of documentation does not mention > clog10{,f,l} functions. > I've tried to write a patch, however I'm not sure how should be these > functions described. > > Thanks, > Martin > PING
Re: [ARM] Add support for overflow add, sub, and neg operations
On 02/25/2016 02:51 AM, Kyrill Tkachov wrote: Hi Michael, On 24/02/16 23:02, Michael Collison wrote: This patch adds support for builtin overflow of add, subtract and negate. This patch is targeted for gcc 7 stage 1. It was tested with no regressions in arm and thumb modes on the following targets: arm-non-linux-gnueabi arm-non-linux-gnuabihf armeb-none-linux-gnuabihf arm-non-eabi I'll have a deeper look once we're closer to GCC 7 development. I've got a few comments in the meantime. 2016-02-24 Michael Collison * config/arm/arm-modes.def: Add new condition code mode CC_V to represent the overflow bit. * config/arm/arm.c (maybe_get_arm_condition_code): Add support for CC_Vmode. * config/arm/arm.md (addv4, add3_compareV, addsi3_compareV_upper): New patterns to support signed builtin overflow add operations. (uaddv4, add3_compareC, addsi3_compareV_upper): New patterns to support unsigned builtin add overflow operations. (subv4, sub3_compare1): New patterns to support signed builtin overflow subtract operations, (usubv4): New patterns to support unsigned builtin subtract overflow operations. (negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New patterns to support builtin overflow negate operations. Can you please summarise what sequences are generated for these operations, and how they are better than the default fallback sequences. Sure for a simple test case such as: int fn3 (int x, int y, int *ovf) { int res; *ovf = __builtin_sadd_overflow (x, y, &res); return res; } Current trunk at -O2 generates fn3: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r1, #0 mov r3, #0 add r1, r0, r1 blt .L4 cmp r1, r0 blt .L3 .L2: str r3, [r2] mov r0, r1 bx lr .L4: cmp r1, r0 ble .L2 .L3: mov r3, #1 b .L2 With the overflow patch this now generates: addsr0, r0, r1 movvs r3, #1 movvc r3, #0 str r3, [r2] bx lr Also, we'd need tests for each of these overflow operations, since these are pretty complex patterns that are being added. The patterns are tested now most notably by tests in: c-c++-common/torture/builtin-arith-overflow*.c I had a few failures I resolved so the builtin overflow arithmetic functions are definitely being exercised. Also, you may want to consider splitting this into a patch series, each adding a single overflow operation, together with its tests. That way it will be easier to keep track of which pattern applies to which use case and they can go in independently of each other. Let me know if you still fell the same way given the existing test cases. +(define_expand "uaddv4" + [(match_operand:SIDI 0 "register_operand") + (match_operand:SIDI 1 "register_operand") + (match_operand:SIDI 2 "register_operand") + (match_operand 3 "")] + "TARGET_ARM" +{ + emit_insn (gen_add3_compareC (operands[0], operands[1], operands[2])); + + rtx x; + x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (VOIDmode, operands[3]), +pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + DONE; +}) + I notice this and many other patterns in this patch are guarded on TARGET_ARM. Is there any reason why they should be restricted to arm state and not be TARGET_32BIT ? I thought about this as well. I will test will TARGET_32BIT and get back to you. Thanks, Kyrill -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org
Re: [PATCH] Do not gather mem stats in run_exit_handles (PR middle-end/69919)
On 02/25/2016 07:02 PM, David Malcolm wrote: > Was this tested with "jit" in --enable-languages? I ask because > toplev::main can be run repeatedly by libgccjit, and it looks like > nothing can reset after_memory_report. (Though I don't typically build > with --enable-gather-detailed-mem-stats, and the jit docs don't mention > it). Wasn't, however as I've just retested that with jit in --enable-languages it works. However, you are right that it's not reset anywhere. I will try to come up with a better solution for GCC 7. Martin
Re: [PATCH 3/4] Replace ENABLE_CHECKING with CHECKING_P in dwarf2out
On Thu, Feb 25, 2016 at 11:06 AM, marxin wrote: > gcc/ChangeLog: Ok. Richard. > 2016-02-25 Martin Liska > > * dwarf2out.c (new_loc_descr): Replace ENABLE_CHECKING with > CHECKING_P. > (resolve_args_picking_1): Likewise. > * dwarf2out.h (struct GTY): Likewise. > --- > gcc/dwarf2out.c | 6 +++--- > gcc/dwarf2out.h | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index 97e192b..a8c21d8 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -1325,7 +1325,7 @@ new_loc_descr (enum dwarf_location_atom op, unsigned > HOST_WIDE_INT oprnd1, >dw_loc_descr_ref descr = ggc_cleared_alloc (); > >descr->dw_loc_opc = op; > -#if ENABLE_CHECKING > +#if CHECKING_P >descr->dw_loc_frame_offset = -1; > #endif >descr->dw_loc_oprnd1.val_class = dw_val_class_unsigned_const; > @@ -15369,14 +15369,14 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, > unsigned initial_frame_offset, >/* If we already met this node, there is nothing to compute anymore. > */ >if (visited.add (l)) > { > -#if ENABLE_CHECKING > +#if CHECKING_P > /* Make sure that the stack size is consistent wherever the > execution > flow comes from. */ > gcc_assert ((unsigned) l->dw_loc_frame_offset == frame_offset_); > #endif > break; > } > -#if ENABLE_CHECKING > +#if CHECKING_P >l->dw_loc_frame_offset = frame_offset_; > #endif > > diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h > index a96ac38..91b3d6b 100644 > --- a/gcc/dwarf2out.h > +++ b/gcc/dwarf2out.h > @@ -239,7 +239,7 @@ struct GTY((chain_next ("%h.dw_loc_next"))) > dw_loc_descr_node { > frame offset. */ >unsigned int frame_offset_rel : 1; >int dw_loc_addr; > -#if ENABLE_CHECKING > +#if CHECKING_P >/* When translating a function into a DWARF procedure, contains the frame > offset *before* evaluating this operation. It is -1 when not yet > initialized. */ > -- > 2.7.0 > >
Re: [PATCH 4/4] Poison ENABLE_CHECKING macro
On Thu, Feb 25, 2016 at 11:11 AM, marxin wrote: > gcc/ChangeLog: Ok. Richard. > 2016-02-25 Martin Liska > > * system.h: Poison ENABLE_CHECKING macro. > --- > gcc/system.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/gcc/system.h b/gcc/system.h > index 445073c..cb54541 100644 > --- a/gcc/system.h > +++ b/gcc/system.h > @@ -1014,6 +1014,10 @@ extern void fancy_abort (const char *, int, const char > *) ATTRIBUTE_NORETURN; > #undef rindex > #pragma GCC poison bcopy bzero bcmp rindex > > +/* Poison ENABLE_CHECKING macro that should be replaced with > + 'if (flag_checking)', or with CHECKING_P macro. */ > +#pragma GCC poison ENABLE_CHECKING > + > #endif /* GCC >= 3.0 */ > > /* This macro allows casting away const-ness to pass -Wcast-qual > -- > 2.7.0 >
Clear visibility of TYPE_DECL
Hi, while looking into PR69589 I noticed that types are not merged when pragma visibility does not match. This is because C++ FE stores visibility into TYPE_DECL that is used by FE only. This patch clears the flag in free_lang_data. Bootstrapped/regtested x86_64-linux and tested it makes no difference for Firefox LTO binary, OK? PR lto/69589 * tree.c (free_lang_data_in_decl): Clear visibility of TYPE_DECL. Index: tree.c === --- tree.c (revision 233707) +++ tree.c (working copy) @@ -5472,8 +5473,13 @@ free_lang_data_in_decl (tree decl) || (decl_function_context (decl) && !TREE_STATIC (decl))) DECL_INITIAL (decl) = NULL_TREE; } - else if (TREE_CODE (decl) == TYPE_DECL - || TREE_CODE (decl) == FIELD_DECL) + else if (TREE_CODE (decl) == TYPE_DECL) +{ + DECL_VISIBILITY (decl) = VISIBILITY_DEFAULT; + DECL_VISIBILITY_SPECIFIED (decl) = 0; + DECL_INITIAL (decl) = NULL_TREE; +} + else if (TREE_CODE (decl) == FIELD_DECL) DECL_INITIAL (decl) = NULL_TREE; else if (TREE_CODE (decl) == TRANSLATION_UNIT_DECL && DECL_INITIAL (decl)
Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types
Hi, while looking into the PR testcase I noticed that we detect wrong duplicate types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which is not necessary. Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify that nothing changes in ODR type merging except for the duplicates. OK? Honza * tree.c (need_assembler_name_p): Do not compute mangled name for variabt types. Index: tree.c === --- tree.c (revision 233707) +++ tree.c (working copy) @@ -5329,6 +5329,7 @@ need_assembler_name_p (tree decl) && TREE_CODE (decl) == TYPE_DECL && DECL_NAME (decl) && decl == TYPE_NAME (TREE_TYPE (decl)) + && TYPE_MAIN_VARIANT (TREE_TYPE (decl)) == TREE_TYPE (decl) && !TYPE_ARTIFICIAL (TREE_TYPE (decl)) && (type_with_linkage_p (TREE_TYPE (decl)) || TREE_CODE (TREE_TYPE (decl)) == INTEGER_TYPE)
Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z
On Fri, Feb 26, 2016 at 4:02 AM, kugan wrote: > > > Hi, > > This is an attempt to fix missed optimization: x + (-y * z * z) => x - y * z > * z as reported in PR40921. > > Regression tested and bootstrapped on x86-64-linux-gnu with no new > regressions. > > Is this OK for next stage1? Err. I think the way you implement that in reassoc is ad-hoc and not related to reassoc at all. In fact what reassoc is missing is to handle -y * z * (-w) * x -> y * x * w * x thus optimize negates as if they were additional * -1 entries in a multiplication chain. And then optimize a single remaining * -1 in the result chain to a negate. Then match.pd handles x + (-y) -> x - y (independent of -frounding-math btw). So no, this isn't ok as-is, IMHO you want to expand the multiplication ops chain pulling in the * -1 ops (if single-use, of course). Richard. > Thanks, > Kugan > > > gcc/ChangeLog: > > 2016-02-26 Kugan Vivekanandarajah > > PR middle-end/40921 > * tree-ssa-reassoc.c (propagate_neg_to_sub_or_add): New. > (reassociate_bb): Call propagate_neg_to_sub_or_add. > > > gcc/testsuite/ChangeLog: > > 2016-02-26 Kugan Vivekanandarajah > > PR middle-end/40921 > * gcc.dg/tree-ssa/pr40921.c: New test.
Re: Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types
On Fri, Feb 26, 2016 at 11:52:50AM +0100, Jan Hubicka wrote: > Hi, > while looking into the PR testcase I noticed that we detect wrong duplicate > types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which > is not necessary. > > Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify > that nothing changes in ODR type merging except for the duplicates. OK? > > Honza > * tree.c (need_assembler_name_p): Do not compute mangled name for > variabt types. s/variabt/variant/ Jakub
Revert gcc r227962
Hi, I've submitted a patch that was committed as r227962, it causes some unintended side effects (namely libuuid on Cygwin). Can someone please revert? Kai still needs some time to setup his gcc development environment. Thanks. signature.asc Description: OpenPGP digital signature
Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x
On Fri, Feb 26, 2016 at 4:02 AM, kugan wrote: > > > Hi, > > This is an attempt to fix missed optimization: x+x+x+x -> 4*x as reported in > PR63586. > > Regression tested and bootstrapped on x86-64-linux-gnu with no new > regressions. > > Is this OK for next stage1? That looks better, but I think the unordered_remove will break operand sorting and thus you probably don't handle x + x + x + x + y + y + y + y + y + y + z + z + z + z optimally. I'd say you simply want to avoid the recursion and collect a vector of [start, end] pairs before doing any modification to the ops vector. Richard. > Thanks, > Kugan > > > gcc/testsuite/ChangeLog: > > 2016-02-26 Kugan Vivekanandarajah > > PR middle-end/63586 > * gcc.dg/tree-ssa/reassoc-14.c: Fix multiply count. > > gcc/ChangeLog: > > 2016-02-26 Kugan Vivekanandarajah > > PR middle-end/63586 > * tree-ssa-reassoc.c (transform_add_to_multiply): New. > (reassociate_bb): Call transform_add_to_multiply. > >
Re: [PATCH] [RFA] [PR tree-optmization/69740] Schedule loop fixups when needed
On Fri, Feb 26, 2016 at 8:50 AM, Jeff Law wrote: > On 02/25/2016 03:00 AM, Richard Biener wrote: >> >> >> So I fail to see how only successor edges are relevant. Isn't the >> important >> case to catch whether we remove an edge marked EDGE_IRREDUCIBLE_LOOP? >> Even if the BB persists we might have exposed a new loop here. >> >> Note that it is not safe to look at {BB,EDGE}_IRREDUCIBLE_LOOP if the loop >> state does not have LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS set >> (the flags may be stale or missing). So it might be that we can't rely on >> non-loop passes modifying the CFG to handle this optimistically. >> >> Thus, how about (my main point) moving this to remove_edge instead, like > > Yea. That works. The !loops_state_satisfies_p check will almost certainly > cause us to trigger loop cleanups more often, but I think it's the > right/safe thing to do to catch cases where we haven't go the > IRREDUCIBLE_LOOP flags set. > > > Bootstrapped and regression tested on x86_64-linux-gnu. OK for the trunk? Ok with a comment added before that check. Richard. > Thanks, > Jeff > > > PR tree-optimization/69740 > * cfghooks.c (remove_edge): Request loop fixups if we delete > an edge that might turn an irreducible loop into a natural > loop. > > PR tree-optimization/69740 > * gcc.c-torture/compile/pr69740-1.c: New test. > * gcc.c-torture/compile/pr69740-2.c: New test. > > diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c > index bbb1017..f80e455 100644 > --- a/gcc/cfghooks.c > +++ b/gcc/cfghooks.c > @@ -408,7 +408,14 @@ void > remove_edge (edge e) > { >if (current_loops != NULL) > -rescan_loop_exit (e, false, true); > +{ > + rescan_loop_exit (e, false, true); > + > + if (!loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS) > + || (e->flags & EDGE_IRREDUCIBLE_LOOP) > + || (e->dest->flags & BB_IRREDUCIBLE_LOOP)) > + loops_state_set (LOOPS_NEED_FIXUP); > +} > >/* This is probably not needed, but it doesn't hurt. */ >/* FIXME: This should be called via a remove_edge hook. */ > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c > b/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c > new file mode 100644 > index 000..ac867d8 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c > @@ -0,0 +1,12 @@ > +char a; > +short b; > +void fn1() { > + if (b) > +; > + else { > +int c[1] = {0}; > + l1:; > + } > + if (a) > +goto l1; > +} > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c > b/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c > new file mode 100644 > index 000..a89c9a0 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c > @@ -0,0 +1,19 @@ > +inline int foo(int *p1, int p2) { > + int z = *p1; > + while (z > p2) > +p2 = 2; > + return z; > +} > +int main() { > + int i; > + for (;;) { > +int j, k; > +i = foo(&k, 7); > +if (k) > + j = i; > +else > + k = j; > +if (2 != j) > + __builtin_abort(); > + } > +} >
Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.
On Fri, Feb 26, 2016 at 11:05 AM, Martin Liška wrote: > On 12/04/2015 10:45 AM, Martin Liška wrote: >> Hello. >> >> I noticed that Builtins section of documentation does not mention >> clog10{,f,l} functions. >> I've tried to write a patch, however I'm not sure how should be these >> functions described. >> >> Thanks, >> Martin >> Ok with also addign them to the @findex list above. Richard. > PING
Re: Clear visibility of TYPE_DECL
On Fri, 26 Feb 2016, Jan Hubicka wrote: > Hi, > while looking into PR69589 I noticed that types are not merged when pragma > visibility does not match. This is because C++ FE stores visibility into > TYPE_DECL > that is used by FE only. This patch clears the flag in free_lang_data. > > Bootstrapped/regtested x86_64-linux and tested it makes no difference for > Firefox LTO binary, OK? Ok. Richard. > PR lto/69589 > * tree.c (free_lang_data_in_decl): Clear visibility of TYPE_DECL. > Index: tree.c > === > --- tree.c(revision 233707) > +++ tree.c(working copy) > @@ -5472,8 +5473,13 @@ free_lang_data_in_decl (tree decl) > || (decl_function_context (decl) && !TREE_STATIC (decl))) > DECL_INITIAL (decl) = NULL_TREE; > } > - else if (TREE_CODE (decl) == TYPE_DECL > -|| TREE_CODE (decl) == FIELD_DECL) > + else if (TREE_CODE (decl) == TYPE_DECL) > +{ > + DECL_VISIBILITY (decl) = VISIBILITY_DEFAULT; > + DECL_VISIBILITY_SPECIFIED (decl) = 0; > + DECL_INITIAL (decl) = NULL_TREE; > +} > + else if (TREE_CODE (decl) == FIELD_DECL) > DECL_INITIAL (decl) = NULL_TREE; >else if (TREE_CODE (decl) == TRANSLATION_UNIT_DECL > && DECL_INITIAL (decl)
Re: Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types
On Fri, Feb 26, 2016 at 12:04 PM, Jakub Jelinek wrote: > On Fri, Feb 26, 2016 at 11:52:50AM +0100, Jan Hubicka wrote: >> Hi, >> while looking into the PR testcase I noticed that we detect wrong duplicate >> types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which >> is not necessary. >> >> Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify >> that nothing changes in ODR type merging except for the duplicates. OK? >> >> Honza >> * tree.c (need_assembler_name_p): Do not compute mangled name for >> variabt types. > > s/variabt/variant/ Ok with that fix. Richard. > Jakub
Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.
On 02/26/2016 12:11 PM, Richard Biener wrote: > Ok with also addign them to the @findex list above. > > Richard. Thanks for review, updated version installed as r233738. Martin
[PATCH] Fix up memset handling in DSE (PR rtl-optimization/69891)
Hi! As analyzed by Eric, DSE mishandles memset calls if it can't figure out what the arguments to memset exactly are (it handles only register arguments right now), or if the second or third arguments are not CONST_INTs. In that case we don't call record_store, because we don't know what to call it on; but we need to treat it as a wild store, which is handled by the clear_rhs_from_active_local_stores () function (which record_store also uses if it can't figure out what exactly is being overwritten). Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-02-26 Jakub Jelinek Eric Botcazou PR rtl-optimization/69891 * dse.c (scan_insn): If we can't figure out memset arguments or they are non-constant, call clear_rhs_from_active_local_stores. * gcc.target/i386/pr69891.c: New test. --- gcc/dse.c.jj2016-01-19 13:32:12.0 +0100 +++ gcc/dse.c 2016-02-26 11:03:36.694206088 +0100 @@ -2556,6 +2556,8 @@ scan_insn (bb_info_t bb_info, rtx_insn * active_local_stores = insn_info; } } + else + clear_rhs_from_active_local_stores (); } } else if (SIBLING_CALL_P (insn) && reload_completed) --- gcc/testsuite/gcc.target/i386/pr69891.c.jj 2016-02-26 11:09:45.492079225 +0100 +++ gcc/testsuite/gcc.target/i386/pr69891.c 2016-02-26 11:10:58.941058170 +0100 @@ -0,0 +1,30 @@ +/* PR rtl-optimization/69891 */ +/* { dg-do run } */ +/* { dg-options "-O -fno-tree-fre -mstringop-strategy=libcall -Wno-psabi" } */ +/* { dg-additional-options "-mno-sse" { target ia32 } } */ + +typedef unsigned short A; +typedef unsigned short B __attribute__ ((vector_size (32))); +typedef unsigned int C; +typedef unsigned int D __attribute__ ((vector_size (32))); +typedef unsigned long long E; +typedef unsigned long long F __attribute__ ((vector_size (32))); + +__attribute__((noinline, noclone)) unsigned +foo(D a, B b, D c, F d) +{ + b /= (B) {1, -c[0]} | 1; + c[0] |= 7; + a %= c | 1; + c ^= c; + return a[0] + b[15] + c[0] + d[3]; +} + +int +main () +{ + unsigned x = foo ((D) {}, (B) {}, (D) {}, (F) {}); + if (x != 0) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH, rs6000] Fixing PR 67145
On Fri, Feb 26, 2016 at 12:08 AM, Richard Henderson wrote: > It's the simplify-rtx.c portion of the patch that fixes the i686 regression. > > In the PR, Alan raises some good points, but I don't believe that we can > address those for gcc6. A new rtl reassoc optimization that takes loop > invariance into account will have to wait. > > But we do need to take care of the rs6000 ice that results, and that's the > bulk of the patch -- allowing CA to be sorted to any register of the plus > chain. > > Some notes: > > ca_operand doesn't work as written, since CA_REGNO is not an available > register, and thus doesn't satisfy register_operand. > > Is there any particular reason that subf<>3_carry_in_m1 was written with > minus rather than plus like all of the other patterns? Segher wrote the rs6000 carry infrastructure, so he's the best one to comment. - David
Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419
Ping? On 26 January 2016 at 15:43, Christophe Lyon wrote: > With the attachment > > > On 26 January 2016 at 15:42, Christophe Lyon > wrote: >> Hi, >> >> This is a followup to PR63304. >> >> As discussed in bugzilla, this patch disables pcrelative_literal_loads >> when -mfix-cortex-a53-843419 (or its default configure option) is >> used. >> >> I copied the behavior of -mfix-cortex-a53-835769 (e.g. in >> aarch64_can_inline_p), and I have tested by building the Linux kernel >> using -mfix-cortex-a53-843419 and checked that >> R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under >> CONFIG_ARM64_ERRATUM_843419). >> >> For reference, this is motivated by: >> https://bugs.linaro.org/show_bug.cgi?id=1994 >> and further details on Launchpad: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009 >> >> OK for trunk? >> >> Thanks, >> >> Christophe.
Re: [PATCH][AArch64] Replace insn to zero up DF register
Evandro Menezes wrote: > > I have a question though: is it necessary to add the "fp" and "simd" > attributes to both movsf_aarch64 and movdf_aarch64 as well? You need at least the "simd" attribute, but providing "fp" as well is clearer (in principle the TARGET_FLOAT check in the pattern condition is redundant as a result, but the movhf and movtf patterns already do both). Also you want to use the smallest possible SIMD size as these are scalar operations and some microarchitectures execute 64-bit operations more efficiently than 128-bit ones, so: mov\\t%0.h[0], %w1 + movi\\t%0.4h, #0 umov\\t%w0, %1.h[0] fmov\\t%s0, %w1 + movi\\t%0.2s, #0 fmov\\t%w0, %s1 With those changes it should be ready for commit once you get the OK from James/Marcus. Wilco
Re: [PATCH] Fix up memset handling in DSE (PR rtl-optimization/69891)
On Fri, 26 Feb 2016, Jakub Jelinek wrote: > Hi! > > As analyzed by Eric, DSE mishandles memset calls if it can't figure out what > the arguments to memset exactly are (it handles only register arguments > right now), or if the second or third arguments are not CONST_INTs. > In that case we don't call record_store, because we don't know what to call > it on; but we need to treat it as a wild store, which is handled by the > clear_rhs_from_active_local_stores () function (which record_store also uses > if it can't figure out what exactly is being overwritten). > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok. Thanks, Richard. > 2016-02-26 Jakub Jelinek > Eric Botcazou > > PR rtl-optimization/69891 > * dse.c (scan_insn): If we can't figure out memset arguments > or they are non-constant, call clear_rhs_from_active_local_stores. > > * gcc.target/i386/pr69891.c: New test. > > --- gcc/dse.c.jj 2016-01-19 13:32:12.0 +0100 > +++ gcc/dse.c 2016-02-26 11:03:36.694206088 +0100 > @@ -2556,6 +2556,8 @@ scan_insn (bb_info_t bb_info, rtx_insn * > active_local_stores = insn_info; > } > } > + else > + clear_rhs_from_active_local_stores (); > } > } >else if (SIBLING_CALL_P (insn) && reload_completed) > --- gcc/testsuite/gcc.target/i386/pr69891.c.jj2016-02-26 > 11:09:45.492079225 +0100 > +++ gcc/testsuite/gcc.target/i386/pr69891.c 2016-02-26 11:10:58.941058170 > +0100 > @@ -0,0 +1,30 @@ > +/* PR rtl-optimization/69891 */ > +/* { dg-do run } */ > +/* { dg-options "-O -fno-tree-fre -mstringop-strategy=libcall -Wno-psabi" } > */ > +/* { dg-additional-options "-mno-sse" { target ia32 } } */ > + > +typedef unsigned short A; > +typedef unsigned short B __attribute__ ((vector_size (32))); > +typedef unsigned int C; > +typedef unsigned int D __attribute__ ((vector_size (32))); > +typedef unsigned long long E; > +typedef unsigned long long F __attribute__ ((vector_size (32))); > + > +__attribute__((noinline, noclone)) unsigned > +foo(D a, B b, D c, F d) > +{ > + b /= (B) {1, -c[0]} | 1; > + c[0] |= 7; > + a %= c | 1; > + c ^= c; > + return a[0] + b[15] + c[0] + d[3]; > +} > + > +int > +main () > +{ > + unsigned x = foo ((D) {}, (B) {}, (D) {}, (F) {}); > + if (x != 0) > +__builtin_abort (); > + return 0; > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.
Hi all, I admit that, the title looks a little bit confusing. The situation is like this, To make insn_1 strict, lra generates a new insn_1_reload insn. In insn_1_reload, there is a scratch operand with this form clobber (match_scratch:MODE x "=&r") When lra tries to reload insn_1_reload in later iteration, a new pseudo register (let say RXX) is created to replace this scratch operand in-place. Additionally, a new insn will be generated and inserted after insn_1_reload to finish the reload. It's in this form: (set scratch, RXX) And this instruction is illegal. no target implements this kind of pattern. LRA will ICE because of this. "internal compiler error: in lra_set_insn_recog_data, at lra.c:964" And indeed, this pattern has no side-effect. The scratch operand should stay inside the pattern. Normally, at the very beginning of LRA reload, all scratch operands will be replaced by newly created pseudo register. However, this is a problem when generated reload insn has output scratch operand. I have checked, x86, arm, aarch64, mips, arc all have such patterns. But it's not triggered. In my case, it's triggered by compiling glibc with local change. So a simple change is made in this patch. The output operand is reloaded only when it's not a scratch operand and it's not unused since then. aarch64-none-linux-gnu bootstrap and regression test OK. x86_64-linux bootstrap and regression test OK. OK for trunk? Regards, Renlin Li gcc/ChangeLog: 2016-02-26 Renlin Li * lra-constraints.c (curr_insn_transform): Don't generate reload for output scratch operand. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 08cf0aa6c4208bb60ba5071bad1255d587f1cb4a..ef5809ff226cca69bb711bfc5dab55e24caba01a 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -3882,6 +3882,8 @@ curr_insn_transform (bool check_only_p) } *loc = new_reg; if (type != OP_IN + /* Don't generate reload for output scratch operand. */ + && GET_CODE (old) != SCRATCH && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) { start_sequence ();
Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.
On Fri, Feb 26, 2016 at 1:54 PM, Renlin Li wrote: > Hi all, > > I admit that, the title looks a little bit confusing. > > The situation is like this, > To make insn_1 strict, lra generates a new insn_1_reload insn. > In insn_1_reload, there is a scratch operand with this form > clobber (match_scratch:MODE x "=&r") > > When lra tries to reload insn_1_reload in later iteration, a new pseudo > register (let say RXX) is created to replace this scratch operand in-place. > Additionally, a new insn will be generated and inserted after insn_1_reload > to > finish the reload. It's in this form: > (set scratch, RXX) > > And this instruction is illegal. no target implements this kind of pattern. > LRA will ICE because of this. > "internal compiler error: in lra_set_insn_recog_data, at lra.c:964" > > And indeed, this pattern has no side-effect. The scratch operand should > stay inside the pattern. > > Normally, at the very beginning of LRA reload, all scratch operands will be > replaced by newly created pseudo register. However, this is a problem when > generated reload insn has output scratch operand. > > I have checked, x86, arm, aarch64, mips, arc all have such patterns. But > it's > not triggered. In my case, it's triggered by compiling glibc with local > change. > > So a simple change is made in this patch. The output operand is reloaded > only > when it's not a scratch operand and it's not unused since then. > > > aarch64-none-linux-gnu bootstrap and regression test OK. > x86_64-linux bootstrap and regression test OK. > OK for trunk? Please extract a testcase from your modified glibc sources. Richard. > Regards, > Renlin Li > > > gcc/ChangeLog: > > 2016-02-26 Renlin Li > > * lra-constraints.c (curr_insn_transform): Don't generate reload for > output scratch operand. >
Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.
Hi Richard, On 26/02/16 12:57, Richard Biener wrote: On Fri, Feb 26, 2016 at 1:54 PM, Renlin Li wrote: I have checked, x86, arm, aarch64, mips, arc all have such patterns. But it's not triggered. In my case, it's triggered by compiling glibc with local change. Please extract a testcase from your modified glibc sources. Richard. The change is to the compiler. I change the compiler and tries to build a linux toolchain. I tried to create a test case. But no luck. It needs to make LRA generate a reload pattern with a non-input scratch register operand. This type of pattern actually is quite common in the target backends. I also searched the GCC bugzilla website, there is no similar bug reported. Regards, Renlin Regards, Renlin Li gcc/ChangeLog: 2016-02-26 Renlin Li * lra-constraints.c (curr_insn_transform): Don't generate reload for output scratch operand.
Re: [WWWDocs] Deprecate support for non-thumb ARM devices
On 24/02/16 13:59, Richard Earnshaw (lists) wrote: > After discussion with the ARM port maintainers we have decided that now > is probably the right time to deprecate support for versions of the ARM > Architecture prior to ARMv4t. This will allow us to clean up some of > the code base going forwards by being able to assume: > - Presence of half-word data accesses > - Presence of Thumb and therefore of interworking instructions. > > This patch records the status change in the GCC-6 release notes. > > I propose to commit this patch later this week. > Now done. R. > R. > > > deprecate.patch > > > Index: htdocs/gcc-6/changes.html > === > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v > retrieving revision 1.61 > diff -u -r1.61 changes.html > --- htdocs/gcc-6/changes.html 19 Feb 2016 05:00:54 - 1.61 > +++ htdocs/gcc-6/changes.html 19 Feb 2016 14:47:31 - > @@ -340,7 +340,14 @@ > ARM > > > - The arm port now supports target attributes and pragmas. Please > + Support for revisions of the ARM architecture prior to ARMv4t has > + been deprecated and will be removed in a future GCC release. > + This affects ARM6, ARM7 (but not ARM7TDMI), ARM8, StrongARM, and > + Faraday fa526 and fa626 devices, which do not have support for > + the Thumb execution state. > + > + > + The ARM port now supports target attributes and pragmas. Please > refer to the href="https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html#ARM-Function-Attributes";> > documentation for details of available attributes and > pragmas as well as usage instructions. >
[PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a year ago, but had a proposed patch by Mike Meissner that was never vetted and committed. I've reviewed the patch and tested it on GCC 5 and GCC 6, and with the patch applied we see the test pass for both 32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit on a Power8 little-endian platform. As I understand it, the test case got out of sync with the implementation in GCC 5, and this rewrite of the test case restores order. I've verified that the original compilation options from Andreas Schwab for this test case result in correct generation of lxsdx, which was not the case with the original report. The test case is extremely different in GCC 4.9. As Mike has noted in the PR, the -mupper-regs support does not exist in GCC 4.9, so the rewritten test case does not apply there. As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and powerpc64le-unknown-linux-gnu (-m64). Is this ok for trunk and GCC 5? Thanks, Bill 2016-02-26 Michael Meissner Bill Schmidt * gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired functionality for both 32-bit and 64-bit. --- gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(revision 220948) +++ gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ @@ -51,13 +51,14 @@ load_store_sf (unsigned long num, float value37= 0.0f; float value38= 0.0f; float value39= 0.0f; - unsigned long in_mask; - unsigned long out_mask; + unsigned long in_mask, in_mask2; + unsigned long out_mask, out_mask2; unsigned long i; for (i = 0; i < num; i++) { in_mask = *in_mask_ptr++; + in_mask2 = *in_mask_ptr++; if ((in_mask & (1L << 0)) != 0L) value00 = *from_ptr++; @@ -118,67 +119,68 @@ load_store_sf (unsigned long num, if ((in_mask & (1L << 19)) != 0L) value19 = *from_ptr++; - if ((in_mask & (1L << 20)) != 0L) + if ((in_mask2 & (1L << 0)) != 0L) value20 = *from_ptr++; - if ((in_mask & (1L << 21)) != 0L) + if ((in_mask2 & (1L << 1)) != 0L) value21 = *from_ptr++; - if ((in_mask & (1L << 22)) != 0L) + if ((in_mask2 & (1L << 2)) != 0L) value22 = *from_ptr++; - if ((in_mask & (1L << 23)) != 0L) + if ((in_mask2 & (1L << 3)) != 0L) value23 = *from_ptr++; - if ((in_mask & (1L << 24)) != 0L) + if ((in_mask2 & (1L << 4)) != 0L) value24 = *from_ptr++; - if ((in_mask & (1L << 25)) != 0L) + if ((in_mask2 & (1L << 5)) != 0L) value25 = *from_ptr++; - if ((in_mask & (1L << 26)) != 0L) + if ((in_mask2 & (1L << 6)) != 0L) value26 = *from_ptr++; - if ((in_mask & (1L << 27)) != 0L) + if ((in_mask2 & (1L << 7)) != 0L) value27 = *from_ptr++; - if ((in_mask & (1L << 28)) != 0L) + if ((in_mask2 & (1L << 8)) != 0L) value28 = *from_ptr++; - if ((in_mask & (1L << 29)) != 0L) + if ((in_mask2 & (1L << 9)) != 0L) value29 = *from_ptr++; - if ((in_mask & (1L << 30)) != 0L) + if ((in_mask2 & (1L << 10)) != 0L) value30 = *from_ptr++; - if ((in_mask & (1L << 31)) != 0L) + if ((in_mask2 & (1L << 11)) != 0L) value31 = *from_ptr++; - if ((in_mask & (1L << 32)) != 0L) + if ((in_mask2 & (1L << 12)) != 0L) value32 = *from_ptr++; - if ((in_mask & (1L << 33)) != 0L) + if ((in_mask2 & (1L << 13)) != 0L) value33 = *from_ptr++; - if ((in_mask & (1L << 34)) != 0L) + if ((in_mask2 & (1L << 14)) != 0L) value34 = *from_ptr++; - if ((in_mask & (1L << 35)) != 0L) + if ((in_mask2 & (1L << 15)) != 0L) value35 = *from_ptr++; - if ((in_mask & (1L << 36)) != 0L) + if ((in_mask2 & (1L << 16)) != 0L) value36 = *from_ptr++; - if ((in_mask & (1L << 37)) != 0L) + if ((in_mask2 & (1L << 17)) != 0L) value37 = *from_ptr++; - if ((in_mask & (1L << 38)) != 0L) + if ((in_mask2 & (1L << 18)) != 0L) value38 = *from_ptr++; - if ((in_mask & (1L << 39)) != 0L) + if ((in_mask2 & (1L << 19)) != 0L) value39 = *from_ptr++; out_mask = *out_mask_ptr++; + out_mask2 = *out_mask_ptr++; if ((out_mask & (1L << 0)) != 0L) *to_ptr++ = value00; @@ -239,64 +241,64 @@ load_store_sf (unsigned long num, if ((out_mask & (1L << 19)) != 0L) *to_ptr++ = value19; - if ((out_mask & (1L << 20)) != 0L) + if ((out_mask2 & (1L << 0)) != 0L)
[PATCH][ARM][wwwdocs] Mention Cortex-A32 and Cortex-A35 support in changes.html for GCC 6
Hi all, This patch adds a note to changes.html about the added support for Cortex-A32 and Cortex-A35. Ok to commit? Thanks, Kyrill Index: htdocs/gcc-6/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.62 diff -U 3 -r1.62 changes.html --- htdocs/gcc-6/changes.html 24 Feb 2016 09:36:06 - 1.62 +++ htdocs/gcc-6/changes.html 25 Feb 2016 10:51:14 - @@ -345,6 +345,15 @@ documentation for details of available attributes and pragmas as well as usage instructions. + + Support has been added for the following processors + (GCC identifiers in parentheses): ARM Cortex-A32 + (cortex-a32), ARM Cortex-A35 (cortex-a35). + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a32 or + -mtune=cortex-a35. +
Re: [PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)
On Fri, Feb 26, 2016 at 9:18 AM, Bill Schmidt wrote: > Hi, > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a > year ago, but had a proposed patch by Mike Meissner that was never > vetted and committed. I've reviewed the patch and tested it on GCC 5 > and GCC 6, and with the patch applied we see the test pass for both > 32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit > on a Power8 little-endian platform. > > As I understand it, the test case got out of sync with the > implementation in GCC 5, and this rewrite of the test case restores > order. I've verified that the original compilation options from Andreas > Schwab for this test case result in correct generation of lxsdx, which > was not the case with the original report. > > The test case is extremely different in GCC 4.9. As Mike has noted in > the PR, the -mupper-regs support does not exist in GCC 4.9, so the > rewritten test case does not apply there. > > As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and > powerpc64le-unknown-linux-gnu (-m64). Is this ok for trunk and GCC 5? > > Thanks, > Bill > > > 2016-02-26 Michael Meissner > Bill Schmidt > > * gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired > functionality for both 32-bit and 64-bit. Okay. Thanks, David
[PATCH] Fix PR69720
The following fixes PR69720 where with nested reductions that require unrolling the inner loop (and thus having multiple PHIs) we fail to properly build the reduction epilogue. Existing testcases in the testsuite are also affected but for them it doesn't matter as adding zero can be omitted safely ... Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress. We might be able to remove the adjustment_def case completely now, but that's not appropriate at this stage so I removed only the case we were handling not correctly. Richard. 2016-02-26 Richard Biener PR tree-optimization/69720 * tree-vect-loop.c (get_initial_def_for_reduction): Avoid the adjustment_def path for possibly vectorized defs. (vect_create_epilog_for_reduction): Handle vectorized initial defs properly. * gcc.dg/vect/vect-outer-pr69720.c: New testcase. Index: gcc/tree-vect-loop.c === *** gcc/tree-vect-loop.c(revision 233734) --- gcc/tree-vect-loop.c(working copy) *** get_initial_def_for_reduction (gimple *s *** 4110,4115 --- 4119,4133 return vect_create_destination_var (init_val, vectype); } + /* In case of a nested reduction do not use an adjustment def as + that case is not supported by the epilogue generation correctly + if ncopies is not one. */ + if (adjustment_def && nested_in_vect_loop) + { + *adjustment_def = NULL; + return vect_get_vec_def_for_operand (init_val, stmt); + } + switch (code) { case WIDEN_SUM_EXPR: *** get_initial_def_for_reduction (gimple *s *** 4124,4135 /* ADJUSMENT_DEF is NULL when called from vect_create_epilog_for_reduction to vectorize double reduction. */ if (adjustment_def) ! { ! if (nested_in_vect_loop) ! *adjustment_def = vect_get_vec_def_for_operand (init_val, stmt); ! else ! *adjustment_def = init_val; ! } if (code == MULT_EXPR) { --- 4142,4148 /* ADJUSMENT_DEF is NULL when called from vect_create_epilog_for_reduction to vectorize double reduction. */ if (adjustment_def) ! *adjustment_def = init_val; if (code == MULT_EXPR) { *** vect_create_epilog_for_reduction (vec
[C PATCH] Fix C error-recovery (PR c/69796, PR c/69974)
Hi! Already PR69483 and these two further PRs show that it is really a bad idea to set TREE_TYPE of decls with incomplete types to error_mark_node, there are lots of places in the middle-end that don't expect error_mark_nodes appearing so late. I've bootstrapped/regtested on x86_64-linux and i686-linux following patch which just doesn't do anything with the type, we don't emit the error multiple times, because if the FE emits errors, cgraphunit doesn't attempt to assemble variables or compile functions (just gimplifies them) if seen_error (). Attached is then untested alternative, set the type to something that should hopefully not cause ICEs nor further errors/warnings during the error-recovery. Ok for trunk, or shall I test the other patch instead? 2016-02-26 Jakub Jelinek PR c/69796 PR c/69974 * c-parser.c (c_parser_translation_unit): Don't change TREE_TYPE of incomplete decls to error_mark_node. * gcc.dg/pr69796.c: New test. * gcc.dg/pr69974.c: New test. --- gcc/c/c-parser.c.jj 2016-02-24 09:33:37.0 +0100 +++ gcc/c/c-parser.c2016-02-26 13:31:00.947133760 +0100 @@ -1436,10 +1436,7 @@ c_parser_translation_unit (c_parser *par tree decl; FOR_EACH_VEC_ELT (incomplete_record_decls, i, decl) if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node) - { - error ("storage size of %q+D isn%'t known", decl); - TREE_TYPE (decl) = error_mark_node; - } + error ("storage size of %q+D isn%'t known", decl); } /* Parse an external declaration (C90 6.7, C99 6.9). --- gcc/testsuite/gcc.dg/pr69796.c.jj 2016-02-26 13:16:32.352323768 +0100 +++ gcc/testsuite/gcc.dg/pr69796.c 2016-02-26 13:16:16.0 +0100 @@ -0,0 +1,10 @@ +/* PR c/69796 */ +/* { dg-do compile } */ + +struct S s;/* { dg-error "storage size of 's' isn't known" } */ + +void +foo () +{ + s a; /* { dg-error "expression statement has incomplete type|expected" } */ +} --- gcc/testsuite/gcc.dg/pr69974.c.jj 2016-02-26 13:13:40.239740046 +0100 +++ gcc/testsuite/gcc.dg/pr69974.c 2016-02-26 13:13:29.0 +0100 @@ -0,0 +1,13 @@ +/* PR c/69974 */ +/* { dg-do compile } */ + +struct S; +char foo (struct S *); +struct S a;/* { dg-error "storage size of 'a' isn't known" } */ +int b; + +void +bar () +{ + b &= foo (&a); +} Jakub 2016-02-26 Jakub Jelinek PR c/69796 PR c/69974 * c-parser.c (c_parser_translation_unit): For incomplete decls set type to char_type_node instead of error_mark_node. * gcc.dg/pr69796.c: New test. * gcc.dg/pr69974.c: New test. --- gcc/c/c-parser.c.jj 2016-02-24 09:33:37.0 +0100 +++ gcc/c/c-parser.c2016-02-26 13:04:16.416657335 +0100 @@ -1438,7 +1438,10 @@ c_parser_translation_unit (c_parser *par if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node) { error ("storage size of %q+D isn%'t known", decl); - TREE_TYPE (decl) = error_mark_node; + /* Don't set the type to error_mark_node, the middle-end is unprepared + to see error_mark_node type appearing so late in the IL, + so using some other type is better for error recovery. */ + TREE_TYPE (decl) = char_type_node; } } --- gcc/testsuite/gcc.dg/pr69796.c.jj 2016-02-26 13:16:32.352323768 +0100 +++ gcc/testsuite/gcc.dg/pr69796.c 2016-02-26 13:16:16.0 +0100 @@ -0,0 +1,10 @@ +/* PR c/69796 */ +/* { dg-do compile } */ + +struct S s;/* { dg-error "storage size of 's' isn't known" } */ + +void +foo () +{ + s a; /* { dg-error "expression statement has incomplete type|expected" } */ +} --- gcc/testsuite/gcc.dg/pr69974.c.jj 2016-02-26 13:13:40.239740046 +0100 +++ gcc/testsuite/gcc.dg/pr69974.c 2016-02-26 13:13:29.0 +0100 @@ -0,0 +1,13 @@ +/* PR c/69974 */ +/* { dg-do compile } */ + +struct S; +char foo (struct S *); +struct S a;/* { dg-error "storage size of 'a' isn't known" } */ +int b; + +void +bar () +{ + b &= foo (&a); +}
Re: [PATCH][ARM][wwwdocs] Mention Cortex-A32 and Cortex-A35 support in changes.html for GCC 6
On 26/02/16 14:25, Kyrill Tkachov wrote: > Hi all, > > This patch adds a note to changes.html about the added support for > Cortex-A32 and Cortex-A35. > > Ok to commit? > OK. R. > Thanks, > Kyrill > > wwwdocs-a32-a35.patch > > > Index: htdocs/gcc-6/changes.html > === > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v > retrieving revision 1.62 > diff -U 3 -r1.62 changes.html > --- htdocs/gcc-6/changes.html 24 Feb 2016 09:36:06 - 1.62 > +++ htdocs/gcc-6/changes.html 25 Feb 2016 10:51:14 - > @@ -345,6 +345,15 @@ > documentation for details of available attributes and > pragmas as well as usage instructions. > > + > + Support has been added for the following processors > + (GCC identifiers in parentheses): ARM Cortex-A32 > + (cortex-a32), ARM Cortex-A35 (cortex-a35). > + The GCC identifiers can be used > + as arguments to the -mcpu or -mtune options, > + for example: -mcpu=cortex-a32 or > + -mtune=cortex-a35. > + > > > >
Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute
On Mon, Feb 22, 2016 at 03:07:09PM +, Alan Lawrence wrote: > On 22/01/16 17:16, Alan Lawrence wrote: > > > >On 21/01/16 17:23, Alan Lawrence wrote: > >>On 18/01/16 17:10, Eric Botcazou wrote: > >>> > >>>Could you post the list of files that differ? How do they differ exactly? > >> > >>Hmmm. Well, I definitely had this failing to bootstrap once. I repeated > >>that, to > >>try to identify exactly what the differences wereand it succeeded even > >>with > >>my pre-AAPCS64-update host compiler. So, this is probably a false alarm; I'm > >>bootstrapping again, after a rebase, to make sure... > >> > >>--Alan > > > >Ok, rebased onto a more recent build, and bootstrapping with Ada posed no > >problems. Sorry for the noise. > > > >However, I had to drop the assert that TYPE_FIELDS was non-null because of > >some > >C++ testcases. > > > >Is this version OK for trunk? > > > >--Alan > > > >gcc/ChangeLog: > > > > * gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment): > > Rewrite, looking one level down for records and arrays. > >--- > > gcc/config/aarch64/aarch64.c | 31 --- > > 1 file changed, 16 insertions(+), 15 deletions(-) > > > >diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > >index 9142ac0..b084f83 100644 > >--- a/gcc/config/aarch64/aarch64.c > >+++ b/gcc/config/aarch64/aarch64.c > >@@ -1925,22 +1925,23 @@ aarch64_vfp_is_call_candidate (cumulative_args_t > >pcum_v, machine_mode mode, > > static unsigned int > > aarch64_function_arg_alignment (machine_mode mode, const_tree type) > > { > >- unsigned int alignment; > >+ if (!type) > >+return GET_MODE_ALIGNMENT (mode); > >+ if (integer_zerop (TYPE_SIZE (type))) > >+return 0; > > > >- if (type) > >-{ > >- if (!integer_zerop (TYPE_SIZE (type))) > >-{ > >- if (TYPE_MODE (type) == mode) > >-alignment = TYPE_ALIGN (type); > >- else > >-alignment = GET_MODE_ALIGNMENT (mode); > >-} > >- else > >-alignment = 0; > >-} > >- else > >-alignment = GET_MODE_ALIGNMENT (mode); > >+ gcc_assert (TYPE_MODE (type) == mode); > >+ > >+ if (!AGGREGATE_TYPE_P (type)) > >+return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)); > >+ > >+ if (TREE_CODE (type) == ARRAY_TYPE) > >+return TYPE_ALIGN (TREE_TYPE (type)); > >+ > >+ unsigned int alignment = 0; > >+ > >+ for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) > >+alignment = std::max (alignment, DECL_ALIGN (field)); > > > >return alignment; > > } > > > > > Ping. > > If this is not appropriate for GCC6, then is it OK for Stage 1 of GCC 7? I think this needs to be a GCC 7 fix at this point. Additionally, I'd like to fully understand PR69841 before we take this patch. In particular, under what circumstances can the C++ front end set DECL_ALIGN of a type to be wider than we expect. Can we ever end up with 128-bit alignment of a template parameter when we were expecting 32- or 64-bit alignment, and if we can what are the implications on this patch? Thanks, James
Re: [AArch64] Emit square root using the Newton series
On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote: > In preparation for the patch adding the Newton series also for > square root, I'd like to propose this patch changing the name of the > existing tuning flag for the reciprocal square root. This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also be OK. Pick your favourite! One comment on the replacement invoke.texi text below, otherwise this is OK to apply now. > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt > index 5cbd4cd..155d2bd 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -151,5 +151,5 @@ PC relative literal loads. > > mlow-precision-recip-sqrt > Common Var(flag_mrecip_low_precision_sqrt) Optimization > -When calculating a sqrt approximation, run fewer steps. > +Calculate the reciprocal square-root approximation in fewer steps. > This reduces precision, but can result in faster computation. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 490df93..eeff24d 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -12879,12 +12879,10 @@ corresponding flag to the linker. > @item -mno-low-precision-recip-sqrt > @opindex -mlow-precision-recip-sqrt > @opindex -mno-low-precision-recip-sqrt > -The square root estimate uses two steps instead of three for > double-precision, > -and one step instead of two for single-precision. > -Thus reducing latency and precision. > -This is only relevant if @option{-ffast-math} activates > -reciprocal square root estimate instructions. > -Which in turn depends on the target processor. > +The reciprocal square root approximation uses one step less than otherwise, > +thus reducing latency and precision. When calculating the reciprocal square root approximation, use one less step than otherwise, thus reducing latency and precision. Thanks, James
[PATCH] S/390: PR69709 Fix risbg splitter
This fixes a wrong code generation problem with the splitters introduced with that patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01840.html The target operand is used as temporary. This fails if it matches the source of the left shift which is read after writing the temporary. Thanks to Dominik for debugging it and thanks to Richard for the fix! Bootstrapped and regtested on s390x with-arch=z13. Bye, -Andreas- gcc/ChangeLog: 2016-02-26 Richard Henderson PR target/69709 * config/s390/s390.md (risbg and risbgn splitters): Allocate new pseudo in case the target rtx matches the source of the left shift. gcc/testsuite/ChangeLog: 2016-02-26 Andreas Krebbel PR target/69709 * gcc.target/s390/pr69709.c: New test. --- gcc/config/s390/s390.md | 24 gcc/testsuite/gcc.target/s390/pr69709.c | 39 + 2 files changed, 59 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/pr69709.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 2c90eae..8f92018 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3876,13 +3876,21 @@ (ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "") (match_operand:GPR 4 "nonzero_shift_count_operand" ""] "TARGET_ZEC12 && UINTVAL (operands[2]) + UINTVAL (operands[4]) >= " - [(set (match_dup 0) + [(set (match_dup 6) (lshiftrt:GPR (match_dup 1) (match_dup 2))) (set (match_dup 0) - (ior:GPR (and:GPR (match_dup 0) (match_dup 5)) + (ior:GPR (and:GPR (match_dup 6) (match_dup 5)) (ashift:GPR (match_dup 3) (match_dup 4] { operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1); + if (rtx_equal_p (operands[0], operands[3])) +{ + if (!can_create_pseudo_p ()) + FAIL; + operands[6] = gen_reg_rtx (mode); +} + else +operands[6] = operands[0]; }) (define_split @@ -3894,15 +3902,23 @@ (match_operand:GPR 4 "nonzero_shift_count_operand" "" (clobber (reg:CC CC_REGNUM))])] "TARGET_Z10 && !TARGET_ZEC12 && UINTVAL (operands[2]) + UINTVAL (operands[4]) >= " - [(set (match_dup 0) + [(set (match_dup 6) (lshiftrt:GPR (match_dup 1) (match_dup 2))) (parallel [(set (match_dup 0) - (ior:GPR (and:GPR (match_dup 0) (match_dup 5)) + (ior:GPR (and:GPR (match_dup 6) (match_dup 5)) (ashift:GPR (match_dup 3) (match_dup 4 (clobber (reg:CC CC_REGNUM))])] { operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1); + if (rtx_equal_p (operands[0], operands[3])) +{ + if (!can_create_pseudo_p ()) + FAIL; + operands[6] = gen_reg_rtx (mode); +} + else +operands[6] = operands[0]; }) (define_insn "*rsbg__noshift" diff --git a/gcc/testsuite/gcc.target/s390/pr69709.c b/gcc/testsuite/gcc.target/s390/pr69709.c new file mode 100644 index 000..e9aa024 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr69709.c @@ -0,0 +1,39 @@ +/* PR69709 This testcase used to fail due to a broken risbg + splitter. */ + +/* { dg-do run } */ +/* { dg-options "-O3 -march=z10" } */ + + +typedef struct +{ + unsigned int sig[2]; +} +val_t; + +unsigned int __attribute__ ((noinline)) +div_significands (const val_t * a) +{ + val_t u = *a; + int bit = 64; + unsigned int r; + do +{ + u.sig[1] = (u.sig[1] << 1) | (u.sig[0] >> 31); + u.sig[0] = 42; + + if (bit == 64) + r = u.sig[1]; +} + while (--bit >= 0); + return r; +} + +int +main (void) +{ + val_t a = { { 0x1, 0x1 } }; + if (div_significands (&a) != 2) +__builtin_abort (); + return 0; +} -- 1.9.1
Re: Wonly-top-basic-asm
On 02/21/2016 11:27 AM, David Wohlferd wrote: So now what? I have one Bernd who likes the sample, and one who doesn't. Obviously I think what I'm proposing is better than what's there now and I've done my best to say why. But me believing it to be better doesn't get anything checked in. I hadn't thought it through well enough. Jan's objection (order isn't guaranteed) is relevant. I'd drop the example. Bernd
Re: Fix PR44281 (bad RA with global regs)
On 02/22/2016 03:37 PM, Richard Biener wrote: Do calls properly clobber them even if they are not in the set of call-clobbered regs? Are asm()s properly using/clobbering them? I think you are allowed to use them in asm()s without adding constraints for them? Calls do, asms currently don't AFAICT. Not sure whether it's allowed to use them, but I think it should be straightforward to adjust df-scan. Michael Matz wrote: Some jit-like code uses global reg vars to reserve registers for the generated code. It would have to add -ffixed- compilation options instead of only the global vars after the patch. I think the advantages of having "good RA" with global reg vars are dubious enough to not warrant breaking backward compatibility. I don't see the need for new options - if we have proper live info for calls and asms, what other reason could there be for incomplete liveness information? RB: > I'd say this is sth for GCC 7. Not really objecting to that. Would you ack it if I extended df-scan for the asm case? Bernd
Re: [PATCH][AArch64] Remove an unused reload hook.
On Thu, Feb 25, 2016 at 12:00:58PM +0100, Yvan Roux wrote: > Hi, > > On 26 January 2015 at 18:01, Matthew Wahab wrote: > > Hello, > > > > The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the > > Aarch64 backend no longer supports reload, this macro is not needed and this > > patch removes it. > > > > Tested aarch64-none-linux-gnu with gcc-check. No new failures. > > > > Ok for trunk? > > Matthew > > > > gcc/ > > 2015-01-26 Matthew Wahab > > > > * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove. > > * config/aarch64/arch64-protos.h > > (aarch64_legitimize_reload_address): Remove. > > * config/aarch64/aarch64.c (aarch64_legitimize_reload_address): > > Remove. > > It seems that this old patch was forgotten, I guess that it'll have to > wait for GCC 7 now, but I think it's a good thing top cleanup the > reload specific hooks and constructions (I've another patch on for > that under on-going). Agreed. Please repost the patch once GCC 7 development opens and I'll take a look at it then. Thanks, James
Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false
On Thu, Feb 25, 2016 at 09:25:45AM +, Kyrill Tkachov wrote: > Hi all, > > In this wrong-code PR we get bad code when synthesising a TImode right shift > by variable amount using DImode shifts during expand. > > The expand_double_word_shift function expands two paths: one where the > variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for > aarch64) at runtime > and one where it's less than that, and performs a conditional select between > the two in the end. > > The second path is the one that goes wrong here. > The algorithm it uses for calculating a TImode shift when the variable shift > amount is <64 is: > > > --DImode- --DImode- --DImode- --DImode- > { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ > source-lo ___| } >> var_amnt > > 1) carry = source-hi << 1 > 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending > on shift_truncation_mask > 3) carry = carry << tmp // net effect is that carry is source-hi shifted left > by BITS_PER_WORD - var_amnt > 4) target-lo = source-lo >>u var_amnt //unsigned shift. > 5) target-lo = target-lo | carry > 6) target-hi = source-hi >> var_amnt > > where the two DImode words source-hi and source-lo are the two words of the > source TImode value and var_amnt is the register holding the variable shift > amount. This is performed by the expand_subword_shift function in optabs.c. > > Step 2) is valid only if the target truncates the shift amount by the width > of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and > TARGET_SHIFT_TRUNCATION_MASK is 63 in this case. > > Step 3) is the one that goes wrong. On aarch64 a left shift is usually > implemented using the LSL instruction but we also have alternatives that can > use the SIMD registers and the USHL instruction. In this case we end up > using the USHL instruction. However, although the USHL instruction does > a DImode shift, it doesn't truncate the shift amount by 64, but rather by > 255. Furthermore, the shift amount is interpreted as a 2's complement signed > integer and if it's negative performs a right shift. This is in contrast > with the normal LSL instruction which just performs an unsigned shift by an > amount truncated by 64. > > Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't > assume shift amounts are truncated unless we know we can only ever use the > LSL instructions for variable shifts. > > However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode, > so expand_subword_shift assumes that since the mask is non-zero it's a valid > shift truncation mask. The documentation for TARGET_SHIFT_TRUNCATION_MASK > says: > " The default implementation of this function returns > @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED} > and 0 otherwise." > > So since for TARGET_SIMD we cannot guarantee that all shifts truncate their > amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when > SHIFT_COUNT_TRUNCATED is false. This is what the patch does, and with it > step 2) becomes: > 2) tmp = BITS_PER_WORD - 1 - var_amnt > which behaves correctly on aarch64. > > Unfortunately, the testcase from the PR has very recently gone latent on > trunk because it depends on register allocation picking a particular > alternative from the *aarch64_ashl_sisd_or_int_3 pattern in aarch64.md. > I tried to do some inline asm tricks to get it to force the correct > constraints but was unsuccessful. Nevertheless I've included the testcase in > the patch. I suppose it's better than nothing. Thanks for the great write-up. This level of detail is very helpful for review. OK for trunk. Thanks, James > > Bootstrapped and tested on aarch64. > > Ok for trunk? > > Thanks, > Kyrill > > > 2016-02-25 Kyrylo Tkachov > > PR target/69613 > * config/aarch64/aarch64.c (aarch64_shift_truncation_mask): > Return 0 if !SHIFT_COUNT_TRUNCATED > > 2016-02-25 Kyrylo Tkachov > > PR target/69613 > * gcc.dg/torture/pr69613.c: New test. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT > aarch64_shift_truncation_mask (machine_mode mode) > { >return > -(aarch64_vector_mode_supported_p (mode) > +(!SHIFT_COUNT_TRUNCATED > + || aarch64_vector_mode_supported_p (mode) > || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - > 1); > } > > diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c > b/gcc/testsuite/gcc.dg/torture/pr69613.c > new file mode 100644 > index > ..44f2b0cc91ac4b12c4d255b608f95bc8bf016923 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c > @@ -0,0 +1
Re: [PATCH][AArch64] Set TREE_TARGET_GLOBALS in aarch64_set_current_function when new tree is the default node to recalculate optab availability
On Thu, Feb 25, 2016 at 11:04:21AM +, Kyrill Tkachov wrote: > Hi all, > > Seems like aarch64 is suffering from something similar to PR 69245 as well. > If a target pragma sets the target state to the same as the > target_option_default_node the node is just a pointer to > target_option_default_node rather than a distinct identical node. So we must > still restore the target globals even when setting to > target_option_default_node in order to force the midend to recompute the > availability of various optabs. > > If we don't do it, we can get in a problem like in the testcase where the > isa_flags are all set correctly, but the optab HAVE_* predicates have not > been recomputed. > > There is also a related issue present when popping/resetting target pragmas > for which I'll send out a patch separately. > > Bootstrapped and tested on aarch64. > > Ok for trunk? OK. Thanks, James > Thanks, > Kyrill > > > 2016-02-25 Kyrylo Tkachov > > PR target/69245 > * config/aarch64/aarch64.c (aarch64_set_current_function): Save/restore > target globals when switching to target_option_default_node. > > 2016-02-25 Kyrylo Tkachov > > PR target/69245 > * gcc.target/aarch64/pr69245_1.c: New test.
[omp, hsa] Do not gridify simd constructs
Hi, unfortunately, I have missed two execution failures when using HSA (teams-6.f90 and target1.f90), both of which are caused by not handling the simd part of combined target teams distribute parallel for simd construct. I have not really thought through how exactly should GPUs teat the simd construct in a combined construct but of course we must not miscompile. So I'd like to commit the following patch which just disallows gridification of such cases to trunk now and am working on a fix that removes the simd loop for the hsa branch. Eventually we might also want to hsa-vectorize the body even though the explicit loop is missing, but that will also mean that the HSA grid size has to shrink appropriately. Bootstrapped and tested on x86_64-linux, with and without HSA enabled. OK for trunk? Thanks, Martin 2016-02-17 Martin Jambor * omp-low.c (grid_find_ungridifiable_statement): Store problematic statements to wi->info. Also disallow omp simd constructs. (grid_target_follows_gridifiable_pattern): Use wi.info to dump reason for not gridifying. Dump special string for omp_for. --- gcc/omp-low.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index fcbb3e0..989d03e 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -17241,7 +17241,7 @@ grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc, static tree grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi, bool *handled_ops_p, - struct walk_stmt_info *) + struct walk_stmt_info *wi) { *handled_ops_p = false; gimple *stmt = gsi_stmt (*gsi); @@ -17251,6 +17251,7 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi, if (gimple_call_noreturn_p (as_a (stmt))) { *handled_ops_p = true; + wi->info = stmt; return error_mark_node; } break; @@ -17266,8 +17267,19 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi, case GIMPLE_OMP_TARGET: case GIMPLE_OMP_ORDERED: *handled_ops_p = true; + wi->info = stmt; return error_mark_node; +case GIMPLE_OMP_FOR: + if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD) + && gimple_omp_for_combined_into_p (stmt)) + { + *handled_ops_p = true; + wi->info = stmt; + return error_mark_node; + } + break; + default: break; } @@ -17509,10 +17521,11 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p struct walk_stmt_info wi; memset (&wi, 0, sizeof (wi)); - if (gimple *bad = walk_gimple_seq (gimple_omp_body (gfor), -grid_find_ungridifiable_statement, -NULL, &wi)) + if (walk_gimple_seq (gimple_omp_body (gfor), + grid_find_ungridifiable_statement, + NULL, &wi)) { + gimple *bad = (gimple *) wi.info; if (dump_enabled_p ()) { if (is_gimple_call (bad)) @@ -17520,6 +17533,11 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p "Will not turn target construct into a gridified " " GPGPU kernel because the inner loop contains " "call to a noreturn function\n"); + if (gimple_code (bad) == GIMPLE_OMP_FOR) + dump_printf_loc (MSG_NOTE, tloc, +"Will not turn target construct into a gridified " +" GPGPU kernel because the inner loop contains " +"a simd construct\n"); else dump_printf_loc (MSG_NOTE, tloc, "Will not turn target construct into a gridified " -- 2.7.1
[hsa] Fail in presence of atomic operations in private segment
Hi, the HSA does not allow atomic instructions operating on the private segment because they are quite pointless. In the long term, we should put addressable local variables in the global memory somewhere (I'm going to look at the nvidia stack effort soon). But for now we need to fail in the compiler, because otherwise there is testcase in the libgomp testsuite that complains about the finalizer errors in the output of the execute test. Bootstrapped and tested, I'll commit it to trunk shortly. Thanks, Martin 2016-02-17 Martin Jambor * hsa-gen.c (gen_hsa_ternary_atomic_for_builtin): Fail in presence of atomic operations in private segment. --- gcc/hsa-gen.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 28e8b6f..717e755 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -4557,8 +4557,13 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig, hsa_op_address *addr; addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb); - /* TODO: Warn if addr has private segment, because the finalizer will not - accept that (and it does not make much sense). */ + if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE) +{ + HSA_SORRY_AT (gimple_location (stmt), + "HSA does not implement atomic operations in private " + "segment"); + return; +} hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb); -- 2.7.1
[hsa] Satisfy conditional move operand type constrains
Hi, It turned out that we got the types of conditional move operands and their immediate operands wrong in, partly because we missed a detail in the specs, partly because the finalizer/verifier is even stricter in one aspect. I'll commit the bootstrapped and tested fix below shortly. Thanks, Martin 2016-02-17 Martin Jambor * hsa.h (is_a_helper): New overload for hsa_op_immed for hsa_op_with_type operands. (hsa_unsigned_type_for_type): Declare. * hsa.c (hsa_unsigned_type_for_type): New function. * hsa-gen.c (gen_hsa_binary_operation): Use hsa_unsigned_type_for_type. (gen_hsa_insns_for_operation_assignment): Satisfy constrains of the finalizer. Do not emit extra move. --- gcc/hsa-gen.c | 28 +++- gcc/hsa.c | 8 gcc/hsa.h | 12 3 files changed, 31 insertions(+), 17 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 717e755..a7dc220 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3022,7 +3022,7 @@ gen_hsa_binary_operation (int opcode, hsa_op_reg *dest, && is_a (op2)) { hsa_op_immed *i = dyn_cast (op2); - i->set_type (hsa_uint_for_bitsize (hsa_type_bit_size (i->m_type))); + i->set_type (hsa_unsigned_type_for_type (i->m_type)); } hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->m_type, dest, @@ -3233,27 +3233,21 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb) ctrl = r; } - hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); - hsa_op_with_type *rhs3_reg = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - - BrigType16_t btype = hsa_bittype_for_type (dest->m_type); - hsa_op_reg *tmp = new hsa_op_reg (btype); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - rhs2_reg->m_type = btype; - rhs3_reg->m_type = btype; + BrigType16_t utype = hsa_unsigned_type_for_type (dest->m_type); + if (is_a (op2)) + op2->m_type = utype; + if (is_a (op3)) + op3->m_type = utype; hsa_insn_basic *insn - = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, tmp->m_type, tmp, ctrl, - rhs2_reg, rhs3_reg); + = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, + hsa_bittype_for_type (dest->m_type), + dest, ctrl, op2, op3); hbb->append_insn (insn); - - /* As operands of a CMOV insn must be Bx types, we have to emit - a conversion insn. */ - hsa_insn_basic *mov = new hsa_insn_basic (2, BRIG_OPCODE_MOV, - dest->m_type, dest, tmp); - hbb->append_insn (mov); - return; } case COMPLEX_EXPR: diff --git a/gcc/hsa.c b/gcc/hsa.c index f0b3205..9537e29 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -472,6 +472,14 @@ hsa_bittype_for_type (BrigType16_t t) return hsa_bittype_for_bitsize (hsa_type_bit_size (t)); } +/* Return HSA unsigned integer type with the same size as the type T. */ + +BrigType16_t +hsa_unsigned_type_for_type (BrigType16_t t) +{ + return hsa_uint_for_bitsize (hsa_type_bit_size (t)); +} + /* Return true if and only if TYPE is a floating point number type. */ bool diff --git a/gcc/hsa.h b/gcc/hsa.h index f0436f3..275a954 100644 --- a/gcc/hsa.h +++ b/gcc/hsa.h @@ -199,6 +199,17 @@ is_a_helper ::test (hsa_op_base *p) return p->m_kind == BRIG_KIND_OPERAND_CONSTANT_BYTES; } +/* Likewise, but for a more specified base. */ + +template <> +template <> +inline bool +is_a_helper ::test (hsa_op_with_type *p) +{ + return p->m_kind == BRIG_KIND_OPERAND_CONSTANT_BYTES; +} + + /* HSA register operand. */ class hsa_op_reg : public hsa_op_with_type @@ -1326,6 +1337,7 @@ BrigType16_t hsa_bittype_for_bitsize (unsigned bitsize); BrigType16_t hsa_uint_for_bitsize (unsigned bitsize); BrigType16_t hsa_float_for_bitsize (unsigned bitsize); BrigType16_t hsa_bittype_for_type (BrigType16_t t); +BrigType16_t hsa_unsigned_type_for_type (BrigType16_t t); bool hsa_type_float_p (BrigType16_t type); bool hsa_type_integer_p (BrigType16_t type); bool hsa_btype_p (BrigType16_t type); -- 2.7.1
[hsa/69674] Make testsuite libgomp.c/for-3.c compile with -m32
Hi, the following wrong decision about HSA pointer type leads to hitting an assert later in the out-of-ssa phase for 32bit targets. I have not attempted to test HSA on a 32bit i686 yet, I believe there is no functional runtime (although it should be possible to come up with one) and we do not build libgomp plugin for it, but we should not ICE and the new code below is the correct way of doing things anyway. So I will commit it shortly, it has been included in a bootstrap/testsuite run. Thanks, Martin 2016-02-18 Martin Jambor pr hsa/69674 * hsa-gen.c (gen_hsa_phi_from_gimple_phi): Use proper hsa type for pointers. (gen_hsa_addr): Allow integer constants in TMR_INDEX2. --- gcc/hsa-gen.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index a7dc220..a295fda 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -2105,9 +2105,17 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOST_WIDE_INT *output_bitsize = NULL, } if (TMR_INDEX2 (ref)) { - hsa_op_base *disp2 = hsa_cfun->reg_for_gimple_ssa - (TMR_INDEX2 (ref))->get_in_type (addrtype, hbb); - reg = add_addr_regs_if_needed (reg, as_a (disp2), hbb); + if (TREE_CODE (TMR_INDEX2 (ref)) == SSA_NAME) + { + hsa_op_base *disp2 = hsa_cfun->reg_for_gimple_ssa + (TMR_INDEX2 (ref))->get_in_type (addrtype, hbb); + reg = add_addr_regs_if_needed (reg, as_a (disp2), +hbb); + } + else if (TREE_CODE (TMR_INDEX2 (ref)) == INTEGER_CST) + offset += wi::to_offset (TMR_INDEX2 (ref)); + else + gcc_unreachable (); } offset += wi::to_offset (TMR_OFFSET (ref)); break; @@ -5329,7 +5337,8 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb) hsa_op_address *addr = gen_hsa_addr (TREE_OPERAND (op, 0), hbb_src); - hsa_op_reg *dest = new hsa_op_reg (BRIG_TYPE_U64); + hsa_op_reg *dest + = new hsa_op_reg (hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT)); hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_LDA, BRIG_TYPE_U64, dest, addr); -- 2.7.1
[hsa/69568] Fix ld instruction type for packed data
Hi, this is a fix for another type of HSA immediate operands that we got wrong. Apparently, the immediate value that is to be used in an operation on packed type must be unsigned (unless it is 128bit, then it has to be bit-typed, which is however something we are handling correctly even now). The patch below fulfills that requirement. I have bootstrapped and tested it and will commit it shortly. Thanks, Martin 2016-02-18 Martin Jambor PR hsa/69568 * hsa.h (hsa_type_packed_p): Declare. * hsa.c (hsa_type_packed_p): New function. * hsa-gen.c (mem_type_for_type): Use unsigned type for packed loads. (gen_hsa_insns_for_store): Use hsa_type_packed_p. * hsa-brig.c (emit_basic_insn): Likewise. --- gcc/hsa-brig.c | 2 +- gcc/hsa-gen.c | 6 -- gcc/hsa.c | 8 gcc/hsa.h | 1 + 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index cfbac58..61cfd8b 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -1803,7 +1803,7 @@ emit_basic_insn (hsa_insn_basic *insn) repr.base.type = lendian16 (type); repr.base.operands = lendian32 (emit_insn_operands (insn)); - if ((type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE) + if (hsa_type_packed_p (type)) { if (hsa_type_float_p (type) && !hsa_opcode_floating_bit_insn_p (insn->m_opcode)) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index a295fda..3b915cc 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -754,11 +754,13 @@ mem_type_for_type (BrigType16_t type) unsigned type?). */ if ((type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_128) return BRIG_TYPE_B128; - else if (hsa_btype_p (type)) + else if (hsa_btype_p (type) || hsa_type_packed_p (type)) { unsigned bitsize = hsa_type_bit_size (type); if (bitsize < 128) return hsa_uint_for_bitsize (bitsize); + else + return hsa_bittype_for_bitsize (bitsize); } return type; } @@ -2648,7 +2650,7 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb) we can modify the above in place. */ if (hsa_op_immed *imm = dyn_cast (src)) { - if ((imm->m_type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_NONE) + if (!hsa_type_packed_p (imm->m_type)) imm->m_type = mem->m_type; else { diff --git a/gcc/hsa.c b/gcc/hsa.c index 9537e29..09bfd28 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -480,6 +480,14 @@ hsa_unsigned_type_for_type (BrigType16_t t) return hsa_uint_for_bitsize (hsa_type_bit_size (t)); } +/* Return true if TYPE is a packed HSA type. */ + +bool +hsa_type_packed_p (BrigType16_t type) +{ + return (type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE; +} + /* Return true if and only if TYPE is a floating point number type. */ bool diff --git a/gcc/hsa.h b/gcc/hsa.h index 275a954..3a13fbd 100644 --- a/gcc/hsa.h +++ b/gcc/hsa.h @@ -1338,6 +1338,7 @@ BrigType16_t hsa_uint_for_bitsize (unsigned bitsize); BrigType16_t hsa_float_for_bitsize (unsigned bitsize); BrigType16_t hsa_bittype_for_type (BrigType16_t t); BrigType16_t hsa_unsigned_type_for_type (BrigType16_t t); +bool hsa_type_packed_p (BrigType16_t type); bool hsa_type_float_p (BrigType16_t type); bool hsa_type_integer_p (BrigType16_t type); bool hsa_btype_p (BrigType16_t type); -- 2.7.1
[hsa, testsuite] Gridification tests
Hi, the patch below adds a DejaGNU effective target predicate (is that the correct dejagnu term?) offload_hsa so that selected tests can be run only if the hsa offloading is enabled. I hope it is fairly standard stuff. Additionally, it adds one C/C++ and one Fortran testsuite to check that gridification happens. Tested, both with and without HSA enabled. OK for trunk? Thanks, Martin 2016-02-10 Martin Jambor * target-supports.exp (check_effective_target_offload_hsa): New. * c-c++-common/gomp/gridify-1.c: New test. * gfortran.dg/gomp/gridify-1.f90: Likewise. --- gcc/testsuite/c-c++-common/gomp/gridify-1.c | 54 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 | 16 + gcc/testsuite/lib/target-supports.exp| 8 + 3 files changed, 78 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-1.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c b/gcc/testsuite/c-c++-common/gomp/gridify-1.c new file mode 100644 index 000..ba7a866 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target offload_hsa } */ +/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */ + +void +foo1 (int n, int *a, int workgroup_size) +{ + int i; +#pragma omp target +#pragma omp teams thread_limit(workgroup_size) +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) +for (i = 0; i < n; i++) + a[i]++; +} + +void +foo2 (int j, int n, int *a) +{ + int i; +#pragma omp target teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) +for (i = j + 1; i < n; i++) + a[i] = i; +} + +void +foo3 (int j, int n, int *a) +{ + int i; +#pragma omp target teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) + for (i = j + 1; i < n; i += 3) +a[i] = i; +} + +void +foo4 (int j, int n, int *a) +{ +#pragma omp parallel + { +#pragma omp single +{ + int i; +#pragma omp target +#pragma omp teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) + for (i = j + 1; i < n; i += 3) + a[i] = i; +} + } +} + + +/* { dg-final { scan-tree-dump-times "Target construct will be turned into a gridified GPGPU kernel" 4 "omplower" } } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 new file mode 100644 index 000..00ff7f5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-require-effective-target offload_hsa } +! { dg-options "-fopenmp -fdump-tree-omplower-details" } */ + +subroutine vector_square(n, a, b) + integer i, n, b(n), a(n) +!$omp target teams +!$omp distribute parallel do + do i=1,n + b(i) = a(i) * a(i) + enddo +!$omp end distribute parallel do +!$omp end target teams +end subroutine vector_square + +! { dg-final { scan-tree-dump "Target construct will be turned into a gridified GPGPU kernel" "omplower" } } diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0b4252f..fac4c3c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6936,3 +6936,11 @@ proc check_effective_target_offload_nvptx { } { int main () {return 0;} } "-foffload=nvptx-none" ] } + +# Return 1 if the compiler has been configured with hsa offloading. + +proc check_effective_target_offload_hsa { } { +return [check_no_compiler_messages offload_hsa assembly { + int main () {return 0;} +} "-foffload=hsa" ] +} -- 2.7.1
[hsa, testsuite] Suppress hsa warnings in compiler gomp tests
Hi, compilation of (some) target constructs in the following testcases fails and the warning the compiler gives out by default are then reported as excess errors. There are two options how to deal with them. Either we can change the gomp.exp files to pass -Wno-hsa as the default set of options and then override it in the tests which we want to be compiled also to HSAIL, or disable it explicitely only in the tests where we know about the warnings. Both approaches work. I have selected the second option so that if we regress in the sense that we fail to produce an HSAIL for a test for which we could do it earlier, I can detect it and examine the issue. The explicit -Wno-hsa options are added by the patch below, which has been tested both with and without HSA. OK for trunk? Thanks, Martin 2016-02-12 Martin Jambor testsuite/ * c-c++-common/gomp/clauses-1.c: Add -Wno-hsa option. * c-c++-common/gomp/if-1.c: Likewise. * c-c++-common/gomp/nesting-1.c: Likewise. * c-c++-common/gomp/nesting-warn-1.c: Likewise. * c-c++-common/gomp/pr61486-2.c: Likewise. * c-c++-common/gomp/target-teams-1.c: Likewise. * g++.dg/gomp/target-teams-1.C: Likewise. * gcc.dg/gomp/pr68128-2.c: Likewise. * gfortran.dg/gomp/target1.f90: Likewise. * gfortran.dg/gomp/target2.f90: Likewise. * gfortran.dg/gomp/target3.f90: Likewise. --- gcc/testsuite/c-c++-common/gomp/clauses-1.c | 2 +- gcc/testsuite/c-c++-common/gomp/if-1.c | 2 +- gcc/testsuite/c-c++-common/gomp/nesting-1.c | 2 ++ gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c | 2 ++ gcc/testsuite/c-c++-common/gomp/pr61486-2.c | 2 +- gcc/testsuite/c-c++-common/gomp/target-teams-1.c | 2 +- gcc/testsuite/g++.dg/gomp/target-teams-1.C | 2 +- gcc/testsuite/gcc.dg/gomp/pr68128-2.c| 2 +- gcc/testsuite/gfortran.dg/gomp/target1.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/target2.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/target3.f90 | 2 +- 11 files changed, 13 insertions(+), 9 deletions(-) diff --git a/gcc/testsuite/c-c++-common/gomp/clauses-1.c b/gcc/testsuite/c-c++-common/gomp/clauses-1.c index 2d1c352..8eb514d 100644 --- a/gcc/testsuite/c-c++-common/gomp/clauses-1.c +++ b/gcc/testsuite/c-c++-common/gomp/clauses-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ +/* { dg-options "-fopenmp -Wno-hsa" } */ /* { dg-additional-options "-std=c99" { target c } } */ int t; diff --git a/gcc/testsuite/c-c++-common/gomp/if-1.c b/gcc/testsuite/c-c++-common/gomp/if-1.c index 4ba708c..3a2862f 100644 --- a/gcc/testsuite/c-c++-common/gomp/if-1.c +++ b/gcc/testsuite/c-c++-common/gomp/if-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ +/* { dg-options "-fopenmp -Wno-hsa" } */ void foo (int a, int b, int *p, int *q) diff --git a/gcc/testsuite/c-c++-common/gomp/nesting-1.c b/gcc/testsuite/c-c++-common/gomp/nesting-1.c index 61b2f81..395f9d7 100644 --- a/gcc/testsuite/c-c++-common/gomp/nesting-1.c +++ b/gcc/testsuite/c-c++-common/gomp/nesting-1.c @@ -1,3 +1,5 @@ +/* { dg-options "-fopenmp -Wno-hsa" } */ + extern int i; void diff --git a/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c b/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c index 800ad53..30d037b 100644 --- a/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c +++ b/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c @@ -1,3 +1,5 @@ +/* { dg-options "-fopenmp -Wno-hsa" } */ + extern int i; void diff --git a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c index db97143..009e224 100644 --- a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c +++ b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c @@ -1,6 +1,6 @@ /* PR middle-end/61486 */ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ +/* { dg-options "-fopenmp -Wno-hsa" } */ /* { dg-require-effective-target alloca } */ #pragma omp declare target diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c index 0a707c2..94c997f 100644 --- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c +++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp -fdump-tree-gimple" } */ +/* { dg-options "-fopenmp -fdump-tree-gimple -Wno-hsa" } */ int v = 6; void bar (int); diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C index 0a97de0..7e2ba39 100644 --- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C +++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-fopenmp -fdump-tree-gimple" } +// { dg-options "-fopenmp -fdump-tree-gimple -Wno-hsa" } int v = 6; void bar (int); diff --git a/gcc/testsuite/gcc.dg/gomp/pr68128-2.c b/gcc/testsuite/gcc.dg/gomp/pr68128-2.c index 58a07e9..6cdb9d4 100644 --- a/gcc/testsuite/gcc.dg/gomp/pr68
[hsa,testsuite] Introduce offload_device_shared_as effective target
Hello, this patch has been written by Keith McDaniel when he was working at AMD (and so it should be covered by their blanket copyright assignment) and adds a libgomp testsuite effective target predicate offload_device_shared_as that allows us to run tests only when target constructs are run on a device with shared memory (this includes the host). Keith included a C++ test to illustrate the use. I have tested this thoroughly, both with and without HSA (enabled or present). OK for trunk? Thanks, Martin 2016-02-10 Keith McDaniel Martin Jambor * testsuite/lib/libgomp.exp (check_effective_target_offload_device_shared_as): New proc. * testsuite/libgomp.c++/declare_target-1.C: New test. --- libgomp/testsuite/lib/libgomp.exp| 13 libgomp/testsuite/libgomp.c++/declare_target-1.C | 38 2 files changed, 51 insertions(+) create mode 100644 libgomp/testsuite/libgomp.c++/declare_target-1.C diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index a4c9d83..154a447 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -343,6 +343,19 @@ proc check_effective_target_offload_device_nonshared_as { } { } } ] } + +# Return 1 if offload device is available and it has shared address space. +proc check_effective_target_offload_device_shared_as { } { +return [check_runtime_nocache offload_device_shared_as { + int main () + { + int x = 10; + #pragma omp target map(to: x) + x++; + return x == 10; + } +} ] +} # Return 1 if at least one nvidia board is present. diff --git a/libgomp/testsuite/libgomp.c++/declare_target-1.C b/libgomp/testsuite/libgomp.c++/declare_target-1.C new file mode 100644 index 000..4394bb1 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/declare_target-1.C @@ -0,0 +1,38 @@ +// { dg-do run } +// { dg-require-effective-target offload_device_shared_as } + +#include + +struct typeX +{ + int a; +}; + +class typeY +{ +public: + int foo () { return a^0x01; } + int a; +}; + +#pragma omp declare target +struct typeX varX; +class typeY varY; +#pragma omp end declare target + +int main () +{ + varX.a = 0; + varY.a = 0; + + #pragma omp target +{ + varX.a = 100; + varY.a = 100; +} + + if (varX.a != 100 || varY.a != 100) +abort (); + + return 0; +} -- 2.7.1
[hsa, testsuite] Suppress hsa warnings in libgomp tests
Hi, just like with the compiler gomp testsuite, we need to add -Wno-hsa to options when compiling libgomp testcases in order not to have "excess errors" failures when HSA is enabled. There are quite many of such testcases on the trunk because I have disabled the dynamic parallelism way of executing stuff. Hopefully we'll be able to revert many of the hunks below when we get that working. The patch has been tested both with and without HSA enabled. OK for trunk? Thanks, Martin 2016-02-12 Martin Jambor * testsuite/libgomp.c++/examples-4/target_data-5.C: Do not generate HSA warnings. * testsuite/libgomp.c++/for-11.C: Likewise. * testsuite/libgomp.c++/for-13.C: Likewise. * testsuite/libgomp.c++/for-14.C: Likewise. * testsuite/libgomp.c++/pr66199-2.C: Likewise. * testsuite/libgomp.c++/pr66199-4.C: Likewise. * testsuite/libgomp.c++/pr66199-5.C: Likewise. * testsuite/libgomp.c++/pr66199-6.C: Likewise. * testsuite/libgomp.c++/pr66199-7.C: Likewise. * testsuite/libgomp.c++/pr66199-8.C: Likewise. * testsuite/libgomp.c++/target-1.C: Likewise. * testsuite/libgomp.c++/target-2.C: Likewise. * testsuite/libgomp.c++/target-3.C: Likewise. * testsuite/libgomp.c++/target-8.C: Likewise. * testsuite/libgomp.c/examples-4/async_target-1.c: Likewise. * testsuite/libgomp.c/examples-4/declare_target-3.c: Likewise. * testsuite/libgomp.c/examples-4/declare_target-4.c: Likewise. * testsuite/libgomp.c/examples-4/declare_target-5.c: Likewise. * testsuite/libgomp.c/examples-4/target-1.c: Likewise. * testsuite/libgomp.c/examples-4/target-2.c: Likewise. * testsuite/libgomp.c/examples-4/target-3.c: Likewise. * testsuite/libgomp.c/examples-4/target-4.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-1.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-2.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-3.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-4.c: Likewise. * testsuite/libgomp.c/examples-4/target_update-1.c: Likewise. * testsuite/libgomp.c/examples-4/target_update-2.c: Likewise. * testsuite/libgomp.c/examples-4/teams-2.c: Likewise. * testsuite/libgomp.c/examples-4/teams-3.c: Likewise. * testsuite/libgomp.c/examples-4/teams-4.c: Likewise. * testsuite/libgomp.c/examples-4/teams-6.c: Likewise. * testsuite/libgomp.c/for-3.c: Likewise. * testsuite/libgomp.c/for-5.c: Likewise. * testsuite/libgomp.c/for-6.c: Likewise. * testsuite/libgomp.c/pr66199-2.c: Likewise. * testsuite/libgomp.c/pr66199-4.c: Likewise. * testsuite/libgomp.c/pr66199-5.c: Likewise. * testsuite/libgomp.c/pr66199-6.c: Likewise. * testsuite/libgomp.c/pr66199-7.c: Likewise. * testsuite/libgomp.c/pr66199-8.c: Likewise. * testsuite/libgomp.c/pr66714.c: Likewise. * testsuite/libgomp.c/target-1.c: Likewise. * testsuite/libgomp.c/target-16.c: Likewise. * testsuite/libgomp.c/target-2.c: Likewise. * testsuite/libgomp.c/target-31.c: Likewise. * testsuite/libgomp.c/target-32.c: Likewise. * testsuite/libgomp.c/target-35.c: Likewise. * testsuite/libgomp.c/target-5.c: Likewise. * testsuite/libgomp.c/target-6.c: Likewise. * testsuite/libgomp.c/target-critical-1.c: Likewise. * testsuite/libgomp.c/target-teams-1.c: Likewise. * testsuite/libgomp.c/thread-limit-2.c: Likewise. * testsuite/libgomp.c/thread-limit-3.c: Likewise. * testsuite/libgomp.fortran/examples-4/async_target-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/declare_target-3.f90: Likewise. * testsuite/libgomp.fortran/examples-4/declare_target-4.f90: Likewise. * testsuite/libgomp.fortran/examples-4/declare_target-5.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-3.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-4.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-3.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-4.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-5.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_update-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_update-2.f90: Likewise. * testsuite/libgom
[hsa,testsuite] Adjust libgomp tests that do not work on host fallback
Hi, this patch avoids run-time failures in libgomp testsuite that curtrently happen when HSA offloading is actually used. All of them currently require the offload_device effective target which the patch changes to offload_device_nonshared_as one. For some tests, such as libgomp.c/examples-4/device-1.c this is really the correct thing to do because the test explicitely checks that changes that happen in a target construct and are not mapped back are not observable on the host. However, the majority of the tests has a different problem. The test for some reason is not compiled into HSAIL (usually because it would require the dynamic parallelism path which is disabled or because it calls abort from within target which HSA so far cannot handle) and so the host fallback is called, even though the test actually is not supposed to be called on it. The tests then call omp_is_initial_device to verify they are not running on the host and decide to fail. Changing the effective target only to devices with non-shared memory probably isn't the really correct fix. We basically want to disable the host fallback for them regardeless of address spaces but I cannot think of a simple and generic way of doing that. However, all testcases for non-shared memory devices were written with disallowed fallback in mind and so this soulution also gives the desired result. Tested both with and without HSA (enabled or present). OK for trunk? Thanks, Martin 2016-02-12 Martin Jambor libgomp/ * testsuite/libgomp.c/examples-4/async_target-2.c: Only run on non-shared memory accelerators. * testsuite/libgomp.c/examples-4/device-1.c: Likewise. * testsuite/libgomp.c/examples-4/target-5.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-6.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-7.c: Likewise. * testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise. --- libgomp/testsuite/libgomp.c/examples-4/async_target-2.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/device-1.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target-5.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target_data-6.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target_data-7.c | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/async_target-2.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target-5.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target_data-6.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target_data-7.f90 | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c index ce63328..0c76f8e 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c +++ b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/device-1.c b/libgomp/testsuite/libgomp.c/examples-4/device-1.c index dad8572..46aa160 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/device-1.c +++ b/libgomp/testsuite/libgomp.c/examples-4/device-1.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/target-5.c b/libgomp/testsuite/libgomp.c/examples-4/target-5.c index 1853fba..1c14bae 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/target-5.c +++ b/libgomp/testsuite/libgomp.c/examples-4/target-5.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c index affeb49..57c7c0c 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c +++ b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-7.c b/libgomp/testsuite/libgomp.c/examples-4/target_data-7.c index c18d480..8ec41ea 100644 --- a/libgomp/
[hsa,testsuite] New directory for HSA-specific C testcases
Hi, we would like a place to have some HSA-specific tests, which would only run not only when HSA is enabled at configuration time but also when HSA hardware is present and used for offloading. The only way to detect that situation I could think of is to run a simple kernel with environment variable HSA_DEBUG set and look HSA libgomp plugin debug output in the stderr stream. So I started with a copy of proc check_runtime_nocache and repurposed it for this task and then guarded any execution of tests by that predicate in the new c.exp file. The new directory is in the libgomp testsuite because (at least now) there is no other way of running hsa stuff but through libgomp. All tests there are run twice, at -O0 and -O2, because we have found that both levels are very useful for detecting errors in the lowering to HSA. I have very little experience with tcl, expect or DejaGNU and would appreciate very much any feedback or guidance of anyone more experience in these areas. In particular, it would probably be better to structure the new predicate as an effective-target one but I am not sure what amount of caching would I have to do manually for such a new runtime test. After I incorporate all feedback, I would of course like to commit this to trunk. Needless to say, the patch has been tested and works. Thanks, Martin 2016-02-22 Martin Jambor Martin Liska * testsuite/lib/libgomp.exp (check_hsa_offloading_available): New. * testsuite/libgomp.hsa.c/c.exp: Likewise. * testsuite/libgomp.hsa.c/alloca-1.c: Likewise. * testsuite/libgomp.hsa.c/bitfield-1.c: Likewise. * testsuite/libgomp.hsa.c/builtins-1.c: Likewise. * testsuite/libgomp.hsa.c/complex-1.c: Likewise. * testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise. * testsuite/libgomp.hsa.c/function-call-1.c: Likewise. * testsuite/libgomp.hsa.c/get-level-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-2.c: Likewise. * testsuite/libgomp.hsa.c/gridify-3.c: Likewise. * testsuite/libgomp.hsa.c/gridify-4.c: Likewise. * testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise. * testsuite/libgomp.hsa.c/pr69568.c: Likewise. * testsuite/libgomp.hsa.c/rotate-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise. --- libgomp/testsuite/lib/libgomp.exp | 44 ++ libgomp/testsuite/libgomp.hsa.c/alloca-1.c | 25 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c | 160 + libgomp/testsuite/libgomp.hsa.c/builtins-1.c | 97 + libgomp/testsuite/libgomp.hsa.c/c.exp | 41 ++ libgomp/testsuite/libgomp.hsa.c/complex-1.c| 65 + .../testsuite/libgomp.hsa.c/formal-actual-args-1.c | 83 +++ libgomp/testsuite/libgomp.hsa.c/function-call-1.c | 50 +++ libgomp/testsuite/libgomp.hsa.c/get-level-1.c | 26 libgomp/testsuite/libgomp.hsa.c/gridify-1.c| 26 libgomp/testsuite/libgomp.hsa.c/gridify-2.c| 26 libgomp/testsuite/libgomp.hsa.c/gridify-3.c| 39 + libgomp/testsuite/libgomp.hsa.c/gridify-4.c| 45 ++ .../testsuite/libgomp.hsa.c/memory-operations-1.c | 92 libgomp/testsuite/libgomp.hsa.c/pr69568.c | 41 ++ libgomp/testsuite/libgomp.hsa.c/rotate-1.c | 39 + libgomp/testsuite/libgomp.hsa.c/switch-1.c | 145 +++ libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c | 116 +++ 18 files changed, 1160 insertions(+) create mode 100644 libgomp/testsuite/libgomp.hsa.c/alloca-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/builtins-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/c.exp create mode 100644 libgomp/testsuite/libgomp.hsa.c/complex-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/formal-actual-args-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/function-call-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/get-level-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-2.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-3.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-4.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/memory-operations-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/pr69568.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/rotate-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index 154a447..1c917c8 100644 --- a/libgomp/testsuite/lib/libgomp.exp +
[PATCH] Fix powerpc shift/rotate/mask insn handling (PR target/69946)
Hi! Segher has added last year a few routines for the shift/rotate + mask patterns, insns always have one predicate which tests if PowerPC supports such pattern, and another that emits the instruction for it. The testcase in the patch is miscompiled, we end up with an instruction with out of bound shift count, because there is disagreement between the analysis phase, which does some changes (changes shifts by 0 into rotate and some rotates into left or right shifts (for the last one with changed shift count)), but those changes are only done virtually, the predicate can't change the instruction, it either accepts it or rejects it. Then during output, we don't perform those changes, treat it as rotate or left or right shift just based on what the actual IL has. In some cases it is harmless, in other cases, as the testcase shows, it is harmful. Also, I've noticed that the preparation phase of both rs6000_is_valid_shift_mask and rs6000_is_valid_insert_mask is pretty much the same (the only difference is that the latter requires the shift/rotate count to be always CONST_INT, while the former allows also a REG in there). So, what the following patch does is that it moves the preparation from the predicate functions to a new static inline helper function, and then uses that function in both the predicate functions, and also in both the output functions. Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk? 2016-02-26 Jakub Jelinek PR target/69946 * config/rs6000/rs6000.c (rs6000_is_valid_shift_mask_1): New function. (rs6000_is_valid_shift_mask, rs6000_is_valid_insert_mask): Use it. (rs6000_insn_for_shift_mask, rs6000_insn_for_insert_mask): Likewise. Adjust for possible changes of shift/rotate CODE and shift count SH. * gcc.dg/pr69946.c: New test. --- gcc/config/rs6000/rs6000.c.jj 2016-02-24 22:33:32.0 +0100 +++ gcc/config/rs6000/rs6000.c 2016-02-26 12:05:39.489021521 +0100 @@ -17319,42 +17319,60 @@ rs6000_insn_for_and_mask (machine_mode m gcc_unreachable (); } -/* Return whether MASK (a CONST_INT) is a valid mask for any rlw[i]nm, - rld[i]cl, rld[i]cr, or rld[i]c instruction, to implement an AND with - shift SHIFT (a ROTATE, ASHIFT, or LSHIFTRT) in mode MODE. */ +/* Helper for rs6000_is_valid_shift_mask, rs6000_insn_for_shift_mask, + rs6000_is_valid_insert_mask and rs6000_insn_for_insert_mask. */ -bool -rs6000_is_valid_shift_mask (rtx mask, rtx shift, machine_mode mode) +static inline bool +rs6000_is_valid_shift_mask_1 (rtx mask, rtx shift, machine_mode mode, + int *nb, int *ne, int *n, int *sh, + rtx_code *code) { - int nb, ne; - - if (!rs6000_is_valid_mask (mask, &nb, &ne, mode)) + if (!rs6000_is_valid_mask (mask, nb, ne, mode)) return false; - int n = GET_MODE_PRECISION (mode); - int sh = -1; + *n = GET_MODE_PRECISION (mode); + *sh = -1; if (CONST_INT_P (XEXP (shift, 1))) { - sh = INTVAL (XEXP (shift, 1)); - if (sh < 0 || sh >= n) + *sh = INTVAL (XEXP (shift, 1)); + if (*sh < 0 || *sh >= *n) return false; } - rtx_code code = GET_CODE (shift); + *code = GET_CODE (shift); /* Convert any shift by 0 to a rotate, to simplify below code. */ - if (sh == 0) -code = ROTATE; + if (*sh == 0) +*code = ROTATE; /* Convert rotate to simple shift if we can, to make analysis simpler. */ - if (code == ROTATE && sh >= 0 && nb >= ne && ne >= sh) -code = ASHIFT; - if (code == ROTATE && sh >= 0 && nb >= ne && nb < sh) + if (*code == ROTATE && *sh >= 0 && *nb >= *ne) { - code = LSHIFTRT; - sh = n - sh; + if (*ne >= *sh) + *code = ASHIFT; + else if (*nb < *sh) + { + *code = LSHIFTRT; + *sh = *n - *sh; + } } + return true; +} + +/* Return whether MASK (a CONST_INT) is a valid mask for any rlw[i]nm, + rld[i]cl, rld[i]cr, or rld[i]c instruction, to implement an AND with + shift SHIFT (a ROTATE, ASHIFT, or LSHIFTRT) in mode MODE. */ + +bool +rs6000_is_valid_shift_mask (rtx mask, rtx shift, machine_mode mode) +{ + int nb, ne, n, sh; + rtx_code code; + + if (!rs6000_is_valid_shift_mask_1 (mask, shift, mode, &nb, &ne, +&n, &sh, &code)) +return false; /* DImode rotates need rld*. */ if (mode == DImode && code == ROTATE) @@ -17398,15 +17416,20 @@ rs6000_is_valid_shift_mask (rtx mask, rt const char * rs6000_insn_for_shift_mask (machine_mode mode, rtx *operands, bool dot) { - int nb, ne; + int nb, ne, n, sh; + rtx_code code; - if (!rs6000_is_valid_mask (operands[3], &nb, &ne, mode)) + if (!rs6000_is_valid_shift_mask_1 (operands[3], operands[4], mode, &nb, &ne, +&n, &sh, &code)) gcc_unreachable (); + if (sh != -1) +operands[2] = GEN_INT (sh); + if (mode == DImode && ne == 0) { - if (GET_CODE (operands[4])
Re: [omp, hsa] Do not gridify simd constructs
On Fri, Feb 26, 2016 at 04:57:19PM +0100, Martin Jambor wrote: > unfortunately, I have missed two execution failures when using HSA > (teams-6.f90 and target1.f90), both of which are caused by not > handling the simd part of combined target teams distribute parallel > for simd construct. I have not really thought through how exactly > should GPUs teat the simd construct in a combined construct but of > course we must not miscompile. > > So I'd like to commit the following patch which just disallows > gridification of such cases to trunk now and am working on a fix that > removes the simd loop for the hsa branch. Eventually we might also > want to hsa-vectorize the body even though the explicit loop is > missing, but that will also mean that the HSA grid size has to shrink > appropriately. > > Bootstrapped and tested on x86_64-linux, with and without HSA > enabled. OK for trunk? > > Thanks, > > Martin > > 2016-02-17 Martin Jambor > > * omp-low.c (grid_find_ungridifiable_statement): Store problematic > statements to wi->info. Also disallow omp simd constructs. > (grid_target_follows_gridifiable_pattern): Use wi.info to dump reason > for not gridifying. Dump special string for omp_for. Ok (though hope that for GCC 7 you actually handle it better). Jakub
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
On Fri, Feb 26, 2016 at 04:59:57PM +0100, Martin Jambor wrote: > just like with the compiler gomp testsuite, we need to add -Wno-hsa to > options when compiling libgomp testcases in order not to have "excess > errors" failures when HSA is enabled. There are quite many of such > testcases on the trunk because I have disabled the dynamic parallelism > way of executing stuff. Hopefully we'll be able to revert many of the > hunks below when we get that working. > > The patch has been tested both with and without HSA enabled. OK for > trunk? I don't like this very much. Couldn't you instead add -Wno-hsa next to -fopenmp in *.exp, and just where you want to explicitly check the hsa warnings, enable it manually in dg-options or dg-additional-options (it would need to be guarded with hsa being enabled etc. anyway). Jakub
Re: [hsa,testsuite] Introduce offload_device_shared_as effective target
On Fri, Feb 26, 2016 at 04:59:42PM +0100, Martin Jambor wrote: > this patch has been written by Keith McDaniel when he was working at > AMD (and so it should be covered by their blanket copyright > assignment) and adds a libgomp testsuite effective target predicate > offload_device_shared_as that allows us to run tests only when target > constructs are run on a device with shared memory (this includes the > host). Keith included a C++ test to illustrate the use. > > I have tested this thoroughly, both with and without HSA (enabled or > present). OK for trunk? > > Thanks, > > Martin > > 2016-02-10 Keith McDaniel > Martin Jambor > > * testsuite/lib/libgomp.exp > (check_effective_target_offload_device_shared_as): New proc. > * testsuite/libgomp.c++/declare_target-1.C: New test. Ok. Jakub
Re: [PATCH 1/3] Add aarch64-*-rtems* target
On Thu, Feb 25, 2016 at 02:49:08PM -0600, Joel Sherrill wrote: > * gcc/config.gcc, libgcc/config.host: Add aarch64-*-rtems*. > * gcc/config/aarch64/rtems.h: New file. OK. Thanks, James > --- > gcc/config.gcc | 11 +-- > gcc/config/aarch64/rtems.h | 28 > libgcc/config.host | 2 +- > 3 files changed, 38 insertions(+), 3 deletions(-) > create mode 100644 gcc/config/aarch64/rtems.h
Re: [PATCH 3/3] contrib/config-list.mk: Add aarch64-rtems and x86_64-rtems
On Thu, Feb 25, 2016 at 02:49:10PM -0600, Joel Sherrill wrote: > * contrib/config-list.mk: Add aarch64-rtems and x86_64-rtems The AArch64 part of this is OK. Thanks, James > --- > contrib/config-list.mk | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-)
[PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL
Hi, my fix for PR 69666 has caused quite a few regressions accross the borad where SRA removed a SSA_NAME which however still was in the IL (and usually stumbled upon it itself straight away). The removal path should not be executed when there is an SSA_NAME on the LHS, the code clearly is not ready for it. Before my patch, we got always lucky because the statement was simply modified elsewhere when the LHS was an SSA_NAME. However, even that was not 10% guaranteed because of the !access_has_replacements_p (racc) part of the changed condition. The patch below fixes the ICEs simply by guarding the removal code to only work when the LHS is not an SSA_NAME. This means that the safe path below it is going to execute. I have bootstrapped and tested the patch on x86_64-linux. I'd like to commit it to trunk as soon as it gets approved and then I'd like to commit it to gcc-5 branch together with the PR 69666 fix a few days afterwards. OK? Thanks, Martin 2016-02-26 Martin Jambor PR middle-end/69920 * tree-sra.c (sra_modify_assign): Do not remove loads of uninitialized aggregates to SSA_NAMEs. testsuite/ * gcc.dg/torture/pr69932.c: New test. * gcc.dg/torture/pr69936.c: Likewise. --- gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++ gcc/testsuite/gcc.dg/torture/pr69936.c | 24 gcc/tree-sra.c | 3 ++- 3 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c b/gcc/testsuite/gcc.dg/torture/pr69932.c new file mode 100644 index 000..4b82130 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69932.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ + +int a; +void fn1() { + int b = 4; + short c[4]; + c[b] = c[a]; + if (c[2]) {} + +} diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c b/gcc/testsuite/gcc.dg/torture/pr69936.c new file mode 100644 index 000..3023bbb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69936.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ + +int a; +char b; +void fn1(int p1) {} + +int fn2() { return 5; } + +void fn3() { + if (fn2()) +; + else { +char c[5]; +c[0] = 5; + lbl_608: +fn1(c[9]); +int d = c[9]; +c[2] | a; +d = c[b]; + } + goto lbl_608; +} + +int main() { return 0; } diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 663ded2..366f413 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) else { if (access_has_children_p (racc) - && !racc->grp_unscalarized_data) + && !racc->grp_unscalarized_data + && TREE_CODE (lhs) != SSA_NAME) { if (dump_file) { -- 2.7.1
Re: [PATCH] Fix powerpc shift/rotate/mask insn handling (PR target/69946)
On Fri, Feb 26, 2016 at 11:02 AM, Jakub Jelinek wrote: > Hi! > > Segher has added last year a few routines for the shift/rotate + mask > patterns, insns always have one predicate which tests if PowerPC supports > such pattern, and another that emits the instruction for it. > > The testcase in the patch is miscompiled, we end up with an instruction > with out of bound shift count, because there is disagreement between the > analysis phase, which does some changes (changes shifts by 0 > into rotate and some rotates into left or right shifts (for the last one > with changed shift count)), but those changes are only done virtually, > the predicate can't change the instruction, it either accepts it or rejects > it. Then during output, we don't perform those changes, treat it as > rotate or left or right shift just based on what the actual IL has. > In some cases it is harmless, in other cases, as the testcase shows, it is > harmful. > > Also, I've noticed that the preparation phase of both > rs6000_is_valid_shift_mask and rs6000_is_valid_insert_mask is pretty much > the same (the only difference is that the latter requires the shift/rotate > count to be always CONST_INT, while the former allows also a REG in there). > > So, what the following patch does is that it moves the preparation > from the predicate functions to a new static inline helper function, > and then uses that function in both the predicate functions, and also > in both the output functions. > > Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk? > > 2016-02-26 Jakub Jelinek > > PR target/69946 > * config/rs6000/rs6000.c (rs6000_is_valid_shift_mask_1): New > function. > (rs6000_is_valid_shift_mask, rs6000_is_valid_insert_mask): Use it. > (rs6000_insn_for_shift_mask, rs6000_insn_for_insert_mask): Likewise. > Adjust for possible changes of shift/rotate CODE and shift count SH. > > * gcc.dg/pr69946.c: New test. Segher's patch from Wednesday should fix this. We can address the duplication separately in GCC 7. Thanks, David
Re: (Non-)offloading diagnostics
Hi! In light of the -Whsa testsuite patches just posted, I think we first need to clarify the general policy questions I posted a month ago: On Tue, 26 Jan 2016 11:46:14 +0100, I wrote: > On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor wrote: > > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote: > > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote: > > > > [...] There are no failing > > > > testcases if HSA is not configured. If it is, there are some, all of > > > > which fall into one the following categories: > > > > > > > > 1) HSA cannot compile a function for one reason or another (most > > > > common cause is inability of HSA to take an address of a function > > > > or make an indirect call) and gives a warning, which is regarded > > > > as an "excess error" by dejagnu. > > Confirmed: > > [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function > 'bar._omp_fn.26.hsa.31': > cc1: warning: could not emit HSAIL for the function [-Whsa] > cc1: note: support for HSA does not implement non-gridified OpenMP > parallel constructs. > [...] > > ..., and many more. So, with --enable-offload-targets=[...],hsa we > regress (PASS -> FAIL; "test for excess errors") such compile tests. > > > > It would be good if there is a -W* switch to turn such warnings off. > > > Not just for the purposes of dejagnu libgomp testing, but say one > > > might try to compile a program primarily say for XeonPhi or PTX > > > offloading, > > > but have HSA enabled to, but care primarily about the former two, etc. > > > > All these warnings are in the -Whsa group and can be suppressed with > > -Wno-hsa. > > These compile tests are done without any -W* flags; -Whsa is enabled by > default. I'm a proponent of enabling as many useful warnings by default, or if not by default, then with -Wall. -Whsa is enabled by default, and has thus set a precedent of doing that. > How to address this mismatch? Put -Wno-has into all regressing > test case files individually? Run the affected testsuites with -Wno-hsa? > Not enable -Whsa by default (but I agree it's useful to users)? > (Instead, enable with -Wall, which any sane user should be specifying?) Even if a bit tedious, my preference actually is to add to the test cases an (expected) dg-warning everywhere where such a non-offloading warning currently triggers, because that's what users will be seeing (with -Whsa enabled by default), and because that will make it obvious (PASS -> FAIL for the warning check) when that warning disappears (say, because the compiler can now offload the respective construct, yay). > A very similar problem also exists for nvptx offloading (Nathan CCed), > where we emit similar warnings (enabled by default). As nvptx offloading > happens during link-time (not compile-time, as with hsa offloading), > these don't affect GCC's compile tests, but need to be worked around in > libgomp test cases. Grüße Thomas
Re: (Non-)offloading diagnostics
Hi, On Fri, Feb 26, 2016 at 05:46:33PM +0100, Thomas Schwinge wrote: > Hi! > > In light of the -Whsa testsuite patches just posted, I think we first > need to clarify the general policy questions I posted a month ago: > > On Tue, 26 Jan 2016 11:46:14 +0100, I wrote: > > On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor wrote: > > > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote: > > > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote: > > > > > [...] There are no failing > > > > > testcases if HSA is not configured. If it is, there are some, all of > > > > > which fall into one the following categories: > > > > > > > > > > 1) HSA cannot compile a function for one reason or another (most > > > > > common cause is inability of HSA to take an address of a function > > > > > or make an indirect call) and gives a warning, which is regarded > > > > > as an "excess error" by dejagnu. > > > > Confirmed: > > > > [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function > > 'bar._omp_fn.26.hsa.31': > > cc1: warning: could not emit HSAIL for the function [-Whsa] > > cc1: note: support for HSA does not implement non-gridified OpenMP > > parallel constructs. > > [...] > > > > ..., and many more. So, with --enable-offload-targets=[...],hsa we > > regress (PASS -> FAIL; "test for excess errors") such compile tests. > > > > > > It would be good if there is a -W* switch to turn such warnings off. > > > > Not just for the purposes of dejagnu libgomp testing, but say one > > > > might try to compile a program primarily say for XeonPhi or PTX > > > > offloading, > > > > but have HSA enabled to, but care primarily about the former two, etc. > > > > > > All these warnings are in the -Whsa group and can be suppressed with > > > -Wno-hsa. > > > > These compile tests are done without any -W* flags; -Whsa is enabled by > > default. > > I'm a proponent of enabling as many useful warnings by default, or if not > by default, then with -Wall. -Whsa is enabled by default, and has thus > set a precedent of doing that. I am not sure I'd go as far as "as many as possible," but in the case of -Whsa, the warnings get emitted only if HSA offloading is configured and especially only if the user used OMP and its target construct. This means that it is relevant only for a rather small class of users and it's not a "your code looks weird" kind of warning but a "the compiler is not doing what you clearly asked for" warning. So that is why we decided to warn unconditionally. But as far as I understand, gcc does not give any promises about warnings, so I believe decisions like a defaultness of a warning can be revisited at any point in the future, for example if people learn not to expect some constructs to be offloaded to GPUs. Moreover, the conventions regarding offloading are still being settled and still will for quite some time so nobody should really expect such details to be set in stone. > > > How to address this mismatch? Put -Wno-has into all regressing > > test case files individually? Run the affected testsuites with -Wno-hsa? > > Not enable -Whsa by default (but I agree it's useful to users)? > > (Instead, enable with -Wall, which any sane user should be specifying?) > > Even if a bit tedious, my preference actually is to add to the test cases > an (expected) dg-warning everywhere where such a non-offloading warning > currently triggers, because that's what users will be seeing (with -Whsa > enabled by default), and because that will make it obvious (PASS -> FAIL > for the warning check) when that warning disappears (say, because the > compiler can now offload the respective construct, yay). That is my opinion as well, except that given the number of warnings now (with dynamic parallelism disabled), I prefer to work on the file granularity. Also, often testcases use macros heavily and putting dg-warning into them is somewhere between weird and outright impossible. On the other hand, as you have probably noticed, Jakub asked me to pass -Wno-hsa to all tests instead so he seems to have the opposing point of view. I must say that I am not really ready to argue about this too much, especially if we have our own HSA testsuite directory. Martin > > > A very similar problem also exists for nvptx offloading (Nathan CCed), > > where we emit similar warnings (enabled by default). As nvptx offloading > > happens during link-time (not compile-time, as with hsa offloading), > > these don't affect GCC's compile tests, but need to be worked around in > > libgomp test cases. > > > Grüße > Thomas
Re: [Fortran, Patch] (Coarrays) Wrong events size
Hi Allessandro, * PING * Looks obvious and simple enough for me. OK. Thanks for the patch! Thomas
[PATCH] update include/plugin-api.h to add hooks for section alignment + size
Hello, I would like to update the gcc plugin API to include interfaces for querying section size and alignment. The intent is to make it easier for plugins to do link-time reordering of .bss/.data/.rodata sections to reduce padding and/or improve cache utilization. I've posted a patch to binutils that includes changes to the gold linker to implement these hooks, you can find details at http://sourceware.org/ml/binutils/2016-02/msg00400.html. The attached patch is to make sure that the gcc plugin-api.h is in sync with the corresponding header in binutils. Questions + comments welcome. Thanks, Than diff --git a/include/plugin-api.h b/include/plugin-api.h index 1bf1750..3ee2e2b 100644 --- a/include/plugin-api.h +++ b/include/plugin-api.h @@ -1,6 +1,6 @@ /* plugin-api.h -- External linker plugin API. */ -/* Copyright (C) 2009-2015 Free Software Foundation, Inc. +/* Copyright (C) 2009-2016 Free Software Foundation, Inc. Written by Cary Coutant . This file is part of binutils. @@ -345,6 +345,26 @@ enum ld_plugin_status const struct ld_plugin_section * section_list, unsigned int num_sections); +/* The linker's interface for retrieving the section alignment requirement + of a specific section in an object. This interface should only be invoked in the + claim_file handler. This function sets *ADDRALIGN to the ELF sh_addralign + value of the input section. */ + +typedef +enum ld_plugin_status +(*ld_plugin_get_input_section_alignment) (const struct ld_plugin_section section, + unsigned int *addralign); + +/* The linker's interface for retrieving the section size of a specific section + in an object. This interface should only be invoked in the claim_file handler. + This function sets *SECSIZE to the ELF sh_size + value of the input section. */ + +typedef +enum ld_plugin_status +(*ld_plugin_get_input_section_size) (const struct ld_plugin_section section, + uint64_t *secsize); + enum ld_plugin_level { LDPL_INFO, @@ -384,7 +404,9 @@ enum ld_plugin_tag LDPT_ALLOW_SECTION_ORDERING = 24, LDPT_GET_SYMBOLS_V2 = 25, LDPT_ALLOW_UNIQUE_SEGMENT_FOR_SECTIONS = 26, - LDPT_UNIQUE_SEGMENT_FOR_SECTIONS = 27 + LDPT_UNIQUE_SEGMENT_FOR_SECTIONS = 27, + LDPT_GET_INPUT_SECTION_ALIGNMENT = 28, + LDPT_GET_INPUT_SECTION_SIZE = 29 }; /* The plugin transfer vector. */ @@ -416,6 +438,8 @@ struct ld_plugin_tv ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; +ld_plugin_get_input_section_alignment tv_get_input_section_alignment; +ld_plugin_get_input_section_size tv_get_input_section_size; } tv_u; };
libgo patch committed: Add some Getsockopt functions to the syscall package
This patch adds some Getsockopt functions to the syscall package. These functions exist in the master library, but never made it into gccgo's syscall package. This fixes GCC PR 69966. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 233670) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -156f5f0152797ac2afe5f23803aeb3c7b8f8418e +3de822d11255d439fac9717897b017aae2de18c2 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/syscall/socket.go === --- libgo/go/syscall/socket.go (revision 232239) +++ libgo/go/syscall/socket.go (working copy) @@ -251,6 +251,13 @@ func GetsockoptIPv6Mreq(fd, level, opt i return &value, err } +func GetsockoptICMPv6Filter(fd, level, opt int) (*ICMPv6Filter, error) { + var value ICMPv6Filter + vallen := Socklen_t(SizeofICMPv6Filter) + err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen) + return &value, err +} + //sys setsockopt(s int, level int, name int, val unsafe.Pointer, vallen Socklen_t) (err error) //setsockopt(s _C_int, level _C_int, optname _C_int, val *byte, vallen Socklen_t) _C_int Index: libgo/go/syscall/socket_bsd.go === --- libgo/go/syscall/socket_bsd.go (revision 232239) +++ libgo/go/syscall/socket_bsd.go (working copy) @@ -80,3 +80,10 @@ func BindToDevice(fd int, device string) func anyToSockaddrOS(rsa *RawSockaddrAny) (Sockaddr, error) { return nil, EAFNOSUPPORT } + +func GetsockoptIPv6MTUInfo(fd, level, opt int) (*IPv6MTUInfo, error) { + var value IPv6MTUInfo + vallen := Socklen_t(SizeofIPv6MTUInfo) + err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen) + return &value, err +} Index: libgo/go/syscall/socket_linux.go === --- libgo/go/syscall/socket_linux.go(revision 232239) +++ libgo/go/syscall/socket_linux.go(working copy) @@ -168,6 +168,20 @@ func anyToSockaddrOS(rsa *RawSockaddrAny return nil, EAFNOSUPPORT } +func GetsockoptIPv6MTUInfo(fd, level, opt int) (*IPv6MTUInfo, error) { + var value IPv6MTUInfo + vallen := Socklen_t(SizeofIPv6MTUInfo) + err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen) + return &value, err +} + +func GetsockoptUcred(fd, level, opt int) (*Ucred, error) { + var value Ucred + vallen := Socklen_t(SizeofUcred) + err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen) + return &value, err +} + //sysnbEpollCreate(size int) (fd int, err error) //epoll_create(size _C_int) _C_int Index: libgo/mksysinfo.sh === --- libgo/mksysinfo.sh (revision 232239) +++ libgo/mksysinfo.sh (working copy) @@ -870,6 +870,14 @@ if ! grep 'type ICMPv6Filter ' ${OUT} > echo 'type ICMPv6Filter struct { Data [8]uint32 }' >> ${OUT} fi +# The ip6_mtuinfo struct. +grep '^type _ip6_mtuinfo ' gen-sysinfo.go | \ +sed -e 's/_ip6_mtuinfo/IPv6MTUInfo/' \ + -e 's/ip6m_addr/Addr/' \ + -e 's/_sockaddr_in6/RawSockaddrInet6/' \ + -e 's/ip6m_mtu/Mtu/' \ +>> ${OUT} + # Try to guess the type to use for fd_set. fd_set=`grep '^type _fd_set ' gen-sysinfo.go || true` fds_bits_type="_C_long" @@ -1464,7 +1472,7 @@ set cmsghdr Cmsghdr ip_mreq IPMreq ip_mr msghdr Msghdr nlattr NlAttr nlmsgerr NlMsgerr nlmsghdr NlMsghdr \ rtattr RtAttr rtgenmsg RtGenmsg rtmsg RtMsg rtnexthop RtNexthop \ sock_filter SockFilter sock_fprog SockFprog ucred Ucred \ -icmp6_filter ICMPv6Filter +icmp6_filter ICMPv6Filter ip6_mtuinfo IPv6MTUInfo while test $# != 0; do nc=$1 ngo=$2
Re: [PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL
On February 26, 2016 5:15:43 PM GMT+01:00, Martin Jambor wrote: >Hi, > >my fix for PR 69666 has caused quite a few regressions accross the >borad where SRA removed a SSA_NAME which however still was in the IL >(and usually stumbled upon it itself straight away). > >The removal path should not be executed when there is an SSA_NAME on >the LHS, the code clearly is not ready for it. Before my patch, we >got always lucky because the statement was simply modified elsewhere >when the LHS was an SSA_NAME. However, even that was not 10% >guaranteed because of the !access_has_replacements_p (racc) part of >the changed condition. > >The patch below fixes the ICEs simply by guarding the removal code to >only work when the LHS is not an SSA_NAME. This means that the safe >path below it is going to execute. > >I have bootstrapped and tested the patch on x86_64-linux. I'd like to >commit it to trunk as soon as it gets approved and then I'd like to >commit it to gcc-5 branch together with the PR 69666 fix a few days >afterwards. OK? OK. Richard. >Thanks, > >Martin > > >2016-02-26 Martin Jambor > > PR middle-end/69920 > * tree-sra.c (sra_modify_assign): Do not remove loads of > uninitialized aggregates to SSA_NAMEs. > >testsuite/ > * gcc.dg/torture/pr69932.c: New test. > * gcc.dg/torture/pr69936.c: Likewise. >--- > gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++ > gcc/testsuite/gcc.dg/torture/pr69936.c | 24 > gcc/tree-sra.c | 3 ++- > 3 files changed, 36 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c > create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c > >diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c >b/gcc/testsuite/gcc.dg/torture/pr69932.c >new file mode 100644 >index 000..4b82130 >--- /dev/null >+++ b/gcc/testsuite/gcc.dg/torture/pr69932.c >@@ -0,0 +1,10 @@ >+/* { dg-do compile } */ >+ >+int a; >+void fn1() { >+ int b = 4; >+ short c[4]; >+ c[b] = c[a]; >+ if (c[2]) {} >+ >+} >diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c >b/gcc/testsuite/gcc.dg/torture/pr69936.c >new file mode 100644 >index 000..3023bbb >--- /dev/null >+++ b/gcc/testsuite/gcc.dg/torture/pr69936.c >@@ -0,0 +1,24 @@ >+/* { dg-do compile } */ >+ >+int a; >+char b; >+void fn1(int p1) {} >+ >+int fn2() { return 5; } >+ >+void fn3() { >+ if (fn2()) >+; >+ else { >+char c[5]; >+c[0] = 5; >+ lbl_608: >+fn1(c[9]); >+int d = c[9]; >+c[2] | a; >+d = c[b]; >+ } >+ goto lbl_608; >+} >+ >+int main() { return 0; } >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >index 663ded2..366f413 100644 >--- a/gcc/tree-sra.c >+++ b/gcc/tree-sra.c >@@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt, >gimple_stmt_iterator *gsi) > else > { > if (access_has_children_p (racc) >-&& !racc->grp_unscalarized_data) >+&& !racc->grp_unscalarized_data >+&& TREE_CODE (lhs) != SSA_NAME) > { > if (dump_file) > {
Re: (Non-)offloading diagnostics
On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote: > > I'm a proponent of enabling as many useful warnings by default, or if not > > by default, then with -Wall. -Whsa is enabled by default, and has thus > > set a precedent of doing that. > > I am not sure I'd go as far as "as many as possible," but in the case > of -Whsa, the warnings get emitted only if HSA offloading is > configured and especially only if the user used OMP and its target > construct. This means that it is relevant only for a rather small > class of users and it's not a "your code looks weird" kind of warning > but a "the compiler is not doing what you clearly asked for" warning. > So that is why we decided to warn unconditionally. > > But as far as I understand, gcc does not give any promises about > warnings, so I believe decisions like a defaultness of a warning can > be revisited at any point in the future, for example if people learn > not to expect some constructs to be offloaded to GPUs. Moreover, the > conventions regarding offloading are still being settled and still > will for quite some time so nobody should really expect such details > to be set in stone. The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite are (meant to be) valid OpenMP testcases, having them full of dozens of dg-warning lines where every of the 10+ different offloading target warns about something would be a maintainance nightmare. E.g. when adding new OpenMP tests, one would need to configure all the offloaders (individually?), for some you need hw not every committer has, for others there are other issues (e.g., is the required amdkfd going to be submitted for upstream kernel? I might have hard time convincing our kernel maintainers to use that instead of what is in upstream kernel others). So, IMHO if you want to check for warnings, do that as Martin has added a new subdir with only hsa OpenMP tests, if you want test warnings on tests we already have elsewhere, #include them in the other dir, dg-do link instead of run (so that it is not run multiple times), and check for the warnings; you could also use -foffload=hsa in there to make sure you only have to care about hsa warnings, and not NVPTX, or whatever other offloader. Jakub
Re: [Patch, Fortran] PR 69495: unused-label warning does not tell which flag triggered it
double-ping ... 2016-02-12 0:29 GMT+01:00 Janus Weil : > ping! > > 2016-02-05 19:19 GMT+01:00 Janus Weil : >> Hi all, >> >> I have slightly updated the patch now to avoid string-breaking issues >> (even if it may not be a problem at all, as mentioned by Jospeh). Also >> I removed the questionable part in intrinsic.c that I was not sure >> about. >> >> This version of the patch should not be too far from obvious now. Ok for >> trunk? >> >> Cheers, >> Janus >> >> >> >> 2016-02-03 23:27 GMT+01:00 Joseph Myers : >>> On Wed, 3 Feb 2016, Manfred Schwarb wrote: >>> There are 2 things with translation, and there is a third issue: - As you noticed, breaking things differently means translation has to be done again. - Normally, each string is translated independently, and depending on the language there may be lack of context (e.g. adjectives get different suffixes depending on the noun). >>> >>> I believe gettext works fine with (compile-time) string constant >>> concatenation - that is, extracts the whole concatenated string for >>> translation, so these are non-issues. What doesn't work includes: >>> >>> * Runtime concatenation of strings or otherwise putting English fragments >>> together at runtime. >>> >>> * String constant concatenation where one of the concatenated pieces comes >>> from a macro expansion. >>> >>> * The argument for translation being a conditional expression: >>> >>> error (cond ? "message 1" : "message 2"); >>> >>> (in this case, only one of the messages may be extracted for translation, >>> so you need to mark both of them up with appropriate macros such as G_). >>> >>> -- >>> Joseph S. Myers >>> jos...@codesourcery.com
Re: [Fortran, Patch] (Coarrays) Wrong events size
Dear Alessandro, Seconded! I saw your ping on my phone and was going to respond. well, now :-) Thanks for the patch Paul On 26 February 2016 at 18:29, Thomas Koenig wrote: > Hi Allessandro, > >> * PING * > > > Looks obvious and simple enough for me. > > OK. > > Thanks for the patch! > > Thomas > -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
Re: Fix PR44281 (bad RA with global regs)
On 02/26/2016 08:41 AM, Bernd Schmidt wrote: On 02/22/2016 03:37 PM, Richard Biener wrote: Do calls properly clobber them even if they are not in the set of call-clobbered regs? Are asm()s properly using/clobbering them? I think you are allowed to use them in asm()s without adding constraints for them? Calls do, asms currently don't AFAICT. Not sure whether it's allowed to use them, but I think it should be straightforward to adjust df-scan. The other case that came to mind was signal handlers. What happens if we're using the global register as a scratch, we hit a memory reference that faults and inside the signal handler the original code expects to be able to use the global register the user set up? If that's a valid use scenario, then there's probably all kinds of DF stuff that we'd need to expose. jeff
Re: (Non-)offloading diagnostics
Hi, On Fri, Feb 26, 2016 at 06:51:34PM +0100, Jakub Jelinek wrote: > On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote: > > > I'm a proponent of enabling as many useful warnings by default, or if not > > > by default, then with -Wall. -Whsa is enabled by default, and has thus > > > set a precedent of doing that. > > > > I am not sure I'd go as far as "as many as possible," but in the case > > of -Whsa, the warnings get emitted only if HSA offloading is > > configured and especially only if the user used OMP and its target > > construct. This means that it is relevant only for a rather small > > class of users and it's not a "your code looks weird" kind of warning > > but a "the compiler is not doing what you clearly asked for" warning. > > So that is why we decided to warn unconditionally. > > > > But as far as I understand, gcc does not give any promises about > > warnings, so I believe decisions like a defaultness of a warning can > > be revisited at any point in the future, for example if people learn > > not to expect some constructs to be offloaded to GPUs. Moreover, the > > conventions regarding offloading are still being settled and still > > will for quite some time so nobody should really expect such details > > to be set in stone. > > The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite > are (meant to be) valid OpenMP testcases, having them full of dozens of > dg-warning lines where every of the 10+ different offloading target warns > about something would be a maintainance nightmare. Agreed, having such dg-warnings would definitely be an overkill. I only intended to mark the whole test with an option. I am willing to be looking for new hsa warnings and examine them myself, adding the option if necessary. I would not expect the originator of the testcase or anybody who does not care for HSA to do it. > E.g. when adding > new OpenMP tests, one would need to configure all the offloaders > (individually?), for some you need hw not every committer has, No special hardware is necessary to see the warning (though you need at least https://github.com/HSAFoundation/HSA-Runtime-Reference-Source to build the libgomp plugin). Once gcc decides to emit HSAIL, it of course has to work as expected. There should be no need to configure hsa individually either. > for others > there are other issues (e.g., is the required amdkfd going to be submitted > for upstream kernel? I might have hard time convincing our kernel > maintainers to use that instead of what is in upstream kernel others). I am being repeatedly told it will be and very soon, but apparently it takes longer than AMD anticipated. I don't think anybody expects any distribution to pick it up on their own (...but you know, Red Hat is actually the company that now employs the upstream kernel kfd maintainer ;-). > So, IMHO if you want to check for warnings, do that as Martin has added a > new subdir with only hsa OpenMP tests, if you want test warnings on tests we > already have elsewhere, #include them in the other dir, dg-do link instead > of run (so that it is not run multiple times), and check for the warnings; > you could also use -foffload=hsa in there to make sure you only have to care > about hsa warnings, and not NVPTX, or whatever other offloader. > Just to be clear, I never wanted to be testing for presence of warnings, I see no value in that. All in all, I am willing to add -Wno-hsa to default options and only have these warnings on in dedicated HSA directories. I will amend the posted patches once testsuite maintainers look at my initial proposal for the first such directory. Martin
Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)
On Thu, Feb 25, 2016 at 10:52:29AM -0500, David Edelsohn wrote: > Please add a short comment explaining why rs6000_insn_for_shift_mask > doesn't need to match the logic in rs6000_is_valid_shift_mask > converting rotates to simple shifts. I added this comment: --- trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:17:02 233754 +++ trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:49:18 233755 @@ -17438,9 +17438,12 @@ rs6000_insn_for_shift_mask (machine_mode operands[2] = GEN_INT (32 - INTVAL (operands[2])); operands[3] = GEN_INT (31 - nb); operands[4] = GEN_INT (31 - ne); + /* This insn can also be a 64-bit rotate with mask that really makes +it just a shift right (with mask); the %h below are to adjust for +that situation (shift count is >= 32 in that case). */ if (dot) - return "rlw%I2nm. %0,%1,%2,%3,%4"; - return "rlw%I2nm %0,%1,%2,%3,%4"; + return "rlw%I2nm. %0,%1,%h2,%3,%4"; + return "rlw%I2nm %0,%1,%h2,%3,%4"; } gcc_unreachable (); Segher
Re: [hsa merge 08/10] HSAIL BRIG description header file
Hi, I hope I've got some good news: On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote: > Jakub Jelinek writes: > > > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote: > >> the following patch adds a BRIG (binary representation of HSAIL) > >> representation description. It is within a single header file > >> describing the binary structures and constants of the format. > >> > >> The file comes from the HSA Foundation (I have only added the > >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments > >> which are not present in proposed future versions of the file) and is > >> licensed under "University of Illinois/NCSA Open Source License." > >> > >> The license is "GPL-compatible" according to FSF > >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses) > >> so I believe we can have it in GCC. Nevertheless, it is not GPL and > >> there is no copyright assignment for it, but the situation is > >> hopefully analogous to some other libraries that have their upstream > >> elsewhere but we ship them as part of the GCC. > >> > >> In the previous posting of this patch > >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have > >> requested a permission from the steering committee to include this file > >> with a different upstream in GCC. I have not received an official > >> reply but since I have been chosen to be the HSA maintainer, I tend to > >> think there were no legal objections against HSA going forward, > >> including this file. > > Martin, could you ask the HSA Foundation or AMD or whoever if there is > any way they could remove the second requirement of the license? It > adds yet another case where anybody distributing GCC has to list yet > another copyright notice. > I have asked HSA foundation to do just that and apparently they agreed to change the licensing of the file (in upcoming versions of HSA) to the MIT license. IIUC, the reading of the license header would then be the one below. I hope that means the problematic requirements will be gone and we will be able to just use their file. If, however, you still think there will be issues preventing us from doing that, please let me know as soon as possible. Thanks, Martin The license is going to be: The MIT License (MIT) Copyright (c) 2016, HSA Foundation, Inc * All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."
Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)
On Fri, Feb 26, 2016 at 1:52 PM, Segher Boessenkool wrote: > On Thu, Feb 25, 2016 at 10:52:29AM -0500, David Edelsohn wrote: >> Please add a short comment explaining why rs6000_insn_for_shift_mask >> doesn't need to match the logic in rs6000_is_valid_shift_mask >> converting rotates to simple shifts. > > I added this comment: > > --- trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:17:02 233754 > +++ trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:49:18 233755 > @@ -17438,9 +17438,12 @@ rs6000_insn_for_shift_mask (machine_mode > operands[2] = GEN_INT (32 - INTVAL (operands[2])); >operands[3] = GEN_INT (31 - nb); >operands[4] = GEN_INT (31 - ne); > + /* This insn can also be a 64-bit rotate with mask that really makes > +it just a shift right (with mask); the %h below are to adjust for > +that situation (shift count is >= 32 in that case). */ >if (dot) > - return "rlw%I2nm. %0,%1,%2,%3,%4"; > - return "rlw%I2nm %0,%1,%2,%3,%4"; > + return "rlw%I2nm. %0,%1,%h2,%3,%4"; > + return "rlw%I2nm %0,%1,%h2,%3,%4"; > } > >gcc_unreachable (); Thanks! - David
[PATCH] Fix "no-vsx" target attribute handling (PR target/69969)
Hi! Most of the errors and warnings in rs6000_option_override_internal are emitted only if the particular option is explicit, e.g. if (TARGET_P9_DFORM && !TARGET_P9_VECTOR) { if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) error ("-mpower9-dform requires -mpower9-vector"); rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM; } and many others, which is right, but for the -mallow-movmisalign requires -mvsx error it doesn't do this, so if say -mcpu=power8 compiled TU contains a routine with target ("no-vsx") attribute, we get this error, even when the user hasn't done anything we should complain about. Fixed by following what we do for the other options, bootstrapped/regtested on powerpc64le-linux (and powerpc64-linux, but regtest is still pending there). Ok for trunk? 2016-02-26 Jakub Jelinek PR target/69969 * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't complain about -mallow-movmisalign without -mvsx if TARGET_ALLOW_MOVMISALIGN was not set explicitly. * gcc.target/powerpc/pr69969.c: New test. --- gcc/config/rs6000/rs6000.c.jj 2016-02-26 12:05:39.0 +0100 +++ gcc/config/rs6000/rs6000.c 2016-02-26 15:58:29.250259330 +0100 @@ -4207,7 +4207,8 @@ rs6000_option_override_internal (bool gl else if (TARGET_ALLOW_MOVMISALIGN && !TARGET_VSX) { - if (TARGET_ALLOW_MOVMISALIGN > 0) + if (TARGET_ALLOW_MOVMISALIGN > 0 + && global_options_set.x_TARGET_ALLOW_MOVMISALIGN) error ("-mallow-movmisalign requires -mvsx"); TARGET_ALLOW_MOVMISALIGN = 0; --- gcc/testsuite/gcc.target/powerpc/pr69969.c.jj 2016-02-26 16:03:34.992101828 +0100 +++ gcc/testsuite/gcc.target/powerpc/pr69969.c 2016-02-26 16:03:25.0 +0100 @@ -0,0 +1,7 @@ +/* PR target/69969 */ +/* { dg-do compile } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8" } */ + +int bar (int x) { return x; } +__attribute__((__target__("no-vsx"))) int foo (int x) { return x; } /* { dg-bogus "-mallow-movmisalign requires -mvsx" } */ Jakub
Re: [hsa merge 08/10] HSAIL BRIG description header file
On Feb 26, 2016, at 10:58 AM, Martin Jambor wrote: > I have asked HSA foundation Thanks. > The license is going to be: > > The MIT License (MIT) Wonderful.
[PATCH] Fix up recent texinfo regression
Hi! I've noticed today: ../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A ../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A After reading info texinfo on @xref and @pxref, I believe it is invalid to use @xref this way, in the middle of a sentence, and indeed it looks quite weird what is produced. So the following patch attempts to fix it up by using @xref in a separate sentence. Tested on x86_64-linux, ok for trunk? BTW, it also seems that we use see @xref{...}, or See @xref{...}, in various places, that also looks wrong, for info that results in see *note ...::, but in *.pdf it is probably see See ..., Thus, perhaps we should check for this and remove the see/See words before @xref, and if we want see instead of See in *.pdf, use @pxref. 2016-02-26 Jakub Jelinek * doc/extend.texi (__builtin_alloca, __builtin_alloca_with_align): Fix @xref usage. --- gcc/doc/extend.texi.jj 2016-02-26 16:49:58.0 +0100 +++ gcc/doc/extend.texi 2016-02-26 20:41:16.611200672 +0100 @@ -10714,10 +10714,11 @@ it is the responsibility of its caller t cause it to exceed the stack size limit. The @code{__builtin_alloca} function is provided to make it possible to allocate on the stack arrays of bytes with an upper bound that may be -computed at run time. Since C99 @xref{Variable Length} Arrays offer +computed at run time. Since C99 Variable Length Arrays offer similar functionality under a portable, more convenient, and safer interface they are recommended instead, in both C99 and C++ programs where GCC provides them as an extension. +@xref{Variable Length}, for details. @end deftypefn @@ -10761,10 +10762,10 @@ the argument doesn't cause it to exceed The @code{__builtin_alloca_with_align} function is provided to make it possible to allocate on the stack overaligned arrays of bytes with an upper bound that may be computed at run time. Since C99 -@xref{Variable Length} Arrays offer the same functionality under +Variable Length Arrays offer the same functionality under a portable, more convenient, and safer interface they are recommended instead, in both C99 and C++ programs where GCC provides them as -an extension. +an extension. @xref{Variable Length}, for details. @end deftypefn Jakub
C++ PATCH for c++/69958 (wrong sizeof...)
alias templates are supposed to be transparent, so when we see template using wrapped2 = list>; the size_for needs to be expanded into sizeof...(something). There is no way to write this "something" in C++, but it's simple enough to leave it as an *_ARGUMENT_PACK within the compiler. Tested x86_64-pc-linux-gnu, applying to trunk. commit 772d7b6f7c13119525da6e470c8729f14f35152f Author: Jason Merrill Date: Fri Feb 26 13:53:08 2016 -0500 PR c++/69958 * pt.c (make_argument_pack): New. (tsubst_copy) [SIZEOF_EXPR]: Handle partial expansion. (tsubst_copy_and_build): Likewise. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index cd3eb67..b5855a8 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -7,6 +7,25 @@ get_pattern_parm (tree parm, tree tmpl) return patparm; } +/* Make an argument pack out of the TREE_VEC VEC. */ + +static tree +make_argument_pack (tree vec) +{ + tree pack; + tree elt = TREE_VEC_ELT (vec, 0); + if (TYPE_P (elt)) +pack = cxx_make_type (TYPE_ARGUMENT_PACK); + else +{ + pack = make_node (NONTYPE_ARGUMENT_PACK); + TREE_TYPE (pack) = TREE_TYPE (elt); + TREE_CONSTANT (pack) = 1; +} + SET_ARGUMENT_PACK_ARGS (pack, vec); + return pack; +} + /* Substitute ARGS into the vector or list of template arguments T. */ static tree @@ -14066,7 +14085,8 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl) } case SIZEOF_EXPR: - if (PACK_EXPANSION_P (TREE_OPERAND (t, 0))) + if (PACK_EXPANSION_P (TREE_OPERAND (t, 0)) + || ARGUMENT_PACK_P (TREE_OPERAND (t, 0))) { tree expanded, op = TREE_OPERAND (t, 0); int len = 0; @@ -14077,7 +14097,11 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl) ++cp_unevaluated_operand; ++c_inhibit_evaluation_warnings; /* We only want to compute the number of arguments. */ - expanded = tsubst_pack_expansion (op, args, complain, in_decl); + if (PACK_EXPANSION_P (op)) + expanded = tsubst_pack_expansion (op, args, complain, in_decl); + else + expanded = tsubst_template_args (ARGUMENT_PACK_ARGS (op), + args, complain, in_decl); --cp_unevaluated_operand; --c_inhibit_evaluation_warnings; @@ -14093,13 +14117,15 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl) return error_mark_node; else if (PACK_EXPANSION_P (expanded) || (TREE_CODE (expanded) == TREE_VEC - && len > 0 - && PACK_EXPANSION_P (TREE_VEC_ELT (expanded, len-1 + && pack_expansion_args_count (expanded))) + { - if (TREE_CODE (expanded) == TREE_VEC) - expanded = TREE_VEC_ELT (expanded, len - 1); + if (PACK_EXPANSION_P (expanded)) + /* OK. */; + else if (TREE_VEC_LENGTH (expanded) == 1) + expanded = TREE_VEC_ELT (expanded, 0); else - PACK_EXPANSION_SIZEOF_P (expanded) = true; + expanded = make_argument_pack (expanded); if (TYPE_P (expanded)) return cxx_sizeof_or_alignof_type (expanded, SIZEOF_EXPR, @@ -16162,7 +16188,8 @@ tsubst_copy_and_build (tree t, length, stride, TREE_TYPE (op1))); } case SIZEOF_EXPR: - if (PACK_EXPANSION_P (TREE_OPERAND (t, 0))) + if (PACK_EXPANSION_P (TREE_OPERAND (t, 0)) + || ARGUMENT_PACK_P (TREE_OPERAND (t, 0))) RETURN (tsubst_copy (t, args, complain, in_decl)); /* Fall through */ diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C new file mode 100644 index 000..1187429 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C @@ -0,0 +1,33 @@ +// PR c++/69958 +// { dg-do compile { target c++11 } } + +typedef decltype(sizeof(int)) size_t; + +template +struct list { }; + +template +struct size { }; + +template +using size_for = size; + +template struct assert_same; +template struct assert_same {}; + +template +using wrapped = list>; + +// This assertion fails (produces size<4>) +assert_same< +list>, +wrapped> a3; + + +template +using wrapped2 = list>; + +// This assertion fails (produces size<2>) +assert_same< +list>, +wrapped2> a4; diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C new file mode 100644 index 000..0e8096d --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C @@ -0,0 +1,33 @@ +// PR c++/69958 +// { dg-do compile { target c++11 } } + +typedef decltype(sizeof(int)) size_t; + +template +struct list { }; + +template +struct size { }; + +template +using size_for = size; + +template struct assert_same; +template struct assert_same {}; + +template +using wrapped = list>; + +// This assertion fails (produces size<4>) +assert_same< +list>, + wrapped> a3; + + +template +using wrapped2 = list>; + +// This assertion fails (produces size<2>) +assert_same< +list>, + wrapped2> a4;
Re: [PATCH] Fix "no-vsx" target attribute handling (PR target/69969)
On Fri, Feb 26, 2016 at 2:27 PM, Jakub Jelinek wrote: > Hi! > > Most of the errors and warnings in rs6000_option_override_internal > are emitted only if the particular option is explicit, e.g. > if (TARGET_P9_DFORM && !TARGET_P9_VECTOR) > { > if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) > error ("-mpower9-dform requires -mpower9-vector"); > rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM; > } > and many others, which is right, but for the > -mallow-movmisalign requires -mvsx > error it doesn't do this, so if say -mcpu=power8 compiled TU > contains a routine with target ("no-vsx") attribute, we get this > error, even when the user hasn't done anything we should complain about. > > Fixed by following what we do for the other options, bootstrapped/regtested > on powerpc64le-linux (and powerpc64-linux, but regtest is still pending > there). Ok for trunk? > > 2016-02-26 Jakub Jelinek > > PR target/69969 > * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't > complain about -mallow-movmisalign without -mvsx if > TARGET_ALLOW_MOVMISALIGN was not set explicitly. > > * gcc.target/powerpc/pr69969.c: New test. Seems reasonable. LGTM Thanks, David
Re: [PATCH] Fix up recent texinfo regression
On 02/26/2016 12:51 PM, Jakub Jelinek wrote: Hi! I've noticed today: ../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A ../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A After reading info texinfo on @xref and @pxref, I believe it is invalid to use @xref this way, in the middle of a sentence, and indeed it looks quite weird what is produced. So the following patch attempts to fix it up by using @xref in a separate sentence. Tested on x86_64-linux, ok for trunk? Thanks for fixing it up. I noticed the warning yesterday and have been meaning to look into it. Your change seems fine to me though I wonder if using a plain @ref{} instead would do the right thing. It's used in a bunch of places to refer to the Common Function Attributes section (for example). That way the referenced text wouldn't need to be repeated. Martin BTW, it also seems that we use see @xref{...}, or See @xref{...}, in various places, that also looks wrong, for info that results in see *note ...::, but in *.pdf it is probably see See ..., Thus, perhaps we should check for this and remove the see/See words before @xref, and if we want see instead of See in *.pdf, use @pxref. 2016-02-26 Jakub Jelinek * doc/extend.texi (__builtin_alloca, __builtin_alloca_with_align): Fix @xref usage. --- gcc/doc/extend.texi.jj 2016-02-26 16:49:58.0 +0100 +++ gcc/doc/extend.texi 2016-02-26 20:41:16.611200672 +0100 @@ -10714,10 +10714,11 @@ it is the responsibility of its caller t cause it to exceed the stack size limit. The @code{__builtin_alloca} function is provided to make it possible to allocate on the stack arrays of bytes with an upper bound that may be -computed at run time. Since C99 @xref{Variable Length} Arrays offer +computed at run time. Since C99 Variable Length Arrays offer similar functionality under a portable, more convenient, and safer interface they are recommended instead, in both C99 and C++ programs where GCC provides them as an extension. +@xref{Variable Length}, for details. @end deftypefn @@ -10761,10 +10762,10 @@ the argument doesn't cause it to exceed The @code{__builtin_alloca_with_align} function is provided to make it possible to allocate on the stack overaligned arrays of bytes with an upper bound that may be computed at run time. Since C99 -@xref{Variable Length} Arrays offer the same functionality under +Variable Length Arrays offer the same functionality under a portable, more convenient, and safer interface they are recommended instead, in both C99 and C++ programs where GCC provides them as -an extension. +an extension. @xref{Variable Length}, for details. @end deftypefn Jakub
Re: [PATCH] Fix PR69760
On Wed, Feb 24, 2016 at 6:49 AM, Richard Biener wrote: > > The following fixes bogus SCEV analysis for expressions that are only > executed conditionally [note: conditionally here doesn't include > after a taken exit]. Basically we have to make sure further analysis > does not attempt to use undefined overflow for expressions we don't > know whether they are computed in the original source (for all > loop iterations). This would result in bogus CHRECs as can be seen > in this PR. > > The solution is to re-write those expressions in a way so overflow > behavior is well-defined. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Richard. > > 2016-02-24 Richard Biener > Jakub Jelinek > > PR middle-end/69760 > * tree-scalar-evolution.c (interpret_rhs_expr): Re-write > conditionally executed ops to well-defined overflow behavior. > This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69983 -- H.J.
Re: [PATCH, rs6000] Fixing PR 67145
On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote: > It's the simplify-rtx.c portion of the patch that fixes the i686 regression. > > In the PR, Alan raises some good points, but I don't believe that we can > address those for gcc6. A new rtl reassoc optimization that takes loop > invariance into account will have to wait. > > But we do need to take care of the rs6000 ice that results, and that's the > bulk of the patch -- allowing CA to be sorted to any register of the plus > chain. The rs6000 change is much too risky for stage 4 as well in my opinion. It also seems it will regress codegen (for which we have no testcases yet, sigh). What is the rs6000 ICE? > Some notes: > > ca_operand doesn't work as written, since CA_REGNO is not an available > register, and thus doesn't satisfy register_operand. Yes, curious. But when it was written it *did* match. Huh. > Is there any particular reason that subf<>3_carry_in_m1 was written with > minus rather than plus like all of the other patterns? It's what RTL simplification comes up with, so it is the only thing combine tries. The same reason why there are separate patterns for 0 and -1 at all. Segher
Re: [PATCH, rs6000] Fixing PR 67145
On 02/26/2016 12:41 PM, Segher Boessenkool wrote: > On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote: >> It's the simplify-rtx.c portion of the patch that fixes the i686 regression. >> >> In the PR, Alan raises some good points, but I don't believe that we can >> address those for gcc6. A new rtl reassoc optimization that takes loop >> invariance into account will have to wait. >> >> But we do need to take care of the rs6000 ice that results, and that's the >> bulk of the patch -- allowing CA to be sorted to any register of the plus >> chain. > > The rs6000 change is much too risky for stage 4 as well in my opinion. > It also seems it will regress codegen (for which we have no testcases > yet, sigh). > > What is the rs6000 ICE? The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot in the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an ICE due to an unrecognizable insn. The rs6000 patch changes the backend to allow (reg:M ca) to appear exactly once in (plus (plus r1 r2) r3). And we do have a test case for it -- the ICE. How do you imagine the rs6000 change will regress codegen? >> ca_operand doesn't work as written, since CA_REGNO is not an available >> register, and thus doesn't satisfy register_operand. > > Yes, curious. But when it was written it *did* match. Huh. It probably started to fail with r181760, back in 2011. But the predicate wasn't much used in the rs6000 backend, so I guess it hadn't really mattered. >> Is there any particular reason that subf<>3_carry_in_m1 was written with >> minus rather than plus like all of the other patterns? > > It's what RTL simplification comes up with, so it is the only thing > combine tries. The same reason why there are separate patterns for 0 > and -1 at all. Well, it appears to use the new pattern just as well. A few trivial tests are in fact simplified as one would expect during combine. r~
Re: [PATCH] Fix up recent texinfo regression
On Fri, Feb 26, 2016 at 01:09:57PM -0700, Martin Sebor wrote: > On 02/26/2016 12:51 PM, Jakub Jelinek wrote: > >I've noticed today: > >../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A > >../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A > > > >After reading info texinfo on @xref and @pxref, I believe it is invalid > >to use @xref this way, in the middle of a sentence, and indeed it looks > >quite weird what is produced. So the following patch attempts to fix it up > >by using @xref in a separate sentence. Tested on x86_64-linux, ok for > >trunk? > > Thanks for fixing it up. I noticed the warning yesterday and have > been meaning to look into it. > > Your change seems fine to me though I wonder if using a plain @ref{} > instead would do the right thing. It's used in a bunch of places to > refer to the Common Function Attributes section (for example). That > way the referenced text wouldn't need to be repeated. It would read as See C99 Section x.y [Variable Length], page NN Arrays offer in the pdf and See C99 *note Variable Length:: Arrays offer if I read the texinfo doc right. Both look just too weird to me. Jakub
Re: [PATCH, rs6000] Fixing PR 67145
On Fri, Feb 26, 2016 at 12:51:10PM -0800, Richard Henderson wrote: > > What is the rs6000 ICE? > > The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot > in > the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an > ICE due to an unrecognizable insn. So it changes existing RTL to a form that does not pass recog. Uh. > How do you imagine the rs6000 change will regress codegen? Combine of sequences with double-length adds. > >> ca_operand doesn't work as written, since CA_REGNO is not an available > >> register, and thus doesn't satisfy register_operand. > > > > Yes, curious. But when it was written it *did* match. Huh. > > It probably started to fail with r181760, back in 2011. But the predicate > wasn't much used in the rs6000 backend, so I guess it hadn't really mattered. It failed more recently though, that's what r215429 is about. Oh, before that it did test just the code, didn't use register_operand. *facepalm* Segher
Re: [PATCH, rs6000] Fixing PR 67145
On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote: > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index 450fa8b..9d55e7b 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -4421,9 +4421,17 @@ simplify_plus_minus (enum rtx_code code, machine_mode > mode, rtx op0, >n_ops = i; > } > > - /* If nothing changed, fail. */ > + /* If nothing changed, check that rematerialization of rtl instructions > + is still required. */ >if (!canonicalized) > -return NULL_RTX; > +{ > + /* Perform rematerialization if only all operands are registers and > + all operations are PLUS. */ > + for (i = 0; i < n_ops; i++) > + if (ops[i].neg || !REG_P (ops[i].op)) > + return NULL_RTX; > + goto gen_result; > +} If you check for fixed registers as well here, does that work for you? Segher
[PING^2] Re: [PATCH] [wwwdocs] Add a "Plugin issues" section to the GCC 6 porting guide
Ping On Thu, 2016-02-18 at 10:44 -0500, David Malcolm wrote: > Ping: > https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00765.html > > > On Thu, 2016-02-11 at 10:12 -0500, David Malcolm wrote: > > I've (mostly) ported gcc-python-plugin to gcc 6. The attached > > patch > > for the gcc website starts a new "Plugin issues" section, and > > covers > > the biggest issue I ran into (FWIW the suggested compatibility > > typedef > > is the one I committed to gcc-python-plugin). > > > > Validates. > > > > OK to commit?
Re: [PATCH, rs6000] Fixing PR 67145
On 02/26/2016 01:01 PM, Segher Boessenkool wrote: >> How do you imagine the rs6000 change will regress codegen? > > Combine of sequences with double-length adds. What sort of test case are you imagining here? The trivial tests I've looked at have all been optimal (before and after). > It failed more recently though, that's what r215429 is about. Oh, before > that it did test just the code, didn't use register_operand. *facepalm* Ah yes, exactly. I didn't think to annotate ca_operand for more recent changes. ;-) r~
Re: Revert gcc r227962
On 02/26/2016 04:04 AM, JonY wrote: Hi, I've submitted a patch that was committed as r227962, it causes some unintended side effects (namely libuuid on Cygwin). Can someone please revert? Kai still needs some time to setup his gcc development environment. We'd need to have a better sense of why this is causing problems. If Kai could chime in with a description of the problem it would be helpful. jeff
Re: [PATCH 2/4] Replace ENABLE_CHECKING macro with flag_checking in GNAT
On 02/24/2016 07:10 AM, marxin wrote: gcc/ada/ChangeLog: 2016-02-24 Martin Liska * gcc-interface/utils.c (set_reverse_storage_order_on_pad_type): Replace ENABLE_CHECKING macro with flag_checking. OK. jeff
Re: [PATCH] [wwwdocs] Add a "Plugin issues" section to the GCC 6 porting guide
Hi David, On Thu, 11 Feb 2016, David Malcolm wrote: > I've (mostly) ported gcc-python-plugin to gcc 6. The attached patch > for the gcc website starts a new "Plugin issues" section, and covers > the biggest issue I ran into (FWIW the suggested compatibility typedef > is the one I committed to gcc-python-plugin). this is lovely, thanks for doing it! Just some small editorial comments; please go ahead and commit after making (or at least considering) them. Index: htdocs/gcc-6/porting_to.html === +"gimple" became a struct, rather than a pointer Should this be gimple? +Prior to GCC 6, "gimple" meant a pointer to a statement. It was a +typedef aliasing the type struct gimple_statement_base *: Same here. +"gimple" is now the statement struct itself, not a pointer. +The "gimple" struct is now the base class of the gimple statement class +hierarchy, and throughout gcc every gimple was changed to a ..and here. gcc -> GCC How about "every instance of gimple" ? +(revision +https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=42acab1cd6812e2d9e49f4132176f5505f49a0e5";>r227941 +was the commit in question). The typedef const_gimple is no more; "is the commit" ? +change. If you aim for compatibility between both gcc 6 and earlier +releases of gcc, it may be cleanest to introduce a compatibility typedef GCC 6 GCC Thank you, Gerald
Re: [wwwdocs] Update -Wnonnull description
On Thu, 25 Feb 2016, Marek Polacek wrote: > Now that -Wnonnull-comare has been split out of -Wnonnull, we should > also update the porting_to text. Is this sufficient? It is for me. :-) Unless anyone else complains, go ahead and commit. Thanks, Gerald
Re: [PATCH, rs6000] Fixing PR 67145
On 02/26/2016 01:03 PM, Segher Boessenkool wrote: > On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote: >> + /* Perform rematerialization if only all operands are registers and >> + all operations are PLUS. */ >> + for (i = 0; i < n_ops; i++) >> +if (ops[i].neg || !REG_P (ops[i].op)) >> + return NULL_RTX; >> + goto gen_result; >> +} > > If you check for fixed registers as well here, does that work for you? Maybe. It prevents canonicalization of reg+fp vs fp+reg, which could well occur via arithmetic on locally allocated arrays. I guess for the purposes of stage4 I'd be willing to do if (ops[i].neg || !REG_P (ops[i].op) || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER && fixed_regs[REGNO (ops[i].op)] && !global_regs[REGNO (ops[i].op)] && ops[i].op != frame_pointer_rtx && ops[i].op != arg_pointer_rtx && ops[i].op != stack_pointer_rtx)) It's pretty ugly though, and I wouldn't want to keep this forever. The rs6000 change really ought to be evaluated at some point. Given its scope, I see little difference to doing that now vs putting it off to gcc7. r~
Re: [PATCH, rs6000] Fixing PR 67145
On Fri, Feb 26, 2016 at 01:10:17PM -0800, Richard Henderson wrote: > On 02/26/2016 01:01 PM, Segher Boessenkool wrote: > >> How do you imagine the rs6000 change will regress codegen? > > > > Combine of sequences with double-length adds. > > What sort of test case are you imagining here? The trivial tests I've looked > at have all been optimal (before and after). There are many different combinations, for example (32-bit code) long long add_s42(long long a, int b) { return a + ((long long) b << 32) + 42; } and we generate optimal code on BE for all of those (not on LE yet, and there is the issue with open-coded carry chains, and and and, and things shift around. Also all the scc things. But I really should submit some testcases). Segher
Re: [PATCH] Fix up recent texinfo regression
On 02/26/2016 01:55 PM, Jakub Jelinek wrote: On Fri, Feb 26, 2016 at 01:09:57PM -0700, Martin Sebor wrote: On 02/26/2016 12:51 PM, Jakub Jelinek wrote: I've noticed today: ../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A ../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A After reading info texinfo on @xref and @pxref, I believe it is invalid to use @xref this way, in the middle of a sentence, and indeed it looks quite weird what is produced. So the following patch attempts to fix it up by using @xref in a separate sentence. Tested on x86_64-linux, ok for trunk? Thanks for fixing it up. I noticed the warning yesterday and have been meaning to look into it. Your change seems fine to me though I wonder if using a plain @ref{} instead would do the right thing. It's used in a bunch of places to refer to the Common Function Attributes section (for example). That way the referenced text wouldn't need to be repeated. It would read as See C99 Section x.y [Variable Length], page NN Arrays offer in the pdf and See C99 *note Variable Length:: Arrays offer if I read the texinfo doc right. Both look just too weird to me. I agree, that wouldn't look right. What you have should look better. FWIW, more out of curiosity of how it works than anything else I pulled up the PDF manual to see what some uses of @ref{} get rendered as. It doesn't look like the "Section X.Y" part is there, contrary to my reading of the Texinfo manual. For example, the reference below: See also the @ref{AVR Named Address Spaces} section for an alternate way to locate and access data in flash memory. looks like this in the GCC 5.3.0 PDF manual: See also the [AVR Named Address Spaces], page 382 section for an alternate way... It doesn't look quite right either but the "Section X.Y" text isn't here (the text in brackets is the link to the section). There is a way to replace the name of the referenced section with arbitrary text. I found an example of it here: Notice that attribute @ref{AVR Variable Attributes,,@code{progmem}} locates data in flash which renders as Notice that attribute [progmem], page 434 locates data in flash So presumably, to get what I was going for, the reference would have to be written like so: Since C99 @ref{Variable Length,,Variable Length Arrays} offer similar functionality although it would still have the "page 123" text appended to it. I don't see how to avoid that. It's probably best to stick to only basic uses in sentences of their own or in parenthetical notes. Martin
Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false
On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh wrote: > On Thu, Feb 25, 2016 at 09:25:45AM +, Kyrill Tkachov wrote: >> Hi all, >> >> In this wrong-code PR we get bad code when synthesising a TImode right shift >> by variable amount using DImode shifts during expand. >> >> The expand_double_word_shift function expands two paths: one where the >> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for >> aarch64) at runtime >> and one where it's less than that, and performs a conditional select between >> the two in the end. >> >> The second path is the one that goes wrong here. >> The algorithm it uses for calculating a TImode shift when the variable shift >> amount is <64 is: >> >> >> --DImode- --DImode- --DImode- --DImode- >> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ >> source-lo ___| } >> var_amnt >> >> 1) carry = source-hi << 1 >> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending >> on shift_truncation_mask >> 3) carry = carry << tmp // net effect is that carry is source-hi shifted >> left by BITS_PER_WORD - var_amnt >> 4) target-lo = source-lo >>u var_amnt //unsigned shift. >> 5) target-lo = target-lo | carry >> 6) target-hi = source-hi >> var_amnt >> >> where the two DImode words source-hi and source-lo are the two words of the >> source TImode value and var_amnt is the register holding the variable shift >> amount. This is performed by the expand_subword_shift function in optabs.c. >> >> Step 2) is valid only if the target truncates the shift amount by the width >> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and >> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case. >> >> Step 3) is the one that goes wrong. On aarch64 a left shift is usually >> implemented using the LSL instruction but we also have alternatives that can >> use the SIMD registers and the USHL instruction. In this case we end up >> using the USHL instruction. However, although the USHL instruction does >> a DImode shift, it doesn't truncate the shift amount by 64, but rather by >> 255. Furthermore, the shift amount is interpreted as a 2's complement signed >> integer and if it's negative performs a right shift. This is in contrast >> with the normal LSL instruction which just performs an unsigned shift by an >> amount truncated by 64. >> >> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't >> assume shift amounts are truncated unless we know we can only ever use the >> LSL instructions for variable shifts. >> >> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode, >> so expand_subword_shift assumes that since the mask is non-zero it's a valid >> shift truncation mask. The documentation for TARGET_SHIFT_TRUNCATION_MASK >> says: >> " The default implementation of this function returns >> @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED} >> and 0 otherwise." >> >> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their >> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when >> SHIFT_COUNT_TRUNCATED is false. This is what the patch does, and with it >> step 2) becomes: >> 2) tmp = BITS_PER_WORD - 1 - var_amnt >> which behaves correctly on aarch64. >> >> Unfortunately, the testcase from the PR has very recently gone latent on >> trunk because it depends on register allocation picking a particular >> alternative from the *aarch64_ashl_sisd_or_int_3 pattern in aarch64.md. >> I tried to do some inline asm tricks to get it to force the correct >> constraints but was unsuccessful. Nevertheless I've included the testcase in >> the patch. I suppose it's better than nothing. > > Thanks for the great write-up. This level of detail is very helpful for > review. > > OK for trunk. > > Thanks, > James > >> >> Bootstrapped and tested on aarch64. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> >> 2016-02-25 Kyrylo Tkachov >> >> PR target/69613 >> * config/aarch64/aarch64.c (aarch64_shift_truncation_mask): >> Return 0 if !SHIFT_COUNT_TRUNCATED >> >> 2016-02-25 Kyrylo Tkachov >> >> PR target/69613 >> * gcc.dg/torture/pr69613.c: New test. > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index >> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 >> 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT >> aarch64_shift_truncation_mask (machine_mode mode) >> { >>return >> -(aarch64_vector_mode_supported_p (mode) >> +(!SHIFT_COUNT_TRUNCATED >> + || aarch64_vector_mode_supported_p (mode) >> || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - >> 1); >> } >> >> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c >> b/gcc/testsuite/gcc.dg/torture/pr69613.c >> new file mode 100644 >> ind
Re: [C PATCH] Fix C error-recovery (PR c/69796, PR c/69974)
On 02/26/2016 07:43 AM, Jakub Jelinek wrote: Hi! Already PR69483 and these two further PRs show that it is really a bad idea to set TREE_TYPE of decls with incomplete types to error_mark_node, there are lots of places in the middle-end that don't expect error_mark_nodes appearing so late. I've bootstrapped/regtested on x86_64-linux and i686-linux following patch which just doesn't do anything with the type, we don't emit the error multiple times, because if the FE emits errors, cgraphunit doesn't attempt to assemble variables or compile functions (just gimplifies them) if seen_error (). Attached is then untested alternative, set the type to something that should hopefully not cause ICEs nor further errors/warnings during the error-recovery. Ok for trunk, or shall I test the other patch instead? 2016-02-26 Jakub Jelinek PR c/69796 PR c/69974 * c-parser.c (c_parser_translation_unit): Don't change TREE_TYPE of incomplete decls to error_mark_node. * gcc.dg/pr69796.c: New test. * gcc.dg/pr69974.c: New test. This one leaves the type incomplete, right? So ISTM it's somewhat more likely than the second to expose other errors later with code that doesn't expect the type to be incomplete (much like other code doesn't expect to find error_mark_node in here). The second patch at least puts a real type in there. I suspect that's less likely to cause problems downstream, except perhaps with diagnostics. I could argue for either. I almost asked for the latter to be tested, but the more I think about it, I don't like slamming in another type like that. I'll conditionally approve -- if nobody objects in 72hrs, consider the first patch OK for the trunk. jeff
Re: [PATCH, rs6000] Fixing PR 67145
On Fri, Feb 26, 2016 at 01:35:10PM -0800, Richard Henderson wrote: > On 02/26/2016 01:03 PM, Segher Boessenkool wrote: > > On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote: > >> + /* Perform rematerialization if only all operands are registers and > >> + all operations are PLUS. */ > >> + for (i = 0; i < n_ops; i++) > >> + if (ops[i].neg || !REG_P (ops[i].op)) > >> +return NULL_RTX; > >> + goto gen_result; > >> +} > > > > If you check for fixed registers as well here, does that work for you? > > Maybe. It prevents canonicalization of reg+fp vs fp+reg, which could well > occur via arithmetic on locally allocated arrays. Where are these canonicalization rules described? > I guess for the purposes of stage4 I'd be willing to do > > if (ops[i].neg > || !REG_P (ops[i].op) > || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER > && fixed_regs[REGNO (ops[i].op)] > && !global_regs[REGNO (ops[i].op)] > && ops[i].op != frame_pointer_rtx > && ops[i].op != arg_pointer_rtx > && ops[i].op != stack_pointer_rtx)) > > It's pretty ugly though, and I wouldn't want to keep this forever. Yeah. > The rs6000 change really ought to be evaluated at some point. Given its > scope, > I see little difference to doing that now vs putting it off to gcc7. It is stage 4. This rs6000 change has almost 100% chance of introducing regressions. Segher