Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)
On Thu, Nov 17, 2016 at 05:39:57PM -0800, Marek Polacek wrote: > > Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it, > > it will be just insanely huge, but each SAVE_EXPR appears exactly twice > > in its containing SAVE_EXPR and the second time cp_genericize_r sees > > the SAVE_EXPR, it will do the > > 1138 /* Other than invisiref parms, don't walk the same tree > > twice. */ > > 1139 if (p_set->contains (stmt)) > > 1140{ > > 1141 *walk_subtrees = 0; > > 1142 return NULL_TREE; > > 1143} > > early exit. > > Clearly I misread things here. I blame the cold I've caught! > > In any case, I'm sorry for wasting your time and I'll just close the PR. I bet it is a compile-time hog with -fsanitize=undefined -fdump-tree-original. So, if anything, we'd need some hack in the SAVE_EXPR generic printing code, say track the depth of SAVE_EXPRs being printed concurrently and if we go above a certain level of them (say 10), change into a different mode where we print the content of SAVE_EXPR just the first time we encounter it + e.g. the SAVE_EXPR address and then when encountering the same SAVE_EXPR again, just print the address. When the SAVE_EXPR nesting level shrinks below 10, clear the pointer set of SAVE_EXPRs and resume normal behavior. -fsanitize=undefined -fdump-tree-gimple is fast. Jakub
Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
On Thu, 17 Nov 2016, Segher Boessenkool wrote: > For code like the testcase in PR71785 GCC factors all the indirect branches > to a single dispatcher that then everything jumps to. This is because > having many indirect branches with each many jump targets does not scale > in large parts of the compiler. Very late in the pass pipeline (right > before peephole2) the indirect branches are then unfactored again, by > the duplicate_computed_gotos pass. > > This pass works by replacing branches to such a common dispatcher by a > copy of the dispatcher. For code like this testcase this does not work > so well: most cases do a single addition instruction right before the > dispatcher, but not all, and we end up with only two indirect jumps: the > one without the addition, and the one with the addition in its own basic > block, and now everything else jumps _there_. > > This patch rewrites the algorithm to deal with this. It also makes it > simpler: it does not need the "candidates" array anymore, it does not > need RTL layout mode, it does not need cleanup_cfg, and it does not > need to keep track of what blocks it already visited. > > Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version > of the testcase that has 2000 cases instead of 4. Is this okay for trunk? Looks good to me - a single question below: > > Segher > > > 2016-11-17 Segher Boessenkool > > PR rtl-optimization/71785 > * bb-reorder.c (maybe_duplicate_computed_goto): New function. > (duplicate_computed_gotos): New function. > (pass_duplicate_computed_gotos::execute): Rewrite. > > --- > gcc/bb-reorder.c | 214 > --- > 1 file changed, 93 insertions(+), 121 deletions(-) > > diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c > index 85bc569..90de2de 100644 > --- a/gcc/bb-reorder.c > +++ b/gcc/bb-reorder.c > @@ -2587,11 +2587,100 @@ make_pass_reorder_blocks (gcc::context *ctxt) >return new pass_reorder_blocks (ctxt); > } > > +/* Duplicate a block (that we already know ends in a computed jump) into its > + predecessors, where possible. Return whether anything is changed. */ > +static bool > +maybe_duplicate_computed_goto (basic_block bb, int max_size) > +{ > + if (single_pred_p (bb)) > +return false; > + > + /* Make sure that the block is small enough. */ > + rtx_insn *insn; > + FOR_BB_INSNS (bb, insn) > +if (INSN_P (insn)) > + { > + max_size -= get_attr_min_length (insn); > + if (max_size < 0) > +return false; > + } > + > + bool changed = false; > + edge e; > + edge_iterator ei; > + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) > +{ > + basic_block pred = e->src; > + > + /* Do not duplicate BB into PRED if that is the last predecessor, or if > + we cannot merge a copy of BB with PRED. */ > + if (single_pred_p (bb) > + || !single_succ_p (pred) > + || e->flags & EDGE_COMPLEX > + || pred->index < NUM_FIXED_BLOCKS > + || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred > + { > + ei_next (&ei); > + continue; > + } > + > + if (dump_file) > + fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n", > + bb->index, e->src->index); > + > + /* Remember if PRED can be duplicated; if so, the copy of BB merged > + with PRED can be duplicated as well. */ > + bool can_dup_more = can_duplicate_block_p (pred); > + > + /* Make a copy of BB, merge it into PRED. */ > + basic_block copy = duplicate_block (bb, e, NULL); > + emit_barrier_after_bb (copy); > + reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred)); > + merge_blocks (pred, copy); there is can_merge_blocks_p and I would have expected the RTL CFG hooks to do the insn "merging" (you use reorder_insns_nobb for this which is actually deprecated?). I suppose if you'd specify pred as third arg to duplicate_block the CFG hook would have done its work (can_merge_blocks_p checks a->next_bb == b)? I'm not that familiar with CFG-RTL mode. > + > + changed = true; > + > + /* Try to merge the resulting merged PRED into further predecessors. > */ > + if (can_dup_more) > + maybe_duplicate_computed_goto (pred, max_size); > +} > + > + return changed; > +} > + > /* Duplicate the blocks containing computed gotos. This basically unfactors > computed gotos that were factored early on in the compilation process to > - speed up edge based data flow. We used to not unfactoring them again, > - which can seriously pessimize code with many computed jumps in the source > - code, such as interpreters. See e.g. PR15242. */ > + speed up edge based data flow. We used to not unfactor them again, which > + can seriously pessimize code with many computed jumps in the source code, > + such as interpreters. See e.g. PR15242. */ > +static void > +duplic
Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
On Fri, Nov 18, 2016 at 09:09:40AM +0100, Richard Biener wrote: > > + /* Make a copy of BB, merge it into PRED. */ > > + basic_block copy = duplicate_block (bb, e, NULL); > > + emit_barrier_after_bb (copy); > > + reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred)); > > + merge_blocks (pred, copy); > > there is can_merge_blocks_p and I would have expected the RTL CFG hooks > to do the insn "merging" (you use reorder_insns_nobb for this which is > actually deprecated?). I suppose if you'd specify pred as third arg > to duplicate_block the CFG hook would have done its work > (can_merge_blocks_p checks a->next_bb == b)? I'm not that familiar > with CFG-RTL mode. The third arg to duplicate_block ("after") doesn't do anything in either RTL mode: it only is passed to move_block_after, but that is a noop for the RTL modes (and its result is not checked by duplicate_block, ouch). can_merge_blocks_p will always return true here (except I forgot to check for simplejump_p -- I'll add that). rtl_merge_blocks does not check rtl_can_merge_blocks_p (and you get a quite spectacular ICE when you get it wrong: everything between the two blocks is deleted :-) ). I'll make a separate patch that checks it (layout mode already does). reorder_insns_nobb is deprecated but it does an important job for which there is no replacement. In many cases you can do better than doing manual surgery on the insn stream of course. Thanks for the review, Segher
Re: [PATCH] [AArch64] Fix PR78382
Hi Naveen, On 18/11/16 05:30, Hurugalawadi, Naveen wrote: Hi, Please find attached the patch that fixes PR78382. The "SYMBOL_SMALL_TLSGD" was not handled for ILP32. Hence it generates error when compiled for ILP32. The attached patch adds the support and handles it properly as expected for ILP32. Please review the patch and let me know if its okay? Regression tested on AArch64 with no regressions. Bootstrap on an aarch64 system is also required. Thanks, Naveen 2016-11-18 Naveen H.S * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle SYMBOL_SMALL_TLSGD for ILP32. * config/aarch64/aarch64.md : tlsgd_small modified into tlsgd_small_ to support SImode and DImode. *tlsgd_small modified into *tlsgd_small_ to support SImode and DImode. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 11d41cf..1688f0d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1374,10 +1374,17 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, case SYMBOL_SMALL_TLSGD: { rtx_insn *insns; - rtx result = gen_rtx_REG (Pmode, R0_REGNUM); + rtx result; + if (TARGET_ILP32) + result = gen_rtx_REG (SImode, R0_REGNUM); + else + result = gen_rtx_REG (DImode, R0_REGNUM); Just use gen_rtx_REG (ptr_mode, R0_REGNUM)? start_sequence (); - aarch64_emit_call_insn (gen_tlsgd_small (result, imm)); + if (TARGET_ILP32) + aarch64_emit_call_insn (gen_tlsgd_small_si (result, imm)); + else + aarch64_emit_call_insn (gen_tlsgd_small_di (result, imm)); insns = get_insns (); end_sequence (); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a652a7c..4833c7f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5089,20 +5089,20 @@ ;; The TLS ABI specifically requires that the compiler does not schedule ;; instructions in the TLS stubs, in order to enable linker relaxation. ;; Therefore we treat the stubs as an atomic sequence. -(define_expand "tlsgd_small" +(define_expand "tlsgd_small_" [(parallel [(set (match_operand 0 "register_operand" "") (call (mem:DI (match_dup 2)) (const_int 1))) -(unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS) +(unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS) (clobber (reg:DI LR_REGNUM))])] "" { operands[2] = aarch64_tls_get_addr (); }) -(define_insn "*tlsgd_small" +(define_insn "*tlsgd_small_" [(set (match_operand 0 "register_operand" "") (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1))) - (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS) + (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS) (clobber (reg:DI LR_REGNUM)) ] "" diff --git a/gcc/testsuite/gcc.target/aarch64/pr78382.c b/gcc/testsuite/gcc.target/aarch64/pr78382.c new file mode 100644 index 000..6c98e5e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr78382.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -fpic -mabi=ilp32 -mtls-dialect=trad" } */ + +__thread int abc; +void +foo () +{ + int *p; + p = &abc; +}
Re: [PATCH, GCC/ARM] Make arm_feature_set agree with type of FL_* macros
On 17/11/16 20:41, Thomas Preudhomme wrote: On 17/11/16 09:10, Kyrill Tkachov wrote: On 16/11/16 18:09, Thomas Preudhomme wrote: Hi, I've rebased the patch to make arm_feature_set agree with type of FL_* macros on top of trunk rather than on top of the optional -mthumb patch. That involved doing the changes to gcc/config/arm/arm-protos.h rather than gcc/config/arm/arm-flags.h. I also took advantage of the fact that each line is changed to change the indentation to tabs and add dots in comments missing one. For reference, please find below the original patch description: Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array of 2 unsigned long. However, the flags stored in these two entries are (signed) int, being combinations of bits set via expression of the form 1 << bitno. This creates 3 issues: 1) undefined behavior when setting the msb (1 << 31) 2) undefined behavior when storing a flag with msb set (negative int) into one of the unsigned array entries (positive int) 3) waste of space since the top 32 bits of each entry is not used This patch changes the definition of FL_* macro to be unsigned int by using the form 1U << bitno instead and changes the definition of arm_feature_set to be an array of 2 unsigned (int) entries. *** gcc/ChangeLog *** 2016-10-15 Thomas Preud'homme * config/arm/arm-protos.h (FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32, FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E, FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM, FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV, FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT, FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST): Reindent comment, add final dot when missing and make value unsigned. (arm_feature_set): Use unsigned entries instead of unsigned long. Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state. Is this ok for trunk? Best regards, Thomas On 14/11/16 18:56, Thomas Preudhomme wrote: My apologize, I realized when trying to apply the patch that I wrote it on top of the optional -mthumb patch instead of the reverse. I'll rebase it to not screw up bisect. Best regards, Thomas On 14/11/16 14:47, Kyrill Tkachov wrote: On 14/11/16 14:07, Thomas Preudhomme wrote: Hi, Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array of 2 unsigned long. However, the flags stored in these two entries are (signed) int, being combinations of bits set via expression of the form 1 << bitno. This creates 3 issues: 1) undefined behavior when setting the msb (1 << 31) 2) undefined behavior when storing a flag with msb set (negative int) into one of the unsigned array entries (positive int) 3) waste of space since the top 32 bits of each entry is not used This patch changes the definition of FL_* macro to be unsigned int by using the form 1U << bitno instead and changes the definition of arm_feature_set to be an array of 2 unsigned (int) entries. Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state. Is this ok for trunk? Ok. Thanks, Kyrill Best regards, Thomas +#define FL_ARCH7 (1U << 22)/* Architecture 7. */ +#define FL_ARM_DIV(1U << 23)/* Hardware divide (ARM mode). */ +#define FL_ARCH8 (1U << 24)/* Architecture 8. */ +#define FL_CRC32 (1U << 25)/* ARMv8 CRC32 instructions. */ +#define FL_SMALLMUL (1U << 26)/* Small multiply supported. */ +#define FL_NO_VOLATILE_CE (1U << 27)/* No volatile memory in IT block. */ + +#define FL_IWMMXT (1U << 29)/* XScale v2 or "Intel Wireless MMX + technology". */ +#define FL_IWMMXT2(1U << 30)/* "Intel Wireless MMX2 +technology". */ +#define FL_ARCH6KZ(1U << 31)/* ARMv6KZ architecture. */ + +#define FL2_ARCH8_1 (1U << 0)/* Architecture 8.1. */ +#define FL2_ARCH8_2 (1U << 1)/* Architecture 8.2. */ +#define FL2_FP16INST (1U << 2)/* FP16 Instructions for ARMv8.2 and + later. */ Since you're here can you please replace the "Architecture " by "ARMv(7,8,8.1,8.2)-A" in these entries. That seems to make it less accurate though. For a start, most of them would also apply to the R profile. There is also a couple of them that would apply to the M profile: for instance FL_ARCH7 would be set for ARMv7-M and FL_ARCH8 would be set for ARMv8-M. Fair point. Ok as is then, Kyrill Best regards, Thomas
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 17/11/16 17:45, Segher Boessenkool wrote: On Thu, Nov 17, 2016 at 04:50:46PM +, Kyrill Tkachov wrote: Is loading/storing a pair as cheap as loading/storing a single register? In that case you could shrink-wrap per pair of registers instead. I suppose it can vary by microarchitecture. For the purposes of codegen I'd say it's more expensive than load/storing a single register (as there's more memory bandwidth required after all) but cheaper than two separate loads stores (alignment quirks notwithstanding). Interesting idea. That could help with code size too. I'll try it out. I'm encountering some difficulties implementing this idea. I want to still keep the per-register structures across the hooks but basically restrict the number of components in a basic block to an even number of FPRs and GPRs. I tried doing this in COMPONENTS_FOR_BB So your COMPONENTS_FOR_BB returns both components in a pair whenever one of those is needed? That should work afaics. I mean I still want to have one component per register and since emit_{prologue,epilogue}_components knows how to form pairs from the components passed down to it I just need to restrict the number of components in any particular basic block to an even number. So say a function can wrap 5 registers: x22,x23,x24,x25,x26. I want get_separate_components to return 5 components since in that hook we don't know how these registers are distributed across each basic block. components_for_bb has that information. In components_for_bb I want to restrict the components for a basic block to an even number, so if normally all 5 registers would be valid for wrapping in that bb I'd only choose 4 so I could form 2 pairs. But selecting only 4 of the 5 registers, say only x22,x23,x24,x25 leads to x26 not being saved or restored at all, even during the normal prologue and epilogue because x26 was marked as a component in components_for_bb and therefore omitted from the prologue and epilogue. So I'm thinking x26 should be removed from the wrappable components of a basic block by disqualify_components. I'm trying that approach now. Thanks, Kyrill but apparently this ended up not saving/restoring some of the registers at all because the components that were "filtered out" that way still made their way to the bitmap passed into SET_HANDLED_COMPONENTS and so the normal prologue/epilogue didn't end up saving and restoring them. I am not sure what this means? "filtered out"? Segher
Re: [PATCH 1/2] PR77822
Hi Dominik, On Thu, Nov 17, 2016 at 04:53:47PM +0100, Dominik Vogt wrote: > +/* A convenience macro to determine whether a SIZE lies inclusively > + within [1, RANGE], POS lies inclusively within between > + [0, RANGE - 1] and the sum lies inclusively within [1, RANGE]. */ > +#define SIZE_POS_IN_RANGE(SIZE, POS, RANGE) \ > + (IN_RANGE (POS, 0, (RANGE) - 1) \ > + && IN_RANGE (SIZE, 1, RANGE) \ > + && IN_RANGE ((SIZE) + (POS), 1, RANGE)) You should put parens around every use of SIZE, POS, RANGE -- there could be a comma operator in the macro argument. You don't check if SIZE+POS overflows / wraps around. If you really don't care about that, you can lose the > + && IN_RANGE (SIZE, 1, RANGE) \ part as well? Segher
Re: Add a load_extend_op wrapper
Eric Botcazou writes: >> 2016-11-15 Richard Sandiford >> Alan Hayward >> David Sherwood >> >> * rtl.h (load_extend_op): Declare. >> * rtlanal.c (load_extend_op): New function. > > I'd make it an inline function. Sorry, I'd committed it before I got this message. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * rtlanal.c (load_extend_op): Move to... * rtl.h: ...here and make inline. diff --git a/gcc/rtl.h b/gcc/rtl.h index df5172b..b21514f 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2954,7 +2954,6 @@ extern void set_insn_deleted (rtx); /* Functions in rtlanal.c */ -extern rtx_code load_extend_op (machine_mode); extern rtx single_set_2 (const rtx_insn *, const_rtx); extern bool contains_symbol_ref_p (const_rtx); extern bool contains_symbolic_reference_p (const_rtx); @@ -3771,5 +3770,17 @@ struct GTY(()) cgraph_rtl_info { unsigned function_used_regs_valid: 1; }; +/* If loads from memories of mode MODE always sign or zero extend, + return SIGN_EXTEND or ZERO_EXTEND as appropriate. Return UNKNOWN + otherwise. */ + +inline rtx_code +load_extend_op (machine_mode mode) +{ + if (SCALAR_INT_MODE_P (mode) + && GET_MODE_PRECISION (mode) < BITS_PER_WORD) +return LOAD_EXTEND_OP (mode); + return UNKNOWN; +} #endif /* ! GCC_RTL_H */ diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index f07a77a..55a9d2c 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -3863,19 +3863,6 @@ subreg_nregs_with_regno (unsigned int regno, const_rtx x) return info.nregs; } -/* If loads from memories of mode MODE always sign or zero extend, - return SIGN_EXTEND or ZERO_EXTEND as appropriate. Return UNKNOWN - otherwise. */ - -rtx_code -load_extend_op (machine_mode mode) -{ - if (SCALAR_INT_MODE_P (mode) - && GET_MODE_PRECISION (mode) < BITS_PER_WORD) -return LOAD_EXTEND_OP (mode); - return UNKNOWN; -} - struct parms_set_data { int nregs;
Re: [PATCH] [AArch64] Fix PR78382
Hi Kyrill, Thanks for the comment. Bootstrapped successfully on AArch64 (thunder) system. And also regression tested on AArch64(thunder) with no regressions. Thanks, Naveen
[PATCH GCC]Move simplification from fold-cond.c to match.pd and extend it
Hi, This is a rework of https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02005.html. According to review comment, I extended the original patch and made it covering last kind simplification of fold_cond_expr_with_comparison (Well, this patch handles <, <=, > and >=. == will be handled by a follow up). This patch also adds several tests, some tests are for existing fold_cond_expr_with_comparison simplification but not covered yet; others are for new extension. Bootstrap and test on x86_64 and AArch64 along with following patches, is it OK? Thanks, bin 2016-11-16 Bin Cheng * fold-const.c (fold_cond_expr_with_comparison): Move simplification for A cmp C1 ? A : C2 to below, also simplify remaining code. * match.pd: Move and extend simplification from above to here: (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)). * tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func. (predicate_scalar_phi): Call fold_stmt using the new valueize func. gcc/testsuite/ChangeLog 2016-11-16 Bin Cheng * gcc.dg/fold-cond_expr-1.c: New test. * gcc.dg/fold-condcmpconv-1.c: New test. * gcc.dg/fold-condcmpconv-2.c: New test. * gcc.dg/tree-ssa/pr66726.c: Adjust test strings.diff --git a/gcc/fold-const.c b/gcc/fold-const.c index c597414..1e61ccf 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -5210,95 +5210,18 @@ fold_cond_expr_with_comparison (location_t loc, tree type, } } - /* If this is A op C1 ? A : C2 with C1 and C2 constant integers, - we might still be able to simplify this. For example, - if C1 is one less or one more than C2, this might have started - out as a MIN or MAX and been transformed by this function. - Only good for INTEGER_TYPEs, because we need TYPE_MAX_VALUE. */ + /* If this is A == C1 ? A : C2 with C1 and C2 constant integers, + we simplify it into A == C1 ? C1 : C2. */ - if (INTEGRAL_TYPE_P (type) + if (comp_code == EQ_EXPR + && INTEGRAL_TYPE_P (type) && TREE_CODE (arg01) == INTEGER_CST + && TREE_CODE (arg1) != INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST) -switch (comp_code) - { - case EQ_EXPR: - if (TREE_CODE (arg1) == INTEGER_CST) - break; - /* We can replace A with C1 in this case. */ - arg1 = fold_convert_loc (loc, type, arg01); - return fold_build3_loc (loc, COND_EXPR, type, arg0, arg1, arg2); - - case LT_EXPR: - /* If C1 is C2 + 1, this is min(A, C2), but use ARG00's type for - MIN_EXPR, to preserve the signedness of the comparison. */ - if (! operand_equal_p (arg2, TYPE_MAX_VALUE (type), - OEP_ONLY_CONST) - && operand_equal_p (arg01, - const_binop (PLUS_EXPR, arg2, -build_int_cst (type, 1)), - OEP_ONLY_CONST)) - { - tem = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (arg00), arg00, - fold_convert_loc (loc, TREE_TYPE (arg00), -arg2)); - return fold_convert_loc (loc, type, tem); - } - break; - - case LE_EXPR: - /* If C1 is C2 - 1, this is min(A, C2), with the same care - as above. */ - if (! operand_equal_p (arg2, TYPE_MIN_VALUE (type), - OEP_ONLY_CONST) - && operand_equal_p (arg01, - const_binop (MINUS_EXPR, arg2, -build_int_cst (type, 1)), - OEP_ONLY_CONST)) - { - tem = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (arg00), arg00, - fold_convert_loc (loc, TREE_TYPE (arg00), -arg2)); - return fold_convert_loc (loc, type, tem); - } - break; - - case GT_EXPR: - /* If C1 is C2 - 1, this is max(A, C2), but use ARG00's type for - MAX_EXPR, to preserve the signedness of the comparison. */ - if (! operand_equal_p (arg2, TYPE_MIN_VALUE (type), - OEP_ONLY_CONST) - && operand_equal_p (arg01, - const_binop (MINUS_EXPR, arg2, -build_int_cst (type, 1)), - OEP_ONLY_CONST)) - { - tem = fold_build2_loc (loc, MAX_EXPR, TREE_TYPE (arg00), arg00, - fold_convert_loc (loc, TREE_TYPE (arg00), -arg2)); - return fold_convert_loc (loc, type, tem); - } - break; - - case GE_EXPR: - /* If C1 is C2 + 1, this is max(A, C2), with the same care as above. */ - if (! operand_equal_p (arg2, TYPE_MAX_VALUE (type), -
Re: [PATCH 1/4] Remove build dependence on HSA run-time
On Sun, Nov 13, 2016 at 08:02:41PM +0100, Martin Jambor wrote: > @@ -143,6 +240,12 @@ init_enviroment_variables (void) > suppress_host_fallback = true; >else > suppress_host_fallback = false; > + > + hsa_runtime_lib = getenv ("HSA_RUNTIME_LIB"); > + if (hsa_runtime_lib == NULL) > +hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so"; libgomp is very much env var driven, but the above one is IMHO just too dangerous in suid/sgid apps, allowing one to select a library of their own choice to dlopen is an instant exploit possibility, so such env var should be only considered in non-priviledged processes. It is possible to try dlopen (hsa_runtime_lib) and if that fails, try dlopen ("libhsa-runtime64.so"), where it would search the library only in the system paths (note, the dynamic linker handles LD_LIBRARY_PATH, LD_PRELOAD etc. safely in priviledges processes). So I'd recommend to use secure_getenv instead. E.g. see how libgfortran checks for it in configure and even provides a fallback version for it. In the HSA plugin case, I think the fallback should be static function in the plugin. Otherwise it looks reasonable, thanks for working on that. Jakub
[PATCH][wwwdocs] Document new arm/aarch64 CPU support in a more compact way
Hi all, This patch brings the new CPU support announcements in line with the format used in the GCC 6 notes. That is, rather than have a separate "The is now supported via the..." entry for each new core just list them and give a use example with the -mcpu,-mtune options. Ok to commit? Thanks, Kyrill Index: htdocs/gcc-7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.24 diff -U 3 -r1.24 changes.html --- htdocs/gcc-7/changes.html 9 Nov 2016 14:28:59 - 1.24 +++ htdocs/gcc-7/changes.html 16 Nov 2016 10:23:04 - @@ -287,14 +287,14 @@ processing floating-point instructions. - The ARM Cortex-A73 processor is now supported via the - -mcpu=cortex-a73 and -mtune=cortex-a73 - options as well as the equivalent target attributes and pragmas. - - - The Broadcom Vulcan processor is now supported via the - -mcpu=vulcan and -mtune=vulcan options as - well as the equivalent target attributes and pragmas. + Support has been added for the following processors + (GCC identifiers in parentheses): ARM Cortex-A73 + (cortex-a73) and Broadcom Vulcan (vulcan). + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a73 or + -mtune=vulcan or as arguments to the equivalent target + attributes and pragmas. @@ -316,19 +316,14 @@ armv8-m.main+dsp options. - The ARM Cortex-A73 processor is now supported via the - -mcpu=cortex-a73 and -mtune=cortex-a73 - options. - - - The ARM Cortex-M23 processor is now supported via the - -mcpu=cortex-m23 and -mtune=cortex-m23 - options. - - - The ARM Cortex-M33 processor is now supported via the - -mcpu=cortex-m33 and -mtune=cortex-m33 - options. + Support has been added for the following processors + (GCC identifiers in parentheses): ARM Cortex-A73 + (cortex-a73), ARM Cortex-M23 (cortex-m23) and + ARM Cortex-M33 (cortex-m33). + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a73 or + -mtune=cortex-m33.
Re: [PATCH 2/4] HSA specific built-ins
On Sun, Nov 13, 2016 at 08:39:35PM +0100, Martin Jambor wrote: > Hello, > > this patch adds a small file hsa-builtins.def which defines a few > builtins that I then use in OpenMP lowering and expansion. > > After we split gridification stuff in omp-low.c to a separate file, we > should be able to only conditionally include the file and remove the > weird conditional ifdef. > > OK for trunk? Does this work well even with lto and jit FEs? Ok for trunk if it does. > 2016-11-11 Martin Jambor > > gcc/ > * hsa-builtins.def: New file. > * Makefile.in (BUILTINS_DEF): Add hsa-builtins.def dependency. > * builtins.def: Include hsa-builtins.def. > (DEF_HSA_BUILTIN): New macro. > > fortran/ > * f95-lang.c (DEF_HSA_BUILTIN): New macro. Jakub
Re: [PATCH 3/4] OpenMP lowering changes from the hsa branch
On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote: > + size_t collapse = gimple_omp_for_collapse (for_stmt); > + struct omp_for_data_loop *loops > += (struct omp_for_data_loop *) > +alloca (gimple_omp_for_collapse (for_stmt) > + * sizeof (struct omp_for_data_loop)); Use struct omp_for_data_loop *loops = XALLOCAVEC (struct omp_for_data_loop, gimple_omp_for_collapse (for_stmt)); instead? > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp = > { >GIMPLE_PASS, /* type */ >"ompexp", /* name */ > - OPTGROUP_NONE, /* optinfo_flags */ > + OPTGROUP_OPENMP, /* optinfo_flags */ >TV_NONE, /* tv_id */ >PROP_gimple_any, /* properties_required */ >PROP_gimple_eomp, /* properties_provided */ What about the simdclone, omptargetlink, diagnose_omp_blocks passes? What about openacc specific passes (oaccdevlow)? And Alex is hopefully going to add ompdevlow pass soon. Otherwise LGTM. Jakub
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
Hi Bruce, Mike, > On Fri, Nov 11, 2016 at 3:18 AM, Mike Stump wrote: >> >> No objections from me. >> > Or me. Thanks! as discussed at length in the PR, the fixincludes route alone isn't enough to get libsanitizer to build on Darwin 15: we stay with undefined references to a function which had to fixinclude'd out due to use of the Blocks extension. On closer inspection, it turns out that the Darwin 16 effectively disables os_trace unless using clang >= 7.3. So there's no point in jumping through hoops to achieve the same with a large effort. OTOH, the fixes in my fixincludes patch fix real incompatibilities between Darwin headers an GCC, so we agreed to keep them even if libsanitizer would build without. So the current suggestion is to combine my fixincludes patch and Jack's patch to disable use if !__BLOCKS__. Bootstrapped without regressions on Darwin 16 by myself, and on 15 and 14 by Jack. I guess this is ok for mainline now to restore bootstrap? It may be that we add some more fixincludes fixes later, even if not required for libsanitizer. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2016-11-10 Rainer Orth PR sanitizer/78267 * inclhack.def (darwin_availabilityinternal, darwin_os_trace_1) (darwin_os_trace_2, darwin_os_trace_3): New fixes. (hpux_stdint_least_fast): Remove spurious _EOFix_. * fixincl.x: Regenerate. * tests/bases/AvailabilityInternal.h: New file. * tests/bases/os/trace.h: New file. 2016-11-14 Jack Howarth PR sanitizer/78267 * sanitizer_common/sanitizer_mac.cc: Include only if compiler supports blocks extension. # HG changeset patch # Parent 14eb543b5cb5e7a2bdea7430e3bf3b771bf0e54f Fix macOS 10.12 and (PR sanitizer/78267) diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def --- a/fixincludes/inclhack.def +++ b/fixincludes/inclhack.def @@ -1338,6 +1338,32 @@ fix = { }; /* + * macOS 10.12 uses __attribute__((availability)) + * unconditionally. + */ +fix = { +hackname = darwin_availabilityinternal; +mach = "*-*-darwin*"; +files = AvailabilityInternal.h; +select= "#define[ \t]+(__API_[ADU]\\([^)]*\\)).*"; +c_fix = format; +c_fix_arg = <<- _EOFix_ + #if defined(__has_attribute) + #if __has_attribute(availability) + %0 + #else + #define %1 + #endif + #else + #define %1 + #endif + _EOFix_; + +test_text = "#define __API_A(x) __attribute__((availability(__API_AVAILABLE_PLATFORM_##x)))\n" + "#define __API_D(msg,x) __attribute__((availability(__API_DEPRECATED_PLATFORM_##x,message=msg)))"; +}; + +/* * For the AAB_darwin7_9_long_double_funcs fix to be useful, * you have to not use "" includes. */ @@ -1410,6 +1436,62 @@ fix = { }; /* + * Mac OS X 10.11 uses attribute on function definition. + */ +fix = { + hackname = darwin_os_trace_1; + mach = "*-*-darwin*"; + files = os/trace.h; + select= "^(_os_trace_verify_printf.*) (__attribute__.*)"; + c_fix = format; + c_fix_arg = "%1"; + test_text = "_os_trace_verify_printf(const char *msg, ...) __attribute__((format(printf, 1, 2)))"; +}; + +/* + * Mac OS X 10.1[012] os_trace_payload_t typedef uses Blocks + * extension without guard. + */ +fix = { + hackname = darwin_os_trace_2; + mach = "*-*-darwin*"; + files = os/trace.h; + select= "typedef.*\\^os_trace_payload_t.*"; + c_fix = format; + c_fix_arg = "#if __BLOCKS__\n%0\n#endif"; + test_text = "typedef void (^os_trace_payload_t)(xpc_object_t xdict);"; +}; + +/* + * In Mac OS X 10.1[012] , need to guard users of + * os_trace_payload_t typedef, too. + */ +fix = { + hackname = darwin_os_trace_3; + mach = "*-*-darwin*"; + files = os/trace.h; + select= <<- _EOSelect_ + __(API|OSX)_.* + OS_EXPORT.* + .* + _os_trace.*os_trace_payload_t payload); + _EOSelect_; + c_fix = format; + c_fix_arg = "#if __BLOCKS__\n%0\n#endif"; + test_text = <<- _EOText_ + __API_AVAILABLE(macosx(10.10), ios(8.0), watchos(2.0), tvos(8.0)) + OS_EXPORT OS_NOTHROW OS_NOT_TAIL_CALLED + void + _os_trace_with_buffer(void *dso, const char *message, uint8_t type, const void *buffer, size_t buffer_size, os_trace_payload_t payload); + + __OSX_AVAILABLE_STARTING(__MAC_10_12, __IPHONE_10_0) + OS_EXPORT OS_NOTHROW + void + _os_trace_internal(void *dso, uint8_t type, const char *format, const uint8_t *buf, size_t buf_size, os_trace_payload_t payload); + _EOText_; +}; + +/* * __private_extern__ doesn't exist in FSF GCC. Even if it did, * why would you ever put it in a system header file? */ @@ -2638,7 +2720,6 @@ fix = { c-fix-arg = "# define UINT_%164_MAX __UINT64_MAX__"; test-text = "# define UINT_FAST64_MAXULLONG_MAX\n" "# define UINT_LEAST64_MAXULLONG_MAX\n"; - _EOFix_; }; /* diff --git a/fixincludes/tests/base/A
Re: Add a load_extend_op wrapper
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > gcc/ > * rtlanal.c (load_extend_op): Move to... > * rtl.h: ...here and make inline. OK, thanks. -- Eric Botcazou
Re: [Patch 1/5] OpenACC tile clause support, OMP_CLAUSE_TILE adjustments
On Thu, Nov 17, 2016 at 05:27:14PM +0800, Chung-Lin Tang wrote: > I've updated the patch as you suggested. > Here's the v2 of the first patch, mainly gimplify.c adjusted. > > About the ChangeLog issues, I'll make sure the final committed diffs will > solve them. Ok. Jakub
Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+
Hi Jonathan, > On 03/11/16 15:11 +0100, Rainer Orth wrote: >>Fortunately, this is all easily fixed by wrapping the affected templates >>in a new macro. That's what this patch does. The new libstdc++ >>acinclude.m4 test may well need wording changes in comments etc. and can >>perhaps be shortened a bit, bit it worked for me. > > See below. thanks for the thorough review. > The change is OK in principle, but I'd prefer more meaningful names > for the macros ... Fine with me: I know close to nothing of C++, so please bear with me ;-) >>--- a/libstdc++-v3/acinclude.m4 >>+++ b/libstdc++-v3/acinclude.m4 >>@@ -2181,7 +2181,8 @@ AC_DEFUN([GLIBCXX_CHECK_STDIO_PROTO], [ >> ]) >> >> dnl >>-dnl Check whether required C++11 overloads are present in . >>+dnl Check whether required C++11 overloads and templates are present >>+dnl in . > > The standard doesn't actually require templates to be used here, it > only requires "sufficient additional overloads" so that integral > arguments work. We happen to do that with templates, and apparently so > do the Solaris headers, but other implementations are possible. So a > more accurate description would be: > > dnl Check whether required C++11 overloads for FP and integral types > dnl are present in . > > And rather than PROTO1 and PROTO2 the macros could be called PROTO_FP > for the overloads for FP types, and PROTO_INT (or just PROTO_I) for > the overloads for integral types (which happen to be templates in our > header). I went for the longer/more expressive names. > The suggestions above would make it shorter, while still remaining > accurate to what the real code in the header does. It could be reduced > even further without altering the meaning for the purposes of this > test: > >>+namespace std { >>+ template >>+constexpr typename __gnu_cxx::__enable_if >>+ <__is_integer<_Tp>::__value, double>::__type >>+log2(_Tp __x) >>+{ return __builtin_log2(__x); } > >template > struct __enable_if_int; > >template<> > struct __enable_if_int > { typedef double __type; }; > >template > constexpr typename __enable_if_int<_Tp>::__type > log2(_Tp __x) > { return __builtin_log2(__x); } > > I don't mind whether you go with this or just remove the unnecessary > structs, typedef and primary template bodies. Either is OK. I went for the latter to keep the testcase close to the original. Please find attached the revised version of the patch. I hope I've incorporated all of your comments. @Bruce: To avoid the hpux11_fabsf test breakage from my original fixincludes part, Dave suggested in private mail to replace the bypass = '__cplusplus' by mach = '*-hp-hpux11*' so the fix only applies to the systems that need them and doesn't get interference from other testcases. So I've got clean make check results in fixincludes. The patch has again been bootstrapped without regressions on i386-pc-solaris2.12, sparc-sun-solaris2.12, i386-pc-solaris2.11 (both with only the fp overloads and nothing at all), and x86_64-pc-linux-gnu. I plan to repeat that testing on the gcc-6 and gcc-5 branches; differences between the branches were minimal for the previons versions (like regenerating fixincl.x or an adjacent comment difference in c_global/cmath). Ok for mainline now and the backports after some soak time? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2016-10-27 Rainer Orth John David Anglin libstdc++-v3: * acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Update comments. (__CORRECT_ISO_CPP11_MATH_H_PROTO): Rename to ... (__CORRECT_ISO_CPP11_MATH_H_PROTO_FP): ... this. Add test for C++11 templates. * configure: Regenerate. * config.h.in: Regenerate. * include/c_global/cmath [__cplusplus >= 201103L]: Reflect __CORRECT_ISO_CPP11_MATH_H_PROTO to __CORRECT_ISO_CPP11_MATH_H_PROTO_FP rename. * include/c_global/cmath [_GLIBCXX_USE_C99_MATH && !_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC && __cplusplus >= 201103L] (std::fpclassify): Wrap in !__CORRECT_ISO_CPP11_MATH_H_PROTO_INT. (std::isfinite): Likewise. (std::isinf): Likewise. (std::isnan): Likewise. (std::isnormal): Likewise. (std::signbit): Likewise. (std::isgreater): Likewise. (std::isgreaterequal): Likewise. (std::isless): Likewise. (std::islessequal): Likewise. (std::islessgreater): Likewise. (std::isunordered): Likewise. [__cplusplus >= 201103L && _GLIBCXX_USE_C99_MATH_TR1] (std::acosh): Likewise. (std::asinh): Likewise. (std::atanh): Likewise. (std::cbrt): Likewise. (std::copysign): Likewise.
Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
On Fri, Nov 18, 2016 at 03:07:27AM -0600, Segher Boessenkool wrote: > rtl_merge_blocks does not check rtl_can_merge_blocks_p (and you get a > quite spectacular ICE when you get it wrong: everything between the two > blocks is deleted :-) ). I'll make a separate patch that checks it > (layout mode already does). This fails in cfgcleanup.c (in merge_blocks_move_successor_nojumps), on the "a->next_bb == b" test. Oh well. Segher
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
> On 17 Nov 2016, at 21:19, Jeff Law wrote: > > On 11/08/2016 05:49 PM, Iain Sandoe wrote: >> >>> On 8 Nov 2016, at 13:39, Mike Stump wrote: >>> >>> On Nov 8, 2016, at 1:05 PM, Iain Sandoe >>> wrote: Simple for the simple case is already part of my patch, but capability for the expert and non-simple case is also present, >>> >>> I'm trying to ask a specific question, what does the patch allow >>> that can't be done otherwise? Kinda a trivial question, and I >>> don't see any answer. >>> >>> I'm not looking for, it allows it to work, or it makes the expert >>> case work. I'm look for the specific question, and the specific >>> information you want, and why ld -v doesn't get it. >> >> ld -v gets it when you can execute ld. It doesn’t get it when the >> $host ld is not executable on $build. > Exactly! > >> Providing the option to give the version allows that without >> requiring the complexity of other (possibly valid) solutions. If you >> know that you’re building (my patched) ld64-253.9 for powerpc-darwin9 >> (crossed from x86-64-darwin14) it’s easy, just put —with-ld64=253.9 >> .. >> >> I think we’ve debated this enough - I’m OK with keeping my extra >> facility locally and will resubmit the patch with it removed in due >> course, Iain > Your call. But ISTM the ability to specify the linker version or even better, > its behaviour is a notable improvement for these crosses. Thanks, at least I’m not going completely crazy ;-) However, I’d already revised the patch to take the approach Mike wanted, and thus didn’t add the documentation that Joseph noted was missing. It takes quite a while to re-test this across the Darwin patch, so I’m going to put forward the amended patch (as Mike wanted it) and we can discuss ways to deal with native and Canadian crosses later. FWIW, it’s possible (but rather kludgy) with the attached patch to override gcc_gc_ld64_version on the make line to build a native/Canadian cross with a “non-standard” ld64 version (I did at least build a native X powerpc-darwin9 on x86_64-darwin14 which was not a brick). OK now for trunk? open branches? Iain gcc/ * configure.ac (with-ld64): New var, set for Darwin, set on detection of ld64, gcc_cv_ld64_export_dynamic: New, New test. * darwin.h: Use LD64_HAS_DYNAMIC export. DEF_LD64: New, define. * darwin10.h(DEF_LD64): Update for this target version. * darwin12.h(LINK_GCC_C_SEQUENCE_SPEC): Remove rdynamic test. (DEF_LD64): Update for this target version. PR71767-part2-revised.patch Description: Binary data
Re: [Patch 2/5] OpenACC tile clause support, omp-low parts
Hi! On Thu, Nov 17, 2016 at 05:31:40PM +0800, Chung-Lin Tang wrote: > +#ifndef ACCEL_COMPILER > + span = integer_one_node; > +#else > + if (!e_mask) > +/* Not paritioning. */ > +span = integer_one_node; ... This goes against the recent trend of avoiding #if/#ifdef guarded blocks of code as much as possible, the ACCEL_COMPILER only hunk is significant and will usually not be enabled, so people will not notice breakages in it until building an accel compiler. What about #ifndef ACCEL_COMPILER if (true) span = integer_one_node; else #endif if (!e_mask) /* Not paritioning. */ span_integer_one_node; else if ... or what I've proposed earlier: #ifndef ACCEL_COMPILER e_mask = 0; #endif if (!e_mask) ... Ok with that fixed. Jakub
Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts
On Thu, Nov 17, 2016 at 05:34:34PM +0800, Chung-Lin Tang wrote: > Updated C/C++ front-end patches, adjusted as reviewed. Jason is right, finish_omp_clauses will verify the tile operands when !processing_template_decl are non-negative host INTEGER_CSTs, so can't you just tsubst it like OMP_CLAUSE_COLLAPSE? If the operand is not a constant expression, presumably it will not be INTEGER_CST. On the other side, OMP_CLAUSE_TILE has now 3 operands instead of just 1, don't you need to do something during instantiation for the other 2 operands? Jakub
Re: [Patch 4/5] OpenACC tile clause support, Fortran front-end parts
On Sat, Nov 12, 2016 at 08:51:00AM -0800, Cesar Philippidis wrote: > On 11/11/2016 02:34 AM, Jakub Jelinek wrote: > > On Thu, Nov 10, 2016 at 06:46:46PM +0800, Chung-Lin Tang wrote: > > And here's the patch. The patch doesn't look like OpenACC tile clause fortran support, but bind/nohost clause C/C++ support. Jakub
Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally
On 17 November 2016 at 10:23, Kyrill Tkachov wrote: > > On 09/11/16 12:58, Bernd Edlinger wrote: >> >> Hi! >> >> >> This patch enables the ldrd/strd peephole rules unconditionally. >> >> It is meant to fix cases, where the patch to reduce the sha512 >> stack usage splits ldrd/strd instructions into separate ldr/str insns, >> but is technically independent from the other patch: >> >> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html >> >> It was necessary to change check_effective_target_arm_prefer_ldrd_strd >> to retain the true prefer_ldrd_strd tuning flag. >> >> >> Bootstrapped and reg-tested on arm-linux-gnueabihf. >> Is it OK for trunk? > > > This is ok. > Thanks, > Kyrill > Hi Bernd, Since you committed this patch (r242549), I'm seeing the new test failing on some arm*-linux-gnueabihf configurations: FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10 FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times strd 9 See http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html for a map of failures. Am I missing something? Thanks, Christophe >> Thanks >> Bernd. > >
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com] > Sent: 17 November 2016 21:53 > To: Matthew Fortune; Toma Tabacu; Andrew Bennett; 'gcc- > patc...@gcc.gnu.org' > Cc: Moore, Catherine > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard > library support check the sysroot supports the required test options. > > > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Matthew Fortune > > Sent: Thursday, November 17, 2016 8:47 AM > > To: Toma Tabacu ; Andrew Bennett > > ; Moore, Catherine > > ; 'gcc-patches@gcc.gnu.org' > patc...@gcc.gnu.org> > > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires > > standard library support check the sysroot supports the required test > > options. > > > > Toma Tabacu writes: > > > Hi, > > > > > > The patch below is a rebased version of Andrew's patch plus a few > > more changes > > > to test options. > > > > > > I have tested Andrew's patch by passing -msoft-float to the testsuite > > without > > > having a soft-float library available, and saw that the inline-memcpy- > > {1,2,3,4,5}.c > > > and memcpy-1.c tests were also failing to find standard library > > headers. > > > In the version below, I have added (REQUIRES_STDLIB) to them as > > well. > > > > > > However, I believe that the memcpy-1.c test was removed from the > > first version > > > of Andrew's patch in response to Matthew's comments, but I don't > > think it > > > should be. > > > > > > Tested with mips-img-linux-gnu and mips-mti-linux-gnu. > > > > This looks OK to me but I then again I helped with the design for this. > > > > I'd like to give Catherine a chance to take a look though as the feature > > is unusual. > > > > One comment below. > > > > > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp > > > b/gcc/testsuite/gcc.target/mips/mips.exp > > > index e22d782..ccd4ecb 100644 > > > --- a/gcc/testsuite/gcc.target/mips/mips.exp > > > +++ b/gcc/testsuite/gcc.target/mips/mips.exp > > > @@ -1420,6 +1426,22 @@ proc mips-dg-options { args } { > > > } > > > } > > > > > > +# If the test is marked as requiring standard libraries check > > > +# that the sysroot has support for the current set of test options. > > > +if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } { > > > + mips_push_test_options saved_options $extra_tool_flags > > > + set output [mips_preprocess "" { > > > + #include > > > + } 1] > > > + mips_pop_test_options saved_options > > > + > > > + # If the preprocessing of the stdlib.h file produced errors mark > > > + # the test as unsupported. > > > + if { ![string equal $output ""] } { > > > + set do_what [lreplace $do_what 1 1 "N"] > > > > We should describe what we expect the format of do_what to be at > > the time > > of writing in case it ever changes. i.e. a comment to say what the > > second > > character means and therefore make it clear that setting it to 'n' makes > > the test unsupported. > > > > This patch looks okay to me after updating the comment as Matthew suggested. The version below has a more detailed comment about marking tests as unsupported. Matthew, does it look good to you ? Also, should we document our expectations for the rest of do_what's format (elements 0 and 2) ? Regards, Toma Tabacu gcc/testsuite/ChangeLog: * gcc.target/mips/inline-memcpy-1.c (dg-options): Add (REQUIRES_STDLIB). * gcc.target/mips/inline-memcpy-2.c: Ditto. * gcc.target/mips/inline-memcpy-3.c: Ditto. * gcc.target/mips/inline-memcpy-4.c: Ditto. * gcc.target/mips/inline-memcpy-5.c: Ditto. * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto. * gcc.target/mips/loongson-simd.c: Ditto. * gcc.target/mips/memcpy-1.c: Ditto. * gcc.target/mips/mips-3d-1.c: Ditto. * gcc.target/mips/mips-3d-2.c: Ditto. * gcc.target/mips/mips-3d-3.c: Ditto. * gcc.target/mips/mips-3d-4.c: Ditto. * gcc.target/mips/mips-3d-5.c: Ditto. * gcc.target/mips/mips-3d-6.c: Ditto. * gcc.target/mips/mips-3d-7.c: Ditto. * gcc.target/mips/mips-3d-8.c: Ditto. * gcc.target/mips/mips-3d-9.c: Ditto. * gcc.target/mips/mips-ps-1.c: Ditto. * gcc.target/mips/mips-ps-2.c: Ditto. * gcc.target/mips/mips-ps-3.c: Ditto. * gcc.target/mips/mips-ps-4.c: Ditto. * gcc.target/mips/mips-ps-6.c: Ditto. * gcc.target/mips/mips16-attributes.c: Ditto. * gcc.target/mips/mips32-dsp-run.c: Ditto. * gcc.target/mips/mips32-dsp.c: Ditto. * gcc.target/mips/save-restore-1.c: Ditto. * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib. (mips_preprocess): Add ignore_output argument that when set will not return the pre-processed output. (mips_arch_info): Update arguments for the call to mips_preprocess. (mips-dg-init): Ditto. (m
Re: [PATCH 1/2] PR77822
On Fri, Nov 18, 2016 at 03:31:40AM -0600, Segher Boessenkool wrote: > Hi Dominik, > > On Thu, Nov 17, 2016 at 04:53:47PM +0100, Dominik Vogt wrote: > > +/* A convenience macro to determine whether a SIZE lies inclusively > > + within [1, RANGE], POS lies inclusively within between > > + [0, RANGE - 1] and the sum lies inclusively within [1, RANGE]. */ > > +#define SIZE_POS_IN_RANGE(SIZE, POS, RANGE) \ > > + (IN_RANGE (POS, 0, (RANGE) - 1) \ > > + && IN_RANGE (SIZE, 1, RANGE) \ > > + && IN_RANGE ((SIZE) + (POS), 1, RANGE)) > > You should put parens around every use of SIZE, POS, RANGE -- there could > be a comma operator in the macro argument. Good point. That wouldn't matter for a backend macro, but in system.h it does. > You don't check if SIZE+POS overflows / wraps around. If you really don't > care about that, you can lose the > > > + && IN_RANGE (SIZE, 1, RANGE) \ > > part as well? The macro is _intended_ for checking the (possibly bogus) arguments of zero_extract and sing_extract that combine may generate. SIZE is some unsigned value, but POS might be unsigned or signed, and actually have a negative value (INTVAL (operands[x]) or UINTVAL (operands[x]))), and RANGE is typically 64 or another small value. Anyway, the macro should work for any inputs (except RANGE <= 0). How about this: -- /* A convenience macro to determine whether a SIZE lies inclusively within [1, UPPER], POS lies inclusively within between [0, UPPER - 1] and the sum lies inclusively within [1, UPPER]. UPPER must be >= 1. */ #define SIZE_POS_IN_RANGE(SIZE, POS, UPPER) \ (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (UPPER) - 1) \ && IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (UPPER) \ - (unsigned HOST_WIDE_INT)(POS))) -- IN_RANGE(POS...) makes sure that POS is a non-negative number smaller than UPPER, so (UPPER) - (POS) does not wrap. Or is there some case that the new macro does not handle? Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH] OpenACC routines -- middle end
On Fri, Nov 11, 2016 at 03:43:02PM -0800, Cesar Philippidis wrote: > + error_at (OMP_CLAUSE_LOCATION (c), > + "%qs specifies a conflicting level of parallelism", > + omp_clause_code_name[OMP_CLAUSE_CODE (c)]); > + inform (OMP_CLAUSE_LOCATION (c_level), > + "... to the previous %qs clause here", I think the '... ' part is unnecessary. Perhaps word it better like we word errors/warnings for mismatched attributes etc.? > +incompatible: > + if (c_diag != NULL_TREE) > + error_at (OMP_CLAUSE_LOCATION (c_diag), > + "incompatible %qs clause when applying" > + " %<%s%> to %qD, which has already been" > + " marked as an accelerator routine", > + omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)], > + routine_str, fndecl); > + else if (c_diag_p != NULL_TREE) > + error_at (loc, > + "missing %qs clause when applying" > + " %<%s%> to %qD, which has already been" > + " marked as an accelerator routine", > + omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)], > + routine_str, fndecl); > + else > + gcc_unreachable (); > + if (c_diag_p != NULL_TREE) > + inform (OMP_CLAUSE_LOCATION (c_diag_p), > + "... with %qs clause here", > + omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)]); Again, I think this usually would be something like "previous %qs clause" or similar in the inform. Generally, I think the error message should be self-contained and infom should be just extra information, rather than error containing first half of the diagnostic message and inform the second one. E.g. for translations, while such a sentence crossing the two diagnostic routines might make sense in english, it might look terrible in other languages. > + else > + { > + /* In the front ends, we don't preserve location information for the > + OpenACC routine directive itself. However, that of c_level_p > + should be close. */ > + location_t loc_routine = OMP_CLAUSE_LOCATION (c_level_p); > + inform (loc_routine, "... without %qs clause near to here", > + omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)]); > + } > + /* Incompatible. */ > + return -1; > +} > + > + return 0; Jakub
Re: [PATCH] OpenACC routines -- c front end
On Fri, Nov 11, 2016 at 03:43:23PM -0800, Cesar Philippidis wrote: > @@ -11801,12 +11807,11 @@ c_parser_oacc_shape_clause (c_parser *parser, > omp_clause_code kind, > } > > location_t expr_loc = c_parser_peek_token (parser)->location; > - c_expr cexpr = c_parser_expr_no_commas (parser, NULL); > - cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true); > - tree expr = cexpr.value; > + tree expr = c_parser_expr_no_commas (parser, NULL).value; > if (expr == error_mark_node) > goto cleanup_error; > > + mark_exp_read (expr); > expr = c_fully_fold (expr, false, NULL); > > /* Attempt to statically determine when the number isn't a Why? Are the arguments of the clauses lvalues? > @@ -11867,12 +11872,12 @@ c_parser_oacc_shape_clause (c_parser *parser, > omp_clause_code kind, > seq */ > > static tree > -c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code, > - tree list) > +c_parser_oacc_simple_clause (c_parser * /* parser */, location_t loc, Just leave it as c_parser *, or better yet remove the argument if you don't need it. > + else > + { > + //TODO? TREE_USED (decl) = 1; This would be /* FIXME: TREE_USED (decl) = 1; */ but wouldn't it be better to figure out if you want to do that or not? Jakub
Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.
On 16 November 2016 at 23:12, Andrew Burgess wrote: > * Mike Stump [2016-11-16 12:59:53 -0800]: > >> On Nov 16, 2016, at 12:09 PM, Andrew Burgess >> wrote: >> > My only remaining concern is the new tests, I've tried to restrict >> > them to targets that I suspect they'll pass on with: >> > >> >/* { dg-final-use { scan-assembler "\.section\[\t >> > \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target >> > *-*-linux* *-*-gnu* } } } */ >> > >> > but I'm still nervous that I'm going to introduce test failures. Is >> > there any advice / guidance I should follow before I commit, or are >> > folk pretty relaxed so long as I've made a reasonable effort? >> >> So, if you are worried about the way the line is constructed, I usually test >> it by misspelling the *-*-linux* *-*-gnu* part as *-*-linNOTux* *-*-gnNOTu* >> and see if the test then doesn't run on your machine. If it doesn't then >> you can be pretty confident that only machines that match the target triplet >> can be impacted. I usually do this type of testing by running the test case >> in isolation (not the full tests suite). Anyway, do the best you can, and >> don't worry about t it too much, learn from the experience, even if it goes >> wrong in some way. If it did go wrong, just be responsive (don't check it >> in just before a 6 week vacation) about fixing it, if you can. >> > > Thanks for the feedback. > > Change committed as revision 242519. If anyone sees any issues just > let me know. > Hi Andrew, Sorry for the delay, there are so many commits these days, with so many regression reports to check manually before reporting here So, your new test fails on arm* targets: gcc.dg/tree-prof/section-attr-1.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold.0 gcc.dg/tree-prof/section-attr-2.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold.0 gcc.dg/tree-prof/section-attr-3.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold.0 Christophe > Thanks, > Andrew
Re: [PATCH] OpenACC routines -- fortran front end
On Fri, Nov 11, 2016 at 03:44:07PM -0800, Cesar Philippidis wrote: > --- a/gcc/fortran/gfortran.h > +++ b/gcc/fortran/gfortran.h > @@ -314,6 +314,15 @@ enum save_state > { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT > }; > > +/* Flags to keep track of ACC routine states. */ > +enum oacc_function > +{ OACC_FUNCTION_NONE = 0, Please add a newline after {. >if (clauses) > { >unsigned mask = 0; > >if (clauses->gang) > - level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); > + { > + level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); > + ret = OACC_FUNCTION_GANG; > + } >if (clauses->worker) > - level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); > + { > + level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); > + ret = OACC_FUNCTION_WORKER; > + } >if (clauses->vector) > - level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); > + { > + level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); > + ret = OACC_FUNCTION_VECTOR; > + } As you have {}s around, please use level = GOMP_DIM_*; mask |= GOMP_DIM_MASK (level); ret = OACC_FUNCTION_*; >if (clauses->seq) > level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level); > >if (mask != (mask & -mask)) > - gfc_error ("Multiple loop axes specified for routine"); > + ret = OACC_FUNCTION_NONE; > } > > - if (level < 0) > -level = GOMP_DIM_MAX; > - > - return level; > + return ret; > } > > match > gfc_match_oacc_routine (void) > { >locus old_loc; > - gfc_symbol *sym = NULL; >match m; > + gfc_intrinsic_sym *isym = NULL; > + gfc_symbol *sym = NULL; >gfc_omp_clauses *c = NULL; >gfc_oacc_routine_name *n = NULL; > + oacc_function dims = OACC_FUNCTION_NONE; > + bool seen_error = false; > >old_loc = gfc_current_locus; > > @@ -2287,45 +2314,52 @@ gfc_match_oacc_routine (void) >if (m == MATCH_YES) > { >char buffer[GFC_MAX_SYMBOL_LEN + 1]; > - gfc_symtree *st; > + gfc_symtree *st = NULL; > >m = gfc_match_name (buffer); >if (m == MATCH_YES) > { > - st = gfc_find_symtree (gfc_current_ns->sym_root, buffer); > + if ((isym = gfc_find_function (buffer)) == NULL > + && (isym = gfc_find_subroutine (buffer)) == NULL) > + { > + st = gfc_find_symtree (gfc_current_ns->sym_root, buffer); > + if (st == NULL && gfc_current_ns->proc_name->attr.contained Please add a newline before &&. > + && gfc_current_ns->parent) > + st = gfc_find_symtree (gfc_current_ns->parent->sym_root, > +buffer); > + } > @@ -5934,6 +6033,21 @@ gfc_resolve_oacc_blocks (gfc_code *code, gfc_namespace > *ns) >ctx.private_iterators = new hash_set; >ctx.previous = omp_current_ctx; >ctx.is_openmp = false; > + > + if (code->ext.omp_clauses->gang) > +dims = OACC_FUNCTION_GANG; > + if (code->ext.omp_clauses->worker) > +dims = OACC_FUNCTION_WORKER; > + if (code->ext.omp_clauses->vector) > +dims = OACC_FUNCTION_VECTOR; > + if (code->ext.omp_clauses->seq) > +dims = OACC_FUNCTION_SEQ; Shouldn't these be else if ? > + > + if (dims == OACC_FUNCTION_NONE && ctx.previous != NULL Again, as the whole condition doesn't fit on one line, please put && on a new line. > + && !ctx.previous->is_openmp) > +dims = ctx.previous->dims; Jakub
Re: [PATCH] OpenACC routines -- test cases
On Fri, Nov 11, 2016 at 03:44:29PM -0800, Cesar Philippidis wrote: > This patch contains new and adjusted, runtime and compiler test cases > for the new OpenACC routine functionality. > > Is this ok for trunk? Ok (though of course, if the diagnostic wording is adjusted, then the test will need to match). Jakub
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
Hi Segher, Thanks for the review, > On 6 Nov 2016, at 22:09, Segher Boessenkool > wrote: > > On Sun, Nov 06, 2016 at 12:13:16PM -0800, Iain Sandoe wrote: >> 2016-11-06 Iain Sandoe >> >> PR target/57438 >> * config/i386/i386.c (ix86_code_end): Note that we emitted code where >> the >> function might otherwise appear empty for picbase thunks. >> (ix86_output_function_epilogue): If we find a zero-sized function >> assume that >> reaching it is UB and trap. If we find a trailing label append a nop. >> * config/rs6000/rs6000.c (rs6000_output_function_epilogue): If we find >> a zero-sized function assume that reaching it is UB and trap. If we >> find a >> trailing label, append a nop. > > These lines are too long. fixed. > >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index b0d2b64..326e2e9 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -30148,11 +30148,16 @@ rs6000_output_function_epilogue (FILE *file, >> { >> #if TARGET_MACHO >> macho_branch_islands (); >> - /* Mach-O doesn't support labels at the end of objects, so if >> - it looks like we might want one, insert a NOP. */ >> + >> + if (TARGET_MACHO) > > Both #if and if? As discussed on IRC, I was under the impression that it is desired to move away from #ifdef towards if() and I have been adding those where locally things have been touched - in this case it was only partially possible. However, FAOD - you pointed out that for the RS6000 back-end #ifdef are still preferred, and I’ve removed this change. > +/* Mach-O doesn't support labels at the end of objects, so if >> + it looks like we might want one, take special action. >> + >> + First, collect any sequence deleted debug labels. */ > > "sequence of”. indeed > >> +/* If we have: >> + label: >> + barrier >> + That need to be guarded. */ > > "then this needs to be guarded.", or similar? amended, > >> +/* up to now we've only seen notes or barriers. */ > > Sentences start with a capital letter. fixed > >> +/* See if have a completely empty function body. */ > > "we have” amended > >> +while (insn && ! INSN_P (insn)) >> + insn = PREV_INSN (insn); >> +/* If we don't find any, we've got an empty function body; i.e. > > "any insn”? amended > >> + completely empty - without a return or branch. GCC declares >> + that reaching __builtin_unreachable() means UB (but we want >> + finite-sized function bodies; to help the user out, let's trap >> + the case. */ > > Zero is finite size; "non-empty function bodies”? yeah… amended. > > The rs6000 code looks fine, thanks; but please work on the text a bit :-) Sorry, that was poor proof-reading (too many times looking at the same code, I guess). OK now for trunk? Open branches? Iain PR target/57438 * config/i386/i386.c (ix86_code_end): Note that we emitted code where the function might otherwise appear empty for picbase thunks. (ix86_output_function_epilogue): If we find a zero-sized function assume that reaching it is UB and trap. If we find a trailing label append a nop. * config/rs6000/rs6000.c (rs6000_output_function_epilogue): If we find a zero-sized function assume that reaching it is UB and trap. If we find a trailing label, append a nop. PR57438-revised.patch Description: Binary data
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
Toma Tabacu writes: > The version below has a more detailed comment about marking tests as > unsupported. > Matthew, does it look good to you ? > > Also, should we document our expectations for the rest of do_what's > format (elements 0 and 2) ? I don’t think it is necessary. The explanation is just to make it clear how the modification should lead to the UNSUPPORTED state so that it can be re-implemented if necessary in the future with the same effect. > gcc/testsuite/ChangeLog: > > * gcc.target/mips/inline-memcpy-1.c (dg-options): Add > (REQUIRES_STDLIB). > * gcc.target/mips/inline-memcpy-2.c: Ditto. > * gcc.target/mips/inline-memcpy-3.c: Ditto. > * gcc.target/mips/inline-memcpy-4.c: Ditto. > * gcc.target/mips/inline-memcpy-5.c: Ditto. > * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto. > * gcc.target/mips/loongson-simd.c: Ditto. > * gcc.target/mips/memcpy-1.c: Ditto. > * gcc.target/mips/mips-3d-1.c: Ditto. > * gcc.target/mips/mips-3d-2.c: Ditto. > * gcc.target/mips/mips-3d-3.c: Ditto. > * gcc.target/mips/mips-3d-4.c: Ditto. > * gcc.target/mips/mips-3d-5.c: Ditto. > * gcc.target/mips/mips-3d-6.c: Ditto. > * gcc.target/mips/mips-3d-7.c: Ditto. > * gcc.target/mips/mips-3d-8.c: Ditto. > * gcc.target/mips/mips-3d-9.c: Ditto. > * gcc.target/mips/mips-ps-1.c: Ditto. > * gcc.target/mips/mips-ps-2.c: Ditto. > * gcc.target/mips/mips-ps-3.c: Ditto. > * gcc.target/mips/mips-ps-4.c: Ditto. > * gcc.target/mips/mips-ps-6.c: Ditto. > * gcc.target/mips/mips16-attributes.c: Ditto. > * gcc.target/mips/mips32-dsp-run.c: Ditto. > * gcc.target/mips/mips32-dsp.c: Ditto. > * gcc.target/mips/save-restore-1.c: Ditto. > * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib. > (mips_preprocess): Add ignore_output argument that when set > will not return the pre-processed output. > (mips_arch_info): Update arguments for the call to > mips_preprocess. > (mips-dg-init): Ditto. > (mips-dg-options): Check if a test having test option > (REQUIRES_STDLIB) has the required sysroot support for > the current test options. OK. Committed as r242587. Thanks, Matthew
Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+
On 18/11/16 12:10 +0100, Rainer Orth wrote: The change is OK in principle, but I'd prefer more meaningful names for the macros ... Fine with me: I know close to nothing of C++, so please bear with me ;-) No problem, that's what the rest of us are here to help with :-) I don't mind whether you go with this or just remove the unnecessary structs, typedef and primary template bodies. Either is OK. I went for the latter to keep the testcase close to the original. OK, that's probably simpler to maintain in case we change the code in the headers in future. Please find attached the revised version of the patch. I hope I've incorporated all of your comments. Looks good. Ok for mainline now and the backports after some soak time? Yes, the libstdc++ parts are OK, thanks.
Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
On Fri, Nov 18, 2016 at 09:09:40AM +0100, Richard Biener wrote: > > This patch rewrites the algorithm to deal with this. It also makes it > > simpler: it does not need the "candidates" array anymore, it does not > > need RTL layout mode, it does not need cleanup_cfg, and it does not > > need to keep track of what blocks it already visited. > > > > Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version > > of the testcase that has 2000 cases instead of 4. Is this okay for trunk? > > Looks good to me - a single question below: And here is a testcase, okay as well? Segher 2016-11-18 Segher Boessenkool gcc/testsuite/ PR rtl-optimization/71785 * gcc.target/powerpc/pr71785.c: New file. --- gcc/testsuite/gcc.target/powerpc/pr71785.c | 52 ++ 1 file changed, 52 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr71785.c diff --git a/gcc/testsuite/gcc.target/powerpc/pr71785.c b/gcc/testsuite/gcc.target/powerpc/pr71785.c new file mode 100644 index 000..613dcd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr71785.c @@ -0,0 +1,52 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not {\mb\M} } } */ + +/* Check that all computed gotos in this testcase end up unfactored completely. + If some is not there will be a unconditional jump left; if all works fine, + all are gone. */ + +typedef enum opcode +{ + OP_A, + OP_B, + OP_END +} opcode; + +typedef struct op +{ + opcode opcode; + int arg; +} op; + +extern void do_stuff_b(int arg); +extern void do_stuff_c(int arg); + +extern int someglobal; + +void +eval(op *op) +{ + static const void *dispatch_table[] = { + &&CASE_OP_A, + &&CASE_OP_B, + &&CASE_OP_C, + &&CASE_OP_END + }; + + goto *dispatch_table[op->opcode]; +CASE_OP_A: + someglobal++; + op++; + goto *dispatch_table[op->opcode]; +CASE_OP_B: + do_stuff_b(op->arg); + op++; + goto *dispatch_table[op->opcode]; +CASE_OP_C: + do_stuff_c(op->arg); + op++; + goto *dispatch_table[op->opcode]; +CASE_OP_END: + return; +} -- 1.9.3
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On Fri, Nov 18, 2016 at 09:29:13AM +, Kyrill Tkachov wrote: > >So your COMPONENTS_FOR_BB returns both components in a pair whenever one > >of those is needed? That should work afaics. > > I mean I still want to have one component per register and since > emit_{prologue,epilogue}_components knows how to form pairs from the > components passed down to it I just need to restrict the number of > components in any particular basic block to an even number. > So say a function can wrap 5 registers: x22,x23,x24,x25,x26. > I want get_separate_components to return 5 components since in that hook > we don't know how these registers are distributed across each basic block. > components_for_bb has that information. > In components_for_bb I want to restrict the components for a basic block to > an even number, so if normally all 5 registers would be valid for wrapping > in that bb I'd only choose 4 so I could form 2 pairs. But selecting only 4 > of the 5 registers, say only x22,x23,x24,x25 leads to x26 not being saved > or restored at all, even during the normal prologue and epilogue because > x26 was marked as a component in components_for_bb and therefore omitted > from > the prologue and epilogue. > So I'm thinking x26 should be removed from the wrappable components of > a basic block by disqualify_components. I'm trying that approach now. My suggestion was, in components_for_bb, whenever you mark x22 as needed you also mark x23 as needed, and whenever you mark x23 as needed you also mark x22. I think this is a lot simpler? Segher
RE: [PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent redundant testing of branch-cost-1.c.
Moore, Catherine writes: > > gcc/testsuite/ChangeLog: > > > > 2016-11-15 Toma Tabacu > > > > * gcc.target/mips/branch-cost-1.c (dg-options): Use > > (HAS_MOVN) instead > > of isa>=4, in order to downgrade to R5. > > > > diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c > > b/gcc/testsuite/gcc.target/mips/branch-cost-1.c > > index 61c3029..7f7ebbe 100644 > > --- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c > > +++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c > > @@ -1,4 +1,4 @@ > > -/* { dg-options "-mbranch-cost=1 isa>=4" } */ > > +/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */ > > /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ > > NOMIPS16 int > > foo (int x, int y, int z, int k) > > I committed this patch for you. Have you requested write access to the > repository? I was thinking the same thing given Toma has now submitted a few patches. Please fill out the form below and reference either myself or Catherine: https://sourceware.org/cgi-bin/pdw/ps_form.cgi Thanks, Matthew
[Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979
Hi all, the attached patch fixes an ice-on-valid problem, simply by removing an assert. The generated code works as expected and the patch regtests cleanly on x86_64-linux-gnu. Ok for trunk? Cheers, Janus 2016-11-18 Janus Weil PR fortran/78392 * trans-array.c (gfc_trans_auto_array_allocation): Remove an assert. 2016-11-18 Janus Weil PR fortran/78392 * gfortran.dg/saved_automatic_2.f90: New test case. Index: gcc/fortran/trans-array.c === --- gcc/fortran/trans-array.c (Revision 242586) +++ gcc/fortran/trans-array.c (Arbeitskopie) @@ -5976,7 +5976,6 @@ gfc_trans_auto_array_allocation (tree decl, gfc_sy type = TREE_TYPE (type); gcc_assert (!sym->attr.use_assoc); - gcc_assert (!TREE_STATIC (decl)); gcc_assert (!sym->module); if (sym->ts.type == BT_CHARACTER ! { dg-do run } ! ! PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979 ! ! Contributed by Janus Weil module mytypes implicit none contains pure integer function get_i () get_i = 13 end function end module program test use mytypes implicit none integer, dimension(get_i()), save :: x if (size(x) /= 13) call abort() end
Re: Fix PR78154
On 17 November 2016 at 15:24, Richard Biener wrote: > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: > >> On 17 November 2016 at 14:21, Richard Biener wrote: >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi Richard, >> >> Following your suggestion in PR78154, the patch checks if stmt >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p >> >> and returns true in that case. >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress. >> >> Would it be OK to commit this patch in stage-3 ? >> > >> > As people noted we have returns_nonnull for this and that is already >> > checked. So please make sure the builtins get this attribute instead. >> OK thanks, I will add the returns_nonnull attribute to the required >> string builtins. >> I noticed some of the string builtins don't have RET1 in builtins.def: >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF. >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar >> to entries for memmove, strcpy ? > > Yes, I think so. Hi, In the attached patch I added returns_nonnull attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF, and changed few builtins like strcat, strncpy, strncat and corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF. Does the patch look correct ? Thanks, Prathamesh > > Richard. diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def index 8dc59c9..da82da5 100644 --- a/gcc/builtin-attrs.def +++ b/gcc/builtin-attrs.def @@ -108,6 +108,7 @@ DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic") DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm") DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure") DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice") +DEF_ATTR_IDENT (ATTR_RETURNS_NONNULL, "returns_nonnull") DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL) @@ -195,8 +196,11 @@ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL) /* Nothrow leaf functions whose pointer parameter(s) are all nonnull, and which return their first argument. */ -DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF_1, ATTR_RETURNS_NONNULL, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ + ATTR_RET1_NOTHROW_NONNULL_LEAF_1) + /* Nothrow const leaf functions whose pointer parameter(s) are all nonnull. */ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL_LEAF, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) diff --git a/gcc/builtins.def b/gcc/builtins.def index 219feeb..c697b0a 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -646,13 +646,13 @@ DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, "memchr", BT_FN_PTR_CONST_PTR_INT_SIZE, DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, "memcmp", BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMCPY, "memcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMMOVE, "memmove", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMSET, "memset", BT_FN_PTR_PTR_INT_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_RINDEX, "rindex", BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STRCASECMP, "strcasecmp", BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCHR, "strchr", BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_STRCMP, "strcmp", BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCPY, "strcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) @@ -661,9 +661,9 @@
Re: [PATCH] Fix PR78189
On 11 November 2016 at 09:56, Richard Biener wrote: > On Thu, 10 Nov 2016, Christophe Lyon wrote: > >> On 10 November 2016 at 09:34, Richard Biener wrote: >> > On Wed, 9 Nov 2016, Christophe Lyon wrote: >> > >> >> On 9 November 2016 at 09:36, Bin.Cheng wrote: >> >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener >> >> > wrote: >> >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote: >> >> >> >> >> >>> Hi Richard, >> >> >>> >> >> >>> >> >> >>> On 7 November 2016 at 09:01, Richard Biener wrote: >> >> >>> > >> >> >>> > The following fixes an oversight when computing alignment in the >> >> >>> > vectorizer. >> >> >>> > >> >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to >> >> >>> > trunk. >> >> >>> > >> >> >>> > Richard. >> >> >>> > >> >> >>> > 2016-11-07 Richard Biener >> >> >>> > >> >> >>> > PR tree-optimization/78189 >> >> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): >> >> >>> > Fix >> >> >>> > alignment computation. >> >> >>> > >> >> >>> > * g++.dg/torture/pr78189.C: New testcase. >> >> >>> > >> >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C >> >> >>> > === >> >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) >> >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) >> >> >>> > @@ -0,0 +1,41 @@ >> >> >>> > +/* { dg-do run } */ >> >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize >> >> >>> > -fno-vect-cost-model" } */ >> >> >>> > + >> >> >>> > +#include >> >> >>> > + >> >> >>> > +struct A >> >> >>> > +{ >> >> >>> > + void * a; >> >> >>> > + void * b; >> >> >>> > +}; >> >> >>> > + >> >> >>> > +struct alignas(16) B >> >> >>> > +{ >> >> >>> > + void * pad; >> >> >>> > + void * misaligned; >> >> >>> > + void * pad2; >> >> >>> > + >> >> >>> > + A a; >> >> >>> > + >> >> >>> > + void Null(); >> >> >>> > +}; >> >> >>> > + >> >> >>> > +void B::Null() >> >> >>> > +{ >> >> >>> > + a.a = nullptr; >> >> >>> > + a.b = nullptr; >> >> >>> > +} >> >> >>> > + >> >> >>> > +void __attribute__((noinline,noclone)) >> >> >>> > +NullB(void * misalignedPtr) >> >> >>> > +{ >> >> >>> > + B* b = reinterpret_cast(reinterpret_cast> >> >>> > *>(misalignedPtr) - offsetof(B, misaligned)); >> >> >>> > + b->Null(); >> >> >>> > +} >> >> >>> > + >> >> >>> > +int main() >> >> >>> > +{ >> >> >>> > + B b; >> >> >>> > + NullB(&b.misaligned); >> >> >>> > + return 0; >> >> >>> > +} >> >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >> >>> > index 9346cfe..b03cb1e 100644 >> >> >>> > --- gcc/tree-vect-data-refs.c >> >> >>> > +++ gcc/tree-vect-data-refs.c >> >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct >> >> >>> > data_reference *dr) >> >> >>> >base = ref; >> >> >>> >while (handled_component_p (base)) >> >> >>> > base = TREE_OPERAND (base, 0); >> >> >>> > + unsigned int base_alignment; >> >> >>> > + unsigned HOST_WIDE_INT base_bitpos; >> >> >>> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> >> >>> > + /* As data-ref analysis strips the MEM_REF down to its base >> >> >>> > operand >> >> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we >> >> >>> > have to >> >> >>> > + adjust things to make base_alignment valid as the alignment of >> >> >>> > + DR_BASE_ADDRESS. */ >> >> >>> >if (TREE_CODE (base) == MEM_REF) >> >> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, >> >> >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, >> >> >>> > 1)), 0)); >> >> >>> > - unsigned int base_alignment = get_object_alignment (base); >> >> >>> > +{ >> >> >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * >> >> >>> > BITS_PER_UNIT; >> >> >>> > + base_bitpos &= (base_alignment - 1); >> >> >>> > +} >> >> >>> > + if (base_bitpos != 0) >> >> >>> > +base_alignment = base_bitpos & -base_bitpos; >> >> >>> > + /* Also look at the alignment of the base address DR analysis >> >> >>> > + computed. */ >> >> >>> > + unsigned int base_addr_alignment = get_pointer_alignment >> >> >>> > (base_addr); >> >> >>> > + if (base_addr_alignment > base_alignment) >> >> >>> > +base_alignment = base_addr_alignment; >> >> >>> > >> >> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> >> >>> > DR_VECT_AUX (dr)->base_element_aligned = true; >> >> >>> >> >> >>> Since you committed this patch (r241892), I'm seeing execution >> >> >>> failures: >> >> >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test >> >> >>> gcc.dg/vect/pr40074.c execution test >> >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 >> >> >>> --with-fpu=neon-fp16 >> >> >>> (using qemu as simulator) >> >> >> >> >> >> The difference is that we now vectorize the testcase with versioning >> >> >> for alignment (but it should never execute the vectorized varia
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? > > Thanks. > Changelog: > > 2016-11-15 Yuri Rumyantsev > > * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. > * tree-if-conv.c (tree_if_conversion): Make public. > * * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. > * tree-vect-loop.c: include tree-if-conv.h. > (new_loop_vec_info): Add zeroing orig_loop_info field. > (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. > (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL > if epilogue is vectorized, set up orig_loop_info field of loop_vinfo > using passed argument. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization with less vf. If-convert epilogue if > required. Print vectorization success for epilogue. > * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization > if it is required, pass loop_vinfo produced during vectorization of > loop body to vect_analyze_loop. > * tree-vectorizer.h (struct _loop_vec_info): Add new field > orig_loop_info. > (LOOP_VINFO_ORIG_LOOP_INFO): New. > (LOOP_VINFO_EPILOGUE_P): New. > (LOOP_VINFO_ORIG_VECT_FACTOR): New. > (vect_do_peeling): Change prototype to return epilogue. > (vect_analyze_loop): Add argument of loop_vec_info type. > (vect_transform_loop): Return created loop. > > gcc/testsuite/ > > * lib/target-supports.exp (check_avx2_hw_available): New. > (check_effective_target_avx2_runtime): New. > * gcc.dg/vect/vect-tail-nomask-1.c: New test. > Hi, This new test fails on arm-none-eabi (using default cpu/fpu/mode): gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-tail-nomask-1.c execution test It does pass on the same target if configured --with-cpu=cortex-a9. Christophe > > 2016-11-14 20:04 GMT+03:00 Richard Biener : >> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >> wrote: >>>Richard, >>> >>>I checked one of the tests designed for epilogue vectorization using >>>patches 1 - 3 and found out that build compiler performs vectorization >>>of epilogues with --param vect-epilogues-nomask=1 passed: >>> >>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>t1.new-nomask.s -fdump-tree-vect-details >>>$ grep VECTORIZED -c t1.c.156t.vect >>>4 >>> Without param only 2 loops are vectorized. >>> >>>Should I simply add a part of tests related to this feature or I must >>>delete all not necessary changes also? >> >> Please remove all not necessary changes. >> >> Richard. >> >>>Thanks. >>>Yuri. >>> >>>2016-11-14 16:40 GMT+03:00 Richard Biener : On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > In my previous patch I forgot to remove couple lines related to aux >>>field. > Here is the correct updated patch. Yeah, I noticed. This patch would be ok for trunk (together with necessary parts from 1 and 2) if all not required parts are removed (and you'd add the testcases covering non-masked tail vect). Thus, can you please produce a single complete patch containing only non-masked epilogue vectoriziation? Thanks, Richard. > Thanks. > Yuri. > > 2016-11-14 15:51 GMT+03:00 Richard Biener : > > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > > > >> Richard, > >> > >> I prepare updated 3 patch with passing additional argument to > >> vect_analyze_loop as you proposed (untested). > >> > >> You wrote: > >> tw, I wonder if you can produce a single patch containing just > >> epilogue vectorization, that is combine patches 1-3 but rip out > >> changes only needed by later patches? > >> > >> Did you mean that I exclude all support for vectorization >>>epilogues, > >> i.e. exclude from 2-nd patch all non-related changes > >> like > >> > >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > >> index 11863af..32011c1 100644 > >> --- a/gcc/tree-vect-loop.c > >> +++ b/gcc/tree-vect-loop.c > >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) > >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; > >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; > >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; > >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; > >> + LOOP_VINFO_NEED_MASKING (res) = false; > >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > > > > Yes. > > > >> Did you mean also that new combined patch must be working patch, >>>i.e. >>
[patch,avr] Disallow LDS / STS for .rodata on AVR_TINY.
Currently, Binutils still comes with an AVR_TINY linker description file which puts .rodata into RAM so that LDS is applicable for reading such objects. However, as these devices come with a linearised address model, linker descriptions which put .rodata into flash are much more reasonable. This patch caters for such situations: As .rodata virtual addresses will be offset by 0x4000, respective objects may no more be accessed by LDS. Moreover, objects with a section attribute won't be accessed by LDS. Ok for trunk? Johann PR target/78093 * config/avr/avr.c (avr_decl_maybe_lds_p): New static function. (avr_encode_section_info) [TARGET_ABSDATA && AVR_TINY]: Use it. Index: config/avr/avr.c === --- config/avr/avr.c (revision 242544) +++ config/avr/avr.c (working copy) @@ -10102,6 +10102,33 @@ avr_section_type_flags (tree decl, const } +/* A helper for the next function. NODE is a decl that is associated with + a symbol. Return TRUE if the respective object may be accessed by LDS. + There migth still be other reasons for why LDS is not appropriate. + This function is only useful for AVR_TINY. */ + +static bool +avr_decl_maybe_lds_p (tree node) +{ + if (!node + || TREE_CODE (node) != VAR_DECL + || DECL_SECTION_NAME (node) != NULL) +return false; + + if (TREE_READONLY (node)) +return true; + + // C++ requires peeling arrays. + + do +node = TREE_TYPE (node); + while (ARRAY_TYPE == TREE_CODE (node)); + + return (node != error_mark_node + && !TYPE_READONLY (node)); +} + + /* Implement `TARGET_ENCODE_SECTION_INFO'. */ static void @@ -10193,7 +10220,8 @@ avr_encode_section_info (tree decl, rtx if (avr_decl_absdata_p (decl, DECL_ATTRIBUTES (decl)) || (TARGET_ABSDATA && !progmem_p - && !addr_attr) + && !addr_attr + && avr_decl_maybe_lds_p (decl)) || (addr_attr // If addr_attr is non-null, it has an argument. Peek into it. && TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (addr_attr))) < 0xc0))
[PATCH] S390: Lower requirements for successful htm tests.
The attached patch makes the htm tests on s390 less sensitive to spurious abort. Please check the commit comment for details. The modified tests have been run once on a zEC12. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/testsuite/ChangeLog * gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS) (DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for successful test. * gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS) (DEFAULT_REQUIRED_QUORUM): Likewise. >From cd354b024c5e129575465b9088b0e0d03340c8f0 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Fri, 18 Nov 2016 14:06:28 +0100 Subject: [PATCH] S390: Lower requirements for successful htm tests. Due to spurious aborts, requiring four successful tests out of five fails too often; accept four out of seven instead. Reduce number of warmup passes. --- gcc/testsuite/gcc.target/s390/htm-builtins-1.c | 6 +++--- gcc/testsuite/gcc.target/s390/htm-builtins-2.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c index c90490f..ff43be9 100644 --- a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c @@ -13,9 +13,9 @@ /* local definitions -- */ -#define DEFAULT_MAX_REPETITIONS 5 -#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1) -#define NUM_WARMUP_RUNS 10 +#define DEFAULT_MAX_REPETITIONS 7 +#define DEFAULT_REQUIRED_QUORUM 4 +#define NUM_WARMUP_RUNS 2 /* local macros --- */ diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c index 15b0d12..bb9d346 100644 --- a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c @@ -13,8 +13,8 @@ /* local definitions -- */ -#define DEFAULT_MAX_REPETITIONS 5 -#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1) +#define DEFAULT_MAX_REPETITIONS 7 +#define DEFAULT_REQUIRED_QUORUM 4 #define DEFAULT_ABORT_ADDRESS (0x12345678u) /* local macros --- */ -- 2.3.0
Re: Rework subreg_get_info
Joseph Myers writes: > On Tue, 15 Nov 2016, Richard Sandiford wrote: >> Richard Sandiford writes: >> > This isn't intended to change the behaviour, just rewrite the >> > existing logic in a different (and hopefully clearer) way. >> > The new form -- particularly the part based on the "block" >> > concept -- is easier to convert to polynomial sizes. >> > >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Sorry, I should have said: this was also tested by compiling the >> testsuite before and after the change at -O2 -ftree-vectorize on: >> >> aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi >> arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf >> epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf >> hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu >> i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf >> m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu >> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf >> nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu >> powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf > > e500 double (both DFmode and TFmode) was the case that motivated the > original creation of subreg_get_info. I think powerpc-linux-gnuspe > --enable-e500-double would be a good case for verifying the patch doesn't > change generated code. (Though when I tried building it from mainline > sources last week I got an ICE in LRA building __multc3, so testing it > might be problematic at present.) Thanks for the pointer. I've now tried comparing the testsuite output on that combination and there were no differences. 31 tests triggered an internal compiler error but >17800 compiled successfully, so hopefully the test was at least somewhat meaningful. (For the record: I got a build failure due to addsi3 FAILing, but I hacked around that by converting it to a force_reg/DONE.) Thanks, Richard
Re: [v3 PATCH] LWG 2766, LWG 2749
On 17/11/16 23:38 +0200, Ville Voutilainen wrote: The first patch does everything but the container adaptor parts (which are wrong in the p/r) and the tuple part (which is icky). The second part does the tuple parts, and needs some serious RFC. I know it's ugly as sin, but it works. I don't think we should try to teach our tuple to give the right answer for is_constructible etc., because the last attempt at it broke boost and would break reasonable existing code in addition to just boost - such things are not Stage 3 material, imnsho. I agree, but if we'd just refused to support such undefined behaviour in stage 1 we wouldn't now be in a position of saying we can't change it in stage 3. Oh well, I'll review this ASAP.
Re: [v3 PATCH] LWG 2766, LWG 2749
On 18 November 2016 at 15:54, Jonathan Wakely wrote: > I agree, but if we'd just refused to support such undefined behaviour > in stage 1 we wouldn't now be in a position of saying we can't change > it in stage 3. I want to support the code that the previous attempt breaks. I don't think I can do so without concepts.
Re: [PATCH 1/2] PR77822
On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote: > How about this: > > -- > /* A convenience macro to determine whether a SIZE lies inclusively >within [1, UPPER], POS lies inclusively within between >[0, UPPER - 1] and the sum lies inclusively within [1, UPPER]. >UPPER must be >= 1. */ > #define SIZE_POS_IN_RANGE(SIZE, POS, UPPER) \ > (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (UPPER) - 1) \ >&& IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (UPPER) \ >- (unsigned HOST_WIDE_INT)(POS))) ^ missing space here > IN_RANGE(POS...) makes sure that POS is a non-negative number > smaller than UPPER, so (UPPER) - (POS) does not wrap. Or is there > some case that the new macro does not handle? I think it works fine. With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like IN_RANGE, i.e. UPPER is inclusive then. Dunno. Segher
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
On 10 November 2016 at 12:06, Pierre-Marie de Rodat wrote: > On 11/09/2016 10:02 AM, Richard Biener wrote: >> >> Using scan-assembler-times on the dwarf? > > > I always have a bad feeling about this kind of check as I imagine it can > break very easily with legit changes. But I have nothing better to > contribute, so I’ve added one such testcase. ;-) > >>> Ok to commit? >> >> >> Ok. > > > This is committed as r242035. Thanks! > Hi, Part of the new testcase added with this commit fails on arm* targets: g++.dg/pr78112.C scan-assembler-times DW_AT_object_pointer 37 Christophe > -- > Pierre-Marie de Rodat
Re: [Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979
Hi Janus, > the attached patch fixes an ice-on-valid problem, simply by removing an > assert. ... I have several instances in my test suite showing that the proposed patch removes the ICE but generates wrong code: pr42359, second test, => ICE on another place pr54613, sixth and eighth tests, Thanks for working on the issue, Dominique
Re: [PATCH v3] PR77359: Properly align local variables in functions calling alloca.
On Fri, Nov 11, 2016 at 10:33:16AM +0100, Dominik Vogt wrote: > gcc/ChangeLog > > * config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align > local variables in functions calling alloca. Also update the ASCII > drawings > * config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET): > PR/77359: Likewise. > * config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET): > PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h. Applied. Thanks! -Andreas-
Re: [PATCH] S390: Lower requirements for successful htm tests.
On Fri, Nov 18, 2016 at 02:48:30PM +0100, Dominik Vogt wrote: > gcc/testsuite/ChangeLog > > * gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS) > (DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for > successful test. > * gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS) > (DEFAULT_REQUIRED_QUORUM): Likewise. Applied. Thanks! -Andreas-
libgo patch committed: Remove long-obsolete "old" directories
The directories old/regexp and old/template were removed from the master Go library in 2012 (https://golang.org/cl/5979046) but somehow that was not reflected in libgo. This patch removes them from libgo. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian patch.txt.bz2 Description: BZip2 compressed data
Re: [PATCH 1/2] PR77822
On Fri, Nov 18, 2016 at 08:02:08AM -0600, Segher Boessenkool wrote: > On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote: > > IN_RANGE(POS...) makes sure that POS is a non-negative number > > smaller than UPPER, so (UPPER) - (POS) does not wrap. Or is there > > some case that the new macro does not handle? > > I think it works fine. > > With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like > IN_RANGE, i.e. UPPER is inclusive then. Dunno. Yeah, maybe rather call it RANGE to avoid too much similarity. Some name that is so vague that one has to read the documentation. I'll post an updated patch later. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
[PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
gcc.dg/tree-ssa/builtin-sprintf-2.c is showing intermittent failures, which valgrind shows to be a read past the end of a buffer. The root cause is that the on-demand substring loc code isn't set up to cope with -ftrack-macro-expansion != 2 (the default). In the failing case, it attempts to use this location as the location of the literal: ../../src/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c:95:1: note: RNG (0, 0, 0, "%hhi", i) ^~~ rather than: RNG (0, 0, 0, "%hhi", i) ^~~~ On re-parsing to generate the on-demand substring locations, it thus erroneously attempts to parse the 'R' as a raw string, and so this code within cpp_interpret_string_1 fires: if (*p == 'R') { const uchar *prefix; /* Skip over 'R"'. */ p += 2; prefix = p; while (*p != '(') p++; and the issue happens in the "while" loop, as it erroneously walks through this memory: (gdb) p strs.m_vec.m_vecdata[0] $20 = {len = 3, text = 0xc9bcca0 "RNG"} trying to find the open parenthesis, and starts reading beyond the allocated buffer. The fix is to require that -ftrack-macro-expansion == 2 (the default) for on-demand string literal locations to be available, failing gracefully to simply using the whole location_t otherwise. Doing so requires some fixups to existing test cases: [gcc.dg/tree-ssa/builtin-sprintf-warn-{1,4}.c both have -ftrack-macro-expansion=0. In the latter case, it seems to be unnecessary, so this patch removes it. In the former case, it seems to be needed, but there are some expected locations in test_sprintf_note that are changed by the patch. It's not clear to me how to fix that, so for now the patch removes the expected column numbers from that function within the test case. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. I think I can self-approve the input.c change and new test cases, but I'm not sure about the changes to Martin's test cases. Martin: are the changes to your test cases OK by you, or is there a better way to rewrite them? gcc/ChangeLog: PR preprocessor/78324 * input.c (get_substring_ranges_for_loc): Fail gracefully if -ftrack-macro-expansion has a value other than 2. gcc/testsuite/ChangeLog: PR preprocessor/78324 * gcc.dg/plugin/diagnostic-test-string-literals-1.c (test_multitoken_macro): New function. * gcc.dg/plugin/diagnostic-test-string-literals-3.c: New test case. * gcc.dg/plugin/diagnostic-test-string-literals-4.c: New test case. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_sprintf_note): Drop expected column numbers from dg-warning directives. * gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Drop -ftrack-macro-expansion=0. --- gcc/input.c| 9 + .../plugin/diagnostic-test-string-literals-1.c | 16 .../plugin/diagnostic-test-string-literals-3.c | 43 ++ .../plugin/diagnostic-test-string-literals-4.c | 43 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 4 +- .../gcc.dg/tree-ssa/builtin-sprintf-warn-1.c | 6 +-- .../gcc.dg/tree-ssa/builtin-sprintf-warn-4.c | 2 +- 7 files changed, 118 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-3.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-4.c diff --git a/gcc/input.c b/gcc/input.c index 728f4dd..611e18b 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1322,6 +1322,15 @@ get_substring_ranges_for_loc (cpp_reader *pfile, if (strloc == UNKNOWN_LOCATION) return "unknown location"; + /* Reparsing the strings requires accurate location information. + If -ftrack-macro-expansion has been overridden from its default + of 2, then we might have a location of a macro expansion point, + rather than the location of the literal itself. + Avoid this by requiring that we have full macro expansion tracking + for substring locations to be available. */ + if (cpp_get_options (pfile)->track_macro_expansion != 2) +return "track_macro_expansion != 2"; + /* If string concatenation has occurred at STRLOC, get the locations of all of the literal tokens making up the compound string. Otherwise, just use STRLOC. */ diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c index 3e44936..76a085e 100644 --- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c @@ -243,6 +243,22 @@ test_macro (void) { dg-end-multiline-output "" } */ } +void +test_multitoken_macro (void) +{ +#define RANGE ("0123456789") /* { dg-error "unab
Re: Fix PR77881: combine improvement
On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab wrote: > On Nov 14 2016, Michael Matz wrote: > >> PR missed-optimization/77881 >> * combine.c (simplify_comparison): Remove useless subregs >> also inside the loop, not just after it. >> (make_compound_operation): Recognize some subregs as being >> masking as well. > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Hi, I can confirm that, also new PR opened for tracking. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 Thanks, bin > > Andreas. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
Re: [PATCH, vec-tails] Support loop epilogue vectorization
It is very strange that this test failed on arm, since it requires target avx2 to check vectorizer dumps: /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */ /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ Could you please clarify what is the reason of the failure? Thanks. 2016-11-18 16:20 GMT+03:00 Christophe Lyon : > On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >> Hi All, >> >> Here is patch for non-masked epilogue vectoriziation. >> >> Bootstrap and regression testing did not show any new failures. >> >> Is it OK for trunk? >> >> Thanks. >> Changelog: >> >> 2016-11-15 Yuri Rumyantsev >> >> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >> * tree-if-conv.c (tree_if_conversion): Make public. >> * * tree-if-conv.h: New file. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >> dynamic alias checks for epilogues. >> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >> * tree-vect-loop.c: include tree-if-conv.h. >> (new_loop_vec_info): Add zeroing orig_loop_info field. >> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >> using passed argument. >> (vect_transform_loop): Check if created epilogue should be returned >> for further vectorization with less vf. If-convert epilogue if >> required. Print vectorization success for epilogue. >> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >> if it is required, pass loop_vinfo produced during vectorization of >> loop body to vect_analyze_loop. >> * tree-vectorizer.h (struct _loop_vec_info): Add new field >> orig_loop_info. >> (LOOP_VINFO_ORIG_LOOP_INFO): New. >> (LOOP_VINFO_EPILOGUE_P): New. >> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >> (vect_do_peeling): Change prototype to return epilogue. >> (vect_analyze_loop): Add argument of loop_vec_info type. >> (vect_transform_loop): Return created loop. >> >> gcc/testsuite/ >> >> * lib/target-supports.exp (check_avx2_hw_available): New. >> (check_effective_target_avx2_runtime): New. >> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >> > > Hi, > > This new test fails on arm-none-eabi (using default cpu/fpu/mode): > gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-tail-nomask-1.c execution test > > It does pass on the same target if configured --with-cpu=cortex-a9. > > Christophe > > > >> >> 2016-11-14 20:04 GMT+03:00 Richard Biener : >>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>> wrote: Richard, I checked one of the tests designed for epilogue vectorization using patches 1 - 3 and found out that build compiler performs vectorization of epilogues with --param vect-epilogues-nomask=1 passed: $ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o t1.new-nomask.s -fdump-tree-vect-details $ grep VECTORIZED -c t1.c.156t.vect 4 Without param only 2 loops are vectorized. Should I simply add a part of tests related to this feature or I must delete all not necessary changes also? >>> >>> Please remove all not necessary changes. >>> >>> Richard. >>> Thanks. Yuri. 2016-11-14 16:40 GMT+03:00 Richard Biener : > On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> In my previous patch I forgot to remove couple lines related to aux field. >> Here is the correct updated patch. > > Yeah, I noticed. This patch would be ok for trunk (together with > necessary parts from 1 and 2) if all not required parts are removed > (and you'd add the testcases covering non-masked tail vect). > > Thus, can you please produce a single complete patch containing only > non-masked epilogue vectoriziation? > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-14 15:51 GMT+03:00 Richard Biener : >> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >> > >> >> Richard, >> >> >> >> I prepare updated 3 patch with passing additional argument to >> >> vect_analyze_loop as you proposed (untested). >> >> >> >> You wrote: >> >> tw, I wonder if you can produce a single patch containing just >> >> epilogue vectorization, that is combine patches 1-3 but rip out >> >> changes only needed by later patches? >> >> >> >> Did you mean that I exclude all support for vectorization epilogues, >> >> i.e. exclude from 2-nd patch all non-related changes >> >> like >> >> >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> >> index 11863af..32011c1 100644 >> >> --- a/gcc/tree-vect-loop.c >> >> +++ b/gcc/tree-vect-loop.c >> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >> >>LOOP_
Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally
On 11/18/16 12:58, Christophe Lyon wrote: > On 17 November 2016 at 10:23, Kyrill Tkachov > wrote: >> >> On 09/11/16 12:58, Bernd Edlinger wrote: >>> >>> Hi! >>> >>> >>> This patch enables the ldrd/strd peephole rules unconditionally. >>> >>> It is meant to fix cases, where the patch to reduce the sha512 >>> stack usage splits ldrd/strd instructions into separate ldr/str insns, >>> but is technically independent from the other patch: >>> >>> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html >>> >>> It was necessary to change check_effective_target_arm_prefer_ldrd_strd >>> to retain the true prefer_ldrd_strd tuning flag. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>> Is it OK for trunk? >> >> >> This is ok. >> Thanks, >> Kyrill >> > > Hi Bernd, > > Since you committed this patch (r242549), I'm seeing the new test > failing on some arm*-linux-gnueabihf configurations: > > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10 > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times strd 9 > > See > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html > for a map of failures. > > Am I missing something? Hi Christophe, as always many thanks for your testing... I have apparently only looked at the case -mfloat-abi=soft here, which is what my other patch is going to address. But all targets with -mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd and vstr.64 instead of strd, which should be accepted as well. So the attached patch should fix at least most of the fallout. Is it OK for trunk? Thanks Bernd. 2016-11-18 Bernd Edlinger * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu. Index: gcc/testsuite/gcc.target/arm/pr53447-5.c === --- gcc/testsuite/gcc.target/arm/pr53447-5.c (revision 242588) +++ gcc/testsuite/gcc.target/arm/pr53447-5.c (working copy) @@ -15,5 +15,8 @@ void foo(long long* p) p[9] -= p[10]; } -/* { dg-final { scan-assembler-times "ldrd" 10 } } */ -/* { dg-final { scan-assembler-times "strd" 9 } } */ +/* We accept neon instructions vldr.64 and vstr.64 as well. + Note: DejaGnu counts patterns with alternatives twice, + so actually there are only 10 loads and 9 stores. */ +/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */ +/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 18 November 2016 at 16:46, Yuri Rumyantsev wrote: > It is very strange that this test failed on arm, since it requires > target avx2 to check vectorizer dumps: > > /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { > target avx2_runtime } } } */ > /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED > \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ > > Could you please clarify what is the reason of the failure? It's not the scan-dumps that fail, but the execution. The test calls abort() for some reason. It will take me a while to rebuild the test manually in the right debug environment to provide you with more traces. > > Thanks. > > 2016-11-18 16:20 GMT+03:00 Christophe Lyon : >> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> Here is patch for non-masked epilogue vectoriziation. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> >>> Thanks. >>> Changelog: >>> >>> 2016-11-15 Yuri Rumyantsev >>> >>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >>> * tree-if-conv.c (tree_if_conversion): Make public. >>> * * tree-if-conv.h: New file. >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >>> dynamic alias checks for epilogues. >>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >>> * tree-vect-loop.c: include tree-if-conv.h. >>> (new_loop_vec_info): Add zeroing orig_loop_info field. >>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >>> using passed argument. >>> (vect_transform_loop): Check if created epilogue should be returned >>> for further vectorization with less vf. If-convert epilogue if >>> required. Print vectorization success for epilogue. >>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >>> if it is required, pass loop_vinfo produced during vectorization of >>> loop body to vect_analyze_loop. >>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >>> orig_loop_info. >>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >>> (LOOP_VINFO_EPILOGUE_P): New. >>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >>> (vect_do_peeling): Change prototype to return epilogue. >>> (vect_analyze_loop): Add argument of loop_vec_info type. >>> (vect_transform_loop): Return created loop. >>> >>> gcc/testsuite/ >>> >>> * lib/target-supports.exp (check_avx2_hw_available): New. >>> (check_effective_target_avx2_runtime): New. >>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>> >> >> Hi, >> >> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-tail-nomask-1.c execution test >> >> It does pass on the same target if configured --with-cpu=cortex-a9. >> >> Christophe >> >> >> >>> >>> 2016-11-14 20:04 GMT+03:00 Richard Biener : On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev wrote: >Richard, > >I checked one of the tests designed for epilogue vectorization using >patches 1 - 3 and found out that build compiler performs vectorization >of epilogues with --param vect-epilogues-nomask=1 passed: > >$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >t1.new-nomask.s -fdump-tree-vect-details >$ grep VECTORIZED -c t1.c.156t.vect >4 > Without param only 2 loops are vectorized. > >Should I simply add a part of tests related to this feature or I must >delete all not necessary changes also? Please remove all not necessary changes. Richard. >Thanks. >Yuri. > >2016-11-14 16:40 GMT+03:00 Richard Biener : >> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >> >>> Richard, >>> >>> In my previous patch I forgot to remove couple lines related to aux >field. >>> Here is the correct updated patch. >> >> Yeah, I noticed. This patch would be ok for trunk (together with >> necessary parts from 1 and 2) if all not required parts are removed >> (and you'd add the testcases covering non-masked tail vect). >> >> Thus, can you please produce a single complete patch containing only >> non-masked epilogue vectoriziation? >> >> Thanks, >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2016-11-14 15:51 GMT+03:00 Richard Biener : >>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>> > >>> >> Richard, >>> >> >>> >> I prepare updated 3 patch with passing additional argument to >>> >> vect_analyze_loop as you proposed (untested). >>> >> >>> >> You wrote: >>> >> tw, I wonder if you can produce a single patch containing just >>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>> >> changes only needed by later patches? >>> >> >>> >> Did you mean that
Re: Fix PR77881: combine improvement
Hi, On Fri, 18 Nov 2016, Bin.Cheng wrote: > On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab wrote: > > On Nov 14 2016, Michael Matz wrote: > > > >> PR missed-optimization/77881 > >> * combine.c (simplify_comparison): Remove useless subregs > >> also inside the loop, not just after it. > >> (make_compound_operation): Recognize some subregs as being > >> masking as well. > > > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. > Hi, > I can confirm that, also new PR opened for tracking. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 See PR78390 for a patch (comment #8) fixing the aarch64 problem. Ciao, Michael.
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Martin
libgo patch committed: Don't call __go_alloc/__go_free from environment routines
This patch to libgo changes the environment support routines to not call __go_alloc/__go_free, but to instead use malloc/free. This code just needs temporary buffers, it is natural to write it in C because it calls setenv/unsetenv, and C code can safely call malloc/free but with the future GC can not safely allocate memory on the Go heap. Patch bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 242592) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -2ab785788691ad289f838a0b3a6bc9013d0fc337 +fc4ca600b2fc6de81fd3c4014542d6a50593db1a The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-setenv.c === --- libgo/runtime/go-setenv.c (revision 242581) +++ libgo/runtime/go-setenv.c (working copy) @@ -9,10 +9,7 @@ #include #include -#include "go-alloc.h" #include "runtime.h" -#include "arch.h" -#include "malloc.h" /* Set the C environment from Go. This is called by syscall.Setenv. */ @@ -25,7 +22,6 @@ setenv_c (String k, String v) unsigned char *kn; const byte *vs; unsigned char *vn; - intgo len; ks = k.str; if (ks == NULL) @@ -39,25 +35,23 @@ setenv_c (String k, String v) #ifdef HAVE_SETENV - if (ks != NULL && ks[k.len] != 0) + if (ks[k.len] != 0) { - // Objects that are explicitly freed must be at least 16 bytes in size, - // so that they are not allocated using tiny alloc. - len = k.len + 1; - if (len < TinySize) - len = TinySize; - kn = __go_alloc (len); + kn = malloc (k.len + 1); + if (kn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); + kn[k.len] = '\0'; ks = kn; } - if (vs != NULL && vs[v.len] != 0) + if (vs[v.len] != 0) { - len = v.len + 1; - if (len < TinySize) - len = TinySize; - vn = __go_alloc (len); + vn = malloc (v.len + 1); + if (vn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (vn, vs, v.len); + vn[v.len] = '\0'; vs = vn; } @@ -66,19 +60,20 @@ setenv_c (String k, String v) #else /* !defined(HAVE_SETENV) */ len = k.len + v.len + 2; - if (len < TinySize) -len = TinySize; - kn = __go_alloc (len); + kn = malloc (len); + if (kn == NULL) +runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); kn[k.len] = '='; __builtin_memcpy (kn + k.len + 1, vs, v.len); kn[k.len + v.len + 1] = '\0'; putenv ((char *) kn); + kn = NULL; /* putenv takes ownership of the string. */ #endif /* !defined(HAVE_SETENV) */ if (kn != NULL) -__go_free (kn); +free (kn); if (vn != NULL) -__go_free (vn); +free (vn); } Index: libgo/runtime/go-unsetenv.c === --- libgo/runtime/go-unsetenv.c (revision 242581) +++ libgo/runtime/go-unsetenv.c (working copy) @@ -9,10 +9,7 @@ #include #include -#include "go-alloc.h" #include "runtime.h" -#include "arch.h" -#include "malloc.h" /* Unset an environment variable from Go. This is called by syscall.Unsetenv. */ @@ -24,7 +21,6 @@ unsetenv_c (String k) { const byte *ks; unsigned char *kn; - intgo len; ks = k.str; if (ks == NULL) @@ -33,14 +29,11 @@ unsetenv_c (String k) #ifdef HAVE_UNSETENV - if (ks != NULL && ks[k.len] != 0) + if (ks[k.len] != 0) { - // Objects that are explicitly freed must be at least 16 bytes in size, - // so that they are not allocated using tiny alloc. - len = k.len + 1; - if (len < TinySize) - len = TinySize; - kn = __go_alloc (len); + kn = malloc (k.len + 1); + if (kn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); ks = kn; } @@ -50,5 +43,5 @@ unsetenv_c (String k) #endif /* !defined(HAVE_UNSETENV) */ if (kn != NULL) -__go_free (kn); +free (kn); }
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 12:58 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. I disagree. At a minimum, the return value of malloc(0) is implementation-defined and so relying on it being either null or non-null is non-portable and can be a source of subtle bugs. But malloc(0) has also been known to result from unsigned overflow which has led to vulnerabilities and exploits (famously by the JPG COM vulnerability exploit, but there are plenty of others, including for instance CVE-2016-2851). Much academic research has been devoted to this problem and to solutions to detect and prevent it. The following paper has some good background and references: https://www.usenix.org/legacy/event/woot10/tech/full_papers/Vanegue.pdf Coding standards rules have been developed requiring conforming software to avoid allocating zero bytes for these reasons. TS 1796, the C Secure Coding Rules technical specification, has such a requirement. It was derived from the CERT C Secure Coding rule MEM04-C. Beware of zero-length allocations: https://www.securecoding.cert.org/confluence/x/GQI The same argument applies to alloca(0) with the added caveat that, unlike with the other allocation functions, a non-null return value need not be distinct. In the absence of a policy or guidelines it's a matter of opinion whether or not this warning belongs in either -Wall or -Wextra. Based on the severity of the problems that allocating zero size has been linked to I think it could be successfully argued that it should be in -Wall (the problems are obviously more serious than those that have ever been associated with the -Wunused-type warnings, for example). I put it in -Wextra only because I was trying to be sensitive to the "no false positive" argument. All this said, before debating the fine details of under which conditions each of the new warninsg should be enabled, I was hoping to get comments on the meat of the patch that implements the warnings. Martin
[PATCH] Fix PR78413
Hi, The if-conversion patch for PR77848 missed a case where an outer loop should not be versioned for vectorization; this was caught by an assert in tests recorded in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78413. This patch fixes the problem by ensuring that both inner and outer loop latches have a single predecessor before versioning an outer loop. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2016-11-18 Bill Schmidt PR tree-optimization/78413 * tree-if-conv.c (versionable_outer_loop_p): Require that both inner and outer loop latches have single predecessors. [gcc/testsuite] 2016-11-18 Bill Schmidt PR tree-optimization/78413 * gcc.dg/tree-ssa/pr78413.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr78413.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) @@ -0,0 +1,35 @@ +/* PR78413. These previously failed in tree if-conversion due to a loop + latch with multiple predecessors that the code did not anticipate. */ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math" } */ + +extern long long int llrint(double x); +int a; +double b; +__attribute__((cold)) void decode_init() { + int c, d = 0; + for (; d < 12; d++) { +if (d) + b = 0; +c = 0; +for (; c < 6; c++) + a = b ? llrint(b) : 0; + } +} + +struct S { + _Bool bo; +}; +int a, bb, c, d; +void fn1() { + do +do + do { + struct S *e = (struct S *)1; + do + bb = a / (e->bo ? 2 : 1); + while (bb); + } while (0); +while (d); + while (c); +} Index: gcc/tree-if-conv.c === --- gcc/tree-if-conv.c (revision 242551) +++ gcc/tree-if-conv.c (working copy) @@ -2575,6 +2575,8 @@ version_loop_for_if_conversion (struct loop *loop) - The loop has a single exit. - The loop header has a single successor, which is the inner loop header. +- Each of the inner and outer loop latches have a single + predecessor. - The loop exit block has a single predecessor, which is the inner loop's exit block. */ @@ -2586,7 +2588,9 @@ versionable_outer_loop_p (struct loop *loop) || loop->inner->next || !single_exit (loop) || !single_succ_p (loop->header) - || single_succ (loop->header) != loop->inner->header) + || single_succ (loop->header) != loop->inner->header + || !single_pred_p (loop->latch) + || !single_pred_p (loop->inner->latch)) return false; basic_block outer_exit = single_pred (loop->latch);
Re: [PATCH] avoid calling alloca(0)
On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: > >In the libiberty/alloca.c, I don't see anything special about alloca (0). > >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). > >alloca (0) just returns NULL after it. The diffutils link is the same > >alloca.c as in libiberty. Returning NULL or returning a non-unique pointer > >for alloca (0) is just fine, it is the same thing as returning NULL or > >returning a non-unique pointer from malloc (0). We definitely don't want > >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), > >because it behaves the same. > > I disagree. At a minimum, the return value of malloc(0) is > implementation-defined and so relying on it being either null > or non-null is non-portable and can be a source of subtle bugs. Most apps know what malloc (0) means and treat it correctly, they know they shouldn't dereference the pointer, because it is either NULL or holds an array with 0 elements. I fail to see why you would want to warn. > But malloc(0) has also been known to result from unsigned overflow > which has led to vulnerabilities and exploits (famously by the JPG > COM vulnerability exploit, but there are plenty of others, including > for instance CVE-2016-2851). Much academic research has been devoted > to this problem and to solutions to detect and prevent it. How is it different from overflowing to 1 or 2 or 27? What is special on the value 0? > In the absence of a policy or guidelines it's a matter of opinion > whether or not this warning belongs in either -Wall or -Wextra. It IMHO doesn't belong to either of these. Jakub
Re: [PATCH] Fix PR78413
On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote: > === > --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) > @@ -0,0 +1,35 @@ > +/* PR78413. These previously failed in tree if-conversion due to a loop > + latch with multiple predecessors that the code did not anticipate. */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math" } */ Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE. > +extern long long int llrint(double x); > +int a; > +double b; > +__attribute__((cold)) void decode_init() { > + int c, d = 0; > + for (; d < 12; d++) { > +if (d) > + b = 0; > +c = 0; > +for (; c < 6; c++) > + a = b ? llrint(b) : 0; > + } > +} > + > +struct S { > + _Bool bo; > +}; > +int a, bb, c, d; > +void fn1() { > + do > +do > + do { > + struct S *e = (struct S *)1; > + do > + bb = a / (e->bo ? 2 : 1); > + while (bb); > + } while (0); > +while (d); > + while (c); > +} -- Markus
[COMMITTED] Add myself to MAINTAINERS (Write After Approval).
Committed as r242595. Thanks, Toma Tabacu Index: ChangeLog === --- ChangeLog (revision 242594) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2016-11-18 Toma Tabacu + + * MAINTAINERS (Write After Approval): Add myself. + 2016-11-15 Matthias Klose * Makefile.def: Remove references to GCJ. Index: MAINTAINERS === --- MAINTAINERS (revision 242594) +++ MAINTAINERS (working copy) @@ -587,6 +587,7 @@ Robert Suchanek Andrew Sutton Gabriele Svelto +Toma Tabacu Sriraman Tallam Chung-Lin Tang Samuel Tardieu
Re: [PATCH] Fix PR78413
> On Nov 18, 2016, at 10:33 AM, Markus Trippelsdorf > wrote: > > On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote: >> === >> --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) >> +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) >> @@ -0,0 +1,35 @@ >> +/* PR78413. These previously failed in tree if-conversion due to a loop >> + latch with multiple predecessors that the code did not anticipate. */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O3 -ffast-math" } */ > > Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE. Whoops. Yes indeed, sorry I missed the flag difference for the second failure. Bill
Re: [PATCH, GCC/ARM, ping] Optional -mthumb for Thumb only targets
On 11/11/16 14:35, Kyrill Tkachov wrote: On 08/11/16 13:36, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 25/10/16 18:07, Thomas Preudhomme wrote: Hi, Currently when a user compiles for a thumb-only target (such as Cortex-M processors) without specifying the -mthumb option GCC throws the error "target CPU does not support ARM mode". This is suboptimal from a usability point of view: the -mthumb could be deduced from the -march or -mcpu option when there is no ambiguity. This patch implements this behavior by extending the DRIVER_SELF_SPECS to automatically append -mthumb to the command line for thumb-only targets. It does so by checking the last -march option if any is given or the last -mcpu option otherwise. There is no ordering issue because conflicting -mcpu and -march is already handled. Note that the logic cannot be implemented in function arm_option_override because we need to provide the modified command line to the GCC driver for finding the right multilib path and the function arm_option_override is executed too late for that effect. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2016-10-18 Terry Guo Thomas Preud'homme PR target/64802 * common/config/arm/arm-common.c (arm_target_thumb_only): New function. * config/arm/arm-opts.h: Include arm-flags.h. (struct arm_arch_core_flag): Define. (arm_arch_core_flags): Define. * config/arm/arm-protos.h: Include arm-flags.h. (FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32, FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E, FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM, FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV, FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT, FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST, FL_TUNE, FL_FOR_ARCH2, FL_FOR_ARCH3, FL_FOR_ARCH3M, FL_FOR_ARCH4, FL_FOR_ARCH4T, FL_FOR_ARCH5, FL_FOR_ARCH5T, FL_FOR_ARCH5E, FL_FOR_ARCH5TE, FL_FOR_ARCH5TEJ, FL_FOR_ARCH6, FL_FOR_ARCH6J, FL_FOR_ARCH6K, FL_FOR_ARCH6Z, FL_FOR_ARCH6ZK, FL_FOR_ARCH6KZ, FL_FOR_ARCH6T2, FL_FOR_ARCH6M, FL_FOR_ARCH7, FL_FOR_ARCH7A, FL_FOR_ARCH7VE, FL_FOR_ARCH7R, FL_FOR_ARCH7M, FL_FOR_ARCH7EM, FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A, FL2_FOR_ARCH8_2A, FL_FOR_ARCH8M_BASE, FL_FOR_ARCH8M_MAIN, arm_feature_set, ARM_FSET_MAKE, ARM_FSET_MAKE_CPU1, ARM_FSET_MAKE_CPU2, ARM_FSET_CPU1, ARM_FSET_CPU2, ARM_FSET_EMPTY, ARM_FSET_ANY, ARM_FSET_HAS_CPU1, ARM_FSET_HAS_CPU2, ARM_FSET_HAS_CPU, ARM_FSET_ADD_CPU1, ARM_FSET_ADD_CPU2, ARM_FSET_DEL_CPU1, ARM_FSET_DEL_CPU2, ARM_FSET_UNION, ARM_FSET_INTER, ARM_FSET_XOR, ARM_FSET_EXCLUDE, ARM_FSET_IS_EMPTY, ARM_FSET_CPU_SUBSET): Move to ... * config/arm/arm-flags.h: This new file. * config/arm/arm.h (TARGET_MODE_SPEC_FUNCTIONS): Define. (EXTRA_SPEC_FUNCTIONS): Add TARGET_MODE_SPEC_FUNCTIONS to its value. (TARGET_MODE_SPECS): Define. (DRIVER_SELF_SPECS): Add TARGET_MODE_SPECS to its value. *** gcc/testsuite/ChangeLog *** 2016-10-11 Thomas Preud'homme PR target/64802 * gcc.target/arm/optional_thumb-1.c: New test. * gcc.target/arm/optional_thumb-2.c: New test. * gcc.target/arm/optional_thumb-3.c: New test. No regression when running the testsuite for -mcpu=cortex-m0 -mthumb, -mcpu=cortex-m0 -marm and -mcpu=cortex-a8 -marm Is this ok for trunk? This looks like a useful usability improvement. This is ok after a bootstrap on an arm-none-linux-gnueabihf target. Sorry for the delay, Kyrill I've rebased the patch on top of the arm_feature_set type consistency fix [1] and committed it. The committed patch is in attachment for reference. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01680.html Best regards, Thomas diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c index f3b674339a50460d55920ca8d26275a550bbbc1e..473417a2e5f04488197c27ead2b65680bddec274 100644 --- a/gcc/common/config/arm/arm-common.c +++ b/gcc/common/config/arm/arm-common.c @@ -98,6 +98,29 @@ arm_rewrite_mcpu (int argc, const char **argv) return arm_rewrite_selected_cpu (argv[argc - 1]); } +/* Called by the driver to check whether the target denoted by current + command line options is a Thumb-only target. ARGV is an array of + -march and -mcpu values (ie. it contains the rhs after the equal + sign) and we use the last one of them to make a decision. The + number of elements in ARGV is given in ARGC. */ +const char * +arm_target_thumb_only (int argc, const char **argv) +{ + unsigned int opt; + + if (argc) +{ + for (opt = 0; opt < (ARRAY_SIZE (arm_arch_core_flags)); opt++) + if ((strcmp (argv[argc - 1], arm_arch_core_flags[opt].name) == 0) + && !ARM_FSET_HAS_CPU1(arm_arch_core_flags[opt].flags, FL_N
Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
Martin: are the changes to your test cases OK by you, or is there a better way to rewrite them? Thanks for looking into it! Since the purpose of the test_sprintf_note function in the test is to verify the location of the caret within the warnings I think we should keep it if it's possible. Would either removing the P macro or moving the function to a different file that doesn't use the -ftrack-macro-expansion=0 option work? Martin diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c index 5779a95..b6a6011 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c @@ -181,13 +181,13 @@ void test_sprintf_note (void) /* Diagnostic column numbers are 1-based. */ P (buffer (0),/* { dg-message "format output 4 bytes into a destination of size 0" } */ - "%c%s%i", '1', "2", 3);/* { dg-warning "7:.%c. directive writing 1 byte into a region of size 0" } */ + "%c%s%i", '1', "2", 3);/* { dg-warning ".%c. directive writing 1 byte into a region of size 0" } */ P (buffer (1),/* { dg-message "format output 6 bytes into a destination of size 1" } */ - "%c%s%i", '1', "23", 45); /* { dg-warning "9:.%s. directive writing 2 bytes into a region of size 0" } */ + "%c%s%i", '1', "23", 45); /* { dg-warning ".%s. directive writing 2 bytes into a region of size 0" } */ P (buffer (2),/* { dg-message "format output 6 bytes into a destination of size 2" } */ - "%c%s%i", '1', "2", 345); /* { dg-warning "11:.%i. directive writing 3 bytes into a region of size 0" } */ + "%c%s%i", '1', "2", 345); /* { dg-warning ".%i. directive writing 3 bytes into a region of size 0" } */ /* It would be nice if the caret in the location range for the format string below could be made to point at the closing quote of the format diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c index faa5806..b587d00 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret -ftrack-macro-expansion=0" } */ +/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */ extern int sprintf (char*, const char*, ...);
Re: [PATCH] avoid calling alloca(0)
On 11/18/16, Jakub Jelinek wrote: > On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: >> >In the libiberty/alloca.c, I don't see anything special about alloca >> > (0). >> >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca >> > (4035). >> >alloca (0) just returns NULL after it. The diffutils link is the same >> >alloca.c as in libiberty. Returning NULL or returning a non-unique >> > pointer >> >for alloca (0) is just fine, it is the same thing as returning NULL or >> >returning a non-unique pointer from malloc (0). We definitely don't >> > want >> >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), >> >because it behaves the same. >> >> I disagree. At a minimum, the return value of malloc(0) is >> implementation-defined and so relying on it being either null >> or non-null is non-portable and can be a source of subtle bugs. > > Most apps know what malloc (0) means and treat it correctly, they know they > shouldn't dereference the pointer, because it is either NULL or holds an > array with 0 elements. I fail to see why you would want to warn. > Just as a reference point, my old version of the clang static analyzer warns for malloc(0); but not alloca(0); though. For example: $ cat alloc_0.c #include #include void fn(void) { char *ptr0 = (char *)malloc(0); void *ptr1 = alloca(0); *ptr0 = *(char *)ptr1; } $ clang -Wall -Wextra -pedantic --analyze -c alloc_0.c alloc_0.c:6:23: warning: Call to 'malloc' has an allocation size of 0 bytes char *ptr0 = (char *)malloc(0); ^ ~ 1 warning generated. Doing some more Googling on the topic finds debates as to whether this warning is warranted or not, but it seems like people are pretty used to dealing with it from clang already, so they probably wouldn't mind too much if gcc started being consistent with it.
Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f
Hi, On Thu, 17 Nov 2016, Bin.Cheng wrote: > B) Depending on ilp, I think below test strings fail for long time with > haswell: > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 1 "pcom" { target lp64 } } } > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 2 "pcom" { target ia32 } } } > Because vectorizer choose vf==4 in this case, and there is no > predictive commoning opportunities at all. > Also the newly added test string fails in this case too because the > prolog peeled iterates more than 1 times. Btw, this probably means that on haswell (or other archs with vf==4) mgrid is slower than necessary. On mgrid you really really want predictive commoning to happen. Vectorization isn't that interesting there. Ciao, Michael.
[PATCH GCC]Move simplification of (A == C1) ? A : C2 to match.pd
Hi, This is a follow up patch for https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html It moves remaining simplification for (A == C1) ? A : C2 in fold_cond_expr_with_comparison to match.pd. Bootstrap and test on x86_64 and AArch64, is it OK? Thanks, bin 2016-11-17 Bin Cheng * fold-const.c (fold_cond_expr_with_comparison): Move simplification for A == C1 ? A : C2 to below. * match.pd: Move from above to here: (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 1e61ccf..1877dac 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -5210,19 +5210,6 @@ fold_cond_expr_with_comparison (location_t loc, tree type, } } - /* If this is A == C1 ? A : C2 with C1 and C2 constant integers, - we simplify it into A == C1 ? C1 : C2. */ - - if (comp_code == EQ_EXPR - && INTEGRAL_TYPE_P (type) - && TREE_CODE (arg01) == INTEGER_CST - && TREE_CODE (arg1) != INTEGER_CST - && TREE_CODE (arg2) == INTEGER_CST) -{ - arg1 = fold_convert_loc (loc, type, arg01); - return fold_build3_loc (loc, COND_EXPR, type, arg0, arg1, arg2); -} - return NULL_TREE; } diff --git a/gcc/match.pd b/gcc/match.pd index 4beac4e..a8d94de 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1939,15 +1939,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Simplification moved from fold_cond_expr_with_comparison. It may also be extended. */ -/* (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if: +/* This pattern implements two kinds simplification: + + Case 1) + (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if: 1) Conversions are type widening from smaller type. 2) Const c1 equals to c2 after canonicalizing comparison. 3) Comparison has tree code LT, LE, GT or GE. This specific pattern is needed when (cmp (convert x) c) may not be simplified by comparison patterns because of multiple uses of x. It also makes sense here because simplifying across multiple - referred var is always benefitial for complicated cases. */ -(for cmp (lt le gt ge) + referred var is always benefitial for complicated cases. + + Case 2) + (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2). */ +(for cmp (lt le gt ge eq) (simplify (cond (cmp@0 (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2) (with @@ -1966,37 +1972,45 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_UNSIGNED (from_type) || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type) { -if (wi::to_widest (@3) == (wi::to_widest (@2) - 1)) - { -/* X <= Y - 1 equals to X < Y. */ -if (cmp_code == LE_EXPR) - code = LT_EXPR; -/* X > Y - 1 equals to X >= Y. */ -if (cmp_code == GT_EXPR) - code = GE_EXPR; - } -if (wi::to_widest (@3) == (wi::to_widest (@2) + 1)) - { -/* X < Y + 1 equals to X <= Y. */ -if (cmp_code == LT_EXPR) - code = LE_EXPR; -/* X >= Y + 1 equals to X > Y. */ -if (cmp_code == GE_EXPR) - code = GT_EXPR; - } -if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3)) +if (code != EQ_EXPR) { -if (cmp_code == LT_EXPR || cmp_code == LE_EXPR) - code = MIN_EXPR; -if (cmp_code == GT_EXPR || cmp_code == GE_EXPR) - code = MAX_EXPR; +if (wi::to_widest (@3) == (wi::to_widest (@2) - 1)) + { +/* X <= Y - 1 equals to X < Y. */ +if (cmp_code == LE_EXPR) + code = LT_EXPR; +/* X > Y - 1 equals to X >= Y. */ +if (cmp_code == GT_EXPR) + code = GE_EXPR; + } +if (wi::to_widest (@3) == (wi::to_widest (@2) + 1)) + { +/* X < Y + 1 equals to X <= Y. */ +if (cmp_code == LT_EXPR) + code = LE_EXPR; +/* X >= Y + 1 equals to X > Y. */ +if (cmp_code == GE_EXPR) + code = GT_EXPR; + } +if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3)) + { +if (cmp_code == LT_EXPR || cmp_code == LE_EXPR) + code = MIN_EXPR; +if (cmp_code == GT_EXPR || cmp_code == GE_EXPR) + code = MAX_EXPR; + } } +/* Can do A == C1 ? A : C2 -> A == C1 ? C1 : C2? */ +else if (!int_fits_type_p (@3, from_type)) + code = ERROR_MARK; } } (if (code == MAX_EXPR) (convert (max @1 (convert:from_type @2))) (if (code == MIN_EXPR) - (convert (min @1 (convert:from_type @2 + (convert (min @1 (convert
[PATCH GCC]Simplify (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2)
Hi, This is a rework of https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02007.html. Though review comments suggested it could be merged with last kind simplification of fold_cond_expr_with_comparison, it's not really applicable. As a matter of fact, the suggestion stands for patch @https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02005.html. I had previous patch (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html) moving fold_cond_expr_with_comparison to match.pd pattern and incorporated that patch into it. For this one, the rework is trivial, just renames several variable tags as suggested. Bootstrap and test on x86_64 and AArch64, is it OK? Thanks, bin 2016-11-17 Bin Cheng * match.pd: Add new pattern: (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2). gcc/testsuite/ChangeLog 2016-11-17 Bin Cheng * gcc.dg/fold-bopcond-1.c: New test. * gcc.dg/fold-bopcond-2.c: New test.diff --git a/gcc/match.pd b/gcc/match.pd index a8d94de..e47f77a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2012,6 +2012,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (code == EQ_EXPR) (cond (cmp @1 (convert:from_type @3)) (convert:from_type @3) @2))) +/* (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2) if: + + 1) OP is PLUS or MINUS. + 2) CMP is LT, LE, GT or GE. + 3) C3 == (C1 op C2), and computation doesn't have undefined behavior. + + This pattern also handles special cases like: + + A) Operand x is a unsigned to signed type conversion and c1 is + integer zero. In this case, + (signed type)x < 0 <=> x > MAX_VAL(signed type) + (signed type)x >= 0 <=> x <= MAX_VAL(signed type) + B) Const c1 may not equal to (C3 op' C2). In this case we also + check equality for (c1+1) and (c1-1) by adjusting comparison + code. + + TODO: Though signed type is handled by this pattern, it cannot be + simplified at the moment because C standard requires additional + type promotion. In order to match&simplify it here, the IR needs + to be cleaned up by other optimizers, i.e, VRP. */ +(for op (plus minus) + (for cmp (lt le gt ge) + (simplify + (cond (cmp (convert? @X) INTEGER_CST@1) (op @X INTEGER_CST@2) INTEGER_CST@3) + (with { tree from_type = TREE_TYPE (@X), to_type = TREE_TYPE (@1); } +(if (types_match (from_type, to_type) +/* Check if it is special case A). */ +|| (TYPE_UNSIGNED (from_type) +&& !TYPE_UNSIGNED (to_type) +&& TYPE_PRECISION (from_type) == TYPE_PRECISION (to_type) +&& integer_zerop (@1) +&& (cmp == LT_EXPR || cmp == GE_EXPR))) + (with + { + bool overflow = false; + enum tree_code code, cmp_code = cmp; + tree real_c1, c1 = @1, c2 = @2, c3 = @3; + tree op_type = TREE_TYPE (@X); + signop sgn = TYPE_SIGN (op_type); + widest_int wmin = widest_int::from (wi::min_value (op_type), sgn); + widest_int wmax = widest_int::from (wi::max_value (op_type), sgn); + + /* Handle special case A), given x of unsigned type: + ((signed type)x < 0) <=> (x > MAX_VAL(signed type)) + ((signed type)x >= 0) <=> (x <= MAX_VAL(signed type)) */ + if (!types_match (from_type, to_type)) + { + if (cmp_code == LT_EXPR) + cmp_code = GT_EXPR; + if (cmp_code == GE_EXPR) + cmp_code = LE_EXPR; + c1 = wide_int_to_tree (op_type, wi::max_value (to_type)); + } + /* To simplify this pattern, we require c3 = (c1 op c2). Here we + compute (c3 op' c2) and check if it equals to c1 with op' being + the inverted operator of op. Make sure overflow doesn't happen + if it is undefined. */ + if (op == PLUS_EXPR) + real_c1 = wide_int_to_tree (op_type, + wi::sub (c3, c2, sgn, &overflow)); + else + real_c1 = wide_int_to_tree (op_type, + wi::add (c3, c2, sgn, &overflow)); + code = cmp_code; + if (!overflow || !TYPE_OVERFLOW_UNDEFINED (op_type)) + { + /* Check if c1 equals to real_c1. Boundary condition is handled + by adjusting comparison operation if necessary. */ + if (wi::to_widest (c1) == (wi::to_widest (real_c1) - 1)) + { + /* X <= Y - 1 equals to X < Y. */ + if (cmp_code == LE_EXPR) + code = LT_EXPR; + /* X > Y - 1 equals to X >= Y. */ + if (cmp_code == GT_EXPR) + code = GE_EXPR; + } + if (wi::to_widest (c1) == (wi::to_widest (real_c1) + 1)) + { + /* X < Y + 1 equals to X <= Y. */ + if (cmp_code == LT_EXPR) + code = LE_EXPR; + /* X >= Y + 1 equals to X > Y. */ + if (cmp_code == GE_EXPR) +
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On Nov 18, 2016, at 3:13 AM, Iain Sandoe wrote: > > Thanks, at least I’m not going completely crazy ;-) I'll just note for completeness that Jeff also couldn't explain a failure of your latest patch. If you run into one, let me know. > OK now for trunk? Ok. > open branches? Ok.
Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f
On Fri, Nov 18, 2016 at 4:52 PM, Michael Matz wrote: > Hi, > > On Thu, 17 Nov 2016, Bin.Cheng wrote: > >> B) Depending on ilp, I think below test strings fail for long time with >> haswell: >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 1 "pcom" { target lp64 } } } >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 2 "pcom" { target ia32 } } } >> Because vectorizer choose vf==4 in this case, and there is no >> predictive commoning opportunities at all. >> Also the newly added test string fails in this case too because the >> prolog peeled iterates more than 1 times. > > Btw, this probably means that on haswell (or other archs with vf==4) mgrid > is slower than necessary. On mgrid you really really want predictive > commoning to happen. Vectorization isn't that interesting there. Interesting, I will check if there is difference between 2/4 vf. we do have cases that smaller vf is better and should be chosen, though for different reasons. Thanks, bin > > > Ciao, > Michael.
[PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment
Hi all, attached patch fixes the issue which was given by nesting calls to typebound procedures. The expression of the inner typebound procedure call was resolved correctly, but in the case of it's having a class type the ref-list was discarded. Leaving the list of references untouched, resolves the wrong error-message and generates correct code. When checking the shortened example in comment #3 one gets a segfault, because v6 is not allocated explicitly. The initial example made sure, that v6 was allocated. Reading through the standard, I did not find, whether the auto-allocation is applicable here. I therefore have extended the testcase by an allocate(v6). Dominique pointed out, that there are already some prs for adding an on-demand -fcheck=something runtime check for not allocated objects. But that does not solve the question, whether v6 should be auto-allocated when assigned by a typebound-procedure (ifort and cray need v6 allocated do, i.e., they don't auto-allocate). Btw, when using the in gcc-7 available polymorphic assign, then v6 is actually auto-allocated and the program runs fine. So what are your opinions on the auto-allocation issue? This patch fixes the wrong error messages in both gcc-7 and gcc-6. Bootstraped and regtested on x86_64-linux/F23 for gcc-7 and -6. Ok for trunk and gcc-6? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de gcc/testsuite/ChangeLog: 2016-11-18 Andre Vehreschild PR fortran/78395 * gfortran.dg/typebound_operator_21.f03: New test. gcc/fortran/ChangeLog: 2016-11-18 Andre Vehreschild PR fortran/78395 * resolve.c (resolve_typebound_function): Prevent stripping of refs, when the base-expression is a class' typed one. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 825bb12..589a673 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -6140,7 +6140,7 @@ resolve_typebound_function (gfc_expr* e) gfc_free_ref_list (class_ref->next); class_ref->next = NULL; } - else if (e->ref && !class_ref) + else if (e->ref && !class_ref && expr->ts.type != BT_CLASS) { gfc_free_ref_list (e->ref); e->ref = NULL; diff --git a/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 new file mode 100644 index 000..ea374a1 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 @@ -0,0 +1,78 @@ +! { dg-do run } +! +! Test that pr78395 is fixed. +! Contributed by Chris and Janus Weil + +module types_mod + implicit none + + type, public :: t1 +integer :: a + contains +procedure :: get_t2 + end type + + type, public :: t2 +integer :: b + contains +procedure, pass(rhs) :: mul2 +procedure :: assign +generic :: operator(*) => mul2 +generic :: assignment(=) => assign + end type + +contains + + function get_t2(this) +class(t1), intent(in) :: this +class(t2), allocatable :: get_t2 +type(t2), allocatable :: local +allocate(local) +local%b = this%a +call move_alloc(local, get_t2) + end function + + function mul2(lhs, rhs) +class(t2), intent(in) :: rhs +integer, intent(in) :: lhs +class(t2), allocatable :: mul2 +type(t2), allocatable :: local +allocate(local) +local%b = rhs%b*lhs +call move_alloc(local, mul2) + end function + + subroutine assign(this, rhs) +class(t2), intent(out) :: this +class(t2), intent(in) :: rhs +select type(rhs) +type is(t2) + this%b = rhs%b +class default + error stop +end select + end subroutine + +end module + + +program minimal + use types_mod + implicit none + + class(t1), allocatable :: v4 + class(t2), allocatable :: v6 + + allocate(v4, source=t1(4)) + allocate(v6) + v6 = 3 * v4%get_t2() + + select type (v6) +type is (t2) + if (v6%b /= 12) error stop +class default + error stop + end select + deallocate(v4, v6) +end +
Re: [PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment
Hi Andre, Btw, when using the in gcc-7 available polymorphic assign, then v6 is actually auto-allocated and the program runs fine. So what are your opinions on the auto-allocation issue? It is my experience that such questions can speedily and correctly be answered by the regulars on comp.lang.fortran. I have no opinion either way :-) Regards Thomas
Re: [PATCH] Fix combine's make_extraction (PR rtl-optimization/78378)
Hi, On Wed, 16 Nov 2016, Jakub Jelinek wrote: > If inner is a MEM, make_extraction requires that pos is a multiple of > bytes and deals with offsetting it. Or otherwise requires that pos is a > multiple of BITS_PER_WORD and for REG inner it handles that too. But if > inner is something different, it calls just force_to_mode to the target > mode, which only really works if pos is 0. This should also fix the aarch64 fail for gcc.c-torture/execute/cbrt.c . (At least I came up with the same patch in PR78390) Ciao, Michael.
Patch ping
Hi! I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01074.html - C++ ABI - mangling of TLS aux symbols; either the posted patch or one with if (abi_version_at_least (11)) http://gcc.gnu.org/ml/gcc-patches/2016-11/msg00351.html - DWARF Solaris bootstrap fix Jakub
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 09:29 AM, Jakub Jelinek wrote: On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. I disagree. At a minimum, the return value of malloc(0) is implementation-defined and so relying on it being either null or non-null is non-portable and can be a source of subtle bugs. Most apps know what malloc (0) means and treat it correctly, they know they shouldn't dereference the pointer, because it is either NULL or holds an array with 0 elements. I fail to see why you would want to warn. Because people make mistakes and warnings help us avoid them (isn't that obvious?) Just because we get it right most of the time doesn't mean we get right all of the time. The papers and exploits I pointed to should provide ample evidence that zero allocations are a problem that can have serious (and costly) consequences. We (i.e., Red Hat) recognize this risk and have made it our goal to help stem some of these problems. But malloc(0) has also been known to result from unsigned overflow which has led to vulnerabilities and exploits (famously by the JPG COM vulnerability exploit, but there are plenty of others, including for instance CVE-2016-2851). Much academic research has been devoted to this problem and to solutions to detect and prevent it. How is it different from overflowing to 1 or 2 or 27? What is special on the value 0? It's a case that, unlike the others, can be readily detected. It would be nice to detect the others as well but GCC can't do that yet. That doesn't mean we shouldn't try to detect at least the small subset for now. It doesn't have to be perfect for it to be useful. In the absence of a policy or guidelines it's a matter of opinion whether or not this warning belongs in either -Wall or -Wextra. It IMHO doesn't belong to either of these. You've made that quite clear. I struggle to reconcile your position in this case with the one in debate about the -Wimplicit-fallthrough option where you insisted on the exact opposite despite the overwhelming number of false positives caused by it. Why is it appropriate for that option to be in -Wextra and not this one? Martin
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
On Nov 18, 2016, at 4:33 AM, Iain Sandoe wrote: > > As discussed on IRC, I was under the impression that it is desired to move > away from #ifdef towards if() and I have been adding those where locally > things have been touched - in this case it was only partially possible. > > However, FAOD - you pointed out that for the RS6000 back-end #ifdef are still > preferred, Shudder. We can encourage anyone that likes #if, to like if () instead. > OK now for trunk? Ok; I'm pretty sure that change can only impact darwin. If you wanted to reduce the test case to 3 cases, I think that would also show the problem that show that your patch fixes it, ok with such a change, if you want. > Open branches? I'm fine with back porting.
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
Hi Mike, > On 18 Nov 2016, at 17:16, Mike Stump wrote: > > On Nov 18, 2016, at 4:33 AM, Iain Sandoe wrote: >> >> As discussed on IRC, I was under the impression that it is desired to move >> away from #ifdef towards if() and I have been adding those where locally >> things have been touched - in this case it was only partially possible. >> >> However, FAOD - you pointed out that for the RS6000 back-end #ifdef are >> still preferred, > > Shudder. We can encourage anyone that likes #if, to like if () instead. > >> OK now for trunk? > > Ok; I'm pretty sure that change can only impact darwin. If you wanted to > reduce the test case to 3 cases, I think that would also show the problem > that show that your patch fixes it, ok with such a change, if you want. I’d like to do that; is there a way to force a jump table for a limited set of cases? (the example was about the smallest I could get where GCC elected to produce a jump table instead of branches) Iain > >> Open branches? > > I'm fine with back porting. >
Re: [PATCH v2][PR libgfortran/78314] Fix ieee_support_halting
On 16/11/16 16:53, Szabolcs Nagy wrote: > ieee_support_halting only checked the availability of status > flags, not trapping support. On some targets the later can > only be checked at runtime: feenableexcept reports if > enabling traps failed. > > So check trapping support by enabling/disabling it. > > Updated the test that enabled trapping to check if it is > supported. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > gcc/testsuite/ > 2016-11-16 Szabolcs Nagy > > PR libgfortran/78314 > * gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting. > > libgfortran/ > 2016-11-16 Szabolcs Nagy > > PR libgfortran/78314 > * config/fpu-glibc.h (support_fpu_trap): Use feenableexcept. > it seems this broke ieee_8.f90 which tests compile time vs runtime value of ieee_support_halting if fortran needs this, then support_halting should be always false on arm and aarch64. but i'm not familiar enough with fortran to tell if there is some better workaround.
Re: [PATCH] DWARF: make signedness explicit for enumerator const values
On 11/14/2016 01:05 PM, Mark Wielaard wrote: You can either choose a signed/unsigned form not giving the consumer a hint about the size of the underlying constant value or one of the sized data forms that don't give a hint about the value representation/signedness. So in both cases the consumer looses without an actual (base) type telling them how to interpret the constant. I see, thank you for the explanation! :-) I agree with you, so I’ll revise to instead add a DW_AT_type attribute to enumeration DIEs to point to a base type. -- Pierre-Marie de Rodat
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
I think restoring bootstrap is likely a good thing. On Fri, Nov 18, 2016 at 2:45 AM, Rainer Orth wrote: > > I guess this is ok for mainline now to restore bootstrap?
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
Hi Christophe, On 11/18/2016 03:03 PM, Christophe Lyon wrote: Hi, Part of the new testcase added with this commit fails on arm* targets: g++.dg/pr78112.C scan-assembler-times DW_AT_object_pointer 37 Thank you for the report. As I said on the bugzilla (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112#c17), I reproduce inconsistent results on this even on a very reduced testcase. So I wonder if I should just remove this testcase. -- Pierre-Marie de Rodat
Re: [PATCH] avoid calling alloca(0)
On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote: > Because people make mistakes and warnings help us avoid them (isn't > that obvious?) Just because we get it right most of the time doesn't > mean we get right all of the time. The papers and exploits I pointed > to should provide ample evidence that zero allocations are a problem > that can have serious (and costly) consequences. We (i.e., Red Hat) > recognize this risk and have made it our goal to help stem some of > these problems. Are you talking about cases where user writes malloc (0) explicitly, or where user writes malloc (n); and the optimizers figure out n is 0, either always, or e.g. because the compiler decided to duplicate the code and and in one of the branches it proves it is 0, or you want to add a runtime warning when malloc is called with 0? E.g. I don't see how writing malloc (0) in the code should have anything to do with any overflows. The cases where jump threading creates cases where we call functions with constant arguments and we then warn on them is also problematic, often such code is even dead except the compiler is not able to prove it. > >It IMHO doesn't belong to either of these. > > You've made that quite clear. I struggle to reconcile your > position in this case with the one in debate about the > -Wimplicit-fallthrough option where you insisted on the exact > opposite despite the overwhelming number of false positives > caused by it. Why is it appropriate for that option to be in > -Wextra and not this one? It also matters what one can do to tell the compiler the code is fine. For e.g. -Wimplicit-fallthrough and many other warnings one can add a comment, or attribute, etc. to get the warning out of the way. But the patch you've posted for the alloca (0) stuff contained changes of n to n + !n, so the warning basically asks people to slow down their code for no reason, just to get the warning out of the way. Jakub
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
On Fri, Nov 18, 2016 at 05:19:31PM +, Iain Sandoe wrote: > I’d like to do that; is there a way to force a jump table for a limited set > of cases? > (the example was about the smallest I could get where GCC elected to produce > a jump table instead of branches) --param case-values-threshold ? Segher
Re: Import libcilkrts Build 4467 (PR target/68945)
2016-11-17 20:01 GMT+03:00 Jeff Law : > On 11/17/2016 09:56 AM, Ilya Verbin wrote: >> >> 2016-11-17 18:50 GMT+03:00 Rainer Orth : >>> >>> Rainer Orth writes: >>> I happened to notice that my libcilkrts SPARC port has been applied upstream. So to reach closure on this issue for the GCC 7 release, I'd like to import upstream into mainline which seems to be covered by the free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even though https://gcc.gnu.org/codingconventions.html#upstream lists nothing specific and we have no listed maintainer. >>> >>> >>> I initially used Ilya's intel.com address, which bounced. Now using the >>> current address listed in MAINTAINERS... >> >> >> Yeah, I don't work for Intel anymore. And I'm not a libcilkrts >> maintainer, so I can't approve it. > > Do you want to be? IIRC I was going to nominate you, but held off knowing > your situation was going to change. > > If you're interested in maintainer positions, I can certainly still nominate > you. I have little experience with this library, and no longer have a connection with Cilk developers an Intel, so I'm not interested. -- Ilya
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 09:01 AM, Martin Sebor wrote: On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Well, how about the rule is that you look at the convention of the specific list you are adding something to? At least that way we retain local consistency and don't mess up those parts of the documentation that we have already tried to organize in some way. There's so much material in the command-line options chapter that it would be hard to figure out how to present it even if the content were completely static, much less when people are adding/renaming/modifying options all the time. -Sandra
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
On Nov 18, 2016, at 2:45 AM, Rainer Orth wrote: > So the current suggestion is to combine my fixincludes patch and Jack's > patch to disable use if !__BLOCKS__. > I guess this is ok for mainline now to restore bootstrap? I think we are down to everyone likes this, Ok. The big question is, does this need a back port? If you fix includes virtual members or data members of C/C++ classes, just be careful disappearing content as that can change the layout of the structure or class.
libgo patch committed: Move sched variable from C to Go
This patch simply moves the schedt (aka Sched) type and the sched variable from C to Go. This is a step toward further changes. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 242594) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -fc4ca600b2fc6de81fd3c4014542d6a50593db1a +bf4762823c4543229867436399be3ae30b4d13bb The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/runtime/runtime2.go === --- libgo/go/runtime/runtime2.go(revision 242581) +++ libgo/go/runtime/runtime2.go(working copy) @@ -550,9 +550,6 @@ const ( _MaxGomaxprocs = 1 << 8 ) -/* -Commented out for gccgo for now. - type schedt struct { // accessed atomically. keep at top to ensure alignment on 32-bit systems. goidgen uint64 @@ -578,18 +575,17 @@ type schedt struct { runqsize int32 // Global cache of dead G's. - gflock mutex - gfreeStack *g - gfreeNoStack *g - ngfree int32 + gflock mutex + gfree *g + ngfree int32 // Central cache of sudog structs. sudoglock mutex sudogcache *sudog - // Central pool of available defer structs of different sizes. + // Central pool of available defer structs. deferlock mutex - deferpool [5]*_defer + deferpool *_defer gcwaiting uint32 // gc is waiting to run stopwait int32 @@ -608,7 +604,6 @@ type schedt struct { procresizetime int64 // nanotime() of last change to gomaxprocs totaltime int64 // ∫gomaxprocs dt up to procresizetime } -*/ // The m.locked word holds two pieces of state counting active calls to LockOSThread/lockOSThread. // The low bit (LockExternal) is a boolean reporting whether any LockOSThread call is active. @@ -772,8 +767,10 @@ var ( ncpu int32 -// forcegc forcegcstate -// sched schedt + // forcegc forcegcstate + + sched schedt + // newprocsint32 // Information about what cpu features are available. Index: libgo/go/runtime/stubs.go === --- libgo/go/runtime/stubs.go (revision 242581) +++ libgo/go/runtime/stubs.go (working copy) @@ -520,3 +520,9 @@ func dumpregs(*_siginfo_t, unsafe.Pointe // Temporary for gccgo until we port panic.go. func startpanic() + +// Temporary for gccgo until we port proc.go. +//go:linkname getsched runtime.getsched +func getsched() *schedt { + return &sched +} Index: libgo/runtime/proc.c === --- libgo/runtime/proc.c(revision 242581) +++ libgo/runtime/proc.c(working copy) @@ -351,48 +351,18 @@ runtime_mcall(void (*pfn)(G*)) // // Design doc at http://golang.org/s/go11sched. -typedef struct Sched Sched; -struct Sched { - Lock; - - uint64 goidgen; - M* midle; // idle m's waiting for work - int32 nmidle; // number of idle m's waiting for work - int32 nmidlelocked; // number of locked m's waiting for work - int32 mcount; // number of m's that have been created - int32 maxmcount; // maximum number of m's allowed (or die) - - P* pidle; // idle P's - uint32 npidle; - uint32 nmspinning; - - // Global runnable queue. - G* runqhead; - G* runqtail; - int32 runqsize; - - // Global cache of dead G's. - Lockgflock; - G* gfree; - - uint32 gcwaiting; // gc is waiting to run - int32 stopwait; - Notestopnote; - uint32 sysmonwait; - Notesysmonnote; - uint64 lastpoll; - - int32 profilehz; // cpu profiling rate -}; +typedef struct schedt Sched; enum { - // Number of goroutine ids to grab from runtime_sched.goidgen to local per-P cache at once. + // Number of goroutine ids to grab from runtime_sched->goidgen to local per-P cache at once. // 16 seems to provide enough amortization, but other than that it's mostly arbitrary number. GoidCacheBatch = 16, }; -Sched runtime_sched; +extern Sched* runtime_getsched() __asm__ (GOSYM_PREFIX "runtime.getsched"); + +static Sched* runtime_sched; int32 runtime_gomaxprocs; uint32 runtime_needextram = 1; M runtime_m0; @@ -471,6 +441,8 @@ runtime_schedinit(void) const byte *p; Eface i; + runtime_sched = runtime_getsched(); + m = &runtime_m0; g = &runtime_g0; m->g0 = g; @@ -479,7 +451,7 @@ runtime_schedinit(void) initcontext(); - runtime_sched.maxmcount =
Re: [Patch v4 0/17] Add support for _Float16 to AArch64 and ARM
On Fri, Nov 11, 2016 at 03:37:17PM +, James Greenhalgh wrote: > > Hi, > > This patch set enables the _Float16 type specified in ISO/IEC TS 18661-3 > for AArch64 and ARM. The patch set has been posted over the past two months, > with many of the target-independent changes approved. I'm reposting it in > entirity in the form I hope to commit it to trunk. > > The patch set can be roughly split in three; first, hookization of > TARGET_FLT_EVAL_METHOD, and changes to the excess precision logic in the > compiler to handle the new values for FLT_EVAL_METHOD defined in > ISO/IEC TS-18661-3. Second, the AArch64 changes required to enable _Float16, > and finally the ARM changes required to enable _Float16. > > The broad goals and an outline of each patch in the patch set were > described in https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02383.html . > As compared to the original submission, the patch set has grown an ARM > port, and has had several rounds of technical review on the target > independent aspects. > > This has resulted in many of the patches already being approved, a full > summary of the status of each ticket is immediately below. > > Clearly the focus for review of this patch set now needs to be the AArch64 > and ARM ports, I hope the appropriate maintainers will be able to do so in > time for the patch set to be accepted for GCC 7. > > I've built and tested the full patch set on ARM (cross and native), > AArch64 (cross and native) and x86_64 (native) with no identified issues. All the target independent changes are now approved, and all the ARM patches have been approved (two are conditional on minor changes). I'd like to apply the target independent and ARM changes to trunk, while I wait for approval of the AArch64 changes. The changes for the two ports should be independent. Would that be acceptable, or would you prefer me to wait for review of the AArch64 changes? I will then send a follow-up patch for doc/extend.texi detailing the availability of _Float16 on ARM (I'm holding off on doing this until I know which order the ARM and AArch64 parts will go in). Thanks, James > -- > Target independent changes > > 10 patches, 9 previously approved, 1 New implementing testsuite > changes to enable _Float16 tests in more circumstances on ARM. > -- > > [Patch 1/17] Add a new target hook for describing excess precision intentions > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html > > [Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386 > > Blanket approved by Jeff in: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html > > [Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390 > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html > > [Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k > > Blanket approved by Jeff in: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html > And by Andreas: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html > > There was a typo in the original patch, fixed in: > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01173.html > which I would apply as an "obvious" fix to the original patch. > > [Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3] > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html > > Joseph had a comment in > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests > should check FLT_EVAL_METHOD from rather than > __FLT_EVAL_METHOD__. Rather than implement that suggestion, I added tests > to patch 6 which tested the macro, and left the tests in this > patch testing the internal macro. > > [Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION > > Approved (after removing a rebase bug): > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00231.html > > [Patch 7/17] Delete TARGET_FLT_EVAL_METHOD and poison it. > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02401.html > > [Patch 8/17] Make _Float16 available if HFmode is available > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02403.html > > [Patch libgcc 9/17] Update soft-fp from glibc > > Self approved under policy that we can update libraries which GCC mirrors > without further approval. > > [Patch testsuite patch 10/17] Add options for floatN when checking effective > target for support > > NEW! > > > AArch64 changes > > 3 patches, none reviewed > > > [Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns > > Not reviewed, last pinged (^6): > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00584.html > > [Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and > truncations > > Not reviewed: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02395.html > > [Patch AArch64 13/17] Enable _Float16 for AArch64 > > Not reviewed: > https://gcc.gnu.org/ml/gcc-patches/2016-10/ms
Re: [PATCH] enable -fprintf-return-value by default
On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. Thanks Martin gcc-fprintf-return-value.diff gcc/c-family/ChangeLog: * c.opt (-fprintf-return-value): Enable by default. gcc/ChangeLog: * doc/invoke.texi (-fprintf-return-value): Document that option is enabled by default. OK once you and Sandra are in-sync on the doc changes. jeff
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 05:32 PM, Martin Sebor wrote: I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: I have the "advantage" of being a GCC gray beard and had the misfortune of maintaining a system that had one of those allocas (hpux) *and* having to debug heap exhaustions in GCC that would occur due to this kind of construct for (some large number of iterations) frobit (...) Where frobit would allocate a metric ton of stuff with alloca. Note the calls to alloca down in frobit would all appear to be at the same stack depth (alloca literally recorded the SP value and the PA was a fixed frame target). So the calls to alloca within frobit wouldn't deallocate anything. http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html Their GCC-derived compilers apparently enable it with -fno-builtins. Right. That said, I don't view this as reason to exclude the warning from -Wextra. The vendor-provided compilers are obviously customized versions of GCC with their own features, including their own set of options and warnings. It shouldn't stop us from enabling those we expect to be useful to the typical GCC user on typical targets, and especially those that can help expose sources of common bugs. Recognizing I'm preaching to choir here: Calling alloca with any argument is considered dangerous enough to be discouraged in most man pages. Alloca(0) is a potentially dangerous corner case of an already dangerous API. It seems at least as problematic as any of the warnings already in -Wextra, and I'd say as many in -Wall. Right. We're well in the "living dangerously" space. But I could claim that on one of these targets, warning about an alloca (0) would likely result in a developer removing it or forcing it to allocate some small amount of space to shut up the warning. That in turn runs the risk of making the target code more prone to heap exhaustion and possibly less secure, particularly if the developer isn't aware of the special nature of alloca (0). Even on systems with this unusual alloca implementation, since alloca(0) is special (and expensive), warning on such calls would likely be useful. In fact, since calling alloca(0) in these environments is actually important, warning on them coukd help detect their unintended absence. The few such calls that are intended can be easily tweaked to suppress the warning. It certainly is special and often expensive. But I'd come back to the likely behavior of the developer. I suspect unless there's a comment indicating the alloca (0) is intentional, they're likely to just remove it. So I see both sides here and I'm not sure what the best path forward should be. jeff
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 12:58 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. Maybe that's the key. We don't want to warn for alloca (0); But we should warn for whatever = alloca (0); The former is a clear request for GC of the alloca space. The latter it much more likely a request for space where something went wrong computing the amount of space. jeff
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 10:38 AM, Sandra Loosemore wrote: On 11/18/2016 09:01 AM, Martin Sebor wrote: On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Well, how about the rule is that you look at the convention of the specific list you are adding something to? At least that way we retain local consistency and don't mess up those parts of the documentation that we have already tried to organize in some way. There's so much material in the command-line options chapter that it would be hard to figure out how to present it even if the content were completely static, much less when people are adding/renaming/modifying options all the time. I think it would be be ideal if all the options were sorted the same way in all sections. Is there some command to have texinfo sort them for us? If not, can we write a script to sort them, either each time just before generating the docs or manually? (I'm happy to help.) Otherwise, consistency will continue to be an elusive goal. There are at least two reasons why I don't think following the style used by each section is likely to yield good results (and clearly hasn't to date). First, the big sections already have examples of both approaches (or simply options out of order). In some of them it can also be hard to tell if the radix of the options you're looking to for guidance starts with an 'n'. Second, when adding more than one option to different sections (such as the -Wformat-length and -fprintf-format-length options) having to remember to apply a different sort order for each (warnings are sorted by radix but optimization options, for the most parts, strictly alphabetically), seems also likely to trip people up. Martin PS I don't mind moving the -fno-printf-return-value option up or down and I will do it before committing the patch. I would just prefer to be able not to have to remember and worry about all these subtle rules. There are too many of them and they tend to take time and energy away from things that matter more (like fixing bugs).
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 11:52 AM, Martin Sebor wrote: I think it would be be ideal if all the options were sorted the same way in all sections. Is there some command to have texinfo sort them for us? If not, can we write a script to sort them, either each time just before generating the docs or manually? (I'm happy to help.) Otherwise, consistency will continue to be an elusive goal. I'm not aware of texinfo way to do this automatically. There are at least two reasons why I don't think following the style used by each section is likely to yield good results (and clearly hasn't to date). First, the big sections already have examples of both approaches (or simply options out of order). In some of them it can also be hard to tell if the radix of the options you're looking to for guidance starts with an 'n'. Second, when adding more than one option to different sections (such as the -Wformat-length and -fprintf-format-length options) having to remember to apply a different sort order for each (warnings are sorted by radix but optimization options, for the most parts, strictly alphabetically), seems also likely to trip people up. Let's split this issue off by moving the option into the location Sandra has asked so that we're at least kindof, sorta, locally consistent. That allows your patch to go forward. Then separately we can see if we can bring more sense to the larger issue. Sandra has tried to work towards bring sanity to our documentation (which has grown like field bindweed over time) and we can include a discussion about this issue in that larger effort. PS I don't mind moving the -fno-printf-return-value option up or down and I will do it before committing the patch. I would just prefer to be able not to have to remember and worry about all these subtle rules. There are too many of them and they tend to take time and energy away from things that matter more (like fixing bugs). Understood. But that's also part of the reason why we delegate things -- there's a million little things to remember and nobody can remember them all. So it's a balance between saying "we should clean this up and bring consistency here now" and "the maintainer has asked for a change, let's do that and address the consistency issues separately". There's obviously pros and cons to each decision which I don't enumerate ;-) Jeff
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
On Fri, Nov 18, 2016 at 9:42 AM, Mike Stump wrote: > On Nov 18, 2016, at 2:45 AM, Rainer Orth > wrote: >> So the current suggestion is to combine my fixincludes patch and Jack's >> patch to disable use if !__BLOCKS__. > >> I guess this is ok for mainline now to restore bootstrap? > > I think we are down to everyone likes this, Ok. The big question is, does > this need a back port? My thinking on that subject is that since include fixes do not directly affect the compiler and, instead, facilitate its use on various platforms, it is therefore reasonably safe to back port. Especially if adequate guards (selection tests) are included in the fix to prevent it from taking action on the wrong files. But I also think it is really a "steering committee" decision. For me, I think it is safe enough.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi! On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote: > > Hi HJ, could you please commit it? > > Done. I'm seeing lots of ICEs with this. E.g. reduced: typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); typedef unsigned char __mmask8; typedef float __v4sf __attribute__ ((__vector_size__ (16))); static inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_setzero_ps (void) { return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; } __m128 foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, __m128 *__F) { return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B, (__v4sf) __C, (__v4sf) __D, (__v4sf) __E, (__v4sf) __A, (const __v4sf *) __F, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } ICEs with -mavx5124fmaps -O0, but succeeds with -mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2. fcn_mask = gen_avx5124fmaddps_4fmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz; msk_mov = gen_avx512vl_loadv4sf_mask; looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong. You need some fallback if avx512vl isn't available, perhaps use avx512f 512-bit masked insns with bits in the mask forced to pick only the ones you want? Also, seems there are various formatting issues in the change, e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char indentation, some lines too long. Jakub