Re: [PATCH] Fix gcc.dg/tree-ssa/pr68198.c

2016-09-28 Thread Richard Biener
On Wed, 28 Sep 2016, Bernd Edlinger wrote:

> Hi,
> 
> this test does fail because a non-existent dump file is referenced in
> the dg-final.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> OK for trunk?

Ok.

Richard.


Re: [PATCH, PING] DWARF: process all TYPE_DECL nodes when iterating on scopes

2016-09-28 Thread Richard Biener
On Tue, Sep 27, 2016 at 4:01 PM, Pierre-Marie de Rodat
 wrote:
> On 09/07/2016 11:30 AM, Richard Biener wrote:
>>
>> That said, with the idea of early debug in place and thus giving
>> more responsibility to the frontends I wonder in what order the Ada
>> FE calls debug_hooks.early_global_decl ()? Maybe it's the middle-end
>> and it should arrange for a more natural order on function nests.  So
>> I'd appreciate if you can investigate this side of the issue a bit,
>> that is, simply avoid the bad ordering.
>
>
> I investigated this track: it’s not the Ada front-end’s calls to the
> debug_hooks.early_global_decl that matter, but actually the ones in
> cgraphunit.c (symbol_table::finalize_compilation_unit).
>
> The new patch attached tries to fix the call order. Bootstrapping and
> regtesting on x86_64-linux were clean except one testcase where the debug
> output changed legitimally (matcher updated). Ok to commit? Thank you in
> advance!

Hmm, interesting approach.  It might work reliably at this point of
the compilation
but we do actually recycle cgraph nodes.  So I wonder if given we do have
->origin / ->next_nested in the cgraph if we can simply perform a walk in the
appropriate order.

But then OTOH in my early LTO debug patchset I have

Index: early-lto-debug/gcc/dwarf2out.c
===
--- early-lto-debug.orig/gcc/dwarf2out.c2016-09-26
15:51:37.666113699 +0200
+++ early-lto-debug/gcc/dwarf2out.c 2016-09-27 08:52:20.802507268 +0200
@@ -23837,6 +24261,16 @@ dwarf2out_early_global_decl (tree decl)
  if (!DECL_STRUCT_FUNCTION (decl))
goto early_decl_exit;

+ /* Emit an abstract origin of a function first.  This happens
+with C++ constructor clones for example and makes
+dwarf2out_abstract_function happy which requires the early
+DIE of the abstract instance to be present.  */
+ if (DECL_ABSTRACT_ORIGIN (decl))
+   {
+ current_function_decl = DECL_ABSTRACT_ORIGIN (decl);
+ dwarf2out_decl (DECL_ABSTRACT_ORIGIN (decl));
+   }
+
  current_function_decl = decl;
}
   dwarf2out_decl (decl);

which is not 100% equivalent (looking at the abstract origin) but it
sounds like a
similar issue.  So I wonder if doing the same for DECL_CONTEXT being a function
makes sense here and would also fix your particular issue in a more
reliable way.

Thanks,
Richard.

> --
> Pierre-Marie de Rodat


Re: [PATCH] Refactor section/label init for early LTO debug

2016-09-28 Thread Richard Biener
On Tue, 27 Sep 2016, Rainer Orth wrote:

> Hi Richard,
> 
> >> this patch introduced many pch assembly comparison failures on Solaris
> >> (both sparc and x86, 32 and 64-bit, /bin/as only), like
> >> 
> >> FAIL: gcc.dg/pch/common-1.c   -O3 -g  assembly comparison
> >> FAIL: gcc.dg/pch/common-1.c  -O0 -g assembly comparison
> >> 
> >> gcc.log shows this diff:
> >> 
> >> <   .long   .Letext0
> >> >   .long
> >> 
> >> While the .Letext0 label is still in the assembly output, it isn't
> >> referenced inside .debug_line, as can be seen in the diff.
> >
> > Can you check if moving
> >
> > static void
> > dwarf2out_assembly_start (void)
> > {
> > #ifndef DWARF2_LINENO_DEBUGGING_INFO
> >   ASM_GENERATE_INTERNAL_LABEL (text_section_label, TEXT_SECTION_LABEL, 0);
> >   ASM_GENERATE_INTERNAL_LABEL (text_end_label, TEXT_END_LABEL, 0);
> >   ASM_GENERATE_INTERNAL_LABEL (cold_text_section_label,
> >COLD_TEXT_SECTION_LABEL, 0);
> >   ASM_GENERATE_INTERNAL_LABEL (cold_end_label, COLD_END_LABEL, 0);
> >
> >   switch_to_section (text_section);
> >   ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
> > #endif
> >
> > back to dwarf2out_init helps?  Ah!  Does
> >
> > Index: gcc/dwarf2out.c
> > ===
> > --- gcc/dwarf2out.c (revision 240521)
> > +++ gcc/dwarf2out.c (working copy)
> > @@ -25657,14 +25687,6 @@ dwarf2out_init (const char *filename ATT
> >  vec_alloc (macinfo_table, 64);
> >  #endif
> >  
> > -  /* Make sure the line number table for .text always exists.  */
> > -  text_section_line_info = new_line_info_table ();
> > -  text_section_line_info->end_label = text_end_label;
> > -
> > -#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > -  cur_line_info_table = text_section_line_info;
> > -#endif
> > -
> >/* If front-ends already registered a main translation unit but we were 
> > not
> >   ready to perform the association, do this now.  */
> >if (main_translation_unit != NULL_TREE)
> > @@ -25688,6 +25710,14 @@ dwarf2out_assembly_start (void)
> >ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
> >  #endif
> >  
> > +  /* Make sure the line number table for .text always exists.  */
> > +  text_section_line_info = new_line_info_table ();
> > +  text_section_line_info->end_label = text_end_label;
> > +
> > +#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > +  cur_line_info_table = text_section_line_info;
> > +#endif
> > +
> >if (HAVE_GAS_CFI_SECTIONS_DIRECTIVE
> >&& dwarf2out_do_cfi_asm ()
> >&& (!(flag_unwind_tables || flag_exceptions)
> >
> >
> > fix it?  Ok if it passes testing for you.
> 
> testing on a single testcase worked fine.  I'll just run a full
> bootstrap to make sure everything is fine.

Meanwhile I bootstrapped and tested it on x86_64-unknown-linux-gnu and
applied it to trunk.

Sorry for the breakage.

Richard.

2016-09-28  Richard Biener  

* dwarf2out.c (dwarf2out_init): Move text_section_line_info init ...
(dwarf2out_assembly_start): ... here.

diff -u gcc/dwarf2out.c gcc/dwarf2out.c
--- gcc/dwarf2out.c (working copy)
+++ gcc/dwarf2out.c (working copy)
@@ -25684,14 +25687,6 @@
 vec_alloc (macinfo_table, 64);
 #endif
 
-  /* Make sure the line number table for .text always exists.  */
-  text_section_line_info = new_line_info_table ();
-  text_section_line_info->end_label = text_end_label;
-
-#ifdef DWARF2_LINENO_DEBUGGING_INFO
-  cur_line_info_table = text_section_line_info;
-#endif
-
   /* If front-ends already registered a main translation unit but we were not
  ready to perform the association, do this now.  */
   if (main_translation_unit != NULL_TREE)
@@ -25715,6 +25710,14 @@
   ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
 #endif
 
+  /* Make sure the line number table for .text always exists.  */
+  text_section_line_info = new_line_info_table ();
+  text_section_line_info->end_label = text_end_label;
+
+#ifdef DWARF2_LINENO_DEBUGGING_INFO
+  cur_line_info_table = text_section_line_info;
+#endif
+
   if (HAVE_GAS_CFI_SECTIONS_DIRECTIVE
   && dwarf2out_do_cfi_asm ()
   && (!(flag_unwind_tables || flag_exceptions)



Re: [PATCH] Refactor section/label init for early LTO debug

2016-09-28 Thread Richard Biener
On Wed, 28 Sep 2016, Richard Biener wrote:

> On Tue, 27 Sep 2016, Rainer Orth wrote:
> 
> > Hi Richard,
> > 
> > >> this patch introduced many pch assembly comparison failures on Solaris
> > >> (both sparc and x86, 32 and 64-bit, /bin/as only), like
> > >> 
> > >> FAIL: gcc.dg/pch/common-1.c   -O3 -g  assembly comparison
> > >> FAIL: gcc.dg/pch/common-1.c  -O0 -g assembly comparison
> > >> 
> > >> gcc.log shows this diff:
> > >> 
> > >> <   .long   .Letext0
> > >> >   .long
> > >> 
> > >> While the .Letext0 label is still in the assembly output, it isn't
> > >> referenced inside .debug_line, as can be seen in the diff.
> > >
> > > Can you check if moving
> > >
> > > static void
> > > dwarf2out_assembly_start (void)
> > > {
> > > #ifndef DWARF2_LINENO_DEBUGGING_INFO
> > >   ASM_GENERATE_INTERNAL_LABEL (text_section_label, TEXT_SECTION_LABEL, 0);
> > >   ASM_GENERATE_INTERNAL_LABEL (text_end_label, TEXT_END_LABEL, 0);
> > >   ASM_GENERATE_INTERNAL_LABEL (cold_text_section_label,
> > >COLD_TEXT_SECTION_LABEL, 0);
> > >   ASM_GENERATE_INTERNAL_LABEL (cold_end_label, COLD_END_LABEL, 0);
> > >
> > >   switch_to_section (text_section);
> > >   ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
> > > #endif
> > >
> > > back to dwarf2out_init helps?  Ah!  Does
> > >
> > > Index: gcc/dwarf2out.c
> > > ===
> > > --- gcc/dwarf2out.c (revision 240521)
> > > +++ gcc/dwarf2out.c (working copy)
> > > @@ -25657,14 +25687,6 @@ dwarf2out_init (const char *filename ATT
> > >  vec_alloc (macinfo_table, 64);
> > >  #endif
> > >  
> > > -  /* Make sure the line number table for .text always exists.  */
> > > -  text_section_line_info = new_line_info_table ();
> > > -  text_section_line_info->end_label = text_end_label;
> > > -
> > > -#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > > -  cur_line_info_table = text_section_line_info;
> > > -#endif
> > > -
> > >/* If front-ends already registered a main translation unit but we 
> > > were 
> > > not
> > >   ready to perform the association, do this now.  */
> > >if (main_translation_unit != NULL_TREE)
> > > @@ -25688,6 +25710,14 @@ dwarf2out_assembly_start (void)
> > >ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
> > >  #endif
> > >  
> > > +  /* Make sure the line number table for .text always exists.  */
> > > +  text_section_line_info = new_line_info_table ();
> > > +  text_section_line_info->end_label = text_end_label;
> > > +
> > > +#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > > +  cur_line_info_table = text_section_line_info;
> > > +#endif
> > > +
> > >if (HAVE_GAS_CFI_SECTIONS_DIRECTIVE
> > >&& dwarf2out_do_cfi_asm ()
> > >&& (!(flag_unwind_tables || flag_exceptions)
> > >
> > >
> > > fix it?  Ok if it passes testing for you.
> > 
> > testing on a single testcase worked fine.  I'll just run a full
> > bootstrap to make sure everything is fine.
> 
> Meanwhile I bootstrapped and tested it on x86_64-unknown-linux-gnu and
> applied it to trunk.

Ah, you already did.

Thanks,
Richard.


Re: [PATCH] Last bit from Early LTO debug merge -- move break_out_includes

2016-09-28 Thread Richard Biener
On Tue, 27 Sep 2016, Jason Merrill wrote:

> OK.

May I ping https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01835.html then
on which this depends?

Thanks,
Richard.

> On Tue, Sep 27, 2016 at 9:47 AM, Richard Biener  wrote:
> >
> > This is moving break_out_includes (aka -feliminate-dwarf2-dups handling)
> > to early finish.  It requires some massaging with the way we pick
> > up its results which previously "conveniently" ended up in the
> > limbo list but after moving to early will be scrapped off by
> > late finish doing a limbo list flush (where we just drop all CU DIEs
> > thinking it can only be comp_unit_die () itself).  Thus this patch
> > adds a cu_die_list list for it (much similar to the comdat one we already
> > have).
> >
> > Eventually -feliminate-dwarf2-dups might die anyway(?), its testing
> > coverage is very low and it's declared broken for C++.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Richard.
> >
> > 2016-09-27  Richard Biener  
> >
> > * dwarf2out.c (cu_die_list): New global.
> > (dwarf2out_finish): Walk cu_die_list instead of limbo DIEs.  Add
> > main_comp_unit_die to cu_die_list if we created it.
> > Move break_out_includes ...
> > (dwarf2out_early_finish): ... here.  Push created CU DIEs onto
> > the cu_die_list.
> >
> > diff -u gcc/dwarf2out.c gcc/dwarf2out.c
> > --- gcc/dwarf2out.c (working copy)
> > +++ gcc/dwarf2out.c (working copy)
> > @@ -2866,6 +2866,9 @@
> >  /* A list of type DIEs that have been separated into comdat sections.  */
> >  static GTY(()) comdat_type_node *comdat_type_list;
> >
> > +/* A list of CU DIEs that have been separated.  */
> > +static GTY(()) limbo_die_node *cu_die_list;
> > +
> >  /* A list of DIEs with a NULL parent waiting to be relocated.  */
> >  static GTY(()) limbo_die_node *limbo_die_list;
> >
> > @@ -27855,11 +27858,6 @@
> >resolve_addr (comp_unit_die ());
> >move_marked_base_types ();
> >
> > -  /* Generate separate CUs for each of the include files we've seen.
> > - They will go into limbo_die_list.  */
> > -  if (flag_eliminate_dwarf2_dups)
> > -break_out_includes (comp_unit_die ());
> > -
> >/* Initialize sections and labels used for actual assembler output.  */
> >init_sections_and_labels ();
> >
> > @@ -27867,7 +27865,7 @@
> >   have children.  */
> >add_sibling_attributes (comp_unit_die ());
> >limbo_die_node *node;
> > -  for (node = limbo_die_list; node; node = node->next)
> > +  for (node = cu_die_list; node; node = node->next)
> >  add_sibling_attributes (node->die);
> >for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
> >  add_sibling_attributes (ctnode->root_die);
> > @@ -27876,7 +27874,15 @@
> >   skeleton compile_unit DIE that remains in the .o, while
> >   most attributes go in the DWO compile_unit_die.  */
> >if (dwarf_split_debug_info)
> > -main_comp_unit_die = gen_compile_unit_die (NULL);
> > +{
> > +  limbo_die_node *cu;
> > +  main_comp_unit_die = gen_compile_unit_die (NULL);
> > +  cu = limbo_die_list;
> > +  gcc_assert (cu->die == main_comp_unit_die);
> > +  limbo_die_list = limbo_die_list->next;
> > +  cu->next = cu_die_list;
> > +  cu_die_list = cu;
> > +}
> >else
> >  main_comp_unit_die = comp_unit_die ();
> >
> > @@ -27988,7 +27994,7 @@
> >
> >/* Output all of the compilation units.  We put the main one last so that
> >   the offsets are available to output_pubnames.  */
> > -  for (node = limbo_die_list; node; node = node->next)
> > +  for (node = cu_die_list; node; node = node->next)
> >  output_comp_unit (node->die, 0);
> >
> >hash_table comdat_type_table (100);
> > @@ -28217,6 +28223,21 @@
> >prune_unused_types ();
> >  }
> >
> > +  /* Generate separate CUs for each of the include files we've seen.
> > + They will go into limbo_die_list and from there to cu_die_list.  */
> > +  if (flag_eliminate_dwarf2_dups)
> > +{
> > +  gcc_assert (limbo_die_list == NULL);
> > +  break_out_includes (comp_unit_die ());
> > +  limbo_die_node *cu;
> > +  while ((cu = limbo_die_list))
> > +   {
> > + limbo_die_list = cu->next;
> > + cu->next = cu_die_list;
> > + cu_die_list = cu;
> > +   }
> > +}
> > +
> >/* The early debug phase is now finished.  */
> >early_dwarf_finished = true;
> >  }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 1/2][RS6000] .gnu.attribute Tag_GNU_Power_ABI_FP

2016-09-28 Thread Segher Boessenkool
Hi Alan,

On Wed, Sep 28, 2016 at 10:41:45AM +0930, Alan Modra wrote:
> Extend this attribute to cover long double ABIs, for 64-bit too.  The
> idea is that the linker should warn if you are linking object files
> with incompatible ABIs, for example IEEE128 long double with IBM long
> double.  It's only a warning, and doesn't catch everything.  In
> particular, global data mismatches.  ie. initialised global variables
> in one format can be used by code that expects a different format
> without warning.  Also, a function returning "sizeof (long double)"
> or similar won't cause an object file to be tagged.  Oh, and you need
> a new linker to see long double warnings.
> 
> The patch also corrects an error that crept in to code setting
> rs6000_passes_float.  See the added comment.  Passing IEEE128 values
> in vsx regs ought to set both Tag_GNU_Power_ABI_FP and
> Tag_GNU_Power_ABI_Vector.
> 
> I've also added a new option, default on, that disables output of
> .gnu_attribute tags.  That's needed because this patch alone
> introduces libstdc++ testsuite regressions, fixed by the next patch.

>  #ifdef HAVE_AS_GNU_ATTRIBUTE
> +  /* ??? The value emitted depends on options active at file end.
> + Assume anyone using #pragma or attributes that might change
> + options knows what they are doing.  */
> +  if ((TARGET_64BIT || DEFAULT_ABI == ABI_V4)
> +  && rs6000_passes_float)
> +fprintf (asm_out_file, "\t.gnu_attribute 4, %d\n",
> +  ((TARGET_DF_FPR | TARGET_DF_SPE ? 1
> +: TARGET_SF_FPR | TARGET_SF_SPE ? 3
> +: 2)
> +   | (!rs6000_passes_long_double ? 0
> +  : !TARGET_LONG_DOUBLE_128 ? 2 * 4
> +  : TARGET_IEEEQUAD ? 3 * 4
> +  : 1 * 4)));

This will become much more readable if you expand it to a few statements,
using a temp var perhaps.

> @@ -37085,6 +37117,9 @@ static struct rs6000_opt_var const rs6000_opt_vars[] =
>{ "warn-cell-microcode",
>  offsetof (struct gcc_options, x_rs6000_warn_cell_microcode),
>  offsetof (struct cl_target_option, x_rs6000_warn_cell_microcode), },
> +  { "gnu-attr",
> +offsetof (struct gcc_options, x_rs6000_gnu_attr),
> +offsetof (struct cl_target_option, x_rs6000_gnu_attr), },
>  };

Please spell out "gnu-attribute" for the option name.

> +  if test "$gcc_cv_gld_major_version" -eq 2 -a 
> "$gcc_cv_gld_minor_version" -ge 28 -o "$gcc_cv_gld_major_version" -gt 2; then

That line is a little long.

> +  AC_DEFINE(HAVE_LD_PPC_GNU_ATTR, 1,
> +[Define if your PowerPC linker has .gnu_attribute long double support.])

"LD" didn't make me think "long double" ;-)  Maybe use a better name?

Okay for trunk with those thing fixed, and Joseph's comments taken care
of of course.

Thanks,


Segher


Re: [PATCH 2/2] Disable .gnu_attribute tags in compatibility-ldbl.o

2016-09-28 Thread Segher Boessenkool
On Wed, Sep 28, 2016 at 10:43:38AM +0930, Alan Modra wrote:
> compatibility-ldbl.o is compiled with -mlong-double-64.  When
> long double .gnu_attribute tags are checked by the linker, it
> complains about the mismatch between this file and others in
> libstdc++.
> 
> Bootstrapped and regression tested powerpc64le-linux and
> powerpc64-linux biarch.  OK to apply?

Okay with the option name fixed.  Thanks,


Segher


[PATCH][RTL ifcvt] Transform (X == CST) ? -CST : Y into (X == CST) ? -X : Y when conditional negation is available

2016-09-28 Thread Kyrill Tkachov

Hi all,

This patch tries to avoid materialising an immediate
when comparison has shown that it is already loaded.
This is performed during ifcvt and only when the target has the conditional 
negate
or inverse optabs, which for now is only aarch64.

Thus for the code:
int
foo (int a, int b)
{
  return a == 5 ? -5 : b;
}

we currently emit on aarch64:
foo:
mov w2, -5
cmp w0, 5
cselw0, w1, w2, ne
ret

but with this patch we emit:
foo:
cmp w0, 5
csneg   w0, w1, w0, ne
ret

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

P.S. The simpler form of the transformation: X == CST ? CST : Y --> X == CST ? 
X : Y
could also be beneficial IMO but I found that it can cause bad codegen in 
combine
due to extending the live range of X. I'm still working on a way to work around 
that,
but that is a separate piece of work anyway.

2016-09-28  Kyrylo Tkachov  

* ifcvt.c (noce_try_avoid_const_materialization): New function.
(noce_process_if_block): Use it.

2016-09-28  Kyrylo Tkachov  

* gcc.target/aarch64/ifcvt_avoid_const_materialization_1.c: New test.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 24542f008485e6c28e068030fa301f2ce040efc1..203cfe98f82a49c35520a2cb16d2fa09d4c85c72 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1313,6 +1313,84 @@ noce_try_inverse_constants (struct noce_if_info *if_info)
   return false;
 }
 
+/* Try to avoid materializing a constant if we know it's in one of the
+   registers.  For example:
+   (X == CST) ? -CST : Y --> (X == CST) ? -X : Y.
+   Do this only if conditional negation is available.
+   Similar for bitwise NOT.  */
+
+static bool
+noce_try_avoid_const_materialization (struct noce_if_info *if_info)
+{
+  if (!noce_simple_bbs (if_info))
+return false;
+
+  rtx cond = if_info->cond;
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx_code code = GET_CODE (cond);
+  machine_mode mode = GET_MODE (if_info->x);
+
+  if (!(code == EQ || code == NE)
+  || !REG_P (XEXP (cond, 0))
+  || !REG_P (if_info->x)
+  || GET_MODE (XEXP (cond, 0)) != mode
+  || !CONST_INT_P (XEXP (cond, 1)))
+return false;
+
+  rtx cst = XEXP (cond, 1);
+  if (cst == CONST0_RTX (mode))
+return false;
+
+  rtx non_cst = XEXP (cond, 0);
+  rtx eq_side = code == EQ ? b : a;
+  if (!CONST_INT_P (eq_side))
+return false;
+
+  HOST_WIDE_INT cstval = INTVAL (cst);
+  HOST_WIDE_INT eq_side_val = INTVAL (eq_side);
+
+  rtx_code op_code;
+  if (eq_side_val == ~cstval)
+op_code = NOT;
+  else if (eq_side_val != HOST_WIDE_INT_MIN && (cstval == -eq_side_val))
+op_code = NEG;
+  else
+return false;
+
+  /* By the rules of the negcc/notcc optabs must happen when the COND is true,
+ in this case when register in COND is equal to CST so always set the
+ comparison to EQ.  */
+  if (code == NE)
+{
+  a = non_cst;
+  cond = gen_rtx_fmt_ee (EQ, GET_MODE (cond), non_cst, cst);
+}
+  else
+b = non_cst;
+
+  start_sequence ();
+  rtx target
+= emit_conditional_neg_or_complement (if_info->x, op_code, mode,
+	   cond, a, b);
+  if (!target)
+{
+  end_sequence ();
+  return false;
+}
+
+  if (target != if_info->x)
+noce_emit_move_insn (if_info->x, target);
+
+  rtx_insn *seq = end_ifcvt_sequence (if_info);
+  if (!seq)
+return false;
+
+   emit_insn_before_setloc (seq, if_info->jump,
+			 INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_avoid_const_materialization";
+  return true;
+}
 
 /* Convert "if (test) x = a; else x = b", for A and B constant.
Also allow A = y + c1, B = y + c2, with a common y between A
@@ -3606,6 +3684,8 @@ noce_process_if_block (struct noce_if_info *if_info)
 goto success;
   if (noce_try_inverse_constants (if_info))
 goto success;
+  if (noce_try_avoid_const_materialization (if_info))
+goto success;
   if (!targetm.have_conditional_execution ()
   && noce_try_store_flag_constants (if_info))
 goto success;
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_avoid_const_materialization_1.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_avoid_const_materialization_1.c
new file mode 100644
index ..b2a05eaff606a54afc40c3899ad5926c889283dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_avoid_const_materialization_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we avoid moving the immediate into a register
+   if comparison has shown that the inverse or negated form is
+   already in one of the registers.  */
+
+int
+foo (int a, int b)
+{
+  return a == 5 ? -5 : b;
+}
+
+int
+bar (int a, int b)
+{
+  return a != 5 ? b : ~5;
+}
+
+/* { dg-final { scan-assembler-not "mov\\tw\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "csneg\\tw\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "csinv\\tw\[0-9\]+" 1 } } */


Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components

2016-09-28 Thread Segher Boessenkool
On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
> On 09/23/2016 02:21 AM, Segher Boessenkool wrote:
> >--- a/gcc/function.c
> >+++ b/gcc/function.c
> >@@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void)
> >   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> >   edge orig_entry_edge = entry_edge;
> >
> >-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
> >   rtx_insn *prologue_seq = make_prologue_seq ();
> >-  rtx_insn *epilogue_seq = make_epilogue_seq ();
> >
> >   /* Try to perform a kind of shrink-wrapping, making sure the
> >  prologue/epilogue is emitted only around those parts of the
> >@@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void)
> >
> >   try_shrink_wrapping (&entry_edge, prologue_seq);
> >
> >+  try_shrink_wrapping_separate (entry_edge->dest);
> >+
> >+  if (crtl->shrink_wrapped_separate)
> >+prologue_seq = make_prologue_seq ();
> Note this implies that make_prologue_seq (and consequently the target 
> specific bits for prologue generation) can be safely called more than 
> once.  That's probably OK, but might be worth a comment -- your decision 
> whether or not to add the comment.

It is only called twice if separately shrink-wrapping (and that did
anything), so it won't change current behaviour.

I'll add a comment.

> >+static void
> >+place_prologue_for_one_component (unsigned int which, basic_block head)
> >+{
> >+  /* The block we are currently dealing with.  */
> >+  basic_block bb = head;
> >+  /* Is this the first time we visit this block, i.e. have we just gone
> >+ down the tree.  */
> >+  bool first_visit = true;
> >+
> >+  /* Walk the dominator tree, visit one block per iteration of this loop.
> >+ Each basic block is visited twice: once before visiting any children
> >+ of the block, and once after visiting all of them (leaf nodes are
> >+ visited only once).  As an optimization, we do not visit subtrees
> >+ that can no longer influence the prologue placement.  */
> >+  for (;;)
> Is there some reason you wrote this as a loop rather than recursion? 
> IMHO it makes this function (and spread_components) more difficult to 
> reason about than it needs to be.

It would recurse a few thousand frames deep on not terribly big testcases
(i.e. I've seen it happen).  This uses a lot of stack space on some ABIs,
and is very slow as well (this is the slowest function here by far).
Unlimited recursion is bad.

> >+/* Mark HAS_COMPONENTS for every block dominated by at least one block 
> >with
> >+   PRO_COMPONENTS set for the respective components, starting at HEAD.  */
> >+static void
> >+spread_components (basic_block head)
> I don't see any references to PRO_COMPONENTS in the actual code.  If I'm 
> reading the code correctly, you're just accumulating/propagating 
> HAS_COMPONENTS from parents to children via a DFS walk of the dominator 
> tree, right?

Yeah, s/PRO_COMPONENTS/HAS_COMPONENTS/.  I have renamed things a few
time but missed this one due to the ALL VARS IN ALL CAPS style :-)

> >+/* Place code for prologues and epilogues for COMPONENTS where we can put
> >+   that code at the end of basic blocks.  */
> >+static void
> >+emit_common_tails_for_components (sbitmap components)
> [ Snip. ]
> >+
> >+  /* Put the code at the end of the BB, but before any final jump.  */
> >+  if (!bitmap_empty_p (epi))
> So what if the final jump uses hard registers in one way or another?   I 
> don't immediately see anything that verifies it is safe to transpose the 
> epilogue and the final jump.

Whoops.  Thanks for catching this.

> Conceptually want the epilogue to occur on the outgoing edge(s).  But 
> you want to actually transpose the epilogue and the control flow insn so 
> that you can insert the epilogue in one place.

The same problem happens with prologues, too.

> But naive transposition isn't safe.  The control flow insn may use one 
> or more registers that you were restoring or you may be on a cc0 target. 

A cc0 target can not use separate shrink-wrapping *anyway* if any of the
components would clobber cc0, so that is no problem here.

>   I think you need to handle the former more cleanly.  The latter I'd 
> be comfortable filtering out in try_shrink_wrapping_separate.

I'm thinking to not do the common tail optimisation if BB_END is a
JUMP_INSN but not simplejump_p (or a return or a sibling call).  Do
you see any problem with that?

> This has similarities to some of the problems we've had in the past with 
> rtl-gcse insertion as well as output reloads on insns with control flow.

Oh, I had so much fun recently with the latter :-/

> With transposition issue addressed, the only blocker I see are some 
> simple testcases we can add to the suite.  They don't have to be real 
> extensive.

Yeah, working on it.

> And one motivating example for the list archives, ideally 
> the glibc malloc case.

Okay.

Thanks for the reviews,


Segher


RE: [PATCH] [ARC] Fix emitting jump tables for ARCv2

2016-09-28 Thread Claudiu Zissulescu
 
> Could you rebase this onto the current head please.  I couldn't get
> this to merge cleanly.
> 
Sure, I will come back to you asap.

//Claudiu


Un-break dwarf2out for DWARF2_LINENO_DEBUGGING_INFO configurations (was: [PATCH] Refactor section/label init for early LTO debug)

2016-09-28 Thread Thomas Schwinge
Hi!

On Tue, 27 Sep 2016 12:34:46 +0200 (CEST), Richard Biener  
wrote:
> --- gcc/dwarf2out.c (revision 240521)
> +++ gcc/dwarf2out.c (working copy)
> @@ -25657,14 +25687,6 @@ dwarf2out_init (const char *filename ATT
>  vec_alloc (macinfo_table, 64);
>  #endif
>  
> -  /* Make sure the line number table for .text always exists.  */
> -  text_section_line_info = new_line_info_table ();
> -  text_section_line_info->end_label = text_end_label;
> -
> -#ifdef DWARF2_LINENO_DEBUGGING_INFO
> -  cur_line_info_table = text_section_line_info;
> -#endif
> -
>/* If front-ends already registered a main translation unit but we were 
> not
>   ready to perform the association, do this now.  */
>if (main_translation_unit != NULL_TREE)
> @@ -25688,6 +25710,14 @@ dwarf2out_assembly_start (void)
>ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
>  #endif
>  
> +  /* Make sure the line number table for .text always exists.  */
> +  text_section_line_info = new_line_info_table ();
> +  text_section_line_info->end_label = text_end_label;
> +
> +#ifdef DWARF2_LINENO_DEBUGGING_INFO
> +  cur_line_info_table = text_section_line_info;
> +#endif
> +
>if (HAVE_GAS_CFI_SECTIONS_DIRECTIVE
>&& dwarf2out_do_cfi_asm ()
>&& (!(flag_unwind_tables || flag_exceptions)

(This got committed in r240545.)  For DWARF2_LINENO_DEBUGGING_INFO
configurations (that is, nvptx; Bernd CCed, who originally authored the
DWARF2_LINENO_DEBUGGING_INFO support), this breaks things because of
uninitialized text_section_line_info/cur_line_info_table.  OK to fix as
follows?

commit e745307ac5a90e999133df21139526230e81
Author: Thomas Schwinge 
Date:   Wed Sep 28 10:55:23 2016 +0200

Un-break dwarf2out for DWARF2_LINENO_DEBUGGING_INFO configurations

gcc/
* dwarf2out.c (dwarf2_lineno_debug_hooks): Use
dwarf2out_assembly_start.
---
 gcc/dwarf2out.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git gcc/dwarf2out.c gcc/dwarf2out.c
index 51cab42..8c18c4c 100644
--- gcc/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -2539,7 +2539,7 @@ const struct gcc_debug_hooks dwarf2_lineno_debug_hooks =
   dwarf2out_init,
   debug_nothing_charstar,
   debug_nothing_charstar,
-  debug_nothing_void,
+  dwarf2out_assembly_start,
   debug_nothing_int_charstar,
   debug_nothing_int_charstar,
   debug_nothing_int_charstar,


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: Un-break dwarf2out for DWARF2_LINENO_DEBUGGING_INFO configurations (was: [PATCH] Refactor section/label init for early LTO debug)

2016-09-28 Thread Richard Biener
On Wed, 28 Sep 2016, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 27 Sep 2016 12:34:46 +0200 (CEST), Richard Biener  
> wrote:
> > --- gcc/dwarf2out.c (revision 240521)
> > +++ gcc/dwarf2out.c (working copy)
> > @@ -25657,14 +25687,6 @@ dwarf2out_init (const char *filename ATT
> >  vec_alloc (macinfo_table, 64);
> >  #endif
> >  
> > -  /* Make sure the line number table for .text always exists.  */
> > -  text_section_line_info = new_line_info_table ();
> > -  text_section_line_info->end_label = text_end_label;
> > -
> > -#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > -  cur_line_info_table = text_section_line_info;
> > -#endif
> > -
> >/* If front-ends already registered a main translation unit but we were 
> > not
> >   ready to perform the association, do this now.  */
> >if (main_translation_unit != NULL_TREE)
> > @@ -25688,6 +25710,14 @@ dwarf2out_assembly_start (void)
> >ASM_OUTPUT_LABEL (asm_out_file, text_section_label);
> >  #endif
> >  
> > +  /* Make sure the line number table for .text always exists.  */
> > +  text_section_line_info = new_line_info_table ();
> > +  text_section_line_info->end_label = text_end_label;
> > +
> > +#ifdef DWARF2_LINENO_DEBUGGING_INFO
> > +  cur_line_info_table = text_section_line_info;
> > +#endif
> > +
> >if (HAVE_GAS_CFI_SECTIONS_DIRECTIVE
> >&& dwarf2out_do_cfi_asm ()
> >&& (!(flag_unwind_tables || flag_exceptions)
> 
> (This got committed in r240545.)  For DWARF2_LINENO_DEBUGGING_INFO
> configurations (that is, nvptx; Bernd CCed, who originally authored the
> DWARF2_LINENO_DEBUGGING_INFO support), this breaks things because of
> uninitialized text_section_line_info/cur_line_info_table.  OK to fix as
> follows?

Ok.

Richard.

> commit e745307ac5a90e999133df21139526230e81
> Author: Thomas Schwinge 
> Date:   Wed Sep 28 10:55:23 2016 +0200
> 
> Un-break dwarf2out for DWARF2_LINENO_DEBUGGING_INFO configurations
> 
>   gcc/
>   * dwarf2out.c (dwarf2_lineno_debug_hooks): Use
>   dwarf2out_assembly_start.
> ---
>  gcc/dwarf2out.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git gcc/dwarf2out.c gcc/dwarf2out.c
> index 51cab42..8c18c4c 100644
> --- gcc/dwarf2out.c
> +++ gcc/dwarf2out.c
> @@ -2539,7 +2539,7 @@ const struct gcc_debug_hooks dwarf2_lineno_debug_hooks =
>dwarf2out_init,
>debug_nothing_charstar,
>debug_nothing_charstar,
> -  debug_nothing_void,
> +  dwarf2out_assembly_start,
>debug_nothing_int_charstar,
>debug_nothing_int_charstar,
>debug_nothing_int_charstar,
> 


Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Jonathan Wakely

On 27/09/16 23:28 +, Joseph Myers wrote:

On Tue, 27 Sep 2016, Jonathan Wakely wrote:


This adds the new 3D std::hypot() functions. This implementation seems
to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
hypot(hypot(x, y), z), and should be a bit more accurate at very large
or very small values due to reducing the arguments by the largest one.
Improvements welcome though, as this is not my forte.


Should I take it from this implementation that C++ is not concerned by
certain considerations that would arise for C: spurious underflow
exceptions from the scaling when some arguments much larger than others;
spurious "invalid" exceptions from the comparisons when any argument is
(quiet) NaN; handling of mixed (quiet) NaN and Inf arguments (where ISO C
Annex F has Inf returned, not NaN)?


The entire spec from the C++ draft is:  Returns: √ x^2 + y^2 + z^2

It doesn't mention "without undue overflow or underflow" as the C
standard does for hypot(x, y). Handling those conditions would of
course be nice, but I'm not capable of doing so, and I'd rather have a
poor implementation than none.

Again, improvements are welcome. I'm not the right person to do that.




Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Jonathan Wakely

On 27/09/16 23:50 +0200, Marc Glisse wrote:

On Tue, 27 Sep 2016, Jonathan Wakely wrote:


This adds the new 3D std::hypot() functions. This implementation seems
to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
hypot(hypot(x, y), z), and should be a bit more accurate at very large
or very small values due to reducing the arguments by the largest one.
Improvements welcome though, as this is not my forte.


I understand the claims about accuracy, but the one about speed seems 
very surprising to me. Was your test specifically with denormals on 
not-so-recent hardware, or specifically when sqrt does a library call 
for errno? Otherwise, I have a hard time believing that 3 
multiplications and 2 additions can be slower than 4 multiplications, 
2 additions, plus a bunch of tests and divisions.


I didn't test any denormals, just similar inputs to the ones in the
new testcase. Looking at my crappy benchmark again it seems I'd
commented out the sqrt(x*x+y*y+z*z) version in favour of a call to
gsl_hypot3 ... oops!

So what I committed is faster than hypot(hypot(x, y), z) but as you
guessed, slower than the simple sqrt(x*x+y*y+z*z).

I'm happy to defer to you in terms of what would be a better
implementation, as I'm very unlikely to ever need to use this. I don't
know whether we should be optimizing for speed or accuracy. But since
nobody else had contributed an implementation (and this was the one
C++17 feature libc++ has that we don't :-) I decided to add it.



[PATCH] Fix undefined behavior on g++.dg/pr77550.C (PR 77748)

2016-09-28 Thread Bernd Edlinger
Hi,

in this test case, c-reduce had created various forms of undefined
behaviour, which did only show up on m68k and arm-none-eabi.

Most notably the original problem did only happen when
512/sizeof(basic_string) = 7, but with a non-zero remainder.
In the way that c-reduce used to reduce that structure
there is an array of enum as a place holder.
But that depends on sizeof(enum) to be 4, which is
not the case on arm-none-eabi.

Other problems come from a not-initialized _M_string_length
which ends up in a memcmp crash.

And the _Alloc_hider constructor, which is linked to
libstdc++, but that works only as long as the signature
of that function is not changed, so it turned out
the test case would not link with gcc-6 for instance.

I found no way how to write that template specialization
without breaking the purpose of reproducing the
original PR, because __attribute((noinline, noclone))
seems to be completely ignored in this case.

I made sure that PR 77550 can still be reproduced
with this test case both on gcc-6 and trunk.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.2016-09-28  Bernd Edlinger  

	PR c++/77748
	* g++.dg/pr77550.C: Avoid undefined behavior.

ndex: pr77550.C
===
--- pr77550.C	(revision 240540)
+++ pr77550.C	(working copy)
@@ -1,6 +1,7 @@
 // { dg-do run }
 // { dg-options "-std=c++14 -O3" }
 
+#define enum enum __attribute((mode(SI)))
 namespace std {
 typedef int size_t;
 inline namespace __cxx11 {}
@@ -229,15 +230,17 @@
   struct _Alloc_hider {
 _Alloc_hider(pointer, allocator && = allocator());
   } _M_dataplus;
-  size_type _M_string_length;
+  size_type _M_string_length = 0;
   enum { _S_local_capacity = 15 } _M_local_buf[_S_local_capacity];
-  pointer _M_local_data();
-  void _M_set_length(size_type);
-  basic_string() : _M_dataplus(_M_local_data()) { _M_set_length(0); }
+  basic_string() : _M_dataplus(0) {}
   basic_string(const basic_string &) : _M_dataplus(0) {}
   size_type size() { return _M_string_length; }
   char *data() const {}
 };
+//template<> basic_string, std::allocator>::
+//_Alloc_hider::_Alloc_hider(char*, std::allocator&&) {}
+extern "C" void
+_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_ (...) {}
 }
 template 
 int operator==(basic_string<_CharT> &p1, const basic_string<_CharT> &p2) {


Re: [PATCH] Fix undefined behavior on g++.dg/pr77550.C (PR 77748)

2016-09-28 Thread Richard Biener
On Wed, 28 Sep 2016, Bernd Edlinger wrote:

> Hi,
> 
> in this test case, c-reduce had created various forms of undefined
> behaviour, which did only show up on m68k and arm-none-eabi.
> 
> Most notably the original problem did only happen when
> 512/sizeof(basic_string) = 7, but with a non-zero remainder.
> In the way that c-reduce used to reduce that structure
> there is an array of enum as a place holder.
> But that depends on sizeof(enum) to be 4, which is
> not the case on arm-none-eabi.
> 
> Other problems come from a not-initialized _M_string_length
> which ends up in a memcmp crash.
> 
> And the _Alloc_hider constructor, which is linked to
> libstdc++, but that works only as long as the signature
> of that function is not changed, so it turned out
> the test case would not link with gcc-6 for instance.
> 
> I found no way how to write that template specialization
> without breaking the purpose of reproducing the
> original PR, because __attribute((noinline, noclone))
> seems to be completely ignored in this case.
> 
> I made sure that PR 77550 can still be reproduced
> with this test case both on gcc-6 and trunk.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Yes.

Thanks,
Richard.

> 
> Thanks
> Bernd.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] libstdc++/77686 use may_alias for std::function storage

2016-09-28 Thread Jonathan Wakely

std::function::swap does swap(_M_functor, x._M_functor) which
exchanges the underlying bytes of the two _Any_data PODs using that
type's implicit assignment operator. However, unlike using placement
new to construct an object in the storage, simply memcpying the bytes
doesn't change the effective type of the storage, so alias analysis
decides it can't point to whatever we've tried to store in there.

This attribute tells the middle-end to assume anything those bytes
could contain any type of object, which is exactly what we want.

PR libstdc++/77686
* include/std/functional (_Any_data): Add may_alias attribute.

Tested powerpc64le-linux, committing to trunk and gcc-6-branch.

std::any doesn't have the same problem, because I defined an _Op_xfer
operation that uses placement new to copy the object into the
destination, so the dynamic type is changed correctly.  I haven't
checked whether optional and variant might have similar problems in
any implicit copy/move operations working on POD storage.


commit f8578f058f049c36ddfae1bf654f3dc93fd8b7ef
Author: Jonathan Wakely 
Date:   Wed Sep 28 11:41:08 2016 +0100

libstdc++/77686 use may_alias for std::function storage

PR libstdc++/77686
* include/std/functional (_Any_data): Add may_alias attribute.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 8b2389c..74e65c7 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1401,7 +1401,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 void (_Undefined_class::*_M_member_pointer)();
   };
 
-  union _Any_data
+  union [[gnu::may_alias]] _Any_data
   {
 void*   _M_access()   { return &_M_pod_data[0]; }
 const void* _M_access() const { return &_M_pod_data[0]; }


Re: [PATCH] libstdc++/77686 use may_alias for std::function storage

2016-09-28 Thread Richard Biener
On Wed, Sep 28, 2016 at 12:57 PM, Jonathan Wakely  wrote:
> std::function::swap does swap(_M_functor, x._M_functor) which
> exchanges the underlying bytes of the two _Any_data PODs using that
> type's implicit assignment operator. However, unlike using placement
> new to construct an object in the storage, simply memcpying the bytes
> doesn't change the effective type of the storage, so alias analysis
> decides it can't point to whatever we've tried to store in there.

To clarify -- the implicit assingment operator for PODs (including unions)
simply expands to an aggregate assignment which is subject to TBAA
rules and thus in this case instantiates an effective type of _Any_data.
Using memcpy would have worked as memcpy _does_ transfer
the effective type of the storage.  It wasn't points-to deciding things
here but TBAA given automatic storage X with effective type T
read via an lvalue of type _Any_data.

> This attribute tells the middle-end to assume anything those bytes
> could contain any type of object, which is exactly what we want.

It tells the middle-end that accessing storage via a _pointer_ to such
marked type may access storage with a dynamic type that is not
compatible with the type.

Details ;)

Btw, I think the patch is correct.

Richard.

> PR libstdc++/77686
> * include/std/functional (_Any_data): Add may_alias attribute.
>
> Tested powerpc64le-linux, committing to trunk and gcc-6-branch.
>
> std::any doesn't have the same problem, because I defined an _Op_xfer
> operation that uses placement new to copy the object into the
> destination, so the dynamic type is changed correctly.  I haven't
> checked whether optional and variant might have similar problems in
> any implicit copy/move operations working on POD storage.
>
>


Re: [PATCH] libstdc++/77686 use may_alias for std::function storage

2016-09-28 Thread Jonathan Wakely

On 28/09/16 13:14 +0200, Richard Biener wrote:

On Wed, Sep 28, 2016 at 12:57 PM, Jonathan Wakely  wrote:

std::function::swap does swap(_M_functor, x._M_functor) which
exchanges the underlying bytes of the two _Any_data PODs using that
type's implicit assignment operator. However, unlike using placement
new to construct an object in the storage, simply memcpying the bytes
doesn't change the effective type of the storage, so alias analysis
decides it can't point to whatever we've tried to store in there.


To clarify -- the implicit assingment operator for PODs (including unions)
simply expands to an aggregate assignment which is subject to TBAA
rules and thus in this case instantiates an effective type of _Any_data.
Using memcpy would have worked as memcpy _does_ transfer
the effective type of the storage.  It wasn't points-to deciding things
here but TBAA given automatic storage X with effective type T
read via an lvalue of type _Any_data.


This attribute tells the middle-end to assume anything those bytes
could contain any type of object, which is exactly what we want.


It tells the middle-end that accessing storage via a _pointer_ to such
marked type may access storage with a dynamic type that is not
compatible with the type.

Details ;)


But important ones, thanks.


[PATCH] [ARC COMMITTED] MAINTAINERS (Reviewers): Add myself.

2016-09-28 Thread Claudiu Zissulescu
From: claziss 

2016-09-28  Claudiu Zissulescu  

* MAINTAINERS (Reviewers): Add myself.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@240569 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   | 4 
 MAINTAINERS | 1 +
 2 files changed, 5 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index a51606c..e3dc6b7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2016-09-28  Claudiu Zissulescu  
+
+   * MAINTAINERS (Reviewers): Add myself.
+
 2016-09-26  Anton Kolesov  
 
* configure.ac: Disable "sim" directory for arc*-*-*.
diff --git a/MAINTAINERS b/MAINTAINERS
index 87157c9..644d02e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -276,6 +276,7 @@ check in changes outside of the parts of the compiler they 
maintain.
Reviewers
 
 aarch64 port   James Greenhalgh
+arc port   Claudiu Zissulescu  
 arm port   Kyrylo Tkachov  
 C front endMarek Polacek   
 dataflow   Paolo Bonzini   
-- 
1.9.1



Re: internal fn pretty printing

2016-09-28 Thread Nathan Sidwell

On 09/27/16 09:30, Bernd Schmidt wrote:

On 09/27/2016 12:58 PM, Nathan Sidwell wrote:

In working on some new code I got sufficiently frustrated to implement
pretty printing on internal function discriminators, as I think one of
you suggested a while back.  With this patch we get:

 .data_dep.2 = UNIQUE (OACC_FORK, .data_dep.2, -1);

rather than an obscure raw integer for the first argument.


I realized I could simplify the comma separator logic, by checking i for 
non-zeroness before emitting the arg.  Committed as obvious.


nathan
2016-09-28  Nathan Sidwell  

	* gimple-pretty-print.c (dump_gimple_call_args): Simplify "' "
	printing.

Index: gimple-pretty-print.c
===
--- gimple-pretty-print.c	(revision 240552)
+++ gimple-pretty-print.c	(working copy)
@@ -649,26 +649,21 @@ dump_gimple_call_args (pretty_printer *b
 	{
 	  i++;
 	  pp_string (buffer, enums[v]);
-	  if (i < gimple_call_num_args (gs))
-		pp_string (buffer, ", ");
 	}
 	}
 }
 
   for (; i < gimple_call_num_args (gs); i++)
 {
-  dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false);
-  if (i < gimple_call_num_args (gs) - 1)
+  if (i)
 	pp_string (buffer, ", ");
+  dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false);
 }
 
   if (gimple_call_va_arg_pack_p (gs))
 {
-  if (gimple_call_num_args (gs) > 0)
-{
-  pp_comma (buffer);
-  pp_space (buffer);
-}
+  if (i)
+	pp_string (buffer, ", ");
 
   pp_string (buffer, "__builtin_va_arg_pack ()");
 }


Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Marc Glisse

On Wed, 28 Sep 2016, Jonathan Wakely wrote:


On 27/09/16 23:28 +, Joseph Myers wrote:

On Tue, 27 Sep 2016, Jonathan Wakely wrote:


This adds the new 3D std::hypot() functions. This implementation seems
to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
hypot(hypot(x, y), z), and should be a bit more accurate at very large
or very small values due to reducing the arguments by the largest one.
Improvements welcome though, as this is not my forte.


Should I take it from this implementation that C++ is not concerned by
certain considerations that would arise for C: spurious underflow
exceptions from the scaling when some arguments much larger than others;
spurious "invalid" exceptions from the comparisons when any argument is
(quiet) NaN; handling of mixed (quiet) NaN and Inf arguments (where ISO C
Annex F has Inf returned, not NaN)?


The entire spec from the C++ draft is:  Returns: √ x^2 + y^2 + z^2

It doesn't mention "without undue overflow or underflow" as the C
standard does for hypot(x, y). Handling those conditions would of
course be nice, but I'm not capable of doing so, and I'd rather have a
poor implementation than none.


p0030r0 had "In addition to the two-argument versions of certain math
functions in  and its float and long double overloads, C++ adds
three-argument versions of these functions that follow similar semantics
as their two-argument counterparts."

One could interpret that as an intent to follow the C semantics. On the
other hand, the paper mentions speed and convenience as the main selling
points. Maybe LWG could clarify the intent?

--
Marc Glisse


[PATCH][1/2] Fix PR77768

2016-09-28 Thread Richard Biener

I am testing the following patch to avoid useless VRP range allocations
when we just ask for varying on stmts we don't know how to handle.
I think it should fix the PR where we end up assigning to the
static const vr_const_varying returned by get_value_range in the early VRP
context.

Eventually the VRP lattice needs to become sparse (or just growable, I 
have a patch for that as well).  Not until early VRP gets some more
capabilities though, it would be a waste otherwise.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-09-28  Richard Biener  

* tree-vrp.c (set_defs_to_varying): New helper.
(vrp_initialize): Call it.
(vrp_visit_stmt): Likewise.
(evrp_dom_walker::before_dom_children): Likewise.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 240568)
+++ gcc/tree-vrp.c  (working copy)
@@ -343,6 +343,24 @@ set_value_range_to_varying (value_range
 }
 
 
+/* Set value-ranges of all SSA names defined by STMT to varying.  */
+
+static void
+set_defs_to_varying (gimple *stmt)
+{
+  ssa_op_iter i;
+  tree def;
+  FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF)
+{
+  unsigned ver = SSA_NAME_VERSION (def);
+  /* Avoid needlessly allocating value-ranges.  */
+  if (num_vr_values >= ver
+ && vr_value[ver])
+   set_value_range_to_varying (vr_value[ver]);
+}
+}
+
+
 /* Set value range VR to {T, MIN, MAX, EQUIV}.  */
 
 static void
@@ -7022,10 +7040,7 @@ vrp_initialize ()
prop_set_simulate_again (stmt, true);
  else if (!stmt_interesting_for_vrp (stmt))
{
- ssa_op_iter i;
- tree def;
- FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF)
-   set_value_range_to_varying (get_value_range (def));
+ set_defs_to_varying (stmt);
  prop_set_simulate_again (stmt, false);
}
  else
@@ -7901,8 +7916,6 @@ vrp_visit_stmt (gimple *stmt, edge *take
 {
   value_range vr = VR_INITIALIZER;
   tree lhs = gimple_get_lhs (stmt);
-  tree def;
-  ssa_op_iter iter;
   extract_range_from_stmt (stmt, taken_edge_p, output_p, &vr);
 
   if (*output_p)
@@ -7997,8 +8010,7 @@ vrp_visit_stmt (gimple *stmt, edge *take
 
   /* All other statements produce nothing of interest for VRP, so mark
  their outputs varying and prevent further simulation.  */
-  FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_DEF)
-set_value_range_to_varying (get_value_range (def));
+  set_defs_to_varying (stmt);
 
   return (*taken_edge_p) ? SSA_PROP_INTERESTING : SSA_PROP_VARYING;
 }
@@ -10726,12 +10738,7 @@ evrp_dom_walker::before_dom_children (ba
  && (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE))
update_value_range (output, &vr);
  else
-   {
- tree def;
- ssa_op_iter iter;
- FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_DEF)
-   set_value_range_to_varying (get_value_range (def));
-   }
+   set_defs_to_varying (stmt);
 
  /* Try folding stmts with the VR discovered.  */
  bool did_replace
@@ -10780,12 +10787,7 @@ evrp_dom_walker::before_dom_children (ba
}
}
   else
-   {
- tree def;
- ssa_op_iter iter;
- FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_DEF)
-   set_value_range_to_varying (get_value_range (def));
-   }
+   set_defs_to_varying (stmt);
 }
   bb->flags |= BB_VISITED;
   return NULL;


[PATCH][2/2] Fix PR77768

2016-09-28 Thread Richard Biener

The following patch makes VN not choke on code that stores to readonly
memory it knows the constant value of.  I took the liberty to clean up
the surrounding code a bit as well.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-09-28  Richard Biener  

PR tree-optimization/77768
* tree-ssa-sccvn.c (visit_reference_op_store): Properly deal
with stores to a place we know has a constant value.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 240566)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3575,7 +3575,7 @@ visit_reference_op_store (tree lhs, tree
 {
   bool changed = false;
   vn_reference_t vnresult = NULL;
-  tree result, assign;
+  tree assign;
   bool resultsame = false;
   tree vuse = gimple_vuse (stmt);
   tree vdef = gimple_vdef (stmt);
@@ -3599,9 +3599,11 @@ visit_reference_op_store (tree lhs, tree
  Otherwise, the vdefs for the store are used when inserting into
  the table, since the store generates a new memory state.  */
 
-  result = vn_reference_lookup (lhs, vuse, VN_NOWALK, &vnresult, false);
-  if (result)
+  vn_reference_lookup (lhs, vuse, VN_NOWALK, &vnresult, false);
+  if (vnresult
+  && vnresult->result)
 {
+  tree result = vnresult->result;
   if (TREE_CODE (result) == SSA_NAME)
result = SSA_VAL (result);
   resultsame = expressions_equal_p (result, op);
@@ -3616,22 +3618,21 @@ visit_reference_op_store (tree lhs, tree
}
 }
 
-  if ((!result || !resultsame)
+  if (!resultsame)
+{
   /* Only perform the following when being called from PRE
 which embeds tail merging.  */
-  && default_vn_walk_kind == VN_WALK)
-{
-  assign = build2 (MODIFY_EXPR, TREE_TYPE (lhs), lhs, op);
-  vn_reference_lookup (assign, vuse, VN_NOWALK, &vnresult, false);
-  if (vnresult)
+  if (default_vn_walk_kind == VN_WALK)
{
- VN_INFO (vdef)->use_processed = true;
- return set_ssa_val_to (vdef, vnresult->result_vdef);
+ assign = build2 (MODIFY_EXPR, TREE_TYPE (lhs), lhs, op);
+ vn_reference_lookup (assign, vuse, VN_NOWALK, &vnresult, false);
+ if (vnresult)
+   {
+ VN_INFO (vdef)->use_processed = true;
+ return set_ssa_val_to (vdef, vnresult->result_vdef);
+   }
}
-}
 
-  if (!result || !resultsame)
-{
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "No store match\n");
@@ -3644,9 +3645,7 @@ visit_reference_op_store (tree lhs, tree
   /* Have to set value numbers before insert, since insert is
 going to valueize the references in-place.  */
   if (vdef)
-   {
- changed |= set_ssa_val_to (vdef, vdef);
-   }
+   changed |= set_ssa_val_to (vdef, vdef);
 
   /* Do not insert structure copies into the tables.  */
   if (is_gimple_min_invariant (op)


Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Jonathan Wakely

On 28/09/16 13:31 +0200, Marc Glisse wrote:

On Wed, 28 Sep 2016, Jonathan Wakely wrote:


On 27/09/16 23:28 +, Joseph Myers wrote:

On Tue, 27 Sep 2016, Jonathan Wakely wrote:


This adds the new 3D std::hypot() functions. This implementation seems
to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
hypot(hypot(x, y), z), and should be a bit more accurate at very large
or very small values due to reducing the arguments by the largest one.
Improvements welcome though, as this is not my forte.


Should I take it from this implementation that C++ is not concerned by
certain considerations that would arise for C: spurious underflow
exceptions from the scaling when some arguments much larger than others;
spurious "invalid" exceptions from the comparisons when any argument is
(quiet) NaN; handling of mixed (quiet) NaN and Inf arguments (where ISO C
Annex F has Inf returned, not NaN)?


The entire spec from the C++ draft is:  Returns: √ x^2 + y^2 + z^2

It doesn't mention "without undue overflow or underflow" as the C
standard does for hypot(x, y). Handling those conditions would of
course be nice, but I'm not capable of doing so, and I'd rather have a
poor implementation than none.


p0030r0 had "In addition to the two-argument versions of certain math
functions in  and its float and long double overloads, C++ adds
three-argument versions of these functions that follow similar semantics
as their two-argument counterparts."

One could interpret that as an intent to follow the C semantics. On the
other hand, the paper mentions speed and convenience as the main selling
points. Maybe LWG could clarify the intent?


I expect the answer will be that it's QoI.



[PATCH] Avoid store forwarding issue in vectorizing strided SLP loads

2016-09-28 Thread Richard Biener

Currently strided SLP vectorization creates vector constructors composed
of vector elements.  This is a constructor form that is not handled
specially by the expander but it gets expanded via piecewise stores
to scratch memory and a load of that scratch memory.  This is obviously
bad on any modern CPU which can do store forwarding (which in this case
does not work on any CPU I know of).  The following patch simply avoids
the issue by making the vectorizer create integer loads, composing
a vector of that integers and then punning that to the desired vector
type.  Thus (V4SF){V2SF, V2SF} becomes (V4SF)(V2DI){DI, DI} and
every body is happy.  Especially x264 gets a 5-10% improvement
(dependent on vector size and x86 sub-architecture).

Handling the vector-vector constructors on the expander side would
require either similar punning or making vec_init parametric on
the element mode plus supporting vector elements in all targets
(which in the end probably will simply pun them similarly).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2016-09-28  Richard Biener  

* tree-vect-stmts.c (vectorizable_load): Avoid emitting vector
constructors with vector elements.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 240565)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -6862,17 +6925,40 @@ vectorizable_load (gimple *stmt, gimple_
   int nloads = nunits;
   int lnel = 1;
   tree ltype = TREE_TYPE (vectype);
+  tree lvectype = vectype;
   auto_vec dr_chain;
   if (memory_access_type == VMAT_STRIDED_SLP)
{
- nloads = nunits / group_size;
  if (group_size < nunits)
{
- lnel = group_size;
- ltype = build_vector_type (TREE_TYPE (vectype), group_size);
+ /* Avoid emitting a constructor of vector elements by performing
+the loads using an integer type of the same size,
+constructing a vector of those and then re-interpreting it
+as the original vector type.  This works around the fact
+that the vec_init optab was only designed for scalar
+element modes and thus expansion goes through memory.
+This avoids a huge runtime penalty due to the general
+inability to perform store forwarding from smaller stores
+to a larger load.  */
+ unsigned lsize
+   = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
+ enum machine_mode elmode = mode_for_size (lsize, MODE_INT, 0);
+ enum machine_mode vmode = mode_for_vector (elmode,
+nunits / group_size);
+ /* If we can't construct such a vector fall back to
+element loads of the original vector type.  */
+ if (VECTOR_MODE_P (vmode)
+ && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing)
+   {
+ nloads = nunits / group_size;
+ lnel = group_size;
+ ltype = build_nonstandard_integer_type (lsize, 1);
+ lvectype = build_vector_type (ltype, nloads);
+   }
}
  else
{
+ nloads = 1;
  lnel = nunits;
  ltype = vectype;
}
@@ -6925,9 +7011,17 @@ vectorizable_load (gimple *stmt, gimple_
}
  if (nloads > 1)
{
- tree vec_inv = build_constructor (vectype, v);
- new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
+ tree vec_inv = build_constructor (lvectype, v);
+ new_temp = vect_init_vector (stmt, vec_inv, lvectype, gsi);
  new_stmt = SSA_NAME_DEF_STMT (new_temp);
+ if (lvectype != vectype)
+   {
+ new_stmt = gimple_build_assign (make_ssa_name (vectype),
+ VIEW_CONVERT_EXPR,
+ build1 (VIEW_CONVERT_EXPR,
+ vectype, new_temp));
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+   }
}
 
  if (slp)



Re: [BUILDROBOT] tic6x-uclinux: undefined reference to `gnu_libc_printf_pointer_format(tree_node*, char const**)' (was: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905))

2016-09-28 Thread Segher Boessenkool
On Tue, Sep 27, 2016 at 12:08:46AM +0200, Jan-Benedict Glaw wrote:
> Hi Martin,
> 
> On Thu, 2016-09-08 13:03:12 -0600, Martin Sebor  wrote:
> > Attached is another update to the patch to address the last round
> > of comments and suggestions, most notably to:
> [...]
> 
> with the currently committed version, the tic6x-uclinux target fails,
> see ie.
> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=630308 :

[ snip ]

The same happens for bfin-uclinux:

g++ -no-pie   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H 
-static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o 
attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o c/c-aux-info.o 
c/c-objc-common.o c/c-parser.o c/c-array-notation.o c/c-fold.o 
c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o 
c-family/c-format.o c-family/c-gimplify.o c-family/c-indentation.o 
c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o 
c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o 
c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o 
c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o default-c.o 
\
  cc1-checksum.o libbackend.a main.o libcommon-target.a libcommon.a 
../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a 
../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a   -lmpc -lmpfr -lgmp 
-rdynamic -ldl  -L./../zlib -lz
bfin.o:(.data.rel+0x778): undefined reference to 
`gnu_libc_printf_pointer_format(tree_node*, char const**)'

(that symbol would be defined in linux.o, but that file is not built for
uclinux).


Segher


Re: Implement -Wimplicit-fallthrough (version 9)

2016-09-28 Thread Michael Matz
Hi,

On Tue, 27 Sep 2016, Tom Tromey wrote:

> The point of the warning is to make code more robust.  But accepting any 
> comment like "Don't fall through" is not more robust, but rather an 
> error waiting to happen; as IIUC the user has no way to detect this 
> case.
> 
> I think it's better for the comment-scanning feature to be very picky 
> (or even just not exist at all) -- that way you know exactly what you 
> are getting.  Lint was traditionally picky IIRC.  And, this is a warning 
> that isn't default and can also be disabled, so it's not as if users 
> have no recourse.

Not accepting
  /* And here we intentionally fall through because ... */
and forcing users to replace this by:
  /* fallthrough */
is not robust either.  It's actually actively lowering robustness of code, 
it creates work for programmers that will be regarded as pointless (and 
rightly so) and will merely lead to everybody disabling the warning (see 
our generated files) at which point we could just as well not have 
implemented it (which would be a shame because I think it's genuinely 
useful).

The point of warnings is to make code robust under the condition of not 
being a pain by giving zillions of false positives.  In this specific case 
the chance of giving false positives by being picky in how to disable the 
warning is very high.  On the other hand the chance of unintentionally 
disabling the warning by a negative comment like "Don't fall through here" 
is low, because presumably the one adding that comment (and hence thinking 
about that part of the code) also in fact put in the "break;" afterwards.

The argument with lint being picky would apply only if GCC would have 
added this warning maybe 20 years ago, not now where nearly nobody even 
knows what lint is, which lead to a large existing code base not having 
comments that would be accepted by lint but comments that do specify the 
intent of falling through.


Ciao,
Michael.
P.S.: Initially I even wanted to argue that the mere existence of _any_ 
comment before a case label would disable the warning.  I don't have the 
numbers but I bet even that version would have found the very same bugs 
that the picky version has.


Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Joseph Myers
On Wed, 28 Sep 2016, Jonathan Wakely wrote:

> On 27/09/16 23:28 +, Joseph Myers wrote:
> > On Tue, 27 Sep 2016, Jonathan Wakely wrote:
> > 
> > > This adds the new 3D std::hypot() functions. This implementation seems
> > > to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
> > > hypot(hypot(x, y), z), and should be a bit more accurate at very large
> > > or very small values due to reducing the arguments by the largest one.
> > > Improvements welcome though, as this is not my forte.
> > 
> > Should I take it from this implementation that C++ is not concerned by
> > certain considerations that would arise for C: spurious underflow
> > exceptions from the scaling when some arguments much larger than others;
> > spurious "invalid" exceptions from the comparisons when any argument is
> > (quiet) NaN; handling of mixed (quiet) NaN and Inf arguments (where ISO C
> > Annex F has Inf returned, not NaN)?
> 
> The entire spec from the C++ draft is:  Returns: √ x^2 + y^2 + z^2

As a further issue: even if you ignore exceptions and Annex F issues, and 
don't have NaN arguments, if you have an Inf argument it will be the 
largest, and so your code will do Inf/Inf, and so return NaN, if any 
argument is Inf (when obviously the correct answer in that case is Inf).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Define 3-argument overloads of std::hypot for C++17 (P0030R1)

2016-09-28 Thread Jonathan Wakely

On 28/09/16 12:40 +, Joseph Myers wrote:

On Wed, 28 Sep 2016, Jonathan Wakely wrote:


On 27/09/16 23:28 +, Joseph Myers wrote:
> On Tue, 27 Sep 2016, Jonathan Wakely wrote:
>
> > This adds the new 3D std::hypot() functions. This implementation seems
> > to be faster than the naïve sqrt(x*x + y*y + z*z) implementation, or
> > hypot(hypot(x, y), z), and should be a bit more accurate at very large
> > or very small values due to reducing the arguments by the largest one.
> > Improvements welcome though, as this is not my forte.
>
> Should I take it from this implementation that C++ is not concerned by
> certain considerations that would arise for C: spurious underflow
> exceptions from the scaling when some arguments much larger than others;
> spurious "invalid" exceptions from the comparisons when any argument is
> (quiet) NaN; handling of mixed (quiet) NaN and Inf arguments (where ISO C
> Annex F has Inf returned, not NaN)?

The entire spec from the C++ draft is:  Returns: √ x^2 + y^2 + z^2


As a further issue: even if you ignore exceptions and Annex F issues, and
don't have NaN arguments, if you have an Inf argument it will be the
largest, and so your code will do Inf/Inf, and so return NaN, if any
argument is Inf (when obviously the correct answer in that case is Inf).


I've opened https://gcc.gnu.org/PR6 but I don't plan to work on
it.




[PATCH 1/2] Temporary remove "at least 8 byte alignment" code from x86

2016-09-28 Thread Denys Vlasenko
This change drops forced alignment to 8 if requested alignment is higher
than 8: before the patch, -falign-functions=9 was generating

.p2align 4,,8
.p2align 3

which means: "align to 16 if the skip is 8 bytes or less; else align to 8".
After this change, ".p2align 3" is not emitted.

This behavior will be implemented differently by the next patch.

The new SUBALIGN_LOG define will be used by the next patch.

2016-09-27  Denys Vlasenko  

* config/i386/freebsd.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Remove "If N
is large, do at least 8 byte alignment" code. Add SUBALIGN_LOG
define.
* config/i386/gnu-user.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/iamcu.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/openbsdelf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/x86-64.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.

Index: gcc/config/i386/freebsd.h
===
--- gcc/config/i386/freebsd.h   (revision 239390)
+++ gcc/config/i386/freebsd.h   (working copy)
@@ -92,25 +92,19 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
command to advance the location counter to a multiple of 1<= (1<<(LOG))) \
+fprintf ((FILE), "\t.p2align %d\n", (LOG));\
+  else \
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
-   /* Make sure that we have at least 8 byte alignment if > 8 byte \
-  alignment is preferred.  */  \
-   if ((LOG) > 3   \
-   && (1 << (LOG)) > ((MAX_SKIP) + 1)  \
-   && (MAX_SKIP) >= 7) \
- fputs ("\t.p2align 3\n", (FILE)); \
-  }
\
 }  \
   } while (0)
 #endif
Index: gcc/config/i386/gnu-user.h
===
--- gcc/config/i386/gnu-user.h  (revision 239390)
+++ gcc/config/i386/gnu-user.h  (working copy)
@@ -94,24 +94,18 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
command to advance the location counter to a multiple of 1<= (1<<(LOG))) \
+fprintf ((FILE), "\t.p2align %d\n", (LOG));\
+  else \
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
-   /* Make sure that we have at least 8 byte alignment if > 8 byte \
-  alignment is preferred.  */  \
-   if ((LOG) > 3   \
-   && (1 << (LOG)) > ((MAX_SKIP) + 1)  \
-   && (MAX_SKIP) >= 7) \
- fputs ("\t.p2align 3\n", (FILE)); \
-  }
\
 }  \
   } while (0)
 #endif
Index: gcc/config/i386/iamcu.h
===
--- gcc/config/i386/iamcu.h (revision 239390)
+++ gcc/config/i386/iamcu.h (working copy)
@@ -62,23 +62,17 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 /* A C statement to output to the stdio stream FILE an assembler
command to advance the location counter to a multiple of 1<= (1<<(LOG))) \
+fprintf ((FILE), "\t.p2align %d\n", (LOG));\
+  else \
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
-   /* Make sure that we have at least 8 byte alignment if > 8 byte \
-  alignment is preferred.  */  \
-   if ((LOG) > 3   \
-   && (1 << (LOG)) > ((MAX_SKIP) + 1)  \
-   && (MAX_SKIP) >= 7) \
- fputs ("\t.p2align 3\n", (FILE)); \
-  }
\
 }  \
   } while (0)
 
Index: gcc/config/i386/openbsdelf.h
===
--- gcc/config/i386/openbsdelf.h(revision 239390)
+++ gcc/config/i386/openbsdelf.h(working copy)
@@ -63,24 +63,18 @@ along with GCC; see the file COPYING3.  If not see
 
 /* 

[PATCH 0/2] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 4

2016-09-28 Thread Denys Vlasenko
These patches are for this bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240
"RFE: extend -falign-xyz syntax"

This is version 4 of the patch set. Changes since version 3:

* Improved documentation in invoke.texi

* Fixed x86-specific calculation of default N2 value:
  previous version was doing it incorrectly for cross-compile


[PATCH 2/2] Extend -falign-FOO=N to N[,M[,N2[,M2]]]

2016-09-28 Thread Denys Vlasenko
falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte 
alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
On 05/20/2015 02:47 AM, Linus Torvalds wrote:
> At the same time, I have to admit that I abhor a 64-byte function
> alignment, when we have a fair number of functions that are (much)
> smaller than that.
>
> Is there some way to get gcc to take the size of the function into
> account? Because aligning a 16-byte or 32-byte function on a 64-byte
> alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
.p2align 4,,9
.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. But now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2016-09-27  Denys Vlasenko  

* common.opt (-falign-functions=): Accept a string instead of integer.
(-falign-jumps=): Likewise.
(-falign-labels=): Likewise.
(-falign-loops=): Likewise.
* flags.h (struct target_flag_state): Revamp how alignment data is stored:
for each of four alignment types, store two pairs of log/maxskip values.
* toplev.c (read_uint): New function.
(read_log_maxskip): New function.
(parse_N_M): New function.
(init_alignments): Set align_FOO[0/1].log/maxskip from
specified falign-FOO=N[,M[,N[,M]]] options.
* varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
macros, first for N,M and second time for N2,M2 from
falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
* config/aarch64/aarch64-protos.h (struct tune_params):
Change foo_align members from integers to strings.
* config/i386/i386.c (struct ptt): Likewise.
* config/aarch64/aarch64.c (_tunings):
Change foo_align field values from integers to strings.
* config/i386/i386.c (processor_target_table): Likewise.
* config/arm/arm.c (arm_override_options_after_change_1):
Fix if() condition to detect that -falign-functions is specified.
* config/i386/i386.c (ix86_default_align): Likewise.
* common/config/i386/i386-common.c (ix86_handle_option):
Remove conversion of malign_foo's to falign_foo's.
The warning is still emitted.
* doc/invoke.texi: Update option documentation.
* testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/common/config/i386/i386-common.c
===
--- gcc/common/config/i386/i386-common.c(revision 239860)
+++ gcc/common/config/i386/i386-common.c(working copy)
@@ -998,36 +998,17 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
-
-  /* Comes from final.c -- no real reason to change it.  */
-#define MAX_CODE_ALIGN 16
-
 case OPT_malign_loops_:
   warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
-  if (value > MAX_CODE_ALIGN)
-   error_at (loc, "-malign-loops=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
-  else
-   opts->x_align_loops = 1 << value;
   return true;
 
 case OPT_malign_jumps_:
   warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps");
-  if (value > MAX_CODE_ALIGN)
-   error_at (loc, "-malign-jumps=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
-  else
-   opts->x_align_jumps = 1 << value;
   return true;
 
 case OPT_malign_functions_:
   warning_at (loc, 0,
  "-malign-functions is obsolete, use -falign-functions");
-  if (value > MAX_CODE_ALIGN)
-   error_at 

Re: [Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-09-28 Thread Alessandro Fanfarillo
* PING *

2016-09-21 19:03 GMT+01:00 Alessandro Fanfarillo :
> Thanks Andre.
>
> 2016-09-19 9:55 GMT-06:00 Andre Vehreschild :
>> Hi Alessandro,
>
>> The if in resolve.c at 8837: resolve_failed_image (... is intentional? It is
>> doing nothing. So do you plan to add more code, or will there never be
>> anything. If the later I recommend to just put a comment there and remove the
>> empty if.
>
> I added the if statement during the development and I forgot to remove it.
>
>>
>> There still is no test when -fcoarray=single is used. This shouldn't be so
>> hard, should it?
>
> Done.
>
> Built and regtested on x86_64-pc-linux-gnu.
>
>>
>> Regards,
>> Andre
>>
>> On Mon, 19 Sep 2016 08:30:12 -0700
>> Alessandro Fanfarillo  wrote:
>>
>>> * PING *
>>>
>>> On Sep 7, 2016 3:01 PM, "Alessandro Fanfarillo" 
>>> wrote:
>>>
>>> > Dear all,
>>> > the attached patch supports failed images also when -fcoarray=single is
>>> > used.
>>> >
>>> > Built and regtested on x86_64-pc-linux-gnu.
>>> >
>>> > Cheers,
>>> > Alessandro
>>> >
>>> > 2016-08-09 5:22 GMT-06:00 Paul Richard Thomas <
>>> > paul.richard.tho...@gmail.com>:
>>> > > Hi Sandro,
>>> > >
>>> > > As far as I can see, this is OK barring a couple of minor wrinkles and
>>> > > a question:
>>> > >
>>> > > For coarray_failed_images_err.f90 and coarray_image_status_err.f90 you
>>> > > have used the option -fdump-tree-original without making use of the
>>> > > tree dump.
>>> > >
>>> > > Mikael asked you to provide an executable test with -fcoarray=single.
>>> > > Is this not possible for some reason?
>>> > >
>>> > > Otherwise, this is OK for trunk.
>>> > >
>>> > > Thanks for the patch.
>>> > >
>>> > > Paul
>>> > >
>>> > > On 4 August 2016 at 05:07, Alessandro Fanfarillo
>>> > >  wrote:
>>> > >> * PING *
>>> > >>
>>> > >> 2016-07-21 13:05 GMT-06:00 Alessandro Fanfarillo <
>>> > fanfarillo@gmail.com>:
>>> > >>> Dear Mikael and all,
>>> > >>>
>>> > >>> in attachment the new patch, built and regtested on
>>> > x86_64-pc-linux-gnu.
>>> > >>>
>>> > >>> Cheers,
>>> > >>> Alessandro
>>> > >>>
>>> > >>> 2016-07-20 13:17 GMT-06:00 Mikael Morin :
>>> >  Le 20/07/2016 à 11:39, Andre Vehreschild a écrit :
>>> > >
>>> > > Hi Mikael,
>>> > >
>>> > >
>>> > >>> +  if(st == ST_FAIL_IMAGE)
>>> > >>> +new_st.op = EXEC_FAIL_IMAGE;
>>> > >>> +  else
>>> > >>> +gcc_unreachable();
>>> > >>
>>> > >> You can use
>>> > >> gcc_assert (st == ST_FAIL_IMAGE);
>>> > >> foo...;
>>> > >> instead of
>>> > >> if (st == ST_FAIL_IMAGE)
>>> > >> foo...;
>>> > >> else
>>> > >> gcc_unreachable ();
>>> > >
>>> > >
>>> > > Be careful, this is not 100% identical in the general case. For 
>>> > > older
>>> > > gcc version (gcc < 4008) gcc_assert() is mapped to nothing, esp. not
>>> > to
>>> > > an abort(), so the behavior can change. But in this case everything
>>> > is
>>> > > fine, because the patch is most likely not backported.
>>> > >
>>> >  Didn't know about this. The difference seems to be very subtle.
>>> >  I don't mind much anyway. The original version can stay if preferred,
>>> > this
>>> >  was just a suggestion.
>>> > 
>>> >  By the way, if the function is inlined in its single caller, the
>>> > assert or
>>> >  unreachable statement can be removed, which avoids choosing between
>>> > them.
>>> >  That's another suggestion.
>>> > 
>>> > 
>>> > >>> +
>>> > >>> +  return MATCH_YES;
>>> > >>> +
>>> > >>> + syntax:
>>> > >>> +  gfc_syntax_error (st);
>>> > >>> +
>>> > >>> +  return MATCH_ERROR;
>>> > >>> +}
>>> > >>> +
>>> > >>> +match
>>> > >>> +gfc_match_fail_image (void)
>>> > >>> +{
>>> > >>> +  /* if (!gfc_notify_std (GFC_STD_F2008_TS, "FAIL IMAGE statement
>>> > >>> at %C")) */
>>> > >>> +  /*   return MATCH_ERROR; */
>>> > >>> +
>>> > >>
>>> > >> Can this be uncommented?
>>> > >>
>>> > >>> +  return fail_image_statement (ST_FAIL_IMAGE);
>>> > >>> +}
>>> > >>>
>>> > >>>  /* Match LOCK/UNLOCK statement. Syntax:
>>> > >>>   LOCK ( lock-variable [ , lock-stat-list ] )
>>> > >>> diff --git a/gcc/fortran/trans-intrinsic.c
>>> > >>> b/gcc/fortran/trans-intrinsic.c index 1aaf4e2..b2f5596 100644
>>> > >>> --- a/gcc/fortran/trans-intrinsic.c
>>> > >>> +++ b/gcc/fortran/trans-intrinsic.c
>>> > >>> @@ -1647,6 +1647,24 @@ trans_this_image (gfc_se * se, gfc_expr
>>> > >>> *expr) m, lbound));
>>> > >>>  }
>>> > >>>
>>> > >>> +static void
>>> > >>> +gfc_conv_intrinsic_image_status (gfc_se *se, gfc_expr *expr)
>>> > >>> +{
>>> > >>> +  unsigned int num_args;
>>> > >>> +  tree *args,tmp;
>>> > >>> +
>>> > >>> +  num_args = gfc_intrinsic_argument_list_length (expr);
>>> > >>> +  args = XALLOCAVEC (tree, num_args);
>>> > >>> +
>>> > >>>

RE: [PATCH] [ARC] Add simple shift/rotate ops.

2016-09-28 Thread Claudiu Zissulescu
> 
> This looks good to me.
> 
> Thanks,
> Andrew
> 
Committed r240576.

Thank you for your review,
Claudiu



[PATCH] Fix PR77407

2016-09-28 Thread Richard Biener

The following patch implements patterns to catch x / abs (x)
and x / -x, taking advantage of undefinedness at x == 0 as
opposed to the PR having testcases with explicit != 0 checks.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-09-28  Richard Biener  

PR middle-end/77407
* match.pd: Add X / abs (X) -> X < 0 ? -1 : 1 and
X / -X -> -1 simplifications.

* gcc.dg/pr77407.c: New testcase.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 240565)
+++ gcc/match.pd(working copy)
@@ -147,12 +147,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (op @0 integer_onep)
 (non_lvalue @0)))
 
-/* X / -1 is -X.  */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
+  /* X / -1 is -X.  */
  (simplify
(div @0 integer_minus_onep@1)
(if (!TYPE_UNSIGNED (type))
-(negate @0
+(negate @0)))
+ /* X / abs (X) is X < 0 ? -1 : 1.  */ 
+ (simplify
+   (div @0 (abs @0))
+   (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+   && TYPE_OVERFLOW_UNDEFINED (type))
+(cond (lt @0 { build_zero_cst (type); })
+  { build_minus_one_cst (type); } { build_one_cst (type); })))
+ /* X / -X is -1.  */
+ (simplify
+   (div @0 (negate @0))
+   (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+   && TYPE_OVERFLOW_UNDEFINED (type))
+{ build_minus_one_cst (type); })))
 
 /* For unsigned integral types, FLOOR_DIV_EXPR is the same as
TRUNC_DIV_EXPR.  Rewrite into the latter in this case.  */
Index: gcc/testsuite/gcc.dg/pr77407.c
===
--- gcc/testsuite/gcc.dg/pr77407.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr77407.c  (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-overflow -fdump-tree-gimple" } */
+
+int foo (int c)
+{
+  if (c != 0)
+c /= __builtin_abs (c);
+  return c;
+}
+
+int bar (int c)
+{
+  if (c != 0)
+c /= -c;
+  return c;
+}
+
+/* { dg-final { scan-tree-dump-times "/" 0 "gimple" } } */


[PATCH] Fix (part of) PR77399

2016-09-28 Thread Richard Biener

This fixes the original request in PR77399, better handling of

typedef int   v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));
v4sf vec_cast(v4si f)
{
  return (v4sf){f[0], f[1], f[2], f[3]};
}

which nicely fits into the existing simplify_vector_constructor code.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-09-28  Richard Biener  

PR tree-optimization/77399
* tree-ssa-forwprop.c (simplify_vector_constructor): Handle
float <-> int conversions.

* gcc.dg/tree-ssa/forwprop-35.c: New testcase.

Index: gcc/tree-ssa-forwprop.c
===
*** gcc/tree-ssa-forwprop.c (revision 240565)
--- gcc/tree-ssa-forwprop.c (working copy)
*** simplify_vector_constructor (gimple_stmt
*** 1953,1959 
gimple *def_stmt;
tree op, op2, orig, type, elem_type;
unsigned elem_size, nelts, i;
!   enum tree_code code;
constructor_elt *elt;
unsigned char *sel;
bool maybe_ident;
--- 1953,1959 
gimple *def_stmt;
tree op, op2, orig, type, elem_type;
unsigned elem_size, nelts, i;
!   enum tree_code code, conv_code;
constructor_elt *elt;
unsigned char *sel;
bool maybe_ident;
*** simplify_vector_constructor (gimple_stmt
*** 1970,1975 
--- 1970,1976 
  
sel = XALLOCAVEC (unsigned char, nelts);
orig = NULL;
+   conv_code = ERROR_MARK;
maybe_ident = true;
FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
  {
*** simplify_vector_constructor (gimple_stmt
*** 1984,1989 
--- 1985,2008 
if (!def_stmt)
return false;
code = gimple_assign_rhs_code (def_stmt);
+   if (code == FLOAT_EXPR
+ || code == FIX_TRUNC_EXPR)
+   {
+ op1 = gimple_assign_rhs1 (def_stmt);
+ if (conv_code == ERROR_MARK)
+   {
+ if (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (elt->value)))
+ != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1
+   return false;
+ conv_code = code;
+   }
+ else if (conv_code != code)
+   return false;
+ if (TREE_CODE (op1) != SSA_NAME)
+   return false;
+ def_stmt = SSA_NAME_DEF_STMT (op1);
+ code = gimple_assign_rhs_code (def_stmt);
+   }
if (code != BIT_FIELD_REF)
return false;
op1 = gimple_assign_rhs1 (def_stmt);
*** simplify_vector_constructor (gimple_stmt
*** 1997,2003 
{
  if (TREE_CODE (ref) != SSA_NAME)
return false;
! if (!useless_type_conversion_p (type, TREE_TYPE (ref)))
return false;
  orig = ref;
}
--- 2016,2024 
{
  if (TREE_CODE (ref) != SSA_NAME)
return false;
! if (! VECTOR_TYPE_P (TREE_TYPE (ref))
! || ! useless_type_conversion_p (TREE_TYPE (op1),
! TREE_TYPE (TREE_TYPE (ref
return false;
  orig = ref;
}
*** simplify_vector_constructor (gimple_stmt
*** 2010,2016 
  return false;
  
if (maybe_ident)
! gimple_assign_set_rhs_from_tree (gsi, orig);
else
  {
tree mask_type, *mask_elts;
--- 2031,2043 
  return false;
  
if (maybe_ident)
! {
!   if (conv_code == ERROR_MARK)
!   gimple_assign_set_rhs_from_tree (gsi, orig);
!   else
!   gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
!   NULL_TREE, NULL_TREE);
! }
else
  {
tree mask_type, *mask_elts;
*** simplify_vector_constructor (gimple_stmt
*** 2028,2034 
for (i = 0; i < nelts; i++)
mask_elts[i] = build_int_cst (TREE_TYPE (mask_type), sel[i]);
op2 = build_vector (mask_type, mask_elts);
!   gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
  }
update_stmt (gsi_stmt (*gsi));
return true;
--- 2055,2072 
for (i = 0; i < nelts; i++)
mask_elts[i] = build_int_cst (TREE_TYPE (mask_type), sel[i]);
op2 = build_vector (mask_type, mask_elts);
!   if (conv_code == ERROR_MARK)
!   gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
!   else
!   {
! gimple *perm
!   = gimple_build_assign (make_ssa_name (TREE_TYPE (orig)),
!  VEC_PERM_EXPR, orig, orig, op2);
! orig = gimple_assign_lhs (perm);
! gsi_insert_before (gsi, perm, GSI_SAME_STMT);
! gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
! NULL_TREE, NULL_TREE);
!   }
  }
update_stmt (gsi_stmt (*gsi));
return true;
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-35.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/forw

[PATCH] Fix PR55152

2016-09-28 Thread Richard Biener

The following adds a pattern to optimize MAX (a, -a) to abs (a).

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-09-28  Richard Biener  

PR middle-end/55152
* match.pd: Add max(a,-a) -> abs(a) pattern.

* gcc.dg/pr55152.c: New testcase.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 240565)
+++ gcc/match.pd(working copy)
@@ -1271,6 +1284,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (max:c (min:c @0 @1) @1)
  @1)
+/* max(a,-a) -> abs(a).  */
+(simplify
+ (max:c @0 (negate @0))
+ (if (TREE_CODE (type) != COMPLEX_TYPE
+  && (! ANY_INTEGRAL_TYPE_P (type)
+ || TYPE_OVERFLOW_UNDEFINED (type)))
+  (abs @0)))
 (simplify
  (min @0 @1)
  (switch
Index: gcc/testsuite/gcc.dg/pr55152.c
===
--- gcc/testsuite/gcc.dg/pr55152.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr55152.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ffinite-math-only -fstrict-overflow 
-fdump-tree-optimized" } */
+
+double g (double a)
+{
+  return (a>=-a)?a:-a;
+}
+int f(int a)
+{
+  return (a>=-a)?a:-a;
+}
+
+/* { dg-final { scan-tree-dump-times "ABS_EXPR" 2 "optimized" } } */


Re: [PATCH] Bits from Early LTO debug merge -- move stuff from late to early finish

2016-09-28 Thread Jason Merrill

OK.


RE: [PATCH] [ARC] New CPU C-define handler.

2016-09-28 Thread Claudiu Zissulescu
 
> This looks like a good improvement to me.
> 
> Thanks,
> Andrew
> 
Committed r240577

Thank you for your review,
Claudiu


Re: [C++ PATCH] Fix constexpr switch handling (PR c++/77467, take 2)

2016-09-28 Thread Jason Merrill

OK, thanks.

Jason


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Eli Zaretskii
> From: DJ Delorie 
> Cc: f...@deneb.enyo.de, gcc-patches@gcc.gnu.org, seze...@gmail.com
> Date: Tue, 27 Sep 2016 15:23:46 -0400
> 
> > Why would it need to
> > change?  It's perfectly okay to link GPL code with LGPL code, we do
> > this all the time with libgcc, no?  Or am I missing something?
> 
> libgcc has an exception that covers most of those cases; be careful when
> comparing those to your (his) use case.

OK, then take Gnulib as a better example.  If my arithmetics is
correct, about 70% of its files is LGPL, the rest GPL.  Which doesn't
keep many GNU projects under GPL from using Gnulib.


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-09-28 Thread Jason Merrill
On Tue, Sep 27, 2016 at 11:10 AM, Bernd Edlinger
 wrote:
> On 09/27/16 16:42, Jason Merrill wrote:
>> On Tue, Sep 27, 2016 at 10:28 AM, Bernd Edlinger
>>  wrote:
>>> On 09/27/16 16:10, Florian Weimer wrote:
 * Bernd Edlinger:

>> “0 << 0” is used in a similar context, to create a zero constant for a
>> multi-bit subfield of an integer.
>>
>> This example comes from GDB, in bfd/elf64-alpha.c:
>>
>> |   insn = INSN_ADDQ | (16 << 21) | (0 << 16) | (0 << 0);
>>
>
> Of course that is not a boolean context, and will not get a warning.
>
> Question is if "if (1 << 0)" is possibly a miss-spelled "if (1 < 0)".
>
> Maybe 1 and 0 come from macro expansion

 But what's the intent of treating 1 << 0 and 0 << 0 differently in the
 patch, then?
>>>
>>> I am not sure if it was a good idea.
>>>
>>> I saw, we had code of the form
>>> bool flag = 1 << 2;
>>>
>>> another value LOOKUP_PROTECT is  1 << 0, and
>>> bool flag = 1 << 0;
>>>
>>> would at least not overflow the allowed value range of a boolean.
>>
>> Assigning a bit mask to a bool variable is still probably not what was
>> intended, even if it doesn't change the value.
>
> That works for me too.
> I can simply remove that exception.

Sounds good.

Jason


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Eli Zaretskii
> From: DJ Delorie 
> Cc: gcc-patches@gcc.gnu.org, e...@gnu.org, f...@deneb.enyo.de
> Date: Tue, 27 Sep 2016 15:45:28 -0400
> 
> 
> I wonder if us relicensing our modified copy would apply to your old
> copy.  I mean, are we sure RMS knows you're also relicensing an old
> copy, and that the current copy is being relicensed only to avoid future
> issues.  If we're only doing it to document the decision, the fact that
> hashtab.h and filename_cmp.c are still GPL mostly negates the
> effectiveness of our change anyway.
> 
> (i.e. it seems like you can get what you need whether we relicense ours
> or not, and relicensing ours doesn't have much actual effect).

I see no reason why setting the record straight about license
compatibility should be an issue for us.  Better late than never,
right?


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Florian Weimer
* Eli Zaretskii:

> If my arithmetics is correct, about 70% of its files is LGPL, the
> rest GPL.  Which doesn't keep many GNU projects under GPL from using
> Gnulib.

Sorry, I don't understand.  Surely anything released under the LGPL by
the FSF can be upgraded to the current GPLv3?  First upgrade to the
latest LGPL, then switch over to the GPLv3?

(I assume that the FSF releases their works under the “any later
version” regime.)


Re: [PATCH v3] Optimize strchr to strlen

2016-09-28 Thread Jason Merrill
I think this broke g++.dg/ext/builtin10.C.

Jason


[gomp4] more tile parsing

2016-09-28 Thread Nathan Sidwell
In my recent patch for tile clauses, I noticed some tile tests didn't appear to 
fail, even though the loops they tiled were ill formed.  I figured I'd find out 
why in the fullness of time.


Didn't take long for time to be full.  The omp_for parsing routines had no 
knowledge of tile, so did not checking.  This patch rectifies that.


I did notice a couple of nits in the c++ parser, also fixed here
1) we'd complain about a missing inner loop twice, which is a bit repetative. 
Fixed by only emitting the missing for token error, if we didn't already emit an 
error.


2) 'do ... while (1)' is an odd way to write a non-terminating loop.  Changed to 
'for (;;) ...'  I guess the loop condition was originally something other than '1'.


applied to gomp4

nathan


Re: [gomp4] more tile parsing

2016-09-28 Thread Nathan Sidwell

ENOPATCH
2016-09-28  Nathan Sidwell  

	c/
	* c-parser.c (c_parser_omp_for_tiling): Accept tiling constructs.

	cp/
	* parser.c (cp_parser_omp_for_loop): Deal with tile clause.  Don't
	emit a parse error about missing for after already emitting
	one.  Use more conventional for idiom for unbounded loop.
	* semantics.c (finish_omp_for): Deal with tile clause.

	testsuite/
	* c-c++-common/goacc/tile-2.c: New.
	* c-c++-common/goacc/tile.c: Fix.
	
Index: c/c-parser.c
===
--- c/c-parser.c	(revision 240524)
+++ c/c-parser.c	(working copy)
@@ -14920,11 +14920,17 @@ c_parser_omp_for_loop (location_t loc, c
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   location_t for_loc;
+  bool tiling = false;
   vec *for_block = make_tree_vector ();
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+  {
+	tiling = true;
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+  }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 	 && OMP_CLAUSE_ORDERED_EXPR (cl))
   {
@@ -14954,7 +14960,7 @@ c_parser_omp_for_loop (location_t loc, c
 	  pc = &OMP_CLAUSE_CHAIN (*pc);
 }
 
-  gcc_assert (collapse >= 1 && ordered >= 0);
+  gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;
 
   declv = make_tree_vec (count);
Index: cp/parser.c
===
--- cp/parser.c	(revision 240524)
+++ cp/parser.c	(working copy)
@@ -33660,10 +33660,16 @@ cp_parser_omp_for_loop (cp_parser *parse
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   vec *for_block = make_tree_vector ();
   auto_vec orig_inits;
+  bool tiling = false;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+  {
+	tiling = true;
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+  }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 	 && OMP_CLAUSE_ORDERED_EXPR (cl))
   {
@@ -33693,7 +33699,7 @@ cp_parser_omp_for_loop (cp_parser *parse
 	  pc = &OMP_CLAUSE_CHAIN (*pc);
 }
 
-  gcc_assert (collapse >= 1 && ordered >= 0);
+  gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;
 
   declv = make_tree_vec (count);
@@ -33712,13 +33718,15 @@ cp_parser_omp_for_loop (cp_parser *parse
   if (code != CILK_FOR
 	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
-	  cp_parser_error (parser, "for statement expected");
+	  if (!collapse_err)
+	cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
   if (code == CILK_FOR
 	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
 	{
-	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  if (!collapse_err)
+	cp_parser_error (parser, "_Cilk_for statement expected");
 	  return NULL;
 	}
   loc = cp_lexer_consume_token (parser->lexer)->location;
@@ -33878,7 +33886,7 @@ cp_parser_omp_for_loop (cp_parser *parse
 	 nested.  Hopefully the final version clarifies this.
 	 For now handle (multiple) {'s and empty statements.  */
   cp_parser_parse_tentatively (parser);
-  do
+  for (;;)
 	{
 	  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	break;
@@ -33893,14 +33901,13 @@ cp_parser_omp_for_loop (cp_parser *parse
 	  else
 	{
 	  loc = cp_lexer_peek_token (parser->lexer)->location;
-	  error_at (loc, "not enough collapsed for loops");
+	  error_at (loc, "not enough for loops to collapse");
 	  collapse_err = true;
 	  cp_parser_abort_tentative_parse (parser);
 	  declv = NULL_TREE;
 	  break;
 	}
 	}
-  while (1);
 
   if (declv)
 	{
Index: cp/semantics.c
===
--- cp/semantics.c	(revision 240524)
+++ cp/semantics.c	(working copy)
@@ -7986,11 +7986,19 @@ finish_omp_for (location_t locus, enum t
   gcc_assert (TREE_VEC_LENGTH (declv) == TREE_VEC_LENGTH (incrv));
   if (TREE_VEC_LENGTH (declv) > 1)
 {
-  tree c = find_omp_clause (clauses, OMP_CLAUSE_COLLAPSE);
+  tree c;
+
+  c = find_omp_clause (clauses, OMP_CLAUSE_TILE);
   if (c)
-	collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
-  if (collapse != TREE_VEC_LENGTH (declv))
-	ordered = TREE_VEC_LENGTH (declv);
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (c));
+  else
+	{
+	  c = find_omp_clause (clauses, OMP_CLAUSE_COLLAPSE);
+	  if (c)
+	collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
+	  if (collapse != TREE_VEC_LENGTH (declv))
+	ordered = TREE_VEC_LENGTH (declv)

Re: Implement -Wimplicit-fallthrough (version 9)

2016-09-28 Thread Tom Tromey
> "Michael" == Michael Matz  writes:

Michael> Not accepting
Michael>   /* And here we intentionally fall through because ... */
Michael> and forcing users to replace this by:
Michael>   /* fallthrough */
Michael> is not robust either.  It's actually actively lowering robustness of 
code, 
Michael> it creates work for programmers that will be regarded as pointless 
(and 
Michael> rightly so) and will merely lead to everybody disabling the warning 
(see 
Michael> our generated files)

We can't control what programmers might do.  My point is that accepting
too much is actively bad -- it hides errors.  If this somehow makes some
programmer fall down a slippery slope, well, that's their error, not
gcc's.

TBH I think it would be better not to parse comments at all.  Heuristics
are generally bad and this case and ensuing discussion is a great
demonstration of that.

The other day I built gdb with -Wimplicit-fallthrough.  I was surprised
to find that gcc rejected this:

default:
  {
complaint (&symfile_complaints,
   _("Storage class %d not recognized during scan"),
   sclass);
  }
  /* FALLTHROUGH */

  /* C_FCN is .bf and .ef symbols.  I think it is sufficient
 to handle only the C_FUN and C_EXT.  */
case C_FCN:

Presumably without the comment heuristic, this would be accepted.

Michael> The point of warnings is to make code robust under the condition of 
not 
Michael> being a pain by giving zillions of false positives.

My experience so far is that it's not so bad.  gdb actually had comments
in most spots, they just required a quick pass to clean them up:

https://sourceware.org/ml/gdb-patches/2016-09/msg00375.html

And, code bases in more dire straights can just disable the warning after all.

Tom


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Richard Kenner
> Sorry, I don't understand.  Surely anything released under the LGPL by
> the FSF can be upgraded to the current GPLv3?  First upgrade to the
> latest LGPL, then switch over to the GPLv3?

That seems correct to me.


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Eli Zaretskii
> From: Florian Weimer 
> Cc: DJ Delorie ,  gcc-patches@gcc.gnu.org,  seze...@gmail.com
> Date: Wed, 28 Sep 2016 16:43:53 +0200
> 
> * Eli Zaretskii:
> 
> > If my arithmetics is correct, about 70% of its files is LGPL, the
> > rest GPL.  Which doesn't keep many GNU projects under GPL from using
> > Gnulib.
> 
> Sorry, I don't understand.  Surely anything released under the LGPL by
> the FSF can be upgraded to the current GPLv3?  First upgrade to the
> latest LGPL, then switch over to the GPLv3?
> 
> (I assume that the FSF releases their works under the “any later
> version” regime.)

The above was in response to DJ's questions up-thread:

> > Because Ozkan wants to use it in an otherwise LGPL package.
> 
> But then the implementation would need relicensing as well, wouldn't
> it?
> 
> Having both under different licenses is just confusing.

Did I misunderstand the question?


Re: [PATCH v3] Optimize strchr to strlen

2016-09-28 Thread Wilco Dijkstra
Jason Merrill wrote:
> I think this broke g++.dg/ext/builtin10.C.

That's odd. It appears if you add a fold in gimple-fold.c, it no longer calls 
the
folding code in builtins.c. No idea what the idea behind that is (especially 
since
there are other builtins that appear in both files), but this simple patch 
fixes it:

If strchr can't be folded in gimple-fold.c, break so folding code in builtins.c 
is
also called.

OK for commit?

2016-09-28  Wilco Dijkstra  

* gimple-fold.c (gimple_fold_builtin): After failing to fold
strchr, also try the generic folding.
--
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 
ddf4cf0ae68ef6708377fdb1a2b45575d90da799..b6802e81fd1a7fd0b309cb9aa0f984f7bacb6596
 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2948,7 +2948,10 @@ gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case BUILT_IN_STRNCAT:
   return gimple_fold_builtin_strncat (gsi);
 case BUILT_IN_STRCHR:
-  return gimple_fold_builtin_strchr (gsi);
+  if (gimple_fold_builtin_strchr (gsi))
+   return true;
+  /* Perform additional folding in builtin.c.  */
+  break;
 case BUILT_IN_FPUTS:
   return gimple_fold_builtin_fputs (gsi, gimple_call_arg (stmt, 0),
gimple_call_arg (stmt, 1), false);



Re: [PATCH] DWARF: remove pessimistic DWARF version checks for imported entities

2016-09-28 Thread Jason Merrill
On Thu, Aug 18, 2016 at 5:33 AM, Pierre-Marie de Rodat
 wrote:
> A check in dwarf2out_imported_module_or_decl prevents valid strict
> DWARF2 constructs such as DW_TAG_imported_declaration from being emitted
> in dwarf2out_imported_module_or_decl_1.
>
> The latter already protects the emission of newer DWARF tags with
> appropriate checks, so the one in the former is redundant and
> pessimistic.  This function is already called from places like
> process_scope_var, which are not protected anyway.
>
> This patch removes the check in dwarf2out_imported_module_or_decl so
> that tags like DW_TAG_imported_declaration are emitted even in strict
> DWARF2 mode.
>
> Bootstrapped and regtested on x86_64-linux, no regression.

This check was added for bug 41405, a Darwin bootstrap problem.
Dominique, can you verify that the compiler still bootstraps with this
change?  OK if so.

Jason


Re: [PATCH v3] Optimize strchr to strlen

2016-09-28 Thread Jason Merrill
OK.

On Wed, Sep 28, 2016 at 11:43 AM, Wilco Dijkstra  wrote:
> Jason Merrill wrote:
>> I think this broke g++.dg/ext/builtin10.C.
>
> That's odd. It appears if you add a fold in gimple-fold.c, it no longer calls 
> the
> folding code in builtins.c. No idea what the idea behind that is (especially 
> since
> there are other builtins that appear in both files), but this simple patch 
> fixes it:
>
> If strchr can't be folded in gimple-fold.c, break so folding code in 
> builtins.c is
> also called.
>
> OK for commit?
>
> 2016-09-28  Wilco Dijkstra  
>
> * gimple-fold.c (gimple_fold_builtin): After failing to fold
> strchr, also try the generic folding.
> --
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 
> ddf4cf0ae68ef6708377fdb1a2b45575d90da799..b6802e81fd1a7fd0b309cb9aa0f984f7bacb6596
>  100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -2948,7 +2948,10 @@ gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case BUILT_IN_STRNCAT:
>return gimple_fold_builtin_strncat (gsi);
>  case BUILT_IN_STRCHR:
> -  return gimple_fold_builtin_strchr (gsi);
> +  if (gimple_fold_builtin_strchr (gsi))
> +   return true;
> +  /* Perform additional folding in builtin.c.  */
> +  break;
>  case BUILT_IN_FPUTS:
>return gimple_fold_builtin_fputs (gsi, gimple_call_arg (stmt, 0),
> gimple_call_arg (stmt, 1), false);
>


Re: [PATCH] DWARF: remove pessimistic DWARF version checks for imported entities

2016-09-28 Thread Dominique d'Humières

> Le 28 sept. 2016 à 17:44, Jason Merrill  a écrit :
> 
> On Thu, Aug 18, 2016 at 5:33 AM, Pierre-Marie de Rodat
>  wrote:
>> A check in dwarf2out_imported_module_or_decl prevents valid strict
>> DWARF2 constructs such as DW_TAG_imported_declaration from being emitted
>> in dwarf2out_imported_module_or_decl_1.
>> 
>> The latter already protects the emission of newer DWARF tags with
>> appropriate checks, so the one in the former is redundant and
>> pessimistic.  This function is already called from places like
>> process_scope_var, which are not protected anyway.
>> 
>> This patch removes the check in dwarf2out_imported_module_or_decl so
>> that tags like DW_TAG_imported_declaration are emitted even in strict
>> DWARF2 mode.
>> 
>> Bootstrapped and regtested on x86_64-linux, no regression.
> 
> This check was added for bug 41405, a Darwin bootstrap problem.
> Dominique, can you verify that the compiler still bootstraps with this
> change?  OK if so.

OK, but not before tomorrow afternoon.

Dominique

> Jason



Re: [PATCH] Avoid store forwarding issue in vectorizing strided SLP loads

2016-09-28 Thread Jeff Law

On 09/28/2016 05:41 AM, Richard Biener wrote:


Currently strided SLP vectorization creates vector constructors composed
of vector elements.  This is a constructor form that is not handled
specially by the expander but it gets expanded via piecewise stores
to scratch memory and a load of that scratch memory.

Ugh.  Yup, obviously bad, even without store forwarding.



does not work on any CPU I know of).  The following patch simply avoids
the issue by making the vectorizer create integer loads, composing
a vector of that integers and then punning that to the desired vector
type.  Thus (V4SF){V2SF, V2SF} becomes (V4SF)(V2DI){DI, DI} and
every body is happy.  Especially x264 gets a 5-10% improvement
(dependent on vector size and x86 sub-architecture).
Seems reasonable to me -- there's not a lot of difference (conceptually) 
to how we've used SImode constants to construct DFmode constants in the 
past.




Handling the vector-vector constructors on the expander side would
require either similar punning or making vec_init parametric on
the element mode plus supporting vector elements in all targets
(which in the end probably will simply pun them similarly).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2016-09-28  Richard Biener  

* tree-vect-stmts.c (vectorizable_load): Avoid emitting vector
constructors with vector elements.

Seems quite reasonable.

jeff



[PATCH PR77718]

2016-09-28 Thread Aaron Sawdey
This patch that Bernd put in PR77718 seems to be fine. Bootstrap and
regtest done on powerpc64le, no new failures. Ok for trunk?

2016-09-28  Bernd Schmidt  

* builtins.c (expand_builtin_memcmp): don't swap args unless
result is only being compared with zero.

Index: builtins.c
===
--- builtins.c  (revision 240511)
+++ builtins.c  (working copy)
@@ -3707,11 +3707,13 @@ expand_builtin_memcmp (tree exp, rtx tar
 
   by_pieces_constfn constfn = NULL;
 
-  const char *src_str = c_getstr (arg1);
-  if (src_str == NULL)
-src_str = c_getstr (arg2);
-  else
-std::swap (arg1_rtx, arg2_rtx);
+  const char *src_str = c_getstr (arg2);
+  if (result_eq && src_str == NULL)
+{
+  src_str = c_getstr (arg1);
+  if (src_str != NULL)
+   std::swap (arg1_rtx, arg2_rtx);
+}
 
   /* If SRC is a string constant and block move would be done
  by pieces, we can avoid loading the string from memory

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[PATCH][v4] GIMPLE store merging pass

2016-09-28 Thread Kyrill Tkachov

Hi all,

This is v4 of the pass. It addresses feedback by Bernhard, including typo fixes 
and
skipping of debug statements.
Also, I've extended it to handle the case from PR 23684 and included that 
testcase
in the patch.  Merging now triggers more often.
I've also added purging of dead EH edges that was missing from the previous 
versions.

Bootstrapped and tested on aarch64-none-linux-gnu, x86_64-unknown-linux-gnu, 
arm-none-linux-gnueabihf.
Also tested on aarch64 big-endian.

I saw no regressions on my x86_64 machine on SPEC2006. I think the changes in 
individual benchmarks were
in the noise though I think the x86_64 expanders could be improved to split 
expensive movabsq instructions
into two movl ones (I think).

Bill, could you or someone else with access to Power benchmarking try this 
patch out on some benchmarks
that you usually use? The new pass in this patch is on by default and can be 
turned off by -fno-store-merging
if needed.  Jakub indicated that his last attempt at this work caused 
regressions on powerpc so I'd like to
see if this patch is okay in that regard.

Thanks,
Kyrill

2016-09-28  Kyrylo Tkachov  

PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.

2016-09-28  Kyrylo Tkachov  
Jakub Jelinek  
Andrew Pinski  

PR middle-end/22141
PR rtl-optimization/23684
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.target/aarch64/ldp_stp_4.c: Likewise.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.dg/store_merging_7.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.
* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
commit 74e3e742e446f6a892c8ee755f640756efc816cd
Author: Kyrylo Tkachov 
Date:   Tue Jul 12 12:30:47 2016 +0100

GIMPLE store merging pass

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9e08fbf..d83b2f1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1300,6 +1300,7 @@ OBJS = \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
 	gimple-ssa-split-paths.o \
+	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
 	gimple-ssa-sprintf.o \
 	gimple-streamer-in.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 168735e..4d73852 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1460,6 +1460,10 @@ fstrict-volatile-bitfields
 Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
 Force bitfield accesses to match their type width.
 
+fstore-merging
+Common Var(flag_store_merging) Optimization
+Use the tree store merging pass.
+
 fguess-branch-probability
 Common Report Var(flag_guess_branch_prob) Optimization
 Enable guessing of branch probabilities.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6767462..83acd3b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -403,7 +403,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
--fstdarg-opt -fstrict-aliasing @gol
+-fstdarg-opt -fstore-merging -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
@@ -414,8 +414,8 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp -funconstrained-commons @gol
+-ftree-switch-conversion -ftree-tail-merge @gol
+-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -6593,6 +6593,7 @@ compilation time.
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
+-fstore-merging @gol
 -ftree-bit-ccp @gol
 -ftree-ccp @gol
 -ftree-ch @gol
@@ -7927,6 +7928,13 @@ Perform scalar replacement of aggregates.  This pass replaces structure
 reference

Re: Change license of filenames.h to LGPL

2016-09-28 Thread DJ Delorie
Florian Weimer  writes:
> Sorry, I don't understand.  Surely anything released under the LGPL by
> the FSF can be upgraded to the current GPLv3?  First upgrade to the
> latest LGPL, then switch over to the GPLv3?
>
> (I assume that the FSF releases their works under the “any later
> version” regime.)

That's not what that means.  The license terms cannot be changed, and
remain "version X or later", even if the user chooses to apply the terms
of some later version.

The "or later" allows the users alternatives for when the FSF fixes a
"license bug" in a newer version; it avoids needing to update all the
licenses.  It also future-proofs the older code, ensuring it's
license-compatible with newer code.


Re: [PATCH][v4] GIMPLE store merging pass

2016-09-28 Thread Bill Schmidt
On Sep 28, 2016, at 10:54 AM, Kyrill Tkachov  
wrote:
> 
> Hi all,
> 
> This is v4 of the pass. It addresses feedback by Bernhard, including typo 
> fixes and
> skipping of debug statements.
> Also, I've extended it to handle the case from PR 23684 and included that 
> testcase
> in the patch.  Merging now triggers more often.
> I've also added purging of dead EH edges that was missing from the previous 
> versions.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu, x86_64-unknown-linux-gnu, 
> arm-none-linux-gnueabihf.
> Also tested on aarch64 big-endian.
> 
> I saw no regressions on my x86_64 machine on SPEC2006. I think the changes in 
> individual benchmarks were
> in the noise though I think the x86_64 expanders could be improved to split 
> expensive movabsq instructions
> into two movl ones (I think).
> 
> Bill, could you or someone else with access to Power benchmarking try this 
> patch out on some benchmarks
> that you usually use? The new pass in this patch is on by default and can be 
> turned off by -fno-store-merging
> if needed.  Jakub indicated that his last attempt at this work caused 
> regressions on powerpc so I'd like to
> see if this patch is okay in that regard.

Hi Kyrill, 

Thanks for the heads-up!  I will have someone on my team look at this as soon 
as possible.

Much obliged,

Bill

> 
> Thanks,
> Kyrill
> 
> 2016-09-28  Kyrylo Tkachov  
> 
>PR middle-end/22141
>* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
>* common.opt (fstore-merging): New Optimization option.
>* opts.c (default_options_table): Add entry for
>OPT_ftree_store_merging.
>* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
>* passes.def: Insert pass_tree_store_merging.
>* tree-pass.h (make_pass_store_merging): Declare extern
>prototype.
>* gimple-ssa-store-merging.c: New file.
>* doc/invoke.texi (Optimization Options): Document
>-fstore-merging.
> 
> 2016-09-28  Kyrylo Tkachov  
>Jakub Jelinek  
>Andrew Pinski  
> 
>PR middle-end/22141
>PR rtl-optimization/23684
>* gcc.c-torture/execute/pr22141-1.c: New test.
>* gcc.c-torture/execute/pr22141-2.c: Likewise.
>* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
>* gcc.target/aarch64/ldp_stp_4.c: Likewise.
>* gcc.dg/store_merging_1.c: New test.
>* gcc.dg/store_merging_2.c: Likewise.
>* gcc.dg/store_merging_3.c: Likewise.
>* gcc.dg/store_merging_4.c: Likewise.
>* gcc.dg/store_merging_5.c: Likewise.
>* gcc.dg/store_merging_6.c: Likewise.
>* gcc.dg/store_merging_7.c: Likewise.
>* gcc.target/i386/pr22141.c: Likewise.
>* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
> 



Re: [PATCH][v4] GIMPLE store merging pass

2016-09-28 Thread Kyrill Tkachov


On 28/09/16 16:59, Bill Schmidt wrote:

On Sep 28, 2016, at 10:54 AM, Kyrill Tkachov  
wrote:

Hi all,

This is v4 of the pass. It addresses feedback by Bernhard, including typo fixes 
and
skipping of debug statements.
Also, I've extended it to handle the case from PR 23684 and included that 
testcase
in the patch.  Merging now triggers more often.
I've also added purging of dead EH edges that was missing from the previous 
versions.

Bootstrapped and tested on aarch64-none-linux-gnu, x86_64-unknown-linux-gnu, 
arm-none-linux-gnueabihf.
Also tested on aarch64 big-endian.

I saw no regressions on my x86_64 machine on SPEC2006. I think the changes in 
individual benchmarks were
in the noise though I think the x86_64 expanders could be improved to split 
expensive movabsq instructions
into two movl ones (I think).

Bill, could you or someone else with access to Power benchmarking try this 
patch out on some benchmarks
that you usually use? The new pass in this patch is on by default and can be 
turned off by -fno-store-merging
if needed.  Jakub indicated that his last attempt at this work caused 
regressions on powerpc so I'd like to
see if this patch is okay in that regard.

Hi Kyrill,

Thanks for the heads-up!  I will have someone on my team look at this as soon 
as possible.


Thanks!

Kyrill


Much obliged,

Bill


Thanks,
Kyrill

2016-09-28  Kyrylo Tkachov  

PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.

2016-09-28  Kyrylo Tkachov  
Jakub Jelinek  
Andrew Pinski  

PR middle-end/22141
PR rtl-optimization/23684
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.target/aarch64/ldp_stp_4.c: Likewise.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.dg/store_merging_7.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.
* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.





Re: [PATCH][RTL ifcvt] Transform (X == CST) ? -CST : Y into (X == CST) ? -X : Y when conditional negation is available

2016-09-28 Thread Jeff Law

On 09/28/2016 02:48 AM, Kyrill Tkachov wrote:

Hi all,

This patch tries to avoid materialising an immediate
when comparison has shown that it is already loaded.
This is performed during ifcvt and only when the target has the
conditional negate
or inverse optabs, which for now is only aarch64.

Thus for the code:
int
foo (int a, int b)
{
  return a == 5 ? -5 : b;
}

we currently emit on aarch64:
foo:
mov w2, -5
cmp w0, 5
cselw0, w1, w2, ne
ret

but with this patch we emit:
foo:
cmp w0, 5
csneg   w0, w1, w0, ne
ret

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

P.S. The simpler form of the transformation: X == CST ? CST : Y --> X ==
CST ? X : Y
could also be beneficial IMO but I found that it can cause bad codegen
in combine
due to extending the live range of X. I'm still working on a way to work
around that,
but that is a separate piece of work anyway.
FWIW, the simpler form of the transformation is already done just prior 
to leaving SSA form in tree-ssa-uncprop.c.  So there may not be a ton of 
opportunities for the simpler form in the RTL optimizers.


My only concern here is this transformation isn't significantly 
different than the simpler form, which is apparently causing some kind 
of combine problem.


I'd like to understand the combine problem better before giving final 
approval to this patch.



Jeff


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-09-28 Thread Bernd Edlinger
On 09/28/16 16:41, Jason Merrill wrote:
> On Tue, Sep 27, 2016 at 11:10 AM, Bernd Edlinger
>  wrote:
>> On 09/27/16 16:42, Jason Merrill wrote:
>>> On Tue, Sep 27, 2016 at 10:28 AM, Bernd Edlinger
>>>  wrote:
 On 09/27/16 16:10, Florian Weimer wrote:
> * Bernd Edlinger:
>
>>> “0 << 0” is used in a similar context, to create a zero constant for a
>>> multi-bit subfield of an integer.
>>>
>>> This example comes from GDB, in bfd/elf64-alpha.c:
>>>
>>> |   insn = INSN_ADDQ | (16 << 21) | (0 << 16) | (0 << 0);
>>>
>>
>> Of course that is not a boolean context, and will not get a warning.
>>
>> Question is if "if (1 << 0)" is possibly a miss-spelled "if (1 < 0)".
>>
>> Maybe 1 and 0 come from macro expansion
>
> But what's the intent of treating 1 << 0 and 0 << 0 differently in the
> patch, then?

 I am not sure if it was a good idea.

 I saw, we had code of the form
 bool flag = 1 << 2;

 another value LOOKUP_PROTECT is  1 << 0, and
 bool flag = 1 << 0;

 would at least not overflow the allowed value range of a boolean.
>>>
>>> Assigning a bit mask to a bool variable is still probably not what was
>>> intended, even if it doesn't change the value.
>>
>> That works for me too.
>> I can simply remove that exception.
>
> Sounds good.
>

Great.  Is that an "OK with that change"?


Bernd.


Re: [PATCH] Use CONSTRUCTOR_NELTS macro some more

2016-09-28 Thread Jeff Law

On 09/22/2016 02:07 PM, Jakub Jelinek wrote:

Hi!

I've noticed lots of vec_safe_length (CONSTRUCTOR_ELTS (...)) uses
in the sources, which IMHO are less readable than the much more often
used CONSTRUCTOR_NELTS (...) macro that does the same thing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-22  Jakub Jelinek  

* hsa-gen.c (hsa_op_immed::hsa_op_immed Use CONSTRUCTOR_NELTS (...)
instead of vec_safe_length (CONSTRUCTOR_ELTS (...)).
(gen_hsa_ctor_assignment): Likewise.
* print-tree.c (print_node): Likewise.
* tree-dump.c (dequeue_and_dump): Likewise.
* tree-sra.c (sra_modify_constructor_assign): Likewise.
* expr.c (store_constructor): Likewise.
* fold-const.c (operand_equal_p): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* hsa-brig.c (hsa_op_immed::emit_to_buffer): Likewise.
* ipa-icf-gimple.c (func_checker::compare_operand): Likewise.
cp/
* typeck2.c (process_init_constructor_record): Use
CONSTRUCTOR_NELTS (...) instead of
vec_safe_length (CONSTRUCTOR_ELTS (...)).
* decl.c (reshape_init_r): Likewise.
(check_initializer): Likewise.
ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Use
CONSTRUCTOR_NELTS (...) instead of
vec_safe_length (CONSTRUCTOR_ELTS (...)).

OK.
jeff



Re: [Patch AArch64] Add floatdihf2 and floatunsdihf2 patterns

2016-09-28 Thread James Greenhalgh
On Wed, Sep 21, 2016 at 10:42:03AM +0100, James Greenhalgh wrote:
> On Tue, Sep 13, 2016 at 10:31:28AM +0100, James Greenhalgh wrote:
> > On Tue, Sep 06, 2016 at 10:19:50AM +0100, James Greenhalgh wrote:
> > > This patch adds patterns for conversion from 64-bit integer to 16-bit
> > > floating-point values under AArch64 targets which don't have support for
> > > the ARMv8.2-A 16-bit floating point extensions.
> > > 
> > > We implement these by first saturating to a SImode (we know that any
> > > values >= 65504 will round to infinity after conversion to HFmode), then
> > > converting to a DFmode (unsigned conversions could go to SFmode, but there
> > > is no performance benefit to this). Then converting to HFmode.
> > > 
> > > Having added these patterns, the expansion path in "expand_float" will
> > > now try to use them for conversions from SImode to HFmode as there is no
> > > floatsihf2 pattern. expand_float first tries widening the integer size and
> > > looking for a match, so it will try SImode -> DImode. But our DI mode
> > > pattern is going to then saturate us back to SImode which is wasteful.
> > > 
> > > Better, would be for us to provide float(uns)sihf2 patterns directly.
> > > So that's what this patch does.
> > > 
> > > The testcase add in this patch would fail on trunk for AArch64. There is
> > > no libgcc routine to make the conversion, and we don't provide appropriate
> > > patterns in the backend, so we get a link-time error.
> > > 
> > > Bootstrapped and tested on aarch64-none-linux-gnu
> > > 
> > > OK for trunk?
> > 
> > Ping.
> 
> Ping^2

Ping^3

There was an off-list question as to whether the mid-end could catch this,
rather than requiring the target to do so. My objection to that is that it
would involve teaching the midend about saturating narrowing operations,
which if the target doesn't provide them natively require branching.

I'd rather push targets that want DImode to HFmode (and don't provide a
DImode to TFmode to go through first) to use libgcc/soft-fp than try to add
a special generic expander for DImode to HFmode conversions.

Note that even if we did have a generic expander for these types, we would
still need some version of this patch, as we want to override the behaviour
where the ARMv8.2-A 16-bit floating-point types are available.

Thanks,
James

> > > 2016-09-06  James Greenhalgh  
> > > 
> > >   * config/aarch64/aarch64.md (sihf2): Convert to expand.
> > >   (dihf2): Likewise.
> > >   (aarch64_fp16_hf2): New.
> > > 
> > > 2016-09-06  James Greenhalgh  
> > > 
> > >   * gcc.target/aarch64/floatdihf2_1.c: New.
> > > 
> > 
> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > > index 6afaf90..1882a72 100644
> > > --- a/gcc/config/aarch64/aarch64.md
> > > +++ b/gcc/config/aarch64/aarch64.md
> > > @@ -4630,7 +4630,14 @@
> > >[(set_attr "type" "f_cvti2f")]
> > >  )
> > >  
> > > -(define_insn "hf2"
> > > +;; If we do not have ARMv8.2-A 16-bit floating point extensions, the
> > > +;; midend will arrange for an SImode conversion to HFmode to first go
> > > +;; through DFmode, then to HFmode.  But first it will try converting
> > > +;; to DImode then down, which would match our DImode pattern below and
> > > +;; give very poor code-generation.  So, we must provide our own emulation
> > > +;; of the mid-end logic.
> > > +
> > > +(define_insn "aarch64_fp16_hf2"
> > >[(set (match_operand:HF 0 "register_operand" "=w")
> > >   (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
> > >"TARGET_FP_F16INST"
> > > @@ -4638,6 +4645,53 @@
> > >[(set_attr "type" "f_cvti2f")]
> > >  )
> > >  
> > > +(define_expand "sihf2"
> > > +  [(set (match_operand:HF 0 "register_operand")
> > > + (FLOATUORS:HF (match_operand:SI 1 "register_operand")))]
> > > +  "TARGET_FLOAT"
> > > +{
> > > +  if (TARGET_FP_F16INST)
> > > +emit_insn (gen_aarch64_fp16_sihf2 (operands[0], operands[1]));
> > > +  else
> > > +{
> > > +  rtx convert_target = gen_reg_rtx (DFmode);
> > > +  emit_insn (gen_sidf2 (convert_target, operands[1]));
> > > +  emit_insn (gen_truncdfhf2 (operands[0], convert_target));
> > > +}
> > > +  DONE;
> > > +}
> > > +)
> > > +
> > > +;; For DImode there is no wide enough floating-point mode that we
> > > +;; can convert through natively (TFmode would work, but requires a 
> > > library
> > > +;; call).  However, we know that any value >= 65504 will be rounded
> > > +;; to infinity on conversion.  This is well within the range of SImode, 
> > > so
> > > +;; we can:
> > > +;;   Saturate to SImode.
> > > +;;   Convert from that to DFmode
> > > +;;   Convert from that to HFmode (phew!).
> > > +;; Note that the saturation to SImode requires the SIMD extensions.  If
> > > +;; we ever need to provide this pattern where the SIMD extensions are not
> > > +;; available, we would need a different approach.
> > > +
> > > +(define_expand "dihf2"
> > > +  [(set (match_operand:HF 0 "register_operand")
> > > + (FLOATUORS:HF (matc

Re: [PATCH GCC 6/9]Simplify control flow graph for vectorized loop

2016-09-28 Thread Jeff Law

On 09/21/2016 02:52 AM, Bin.Cheng wrote:

On Wed, Sep 14, 2016 at 5:43 PM, Jeff Law  wrote:

On 09/14/2016 07:21 AM, Richard Biener wrote:


On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng  wrote:


Hi,
This is the main patch improving control flow graph for vectorized loop.
It generally rewrites loop peeling stuff in vectorizer.  As described in
patch, for a typical loop to be vectorized like:

   preheader:
 LOOP:
   header_bb:
 loop_body
 if (exit_loop_cond) goto exit_bb
 elsegoto header_bb
   exit_bb:

This patch peels prolog and epilog from the loop, adds guards skipping
PROLOG and EPILOG for various conditions.  As a result, the changed CFG
would look like:

   guard_bb_1:
 if (prefer_scalar_loop) goto merge_bb_1
 elsegoto guard_bb_2

   guard_bb_2:
 if (skip_prolog) goto merge_bb_2
 else goto prolog_preheader

   prolog_preheader:
 PROLOG:
   prolog_header_bb:
 prolog_body
 if (exit_prolog_cond) goto prolog_exit_bb
 else  goto prolog_header_bb
   prolog_exit_bb:

   merge_bb_2:

   vector_preheader:
 VECTOR LOOP:
   vector_header_bb:
 vector_body
 if (exit_vector_cond) goto vector_exit_bb
 else  goto vector_header_bb
   vector_exit_bb:

   guard_bb_3:
 if (skip_epilog) goto merge_bb_3
 else goto epilog_preheader

   merge_bb_1:

   epilog_preheader:
 EPILOG:
   epilog_header_bb:
 epilog_body
 if (exit_epilog_cond) goto merge_bb_3
 else  goto epilog_header_bb

   merge_bb_3:


Note this patch peels prolog and epilog only if it's necessary, as well
as adds different guard_conditions/branches.  Also the first guard/branch
could be further improved by merging it with loop versioning.

Before this patch, up to 4 branch instructions need to be executed before
the vectorized loop is reached in the worst case, while the number is
reduced to 2 with this patch.  The patch also does better in compile time
analysis to avoid unnecessary peeling/branching.
From implementation's point of view, vectorizer needs to update induction
variables and iteration bounds along with control flow changes.
Unfortunately, it also becomes much harder to follow because slpeel_*
functions updates SSA by itself, rather than using update_ssa interface.
This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.
This should make the implementation easier to read, and I think it maybe a
step forward to replace slpeel_* functions with generic GIMPLE loop copy
interfaces as Richard suggested.



I've skimmed over the patch and it looks reasonable to me.


THanks.  I was maybe 15% of the way through the main patch.  Nothing that
gave me cause for concern, but I wasn't ready to ACK it myself yet.

Hi Jeff,
Any update on this one?  Well, it might conflict with the epilogue
vectorization patch set?
I considered Richi's message an ACK for the patch.  Sorry if I wasn't 
clear about that.


While this patch may conflict with the epilogue vectorization patch set, 
but the epilogue vectorization work seems to have stalled, so let's have 
yours go in now.


Jeff



Re: [Patch] Remove all uses of TARGET_FLT_EVAL_METHOD_NON_DEFAULT and poison it

2016-09-28 Thread James Greenhalgh
On Thu, Sep 22, 2016 at 05:55:21PM +, Joseph Myers wrote:
> On Thu, 22 Sep 2016, James Greenhalgh wrote:
> 
> > The relaxation isn't portable, and keeping it in place is tricky, so this
> > patch removes it, and poisons TARGET_FLT_EVAL_METHOD_NON_DEFAULT in
> > system.h to prevent future use.
> > 
> > Bootstrapped and tested on x86_64 with --enable-languages=all,ada,go,obj-c++
> > with no issues.
> > 
> > OK?
> 
> OK.

Hi Joseph,

Is this OK sufficient for me to commit the patch, or must I wait for input
from the other CCed front-end maintainers?

Thanks,
James



Re: [PATCH] Introduce selftest::locate_file

2016-09-28 Thread Jeff Law

On 09/21/2016 06:59 PM, David Malcolm wrote:

On Mon, 2016-09-19 at 11:31 -0600, Jeff Law wrote:

On 09/16/2016 03:19 PM, David Malcolm wrote:



When possible I don't think we want the tests to be target
specific.
Hmm, I'm probably about to argue for Bernd's work...  The 71779
testcase
is a great example -- it uses high/lo_sum.  Not all targets
support
that
-- as long as we don't try to recognize those insns we're likely
OK,
but
that seems fragile long term.  Having an idealized target means
we
can
ignore much of these issues.


An alternative would be to pick a specific target for each test.

It's an alternative, but not one I particularly like since those
tests
won't be consistently run.  With an abstracted target like Bernd
suggests we ought to be able to make most tests work with the
abstracted
target and minimize the number of truely target specific tests.



So I'm real curious, what happens if you run this RTL selftest
under
valgrind?  I have the sneaking suspicion that we'll start doing
some
uninitialized memory reads.


I'm seeing various leaks (an htab within linemap_init for all of
the
line_table fixtures), but no uninitialized reads.

Wow.  I must say I'm surprised.  Good news though.



  +

+  /* Dump taken from comment 2 of PR 71779, of
+ "...the relevant memory access coming out of expand"
+ with basic block IDs added, and prev/next insns set to
+ 0 at ends.  */
+  const char *input_dump
+= (";; MEM[(struct isl_obj *)&obj1] =
&isl_obj_map_vtable;\n"
+   "(insn 1045 0 1046 2 (set (reg:SI 480)\n"
+   "(high:SI (symbol_ref:SI
(\"isl_obj_map_vtable\")
[flags 0xc0] )))
y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1046 1045 1047 2 (set (reg/f:SI 479)\n"
+   "(lo_sum:SI (reg:SI 480)\n"
+   "(symbol_ref:SI (\"isl_obj_map_vtable\")
[flags
0xc0] ))) y.c:12702
-1\n"
+   " (expr_list:REG_EQUAL (symbol_ref:SI
(\"isl_obj_map_vtable\") [flags 0xc0] )\n"
+   "(nil)))\n"
+   "(insn 1047 1046 1048 2 (set (reg:DI 481)\n"
+   "(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1048 1047 1049 2 (set (zero_extract:DI (reg/v:DI
191
[ obj1D.17368 ])\n"
+   "(const_int 32 [0x20])\n"
+   "(const_int 0 [0]))\n"
+   "(reg:DI 481)) y.c:12702 -1\n"
+   " (nil))\n"

So looking at this just makes my head hurt.  Escaping, lots of
quotes,
unnecessary things in the dump, etc.  The question I would have
is
why
not have a file with the dump and read the file?


(nods)

Seems like I need to add a mechanism for telling the selftests
which
directory to load the tests relative to.

What about putting them inside the appropriate gcc.target
directories?
We could have one for the "generic" target assuming we build
something
like that, the others could live in their target specific directory.


jeff


Having selftests that read RTL dumps load them from files rather than
embedding them as strings in the source requires a way to locate the
relevant files.

This patch adds a selftest::locate_file function for locating such
files, relative to "$(SRCDIR)/gcc/testsuite/selftests".  This is
done via a new argument to -fself-test, which supplies the current
value of "$(SRCDIR)/gcc" to cc1.

I chose "$(SRCDIR)/gcc/testsuite/selftests", so as to be below
gcc/testsuite, but not below any of the existing DejaGnu subdirectories,
to avoid selftest-specific files from being picked up by .exp globbing
patterns.  We could add target-specific directories below that dir if
necessary.

I've rewritten the rest of the patch kit to use this to load from .rtl
dump files within that directory, rather than embedding the dumps as
string literals in the C source.

The patch also exposes a selftests::path_to_src_gcc, which could be
used by a selftest to e.g. load a DejaGnu file, so that if need be
we could share .rtl input files between both -fself-test tests and
DejaGnu-based tests for the .rtl frontend.

(Depends on the approved-when-needed
  "[PATCH 2/9] Add selftest::read_file"
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg00476.html ).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk once the dependencies are in?

gcc/ChangeLog:
* Makefile.in (s-selftest) Add $(srcdir) as an argument of
-fself-test.
(selftest-gdb): Likewise.
(selftest-valgrind): Likewise.
* common.opt (fself-test): Rename to...
(fself-test=): ...this, documenting the meaning of the argument.
* selftest-run-tests.c: Include "options.h".
(selftest::run_tests): Initialize selftest::path_to_src_gcc from
flag_self_test.
* selftest.c (selftest::path_to_src_gcc): New global.
(selftest::locate_file): New function.
(selftest::test_locate_file): New function.
(selftest::selftest_c_tests): Call test_locate_file.
* selftest.h (selftest::locate_file): New decl.
(selfte

Re: [PATCH] print-rtx.c: add 'h', v' and 'p' prefixes to regnos

2016-09-28 Thread Jeff Law

On 09/21/2016 01:01 PM, David Malcolm wrote:


Presumably we could use "v" rather than "p" as the prefix for the
first
5 pseudos (up to LAST_VIRTUAL_REGISTER), doing any adjustment at load
time, rather than at dump time.  So the above example would look
like:

   (reg/f:DI v82 virtual-stack-vars)

i.e. the 82 for x86_64's virtual-stack-vars would be prefixed with a
'v', and the loader would adjust it to be the current target's value
for VIRTUAL_STACK_VARS_REGNUM.

Do you like the idea of prefixing regnums of hardregs with 'h'? (so
that all regnos get a one-char prefix) e.g.
  (reg/i:SI h0 ax)
  (reg/i:SF h21 xmm0)


(Replying to myself, in the hope of better demonstrating the idea)

The following patch implements this idea for RTL dumps, so that all REGNO
values in dumps get a one character prefix: 'h' for hard registers, 'v'
for virtual registers, and 'p' for non-virtual pseudos (making it easier
for both humans and parsers to grok the meaning of a REGNO).
I think you nailed it.  h, v & p prefixing for each of the register 
types, but leaving the actual register number as-is in the dump file.




Successfully bootstrapped on x86_64-pc-linux-gnu.
There are various regression test failures involving scan-rtl-dump
due to regexes not matching e.g.
 PASS -> FAIL : gcc.target/i386/pr20020-1.c scan-rtl-dump expand "\\(set \\(reg/i:TI 
0 ax\\)"
 PASS -> FAIL : gcc.target/i386/pr20020-1.c scan-rtl-dump expand "\\(set \\(reg:TI [0-9]* 
\\[  \\]\\)"
If the approach is OK, I can do an updated patch that also fixes up the
relevant tests (adding the appropriate prefixes); this would touch
multiple targets.
Yea.  This obviously highlights some of the long term issues with making 
changes into the dump format.  As I said in our meeting yesterday, I do 
understand the desire to nail down the format :-)  But I also want to 
use the opportunity we have to make the dumps easier for your work to 
read & interpret.


jeff



Re: [PATCH] print-rtx.c: add 'h', v' and 'p' prefixes to regnos

2016-09-28 Thread Bernd Schmidt

On 09/28/2016 06:23 PM, Jeff Law wrote:


  (reg/i:SI h0 ax)
  (reg/i:SF h21 xmm0)


(Replying to myself, in the hope of better demonstrating the idea)

The following patch implements this idea for RTL dumps, so that all REGNO
values in dumps get a one character prefix: 'h' for hard registers, 'v'
for virtual registers, and 'p' for non-virtual pseudos (making it easier
for both humans and parsers to grok the meaning of a REGNO).

I think you nailed it.  h, v & p prefixing for each of the register
types, but leaving the actual register number as-is in the dump file.

I'm actually no longer quite so sure this buys us much: a port might 
have an actual register named "h0", leading to confusion. Virtual and 
hard registers also already have their real name printed after the number.


A "p" prefix for pseudos might still be a good idea, but there's still 
the issue of a real "p0" register name causing confusion.



Bernd



Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components

2016-09-28 Thread Jeff Law

On 09/28/2016 03:04 AM, Segher Boessenkool wrote:



+static void
+place_prologue_for_one_component (unsigned int which, basic_block head)
+{
+  /* The block we are currently dealing with.  */
+  basic_block bb = head;
+  /* Is this the first time we visit this block, i.e. have we just gone
+ down the tree.  */
+  bool first_visit = true;
+
+  /* Walk the dominator tree, visit one block per iteration of this loop.
+ Each basic block is visited twice: once before visiting any children
+ of the block, and once after visiting all of them (leaf nodes are
+ visited only once).  As an optimization, we do not visit subtrees
+ that can no longer influence the prologue placement.  */
+  for (;;)

Is there some reason you wrote this as a loop rather than recursion?
IMHO it makes this function (and spread_components) more difficult to
reason about than it needs to be.


It would recurse a few thousand frames deep on not terribly big testcases
(i.e. I've seen it happen).  This uses a lot of stack space on some ABIs,
and is very slow as well (this is the slowest function here by far).
Unlimited recursion is bad.
I'm surprised the recursion was that deep.  Such is life.   Thanks for 
clarifying.  I won't object to the iterative version. :-)



+/* Place code for prologues and epilogues for COMPONENTS where we can put
+   that code at the end of basic blocks.  */
+static void
+emit_common_tails_for_components (sbitmap components)

[ Snip. ]

+
+  /* Put the code at the end of the BB, but before any final jump.  */
+  if (!bitmap_empty_p (epi))

So what if the final jump uses hard registers in one way or another?   I
don't immediately see anything that verifies it is safe to transpose the
epilogue and the final jump.


Whoops.  Thanks for catching this.

I missed it the first time though the code too.




Conceptually want the epilogue to occur on the outgoing edge(s).  But
you want to actually transpose the epilogue and the control flow insn so
that you can insert the epilogue in one place.


The same problem happens with prologues, too.
Yea.  I guess if a had a jump with an embedded side effect (such as movb 
or addb on the PA), then transposing the control flow insn with the 
prologue would be problematical as well.




A cc0 target can not use separate shrink-wrapping *anyway* if any of the
components would clobber cc0, so that is no problem here.

True, but I'd be more comfortable if we filtered out cc0 targets explicitly.




  I think you need to handle the former more cleanly.  The latter I'd
be comfortable filtering out in try_shrink_wrapping_separate.


I'm thinking to not do the common tail optimisation if BB_END is a
JUMP_INSN but not simplejump_p (or a return or a sibling call).  Do
you see any problem with that?

Seems reasonable.


Jeff


Re: [PATCH][RTL ifcvt] Transform (X == CST) ? -CST : Y into (X == CST) ? -X : Y when conditional negation is available

2016-09-28 Thread Kyrill Tkachov

Hi Jeff,

On 28/09/16 17:07, Jeff Law wrote:

On 09/28/2016 02:48 AM, Kyrill Tkachov wrote:

Hi all,

This patch tries to avoid materialising an immediate
when comparison has shown that it is already loaded.
This is performed during ifcvt and only when the target has the
conditional negate
or inverse optabs, which for now is only aarch64.

Thus for the code:
int
foo (int a, int b)
{
  return a == 5 ? -5 : b;
}

we currently emit on aarch64:
foo:
mov w2, -5
cmp w0, 5
cselw0, w1, w2, ne
ret

but with this patch we emit:
foo:
cmp w0, 5
csneg   w0, w1, w0, ne
ret

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

P.S. The simpler form of the transformation: X == CST ? CST : Y --> X ==
CST ? X : Y
could also be beneficial IMO but I found that it can cause bad codegen
in combine
due to extending the live range of X. I'm still working on a way to work
around that,
but that is a separate piece of work anyway.

FWIW, the simpler form of the transformation is already done just prior to 
leaving SSA form in tree-ssa-uncprop.c.  So there may not be a ton of 
opportunities for the simpler form in the RTL optimizers.



Indeed there weren't that many places, but they were some. They helped in 
avoiding materialising the floating point 0.0
in particular in some cases.


My only concern here is this transformation isn't significantly different than 
the simpler form, which is apparently causing some kind of combine problem.

I'd like to understand the combine problem better before giving final approval 
to this patch.



Sure, I'm attaching the ifcvt patch that does the aforementioned simple 
transform.
The testcase it regresses is:
int
foo (float a, float b, float c, int d)
{
  return a > 5 && a < 10 ? 6 : 0;
}

As far as I got is:
The relevant pre-combine (aarch64) RTL without the patch is:
(insn 21 19 45 2 (set (reg:SI 87)
(zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90 
{*zero_extendqisi2_aarch64}
 (expr_list:REG_DEAD (reg:SI 86)
(nil)))
(insn 47 45 48 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 87)
(const_int 0 [0]))) fbad.c:4 392 {cmpsi}
 (expr_list:REG_DEAD (reg:SI 87)
(nil)))
(insn 48 47 30 2 (set (reg:SI 76 [  ])
(if_then_else:SI (ne (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 89)
(const_int 0 [0]))) fbad.c:4 444 {*cmovsi_insn}
 (expr_list:REG_DEAD (reg:SI 89)
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil


and with the patch it is:
(insn 21 19 45 2 (set (reg:SI 87)
(zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90 
{*zero_extendqisi2_aarch64}
 (expr_list:REG_DEAD (reg:SI 86)
(nil)))
(insn 46 45 47 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 87)
(const_int 0 [0]))) fbad.c:4 392 {cmpsi}
 (nil))
(insn 47 46 30 2 (set (reg:SI 76 [  ])
(if_then_else:SI (eq (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 87)
(reg:SI 89))) fbad.c:4 444 {*cmovsi_insn}
 (expr_list:REG_DEAD (reg:SI 89)
(expr_list:REG_DEAD (reg:SI 87)
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil)

Combine is trying to combine 21 -> 47 in the first case (21 -> 46 in the 
second) and without the
transformation ends up looking into the uses of CC as well, the IF_THEN_ELSE in 
insn 48.
But I think it only does that if reg 87 is dead in insn 47.
In the end what happens is it fails to merge the comparison and the use of the 
comparison
and ends up emitting extra code.
So before:
fmovs2, 5.0e+0
fmovs1, 1.0e+1
movw0, 6
fcmpes0, s2
fccmpes0, s1, 0, gt
cselw0, w0, wzr, mi //conditional select
ret

after:
foo:
fmovs2, 5.0e+0
fmovs1, 1.0e+1
movw1, 6
fcmpes0, s2
fccmpes0, s1, 0, gt
csetw0, mi // conditional set
cmpw0, 0
cselw0, w0, w1, eq
ret

I think it has to do with the logic to set and use added_sets_2 in try_combine
but I got lost trying to debug it, so I sort of attributed it to "extending 
live ranges
of registers is bad"

Kyrill




Jeff


diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 24542f008485e6c28e068030fa301f2ce040efc1..18cda590f3c17d0f7a48ba3e3f09bd715f60891b 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1810,13 +1810,38 @@ noce_try_cmove (struct noce_if_info *if_info)
   && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
 {
   start_sequence ();
+  rtx cond = if_info->cond;
 
-  code = GET_CODE (if_info->cond);
-  target = noce_emit_cmove (if_info, if_info->x, code,
-XEXP (if_info->cond, 0),
-XEXP (if_info->cond, 1),
-if_info->a, if_info->b);
+  code = GET_CODE (cond);
+
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  /* Transform x == CST ? CST : y into x == CST ? x : y to avoid
+	 materializing CST for the c

Re: [PATCH] print-rtx.c: add 'h', v' and 'p' prefixes to regnos

2016-09-28 Thread Jeff Law

On 09/28/2016 10:30 AM, Bernd Schmidt wrote:

On 09/28/2016 06:23 PM, Jeff Law wrote:


  (reg/i:SI h0 ax)
  (reg/i:SF h21 xmm0)


(Replying to myself, in the hope of better demonstrating the idea)

The following patch implements this idea for RTL dumps, so that all
REGNO
values in dumps get a one character prefix: 'h' for hard registers, 'v'
for virtual registers, and 'p' for non-virtual pseudos (making it easier
for both humans and parsers to grok the meaning of a REGNO).

I think you nailed it.  h, v & p prefixing for each of the register
types, but leaving the actual register number as-is in the dump file.


I'm actually no longer quite so sure this buys us much: a port might
have an actual register named "h0", leading to confusion. Virtual and
hard registers also already have their real name printed after the number.

A "p" prefix for pseudos might still be a good idea, but there's still
the issue of a real "p0" register name causing confusion.
So how do you think we should deal with distinguishing between the 
different registers that may appear in a dump file?


The case I'm worried about is the register meanings in a testsuite dump 
file changing over time if/when new hard registers are added to the port 
or we introduce new virtual registers.


FOr hard regs and virtuals we can probably map backwards using their 
names.  So given a register in a dump, if we can't reverse map it back 
to a hard reg or a virtual, then we assume its a pseudo?


jeff


Re: [PATCH] print-rtx.c: add 'h', v' and 'p' prefixes to regnos

2016-09-28 Thread Bernd Schmidt

On 09/28/2016 06:36 PM, Jeff Law wrote:

A "p" prefix for pseudos might still be a good idea, but there's still
the issue of a real "p0" register name causing confusion.

So how do you think we should deal with distinguishing between the
different registers that may appear in a dump file?


I think the main problem we were trying to solve is making sure we can 
make future-proof dumps. So that would argue for allowing h0, v0, p0 
syntax in the reader, but not printing them out that way by default.


Also, if we don't already, allow hard regs to be specified by name in 
the reader, and maybe even require it for virtuals.



Bernd


Re: [PATCH][v4] GIMPLE store merging pass

2016-09-28 Thread Pat Haugen
On 09/28/2016 10:54 AM, Kyrill Tkachov wrote:
> +fstore-merging
> +Common Var(flag_store_merging) Optimization
> +Use the tree store merging pass.
> +

Did you purposely leave off "Report" for this option? I noticed the option 
didn't show up in the "options enabled:" section of the .s file when 
-fverbose-asm is specified.

-Pat



Re: [PATCH 2/2] Disable .gnu_attribute tags in compatibility-ldbl.o

2016-09-28 Thread Joseph Myers
On Wed, 28 Sep 2016, Alan Modra wrote:

> > I'd expect libraries such as libstdc++ and libgcc (generally, all compiler 
> > and libc libraries) to be set up in such a way that they will work with 
> > all long double choices in user code (via mangling and headers mapping 
> > access to long double library functions to the right versions for the 
> > chosen type) - and so need to be compiled without these attribute tags to 
> > avoid the linker complaining when someone links them with user code built 
> > with a non-default choice of long double.  Certainly for glibc I'd think 
> > using the option globally to build everything is the right choice (well, 
> > except for libnldbl.a, where -mlong-double-64 attributes are logically 
> > correct).
> 
> Yes, and this is why the linker only warns rather than errors on
> mismatching .gnu.attributes tags.

But for a library that is aware of long double variants, it shouldn't even 
warn.  And given that we don't build multiple copies of GCC's libraries, 
they should be aware of the variants (via mangling them differently, 
ensuring versions of the relevant functions for each long double type are 
present, etc.) and so using them should not result in warnings.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR55152

2016-09-28 Thread Joseph Myers
On Wed, 28 Sep 2016, Richard Biener wrote:

> Index: gcc/testsuite/gcc.dg/pr55152.c
> ===
> --- gcc/testsuite/gcc.dg/pr55152.c(revision 0)
> +++ gcc/testsuite/gcc.dg/pr55152.c(working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -ffinite-math-only -fstrict-overflow 
> -fdump-tree-optimized" } */
> +
> +double g (double a)
> +{
> +  return (a>=-a)?a:-a;

You should need -fno-signed-zeros for this (that is, for the 
transformation to MAX_EXPR), not -ffinite-math-only.  For a == -0, that 
function should return -0, but abs would produce +0.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch] Remove all uses of TARGET_FLT_EVAL_METHOD_NON_DEFAULT and poison it

2016-09-28 Thread Joseph Myers
On Wed, 28 Sep 2016, James Greenhalgh wrote:

> On Thu, Sep 22, 2016 at 05:55:21PM +, Joseph Myers wrote:
> > On Thu, 22 Sep 2016, James Greenhalgh wrote:
> > 
> > > The relaxation isn't portable, and keeping it in place is tricky, so this
> > > patch removes it, and poisons TARGET_FLT_EVAL_METHOD_NON_DEFAULT in
> > > system.h to prevent future use.
> > > 
> > > Bootstrapped and tested on x86_64 with 
> > > --enable-languages=all,ada,go,obj-c++
> > > with no issues.
> > > 
> > > OK?
> > 
> > OK.
> 
> Hi Joseph,
> 
> Is this OK sufficient for me to commit the patch, or must I wait for input
> from the other CCed front-end maintainers?

It is sufficient for you to commit the patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v3] Optimize strchr to strlen

2016-09-28 Thread Christophe Lyon
On 28 September 2016 at 16:45, Jason Merrill  wrote:
> I think this broke g++.dg/ext/builtin10.C.
>
> Jason

It also broke:
  gc  gcc.c-torture/execute/builtins/strchr.c execution,  -O1
c.c-torture/execute/builtins/strchr.c (execution) on arm* and aarch64*

Christophe


[PATCH] Check for overflow in filesystem::last_write_time

2016-09-28 Thread Jonathan Wakely

A system_clock::time_point can't be used to represent all valid values
of a 64-bit time_t because nearly half the bits are used up for the
nanoseconds part. This makes filesystem::last_write_time() check the
time_t values and report an error if it can't be converted to the
time_point type. This means we don't get undefined behaviour for files
with mtime after the 24th century :-)

Thanks to Eric Fiselier for point it out.

* include/experimental/bits/fs_fwd.h (file_time_type): Simplify
definition.
* src/filesystem/ops.cc (file_time): Take error_code parameter and
check for overflow.
(do_copy_file, last_write_time): Pass error_code in file_time calls.
* testsuite/experimental/filesystem/operations/last_write_time.cc:
New.
* testsuite/util/testsuite_fs.h (scoped_file): Define RAII helper.

Tested x86_64-linux, committed to trunk. Backports for this (and some
other filesystem changes) to follow soon.

commit 3a59a5aabe310ca9e659dcf73829b3f89454f3ab
Author: Jonathan Wakely 
Date:   Wed Sep 28 13:51:09 2016 +0100

Check for overflow in filesystem::last_write_time

* include/experimental/bits/fs_fwd.h (file_time_type): Simplify
definition.
* src/filesystem/ops.cc (file_time): Take error_code parameter and
check for overflow.
(do_copy_file, last_write_time): Pass error_code in file_time calls.
* testsuite/experimental/filesystem/operations/last_write_time.cc:
New.
* testsuite/util/testsuite_fs.h (scoped_file): Define RAII helper.

diff --git a/libstdc++-v3/include/experimental/bits/fs_fwd.h 
b/libstdc++-v3/include/experimental/bits/fs_fwd.h
index 57aa4d3..b9cc041 100644
--- a/libstdc++-v3/include/experimental/bits/fs_fwd.h
+++ b/libstdc++-v3/include/experimental/bits/fs_fwd.h
@@ -253,7 +253,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   operator^=(directory_options& __x, directory_options __y) noexcept
   { return __x = __x ^ __y; }
 
-  typedef chrono::time_point file_time_type;
+  using file_time_type = std::chrono::system_clock::time_point;
 
   // operational functions
 
diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 0ecb8b9..659cfbb 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -288,16 +288,24 @@ namespace
   }
 
   inline fs::file_time_type
-  file_time(const stat_type& st) noexcept
+  file_time(const stat_type& st, std::error_code& ec) noexcept
   {
 using namespace std::chrono;
-return fs::file_time_type{
 #ifdef _GLIBCXX_USE_ST_MTIM
-   seconds{st.st_mtim.tv_sec} + nanoseconds{st.st_mtim.tv_nsec}
+time_t s = st.st_mtim.tv_sec;
+nanoseconds ns{st.st_mtim.tv_nsec};
 #else
-   seconds{st.st_mtime}
+time_t s = st.st_mtime;
+nanoseconds ns{};
 #endif
-};
+
+if (s >= (nanoseconds::max().count() / 1e9))
+  {
+   ec = std::make_error_code(std::errc::value_too_large); // EOVERFLOW
+   return fs::file_time_type::min();
+  }
+ec.clear();
+return fs::file_time_type{seconds{s} + ns};
   }
 
   // Returns true if the file descriptor was successfully closed,
@@ -373,11 +381,11 @@ namespace
  }
else if (is_set(option, opts::update_existing))
  {
-   if (file_time(*from_st) <= file_time(*to_st))
- {
-   ec.clear();
-   return false;
- }
+   const auto from_mtime = file_time(*from_st, ec);
+   if (ec)
+ return false;
+   if ((from_mtime <= file_time(*to_st, ec)) || ec)
+ return false;
  }
else if (!is_set(option, opts::overwrite_existing))
  {
@@ -1036,7 +1044,7 @@ fs::last_write_time(const path& p)
 fs::file_time_type
 fs::last_write_time(const path& p, error_code& ec) noexcept
 {
-  return do_stat(p, ec, [](const auto& st) { return file_time(st); },
+  return do_stat(p, ec, [&ec](const auto& st) { return file_time(st, ec); },
 file_time_type::min());
 }
 
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc
new file mode 100644
index 000..b1aea20
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc
@@ -0,0 +1,111 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Yo

libgo patch committed: Pass -fcompiling-runtime when testing the runtime package

2016-09-28 Thread Ian Lance Taylor
My earlier patch to inline runtime.getcallerpc and runtime.getcallersp
when compiling the runtime package was incomplete, because it only
took effect when passing the -fcompiling-runtime option, and that
option is not passed when testing the runtime package.  This patch by
Than McIntosh fixes that by passing the option.  Bootstrapped and ran
Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 240560)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-4046a883070c1f5f58de336f7378f3bca69ea2b6
+c79a35411c1065c71add196fdeca6e5207a79248
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 240560)
+++ libgo/Makefile.am   (working copy)
@@ -1291,6 +1291,7 @@ runtime.inc: s-runtime-inc; @true
 s-runtime-inc: runtime-go.lo
$(SHELL) $(srcdir)/mvifdiff.sh runtime.inc.tmp runtime.inc
$(STAMP) $@
+runtime_check_GOCFLAGS = -fgo-compiling-runtime
 runtime/check: $(CHECK_DEPS)
@$(CHECK)
 .PHONY: runtime/check
Index: libgo/Makefile.in
===
--- libgo/Makefile.in   (revision 240560)
+++ libgo/Makefile.in   (working copy)
@@ -1249,6 +1249,7 @@ CHECK_DEPS = $(toolexeclibgo_DATA) $(too
 @LIBGO_IS_SOLARIS_FALSE@matchargs_os = 
 extra_go_files_runtime = runtime_sysinfo.go
 runtime_go_lo_GOCFLAGS = -fgo-c-header=runtime.inc.tmp -fgo-compiling-runtime
+runtime_check_GOCFLAGS = -fgo-compiling-runtime
 @LIBGO_IS_BSD_TRUE@golang_org_x_net_route_lo = \
 @LIBGO_IS_BSD_TRUE@golang_org/x/net/route/route.lo
 


[PATCH] include/std/chrono (system_clock): Fix typo in comment.

2016-09-28 Thread Jonathan Wakely

Committed to trunk.

commit 6246cb3120548a1dfc62bf7aa538f3e5538123c9
Author: Jonathan Wakely 
Date:   Wed Sep 28 19:03:43 2016 +0100

* include/std/chrono (system_clock): Fix typo in comment.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index f29d8e1..11e7fa2 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -784,7 +784,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // Clocks.
 
 // Why nanosecond resolution as the default?
-// Why have std::system_clock always count in the higest
+// Why have std::system_clock always count in the highest
 // resolution (ie nanoseconds), even if on some OSes the low 3
 // or 9 decimal digits will be always zero? This allows later
 // implementations to change the system_clock::now()


Re: [patch, libgfortran] PR77707 formatted direct access: nextrec off by one

2016-09-28 Thread Steve Kargl
On Tue, Sep 27, 2016 at 10:15:40PM -0700, Jerry DeLisle wrote:
> I plan to commit the attached patch in the next few days. Fairly simple.
> 
> Regression tested on x86-64.
> 

Looks ok to me.

-- 
Steve


[PATCH, i386 libgcc]: Remove obsolete workaround for PR 44174

2016-09-28 Thread Uros Bizjak
Hello!

Attached patch removes obsolete register allocator workaround from
gcc-4.6 era. The problem was  fixed in gcc-4.7+.

2015-09-28  Uros Bizjak  

* config/i386/cpuinfo.c (__get_cpuid_output): Remove.
(__cpu_indicator_init): Call __get_cpuid, not __get_cpuid_output.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/cpuinfo.c
===
--- config/i386/cpuinfo.c   (revision 240579)
+++ config/i386/cpuinfo.c   (working copy)
@@ -381,20 +381,6 @@ get_available_features (unsigned int ecx, unsigned
   __cpu_model.__cpu_features[0] = features;
 }
 
-/* A noinline function calling __get_cpuid. Having many calls to
-   cpuid in one function in 32-bit mode causes GCC to complain:
-   "can't find a register in class CLOBBERED_REGS".  This is
-   related to PR rtl-optimization 44174. */
-
-static int __attribute__ ((noinline))
-__get_cpuid_output (unsigned int __level,
-   unsigned int *__eax, unsigned int *__ebx,
-   unsigned int *__ecx, unsigned int *__edx)
-{
-  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
-}
-
-
 /* A constructor function that is sets __cpu_model and __cpu_features with
the right values.  This needs to run only once.  This constructor is
given the highest priority and it should run before constructors without
@@ -406,7 +392,7 @@ __cpu_indicator_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
 
-  int max_level = 5;
+  int max_level;
   unsigned int vendor;
   unsigned int model, family, brand_id;
   unsigned int extended_model, extended_family;
@@ -416,7 +402,7 @@ __cpu_indicator_init (void)
 return 0;
 
   /* Assume cpuid insn present. Run in level 0 to get vendor id. */
-  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
+  if (!__get_cpuid (0, &eax, &ebx, &ecx, &edx))
 {
   __cpu_model.__cpu_vendor = VENDOR_OTHER;
   return -1;
@@ -431,7 +417,7 @@ __cpu_indicator_init (void)
   return -1;
 }
 
-  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
 {
   __cpu_model.__cpu_vendor = VENDOR_OTHER;
   return -1;


Re: C++ PATCH for c++/68703 (dependent vector length)

2016-09-28 Thread David Edelsohn
Hi, Jason

This patch added testcases g++.dg/ext/vector32.C and vector32a.C, but
there is no gcc/testsuite/ChangeLog entry.

The testcases are failing on AIX with a strange ICE:

src/gcc/testsuite/g++.dg/ext/vector32.C: In function 'int main()':
src/gcc/testsuite/g++.dg/ext/vector32.C:18:1: internal compiler error:
in get, at cgraph.h:395

Do you want me to open a new PR for this?

Thanks, David


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-28 Thread Martin Sebor

On 09/24/2016 10:39 AM, Andreas Schwab wrote:

I'm still seeing these failures on m68k:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c  (test for warnings, line 358)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c  (test for warnings, line 1222)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c  (test for warnings, line 1272)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c  (test for warnings, line 1383)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test for excess errors)
Excess errors:
/daten/aranym/gcc/gcc-20160924/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c:1408:3:
 warning: specified size 4294967295 exceeds the size 2 of the destination 
object [-Wformat-length=]


I'm pretty sure the first two are due to the same problem as bug 77735.
I plan to look into fixing it today or later this week.  The last two
were caused by bug v77762 and should be fixed by now.

Martin


Re: [PATCH, Fortran] Extension: COTAN and degree-valued trig intrinsics with -fdec-math

2016-09-28 Thread Steve Kargl
On Mon, Sep 26, 2016 at 10:01:27AM -0400, Fritz Reese wrote:
> 
> Attached is a patch extending the GNU Fortran front-end to support
> some additional math intrinsics, enabled with a new compile flag
> -fdec-math. The flag adds the COTAN intrinsic (cotangent), as well as
> degree versions of all trigonometric intrinsics (SIND, TAND, ACOSD,
> etc...). This extension allows for further compatibility with legacy
> code that depends on the compiler to support such intrinsic functions.
> 

I plan to review this patch over the weekend.  Two things
to consider.

1) The documentation should note that these intrinsics are
   for compatibility with legacy code and should strongly
   discourage their use in new code.

2) In regards to Joseph and Tobias' comments, the documentation
   should give a hint to the quality of implementation.  Argument
   reduction can be a real pain and without a formal numerical
   analysis, I can imagine large ULP errors near zeros and 
   infinities.

I haven't looked at the implementation yet, but will suggest that
REAL(4) should probably be simply written in terms of REAL(8),
e.g., 

function sind(x) result(retval)
   real(4) retval
   real(4), intent(in) :: x
   retval = dsind(real(x, 8)) 
end function sind

Yes, the layer of indirection and computations in REAL(8) 
will be slower, but you should have much improved accuracy.

-- 
Steve


Re: Change license of filenames.h to LGPL

2016-09-28 Thread Alexandre Oliva
On Sep 27, 2016, Ozkan Sezer  wrote:

> FYI: What I originally wanted was an authorization _for me_ to use
> filenames.h in LGPL projects with LGPL license notice; the version
> I use is modified (not refer to any external code other than libc,
> i.e. only macros and inlines) and doesn't include hashtab.h either;
> therefore I believe that my request is fulfilled and is not subject
> to the concerns raised by you guys.

It would probably be wise for you to amend the modified copy you'll
distribute with the changes proposd by Eli, and to add a link to this
thread in the archives, should anyone be surprised by the different
license.

As for the copy in GCC, that has additional code, we can then keep it
under the stronger copyleft defenses.

Does that work for everyone involved?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] objc: update documetation and add test-case of constructor/destructor attr.

2016-09-28 Thread Mike Stump
On Aug 10, 2016, at 2:11 AM, Martin Liška  wrote:
> 
> Following patch clarifies usage of ctor and dtor attributes for Objective C.
> Patch survives (on x86_64-linux-gnu):
> 
> make -k check-objc RUNTESTFLAGS="execute.exp"
> 
> Ready for trunk?

Ok.



Re: Implement -Wimplicit-fallthrough (version 9)

2016-09-28 Thread Jakub Jelinek
On Wed, Sep 28, 2016 at 09:29:01AM -0600, Tom Tromey wrote:
> > "Michael" == Michael Matz  writes:
> 
> Michael> Not accepting
> Michael>   /* And here we intentionally fall through because ... */
> Michael> and forcing users to replace this by:
> Michael>   /* fallthrough */
> Michael> is not robust either.  It's actually actively lowering robustness of 
> code, 
> Michael> it creates work for programmers that will be regarded as pointless 
> (and 
> Michael> rightly so) and will merely lead to everybody disabling the warning 
> (see 
> Michael> our generated files)
> 
> We can't control what programmers might do.  My point is that accepting
> too much is actively bad -- it hides errors.  If this somehow makes some
> programmer fall down a slippery slope, well, that's their error, not
> gcc's.
> 
> TBH I think it would be better not to parse comments at all.  Heuristics
> are generally bad and this case and ensuing discussion is a great
> demonstration of that.
> 
> The other day I built gdb with -Wimplicit-fallthrough.  I was surprised
> to find that gcc rejected this:
> 
>   default:
> {
>   complaint (&symfile_complaints,
>  _("Storage class %d not recognized during scan"),
>  sclass);
> }
> /* FALLTHROUGH */
> 
> /* C_FCN is .bf and .ef symbols.  I think it is sufficient
>to handle only the C_FUN and C_EXT.  */
>   case C_FCN:
> 
> Presumably without the comment heuristic, this would be accepted.

Is complaint a noreturn call?  If not, then it would certainly warn, unless
there is [[fallthrough]]; or __attribute__((fallthrough)); etc. (or the
comment).  For the comment, /* FALLTHROUGH */ is the recognized spelling of
the comment, but right now we only look for such comments immediately before
a case/default keyword or user label; if there is another comment in
between, it is ignored.  This is something we are considering to change,
exactly because often the /* FALLTHRU */ comment appears after some case and
then there is unrelated comment before the next case about what that case
handles.

Jakub


[PATCH] Backport of 25 fortran patches

2016-09-28 Thread Steve Kargl
The attached patch and ChangeLog entries are for the
backporting of 25 patches from trunk to the 6-branch.
The bugzilla PR's contained in the patch are

  fortran/41922 fortran/60774 fortran/61318 fortran/68566 fortran/69514
  fortran/69867 fortran/69962 fortran/70006 fortran/71067 fortran/71730
  fortran/71799 fortran/71859 fortran/71862 fortran/77260 fortran/77351
  fortran/77372 fortran/77380 fortran/77391 fortran/77420 fortran/77429 
  fortran/77460 fortran/77506 fortran/77507 fortran/77612 fortran/77694

The patch has been bootstrapped and regression tested on
x86_64-*-freebsd.  Ok to commit?

-- 
Steve
2016-09-28  Steven G. Kargl  

PR fortran/41922
* target-memory.c (expr_to_char): Pass in locus and use it in error
messages.
(gfc_merge_initializers): Ditto.
* target-memory.h: Update prototype for gfc_merge_initializers ().
* trans-common.c (get_init_field): Use the correct locus.

PR fortran/60774
* parse.c (next_free,next_fixed): Issue error for statement label
without a statement.

PR fortran/61318
* interface.c (compare_parameter): Use better locus for error message.

PR fortran/68566
* check.c (gfc_check_reshape): Check for constant expression.

PR fortran/69514
* array.c (gfc_match_array_constructor):  If type-spec is present,
walk the array constructor performing possible conversions for 
numeric types.

PR fortran/69867
* decl.c (build_struct): Ensure that pointers point to something.

PR fortran/69962
* decl.c (gfc_set_constant_character_len):  if expr is not
constant issue an error instead of an ICE.

PR fortran/70006
* io.c (gfc_resolve_dt): Use correct locus.
* resolve.c (resolve_branch): Ditto.

PR fortran/71067
* decl.c (match_data_constant): On error, set 'result' to NULL.

PR fortran/71730
* decl.c (char_len_param_value): Check return value of
gfc_reduce_init_expr().

PR fortran/71799
* resolve.c(gfc_resolve_iterator): Failure of type conversion need
not ICE.

PR fortran/71859
* check.c(numeric_check): Prevent ICE.  Issue error for invalid
subroutine as an actual argument when numeric argument is expected.

PR fortran/71862
* class.c: Remove assert.  Iterate over component only if non-null.

PR fortran/77260
* gcc/fortran/trans-decl.c (generate_local_decl): Suppress warning
for unused variable if symbol is entry point.

PR fortran/77351
* frontend-passes.c (remove_trim,combine_array_constructor): Check for
NULL pointer.

PR fortran/77372
simplify.c (simplify_ieee_selected_real_kind): Check for NULL pointers.

PR fortran/77380
* dependency.c (gfc_check_dependency): Do not assert with
-fcoarray=lib.

PR fortran/77391
* resolve.c (deferred_requirements): New function to check F2008:C402.
(resolve_fl_variable,resolve_fl_parameter): Use it.

PR fortran/77420
* trans-common.c:  Handle array elements in equivalence when
the lower and upper bounds of array spec are NULL.

PR fortran/77429 
* dependency.c (gfc_check_dependency):  Convert gcc_assert() to
a conditional and possible call to  gfc_internal_error().

PR fortran/77460
* simplify.c (simplify_transformation_to_scalar):  On error, result
may be NULL, simply return.

PR fortran/77506
* array.c (gfc_match_array_constructor): CHARACTER(len=*) cannot
appear in an array constructor.

PR fortran/77507
* intrinsic.c (add_functions):  Use correct keyword.

PR fortran/77612
* decl.c (char_len_param_value): Check parent namespace for 
seen_implicit_none.

PR fortran/77694
* frontend-passes.c (optimize_binop_array_assignment): Check pointer
for NULL.

2016-09-28  Steven G. Kargl  

PR fortran/77507
* ieee/ieee_arithmetic.F90 (IEEE_VALUE_4,IEEE_VALUE_8,IEEE_VALULE_10,
IEEE_VALUE_16):  Use correct keyword.

2016-09-28  Steven G. Kargl  

PR fortran/41922
* gfortran.dg/equiv_constraint_5.f90: Adjust the error message.
* gfortran.dg/equiv_constraint_7.f90: Ditto.
* gfortran.dg/pr41922.f90: New test.

PR fortran/60774
* gfortran.dg/empty_label.f: Adjust test for new error message.
* gfortran.dg/empty_label.f90: Ditto.
* gfortran.dg/empty_label_typedecl.f90: Ditto.
* gfortran.dg/label_3.f90: Deleted (redundant with empty_label.f90).
* gfortran.dg/warnings_are_errors_1.f90: Remove invalid statement label.

PR fortran/61318
* gfortran.dg/pr61318.f90: New test.

PR fortran/68566
* gfortran.dg/pr68566.f90: new test.

PR fortran/69514
* gfortra

Re: Implement -Wimplicit-fallthrough (version 9)

2016-09-28 Thread Bernd Schmidt

On 09/28/2016 02:15 PM, Michael Matz wrote:

P.S.: Initially I even wanted to argue that the mere existence of _any_
comment before a case label would disable the warning.  I don't have the
numbers but I bet even that version would have found the very same bugs
that the picky version has.


Sounds like a pretty good idea to me for a default setting. If we really 
want to have multiple levels of the warning. I agree that it's likely to 
find the majority of problems, and it no longer depends on language and 
spelling of the comment.



Bernd


Patch ping

2016-09-28 Thread Jakub Jelinek
Hi!

I'd like to ping the

http://gcc.gnu.org/ml/gcc-patches/2016-09/msg01436.html

patch, containing various fixes for gimple-ssa-sprintf.c.
If the 0 < var to var > 0 changes are deemed too controversial, I can
separate them from the other changes.

Jakub


Re: Patch ping

2016-09-28 Thread Bernd Schmidt

On 09/28/2016 09:24 PM, Jakub Jelinek wrote:

I'd like to ping the

http://gcc.gnu.org/ml/gcc-patches/2016-09/msg01436.html

patch, containing various fixes for gimple-ssa-sprintf.c.
If the 0 < var to var > 0 changes are deemed too controversial, I can
separate them from the other changes.


I'd like to see them separated because they are obvious and good, so 
please install them first.



Bernd



Re: debug container mutex association

2016-09-28 Thread François Dumont

On 27/09/2016 12:32, Jonathan Wakely wrote:

Index: include/debug/safe_base.h

===
--- include/debug/safe_base.h(revision 240509)
+++ include/debug/safe_base.h(working copy)
@@ -121,11 +121,11 @@
void
_M_detach();

+  public:
/** Likewise, but not thread-safe. */
void
_M_detach_single() throw ();

-  public:
/** Determines if we are attached to the given sequence. */
bool
_M_attached_to(const _Safe_sequence_base* __seq) const



Would this be a smaller change, that doesn't make the member
accessible to all code?

--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -50,6 +50,7 @@ namespace __gnu_debug
  class _Safe_iterator_base
  {
friend class _Safe_sequence_base;
+template friend class _Safe_sequence;

  public:
/** The sequence this iterator references; may be NULL to indicate
.

I am not a great fan of friend class. As long as it was friend 
declaration between iterator and sequence base types it was ok. Now that 
we need to make a template class friend I consider that it is too much 
friendship and prefer to make the necessary method public.


But if you think otherwise just tell me and I will use your approach.

François



Re: [PATCH] fix typos behind incorrect destination buffer length in -Wformat-length warning (77762)

2016-09-28 Thread Martin Sebor

On 09/27/2016 10:43 PM, Jeff Law wrote:

On 09/27/2016 04:50 PM, Martin Sebor wrote:

The attached patch corrects a couple of typos in argument numbers
in the handling of __builtin__vsnprintf_chk calls in the gimple-
ssa-sprintf pass, and another couple of typos in the test for
this that were masking this failure.

As an aside, the patch also fixes the off-by-one line test failures
introduced in r240503.  If there is a way to make the line numbers
relative (as suggested in
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02070.html) I'm happy
to update the -Wformat-length tests to make use of them (and document
it on the Wiki) if someone can point me at an example (or
documentation).  I couldn't find examples of dg-warning directives
that use the feature.

Thanks
Martin

gcc-77762.diff


PR c/77762 - Incorrect destination buffer length in -Wformat-length
warning

gcc/testsuite/ChangeLog:
2016-09-27  Martin Sebor  

PR c/77762
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_vsnprintf_chk_s):
Call __builtin___vsnprintf_chk, not __builtin___snprintf_chk.
(test_sprintf_p_const): Adjust line numbers to avoid failures
introduced in r240503.

gcc/ChangeLog:
2016-09-27  Martin Sebor  

PR c/77762
* gimple-ssa-sprintf.c (pass_sprintf_length::handle_gimple_call):
Fix typos.

OK.

The relative line number capability was just added:

https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01617.html


Thanks.  I added it to the Wiki:

  https://gcc.gnu.org/wiki/HowToPrepareATestcas

Martin


Shared mutex pool

2016-09-28 Thread François Dumont

Hi

Here is the patch to share a mutex pool between debug mode and 
shared_ptr implementation. It saves 392 bytes on generated .so and will 
make sure that fixing false sharing will impact both usages.


I preferred to leave implementation in shared_ptr.cc to avoid 
introducing another translation unit.


* src/c++11/shared_ptr.cc (mask, invalid, get_mutex): Move
declaration...
* src/c++11/mutex_pool.h: ... here. New.
* src/c++11/debug.cc: Use latter.

Tested under Linux x86_64, normal and debug modes.

Ok to commit ?

François

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 6b0db11..d79e43b 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -40,6 +40,8 @@
 
 #include  // for __cxa_demangle
 
+#include "mutex_pool.h"
+
 using namespace std;
 
 namespace
@@ -50,15 +52,13 @@ namespace
   __gnu_cxx::__mutex&
   get_safe_base_mutex(void* address)
   {
-const size_t mask = 0xf;
-static __gnu_cxx::__mutex safe_base_mutex[mask + 1];
-
 // Use arbitrarily __gnu_debug::vector as the container giving
 // alignment of debug containers.
 const auto alignbits = __builtin_ctz(alignof(__gnu_debug::vector));
-const size_t index
-  = (reinterpret_cast(address) >> alignbits) & mask;
-return safe_base_mutex[index];
+const unsigned char index
+  = (reinterpret_cast(address) >> alignbits)
+  & __gnu_internal::mask;
+return __gnu_internal::get_mutex(index);
   }
 
   void
diff --git a/libstdc++-v3/src/c++11/mutex_pool.h b/libstdc++-v3/src/c++11/mutex_pool.h
new file mode 100644
index 000..a352467
--- /dev/null
+++ b/libstdc++-v3/src/c++11/mutex_pool.h
@@ -0,0 +1,34 @@
+// Mutex pool used to limit contention -*- C++ -*-
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+namespace __gnu_internal _GLIBCXX_VISIBILITY(hidden)
+{
+  const unsigned char mask = 0xf;
+  const unsigned char invalid = mask + 1;
+
+  /* Returns different instances of __mutex depending on the passed index
+   * in order to limit contention.
+   */
+  __gnu_cxx::__mutex& get_mutex(unsigned char i);
+}
diff --git a/libstdc++-v3/src/c++11/shared_ptr.cc b/libstdc++-v3/src/c++11/shared_ptr.cc
index 1286ac2..9028040 100644
--- a/libstdc++-v3/src/c++11/shared_ptr.cc
+++ b/libstdc++-v3/src/c++11/shared_ptr.cc
@@ -24,6 +24,21 @@
 
 #include 
 
+#include "mutex_pool.h"
+
+namespace __gnu_internal _GLIBCXX_VISIBILITY(hidden)
+{
+  /* Returns different instances of __mutex depending on the passed index
+   * in order to limit contention.
+   */
+  __gnu_cxx::__mutex&
+  get_mutex(unsigned char i)
+  {
+static __gnu_cxx::__mutex m[mask + 1];
+return m[i];
+  }
+}
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -37,21 +52,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __GTHREADS
   namespace
   {
-const unsigned char mask = 0xf;
-const unsigned char invalid = mask + 1;
-
 inline unsigned char key(const void* addr)
-{ return _Hash_impl::hash(addr) & mask; }
-
-/* Returns different instances of __mutex depending on the passed address
- * in order to limit contention.
- */
-__gnu_cxx::__mutex&
-get_mutex(unsigned char i)
-{
-  static __gnu_cxx::__mutex m[mask + 1];
-  return m[i];
-}
+{ return _Hash_impl::hash(addr) & __gnu_internal::mask; }
   }
 
   _Sp_locker::_Sp_locker(const void* p)
@@ -59,10 +61,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 if (__gthread_active_p())
   {
 	_M_key1 = _M_key2 = key(p);
-	get_mutex(_M_key1).lock();
+__gnu_internal::get_mutex(_M_key1).lock();
   }
 else
-  _M_key1 = _M_key2 = invalid;
+  _M_key1 = _M_key2 = __gnu_internal::invalid;
   }
 
   _Sp_locker::_Sp_locker(const void* p1, const void* p2)
@@ -72,22 +74,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_key1 = key(p1);
 	_M_key2 = key(p2);
 	if (_M_key2 < _M_key1)
-	  get_mutex(_M_key2).lock();
-	get_mutex(_M_key1).lock();

Re: Change license of filenames.h to LGPL

2016-09-28 Thread Ozkan Sezer
On 9/28/16, Alexandre Oliva  wrote:
> On Sep 27, 2016, Ozkan Sezer  wrote:
>
>> FYI: What I originally wanted was an authorization _for me_ to use
>> filenames.h in LGPL projects with LGPL license notice; the version
>> I use is modified (not refer to any external code other than libc,
>> i.e. only macros and inlines) and doesn't include hashtab.h either;
>> therefore I believe that my request is fulfilled and is not subject
>> to the concerns raised by you guys.
>
> It would probably be wise for you to amend the modified copy you'll
> distribute with the changes proposd by Eli, and to add a link to this
> thread in the archives, should anyone be surprised by the different
> license.

I believe the following is enough?
https://sf.net/p/libtimidity/libtimidity/ci/master/tree/src/filenames.h

> As for the copy in GCC, that has additional code, we can then keep it
> under the stronger copyleft defenses.
>
> Does that work for everyone involved?

For me, yes.

--
O.S.


Re: debug container mutex association

2016-09-28 Thread Jonathan Wakely

On 28/09/16 21:30 +0200, François Dumont wrote:

On 27/09/2016 12:32, Jonathan Wakely wrote:

Index: include/debug/safe_base.h

===
--- include/debug/safe_base.h(revision 240509)
+++ include/debug/safe_base.h(working copy)
@@ -121,11 +121,11 @@
   void
   _M_detach();

+  public:
   /** Likewise, but not thread-safe. */
   void
   _M_detach_single() throw ();

-  public:
   /** Determines if we are attached to the given sequence. */
   bool
   _M_attached_to(const _Safe_sequence_base* __seq) const



Would this be a smaller change, that doesn't make the member
accessible to all code?

--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -50,6 +50,7 @@ namespace __gnu_debug
 class _Safe_iterator_base
 {
   friend class _Safe_sequence_base;
+template friend class _Safe_sequence;

 public:
   /** The sequence this iterator references; may be NULL to indicate
.

I am not a great fan of friend class. As long as it was friend 
declaration between iterator and sequence base types it was ok. Now 
that we need to make a template class friend I consider that it is too 
much friendship and prefer to make the necessary method public.


OK.


But if you think otherwise just tell me and I will use your approach.


Making it public is fine. Please commit your original patch, thanks!




Re: [PATCH] Backport of 25 fortran patches

2016-09-28 Thread Jerry DeLisle

On 09/28/2016 12:12 PM, Steve Kargl wrote:

The attached patch and ChangeLog entries are for the
backporting of 25 patches from trunk to the 6-branch.
The bugzilla PR's contained in the patch are

  fortran/41922 fortran/60774 fortran/61318 fortran/68566 fortran/69514
  fortran/69867 fortran/69962 fortran/70006 fortran/71067 fortran/71730
  fortran/71799 fortran/71859 fortran/71862 fortran/77260 fortran/77351
  fortran/77372 fortran/77380 fortran/77391 fortran/77420 fortran/77429
  fortran/77460 fortran/77506 fortran/77507 fortran/77612 fortran/77694

The patch has been bootstrapped and regression tested on
x86_64-*-freebsd.  Ok to commit?



Well, though it is preferred to backport one at a time to help with 
backtracking, I also don't think we should be too pedantic either. Thanks for 
all the work.


OK

Jerry


  1   2   >