date:20161201

Re: [PATCH] Fix UB in dwarf2out.c (PR debug/78587)

2016-12-01 Thread Richard Biener

On Wed, Nov 30, 2016 at 8:02 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch fixes 3 spots with UB in dwarf2out.c, furthermore the first spot
> results in smaller/better debug info.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2016-11-30  Jakub Jelinek  
>
> PR debug/78587
> * dwarf2out.c (loc_descr_plus_const): For negative offset use
> uint_loc_descriptor instead of int_loc_descriptor and perform negation
> in unsigned HOST_WIDE_INT type.
> (scompare_loc_descriptor): Shift UINTVAL left instead of INTVAL.
>
> * gcc.dg/debug/pr78587.c: New test.
>
> --- gcc/dwarf2out.c.jj  2016-11-18 22:55:19.0 +0100
> +++ gcc/dwarf2out.c 2016-11-30 15:16:39.402673343 +0100
> @@ -1514,7 +1514,8 @@ loc_descr_plus_const (dw_loc_descr_ref *
>
>else
>  {
> -  loc->dw_loc_next = int_loc_descriptor (-offset);
> +  loc->dw_loc_next
> +   = uint_loc_descriptor (-(unsigned HOST_WIDE_INT) offset);
>add_loc_descr (&loc->dw_loc_next, new_loc_descr (DW_OP_minus, 0, 0));
>  }
>  }
> @@ -13837,7 +13838,7 @@ scompare_loc_descriptor (enum dwarf_loca
>if (CONST_INT_P (XEXP (rtl, 1))
>   && GET_MODE_BITSIZE (op_mode) < HOST_BITS_PER_WIDE_INT
>   && (size_of_int_loc_descriptor (shift) + 1
> - + size_of_int_loc_descriptor (INTVAL (XEXP (rtl, 1)) << shift)
> + + size_of_int_loc_descriptor (UINTVAL (XEXP (rtl, 1)) << shift)
>   >= size_of_int_loc_descriptor (GET_MODE_MASK (op_mode)) + 1
>  + size_of_int_loc_descriptor (INTVAL (XEXP (rtl, 1))
>& GET_MODE_MASK (op_mode
> @@ -13852,7 +13853,7 @@ scompare_loc_descriptor (enum dwarf_loca
>add_loc_descr (&op0, int_loc_descriptor (shift));
>add_loc_descr (&op0, new_loc_descr (DW_OP_shl, 0, 0));
>if (CONST_INT_P (XEXP (rtl, 1)))
> -op1 = int_loc_descriptor (INTVAL (XEXP (rtl, 1)) << shift);
> +op1 = int_loc_descriptor (UINTVAL (XEXP (rtl, 1)) << shift);
>else
>  {
>add_loc_descr (&op1, int_loc_descriptor (shift));
> --- gcc/testsuite/gcc.dg/debug/pr78587.c.jj 2016-11-30 15:01:08.855153232 
> +0100
> +++ gcc/testsuite/gcc.dg/debug/pr78587.c2016-11-30 15:20:22.0 
> +0100
> @@ -0,0 +1,23 @@
> +/* PR debug/78587 */
> +/* { dg-do compile } */
> +/* { dg-additional-options "-w" } */
> +
> +extern void bar (void);
> +
> +void
> +foo (long long x)
> +{
> +  x ^= 9223372036854775808ULL;
> +  bar ();
> +}
> +
> +struct S { int w[4]; } a[1], b;
> +
> +void
> +baz ()
> +{
> +  int e = (int) baz;
> +  if (e <= -80)
> +e = 0;
> +  b = a[e];
> +}
>
> Jakub

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-01 Thread Jakub Jelinek

On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:
> Isn't this too simplistic?  I mean, if you have say dirtype of signed char
> and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
> has range 32, 64, while I think your routine will yield -128, 127 (well,
> 0 as min and -128 as max as that is what you return for signed type).
> 
> Can't you subtract argmax - argmin (best just in wide_int, no need to build
> trees), and use what you have just for the case where that number doesn't
> fit into the narrower precision, otherwise if argmin - (dirtype) argmin
> == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
> as the range, and in case that it crosses a boundary figure if you can do
> anything than the above?  Guess all cases of signed/unsigned dirtype and/or
> argtype need to be considered.

Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
  || (vr0.type == VR_RANGE
  && integer_zerop (int_const_binop (RSHIFT_EXPR,
   int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
 size_int (TYPE_PRECISION (outer_type)))
part that is important here for the narrowing conversion.

Jakub

Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Richard Biener

On Wed, Nov 30, 2016 at 6:59 PM, Jeff Law  wrote:
> On 11/30/2016 01:38 AM, Richard Biener wrote:
>>
>> On Tue, Nov 29, 2016 at 5:07 PM, Jeff Law  wrote:
>>>
>>> On 11/29/2016 03:23 AM, Richard Biener wrote:


 On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:
>
>
>
>
> I was digging into  issues around the patches for 78120 when I stumbled
> upon
> undesirable bb copying in bb-reorder.c on the m68k.
>
> The core issue is that the m68k does not define a length attribute and
> therefore generic code assumes that the length of all insns is 0 bytes.



 What other targets behave like this?
>>>
>>>
>>> ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c
>>
>>
>> Ok.
>>
>>> cris has a hack to define a length, even though no attempt is made to
>>> make
>>> it accurate.  The hack specifically calls out that it's to make
>>> bb-reorder
>>> happy.
>>>

> That in turn makes bb-reorder think it is infinitely cheap to copy
> basic
> blocks.  In the two codebases I looked at (GCC's runtime libraries and
> newlib) this leads to a 10% and 15% undesirable increase in code size.
>
> I've taken a slight variant of this patch and bootstrapped/regression
> tested
> it on x86_64-linux-gnu to verify sanity as well as built the m68k
> target
> libraries noted above.
>
> OK for the trunk?



 I wonder if it isn't better to default to a length of 1 instead of zero
 when
 there is no length attribute.  There are more users of the length
 attribute
 in bb-reorder.c (and elsewhere as well I suppose).
>>>
>>>
>>> I pondered that as well, but felt it was riskier given we've had a
>>> default
>>> length of 0 for ports that don't define lengths since the early 90s.
>>> It's
>>> certainly easy enough to change that default if you'd prefer.  I don't
>>> have
>>> a strong preference either way.
>>
>>
>> Thinking about this again maybe targets w/o insn-length should simply
>> always use the 'simple' algorithm instead of the STV one?  At least that
>> might be what your change effectively does in some way?
>
> From reading the comments I don't think STC will collapse down into the
> simple algorithm if block copying is disabled.  But Segher would know for
> sure.
>
> WRT the choice of simple vs STC, I doubt it matters much for the processors
> in question.

I guess STC doesn't make much sense if we can't say anything about BB sizes.

Richard.

>
> JEff

[patch,avr] Clean up n_flash field from MCU information.

2016-12-01 Thread Georg-Johann Lay

The introduction of the flash_size field in avr_mcu_t rendered the 
n_flash field redundant.  This patch computes the value of n_flash as 
needed from flash_size and cleans up n_flash.


Ok for trunk?

Johann

gcc/
* config/avr/avr-arch.h (avr_mcu_t) [n_flash]: Remove field.
* config/avr/avr-devices.c (AVR_MCU): Remove N_FLASH macro argument.
* config/avr/avr-mcus.def (AVR_MCU): Remove initializer for n_flash.
* config/avr/avr.c (avr_set_core_architecture) [avr_n_flash]: Use
avr_mcu_types.flash_size to compute default value.
* config/avr/gen-avr-mmcu-specs.c (print_mcu) [cc1_n_flash]: Use
mcu->flash_size to compute value for spec.


Index: config/avr/avr-arch.h
===
--- config/avr/avr-arch.h	(revision 243099)
+++ config/avr/avr-arch.h	(working copy)
@@ -120,9 +120,6 @@ const char *const macro;
   /* Start of text section. */
   int text_section_start;
 
-  /* Number of 64k segments in the flash.  */
-  int n_flash;
-
   /* Flash size in bytes.  */
   int flash_size;
 } avr_mcu_t;
Index: config/avr/avr-devices.c
===
--- config/avr/avr-devices.c	(revision 243099)
+++ config/avr/avr-devices.c	(working copy)
@@ -111,12 +111,12 @@ avr_texinfo[] =
 const avr_mcu_t
 avr_mcu_types[] =
 {
-#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE)\
-  { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE },
+#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, FLASH_SIZE)\
+  { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, FLASH_SIZE },
 #include "avr-mcus.def"
 #undef AVR_MCU
 /* End of list.  */
-  { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0, 0 }
+  { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0 }
 };
 
 
Index: config/avr/avr-mcus.def
===
--- config/avr/avr-mcus.def	(revision 243099)
+++ config/avr/avr-mcus.def	(working copy)
@@ -59,301 +59,298 @@ supply respective built-in macro.
 
TEXT_STARTFirst address of Flash, used in -Ttext=.
 
-   N_FLASH   Number of 64 KiB flash segments, rounded up.  The default
- value for -mn-flash=.
-
FLASH_SIZEFlash size in bytes.
 
"avr2" must be first for the "0" default to work as intended.  */
 
 /* Classic, <= 8K.  */
-AVR_MCU ("avr2", ARCH_AVR2, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 6, 0x2000)
+AVR_MCU ("avr2", ARCH_AVR2, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 0x6)
 
-AVR_MCU ("at90s2313",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2313__", 0x0060, 0x0, 1, 0x800)
-AVR_MCU ("at90s2323",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2323__", 0x0060, 0x0, 1, 0x800)
-AVR_MCU ("at90s2333",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2333__", 0x0060, 0x0, 1, 0x800)
-AVR_MCU ("at90s2343",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2343__", 0x0060, 0x0, 1, 0x800)
-AVR_MCU ("attiny22", ARCH_AVR2, AVR_SHORT_SP, "__AVR_ATtiny22__",  0x0060, 0x0, 1, 0x800)
-AVR_MCU ("attiny26", ARCH_AVR2, AVR_SHORT_SP, "__AVR_ATtiny26__",  0x0060, 0x0, 1, 0x800)
-AVR_MCU ("at90s4414",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90S4414__", 0x0060, 0x0, 1, 0x1000)
-AVR_MCU ("at90s4433",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S4433__", 0x0060, 0x0, 1, 0x1000)
-AVR_MCU ("at90s4434",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90S4434__", 0x0060, 0x0, 1, 0x1000)
-AVR_MCU ("at90s8515",ARCH_AVR2, AVR_ERRATA_SKIP, "__AVR_AT90S8515__",  0x0060, 0x0, 1, 0x2000)
-AVR_MCU ("at90c8534",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90C8534__", 0x0060, 0x0, 1, 0x2000)
-AVR_MCU ("at90s8535",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90S8535__", 0x0060, 0x0, 1, 0x2000)
+AVR_MCU ("at90s2313",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2313__", 0x0060, 0x0, 0x800)
+AVR_MCU ("at90s2323",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2323__", 0x0060, 0x0, 0x800)
+AVR_MCU ("at90s2333",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2333__", 0x0060, 0x0, 0x800)
+AVR_MCU ("at90s2343",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2343__", 0x0060, 0x0, 0x800)
+AVR_MCU ("attiny22", ARCH_AVR2, AVR_SHORT_SP, "__AVR_ATtiny22__",  0x0060, 0x0, 0x800)
+AVR_MCU ("attiny26", ARCH_AVR2, AVR_SHORT_SP, "__AVR_ATtiny26__",  0x0060, 0x0, 0x800)
+AVR_MCU ("at90s4414",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90S4414__", 0x0060, 0x0, 0x1000)
+AVR_MCU ("at90s4433",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S4433__", 0x0060, 0x0, 0x1000)
+AVR_MCU ("at90s4434",ARCH_AVR2, AVR_ISA_NONE, "__AVR_AT90S4434__", 0x0060, 0x0, 0x1000)
+AVR_MCU ("at90s8515",ARCH_AVR2, AVR_ERRATA_SKIP, "__AVR_AT90S8515_

[PATCH PR78559][RFC]Proposed fix

2016-12-01 Thread Bin Cheng

Hi,
After investigation, I believe PR78559 is a combine issue revealed by tree 
level change.  Root causes is after replacing CC register use in 
undobuf.other_insn, its REG_EQUAL/REG_EQUIV notes are no longer valid because 
meaning of CC register has changed in i2/i3 instructions by combine.  For 
following combine sequence, GCC would try to use the note and result in wrong 
code.  This is a proposed patch discarding all REG_EQUAL/REG_EQUIV notes for 
other_insn.  It might be a over-kill, but it's difficult to analyze whether 
registers have been changed or not?  Bootstrap and test on x86_64 and AArch64, 
any suggestion on how to fix this?

Thanks,
bin

2016-12-01  Bin Cheng  

PR rtl-optimization/78559
* combine.c (try_combine): Discard REG_EQUAL and REG_EQUIV for
other_insn in combine.

gcc/testsuite/ChangeLog
2016-12-01  Bin Cheng  

PR rtl-optimization/78559
* gcc.c-torture/execute/pr78559.c: New test.diff --git a/gcc/combine.c b/gcc/combine.c
index 22fb7a9..93b0901 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -4138,7 +4138,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
 PATTERN (undobuf.other_insn)))
  ||(REG_NOTE_KIND (note) == REG_UNUSED
 && !reg_set_p (XEXP (note, 0),
-   PATTERN (undobuf.other_insn
+   PATTERN (undobuf.other_insn)))
+ || REG_NOTE_KIND (note) == REG_EQUAL
+ || REG_NOTE_KIND (note) == REG_EQUIV)
remove_note (undobuf.other_insn, note);
}
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78559.c 
b/gcc/testsuite/gcc.c-torture/execute/pr78559.c
new file mode 100644
index 000..1db1519
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78559.c
@@ -0,0 +1,34 @@
+/* PR rtl-optimization/78559 */
+
+int g = 20;
+int d = 0;
+
+short
+fn2 (int p1, int p2)
+{
+  return p2 >= 2 || 5 >> p2 ? p1 : p1 << p2;
+}
+
+int
+main ()
+{
+  int result = 0;
+lbl_2582:
+  if (g)
+{
+  for (int c = -3; c; c++)
+result = fn2 (1, g);
+}
+  else
+{
+  for (int i = 0; i < 2; i += 2)
+if (d)
+  goto lbl_2582;
+}
+  if (result != 1)
+__builtin_abort ();
+  return 0;
+}
+
+
+

Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Liška


On 11/30/2016 11:46 PM, Martin Sebor wrote:

On 11/24/2016 05:59 AM, Martin Liška wrote:

On 11/24/2016 09:29 AM, Richard Biener wrote:

Please guard with ! TDF_GIMPLE, otherwise the output will not be parseable
with the GIMPLE FE.

RIchard.


Done and verified that and it provides equal dumps for -fdump*-gimple.
Installed as r242837.


Hi Martin,

I'm trying to understand how to interpret the probabilities (to
make sure one of my tests, builtin-sprintf-2.c, is testing what
it's supposed to be testing).

With this example:

  char d2[2];

  void f (void)
  {
if (2 != __builtin_sprintf (d2, "%i", 12))
  __builtin_abort ();
  }

the probability of the branch to abort is 0%:

  f1 ()
  {
int _1;

 [100.0%]:
_1 = __builtin_sprintf (&d, "%i", 12);
if (_1 != 2)
  goto ; [0.0%]
else
  goto ; [100.0%]

 [0.0%]:
__builtin_abort ();

 [100.0%]:
return;
  }


Hello Martin.

Looks I did a small error. I use only only one digit after decimal point, which 
is unable to
display noreturn predictor (defined as PROB_VERY_UNLIKELY):

#define PROB_VERY_UNLIKELY  (REG_BR_PROB_BASE / 2000 - 1) // this is 4

I would suggest to use following patch to display at least 2 digits, that would 
distinguish
between real zero and PROB_VERY_UNLIKELY:

x.c.046t.profile_estimate:

f ()
{
  int _1;

   [100.00%]:
  _1 = __builtin_sprintf (&d2, "%i", 12);
  if (_1 != 2)
goto ; [0.04%]
  else
goto ; [99.96%]

   [0.04%]:
  __builtin_abort ();

   [99.96%]:
  return;

}



Yet the call to abort is in the assembly so I would expect its
probability to be more than zero.  So my question is: it it safe
to be testing for calls to abort in the optimized dump as a way
of verifying that the call has not been eliminated from the program
regardless of their probabilities?


I think so, otherwise the call would be removed.

I'm going to test the patch (and eventually update scanned patterns).

Martin

Patch candidate:

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index b5e866d..de57e89 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -72,12 +72,17 @@ debug_gimple_stmt (gimple *gs)
   print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
 }
 
+/* Print format used for displaying probability of an edge or frequency

+   of a basic block.  */
+
+#define PROBABILITY_FORMAT "[%.2f%%]"
+
 /* Dump E probability to BUFFER.  */
 
 static void

 dump_edge_probability (pretty_printer *buffer, edge e)
 {
-  pp_scalar (buffer, " [%.1f%%]",
+  pp_scalar (buffer, " " PROBABILITY_FORMAT,
 e->probability * 100.0 / REG_BR_PROB_BASE);
 }
 
@@ -1023,7 +1028,7 @@ dump_gimple_label (pretty_printer *buffer, glabel *gs, int spc, int flags)

   dump_generic_node (buffer, label, spc, flags, false);
   basic_block bb = gimple_bb (gs);
   if (bb && !(flags & TDF_GIMPLE))
-   pp_scalar (buffer, " [%.1f%%]",
+   pp_scalar (buffer, " " PROBABILITY_FORMAT,
   bb->frequency * 100.0 / REG_BR_PROB_BASE);
   pp_colon (buffer);
 }
@@ -2590,7 +2595,8 @@ dump_gimple_bb_header (FILE *outf, basic_block bb, int 
indent, int flags)
  if (flags & TDF_GIMPLE)
fprintf (outf, "%*sbb_%d:\n", indent, "", bb->index);
  else
-   fprintf (outf, "%*s [%.1f%%]:\n", indent, "", bb->index,
+   fprintf (outf, "%*s " PROBABILITY_FORMAT ":\n",
+indent, "", bb->index,
 bb->frequency * 100.0 / REG_BR_PROB_BASE);
}
 }



For reference, the directive the test uses since this change was
committed looks like this:

{ dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n *__builtin_abort" 114 
"optimized" }

If I'm reading the heavily escaped regex right it matches any
percentage, even 0.0% (and the test passes).

Thanks
Martin

[PATCH] Fix runtime error: left shift of negative value (PR, ipa/78555).

2016-12-01 Thread Martin Liška


As described in the PR, we do couple of shifts of a negative value. Fixed in 
the patch
and couple of new unit tests are added.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 61a6b5e0c973bd77341a1053609c7ad331691a9e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 30 Nov 2016 15:24:48 +0100
Subject: [PATCH] Fix runtime error: left shift of negative value (PR
 ipa/78555).

gcc/ChangeLog:

2016-11-30  Martin Liska  

	PR ipa/78555
	* sreal.c (sreal::to_int): Make absolute value before shifting.
	(sreal::operator/): Likewise.
	(sreal_verify_negative_division): New test.
	(void sreal_c_tests): Call the new test.
	* sreal.h (sreal::normalize_up): Use new SREAL_ABS and
	SREAL_SIGN macros.
	(sreal::normalize_down): Likewise.
---
 gcc/sreal.c | 20 +---
 gcc/sreal.h |  9 +
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/gcc/sreal.c b/gcc/sreal.c
index 9c43b4e..52e530d 100644
--- a/gcc/sreal.c
+++ b/gcc/sreal.c
@@ -102,14 +102,14 @@ sreal::shift_right (int s)
 int64_t
 sreal::to_int () const
 {
-  int64_t sign = m_sig < 0 ? -1 : 1;
+  int64_t sign = SREAL_SIGN (m_sig);
 
   if (m_exp <= -SREAL_BITS)
 return 0;
   if (m_exp >= SREAL_PART_BITS)
 return sign * INTTYPE_MAXIMUM (int64_t);
   if (m_exp > 0)
-return m_sig << m_exp;
+return sign * (SREAL_ABS (m_sig) << m_exp);
   if (m_exp < 0)
 return m_sig >> -m_exp;
   return m_sig;
@@ -229,7 +229,8 @@ sreal::operator/ (const sreal &other) const
 {
   gcc_checking_assert (other.m_sig != 0);
   sreal r;
-  r.m_sig = (m_sig << SREAL_PART_BITS) / other.m_sig;
+  r.m_sig
+= SREAL_SIGN (m_sig) * (SREAL_ABS (m_sig) << SREAL_PART_BITS) / other.m_sig;
   r.m_exp = m_exp - other.m_exp - SREAL_PART_BITS;
   r.normalize ();
   return r;
@@ -334,6 +335,18 @@ sreal_verify_shifting (void)
 verify_shifting (values[i]);
 }
 
+/* Verify division by (of) a negative value.  */
+
+static void
+sreal_verify_negative_division (void)
+{
+  ASSERT_EQ (sreal (1) / sreal (1), sreal (1));
+  ASSERT_EQ (sreal (-1) / sreal (-1), sreal (1));
+  ASSERT_EQ (sreal (-1234567) / sreal (-1234567), sreal (1));
+  ASSERT_EQ (sreal (-1234567) / sreal (1234567), sreal (-1));
+  ASSERT_EQ (sreal (1234567) / sreal (-1234567), sreal (-1));
+}
+
 /* Run all of the selftests within this file.  */
 
 void sreal_c_tests ()
@@ -341,6 +354,7 @@ void sreal_c_tests ()
   sreal_verify_basics ();
   sreal_verify_arithmetics ();
   sreal_verify_shifting ();
+  sreal_verify_negative_division ();
 }
 
 } // namespace selftest
diff --git a/gcc/sreal.h b/gcc/sreal.h
index ce9cdbb..21f14b0 100644
--- a/gcc/sreal.h
+++ b/gcc/sreal.h
@@ -31,6 +31,9 @@ along with GCC; see the file COPYING3.  If not see
 
 #define SREAL_BITS SREAL_PART_BITS
 
+#define SREAL_SIGN(v) (v < 0 ? -1: 1)
+#define SREAL_ABS(v) (v < 0 ? -v: v)
+
 /* Structure for holding a simple real number.  */
 class sreal
 {
@@ -193,7 +196,6 @@ inline sreal operator>> (const sreal &a, int exp)
 inline void
 sreal::normalize_up ()
 {
-  int64_t s = m_sig < 0 ? -1 : 1;
   unsigned HOST_WIDE_INT sig = absu_hwi (m_sig);
   int shift = SREAL_PART_BITS - 2 - floor_log2 (sig);
 
@@ -208,7 +210,7 @@ sreal::normalize_up ()
   m_exp = -SREAL_MAX_EXP;
   sig = 0;
 }
-  if (s == -1)
+  if (SREAL_SIGN (m_sig) == -1)
 m_sig = -sig;
   else
 m_sig = sig;
@@ -221,7 +223,6 @@ sreal::normalize_up ()
 inline void
 sreal::normalize_down ()
 {
-  int64_t s = m_sig < 0 ? -1 : 1;
   int last_bit;
   unsigned HOST_WIDE_INT sig = absu_hwi (m_sig);
   int shift = floor_log2 (sig) - SREAL_PART_BITS + 2;
@@ -246,7 +247,7 @@ sreal::normalize_down ()
   m_exp = SREAL_MAX_EXP;
   sig = SREAL_MAX_SIG;
 }
-  if (s == -1)
+  if (SREAL_SIGN (m_sig) == -1)
 m_sig = -sig;
   else
 m_sig = sig;
-- 
2.10.2

Re: PING! [PATCH, Fortran, accaf, v1] Add caf-API-calls to asynchronously handle allocatable components in derived type coarrays.

2016-12-01 Thread Andre Vehreschild

Hi all,

and another aftermatch reported by Dominique. Fixing testcase
coarray_lib_alloc_4.f90 for 32-bit. Committed as obvious as r243101. Thanks for
reporting Dominique.

- Andre

On Wed, 30 Nov 2016 16:59:39 +0100
Andre Vehreschild  wrote:

> Fixed -> r243034.
> 
> - Andre
> 
> On Wed, 30 Nov 2016 15:53:39 +0100
> Janus Weil  wrote:
> 
> > Hi,
> >   
> > > on IRC:
> > > 15:28:22 dominiq:  vehre: add /* FALLTHROUGH */
> > >
> > > Done and committed as obvious as r243023.
> > 
> > thanks. However, I still see these two:
> > 
> >   
> > >> > /home/jweil/gcc/gcc7/trunk/libgfortran/caf/single.c: In function
> > >> > ‘_gfortran_caf_get_by_ref’:
> > >> > /home/jweil/gcc/gcc7/trunk/libgfortran/caf/single.c:1863:29: warning:
> > >> > ‘src_size’ may be used uninitialized in this function
> > >> > [-Wmaybe-uninitialized]
> > >> >if (size == 0 || src_size == 0)
> > >> > ~^~~~
> > >> > /home/jweil/gcc/gcc7/trunk/libgfortran/caf/single.c: In function
> > >> > ‘_gfortran_caf_send_by_ref’:
> > >> > /home/jweil/gcc/gcc7/trunk/libgfortran/caf/single.c:2649:29: warning:
> > >> > ‘src_size’ may be used uninitialized in this function
> > >> > [-Wmaybe-uninitialized]
> > >> >if (size == 0 || src_size == 0)
> > >> > ~^~~~
> > 
> > Can you please fix them as well?
> > 
> > Thanks,
> > Janus
> > 
> > 
> > 
> >   
> > >> > 2016-11-30 14:30 GMT+01:00 Andre Vehreschild :
> > >> > > Hi Paul,
> > >> > >
> > >> > > thanks for the review. Committed with the changes requested and the
> > >> > > one reported by Dominique on IRC for coarray_lib_alloc_4 when
> > >> > > compiled with -m32 as r243021.
> > >> > >
> > >> > > Thanks for the review and tests.
> > >> > >
> > >> > > Regards,
> > >> > > Andre
> > >> > >
> > >> > > On Wed, 30 Nov 2016 07:49:13 +0100
> > >> > > Paul Richard Thomas  wrote:
> > >> > >
> > >> > >> Dear Andre,
> > >> > >>
> > >> > >> This all looks OK to me. The only comment that I have that you might
> > >> > >> deal with before committing is that some of the Boolean expressions,
> > >> > >> eg:
> > >> > >> +  int caf_dereg_mode
> > >> > >> +  = ((caf_mode & GFC_STRUCTURE_CAF_MODE_IN_COARRAY) != 0
> > >> > >> +  || c->attr.codimension)
> > >> > >> +  ? ((caf_mode & GFC_STRUCTURE_CAF_MODE_DEALLOC_ONLY) != 0
> > >> > >> +  ? GFC_CAF_COARRAY_DEALLOCATE_ONLY
> > >> > >> +  : GFC_CAF_COARRAY_DEREGISTER)
> > >> > >> +  : GFC_CAF_COARRAY_NOCOARRAY;
> > >> > >>
> > >> > >> are getting be sufficiently convoluted that a small, appropriately
> > >> > >> named, helper function might be clearer. Of course, this is true of
> > >> > >> many parts of gfortran but it is not too late to start making the
> > >> > >> code a bit clearer.
> > >> > >>
> > >> > >> You can commit to the present trunk as far as I am concerned. I know
> > >> > >> that the caf enthusiasts will test it to bits before release!
> > >> > >>
> > >> > >> Regards
> > >> > >>
> > >> > >> Paul
> > >> > >>
> > >> > >>
> > >> > >> On 28 November 2016 at 19:33, Andre Vehreschild 
> > >> > >> wrote:
> > >> > >> > PING!
> > >> > >> >
> > >> > >> > I know it's a lengthy patch, but comments would be nice anyway.
> > >> > >> >
> > >> > >> > - Andre
> > >> > >> >
> > >> > >> > On Tue, 22 Nov 2016 20:46:50 +0100
> > >> > >> > Andre Vehreschild  wrote:
> > >> > >> >
> > >> > >> >> Hi all,
> > >> > >> >>
> > >> > >> >> attached patch addresses the need of extending the API of the
> > >> > >> >> caf-libs to enable allocatable components asynchronous
> > >> > >> >> allocation. Allocatable components in derived type coarrays are
> > >> > >> >> different from regular coarrays or coarrayed components. The
> > >> > >> >> latter have to be allocated on all images or on none.
> > >> > >> >> Furthermore is the allocation a point of synchronisation.
> > >> > >> >>
> > >> > >> >> For allocatable components the F2008 allows to have some
> > >> > >> >> allocated on some images and on others not. Furthermore is the
> > >> > >> >> registration with the caf-lib, that an allocatable component is
> > >> > >> >> present in a derived type coarray no longer a synchronisation
> > >> > >> >> point. To implement these features two new types of coarray
> > >> > >> >> registration have been introduced. The first one just
> > >> > >> >> registering the component with the caf-lib and the latter doing
> > >> > >> >> the allocate. Furthermore has the caf-API been extended to
> > >> > >> >> provide a query function to learn about the allocation status of
> > >> > >> >> a component on a remote image.
> > >> > >> >>
> > >> > >> >> Sorry, that the patch is rather lengthy. Most of this is due to
> > >> > >> >> the structure_alloc_comps' signature change. The routine and its
> > >> > >> >> wrappers are used rather often which needed the appropriate
> > >> > >> >> changes.
> > >> > >> >>
> > >> > >> >> I know I left two or three TODOs in the patch to remind me of
> > >>

PR78599

2016-12-01 Thread Prathamesh Kulkarni

Hi,
As mentioned in PR, the issue seems to be that in
propagate_bits_accross_jump_functions(),
ipa_get_type() returns record_type during WPA and hence we pass
invalid precision to
ipcp_bits_lattice::meet_with (value, mask, precision) which eventually
leads to runtime error.
The attached patch tries to fix that, by bailing out if type of param
is not integral or pointer type.
This happens for the edge from deque_test -> _Z4copyIPd1BEvT_S2_T0_.isra.0/9.

However I am not sure how ipcp_bits_lattice::meet_with (value, mask,
precision) gets called for this case. In
ipa_compute_jump_functions_for_edge(), we set jfunc->bits.known to
true only
if parm's type satisfies INTEGRAL_TYPE_P or POINTER_TYPE_P.
And ipcp_bits_lattice::meet_with (value, mask, precision) is called
only if jfunc->bits.known
is set to true. So I suppose it shouldn't really happen that
ipcp_bits_lattice::meet_with(value, mask, precision) gets called when
callee parameter's type is record_type, since the corresponding
argument's type would also need to be record_type and
jfunc->bits.known would be set to false.

Without -flto, parm_type is reference_type so that satisfies POINTER_TYPE_P,
but with -flto it's appearing to be record_type. Is this possibly the
same issue of TYPE_ARG_TYPES returning bogus types during WPA ?

I verified the attached patch fixes the runtime error with ubsan-built gcc.
Bootstrap+tested on x86_64-unknown-linux-gnu.
Cross-tested on arm*-*-*, aarch64*-*-*.
LTO bootstrap on x86_64-unknown-linux-gnu in progress.
Is it OK to commit if it succeeds ?

Thanks,
Prathamesh
2016-12-01  Prathamesh Kulkarni  

PR ipa/78599
* ipa-cp.c (propagate_bits_accross_jump_function): Check if parm_type
is integral or pointer type.
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 2ec671f..28eb74c 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1770,12 +1770,15 @@ propagate_bits_accross_jump_function (cgraph_edge *cs, 
int idx, ipa_jump_func *j
   tree parm_type = ipa_get_type (callee_info, idx);
 
   /* For K&R C programs, ipa_get_type() could return NULL_TREE.
- Avoid the transform for these cases.  */
-  if (!parm_type)
+ Avoid the transform for these cases or if parm type is not
+ integral or pointer type.  */
+  if (!parm_type
+  || !(INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Setting dest_lattice to bottom, because"
-   " param %i type is NULL for %s\n", idx,
+   " param %i type is %s for %s\n", idx,
+   (parm_type == NULL) ? "NULL" : "non-integral",
cs->callee->name ());
 
   return dest_lattice->set_to_bottom ();

Fwd: Re: [PATCH, ARM/testsuite 6/7] Force soft float in ARMv6-M and ARMv8-M Baseline options

2016-12-01 Thread Thomas Preudhomme


Hi,

We have decided to backport this patch fixing testing for ARMv8-M Baseline to 
our embedded-6-branch.



*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Best regards,

Thomas
--- Begin Message ---

On 22/09/16 16:47, Richard Earnshaw (lists) wrote:

On 22/09/16 15:51, Thomas Preudhomme wrote:

Sorry, noticed an error in the patch. It was not caught during testing
because GCC was built with --with-mode=thumb. Correct patch attached.

Best regards,

Thomas

On 22/09/16 14:49, Thomas Preudhomme wrote:

Hi,

ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the
arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like
-mthumb is
passed for architectures that only support Thumb instruction set. This
patch
adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add
options.
Patch is in attachment.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Is this ok for trunk?

Best regards,

Thomas


6_softfloat_testing_v6m_v8m_baseline.patch


diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5 } */
 #   /* { dg-require-effective-target arm_arch_v5_multilib } */
-foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
-v4t "-march=armv4t" __ARM_ARCH_4T__
-v5 "-march=armv5 -marm" __ARM_ARCH_5__
-v5t "-march=armv5t" __ARM_ARCH_5T__
-v5te "-march=armv5te" __ARM_ARCH_5TE__
-v6 "-march=armv6" __ARM_ARCH_6__
-v6k "-march=armv6k" __ARM_ARCH_6K__
-v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
-v6z "-march=armv6z" __ARM_ARCH_6Z__
-v6m "-march=armv6-m -mthumb" 
__ARM_ARCH_6M__
-v7a "-march=armv7-a" __ARM_ARCH_7A__
-v7r "-march=armv7-r" __ARM_ARCH_7R__
-v7m "-march=armv7-m -mthumb" 
__ARM_ARCH_7M__
-v7em "-march=armv7e-m -mthumb" 
__ARM_ARCH_7EM__
-v8a "-march=armv8-a" __ARM_ARCH_8A__
-v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
-v8m_base "-march=armv8-m.base -mthumb" 
__ARM_ARCH_8M_BASE__
-v8m_main "-march=armv8-m.main -mthumb" 
__ARM_ARCH_8M_MAIN__ } {
+foreach { armfunc armflag armdef } {
+   v4 "-march=armv4 -marm" __ARM_ARCH_4__
+   v4t "-march=armv4t" __ARM_ARCH_4T__
+   v5 "-march=armv5 -marm" __ARM_ARCH_5__
+   v5t "-march=armv5t" __ARM_ARCH_5T__
+   v5te "-march=armv5te" __ARM_ARCH_5TE__
+   v6 "-march=armv6" __ARM_ARCH_6__
+   v6k "-march=armv6k" __ARM_ARCH_6K__
+   v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
+   v6z "-march=armv6z" __ARM_ARCH_6Z__
+   v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+   v7a "-march=armv7-a" __ARM_ARCH_7A__
+   v7r "-march=armv7-r" __ARM_ARCH_7R__
+   v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+   v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+   v8a "-march=armv8-a" __ARM_ARCH_8A__
+   v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+   v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" 
__ARM_ARCH_8M_BASE__
+   v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
proc check_effective_target_arm_arch_FUNC_ok { } {
if { [ string match "*-marm*" "FLAG" ] &&



I think if you're going to do this you need to also check that changing
the ABI in this way isn't incompatible with other aspects of how the
user has invoked dejagnu.


The reason this patch was made is that without it dg-require-effective-target 
arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain 
but then any testcase containing a function for such a target (such as the 
atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not 
support hard float ABI.


I see 2 ways

[arm-embedded] [PATCH, GCC/ARM] Add multilib mapping for Cortex-M23 & Cortex-M33

2016-12-01 Thread Thomas Preudhomme


Hi,

We have decided to backport this patch to add multilib support for ARM 
Cortex-M23 and Cortex-M33 to our embedded-6-branch.



*** gcc/ChangeLog ***

2016-11-30  Thomas Preud'homme  

* config/arm/t-rmprofile: Add mappings for Cortex-M23 and Cortex-M33.


Best regards,

Thomas
--- Begin Message ---

Hi,

With ARM Cortex-M23 and Cortex-M33 and the support for RM profile multilib added 
recently, it's time to add the corresponding CPU to architecture mappings in 
config/arm/t-rmprofile. Note that Cortex-M33 is mapped to ARMv8-M Mainline 
because there is no transitive closure of mappings and the multilib for ARMv8-M 
Mainline with DSP extensions is ARMv8-M Mainline.


ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-11-30  Thomas Preud'homme  

* config/arm/t-rmprofile: Add mappings for Cortex-M23 and Cortex-M33.


Testing: Linking fails before this patch when targeting one of these two cores 
and using rmprofile multilib but succeeds with the patch.


Is this ok for stage3?

Best regards,

Thomas
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index c8b5c9cbd03694eea69855e20372afa3e97d6b4c..93aa909b4d942ad9875a95e0d4397ff17b317905 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -102,6 +102,8 @@ MULTILIB_MATCHES   += march?armv6s-m=mcpu?cortex-m1.small-multiply
 MULTILIB_MATCHES   += march?armv7-m=mcpu?cortex-m3
 MULTILIB_MATCHES   += march?armv7e-m=mcpu?cortex-m4
 MULTILIB_MATCHES   += march?armv7e-m=mcpu?cortex-m7
+MULTILIB_MATCHES   += march?armv8-m.base=mcpu?cortex-m23
+MULTILIB_MATCHES   += march?armv8-m.main=mcpu?cortex-m33
 MULTILIB_MATCHES   += march?armv7=mcpu?cortex-r4
 MULTILIB_MATCHES   += march?armv7=mcpu?cortex-r4f
 MULTILIB_MATCHES   += march?armv7=mcpu?cortex-r5
--- End Message ---

[avr,committed]: Use SYMBOL_REF_P if possible.

2016-12-01 Thread Georg-Johann Lay


http://gcc.gnu.org/r243104

Use more SYMBOL_REF_P instead of SYMBOL_REF == GET_CODE (...).

Committed as obvious.

Johann

gcc/
* config/avr/avr.c (avr_print_operand): Use SYMBOL_REF_P if possible.
(avr_handle_addr_attribute, avr_asm_output_aligned_decl_common)
(avr_asm_asm_output_aligned_bss, avr_addr_space_convert): Dito.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 243099)
+++ config/avr/avr.c	(working copy)
@@ -2726,7 +2726,7 @@ avr_print_operand (FILE *file, rtx x, in
 }
   else if (code == 'i')
 {
-  if (GET_CODE (x) == SYMBOL_REF && (SYMBOL_REF_FLAGS (x) & SYMBOL_FLAG_IO))
+  if (SYMBOL_REF_P (x) && (SYMBOL_REF_FLAGS (x) & SYMBOL_FLAG_IO))
 	avr_print_operand_address
 	  (file, VOIDmode, plus_constant (HImode, x, -avr_arch->sfr_offset));
   else
@@ -9585,7 +9585,7 @@ avr_handle_addr_attribute (tree *node, t
 rtx
 avr_eval_addr_attrib (rtx x)
 {
-  if (GET_CODE (x) == SYMBOL_REF
+  if (SYMBOL_REF_P (x)
   && (SYMBOL_REF_FLAGS (x) & SYMBOL_FLAG_ADDRESS))
 {
   tree decl = SYMBOL_REF_DECL (x);
@@ -9896,7 +9896,7 @@ avr_asm_output_aligned_decl_common (FILE
   rtx symbol;
 
   if (mem != NULL_RTX && MEM_P (mem)
-  && GET_CODE ((symbol = XEXP (mem, 0))) == SYMBOL_REF
+  && SYMBOL_REF_P ((symbol = XEXP (mem, 0)))
   && (SYMBOL_REF_FLAGS (symbol) & (SYMBOL_FLAG_IO | SYMBOL_FLAG_ADDRESS)))
 {
 
@@ -9941,7 +9941,7 @@ avr_asm_asm_output_aligned_bss (FILE *fi
   rtx symbol;
 
   if (mem != NULL_RTX && MEM_P (mem)
-  && GET_CODE ((symbol = XEXP (mem, 0))) == SYMBOL_REF
+  && SYMBOL_REF_P ((symbol = XEXP (mem, 0)))
   && (SYMBOL_REF_FLAGS (symbol) & (SYMBOL_FLAG_IO | SYMBOL_FLAG_ADDRESS)))
 {
   if (!(SYMBOL_REF_FLAGS (symbol) & SYMBOL_FLAG_ADDRESS))
@@ -12715,7 +12715,7 @@ avr_addr_space_convert (rtx src, tree ty
  but are located in flash.  In that case we patch the incoming
  address space.  */
 
-  if (SYMBOL_REF == GET_CODE (sym)
+  if (SYMBOL_REF_P (sym)
   && ADDR_SPACE_FLASH == AVR_SYMBOL_GET_ADDR_SPACE (sym))
 {
   as_from = ADDR_SPACE_FLASH;

Re: [PING] (v2) Add a "compact" mode to print_rtx_function

2016-12-01 Thread Dominik Vogt

On Tue, Nov 22, 2016 at 10:38:42AM -0500, David Malcolm wrote:
> On Tue, 2016-11-22 at 15:45 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> > > On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> > > > Can't it be done only if xloc.file contains any fancy characters?
> > > 
> > > Sure, but why? Strings generally get emitted with quotes around
> > > them, I
> > > don't see a good reason for filenames to be different, especially
> > > if it
> > > makes the output easier to parse.
> > 
> > Because printing common filenames matches what we emit in
> > diagnostics,
> > what e.g. sanitizers emit at runtime diagnostics, what we emit as
> > locations
> > in gimple dumps etc.
> 
> It sounds like a distinction between human-readable vs machine
> -readable.
> 
> How about something like the following, which only adds the quotes if
> outputting the RTL FE's input format?
> 
> Does this fix the failing tests?

> From 642d511fdba3a33fb18ce46c549f7c972ed6b14e Mon Sep 17 00:00:00 2001
> From: David Malcolm 
> Date: Tue, 22 Nov 2016 11:06:41 -0500
> Subject: [PATCH] print-rtl.c: conditionalize quotes for filenames
> 
> gcc/ChangeLog:
>   * print-rtl.c (rtx_writer::print_rtx_operand_code_i): Only use
>   quotes for filenames when in compact mode.
> ---
>  gcc/print-rtl.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
> index 77e6b05..5370602 100644
> --- a/gcc/print-rtl.c
> +++ b/gcc/print-rtl.c
> @@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
> int idx)
>if (INSN_HAS_LOCATION (in_insn))
>   {
> expanded_location xloc = insn_location (in_insn);
> -   fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
> +   if (m_compact)
> + fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
> +   else
> + fprintf (m_outfile, " %s:%i", xloc.file, xloc.line);
>   }
>  #endif
>  }
> -- 
> 1.8.5.3

I'd like to get our test failure fixed, either by changing
print-rtl.c or our test case.  Is the above patch good for trunk?
It does fix the s390 test failure.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[testsuite,avr,committed]: Add target avr_tiny for respective tests.

2016-12-01 Thread Georg-Johann Lay

Added target avr_tiny filter to 2 tests that are obviously intended to 
run on AVR_TINY.


Applied as obvious.

Johann

gcc/testsuite/
* gcc.target/avr/tiny-memx.c: Only perform if target avr_tiny.
* gcc.target/avr/tiny-caller-save.c: Dito.

Index: gcc.target/avr/tiny-caller-save.c
===
--- gcc.target/avr/tiny-caller-save.c   (revision 243104)
+++ gcc.target/avr/tiny-caller-save.c   (revision 243105)
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target avr_tiny } } */
 /* { dg-options "-mmcu=avrtiny -gdwarf -Os" } */

 /* This is a stripped down piece of libgcc2.c that triggerd an ICE for 
avr with

Index: gcc.target/avr/tiny-memx.c
===
--- gcc.target/avr/tiny-memx.c  (revision 243104)
+++ gcc.target/avr/tiny-memx.c  (revision 243105)
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target avr_tiny } } */
 /* { dg-options "-mmcu=avrtiny" } */

 const __memx char ascmonth[] = "Jan"; /* { dg-error "not supported" } */

[PATCH v3] Support ASan ODR indicators at compiler side.

2016-12-01 Thread Maxim Ostapenko


Hi,

this is the third attempt to support ASan odr indicators in GCC. I've 
fixed the issue with odr indicator name (now we don't use 
ASM_GENERATE_INTERNAL_LABEL and use XALLOCAVEC for avoid overflow).

Also several style issues from previous review were covered.
How does it look now?

Tested and ASan bootstrapped on x86_64-unknown-linux-gnu.

-Maxim
commit 35cb5eb935bb1e29357e93f8e9a4e87fb4e9f511
Author: Maxim Ostapenko 
Date:   Fri Oct 28 10:22:03 2016 +0300

Add support for ASan odr_indicator.

config/

	* bootstrap-asan.mk: Replace LSAN_OPTIONS=detect_leaks=0 with
	ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1.

gcc/

	* asan.c (asan_global_struct): Refactor.
	(create_odr_indicator): New function.
	(asan_needs_odr_indicator_p): Likewise.
	(is_odr_indicator): Likewise.
	(asan_add_global): Introduce odr_indicator_ptr. Pass it into global's
	constructor.
	(asan_protect_global): Do not protect odr indicators.

gcc/testsuite/

	* c-c++-common/asan/no-redundant-odr-indicators-1.c: New test.

diff --git a/config/ChangeLog b/config/ChangeLog
index ed59787..a5d5ff5 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,8 @@
+2016-12-01  Maxim Ostapenko  
+
+	* bootstrap-asan.mk: Replace LSAN_OPTIONS=detect_leaks=0 with
+	ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1.
+
 2016-11-30  Matthias Klose  
 
 	* pkg.m4: New file.
diff --git a/config/bootstrap-asan.mk b/config/bootstrap-asan.mk
index 70baaf9..e73d4c2 100644
--- a/config/bootstrap-asan.mk
+++ b/config/bootstrap-asan.mk
@@ -1,7 +1,7 @@
 # This option enables -fsanitize=address for stage2 and stage3.
 
 # Suppress LeakSanitizer in bootstrap.
-export LSAN_OPTIONS="detect_leaks=0"
+export ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1
 
 STAGE2_CFLAGS += -fsanitize=address
 STAGE3_CFLAGS += -fsanitize=address
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b3cc6305..f19ac9d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-12-01  Maxim Ostapenko  
+
+	* asan.c (asan_global_struct): Refactor.
+	(create_odr_indicator): New function.
+	(asan_needs_odr_indicator_p): Likewise.
+	(is_odr_indicator): Likewise.
+	(asan_add_global): Introduce odr_indicator_ptr. Pass it into global's
+	constructor.
+	(asan_protect_global): Do not protect odr indicators.
+
 2016-12-01  Jakub Jelinek  
 
 	PR target/78614
diff --git a/gcc/asan.c b/gcc/asan.c
index cb5d615..9db7481 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1388,6 +1388,16 @@ asan_needs_local_alias (tree decl)
   return DECL_WEAK (decl) || !targetm.binds_local_p (decl);
 }
 
+/* Return true if DECL, a global var, is an artificial ODR indicator symbol
+   therefore doesn't need protection.  */
+
+static bool
+is_odr_indicator (tree decl)
+{
+  return (DECL_ARTIFICIAL (decl)
+	  && lookup_attribute ("asan odr indicator", DECL_ATTRIBUTES (decl)));
+}
+
 /* Return true if DECL is a VAR_DECL that should be protected
by Address Sanitizer, by appending a red zone with protected
shadow memory after it and aligning it to at least
@@ -1436,7 +1446,8 @@ asan_protect_global (tree decl)
   || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT
   || !valid_constant_size_p (DECL_SIZE_UNIT (decl))
   || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE
-  || TREE_TYPE (decl) == ubsan_get_source_location_type ())
+  || TREE_TYPE (decl) == ubsan_get_source_location_type ()
+  || is_odr_indicator (decl))
 return false;
 
   rtl = DECL_RTL (decl);
@@ -2266,14 +2277,15 @@ asan_dynamic_init_call (bool after_p)
 static tree
 asan_global_struct (void)
 {
-  static const char *field_names[8]
+  static const char *field_names[]
 = { "__beg", "__size", "__size_with_redzone",
-	"__name", "__module_name", "__has_dynamic_init", "__location", "__odr_indicator"};
-  tree fields[8], ret;
-  int i;
+	"__name", "__module_name", "__has_dynamic_init", "__location",
+	"__odr_indicator" };
+  tree fields[ARRAY_SIZE (field_names)], ret;
+  unsigned i;
 
   ret = make_node (RECORD_TYPE);
-  for (i = 0; i < 8; i++)
+  for (i = 0; i < ARRAY_SIZE (field_names); i++)
 {
   fields[i]
 	= build_decl (UNKNOWN_LOCATION, FIELD_DECL,
@@ -2295,6 +2307,60 @@ asan_global_struct (void)
   return ret;
 }
 
+/* Create and return odr indicator symbol for DECL.
+   TYPE is __asan_global struct type as returned by asan_global_struct.  */
+
+static tree
+create_odr_indicator (tree decl, tree type)
+{
+  char *name;
+  tree uptr = TREE_TYPE (DECL_CHAIN (TYPE_FIELDS (type)));
+  tree decl_name
+= (HAS_DECL_ASSEMBLER_NAME_P (decl) ? DECL_ASSEMBLER_NAME (decl)
+	: DECL_NAME (decl));
+  /* DECL_NAME theoretically might be NULL.  Bail out with 0 in this case.  */
+  if (decl_name == NULL_TREE)
+return build_int_cst (uptr, 0);
+  int len = strlen (IDENTIFIER_POINTER (decl_name))
+	+ sizeof ("__odr_asan_") + 1;
+  name = XALLOCAVEC (char, len);
+  name[len] = '\0';
+  snprintf (name, len, "__odr_asan_%s",

Re: [PATCH v3] Support ASan ODR indicators at compiler side.

2016-12-01 Thread Jakub Jelinek

On Thu, Dec 01, 2016 at 01:25:43PM +0300, Maxim Ostapenko wrote:
> +  int len = strlen (IDENTIFIER_POINTER (decl_name))
> + + sizeof ("__odr_asan_") + 1;

Please use size_t len instead of int len.  Why the + 1?  sizeof ("__odr_asan_")
should be already strlen ("__odr_asan_") + 1.

> +  name = XALLOCAVEC (char, len);
> +  name[len] = '\0';

This is buffer overflow.  Why do you need it at all?

> +  snprintf (name, len, "__odr_asan_%s", IDENTIFIER_POINTER (decl_name));

This should zero terminate the string.

Also, shouldn't this be followed by:
#ifndef NO_DOT_IN_LABEL
  name[sizeof ("__odr_asan") - 1] = '.';
#elif !defined(NO_DOLLAR_IN_LABEL)
  name[sizeof ("__odr_asan") - 1] = '$';
#endif

to make it not possible to clash with user symbols __odr_asan_foobar etc.
if possible (on targets which don't allow dots nor dollars in labels that is
not really possible, but at least elsewhere).

That said, if the __odr_asan* symbols are exported, it is part of ABI, so
what exactly does LLVM use in those cases?

> +/* { dg-final { scan-assembler-not ".*odr_asan_a.*" } } */
> +/* { dg-final { scan-assembler-not ".*odr_asan_b.*" } } */
> +/* { dg-final { scan-assembler ".*odr_asan_c.*" } } */

The .* on either side makes no sense, please remove those.
And, if the dot or dollar is replacing _, you need to use "odr_asan.a"
etc. in the regexps.

Otherwise LGTM.

Jakub

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-01 Thread Richard Earnshaw (lists)

On 30/11/16 21:43, Cary Coutant wrote:
> How about if instead of special DW_OP codes, you instead define a new
> virtual register that contains the mangled return address? If the rule
> for that virtual register is anything other than DW_CFA_undefined,
> you'd expect to find the mangled return address using that rule;
> otherwise, you would use the rule for LR instead and expect an
> unmangled return address. The earlier example would become (picking an
> arbitrary value of 120 for the new virtual register number):
> 
> .cfi_startproc
>0x0  paciasp (this instruction sign return address register LR/X30)
> .cfi_val 120, DW_OP_reg30
>0x4  stp x29, x30, [sp, -32]!
> .cfi_offset 120, -16
> .cfi_offset 29, -32
> .cfi_def_cfa_offset 32
>0x8  add x29, sp, 0
> 
> Just a suggestion...

What about signing other registers?  And what if the value is then
copied to another register?  Don't you end up with every possible
register (including the FP/SIMD registers) needing a shadow copy?

R.

> 
> -cary
> 
> 
> On Wed, Nov 16, 2016 at 6:02 AM, Jakub Jelinek  wrote:
>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:
>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref 
 were
 designed as shortcut operations when LR is signed with A key and using
 function's CFA as salt.  This is the default behaviour of return address
 signing so is expected to be used for most of the time.  
 DW_OP_AARCH64_pauth
 is designed as a generic operation that allow describing pointer signing on
 any value using any salt and key in case we can't use the shortcut 
 operations
 we can use this.
>>>
>>> I admit to not fully understand the salting/keying involved. But given
>>> that the DW_OP space is really tiny, so we would like to not eat up too
>>> many of them for new opcodes. And given that introducing any new DW_OPs
>>> using for CFI unwinding will break any unwinder anyway causing us to
>>> update them all for this new feature. Have you thought about using a new
>>> CIE augmentation string character for describing that the return
>>> address/link register used by a function/frame is salted/keyed?
>>>
>>> This seems a good description of CIE records and augmentation
>>> characters: http://www.airs.com/blog/archives/460
>>>
>>> It obviously also involves updating all unwinders to understand the new
>>> augmentation character (and possible arguments). But it might be more
>>> generic and saves us from using up too many DW_OPs.
>>
>> From what I understood, the return address is not always scrambled, so
>> it doesn't apply to the whole function, just to most of it (except for
>> an insn in the prologue and some in the epilogue).  So I think one op is
>> needed.  But can't it be just a toggable flag whether the return address
>> is scrambled + some arguments to it?
>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
>> way of scrambling starts here (if not already active) or any kind of
>> scrambling ends here (if already active), and
>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need
>> to represent details of the less common variants with details what to do.
>> Then you'd just hook through some MD_* macro in the unwinder the
>> descrambling operation if the scrambling is active at the insns you unwind
>> on.
>>
>> Jakub

[PR middle-end/78548] fix latent double free in tree-ssa-uninit.c

2016-12-01 Thread Aldy Hernandez

This looks like a latent problem in the -Wmaybe-uninitialized code 
unrelated to my work.


The problem here is a sequence of simplify_preds() followed by 
normalize_preds() that I added, but is based on prior art all over the file:


+  simplify_preds (&uninit_preds, NULL, false);
+  uninit_preds = normalize_preds (uninit_preds, NULL, false);

The problem is that in this particular testcase we trigger 
simplify_preds_4 which makes a copy of a chain, frees the chain, and 
then tries to use the chain later (in normalize_preds).  The 
normalize_preds() function tries to free again the chain and we blow up:


This is the main problem in simplify_preds_4:

  /* Now clean up the chain.  */
  if (simplified)
{
  for (i = 0; i < n; i++)
{
  if ((*preds)[i].is_empty ())
continue;
  s_preds.safe_push ((*preds)[i]);
// 
// Makes a copy of the pred_chain.
}

  destroy_predicate_vecs (preds);
// ^^^
// free() all the pred_chain's.

  (*preds) = s_preds;
// ^^
// Wait a minute, we still keep a copy of the pred_chains.
  s_preds = vNULL;
}

I have no idea how this worked even before my patch.  Perhaps we never 
had a simplify_preds() followed by a normalize_preds() where the 
simplification involved a call to simplify_preds_4.


Interestingly enough, simplify_preds_2() has the exact same code, but 
with the fix I am proposing. So this seems like an oversight.  Also, the 
fact that the simplification in simplify_preds_2 is more common than the 
simplification performed in simplify_preds_4 would suggest that 
simplify_preds_4 was uncommon enough and probably wasn't been used in a 
simplify_preds/normalize_preds combo.


Anyways... I've made some other cleanups to the code, but the main gist 
of the entire patch is:


-  destroy_predicate_vecs (preds);
+  preds->release ();

That is, release preds, but don't free the associated memory with the 
pred_chain's therein.


This patch is on top of my pending patch here:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02900.html

Tested on x86-64 Linux.

OK for trunk?
Aldy
commit ec4443b8dcf89465cb8d9735a3e0a27b181c975c
Author: Aldy Hernandez 
Date:   Thu Dec 1 04:53:38 2016 -0500

PR middle-end/78548
* tree-ssa-uninit.c (simplify_preds_4): Call release() instead of
destroy_predicate_vecs.
(uninit_uses_cannot_happen): Make uninit_preds a scalar.

diff --git a/gcc/testsuite/gcc.dg/uninit-pr78548.c 
b/gcc/testsuite/gcc.dg/uninit-pr78548.c
new file mode 100644
index 000..12e06dd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr78548.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -w -O2" } */
+
+char a;
+int b;
+unsigned c, d;
+short e;
+int main_f;
+int main (  ) {
+L0:
+if ( e ) goto L1;
+b = c & d || a;
+if ( !c ) printf ( "", ( long long ) main_f );
+if ( d || !c ) {
+printf ( "%llu\n", ( long long ) main );
+goto L2;
+}
+unsigned g = b;
+L1:
+b = g;
+L2:
+if ( b ) goto L0;
+  return 0;
+}
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index a648995..b4892c7 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -1774,7 +1774,7 @@ simplify_preds_4 (pred_chain_union *preds)
  s_preds.safe_push ((*preds)[i]);
}
 
-  destroy_predicate_vecs (preds);
+  preds->release ();
   (*preds) = s_preds;
   s_preds = vNULL;
 }
@@ -2211,10 +2211,9 @@ uninit_uses_cannot_happen (gphi *phi, unsigned 
uninit_opnds,
 
   /* Look for the control dependencies of all the uninitialized
  operands and build guard predicates describing them.  */
-  unsigned i;
-  pred_chain_union uninit_preds[max_phi_args];
-  memset (uninit_preds, 0, sizeof (pred_chain_union) * phi_args);
-  for (i = 0; i < phi_args; ++i)
+  pred_chain_union uninit_preds;
+  bool ret = true;
+  for (unsigned i = 0; i < phi_args; ++i)
 {
   if (!MASK_TEST_BIT (uninit_opnds, i))
continue;
@@ -2226,26 +2225,32 @@ uninit_uses_cannot_happen (gphi *phi, unsigned 
uninit_opnds,
   int num_calls = 0;
 
   /* Build the control dependency chain for uninit operand `i'...  */
+  uninit_preds = vNULL;
   if (!compute_control_dep_chain (find_dom (e->src),
  e->src, dep_chains, &num_chains,
  &cur_chain, &num_calls))
-   return false;
+   {
+ ret = false;
+ break;
+   }
   /* ...and convert it into a set of predicates.  */
   convert_control_dep_chain_into_preds (dep_chains, num_chains,
-   &uninit_preds[i]);
+   &uninit_preds);
   for (size_t j = 0; j < num_chains; ++j)
dep_chains[j].release ();
-  simplify_preds (&uninit_preds[i], NULL, false)

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-01 Thread Jiong Wang


On 01/12/16 10:42, Richard Earnshaw (lists) wrote:

On 30/11/16 21:43, Cary Coutant wrote:

How about if instead of special DW_OP codes, you instead define a new
virtual register that contains the mangled return address? If the rule
for that virtual register is anything other than DW_CFA_undefined,
you'd expect to find the mangled return address using that rule;
otherwise, you would use the rule for LR instead and expect an
unmangled return address. The earlier example would become (picking an
arbitrary value of 120 for the new virtual register number):

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_val 120, DW_OP_reg30
0x4  stp x29, x30, [sp, -32]!
 .cfi_offset 120, -16
 .cfi_offset 29, -32
 .cfi_def_cfa_offset 32
0x8  add x29, sp, 0

Just a suggestion...

What about signing other registers?  And what if the value is then
copied to another register?  Don't you end up with every possible
register (including the FP/SIMD registers) needing a shadow copy?


  
  Another issue is compared with the DW_CFA approach, this virtual register

  approach is less efficient on unwind table size and complexer to implement.

  .cfi_register takes two ULEB128 register number, it needs 3 bytes rather
   than DW_CFA's 1 byte.  From example .debug_frame section size for linux
   kernel increment will be ~14% compared with DW_CFA approach's 5%.

  In the implementation, the prologue then normally will be

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_val 120, DW_OP_reg30  <-A
0x4  stp x29, x30, [sp, -32]!
 .cfi_offset 120, -16   <-B
 .cfi_offset 29, -32
 .cfi_def_cfa_offset 32

The epilogue normally will be
...
ldp x29, x30, [sp], 32
  .cfi_val 120, DW_OP_reg30  <- C
  .cfi_restore 29
  .cfi_def_cfa 31, 0

autiasp (this instruction unsign LR/X30)
  .cfi_restore 30

   For the virual register approach, GCC needs to track dwarf generation for
   LR/X30 in every place (A/B/C, maybe some other rare LR copy places), and
   rewrite LR to new virtual register accordingly. This seems easy, but my
   practice shows GCC won't do any DWARF auto-deduction if you have one
   explict DWARF CFI note attached to an insn (handled_one will be true in
   dwarf2out_frame_debug).  So for instruction like stp/ldp, we then need to
   explicitly generate all three DWARF CFI note manually.

   While for DW_CFA approach, they will be:

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_cfa_window_save
0x4  stp x29, x30, [sp, -32]! \
 .cfi_offset 30, -16  |
 .cfi_offset 29, -32  |
 .cfi_def_cfa_offset 32   |  all dwarf generation between sign 
and
...   |  unsign (paciasp/autiasp) is the 
same
ldp x29, x30, [sp], 16|  as before
  .cfi_restore 30 |
  .cfi_restore 29 |
  .cfi_def_cfa 31, 0  |
  /
autiasp (this instruction unsign LR/X30)
  .cfi_cfa_window_save

   The DWARF generation implementation in backend is very simple, nothing needs 
to be
   updated between sign and unsign instruction.

 For the impact on the unwinder, the virtual register approach needs to change
 the implementation of "save value" rule which is quite general code. A target 
hook
 might need for AArch64 that when the destination register is the special 
virtual
 register, it seems a little bit hack to me.


-cary


On Wed, Nov 16, 2016 at 6:02 AM, Jakub Jelinek  wrote:

On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:

On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:

   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were
designed as shortcut operations when LR is signed with A key and using
function's CFA as salt.  This is the default behaviour of return address
signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
is designed as a generic operation that allow describing pointer signing on
any value using any salt and key in case we can't use the shortcut operations
we can use this.

I admit to not fully understand the salting/keying involved. But given
that the DW_OP space is really tiny, so we would like to not eat up too
many of them for new opcodes. And given that introducing any new DW_OPs
using for CFI unwinding will break any unwinder anyway causing us to
update them all for this new feature. Have you thought about using a new
CIE augmentation string character for describing that the return
address/link register used by a function/frame is salted/keyed?

This seems a good description of CIE records and augmentation
characters: http://www.airs.co

Re: [Patch Doc] Update documentation for __fp16 type

2016-12-01 Thread James Greenhalgh


On Wed, Nov 30, 2016 at 05:58:13PM +, Joseph Myers wrote:
> On Wed, 30 Nov 2016, James Greenhalgh wrote:
>
> > +@code{_Float16} type defined by ISO/IEC TS18661:3-2005
>
> Add a space after "TS", and it's -3:2015 not :3-2005.

You would think that after 2 months of having the specification sitting
on my desk I'd have got that right... Fixed in this revision.

> I think the -mfp16-format documentation in invoke.texi should also be
> updated to reflect that it affects availability of _Float16.

I'm working on something larger for -mfp16-format, I'll update invoke.texi
at that point (or otherwise before GCC 7 releases).

Thanks,
James

---

2016-12-01  James Greenhalgh  

* doc/extend.texi (Half-Precision): Update to document current
compiler behaviour.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7d3d17a..23d03bd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1012,11 +1012,12 @@ that handle conversions if/when long double is changed to be IEEE
 @cindex half-precision floating point
 @cindex @code{__fp16} data type
 
-On ARM targets, GCC supports half-precision (16-bit) floating point via
-the @code{__fp16} type.  You must enable this type explicitly
-with the @option{-mfp16-format} command-line option in order to use it.
+On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
+point via the @code{__fp16} type defined in the ARM C Language Extensions.
+On ARM systems, you must enable this type explicitly with the
+@option{-mfp16-format} command-line option in order to use it.
 
-ARM supports two incompatible representations for half-precision
+ARM targets support two incompatible representations for half-precision
 floating-point values.  You must choose one of the representations and
 use it consistently in your program.
 
@@ -1031,22 +1032,20 @@ format, but does not support infinities or NaNs.  Instead, the range
 of exponents is extended, so that this format can represent normalized
 values in the range of @math{2^{-14}} to 131008.
 
-The @code{__fp16} type is a storage format only.  For purposes
-of arithmetic and other operations, @code{__fp16} values in C or C++
-expressions are automatically promoted to @code{float}.  In addition,
-you cannot declare a function with a return value or parameters
-of type @code{__fp16}.
+The GCC port for AArch64 only supports the IEEE 754-2008 format, and does
+not require use of the @option{-mfp16-format} command-line option.
 
-Note that conversions from @code{double} to @code{__fp16}
-involve an intermediate conversion to @code{float}.  Because
-of rounding, this can sometimes produce a different result than a
-direct conversion.
+The @code{__fp16} type may only be used as an argument to intrinsics defined
+in @code{}, or as a storage format.  For purposes of
+arithmetic and other operations, @code{__fp16} values in C or C++
+expressions are automatically promoted to @code{float}.
 
-ARM provides hardware support for conversions between
+The ARM target provides hardware support for conversions between
 @code{__fp16} and @code{float} values
-as an extension to VFP and NEON (Advanced SIMD).  GCC generates
-code using these hardware instructions if you compile with
-options to select an FPU that provides them;
+as an extension to VFP and NEON (Advanced SIMD), and from ARMv8 provides
+hardware support for conversions between @code{__fp16} and @code{double}
+values.  GCC generates code using these hardware instructions if you
+compile with options to select an FPU that provides them;
 for example, @option{-mfpu=neon-fp16 -mfloat-abi=softfp},
 in addition to the @option{-mfp16-format} option to select
 a half-precision format.
@@ -1054,8 +1053,12 @@ a half-precision format.
 Language-level support for the @code{__fp16} data type is
 independent of whether GCC generates code using hardware floating-point
 instructions.  In cases where hardware support is not specified, GCC
-implements conversions between @code{__fp16} and @code{float} values
-as library calls.
+implements conversions between @code{__fp16} and other types as library
+calls.
+
+It is recommended that code which is intended to be portable use the
+@code{_Float16} type defined by ISO/IEC TS 18661-3:2015
+(@xref{Floating Types}).
 
 @node Decimal Float
 @section Decimal Floating Types

Re: [PATCH][TER] PR 48863: Don't replace expressions across local register variable definitions

2016-12-01 Thread Kyrill Tkachov



On 24/11/16 15:12, Richard Biener wrote:

On Thu, Nov 24, 2016 at 2:57 PM, Kyrill Tkachov
 wrote:

Hi all,

In this bug we have TER during out-of-ssa misbehaving. A minimal example is:
void foo(unsigned a, float b)
{
   unsigned c = (unsigned) b;  // 1
   register unsigned r0 __asm__("r0") = a; // 2
   register unsigned r1 __asm__("r1") = c; // 3
 __asm__ volatile( "str %[r1], [%[r0]]\n"
   :
   : [r0] "r" (r0),
 [r1] "r" (r1));
}

Statement 1 can produce a libcall to convert float b into an int and TER
moves it
into statement 3. But the libcall clobbers r0, which we want set to 'a' in
statement 2.
So r0 gets clobbered by the argument to the conversion libcall.

TER already has code to avoid substituting across function calls and ideally
we'd teach it
to not substitute expressions that can perform a libcall across register
variable definitions
where the register can be clobbered in a libcall, but that information is
not easy to get hold
off in a general way at this level.

So this patch disallows replacement across any local register variable
definition. It does this
by keeping track of local register definitions encountered in a similar way
to which calls are
counted for TER purposes.

I hope this is not too big a hammer as local register variables are not very
widely used and we
only disable replacement across them and it allows us to fix the wrong-code
bug on some
inline-asm usage scenarios where gcc currently miscompiles code following
its documented
advice [1]

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64.

Is this approach acceptable?

Ok.


Thanks.
This has been in trunk for a week without any problems.
Is it ok to backport to the branches?
I have bootstrapped and tested arm-none-linux-gnueabihf on them.

Kyrill


Thanks,
Richard.


Thanks,
Kyrill

[1]
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables

2016-11-24  Kyrylo Tkachov  

 PR target/48863
 PR inline-asm/70184
 * tree-ssa-ter.c (temp_expr_table): Add reg_vars_cnt field.
 (new_temp_expr_table): Initialise reg_vars_cnt.
 (free_temp_expr_table): Release reg_vars_cnt.
 (process_replaceable): Add reg_vars_cnt argument, set reg_vars_cnt
 field of TAB.
 (find_replaceable_in_bb): Use the above to record register variable
 write occurrences and cancel replacement across them.

2016-11-24  Kyrylo Tkachov  

 PR target/48863
 PR inline-asm/70184
 * gcc.target/arm/pr48863.c: New test.

Re: [Patches] Add variant constexpr support for visit, comparisons and get

2016-12-01 Thread Jonathan Wakely


On 30/11/16 19:29 -0800, Tim Shen wrote:

On Wed, Nov 30, 2016 at 8:27 AM, Jonathan Wakely wrote:

On 26/11/16 21:38 -0800, Tim Shen wrote:

+  template>
struct _Uninitialized;



I'm still unsure that is_literal_type is the right trait here. If it's
definitely right then we should probably *not* deprecate it in C++17!


No it's not right. We need this only because [basic.types]p10.5.3 (in n4606):

 if it (a literal type) is a union, at least one of its non-static
data members is of non-volatile literal type, ...

is not implemented. In the current GCC implementation, however, all
non-static data members need to be literal types, in order to create a
literal union.

With the current GCC implementation, to keep our goal, which is to
make _Variadic_union literal type, we need to ensure that
_Uninitialized is literal type, by specializing on T:
1) If is_literal_type_v, store a T;
2) otherwise, store a raw buffer of T.

In the future, when [basic.types]p10.5.3 is implemented, we don't need
is_literal_type_v.

I'll add a comment here.


Thanks, that will stop me asking again and again in future ;-)

I think we want to get [basic.types] p10 implemented before we declare
C++17 support non-experimental, so we don't have to change
std::variant layout later.


I didn't check for other compilers.


That's fine, the current approach should work for them too.

[PATCH v2] improve folding of expressions that move a single bit around

2016-12-01 Thread Paolo Bonzini

In code like the following from KVM:

/* it is a read fault? */
error_code = (exit_qualification << 2) & PFERR_FETCH_MASK;

it would be nicer to write

/* it is a read fault? */
error_code = (exit_qualification & VMX_EPT_READ_FAULT_MASK) ? 
PFERR_FETCH_MASK : 0;

instead of having to know the difference between the positions of the
source and destination bits.  LLVM catches the latter just fine (which
is why I am sending this in stage 3...), but GCC does not, so this
patch adds two patterns to catch it.

The combine.c hunk of v1 has been committed already.

Bootstrapped/regtested x86_64-pc-linux-gnu, ok?

Paolo

2016-11-26  Paolo Bonzini  

* match.pd: Simplify X ? C : 0 where C is a power of 2 and
X tests a single bit.

2016-11-26  Paolo Bonzini  

* gcc.dg/fold-and-lshift.c, gcc.dg/fold-and-rshift-1.c,
gcc.dg/fold-and-rshift-2.c: New testcases.

Index: match.pd
===
--- match.pd(revision 242916)
+++ match.pd(working copy)
@@ -2630,6 +2630,21 @@
   (cmp (bit_and@2 @0 integer_pow2p@1) @1)
   (icmp @2 { build_zero_cst (TREE_TYPE (@0)); })))
  
+/* If we have (A & C) != 0 ? D : 0 where C and D are powers of 2,
+   convert this into a shift followed by ANDing with D.  */
+(simplify
+ (cond
+  (ne (bit_and @0 integer_pow2p@1) integer_zerop)
+  integer_pow2p@2 integer_zerop)
+ (with {
+int shift = wi::exact_log2 (@2) - wi::exact_log2 (@1);
+  }
+  (if (shift > 0)
+   (bit_and
+(lshift (convert @0) { build_int_cst (integer_type_node, shift); }) @2)
+   (bit_and
+(convert (rshift @0 { build_int_cst (integer_type_node, -shift); })) @2
+
 /* If we have (A & C) != 0 where C is the sign bit of A, convert
this into A < 0.  Similarly for (A & C) == 0 into A >= 0.  */
 (for cmp (eq ne)
@@ -2644,6 +2659,19 @@
(with { tree stype = signed_type_for (TREE_TYPE (@0)); }
 (ncmp (convert:stype @0) { build_zero_cst (stype); })
 
+/* If we have A < 0 ? C : 0 where C is a power of 2, convert
+   this into a right shift followed by ANDing with C.  */
+(simplify
+ (cond
+  (lt @0 integer_zerop)
+  integer_pow2p@1 integer_zerop)
+ (with {
+int shift = element_precision (@0) - wi::exact_log2 (@1) - 1;
+  }
+  (bit_and
+   (convert (rshift @0 { build_int_cst (integer_type_node, shift); }))
+   @1)))
+
 /* When the addresses are not directly of decls compare base and offset.
This implements some remaining parts of fold_comparison address
comparisons but still no complete part of it.  Still it is good
Index: testsuite/gcc.dg/fold-and-lshift.c
===
--- testsuite/gcc.dg/fold-and-lshift.c  (revision 0)
+++ testsuite/gcc.dg/fold-and-lshift.c  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-original" } */
+
+int f(int x)
+{
+   return (x << 2) & 128;
+}
+
+int g(int x)
+{
+   return !!(x & 32) << 7;
+}
+
+int h(int x)
+{
+   return ((x >> 5) & 1) << 7;
+}
+
+int i(int x)
+{
+   return (x & 32) >> 5 << 7;
+}
+
+int j(int x)
+{
+   return ((x >> 5) & 1) ? 128 : 0;
+}
+
+int k(int x)
+{
+   return (x & 32) ? 128 : 0;
+}
+
+/* { dg-final { scan-tree-dump-not " \\? " "original" } } */
+/* { dg-final { scan-assembler-not "sarl" { target i?86-*-* x86_64-*-* } } }" 
*/
Index: testsuite/gcc.dg/fold-and-rshift-1.c
===
--- testsuite/gcc.dg/fold-and-rshift-1.c(revision 0)
+++ testsuite/gcc.dg/fold-and-rshift-1.c(working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-original" } */
+
+int f(int x)
+{
+   return (x >> 2) & 128;
+}
+
+int g(int x)
+{
+   return !!(x & 512) << 7;
+}
+
+int h(int x)
+{
+   return ((x >> 9) & 1) << 7;
+}
+
+int i(int x)
+{
+   return (x & 512) >> 9 << 7;
+}
+
+int j(int x)
+{
+   return ((x >> 9) & 1) ? 128 : 0;
+}
+
+int k(int x)
+{
+   return (x & 512) ? 128 : 0;
+}
+
+/* { dg-final { scan-tree-dump-not " \\? " "original" } } */
+/* { dg-final { scan-assembler-not "sall" { target i?86-*-* x86_64-*-* } } }" 
*/
Index: testsuite/gcc.dg/fold-and-rshift-2.c
===
--- testsuite/gcc.dg/fold-and-rshift-2.c(revision 0)
+++ testsuite/gcc.dg/fold-and-rshift-2.c(working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-original" } */
+
+unsigned f(unsigned x)
+{
+   return (x >> 29) & 32;
+}
+
+unsigned g(unsigned x)
+{
+   return !!(x & 0x8000) << 5;
+}
+
+unsigned j(unsigned x)
+{
+   return ((x >> 31) & 1) ? 32 : 0;
+}
+
+unsigned k(unsigned x)
+{
+   return (x & 0x8000) ? 32 : 0;
+}
+
+/* { dg-final { scan-tree-dump-not " \\? " "original" } } */
+/* { dg-final { scan-assembler-not "sall" { target i?86-*-* x86_64-*-* } } }" 
*/

Re: [PATCH v3] Support ASan ODR indicators at compiler side.

2016-12-01 Thread Maxim Ostapenko

Jakub, thank you for review. I'll commit following patch if no issues 
occur after regtesting and bootstrapping.


On 01/12/16 13:42, Jakub Jelinek wrote:

On Thu, Dec 01, 2016 at 01:25:43PM +0300, Maxim Ostapenko wrote:

+  int len = strlen (IDENTIFIER_POINTER (decl_name))
+   + sizeof ("__odr_asan_") + 1;

Please use size_t len instead of int len.  Why the + 1?  sizeof ("__odr_asan_")
should be already strlen ("__odr_asan_") + 1.


+  name = XALLOCAVEC (char, len);
+  name[len] = '\0';

This is buffer overflow.  Why do you need it at all?


+  snprintf (name, len, "__odr_asan_%s", IDENTIFIER_POINTER (decl_name));

This should zero terminate the string.

Also, shouldn't this be followed by:
#ifndef NO_DOT_IN_LABEL
   name[sizeof ("__odr_asan") - 1] = '.';
#elif !defined(NO_DOLLAR_IN_LABEL)
   name[sizeof ("__odr_asan") - 1] = '$';
#endif

to make it not possible to clash with user symbols __odr_asan_foobar etc.
if possible (on targets which don't allow dots nor dollars in labels that is
not really possible, but at least elsewhere).

That said, if the __odr_asan* symbols are exported, it is part of ABI, so
what exactly does LLVM use in those cases?


AFAIR LLVM doesn't have such a protection. But I agree that preventing 
symbols clashing is reasonable here. So it should be added to LLVM though.


-Maxim




+/* { dg-final { scan-assembler-not ".*odr_asan_a.*" } } */
+/* { dg-final { scan-assembler-not ".*odr_asan_b.*" } } */
+/* { dg-final { scan-assembler ".*odr_asan_c.*" } } */

The .* on either side makes no sense, please remove those.
And, if the dot or dollar is replacing _, you need to use "odr_asan.a"
etc. in the regexps.

Otherwise LGTM.

Jakub





diff --git a/config/ChangeLog b/config/ChangeLog
index ed59787..a5d5ff5 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,8 @@
+2016-12-01  Maxim Ostapenko  
+
+	* bootstrap-asan.mk: Replace LSAN_OPTIONS=detect_leaks=0 with
+	ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1.
+
 2016-11-30  Matthias Klose  
 
 	* pkg.m4: New file.
diff --git a/config/bootstrap-asan.mk b/config/bootstrap-asan.mk
index 70baaf9..e73d4c2 100644
--- a/config/bootstrap-asan.mk
+++ b/config/bootstrap-asan.mk
@@ -1,7 +1,7 @@
 # This option enables -fsanitize=address for stage2 and stage3.
 
 # Suppress LeakSanitizer in bootstrap.
-export LSAN_OPTIONS="detect_leaks=0"
+export ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1
 
 STAGE2_CFLAGS += -fsanitize=address
 STAGE3_CFLAGS += -fsanitize=address
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b3cc6305..f19ac9d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-12-01  Maxim Ostapenko  
+
+	* asan.c (asan_global_struct): Refactor.
+	(create_odr_indicator): New function.
+	(asan_needs_odr_indicator_p): Likewise.
+	(is_odr_indicator): Likewise.
+	(asan_add_global): Introduce odr_indicator_ptr. Pass it into global's
+	constructor.
+	(asan_protect_global): Do not protect odr indicators.
+
 2016-12-01  Jakub Jelinek  
 
 	PR target/78614
diff --git a/gcc/asan.c b/gcc/asan.c
index cb5d615..5af9547 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1388,6 +1388,16 @@ asan_needs_local_alias (tree decl)
   return DECL_WEAK (decl) || !targetm.binds_local_p (decl);
 }
 
+/* Return true if DECL, a global var, is an artificial ODR indicator symbol
+   therefore doesn't need protection.  */
+
+static bool
+is_odr_indicator (tree decl)
+{
+  return (DECL_ARTIFICIAL (decl)
+	  && lookup_attribute ("asan odr indicator", DECL_ATTRIBUTES (decl)));
+}
+
 /* Return true if DECL is a VAR_DECL that should be protected
by Address Sanitizer, by appending a red zone with protected
shadow memory after it and aligning it to at least
@@ -1436,7 +1446,8 @@ asan_protect_global (tree decl)
   || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT
   || !valid_constant_size_p (DECL_SIZE_UNIT (decl))
   || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE
-  || TREE_TYPE (decl) == ubsan_get_source_location_type ())
+  || TREE_TYPE (decl) == ubsan_get_source_location_type ()
+  || is_odr_indicator (decl))
 return false;
 
   rtl = DECL_RTL (decl);
@@ -2266,14 +2277,15 @@ asan_dynamic_init_call (bool after_p)
 static tree
 asan_global_struct (void)
 {
-  static const char *field_names[8]
+  static const char *field_names[]
 = { "__beg", "__size", "__size_with_redzone",
-	"__name", "__module_name", "__has_dynamic_init", "__location", "__odr_indicator"};
-  tree fields[8], ret;
-  int i;
+	"__name", "__module_name", "__has_dynamic_init", "__location",
+	"__odr_indicator" };
+  tree fields[ARRAY_SIZE (field_names)], ret;
+  unsigned i;
 
   ret = make_node (RECORD_TYPE);
-  for (i = 0; i < 8; i++)
+  for (i = 0; i < ARRAY_SIZE (field_names); i++)
 {
   fields[i]
 	= build_decl (UNKNOWN_LOCATION, FIELD_DECL,
@@ -2295,6 +2307,63 @@ asan_global_struct (void)
   return ret;
 }
 
+/* Create and return odr indicator symbol for DECL.
+   TYPE is __asan_global struct type as returned

Re: [PATCH] ira: Don't substitute into TRAP_IF insns (PR78610)

2016-12-01 Thread Paolo Bonzini



On 30/11/2016 13:46, Segher Boessenkool wrote:
>if (JUMP_P (use_insn))
>   continue;
>  
> +  /* Also don't substitute into a conditional trap insn -- it can become
> +  an unconditional trap, and that is a flow control insn.  */
> +  if (GET_CODE (PATTERN (use_insn)) == TRAP_IF)
> + continue;

Should there be a predicate that catches JUMP_Ps but also TRAP_IF?

Paolo

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Richard Biener

On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:

> Richard!
> 
> I attached vect dump for hte part of attached test-case which
> illustrated how vectorization of epilogues works through masking:
> #define SIZE 1023
> #define ALIGN 64
> 
> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
> __SIZE_TYPE__ size) __attribute__((weak));
> extern void free (void *);
> 
> void __attribute__((noinline))
> test_citer (int * __restrict__ a,
>int * __restrict__ b,
>int * __restrict__ c)
> {
>   int i;
> 
>   a = (int *)__builtin_assume_aligned (a, ALIGN);
>   b = (int *)__builtin_assume_aligned (b, ALIGN);
>   c = (int *)__builtin_assume_aligned (c, ALIGN);
> 
>   for (i = 0; i < SIZE; i++)
> c[i] = a[i] + b[i];
> }
> 
> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
> 
> I did not include in this patch vectorization of low trip-count loops
> since in the original patch additional parameter was introduced:
> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
> +  "vect-short-loops",
> +  "Enable vectorization of low trip count loops using masking.",
> +  0, 0, 1)
> 
> I assume that this ability can be included very quickly but it
> requires cost model enhancements also.

Comments on the patch itself (as I'm having a closer look again,
I know how it vectorizes the above but I wondered why epilogue
and short-trip loops are not basically the same code path).

Btw, I don't like that the features are behind a --param paywall.
That just means a) nobody will use it, b) it will bit-rot quickly,
c) bugs are well-hidden.

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && integer_zerop (nested_in_vect_loop
+   ? STMT_VINFO_DR_STEP (stmt_info)
+   : DR_STEP (dr)))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"allow invariant load for masked loop.\n");
+}

this can test memory_access_type == VMAT_INVARIANT.  Please put
all the checks in a common

  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
{
   if (memory_access_type == VMAT_INVARIANT)
 {
 }
   else if (...)
 {
LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
 }
   else if (..)
...
}

@@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt, 
gimple_stmt_iterator *gsi, gimple **vec_stmt,
   gcc_assert (!nested_in_vect_loop);
   gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: grouped access is not"
+" supported.");
+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  }
+

isn't this already handled by the above?  Or rather the general
disallowance of SLP?

@@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt, 
gimple_stmt_iterator *gsi, gimple **vec_stmt,
&memory_access_type, &gs_info))
 return false;

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && memory_access_type != VMAT_CONTIGUOUS)
+{
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported memory access 
type.\n");
+}
+
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && !can_mask_load_store (stmt))
+{
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported masked store.\n");
+}
+

likewise please combine the ifs.

@@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
  ptr, vec_mask, vec_rhs);
  vect_finish_stmt_generation (stmt, new_stmt, gsi);
  if (i == 0)
-   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+   {
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
+   }
  else
STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
  prev_stmt_info = vinfo_for_stmt (new_stmt);

here you only set the flag, elsewhere you copy DR and VECTYPE as well.

@@ -2113,6 +2146,20 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   && !useless_type_conversion_p (vectype, rhs_vectype)))
 return false;

+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+{
+  /* Check that mask conjuction is supported.  */
+  optab tab;
+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == 
CODE_FOR_n

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni

On 25 November 2016 at 21:17, Jeff Law  wrote:
> On 11/25/2016 01:07 AM, Richard Biener wrote:
>
>>> For the tail-call, issue should we artificially create a lhs and use that
>>> as return value (perhaps by a separate pass before tailcall) ?
>>>
>>> __builtin_memcpy (a1, a2, a3);
>>> return a1;
>>>
>>> gets transformed to:
>>> _1 = __builtin_memcpy (a1, a2, a3)
>>> return _1;
>>>
>>> So tail-call optimization pass would see the IL in it's expected form.
>>
>>
>> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> itself should do this rewrite.  But if this form is required to make
>> things work (I suppose you checked it _does_ actually work?) then
>> we'd need to make sure later passes do not undo it.  So it looks
>> fragile to me.  OTOH I seem to remember that the flags we set on
>> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> verified again there?
>
> So tail calling actually sits on the border between trees and RTL.
> Essentially it's an expand-time decision as we use information from trees as
> well as low level target information.
>
> I would not expect the former sequence to tail call.  The tail calling code
> does not know that the return value from memcpy will be a1.  Thus the tail
> calling code has to assume that it'll have to copy a1 into the return
> register after returning from memcpy, which obviously can't be done if we
> tail called memcpy.
>
> The second form is much more likely to turn into a tail call sequence
> because the return value from memcpy will be sitting in the proper register.
> This form out to work for most calling conventions that allow tail calls.
>
> We could (in theory) try and exploit the fact that memcpy returns its first
> argument as a return value, but that would only be helpful on a target where
> the first argument and return value use the same register. So I'd have a
> slight preference to rewriting per Prathamesh's suggestion above since it's
> more general.
Thanks for the suggestion. The attached patch creates artificial lhs,
and returns it if the function returns it's argument and that argument
is used as return-value.

eg:
f (void * a1, void * a2, long unsigned int a3)
{
   [0.0%]:
  # .MEM_5 = VDEF <.MEM_1(D)>
  __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
  # VUSE <.MEM_5>
  return a1_2(D);

}

is transformed to:
f (void * a1, void * a2, long unsigned int a3)
{
  void * _6;

   [0.0%]:
  # .MEM_5 = VDEF <.MEM_1(D)>
  _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
  # VUSE <.MEM_5>
  return _6;

}

While testing, I came across an issue with function f() defined
intail-padding1.C:
struct X
{
  ~X() {}
  int n;
  char d;
};

X f()
{
  X nrvo;
  __builtin_memset (&nrvo, 0, sizeof(X));
  return nrvo;
}

input to the pass:
X f() ()
{
   [0.0%]:
  # .MEM_3 = VDEF <.MEM_1(D)>
  __builtin_memset (nrvo_2(D), 0, 8);
  # VUSE <.MEM_3>
  return nrvo_2(D);

}

verify_gimple_return failed with:
tail-padding1.C:13:1: error: invalid conversion in return statement
 }
 ^
struct X

struct X &

# VUSE <.MEM_3>
return _4;

It seems the return type of function (struct X) differs with the type
of return value (struct X&).
Not sure how this is possible ?
To work around that, I guarded the transform on:
useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
 TREE_TYPE (retval)))

in the patch. Does that look OK ?

Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
Cross-tested on arm*-*-*, aarch64*-*-*.

Thanks,
Prathamesh
>
>
> Jeff
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..d46ca50 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
 return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
 {
   stmt = gsi_stmt (gsi);
+  if (!ret_stmt)
+   ret_stmt = dyn_cast (stmt);
 
   /* Ignore labels, returns, nops, clobbers and debug stmts.  */
   if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,37 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
{
  call = as_a  (stmt);
  ass_var = gimple_call_lhs (call);
+ if (!ass_var)
+   {
+ /* Check if function returns one if it's arguments
+

[patch,testsuite,avr]: Filter-out -mmcu= from options for tests that set -mmcu=

2016-12-01 Thread Georg-Johann Lay

This patch moves the compile tests that have a hard coded -mmcu=MCU in 
their dg-options to a new folder.


The exp driver filters out -mmcu= from the command line options that are 
provided by, say, board description files or --tool-opts.


This is needed because otherwise conflicting -mmcu= will FAIL respective 
test cases because of "specified option '-mmcu' more than once" errors 
from avr-gcc.


Ok for trunk?

Johann

gcc/testsuite/
* gcc.target/avr/mmcu: New folder for compile-tests with -mmcu=.
* gcc.target/avr/mmcu/avr-mmcu.exp: New file.
* gcc.target/avr/pr58545.c: Move to gcc.target/avr/mmcu.
* gcc.target/avr/tiny-caller-save.c: Dito.
* gcc.target/avr/tiny-memx.c: Dito.
Index: gcc.target/avr/mmcu/avr-mmcu.exp
===
--- gcc.target/avr/mmcu/avr-mmcu.exp	(nonexistent)
+++ gcc.target/avr/mmcu/avr-mmcu.exp	(working copy)
@@ -0,0 +1,99 @@
+# Copyright (C) 2008-2016 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# This folder contains compile tests that set dg-options to
+# some -mmcu= which might collide with the MCU set by the
+# target board.  This in turn will fail the test case due to
+# "error: specified option '-mmcu' more than once".
+#
+# Hence we filter out -mmcu= from cflags and --tool_opts before
+# running the tests.
+
+# Exit immediately if this isn't an AVR target.
+if ![istarget avr-*-*] then {
+  return
+}
+
+# Return the saved values of the variable_list
+proc save_variables { variable_list } {
+set saved_variable { }
+
+foreach variable $variable_list {
+	upvar 1 $variable  var
+
+	set save($variable) $var
+	lappend saved_variable $save($variable)
+}
+return $saved_variable
+}
+
+# Restore the values of the variable_list
+proc restore_variables { variable_list saved_variable } {
+foreach variable $variable_list value $saved_variable {
+	upvar 1 $variable  var
+	set var $value
+}
+}
+
+# Filter out -mmcu= options
+proc filter_out_mmcu { options } {
+set reduced {}
+
+foreach option [ split $options ] {
+	if { ![ regexp "\-mmcu=.*" $option ] } {
+	lappend reduced $option
+	}
+}
+
+return [ join $reduced " " ]
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# If no --tool_opts were specified, use empty ones.
+if ![info exists TOOL_OPTIONS] then {
+set TOOL_OPTIONS ""
+}
+
+# Initialize `dg'.
+dg-init
+
+# Save
+set variablelist [ list TOOL_OPTIONS board_info([target_info name],cflags) ]
+set saved_value [ save_variables $variablelist ]
+
+# Filter-out -mmcu=
+set TOOL_OPTIONS [ filter_out_mmcu $TOOL_OPTIONS ]
+set board_info([ target_info name ],cflags) [ filter_out_mmcu $board_info([ target_info name ],cflags) ] 
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.{\[cCS\],cpp}]] \
+	"" $DEFAULT_CFLAGS
+
+# Restore
+restore_variables $variablelist $saved_value
+
+# All done.
+dg-finish
Index: gcc.target/avr/mmcu/pr58545.c
===
Index: gcc.target/avr/mmcu/tiny-caller-save.c
===
--- gcc.target/avr/mmcu/tiny-caller-save.c	(revision 243105)
+++ gcc.target/avr/mmcu/tiny-caller-save.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target avr_tiny } } */
+/* { dg-do compile } */
 /* { dg-options "-mmcu=avrtiny -gdwarf -Os" } */
 
 /* This is a stripped down piece of libgcc2.c that triggerd an ICE for avr with
Index: gcc.target/avr/mmcu/tiny-memx.c
===
--- gcc.target/avr/mmcu/tiny-memx.c	(revision 243105)
+++ gcc.target/avr/mmcu/tiny-memx.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target avr_tiny } } */
+/* { dg-do compile } */
 /* { dg-options "-mmcu=avrtiny" } */
 
 const __memx char ascmonth[] = "Jan"; /* { dg-error "not supported" } */
Index: gcc.target/avr/pr58545.c
===
--- gcc.target/avr/pr58545.c	(revision 243104)
+++ gcc.target/avr/pr58545.c	(nonexistent)
@@ -1,23 +0,0 @@
-/* { dg-do

Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Segher Boessenkool

On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:
> >> Thinking about this again maybe targets w/o insn-length should simply
> >> always use the 'simple' algorithm instead of the STV one?  At least that
> >> might be what your change effectively does in some way?
> >
> > From reading the comments I don't think STC will collapse down into the
> > simple algorithm if block copying is disabled.  But Segher would know for
> > sure.
> >
> > WRT the choice of simple vs STC, I doubt it matters much for the processors
> > in question.
> 
> I guess STC doesn't make much sense if we can't say anything about BB sizes.

STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.  It never copies code; if that means it
has to jump every second insn, so be it.  It provably is within a factor
three of optimal (optimal is NP-hard), under a really weak assumption
within a factor two, and it does better than that in practice.

STC without block copying makes longer traces which is not a good idea
for most architectures, only for those that have a short jump that is
much shorter than longer jumps (and thus, cannot cover many jump
targets).

I do not know how STC behaves when it does not know the insn lengths.

Segher

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 21:17, Jeff Law  wrote:
> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >
> >>> For the tail-call, issue should we artificially create a lhs and use that
> >>> as return value (perhaps by a separate pass before tailcall) ?
> >>>
> >>> __builtin_memcpy (a1, a2, a3);
> >>> return a1;
> >>>
> >>> gets transformed to:
> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >>> return _1;
> >>>
> >>> So tail-call optimization pass would see the IL in it's expected form.
> >>
> >>
> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> itself should do this rewrite.  But if this form is required to make
> >> things work (I suppose you checked it _does_ actually work?) then
> >> we'd need to make sure later passes do not undo it.  So it looks
> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> verified again there?
> >
> > So tail calling actually sits on the border between trees and RTL.
> > Essentially it's an expand-time decision as we use information from trees as
> > well as low level target information.
> >
> > I would not expect the former sequence to tail call.  The tail calling code
> > does not know that the return value from memcpy will be a1.  Thus the tail
> > calling code has to assume that it'll have to copy a1 into the return
> > register after returning from memcpy, which obviously can't be done if we
> > tail called memcpy.
> >
> > The second form is much more likely to turn into a tail call sequence
> > because the return value from memcpy will be sitting in the proper register.
> > This form out to work for most calling conventions that allow tail calls.
> >
> > We could (in theory) try and exploit the fact that memcpy returns its first
> > argument as a return value, but that would only be helpful on a target where
> > the first argument and return value use the same register. So I'd have a
> > slight preference to rewriting per Prathamesh's suggestion above since it's
> > more general.
> Thanks for the suggestion. The attached patch creates artificial lhs,
> and returns it if the function returns it's argument and that argument
> is used as return-value.
> 
> eg:
> f (void * a1, void * a2, long unsigned int a3)
> {
>[0.0%]:
>   # .MEM_5 = VDEF <.MEM_1(D)>
>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>   # VUSE <.MEM_5>
>   return a1_2(D);
> 
> }
> 
> is transformed to:
> f (void * a1, void * a2, long unsigned int a3)
> {
>   void * _6;
> 
>[0.0%]:
>   # .MEM_5 = VDEF <.MEM_1(D)>
>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>   # VUSE <.MEM_5>
>   return _6;
> 
> }
> 
> While testing, I came across an issue with function f() defined
> intail-padding1.C:
> struct X
> {
>   ~X() {}
>   int n;
>   char d;
> };
> 
> X f()
> {
>   X nrvo;
>   __builtin_memset (&nrvo, 0, sizeof(X));
>   return nrvo;
> }
> 
> input to the pass:
> X f() ()
> {
>[0.0%]:
>   # .MEM_3 = VDEF <.MEM_1(D)>
>   __builtin_memset (nrvo_2(D), 0, 8);
>   # VUSE <.MEM_3>
>   return nrvo_2(D);
> 
> }
> 
> verify_gimple_return failed with:
> tail-padding1.C:13:1: error: invalid conversion in return statement
>  }
>  ^
> struct X
> 
> struct X &
> 
> # VUSE <.MEM_3>
> return _4;
> 
> It seems the return type of function (struct X) differs with the type
> of return value (struct X&).
> Not sure how this is possible ?

You need to honor DECL_BY_REFERENCE of DECL_RESULT.

> To work around that, I guarded the transform on:
> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>  TREE_TYPE (retval)))
> 
> in the patch. Does that look OK ?
> 
> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> 
> Thanks,
> Prathamesh
> >
> >
> > Jeff
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Unset used bit in simplify_replace_* on newly copied rtxs (PR target/78614)

2016-12-01 Thread Bernd Schmidt


On 11/30/2016 11:11 PM, Jakub Jelinek wrote:

I run into the problem that while simplify_replace_rtx if it actually does
copy_rtx (for y) or if it simplifies something through
simplify_gen_{unary,binary,relational,ternary}, the used bits on the newly
created rtxes are cleared, when we fall through into the fallback
simplify_replace_fn_rtx handling, it calls shallow_copy_rtx which copies the
set used bit and thus copy_rtx_if_shared copies it again.


Shouldn't the bit be cleared in shallow_copy_rtx then?


Bernd

Re: [PING] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-01 Thread Dominik Vogt

Ping.

On Mon, Nov 21, 2016 at 01:36:47PM +0100, Dominik Vogt wrote:
> On Fri, Nov 11, 2016 at 12:10:28PM +0100, Dominik Vogt wrote:
> > On Mon, Nov 07, 2016 at 09:29:26PM +0100, Bernd Schmidt wrote:
> > > On 10/31/2016 08:56 PM, Dominik Vogt wrote:
> > > 
> > > >combine_simplify_rtx() tries to replace rtx expressions with just two
> > > >possible values with an experession that uses if_then_else:
> > > >
> > > >  (if_then_else (condition) (value1) (value2))
> > > >
> > > >If the original expression is e.g.
> > > >
> > > >  (and (reg) (const_int 2))
> > > 
> > > I'm not convinced that if_then_else_cond is the right place to do
> > > this. That function is designed to answer the question of whether an
> > > rtx has exactly one of two values and under which condition; I feel
> > > it should continue to work this way.
> > > 
> > > Maybe simplify_ternary_expression needs to be taught to deal with this 
> > > case?
> > 
> > But simplify_ternary_expression isn't called with the following
> > test program (only tried it on s390x):
> > 
> >   void bar(int, int); 
> >   int foo(int a, int *b) 
> >   { 
> > if (a) 
> >   bar(0, *b & 2); 
> > return *b; 
> >   } 
> > 
> > combine_simplify_rtx() is called with 
> > 
> >   (sign_extend:DI (and:SI (reg:SI 61) (const_int 2)))
> > 
> > In the switch it calls simplify_unary_operation(), which return
> > NULL.  The next thing it does is call if_then_else_cond(), and
> > that calls itself with the sign_extend peeled off:
> > 
> >   (and:SI (reg:SI 61) (const_int 2))
> > 
> > takes the "BINARY_P (x)" path and returns false.  The problem
> > exists only if the (and ...) is wrapped in ..._extend, i.e. the
> > ondition dealing with (and ...) directly can be removed from the
> > patch.
> > 
> > So, all recursive calls to if_then_els_cond() return false, and
> > finally the condition in
> > 
> > else if (HWI_COMPUTABLE_MODE_P (mode) 
> >&& pow2p_hwi (nz = nonzero_bits (x, mode))
> > 
> > is true.
> > 
> > Thus, if if_then_else_cond should remain unchanged, the only place
> > to fix this would be after the call to if_then_else_cond() in
> > combine_simplify_rtx().  Actually, there already is some special
> > case handling to override the return code of if_then_else_cond():
> > 
> >   cond = if_then_else_cond (x, &true_rtx, &false_rtx); 
> >   if (cond != 0 
> >   /* If everything is a comparison, what we have is highly unlikely 
> >  to be simpler, so don't use it.  */ 
> > --->  && ! (COMPARISON_P (x) 
> > && (COMPARISON_P (true_rtx) || COMPARISON_P (false_rtx 
> > { 
> >   rtx cop1 = const0_rtx; 
> >   enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1); 
> >  
> > --->  if (cond_code == NE && COMPARISON_P (cond)) 
> > return x; 
> >   ...
> > 
> > Should be easy to duplicate the test in the if-body, if that is
> > what you prefer:
> > 
> >   ...
> >   if (HWI_COMPUTABLE_MODE_P (GET_MODE (x)) 
> >   && pow2p_hwi (nz = nonzero_bits (x, GET_MODE (x))) 
> >   && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND) 
> > && GET_CODE (XEXP (x, 0)) == AND 
> > && CONST_INT_P (XEXP (XEXP (x, 0), 0)) 
> > && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz)) 
> > return x; 
> > 
> > (untested)
> 
> Updated and tested version of the patch attached.  The extra logic
> is now in combine_simplify_rtx.

> gcc/ChangeLog
> 
>   * combine.c (combine_simplify_rtx):  Suppress replacement of
>   "(and (reg) (const_int bit))" with "if_then_else".

> >From 2ebe692928b4ebee3fa6dc02136980801a04b33d Mon Sep 17 00:00:00 2001
> From: Dominik Vogt 
> Date: Mon, 31 Oct 2016 09:00:31 +0100
> Subject: [PATCH] Do not simplify "(and (reg) (const bit)" to if_then_else.
> 
> combine_simplify_rtx() tries to replace rtx expressions with just two
> possible values with an experession that uses if_then_else:
> 
>   (if_then_else (condition) (value1) (value2))
> 
> If the original expression is e.g.
> 
>   (and (reg) (const_int 2))
> 
> where the constant is the mask for a single bit, the replacement results
> in a more complex expression than before:
> 
>   (if_then_else (ne (zero_extract (reg) (1) (31))) (2) (0))
> 
> Similar replacements are done for
> 
>   (signextend (and ...))
>   (zeroextend (and ...))
> 
> Suppress the replacement this special case in if_then_else_cond().
> ---
>  gcc/combine.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index b22a274..457fe8a 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5575,10 +5575,23 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> int in_dest,
>   {
> rtx cop1 = const0_rtx;
> enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1);
> +   unsigned HOST_WIDE_INT nz;
>  
> if (cond_code == NE && COMPARISON_P (cond)

Re: [PING] (v2) Add a "compact" mode to print_rtx_function

2016-12-01 Thread Bernd Schmidt


On 12/01/2016 11:12 AM, Dominik Vogt wrote:


diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 77e6b05..5370602 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
int idx)
   if (INSN_HAS_LOCATION (in_insn))
{
  expanded_location xloc = insn_location (in_insn);
- fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+ if (m_compact)
+   fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+ else
+   fprintf (m_outfile, " %s:%i", xloc.file, xloc.line);
}
 #endif
 }
--
1.8.5.3


I'd like to get our test failure fixed, either by changing
print-rtl.c or our test case.  Is the above patch good for trunk?
It does fix the s390 test failure.


I still don't see a strong reason not to print the quotes, so I'd 
suggest changing the testcase.



Bernd

Re: [PATCH][TER] PR 48863: Don't replace expressions across local register variable definitions

2016-12-01 Thread Richard Biener

On Thu, Dec 1, 2016 at 12:14 PM, Kyrill Tkachov
 wrote:
>
> On 24/11/16 15:12, Richard Biener wrote:
>>
>> On Thu, Nov 24, 2016 at 2:57 PM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> In this bug we have TER during out-of-ssa misbehaving. A minimal example
>>> is:
>>> void foo(unsigned a, float b)
>>> {
>>>unsigned c = (unsigned) b;  // 1
>>>register unsigned r0 __asm__("r0") = a; // 2
>>>register unsigned r1 __asm__("r1") = c; // 3
>>>  __asm__ volatile( "str %[r1], [%[r0]]\n"
>>>:
>>>: [r0] "r" (r0),
>>>  [r1] "r" (r1));
>>> }
>>>
>>> Statement 1 can produce a libcall to convert float b into an int and TER
>>> moves it
>>> into statement 3. But the libcall clobbers r0, which we want set to 'a'
>>> in
>>> statement 2.
>>> So r0 gets clobbered by the argument to the conversion libcall.
>>>
>>> TER already has code to avoid substituting across function calls and
>>> ideally
>>> we'd teach it
>>> to not substitute expressions that can perform a libcall across register
>>> variable definitions
>>> where the register can be clobbered in a libcall, but that information is
>>> not easy to get hold
>>> off in a general way at this level.
>>>
>>> So this patch disallows replacement across any local register variable
>>> definition. It does this
>>> by keeping track of local register definitions encountered in a similar
>>> way
>>> to which calls are
>>> counted for TER purposes.
>>>
>>> I hope this is not too big a hammer as local register variables are not
>>> very
>>> widely used and we
>>> only disable replacement across them and it allows us to fix the
>>> wrong-code
>>> bug on some
>>> inline-asm usage scenarios where gcc currently miscompiles code following
>>> its documented
>>> advice [1]
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>> aarch64-none-linux-gnu,
>>> x86_64.
>>>
>>> Is this approach acceptable?
>>
>> Ok.
>
>
> Thanks.
> This has been in trunk for a week without any problems.
> Is it ok to backport to the branches?
> I have bootstrapped and tested arm-none-linux-gnueabihf on them.

I think so.

Richard.

> Kyrill
>
>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Kyrill
>>>
>>> [1]
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
>>>
>>> 2016-11-24  Kyrylo Tkachov  
>>>
>>>  PR target/48863
>>>  PR inline-asm/70184
>>>  * tree-ssa-ter.c (temp_expr_table): Add reg_vars_cnt field.
>>>  (new_temp_expr_table): Initialise reg_vars_cnt.
>>>  (free_temp_expr_table): Release reg_vars_cnt.
>>>  (process_replaceable): Add reg_vars_cnt argument, set reg_vars_cnt
>>>  field of TAB.
>>>  (find_replaceable_in_bb): Use the above to record register variable
>>>  write occurrences and cancel replacement across them.
>>>
>>> 2016-11-24  Kyrylo Tkachov  
>>>
>>>  PR target/48863
>>>  PR inline-asm/70184
>>>  * gcc.target/arm/pr48863.c: New test.
>
>

Re: [PATCH v2] improve folding of expressions that move a single bit around

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Paolo Bonzini wrote:

> In code like the following from KVM:
> 
> /* it is a read fault? */
> error_code = (exit_qualification << 2) & PFERR_FETCH_MASK;
> 
> it would be nicer to write
> 
> /* it is a read fault? */
> error_code = (exit_qualification & VMX_EPT_READ_FAULT_MASK) ? 
> PFERR_FETCH_MASK : 0;
> 
> instead of having to know the difference between the positions of the
> source and destination bits.  LLVM catches the latter just fine (which
> is why I am sending this in stage 3...), but GCC does not, so this
> patch adds two patterns to catch it.
> 
> The combine.c hunk of v1 has been committed already.
> 
> Bootstrapped/regtested x86_64-pc-linux-gnu, ok?

Ok.

Thanks,
Richard.

> Paolo
> 
> 2016-11-26  Paolo Bonzini  
> 
>   * match.pd: Simplify X ? C : 0 where C is a power of 2 and
>   X tests a single bit.
> 
> 2016-11-26  Paolo Bonzini  
> 
>   * gcc.dg/fold-and-lshift.c, gcc.dg/fold-and-rshift-1.c,
>   gcc.dg/fold-and-rshift-2.c: New testcases.
> 
> Index: match.pd
> ===
> --- match.pd  (revision 242916)
> +++ match.pd  (working copy)
> @@ -2630,6 +2630,21 @@
>(cmp (bit_and@2 @0 integer_pow2p@1) @1)
>(icmp @2 { build_zero_cst (TREE_TYPE (@0)); })))
>   
> +/* If we have (A & C) != 0 ? D : 0 where C and D are powers of 2,
> +   convert this into a shift followed by ANDing with D.  */
> +(simplify
> + (cond
> +  (ne (bit_and @0 integer_pow2p@1) integer_zerop)
> +  integer_pow2p@2 integer_zerop)
> + (with {
> +int shift = wi::exact_log2 (@2) - wi::exact_log2 (@1);
> +  }
> +  (if (shift > 0)
> +   (bit_and
> +(lshift (convert @0) { build_int_cst (integer_type_node, shift); }) @2)
> +   (bit_and
> +(convert (rshift @0 { build_int_cst (integer_type_node, -shift); })) 
> @2
> +
>  /* If we have (A & C) != 0 where C is the sign bit of A, convert
> this into A < 0.  Similarly for (A & C) == 0 into A >= 0.  */
>  (for cmp (eq ne)
> @@ -2644,6 +2659,19 @@
> (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
>  (ncmp (convert:stype @0) { build_zero_cst (stype); })
>  
> +/* If we have A < 0 ? C : 0 where C is a power of 2, convert
> +   this into a right shift followed by ANDing with C.  */
> +(simplify
> + (cond
> +  (lt @0 integer_zerop)
> +  integer_pow2p@1 integer_zerop)
> + (with {
> +int shift = element_precision (@0) - wi::exact_log2 (@1) - 1;
> +  }
> +  (bit_and
> +   (convert (rshift @0 { build_int_cst (integer_type_node, shift); }))
> +   @1)))
> +
>  /* When the addresses are not directly of decls compare base and offset.
> This implements some remaining parts of fold_comparison address
> comparisons but still no complete part of it.  Still it is good
> Index: testsuite/gcc.dg/fold-and-lshift.c
> ===
> --- testsuite/gcc.dg/fold-and-lshift.c(revision 0)
> +++ testsuite/gcc.dg/fold-and-lshift.c(working copy)
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-original" } */
> +
> +int f(int x)
> +{
> + return (x << 2) & 128;
> +}
> +
> +int g(int x)
> +{
> + return !!(x & 32) << 7;
> +}
> +
> +int h(int x)
> +{
> + return ((x >> 5) & 1) << 7;
> +}
> +
> +int i(int x)
> +{
> + return (x & 32) >> 5 << 7;
> +}
> +
> +int j(int x)
> +{
> + return ((x >> 5) & 1) ? 128 : 0;
> +}
> +
> +int k(int x)
> +{
> + return (x & 32) ? 128 : 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " \\? " "original" } } */
> +/* { dg-final { scan-assembler-not "sarl" { target i?86-*-* x86_64-*-* } } 
> }" */
> Index: testsuite/gcc.dg/fold-and-rshift-1.c
> ===
> --- testsuite/gcc.dg/fold-and-rshift-1.c  (revision 0)
> +++ testsuite/gcc.dg/fold-and-rshift-1.c  (working copy)
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-original" } */
> +
> +int f(int x)
> +{
> + return (x >> 2) & 128;
> +}
> +
> +int g(int x)
> +{
> + return !!(x & 512) << 7;
> +}
> +
> +int h(int x)
> +{
> + return ((x >> 9) & 1) << 7;
> +}
> +
> +int i(int x)
> +{
> + return (x & 512) >> 9 << 7;
> +}
> +
> +int j(int x)
> +{
> + return ((x >> 9) & 1) ? 128 : 0;
> +}
> +
> +int k(int x)
> +{
> + return (x & 512) ? 128 : 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " \\? " "original" } } */
> +/* { dg-final { scan-assembler-not "sall" { target i?86-*-* x86_64-*-* } } 
> }" */
> Index: testsuite/gcc.dg/fold-and-rshift-2.c
> ===
> --- testsuite/gcc.dg/fold-and-rshift-2.c  (revision 0)
> +++ testsuite/gcc.dg/fold-and-rshift-2.c  (working copy)
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-original" } */
> +
> +unsigned f(unsigned x)
> +{
> + return (x >> 29) & 32;
> +}
> +
> +unsigned g(unsigned

Re: [patch] boehm-gc removal and libobjc changes to build with an external bdw-gc

2016-12-01 Thread Matthias Klose

On 30.11.2016 18:26, Jeff Law wrote:
> On 11/30/2016 09:53 AM, Matthias Klose wrote:
>> On 30.11.2016 12:38, Richard Biener wrote:
>>> On Wed, Nov 30, 2016 at 12:30 PM, Matthias Klose  wrote:
 There's one more fix needed for the case of only having the pkg-config 
 module
 installed when configuring with --enable-objc-gc. We can't use
 PKG_CHECK_MODULES
 directly because the pkg.m4 macros choke on the dash in the module name. 
 Thus
 setting the CFLAGS and LIBS directly. Ok to install?
>>>
>>> Why not fix pkg.m4?
>>>
>>> Richard.
>>
>> Jakub suggested to avoid using pkg-config at all, so we can get rid off this
>> code altogether.
> I thought we'd OK'd pkg-config (for JIT) which is why I didn't call it out.
> 
> Looking now, pkg-config got NAKd there and was removed.

ok, removed again.

Matthias

Re: [PATCH] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-01 Thread Bernd Schmidt


On 11/21/2016 01:36 PM, Dominik Vogt wrote:

diff --git a/gcc/combine.c b/gcc/combine.c
index b22a274..457fe8a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5575,10 +5575,23 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
{
  rtx cop1 = const0_rtx;
  enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1);
+ unsigned HOST_WIDE_INT nz;

  if (cond_code == NE && COMPARISON_P (cond))
return x;

+ /* If the operation is an AND wrapped in a SIGN_EXTEND or ZERO_EXTEND
+with either operand being just a constant single bit value, do
+nothing since IF_THEN_ELSE is likely to increase the expression's
+complexity.  */
+ if (HWI_COMPUTABLE_MODE_P (mode)
+ && pow2p_hwi (nz = nonzero_bits (x, mode))
+ && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
+   && GET_CODE (XEXP (x, 0)) == AND
+   && CONST_INT_P (XEXP (XEXP (x, 0), 0))
+   && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
+   return x;


It looks like this doesn't actually use cond or true/false_rtx. So this 
could be placed just above the call to if_then_else_cond to avoid 
unnecessary work. Ok if that works.



Bernd

Re: [PR middle-end/78548] fix latent double free in tree-ssa-uninit.c

2016-12-01 Thread Richard Biener

On Thu, Dec 1, 2016 at 12:03 PM, Aldy Hernandez  wrote:
> This looks like a latent problem in the -Wmaybe-uninitialized code unrelated
> to my work.
>
> The problem here is a sequence of simplify_preds() followed by
> normalize_preds() that I added, but is based on prior art all over the file:
>
> +  simplify_preds (&uninit_preds, NULL, false);
> +  uninit_preds = normalize_preds (uninit_preds, NULL, false);
>
> The problem is that in this particular testcase we trigger simplify_preds_4
> which makes a copy of a chain, frees the chain, and then tries to use the
> chain later (in normalize_preds).  The normalize_preds() function tries to
> free again the chain and we blow up:
>
> This is the main problem in simplify_preds_4:
>
>   /* Now clean up the chain.  */
>   if (simplified)
> {
>   for (i = 0; i < n; i++)
> {
>   if ((*preds)[i].is_empty ())
> continue;
>   s_preds.safe_push ((*preds)[i]);
> // 
> // Makes a copy of the pred_chain.
> }
>
>   destroy_predicate_vecs (preds);
> // ^^^
> // free() all the pred_chain's.
>
>   (*preds) = s_preds;
> // ^^
> // Wait a minute, we still keep a copy of the pred_chains.
>   s_preds = vNULL;
> }
>
> I have no idea how this worked even before my patch.  Perhaps we never had a
> simplify_preds() followed by a normalize_preds() where the simplification
> involved a call to simplify_preds_4.
>
> Interestingly enough, simplify_preds_2() has the exact same code, but with
> the fix I am proposing. So this seems like an oversight.  Also, the fact
> that the simplification in simplify_preds_2 is more common than the
> simplification performed in simplify_preds_4 would suggest that
> simplify_preds_4 was uncommon enough and probably wasn't been used in a
> simplify_preds/normalize_preds combo.
>
> Anyways... I've made some other cleanups to the code, but the main gist of
> the entire patch is:
>
> -  destroy_predicate_vecs (preds);
> +  preds->release ();
>
> That is, release preds, but don't free the associated memory with the
> pred_chain's therein.
>
> This patch is on top of my pending patch here:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02900.html
>
> Tested on x86-64 Linux.
>
> OK for trunk?

Ok.

Richard.

> Aldy

[PATCH] PR rtl-optimization/78596 - combine.c:12561:14: runtime error: left shift of negative value

2016-12-01 Thread Markus Trippelsdorf

Hopefully one last patch for UB in combine.c:

 combine.c:12561:14: runtime error: left shift of negative value -9

Fixed by casting to unsigned, as usual.

Tested on ppc64le.
OK for trunk?

Thanks.

PR rtl-optimization/78596
* combine.c (simplify_comparison): Cast to unsigned to avoid
left shifting of negative value.

diff --git a/gcc/combine.c b/gcc/combine.c
index faafcb741f41..e32c02b06810 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -12561,7 +12561,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx 
*pop1)
  if (GET_CODE (op0) == LSHIFTRT)
code = unsigned_condition (code);

- const_op <<= INTVAL (XEXP (op0, 1));
+ const_op = (unsigned HOST_WIDE_INT) const_op
+ << INTVAL (XEXP (op0, 1));
  if (low_bits != 0
  && (code == GT || code == GTU
  || code == LE || code == LEU))

--
Markus

Re: [PATCH] Minimal reimplementation of errors.c within read-md.c

2016-12-01 Thread Bernd Schmidt


On 11/30/2016 09:24 PM, David Malcolm wrote:


gcc/ChangeLog:
* read-md.c (have_error): New global, copied from errors.c.
(fatal): New function, copied from errors.c.


I would have expected the function to go into diagnostic.c, but actually 
there are already various functions of this sort in read-md. I'd request 
you place it near fatal_at, and maybe add this to errors.h:


inline bool seen_error ()
{
  return have_error;
}

and replace explicit uses of have_error with that to unify things a bit.


Bernd

Re: PR78599

2016-12-01 Thread Richard Biener

On Thu, Dec 1, 2016 at 11:07 AM, Prathamesh Kulkarni
 wrote:
> Hi,
> As mentioned in PR, the issue seems to be that in
> propagate_bits_accross_jump_functions(),
> ipa_get_type() returns record_type during WPA and hence we pass
> invalid precision to
> ipcp_bits_lattice::meet_with (value, mask, precision) which eventually
> leads to runtime error.
> The attached patch tries to fix that, by bailing out if type of param
> is not integral or pointer type.
> This happens for the edge from deque_test -> _Z4copyIPd1BEvT_S2_T0_.isra.0/9.

Feels more like a DECL_BY_REFERENCE mishandling and should be fixed elsewhere.

> However I am not sure how ipcp_bits_lattice::meet_with (value, mask,
> precision) gets called for this case. In
> ipa_compute_jump_functions_for_edge(), we set jfunc->bits.known to
> true only
> if parm's type satisfies INTEGRAL_TYPE_P or POINTER_TYPE_P.
> And ipcp_bits_lattice::meet_with (value, mask, precision) is called
> only if jfunc->bits.known
> is set to true. So I suppose it shouldn't really happen that
> ipcp_bits_lattice::meet_with(value, mask, precision) gets called when
> callee parameter's type is record_type, since the corresponding
> argument's type would also need to be record_type and
> jfunc->bits.known would be set to false.
>
> Without -flto, parm_type is reference_type so that satisfies POINTER_TYPE_P,
> but with -flto it's appearing to be record_type. Is this possibly the
> same issue of TYPE_ARG_TYPES returning bogus types during WPA ?
>
> I verified the attached patch fixes the runtime error with ubsan-built gcc.
> Bootstrap+tested on x86_64-unknown-linux-gnu.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> LTO bootstrap on x86_64-unknown-linux-gnu in progress.
> Is it OK to commit if it succeeds ?
>
> Thanks,
> Prathamesh

[PATCH] Fix PR tree-optimization/78598 - tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow

2016-12-01 Thread Markus Trippelsdorf

Using bootstrap-ubsan gcc to build mplayer shows:

tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow:
288230376151711743 * 64 cannot be represented in type 'long int'

Here signed und unsigned integers are mixed in a division resulting in
bogus results: (-83 + 64ULL -1) / 64ULL) == 288230376151711743

Fixed by casting the unsigned parameter to signed.

Tested on ppc64le.
OK for trunk?

Thanks.

PR tree-optimization/78598
* tree-ssa-loop-prefetch.c (ddown): Cast to signed to avoid
overflows.


diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 0a2ee5ea25fd..ead2543ada46 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -700,9 +700,9 @@ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
   gcc_assert (by > 0);
 
   if (x >= 0)
-return x / by;
+return x / (HOST_WIDE_INT) by;
   else
-return (x + by - 1) / by;
+return (x + (HOST_WIDE_INT) by - 1) / (HOST_WIDE_INT) by;
 }
 
 /* Given a CACHE_LINE_SIZE and two inductive memory references
-- 
Markus

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni

On 1 December 2016 at 17:40, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >
>> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >>>
>> >>> __builtin_memcpy (a1, a2, a3);
>> >>> return a1;
>> >>>
>> >>> gets transformed to:
>> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >>> return _1;
>> >>>
>> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >>
>> >>
>> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> itself should do this rewrite.  But if this form is required to make
>> >> things work (I suppose you checked it _does_ actually work?) then
>> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> verified again there?
>> >
>> > So tail calling actually sits on the border between trees and RTL.
>> > Essentially it's an expand-time decision as we use information from trees 
>> > as
>> > well as low level target information.
>> >
>> > I would not expect the former sequence to tail call.  The tail calling code
>> > does not know that the return value from memcpy will be a1.  Thus the tail
>> > calling code has to assume that it'll have to copy a1 into the return
>> > register after returning from memcpy, which obviously can't be done if we
>> > tail called memcpy.
>> >
>> > The second form is much more likely to turn into a tail call sequence
>> > because the return value from memcpy will be sitting in the proper 
>> > register.
>> > This form out to work for most calling conventions that allow tail calls.
>> >
>> > We could (in theory) try and exploit the fact that memcpy returns its first
>> > argument as a return value, but that would only be helpful on a target 
>> > where
>> > the first argument and return value use the same register. So I'd have a
>> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> > more general.
>> Thanks for the suggestion. The attached patch creates artificial lhs,
>> and returns it if the function returns it's argument and that argument
>> is used as return-value.
>>
>> eg:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>[0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return a1_2(D);
>>
>> }
>>
>> is transformed to:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>   void * _6;
>>
>>[0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return _6;
>>
>> }
>>
>> While testing, I came across an issue with function f() defined
>> intail-padding1.C:
>> struct X
>> {
>>   ~X() {}
>>   int n;
>>   char d;
>> };
>>
>> X f()
>> {
>>   X nrvo;
>>   __builtin_memset (&nrvo, 0, sizeof(X));
>>   return nrvo;
>> }
>>
>> input to the pass:
>> X f() ()
>> {
>>[0.0%]:
>>   # .MEM_3 = VDEF <.MEM_1(D)>
>>   __builtin_memset (nrvo_2(D), 0, 8);
>>   # VUSE <.MEM_3>
>>   return nrvo_2(D);
>>
>> }
>>
>> verify_gimple_return failed with:
>> tail-padding1.C:13:1: error: invalid conversion in return statement
>>  }
>>  ^
>> struct X
>>
>> struct X &
>>
>> # VUSE <.MEM_3>
>> return _4;
>>
>> It seems the return type of function (struct X) differs with the type
>> of return value (struct X&).
>> Not sure how this is possible ?
>
> You need to honor DECL_BY_REFERENCE of DECL_RESULT.
Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
resolved the error.
Does the attached version look OK ?
Validation in progress.

Thanks,
Prathamesh
>
>> To work around that, I guarded the transform on:
>> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>>  TREE_TYPE (retval)))
>>
>> in the patch. Does that look OK ?
>>
>> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> Cross-tested on arm*-*-*, aarch64*-*-*.
>>
>> Thanks,
>> Prathamesh
>> >
>> >
>> > Jeff
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..a1c8bd7 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 17:40, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >
> >> >>> For the tail-call, issue should we artificially create a lhs and use 
> >> >>> that
> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >>>
> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >>> return a1;
> >> >>>
> >> >>> gets transformed to:
> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >>> return _1;
> >> >>>
> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >>
> >> >>
> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> verified again there?
> >> >
> >> > So tail calling actually sits on the border between trees and RTL.
> >> > Essentially it's an expand-time decision as we use information from 
> >> > trees as
> >> > well as low level target information.
> >> >
> >> > I would not expect the former sequence to tail call.  The tail calling 
> >> > code
> >> > does not know that the return value from memcpy will be a1.  Thus the 
> >> > tail
> >> > calling code has to assume that it'll have to copy a1 into the return
> >> > register after returning from memcpy, which obviously can't be done if we
> >> > tail called memcpy.
> >> >
> >> > The second form is much more likely to turn into a tail call sequence
> >> > because the return value from memcpy will be sitting in the proper 
> >> > register.
> >> > This form out to work for most calling conventions that allow tail calls.
> >> >
> >> > We could (in theory) try and exploit the fact that memcpy returns its 
> >> > first
> >> > argument as a return value, but that would only be helpful on a target 
> >> > where
> >> > the first argument and return value use the same register. So I'd have a
> >> > slight preference to rewriting per Prathamesh's suggestion above since 
> >> > it's
> >> > more general.
> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> and returns it if the function returns it's argument and that argument
> >> is used as return-value.
> >>
> >> eg:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>[0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return a1_2(D);
> >>
> >> }
> >>
> >> is transformed to:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>   void * _6;
> >>
> >>[0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return _6;
> >>
> >> }
> >>
> >> While testing, I came across an issue with function f() defined
> >> intail-padding1.C:
> >> struct X
> >> {
> >>   ~X() {}
> >>   int n;
> >>   char d;
> >> };
> >>
> >> X f()
> >> {
> >>   X nrvo;
> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >>   return nrvo;
> >> }
> >>
> >> input to the pass:
> >> X f() ()
> >> {
> >>[0.0%]:
> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >>   # VUSE <.MEM_3>
> >>   return nrvo_2(D);
> >>
> >> }
> >>
> >> verify_gimple_return failed with:
> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >>  }
> >>  ^
> >> struct X
> >>
> >> struct X &
> >>
> >> # VUSE <.MEM_3>
> >> return _4;
> >>
> >> It seems the return type of function (struct X) differs with the type
> >> of return value (struct X&).
> >> Not sure how this is possible ?
> >
> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> resolved the error.
> Does the attached version look OK ?

+ ass_var = make_ssa_name (TREE_TYPE (arg));

can you try

  ass_var = copy_ssa_name (arg);

instead?  That way the underlying decl should make sure the
DECL_BY_REFERENCE check in the IL verification works.

Thanks,
Richard.


> Validation in progress.
> 
> Thanks,
> Prathamesh
> >
> >> To work around that, I guarded the transform on:
> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >>  TREE_TYPE (retval)))
> >>
> >> in the patch. Does that look OK ?
> >>
> >> Bootstrap+tested on x86_64-unknown-linux-gnu with 
> >> --enable-languages=all,ada.
> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >
> >> > Jeff
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HR

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni

On 1 December 2016 at 18:26, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 17:40, Richard Biener  wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >
>> >> >>> For the tail-call, issue should we artificially create a lhs and use 
>> >> >>> that
>> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >>>
>> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >>> return a1;
>> >> >>>
>> >> >>> gets transformed to:
>> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >>> return _1;
>> >> >>>
>> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >>
>> >> >>
>> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> verified again there?
>> >> >
>> >> > So tail calling actually sits on the border between trees and RTL.
>> >> > Essentially it's an expand-time decision as we use information from 
>> >> > trees as
>> >> > well as low level target information.
>> >> >
>> >> > I would not expect the former sequence to tail call.  The tail calling 
>> >> > code
>> >> > does not know that the return value from memcpy will be a1.  Thus the 
>> >> > tail
>> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> > register after returning from memcpy, which obviously can't be done if 
>> >> > we
>> >> > tail called memcpy.
>> >> >
>> >> > The second form is much more likely to turn into a tail call sequence
>> >> > because the return value from memcpy will be sitting in the proper 
>> >> > register.
>> >> > This form out to work for most calling conventions that allow tail 
>> >> > calls.
>> >> >
>> >> > We could (in theory) try and exploit the fact that memcpy returns its 
>> >> > first
>> >> > argument as a return value, but that would only be helpful on a target 
>> >> > where
>> >> > the first argument and return value use the same register. So I'd have a
>> >> > slight preference to rewriting per Prathamesh's suggestion above since 
>> >> > it's
>> >> > more general.
>> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> and returns it if the function returns it's argument and that argument
>> >> is used as return-value.
>> >>
>> >> eg:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>[0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return a1_2(D);
>> >>
>> >> }
>> >>
>> >> is transformed to:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   void * _6;
>> >>
>> >>[0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return _6;
>> >>
>> >> }
>> >>
>> >> While testing, I came across an issue with function f() defined
>> >> intail-padding1.C:
>> >> struct X
>> >> {
>> >>   ~X() {}
>> >>   int n;
>> >>   char d;
>> >> };
>> >>
>> >> X f()
>> >> {
>> >>   X nrvo;
>> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >>   return nrvo;
>> >> }
>> >>
>> >> input to the pass:
>> >> X f() ()
>> >> {
>> >>[0.0%]:
>> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >>   # VUSE <.MEM_3>
>> >>   return nrvo_2(D);
>> >>
>> >> }
>> >>
>> >> verify_gimple_return failed with:
>> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >>  }
>> >>  ^
>> >> struct X
>> >>
>> >> struct X &
>> >>
>> >> # VUSE <.MEM_3>
>> >> return _4;
>> >>
>> >> It seems the return type of function (struct X) differs with the type
>> >> of return value (struct X&).
>> >> Not sure how this is possible ?
>> >
>> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> resolved the error.
>> Does the attached version look OK ?
>
> + ass_var = make_ssa_name (TREE_TYPE (arg));
>
> can you try
>
>   ass_var = copy_ssa_name (arg);
>
> instead?  That way the underlying decl should make sure the
> DECL_BY_REFERENCE check in the IL verification works.
Done in the attached version and verified tail-padding1.C passes with
the change.
Does it look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
>> Validation in progress.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> To work around that, I guarded the transform on:
>> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >>

Re: [PATCH PR78559][RFC]Proposed fix

2016-12-01 Thread Segher Boessenkool

Hi!

On Thu, Dec 01, 2016 at 09:47:51AM +, Bin Cheng wrote:
> After investigation, I believe PR78559 is a combine issue revealed by tree 
> level change.  Root causes is after replacing CC register use in 
> undobuf.other_insn, its REG_EQUAL/REG_EQUIV notes are no longer valid because 
> meaning of CC register has changed in i2/i3 instructions by combine.  For 
> following combine sequence, GCC would try to use the note and result in wrong 
> code.  This is a proposed patch discarding all REG_EQUAL/REG_EQUIV notes for 
> other_insn.  It might be a over-kill, but it's difficult to analyze whether 
> registers have been changed or not?  Bootstrap and test on x86_64 and 
> AArch64, any suggestion on how to fix this?
> 

Why is distribute_notes not called on this?  (Search for "We now know",
25 lines down).  Ah, it is only called on the *new* notes, and it only
deletes existing REG_DEAD/REG_UNUSED notes.  You probably should delete
a REG_EQ* here if the reg it refers to is not the same anymore / does
not have the same contents.  Well, except you cannot see the latter here,
so you'll have to kill the note where the SET of the CC is changed.

I'll have a look later; maybe this already helps though.

Segher

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:26, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >
> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
> >> >> >>> use that
> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >>>
> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >>> return a1;
> >> >> >>>
> >> >> >>> gets transformed to:
> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >>> return _1;
> >> >> >>>
> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
> >> >> >>> form.
> >> >> >>
> >> >> >>
> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> verified again there?
> >> >> >
> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> > Essentially it's an expand-time decision as we use information from 
> >> >> > trees as
> >> >> > well as low level target information.
> >> >> >
> >> >> > I would not expect the former sequence to tail call.  The tail 
> >> >> > calling code
> >> >> > does not know that the return value from memcpy will be a1.  Thus the 
> >> >> > tail
> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> > register after returning from memcpy, which obviously can't be done 
> >> >> > if we
> >> >> > tail called memcpy.
> >> >> >
> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> > because the return value from memcpy will be sitting in the proper 
> >> >> > register.
> >> >> > This form out to work for most calling conventions that allow tail 
> >> >> > calls.
> >> >> >
> >> >> > We could (in theory) try and exploit the fact that memcpy returns its 
> >> >> > first
> >> >> > argument as a return value, but that would only be helpful on a 
> >> >> > target where
> >> >> > the first argument and return value use the same register. So I'd 
> >> >> > have a
> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
> >> >> > since it's
> >> >> > more general.
> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> and returns it if the function returns it's argument and that argument
> >> >> is used as return-value.
> >> >>
> >> >> eg:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>[0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return a1_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> is transformed to:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   void * _6;
> >> >>
> >> >>[0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return _6;
> >> >>
> >> >> }
> >> >>
> >> >> While testing, I came across an issue with function f() defined
> >> >> intail-padding1.C:
> >> >> struct X
> >> >> {
> >> >>   ~X() {}
> >> >>   int n;
> >> >>   char d;
> >> >> };
> >> >>
> >> >> X f()
> >> >> {
> >> >>   X nrvo;
> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >>   return nrvo;
> >> >> }
> >> >>
> >> >> input to the pass:
> >> >> X f() ()
> >> >> {
> >> >>[0.0%]:
> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >>   # VUSE <.MEM_3>
> >> >>   return nrvo_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> verify_gimple_return failed with:
> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >>  }
> >> >>  ^
> >> >> struct X
> >> >>
> >> >> struct X &
> >> >>
> >> >> # VUSE <.MEM_3>
> >> >> return _4;
> >> >>
> >> >> It seems the return type of function (struct X) differs with the type
> >> >> of return value (struct X&).
> >> >> Not sure how this is possible ?
> >> >
> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> resolved the error.
> >> Does the attached version look OK ?
> >
> > + ass_var = make_ssa_name (TREE_TYPE (arg));
> >
> > can you try
> >
> >   ass_var = copy_ssa_name (arg);
> >
> > instead?  That way the underlying decl should make sure the
> > DECL_BY_REFERENCE check in the IL verification works.
> Done in the attached version and verified tail-paddi

Re: [PATCH] ira: Don't substitute into TRAP_IF insns (PR78610)

2016-12-01 Thread Segher Boessenkool

On Thu, Dec 01, 2016 at 12:24:37PM +0100, Paolo Bonzini wrote:
> 
> 
> On 30/11/2016 13:46, Segher Boessenkool wrote:
> >if (JUMP_P (use_insn))
> > continue;
> >  
> > +  /* Also don't substitute into a conditional trap insn -- it can 
> > become
> > +an unconditional trap, and that is a flow control insn.  */
> > +  if (GET_CODE (PATTERN (use_insn)) == TRAP_IF)
> > +   continue;
> 
> Should there be a predicate that catches JUMP_Ps but also TRAP_IF?

Maybe.  A conditional TRAP_IF is quite unlike a JUMP, and having two
separate statements here is handy because I need to put that comment
somewhere ;-)


Segher

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni

On 1 December 2016 at 18:38, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 18:26, Richard Biener  wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
>> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >> >
>> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
>> >> >> >>> use that
>> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >> >>>
>> >> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >> >>> return a1;
>> >> >> >>>
>> >> >> >>> gets transformed to:
>> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >> >>> return _1;
>> >> >> >>>
>> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
>> >> >> >>> form.
>> >> >> >>
>> >> >> >>
>> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> >> verified again there?
>> >> >> >
>> >> >> > So tail calling actually sits on the border between trees and RTL.
>> >> >> > Essentially it's an expand-time decision as we use information from 
>> >> >> > trees as
>> >> >> > well as low level target information.
>> >> >> >
>> >> >> > I would not expect the former sequence to tail call.  The tail 
>> >> >> > calling code
>> >> >> > does not know that the return value from memcpy will be a1.  Thus 
>> >> >> > the tail
>> >> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> >> > register after returning from memcpy, which obviously can't be done 
>> >> >> > if we
>> >> >> > tail called memcpy.
>> >> >> >
>> >> >> > The second form is much more likely to turn into a tail call sequence
>> >> >> > because the return value from memcpy will be sitting in the proper 
>> >> >> > register.
>> >> >> > This form out to work for most calling conventions that allow tail 
>> >> >> > calls.
>> >> >> >
>> >> >> > We could (in theory) try and exploit the fact that memcpy returns 
>> >> >> > its first
>> >> >> > argument as a return value, but that would only be helpful on a 
>> >> >> > target where
>> >> >> > the first argument and return value use the same register. So I'd 
>> >> >> > have a
>> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
>> >> >> > since it's
>> >> >> > more general.
>> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> >> and returns it if the function returns it's argument and that argument
>> >> >> is used as return-value.
>> >> >>
>> >> >> eg:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>[0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return a1_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> is transformed to:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   void * _6;
>> >> >>
>> >> >>[0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return _6;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> While testing, I came across an issue with function f() defined
>> >> >> intail-padding1.C:
>> >> >> struct X
>> >> >> {
>> >> >>   ~X() {}
>> >> >>   int n;
>> >> >>   char d;
>> >> >> };
>> >> >>
>> >> >> X f()
>> >> >> {
>> >> >>   X nrvo;
>> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >> >>   return nrvo;
>> >> >> }
>> >> >>
>> >> >> input to the pass:
>> >> >> X f() ()
>> >> >> {
>> >> >>[0.0%]:
>> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >> >>   # VUSE <.MEM_3>
>> >> >>   return nrvo_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> verify_gimple_return failed with:
>> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >> >>  }
>> >> >>  ^
>> >> >> struct X
>> >> >>
>> >> >> struct X &
>> >> >>
>> >> >> # VUSE <.MEM_3>
>> >> >> return _4;
>> >> >>
>> >> >> It seems the return type of function (struct X) differs with the type
>> >> >> of return value (struct X&).
>> >> >> Not sure how this is possible ?
>> >> >
>> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> >> resolved the error.
>> >> Does the attached version look OK ?
>> >
>> > + ass_var = make_ssa_name (TREE_TYPE (arg));
>> >
>> > can you try
>> >
>> >   ass_var = copy_

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:38, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 18:26, Richard Biener  wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
> >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >> >
> >> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
> >> >> >> >>> use that
> >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >> >>>
> >> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >> >>> return a1;
> >> >> >> >>>
> >> >> >> >>> gets transformed to:
> >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >> >>> return _1;
> >> >> >> >>>
> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
> >> >> >> >>> form.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> >> itself should do this rewrite.  But if this form is required to 
> >> >> >> >> make
> >> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> >> verified again there?
> >> >> >> >
> >> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> >> > Essentially it's an expand-time decision as we use information 
> >> >> >> > from trees as
> >> >> >> > well as low level target information.
> >> >> >> >
> >> >> >> > I would not expect the former sequence to tail call.  The tail 
> >> >> >> > calling code
> >> >> >> > does not know that the return value from memcpy will be a1.  Thus 
> >> >> >> > the tail
> >> >> >> > calling code has to assume that it'll have to copy a1 into the 
> >> >> >> > return
> >> >> >> > register after returning from memcpy, which obviously can't be 
> >> >> >> > done if we
> >> >> >> > tail called memcpy.
> >> >> >> >
> >> >> >> > The second form is much more likely to turn into a tail call 
> >> >> >> > sequence
> >> >> >> > because the return value from memcpy will be sitting in the proper 
> >> >> >> > register.
> >> >> >> > This form out to work for most calling conventions that allow tail 
> >> >> >> > calls.
> >> >> >> >
> >> >> >> > We could (in theory) try and exploit the fact that memcpy returns 
> >> >> >> > its first
> >> >> >> > argument as a return value, but that would only be helpful on a 
> >> >> >> > target where
> >> >> >> > the first argument and return value use the same register. So I'd 
> >> >> >> > have a
> >> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
> >> >> >> > since it's
> >> >> >> > more general.
> >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> >> and returns it if the function returns it's argument and that 
> >> >> >> argument
> >> >> >> is used as return-value.
> >> >> >>
> >> >> >> eg:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return a1_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> is transformed to:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   void * _6;
> >> >> >>
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return _6;
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> While testing, I came across an issue with function f() defined
> >> >> >> intail-padding1.C:
> >> >> >> struct X
> >> >> >> {
> >> >> >>   ~X() {}
> >> >> >>   int n;
> >> >> >>   char d;
> >> >> >> };
> >> >> >>
> >> >> >> X f()
> >> >> >> {
> >> >> >>   X nrvo;
> >> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >> >>   return nrvo;
> >> >> >> }
> >> >> >>
> >> >> >> input to the pass:
> >> >> >> X f() ()
> >> >> >> {
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >> >>   # VUSE <.MEM_3>
> >> >> >>   return nrvo_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> verify_gimple_return failed with:
> >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >> >>  }
> >> >> >>  ^
> >> >> >> struct X
> >> >> >>
> >> >> >> struct X &
> >> >> >>
> >> >> >> # VUSE <.MEM_3>
> >> >> >> return _4;
> >> >> >>
> >> >> >> It seems the return type of function (struct X) differs with the type
> >> >> >> of return value (struct X&).
> >> >> >> N

PR78629

2016-12-01 Thread Prathamesh Kulkarni

Hi Richard,
I tested your fix for the patch with ubsan stage-1 built gcc, and it
fixes the error.
Is it OK to commit if bootstrap+test passes on x86_64-unknown-linux-gnu ?

Thanks,
Prathamesh
2016-12-01  Richard Biener  
Prathamesh Kulkarni  

PR middle-end/78629
* vec.h (vec::quick_grow_cleared): Guard call to
memset if len-oldlen != 0.
(vec::safe_grow_cleared): Likewise.

diff --git a/gcc/vec.h b/gcc/vec.h
index 14fb2a6..aa93411 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1092,8 +1092,10 @@ inline void
 vec::quick_grow_cleared (unsigned len)
 {
   unsigned oldlen = length ();
+  size_t sz = sizeof (T) * (len - oldlen);
   quick_grow (len);
-  memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen));
+  if (sz != 0)
+memset (&(address ()[oldlen]), 0, sz);
 }
 
 
@@ -1605,8 +1607,10 @@ inline void
 vec::safe_grow_cleared (unsigned len MEM_STAT_DECL)
 {
   unsigned oldlen = length ();
+  size_t sz = sizeof (T) * (len - oldlen);
   safe_grow (len PASS_MEM_STAT);
-  memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen));
+  if (sz != 0)
+memset (&(address ()[oldlen]), 0, sz);
 }

Re: PR78629

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> I tested your fix for the patch with ubsan stage-1 built gcc, and it
> fixes the error.
> Is it OK to commit if bootstrap+test passes on x86_64-unknown-linux-gnu ?

Ok.

Richard.

Re: [PATCH 5/9] Introduce selftest::locate_file (v4)

2016-12-01 Thread Bernd Schmidt


On 11/11/2016 10:15 PM, David Malcolm wrote:

+  /* Makefile.in has -fself-test=$(srcdir)/testsuite/selftests, so that
+ flag_self_test contains the path to the selftest subdirectory of the
+ source tree (without a trailing slash).  Copy it up to
+ path_to_selftest_files, to avoid selftest.c depending on
+ option-handling.  */
+  path_to_selftest_files = flag_self_test;
+


What kind of dependency are you avoiding? If it's just one include I'd 
prefer to get rid of the extraneous variable.


Otherwise ok.


Bernd

Re: [PATCH] PR rtl-optimization/78596 - combine.c:12561:14: runtime error: left shift of negative value

2016-12-01 Thread Segher Boessenkool

On Thu, Dec 01, 2016 at 01:34:29PM +0100, Markus Trippelsdorf wrote:
> Hopefully one last patch for UB in combine.c:
> 
>  combine.c:12561:14: runtime error: left shift of negative value -9
> 
> Fixed by casting to unsigned, as usual.
> 
> Tested on ppc64le.
> OK for trunk?

Sure, but please fix the indentation of that last new line (and of the
changelog, too, while you're at it ;-) )


Segher


> -   const_op <<= INTVAL (XEXP (op0, 1));
> +   const_op = (unsigned HOST_WIDE_INT) const_op
> +   << INTVAL (XEXP (op0, 1));

Re: Calling 'abort' on bounds violations in libmpx

2016-12-01 Thread Alexander Ivchenko

Should changing minor version of the library be enough?

diff --git a/libmpx/mpxrt/libtool-version b/libmpx/mpxrt/libtool-version
index 7d99255..736d763 100644
--- a/libmpx/mpxrt/libtool-version
+++ b/libmpx/mpxrt/libtool-version
@@ -3,4 +3,4 @@
 # a separate file so that version updates don't involve re-running
 # automake.
 # CURRENT:REVISION:AGE
-2:0:0
+2:1:0

(otherwise - no difference).

I've run make check on a non-mpx-enabled machine (no new regressions)
and manually tested newly added environment variable on the mpx
machine. It looks like there is no explicit tests for libmpx, so I'm
not sure what tests should I add. What do you think would be the right
testing process here?

2016-11-29 20:22 GMT+03:00 Ilya Enkovich :
> 2016-11-29 17:43 GMT+03:00 Alexander Ivchenko :
>> Hi,
>>
>> Attached patch is addressing PR67520. Would that approach work for the
>> problem? Should I also change the version of the library?
>
> Hi!
>
> Overall patch is OK. But you need to change version because you
> change default behavior. How did you test it? Did you check default
> behavior change doesn't affect existing runtime MPX tests? Can we
> add new ones?
>
> Thanks,
> Ilya
>
>>
>> 2016-11-29  Alexander Ivchenko  
>>
>> * mpxrt/mpxrt-utils.c (set_mpx_rt_stop_handler): New function.
>> (print_help): Add help for CHKP_RT_STOP_HANDLER environment
>> variable.
>> (__mpxrt_init_env_vars): Add initialization of stop_handler.
>> (__mpxrt_stop_handler): New function.
>> (__mpxrt_stop): Ditto.
>> * mpxrt/mpxrt-utils.h (mpx_rt_stop_mode_handler_t): New enum.
>>
>>
>>
>> diff --git a/libmpx/mpxrt/mpxrt-utils.c b/libmpx/mpxrt/mpxrt-utils.c
>> index 057a355..63ee7c6 100644
>> --- a/libmpx/mpxrt/mpxrt-utils.c
>> +++ b/libmpx/mpxrt/mpxrt-utils.c
>> @@ -60,6 +60,9 @@
>>  #define MPX_RT_MODE "CHKP_RT_MODE"
>>  #define MPX_RT_MODE_DEFAULT MPX_RT_COUNT
>>  #define MPX_RT_MODE_DEFAULT_STR "count"
>> +#define MPX_RT_STOP_HANDLER "CHKP_RT_STOP_HANDLER"
>> +#define MPX_RT_STOP_HANDLER_DEFAULT MPX_RT_STOP_HANDLER_ABORT
>> +#define MPX_RT_STOP_HANDLER_DEFAULT_STR "abort"
>>  #define MPX_RT_HELP "CHKP_RT_HELP"
>>  #define MPX_RT_ADDPID "CHKP_RT_ADDPID"
>>  #define MPX_RT_BNDPRESERVE "CHKP_RT_BNDPRESERVE"
>> @@ -84,6 +87,7 @@ typedef struct {
>>  static int summary;
>>  static int add_pid;
>>  static mpx_rt_mode_t mode;
>> +static mpx_rt_stop_mode_handler_t stop_handler;
>>  static env_var_list_t env_var_list;
>>  static verbose_type verbose_val;
>>  static FILE *out;
>> @@ -226,6 +230,23 @@ set_mpx_rt_mode (const char *env)
>>}
>>  }
>>
>> +static mpx_rt_stop_mode_handler_t
>> +set_mpx_rt_stop_handler (const char *env)
>> +{
>> +  if (env == 0)
>> +return MPX_RT_STOP_HANDLER_DEFAULT;
>> +  else if (strcmp (env, "abort") == 0)
>> +return MPX_RT_STOP_HANDLER_ABORT;
>> +  else if (strcmp (env, "exit") == 0)
>> +return MPX_RT_STOP_HANDLER_EXIT;
>> +  {
>> +__mpxrt_print (VERB_ERROR, "Illegal value '%s' for %s. Legal values are"
>> +   "[abort | exit]\nUsing default value %s\n",
>> +   env, MPX_RT_STOP_HANDLER, MPX_RT_STOP_HANDLER_DEFAULT);
>> +return MPX_RT_STOP_HANDLER_DEFAULT;
>> +  }
>> +}
>> +
>>  static void
>>  print_help (void)
>>  {
>> @@ -244,6 +265,11 @@ print_help (void)
>>fprintf (out, "%s \t\t set MPX runtime behavior on #BR exception."
>> " [stop | count]\n"
>> "\t\t\t [default: %s]\n", MPX_RT_MODE, MPX_RT_MODE_DEFAULT_STR);
>> +  fprintf (out, "%s \t set the handler function MPX runtime will call\n"
>> +   "\t\t\t on #BR exception when %s is set to \'stop\'."
>> +   " [abort | exit]\n"
>> +   "\t\t\t [default: %s]\n", MPX_RT_STOP_HANDLER, MPX_RT_MODE,
>> +   MPX_RT_STOP_HANDLER_DEFAULT_STR);
>>fprintf (out, "%s \t\t generate out,err file for each process.\n"
>> "\t\t\t generated file will be MPX_RT_{OUT,ERR}_FILE.pid\n"
>> "\t\t\t [default: no]\n", MPX_RT_ADDPID);
>> @@ -357,6 +383,10 @@ __mpxrt_init_env_vars (int* bndpreserve)
>>env_var_list_add (MPX_RT_MODE, env);
>>mode = set_mpx_rt_mode (env);
>>
>> +  env = secure_getenv (MPX_RT_STOP_HANDLER);
>> +  env_var_list_add (MPX_RT_STOP_HANDLER, env);
>> +  stop_handler = set_mpx_rt_stop_handler (env);
>> +
>>env = secure_getenv (MPX_RT_BNDPRESERVE);
>>env_var_list_add (MPX_RT_BNDPRESERVE, env);
>>validate_bndpreserve (env, bndpreserve);
>> @@ -487,6 +517,22 @@ __mpxrt_mode (void)
>>return mode;
>>  }
>>
>> +mpx_rt_mode_t
>> +__mpxrt_stop_handler (void)
>> +{
>> +  return stop_handler;
>> +}
>> +
>> +void __attribute__ ((noreturn))
>> +__mpxrt_stop (void)
>> +{
>> +  if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_ABORT)
>> +abort ();
>> +  else if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_EXIT)
>> +exit (255);
>> +  __builtin_unreachable ();
>> +}
>> +
>>  void
>>  __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size)
>>  {
>> diff --git a/libmpx/mpxrt/mpxrt-utils.h b/libmpx/mpxrt/mpxrt-utils.h
>> index d62937d..6da12cc 100644
>> --- a/libmpx/mpxrt/mpxrt-utils.h
>> +++ b/libmpx

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Yuri Rumyantsev

Thanks Richard for your comments.

You asked me about possible performance improvements for AVX2 machines
- we did not see any visible speed-up for spec2k with any method of
masking, including epilogue masking and combining, only on AVX512
machine aka knl.

I will answer on your question later.

Best regards.
Yuri

2016-12-01 14:33 GMT+03:00 Richard Biener :
> On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard!
>>
>> I attached vect dump for hte part of attached test-case which
>> illustrated how vectorization of epilogues works through masking:
>> #define SIZE 1023
>> #define ALIGN 64
>>
>> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
>> __SIZE_TYPE__ size) __attribute__((weak));
>> extern void free (void *);
>>
>> void __attribute__((noinline))
>> test_citer (int * __restrict__ a,
>>int * __restrict__ b,
>>int * __restrict__ c)
>> {
>>   int i;
>>
>>   a = (int *)__builtin_assume_aligned (a, ALIGN);
>>   b = (int *)__builtin_assume_aligned (b, ALIGN);
>>   c = (int *)__builtin_assume_aligned (c, ALIGN);
>>
>>   for (i = 0; i < SIZE; i++)
>> c[i] = a[i] + b[i];
>> }
>>
>> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
>>
>> I did not include in this patch vectorization of low trip-count loops
>> since in the original patch additional parameter was introduced:
>> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
>> +  "vect-short-loops",
>> +  "Enable vectorization of low trip count loops using masking.",
>> +  0, 0, 1)
>>
>> I assume that this ability can be included very quickly but it
>> requires cost model enhancements also.
>
> Comments on the patch itself (as I'm having a closer look again,
> I know how it vectorizes the above but I wondered why epilogue
> and short-trip loops are not basically the same code path).
>
> Btw, I don't like that the features are behind a --param paywall.
> That just means a) nobody will use it, b) it will bit-rot quickly,
> c) bugs are well-hidden.
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && integer_zerop (nested_in_vect_loop
> +   ? STMT_VINFO_DR_STEP (stmt_info)
> +   : DR_STEP (dr)))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"allow invariant load for masked loop.\n");
> +}
>
> this can test memory_access_type == VMAT_INVARIANT.  Please put
> all the checks in a common
>
>   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> {
>if (memory_access_type == VMAT_INVARIANT)
>  {
>  }
>else if (...)
>  {
> LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>  }
>else if (..)
> ...
> }
>
> @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
>gcc_assert (!nested_in_vect_loop);
>gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: grouped access is not"
> +" supported.");
> + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  }
> +
>
> isn't this already handled by the above?  Or rather the general
> disallowance of SLP?
>
> @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
> &memory_access_type, &gs_info))
>  return false;
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && memory_access_type != VMAT_CONTIGUOUS)
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported memory access
> type.\n");
> +}
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && !can_mask_load_store (stmt))
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported masked store.\n");
> +}
> +
>
> likewise please combine the ifs.
>
> @@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt,
> gimple_stmt_iterator *gsi,
>   ptr, vec_mask, vec_rhs);
>   vect_finish_stmt_generation (stmt, new_stmt, gsi);
>   if (i == 0)
> -   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +   {
> + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> + STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
> +   }
>   else
> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>   prev_stmt_in

Re: [ARM] PR 78253 do not resolve weak ref locally

2016-12-01 Thread Christophe Lyon

Hi,


On 10 November 2016 at 15:10, Christophe Lyon
 wrote:
> On 10 November 2016 at 11:05, Richard Earnshaw
>  wrote:
>> On 09/11/16 21:29, Christophe Lyon wrote:
>>> Hi,
>>>
>>> PR 78253 shows that the handling of weak references has changed for
>>> ARM with gcc-5.
>>>
>>> When r220674 was committed, default_binds_local_p_2 gained a new
>>> parameter (weak_dominate), which, when true, implies that a reference
>>> to a weak symbol defined locally will be resolved locally, even though
>>> it could be overridden by a strong definition in another object file.
>>>
>>> With r220674, default_binds_local_p forces weak_dominate=true,
>>> effectively changing the previous behavior.
>>>
>>> The attached patch introduces default_binds_local_p_4 which is a copy
>>> of default_binds_local_p_2, but using weak_dominate=false, and updates
>>> the ARM target to call default_binds_local_p_4 instead of
>>> default_binds_local_p_2.
>>>
>>> I ran cross-tests on various arm* configurations with no regression,
>>> and checked that the test attached to the original bugzilla now works
>>> as expected.
>>>
>>> I am not sure why weak_dominate defaults to true, and I couldn't
>>> really understand why by reading the threads related to r220674 and
>>> following updates to default_binds_local_p_* which all deal with other
>>> corner cases and do not discuss the weak_dominate parameter.
>>>
>>> Or should this patch be made more generic?
>>>
>>
>> I certainly don't think it should be ARM specific.
> That was my feeling too.
>
>>
>> The questions I have are:
>>
>> 1) What do other targets do today.  Are they the same, or different?
>
> arm, aarch64, s390 use default_binds_local_p_2 since PR 65780, and
> default_binds_local_p before that. Both have weak_dominate=true
> i386 has its own version, calling default_binds_local_p_3 with true
> for weak_dominate
>
> But the behaviour of default_binds_local_p changed with r220674 as I said 
> above.
> See https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=220674 and
> notice how weak_dominate was introduced
>
> The original bug report is about a different case:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219
>
> The original patch submission is
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00410.html
> and the 1st version with weak_dominate is in:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00469.html
> but it's not clear to me why this was introduced
>
>> 2) If different why?
> on aarch64, although binds_local_p returns true, the relocations used when
> building the function pointer is still the same (still via the GOT).
>
> aarch64 has different logic than arm when accessing a symbol
> (eg aarch64_classify_symbol)
>
>> 3) Is the current behaviour really what was intended by the patch?  ie.
>> Was the old behaviour actually wrong?
>>
> That's what I was wondering.
> Before r220674, calling a weak function directly or via a function
> pointer had the same effect (in other words, the function pointer
> points to the actual implementation: the strong one if any, the weak
> one otherwise).
>
> After r220674, on arm the function pointer points to the weak
> definition, which seems wrong to me, it should leave the actual
> resolution to the linker.
>
>

After looking at the aarch64 port, I think that references to weak symbols
have to be handled carefully, to make sure they cannot be resolved
by the assembler, since the weak symbol can be overridden by a strong
definition at link-time.

Here is a new patch which does that.
Validated on arm* targets with no regression, and I checked that the
original testcase now executes as expected.

Christophe


>> R.
>>> Thanks,
>>>
>>> Christophe
>>>
>>
gcc/ChangeLog:

2016-12-01  Christophe Lyon  

PR target/78253
* config/arm/arm.c (legitimize_pic_address): Handle reference to
weak symbol.
(arm_assemble_integer): Likewise.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 74cb64c..258ceb1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6923,10 +6923,13 @@ legitimize_pic_address (rtx orig, machine_mode mode, 
rtx reg)
 same segment as the GOT.  Unfortunately, the flexibility of linker
 scripts means that we can't be sure of that in general, so assume
 that GOTOFF is never valid on VxWorks.  */
+  /* References to weak symbols cannot be resolved locally: they
+may be overridden by a strong definition at link time.  */
   rtx_insn *insn;
   if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF &&
-  SYMBOL_REF_LOCAL_P (orig)))
+  || (GET_CODE (orig) == SYMBOL_REF
+  && SYMBOL_REF_LOCAL_P (orig)
+  && (SYMBOL_REF_DECL(orig) ? !DECL_WEAK(SYMBOL_REF_DECL(orig)) : 
1)))
  && NEED_GOT_RELOC
  && arm_pic_data_is_text_relative)
insn = arm_pic_static_addr (orig, reg);
@@ -21583,8 +21586,13 @@ arm_assemble_integer (rtx x, unsigned int size, int 
aligned_p)

[patch,avr] Document how to avoid progmem on AVR_TINY.

2016-12-01 Thread Georg-Johann Lay

This adds to the documentation a hint how to set up a linker description 
file that avoids progmem altogether any without the usual overhead of 
locating read-only data in RAM.  The proposed linker description file is 
completely transparent to the compiler, and no start-up code has to be 
adjusted.


IIUC there are currently no plans to fix this in the default linker 
description file avrtiny.x, cf. http://sourceware.org/PR20849


Also, link between -mabsdata option and absdata variable attribute.

Ok for trunk?


Johann


gcc/
* doc/invoke.texi (AVR Options) [-mabsdata]: Point to absdata.
* doc/extend.texi (AVR Variable Attributes) [progmem]: Hint
about linker description to avoid progmem altogether.
[absdata]: Point to -mabsdata option.

Index: doc/extend.texi
===
--- doc/extend.texi	(revision 243111)
+++ doc/extend.texi	(working copy)
@@ -5929,6 +5929,30 @@ int read_var (int i)
 @}
 @end smallexample
 
+Please notice that on these devices, there is no need for @code{progmem}
+at all.  Just use an appropriate linker description file like outlined below.
+
+@smallexample
+  .text :
+  @{ ...
+  @} > text
+  /* Leave .rodata in flash and add an offset of 0x4000 to all
+ addresses so that respective objects can be accessed by LD
+ instructions and open coded C/C++.  This means there is no
+ need for progmem in the source and no overhead by read-only
+ data in RAM.  */
+  .rodata ADDR(.text) + SIZEOF (.text) + 0x4000 :
+  @{
+*(.rodata)
+*(.rodata*)
+*(.gnu.linkonce.r*)
+  @} AT> text
+  /* No more need to put .rodata into .data:
+ Removed all .rodata entries from .data.  */
+  .data :
+  @{ ...
+@end smallexample
+
 @end table
 
 @item io
@@ -6001,6 +6025,8 @@ warning like
 
 @end itemize
 
+See also the @option{-mabsdata} @ref{AVR Options,command-line option}.
+
 @end table
 
 @node Blackfin Variable Attributes
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 243111)
+++ doc/invoke.texi	(working copy)
@@ -15402,7 +15402,8 @@ GCC supports the following AVR devices a
 
 Assume that all data in static storage can be accessed by LDS / STS
 instructions.  This option has only an effect on reduced Tiny devices like
-ATtiny40.
+ATtiny40.  See also the @code{absdata}
+@ref{AVR Variable Attributes,variable attribute}.
 
 @item -maccumulate-args
 @opindex maccumulate-args

Re: [RS6000] fix rtl checking internal compiler error

2016-12-01 Thread Bill Schmidt

Good catch, Alan, this one is my fault.  I'll handle the backports to the 5 and 
6 branches.

Bill

> On Dec 1, 2016, at 12:34 AM, Alan Modra  wrote:
> 
> I'm committing this one as obvious once my powerpc64le-linux bootstrap
> and regression check completes.  It fixes hundreds of rtl checking
> testsuite errors like the following:
> 
> gcc.c-torture/compile/pr39943.c:6:1: internal compiler error: RTL check: 
> expected elt 0 type 'e' or 'u', have 'E' (rtx unspec) in insn_is_swappable_p, 
> at config/rs6000/rs6000.c:40678
> 
>   * gcc/config/rs6000/rs6000.c (insn_is_swappable_p): Properly
>   look inside UNSPEC_VSX_XXSPLTW vec.
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 9fe98b7..7f307b1 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -40675,7 +40675,7 @@ insn_is_swappable_p (swap_web_entry *insn_entry, rtx 
> insn,
>   if (GET_CODE (use_body) != SET
>   || GET_CODE (SET_SRC (use_body)) != UNSPEC
>   || XINT (SET_SRC (use_body), 1) != UNSPEC_VSX_XXSPLTW
> - || XEXP (XEXP (SET_SRC (use_body), 0), 1) != const0_rtx)
> + || XVECEXP (SET_SRC (use_body), 0, 1) != const0_rtx)
> return 0;
> }
>   }
> 
> -- 
> Alan Modra
> Australia Development Lab, IBM
>

Re: [PATCH] Fix PR tree-optimization/78598 - tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow

2016-12-01 Thread Richard Biener

On Thu, Dec 1, 2016 at 1:49 PM, Markus Trippelsdorf
 wrote:
> Using bootstrap-ubsan gcc to build mplayer shows:
>
> tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow:
> 288230376151711743 * 64 cannot be represented in type 'long int'
>
> Here signed und unsigned integers are mixed in a division resulting in
> bogus results: (-83 + 64ULL -1) / 64ULL) == 288230376151711743
>
> Fixed by casting the unsigned parameter to signed.
>
> Tested on ppc64le.
> OK for trunk?

Ok.

Richard.

> Thanks.
>
> PR tree-optimization/78598
> * tree-ssa-loop-prefetch.c (ddown): Cast to signed to avoid
> overflows.
>
>
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 0a2ee5ea25fd..ead2543ada46 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -700,9 +700,9 @@ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
>gcc_assert (by > 0);
>
>if (x >= 0)
> -return x / by;
> +return x / (HOST_WIDE_INT) by;
>else
> -return (x + by - 1) / by;
> +return (x + (HOST_WIDE_INT) by - 1) / (HOST_WIDE_INT) by;
>  }
>
>  /* Given a CACHE_LINE_SIZE and two inductive memory references
> --
> Markus

Re: [PATCH 8/9] Introduce class function_reader (v4)

2016-12-01 Thread Bernd Schmidt


On 11/11/2016 10:15 PM, David Malcolm wrote:

 #include "gt-aarch64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6c608e0..0dda786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c


I think we should separate out the target specific tests so as to give 
port maintainers a chance to comment on them separately.



diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 50cd388..179a91f 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1371,6 +1371,19 @@ maybe_set_first_label_num (rtx_code_label *x)
   if (CODE_LABEL_NUMBER (x) < first_label_num)
 first_label_num = CODE_LABEL_NUMBER (x);
 }
+
+/* For use by the RTL function loader, when mingling with normal
+   functions.


Not sure what this means.



   if (str == 0)
-   fputs (" \"\"", m_outfile);
+   fputs (" (nil)", m_outfile);
   else
fprintf (m_outfile, " (\"%s\")", str);
   m_sawclose = 1;


What does this affect?


 /* Global singleton; constrast with md_reader_ptr above.  */
diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
new file mode 100644
index 000..ff6c808
--- /dev/null
+++ b/gcc/read-rtl-function.c
@@ -0,0 +1,2124 @@
+/* read-rtl-function.c - Reader for RTL function dumps
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+#include 


Please double-check all these includes whether they are necessary.


+
+/* Fix up a NOTE_INSN_BASIC_BLOCK based on an integer block ID.  */
+
+void
+fixup_note_insn_basic_block::apply (function_reader */*reader*/) const


Lose the /*reader*/, probably.


+
+/* Implementation of rtx_reader::handle_unknown_directive.
+
+   Require a top-level "function" elements, as emitted by
+   print_rtx_function, and parse it.  */


"element"?


+void
+function_reader::create_function ()
+{
+  /* Currently we assume cfgrtl mode, rather than cfglayout mode.  */
+  if (0)
+cfg_layout_rtl_register_cfg_hooks ();
+  else
+rtl_register_cfg_hooks ();


Do we expect to change this? I'd just get rid of the if (0), at least 
for now.



+/* cgraph_node::add_new_function does additional processing
+   based on symtab->state.  We need to avoid it attempting to gimplify
+   things.  Temporarily putting it in the PARSING state appears to
+   achieve this.  */
+enum symtab_state old_state = symtab->state;
+symtab->state = PARSING;
+cgraph_node::add_new_function (fndecl, true /*lowered*/);
+/* Reset the state.  */
+symtab->state = old_state;
+  }


Does it do anything beside call finalize_function in that state? If 
that's all you need, just call it directly.



+
+  /* Parse DECL_RTL.  */
+  {
+require_char_ws ('(');
+require_word_ws ("DECL_RTL");
+DECL_WRTL_CHECK (t_param)->decl_with_rtl.rtl = parse_rtx ();
+require_char_ws (')');
+  }


Spurious { } blocks.


+  if (0)
+fprintf (stderr, "parse_edge: %i flags 0x%x \n",
+other_bb_idx, flags);


Remove this.

+  /* For now, only process the (edge-from) to this BB, and (edge-to)
+ that go to the exit block; we don't yet verify that the edge-from
+ and edge-to directives are consistent.  */


That's probably worth a FIXME.


+  if (rtx_code_label *label = dyn_cast  (insn))
+maybe_set_max_label_num (label);


I keep forgetting why dyn_cast instead of as_a?


+case 'e':
+  {
+   if (idx == 7 && CALL_P (return_rtx))
+ {
+   m_in_call_function_usage = true;
+   return rtx_reader::read_rtx_operand (return_rtx, idx);
+   m_in_call_function_usage = false;
+ }
+   else
+ return rtx_reader::read_rtx_operand (return_rtx, idx);
+  }
+  break;


Unnecessary { } blocks in several places.


+
+case 'w':
+  {
+   if (!is_compact ())
+ {
+   /* Strip away the redundant hex dump of the value.  */
+   require_char_ws ('[');
+   read_name (&name);
+   require_char_ws (']');
+ }
+  }
+  break;


Here too.


+
+/* Special-cased handling of codes 'i' and 'n' for reading function
+   dumps.  */
+
+void
+function_reader::read_rtx_operand_i_or_n (rtx return_rtx, int idx,
+ char format_char)


Document arguments (everywhere). I think return_rtx (throughout these 
functions) is a poor name that can cause confusion because it seems to 
imply a (return).



+
+  /* Possibly wrote:
+print_node_brief (outfile, "", SYMBOL_REF_DECL (in_rtx),
+  dump_flags);  */


???


+ /* Skip the content for now.  */


Does this relate to the above? Please clarify the comments.


+  case CODE_LABEL:
+   {
+ /* Assume that LABEL_NUSES was not dumped.  */
+ /* TODO: parse LABEL_KIND.  */


Unnecessary { }.


+  if (0 == strcmp (desc, ""))
+{
+  re

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Richard Biener

On Thu, 1 Dec 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> 
> You asked me about possible performance improvements for AVX2 machines
> - we did not see any visible speed-up for spec2k with any method of

Spec 2000?  Can you check with SPEC 2006 or CPUv6?

Did you see performance degradation?  What about compile-time and
binary size effects?

> masking, including epilogue masking and combining, only on AVX512
> machine aka knl.

I see.

Note that as said in the initial review patch the cost model I
saw therein looked flawed.  In the end I'd expect a sensible
approach would be to do

 if (n < scalar-most-profitable-niter)
   {
 no vectorization
   }
 else if (n < masking-more-profitable-than-not-masking-plus-epilogue)
   {
 do masked vectorization
   }
 else
   {
 do unmasked vectorization (with epilogue, eventually vectorized)
   }

where for short trip loops the else path would never be taken
(statically).

And yes, that means masking will only be useful for short-trip loops
which in the end means an overall performance benfit is unlikely
unless we have a lot of short-trip loops that are slow because of
the overhead of main unmasked loop plus epilogue.

Richard.

> I will answer on your question later.
> 
> Best regards.
> Yuri
> 
> 2016-12-01 14:33 GMT+03:00 Richard Biener :
> > On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard!
> >>
> >> I attached vect dump for hte part of attached test-case which
> >> illustrated how vectorization of epilogues works through masking:
> >> #define SIZE 1023
> >> #define ALIGN 64
> >>
> >> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
> >> __SIZE_TYPE__ size) __attribute__((weak));
> >> extern void free (void *);
> >>
> >> void __attribute__((noinline))
> >> test_citer (int * __restrict__ a,
> >>int * __restrict__ b,
> >>int * __restrict__ c)
> >> {
> >>   int i;
> >>
> >>   a = (int *)__builtin_assume_aligned (a, ALIGN);
> >>   b = (int *)__builtin_assume_aligned (b, ALIGN);
> >>   c = (int *)__builtin_assume_aligned (c, ALIGN);
> >>
> >>   for (i = 0; i < SIZE; i++)
> >> c[i] = a[i] + b[i];
> >> }
> >>
> >> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
> >>
> >> I did not include in this patch vectorization of low trip-count loops
> >> since in the original patch additional parameter was introduced:
> >> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
> >> +  "vect-short-loops",
> >> +  "Enable vectorization of low trip count loops using masking.",
> >> +  0, 0, 1)
> >>
> >> I assume that this ability can be included very quickly but it
> >> requires cost model enhancements also.
> >
> > Comments on the patch itself (as I'm having a closer look again,
> > I know how it vectorizes the above but I wondered why epilogue
> > and short-trip loops are not basically the same code path).
> >
> > Btw, I don't like that the features are behind a --param paywall.
> > That just means a) nobody will use it, b) it will bit-rot quickly,
> > c) bugs are well-hidden.
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && integer_zerop (nested_in_vect_loop
> > +   ? STMT_VINFO_DR_STEP (stmt_info)
> > +   : DR_STEP (dr)))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"allow invariant load for masked loop.\n");
> > +}
> >
> > this can test memory_access_type == VMAT_INVARIANT.  Please put
> > all the checks in a common
> >
> >   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > {
> >if (memory_access_type == VMAT_INVARIANT)
> >  {
> >  }
> >else if (...)
> >  {
> > LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> >  }
> >else if (..)
> > ...
> > }
> >
> > @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> >gcc_assert (!nested_in_vect_loop);
> >gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"cannot be masked: grouped access is not"
> > +" supported.");
> > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  }
> > +
> >
> > isn't this already handled by the above?  Or rather the general
> > disallowance of SLP?
> >
> > @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > &memory_access_type, &gs_info))
> >  return false;
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && memory_access_type != VMAT_CONTIGUOUS)
> > +{
> > +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  if (dump_enabled_p (

[PATCH] S/390: Fix setmem-long test.

2016-12-01 Thread Dominik Vogt

The attached patch fixes the setmem_long-1.c S/390 backend test.

Adding a " in the scan-assembler pattern is necessary because of a
recent change in print-rtl.c.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog-setmem-long-test

* gcc.target/s390/md/setmem_long-1.c: Fix test.
>From 6582cbb17262b8559f632fdb9bdc30ef8e9db1c3 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 25 Nov 2016 17:44:12 +0100
Subject: [PATCH] S/390: Fix setmem-long test.

The test needs to take care of extra quotes around the file name that have been
introduced recently.
---
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c 
b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
index 933a698..bd0c594 100644
--- a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
+++ b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
@@ -16,8 +16,8 @@ void test2(char *p, int c, int len)
 }
 
 /* Check that the right patterns are used.  */
-/* { dg-final { scan-assembler-times {c:9 .*{[*]setmem_long_?3?1?z?}} 1 } } */
-/* { dg-final { scan-assembler-times {c:15 .*{[*]setmem_long_and_?3?1?z?}} 1 { 
xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {c"?:9 .*{[*]setmem_long_?3?1?z?}} 1 } } 
*/
+/* { dg-final { scan-assembler-times {c"?:15 .*{[*]setmem_long_and_?3?1?z?}} 1 
{ xfail *-*-* } } } */
 
 #define LEN 500
 char buf[LEN + 2];
-- 
2.3.0

Re: Import libcilkrts Build 4467 (PR target/68945)

2016-12-01 Thread Rainer Orth

Hi Jeff,

>> The following patch has passed x86_64-pc-linux-gnu bootstrap without
>> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
>> are currently running.
>>
>> Ok for mainline if they pass?
> Yes.  Sorry for not getting back to you sooner.

no worries; I've been on vacation for a week anyway.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[Patch 0/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread James Greenhalgh

Hi,

In PR78561, we try to make use of stale constant pool offset data when
making decisions as to whether to output an alignment directive after
the AArch64 constant pool. The offset data goes stale as we only ever
increment it when adding new constants to the pool (it represents an
upper bound on the size of the pool).

To fix that, we should recompute the offset values shortly after
sweeping through insns looking for valid constant.

That's easy enough to do (see patch 2/2) and patch 1/2 is just a simple
rename of the get_pool_size function to reflect that it is not providing
an accurate size, just an upper bound on what the size might be after
optimisation.

Technically, patch 1/2 isn't neccessary to fix the PR but cleaning up the
name seems like a useful to do.

The patch set has been bootstrapped and tested on aarch64-none-linux-gnu and
x86-64-none-linux-gnu without any issues. I've also cross-tested it for
aarch64-none-elf and build-tested it for rs6000 (though I couldn't run the
testsuite as I don't have a test environment).

OK?

Thanks,
James

---

[Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.

[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* varasm.c (recompute_pool_offsets): New.
(output_constant_pool): Call it.

gcc/testsuite/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: New.

---

 gcc/config/rs6000/rs6000.c | 23 +--
 gcc/output.h   |  7 +--
 gcc/testsuite/gcc.target/aarch64/pr78561.c |  9 +
 gcc/varasm.c   | 30 +-
 4 files changed, 56 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr78561.c

[Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

2016-12-01 Thread James Greenhalgh


Hi,

There's no functional change in this patch, just a rename.

The size recorded in "offset" is only ever incremented as we add new items
to the constant pool. But this information can become stale where those
constant pool entries would not get emitted.

Thus, it is only ever an upper bound on the size of the constant pool.

The only uses of get_pool_size are in rs6000 and there it is only used to
check whether a constant pool might be output - but explicitly renaming the
function to make it clear that you're getting an upper bound rather than the
real size can only be good for programmers using the interface.

OK?

Thanks,
James

---
2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0a6a784..7e965f9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25456,7 +25456,7 @@ rs6000_reg_live_or_pic_offset_p (int reg)
   if (TARGET_TOC && TARGET_MINIMAL_TOC
 	  && (crtl->calls_eh_return
 	  || df_regs_ever_live_p (reg)
-	  || get_pool_size ()))
+	  || get_pool_size_upper_bound ()))
 	return true;
 
   if ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)
@@ -26262,7 +26262,7 @@ rs6000_stack_info (void)
 #ifdef TARGET_RELOCATABLE
   || (DEFAULT_ABI == ABI_V4
 	  && (TARGET_RELOCATABLE || flag_pic > 1)
-	  && get_pool_size () != 0)
+	  && get_pool_size_upper_bound () != 0)
 #endif
   || rs6000_ra_ever_killed ())
 info->lr_save_p = 1;
@@ -28039,7 +28039,8 @@ rs6000_emit_prologue (void)
   cfun->machine->r2_setup_needed = df_regs_ever_live_p (TOC_REGNUM);
 
   /* With -mminimal-toc we may generate an extra use of r2 below.  */
-  if (TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
+  if (TARGET_TOC && TARGET_MINIMAL_TOC
+	  && get_pool_size_upper_bound () != 0)
 	cfun->machine->r2_setup_needed = true;
 }
 
@@ -28894,7 +28895,8 @@ rs6000_emit_prologue (void)
 
   /* If we are using RS6000_PIC_OFFSET_TABLE_REGNUM, we need to set it up.  */
   if (!TARGET_SINGLE_PIC_BASE
-  && ((TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
+  && ((TARGET_TOC && TARGET_MINIMAL_TOC
+	   && get_pool_size_upper_bound () != 0)
 	  || (DEFAULT_ABI == ABI_V4
 	  && (flag_pic == 1 || (flag_pic && TARGET_SECURE_PLT))
 	  && df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM
@@ -34961,7 +34963,7 @@ rs6000_elf_declare_function_name (FILE *file, const char *name, tree decl)
   if (DEFAULT_ABI == ABI_V4
   && (TARGET_RELOCATABLE || flag_pic > 1)
   && !TARGET_SECURE_PLT
-  && (get_pool_size () != 0 || crtl->profile)
+  && (get_pool_size_upper_bound () != 0 || crtl->profile)
   && uses_TOC ())
 {
   char buf[256];
@@ -37444,10 +37446,11 @@ static bool
 rs6000_can_eliminate (const int from, const int to)
 {
   return (from == ARG_POINTER_REGNUM && to == STACK_POINTER_REGNUM
-  ? ! frame_pointer_needed
-  : from == RS6000_PIC_OFFSET_TABLE_REGNUM
-? ! TARGET_MINIMAL_TOC || TARGET_NO_TOC || get_pool_size () == 0
-: true);
+	  ? ! frame_pointer_needed
+	  : from == RS6000_PIC_OFFSET_TABLE_REGNUM
+	? ! TARGET_MINIMAL_TOC || TARGET_NO_TOC
+		|| get_pool_size_upper_bound () == 0
+	: true);
 }
 
 /* Define the offset between two registers, FROM to be eliminated and its
@@ -38983,7 +38986,7 @@ rs6000_set_up_by_prologue (struct hard_reg_set_container *set)
   if (!TARGET_SINGLE_PIC_BASE
   && TARGET_TOC
   && TARGET_MINIMAL_TOC
-  && get_pool_size () != 0)
+  && get_pool_size_upper_bound () != 0)
 add_to_hard_reg_set (&set->set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
   if (cfun->machine->split_stack_argp_used)
 add_to_hard_reg_set (&set->set, Pmode, 12);
diff --git a/gcc/output.h b/gcc/output.h
index 0924499..7186dc1 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -287,8 +287,11 @@ extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned,
 /* Write the address of the entity given by SYMBOL to SEC.  */
 extern void assemble_addr_to_section (rtx, section *);
 
-/* Return the size of the constant pool.  */
-extern int get_pool_size (void);
+/* Return the maximum size of the constant pool.  This may be larger
+   than the final size of the constant pool, as entries may be added to
+   the constant pool which become unreferenced, or otherwise not need
+   output by the time we actual

[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread James Greenhalgh


Hi,

In PR78561, we try to make use of stale constant pool offset data when
making decisions as to whether to output an alignment directive after
the AArch64 constant pool. The offset data goes stale as we only ever
increment it when adding new constants to the pool (it represents an
upper bound on the size of the pool).

To fix that, we should recompute the offset values shortly after
sweeping through insns looking for valid constant.

I'm not totally sure about this code so I'd appreciate comments on whether
this is a sensible idea.

Bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu and
checked with aarch64-none-elf with no issues.

OK?

Thanks,
James

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* varasm.c (recompute_pool_offsets): New.
(output_constant_pool): Call it.

gcc/testsuite/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: New.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr78561.c b/gcc/testsuite/gcc.target/aarch64/pr78561.c
new file mode 100644
index 000..048d2d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr78561.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -O3 -mcmodel=tiny" } */
+
+int
+main (__fp16 x)
+{
+  __fp16 a = 6.5504e4;
+  return (x <= a);
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index f8af0c1..f3cd70a 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -3942,6 +3942,29 @@ output_constant_pool_1 (struct constant_descriptor_rtx *desc,
   return;
 }
 
+/* Recompute the offsets of entries in POOL, and the overall size of
+   POOL.  Do this after calling mark_constant_pool to ensure that we
+   are computing the offset values for the pool which we will actually
+   emit.  */
+
+static void
+recompute_pool_offsets (struct rtx_constant_pool *pool)
+{
+  struct constant_descriptor_rtx *desc;
+  pool->offset = 0;
+
+  for (desc = pool->first; desc ; desc = desc->next)
+if (desc->mark)
+  {
+	  /* Recalculate offset.  */
+	unsigned int align = desc->align;
+	pool->offset += (align / BITS_PER_UNIT) - 1;
+	pool->offset &= ~ ((align / BITS_PER_UNIT) - 1);
+	desc->offset = pool->offset;
+	pool->offset += GET_MODE_SIZE (desc->mode);
+  }
+}
+
 /* Mark all constants that are referenced by SYMBOL_REFs in X.
Emit referenced deferred strings.  */
 
@@ -4060,6 +4083,11 @@ output_constant_pool (const char *fnname ATTRIBUTE_UNUSED,
  case we do not need to output the constant.  */
   mark_constant_pool ();
 
+  /* Having marked the constant pool entries we'll actually emit, we
+ now need to rebuild the offset information, which may have become
+ stale.  */
+  recompute_pool_offsets (pool);
+
 #ifdef ASM_OUTPUT_POOL_PROLOGUE
   ASM_OUTPUT_POOL_PROLOGUE (asm_out_file, fnname, fndecl, pool->offset);
 #endif

Re: [PATCH v3] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-01 Thread Dominik Vogt

On Thu, Dec 01, 2016 at 01:33:17PM +0100, Bernd Schmidt wrote:
> On 11/21/2016 01:36 PM, Dominik Vogt wrote:
> >diff --git a/gcc/combine.c b/gcc/combine.c
> >index b22a274..457fe8a 100644
> >--- a/gcc/combine.c
> >+++ b/gcc/combine.c
> >@@ -5575,10 +5575,23 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> >int in_dest,
> > {
> >   rtx cop1 = const0_rtx;
> >   enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1);
> >+  unsigned HOST_WIDE_INT nz;
> >
> >   if (cond_code == NE && COMPARISON_P (cond))
> > return x;
> >
> >+  /* If the operation is an AND wrapped in a SIGN_EXTEND or ZERO_EXTEND
> >+ with either operand being just a constant single bit value, do
> >+ nothing since IF_THEN_ELSE is likely to increase the expression's
> >+ complexity.  */
> >+  if (HWI_COMPUTABLE_MODE_P (mode)
> >+  && pow2p_hwi (nz = nonzero_bits (x, mode))
> >+  && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
> >+&& GET_CODE (XEXP (x, 0)) == AND
> >+&& CONST_INT_P (XEXP (XEXP (x, 0), 0))
> >+&& UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
> >+return x;
> 
> It looks like this doesn't actually use cond or true/false_rtx. So
> this could be placed just above the call to if_then_else_cond to
> avoid unnecessary work. Ok if that works.

It does.  Version 3 attached, bootstrapped on s390x and regression
tested on s390x biarch and s390.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* combine.c (combine_simplify_rtx):  Suppress replacement of
"(and (reg) (const_int bit))" with "if_then_else".
>From 9202cab6332ce5dcfa740bbae3bcf07f3acc8705 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 31 Oct 2016 09:00:31 +0100
Subject: [PATCH] Do not simplify "(and (reg) (const bit)" to if_then_else.

combine_simplify_rtx() tries to replace rtx expressions with just two
possible values with an experession that uses if_then_else:

  (if_then_else (condition) (value1) (value2))

If the original expression is e.g.

  (and (reg) (const_int 2))

where the constant is the mask for a single bit, the replacement results
in a more complex expression than before:

  (if_then_else (ne (zero_extract (reg) (1) (31))) (2) (0))

Similar replacements are done for

  (signextend (and ...))
  (zeroextend (and ...))

Suppress the replacement this special case in if_then_else_cond().
---
 gcc/combine.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index a8dae89..52bde9e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5600,6 +5600,18 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
 && OBJECT_P (SUBREG_REG (XEXP (x, 0)))
 {
   rtx cond, true_rtx, false_rtx;
+  unsigned HOST_WIDE_INT nz;
+
+  /* If the operation is an AND wrapped in a SIGN_EXTEND or ZERO_EXTEND 
with
+either operand being just a constant single bit value, do nothing since
+IF_THEN_ELSE is likely to increase the expression's complexity.  */
+  if (HWI_COMPUTABLE_MODE_P (mode)
+ && pow2p_hwi (nz = nonzero_bits (x, mode))
+ && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
+   && GET_CODE (XEXP (x, 0)) == AND
+   && CONST_INT_P (XEXP (XEXP (x, 0), 0))
+   && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
+ return x;
 
   cond = if_then_else_cond (x, &true_rtx, &false_rtx);
   if (cond != 0
-- 
2.3.0

[PATCH][AARCH64]Simplify call, call_value, sibcall, sibcall_value patterns.

2016-12-01 Thread Renlin Li


Hi all,

This patch refactors the code used in call, call_value, sibcall,
sibcall_value expanders.

Before the change, the logic is following:

call expander  --> call_internal  --> call_reg/call_symbol
call_vlaue expander--> call_value_internal--> 
call_value_reg/call_value_symbol

sibcall expander   --> sibcall_internal   --> sibcall_insn
sibcall_value expander --> sibcall_value_internal --> sibcall_value_insn

After the change, the logic is simplified into:

call expander  --> aarch64_expand_call() --> call_insn
call_value expander--> aarch64_expand_call() --> call_value_insn

sibcall expander   --> aarch64_expand_call() --> sibcall_insn
sibcall_value expander --> aarch64_expand_call() --> sibcall_value_insn

The code are factored out from each expander into aarch64_expand_call ().

This also fixes the two issues Richard Henderson suggests in comments 8:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971

aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
Okay for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-12-01  Renlin Li  

* config/aarch64/aarch64-protos.h (aarch64_expand_call): Declare.
* config/aarch64/aarch64.c (aarch64_expand_call): Define.
* config/aarch64/constraints.md (Usf): Add long call check.
* config/aarch64/aarch64.md (call): Use aarch64_expand_call.
(call_value): Likewise.
(sibcall): Likewise.
(sibcall_value): Likewise.
(call_insn): New.
(call_value_insn): New.
(sibcall_insn): Update rtx pattern.
(sibcall_value_insn): Likewise.
(call_internal): Remove.
(call_value_internal): Likewise.
(sibcall_internal): Likewise.
(sibcall_value_internal): Likewise.
(call_reg): Likewise.
(call_symbol): Likewise.
(call_value_reg): Likewise.
(call_value_symbol): Likewise.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7f67f14..3a5babb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -305,6 +305,7 @@ bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+void aarch64_expand_call (rtx, rtx, bool);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68a3380..c313cf5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4343,6 +4343,51 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* This function is used by the call expanders of the machine description.
+   RESULT is the register in which the result is returned.  It's NULL for
+   "call" and "sibcall".
+   MEM is the location of the function call.
+   SIBCALL indicates whether this function call is normal call or sibling call.
+   It will generate different pattern accordingly.  */
+
+void
+aarch64_expand_call (rtx result, rtx mem, bool sibcall)
+{
+  rtx call, callee, tmp;
+  rtvec vec;
+  machine_mode mode;
+
+  gcc_assert (MEM_P (mem));
+  callee = XEXP (mem, 0);
+  mode = GET_MODE (callee);
+  gcc_assert (mode == Pmode);
+
+  /* Decide if we should generate indirect calls by loading the
+ 64-bit address of the callee into a register before performing
+ the branch-and-link.  */
+
+  if (GET_CODE (callee) == SYMBOL_REF
+  ? (aarch64_is_long_call_p (callee)
+	 || aarch64_is_noplt_call_p (callee))
+  : !REG_P (callee))
+  XEXP (mem, 0) = force_reg (mode, callee);
+
+  call = gen_rtx_CALL (VOIDmode, mem, const0_rtx);
+
+  if (result != NULL_RTX)
+call = gen_rtx_SET (result, call);
+
+  if (sibcall)
+tmp = ret_rtx;
+  else
+tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
+
+  vec = gen_rtvec (2, call, tmp);
+  call = gen_rtx_PARALLEL (VOIDmode, vec);
+
+  aarch64_emit_call_insn (call);
+}
+
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
 void
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bc6d8a2..5682686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -718,12 +718,6 @@
 ;; Subroutine calls and sibcalls
 ;; ---
 
-(define_expand "call_internal"
-  [(parallel [(call (match_operand 0 "memory_operand" "")
-		(match_operand 1 "general_operand" ""))
-	  (use (match_operand 2 "" ""))
-	  (clobber (reg:DI LR_REGNUM))])])
-
 (define_expand "call"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 		(match_operand 1 "general_operand" ""))
@@ -732,57 +726,22 @@
   ""
   "
   {
-rtx callee, pat;
-
-/* In an untyped call, we can get NULL for operand 2.  */
-

[Patch testsuite obvious] Use setjmp, not sigsetjmp in gcc.dg/pr78582.c

2016-12-01 Thread James Greenhalgh


As subject.

Newlib doesn't have sigsetjmp, so the test fails for our newlib-based
testruns. Confirmed that this adjusted test would still have ICE'd
without r242958.

Committed as obvious as revision 243116.

Thanks,
James

---
2016-12-01  James Greenhalgh  

* gcc.dg/pr78582.c (main): Call setjmp, not sigsetjmp.

diff --git a/gcc/testsuite/gcc.dg/pr78582.c b/gcc/testsuite/gcc.dg/pr78582.c
index 3084e3b..5284e3f 100644
--- a/gcc/testsuite/gcc.dg/pr78582.c
+++ b/gcc/testsuite/gcc.dg/pr78582.c
@@ -10,7 +10,7 @@ int
 main (int argc, char argv, char env)
 {
   int a;
-  sigsetjmp (0, 0);
+  setjmp (0);
   argc = a = argc;
   reader_loop ();

Re: [RFC] Assert DECL_ABSTRACT_ORIGIN is different from the decl itself

2016-12-01 Thread Martin Jambor

Hello,

On Wed, Nov 30, 2016 at 02:09:19PM +0100, Martin Jambor wrote:
> On Tue, Nov 29, 2016 at 10:17:02AM -0700, Jeff Law wrote:
> >
> > ...
> >
> > So it seems that rather than an assert that we should just not walk down a
> > self-referencing DECL_ABSTRACT_ORIGIN.
> > 
> 
> ...
> 
> So I wonder what the options are... perhaps it seems that we can call
> dump_function_name which starts with code handling
> !DECL_LANG_SPECIFIC(t) cases, even instead of the weird 
> thing?

The following patch does that, it works as expected on my small
testcases, brings g++ in line with what gcc does with clones when it
comes to OpenMP outline functions and obviously prevents the infinite
recursion.

It passes bootstrap and testing on x86_64-linux.  OK for trunk?

Thanks,


2016-11-30  Martin Jambor  

PR c++/78589
* error.c (dump_decl): Use dump_function_name to dump
!DECL_LANG_SPECIFIC function decls with no or self-referencing
abstract origin.
---
 gcc/cp/error.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 7bf07c3..5f8fb2a 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1216,10 +1216,11 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
 case FUNCTION_DECL:
   if (! DECL_LANG_SPECIFIC (t))
{
- if (DECL_ABSTRACT_ORIGIN (t))
+ if (DECL_ABSTRACT_ORIGIN (t)
+ && DECL_ABSTRACT_ORIGIN (t) != t)
dump_decl (pp, DECL_ABSTRACT_ORIGIN (t), flags);
  else
-   pp_string (pp, M_(""));
+   dump_function_name (pp, t, flags);
}
   else if (DECL_GLOBAL_CTOR_P (t) || DECL_GLOBAL_DTOR_P (t))
dump_global_iord (pp, t);
-- 
2.10.2

Re: [PATCH] avoid calling alloca(0)

2016-12-01 Thread Martin Sebor


On 11/30/2016 09:09 PM, Martin Sebor wrote:

What I think this tells us is that we're not at a place where we're
clean.  But we can incrementally get there.  The warning is only
catching a fairly small subset of the cases AFAICT.  That's not unusual
and analyzing why it didn't trigger on those cases might be useful as
well.


The warning has no smarts.  It relies on constant propagation and
won't find a call unless it sees it's being made with a constant
zero.  Looking at the top two on the list the calls are in extern
functions not called from the same source file, so it probably just
doesn't see that the functions are being called from another file
with a zero.  Building GCC with LTO might perhaps help.


I should also add that for GCC abd provided the main concern is
non-unique pointers, the warning find just the right subset of
calls,  (other concerns are portability to non-GCC compilers or
to library implementations).

GCC makes sure the size is a multiple of stack alignment. When
the argument is constant and a multiple of stack size GCC does
nothing, and so when the size is zero it just returns the top
of stack, resulting in non-unique pointers.  When it's not
constant, GCC emits code to round up the size to a multiple
of stack alignment, which makes each pointer unique.




So where does this leave us for gcc-7?  I'm wondering if we drop the
warning in, but not enable it by default anywhere.  We fix the cases we
can (such as reg-stack,c tree-ssa-threadedge.c, maybe others) before
stage3 closes, and shoot for the rest in gcc-8, including improvign the
warning (if there's something we can clearly improve), and enabling the
warning in -Wall or -Wextra.


I'm fine with deferring the GCC fixes and working on the cleanup
over time but I don't think that needs to gate enabling the option
with -Wextra.  The warnings can be suppressed or prevented from
causing errors during a GCC build either via a command line option
or by pragma in the code.  AFAICT, from the other warnings I see
go by, this is what has been done for -Wno-implicit-fallthrough
while those warnings are being cleaned up.  Why not take the same
approach here?

As much as I would like to improve the warning itself I'm also not
sure I see much of an opportunity for it.  It's not prone to high
rates of false positives (hardly any in fact) and the cases it
misses are those where it simply doesn't see the argument value
because it's not been made available by constant propagation.

That said, I consider the -Walloc-size-larger-than warning to be
the more important part of the patch by far.  I'd hate a lack of
consensus on how to deal with GCC's handful of instances of
alloca(0) to stall the rest of the patch.

Thanks
Martin

[PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread David Edelsohn

A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.

Index: dbxout.c
===
--- dbxout.c(revision 243118)
+++ dbxout.c(working copy)
@@ -1479,6 +1479,7 @@ dbxout_type_fields (tree type)

   /* Omit here local type decls until we know how to support them.  */
   if (TREE_CODE (tem) == TYPE_DECL
+ || TREE_CODE (tem) == TEMPLATE_DECL
  /* Omit here the nameless fields that are used to skip bits.  */
  || DECL_IGNORED_P (tem)
  /* Omit fields whose position or size are variable or too large to

Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Martin Sebor


So, let's use another testcase, -O2 -W -Wall -fno-tree-vrp -fno-tree-ccp
and again UB in it:
volatile bool e;
volatile int x;

int
main ()
{
  x = 123;
  *(char *)&e = x;
  bool f = e;
  x = __builtin_snprintf (0, 0, "%d", f);
}

This will store 1 into x, while without -fprintf-return-value it would store
3.


Great, that's what I was looking for.  I turned it into the following
test case.  Let me try to massage it into a compile-only test suitable
for the test suite and commit it later today.

volatile bool e;
volatile int x;

#define FMT "%d"
const char *fmt = FMT;

int
main ()
{
  x = 123;
  *(char *)&e = x;
  bool f = e;

  int n1 = __builtin_snprintf (0, 0, FMT, f);
  int n2 = __builtin_snprintf (0, 0, fmt, f);

  __builtin_printf ("%i == %i\n", n1, n2);
  if (n1 != n2)
__builtin_abort ();
}

Martin

[PATCH 0/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt

The following patch series adds some patterns for enhanced use of
the r[ixo]sbg instructions on S/390.

 - 0001-* fixes some test regressions with the existing risbg
   patterns that are broken because of recent trunkt changes.

 - 0002-* adds new patterns for the r[xo]sbg instructions and an
   SI mode variant of "extzv".

For details, please chech the commit comments of the patches.  All
patches have been bootstrapped on s390x biarch and regression
tested on s390x biarch and s390.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH 1/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt

On Thu, Dec 01, 2016 at 05:26:16PM +0100, Dominik Vogt wrote:
> The following patch series adds some patterns for enhanced use of
> the r[ixo]sbg instructions on S/390.
> 
>  - 0001-* fixes some test regressions with the existing risbg
>patterns that are broken because of recent trunkt changes.
> 
>  - 0002-* adds new patterns for the r[xo]sbg instructions and an
>SI mode variant of "extzv".
> 
> For details, please chech the commit comments of the patches.  All
> patches have been bootstrapped on s390x biarch and regression
> tested on s390x biarch and s390.

Risbg patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog-fix-risbg-tests

* config/s390/s390.md ("*trunc_sidi_and_subreg_ze")
("*extzvdi_top"): New patterns.
gcc/testsuite/ChangeLog-fix-risbg-tests

* gcc.target/s390/risbg-ll-1.c (f43, f44): Adapt regexps.
* gcc.target/s390/risbg-ll-2.c (f9): Ditto.
>From 59c63c47602d0f32948758b8ce9f36d55b8f8f39 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 25 Nov 2016 10:33:03 +0100
Subject: [PATCH 1/2] S/390: Fix risbg pattern tests.

With r242812 combine generates zero_extract instead of lshiftrt in some case
The test cases are updated to reflect this, but the pattern
"*trunc_sidi_and_subreg_lshrt" is not used anymore.  Add
new pattern "*trunc_sidi_and_subreg_ze" to take over the
one's work.

Add a variant "*extzv_top" that deals with zero_extracts that can be
expressed as a simple right shift, which has the advantage of not clobbering
the condition code.
---
 gcc/config/s390/s390.md| 33 ++
 gcc/testsuite/gcc.target/s390/risbg-ll-1.c |  8 
 gcc/testsuite/gcc.target/s390/risbg-ll-2.c |  2 +-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index aaf8427..43b9371 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3755,6 +3755,24 @@
 }
 })
 
+; Special case where zero_extract can be written as a right-shift.
+(define_insn_and_split "*extzvdi_top"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+   (zero_extract:DI
+(match_operand:DI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "") ; size
+(const_int 0))) ; start
+  ]
+  "EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), 0, 64)"
+  "#"
+  ""
+  [(set (match_dup 0)
+   (lshiftrt:DI (match_dup 1) (match_dup 2)))]
+{
+  operands[2] = GEN_INT (64 - INTVAL (operands[2]));
+})
+
+; In all other cases try risbg.
 (define_insn "*extzv"
   [(set (match_operand:GPR 0 "register_operand" "=d")
   (zero_extract:GPR
@@ -4045,6 +4063,21 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*trunc_sidi_and_subreg_ze"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+   (and:SI
+(subreg:SI (zero_extract:DI
+(match_operand:DI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "")  ; size
+(match_operand 3 "const_int_operand" "")) ; pos
+   4)
+(match_operand:SI 4 "contiguous_bitmask_nowrap_operand" "")))]
+  "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64)"
+  "\t%0,%1,%t4,128+%f4,%2+%3"
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
 ; z = (x << c) | (y >> d) with (x << c) and (y >> d) not overlapping after 
shifting
 ;  -> z = y >> d; z = (x << c) | (z & ((1 << c) - 1))
 ;  -> z = y >> d; z = risbg;
diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-1.c 
b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
index 30350d0..17a9000 100644
--- a/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
+++ b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
@@ -478,8 +478,8 @@ i64 f42 (t42 v_x)
 // Check that we get the case where a 64-bit shift is used by a 32-bit and.
 i32 f43 (i64 v_x)
 {
-  /* { dg-final { scan-assembler "f43:\n\trisbg\t%r2,%r2,32,128\\\+61,64-12" { 
target { lp64 } } } } */
-  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,32,128\\\+61,64-12"
 { target { ! lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r2,%r2,32,128\\\+61,32\\\+20" { target { lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,32,128\\\+61,32\\\+20"
 { target { ! lp64 } } } } */
   i64 v_shr3 = ((ui64)v_x) >> 12;
   i32 v_shr3_tr = (ui32)v_shr3;
   i32 v_conv = v_shr3_tr & -4;
@@ -489,8 +489,8 @@ i32 f43 (i64 v_x)
 // Check that we don't get the case where the 32-bit and mask is not contiguous
 i32 f44 (i64 v_x)
 {
-  /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r2,12" { target { lp64 } 
} } } */
-  /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r3,12\n\tnilf\t%r2,10" { 
target { ! lp64 } } } } */
+  /* { dg-final { scan-assembler "f44:\n\(\t.*\n\)*\tngr\t" { target { lp64 } 
} } } */
+  /* { dg-final { scan-ass

Re: [PATCH 0/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt

On Thu, Dec 01, 2016 at 05:26:16PM +0100, Dominik Vogt wrote:
> The following patch series adds some patterns for enhanced use of
> the r[ixo]sbg instructions on S/390.
> 
>  - 0001-* fixes some test regressions with the existing risbg
>patterns that are broken because of recent trunkt changes.
> 
>  - 0002-* adds new patterns for the r[xo]sbg instructions and an
>SI mode variant of "extzv".
> 
> For details, please chech the commit comments of the patches.  All
> patches have been bootstrapped on s390x biarch and regression
> tested on s390x biarch and s390.

r[xo]sbg patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("extzv"): Allow GPR mode and rename
expander from "extzv" to "extzv".
* ("*extzvdisi"): New zero_extract pattern.
* ("*__ior_and_ze"): New pattern.
with a plain (zero_extract:SI).  Allow GPR mode.
* ("*extract1bit")
("*extract1bitdi"): Rename pattern and switch to GPR
mode.
* ("*rsbg__ze"): New pattern.
gcc/testsuite/ChangeLog

* gcc.target/s390/risbg-ll-1.c (f1, f2, f23, f34, f35, f41): Updated
tests.
* (g1, g2): New tests.
* gcc.target/s390/risbg-ll-2.c (f3, f4): Updated tests.
* gcc.target/s390/risbg-ll-3.c (g1, g2): New tests.
* gcc.target/s390/rosbg-1.c: Add tests for rosbg and rxsbg.
>From 9874c8afb7a61fb98af5b302df9866d25df16b30 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 17 Oct 2016 10:06:16 +0100
Subject: [PATCH 2/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

The new extzv-patterns are necessary for the new r[ox]sbg patterns.
The new risbg patterns are necessary for the new etzv patterns.
---
 gcc/config/s390/s390.md| 74 +--
 gcc/testsuite/gcc.target/s390/risbg-ll-1.c | 36 +++---
 gcc/testsuite/gcc.target/s390/risbg-ll-2.c |  6 +--
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c |  2 +
 gcc/testsuite/gcc.target/s390/rosbg-1.c| 80 ++
 5 files changed, 175 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/rosbg-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 43b9371..15f0a41 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3728,26 +3728,24 @@
 ; extv instruction patterns
 ;
 
-; FIXME: This expander needs to be converted from DI to GPR as well
-; after resolving some issues with it.
-
-(define_expand "extzv"
+(define_expand "extzv"
   [(parallel
-[(set (match_operand:DI 0 "register_operand" "=d")
-(zero_extract:DI
- (match_operand:DI 1 "register_operand" "d")
+[(set (match_operand:GPR 0 "register_operand" "=d")
+(zero_extract:GPR
+ (match_operand:GPR 1 "register_operand" "d")
  (match_operand 2 "const_int_operand" "")   ; size
  (match_operand 3 "const_int_operand" ""))) ; start
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
-  if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64))
+  if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]),
+  GET_MODE_BITSIZE (mode)))
 FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
   emit_move_insn (operands[0],
-gen_rtx_ZERO_EXTRACT (DImode,
+gen_rtx_ZERO_EXTRACT (mode,
   operands[1],
   operands[2],
   operands[3]));
@@ -3787,6 +3785,19 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*extzvdisi"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+  (zero_extract:DI
+(match_operand:SI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "")   ; size
+(match_operand 3 "const_int_operand" ""))) ; start
+  ]
+  "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 32)"
+  "\t%0,%1,64-%2,128+63,32+%3+%2" ; dst, src, start, end, shift
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
 ; 64 bit: (a & -16) | ((b >> 8) & 15)
 (define_insn "*extzvdi_lshiftrt"
   [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+d")
@@ -3820,17 +3831,36 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*__ior_and_ze"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+   (ior:GPR (and:GPR
+(match_operand:GPR 1 "register_operand" "0")
+(match_operand:GPR 2 "const_int_operand" ""))
+   (zero_extract:GPR
+(match_operand:GPR 3 "register_operand" "d")
+(match_operand 4 "const_int_operand" "") ; size
+(match_operand 5 "const_int_operand" "")) ; start
+   ))]
+  "
+   && EXTRACT_ARGS_IN_

Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-01 Thread Martin Liška

On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
> On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
>> I started review process in libsanitizer: https://reviews.llvm.org/D26965
>> And I have a question that was asked in the review: can we distinguish 
>> between load and store
>> in case of having usage of ASAN_POISON?
> 
> I think with ASAN_POISON it is indeed just loads from after scope that can
> be caught, a store overwrites the variable with a new value and when turning
> the store after we make the var no longer addressable into SSA form, we
> loose information about the out of scope store.  Furthermore, if there is
> first a store and then a read, like:
>   if (argc != 12312)
> {
>   char my_char;
>   ptr = &my_char;
> }
>   *ptr = i + 26;
>   return *ptr;
> we don't notice even the read.  Not sure what could be done against that
> though.  I think we'd need to hook into the into-ssa framework, there it
> should know the current value of the variable at the point of the store is
> result of ASAN_POISON and be able to instead of turning that
>   my_char = _23;
> into
>   my_char_35 = _23;
> turn it into:
>   my_char_35 = ASAN_POISON (_23);
> which would represent after scope store into my_char.
> 
> Not really familiar with into-ssa though to know where to do it.
> 
>   Jakub
> 

Richi, may I ask you for help with this question?

Thanks,
Martin

Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Sebor


On 12/01/2016 02:48 AM, Martin Liška wrote:

On 11/30/2016 11:46 PM, Martin Sebor wrote:

On 11/24/2016 05:59 AM, Martin Liška wrote:

On 11/24/2016 09:29 AM, Richard Biener wrote:

Please guard with ! TDF_GIMPLE, otherwise the output will not be
parseable
with the GIMPLE FE.

RIchard.


Done and verified that and it provides equal dumps for -fdump*-gimple.
Installed as r242837.


Hi Martin,

I'm trying to understand how to interpret the probabilities (to
make sure one of my tests, builtin-sprintf-2.c, is testing what
it's supposed to be testing).

With this example:

  char d2[2];

  void f (void)
  {
if (2 != __builtin_sprintf (d2, "%i", 12))
  __builtin_abort ();
  }

the probability of the branch to abort is 0%:

  f1 ()
  {
int _1;

 [100.0%]:
_1 = __builtin_sprintf (&d, "%i", 12);
if (_1 != 2)
  goto ; [0.0%]
else
  goto ; [100.0%]

 [0.0%]:
__builtin_abort ();

 [100.0%]:
return;
  }


Hello Martin.

Looks I did a small error. I use only only one digit after decimal
point, which is unable to
display noreturn predictor (defined as PROB_VERY_UNLIKELY):

#define PROB_VERY_UNLIKELY(REG_BR_PROB_BASE / 2000 - 1) // this is 4

I would suggest to use following patch to display at least 2 digits,
that would distinguish
between real zero and PROB_VERY_UNLIKELY:

x.c.046t.profile_estimate:

f ()
{
  int _1;

   [100.00%]:
  _1 = __builtin_sprintf (&d2, "%i", 12);
  if (_1 != 2)
goto ; [0.04%]
  else
goto ; [99.96%]

   [0.04%]:
  __builtin_abort ();

   [99.96%]:
  return;

}



Yet the call to abort is in the assembly so I would expect its
probability to be more than zero.  So my question is: it it safe
to be testing for calls to abort in the optimized dump as a way
of verifying that the call has not been eliminated from the program
regardless of their probabilities?


I think so, otherwise the call would be removed.


Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.

Martin



I'm going to test the patch (and eventually update scanned patterns).

Martin

Patch candidate:

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index b5e866d..de57e89 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -72,12 +72,17 @@ debug_gimple_stmt (gimple *gs)
   print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
 }

+/* Print format used for displaying probability of an edge or frequency
+   of a basic block.  */
+
+#define PROBABILITY_FORMAT "[%.2f%%]"
+
 /* Dump E probability to BUFFER.  */

 static void
 dump_edge_probability (pretty_printer *buffer, edge e)
 {
-  pp_scalar (buffer, " [%.1f%%]",
+  pp_scalar (buffer, " " PROBABILITY_FORMAT,
  e->probability * 100.0 / REG_BR_PROB_BASE);
 }

@@ -1023,7 +1028,7 @@ dump_gimple_label (pretty_printer *buffer, glabel
*gs, int spc, int flags)
   dump_generic_node (buffer, label, spc, flags, false);
   basic_block bb = gimple_bb (gs);
   if (bb && !(flags & TDF_GIMPLE))
-pp_scalar (buffer, " [%.1f%%]",
+pp_scalar (buffer, " " PROBABILITY_FORMAT,
bb->frequency * 100.0 / REG_BR_PROB_BASE);
   pp_colon (buffer);
 }
@@ -2590,7 +2595,8 @@ dump_gimple_bb_header (FILE *outf, basic_block bb,
int indent, int flags)
   if (flags & TDF_GIMPLE)
 fprintf (outf, "%*sbb_%d:\n", indent, "", bb->index);
   else
-fprintf (outf, "%*s [%.1f%%]:\n", indent, "", bb->index,
+fprintf (outf, "%*s " PROBABILITY_FORMAT ":\n",
+ indent, "", bb->index,
  bb->frequency * 100.0 / REG_BR_PROB_BASE);
 }
 }



For reference, the directive the test uses since this change was
committed looks like this:

{ dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n
*__builtin_abort" 114 "optimized" }

If I'm reading the heavily escaped regex right it matches any
percentage, even 0.0% (and the test passes).

Thanks
Martin

Re: [patch,testsuite,avr]: Filter-out -mmcu= from options for tests that set -mmcu=

2016-12-01 Thread Mike Stump

On Dec 1, 2016, at 3:54 AM, Georg-Johann Lay  wrote:
> 
> This patch moves the compile tests that have a hard coded -mmcu=MCU in their 
> dg-options to a new folder.
> 
> The exp driver filters out -mmcu= from the command line options that are 
> provided by, say, board description files or --tool-opts.
> 
> This is needed because otherwise conflicting -mmcu= will FAIL respective test 
> cases because of "specified option '-mmcu' more than once" errors from 
> avr-gcc.
> 
> Ok for trunk?

So, it would be nice if different ports can use roughly similar schemes to 
handle the same problems.  I think arm is one of the more complex ports at this 
point in this regard with a lot of people and a lot of years time to 
contemplate and implement solutions to the problem.  They in particular don't 
have to move test cases around to handle the difference like this, I think it 
would be best to avoid that requirement if possible.

Glancing around, two starting points for how the arm achieves what it does:

  lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* conflicts 
with -m(cpu|arch)=.* switch"

in arm.exp, and they use something like:

/* { dg-require-effective-target arm_crypto_ok } */ 
|crypto-vsha256hq_u32.c:2:/* { dg-require-effective-target 
arm_crypto_ok } */
/* { dg-add-options arm_crypto } */ 
|crypto-vsha256su0q_u32.c:2:/* { 
dg-require-effective-target arm_crypto_ok } */

to validate the requirements of the test case, and to ensure that optional 
things are selected.  Nice, simple, extensible, handles multilibs, dejagnu 
arguments and different cpu defaults as I recall.

You won't need all the hair the arm folks have, but if you stub in support in 
that direction, you then have simple, easy expansion room to match all 
complexities that the arm folks have already hit and solved.

Re: [patch,avr] Document how to avoid progmem on AVR_TINY.

2016-12-01 Thread Denis Chertykov

2016-12-01 17:28 GMT+03:00 Georg-Johann Lay :
> This adds to the documentation a hint how to set up a linker description
> file that avoids progmem altogether any without the usual overhead of
> locating read-only data in RAM.  The proposed linker description file is
> completely transparent to the compiler, and no start-up code has to be
> adjusted.
>
> IIUC there are currently no plans to fix this in the default linker
> description file avrtiny.x, cf. http://sourceware.org/PR20849
>
> Also, link between -mabsdata option and absdata variable attribute.
>
> Ok for trunk?
>
>
> Johann
>
>
> gcc/
> * doc/invoke.texi (AVR Options) [-mabsdata]: Point to absdata.
> * doc/extend.texi (AVR Variable Attributes) [progmem]: Hint
> about linker description to avoid progmem altogether.
> [absdata]: Point to -mabsdata option.
>

Approved.

Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Sebor


Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.


I think I see what's going on.  It's the call to abort that GCC
uses for the probability (more precisely its attribute noreturn).
When I replace the abort with a function of my own that GCC knows
nothing about the probability goes up to just over 52%.  Still it
seems very low given that 2 is just one of UINT_MAX values the
function can possibly return.

Martin

$ cat a.c && gcc -O2 -S -w -fdump-tree-optimized=/dev/stdout a.c
void foo (void);

void bar (void)
{
  char d [2];
  if (2 != __builtin_sprintf (d, "%i", 12))
foo ();
}

;; Function bar (bar, funcdef_no=0, decl_uid=1797, cgraph_uid=0, 
symbol_order=0)


Removing basic block 5
bar ()
{
  char d[2];
  int _1;

   [100.0%]:
  _1 = __builtin_sprintf (&d, "%i", 12);
  if (_1 != 2)
goto ; [52.9%]
  else
goto ; [47.1%]

   [52.9%]:
  foo ();

   [100.0%]:
  d ={v} {CLOBBER};
  return;

}

Re: [patch,avr] Clean up n_flash field from MCU information.

2016-12-01 Thread Denis Chertykov

2016-12-01 12:26 GMT+03:00 Georg-Johann Lay :
> The introduction of the flash_size field in avr_mcu_t rendered the n_flash
> field redundant.  This patch computes the value of n_flash as needed from
> flash_size and cleans up n_flash.
>
> Ok for trunk?
>
> Johann
>
> gcc/
> * config/avr/avr-arch.h (avr_mcu_t) [n_flash]: Remove field.
> * config/avr/avr-devices.c (AVR_MCU): Remove N_FLASH macro argument.
> * config/avr/avr-mcus.def (AVR_MCU): Remove initializer for n_flash.
> * config/avr/avr.c (avr_set_core_architecture) [avr_n_flash]: Use
> avr_mcu_types.flash_size to compute default value.
> * config/avr/gen-avr-mmcu-specs.c (print_mcu) [cc1_n_flash]: Use
> mcu->flash_size to compute value for spec.
>
>

Approved.

Re: [PATCH 1/6][ARM] Refactor NEON builtin framework to work for other builtins

2016-12-01 Thread Andre Vieira (lists)

On 17/11/16 10:42, Kyrill Tkachov wrote:
> Hi Andre,
> 
> On 09/11/16 10:11, Andre Vieira (lists) wrote:
>> Hi,
>>
>> Refactor NEON builtin framework such that it can be used to implement
>> other builtins.
>>
>> Is this OK for trunk?
>>
>> Regards,
>> Andre
>>
>> gcc/ChangeLog
>> 2016-11-09  Andre Vieira  
>>
>>  * config/arm/arm-builtins.c (neon_builtin_datum): Rename to ..
>>  (arm_builtin_datum): ... this.
>>  (arm_init_neon_builtin): Rename to ...
>>  (arm_init_builtin): ... this. Add a new parameters PREFIX
>>  and USE_SIG_IN_NAME.
>>  (arm_init_neon_builtins): Replace 'arm_init_neon_builtin' with
>>  'arm_init_builtin'. Replace type 'neon_builtin_datum' with
>>  'arm_builtin_datum'.
>>  (arm_init_vfp_builtins): Likewise.
>>  (builtin_arg): Rename enum's replacing 'NEON_ARG' with
>>  'ARG_BUILTIN' and add a 'ARG_BUILTIN_NEON_MEMORY.
>>  (arm_expand_neon_args): Rename to ...
>>  (arm_expand_builtin_args): ... this. Rename builtin_arg
>>  enum values and differentiate between ARG_BUILTIN_MEMORY
>>  and ARG_BUILTIN_NEON_MEMORY.
>>  (arm_expand_neon_builtin_1): Rename to ...
>>  (arm_expand_builtin_1): ... this. Rename builtin_arg enum
>>  values, arm_expand_builtin_args and add bool parameter NEON.
>>  (arm_expand_neon_builtin): Use arm_expand_builtin_1.
>>  (arm_expand_vfp_builtin): Likewise.
>>  (NEON_MAX_BUILTIN_ARGS): Remove, it was unused.
> 
>  /* Expand a neon builtin.  This is also used for vfp builtins, which
> behave in
> the same way.  These builtins are "special" because they don't have
> symbolic
> constants defined per-instruction or per instruction-variant. 
> Instead, the
> -   required info is looked up in the NEON_BUILTIN_DATA record that is
> passed
> +   required info is looked up in the ARM_BUILTIN_DATA record that is
> passed
> into the function.  */
>  
> 
> The comment should be updated now that it's not just NEON builtins that
> are expanded through this function.
> 
>  static rtx
> -arm_expand_neon_builtin_1 (int fcode, tree exp, rtx target,
> -   neon_builtin_datum *d)
> +arm_expand_builtin_1 (int fcode, tree exp, rtx target,
> +   arm_builtin_datum *d, bool neon)
>  {
> 
> I'm not a fan of this 'neon' boolean as it can cause confusion among the
> users of the function
> (see long thread at https://gcc.gnu.org/ml/gcc/2016-10/msg4.html).
> Whether the builtin is a NEON/VFP builtin
> can be distinguished from FCODE, so lets just make that bool neon a
> local variable and initialise it accordingly
> from FCODE.
> 
> Same for:
> +/* Set up a builtin.  It will use information stored in the argument
> struct D to
> +   derive the builtin's type signature and name.  It will append the
> name in D
> +   to the PREFIX passed and use these to create a builtin declaration
> that is
> +   then stored in 'arm_builtin_decls' under index FCODE.  This FCODE is
> also
> +   written back to D for future use.  If USE_SIG_IN_NAME is true the
> builtin's
> +   name is appended with type signature information to distinguish between
> +   signedness and poly.  */
>  
>  static void
> -arm_init_neon_builtin (unsigned int fcode,
> -   neon_builtin_datum *d)
> +arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
> +  const char * prefix, bool use_sig_in_name)
> 
> use_sig_in_name is dependent on FCODE so just deduce it from that
> locally in arm_init_builtin.
> 
> This is ok otherwise.
> Thanks,
> Kyrill
> 
> 

Hi,

Reworked patch according to comments. No changes to ChangeLog.

Is this OK?

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
5ed38d1608cfbfbd1248d76705fcf675bc36c2b2..da6331fdc729461adeb81d84c0c425bc45b80b8c
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -202,7 +202,7 @@ typedef struct {
   const enum insn_code code;
   unsigned int fcode;
   enum arm_type_qualifiers *qualifiers;
-} neon_builtin_datum;
+} arm_builtin_datum;
 
 #define CF(N,X) CODE_FOR_neon_##N##X
 
@@ -242,7 +242,7 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def and
+/* The builtin data can be found in arm_neon_builtins.def,
arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
TARGET_NEON to be true.  The feature tests are checked when the
builtins are expanded.
@@ -252,14 +252,14 @@ typedef struct {
would be specified after the assembler mnemonic, which usually
refers to the last vector operand.  The modes listed per
instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
+   instruction's pattern, for instance in neon.md.  */
 
-static neon_builtin_datum vfp_builtin_data[] =
+static arm_builtin_datum vfp_builtin_data[] =
 {
 #include "arm_vfp_builtins.def"
 };
 
-static neon_builtin_datum neon_builtin_dat

Re: [PATCH 2/6][ARM] Move CRC builtins to refactored framework

2016-12-01 Thread Andre Vieira (lists)

On 09/11/16 10:11, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch refactors the implementation of the ARM ACLE CRC builtins to
> use the builtin framework.
> 
> Is this OK for trunk?
> 
> Regards,
> Andre
> 
> gcc/ChangeLog
> 2016-11-09  Andre Vieira  
> 
>   * config/arm/arm-builtins.c (arm_unsigned_binop_qualifiers): New.
>   (UBINOP_QUALIFIERS): New.
>   (si_UP): Define.
>   (acle_builtin_data): New. Change comment.
>   (arm_builtins): Remove ARM_BUILTIN_CRC32B, ARM_BUILTIN_CRC32H,
>   ARM_BUILTIN_CRC32W, ARM_BUILTIN_CRC32CB, ARM_BUILTIN_CRC32CH,
>   ARM_BUILTIN_CRC32CW. Add ARM_BUILTIN_ACLE_BASE and include
>   arm_acle_builtins.def.
>   (ARM_BUILTIN_ACLE_PATTERN_START): Define.
>   (arm_init_acle_builtins): New.
>   (CRC32_BUILTIN): Remove.
>   (bdesc_2arg): Remove entries for crc32b, crc32h, crc32w,
>   crc32cb, crc32ch and crc32cw.
>   (arm_init_crc32_builtins): Remove.
>   (arm_init_builtins): Use arm_init_acle_builtins rather
>   than arm_init_crc32_builtins.
>   (arm_expand_acle_builtin): New.
>   (arm_expand_builtin): Use 'arm_expand_acle_builtin'.
>   * config/arm/arm_acle_builtins.def: New.
> 
Hi,

Reworked this patch based on the changes made in [1/6]. No changes to
ChangeLog.

Is this OK?

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
da6331fdc729461adeb81d84c0c425bc45b80b8c..b47e255c962239a73b62f5743273d12f07bb237b
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -157,6 +157,13 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_none, qualifier_struct_load_store_lane_index };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
+/* unsigned T (unsigned T, unsigned T, unsigned T).  */
+static enum arm_type_qualifiers
+arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+  qualifier_unsigned };
+#define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -242,17 +249,16 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The builtin data can be found in arm_neon_builtins.def,
-   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
-   TARGET_NEON to be true.  The feature tests are checked when the
-   builtins are expanded.
+/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def
+   and arm_acle_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The feature tests are checked when the builtins are
+   expanded.
 
-   The mode entries in the following table correspond to the "key"
-   type of the instruction variant, i.e. equivalent to that which
-   would be specified after the assembler mnemonic, which usually
-   refers to the last vector operand.  The modes listed per
-   instruction should be the same as those defined for that
-   instruction's pattern, for instance in neon.md.  */
+   The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic for neon instructions, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern, for instance in neon.md.  */
 
 static arm_builtin_datum vfp_builtin_data[] =
 {
@@ -266,6 +272,15 @@ static arm_builtin_datum neon_builtin_data[] =
 
 #undef CF
 #undef VAR1
+#define VAR1(T, N, A) \
+  {#N, UP (A), CODE_FOR_##N, 0, T##_QUALIFIERS},
+
+static arm_builtin_datum acle_builtin_data[] =
+{
+#include "arm_acle_builtins.def"
+};
+
+#undef VAR1
 
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
@@ -518,13 +533,6 @@ enum arm_builtins
 
   ARM_BUILTIN_WMERGE,
 
-  ARM_BUILTIN_CRC32B,
-  ARM_BUILTIN_CRC32H,
-  ARM_BUILTIN_CRC32W,
-  ARM_BUILTIN_CRC32CB,
-  ARM_BUILTIN_CRC32CH,
-  ARM_BUILTIN_CRC32CW,
-
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
@@ -556,6 +564,14 @@ enum arm_builtins
 
 #include "arm_neon_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_##N,
+
+  ARM_BUILTIN_ACLE_BASE,
+
+#include "arm_acle_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -565,6 +581,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_ACLE_PATTERN_START \
+  (ARM_BUILTIN_ACLE_BASE + 1)
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1013,7 +1032,7 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   gcc_assert (ftype != NULL);
 
   if (print_type_signature_p
-  && IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_MAX - 1))
+  && IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_ACLE_MAX - 1))
 snprintf (namebuf, sizeof (namebuf), "%s_%s_%s

Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Jeff Law


On 12/01/2016 05:04 AM, Segher Boessenkool wrote:

On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:

Thinking about this again maybe targets w/o insn-length should simply
always use the 'simple' algorithm instead of the STV one?  At least that
might be what your change effectively does in some way?


From reading the comments I don't think STC will collapse down into the
simple algorithm if block copying is disabled.  But Segher would know for
sure.

WRT the choice of simple vs STC, I doubt it matters much for the processors
in question.


I guess STC doesn't make much sense if we can't say anything about BB sizes.


STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.  It never copies code; if that means it
has to jump every second insn, so be it.  It provably is within a factor
three of optimal (optimal is NP-hard), under a really weak assumption
within a factor two, and it does better than that in practice.

STC without block copying makes longer traces which is not a good idea
for most architectures, only for those that have a short jump that is
much shorter than longer jumps (and thus, cannot cover many jump
targets).

I do not know how STC behaves when it does not know the insn lengths.
mn103 &  m68k are definitely sensitive to jump distances, the former 
more so than the latter.  Some of the others probably are as well.


I think we've probably discussed this more than is really necessary.  We 
just need to pick an alternative and go with it, I think either 
alternative is reasonable (avoid copying when STC has no length 
information or fall back to simple when there is no length information).




jeff

Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Jeff Law


On 12/01/2016 09:49 AM, Martin Sebor wrote:

Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.


I think I see what's going on.  It's the call to abort that GCC
uses for the probability (more precisely its attribute noreturn).
Right.  An edge which leads to a noreturn call is predicated as 
esentially "never taken".  It's a good strong predictor.




When I replace the abort with a function of my own that GCC knows
nothing about the probability goes up to just over 52%.  Still it
seems very low given that 2 is just one of UINT_MAX values the
function can possibly return.
Jan would be the one to know the predictors work.  It's reasonably 
complex stuff.

Jeff

[PATCH, GCC/LRA] Fix PR78617: Fix conflict detection in rematerialization

2016-12-01 Thread Thomas Preudhomme


Hi,

When considering a candidate for rematerialization, LRA verifies if the 
candidate clobbers a live register before going forward with the 
rematerialization (see code starting with comment "Check clobbers do not kill 
something living."). To do this check, the set of live registers at any given 
instruction needs to be maintained. This is done by initializing the set of live 
registers when starting the forward scan of instruction in a basic block and 
updating the set by looking at REG_DEAD notes and destination register at the 
end of an iteration of the scan loop.


However the initialization suffers from 2 issues:

1) it is done from the live out set rather than live in (uses df_get_live_out 
(bb))
2) it ignores pseudo registers that have already been allocated a hard register 
(uses REG_SET_TO_HARD_REG_SET that only looks at hard register and does not look 
at reg_renumber for pseudo registers)


This patch changes the code to use df_get_live_in (bb) to initialize the 
live_hard_regs variable using a loop to check reg_renumber for pseudo registers. 
Please let me know if there is a macro to do that, I failed to find one.


ChangeLog entries are as follow:

gcc/testsuite/ChangeLog:

2016-12-01  Thomas Preud'homme  

PR rtl-optimization/78617
* gcc.c-torture/execute/pr78617.c: New test.


gcc/ChangeLog:

2016-12-01  Thomas Preud'homme  

PR rtl-optimization/78617
* lra-remat.c (do_remat): Initialize live_hard_regs from live in
registers, also setting hard registers mapped to pseudo registers.


Note however that as explained in the problem report, the testcase does not 
trigger the bug on GCC 7 due to better optimization before LRA rematerialization 
is reached.


Testing: testsuite shows no regression when run using:
 + an arm-none-eabi GCC cross-compiler targeting Cortex-M0 and Cortex-M3
 + a bootstrapped arm-linux-gnueabihf GCC native compiler
 + a bootstrapped x86_64-linux-gnu GCC native compiler

Is this ok for stage3?

Best regards,

Thomas
diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index f01c6644c428fd9b5efdf6cc98788e5f6fadba62..cdd7057f602098d33ec3acfdaaac66556640bd82 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -1047,6 +1047,7 @@ update_scratch_ops (rtx_insn *remat_insn)
 static bool
 do_remat (void)
 {
+  unsigned regno;
   rtx_insn *insn;
   basic_block bb;
   bitmap_head avail_cands;
@@ -1054,12 +1055,21 @@ do_remat (void)
   bool changed_p = false;
   /* Living hard regs and hard registers of living pseudos.  */
   HARD_REG_SET live_hard_regs;
+  bitmap_iterator bi;
 
   bitmap_initialize (&avail_cands, ®_obstack);
   bitmap_initialize (&active_cands, ®_obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (bb));
+  CLEAR_HARD_REG_SET (live_hard_regs);
+  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+	{
+	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
+			   ? regno
+			   : reg_renumber[regno];
+	  if (hard_regno >= 0)
+	SET_HARD_REG_BIT (live_hard_regs, hard_regno);
+	}
   bitmap_and (&avail_cands, &get_remat_bb_data (bb)->avin_cands,
 		  &get_remat_bb_data (bb)->livein_cands);
   /* Activating insns are always in the same block as their corresponding
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78617.c b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
new file mode 100644
index ..89c4f6dea8cb507b963f91debb94cbe16eb1db90
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
@@ -0,0 +1,25 @@
+int a = 0;
+int d = 1;
+int f = 1;
+
+int fn1() {
+  return a || 1 >> a;
+}
+
+int fn2(int p1, int p2) {
+  return p2 >= 2 ? p1 : p1 >> 1;
+}
+
+int fn3(int p1) {
+  return d ^ p1;
+}
+
+int fn4(int p1, int p2) {
+  return fn3(!d > fn2((f = fn1() - 1000) || p2, p1));
+}
+
+int main() {
+  if (fn4(0, 0) != 1)
+__builtin_abort ();
+  return 0;
+}

Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Jeff Law


On 12/01/2016 12:51 AM, Jakub Jelinek wrote:

On Wed, Nov 30, 2016 at 06:14:14PM -0700, Martin Sebor wrote:

On 11/30/2016 12:01 PM, Jakub Jelinek wrote:

Hi!

This patch fixes some minor nits I've raised in the PR, more severe issues
left unresolved there.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Thank you.  One comment below.


@@ -1059,7 +1048,12 @@ format_integer (const conversion_spec &s
}

  if (code == NOP_EXPR)
-   argtype = TREE_TYPE (gimple_assign_rhs1 (def));
+   {
+ tree type = TREE_TYPE (gimple_assign_rhs1 (def));
+ if (TREE_CODE (type) == INTEGER_TYPE
+ || TREE_CODE (type) == POINTER_TYPE)
+   argtype = type;


As I replied in my comment #6 on the bug, I'm not sure I see what
is wrong with the original code, and I haven't been able to come
up with a test case that demonstrates a problem because of it with
any of the types you mentioned (bool, enum, or floating).


I think for floating we don't emit NOP_EXPR, but FIX_TRUNC_EXPR;
Correct.  The way to think about this stuff is whether or not the type 
change is *likely* to lead to code generated.  If the type change is not 
likely to generate code, then NOP_EXPR is appropriate.  Else a suitable 
conversion is necessary.  This distinction has long been something we'd 
like to fix, but haven't gotten around it it.


It is almost always the case that a floating point conversion will 
generate code.  So NOP_EXPR is not appropriate for a floating point 
conversion.


A NOP_EXPR will be used for a large number of integer conversions though.


perhaps bool/enum is fine, but in the UB case where one loads arbitrary
values into bool or enum the precision might be too small for those
(for enums only with -fstrict-enums).
Note that loading an out-of-range value into an enum object is, sadly, 
not UB as long as there's enough storage space in the object to hold the 
value.  That's why enums often don't constrain ranges.



Jeff

Re: [Patch 0/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread David Edelsohn

> James Greenhalgh writes:

> The patch set has been bootstrapped and tested on aarch64-none-linux-gnu and
> x86-64-none-linux-gnu without any issues. I've also cross-tested it for
> aarch64-none-elf and build-tested it for rs6000 (though I couldn't run the
> testsuite as I don't have a test environment).

There are PPC64 Linux and AIX systems in the GNU Compile Farm.  All
have DejaGNU installed.

- David

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-12-01 Thread Martin Sebor


Sure - but then you maybe instead want to check for op being in
range [0, max-of-signed-type-of-op] instead?  So similar to
expr_not_equal_to add a expr_in_range helper?

Your function returns true for sizetype vars even if it might be
effectively signed, like for

 sizetype i_1 = -4;
 i_2 = i_1 + 1;

operand_unsigned_p (i) returns true.  I suppose you may have
meant

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (op);
+  if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR
   && TYPE_PRECISION (TREE_TYPE (op)) > TYPE_PRECISION
(TREE_TYPE (gimple_assign_rhs1 (def
  return tree_expr_nonnegative_p (gimple_assign_rhs1 (def)));
+   }
+}
+
+  return false;
+}

?  because only if you do see a cast and that cast is widening from an
nonnegative number
the final one will be unsigned (if interpreted as signed number).


I don't think this is what I want.  Here's a test case that works
with my function but not with the suggested modification:

   char d[4];
   long f (unsigned long i)
   {
 return __builtin_object_size (d + i + 1, 0);
   }

Here, the size I'm looking for is (at most) 3 no matter what value
i has.  Am I missing a case where my function will do the wrong
thing?


You might want to use offset_ints here (see mem_ref_offset for example)


Okay, I'll see if I can switch to offset_int.





+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?



Sorry, what about overflow?

The purpose of this code is to handle cases of the form

   & PTR [range (MIN, MAX)] + CST


what if MAX + CST overflows?


The code doesn't look at MAX, only MIN is considered.  It extracts
both but only actually uses MAX to see if it's dealing with a range
or a constant.  Does that resolve your concern?


   char d[7];

   #define bos(p, t) __builtin_object_size (p, t)

   long f (unsigned i)
   {
 if (2 < i) i = 2;

 char *p = &d[i] + 3;

 return bos (p, 0);
   }


I'm sure that doesn't work as you match for PLUS_EXPR.


Sorry, I'm not sure what you mean.  The above evaluates to 4 with
the patch because i cannot be less than zero (otherwise &d[i] would
be invalid/undefined) so the type-0 size we want (the maximum) is
&d[7] - (&d[0] + 3) or 4.


Maybe simply ignore VR_ANTI_RANGEs for now then?


Yes, that makes sense.


The code above is based on the observation that an anti-range is
often used to represent the full subrange of a narrower signed type
like signed char (as ~[128, -129]).  I haven't been able to create
an anti-range like ~[5, 9]. When/how would a range like that come
about (so I can test it and implement the above correctly)?


if (a < 4)
  if (a > 8)
b = a;

then b should have ~[5, 9]


Right :)  I have figured out by know by know how to create an anti
range in general.  What I meant is that I haven't had luck creating
them in a way that the tree-object-size pass could see (I'm guessing
because EVRP doesn't understand relational expressions).  So given
this modified example from above:

char d[9];

#define bos(p, t) __builtin_object_size (p, t)

long f (unsigned a)
{
   unsigned b = 0;

   if (a < 4)
 if (a > 8)
   b = a;

   char *p = &d[b];
   return bos (p, 0);
}

The value ranges after Early VRP are:

_1: VARYING
b_2: VARYING
a_4(D): VARYING
p_6: ~[0B, 0B]
_8: VARYING

But with the removal of the anti-range code this will be a moot
point.


Maybe the poor range info i a consequence of the pass only benefiting
from EVRP and not VRP?


The range of 'p' is indeed not known (we only represent integer bound ranges).
You seem to want the range of p - &d[0] here, something that is not present
in the IL.


Yes, that's effectively what this patch does.  Approximate pointer
ranges.


It's just something I haven't had time to work on yet and with the
close of stage 1 approaching I wanted to put out this version for
review.  Do you view this enhancement as prerequisite for approving
the patch or is it something that you'd be fine with adding later?


I find the patch adds quite some ad-hoc ugliness to a pass that is
already complex and nearly impossible to understand.


I'm sorry it looks ugly to you.  I'm afraid I'm not yet f

Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread Jeff Law


On 12/01/2016 09:15 AM, David Edelsohn wrote:

A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.
From the looks of things, it appears we skip them in the dwarf2 code as 
well.  But I don't think we can use TEMPLATE_DECL here as that's defined 
by the C++ front end.


I think instead if you test something like:
  (int)TREE_CODE (decl) > NUM_TREE_CODES

You'll filter out any _DECL nodes coming out of the front-ends.


jeff

Re: [Fortran, Patch, PR{43366, 57117, 61337, 61376}, v1] Assign to polymorphic objects.

2016-12-01 Thread Andre Vehreschild

Hi all,

I am sorry, but the initial mail as well as Dominique answer puzzles me:

David: I do expect to 

write (*,*) any 

not being compilable at all, because "any" is an intrinsic function and I
suppose that gfortran is not able to print it. At best it gives an address. So
am I right to assume that it should have been:

write (*,*) x

?

Which is a bit strange. Furthermore is it difficult for me to debug, because I
do not have access to an AIX machine. What address size does the machine have
32/48/64-bit? Is there a chance you send me the file that is generated
additionally by gfortran when called with -fdump-tree-original ? The file is
named alloc_comp_class_5.f03.003t.original usually.

Dominique: How did you get that? Do you have access to an AIX machine? What
kind of instrumentation was active in the compiler you mentioned?

- Andre

On Wed, 30 Nov 2016 21:51:30 +0100
Dominique d'Humières  wrote:

> If I compile the test with an instrumented  gfortran , I get 
> 
> ../../work/gcc/fortran/interface.c:2948:33: runtime error: load of value
> 1818451807, which is not a valid value for type ‘expr_t'
> 
> Dominique
> 
> > Le 30 nov. 2016 à 21:06, David Edelsohn  a écrit :
> > 
> > Hi, Andre
> > 
> > I have noticed that the alloc_comp_class_5.f03 testcase fails on AIX.
> > Annotating the testcase a little, shows that the failure is at
> > 
> >  if (any(x /= ["foo", "bar", "baz"])) call abort()
> > 
> > write (*,*) any
> > 
> > at the point of failure produces
> > 
> > "foobarba"
> > 
> > - David  
> 

-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] avoid calling alloca(0)

2016-12-01 Thread Jeff Law


On 11/30/2016 09:09 PM, Martin Sebor wrote:

What I think this tells us is that we're not at a place where we're
clean.  But we can incrementally get there.  The warning is only
catching a fairly small subset of the cases AFAICT.  That's not unusual
and analyzing why it didn't trigger on those cases might be useful as
well.


The warning has no smarts.  It relies on constant propagation and
won't find a call unless it sees it's being made with a constant
zero.  Looking at the top two on the list the calls are in extern
functions not called from the same source file, so it probably just
doesn't see that the functions are being called from another file
with a zero.  Building GCC with LTO might perhaps help.
Right.  This is consistent with the limitations of other similar 
warnings such as null pointer dereferences.





So where does this leave us for gcc-7?  I'm wondering if we drop the
warning in, but not enable it by default anywhere.  We fix the cases we
can (such as reg-stack,c tree-ssa-threadedge.c, maybe others) before
stage3 closes, and shoot for the rest in gcc-8, including improvign the
warning (if there's something we can clearly improve), and enabling the
warning in -Wall or -Wextra.


I'm fine with deferring the GCC fixes and working on the cleanup
over time but I don't think that needs to gate enabling the option
with -Wextra.  The warnings can be suppressed or prevented from
causing errors during a GCC build either via a command line option
or by pragma in the code.  AFAICT, from the other warnings I see
go by, this is what has been done for -Wno-implicit-fallthrough
while those warnings are being cleaned up.  Why not take the same
approach here?
The difference vs implicit fallthrough is that new instances of implicit 
fallthrus aren't likely to be exposed by changes in IL that occur due to 
transformations in the optimizer pipeline.


Given the number of runtime triggers vs warnings, we know there's many 
instances of passing 0 to the allocators that we're not diagnosing. I 
can easily see differences in the early IL (such as those due to 
BRANCH_COST differing for targets) exposing/hiding cases where 0 flows 
into the allocator argument.  Similarly for changes in inlining 
decisions, jump threading, etc for profiled bootstraps.  I'd like to 
avoid playing wack-a-mole right now.


So I'm being a bit more conservative here.  Maybe it'd be appropriate in 
Wextra since that's not enabled by default for GCC builds.





As much as I would like to improve the warning itself I'm also not
sure I see much of an opportunity for it.  It's not prone to high
rates of false positives (hardly any in fact) and the cases it
misses are those where it simply doesn't see the argument value
because it's not been made available by constant propagation.
There's always ways :-)   For example, I wouldn't be at all surprised if 
you found PHIs that feed the allocation where one or more of the PHI 
arguments are 0.





That said, I consider the -Walloc-size-larger-than warning to be
the more important part of the patch by far.  I'd hate a lack of
consensus on how to deal with GCC's handful of instances of
alloca(0) to stall the rest of the patch.

Agreed on not wanting alloca(0) handling to stall the rest of the patch.

Jeff

Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread David Edelsohn

On Thu, Dec 1, 2016 at 1:25 PM, Jeff Law  wrote:
> On 12/01/2016 09:15 AM, David Edelsohn wrote:
>>
>> A number of the "variant" testcases fail to build on AIX and targets
>> that use stabs.  The failure looks like:
>>
>> /tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
>> internal compiler error: tree check: expected field_decl, have
>> template_decl in int_bit_position, at tree.h:5396
>>
>> which occurs in dbxout_type_fields()
>>
>>   /* Output the name, type, position (in bits), size (in bits) of each
>>  field that we can support.  */
>>   for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
>>  ...
>>   if (VAR_P (tem))
>> {
>>  ...
>>  }
>>   else
>> {
>>   stabstr_C (',');
>>   stabstr_D (int_bit_position (tem));
>>   stabstr_C (',');
>>   stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
>>   stabstr_C (';');
>> }
>>
>> where tem is a TEMPLATE_DECL.  The dbxout code currently skips
>> TYPE_DECL, nameless fields, and CONST_DECL.
>>
>> dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
>> "The debugger doesn't know what to do with such entities anyhow", so
>> this proposed patch skips them in dbxout_type_fields() as well.
>>
>> Okay?
>>
>> Thanks, David
>>
>>
>> PR debug/66419
>> PR c++/78235
>> * dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.
>
> From the looks of things, it appears we skip them in the dwarf2 code as
> well.  But I don't think we can use TEMPLATE_DECL here as that's defined by
> the C++ front end.

TEMPLATE_DECL is defined in cp/cp-tree.def, which is included in
all-tree.def, which is included in tree-core.h, which is included in
tree.h, which is included in dbxout.c.

It also is referenced in common code in gcc/tree.c.

> I think instead if you test something like:
>   (int)TREE_CODE (decl) > NUM_TREE_CODES
>
> You'll filter out any _DECL nodes coming out of the front-ends.

No other DECLs seem to escape.

- David

Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread Jeff Law


On 12/01/2016 11:41 AM, David Edelsohn wrote:

On Thu, Dec 1, 2016 at 1:25 PM, Jeff Law  wrote:

On 12/01/2016 09:15 AM, David Edelsohn wrote:


A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.


From the looks of things, it appears we skip them in the dwarf2 code as
well.  But I don't think we can use TEMPLATE_DECL here as that's defined by
the C++ front end.


TEMPLATE_DECL is defined in cp/cp-tree.def, which is included in
all-tree.def, which is included in tree-core.h, which is included in
tree.h, which is included in dbxout.c.

It also is referenced in common code in gcc/tree.c.

In that case, do ahead with checking TEMPLATE_DECL.





I think instead if you test something like:
  (int)TREE_CODE (decl) > NUM_TREE_CODES

You'll filter out any _DECL nodes coming out of the front-ends.


No other DECLs seem to escape.

Good :-)

jeff

Re: [Fortran, Patch, PR{43366, 57117, 61337, 61376}, v1] Assign to polymorphic objects.

2016-12-01 Thread David Edelsohn

Dump sent privately.

Yes, I meant "x".

AIX defaults to 32 bit.

- David

On Thu, Dec 1, 2016 at 1:31 PM, Andre Vehreschild  wrote:
> Hi all,
>
> I am sorry, but the initial mail as well as Dominique answer puzzles me:
>
> David: I do expect to
>
> write (*,*) any
>
> not being compilable at all, because "any" is an intrinsic function and I
> suppose that gfortran is not able to print it. At best it gives an address. So
> am I right to assume that it should have been:
>
> write (*,*) x
>
> ?
>
> Which is a bit strange. Furthermore is it difficult for me to debug, because I
> do not have access to an AIX machine. What address size does the machine have
> 32/48/64-bit? Is there a chance you send me the file that is generated
> additionally by gfortran when called with -fdump-tree-original ? The file is
> named alloc_comp_class_5.f03.003t.original usually.
>
> Dominique: How did you get that? Do you have access to an AIX machine? What
> kind of instrumentation was active in the compiler you mentioned?
>
> - Andre
>
> On Wed, 30 Nov 2016 21:51:30 +0100
> Dominique d'Humières  wrote:
>
>> If I compile the test with an instrumented  gfortran , I get
>>
>> ../../work/gcc/fortran/interface.c:2948:33: runtime error: load of value
>> 1818451807, which is not a valid value for type ‘expr_t'
>>
>> Dominique
>>
>> > Le 30 nov. 2016 à 21:06, David Edelsohn  a écrit :
>> >
>> > Hi, Andre
>> >
>> > I have noticed that the alloc_comp_class_5.f03 testcase fails on AIX.
>> > Annotating the testcase a little, shows that the failure is at
>> >
>> >  if (any(x /= ["foo", "bar", "baz"])) call abort()
>> >
>> > write (*,*) any
>> >
>> > at the point of failure produces
>> >
>> > "foobarba"
>> >
>> > - David
>>
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de

Go patch committed: add slice initializers to the GC root list

2016-12-01 Thread Ian Lance Taylor

As of https://golang.org/cl/32917 we can put slice initializers in the
.data section.  The program can still change the values in those
slices.  That means that if the slice elements can contain pointers,
we need to register the entire initializer as a GC root.

This would be straightforward except that we only have a Bexpression
for the slice initializer, not an Expression.  So introduce a
Backend_expression type that wraps a Bexpression as an Expression.

The test case for this is https://golang.org/cl/33790.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243094)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-97b949f249515a61d3c09e9e06f08c8af189e967
+b7bad96ce0af50a1129eaab9aa110d68a601917b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 243084)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -4295,6 +4295,20 @@ Unary_expression::do_get_backend(Transla
  true, copy_to_heap, false,
  bexpr);
  bexpr = gogo->backend()->var_expression(implicit, loc);
+
+ // If we are not copying a slice initializer to the heap,
+ // then it can be changed by the program, so if it can
+ // contain pointers we must register it as a GC root.
+ if (this->is_slice_init_
+ && !copy_to_heap
+ && this->expr_->type()->has_pointer())
+   {
+ Bexpression* root =
+   gogo->backend()->var_expression(implicit, loc);
+ root = gogo->backend()->address_expression(root, loc);
+ Type* type = Type::make_pointer_type(this->expr_->type());
+ gogo->add_gc_root(Expression::make_backend(root, type, loc));
+   }
}
   else if ((this->expr_->is_composite_literal()
|| this->expr_->string_expression() != NULL)
@@ -15433,6 +15447,28 @@ Expression::make_compound(Expression* in
   return new Compound_expression(init, expr, location);
 }
 
+// Class Backend_expression.
+
+int
+Backend_expression::do_traverse(Traverse*)
+{
+  return TRAVERSE_CONTINUE;
+}
+
+void
+Backend_expression::do_dump_expression(Ast_dump_context* ast_dump_context) 
const
+{
+  ast_dump_context->ostream() << "backend_expression<";
+  ast_dump_context->dump_type(this->type_);
+  ast_dump_context->ostream() << ">";
+}
+
+Expression*
+Expression::make_backend(Bexpression* bexpr, Type* type, Location location)
+{
+  return new Backend_expression(bexpr, type, location);
+}
+
 // Import an expression.  This comes at the end in order to see the
 // various class definitions.
 
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 243084)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -137,7 +137,8 @@ class Expression
 EXPRESSION_STRUCT_FIELD_OFFSET,
 EXPRESSION_LABEL_ADDR,
 EXPRESSION_CONDITIONAL,
-EXPRESSION_COMPOUND
+EXPRESSION_COMPOUND,
+EXPRESSION_BACKEND
   };
 
   Expression(Expression_classification, Location);
@@ -485,6 +486,10 @@ class Expression
   static Expression*
   make_compound(Expression*, Expression*, Location);
 
+  // Make a backend expression.
+  static Expression*
+  make_backend(Bexpression*, Type*, Location);
+
   // Return the expression classification.
   Expression_classification
   classification() const
@@ -3825,6 +3830,54 @@ class Compound_expression : public Expre
   Expression* expr_;
 };
 
+// A backend expression.  This is a backend expression wrapped in an
+// Expression, for convenience during backend generation.
+
+class Backend_expression : public Expression
+{
+ public:
+  Backend_expression(Bexpression* bexpr, Type* type, Location location)
+: Expression(EXPRESSION_BACKEND, location), bexpr_(bexpr), type_(type)
+  {}
+
+ protected:
+  int
+  do_traverse(Traverse*);
+
+  // For now these are always valid static initializers.  If that
+  // changes we can change this.
+  bool
+  do_is_static_initializer() const
+  { return true; }
+
+  Type*
+  do_type()
+  { return this->type_; }
+
+  void
+  do_determine_type(const Type_context*)
+  { }
+
+  Expression*
+  do_copy()
+  {
+return new Backend_expression(this->bexpr_, this->type_, this->location());
+  }
+
+  Bexpression*
+  do_get_backend(Translate_context*)
+  { return this->bexpr_; }
+
+  void
+  do_dump_expression(Ast_dump_context*) const;
+
+ private:
+  // The backend expression we are wrapping.
+  Bexpression* bexpr_;
+  // The type of the expression;
+  Type* ty

[Committed] PR fortran/78279 -- convert gcc_assert to internal error

2016-12-01 Thread Steve Kargl

I've committed the attached patch, which converts a gcc_assert()
to a conditional express tha may call gfc_internal_error().o

2016-12-01  Steven G. Kargl  

PR fortran/78279
* dependency.c (identical_array_ref): Convert gcc_assert to conditional
and gfc_internal_error.

2016-12-01  Steven G. Kargl  

PR fortran/78279
* gfortran.dg/pr78279.f90: New test.

-- 
Steve
Index: gcc/fortran/dependency.c
===
--- gcc/fortran/dependency.c	(revision 242789)
+++ gcc/fortran/dependency.c	(working copy)
@@ -101,7 +101,9 @@ identical_array_ref (gfc_array_ref *a1, 
 
   if (a1->type == AR_ELEMENT && a2->type == AR_ELEMENT)
 {
-  gcc_assert (a1->dimen == a2->dimen);
+  if (a1->dimen != a2->dimen)
+	gfc_internal_error ("identical_array_ref(): inconsistent dimensions");
+
   for (i = 0; i < a1->dimen; i++)
 	{
 	  if (gfc_dep_compare_expr (a1->start[i], a2->start[i]) != 0)
Index: gcc/testsuite/gfortran.dg/pr78279.f90
===
--- gcc/testsuite/gfortran.dg/pr78279.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78279.f90	(working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+program p
+   integer :: i
+   real :: z(2,4)
+   z = 0.0
+   do i = 1, 3
+  if ( z(i) > z(1,i+1) ) print *, i   ! { dg-error "mismatch in array reference" }
+   end do
+end

Re: [Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

2016-12-01 Thread Jeff Law


On 12/01/2016 08:29 AM, James Greenhalgh wrote:


Hi,

There's no functional change in this patch, just a rename.

The size recorded in "offset" is only ever incremented as we add new items
to the constant pool. But this information can become stale where those
constant pool entries would not get emitted.

Thus, it is only ever an upper bound on the size of the constant pool.

The only uses of get_pool_size are in rs6000 and there it is only used to
check whether a constant pool might be output - but explicitly renaming the
function to make it clear that you're getting an upper bound rather than the
real size can only be good for programmers using the interface.

OK?

Thanks,
James

---
2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.


Both parts of the fix for 78561 are OK.

Thanks,
Jeff

1 2 >

1 - 100 of 138 matches

Mail list logo