date:20160404

[patch] Fix PR target/67172

2016-04-04 Thread Eric Botcazou

Hi,

this is the bootstrap failure on MinGW configured w/ --disable-sjlj-exceptions 
--with-dwarf2, a regression present on the mainline and 5 branch.  The problem 
is the undefined symbol __EH_FRAME_BEGIN__ coming from libgcc2.c, which 
prevents libgcc_s.dll from being linked.

The audit trail of the PR is quite confused: __LIBGCC_EH_FRAME_SECTION_NAME__ 
needs to be (and is correctly) defined during the libgcc build because DWARF2 
exceptions are used.  The problem is that libgcc2.c expects the usual non-ELF 
setup, with __EH_FRAME_BEGIN__ defined in crtstuff.c.  Now t-cygwin has:

CUSTOM_CRTSTUFF = yes

crtbegin.o: $(srcdir)/config/i386/cygming-crtbegin.c
$(crt_compile) -fno-omit-frame-pointer  -c $<

crtbeginS.o: $(srcdir)/config/i386/cygming-crtbegin.c
$(crt_compile) -fno-omit-frame-pointer  -c $< -DCRTSTUFFS_O

# We intentionally use a implementation-reserved init priority of 0,
# so allow the warning.
crtend.o: $(srcdir)/config/i386/cygming-crtend.c
$(crt_compile) -fno-omit-frame-pointer -Wno-error -c $<

That is to say, crtstuff.c is not compiled and instead __EH_FRAME_BEGIN__ is 
private to cygming-crtbegin.c:

/* Stick a label at the beginning of the frame unwind info so we can
   register/deregister it with the exception handling library code.  */
#if DWARF2_UNWIND_INFO
static EH_FRAME_SECTION_CONST char __EH_FRAME_BEGIN__[]
  __attribute__((used, section(__LIBGCC_EH_FRAME_SECTION_NAME__), aligned(4)))
  = { };

with the registration/deregistration machinery in the same file.  Therefore 
the equivalent code in libgcc2.c simply needs to be skipped.

This code is already entirely skipped for Cygwin because of:

#ifndef __CYGWIN__

so the attached patch adds a similar test on MinGW for the problematic code.

Tested on x86/Windows configured with --disable-sjlj-exceptions --with-dwarf2, 
OK for the mainline and 5 branch?


2016-04-04  Eric Botcazou  

PR target/67172
* libgcc2.c (L__main): Undefine __LIBGCC_EH_FRAME_SECTION_NAME__ if
__MINGW32__ is defined.

-- 
Eric BotcazouIndex: libgcc2.c
===
--- libgcc2.c	(revision 234695)
+++ libgcc2.c	(working copy)
@@ -2209,7 +2209,12 @@ TRANSFER_FROM_TRAMPOLINE
 #if !defined (HAS_INIT_SECTION) || !defined (OBJECT_FORMAT_ELF)
 
 /* Some ELF crosses use crtstuff.c to provide __CTOR_LIST__, but use this
-   code to run constructors.  In that case, we need to handle EH here, too.  */
+   code to run constructors.  In that case, we need to handle EH here, too.
+   But MINGW32 is special because it handles CRTSTUFF and EH on its own.  */
+
+#ifdef __MINGW32__
+#undef __LIBGCC_EH_FRAME_SECTION_NAME__
+#endif
 
 #ifdef __LIBGCC_EH_FRAME_SECTION_NAME__
 #include "unwind-dw2-fde.h"

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Prathamesh Kulkarni

On 1 April 2016 at 23:02, Richard Biener  wrote:
> On April 1, 2016 3:48:35 PM GMT+02:00, Prathamesh Kulkarni 
>  wrote:
>>Hi,
>>The attached patch introduces param max-lto-partition which creates an
>>upper
>>bound for partition size.
>>
>>My primary motivation for this patch is to fix building chromium for
>>arm
>>with -flto-partition=one.
>>Chromium fails to build with -flto-partition={none, one} with assembler
>>error:
>>"branch out of range error"
>>because in both these cases LTO creates a single text section of 18 mb
>>which exceeds thumb's limit of 16 mb and arm backend emits a short
>>call if caller and callee are in same section.
>>This is binutils PR18625:
>>https://sourceware.org/bugzilla/show_bug.cgi?id=18625
>>With patch, chromium builds for -flto-partition=one (by creating more
>>than one but minimal number of partitions to honor 16 mb limit).
>>I haven't tested with -flto-partition=none but I suppose the build
>>will still fail for none, because it won't involve partitioning?  I am
>>not sure how to fix for none case.
>>
>>As suggested by Jim in binutils PR18625, the proper fix would be to
>>implement branch relaxation in arm's port of gas, however I suppose
>>only LTO will realistically create such large sections,
>>and implementing branch relaxation appears to be quite complicated and
>>probably too much of
>>an effort for this single use case, so this patch serves as a
>>work-around to the issue.
>>I am looking into fine-tuning the param value for ARM backend to
>>roughly match limit
>>of 16 mb.
>>
>>AFAIU, this would change semantics of --param n_lto_partitions (or
>>-flto-partition=one) from
>>"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
>>not desirable maybe we could add
>>another param/option ?
>>Cross-tested on arm*-*-*.
>>Would this patch be OK for stage-1 (after getting param value right
>>for ARM target) ?
>
> What do you want to achieve?  Changing =one semantics doesn't look right to 
> me.
> Adding a param for maximum size sounds good in general, but only to increase 
> the maximum number of partitions for =balanced (the default).
Well, chromium fails to build on ARM with -flto-partition={none, one}
because the size of text section created with LTO,
exceeds the limit of 16 mb for thumb2 which results in assembler
errors: "branch out of range".
I was trying to fix that by creating minimal number of partitions such
that size of each partition is not greater than section size limit.
I suppose in theory the problem could also present with balanced
partitioning if total_size / n_lto_partitions exceeds section size
limit,
although not sure if this will be a practical case.

Thanks,
Prathamesh
>
> Richard.
>
>>
>>Thanks,
>>Prathamesh
>
>

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Richard Biener

On Mon, 4 Apr 2016, Prathamesh Kulkarni wrote:

> On 1 April 2016 at 23:02, Richard Biener  wrote:
> > On April 1, 2016 3:48:35 PM GMT+02:00, Prathamesh Kulkarni 
> >  wrote:
> >>Hi,
> >>The attached patch introduces param max-lto-partition which creates an
> >>upper
> >>bound for partition size.
> >>
> >>My primary motivation for this patch is to fix building chromium for
> >>arm
> >>with -flto-partition=one.
> >>Chromium fails to build with -flto-partition={none, one} with assembler
> >>error:
> >>"branch out of range error"
> >>because in both these cases LTO creates a single text section of 18 mb
> >>which exceeds thumb's limit of 16 mb and arm backend emits a short
> >>call if caller and callee are in same section.
> >>This is binutils PR18625:
> >>https://sourceware.org/bugzilla/show_bug.cgi?id=18625
> >>With patch, chromium builds for -flto-partition=one (by creating more
> >>than one but minimal number of partitions to honor 16 mb limit).
> >>I haven't tested with -flto-partition=none but I suppose the build
> >>will still fail for none, because it won't involve partitioning?  I am
> >>not sure how to fix for none case.
> >>
> >>As suggested by Jim in binutils PR18625, the proper fix would be to
> >>implement branch relaxation in arm's port of gas, however I suppose
> >>only LTO will realistically create such large sections,
> >>and implementing branch relaxation appears to be quite complicated and
> >>probably too much of
> >>an effort for this single use case, so this patch serves as a
> >>work-around to the issue.
> >>I am looking into fine-tuning the param value for ARM backend to
> >>roughly match limit
> >>of 16 mb.
> >>
> >>AFAIU, this would change semantics of --param n_lto_partitions (or
> >>-flto-partition=one) from
> >>"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
> >>not desirable maybe we could add
> >>another param/option ?
> >>Cross-tested on arm*-*-*.
> >>Would this patch be OK for stage-1 (after getting param value right
> >>for ARM target) ?
> >
> > What do you want to achieve?  Changing =one semantics doesn't look right to 
> > me.
> > Adding a param for maximum size sounds good in general, but only to 
> > increase the maximum number of partitions for =balanced (the default).
>
> Well, chromium fails to build on ARM with -flto-partition={none, one}
> because the size of text section created with LTO,
> exceeds the limit of 16 mb for thumb2 which results in assembler
> errors: "branch out of range".
> I was trying to fix that by creating minimal number of partitions such
> that size of each partition is not greater than section size limit.

Ok, but you simply shouldn't use -flto-partition={none,one} if it doesn't 
work.  Note that "partition size" and text section size do not have
a 1:1 correspondence so a safe limit is hard to achieve anyway.

> I suppose in theory the problem could also present with balanced
> partitioning if total_size / n_lto_partitions exceeds section size
> limit,
> although not sure if this will be a practical case.

I guess an artificial testcase can easily hit this.  Or you can
hit this by adjusting --param lto-partitions to 1.  I think
adding a --param lto-max-partition is missing given that we
already have a --param lto-min-partition and the partitioning
algorithm tries to create lto-partitions partitions (but not smaller
than lto-min-partition) but it never creates more than lto-partitions
partitions as there is no upper bound on individual partition size.

This is also why lto-partitions has such a high default (to exploit
parallelism - but if there is only a very small number of CPU cores
available it doesn't make sense to split up so much for small programs).

That said, lto-partitions is a hint currently but also an upper bound
because we lack lto-max-partition.  Let's fix that instead.

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH][AArch64] Backport of PR 70044 fix to GCC 5

2016-04-04 Thread Kyrill Tkachov


Hi all,

I'd like to backport Nicks' patch for PR 70044 to the GCC 5 branch.
The patch doesn't apply cleanly because the 
aarch64_override_options_after_change and
associated machinery was reworked for GCC 6. This is the (simple) backport of 
that
patch to GCC 5.

Bootstrapped and tested on aarch64-none-linux-gnu. Confirmed that the test 
fails before the patch
and passes with it.

Ok to commit?

Thanks,
Kyrill

2016-04-04  Nick Clifton  
Kyrylo Tkachov  

PR target/70044
* config/aarch64/aarch64.c
(aarch64_override_options_after_change): When forcing
flag_omit_frame_pointer to be true, use a special value that can
be detected if this function is called again, thus preventing
flag_omit_leaf_frame_pointer from being forced to be false.

2016-04-04  Kyrylo Tkachov  

Backport from mainline
2016-03-31  Nick Clifton  

PR target/70044
* gcc.target/aarch64/pr70044.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c7291e535d756afd977894afa9c2bc53f5d8656..4113609d470ff21de47a647217fd20be4c967225 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6917,10 +6917,26 @@ aarch64_override_options (void)
 static void
 aarch64_override_options_after_change (void)
 {
+  /* The logic here is that if we are disabling all frame pointer generation
+ then we do not need to disable leaf frame pointer generation as a
+ separate operation.  But if we are *only* disabling leaf frame pointer
+ generation then we set flag_omit_frame_pointer to true, but in
+ aarch64_frame_pointer_required we return false only for leaf functions.
+
+ PR 70044: We have to be careful about being called multiple times for the
+ same function.  Once we have decided to set flag_omit_frame_pointer just
+ so that we can omit leaf frame pointers, we must then not interpret a
+ second call as meaning that all frame pointer generation should be
+ omitted.  We do this by setting flag_omit_frame_pointer to a special,
+ non-zero value.  */
+
+  if (flag_omit_frame_pointer == 2)
+flag_omit_frame_pointer = 0;
+
   if (flag_omit_frame_pointer)
 flag_omit_leaf_frame_pointer = false;
   else if (flag_omit_leaf_frame_pointer)
-flag_omit_frame_pointer = true;
+flag_omit_frame_pointer = 2;
 
   /* If not optimizing for size, set the default
  alignment to what the target wants */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr70044.c b/gcc/testsuite/gcc.target/aarch64/pr70044.c
new file mode 100644
index ..1a84941dd7ea9dc366dd0ba51e0a96fcb312048f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr70044.c
@@ -0,0 +1,14 @@
+/* { dg-do link } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -O --save-temps -fno-omit-frame-pointer" } */
+
+extern int atoi (const char *);
+
+int
+main (int argc, char **argv)
+{
+  return atoi (argv[0]) + 1;
+}
+
+/* Check that the frame pointer really is created.  */
+/* { dg-final { scan-lto-assembler "add	x29, sp," } } */

Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-04 Thread Richard Biener

On Fri, 1 Apr 2016, Bernd Schmidt wrote:

> On 04/01/2016 11:08 AM, Richard Biener wrote:
> > {
> > ! if (canon_true_dependence (s_info->mem,
> > !GET_MODE (s_info->mem),
> > !s_info->mem_addr,
> > !mem, mem_addr))
> > {
> >   s_info->rhs = NULL;
> >   s_info->const_rhs = NULL;
> > --- 1609,1617 
> >the value of store_info.  If it is, set the rhs to NULL to
> >keep it from being used to remove a load.  */
> > {
> > ! if (canon_output_dependence (s_info->mem, true,
> > !  mem, GET_MODE (mem),
> > !  mem_addr))
> > {
> >   s_info->rhs = NULL;
> >   s_info->const_rhs = NULL;
> 
> I think the patch is ok, but there is a comment in that function which
> references canon_true_dependence; that should also be fixed up.

Done, though I don't understand it at all ... if alias-set was supposed
to be zero for all cases we call canon_true_dependence then the issue
wouldn't have happened.  Maybe there was times where passing mem_addr
== NULL_RTX to canon_true_dependence caused it to bail out?

Not sure how to adjust that comment now, maybe it would be valid
to simply remove the if (spill_alias_set) case and always use
the else case?

> Isn't the testcase invalid though? I thought accesses through char * 
> pointers bypass aliasing rules, but accessing a char array through int * 
> and long * pointers doesn't?

I have installed it as the following with adjusted testcase, if somebody 
can shed some light on the odd comment I'll happily followup.

Richard.

2016-04-04  Richard Biener  

PR rtl-optimization/70484
* rtl.h (canon_output_dependence): Declare.
* alias.c (canon_output_dependence): New function.
* dse.c (record_store): Use canon_output_dependence rather
than canon_true_dependence.

* gcc.dg/torture/pr70484.c: New testcase.

Index: gcc/rtl.h
===
*** gcc/rtl.h   (revision 234663)
--- gcc/rtl.h   (working copy)
*** extern int anti_dependence (const_rtx, c
*** 3652,3657 
--- 3652,3659 
  extern int canon_anti_dependence (const_rtx, bool,
  const_rtx, machine_mode, rtx);
  extern int output_dependence (const_rtx, const_rtx);
+ extern int canon_output_dependence (const_rtx, bool,
+   const_rtx, machine_mode, rtx);
  extern int may_alias_p (const_rtx, const_rtx);
  extern void init_alias_target (void);
  extern void init_alias_analysis (void);
Index: gcc/alias.c
===
*** gcc/alias.c (revision 234663)
--- gcc/alias.c (working copy)
*** output_dependence (const_rtx mem, const_
*** 3057,3062 
--- 3057,3076 
 /*mem_canonicalized=*/false,
 /*x_canonicalized*/false, /*writep=*/true);
  }
+ 
+ /* Likewise, but we already have a canonicalized MEM, and X_ADDR for X.
+Also, consider X in X_MODE (which might be from an enclosing
+STRICT_LOW_PART / ZERO_EXTRACT).
+If MEM_CANONICALIZED is true, MEM is canonicalized.  */
+ 
+ int
+ canon_output_dependence (const_rtx mem, bool mem_canonicalized,
+const_rtx x, machine_mode x_mode, rtx x_addr)
+ {
+   return write_dependence_p (mem, x, x_mode, x_addr,
+mem_canonicalized, /*x_canonicalized=*/true,
+/*writep=*/true);
+ }
  
  
  
Index: gcc/dse.c
===
*** gcc/dse.c   (revision 234663)
--- gcc/dse.c   (working copy)
*** record_store (rtx body, bb_info_t bb_inf
*** 1609,1618 
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
! if (canon_true_dependence (s_info->mem,
!GET_MODE (s_info->mem),
!s_info->mem_addr,
!mem, mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
--- 1609,1617 
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
! if (canon_output_dependence (s_info->mem, true,
!  mem, GET_MODE (mem),
!  mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
Index: gcc/testsuite/gcc.dg/torture/pr70484.c
===
*** gcc/testsuite/gcc.dg/torture/pr70484.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70484.c  (revision 0)
***

Re: [Fortran, Patch, pr65795, v1] Segfault (invalid write) for ALLOCATE statement involving COARRAYS

2016-04-04 Thread Andre Vehreschild

Hi Jerry, hi Dominique,

Jerry, thank you for the review. Committed to

trunk as r234710 and to
gcc-5-branch as r234711

including the additional option Dominique noted I had missed. Thank for
pointing that out, Dominique.

Regards,
Andre

On Sun, 3 Apr 2016 23:46:34 +0200
Dominique d'Humières  wrote:

> > Le 3 avr. 2016 à 23:38, Andre Vehreschild  a écrit :
> > 
> > Hi Dominique,
> > 
> > I thought that was implicit, isn't it?  
> 
> I don’t think so and I see in the log files
> 
> gcc/testsuite/gfortran4/gfortran.sum:UNRESOLVED: 
> gfortran.dg/coarray_allocate_6.f08   -O0   scan-tree-dump-not original 
> "c.caf.x = 0B"
> 
> > Is only the removal of the file implicit?  
> 
> Yes,
> 
> > I will add the option to be on the safe side.  
> 
> Thanks,
> 
> Dominique
> 
> > 
> > - Andre
> > 
> > Am 3. April 2016 22:57:48 MESZ, schrieb "Dominique d'Humières" 
> > :
> > Andre,
> > 
> > Does not the test gfortran.dg/coarray_allocate_6.f08 require a 
> > -fdump-tree-original in the dg-options list?
> > 
> > Thanks for the patch,
> > 
> > Dominique
> > 
> > 
> > -- 
> > Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
> > Tel.: +49 241 929 10 18 * ve...@gmx.de  
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 234709)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,11 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/65795
+	* trans-array.c (gfc_array_allocate): When the array is a coarray,
+	do not nullyfing its allocatable components in array_allocate, because
+	the nullify missed the array ref and nullifies the wrong component.
+	Cosmetics.
+
 2016-03-29  Andre Vehreschild  
 
 	PR fortran/70397
Index: gcc/fortran/trans-array.c
===
--- gcc/fortran/trans-array.c	(Revision 234709)
+++ gcc/fortran/trans-array.c	(Arbeitskopie)
@@ -5550,8 +5550,8 @@
   else
   gfc_add_expr_to_block (&se->pre, set_descriptor);
 
-  if ((expr->ts.type == BT_DERIVED)
-	&& expr->ts.u.derived->attr.alloc_comp)
+  if (expr->ts.type == BT_DERIVED && expr->ts.u.derived->attr.alloc_comp
+  && !coarray)
 {
   tmp = gfc_nullify_alloc_comp (expr->ts.u.derived, se->expr,
 ref->u.ar.as->rank);
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 234709)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/65795
+	* gfortran.dg/coarray_allocate_6.f08: New test.
+
 2016-04-04  Richard Biener  
 
 	PR rtl-optimization/70484
Index: gcc/testsuite/gfortran.dg/coarray_allocate_6.f08
===
--- gcc/testsuite/gfortran.dg/coarray_allocate_6.f08	(nicht existent)
+++ gcc/testsuite/gfortran.dg/coarray_allocate_6.f08	(Arbeitskopie)
@@ -0,0 +1,27 @@
+! { dg-do run }
+! { dg-options "-fcoarray=single -fdump-tree-original" }
+
+! Contributed by Tobias Burnus  
+! Test fix for pr65795.
+
+implicit none
+
+type t2
+  integer, allocatable :: x
+end type t2
+
+type t3
+  type(t2), allocatable :: caf[:]
+end type t3
+
+!type(t3), save, target :: c, d
+type(t3), target :: c, d
+integer :: stat
+
+allocate(c%caf[*], stat=stat)
+end
+
+! Besides checking that the executable does not crash anymore, check
+! that the cause has been remove.
+! { dg-final { scan-tree-dump-not "c.caf.x = 0B" "original" } }
+
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 234706)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,11 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/65795
+	* trans-array.c (gfc_array_allocate): When the array is a coarray,
+	do not nullyfing its allocatable components in array_allocate, because
+	the nullify missed the array ref and nullifies the wrong component.
+	Cosmetics.
+
 2016-03-28  Andre Vehreschild  
 
 	PR fortran/70397
Index: gcc/fortran/trans-array.c
===
--- gcc/fortran/trans-array.c	(Revision 234706)
+++ gcc/fortran/trans-array.c	(Arbeitskopie)
@@ -5390,8 +5390,8 @@
   else
   gfc_add_expr_to_block (&se->pre, set_descriptor);
 
-  if ((expr->ts.type == BT_DERIVED)
-	&& expr->ts.u.derived->attr.alloc_comp)
+  if (expr->ts.type == BT_DERIVED && expr->ts.u.derived->attr.alloc_comp
+  && !coarray)
 {
   tmp = gfc_nullify_alloc_comp (expr->ts.u.derived, se->expr,
 ref->u.ar.as->rank);
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 234706)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/65795
+	* gfortran.dg/coarray_allocate_6.f08: New test.
+
 2016-04-01  Ilya Enkovich  
 
 	Backport from mainline r

Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-04 Thread Richard Biener

On Mon, 4 Apr 2016, Richard Biener wrote:

> On Fri, 1 Apr 2016, Bernd Schmidt wrote:
> 
> > On 04/01/2016 11:08 AM, Richard Biener wrote:
> > >   {
> > > !   if (canon_true_dependence (s_info->mem,
> > > !  GET_MODE (s_info->mem),
> > > !  s_info->mem_addr,
> > > !  mem, mem_addr))
> > >   {
> > > s_info->rhs = NULL;
> > > s_info->const_rhs = NULL;
> > > --- 1609,1617 
> > >  the value of store_info.  If it is, set the rhs to NULL to
> > >  keep it from being used to remove a load.  */
> > >   {
> > > !   if (canon_output_dependence (s_info->mem, true,
> > > !mem, GET_MODE (mem),
> > > !mem_addr))
> > >   {
> > > s_info->rhs = NULL;
> > > s_info->const_rhs = NULL;
> > 
> > I think the patch is ok, but there is a comment in that function which
> > references canon_true_dependence; that should also be fixed up.
> 
> Done, though I don't understand it at all ... if alias-set was supposed
> to be zero for all cases we call canon_true_dependence then the issue
> wouldn't have happened.  Maybe there was times where passing mem_addr
> == NULL_RTX to canon_true_dependence caused it to bail out?
> 
> Not sure how to adjust that comment now, maybe it would be valid
> to simply remove the if (spill_alias_set) case and always use
> the else case?
> 
> > Isn't the testcase invalid though? I thought accesses through char * 
> > pointers bypass aliasing rules, but accessing a char array through int * 
> > and long * pointers doesn't?
> 
> I have installed it as the following with adjusted testcase, if somebody 
> can shed some light on the odd comment I'll happily followup.

Jakub added that comment in r146669.  Still the code only handles
alias-sets being different specially by not deleting such stores.
But it doesn't break from the loop there (if that store is aliasing).

I suppose this code might have similar issues for deleting "dead"
stores when not replacing the RHS.

Richard.

> Richard.
> 
> 2016-04-04  Richard Biener  
> 
>   PR rtl-optimization/70484
>   * rtl.h (canon_output_dependence): Declare.
>   * alias.c (canon_output_dependence): New function.
>   * dse.c (record_store): Use canon_output_dependence rather
>   than canon_true_dependence.
> 
>   * gcc.dg/torture/pr70484.c: New testcase.
> 
> Index: gcc/rtl.h
> ===
> *** gcc/rtl.h (revision 234663)
> --- gcc/rtl.h (working copy)
> *** extern int anti_dependence (const_rtx, c
> *** 3652,3657 
> --- 3652,3659 
>   extern int canon_anti_dependence (const_rtx, bool,
> const_rtx, machine_mode, rtx);
>   extern int output_dependence (const_rtx, const_rtx);
> + extern int canon_output_dependence (const_rtx, bool,
> + const_rtx, machine_mode, rtx);
>   extern int may_alias_p (const_rtx, const_rtx);
>   extern void init_alias_target (void);
>   extern void init_alias_analysis (void);
> Index: gcc/alias.c
> ===
> *** gcc/alias.c   (revision 234663)
> --- gcc/alias.c   (working copy)
> *** output_dependence (const_rtx mem, const_
> *** 3057,3062 
> --- 3057,3076 
>/*mem_canonicalized=*/false,
>/*x_canonicalized*/false, /*writep=*/true);
>   }
> + 
> + /* Likewise, but we already have a canonicalized MEM, and X_ADDR for X.
> +Also, consider X in X_MODE (which might be from an enclosing
> +STRICT_LOW_PART / ZERO_EXTRACT).
> +If MEM_CANONICALIZED is true, MEM is canonicalized.  */
> + 
> + int
> + canon_output_dependence (const_rtx mem, bool mem_canonicalized,
> +  const_rtx x, machine_mode x_mode, rtx x_addr)
> + {
> +   return write_dependence_p (mem, x, x_mode, x_addr,
> +  mem_canonicalized, /*x_canonicalized=*/true,
> +  /*writep=*/true);
> + }
>   
>   
>   
> Index: gcc/dse.c
> ===
> *** gcc/dse.c (revision 234663)
> --- gcc/dse.c (working copy)
> *** record_store (rtx body, bb_info_t bb_inf
> *** 1609,1618 
>  the value of store_info.  If it is, set the rhs to NULL to
>  keep it from being used to remove a load.  */
>   {
> !   if (canon_true_dependence (s_info->mem,
> !  GET_MODE (s_info->mem),
> !  s_info->mem_addr,
> !  mem, mem_addr))
>   {
> s_info->rhs = NULL;
> s_info->const_rhs = NULL;
> --- 1609,1617 -

Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-04 Thread Jan-Benedict Glaw

On Fri, 2016-04-01 12:06:20 -0700, Jake Hamby  wrote:
> I apologize for the poor quality of the initial version of the patch
> that I submitted. I think I may have also mangled it by not

Don't apologize!  Posting work early enables others to comment on it.
GCC is a highly complex beast; nobody will produce a perfectly looking
patch on their first try.

[...]

> To be honest, my hope by sending out my work now, even in such a
> rough state, would be to try to avoid deprecating the GCC port to
> VAX, if only because: 1) there doesn't seem to be a compelling
> reason to remove support for it since it doesn't use any really old
> code paths that aren't also used by other backends (e.g. m68k and
> Atmel AVR use cc0, IBM S/390 uses non-IEEE FP formats), so it
> doesn't seem to be preventing any optimizations or code refactoring
> elsewhere in GCC that I could see, 2) even though NetBSD could
> continue to support VAX GCC, I noticed in the ChangeLogs that
> whenever somebody has made a change to a definition that affects the
> backends, they're usually very good about updating all of the
> backends so that they continue to compile, at least. So leaving the
> VAX backend in the tree would be beneficial from a maintenance
> standpoint for users of it, 3) the VAX backend is perhaps somewhat
> useful as a test case for GCC because so many unusual RTX standard
> instructions were obviously defined *for* it (although x86 defines a
> lot of them, too), although my interest is personally in preserving
> an interesting piece of computer history, and for nostalgia
> purposes.

As of now, ther VAX backend isn't near deprecation IMO. There'a
maintainer (Matt), who did quite a revamp a few years ago bringing the
VAX backend quite forward.  I also quite care for that backend and the
Build Robot I'm running is primarily(!) running to detect VAX
breakages early. (In fact, quite often Matt and I communicate over
submitted patches to the VAX backend.)

> I sent an earlier email to port-vax suggesting that future
> discussions of this project aren't relevant to gcc-patches, but I
> did want to get it on a few people's radar on the NetBSD side and
> try to solicit a bit of help on the questions I had as to how to
> avoid having to add that hack to gcc/except.c to make the optimizer
> not delete the insns. All of the other stuff can be worked on in
> NetBSD-current and avoid bothering the 99% of people who subscribe
> to gcc-patches who have no interest in the VAX backend.

You should /for sure/ bother the gcc-patches people! Please keep
Cc'ing patches to that mailing list.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  [Nach Firefox-Update gibt es Empfehlungen, wenn man einen 
neuen Tab aufmacht.]
the second  :   13:26 <@jbglaw> "Hide the new 
tab page"
13:27 <@jbglaw> Warum zum Teufel sind gerade Kacheln so 
ungeheuer angesagt?!
  13:57 <@andrel> die Mozilla Foundation hat eine LKW Ladung 
Fugenkitt gespendet bekommen?


signature.asc
Description: Digital signature

Re: [Fortran, Patch, pr66911, gcc-5, v1] ICE on allocate character with source as a derived type component

2016-04-04 Thread Andre Vehreschild

Hi Jerry,

thank you for the review. Committed as r234712 to gcc-5-branch.

Regards,
Andre

On Sun, 3 Apr 2016 12:22:31 -0700
Jerry DeLisle  wrote:

> On 04/03/2016 08:33 AM, Andre Vehreschild wrote:
> > Hi all,
> > 
> > attached patch fixes the ICE when using a deferred length char array as
> > source= expression in an allocate for complicated source= expressions.
> > Before the patch the compiler was relying on having the string length
> > available in the ts of the expression, but when the expression is
> > sufficiently complicated it is not set there. In trunk the problem does
> > not arise, because the source= expression is evaluated in more cases.
> > In gcc-5 this is not available without doing a major rewrite of the
> > allocate() statement's conv-routine. Therefore this small portion of
> > extra code reliably does the trick and takes the string_length from the
> > se.string_length now.
> > 
> > Bootstrapped and regtested ok on x86_64-linux-gnu/F23. Ok for
> > gcc-5-branch?
> > 
> > Regards,
> > Andre
> >   
> 
> Yes, OK
> 
> Thanks for your work.
> 
> Jerry


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 234711)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,10 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/66911
+	* trans-stmt.c (gfc_trans_allocate): Get the deferred length of an
+	expression by converting the expression when the length is not set
+	in the symbol's ts.
+
 2016-04-04  Andre Vehreschild  
 
 	PR fortran/65795
Index: gcc/fortran/trans-stmt.c
===
--- gcc/fortran/trans-stmt.c	(Revision 234710)
+++ gcc/fortran/trans-stmt.c	(Arbeitskopie)
@@ -5536,14 +5536,23 @@
 	  if (expr3_len == NULL_TREE
 	  && code->expr3->ts.type == BT_CHARACTER)
 	{
+	  gfc_init_se (&se, NULL);
 	  if (code->expr3->ts.u.cl
 		  && code->expr3->ts.u.cl->length)
 		{
-		  gfc_init_se (&se, NULL);
 		  gfc_conv_expr (&se, code->expr3->ts.u.cl->length);
 		  gfc_add_block_to_block (&block, &se.pre);
 		  expr3_len = gfc_evaluate_now (se.expr, &block);
 		}
+	  else
+		{
+		  /* The string_length is not set in the symbol, which prevents
+		 it being set in the ts.  Deduce it by converting expr3.  */
+		  gfc_conv_expr (&se, code->expr3);
+		  gfc_add_block_to_block (&block, &se.pre);
+		  gcc_assert (se.string_length);
+		  expr3_len = gfc_evaluate_now (se.string_length, &block);
+		}
 	  gcc_assert (expr3_len);
 	}
 	  /* For character arrays only the kind's size is needed, because
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 234711)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2016-04-04  Andre Vehreschild  
+
+	PR fortran/66911
+	* gfortran.dg/deferred_character_16.f90: New test.
+
 2016-04-04  Andre Vehreschild  
 
 	PR fortran/65795
Index: gcc/testsuite/gfortran.dg/deferred_character_16.f90
===
--- gcc/testsuite/gfortran.dg/deferred_character_16.f90	(nicht existent)
+++ gcc/testsuite/gfortran.dg/deferred_character_16.f90	(Arbeitskopie)
@@ -0,0 +1,24 @@
+! { dg-do run }
+
+program truc
+implicit none
+
+type t_env_table
+character(len=:), allocatable :: key
+end type
+
+type(t_env_table), dimension(:), allocatable :: environment_table
+
+character(len=:), allocatable :: s
+
+allocate(environment_table(1))
+environment_table(1)%key='tt'
+
+allocate(s, source=environment_table(1)%key)
+
+if ( .not. allocated(s) ) call abort()
+if ( s /= "tt" ) call abort()
+if ( len(s) /= 2 ) call abort()
+!print *, 's:"', s, '" derived:"',environment_table(1)%key,'"'
+
+end program

Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 11:24:41AM +0200, Richard Biener wrote:
> On Fri, 1 Apr 2016, Bernd Schmidt wrote:
> 
> > On 04/01/2016 11:08 AM, Richard Biener wrote:
> > >   {
> > > !   if (canon_true_dependence (s_info->mem,
> > > !  GET_MODE (s_info->mem),
> > > !  s_info->mem_addr,
> > > !  mem, mem_addr))
> > >   {
> > > s_info->rhs = NULL;
> > > s_info->const_rhs = NULL;
> > > --- 1609,1617 
> > >  the value of store_info.  If it is, set the rhs to NULL to
> > >  keep it from being used to remove a load.  */
> > >   {
> > > !   if (canon_output_dependence (s_info->mem, true,
> > > !mem, GET_MODE (mem),
> > > !mem_addr))
> > >   {
> > > s_info->rhs = NULL;
> > > s_info->const_rhs = NULL;
> > 
> > I think the patch is ok, but there is a comment in that function which
> > references canon_true_dependence; that should also be fixed up.
> 
> Done, though I don't understand it at all ... if alias-set was supposed
> to be zero for all cases we call canon_true_dependence then the issue
> wouldn't have happened.  Maybe there was times where passing mem_addr
> == NULL_RTX to canon_true_dependence caused it to bail out?
> 
> Not sure how to adjust that comment now, maybe it would be valid
> to simply remove the if (spill_alias_set) case and always use
> the else case?

I believe all of the spill_alias_set stuff is dead for many years.
In 4.4, a call to dse_record_singleton_alias_set has been removed
(supposedly related to introduction of IRA).  Then in 4.8 you've
removed the dse_record_singleton_alias_set function, later on Lawrence
removed other small bits of this.
E.g. the alias_set_out argument from canon_address, all of spill_alias_set
handling, alias_set field, clear_alias_set_lookup, clear_alias_mode_holder,
clear_alias_group, clear_alias_mode_table are all dead IMHO.

Jakub

[wwwdocs] [4/3] projects/cxx-status.html -- introduce global CSS for tables

2016-04-04 Thread Gerald Pfeifer

And this is the last part, adding coloring for some table headings.

Jason, I did convert to a color in line of what we have on our
home page (in the navigation bar).  If you want, we can easily
revert to the previous state of affairs, though.

Gerald


Index: gcc.css
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc.css,v
retrieving revision 1.36
diff -u -r1.36 gcc.css
--- gcc.css 2 Apr 2016 13:44:28 -   1.36
+++ gcc.css 4 Apr 2016 04:46:04 -
@@ -67,6 +67,8 @@
 /* C++ status tables. */
 table.cxxstatus th, td { border: 1px solid gray; }
 table.cxxstatus td:nth-child(3) { text-align:center; }
+table.cxxstatus tr.separator { background: #f2f2f9; }
+
 .supported   { background-color: lightgreen; }
 .unsupported { background-color: lightsalmon; }
 
Index: projects/cxx-status.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx-status.html,v
retrieving revision 1.6
diff -u -r1.6 cxx-status.html
--- projects/cxx-status.html2 Apr 2016 13:44:28 -   1.6
+++ projects/cxx-status.html4 Apr 2016 04:46:04 -
@@ -1,11 +1,6 @@
 
 
   C++ Standards Support in GCC
-
-  /*  */
-

Add some C++17 items to gcc-6/changes.html

2016-04-04 Thread Jonathan Wakely


I plan to commit this to wwwdocs CVS. Have I missed anything that
should be listed?


Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.72
diff -u -r1.72 changes.html
--- htdocs/gcc-6/changes.html	22 Mar 2016 15:41:09 -	1.72
+++ htdocs/gcc-6/changes.html	4 Apr 2016 10:20:13 -
@@ -239,6 +239,12 @@
 aggressive in dead-store elimination in situations where
 a memory store to a location precedes a constructor to the
 memory location.
+G++ now supports
+https://gcc.gnu.org/projects/cxx-status.html#cxx1z.html";>C++17
+fold expressions, u8 character literals, and
+extended static_assert.
+G++ now allows constant evaluation for all non-type template arguments.
+G++ now supports C++ Transactional Memory. 
   
 
 Runtime Library (libstdc++)

[C PATCH] PR43651: add warning for duplicate qualifier

2016-04-04 Thread Mikhail Maltsev

Hi all!

Currently GCC produces pedantic warning, if variable declaration (or
typedef) has duplicate qualifier, but only when compiling as C89 (not
C99 or C11).

The attached patch adds a new warning option to enable the same warning
in C99 and C11. It also checks whether qualifiers come from macro
expansion, e.g.:

#define CT2 const int
const CT2 x1;
CT2 const x2;

and does not warn in this case, but warns for, e.g.

void foo(const int const *x) { }
(because this probably meant to be "const int *const x")

The name for new option "-Wduplicate-decl-specifier" and wording was
chosen to match the same option in Clang.

Bootstrapped and regtested on x86_64-linux. OK for next stage 1?

-- 
Regards,
Mikhail Maltsev


gcc/c/ChangeLog:

2016-04-04  Mikhail Maltsev  

PR c/43651
* c-decl.c (declspecs_add_qual): Warn when
-Wduplicate-decl-specifier
is enabled.
* c-errors.c (pedwarn_c90): Return true if warned.
* c-tree.h: Change return type to bool.

gcc/testsuite/ChangeLog:

2016-04-04  Mikhail Maltsev  

PR c/43651
* gcc.dg/Wduplicate-decl-specifier.c: New test.


gcc/c-family/ChangeLog:

2016-04-04  Mikhail Maltsev  

PR c/43651
* c.opt (Wduplicate-decl-specifier): New option.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4f86876..1650b25 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1000,6 +1000,10 @@ Wsubobject-linkage
 C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1)
 Warn if a class type has a base or a field whose type uses the anonymous namespace or depends on a type with no linkage.
 
+Wduplicate-decl-specifier
+C ObjC Var(warn_duplicate_decl_specifier) Warning LangEnabledBy(C ObjC,Wall)
+Warn when a variable declaration has duplicate const, volatile or restrict specifier.
+
 ansi
 C ObjC C++ ObjC++
 A synonym for -std=c89 (for C) or -std=c++98 (for C++).
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 0dd2121..9902326 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9539,21 +9539,25 @@ declspecs_add_qual (source_location loc,
   gcc_assert (TREE_CODE (qual) == IDENTIFIER_NODE
 	  && C_IS_RESERVED_WORD (qual));
   i = C_RID_CODE (qual);
+  location_t prev_loc = 0;
   switch (i)
 {
 case RID_CONST:
   dupe = specs->const_p;
   specs->const_p = true;
+  prev_loc = specs->locations[cdw_const];
   specs->locations[cdw_const] = loc;
   break;
 case RID_VOLATILE:
   dupe = specs->volatile_p;
   specs->volatile_p = true;
+  prev_loc = specs->locations[cdw_volatile];
   specs->locations[cdw_volatile] = loc;
   break;
 case RID_RESTRICT:
   dupe = specs->restrict_p;
   specs->restrict_p = true;
+  prev_loc = specs->locations[cdw_restrict];
   specs->locations[cdw_restrict] = loc;
   break;
 case RID_ATOMIC:
@@ -9564,7 +9568,17 @@ declspecs_add_qual (source_location loc,
   gcc_unreachable ();
 }
   if (dupe)
-pedwarn_c90 (loc, OPT_Wpedantic, "duplicate %qE", qual);
+{
+  bool warned = pedwarn_c90 (loc, OPT_Wpedantic,
+ "duplicate %qE declaration specifier", qual);
+  if (!warned
+	  && warn_duplicate_decl_specifier
+	  && prev_loc >= RESERVED_LOCATION_COUNT
+	  && !from_macro_expansion_at (prev_loc)
+	  && !from_macro_expansion_at (loc))
+	warning_at (loc, OPT_Wduplicate_decl_specifier,
+		"duplicate %qE declaration specifier", qual);
+}
   return specs;
 }
 
diff --git a/gcc/c/c-errors.c b/gcc/c/c-errors.c
index d5e78b8..af157c0 100644
--- a/gcc/c/c-errors.c
+++ b/gcc/c/c-errors.c
@@ -71,11 +71,12 @@ pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
ISO C99 but not supported in ISO C90, thus we explicitly don't pedwarn
when C99 is specified.  (There is no flag_c90.)  */
 
-void
+bool
 pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  bool warned = false;
   rich_location richloc (line_table, location);
 
   va_start (ap, gmsgid);
@@ -92,6 +93,7 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
 			   ? DK_PEDWARN : DK_WARNING);
 	  diagnostic.option_index = opt;
 	  report_diagnostic (&diagnostic);
+	  warned = true;
 	  goto out;
 	}
 }
@@ -114,7 +116,9 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
   diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_PEDWARN);
   diagnostic.option_index = opt;
   report_diagnostic (&diagnostic);
+  warned = true;
 }
 out:
   va_end (ap);
+  return warned;
 }
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 96ab049..ac72820 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -716,7 +716,7 @@ extern void c_bind (location_t, tree, bool);
 extern bool tag_exists_p (enum tree_code, tree);
 
 /* In c-errors.c */
-extern void pedwarn_c90 (location_t, int opt, const char *, ...)
+extern bool pedwarn_c90 (location_t, int opt, const char *, ...)
 ATTRIBUTE_GCC_DIAG(3,4);
 e

Re: [Fortran, Patch, pr67538, v1] ICE with invalid source allocation

2016-04-04 Thread Andre Vehreschild

Hi Jerry,

thanks for the review. Committed as r234714.

Regards,
Andre

On Sun, 3 Apr 2016 12:23:08 -0700
Jerry DeLisle  wrote:

> On 04/03/2016 09:13 AM, Andre Vehreschild wrote:
> > Hi all,
> > 
> > the attached patch checks that for F2008-style allocate(arr1, source=s)
> > the expression in s is array valued, when arr1 has no array spec and
> > emits an error message saying:
> > 
> > Array specification or array-valued SOURCE= expression required in ALLOCATE 
> > statement at (1)
> > 
> > This fixes the ICE.
> > 
> > Bootstrapped and regtests ok on x86_64-linux-gnu/F23. Ok for trunk?
> > 
> > Regards,
> > Andre
> >   
> 
> Yes, OK, thanks for patches.
> 
> Jerry


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 234712)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,5 +1,12 @@
 2016-04-04  Andre Vehreschild  
 
+	PR fortran/67538
+	* resolve.c (resolve_allocate_expr): Emit error message when no
+	array spec and no array valued source= expression is given in an
+	F2008 allocate() for an array to allocate.
+
+2016-04-04  Andre Vehreschild  
+
 	PR fortran/65795
 	* trans-array.c (gfc_array_allocate): When the array is a coarray,
 	do not nullyfing its allocatable components in array_allocate, because
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(Revision 234712)
+++ gcc/fortran/resolve.c	(Arbeitskopie)
@@ -7217,7 +7217,15 @@
 	  if (!gfc_notify_std (GFC_STD_F2008, "Array specification required "
 			   "in ALLOCATE statement at %L", &e->where))
 	goto failure;
-	  *array_alloc_wo_spec = true;
+	  if (code->expr3->rank != 0)
+	*array_alloc_wo_spec = true;
+	  else
+	{
+	  gfc_error ("Array specification or array-valued SOURCE= "
+			 "expression required in ALLOCATE statement at %L",
+			 &e->where);
+	  goto failure;
+	}
 	}
   else
 	{
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 234712)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,5 +1,10 @@
 2016-04-04  Andre Vehreschild  
 
+	PR fortran/67538
+	* gfortran.dg/allocate_with_source_19.f08: New test.
+
+2016-04-04  Andre Vehreschild  
+
 	PR fortran/65795
 	* gfortran.dg/coarray_allocate_6.f08: New test.
 
Index: gcc/testsuite/gfortran.dg/allocate_with_source_19.f08
===
--- gcc/testsuite/gfortran.dg/allocate_with_source_19.f08	(nicht existent)
+++ gcc/testsuite/gfortran.dg/allocate_with_source_19.f08	(Arbeitskopie)
@@ -0,0 +1,22 @@
+! { dg-do compile }
+! { dg-options -std=f2008 }
+
+! Contributed by mreste...@gmail.com
+! Check that instead of an ICE the error message is emitted.
+
+module m
+ implicit none
+contains
+
+ subroutine s()
+  real, allocatable :: x(:)
+  real :: y
+
+   y = 5.0
+   ! x either needs an array spec, or y needs to be an array.
+   allocate( x , source=y ) ! { dg-error "Array specification or array-valued SOURCE= expression required in ALLOCATE statement" }
+
+ end subroutine s
+
+end module m
+

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Prathamesh Kulkarni

On 4 April 2016 at 13:56, Richard Biener  wrote:
> On Mon, 4 Apr 2016, Prathamesh Kulkarni wrote:
>
>> On 1 April 2016 at 23:02, Richard Biener  wrote:
>> > On April 1, 2016 3:48:35 PM GMT+02:00, Prathamesh Kulkarni 
>> >  wrote:
>> >>Hi,
>> >>The attached patch introduces param max-lto-partition which creates an
>> >>upper
>> >>bound for partition size.
>> >>
>> >>My primary motivation for this patch is to fix building chromium for
>> >>arm
>> >>with -flto-partition=one.
>> >>Chromium fails to build with -flto-partition={none, one} with assembler
>> >>error:
>> >>"branch out of range error"
>> >>because in both these cases LTO creates a single text section of 18 mb
>> >>which exceeds thumb's limit of 16 mb and arm backend emits a short
>> >>call if caller and callee are in same section.
>> >>This is binutils PR18625:
>> >>https://sourceware.org/bugzilla/show_bug.cgi?id=18625
>> >>With patch, chromium builds for -flto-partition=one (by creating more
>> >>than one but minimal number of partitions to honor 16 mb limit).
>> >>I haven't tested with -flto-partition=none but I suppose the build
>> >>will still fail for none, because it won't involve partitioning?  I am
>> >>not sure how to fix for none case.
>> >>
>> >>As suggested by Jim in binutils PR18625, the proper fix would be to
>> >>implement branch relaxation in arm's port of gas, however I suppose
>> >>only LTO will realistically create such large sections,
>> >>and implementing branch relaxation appears to be quite complicated and
>> >>probably too much of
>> >>an effort for this single use case, so this patch serves as a
>> >>work-around to the issue.
>> >>I am looking into fine-tuning the param value for ARM backend to
>> >>roughly match limit
>> >>of 16 mb.
>> >>
>> >>AFAIU, this would change semantics of --param n_lto_partitions (or
>> >>-flto-partition=one) from
>> >>"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
>> >>not desirable maybe we could add
>> >>another param/option ?
>> >>Cross-tested on arm*-*-*.
>> >>Would this patch be OK for stage-1 (after getting param value right
>> >>for ARM target) ?
>> >
>> > What do you want to achieve?  Changing =one semantics doesn't look right 
>> > to me.
>> > Adding a param for maximum size sounds good in general, but only to 
>> > increase the maximum number of partitions for =balanced (the default).
>>
>> Well, chromium fails to build on ARM with -flto-partition={none, one}
>> because the size of text section created with LTO,
>> exceeds the limit of 16 mb for thumb2 which results in assembler
>> errors: "branch out of range".
>> I was trying to fix that by creating minimal number of partitions such
>> that size of each partition is not greater than section size limit.
>
> Ok, but you simply shouldn't use -flto-partition={none,one} if it doesn't
> work.  Note that "partition size" and text section size do not have
> a 1:1 correspondence so a safe limit is hard to achieve anyway.
>
>> I suppose in theory the problem could also present with balanced
>> partitioning if total_size / n_lto_partitions exceeds section size
>> limit,
>> although not sure if this will be a practical case.
>
> I guess an artificial testcase can easily hit this.  Or you can
> hit this by adjusting --param lto-partitions to 1.  I think
> adding a --param lto-max-partition is missing given that we
> already have a --param lto-min-partition and the partitioning
> algorithm tries to create lto-partitions partitions (but not smaller
> than lto-min-partition) but it never creates more than lto-partitions
> partitions as there is no upper bound on individual partition size.
>
> This is also why lto-partitions has such a high default (to exploit
> parallelism - but if there is only a very small number of CPU cores
> available it doesn't make sense to split up so much for small programs).
>
> That said, lto-partitions is a hint currently but also an upper bound
> because we lack lto-max-partition.  Let's fix that instead.
Um not sure if I understood correctly.
Do we want to constrain individual partition size by adding parameter
lto-max-partition
for balanced partitioning but not for -flto-partition=one
case (since latter would also change semantics of =one) ?

Thanks,
Prathamesh
>
> Richard.
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Richard Biener

On Mon, 4 Apr 2016, Prathamesh Kulkarni wrote:

> On 4 April 2016 at 13:56, Richard Biener  wrote:
> > On Mon, 4 Apr 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 April 2016 at 23:02, Richard Biener  wrote:
> >> > On April 1, 2016 3:48:35 PM GMT+02:00, Prathamesh Kulkarni 
> >> >  wrote:
> >> >>Hi,
> >> >>The attached patch introduces param max-lto-partition which creates an
> >> >>upper
> >> >>bound for partition size.
> >> >>
> >> >>My primary motivation for this patch is to fix building chromium for
> >> >>arm
> >> >>with -flto-partition=one.
> >> >>Chromium fails to build with -flto-partition={none, one} with assembler
> >> >>error:
> >> >>"branch out of range error"
> >> >>because in both these cases LTO creates a single text section of 18 mb
> >> >>which exceeds thumb's limit of 16 mb and arm backend emits a short
> >> >>call if caller and callee are in same section.
> >> >>This is binutils PR18625:
> >> >>https://sourceware.org/bugzilla/show_bug.cgi?id=18625
> >> >>With patch, chromium builds for -flto-partition=one (by creating more
> >> >>than one but minimal number of partitions to honor 16 mb limit).
> >> >>I haven't tested with -flto-partition=none but I suppose the build
> >> >>will still fail for none, because it won't involve partitioning?  I am
> >> >>not sure how to fix for none case.
> >> >>
> >> >>As suggested by Jim in binutils PR18625, the proper fix would be to
> >> >>implement branch relaxation in arm's port of gas, however I suppose
> >> >>only LTO will realistically create such large sections,
> >> >>and implementing branch relaxation appears to be quite complicated and
> >> >>probably too much of
> >> >>an effort for this single use case, so this patch serves as a
> >> >>work-around to the issue.
> >> >>I am looking into fine-tuning the param value for ARM backend to
> >> >>roughly match limit
> >> >>of 16 mb.
> >> >>
> >> >>AFAIU, this would change semantics of --param n_lto_partitions (or
> >> >>-flto-partition=one) from
> >> >>"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
> >> >>not desirable maybe we could add
> >> >>another param/option ?
> >> >>Cross-tested on arm*-*-*.
> >> >>Would this patch be OK for stage-1 (after getting param value right
> >> >>for ARM target) ?
> >> >
> >> > What do you want to achieve?  Changing =one semantics doesn't look right 
> >> > to me.
> >> > Adding a param for maximum size sounds good in general, but only to 
> >> > increase the maximum number of partitions for =balanced (the default).
> >>
> >> Well, chromium fails to build on ARM with -flto-partition={none, one}
> >> because the size of text section created with LTO,
> >> exceeds the limit of 16 mb for thumb2 which results in assembler
> >> errors: "branch out of range".
> >> I was trying to fix that by creating minimal number of partitions such
> >> that size of each partition is not greater than section size limit.
> >
> > Ok, but you simply shouldn't use -flto-partition={none,one} if it doesn't
> > work.  Note that "partition size" and text section size do not have
> > a 1:1 correspondence so a safe limit is hard to achieve anyway.
> >
> >> I suppose in theory the problem could also present with balanced
> >> partitioning if total_size / n_lto_partitions exceeds section size
> >> limit,
> >> although not sure if this will be a practical case.
> >
> > I guess an artificial testcase can easily hit this.  Or you can
> > hit this by adjusting --param lto-partitions to 1.  I think
> > adding a --param lto-max-partition is missing given that we
> > already have a --param lto-min-partition and the partitioning
> > algorithm tries to create lto-partitions partitions (but not smaller
> > than lto-min-partition) but it never creates more than lto-partitions
> > partitions as there is no upper bound on individual partition size.
> >
> > This is also why lto-partitions has such a high default (to exploit
> > parallelism - but if there is only a very small number of CPU cores
> > available it doesn't make sense to split up so much for small programs).
> >
> > That said, lto-partitions is a hint currently but also an upper bound
> > because we lack lto-max-partition.  Let's fix that instead.
> Um not sure if I understood correctly.
> Do we want to constrain individual partition size by adding parameter
> lto-max-partition
> for balanced partitioning but not for -flto-partition=one
> case (since latter would also change semantics of =one) ?

Yes, I think so.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[Patch, testsuite] Require int32plus and scheduling support for some tests

2016-04-04 Thread Senthil Kumar Selvaraj

Hi,

  This patch add dg-require-effective-target directives to a few tests
  that were failing unnecessarily for the AVR target.

  One of them invokes the compiler with -fschedule-insns2 - I've
  required scheduling support for that testcase. For all other tests,
  I've required int32plus - they either use bit shifts wider than
  16 bits (AVR's int size), or use int constants that are too big
  for a 16 bit int.

  If ok, could someone commit please? I don't have commit access.

Regards
Senthil

2016-04-04  Senthil Kumar Selvaraj  

* gcc.c-torture/compile/pr69102.c: Require scheduling support.
* gcc.c-torture/compile/pr37669.c: Require >=32 bit integers.
* gcc.c-torture/execute/bitfld-6.c: Likewise.
* gcc.c-torture/execute/bitfld-7.c: Likewise.
* gcc.c-torture/execute/pr38151.c: Likewise.
* gcc.c-torture/execute/pr66556.c: Likewise.
* gcc.c-torture/execute/pr67781.c: Likewise.
* gcc.c-torture/execute/pr68648.c: Likewise.


diff --git gcc/testsuite/gcc.c-torture/compile/pr37669.c 
gcc/testsuite/gcc.c-torture/compile/pr37669.c
index c78243b..a2eafc7 100644
--- gcc/testsuite/gcc.c-torture/compile/pr37669.c
+++ gcc/testsuite/gcc.c-torture/compile/pr37669.c
@@ -1,5 +1,6 @@
 /* This testcase used to fail because a miscompiled execute_fold_all_builtins. 
*/
 /* { dg-options "-fgnu89-inline" } */
+/* { dg-require-effective-target int32plus } */
 
 typedef __SIZE_TYPE__ size_t;
 extern __inline __attribute__ ((__always_inline__)) int __attribute__
diff --git gcc/testsuite/gcc.c-torture/compile/pr69102.c 
gcc/testsuite/gcc.c-torture/compile/pr69102.c
index b1328ca..1f0cdc6 100644
--- gcc/testsuite/gcc.c-torture/compile/pr69102.c
+++ gcc/testsuite/gcc.c-torture/compile/pr69102.c
@@ -1,4 +1,5 @@
 /* { dg-options "-Og -fPIC -fschedule-insns2 -fselective-scheduling2 
-fno-tree-fre --param=max-sched-extend-regions-iters=10" } */
+/* { dg-require-effective-target scheduling } */
 void bar (unsigned int);
 
 void
diff --git gcc/testsuite/gcc.c-torture/execute/bitfld-6.c 
gcc/testsuite/gcc.c-torture/execute/bitfld-6.c
index 50927dc..b8c5cbd 100644
--- gcc/testsuite/gcc.c-torture/execute/bitfld-6.c
+++ gcc/testsuite/gcc.c-torture/execute/bitfld-6.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 union U
 {
   const int a;
diff --git gcc/testsuite/gcc.c-torture/execute/bitfld-7.c 
gcc/testsuite/gcc.c-torture/execute/bitfld-7.c
index e9a61df..350e7a3 100644
--- gcc/testsuite/gcc.c-torture/execute/bitfld-7.c
+++ gcc/testsuite/gcc.c-torture/execute/bitfld-7.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 union U
 {
   const int a;
diff --git gcc/testsuite/gcc.c-torture/execute/pr38151.c 
gcc/testsuite/gcc.c-torture/execute/pr38151.c
index 5ee058d..86c8f77 100644
--- gcc/testsuite/gcc.c-torture/execute/pr38151.c
+++ gcc/testsuite/gcc.c-torture/execute/pr38151.c
@@ -1,4 +1,5 @@
 /* { dg-options "-Wno-psabi" } */
+/* { dg-require-effective-target int32plus } */
 void abort (void);
 
 struct S2848
diff --git gcc/testsuite/gcc.c-torture/execute/pr66556.c 
gcc/testsuite/gcc.c-torture/execute/pr66556.c
index f7acf1c..d1259c4 100644
--- gcc/testsuite/gcc.c-torture/execute/pr66556.c
+++ gcc/testsuite/gcc.c-torture/execute/pr66556.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 
 extern void abort (void);
 
diff --git gcc/testsuite/gcc.c-torture/execute/pr67781.c 
gcc/testsuite/gcc.c-torture/execute/pr67781.c
index bf50aa2..71ccd6a 100644
--- gcc/testsuite/gcc.c-torture/execute/pr67781.c
+++ gcc/testsuite/gcc.c-torture/execute/pr67781.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 #ifdef __UINT32_TYPE__
 typedef __UINT32_TYPE__ uint32_t;
 #else
diff --git gcc/testsuite/gcc.c-torture/execute/pr68648.c 
gcc/testsuite/gcc.c-torture/execute/pr68648.c
index fc66806..db55bd0 100644
--- gcc/testsuite/gcc.c-torture/execute/pr68648.c
+++ gcc/testsuite/gcc.c-torture/execute/pr68648.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 int __attribute__ ((noinline))
 foo (void)
 {

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Jan Hubicka

> > Um not sure if I understood correctly.
> > Do we want to constrain individual partition size by adding parameter
> > lto-max-partition
> > for balanced partitioning but not for -flto-partition=one
> > case (since latter would also change semantics of =one) ?
> 
> Yes, I think so.

Yep, I agree.  Having partition one that produces multiple partitions doesn't 
seem to make much sense.
Given that we have such not so predictable target specific limits on size of 
single translation unit
we can handle, I suppose adding a resonable limit to the default balanced 
partitioning makes more sense.
One can always push the behaviour you intend by setting max partitions to 1 (I 
suppose max size should
have precedence over max partitions)

Honza
> 
> Richard.
> 
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > --
> > > Richard Biener 
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
> > > HRB 21284 (AG Nuernberg)
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

[PATCH, trivial, 1/3] Simplify loop in pp_write_text_as_dot_label_to_stream

2016-04-04 Thread Tom de Vries


Hi,

this patch simplifies the loop structure in 
pp_write_text_as_dot_label_to_stream:

- it rewrites the while loop into a for loop
- it factors common code out of the switch cases.

No functional changes.

Bootstrapped and reg-tested on x86_64.

Will commit to stage1 trunk as trivial.

Thanks,
- Tom
Simplify loop in pp_write_text_as_dot_label_to_stream

2016-04-04  Tom de Vries  

	* pretty-print.c (pp_write_text_as_dot_label_to_stream): Simplify loop
	structure.

---
 gcc/pretty-print.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index acb89e6..f6e4b43 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -159,20 +159,20 @@ pp_write_text_as_dot_label_to_stream (pretty_printer *pp, bool for_record)
   const char *p = text;
   FILE *fp = pp_buffer (pp)->stream;
 
-  while (*p)
+  for (;*p; p++)
 {
+  bool escape_char;
   switch (*p)
 	{
 	/* Print newlines as a left-aligned newline.  */
 	case '\n':
-	  fputs ("\\l\\\n", fp);
+	  fputs ("\\l", fp);
+	  escape_char = true;
 	  break;
 
 	/* A pipe is only special for record-shape nodes.  */
 	case '|':
-	  if (for_record)
-	fputc ('\\', fp);
-	  fputc (*p, fp);
+	  escape_char = for_record;
 	  break;
 
 	/* The following characters always have to be escaped
@@ -183,13 +183,18 @@ pp_write_text_as_dot_label_to_stream (pretty_printer *pp, bool for_record)
 	case '>':
 	case '"':
 	case ' ':
-	  fputc ('\\', fp);
-	  /* fall through */
+	  escape_char = true;
+	  break;
+
 	default:
-	  fputc (*p, fp);
+	  escape_char = false;
 	  break;
 	}
-  p++;
+
+  if (escape_char)
+	fputc ('\\', fp);
+
+  fputc (*p, fp);
 }
 
   pp_clear_output_area (pp);

[PATCH, trivial, 2/3] Fix record-shape escapes in pp_write_text_as_dot_label_to_stream

2016-04-04 Thread Tom de Vries


Hi,

this patch fixes the classification of the chars '{}<> ' in 
pp_write_text_as_dot_label_to_stream. They're currently marked as 
always-escape, but that's incorrect, they should be marked as 
escape-for-record.


Bootstrapped and reg-tested on x86_64.

Will commit to stage1 trunk as trivial.

Thanks,
- Tom
Fix record-shape escapes in pp_write_text_as_dot_label_to_stream

2016-04-04  Tom de Vries  

	* pretty-print.c (pp_write_text_as_dot_label_to_stream): Classify chars
	'{}<> ' as escape-for-record.

---
 gcc/pretty-print.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index f6e4b43..c3a90a7 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -170,19 +170,19 @@ pp_write_text_as_dot_label_to_stream (pretty_printer *pp, bool for_record)
 	  escape_char = true;
 	  break;
 
-	/* A pipe is only special for record-shape nodes.  */
+	/* The following characters are only special for record-shape nodes.  */
 	case '|':
+	case '{':
+	case '}':
+	case '<':
+	case '>':
+	case ' ':
 	  escape_char = for_record;
 	  break;
 
 	/* The following characters always have to be escaped
 	   for use in labels.  */
-	case '{':
-	case '}':
-	case '<':
-	case '>':
 	case '"':
-	case ' ':
 	  escape_char = true;
 	  break;

Re: [PATCH] Fix PR70457 (ICE on incompatible call to built-in pow)

2016-04-04 Thread Bill Schmidt

OK, sorry for misreading the note.  This is exactly what I've done for
the GCC 5 and GCC 4.9 versions, so I'll update the GCC 6 version to do
the same.

Bill

On Mon, 2016-04-04 at 08:57 +0200, Jakub Jelinek wrote:
> On Sun, Apr 03, 2016 at 06:43:47PM -0500, Bill Schmidt wrote:
> > --- tree-inline.c   (revision 234702)
> > +++ tree-inline.c   (working copy)
> > @@ -57,8 +57,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "cfgloop.h"
> >  #include "builtins.h"
> >  #include "tree-chkp.h"
> > +#include "case-cfn-macros.h"
> >  
> > -
> >  /* I'm not real happy about this, but we need to handle gimple and
> > non-gimple trees.  */
> >  
> 
> Please keep the extra empty line above.
> 
> > @@ -4070,11 +4070,9 @@ estimate_num_insns (gimple *stmt, eni_weights *wei
> > /* We canonicalize x * x to pow (x, 2.0) with -ffast-math, so
> >specialize the cheap expansion we do here.
> >???  This asks for a more general solution.  */
> > -   switch (DECL_FUNCTION_CODE (decl))
> > +   switch (gimple_call_combined_fn (stmt))
> >   {
> > -   case BUILT_IN_POW:
> > -   case BUILT_IN_POWF:
> > -   case BUILT_IN_POWL:
> > +   CASE_CFN_POW:
> >   if (TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
> >   && (real_equal
> >   (&TREE_REAL_CST (gimple_call_arg (stmt, 1)),
> 
> Actually, I haven't been suggesting to use gimple_call_combined_fn,
> but gimple_call_builtin_p.
> Here using gimple_call_combined_fn doesn't make much sense, because
> it is in code guarded with:
> if (gimple_call_internal_p (stmt))
>   return 0;
> else if ((decl = gimple_call_fndecl (stmt))
>  && DECL_BUILT_IN (decl))
>   {
> ...
> else if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
>   {
> Thus, internal functions don't make this spot at all.
> So, either replace the DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
> with gimple_builtin_call_p (stmt, BUILT_IN_NORMAL); or call
> gimple_builtin_call_types_compatible_p.
> 
> > Index: tree-ssa-math-opts.c
> > ===
> > --- tree-ssa-math-opts.c(revision 234702)
> > +++ tree-ssa-math-opts.c(working copy)
> > @@ -3829,11 +3829,9 @@ pass_optimize_widening_mul::execute (function *fun
> >   if (fndecl
> >   && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > {
> > - switch (DECL_FUNCTION_CODE (fndecl))
> > + switch (gimple_call_combined_fn (stmt))
> > {
> > - case BUILT_IN_POWF:
> > - case BUILT_IN_POW:
> > - case BUILT_IN_POWL:
> > + CASE_CFN_POW:
> > if (TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
> > && real_equal
> >  (&TREE_REAL_CST (gimple_call_arg (stmt, 1)),
> > 
> 
> And similarly here.
> 
>   Jakub
>

[PATCH, trivial, 3/3, PR70433] Handle backslash in pp_write_text_as_dot_label_to_stream

2016-04-04 Thread Tom de Vries


Hi,

this patch fixes PR70433.

Consider test.c:
...
#include 

void
test (void)
{
  printf ("\"%s\"",__FUNCTION__);
}
...

In the dump file we find the backslashes in the string:
...
  printf ("\"%s\"", &__FUNCTION__);
...

But the dot file contains:
...
printf\ (\"\\"%s\\"\",\ &__FUNCTION__);
...

Which translates to this string in the pdf file, without the backslashes:
...
printf (""%s"", &__FUNCTION__);
...

The patch classifies the backslash as always-escape in 
pp_write_text_as_dot_label_to_stream, and that fixes the PR.
[ Also, the patch adds an assert to prevent running into a known graphiz 
PR related to backslashes. ]


Bootstrapped and reg-tested on x86_64.

Will commit to stage1 trunk as trivial.

[ I'd add a testcase, but that is blocked on the 
lib/scandump.exp/dump-suffix bit of 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01077.html which is 
awaiting review. ]


Thanks,
- Tom
Handle backslash in pp_write_text_as_dot_label_to_stream

2016-04-04  Tom de Vries  

	PR other/70433
	* pretty-print.c (pp_write_text_as_dot_label_to_stream): Escape
	backslash in label.

---
 gcc/pretty-print.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index c3a90a7..8ac3d34 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -182,6 +182,12 @@ pp_write_text_as_dot_label_to_stream (pretty_printer *pp, bool for_record)
 
 	/* The following characters always have to be escaped
 	   for use in labels.  */
+	case '\\':
+	  /* There is a bug in some (f.i. 2.36.0) versions of graphiz
+	 ( http://www.graphviz.org/mantisbt/view.php?id=2524 ) related to
+	 backslash as last char in label.  Let's avoid triggering it.  */
+	  gcc_assert (*(p + 1) != '\0');
+	  /* Fall through.  */
 	case '"':
 	  escape_char = true;
 	  break;

Re: [PATCH][Testsuite] Add --param sra-max-scalarization-size-Ospeed to sra-12.c

2016-04-04 Thread Andre Vieira (lists)

On 18/03/16 10:34, Andre Vieira (lists) wrote:
> On 21/10/15 16:59, Jeff Law wrote:
>> On 10/21/2015 09:52 AM, Alan Lawrence wrote:
>>> gcc.dg/tree-ssa/sra-12.c is skipped on a bunch of targets, including
>>> AArch64,
>>> because the default max-scalarization-size depends on MOVE_RATIO, and
>>> on those
>>> targets thus ends up being too small for SRA to optimize the testcase.
>>> Recently
>>> I noticed that the test has been failing for some time on ARM too.
>>> This patch
>>> fixes the test on ARM, AArch64, avr, and sh, and by extension I
>>> believe also on
>>> nds32, although I haven't managed to build a nds32 compiler to check.
>>>
>>> There is an argument that instead we should skip the test on ARM too;
>>> or rather,
>>> since at least ARM and AArch64 would like the test to pass, we should
>>> xfail it
>>> on those platforms until we have time to experiment with the
>>> threshold/param for
>>> SRA. I hope to do some more investigation on that front as part of (or
>>> followup
>>> to) PR/63679.
>>>
>>> Is this OK for trunk?
>>>
>>> Cheers,
>>> Alan
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.dg/tree-ssa/sra-12.c: Enable test on all targets; add --param
>>> sra-max-scalarization-size-Ospeed.
>> OK.
>> jeff
>>
> 
> OK to backport this to gcc-5-branch?
> 
> Cheers,
> Andre
> 
Ping.

[PATCH] gnattools: Clean config.cache (PR70173)

2016-04-04 Thread Segher Boessenkool

The config.cache file should be deleted by "make distclean", just like
config.log and config.status .  The directory itself is still not deleted
(just like the gotools and libcc1 directories).

Tested on powerpc64-linux, --enable-languages=all,ada,go,obj-c++ ,
followed by "make distclean".  Is this okay for trunk?


Segher


2016-04-04  Segher Boessenkool  

gnattools/
PR bootstrap/70173
* Makefile.in (distclean): Also delete config.cache .

---
 gnattools/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in
index 0c889ee..f949ca9 100644
--- a/gnattools/Makefile.in
+++ b/gnattools/Makefile.in
@@ -309,7 +309,7 @@ mostlyclean:
 clean:
 
 distclean:
-   $(RM) Makefile config.status config.log
+   $(RM) Makefile config.status config.log config.cache
 
 maintainer-clean:
 
-- 
1.9.3

[PATCH] libcc1: Clean compiler-name.h (PR70173)

2016-04-04 Thread Segher Boessenkool

Since the file is generated from a Makefile fragment, it needs to be
added to MOSTLYCLEANFILES.  The directory itself is still not deleted
(just like the gnattools and gotools directories).

Tested on powerpc64-linux, --enable-languages=all,ada,go,obj-c++ ,
followed by "make distclean".  Is this okay for trunk?


Segher


2016-04-04  Segher Boessenkool  

libcc1/
PR bootstrap/70173
* Makefile.am (MOSTLYCLEANFILES): New, add compiler-name.h .
(compiler-name.h): Shorten recipe so that it fits the line.
* Makefile.in: Regenerate.

---
 libcc1/Makefile.am | 3 ++-
 libcc1/Makefile.in | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libcc1/Makefile.am b/libcc1/Makefile.am
index 7a274b3..b40820b 100644
--- a/libcc1/Makefile.am
+++ b/libcc1/Makefile.am
@@ -44,11 +44,12 @@ cc1lib_LTLIBRARIES = libcc1.la
 endif
 
 BUILT_SOURCES = compiler-name.h
+MOSTLYCLEANFILES = compiler-name.h
 
 # Put this in a header so we don't run sed for each compilation.  This
 # is also simpler to debug as one can easily see the constant.
 compiler-name.h: Makefile
-   echo "#define COMPILER_NAME \"`echo gcc | sed '$(transform)'`\"" > 
compiler-name.h
+   echo "#define COMPILER_NAME \"`echo gcc | sed '$(transform)'`\"" > $@
 
 
 shared_source = callbacks.cc callbacks.hh connection.cc connection.hh \
diff --git a/libcc1/Makefile.in b/libcc1/Makefile.in
index 9e00368..79d39d3 100644
--- a/libcc1/Makefile.in
+++ b/libcc1/Makefile.in
@@ -296,6 +296,7 @@ cc1libdir = $(libdir)/$(libsuffix)
 @ENABLE_PLUGIN_TRUE@plugin_LTLIBRARIES = libcc1plugin.la
 @ENABLE_PLUGIN_TRUE@cc1lib_LTLIBRARIES = libcc1.la
 BUILT_SOURCES = compiler-name.h
+MOSTLYCLEANFILES = compiler-name.h
 shared_source = callbacks.cc callbacks.hh connection.cc connection.hh \
 marshall.cc marshall.hh rpc.hh status.hh
 
@@ -563,6 +564,7 @@ install-strip:
"INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'" install; \
fi
 mostlyclean-generic:
+   -test -z "$(MOSTLYCLEANFILES)" || rm -f $(MOSTLYCLEANFILES)
 
 clean-generic:
 
@@ -672,7 +674,7 @@ override LDFLAGS := $(filter-out 
-fsanitize=address,$(LDFLAGS))
 # Put this in a header so we don't run sed for each compilation.  This
 # is also simpler to debug as one can easily see the constant.
 compiler-name.h: Makefile
-   echo "#define COMPILER_NAME \"`echo gcc | sed '$(transform)'`\"" > 
compiler-name.h
+   echo "#define COMPILER_NAME \"`echo gcc | sed '$(transform)'`\"" > $@
 
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
-- 
1.9.3

[PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Richard Biener


r141430 as a fix for PR37870 added a memory fallback to 
extract_bit_field_1 and another case punning through a larger integer
mode (as suggested by Ian).  That path is broken as it happily creates

(set (subreg:XF (reg:TI ..)) (...))

on i?86 when we optimize the added testcase

unsigned int
bar (long double x)
{
  union { struct { char a[8]; unsigned int b:7; } c; long double d; } u;
  u.d = x;
  return u.c.b;
}

on GIMPLE to

bar (long double x)
{
  unsigned int _3;
   _4;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _4 = BIT_FIELD_REF ;
  _3 = (unsigned int) _4;
  return _3;


Bootstrapped and tested on x86_64-unknown-linux-gnu (also with -m32)
and the code replaced with a gcc_unreachable ().

I don't see how a subreg can be ever valid as it will be necessarily
converting to a non-integer mode of different size (otherwise 
int_mode_for_mode would have returned non-BLKmode).  I can't think
of a condition that would need to be satisfied where the subreg
would be valid either.

Ok for GCC 7?  (it'll be done as part of said GIMPLE optimization).

The testcase gcc.target/i386/pr37870.c will already ICE with that
patch, so no additional testcase.

Thanks,
Richard.

2016-04-04  Richard Biener  

PR middle-end/37870
* expmed.c (extract_bit_field_1): Remove broken case
using a wider MODE_INT mode.

Index: gcc/expmed.c
===
--- gcc/expmed.c(revision 234708)
+++ gcc/expmed.c(working copy)
@@ -1647,17 +1647,6 @@ extract_bit_field_1 (rtx str_rtx, unsign
if (GET_CODE (op0) == SUBREG)
  op0 = force_reg (imode, op0);
  }
-   else if (REG_P (op0))
- {
-   rtx reg, subreg;
-   imode = smallest_mode_for_size (GET_MODE_BITSIZE (GET_MODE (op0)),
-   MODE_INT);
-   reg = gen_reg_rtx (imode);
-   subreg = gen_lowpart_SUBREG (GET_MODE (op0), reg);
-   emit_move_insn (subreg, op0);
-   op0 = reg;
-   bitnum += SUBREG_BYTE (subreg) * BITS_PER_UNIT;
- }
else
  {
HOST_WIDE_INT size = GET_MODE_SIZE (GET_MODE (op0));

Re: [PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 02:56:51PM +0200, Richard Biener wrote:
> The testcase gcc.target/i386/pr37870.c will already ICE with that
> patch, so no additional testcase.

In theory you could validate_subreg first and use that code if validation
went ok, otherwise go through memory.
But I admit I don't have anything in particular in mind where it would
trigger this code and the subreg would successfully validate.

> 2016-04-04  Richard Biener  
> 
>   PR middle-end/37870
>   * expmed.c (extract_bit_field_1): Remove broken case
>   using a wider MODE_INT mode.
> 
> Index: gcc/expmed.c
> ===
> --- gcc/expmed.c  (revision 234708)
> +++ gcc/expmed.c  (working copy)
> @@ -1647,17 +1647,6 @@ extract_bit_field_1 (rtx str_rtx, unsign
>   if (GET_CODE (op0) == SUBREG)
> op0 = force_reg (imode, op0);
> }
> - else if (REG_P (op0))
> -   {
> - rtx reg, subreg;
> - imode = smallest_mode_for_size (GET_MODE_BITSIZE (GET_MODE (op0)),
> - MODE_INT);
> - reg = gen_reg_rtx (imode);
> - subreg = gen_lowpart_SUBREG (GET_MODE (op0), reg);
> - emit_move_insn (subreg, op0);
> - op0 = reg;
> - bitnum += SUBREG_BYTE (subreg) * BITS_PER_UNIT;
> -   }
>   else
> {
>   HOST_WIDE_INT size = GET_MODE_SIZE (GET_MODE (op0));

Jakub

Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking DR against its innermost loop bahavior if possible

2016-04-04 Thread Richard Biener

On Thu, Mar 31, 2016 at 6:43 PM, Bin.Cheng  wrote:
> On Tue, Mar 29, 2016 at 9:37 AM, Richard Biener
>  wrote:
>> On Mon, Mar 28, 2016 at 9:57 PM, Bin.Cheng  wrote:
>>> Sorry, Should have replied to gcc-patches list.
>>>
>>> Thanks,
>>> bin
>>>
>>> -- Forwarded message --
>>> From: "Bin.Cheng" 
>>> Date: Tue, 29 Mar 2016 03:55:04 +0800
>>> Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking
>>> DR against its innermost loop bahavior if possible
>>> To: Richard Biener 
>>>
>>> On 3/17/16, Richard Biener  wrote:
 On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng  wrote:
> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
>  wrote:
>>
>> Hmm.
> Hi,
> Thanks for reviewing.
>>
>> +  equal_p = true;
>> +  if (e1->base_address && e2->base_address)
>> +equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
>> +  if (e1->offset && e2->offset)
>> +equal_p &= operand_equal_p (e1->offset, e2->offset, 0);
>>
>> surely better to return false early.
>>
>> I think we don't want this in tree-data-refs.h also because of ...
>>
>> @@ -615,15 +619,29 @@
>> hash_memrefs_baserefs_and_store_DRs_read_written_info
>> (data_reference_p a)
>>data_reference_p *master_dr, *base_master_dr;and REALPART) before
>> creating the DR (or adjust the equality function
> and hashing
>>tree ref = DR_REF (a);
>>tree base_ref = DR_BASE_OBJECT (a);
>> +  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
>>tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
>>bool exist1, exist2;
>>
>> -  while (TREE_CODE (ref) == COMPONENT_REF
>> -|| TREE_CODE (ref) == IMAGPART_EXPR
>> -|| TREE_CODE (ref) == REALPART_EXPR)
>> -ref = TREE_OPERAND (ref, 0);
>> +  /* If reference in DR has innermost loop behavior and it is not
>> + a compound memory reference, we store it to innermost_DR_map,
>> + otherwise to ref_DR_map.  */
>> +  if (TREE_CODE (ref) == COMPONENT_REF
>> +  || TREE_CODE (ref) == IMAGPART_EXPR
>> +  || TREE_CODE (ref) == REALPART_EXPR
>> +  || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
>> +  || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
>> +{
>> +  while (TREE_CODE (ref) == COMPONENT_REF
>> +|| TREE_CODE (ref) == IMAGPART_EXPR
>> +|| TREE_CODE (ref) == REALPART_EXPR)
>> +   ref = TREE_OPERAND (ref, 0);
>> +
>> +  master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
>> +}
>> +  else
>> +master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);
>>
>> we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd
>> need to
>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
>> and REALPART) before creating the DR (or adjust the equality function
>> and hashing
>> to disregard them which means subtracting their offset from DR_INIT.
> I am not sure if I understand correctly.  But for component reference,
> it is the base object that we want to record/track.  For example,
>
>   for (i = 0; i < N; i++) {
> m = *data++;
>
> m1 = p1->x - m;
> m2 = p2->x + m;
>
> p3->y = (m1 >= m2) ? p1->y : p2->y;
>
> p1++;
> p2++;
> p3++;
>   }
> We want to infer that reads of p1/p2 in condition statement won't trap
> because there are unconditional reads of the structures, though the
> unconditional reads are actual of other sub-objects.  Here it is the
> invariant part of address that we want to track.

 Well, the variant parts - we want to strip invariant parts as far as we can
 (offsetof (x) and offsetof (y))

> Also illustrated by this example, we can't rely on data-ref analyzer
> here.  Because in gathering/scattering cases, the address could be not
> affine at all.

 Sure, but that's a different issue.

>>
>> To adjust the references we collect you'd maybe could use a callback
>> to get_references_in_stmt
>> to adjust them.
>>
>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple
>> as
> Is this a part of the method you suggested above, or is it an
> alternative one?  If it's the latter, then I have below questions
> embedded.

 It is an alternative to adding a hook to get_references_in_stmt and
 probably "easier".

>>
>> Index: tree-if-conv.c
>> ===
>> --- tree-if-conv.c  (revision 234215)
>> +++ tree-if-conv.c  (working copy)
>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>>
>>for (i = 0; refs->iterate (i, &dr); i++)
>>  {
>> +  tree *refp = &DR_REF (dr);
>> +

Re: [PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Richard Biener

On Mon, 4 Apr 2016, Jakub Jelinek wrote:

> On Mon, Apr 04, 2016 at 02:56:51PM +0200, Richard Biener wrote:
> > The testcase gcc.target/i386/pr37870.c will already ICE with that
> > patch, so no additional testcase.
> 
> In theory you could validate_subreg first and use that code if validation
> went ok, otherwise go through memory.
> But I admit I don't have anything in particular in mind where it would
> trigger this code and the subreg would successfully validate.

Not sure if it would help as that has

  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
store_bit_field
 is the culprit here, and not the backends.  */
  else if (osize >= UNITS_PER_WORD && isize >= osize)
;

and thus we'd return true anyway for (subreg:XF (reg:TI) 0)

Richard.

> > 2016-04-04  Richard Biener  
> > 
> > PR middle-end/37870
> > * expmed.c (extract_bit_field_1): Remove broken case
> > using a wider MODE_INT mode.
> > 
> > Index: gcc/expmed.c
> > ===
> > --- gcc/expmed.c(revision 234708)
> > +++ gcc/expmed.c(working copy)
> > @@ -1647,17 +1647,6 @@ extract_bit_field_1 (rtx str_rtx, unsign
> > if (GET_CODE (op0) == SUBREG)
> >   op0 = force_reg (imode, op0);
> >   }
> > -   else if (REG_P (op0))
> > - {
> > -   rtx reg, subreg;
> > -   imode = smallest_mode_for_size (GET_MODE_BITSIZE (GET_MODE (op0)),
> > -   MODE_INT);
> > -   reg = gen_reg_rtx (imode);
> > -   subreg = gen_lowpart_SUBREG (GET_MODE (op0), reg);
> > -   emit_move_insn (subreg, op0);
> > -   op0 = reg;
> > -   bitnum += SUBREG_BYTE (subreg) * BITS_PER_UNIT;
> > - }
> > else
> >   {
> > HOST_WIDE_INT size = GET_MODE_SIZE (GET_MODE (op0));
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Fix for PR70498 in Libiberty Demangler

2016-04-04 Thread Bernd Schmidt


On 04/02/2016 11:49 AM, Marcel Böhme wrote:


Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases
added + checked PR70498 is resolved.


Richard - any objections from an RM perspective to check in such
potentially security-related fixes now even if not regressions?


The patch now also accounts for overflows in d_compact_number which
is supposed to return -1 in case of negative numbers.


I take it this isn't for the normal 'n' case, but for instances where we 
encounter overflows in d_number?



Bernd

Re: [PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 03:10:34PM +0200, Richard Biener wrote:
> On Mon, 4 Apr 2016, Jakub Jelinek wrote:
> 
> > On Mon, Apr 04, 2016 at 02:56:51PM +0200, Richard Biener wrote:
> > > The testcase gcc.target/i386/pr37870.c will already ICE with that
> > > patch, so no additional testcase.
> > 
> > In theory you could validate_subreg first and use that code if validation
> > went ok, otherwise go through memory.
> > But I admit I don't have anything in particular in mind where it would
> > trigger this code and the subreg would successfully validate.
> 
> Not sure if it would help as that has
> 
>   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> store_bit_field
>  is the culprit here, and not the backends.  */
>   else if (osize >= UNITS_PER_WORD && isize >= osize)
> ;
> 
> and thus we'd return true anyway for (subreg:XF (reg:TI) 0)

If XFmode subreg of TImode reg passes validation, where does it ICE then?

Jakub

Re: [PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Richard Biener

On Mon, 4 Apr 2016, Jakub Jelinek wrote:

> On Mon, Apr 04, 2016 at 03:10:34PM +0200, Richard Biener wrote:
> > On Mon, 4 Apr 2016, Jakub Jelinek wrote:
> > 
> > > On Mon, Apr 04, 2016 at 02:56:51PM +0200, Richard Biener wrote:
> > > > The testcase gcc.target/i386/pr37870.c will already ICE with that
> > > > patch, so no additional testcase.
> > > 
> > > In theory you could validate_subreg first and use that code if validation
> > > went ok, otherwise go through memory.
> > > But I admit I don't have anything in particular in mind where it would
> > > trigger this code and the subreg would successfully validate.
> > 
> > Not sure if it would help as that has
> > 
> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > store_bit_field
> >  is the culprit here, and not the backends.  */
> >   else if (osize >= UNITS_PER_WORD && isize >= osize)
> > ;
> > 
> > and thus we'd return true anyway for (subreg:XF (reg:TI) 0)
> 
> If XFmode subreg of TImode reg passes validation, where does it ICE then?

It ICEs in

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.target/i386/pr37870.c:19:1: 
internal compiler error: in subreg_get_info, at rtlanal.c:3695
0xddee5a subreg_get_info(unsigned int, machine_mode, unsigned int, 
machine_mode, subreg_info*)
/space/rguenther/src/svn/trunk/gcc/rtlanal.c:3695
0xddf12b simplify_subreg_regno(unsigned int, machine_mode, unsigned int, 
machine_mode)
/space/rguenther/src/svn/trunk/gcc/rtlanal.c:3808
0xd8bc7a simplifiable_subregs(subreg_shape const&)
/space/rguenther/src/svn/trunk/gcc/reginfo.c:1234
0xd8be1d record_subregs_of_mode
/space/rguenther/src/svn/trunk/gcc/reginfo.c:1294
0xd8c246 init_subregs_of_mode()
/space/rguenther/src/svn/trunk/gcc/reginfo.c:1348
0xc1a55c init_costs
/space/rguenther/src/svn/trunk/gcc/ira-costs.c:2187

which is

  /* This should always pass, otherwise we don't know how to verify
 the constraint.  These conditions may be relaxed but
 subreg_regno_offset would need to be redesigned.  */
  gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);

Richard.

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Prathamesh Kulkarni

On 4 April 2016 at 17:30, Jan Hubicka  wrote:
>> > Um not sure if I understood correctly.
>> > Do we want to constrain individual partition size by adding parameter
>> > lto-max-partition
>> > for balanced partitioning but not for -flto-partition=one
>> > case (since latter would also change semantics of =one) ?
>>
>> Yes, I think so.
>
> Yep, I agree.  Having partition one that produces multiple partitions doesn't 
> seem to make much sense.
> Given that we have such not so predictable target specific limits on size of 
> single translation unit
> we can handle, I suppose adding a resonable limit to the default balanced 
> partitioning makes more sense.
> One can always push the behaviour you intend by setting max partitions to 1 
> (I suppose max size should
> have precedence over max partitions)
Thanks for the suggestions, I have updated the patch.
Is it OK if it passes bootstrap+test ?

Thanks,
Prathamesh
>
> Honza
>>
>> Richard.
>>
>> > Thanks,
>> > Prathamesh
>> > >
>> > > Richard.
>> > >
>> > > --
>> > > Richard Biener 
>> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
>> > > HRB 21284 (AG Nuernberg)
>> >
>> >
>>
>> --
>> Richard Biener 
>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
>> 21284 (AG Nuernberg)
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 9eb63c2..bc0c612 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -511,9 +511,20 @@ lto_balanced_map (int n_lto_partitions)
   varpool_order.qsort (varpool_node_cmp);
 
   /* Compute partition size and create the first partition.  */
+  if (PARAM_VALUE (MIN_PARTITION_SIZE) > PARAM_VALUE (MAX_PARTITION_SIZE))
+fatal_error (input_location, "min partition size cannot be greater than 
max partition size");
+
   partition_size = total_size / n_lto_partitions;
   if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
 partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
+  else if (partition_size > PARAM_VALUE (MAX_PARTITION_SIZE))
+{
+  n_lto_partitions = total_size / PARAM_VALUE (MAX_PARTITION_SIZE);
+  if (total_size % PARAM_VALUE (MAX_PARTITION_SIZE))
+   n_lto_partitions++;
+  partition_size = total_size / n_lto_partitions;
+}
+
   npartitions = 1;
   partition = new_partition ("");
   if (symtab->dump_file)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9dd513f..294b8a4 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3112,6 +3112,12 @@ do_whole_program_analysis (void)
   timevar_pop (TV_WHOPR_WPA);
 
   timevar_push (TV_WHOPR_PARTITIONING);
+
+  if (flag_lto_partition != LTO_PARTITION_BALANCED
+  && PARAM_VALUE (MAX_PARTITION_SIZE) != INT_MAX)
+fatal_error (input_location, "--param max-lto-partition should only"
+" be used with balanced partitioning\n");
+
   if (flag_lto_partition == LTO_PARTITION_1TO1)
 lto_1_to_1_map ();
   else if (flag_lto_partition == LTO_PARTITION_MAX)
diff --git a/gcc/params.def b/gcc/params.def
index 9362c15..b6055ff 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1029,6 +1029,11 @@ DEFPARAM (MIN_PARTITION_SIZE,
  "Minimal size of a partition for LTO (in estimated instructions).",
  1000, 0, 0)
 
+DEFPARAM (MAX_PARTITION_SIZE,
+ "lto-max-partition",
+ "Maximal size of a partition for LTO (in estimated instructions).",
+ INT_MAX, 0, INT_MAX)
+
 /* Diagnostic parameters.  */
 
 DEFPARAM (CXX_MAX_NAMESPACES_FOR_DIAGNOSTIC_HELP,


ChangeLog
Description: Binary data

Re: [PATCH][GCC 7] Remove broken path in extract_bit_field_1

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 03:27:23PM +0200, Richard Biener wrote:
> It ICEs in
> 
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.target/i386/pr37870.c:19:1: 
> internal compiler error: in subreg_get_info, at rtlanal.c:3695
> 0xddee5a subreg_get_info(unsigned int, machine_mode, unsigned int, 
> machine_mode, subreg_info*)
> /space/rguenther/src/svn/trunk/gcc/rtlanal.c:3695
> 0xddf12b simplify_subreg_regno(unsigned int, machine_mode, unsigned int, 
> machine_mode)
> /space/rguenther/src/svn/trunk/gcc/rtlanal.c:3808
> 0xd8bc7a simplifiable_subregs(subreg_shape const&)
> /space/rguenther/src/svn/trunk/gcc/reginfo.c:1234
> 0xd8be1d record_subregs_of_mode
> /space/rguenther/src/svn/trunk/gcc/reginfo.c:1294
> 0xd8c246 init_subregs_of_mode()
> /space/rguenther/src/svn/trunk/gcc/reginfo.c:1348
> 0xc1a55c init_costs
> /space/rguenther/src/svn/trunk/gcc/ira-costs.c:2187
> 
> which is
> 
>   /* This should always pass, otherwise we don't know how to verify
>  the constraint.  These conditions may be relaxed but
>  subreg_regno_offset would need to be redesigned.  */
>   gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);

So perhaps either validate_subreg should check the same thing, or
the extraction could check this.
Anyway, I don't have anything against removing that hunk from
extract_bit_field_1 for GCC7.

Jakub

Re: [patch] Fix PR target/67172

2016-04-04 Thread Bernd Schmidt


On 04/04/2016 09:39 AM, Eric Botcazou wrote:

The audit trail of the PR is quite confused: __LIBGCC_EH_FRAME_SECTION_NAME__
needs to be (and is correctly) defined during the libgcc build because DWARF2
exceptions are used.  The problem is that libgcc2.c expects the usual non-ELF
setup, with __EH_FRAME_BEGIN__ defined in crtstuff.c.  Now t-cygwin has:


That's t-cygming, not t-cygwin, right?


This code is already entirely skipped for Cygwin because of:

#ifndef __CYGWIN__

so the attached patch adds a similar test on MinGW for the problematic code.

Tested on x86/Windows configured with --disable-sjlj-exceptions --with-dwarf2,
OK for the mainline and 5 branch?


It looks plausible. One could argue that gcc shouldn't define 
__LIBGCC_EH_FRAME_SECTION_NAME__ in this case, but since it's used in 
cygming-crtbegin.c, that seems to gain relatively little.


So, I think this is ok.


Bernd

Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-04 Thread Jan Hubicka


> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 9eb63c2..bc0c612 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -511,9 +511,20 @@ lto_balanced_map (int n_lto_partitions)
>varpool_order.qsort (varpool_node_cmp);
>  
>/* Compute partition size and create the first partition.  */
> +  if (PARAM_VALUE (MIN_PARTITION_SIZE) > PARAM_VALUE (MAX_PARTITION_SIZE))
> +fatal_error (input_location, "min partition size cannot be greater than 
> max partition size");
> +
>partition_size = total_size / n_lto_partitions;
>if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
>  partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
> +  else if (partition_size > PARAM_VALUE (MAX_PARTITION_SIZE))
> +{
> +  n_lto_partitions = total_size / PARAM_VALUE (MAX_PARTITION_SIZE);
> +  if (total_size % PARAM_VALUE (MAX_PARTITION_SIZE))
> + n_lto_partitions++;
> +  partition_size = total_size / n_lto_partitions;
> +}

lto_balanced_map actually works in a way that looks for cheapest cutpoint in 
range
3/4*parittion_size to 2*partition_size and picks the cheapest range.
Setting partition_size to this value will thus not cause partitioner to produce 
smaller
partitions only.  I suppose modify the conditional:

  /* Partition is too large, unwind into step when best cost was reached and
 start new partition.  */   
  if (partition->insns > 2 * partition_size)

and/or in the code above set the partition_size to half of total_size/max_size.

I know this is somewhat sloppy.  This was really just first cut implementation
many years ago. I expected to reimplement it marter soon, but then there was
never really a need for it (I am trying to avoid late IPA optimizations so the
partitioning decisions should mostly affect compile time performance only).
If ARM is more sensitive for partitining, perhaps it would make sense to try to
look for something smarter.

> +
>npartitions = 1;
>partition = new_partition ("");
>if (symtab->dump_file)
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 9dd513f..294b8a4 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -3112,6 +3112,12 @@ do_whole_program_analysis (void)
>timevar_pop (TV_WHOPR_WPA);
>  
>timevar_push (TV_WHOPR_PARTITIONING);
> +
> +  if (flag_lto_partition != LTO_PARTITION_BALANCED
> +  && PARAM_VALUE (MAX_PARTITION_SIZE) != INT_MAX)
> +fatal_error (input_location, "--param max-lto-partition should only"
> +  " be used with balanced partitioning\n");
> +

I think we should wire in resonable MAX_PARTITION_SIZE default.  THe value you
found experimentally may be a good start. For that reason we can't really
refuse a value when !LTO_PARTITION_BALANCED.  Just document it as parameter for
balanced partitioning only and add a parameter to lto_balanced_map specifying 
whether
this param should be honored (because the same path is used for partitioning to 
one partition)

Otherwise the patch looks good to me modulo missing documentation.

Honza

Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking DR against its innermost loop bahavior if possible

2016-04-04 Thread Bin.Cheng

On Mon, Apr 4, 2016 at 2:07 PM, Richard Biener
 wrote:
> On Thu, Mar 31, 2016 at 6:43 PM, Bin.Cheng  wrote:
>> On Tue, Mar 29, 2016 at 9:37 AM, Richard Biener
>>  wrote:
>>> On Mon, Mar 28, 2016 at 9:57 PM, Bin.Cheng  wrote:
 Sorry, Should have replied to gcc-patches list.

 Thanks,
 bin

 -- Forwarded message --
 From: "Bin.Cheng" 
 Date: Tue, 29 Mar 2016 03:55:04 +0800
 Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking
 DR against its innermost loop bahavior if possible
 To: Richard Biener 

 On 3/17/16, Richard Biener  wrote:
> On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng  wrote:
>> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
>>  wrote:
>
> It is an alternative to adding a hook to get_references_in_stmt and
> probably "easier".
>
>>>
>>> Index: tree-if-conv.c
>>> ===
>>> --- tree-if-conv.c  (revision 234215)
>>> +++ tree-if-conv.c  (working copy)
>>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>>>
>>>for (i = 0; refs->iterate (i, &dr); i++)
>>>  {
>>> +  tree *refp = &DR_REF (dr);
>>> +  while ((TREE_CODE (*refp) == COMPONENT_REF
>>> + && TREE_OPERAND (*refp, 2) == NULL_TREE)
>>> +|| TREE_CODE (*refp) == IMAGPART_EXPR
>>> +|| TREE_CODE (*refp) == REALPART_EXPR)
>>> +   refp = &TREE_OPERAND (*refp, 0);
>>> +  if (refp != &DR_REF (dr))
>>> +   {
>>> + tree saved_base = *refp;
>>> + *refp = integer_zero_node;
>>> +
>>> + if (DR_INIT (dr))
>>> +   {
>>> + tree poffset;
>>> + int punsignedp, preversep, pvolatilep;
>>> + machine_mode pmode;
>>> + HOST_WIDE_INT pbitsize, pbitpos;
>>> + get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos,
>>> &poffset,
>>> +  &pmode, &punsignedp, &preversep,
>>> &pvolatilep,
>>> +  false);
>>> + gcc_assert (poffset == NULL_TREE);
>>> +
>>> + DR_INIT (dr)
>>> +   = wide_int_to_tree (ssizetype,
>>> +   wi::sub (DR_INIT (dr),
>>> +pbitpos / BITS_PER_UNIT));
>>> +   }
>>> +
>>> + *refp = saved_base;
>>> + DR_REF (dr) = *refp;
>>> +   }
>> Looks to me the code is trying to resolve difference between two (or
>> more) component references, which is DR_INIT in the code.  But DR_INIT
>> is not the only thing needs to be handled.  For a structure containing
>> two sub-arrays, DR_OFFSET may be different too.
>
> Yes, but we can't say that if
>
>   a->a[i]
>
> doesn't trap that then
>
>   a->b[i]
>
> doesn't trap either.  We can only "strip" outermost
> non-variable-offset components.
>
> But maybe I'm missing what example you are thinking of.
 Hmm, this was the case I meant.  What I don't understand is current
 code logic does infer trap information for a.b[i] from a.a[i].  Given
 below example:
 struct str
 {
   int a[10];
   int b[20];
   char c;
 };

 void bar (struct str *);
 int foo (int x, int n)
 {
   int i;
   struct str s;
   bar (&s);
   for (i = 0; i < n; i++)
 {
   s.a[i] = s.b[i];
   if (x > i)
 s.b[i] = 0;
 }
   bar (&s);
   return 0;
 }
 The loop is convertible because of below code in function
 ifcvt_memrefs_wont_trap:

   /* If a is unconditionally accessed then ... */
   if (DR_RW_UNCONDITIONALLY (*master_dr))
 {
   /* an unconditional read won't trap.  */
   if (DR_IS_READ (a))
 return true;

   /* an unconditionaly write won't trap if the base is written
  to unconditionally.  */
   if (base_master_dr
   && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
 return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
   else
 {
   /* or the base is know to be not readonly.  */
   tree base_tree = get_base_address (DR_REF (a));
   if (DECL_P (base_tree)
   && decl_binds_to_current_def_p (base_tree)
   && ! TREE_READONLY (base_tree))
 return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
 }
 }
 It is the main object '&s' that is recorded in base_master_dr, so
 s.b[i] is considered not trap.  Even it's not, I suppose
 get_base_address will give same result?
>>>
>>> Well, for this case it sees that s.b[i] is read from so it can't be

Re: [PATCH] Add security_sensitive attribute to clean function stack and regs.

2016-04-04 Thread Marcos Díaz

On Mon, Mar 28, 2016 at 3:05 PM, Marcos Díaz
 wrote:
> (Ping and changelog fix)
>
> On Tue, Mar 22, 2016 at 11:15 AM, Marcos Díaz
>  wrote:
>> Hi,
>>the attached patch adds a new attribute 'security_sensitive' for 
>> functions.
>> The idea was discussed in PR middle-end/69976.
>> This attribute makes gcc to emit clean up code at the function's epilogue.
>> This clean-up code cleans the stack used by this function and that isn't
>> needed anymore. It also cleans used registers. It only works in x86_64.
>> Please, review the discussion here:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69976
>> since we had some doubts with the implementation.
>>
>> We also added some test-cases and ran all tests in x86_64.
>> We think this isn't a bug-fix but a new feature.
>>
>> Changelog
>> 2016-03-21  Marcos Diaz  
>>Andres Tiraboschi  
>>
>> PR tree-optimization/69820
> This line should've been  PR middle-end/69976
>> * config/i386/i386-protos.h: Add ix86_clear_regs_emit and
>> ix86_sec_sensitive_attr_p
>> * config/i386/i386.c: (ix86_sec_sensitive_attr_p): New function
>> (ix86_using_red_zone): now take into account if the function has the new
>> attribute.
>> (is_preserved_reg): New function.
>> (is_integer_reg): New function.
>> (is_used_as_ret): New function.
>> (reg_to_string): New function.
>> (clear_reg_emit): New function.
>> (ix86_clear_regs_emit): New function.
>> (ix86_expand_epilogue): Added code to emit clean up code only when
>> security_sensitive attribute is set.
>> (ix86_handle_security_sensitive_attribute): New function.
>> (ix86_attribute_table): Added new attribute.
>> * config/i386/i386.md: (UNSPECV_CLRSTACK): New unspecv.
>> (UNSPECV_CLRREGS): New unspecv.
>> (return): Conditionally emit cleaning regs code.
>> (simple_return): Likewise
>> (clear_regs): New insn.
>> (clear_stack): New insn.
>> * doc/extend.texi: Added description for new security_sensitive attribute.
>
> Changelog
> 2016-03-21  Marcos Diaz  
>Andres Tiraboschi  
>
> PR middle-end/69976
> * config/i386/i386-protos.h: Add ix86_clear_regs_emit and
> ix86_sec_sensitive_attr_p
> * config/i386/i386.c: (ix86_sec_sensitive_attr_p): New function
> (ix86_using_red_zone): now take into account if the function has the new
> attribute.
> (is_preserved_reg): New function.
> (is_integer_reg): New function.
> (is_used_as_ret): New function.
> (reg_to_string): New function.
> (clear_reg_emit): New function.
> (ix86_clear_regs_emit): New function.
> (ix86_expand_epilogue): Added code to emit clean up code only when
> security_sensitive attribute is set.
> (ix86_handle_security_sensitive_attribute): New function.
> (ix86_attribute_table): Added new attribute.
> * config/i386/i386.md: (UNSPECV_CLRSTACK): New unspecv.
> (UNSPECV_CLRREGS): New unspecv.
> (return): Conditionally emit cleaning regs code.
> (simple_return): Likewise
> (clear_regs): New insn.
> (clear_stack): New insn.
> * doc/extend.texi: Added description for new security_sensitive attribute.

ping^2

Re: Fix for PR70498 in Libiberty Demangler

2016-04-04 Thread Marcel Böhme


> On 4 Apr 2016, at 9:24 PM, Bernd Schmidt  wrote:
> 
>> 
>> The patch now also accounts for overflows in d_compact_number which
>> is supposed to return -1 in case of negative numbers.
> 
> I take it this isn't for the normal 'n' case, but for instances where we 
> encounter overflows in d_number?
Yes.

Best regards,
- Marcel

Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-04 Thread Maciej W. Rozycki

On Thu, 31 Mar 2016, Jake Hamby wrote:

> There's one more thing that's broken in the VAX backend which I'd 
> *really* like to fix: GCC can't compile many of its own files at -O2, as 
> well as a few other .c files in the NetBSD tree, because it can't expand 
> an insn pattern. The Makefiles have hacks to add "-O0" on VAX, and when 
> I remove that workaround, I get GCC assertion failures, all of the form:
> 
> /home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c: In function 
> 'void maybe_emit_chk_warning(tree, built_in_function)':
> /home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c:11549:1: 
> error: unrecognizable insn:
>  }
>  ^
> (insn 295 294 296 25 (set (reg:SI 111)
> (subreg:SI (mem:DI (plus:SI (mult:SI (reg:SI 109)
> (const_int 8 [0x8]))
> (reg/f:SI 45 [ D.85015 ])) [7 *_98+0 S8 A32]) 4)) 
> /home/netbsd/current/src/external/gpl3/gcc/dist/gcc/wide-int.h:799 -1
>  (nil))
> /home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c:11549:1: 
> internal compiler error: in extract_insn, at recog.c:2343
> 0xbd0365 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
> const*)
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/rtl-error.c:110
> 0xbd03fa _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/rtl-error.c:118
> 0xb92a2d extract_insn(rtx_insn*)
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/recog.c:2343
> 0x9612cd instantiate_virtual_regs_in_insn
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:1598
> 0x9612cd instantiate_virtual_regs
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:1966
> 0x9612cd execute
>   
> /home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:2015
> 
> The failures all seem to be related to trying to read a value from an 
> array of 64-bit values and loading it into a 32-bit register. It seems 
> like there should be a special insn defined for this sort of array 
> access, since VAX has mova* and pusha* variants to set a value from an 
> address plus an index into a byte, word, long, or 64-bit array (it uses 
> movab/pushab, put not the other variants). The addressing modes, 
> constraints, and predicates all get very complicated, and I still need 
> to understand a bit better what is actually required, and what could be 
> simplified and cleaned up.

 Please note however that this RTL instruction is a memory reference, not 
an address load.  So the suitable hardware instruction would be MOVAQ for 
an indexed DI mode memory load.  If used to set a SI mode subreg at byte 
number 4 (this would be the second hardware register of a pair a DI mode 
value is held in) as seen here, it would have to address the immediately 
preceding register however (so you can't load R0 this way) and it would 
clobber it.

 So offhand I think you need an RTL instruction splitter to express this, 
and then avoid fetching 64 bits worth of data from memory -- for the sake 
of matching the indexed addressing mode -- where you only need 32 bits.  
At the hardware instruction level I'd use a scratch register (as with 
MOVAQ you'd have to waste one anyway) to scale the index and then use 
MOVAL instead with the modified index.  Where no index is used it gets 
simpler even as you can just bump up the displacement according to the 
subreg offset.

 HTH,

  Maciej

Re: [PATCH] Add security_sensitive attribute to clean function stack and regs.

2016-04-04 Thread Bernd Schmidt


On 03/22/2016 03:15 PM, Marcos Díaz wrote:

the attached patch adds a new attribute 'security_sensitive' for functions.
The idea was discussed in PR middle-end/69976.


The first question would be: I see several submissions from folks 
@tallertechnologies.com. Do you and your employer have copyright 
assignments on file with the FSF?


The patch violates gcc coding guidelines in many ways. I'll point out 
some issues, but in general you are expected to familiarize yourself 
with the requirements first. Please review the entire patch yourself for 
such issues.



+bool ix86_sec_sensitive_attr_p(void)


Break line after return types.

  {
-  return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
+  return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI
+   && !ix86_sec_sensitive_attr_p();


Wrap multi-line expressions in parentheses to get correct formatting 
from the editor.



+  return (regno == BX_REG)
+|| (regno == BP_REG)
+|| (regno == SP_REG)
+|| (regno == R12_REG)
+|| (regno == R13_REG)
+|| (regno == R14_REG)
+|| (regno == R15_REG);


Same here but also remove unnecessary parentheses (in many places in 
this patch).



+}
+
+/* Returns true iff regno is an integer register.*/
+static inline bool is_integer_reg(unsigned int regno)


Space before open parens.


+  rtx ret = ix86_function_value
+   (TREE_TYPE (DECL_RESULT (cfun->decl)), cfun->decl, true);


Probably better to split this line after one of the commas.


+ // Is parallel


Use C style comments consistently.

+/* Adds to str in pos position the neame of the register regno.*/


Always two spaces after a period, including at the end of a comment.

In general it would be ideal if this functionality was not x86 specific. 
I could imagine that thread_prologue_and_epilogue_insns could do this in 
a machine-independent fashion.



Bernd

Re: [PATCH] Fix PR70457 (ICE on incompatible call to built-in pow)

2016-04-04 Thread Bill Schmidt

Hi Jakub,

Sorry again for the misunderstanding.  Here are revised patches and
change logs for trunk, GCC 5, and GCC 4.9.  Note that GCC 5 and GCC 4.9
have additional exposures in tree-ssa-math-opts.c that I've repaired
similarly.  The only differences between 5 and 4.9 are the names of the
affected functions.

All have passed regstrap on powerpc64le-unknown-linux-gnu.  Are these
ok?

Thanks,
Bill

Trunk
=

[gcc]

2016-04-04  Bill Schmidt  
Jakub Jelinek 

PR middle-end/70457
* tree-inline.c (estimate_num_insn): Use gimple_call_builtin_p
to ensure a call statement is compatible with a built-in's
prototype.
* tree-ssa-math-opts.c (pass_optimize_windening_mul::execute):
Likewise.

[gcc/testsuite]

2016-04-04  Bill Schmidt  
Jakub Jelinek 

PR middle-end/70457
* gcc.dg/torture/pr70457.c: New.


Index: gcc/testsuite/gcc.dg/torture/pr70457.c
===
--- gcc/testsuite/gcc.dg/torture/pr70457.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70457.c  (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+
+/* This formerly ICEd when trying to expand pow as a built-in with
+   the wrong number of arguments.  */
+
+extern double pow (double __x, double __y) __attribute__ ((__nothrow__ , 
__leaf__));
+extern double __pow (double __x, double __y) __attribute__ ((__nothrow__ , 
__leaf__));
+
+typedef int int64_t __attribute__ ((__mode__ (__DI__)));
+
+typedef struct {
+  int64_t data;
+  int tag;
+} Object;
+
+extern Object Make_Flonum (double);
+extern Object P_Pow (Object, Object);
+
+Object General_Function (Object x, Object y, double (*fun)()) {
+  double d, ret;
+
+  d = 1.0;
+
+  if (y.tag >> 1)
+ret = (*fun) (d);
+  else
+ret = (*fun) (d, 0.0);
+
+  return Make_Flonum (ret);
+}
+
+Object P_Pow (Object x, Object y) { return General_Function (x, y, pow); }
Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 234702)
+++ gcc/tree-inline.c   (working copy)
@@ -4065,7 +4065,7 @@ estimate_num_insns (gimple *stmt, eni_weights *wei
  return 0;
else if (is_inexpensive_builtin (decl))
  return weights->target_builtin_call_cost;
-   else if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+   else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
  {
/* We canonicalize x * x to pow (x, 2.0) with -ffast-math, so
   specialize the cheap expansion we do here.
Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c(revision 234702)
+++ gcc/tree-ssa-math-opts.c(working copy)
@@ -3827,7 +3827,7 @@ pass_optimize_widening_mul::execute (function *fun
{
  tree fndecl = gimple_call_fndecl (stmt);
  if (fndecl
- && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+ && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
{
  switch (DECL_FUNCTION_CODE (fndecl))
{

GCC 5
=

[gcc]

2016-04-04  Bill Schmidt  
Jakub Jelinek 

PR middle-end/70457
* tree-inline.c (estimate_num_insn): Use gimple_call_builtin_p
to ensure a call statement is compatible with a built-in's
prototype.
* tree-ssa-math-opts.c (execute_cse_sincos_1): Likewise.
(pass_cse_sincos::execute): Likewise.
(pass_optimize_widening_mul::execute):  Likewise.

[gcc/testsuite]

2016-04-04  Bill Schmidt  
Jakub Jelinek 

PR middle-end/70457
* gcc.dg/torture/pr70457.c: New.


Index: gcc/testsuite/gcc.dg/torture/pr70457.c
===
--- gcc/testsuite/gcc.dg/torture/pr70457.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70457.c  (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+
+/* This formerly ICEd when trying to expand pow as a built-in with
+   the wrong number of arguments.  */
+
+extern double pow (double __x, double __y) __attribute__ ((__nothrow__ , 
__leaf__));
+extern double __pow (double __x, double __y) __attribute__ ((__nothrow__ , 
__leaf__));
+
+typedef int int64_t __attribute__ ((__mode__ (__DI__)));
+
+typedef struct {
+  int64_t data;
+  int tag;
+} Object;
+
+extern Object Make_Flonum (double);
+extern Object P_Pow (Object, Object);
+
+Object General_Function (Object x, Object y, double (*fun)()) {
+  double d, ret;
+
+  d = 1.0;
+
+  if (y.tag >> 1)
+ret = (*fun) (d);
+  else
+ret = (*fun) (d, 0.0);
+
+  return Make_Flonum (ret);
+}
+
+Object P_Pow (Object x, Object y) { return General_Function (x, y, pow); }
Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 234705)
+++ gcc/tree-in

Re: [PATCH] Fix PR70457 (ICE on incompatible call to built-in pow)

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 10:02:44AM -0500, Bill Schmidt wrote:
> Sorry again for the misunderstanding.  Here are revised patches and
> change logs for trunk, GCC 5, and GCC 4.9.  Note that GCC 5 and GCC 4.9
> have additional exposures in tree-ssa-math-opts.c that I've repaired
> similarly.  The only differences between 5 and 4.9 are the names of the
> affected functions.
> 
> All have passed regstrap on powerpc64le-unknown-linux-gnu.  Are these
> ok?

Yes, with small nits:

> +extern double pow (double __x, double __y) __attribute__ ((__nothrow__ , 
> __leaf__));
> +extern double __pow (double __x, double __y) __attribute__ ((__nothrow__ , 
> __leaf__));

No need for __pow prototype, you can also remove the __x and __y argument
names.

> +typedef int int64_t __attribute__ ((__mode__ (__DI__)));

Please remove the above typedef and just use:

> +
> +typedef struct {
> +  int64_t data;

  long long data;

here.  Not sure if we support DImode on all targets, e.g. those with
32-bit BITS_PER_UNIT etc. (where DImode would be 256-bit integer).
Or just remove the data field altogether if it reproduces without it too.

Jakub

Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)

2016-04-04 Thread Jason Merrill

On 04/02/2016 05:18 PM, Patrick Palka wrote:

Here's a version that uses a separate deletable table to cache the
function copies.  For simplicity I used a hash_map instead of a
hash_table.  Does this look OK to commit after bootstrap + regtest?

Thanks.  Minor nits:

> +struct fundef_copies_table_t
> +{
> +  hash_map *map;
> +};

Why wrap the pointer in a struct?

+ maybe_initialize_fundef_copies_table ();
+ fun_copy *copy = get_fun_copy (fun);

Let's move the initialization call inside get_fun_copy.

On a related note, I noticed that the constexpr_call_table is not marked
deletable.  Marking it deletable speeds up the test case in the PR by
about 10% and saves about 10MB.  Do you think doing so is a good idea?

Please.

On another related note, I noticed that marking something as both
GTY((deletable, cache)) doesn't work as intended, because the
gt_cleare_cache functions run _after_ all deletable roots get
zeroed out.  So during GC the gt_cleare_cache function of a root
marked "deletable, cache" would always be a no-op.  Concretely I think
this means that our cv_cache and fold_cache leak memory during GC
because their underlying hash_map (allocated by operator new) is zeroed
before gc_cleare_cache could clear it.

Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache 
and it being non-null.  But I guess we can get rid of the cache_map 
class and use the approach you have here, of a deletable gc-allocated 
hash_map pointer; I'd still use ->empty() for dumping the cache outside 
of GC, though.

Jason

Do not optimize some polymorphic calls with -fsanitize=undefined

2016-04-04 Thread Jan Hubicka

Hi,
as requested by Jakub, this patch makes devirtualization code to turn off
transformations based on assumption that cxa_pure_virtual will never be called
by a virtual call when -fsanitize=undefined is used.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

PR ipa/66223
* ipa-devirt.c (maybe_record_node): Do not optimize cxa_pure_virtual
calls when sanitizing.
(possible_polymorphic_call_target_p)" FIx formating.

* g++.dg/ipa/devirt-51.C: New testcase.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 234715)
+++ ipa-devirt.c(working copy)
@@ -2438,10 +2438,14 @@ maybe_record_node (vec  &
 {
   gcc_assert (!target_node->global.inlined_to);
   gcc_assert (target_node->real_symbol_p ());
+  /* When sanitizing, do not asume that cxa_pure_virutal is not called
+by valid program.  */
+  if (flag_sanitize & SANITIZE_UNDEFINED)
+   ;
   /* Only add pure virtual if it is the only possible target.  This way
 we will preserve the diagnostics about pure virtual called in many
 cases without disabling optimization in other.  */
-  if (pure_virtual)
+  else if (pure_virtual)
{
  if (nodes.length ())
return;
@@ -3374,8 +3378,7 @@ possible_polymorphic_call_target_p (tree
   bool final;
 
   if (TREE_CODE (TREE_TYPE (n->decl)) == FUNCTION_TYPE
-  && ((fcode = DECL_FUNCTION_CODE (n->decl))
- == BUILT_IN_UNREACHABLE
+  && ((fcode = DECL_FUNCTION_CODE (n->decl)) == BUILT_IN_UNREACHABLE
   || fcode == BUILT_IN_TRAP))
 return true;
 
Index: testsuite/g++.dg/ipa/devirt-51.C
===
--- testsuite/g++.dg/ipa/devirt-51.C(revision 0)
+++ testsuite/g++.dg/ipa/devirt-51.C(working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=undefined -fdump-tree-optimized"  } */
+namespace {
+  struct B {
+B* self;
+B() : self( this ) { self->f(); }
+   void E(void);
+virtual void f() = 0;
+};
+
+struct D : B
+{
+void f() {}
+};
+}
+
+struct D e;
+
+__attribute__ ((used))
+void B::E(void)
+  {
+this->f();
+}
+
+int main()
+{
+D d;
+}
+/* { dg-final { scan-tree-dump "cxa_pure_virtual" "optimized"  } } */

Re: [PATCH] gnattools: Clean config.cache (PR70173)

2016-04-04 Thread Eric Botcazou

> Tested on powerpc64-linux, --enable-languages=all,ada,go,obj-c++ ,
> followed by "make distclean".  Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2016-04-04  Segher Boessenkool  
> 
> gnattools/
>   PR bootstrap/70173
>   * Makefile.in (distclean): Also delete config.cache .

Sure, thanks.

-- 
Eric Botcazou

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-04 Thread Evandro Menezes


On 04/01/16 18:08, Wilco Dijkstra wrote:

Evandro Menezes wrote:

I hope that this gets in the ballpark of what's been discussed previously.

Yes that's very close to what I had in mind. A minor issue is that the vector
modes cannot work as they start at MAX_MODE_FLOAT (which is > 32):

+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+ : (0))

That should be:

+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - MIN_MODE_FLOAT 
+ 1)) \

It would be worth testing all the obvious cases to be sure they work.

Also I don't think it is a good idea to enable all modes on Exynos-M1 and 
XGene-1 -
I haven't seen any evidence that shows it gives a speedup on real code for all 
modes
(or at least on a good micro benchmark like the unit vector test I suggested - 
a simple
throughput test does not count!).


This approximation does benefit M1 in general across several 
benchmarks.  As for my choice for Xgene1, it preserves the original 
setting.  I believe that, with this more granular option, developers can 
fine tune their targets.



The issue is it hides performance gains from an improved divider/sqrt on new 
revisions
or microarchitectures. That means you should only enable cases where there is 
evidence
of a major speedup that cannot be matched by a future improved divider/sqrt.


I did notice that some benchmarks with heavy use of multiplication or 
multiply-accumulation, the series may be detrimental, since it increases 
the competition for the unit(s) that do(es) such operations.


But those micro-architectures that get a better unit for division or 
sqrt() are free to add their own tuning parameters.  Granted, I assume 
that running legacy code is not much of an issue only in a few markets.


Thank you,

--
Evandro Menezes

>From 63a39df80104c504ffdfba698aab9dc2f73221a1 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH 1/2] [AArch64] Add more choices for the reciprocal square root
 approximation

Allow a target to prefer such operation depending on the operation mode.

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise.
	(tune_params): New member "approx_rsqrt_modes".
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(use_rsqrt_p): New argument for the mode and use new member
	"approx_rsqrt_modes" from "tune_params".
	(aarch64_builtin_reciprocal): Devise mode from builtin.
	(aarch64_optab_supported_p): New argument for the mode.
---
 gcc/config/aarch64/aarch64-protos.h | 30 ++
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 --
 gcc/config/aarch64/aarch64.c| 39 ++---
 3 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 58c9d0d..a31ee35 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,32 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
+	  + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (V2SFmode) \
+			   | AARCH64_APPROX_MODE (V4SFmode))
+#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \
+			   | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \
+			  | AARCH64_APPROX_MODE (DFmode) \
+			  | AARCH64_APPROX_MODE (V2SFmode))
+#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \
+			  | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_SCALAR (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (DFmode))
+#define AARCH64_APPROX_VECTOR (AARCH64_APPROX_MODE (V2SFmode) \
+   | AARCH64_APPROX_MODE (V4SFmode) \
+			   | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH6

Re: [AArch64] Emit square root using the Newton series

2016-04-04 Thread Evandro Menezes


On 04/01/16 17:45, Evandro Menezes wrote:

On 03/24/16 14:11, Evandro Menezes wrote:

On 03/17/16 17:46, Evandro Menezes wrote:
This patch refactors the function to emit the reciprocal square root 
approximation to also emit the square root approximation.
This version of the patch cleans up the changes to the MD files and 
fixes some bugs introduced in it since the first proposal.


This version of the patch uses the finer grained selection for the 
approximate sqrt() by the target firstly proposed at 
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html


Additionally, I changed the handling of the special case when the 
argument is 0.0 for scalars to be the same as for vectors.  The reason 
is that by relying on the CC, a scarce resource, it hindered 
parallelism.  By using up an additional register to hold the mask also 
for scalars, the code is more... scalable.


Hopefully this patch gets close to what all have in mind.



[AArch64] Emit square root using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra  

gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(AARCH64_APPROX_MODE): New macro.
   (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise.
(tune_params): New member "approx_sqrt_modes".
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_rsqrt_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(aarch64_override_options_after_change_1): Handle new option.
* config/aarch64/aarch64-simd.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-sqrt): Add new option description.
* doc/invoke.texi (mlow-precision-sqrt): Likewise.


This version of the patch refactors the algorithm to shorten the 
dependency chain at the last iteration of the series.


Thank you for your feedback.

--
Evandro Menezes

>From 8e463d55c89233a623aad2412fb3055021fdd066 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Apr 2016 11:23:29 -0500
Subject: [PATCH] [AArch64] Emit square root using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise.
	(tune_params): New member "approx_sqrt_modes".
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(aarch64_override_options_after_change_1): Handle new option.
	* config/aarch64/aarch64-simd.md
	(rsqrt2): Use new function instead.
	(sqrt2): New expansion and insn definitions.
	* config/aarch64/aarch64.md: Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-sqrt): Add new option description.
	* doc/invoke.texi (mlow-precision-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  29 -
 gcc/config/aarch64/aarch64-simd.md  |  13 +++-
 gcc/config/aarch64/aarch64.c| 115 +---
 gcc/config/aarch64/aarch64.md   |  11 +++-
 gcc/config/aarch64/aarch64.opt  |   9 ++-
 gcc/doc/invoke.texi |  10 
 6 files changed, 147 insertions(+), 40 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 58c9d0d..365572d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,32 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
+	  + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AA

Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)

2016-04-04 Thread Patrick Palka

On Mon, 4 Apr 2016, Jason Merrill wrote:

> On 04/02/2016 05:18 PM, Patrick Palka wrote:
> > Here's a version that uses a separate deletable table to cache the
> > function copies.  For simplicity I used a hash_map instead of a
> > hash_table.  Does this look OK to commit after bootstrap + regtest?
> 
> Thanks.  Minor nits:
> 
> > +struct fundef_copies_table_t
> > +{
> > +  hash_map *map;
> > +};
> 
> Why wrap the pointer in a struct?

If I don't wrap the pointer then I get really weird link errors.  If the
following line is added to constexpr.c:

  static GTY((deletable)) hash_map *blah;

then the final link fails with dozens of these undefined reference errors:

  /home/patrick/code/gcc/gcc/hash-map.h:68: undefined reference to 
`gt_pch_nx(tree_node*&)'
  /home/patrick/code/gcc/gcc/hash-map.h:61: undefined reference to 
`gt_ggc_mx(tree_node*&)'
  /home/patrick/code/gcc/gcc/hash-map.h:62: undefined reference to 
`gt_ggc_mx(tree_node*&)'
  ...

Seems to only happen when the value type of the hash_map is something other
than "tree".  This strangeness can be conveniently avoided by wrapping the
hash_map.


> 
> > + maybe_initialize_fundef_copies_table ();
> > + fun_copy *copy = get_fun_copy (fun);
> 
> Let's move the initialization call inside get_fun_copy.

Done.

> 
> > On a related note, I noticed that the constexpr_call_table is not marked
> > deletable.  Marking it deletable speeds up the test case in the PR by
> > about 10% and saves about 10MB.  Do you think doing so is a good idea?
> 
> Please.

Done.

> 
> > On another related note, I noticed that marking something as both
> > GTY((deletable, cache)) doesn't work as intended, because the
> > gt_cleare_cache functions run _after_ all deletable roots get
> > zeroed out.  So during GC the gt_cleare_cache function of a root
> > marked "deletable, cache" would always be a no-op.  Concretely I think
> > this means that our cv_cache and fold_cache leak memory during GC
> > because their underlying hash_map (allocated by operator new) is zeroed
> > before gc_cleare_cache could clear it.
> 
> Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache and it
> being non-null.  But I guess we can get rid of the cache_map class and use the
> approach you have here, of a deletable gc-allocated hash_map pointer; I'd
> still use ->empty() for dumping the cache outside of GC, though.

Could do this too in a subsequent patch.

Here's an updated patch that that adjusts the initialization call and marks
constexpr_call_table as deletable.  (Also fun_copy is renamed to fundef_copy
for consistency.)

-- >8 --

gcc/cp/ChangeLog:

PR c++/70452
* constexpr.c (struct fundef_copy): New struct.
(struct fundef_copies_table_t): New struct.
(fundef_copies_table): New static variable.
(maybe_initialize_fundef_copies_table): New static function.
(get_fundef_copy): New static function.
(save_fundef_copy): New static function.
(cxx_eval_call_expression): Use get_fundef_copy, and
save_fundef_copy.
(constexpr_call_table): Add "deletable" GTY marker.

gcc/testsuite/ChangeLog:

PR c++/70452
* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c| 99 +--
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++
 2 files changed, 111 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b94b346..bcbf9bd 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -915,7 +915,7 @@ struct constexpr_ctx {
 /* A table of all constexpr calls that have been evaluated by the
compiler in this translation unit.  */
 
-static GTY (()) hash_table *constexpr_call_table;
+static GTY ((deletable)) hash_table 
*constexpr_call_table;
 
 static tree cxx_eval_constant_expression (const constexpr_ctx *, tree,
  bool, bool *, bool *, tree * = NULL);
@@ -965,6 +965,78 @@ maybe_initialize_constexpr_call_table (void)
 constexpr_call_table = hash_table::create_ggc (101);
 }
 
+/* The representation of a single node in the per-function freelist maintained
+   by FUNDEF_COPIES_TABLE.  */
+
+struct fundef_copy
+{
+  tree body;
+  tree parms;
+  tree res;
+  fundef_copy *prev;
+};
+
+/* During constexpr CALL_EXPR evaluation, to avoid issues with sharing when
+   a function happens to get called recursively, we unshare the callee
+   function's body and evaluate this unshared copy instead of evaluating the
+   original body.
+
+   FUNDEF_COPIES_TABLE is a per-function freelist of these unshared function
+   copies.  The underlying data structure of FUNDEF_COPIES_TABLE is a hash_map
+   that's keyed off of the original FUNCTION_DECL and whose value is the chain
+   of this function's unused copies awaiting reuse.  */
+
+struct fundef_copies_table_t
+{
+  hash_map *map;
+};
+
+static GTY((deletable)) fundef_

[PATCH] Remove class cache_map and use ggc hash_maps instead (PR c++/70452)

2016-04-04 Thread Patrick Palka

On Mon, 4 Apr 2016, Jason Merrill wrote:

> Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache and it
> being non-null.  But I guess we can get rid of the cache_map class and use the
> approach you have here, of a deletable gc-allocated hash_map pointer; I'd
> still use ->empty() for dumping the cache outside of GC, though.

Is this what you had in mind?

gcc/cp/ChangeLog:

PR c++/70452
* cp-tree.h (class cache_map): Remove.
* constexpr.c (cv_cache): Change type to
GTY((deletable)) hash_map *.
(maybe_constant_value): Adjust following the change to cv_cache.
(clear_cv_cache): New static function.
(clear_cv_and_fold_caches): Use it.
* cp-gimplify.c (fold_cache): Change type to
GTY((deletable)) hash_map *.
(clear_fold_cache): Adjust following the change to fold_cache.
(cp_fold): Likewise.
---
 gcc/cp/constexpr.c   | 27 ++-
 gcc/cp/cp-gimplify.c | 17 +++--
 gcc/cp/cp-tree.h | 36 
 3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index bcbf9bd..8f080ee 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4304,7 +4304,7 @@ maybe_constant_value_1 (tree t, tree decl)
   return r;
 }
 
-static GTY((cache, deletable)) cache_map cv_cache;
+static GTY((deletable)) hash_map *cv_cache;
 
 /* If T is a constant expression, returns its reduced value.
Otherwise, if T does not have TREE_CONSTANT set, returns T.
@@ -4313,13 +4313,22 @@ static GTY((cache, deletable)) cache_map cv_cache;
 tree
 maybe_constant_value (tree t, tree decl)
 {
-  tree ret = cv_cache.get (t);
-  if (!ret)
-{
-  ret = maybe_constant_value_1 (t, decl);
-  cv_cache.put (t, ret);
-}
-  return ret;
+  if (cv_cache == NULL)
+cv_cache = hash_map::create_ggc (101);
+
+  tree *slot = &cv_cache->get_or_insert (t, NULL);
+  if (*slot == NULL_TREE)
+*slot = maybe_constant_value_1 (t, decl);
+  return *slot;
+}
+
+/* Dispose of the whole CV_CACHE.  */
+
+static void
+clear_cv_cache (void)
+{
+  if (cv_cache != NULL)
+cv_cache->empty ();
 }
 
 /* Dispose of the whole CV_CACHE and FOLD_CACHE.  */
@@ -4327,7 +4336,7 @@ maybe_constant_value (tree t, tree decl)
 void
 clear_cv_and_fold_caches (void)
 {
-  gt_cleare_cache (cv_cache);
+  clear_cv_cache ();
   clear_fold_cache ();
 }
 
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index b6e1d27..61ac187 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1902,14 +1902,15 @@ c_fully_fold (tree x, bool /*in_init*/, bool 
*/*maybe_const*/)
   return cp_fold_rvalue (x);
 }
 
-static GTY((cache, deletable)) cache_map fold_cache;
+static GTY((deletable)) hash_map *fold_cache;
 
 /* Dispose of the whole FOLD_CACHE.  */
 
 void
 clear_fold_cache (void)
 {
-  gt_cleare_cache (fold_cache);
+  if (fold_cache != NULL)
+fold_cache->empty ();
 }
 
 /*  This function tries to fold an expression X.
@@ -1939,8 +1940,12 @@ cp_fold (tree x)
   if (DECL_P (x) || CONSTANT_CLASS_P (x))
 return x;
 
-  if (tree cached = fold_cache.get (x))
-return cached;
+  if (fold_cache == NULL)
+fold_cache = hash_map::create_ggc (101);
+
+  tree *slot = &fold_cache->get_or_insert (x, NULL);
+  if (*slot != NULL_TREE)
+return *slot;
 
   code = TREE_CODE (x);
   switch (code)
@@ -2295,10 +2300,10 @@ cp_fold (tree x)
   return org_x;
 }
 
-  fold_cache.put (org_x, x);
+  *slot = x;
   /* Prevent that we try to fold an already folded result again.  */
   if (x != org_x)
-fold_cache.put (x, x);
+fold_cache->put (x, x);
 
   return x;
 }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1d2bfa..0f7e08f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5527,42 +5527,6 @@ extern cp_parameter_declarator *no_parameters;
 /* True if we saw "#pragma GCC java_exceptions".  */
 extern bool pragma_java_exceptions;
 
-/* Data structure for a mapping from tree to tree that's only used as a cache;
-   we don't GC-mark trees in the map, and we clear the map when collecting
-   garbage.  Global variables of this type must be marked
-   GTY((cache,deletable)) so that the gt_cleare_cache function is called by
-   ggc_collect but we don't try to load the map pointer from a PCH.
-
-   FIXME improve to use keep_cache_entry.  */
-class cache_map
-{
-  /* Use a lazily initialized pointer rather than a map member since a
- hash_map can't be constructed in a static initializer.  */
-  hash_map *map;
-
-public:
-  tree get (tree key)
-  {
-if (map)
-  if (tree *slot = map->get (key))
-   return *slot;
-return NULL_TREE;
-  }
-
-  bool put (tree key, tree val)
-  {
-if (!map)
-  map = new hash_map;
-return map->put (key, val);
-  }
-
-  friend inline void gt_cleare_cache (cache_map &cm)
-  {
-if (cm.map)
-  cm.map->empty();
-  }
-};
-
 /* in call.c */
 extern bool check_dtor_name(t

[committed] Fix pr70307.c testcase (PR middle-end/70307)

2016-04-04 Thread Jakub Jelinek

Hi!

I've noticed the new testcase FAILs on i?86-linux and powerpc-linux.
As this is a compile time testcase rather than runtime, I chose to prune
the rs6000 ABI warnings instead of using -w which is otherwise the only way
to disable them :(.

Fixed thusly, committed to trunk.

2016-04-04  Jakub Jelinek  

PR middle-end/70307
* gcc.dg/torture/pr70307.c: Add -Wno-psabi to dg-options.  Prune
rs6000 ABI warnings.

--- gcc/testsuite/gcc.dg/torture/pr70307.c.jj   2016-04-04 12:28:39.0 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr70307.c  2016-04-04 19:17:13.103432985 
+0200
@@ -1,5 +1,6 @@
 /* PR c/70307 */
 /* { dg-do compile } */
+/* { dg-options "-Wno-psabi" } */
 
 typedef int v4si __attribute__ ((vector_size (16)));
 
@@ -60,3 +61,7 @@ fn8 (int i)
   struct S s = { .v = (v4si){(1, i++)} == (v4si){(0, 0)} };
   return s.v;
 }
+
+/* Ignore a warning that is irrelevant to the purpose of this test.  */
+/* { dg-prune-output "\[^\n\r\]*GCC vector passed by reference\[^\n\r\]*" } */
+/* { dg-prune-output "\[^\n\r\]*GCC vector returned by reference\[^\n\r\]*" } 
*/


Jakub

[C++ PATCH] PR 70501, ICE in verify ctor sanity

2016-04-04 Thread Nathan Sidwell


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70501

This fixes 70501.  The cause is an omission in typeck when converting a scalar 
operand to a vector.  We use build_vector_from_val, which can return a 
CONSTRUCTOR.  We fail to wrap that CONSTRUCTOR in a TARGET_EXPR.


The ICE arises because at the point we meet that CONSTRUCTOR during the 
constexpr processing, the currently active object under construction is that for 
the result of the <= operator, which has type vector-of-bool, rather than 
vector-of-int. (thus this  problem arises in other vector  ops, but mostly 
undetected because the result type is the same  as the operand type)


ok?

nathan
2016-04-04  Nathan Sidwell  

	PR c++/70501
	* typeck.c (cp_build_binary_op): Wrap vector constructors in
	target_exprs.

	* g++.dg/init/pr70501.C: New.

Index: cp/typeck.c
===
--- cp/typeck.c	(revision 234715)
+++ cp/typeck.c	(working copy)
@@ -4196,6 +4196,8 @@ cp_build_binary_op (location_t location,
   op0 = convert (TREE_TYPE (type1), op0);
 	  op0 = save_expr (op0);
   op0 = build_vector_from_val (type1, op0);
+	  if (TREE_CODE (op0) == CONSTRUCTOR)
+		op0 = get_target_expr (op0);
   type0 = TREE_TYPE (op0);
   code0 = TREE_CODE (type0);
   converted = 1;
@@ -4206,6 +4208,8 @@ cp_build_binary_op (location_t location,
   op1 = convert (TREE_TYPE (type0), op1);
 	  op1 = save_expr (op1);
   op1 = build_vector_from_val (type0, op1);
+	  if (TREE_CODE (op1) == CONSTRUCTOR)
+		op1 = get_target_expr (op1);
   type1 = TREE_TYPE (op1);
   code1 = TREE_CODE (type1);
   converted = 1;
Index: testsuite/g++.dg/init/pr70501.C
===
--- testsuite/g++.dg/init/pr70501.C	(nonexistent)
+++ testsuite/g++.dg/init/pr70501.C	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-options "" } Not pedantic */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+struct S { v4si v; };
+
+void
+fn2 (int i, int j)
+{
+  struct S s = { .v = i <= j + (v4si){(1, 2)} };
+}

Re: [PATCH] Remove class cache_map and use ggc hash_maps instead (PR c++/70452)

2016-04-04 Thread Patrick Palka

On Mon, 4 Apr 2016, Patrick Palka wrote:

> On Mon, 4 Apr 2016, Jason Merrill wrote:
> 
> > Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache and it
> > being non-null.  But I guess we can get rid of the cache_map class and use 
> > the
> > approach you have here, of a deletable gc-allocated hash_map pointer; I'd
> > still use ->empty() for dumping the cache outside of GC, though.
> 
> Is this what you had in mind?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/70452
>   * cp-tree.h (class cache_map): Remove.
>   * constexpr.c (cv_cache): Change type to
>   GTY((deletable)) hash_map *.
>   (maybe_constant_value): Adjust following the change to cv_cache.

Actually, using get_or_insert here is unsafe because
maybe_constant_value could be called recursively.

>   (clear_cv_cache): New static function.
>   (clear_cv_and_fold_caches): Use it.
>   * cp-gimplify.c (fold_cache): Change type to
>   GTY((deletable)) hash_map *.
>   (clear_fold_cache): Adjust following the change to fold_cache.
>   (cp_fold): Likewise.

Same issue in cp_fold probably.  So here's a patch that just uses
get() and put():

-- >8 --

gcc/cp/ChangeLog:

PR c++/70452
* cp-tree.h (class cache_map): Remove.
* constexpr.c (cv_cache): Change type to
GTY((deletable)) hash_map *.
(maybe_constant_value): Adjust following the change to cv_cache.
(clear_cv_cache): New static function.
(clear_cv_and_fold_caches): Use it.
* cp-gimplify.c (fold_cache): Change type to
GTY((deletable)) hash_map *.
(clear_fold_cache): Adjust following the change to fold_cache.
(cp_fold): Likewise.
---
 gcc/cp/constexpr.c   | 27 +++
 gcc/cp/cp-gimplify.c | 16 ++--
 gcc/cp/cp-tree.h | 36 
 3 files changed, 29 insertions(+), 50 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index bcbf9bd..1c2701b 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4304,7 +4304,7 @@ maybe_constant_value_1 (tree t, tree decl)
   return r;
 }
 
-static GTY((cache, deletable)) cache_map cv_cache;
+static GTY((deletable)) hash_map *cv_cache;
 
 /* If T is a constant expression, returns its reduced value.
Otherwise, if T does not have TREE_CONSTANT set, returns T.
@@ -4313,21 +4313,32 @@ static GTY((cache, deletable)) cache_map cv_cache;
 tree
 maybe_constant_value (tree t, tree decl)
 {
-  tree ret = cv_cache.get (t);
-  if (!ret)
-{
-  ret = maybe_constant_value_1 (t, decl);
-  cv_cache.put (t, ret);
-}
+  if (cv_cache == NULL)
+cv_cache = hash_map::create_ggc (101);
+
+  if (tree *cached = cv_cache->get (t))
+return *cached;
+
+  tree ret = maybe_constant_value_1 (t, decl);
+  cv_cache->put (t, ret);
   return ret;
 }
 
+/* Dispose of the whole CV_CACHE.  */
+
+static void
+clear_cv_cache (void)
+{
+  if (cv_cache != NULL)
+cv_cache->empty ();
+}
+
 /* Dispose of the whole CV_CACHE and FOLD_CACHE.  */
 
 void
 clear_cv_and_fold_caches (void)
 {
-  gt_cleare_cache (cv_cache);
+  clear_cv_cache ();
   clear_fold_cache ();
 }
 
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index b6e1d27..c96b666 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1902,14 +1902,15 @@ c_fully_fold (tree x, bool /*in_init*/, bool 
*/*maybe_const*/)
   return cp_fold_rvalue (x);
 }
 
-static GTY((cache, deletable)) cache_map fold_cache;
+static GTY((deletable)) hash_map *fold_cache;
 
 /* Dispose of the whole FOLD_CACHE.  */
 
 void
 clear_fold_cache (void)
 {
-  gt_cleare_cache (fold_cache);
+  if (fold_cache != NULL)
+fold_cache->empty ();
 }
 
 /*  This function tries to fold an expression X.
@@ -1939,8 +1940,11 @@ cp_fold (tree x)
   if (DECL_P (x) || CONSTANT_CLASS_P (x))
 return x;
 
-  if (tree cached = fold_cache.get (x))
-return cached;
+  if (fold_cache == NULL)
+fold_cache = hash_map::create_ggc (101);
+
+  if (tree *cached = fold_cache->get (x))
+return *cached;
 
   code = TREE_CODE (x);
   switch (code)
@@ -2295,10 +2299,10 @@ cp_fold (tree x)
   return org_x;
 }
 
-  fold_cache.put (org_x, x);
+  fold_cache->put (org_x, x);
   /* Prevent that we try to fold an already folded result again.  */
   if (x != org_x)
-fold_cache.put (x, x);
+fold_cache->put (x, x);
 
   return x;
 }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b1d2bfa..0f7e08f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5527,42 +5527,6 @@ extern cp_parameter_declarator *no_parameters;
 /* True if we saw "#pragma GCC java_exceptions".  */
 extern bool pragma_java_exceptions;
 
-/* Data structure for a mapping from tree to tree that's only used as a cache;
-   we don't GC-mark trees in the map, and we clear the map when collecting
-   garbage.  Global variables of this type must be marked
-   GTY((cache,deletable)) so that the gt_cleare_cache function is called by
-   ggc_collect but we do

[PATCH] Fix up AVX512 andnot (PR target/70525)

2016-04-04 Thread Jakub Jelinek

Hi!

This patch fixes various *andnot3* issues.  There are two issues on
the ISA side that makes stuff harder for andnot: there are no VPANDNB and
VPANDNW instructions, and while there used to be just VPANDN instruction
in AVX/AVX2, there is only VPANDND and VPANDNQ in EVEX.
The patch changes:
1) simplifies asserts, TARGET_AVX512VL implies both TARGET_AVX2 and
   TARGET_SSE2, so asserts like TARGET_AVX2 || TARGET_AVX512VL make no
   sense
2) for V32HImode/V64QImode it emits vpandnq instruction, rather than
   vpandn that fails to assemble
3) the *andnot3 pattern clearly wasn't expecting subst, but
   as it used (copy-paste?)  in the template, it actually
   was substed, which is wrong - we can't implement V64QImode or V32HImode
   masking of andnot (well, not in a single instruction); checked
   this was the only case of  used in define_insn
   without ; for V*[SD]Imode *andnot3_mask pattern
   should DTRT
4) the *andnot3_mask pattern makes no sense, for similar reasons
   - VPANDNB and VPANDNW are not in the ISA, not even with AVX512-BW
5) formatting fixes

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-04  Jakub Jelinek  

PR target/70525
* config/i386/sse.md (*andnot3): Simplify assertions.
Use vpandn for V16SI/V8DImode, vpandnq for
V32HI/V64QImode, don't use , fix up formatting.
(*andnot3_mask): Remove insn with VI12_AVX512VL iterator.

* gcc.target/i386/pr70525.c: New test.

--- gcc/config/i386/sse.md.jj   2016-04-01 17:21:31.0 +0200
+++ gcc/config/i386/sse.md  2016-04-04 14:42:06.296867515 +0200
@@ -11377,45 +11377,46 @@ (define_insn "*andnot3"
 case MODE_XI:
   gcc_assert (TARGET_AVX512F);
 case MODE_OI:
-  gcc_assert (TARGET_AVX2 || TARGET_AVX512VL);
+  gcc_assert (TARGET_AVX2);
 case MODE_TI:
-  gcc_assert (TARGET_SSE2 || TARGET_AVX512VL);
+  gcc_assert (TARGET_SSE2);
   switch (mode)
-  {
-case V16SImode:
-case V8DImode:
-  if (TARGET_AVX512F)
-  {
-tmp = "pandn";
-break;
-  }
-case V8SImode:
-case V4DImode:
-case V4SImode:
-case V2DImode:
-  if (TARGET_AVX512VL)
-  {
-tmp = "pandn";
-break;
-  }
-default:
-  tmp = TARGET_AVX512VL ? "pandnq" : "pandn";
-  }
+   {
+   case V64QImode:
+   case V32HImode:
+ /* There is no vpandnb or vpandnw instruction, nor vpandn for
+512-bit vectors. Use vpandnq instead.  */
+ tmp = "pandnq";
+ break;
+   case V16SImode:
+   case V8DImode:
+ tmp = "pandn";
+ break;
+   case V8SImode:
+   case V4DImode:
+   case V4SImode:
+   case V2DImode:
+ tmp = TARGET_AVX512VL ? "pandn" : "pandn";
+ break;
+   default:
+ tmp = TARGET_AVX512VL ? "pandnq" : "pandn";
+ break;
+   }
   break;
 
-   case MODE_V16SF:
+case MODE_V16SF:
   gcc_assert (TARGET_AVX512F);
-   case MODE_V8SF:
+case MODE_V8SF:
   gcc_assert (TARGET_AVX);
-   case MODE_V4SF:
+case MODE_V4SF:
   gcc_assert (TARGET_SSE);
 
   tmp = "andnps";
   break;
 
-   default:
+default:
   gcc_unreachable ();
-   }
+}
 
   switch (which_alternative)
 {
@@ -11423,7 +11424,7 @@ (define_insn "*andnot3"
   ops = "%s\t{%%2, %%0|%%0, %%2}";
   break;
 case 1:
-  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, 
%%2}";
+  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
   break;
 default:
   gcc_unreachable ();
@@ -11471,21 +11472,6 @@ (define_insn "*andnot3_mask"
   "vpandn\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}";
   [(set_attr "type" "sselog")
(set_attr "prefix" "evex")
-   (set_attr "mode" "")])
-
-(define_insn "*andnot3_mask"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
-   (vec_merge:VI12_AVX512VL
- (and:VI12_AVX512VL
-   (not:VI12_AVX512VL
- (match_operand:VI12_AVX512VL 1 "register_operand" "v"))
-   (match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm"))
- (match_operand:VI12_AVX512VL 3 "vector_move_operand" "0C")
- (match_operand: 4 "register_operand" "Yk")))]
-  "TARGET_AVX512BW"
-  "vpandn\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}";
-  [(set_attr "type" "sselog")
-   (set_attr "prefix" "evex")
(set_attr "mode" "")])
 
 (define_expand "3"
--- gcc/testsuite/gcc.target/i386/pr70525.c.jj  2016-04-04 15:13:23.417615588 
+0200
+++ gcc/testsuite/gcc.target/i386/pr70525.c 2016-04-04 15:13:04.0 
+0200
@@ -0,0 +1,32 @@
+/* PR target/70525 */
+/* { dg-do assemble { target avx512bw } } */
+/* { dg-options "-O2 -mavx512bw -mno-avx512vl" } */
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef long long v8di __attribute__

[PATCH] avoid false positives in trailing operator rule in check_GNU_style.sh

2016-04-04 Thread Martin Sebor


My recent change to check_GNU_style.sh (commit 1bbf98) introduces
false positives for declarations of functions with a pointer return
type.  For example, in the following, the asterisk in the "void *"
return type is incorrectly diagnosed as a trailing operator:

  -void
  -foo (void);
  +void *
  +foo (int);

The patch below avoids those false positive for file-scope and
namespace-scope functions.

Martin

index fbf6cb2..a7478f8 100755
--- a/contrib/check_GNU_style.sh
+++ b/contrib/check_GNU_style.sh
@@ -240,6 +240,7 @@ g 'There should be no space before closing 
parenthesis.' \

 g 'Braces should be on a separate line.' \
 '(\)|else)[[:blank:]]*{'

-# Does this apply to definition of aggregate objects?
-g 'Trailing operator.' \
+# Does this apply to definitions of aggregate objects?
+ag 'Trailing operator.' \
+  '^[1-9][0-9]*:\+[[:space:]]' \
   '(([^a-zA-Z_]\*)|([-%<=&|^?])|([^*]/)|([^:][+]))$'

Re: Do not optimize some polymorphic calls with -fsanitize=undefined

2016-04-04 Thread Jakub Jelinek

On Mon, Apr 04, 2016 at 05:50:53PM +0200, Jan Hubicka wrote:
> Hi,
> as requested by Jakub, this patch makes devirtualization code to turn off
> transformations based on assumption that cxa_pure_virtual will never be called
> by a virtual call when -fsanitize=undefined is used.
> 
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
> 
>   PR ipa/66223
>   * ipa-devirt.c (maybe_record_node): Do not optimize cxa_pure_virtual
>   calls when sanitizing.
>   (possible_polymorphic_call_target_p)" FIx formating.
> 
>   * g++.dg/ipa/devirt-51.C: New testcase.
> Index: ipa-devirt.c
> ===
> --- ipa-devirt.c  (revision 234715)
> +++ ipa-devirt.c  (working copy)
> @@ -2438,10 +2438,14 @@ maybe_record_node (vec  &
>  {
>gcc_assert (!target_node->global.inlined_to);
>gcc_assert (target_node->real_symbol_p ());
> +  /* When sanitizing, do not asume that cxa_pure_virutal is not called

s/asume/assume/
s/cxa/__cxa/
s/virutal/virtual/

> +  by valid program.  */
> +  if (flag_sanitize & SANITIZE_UNDEFINED)
> + ;

I'd use SANITIZE_UNREACHABLE instead, that is the sanitizer for
__builtin_unreachable ().  Unless we want to split that into
-fsanitize=unreachable
-fsanitize=pure-virtual

Jakub

[v3 PATCH] PR libstdc++/70437

2016-04-04 Thread Ville Voutilainen

Tested on Linux-PPC64.

2016-04-04  Ville Voutilainen  

 PR libstdc++/70437
 * include/bits/stl_pair.h (_ConstructiblePair,
_ImplicitlyConvertiblePair, _MoveConstructiblePair,
_ImplicitlyMoveConvertiblePair): Add shortcut conditions
for same-type cases.
* testsuite/20_util/pair/70437.cc: New.

Re: [v3 PATCH] PR libstdc++/70437

2016-04-04 Thread Ville Voutilainen

And yes, -ENOPATCH.

On 4 April 2016 at 21:42, Ville Voutilainen  wrote:
> Tested on Linux-PPC64.
>
> 2016-04-04  Ville Voutilainen  
>
>  PR libstdc++/70437
>  * include/bits/stl_pair.h (_ConstructiblePair,
> _ImplicitlyConvertiblePair, _MoveConstructiblePair,
> _ImplicitlyMoveConvertiblePair): Add shortcut conditions
> for same-type cases.
> * testsuite/20_util/pair/70437.cc: New.
diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index 7057030..206553a 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -90,29 +90,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template 
   constexpr bool _ConstructiblePair()
   {
-return __and_,
- is_constructible<_T2, const _U2&>>::value;
+return __and_<__or_::type,
+   typename decay<_U1>::type>,
+   is_constructible<_T1, const _U1&>>,
+ __or_::type,
+   typename decay<_U2>::type>,
+   is_constructible<_T2, const _U2&>>>::value;
   }
 
   template 
   constexpr bool _ImplicitlyConvertiblePair()
   {
-return __and_,
- is_convertible>::value;
+return __and_<__or_::type,
+   typename decay<_U1>::type>,
+   is_convertible>,
+ __or_::type,
+   typename decay<_U2>::type>,
+  is_convertible>>::value;
   }
 
   template 
   constexpr bool _MoveConstructiblePair()
   {
-return __and_,
- is_constructible<_T2, _U2&&>>::value;
+return __and_<__or_::type,
+   typename decay<_U1>::type>,
+   is_constructible<_T1, _U1&&>>,
+ __or_::type,
+   typename decay<_U2>::type>,
+   is_constructible<_T2, _U2&&>>>::value;
   }
 
   template 
   constexpr bool _ImplicitlyMoveConvertiblePair()
   {
-return __and_,
- is_convertible<_U2&&, _T2>>::value;
+return __and_<__or_::type,
+   typename decay<_U1>::type>,
+   is_convertible<_U1&&, _T1>>,
+ __or_::type,
+   typename decay<_U2>::type>,
+  is_convertible<_U2&&, _T2>>>::value;
   }
 
 
diff --git a/libstdc++-v3/testsuite/20_util/pair/70437.cc 
b/libstdc++-v3/testsuite/20_util/pair/70437.cc
new file mode 100644
index 000..37e6fb7
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/pair/70437.cc
@@ -0,0 +1,37 @@
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+template  struct B;
+
+template  struct A
+{
+  A(A&&) = default;
+  A(const B &);
+};
+
+template  struct B
+{
+  std::pair,int> a;
+  B(B&&) = default;
+};
+
+bool b = std::is_move_constructible >::value;

[COMMITTED] Add myself as GCC maintainer

2016-04-04 Thread Bill Seurer

I've added myself to the "Write After Approval" maintainers (Committed revision 
234724):

Index: ChangeLog
===
--- ChangeLog   (revision 234723)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2016-04-04  Bill Seurer  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2016-03-29  Kelvin Nilsen  
 
* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 234723)
+++ MAINTAINERS (working copy)
@@ -566,6 +566,7 @@ Martin Sebor

 Dodji Seketeli 
 Svein Seldal   
 Thiemo Seufer  
+Bill Seurer
 Marcus Shawcroft   
 Tim Shen   
 David Sherwood 
-- 

-Bill Seurer

Re: [AArch64] Emit division using the Newton series

2016-04-04 Thread Evandro Menezes


On 04/01/16 17:52, Evandro Menezes wrote:

On 04/01/16 17:45, Wilco Dijkstra wrote:

Evandro Menezes wrote:


However, I don't think that there's the need to handle any special case
for division.  The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN.  So I
don't think that it's a special case that deserves being handled.  IOW,
the result of the approximate reciprocal is always needed.

  No, the result of the approximate reciprocal is not needed.

Basically a NR approximation produces a correction factor that is 
very close
to 1.0, and then multiplies that with the previous estimate to get a 
more

accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)

The computation of the final reciprocal_correction is on the critical 
latency

path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. 
we saved

a multiply.

In principle this could be done as a separate optimization pass that 
tries to
reassociate to reduce latency. However I'm not too convinced this 
would be

easy to implement in GCC's scheduler, so it's best to do it explicitly.


I think that I see what you mean.  I'll hack something tomorrow.


   [AArch64] Emit division using the Newton series

   2016-04-04  Evandro Menezes  
Wilco Dijkstra 

   gcc/
* config/aarch64/aarch64-tuning-flags.def
* config/aarch64/aarch64-protos.h
(AARCH64_APPROX_MODE): New macro.
   (AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
New tuning macros.
(tune_params): Add new member "approx_div_modes".
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_div_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div3"): New expansion.
* config/aarch64/aarch64-simd.md ("div3"): Likewise.
* config/aarch64/aarch64.opt (-mlow-precision-div): Add new
   option.
* doc/invoke.texi (-mlow-precision-div): Describe new option.


This version of the patch has a shorter dependency chain at the last 
iteration of the series.


Thank you for your feedback,

--
Evandro Menezes

>From c8d94247e5b3c6120436051c8da11850937b7246 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Apr 2016 14:02:24 -0500
Subject: [PATCH] [AArch64] Emit division using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra 

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	* config/aarch64/aarch64-protos.h
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
	New tuning macros.
	(tune_params): Add new member "approx_div_modes".
	(aarch64_emit_approx_div): Declare new function.
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_div_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(aarch64_emit_approx_div): Define new function.
	* config/aarch64/aarch64.md ("div3"): New expansion.
	* config/aarch64/aarch64-simd.md ("div3"): Likewise.
	* config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
	* doc/invoke.texi (-mlow-precision-div): Describe new option.
---
 gcc/config/aarch64/aarch64-protos.h | 28 +
 gcc/config/aarch64/aarch64-simd.md  | 14 -
 gcc/config/aarch64/aarch64-tuning-flags.def |  1 -
 gcc/config/aarch64/aarch64.c| 98 ++---
 gcc/config/aarch64/aarch64.md   | 19 --
 gcc/config/aarch64/aarch64.opt  |  5 ++
 gcc/doc/invoke.texi | 10 +++
 7 files changed, 161 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 58c9d0d..25102d5 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,32 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE

Re: Do not optimize some polymorphic calls with -fsanitize=undefined

2016-04-04 Thread Jan Hubicka

> On Mon, Apr 04, 2016 at 05:50:53PM +0200, Jan Hubicka wrote:
> > Hi,
> > as requested by Jakub, this patch makes devirtualization code to turn off
> > transformations based on assumption that cxa_pure_virtual will never be 
> > called
> > by a virtual call when -fsanitize=undefined is used.
> > 
> > Bootstrapped/regtested x86_64-linux, will commit it shortly.
> > 
> > PR ipa/66223
> > * ipa-devirt.c (maybe_record_node): Do not optimize cxa_pure_virtual
> > calls when sanitizing.
> > (possible_polymorphic_call_target_p)" FIx formating.
> > 
> > * g++.dg/ipa/devirt-51.C: New testcase.
> > Index: ipa-devirt.c
> > ===
> > --- ipa-devirt.c(revision 234715)
> > +++ ipa-devirt.c(working copy)
> > @@ -2438,10 +2438,14 @@ maybe_record_node (vec  &
> >  {
> >gcc_assert (!target_node->global.inlined_to);
> >gcc_assert (target_node->real_symbol_p ());
> > +  /* When sanitizing, do not asume that cxa_pure_virutal is not called
> 
> s/asume/assume/
> s/cxa/__cxa/
> s/virutal/virtual/
> 
> > +by valid program.  */
> > +  if (flag_sanitize & SANITIZE_UNDEFINED)
> > +   ;
> 
> I'd use SANITIZE_UNREACHABLE instead, that is the sanitizer for
> __builtin_unreachable ().  Unless we want to split that into
> -fsanitize=unreachable
> -fsanitize=pure-virtual

Thanks. This is about case where we optimize undefined call (which would
otherwise land in cxa_pure_virtual) into some other virtual method (that
is the only resonable choice).  I suppose UNREACHABLE makes sense here.
I think I already commited the patch but I will update this.

Honza

> 
>   Jakub

[PATCH] Fix PR70509 (wrong code with extract from a v64qi)

2016-04-04 Thread Zdenek Sojka

Hello,

as discussed in the PR, when forwprop propagates VEC_PERM_EXPR through 
BIT_FIELD_REF in simplify_bitfield_ref(), the new BIT_FIELD_REF has the index 
of the same type as the vector base type. This patch changes the index to 
bitsize_int().

Bootstrapped on x86_64-pc-linux-gnu, regression test shows the same result as 
without the patch.

I've test both with/out the second patch mentioned in PR70509#c6 ; it was 
pre-approved by Jakub in the PR. I haven't managed to generate a testcase for 
it though.

I was unable to run the testcase via "make check" since my CPU doesn't support 
the avx512bw instruction set; however I have tested a standalone testcase in an 
emulator, before integrating it to the testsuite.

Please replace the changelog entries with a better ones if needed.

Thanks,
Zdenek

gcc/Changelog:

2016-04-04  Zdenek Sojka  

    PR tree-optimization/70509
    * tree-ssa-forwprop.c (simplify_bitfield_ref): Use bitsize_int () instead 
of the vector base type for index.


gcc/testsuite/Changelog:

2016-04-04  Zdenek Sojka  

    PR tree-optimization/70509
    * gcc.target/i386/avx512bw-pr70509.c: New.
=Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c	(revision 234705)
+++ gcc/tree-ssa-forwprop.c	(working copy)
@@ -1773,7 +1773,7 @@
 
   if (code == VEC_PERM_EXPR)
 {
-  tree p, m, index, tem;
+  tree p, m, tem;
   unsigned nelts;
   m = gimple_assign_rhs3 (def_stmt);
   if (TREE_CODE (m) != VECTOR_CST)
@@ -1790,9 +1790,8 @@
 	  p = gimple_assign_rhs2 (def_stmt);
 	  idx -= nelts;
 	}
-  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
   tem = build3 (BIT_FIELD_REF, TREE_TYPE (op),
-		unshare_expr (p), op1, index);
+		unshare_expr (p), op1, bitsize_int (idx * size));
   gimple_assign_set_rhs1 (stmt, tem);
   fold_stmt (gsi);
   update_stmt (gsi_stmt (*gsi));
Index: gcc/testsuite/gcc.target/i386/avx512bw-pr70509.c
===
--- gcc/testsuite/gcc.target/i386/avx512bw-pr70509.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr70509.c	(working copy)
@@ -0,0 +1,26 @@
+/* PR tree-optimization/70509 */
+/* { dg-do run } */
+/* { dg-options "-O1 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#define AVX512BW
+#include "avx512f-helper.h"
+
+typedef char V __attribute__ ((vector_size (64)));
+
+int __attribute__ ((noinline, noclone))
+foo (V u, V v)
+{
+  u /= v[0x20];
+  return u[0];
+}
+
+void
+TEST (void)
+{
+  int x = foo ((V) { 9 }, (V) { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+3 });
+  if (x != 3)
+abort ();
+}

Re: [patch] Fortran fix for PR70289

2016-04-04 Thread Cesar Philippidis

On 04/01/2016 08:17 AM, Jakub Jelinek wrote:
> On Fri, Apr 01, 2016 at 08:07:24AM -0700, Cesar Philippidis wrote:
>> On 04/01/2016 07:56 AM, Jakub Jelinek wrote:
>>> On Fri, Apr 01, 2016 at 07:49:16AM -0700, Cesar Philippidis wrote:
 The bug in PR70289 is an assertion failure triggered by a static
 variable used inside an offloaded acc region which doesn't have a data
 clause associated with it. Basically, that static variable ends up in a
 different lto partition, which was not streamed to the offloaded
 compiler. I'm not sure if we should try to replicate the static storage
 in the offloaded regions, but it probably doesn't make sense in a
 parallel environment anyway.
>>>
>>> Is this really Fortran specific?  I'd expect the diagnostics to be in
>>> gimplify.c and handle it for all 3 FEs...
>>
>> By the time the variable reaches the gimplifier, the reduction variable
>> may no longer match the ones inside the data clause. E.g. consider this
>> directive inside a fortran subroutine:
>>
>>   !$acc parallel copyout(temp) reduction(+:temp)
>>
>> The gimplifier would see something like:
>>
>>   map(force_from:*temp.2 [len: 4]) map(alloc:temp [pointer assign, bias:
>> 0]) reduction(+:temp)
>>
>> At this point, unless I'm mistaken, it would be difficult to tell if
>> temp.2 is a pointer to the same temp in the reduction. Maybe I'm missing
>> something?
> 
> All the info is still there, and this wouldn't be the only case where
> we rely on exact clause ordering.  I think that is still much better than

The gimplify approach didn't turn out to be that bad after all. Is this
patch ok for trunk? It fixes the problem for all fo the FEs.

Cesar
2016-04-04  Cesar Philippidis  

	gcc/
	* gimplify.c (gimplify_adjust_acc_parallel_reductions): New function.
	(gimplify_omp_workshare): Call it.  Add new data clauses for acc
	parallel reductions as needed.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-5.c: New test.
	* c-c++-common/goacc/reduction-promotions.c: New test.
	* gfortran.dg/goacc/reduction-3.f95: New test.
	* gfortran.dg/goacc/reduction-promotions.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr70289.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr70373.c: New test.
	* testsuite/libgomp.oacc-fortran/pr70289.f90: New test.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b9757db..4625881 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9484,6 +9484,108 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   OMP_TARGET_CLAUSES (target) = c;
 }
 
+/* OpenACC parallel reductions need a present_or_copy clause to ensure
+   that the original variable used in the reduction gets updated on
+   the host.  This function scans CLAUSES for reductions and adds or
+   adjusts the data clauses as necessary.  Any incompatible data clause
+   will be reported as a warning and promoted to present_or_copy.  Any
+   private reduction will be treated as an error.  This function
+   returns a list of new present_or_copy date clauses.  */
+
+static tree
+gimplify_adjust_acc_parallel_reductions (tree *clauses)
+{
+  tree c, list = NULL_TREE;
+  hash_set *reduction_decls;
+  reduction_decls = new hash_set;
+
+  /* Scan 1: Construct a hash set with all of the reduction decls.  */
+  for (c = *clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+	reduction_decls->add (OMP_CLAUSE_DECL (c));
+}
+
+  if (reduction_decls->elements () == 0)
+goto cleanup;
+
+  /* Scan 2: Adjust the data clause for each reduction.  */
+  for (c = *clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  int kind = -1;
+  tree decl;
+
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	case OMP_CLAUSE_MAP:
+	  kind = OMP_CLAUSE_MAP_KIND (c);
+	case OMP_CLAUSE_PRIVATE:
+	case OMP_CLAUSE_FIRSTPRIVATE:
+	  decl = OMP_CLAUSE_DECL (c);
+
+	  /* Reference variables always have a GOMP_MAP_ALLOC.  Ignore it.  */
+	  if (POINTER_TYPE_P (TREE_TYPE (decl))
+	  && kind == GOMP_MAP_ALLOC)
+	break;
+
+	  if (!DECL_P (decl))
+	decl = TREE_OPERAND (decl, 0);
+	  gcc_assert (DECL_P (decl));
+
+	  if (!reduction_decls->contains (decl))
+	break;
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP)
+	{
+	  if (!((kind & GOMP_MAP_TOFROM) == GOMP_MAP_TOFROM
+		|| kind == GOMP_MAP_FORCE_PRESENT))
+		{
+		  warning_at (OMP_CLAUSE_LOCATION (c), 0, "incompatible data "
+			  "clause with reduction on %qE; promoting to "
+			  "present_or_copy", DECL_NAME (decl));
+
+		  OMP_CLAUSE_CODE (c) = OMP_CLAUSE_MAP;
+		  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+		}
+	  reduction_decls->remove (decl);
+	  break;
+	}
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
+	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+	{
+	  error_at (OMP_CLAUSE_LOCATION (c), "invalid private reduction "
+			  "on %qE", DECL_NAME (decl));
+	  reduction_decls->remove (decl);
+	}
+	default:;
+	}
+}
+
+  if (reduction_decls->ele

RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

2016-04-04 Thread Matthew Fortune

Hi Robert,

Apologies for the long delay again. This patch is hard to get through. My 
comments
are not all in source sequence but I've tried to keep them short. With a few 
minor
things fixed and some trivial style issues done then this is ready to go. I've 
left
a number of things to look at after getting this patch in as I can't track any 
more
significant updates to this:

> mips_gen_const_int_vector
This should use gen_int_for_mode instead of GEN_INT to avoid the issues that 
msa_ldi is
trying to handle.

> mips_const_vector_same_bytes_p
comment on this function is same as previous function

> mips_msa_idiv_insns
Why not just update mips_idiv_insns and add a mode argument?

> Implement TARGET_PRINT_OPERAND.
Comment spacing between 'E' 'B' and description is different to existing

> mips_print_operand
case 'v' subcases V4SImode and V4SFmode are identical. same for DI/DF.

>@@ -12272,13 +12837,25 @@ mips_class_max_nregs (enum reg_class rclass, 
>machine_mode mode)
>   if (hard_reg_set_intersect_p (left, reg_class_contents[(int) ST_REGS]))
> {
>   if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
>-  size = MIN (size, 4);
>+  {
>+if (MSA_SUPPORTED_MODE_P (mode))
>+  size = MIN (size, UNITS_PER_MSA_REG);
>+else
>+  size = MIN (size, UNITS_PER_FPREG);
>+  }
>+

This hunk should be removed. MSA modes are not supported in ST_REGS.

>@@ -12299,6 +12876,10 @@ mips_cannot_change_mode_class (machine_mode from,
>   && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
> return false;
> 
>+  /* Allow conversions between different MSA vector modes and TImode.  */

Remove 'and TImode' we do not support it.

>@@ -19497,9 +21284,64 @@ mips_expand_vec_unpack (rtx operands[2], bool 
>unsigned_p, bool high_p)
>+if (!unsigned_p)
>+{
>+  /* Extract sign extention for each element comparing each element with
>+   immediate zero.  */
>+  tmp = gen_reg_rtx (imode);
>+  emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode)));
>+}
>+else
>+{
>+  tmp = force_reg (imode, CONST0_RTX (imode));
>+}

Indentation and unnecessary braces on the else.

+   A single N-word move is usually the same cost as N single-word moves.
+   For MSA, we set MOVE_MAX to 16 bytes.
+   Then, MAX_MOVE_MAX is 16 unconditionally.  */
+#define MOVE_MAX (TARGET_MSA ? 16 : UNITS_PER_WORD)
+#define MAX_MOVE_MAX 16

The 16 here should be UNITS_PER_MSA_REG

> mips_expand_builtin_insn

General comment about operations that take an immediate. There is code to 
perform range
checking but it does not seem to leave any trail when the maybe_expand_insn 
fails to 
tell the user it was an out of range immediate that was the problem. (follow up 
work)

>+case CODE_FOR_msa_andi_b:
>+case CODE_FOR_msa_ori_b:
>+case CODE_FOR_msa_nori_b:
>+case CODE_FOR_msa_xori_b:
>+  gcc_assert (has_target_p && nops == 3);
>+  if (!CONST_INT_P (ops[2].value))
>+  break;
>+  ops[2].mode = ops[0].mode;
>+  /* We need to convert the unsigned value to signed.  */
>+  val = sext_hwi (INTVAL (ops[2].value),
>+GET_MODE_UNIT_PRECISION (ops[2].mode));
>+  ops[2].value = mips_gen_const_int_vector (ops[2].mode, val);
>+  break

Isn't the sext_hwi just effectively doing what gen_int_for_mode would? Fixing
mips_gen_const_int_vector would eliminate all of them.

>@@ -527,7 +551,9 @@ (define_attr "insn_count" ""
>(const_int 2)
> 
>(eq_attr "type" "idiv,idiv3")
>-   (symbol_ref "mips_idiv_insns ()")
>+   (cond [(eq_attr "mode" "TI")
>+  (symbol_ref "mips_msa_idiv_insns () * 4")]
>+  (symbol_ref "mips_idiv_insns () * 4"))

Why *4?

>@@ -1537,8 +1553,10 @@ FP_ASM_SPEC "\
> #define LONG_LONG_ACCUM_TYPE_SIZE (TARGET_64BIT ? 128 : 64)
> 
> /* long double is not a fixed mode, but the idea is that, if we
>-   support long double, we also want a 128-bit integer type.  */
>-#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE
>+   support long double, we also want a 128-bit integer type.
>+   For MSA, we support an integer type with a width of BITS_PER_MSA_REG.  */
>+#define MAX_FIXED_MODE_SIZE \
>+  (TARGET_MSA ? BITS_PER_MSA_REG : LONG_DOUBLE_TYPE_SIZE)

This doesn't seem right. We don't support TImode with MSA.

>diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
>new file mode 100644
>index 000..79e382d
>--- /dev/null
>+++ b/gcc/config/mips/mips-msa.md
>@@ -0,0 +1,2725 @@
>+(define_insn "msa_copy_s_"
>+  [(set (match_operand: 0 "register_operand" "=d")
>+  (vec_select:
>+(match_operand:MSA_W 1 "register_operand" "f")
>+(parallel [(match_operand 2 "const__operand" "")])))]
>+  "ISA_HAS_MSA"
>+  "copy_s.\t%0,%w1[%2]"
>+  [(set_attr "type" "simd_copy")
>+   (set_attr "mode" "")])

There is a sign extend version of this pattern needed for TARGET_64BIT widening
to DImode.

>+(define_expand "msa_ldi"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operan

Re: [patch] Fix PR target/67172

2016-04-04 Thread Eric Botcazou

> That's t-cygming, not t-cygwin, right?

Right, sorry about that.

> So, I think this is ok.

Thanks!

-- 
Eric Botcazou

Re: [PATCH] 69517 - [5/6 regression] SEGV on a VLA with excess initializer elements

2016-04-04 Thread Jason Merrill


On 04/01/2016 07:02 PM, Martin Sebor wrote:

Fair enough.  I don't think we can impose an arbitrary 64K limit,
however, as that is a lot smaller than the 8MB default stack size, and
programs can use setrlimit to increase the stack farther.  For GCC 6 let
not impose any limit beyond non-negative/overflowing, and as you say we
can do something better in GCC 7.


Would you be open to imposing a somewhat more permissive limit,
say on the order of one or a few megabytes (but less than the
default 8 MB on Linux)?

I ask because I expect the majority of programmer errors with
VLAs to be due to out-of-range bounds that couldn't be
accommodated even if stack space was extended well beyond
the Linux default (say hundreds of MB), or that would result
in complete stack space exhaustion.  I.e., not caused by
deliberately trying to create very large VLAs but rather by
accidentally creating VLAs with unconstrained bounds (due to
a failure to validate input, uninitialized unsigned variables,
etc.)

I expect fewer cases to be due to negative or zero bounds or
excessive initializers.

I also expect programmers to want to find out about such bugs
sooner (i.e., in unit testing with plentiful stack space) rather
than after software has been deployed (and under low stack space
conditions not exercised during unit testing).

To that end, I think a lower limit is going to be more helpful
than a permissive one (or none at all).



But if even a few MB seems too strict, I would find having even
an exceedingly liberal limit (say 1GB) much preferable to none
at all as it makes it possible to exercise boundary conditions
such as the size overflow problem you noted below.


That sounds reasonable, as long as users with unusual needs can adjust 
it with a flag, but even so I'm nervous about doing this in stage 4.  It 
certainly isn't a regression.



During additional testing it dawned on me that there is no good
way to validate (or even to initialize) the initializer list of
a multi-dimensional VLA that isn't unambiguously braced.

For example, the following VLA with N = 2 and N = 3:

 int A [M][N] = { 1, 2, 3, 4 };

Unpatched, GCC initializes it to { { 1, 2, 3 }, { 0, 0, 0 } }.
With my first patch, GCC throws.  Neither is correct, but doing
the right thing would involve emitting parameterizing the
initialization code for the value of each bound.  While that
might be doable it feels like a bigger change than I would be
comfortable attempting at this stage.  To avoid the problem I've
made it an error to specify a partially braced VLA initializer.


Sounds good.


If you think it's worthwhile, I can see about implementing the
runtime reshaping in stage 1.


No, thanks.  I think requiring explicit bracing is fine.


It seems to me that we want use the existing check for excess
initializers in build_vec_init, in the if (length_check) section, though
as you mention in 70019 that needs to be improved to handle STRING_CST.


I don't think modifying build_vec_init() alone would be sufficient.
For example, the function isn't called for a VLA with a constant
bound like this one:

  int A [2][N] = { 1, 2, 3, 4 };


That seems like a bug, due to array_of_runtime_bound_p returning false 
for that array.



Also, I think we should check for invalid bounds in
compute_array_index_type, next to the UBsan code.  Checking bounds only
from cp_finish_decl means that we don't check uses of VLA types other
than variable declarations.


You mean VLA typedefs?  That's true, though I have consciously
avoided dealing with those.  They're outlawed in N3639 and so
I've been focusing just on variables.  But since GCC accepts
VLA typedefs too I was thinking I would bring them up at some
point in the future to decide what to do about them.


And cast to pointer to VLAs.  But for non-variable cases we don't care 
about available stack, so we wouldn't want your allocation limit to apply.



As for where to add the bounds checking code, I also at first
thought of checking the bounds parallel to the UBSan code in
compute_array_index_type() and even tried that approach. The
problem with it is that it only considers one array dimension
at a time, without the knowledge of the others.  As a result,
as noted in sanitizer/70051, it doesn't correctly detect
overflows in the bounds of multidimensional VLAs.


It doesn't, but I don't see why it couldn't.  It should be fine to check 
each dimension for overflow separately; if an inner dimension doesn't 
overflow, we can go on and consider the outer dimension.


Incidentally, I was wondering if it would make sense to use the 
overflowing calculation for both TYPE_SIZE and the sanity check when 
we're doing both.



+  /* Avoid instrumenting constexpr functions.  Those must
+ be checked statically, and the (non-constexpr) dynamic
+ instrumentation would cause them to be rejected.  */



Hmm, this sounds wrong; constexpr functions can also be called with
non-constant arguments, and the instrumentation sho

Re: [PATCH] Remove class cache_map and use ggc hash_maps instead (PR c++/70452)

2016-04-04 Thread Jason Merrill


On 04/04/2016 01:08 PM, Patrick Palka wrote:

+  tree *slot = &cv_cache->get_or_insert (t, NULL);
+  if (*slot == NULL_TREE)
+*slot = maybe_constant_value_1 (t, decl);

...

-  fold_cache.put (org_x, x);
+  *slot = x;


This pattern isn't safe; the slot might move due to hash table resizing 
between the get_or_insert and the store to *slot.  Just use get/put.


Jason

Re: [PATCH] Remove class cache_map and use ggc hash_maps instead (PR c++/70452)

2016-04-04 Thread Jason Merrill


Oops, I should have read the followup.  :)

This one is OK, thanks.

Jason

68 matches

Mail list logo