date:20150218

Re: The nvptx port [0/11+]

2015-02-18 Thread Thomas Schwinge

Hi!

On Mon, 20 Oct 2014 16:17:56 +0200, Bernd Schmidt  
wrote:
> This is a patch kit that adds the nvptx port to gcc.

Committed to trunk in r220781:

commit 0f7695734890f93fe58179e36ac2f41bf4147d78
Author: tschwinge 
Date:   Wed Feb 18 08:01:03 2015 +

nvptx-none: Disable the lto-plugin.

config/
* elf.m4 (ACX_ELF_TARGET_IFELSE): nvptx-*-none isn't ELF.
/
* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220781 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog|4 
 config/ChangeLog |4 
 config/elf.m4|7 +--
 configure|3 ++-
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git ChangeLog ChangeLog
index 0969af5..a9e4437 100644
--- ChangeLog
+++ ChangeLog
@@ -1,3 +1,7 @@
+2015-02-18  Thomas Schwinge  
+
+   * configure: Regenerate.
+
 2015-02-06  Diego Novillo  
 
* MAINTAINERS (Global Reviewers, Plugin, LTO, tree-ssa,
diff --git config/ChangeLog config/ChangeLog
index 2cbc885..c9ed121 100644
--- config/ChangeLog
+++ config/ChangeLog
@@ -1,3 +1,7 @@
+2015-02-18  Thomas Schwinge  
+
+   * elf.m4 (ACX_ELF_TARGET_IFELSE): nvptx-*-none isn't ELF.
+
 2014-11-17  Bob Dunlop  
 
* mt-ospace (CFLAGS_FOR_TARGET): Append -g -Os rather than
diff --git config/elf.m4 config/elf.m4
index da051cb..1772a44 100644
--- config/elf.m4
+++ config/elf.m4
@@ -1,4 +1,4 @@
-dnl Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+dnl Copyright (C) 2010, 2011, 2015 Free Software Foundation, Inc.
 dnl This file is free software, distributed under the terms of the GNU
 dnl General Public License.  As a special exception to the GNU General
 dnl Public License, this file may be distributed as part of a program
@@ -7,6 +7,8 @@ dnl the same distribution terms as the rest of that program.
 
 dnl From Paolo Bonzini.
 
+dnl Is this an ELF target supporting the LTO plugin?
+
 dnl usage: ACX_ELF_TARGET_IFELSE([if-elf], [if-not-elf])
 AC_DEFUN([ACX_ELF_TARGET_IFELSE], [
 AC_REQUIRE([AC_CANONICAL_TARGET])
@@ -15,7 +17,8 @@ target_elf=no
 case $target in
   *-darwin* | *-aix* | *-cygwin* | *-mingw* | *-aout* | *-*coff* | \
   *-msdosdjgpp* | *-vms* | *-wince* | *-*-pe* | \
-  alpha*-dec-osf* | *-interix* | hppa[[12]]*-*-hpux*)
+  alpha*-dec-osf* | *-interix* | hppa[[12]]*-*-hpux* | \
+  nvptx-*-none)
 target_elf=no
 ;;
   *)
diff --git configure configure
index dd794db..f20a6ab 100755
--- configure
+++ configure
@@ -6047,7 +6047,8 @@ target_elf=no
 case $target in
   *-darwin* | *-aix* | *-cygwin* | *-mingw* | *-aout* | *-*coff* | \
   *-msdosdjgpp* | *-vms* | *-wince* | *-*-pe* | \
-  alpha*-dec-osf* | *-interix* | hppa[12]*-*-hpux*)
+  alpha*-dec-osf* | *-interix* | hppa[12]*-*-hpux* | \
+  nvptx-*-none)
 target_elf=no
 ;;
   *)


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [Haifa Scheduler] Fix latent bug in macro-fusion/instruction grouping

2015-02-18 Thread Maxim Kuvyrkov

On Feb 17, 2015, at 9:43 AM, Jeff Law  wrote:

> On 02/11/15 02:20, James Greenhalgh wrote:
>> 
>> On Mon, Feb 09, 2015 at 11:16:56PM +, Jeff Law wrote:
>>> On 02/06/15 05:24, James Greenhalgh wrote:

 ---
 2015-02-06  James Greenhalgh  

* haifa-sched.c (recompute_todo_spec): After applying a
replacement and cancelling a dependency, also clear the
SCHED_GROUP_P flag.
>>> My worry here would be that we might be clearing a SCHED_GROUP_P that
>>> had been set for some reason other than macro-fusion.
>> 
>> Yeah, I also had this worry. This patch tackles the problem from the
>> other direction. If we see a SCHED_GROUP_P on an insn, treat it as a
>> hard dependency, and don't try to rewrite it. I think this will always
>> be "safe" but it might pessimize if the dependency breaker would have
>> resulted in better code generation.
>> 
>> I don't think this gives you anything towards fixing your bug, but
>> it clears mine.
> Right.  Mine was in the management of the ready queue.  We allowed something 
> with SCHED_GROUP_P to get deferred for several cycles.  While it was deferred 
> another insn that was previously deferred became ready and fired.  That 
> messed up the scheduling group and ultimately resulted in incorrect code.
> 
> The fix was actually pretty simple, We just queue the SCHED_GROUP_P for a 
> single cycle, then reevaluate.

The way SCHED_GROUP_P instructions have been handled historically is by 
combination of two artifacts: (1) removing all dependencies for instructions 
inside SCHED_GROUP sequence but the one to next insn, and (2) maintaining a 
fast track for SCHED_GROUP insns that ensures that once the first SCHED_GROUP 
insn is issued, scheduler does nothing but issuing the single dependent insn of 
the current one.

The side effect of (1) is that scheduling SCHED_GROUP in the normal flow will 
cause correctness problems (what Jeff is seeing) since some/most of the 
dependencies of SCHED_GROUP_P insn were removed.

My educated guess is that the problem was introduced by Bernd's major reworking 
of the scheduler.  The enforcement of (2) is now done in prune_ready_list, 
which doesn't seem to handle a couple of conner cases.  One corner case is what 
happens when SCHED_GROUP insn is delayed for several cycles.  The second one 
(that I know off) is what will happen if first instructions of two or more of 
separate SCHED_GROUPs become ready at the same cycle.

The first corner case, I believe, used to be handled with help of 
last_scheduled_insn, which can't be used reliably anymore due to backtracking.  
The second corner case, I believe, was never handled properly.

> 
>> 
>> I've bootstrapped and tested on x86_64-unknown-linux-gnu with no
>> issues and given it a quick check on the problem code from before,
>> where it has the desired impact.
>> 
>> Thanks,
>> James
>> 
>> ---
>> 2015-02-10  James Greenhalgh  
>> 
>>  * haifa-sched.c (recompute_todo_spec): Treat SCHED_GROUP_P
>>  as forcing a HARD_DEP between instructions, thereby
>>  disallowing rewriting to break dependencies.
> OK.
> jeff

The patch looks good to me too.  Once SCHED_GROUP_P is set on an insn, it 
becomes untouchable due to lack of complete dependency information.

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH, AArch64] Handle SYMBOL_SMALL_TPREL appropriately

2015-02-18 Thread Marcus Shawcroft

On 18 February 2015 at 04:45, Hurugalawadi, Naveen
 wrote:
> Hi Marcus,
>
> Thanks for the review.
>
>>> OK, but fix the trailing white space in the patch
>
> Done. Committed with the modification.
>
>>>  Can you prepare a backport into 4.9
>
> ILP32 support is not completely added in 4.9 and hence the patch
> is not needed.

The handling of SYMBOL_SMALL_TPREL is present in 4.9 and very clearly
has exactly the same issue.

[PATCH][C++] PR 65071

2015-02-18 Thread Andrea Azzarone

Hi all,

this patch try to fix PR c++/65071 (ICE on valid, sizeof...() of
template template parameter pack in return type).

2015-2-18 Andrea Azzarone 
  PR c++/65071
  * gcc/cp/parser.c (cp_parser_sizeof_pack) Also consider template
template parameters.

Thanks.

-- 
Andrea Azzarone

Re: [PATCH][C++] PR 65071

2015-02-18 Thread Andrea Azzarone

Ops, forgot the diff.

2015-02-18 9:19 GMT+01:00 Andrea Azzarone :
> Hi all,
>
> this patch try to fix PR c++/65071 (ICE on valid, sizeof...() of
> template template parameter pack in return type).
>
> 2015-2-18 Andrea Azzarone 
>   PR c++/65071
>   * gcc/cp/parser.c (cp_parser_sizeof_pack) Also consider template
> template parameters.
>
> Thanks.
>
> --
> Andrea Azzarone



-- 
Andrea Azzarone
http://launchpad.net/~andyrock
http://wiki.ubuntu.com/AndreaAzzarone
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c	(revision 220698)
+++ gcc/cp/parser.c	(working copy)
@@ -24369,7 +24369,7 @@ cp_parser_sizeof_pack (cp_parser *parser
   if (expr == error_mark_node)
 cp_parser_name_lookup_error (parser, name, expr, NLE_NULL,
  token->location);
-  if (TREE_CODE (expr) == TYPE_DECL)
+  if (TREE_CODE (expr) == TYPE_DECL || TREE_CODE (expr) == TEMPLATE_DECL)
 expr = TREE_TYPE (expr);
   else if (TREE_CODE (expr) == CONST_DECL)
 expr = DECL_INITIAL (expr);
Index: gcc/testsuite/g++.dg/cpp0x/vt-65071.C
===
--- gcc/testsuite/g++.dg/cpp0x/vt-65071.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/vt-65071.C	(working copy)
@@ -0,0 +1,9 @@
+// PR c++/65071
+// { dg-do compile { target c++11 } }
+
+template struct S {};
+
+template class... T, int N>
+S foo(T...);
+
+auto x = foo(S<2>{});

nvptx mkoffload: For non-installed testing, look in all COMPILER_PATHs for GCC_INSTALL_NAME (was: nvptx offloading patches [4/n])

2015-02-18 Thread Thomas Schwinge

Hi!

On Sat, 1 Nov 2014 13:11:29 +0100, Bernd Schmidt  
wrote:
> [nvptx mkoffload]

To support the --enable-offload-targets=nvptx-none=[install directory]
configuration option, I committed the following to trunk in r220782 (and
filed ):

commit a7243b5200794d53b01d59fa69d467a0545db73f
Author: tschwinge 
Date:   Wed Feb 18 08:17:32 2015 +

nvptx mkoffload: For non-installed testing, look in all COMPILER_PATHs for 
GCC_INSTALL_NAME.

gcc/
* config/nvptx/mkoffload.c (parse_env_var, free_array_of_ptrs)
(access_check): New functions, copied from
config/i386/intelmic-mkoffload.c.
(main): For non-installed testing, look in all COMPILER_PATHs for
GCC_INSTALL_NAME.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220782 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|6 +++
 gcc/config/nvptx/mkoffload.c |  103 ++
 2 files changed, 109 insertions(+)

diff --git gcc/ChangeLog gcc/ChangeLog
index 180a605..0f144f5 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,5 +1,11 @@
 2015-02-18  Thomas Schwinge  
 
+   * config/nvptx/mkoffload.c (parse_env_var, free_array_of_ptrs)
+   (access_check): New functions, copied from
+   config/i386/intelmic-mkoffload.c.
+   (main): For non-installed testing, look in all COMPILER_PATHs for
+   GCC_INSTALL_NAME.
+
* config/nvptx/nvptx.h (GOMP_SELF_SPECS): Define macro.
 
 2015-02-18  Andrew Pinski  
diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
index 96341b8..02c44b6 100644
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -762,6 +762,78 @@ parse_file (Token *tok)
   return tok;
 }
 
+/* Parse STR, saving found tokens into PVALUES and return their number.
+   Tokens are assumed to be delimited by ':'.  */
+static unsigned
+parse_env_var (const char *str, char ***pvalues)
+{
+  const char *curval, *nextval;
+  char **values;
+  unsigned num = 1, i;
+
+  curval = strchr (str, ':');
+  while (curval)
+{
+  num++;
+  curval = strchr (curval + 1, ':');
+}
+
+  values = (char **) xmalloc (num * sizeof (char *));
+  curval = str;
+  nextval = strchr (curval, ':');
+  if (nextval == NULL)
+nextval = strchr (curval, '\0');
+
+  for (i = 0; i < num; i++)
+{
+  int l = nextval - curval;
+  values[i] = (char *) xmalloc (l + 1);
+  memcpy (values[i], curval, l);
+  values[i][l] = 0;
+  curval = nextval + 1;
+  nextval = strchr (curval, ':');
+  if (nextval == NULL)
+   nextval = strchr (curval, '\0');
+}
+  *pvalues = values;
+  return num;
+}
+
+/* Auxiliary function that frees elements of PTR and PTR itself.
+   N is number of elements to be freed.  If PTR is NULL, nothing is freed.
+   If an element is NULL, subsequent elements are not freed.  */
+static void
+free_array_of_ptrs (void **ptr, unsigned n)
+{
+  unsigned i;
+  if (!ptr)
+return;
+  for (i = 0; i < n; i++)
+{
+  if (!ptr[i])
+   break;
+  free (ptr[i]);
+}
+  free (ptr);
+  return;
+}
+
+/* Check whether NAME can be accessed in MODE.  This is like access,
+   except that it never considers directories to be executable.  */
+static int
+access_check (const char *name, int mode)
+{
+  if (mode == X_OK)
+{
+  struct stat st;
+
+  if (stat (name, &st) < 0 || S_ISDIR (st.st_mode))
+   return -1;
+}
+
+  return access (name, mode);
+}
+
 static void
 process (FILE *in, FILE *out)
 {
@@ -853,6 +925,37 @@ main (int argc, char **argv)
 driver_used = sprintf (driver, "%s/", gcc_path);
   sprintf (driver + driver_used, "%s", GCC_INSTALL_NAME);
 
+  bool found = false;
+  if (gcc_path == NULL)
+found = true;
+  else if (access_check (driver, X_OK) == 0)
+found = true;
+  else
+{
+  /* Don't use alloca pointer with XRESIZEVEC.  */
+  driver = NULL;
+  /* Look in all COMPILER_PATHs for GCC_INSTALL_NAME.  */
+  char **paths = NULL;
+  unsigned n_paths;
+  n_paths = parse_env_var (getenv ("COMPILER_PATH"), &paths);
+  for (unsigned i = 0; i < n_paths; i++)
+   {
+ len = strlen (paths[i]) + 1 + strlen (GCC_INSTALL_NAME) + 1;
+ driver = XRESIZEVEC (char, driver, len);
+ sprintf (driver, "%s/%s", paths[i], GCC_INSTALL_NAME);
+ if (access_check (driver, X_OK) == 0)
+   {
+ found = true;
+ break;
+   }
+   }
+  free_array_of_ptrs ((void **) paths, n_paths);
+}
+
+  if (!found)
+fatal_error (input_location,
+"offload compiler %s not found", GCC_INSTALL_NAME);
+
   /* We may be called with all the arguments stored in some file and
  passed with @file.  Expand them into argv before processing.  */
   expandargv (&argc, &argv);


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-18 Thread Andreas Schwab

Joseph Myers  writes:

> For example, on MIPS the C.cond.fmt instruction has a four-bit condition 
> field: "In the cond field of the instruction: cond 2..1 specify the nature 
> of the comparison (equals, less than, and so on); cond 0 specifies whether 
> the comparison is ordered or unordered, i.e. false or true if any operand 
> is a NaN; cond 3 indicates whether the instruction should signal an 
> exception on QNaN inputs, or not".  Together with possibly negating the 
> result you get all 32 possible comparisons (choice of whether the 
> comparison is true or false for each of = < > unordered, choice of whether 
> to raise invalid for quiet NaNs).

The m68k fpu has the same feature.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)

2015-02-18 Thread Thomas Schwinge

Hi!

On Wed, 4 Feb 2015 10:43:14 +0100, Jakub Jelinek  wrote:
> On Mon, Feb 02, 2015 at 04:32:34PM +0100, Thomas Schwinge wrote:
> > Hi!
> > 
> > On Tue, 23 Dec 2014 19:49:35 +0100, I wrote:
> > > On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt 
> > >  wrote:
> > > > The scripts (11/11) I've put up on github, along with a hacked up 
> > > > newlib. These are at [...]
> > 
> > > > They are likely to migrate to MentorEmbedded from bernds, but that had 
> > > > some permissions problems last week.
> > > 
> > > That has recently been done:
> > >  and
> > >  are now available.
> > > 
> > > (I'm aware that we still are to write up how to actually build and test
> > > all this.)
> > 
> > I just updated
> > .
> 
> Can you please update the gmane URLs to corresponding
> https://gcc.gnu.org/ml/gcc-patches/ URLs?  We have our own mailing list
> archives, no need to use third party ones.

It's convenient for me (Message-IDs falls out of my mailer automatically,
and Gmane happens to support retrieving message by Message-ID), and the
sourceware mailing list archives software doesn't interlink articles
between different -MM, which I find rather limiting.


> > OK to check in the following to trunk?

Committed to trunk in r220783.


> > --- gcc/config/nvptx/nvptx.opt
> > +++ gcc/config/nvptx/nvptx.opt
> > @@ -17,13 +17,13 @@
> >  ; along with GCC; see the file COPYING3.  If not see
> >  ; .
> >  
> > -m64
> > -Target Report RejectNegative Mask(ABI64)
> > -Generate code for a 64 bit ABI
> > -
> >  m32
> >  Target Report RejectNegative InverseMask(ABI64)
> > -Generate code for a 32 bit ABI
> > +Generate code for a 32-bit ABI
> > +
> > +m64
> > +Target Report RejectNegative Mask(ABI64)
> > +Generate code for a 64-bit ABI
> 
> I'd expect you want also Negative(m64) on the m32 option and
> Negative(m32) on the m64 option.
> 
> > +@table @gcctabopt
> > +
> > +@item -m32
> > +@itemx -m64
> > +@opindex m32
> > +@opindex m64
> > +Generate code for 32-bit or 64-bit ABI.
> 
> I guess you should mention which one of those is the default (if it isn't
> configure time configurable).

Have taken a note to look into these, later.


> What about multilibs, is newlib built for both -m32 and -m64, or just the
> default option?

So far, we have concentrated only on the 64-bit x86_64 configuration;
32-bit has several known issues to be resolved.
 filed.


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 09:50:15AM +0100, Thomas Schwinge wrote:
> > What about multilibs, is newlib built for both -m32 and -m64, or just the
> > default option?
> 
> So far, we have concentrated only on the 64-bit x86_64 configuration;
> 32-bit has several known issues to be resolved.
>  filed.

I meant 64-bit and 32-bit PTX.

Jakub

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge

Hi!

On Mon, 16 Feb 2015 22:08:12 +0100, Jakub Jelinek  wrote:
> On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
> > I think (also communicated that on IRC) we should instead try not streaming
> > machine-modes at all but generating them at stream-in time via layout_type
> > or layout_decl.
> 
> Here is a WIP prototype for being able to stream a machine mode description
> table and streaming it back in.  [...]

Many thanks for that!  (I had modified Bernd's patch to be less
intrusive, see attached, but of course that didn't resolve its design
problem.)

On Mon, 16 Feb 2015 22:43:49 +0100, Jakub Jelinek  wrote:
> [updated patch]

No regressions with
--enable-offload-targets=nvptx-none=[...],x86_64-intelmicemul-linux-gnu=[...].


Grüße,
 Thomas


commit 97a1ad0d3a96321ded8fad5e3a3cc75b46970bfa
Author: Thomas Schwinge 
Date:   Fri Feb 13 19:51:09 2015 +0100

Use the offload host CPU's modes.def when building an offloading compiler: make it less intrusive.

diff --git gcc/config.gcc gcc/config.gcc
index ebf0ee6..265ac0e 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -482,15 +482,15 @@ tilepro*-*-*)
 	;;
 esac
 
-offload_host_cpu_type=${cpu_type}
-if test "x${enable_as_accelerator}" != "xno"
-then
-	offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
-fi
-case ${offload_host_cpu_type} in
-x86_64)
-  offload_host_cpu_type=i386
-	  ;;
+modes_cpu_type=${cpu_type}
+case ${enable_as_accelerator}:${target} in
+yes:nvptx-*-*)
+	modes_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
+	case ${modes_cpu_type} in
+	x86_64)
+		modes_cpu_type=i386
+		;;
+	esac
 esac
 
 tm_file=${cpu_type}/${cpu_type}.h
@@ -499,9 +499,9 @@ then
 	tm_p_file=${cpu_type}/${cpu_type}-protos.h
 fi
 extra_modes=
-if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+if test -f ${srcdir}/config/${modes_cpu_type}/${modes_cpu_type}-modes.def
 then
-	extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+	extra_modes=${modes_cpu_type}/${modes_cpu_type}-modes.def
 fi
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt
 then
diff --git gcc/config/i386/i386-modes.def gcc/config/i386/i386-modes.def
index 766681b..0b6a1f1 100644
--- gcc/config/i386/i386-modes.def
+++ gcc/config/i386/i386-modes.def
@@ -24,9 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
-/* This file may be used when building a compiler for an offload target.
-   Assume that no special floating point options are used.  */
-#ifndef ACCEL_COMPILER
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
In LP64 mode, XFmode has size and alignment 16.  */
 ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
@@ -36,7 +33,6 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
 			  : &ieee_extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
-#endif
 
 /* Add any extra modes needed to represent the condition code.
 
diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h
index 9a9954b..c0d97ee 100644
--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -64,6 +64,14 @@
 #define DOUBLE_TYPE_SIZE 64
 #define LONG_DOUBLE_TYPE_SIZE 64
 
+#ifdef ACCEL_COMPILER
+/* For ../i386/i386-modes.def.  */
+/* See ../i386/unix.h:TARGET_SUBTARGET64_DEFAULT.  */
+# define TARGET_128BIT_LONG_DOUBLE (TARGET_ABI64)
+/* See ../i386/i386.h:TARGET_96_ROUND_53_LONG_DOUBLE.  */
+# define TARGET_96_ROUND_53_LONG_DOUBLE 0
+#endif
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
 #undef PTRDIFF_TYPE


signature.asc
Description: PGP signature

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge

Hi!

On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek  wrote:
> On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
> > On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> > > I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected 
> > > it
> > > would be built in-tree, is that not the case (at least wiki/Offloading
> > > mentions that).

> configure:4261: checking for C compiler default output file name
> configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc 
> -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc 
> -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
> /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
> /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
> -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem 
> /usr/local/nvptx-none/sys-include-g -O2   conftest.c  >&5
> error opening libc.a
> collect2: error: ld returned 1 exit status
> very early during in-tree newlib configure.

Do you literally have »nvptx-newlib symlinked into the gcc tree as
newlib«?  If yes, then that should explain the problem: as I wrote in
,
you need to »add a symbolic link to nvptx-newlib's newlib directory to
the directory containing the GCC sources«, so not link [GCC]/newlib ->
[newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
resolve the issue?


Grüße,
 Thomas


signature.asc
Description: PGP signature

[PATCH] Fix IFN_OBJECT_SIZE expansion (PR sanitizer/65081)

2015-02-18 Thread Marek Polacek

We're lacking the POINTER_DIFF_EXPR, which means that ptr - 1 is in fact
ptr + very_big_number.  This can result in bogus run-time error when the
objsz checking is turned on.  Jakub suggested to not to issue the error
if (ptr > ptr + offset) is true.  So this patch attemps to do that, along
with some optimizations for the common case.

Bootstrap-ubsan passed, bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-02-17  Marek Polacek  

PR sanitizer/65081
* ubsan.c (OBJSZ_MAX_OFFSET): Define.
(ubsan_expand_objsize_ifn): Don't emit run-time check if the offset
is in range [-16K, -1].  Don't issue run-time error if
(ptr > ptr + offset).

* c-c++-common/ubsan/pr65081.c: New test.

diff --git gcc/testsuite/c-c++-common/ubsan/pr65081.c 
gcc/testsuite/c-c++-common/ubsan/pr65081.c
index e69de29..a1123fd 100644
--- gcc/testsuite/c-c++-common/ubsan/pr65081.c
+++ gcc/testsuite/c-c++-common/ubsan/pr65081.c
@@ -0,0 +1,26 @@
+/* PR sanitizer/65081 */
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
+/* { dg-options "-fsanitize=object-size -fno-sanitize-recover=object-size" } */
+
+struct S
+{
+  int a;
+  char p[1];
+};
+
+struct S b;
+
+struct S *
+foo ()
+{
+  struct S *i = &b;
+  return i + 1;
+}
+
+int
+main (void)
+{
+  struct S *i = foo () - 1;
+  i->a = 1;
+}
diff --git gcc/ubsan.c gcc/ubsan.c
index fc3352f..652d46f 100644
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -920,6 +920,8 @@ ubsan_expand_null_ifn (gimple_stmt_iterator *gsip)
   return false;
 }
 
+#define OBJSZ_MAX_OFFSET (1024 * 16)
+
 /* Expand UBSAN_OBJECT_SIZE internal call.  */
 
 bool
@@ -941,6 +943,10 @@ ubsan_expand_objsize_ifn (gimple_stmt_iterator *gsi)
   || integer_all_onesp (size))
 /* Yes, __builtin_object_size couldn't determine the
object size.  */;
+  else if (TREE_CODE (offset) == INTEGER_CST
+  && wi::ges_p (wi::to_widest (offset), -OBJSZ_MAX_OFFSET)
+  && wi::les_p (wi::to_widest (offset), -1))
+/* The offset is in range [-16K, -1].  */;
   else
 {
   /* if (offset > objsize) */
@@ -952,8 +958,42 @@ ubsan_expand_objsize_ifn (gimple_stmt_iterator *gsi)
   gimple_set_location (g, loc);
   gsi_insert_after (&cond_insert_point, g, GSI_NEW_STMT);
 
+  /* If the offset is small enough, we don't need the second
+run-time check.  */
+  if (TREE_CODE (offset) == INTEGER_CST
+ && wi::ges_p (wi::to_widest (offset), 0)
+ && wi::les_p (wi::to_widest (offset), OBJSZ_MAX_OFFSET))
+   *gsi = gsi_after_labels (then_bb);
+  else
+   {
+ /* Don't issue run-time error if (ptr > ptr + offset).  That
+may happen when computing a POINTER_PLUS_EXPR.  */
+ basic_block then2_bb, fallthru2_bb;
+
+ gimple_stmt_iterator gsi2 = gsi_after_labels (then_bb);
+ cond_insert_point = create_cond_insert_point (&gsi2, false, false,
+   true, &then2_bb,
+   &fallthru2_bb);
+ /* Convert the pointer to an integer type.  */
+ tree p = make_ssa_name (pointer_sized_int_node);
+ g = gimple_build_assign (p, NOP_EXPR, ptr);
+ gimple_set_location (g, loc);
+ gsi_insert_before (&cond_insert_point, g, GSI_NEW_STMT);
+ p = gimple_assign_lhs (g);
+ /* Compute ptr + offset.  */
+ g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
+  PLUS_EXPR, p, offset);
+ gimple_set_location (g, loc);
+ gsi_insert_after (&cond_insert_point, g, GSI_NEW_STMT);
+ /* Now build the conditional and put it into the IR.  */
+ g = gimple_build_cond (LE_EXPR, p, gimple_assign_lhs (g),
+NULL_TREE, NULL_TREE);
+ gimple_set_location (g, loc);
+ gsi_insert_after (&cond_insert_point, g, GSI_NEW_STMT);
+ *gsi = gsi_after_labels (then2_bb);
+   }
+
   /* Generate __ubsan_handle_type_mismatch call.  */
-  *gsi = gsi_after_labels (then_bb);
   if (flag_sanitize_undefined_trap_on_error)
g = gimple_build_call (builtin_decl_explicit (BUILT_IN_TRAP), 0);
   else

Marek

Re: [PATCH] Use !implicit_section in the recent set_section change (PR ipa/65087)

2015-02-18 Thread Jan Hubicka

> On 2015.02.17 at 22:00 +0100, Jan Hubicka wrote:
> > > Hi!
> > > 
> > > Markus reported an ICE, that is fixed by following patch, which limits
> > > the earlier change to !implicit_section only (which I assume is the user
> > > supplied __attribute__((section (.
> > > 
> > > Bootstrapped/regtested on 
> > > {x86_64,i686,aarch64,ppc64,ppc64le,s390,s390x}-linux.
> > > Ok for trunk?
> > > 
> > > 2015-02-17  Jakub Jelinek  
> > > 
> > >   PR ipa/65087
> > >   * cgraphclones.c (cgraph_node::create_virtual_clone): Only copy
> > >   section if !implicit_section.
> > >   (cgraph_node::create_version_clone_with_body): Likewise.
> > >   * trans-mem.c (ipa_tm_create_version): Likewise.
> > 
> > This seems OK. I wonder what the bug Markus reported is.
> 
> The ICE only happens with -fdevirtualize-at-ltrans:
> 
> trippels@gcc2-power8 library % g++ -flto -fdevirtualize-at-ltrans -shared 
> @list
> lto1: internal compiler error: in ipcp_verify_propagated_values, at 
> ipa-cp.c:1057
> 0x10d1270f ipcp_verify_propagated_values()
> ../../gcc/gcc/ipa-cp.c:1057
> 0x10d1481b ipcp_propagate_stage
> ../../gcc/gcc/ipa-cp.c:2758
> 0x10d1481b ipcp_driver
> ../../gcc/gcc/ipa-cp.c:4416
> 0x10d1481b execute
> ../../gcc/gcc/ipa-cp.c:4511
> 
> I will try to come up with a testcase.

This is interesting indeed. -fdevirtualize-at-ltrans should not change outcome 
of ipa-cp,
so we defintly have some latent bug here.  Testcase would be great.

Honza
> 
> -- 
> Markus

Re: [PATCH] Fix IFN_OBJECT_SIZE expansion (PR sanitizer/65081)

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 10:15:03AM +0100, Marek Polacek wrote:
> We're lacking the POINTER_DIFF_EXPR, which means that ptr - 1 is in fact
> ptr + very_big_number.  This can result in bogus run-time error when the
> objsz checking is turned on.  Jakub suggested to not to issue the error
> if (ptr > ptr + offset) is true.  So this patch attemps to do that, along
> with some optimizations for the common case.
> 
> Bootstrap-ubsan passed, bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-02-17  Marek Polacek  
> 
>   PR sanitizer/65081
>   * ubsan.c (OBJSZ_MAX_OFFSET): Define.
>   (ubsan_expand_objsize_ifn): Don't emit run-time check if the offset
>   is in range [-16K, -1].  Don't issue run-time error if
>   (ptr > ptr + offset).
> 
>   * c-c++-common/ubsan/pr65081.c: New test.

Ok, thanks.

Jakub

hashtable optimization

2015-02-18 Thread François Dumont


Hello

I am still studying hashtable performances and especially how to 
reduce overhead compared to tr1 implementation. Most of the overhead is 
coming from the additional modulo operations required with the new data 
model. Having a closer look at PR 57885 bench I realized that we can 
quite easily avoid an important modulo operation in 
_M_insert_bucket_begin thanks to an additional std::size_t in the container.


The patch is quite straightforward, it optimizes insertion of a 
node in an empty bucket which is the most important kind of insertion as 
long as hash functor is doing a good job as per default we should have 
at most 1 element per buckets:


Without patch:
Container:std::unordered_map  Key:int
Insertion: 1106 671 634 634 635  min=634 max=1106

Container:std::tr1::unordered_map  Key:int
Insertion: 885 491 487 490 511  min=487 max=885

With patch:
Container:std::unordered_map  Key:int
Insertion: 1099 581 583 584 592  min=581 max=1099

Container:std::tr1::unordered_map  Key:int
Insertion: 889 491 519 492 493  min=491 max=889

I prefer to propose it now because it impacts ABI.

2015-02-19  François Dumont  

* include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
pointing to _M_before_begin.
(_Hashtable<>): Maintain and use later.

Tested under Linux x86_64.

François

Re: hashtable optimization

2015-02-18 Thread François Dumont


With patch.

On 18/02/2015 10:35, François Dumont wrote:

Hello

I am still studying hashtable performances and especially how to 
reduce overhead compared to tr1 implementation. Most of the overhead 
is coming from the additional modulo operations required with the new 
data model. Having a closer look at PR 57885 bench I realized that we 
can quite easily avoid an important modulo operation in 
_M_insert_bucket_begin thanks to an additional std::size_t in the 
container.


The patch is quite straightforward, it optimizes insertion of a 
node in an empty bucket which is the most important kind of insertion 
as long as hash functor is doing a good job as per default we should 
have at most 1 element per buckets:


Without patch:
Container:std::unordered_map  Key:int
Insertion: 1106 671 634 634 635  min=634 max=1106

Container:std::tr1::unordered_map  Key:int
Insertion: 885 491 487 490 511  min=487 max=885

With patch:
Container:std::unordered_map  Key:int
Insertion: 1099 581 583 584 592  min=581 max=1099

Container:std::tr1::unordered_map  Key:int
Insertion: 889 491 519 492 493  min=491 max=889

I prefer to propose it now because it impacts ABI.

2015-02-19  François Dumont  

* include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
pointing to _M_before_begin.
(_Hashtable<>): Maintain and use later.

Tested under Linux x86_64.

François



Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 220780)
+++ include/bits/hashtable.h	(working copy)
@@ -324,6 +324,9 @@
   // numerous checks in the code to avoid 0 modulus.
   __bucket_type		_M_single_bucket	= nullptr;
 
+  // Cache index of the bucket pointing to _M_before_begin
+  size_type			_M_bbegin_bkt;
+
   bool
   _M_uses_single_bucket(__bucket_type* __bkts) const
   { return __builtin_expect(__bkts == &_M_single_bucket, false); }
@@ -965,7 +968,8 @@
 	__node_type* __this_n = __node_gen(__ht_n);
 	this->_M_copy_code(__this_n, __ht_n);
 	_M_before_begin._M_nxt = __this_n;
-	_M_buckets[_M_bucket_index(__this_n)] = &_M_before_begin;
+	_M_bbegin_bkt = _M_bucket_index(__this_n);
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
 	// Then deal with other nodes.
 	__node_base* __prev_n = __this_n;
@@ -1029,12 +1033,13 @@
   _M_bucket_count = __ht._M_bucket_count;
   _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
   _M_element_count = __ht._M_element_count;
+  _M_bbegin_bkt = __ht._M_bbegin_bkt;
   std::__alloc_on_move(this->_M_node_allocator(), __ht._M_node_allocator());
 
   // Fix buckets containing the _M_before_begin pointers that can't be
   // moved.
   if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
   __ht._M_reset();
 }
 
@@ -1131,7 +1136,8 @@
   _M_bucket_count(__ht._M_bucket_count),
   _M_before_begin(__ht._M_before_begin._M_nxt),
   _M_element_count(__ht._M_element_count),
-  _M_rehash_policy(__ht._M_rehash_policy)
+  _M_rehash_policy(__ht._M_rehash_policy),
+  _M_bbegin_bkt(__ht._M_bbegin_bkt)
 {
   // Update, if necessary, buckets if __ht is using its single bucket.
   if (__ht._M_uses_single_bucket())
@@ -1143,7 +1149,7 @@
   // Update, if necessary, bucket pointing to before begin that hasn't
   // moved.
   if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
   __ht._M_reset();
 }
@@ -1183,7 +1189,8 @@
   _M_buckets(nullptr),
   _M_bucket_count(__ht._M_bucket_count),
   _M_element_count(__ht._M_element_count),
-  _M_rehash_policy(__ht._M_rehash_policy)
+  _M_rehash_policy(__ht._M_rehash_policy),
+  _M_bbegin_bkt(__ht._M_bbegin_bkt)
 {
   if (__ht._M_node_allocator() == this->_M_node_allocator())
 	{
@@ -1199,7 +1206,7 @@
 	  // Update, if necessary, bucket pointing to before begin that hasn't
 	  // moved.
 	  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 	  __ht._M_reset();
 	}
   else
@@ -1265,15 +1272,15 @@
   std::swap(_M_before_begin._M_nxt, __x._M_before_begin._M_nxt);
   std::swap(_M_element_count, __x._M_element_count);
   std::swap(_M_single_bucket, __x._M_single_bucket);
+  std::swap(_M_bbegin_bkt, __x._M_bbegin_bkt);
 
   // Fix buckets containing the _M_before_begin pointers that can't be
   // swapped.
   if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
   if (__x._M_begin())
-	__x._M_buckets[__x._M_bucket_index(__x._M_begin())]
-	  = &__x._M_before_begin;
+	__x._M_buckets[__x._M_bbegin_bkt] = &__x._M_before_begin;
 }
 
   template_M_nxt)
 	// We must update for

[PATCH] Fix PR62217

2015-02-18 Thread Richard Biener


The following patch extends a heuristic in DOM that avoids propagating
copies into IV increments to cover all BIV replacements.  This avoids
the extra loop copy complete peeling produces and would have also
avoided the array bound warning had we not disabled them completely
from VRP2.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-02-18  Richard Biener  

PR tree-optimization/62217
* tree-ssa-dom.c (cprop_operand): Avoid propagating copies
into BIVs.

* gcc.dg/tree-ssa/cunroll-11.c: New testcase.

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 220755)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -2291,11 +2291,16 @@ cprop_operand (gimple stmt, use_operand_
   if (!may_propagate_copy (op, val))
return;
 
-  /* Do not propagate copies into simple IV increment statements.
- See PR23821 for how this can disturb IV analysis.  */
-  if (TREE_CODE (val) != INTEGER_CST
- && simple_iv_increment_p (stmt))
-   return;
+  /* Do not propagate copies into BIVs.
+ See PR23821 and PR62217 for how this can disturb IV and
+number of iteration analysis.  */
+  if (TREE_CODE (val) != INTEGER_CST)
+   {
+ gimple def = SSA_NAME_DEF_STMT (op);
+ if (gimple_code (def) == GIMPLE_PHI
+ && gimple_bb (def)->loop_father->header == gimple_bb (def))
+   return;
+   }
 
   /* Dump details.  */
   if (dump_file && (dump_flags & TDF_DETAILS))
Index: gcc/testsuite/gcc.dg/tree-ssa/cunroll-11.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/cunroll-11.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/cunroll-11.c  (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Warray-bounds -fdump-tree-cunroll-details" } */
+
+typedef struct { unsigned data; } s1;
+s1 g_x[4];
+
+extern void foo (s1 *x1, s1 *x2, int a, int b)
+{
+  int i;
+  for(i = 0; i < a; i++)
+if(i == b)
+  g_x[i] = *x1;
+else
+  g_x[i] = *x2;
+}
+
+/* { dg-final { scan-tree-dump "Loop 1 iterates at most 3 times" "cunroll" } } 
*/
+/* { dg-final { cleanup-tree-dump "cunroll" } } */

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek

On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote:
> I'm just looking for a way to make this less of a hack (and the LTO IL
> less target dependent).  Not for GCC 5 for which something like your
> patch is probably ok, but for the future.

So, given Ilya's and Thomas' testing, is this acceptable for now, and
perhaps we can try to do something better for GCC 6?

Here is the patch with full ChangeLog:

2015-02-18  Jakub Jelinek  

* passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table.
(ipa_write_optimization_summaries): Likewise.
* tree-streamer.h: Include data-streamer.h.
(streamer_mode_table): Declare extern variable.
(bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions.
* lto-streamer-out.c (lto_output_init_mode_table,
lto_write_mode_table): New functions.
(produce_asm_for_decls): Call lto_write_mode_table when streaming
offloading LTO.
* lto-section-in.c (lto_section_name): Add "mode_table" entry.
(lto_create_simple_input_block): Add mode_table argument to the
lto_input_block constructors.
* ipa-prop.c (ipa_prop_read_section, read_replacements_section):
Likewise.
* data-streamer-in.c (string_for_index): Likewise.
* ipa-inline-analysis.c (inline_read_section): Likewise.
* ipa-icf.c (sem_item_optimizer::read_section): Likewise.
* lto-cgraph.c (input_cgraph_opt_section): Likewise.
* lto-streamer-in.c (lto_read_body_or_constructor,
lto_input_toplevel_asms): Likewise.
(lto_input_mode_table): New function.
* tree-streamer-out.c (pack_ts_fixed_cst_value_fields,
pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields):
Use bp_pack_machine_mode.
* real.h (struct real_format): Add name field.
* lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table.
(class lto_input_block): Add mode_table member.
(lto_input_block::lto_input_block): Add mode_table_ argument,
initialize mode_table.
(struct lto_file_decl_data): Add mode_table field.
(lto_input_mode_table, lto_output_init_mode_table): New prototypes.
* tree-streamer-in.c (unpack_ts_fixed_cst_value_fields,
unpack_ts_decl_common_value_fields,
unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode.
* tree-streamer.c (streamer_mode_table): New variable.
* real.c (ieee_single_format, mips_single_format,
motorola_single_format, spu_single_format, ieee_double_format,
mips_double_format, motorola_double_format,
ieee_extended_motorola_format, ieee_extended_intel_96_format,
ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format,
ibm_extended_format, mips_extended_format, ieee_quad_format,
mips_quad_format, vax_f_format, vax_d_format, vax_g_format,
decimal_single_format, decimal_double_format, decimal_quad_format,
ieee_half_format, arm_half_format, real_internal_format): Add name
field.
* config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise.
lto/
* lto.c (lto_mode_identity_table): New variable.
(lto_read_decls): Add mode_table argument to the lto_input_block
constructor.
(lto_file_finalize): Initialize mode_table.
(lto_init): Initialize lto_mode_identity_table.

--- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100
+++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;

+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);

   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state->symtab_node_encoder = encoder;

+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
!lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
--- gcc/tree-streamer.h.jj  2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.

 #include "streamer-hooks.h"
 #include "lto-streamer.h"
+#include "data-streamer.h"
 #include "hash-map.h"

 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);

 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1 << 8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   retu

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-18 Thread Joseph Myers

On Tue, 17 Feb 2015, Kenneth Zadeck wrote:

> The fp exceptions raise some very tricky issues with respect to gcc and 
> optimization.  On many machines, noisy does not mean to throw an 
> exception, it means that you set a bit and then check later.  If you try 
> to model this kind of behavior in gcc, you end up pinning the code so 
> that nothing can be moved or reordered.

When I say exception here, I'm always referring to that flag bit setting, 
not to processor-level exceptions.  In IEEE 754 terms, an exception is 
*signaled*, and the default exception handling is to *raise* a flag and 
deliver a default result (except for exact underflow which doesn't raise 
the flag).

To quote Annex F, "This specification does not require support for trap 
handlers that maintain information about the order or count of 
floating-point exceptions. Therefore, between function calls, 
floating-point exceptions need not be precise: the actual order and number 
of occurrences of floating-point exceptions (> 1) may vary from what the 
source code expresses.".  So it is not necessary to be concerned about 
configurations where trap handlers may be called.

There is as yet no public draft of TS 18661-5 (Supplementary attributes).  
That will provide C bindings for alternate exception handling as described 
in IEEE 754-2008 clause 8.  I suspect such bindings will not readily be 
efficiently implementable using processor-level exception handlers; SIGFPE 
is an awkward interface for implementing such things at the C language 
level, some processors do not support such trap handlers at all (e.g. many 
ARM processors), and where traps are supported they may be asynchronous 
rather than occurring immediately on execution of the relevant 
instruction.  In addition, at least x86 does not support raising exception 
flags without running trap handlers on the next floating-point instruction 
(raiseFlags operation, fesetexcept in TS 18661-1); that is, if trap 
handlers were used to implement standard functionality, it would need to 
be in a way such that this x86 peculiarity is not visible.

> to get this right gcc needs something like a monotonic dependency which 
> would allow reordering and gcc has nothing like this.  essentially, you 
> need way to say that all of these insns modify the same variable, but 
> they all just move the value in the same direction so you do not care 
> what order the operations are performed in.  that does not mean that 
> this could not be added but gcc has nothing like this.

Indeed, this is one of the things about defining the default mode that I 
referred to; the present default is -ftrapping-math, but we may wish to 
distinguish between strict trapping-math (whenever exception flags might 
be tested / raised / lowered, exactly the computations specified by the 
abstract machine have occurred, which might mean rather more limits on 
code movement in the absence of monotonic dependencies) and loose trapping 
math (like the present default; maybe don't transform expressions locally 
in ways that add or remove exceptions, but don't treat an expression as 
having side effects or reading global state purely because of possible 
raising of floating-point exceptions).

> going back to the rounding modes issue, there is a huge range in the 
> architectural implementation space.  you have a few that are pure 
> dynamic, a few that are pure static and some in the middle that are just 
> a mess.  a lot of machines would have liked to support fully static, but 
> could not fit the bits to specify the rounding modes into the 
> instruction.  my point here is you do need to at least have a plan that 
> will support the full space even if you do this with a 1000 small 
> patches.

I think the norm is dynamic, because that's what was in IEEE 754-1985, 
with static rounding added more recently on some processors, because of 
IEEE 754-2008.  (There are other variants - IA64 having multiple dynamic 
rounding mode registers and allowing instructions to specify which one the 
rounding mode is taken from.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek  wrote:
> > On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
> > > On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> > > > I have nvptx-newlib symlinked into the gcc tree as newlib, so I 
> > > > expected it
> > > > would be built in-tree, is that not the case (at least wiki/Offloading
> > > > mentions that).
> 
> > configure:4261: checking for C compiler default output file name
> > configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc 
> > -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc 
> > -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
> > /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
> > /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
> > -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include 
> > -isystem /usr/local/nvptx-none/sys-include-g -O2   conftest.c  >&5
> > error opening libc.a
> > collect2: error: ld returned 1 exit status
> > very early during in-tree newlib configure.
> 
> Do you literally have »nvptx-newlib symlinked into the gcc tree as
> newlib«?  If yes, then that should explain the problem: as I wrote in
> ,
> you need to »add a symbolic link to nvptx-newlib's newlib directory to
> the directory containing the GCC sources«, so not link [GCC]/newlib ->
> [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> resolve the issue?

My bad.  Yes, that does resolve the issue, make & make install now worked
for nvptx-none for me with the patches (2 from Bernd, my mode_table, my
t-nvptx).

Can you or Bernd comment on the other issues I've raised, i.e. whether you
are going to apply Bernd's approved patches, on the t-nvptx fix?

I'll try to have a look at the va_list stuff, if it blocks everything rather
than just testcases with va_list being offloaded.

Jakub

Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)

2015-02-18 Thread Martin Liška


On 02/17/2015 07:38 PM, Jan Hubicka wrote:

Hi,
thanks for working on it.  There are 3 basically indpeendent changes in the 
patch
  - The patch to make checking in lto_streamer_init ENABLE_CHECKING only that I
think can be comitted as obvoius.


Hello.

Following email contains fix for that, which I'm going to install.


  - Templates for call_for_symbol_and_aliases
I do not think these should be strictly necessary for perofrmance, because 
once we
spent too much time in these we are bit screwed.
I however see it also makes things bit nicer by not needing typecasts on 
data pointer.
Pehraps that could be further cleaned?

Alternative would be to implement FOR_EACH_ALIAS macro with tree walking 
iterator.
You have all the structure to not require stack.  Iterator will ocntain an
root node, current node and index to ref.
This may be even easier to use and probably wind up generating about the 
same code
given that the for each template anyway needs to produce self recursive 
function.

I would not care about for_symbol_thunk_and_aliases.  That function is 
heavy by walking
all callers anyway and should not be used in hot code.
I have patch that removes its use from inliner - it is more or less 
leftover from time
we represented thunks as special aliases instead of functions w/o gimple 
body.


Yes, I was also thinking about flat iterator that will be capable of iterating 
thunks/aliases and
I prefer that approach compared to recursive functions. I think we can prepare 
it for next release,
as you said it does not bring so much performance gain.


  - the caching itself.

I will look into the caching in detail.  I am not quite sure I like the idea of 
exposing inline
only cache into cgraph.h.  You could just keep the predicates as are, but have 
inline_ variants
in ipa-inline.h that does the caching for you.

Allocating the bits directly in cgraph_node is probably OK, we don't really 
have shortage there
and can be revisited easily later...

Honza



Please take a look at caching, it would be crucial part of speed improvement.

Martin
>From eb9d34244c43ae1d0576b2ae1002f5267c6cd547 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Wed, 18 Feb 2015 11:18:47 +0100
Subject: [PATCH] Add checking macro within lto_streamer_init.

gcc/ChangeLog:

2015-02-18  Martin Liska  

	* lto-streamer.c (lto_streamer_init): Encapsulate
	streamer_check_handled_ts_structures with checking macro.
---
 gcc/lto-streamer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 836dce9..542a813 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -319,11 +319,13 @@ static hash_table *tree_htab;
 void
 lto_streamer_init (void)
 {
+#ifdef ENABLE_CHECKING
   /* Check that all the TS_* handled by the reader and writer routines
  match exactly the structures defined in treestruct.def.  When a
  new TS_* astructure is added, the streamer should be updated to
  handle it.  */
   streamer_check_handled_ts_structures ();
+#endif
 
 #ifdef LTO_STREAMER_DEBUG
   tree_htab = new hash_table (31);
-- 
2.1.2

[Patch] Fix android build.

2015-02-18 Thread Ilya Tocar

Hi,

On android dlerror returns const char*.
Ok for trunk?

libgomp/
* target.c (gomp_load_plugin_for_device): Fix type of dlerror
return value.
(DLSYM_OPT): Ditto.
---
 libgomp/target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 73e757a..50baa4d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -919,7 +919,7 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 const char *plugin_name)
 {
-  char *err = NULL, *last_missing = NULL;
+  const char *err = NULL, *last_missing = NULL;
   int optional_present, optional_total;
 
   /* Clear any existing error.  */
@@ -947,7 +947,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
 #define DLSYM_OPT(f, n)\
   do   \
 {  \
-  char *tmp_err;   \
+  const char *tmp_err; 
\
   device->f##_func = dlsym (plugin_handle, "GOMP_OFFLOAD_" #n);\
   tmp_err = dlerror ();\
   if (tmp_err == NULL) \
-- 
1.8.3.1

Re: [Patch] Fix android build.

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 01:59:25PM +0300, Ilya Tocar wrote:
> Hi,
> 
> On android dlerror returns const char*.

Looks like POSIX violation.

> Ok for trunk?
> 
> libgomp/
>   * target.c (gomp_load_plugin_for_device): Fix type of dlerror
>   return value.
>   (DLSYM_OPT): Ditto.

Therefore, I wouldn't word the ChangeLog entry this way, because it
isn't fixing it, it is working around Android bug.
So what about
Use const char * instead of char * for variables holding dlerror return
values.
?  Ok with that change.

Jakub

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> Do you literally have »nvptx-newlib symlinked into the gcc tree as
> newlib«?  If yes, then that should explain the problem: as I wrote in
> ,
> you need to »add a symbolic link to nvptx-newlib's newlib directory to
> the directory containing the GCC sources«, so not link [GCC]/newlib ->
> [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> resolve the issue?

BTW, --with-cuda-driver-{include,lib} are apparently not documented in
gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
/usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
either.

../configure --target=nvptx-none 
--enable-as-accelerator-for=x86_64-pc-linux-gnu 
--with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin 
--disable-sjlj-exceptions --enable-newlib-io-long-long
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

and

../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
--target=x86_64-pc-linux-gnu 
--enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst 
--disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include 
--with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

compilers now build, but offloading fails:

/usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
 @/tmp/cce9PdmR
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
mkoffload: fatal error: 
/usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
 returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: 
/usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
 returned 1 exit status
compilation terminated.
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

Is --enable-languages=c,c++,fortran,lto required when configuring the
offload compiler?  It isn't required for intelmic.

Jakub

[PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Kyrill Tkachov


Hi all,

This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_3
pattern and its associated splitters. The problem is that for the 2nd
alternative it will split a right-shift into a SISD left-shift by the 
negated

amount to be shifted by (the ushl instruction allows such semantics).
The splitter generates this RTL:

(set (match_dup 2)
   (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
   (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))

The problem here is that the shift amount register is negated without 
telling

the register allocator about it (and it can't figure it out itself).
So if you try to use the register that operand 2 is assigned to later on,
you get the negated shift amount instead!

The testcase in the patch demonstrates the simple code that can get 
miscompiled

due to this behaviour.

The solution in this patch is to negate the shift amount into the output
operand (operand 0) and mark it as an earlyclobber in that alternative.
This is actually exactly what the very similar
*aarch64_ashr_sisd_or_int_3 pattern does below.
I believe this is the safest and simplest fix at this stage.

This bug was exposed on the Linaro 4.9 branch that happened to have the 
perfect

storm of costs and register pressure and ended up miscompiling
the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
 Linaro compiler, generating essentially code like:

.L141:
negd8, d8   //d8 negated!
ushlv0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
fmovw0, s0
<...irrelevant code...>
b.L140
<...>
.L140:
fmovw0, s8  // s8/d8 used and incremented assuming it had not 
changed at L141

addw0, w0, 1
fmovs8, w0
fmovw1, s10
cmpw0, w1
bne.L141


Basically d8 is negated and later used as if it had not been at .L140 
leading

to completely wrong behaviour.

With this patch that particular part of the assembly now contains at L141:
neg d0, d8
ushlv0.2s, v11.2s, v0.2s
fmovw0, s0

Leaving the original shift amount in d8 intact.

This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
pattern was introduced for 4.9)
Bootstrapped and tested on trunk and 4.9.

Ok for trunk and 4.9?

Thanks,
Kyrill

2015-02-17  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*aarch64_lshr_sisd_or_int_3):
Mark operand 0 as earlyclobber in 2nd alternative.
(1st define_split below *aarch64_lshr_sisd_or_int_3):
Write negated shift amount into QI lowpart operand 0 and use it
in the shift step.
(2nd define_split below *aarch64_lshr_sisd_or_int_3): Likewise.

2015-02-17  Kyrylo Tkachov  

* gcc.target/aarch64/sisd-shft-neg_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1f4169e..8f157ce 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3360,7 +3360,7 @@ (define_insn "*aarch64_ashl_sisd_or_int_3"
 
 ;; Logical right shift using SISD or Integer instruction
 (define_insn "*aarch64_lshr_sisd_or_int_3"
-  [(set (match_operand:GPI 0 "register_operand" "=w,w,r")
+  [(set (match_operand:GPI 0 "register_operand" "=w,&w,r")
 (lshiftrt:GPI
   (match_operand:GPI 1 "register_operand" "w,w,r")
   (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "Us,w,rUs")))]
@@ -3379,11 +3379,13 @@ (define_split
(match_operand:DI 1 "aarch64_simd_register")
(match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
 (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
-(unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
-  ""
+(unspec:DI [(match_dup 1) (match_dup 3)] UNSPEC_SISD_USHL))]
+  {
+operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )
 
 (define_split
@@ -3392,11 +3394,13 @@ (define_split
(match_operand:SI 1 "aarch64_simd_register")
(match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
 (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
-(unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))]
-  ""
+(unspec:SI [(match_dup 1) (match_dup 3)] UNSPEC_USHL_2S))]
+  {
+operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )
 
 ;; Arithmetic right shift using SISD or Integer instruction
diff --git a/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c b/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c
new file mode 100644
index 000..c091657
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+extern void abort (void);
+
+#define force_simd_si(v) asm volatile ("mov %s0, %1.s[0]" :"=w" (v) :"w" (v) :)
+
+unsigned int
+shft_add (unsig

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-02-18 Thread Thomas Schwinge

Hi!

On Wed, 22 Oct 2014 22:57:01 +0400, Ilya Verbin  wrote:
> --- /dev/null
> +++ b/gcc/config/i386/intelmic-mkoffload.c
> +[...]
> +#include "config.h"
> +#include 
> +#include "system.h"
> +#include "coretypes.h"
> +#include "obstack.h"
> +#include "intl.h"
> +#include "diagnostic.h"
> +#include "collect-utils.h"
> +#include 
> +[...]

> --- /dev/null
> +++ b/gcc/config/i386/t-intelmic
> @@ -0,0 +1,9 @@
> +mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c | insn-modes.h
> + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> +   -I$(srcdir)/../libgomp \
> +   -DDEFAULT_REAL_TARGET_MACHINE=\"$(real_target_noncanonical)\" \
> +   -DDEFAULT_TARGET_MACHINE=\"$(target_noncanonical)\" \
> +   $< $(OUTPUT_OPTION)
> +
> +mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
> $(LIBIBERTY) $(LIBDEPS)
> + $(COMPILER) -o $@ mkoffload.o collect-utils.o libcommon-target.a 
> $(LIBIBERTY) $(LIBS)

What is the rationale for the insn-modes.h order-only prerequisites for
mkoffload.o?  Is this simply to get past the build issue which, for
example, Jakub also reported for the nvptx mkoffload,

(»missing mkoffload.o dependencies, patch attached«), or is there a
better rationale for adding (only) this one (indirect) dependency,
instead of listing all of the (real) dependencies?  (After all, we do
want the mkoffload executables to be rebuilt if one of their dependencies
is updated.)  (I have not yet tried to figure out how to do that.)

Grüße,
 Thomas

signature.asc
Description: PGP signature

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-02-18 Thread Ilya Verbin

On Wed, Feb 18, 2015 at 12:48:21 +0100, Thomas Schwinge wrote:
> What is the rationale for the insn-modes.h order-only prerequisites for
> mkoffload.o?  Is this simply to get past the build issue which, for
> example, Jakub also reported for the nvptx mkoffload,
> 
> (»missing mkoffload.o dependencies, patch attached«), or is there a
> better rationale for adding (only) this one (indirect) dependency,
> instead of listing all of the (real) dependencies?  (After all, we do
> want the mkoffload executables to be rebuilt if one of their dependencies
> is updated.)  (I have not yet tried to figure out how to do that.)

Yes, mkoffload is just not working without this dependency, and works well with
it.  Do you know the right way how to add all other dependencies?

  -- Ilya

Re: [patch, avr] Fix ICE PR64452 pushing eliminated rtxes

2015-02-18 Thread Georg-Johann Lay


Am 02/17/2015 um 03:34 PM schrieb Denis Chertykov:

2015-02-17 14:12 GMT+03:00 Georg-Johann Lay :

Byte-wise pushing virtual regs like arg pointer migth result in patterns
like

  (set (mem:QI (post_dec:HI (reg:HI 32 SP)))
   (subreg:QI (plus:HI (reg:HI 28)
   (const_int 17)) 0))

after elimination.

Attached patch uses new pushhi1_insn to push virtuals in HImode so that
expressions like in subreg_reg from above can be reloaded.

Ok to commit ?

Johann

 PR target/64452

 * config/avr/avr.md (pushhi_insn): New insn.
 (push1): Push virtual regs in one chunk using pushhi1_insn.


Approved.
(But I'm worry about this because it's reload related problem and it
can have a side effect)

Denis.


So you have a superior solution in mind?

What side effects specifically?

Currently the side effect is that reload gets simpler expressions and hence 
does not ICE.  There isn't even an insn that can push complex (plus rtx in this 
case) expressions or subregs thereof.  Even if there were such insns I don't 
think reload is supposed to handle them.


The current implementation of push1 assumes that all RTXes which ever 
appear in a push can be decomposed into subregs and these can be simplified to 
some of the push insns, i.e. the push operand simplifies to REG or CONST0_RTX. 
 The subreg above, however, cannot be simplified to anything reload can handle 
and does not match an insn.  And supplying such an insn is pointless because 
that insn would need a scratch and hence require secondary reloads...


plus rtxes are special as they might be produced by reload (R28 above is 
(hard_)frame_pointer).  For similar reason there are two addhi3 insns (one 
without scratch to accommodate reload and one generic with scratch for better 
performance.)



Johann

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 12:48:21PM +0100, Thomas Schwinge wrote:
> > --- /dev/null
> > +++ b/gcc/config/i386/t-intelmic
> > @@ -0,0 +1,9 @@
> > +mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c | insn-modes.h
> > +   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> > + -I$(srcdir)/../libgomp \
> > + -DDEFAULT_REAL_TARGET_MACHINE=\"$(real_target_noncanonical)\" \
> > + -DDEFAULT_TARGET_MACHINE=\"$(target_noncanonical)\" \
> > + $< $(OUTPUT_OPTION)
> > +
> > +mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
> > $(LIBIBERTY) $(LIBDEPS)
> > +   $(COMPILER) -o $@ mkoffload.o collect-utils.o libcommon-target.a 
> > $(LIBIBERTY) $(LIBS)
> 
> What is the rationale for the insn-modes.h order-only prerequisites for
> mkoffload.o?  Is this simply to get past the build issue which, for
> example, Jakub also reported for the nvptx mkoffload,
> 
> (»missing mkoffload.o dependencies, patch attached«), or is there a
> better rationale for adding (only) this one (indirect) dependency,
> instead of listing all of the (real) dependencies?  (After all, we do
> want the mkoffload executables to be rebuilt if one of their dependencies
> is updated.)  (I have not yet tried to figure out how to do that.)

Perhaps if we try to
ALL_HOST_OBJS += mkoffload.o
everything would be handled fine?  I mean, in that case it should
have automatic dependency on all the $(generated_files) and
additionally automatic dependencies for mkoffload.o would be sourced in from
.deps/mkoffload.Po

Jakub

[wwwdocs] Porting to again

2015-02-18 Thread Marek Polacek

This is a revised version.  I reworded the paragraph dealing with
__STDC_VERSION__, made some clarifications wrt %a, and added some
text wrt cpp -P issue.

Ok?

Index: porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/porting_to.html,v
retrieving revision 1.3
diff -u -r1.3 porting_to.html
--- porting_to.html 10 Feb 2015 11:12:20 -  1.3
+++ porting_to.html 18 Feb 2015 12:01:50 -
@@ -24,6 +24,17 @@
 manner. Additions and suggestions for improvement are welcome.
 
 
+Preprocessor issues
+
+The preprocessor started to emit line markers to properly distinguish
+whether a macro token comes from a system header, or from a normal header
+(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60723";>PR60723).
+These new markers can cause intriguing problems, if the packages aren't ready
+to handle them.  To stop the preprocessor from generating the 
#line
+directives, use the -P option, documented
+https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#Preprocessor-Options";>here.
+
+
 C language issues
 
 Default standard is now GNU11
@@ -251,6 +262,105 @@
^
 
 
+__STDC_VERSION__ macro
+
+As the default mode changed to C11, the __STDC_VERSION__
+standard macro, introduced in C95, is now defined by default, and has
+the value 201112L.
+
+Typically, this macro is used as in the following:
+
+
+  #if !defined __STDC_VERSION__ || __STDC_VERSION__ < 199901L
+/* ... */
+  #else
+  # include 
+  #endif
+
+
+You can check the macro using gcc -dM -E -std=gnu11 - < /dev/null 
| grep STDC_VER.
+
+Different meaning of the %a *scanf conversion 
specification
+
+In C89, the GNU C library supports dynamic allocation via the 
%as,
+%aS, and %a[...] conversion specification; see
+https://www.gnu.org/software/libc/manual/html_node/Dynamic-String-Input.html#Dynamic-String-Input";>
+this for more info.
+But in C99, the a conversion specifier is a synonym for 
f
+(float), so the compiler expects an argument of type float *.  
Different
+meaning of %a is a change in semantics, and in combination with 
the
+-Wformat warning option the compiler may emit additional 
warnings:
+
+
+  #include 
+
+  int
+  main (void)
+  {
+char *s;
+scanf ("%as", &s);
+  }
+
+
+
+q.c:7:10: warning: format '%a' 
expects argument of type 'float *', but argument 2 has type 'char 
**' [-Wformat=]
+  scanf ("%as", &s);
+ ^
+
+
+To use the dynamic allocation conversion specifier in C99 and C11, specify
+m as a length modifier, specified by POSIX.1-2008.  That is, use
+%ms or %m[...].
+
+New warnings
+
+Several new warnings have been added to the C front end.  One of the new
+warnings is that GCC now warns about non-standard predefined identifiers with
+the -Wpedantic option.  For instance:
+
+
+  void
+  foo (void)
+  {
+const char *s = __FUNCTION__;
+  }
+
+
+
+q.c:4:19: warning: ISO C does not support 
'__FUNCTION__' predefined identifier [-Wpedantic]
+  const char *s = __FUNCTION__;
+  ^
+
+
+The fix is either to use the standard predefined identifier 
__func__
+(since C99), or to use the __extension__ keyword:
+
+
+  const char *s = __extension__ __FUNCTION__;
+
+
+C++ language issues
+
+Converting std::nullptr_t to bool
+
+Converting std::nullptr_t to bool in the C++11
+mode now requires direct-initialization.  This has been changed in
+http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1423";>DR 
1423.
+As a consequence, the following is invalid:
+
+
+  bool b = nullptr;
+
+
+but the following is valid:
+
+
+  bool b(nullptr);
+
+
+It is recommended to use true, resp. false keywords
+in such cases.
+
 Links
 
 
Marek

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge

Hi Jakub!

(Will respond to your other questions later.)


On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek  wrote:
> On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> > Do you literally have »nvptx-newlib symlinked into the gcc tree as
> > newlib«?  If yes, then that should explain the problem: as I wrote in
> > ,
> > you need to »add a symbolic link to nvptx-newlib's newlib directory to
> > the directory containing the GCC sources«, so not link [GCC]/newlib ->
> > [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> > resolve the issue?

(It did.)  Can you suggest a better wording, to make this more clear in
the documentation?


> BTW, --with-cuda-driver-{include,lib} are apparently not documented in
> gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
> /usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
> either.

Thanks for reporting; will fix that.


> ../configure --target=nvptx-none 
> --enable-as-accelerator-for=x86_64-pc-linux-gnu 
> --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin 
> --disable-sjlj-exceptions --enable-newlib-io-long-long
> make; make DESTDIR=/usr/src/gcc/objnvptxinst install
> 
> and
> 
> ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
> --target=x86_64-pc-linux-gnu 
> --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst 
> --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include 
> --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
> make; make DESTDIR=/usr/src/gcc/objnvptxinst install
> 
> compilers now build

That looks very similar to what I'm using.  I currently install into
separate prefixes/DESTDIRS, because I have not yet verified that there
is no overlap in the installed files.


> offloading fails:
> 
> /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
>  @/tmp/cce9PdmR
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> mkoffload: fatal error: 
> /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
>  returned 1 exit status
> compilation terminated.
> lto-wrapper: fatal error: 
> /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
>  returned 1 exit status
> compilation terminated.
> /usr/bin/ld: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> 
> Is --enable-languages=c,c++,fortran,lto required when configuring the
> offload compiler?  It isn't required for intelmic.

Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
defaults to supporting LTO, and due to this also defaults to building the
LTO front end.  I'll enhance the nvptx offloading documentation
accordingly.  Maybe we should add some "magic" to build the LTO front end
if --enable-as-accelerator-for=[...] has been specified?


Note that I recently added another prerequisite patch for nvptx
offloading to :
.
If that is not applied, you'll get run-time errors because in
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
can't find main$_omp_fn$0 and similar symbols.


Grüße,
 Thomas


pgpfvZAJm6VWf.pgp
Description: PGP signature

Re: [libstdc++/65033] Give alignment info to libatomic

2015-02-18 Thread Jonathan Wakely


On 12/02/15 13:23 -0800, Richard Henderson wrote:

When we fixed PR54005, making sure that atomic_is_lock_free returns the same
value for all objects of a given type, we probably should have changed the
interface so that we would pass size and alignment rather than size and object
pointer.

Instead, we decided that passing null for the object pointer would be
sufficient.  But as this PR shows, we really do need to take alignment into
account.

The following patch constructs a fake object pointer that is maximally
misaligned.  This allows the interface to both the builtin and to libatomic to
remain unchanged.  Which probably makes this back-portable to maintenance
releases as well.


Am I right in thinking that another option would be to ensure that
std::atomic<> objects are always suitably aligned? Would that make
std::atomic<> slightly more compatible with a C11 atomic_int, where
the _Atomic qualifier affects alignment?

https://gcc.gnu.org/PR62259 suggests we might need to enforce
alignment on std::atomic anyway, or am I barking up the wrong tree?

Re: [wwwdocs] Porting to again

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 01:04:30PM +0100, Marek Polacek wrote:
> --- porting_to.html   10 Feb 2015 11:12:20 -  1.3
> +++ porting_to.html   18 Feb 2015 12:01:50 -
> @@ -24,6 +24,17 @@
>  manner. Additions and suggestions for improvement are welcome.
>  
>  
> +Preprocessor issues
> +
> +The preprocessor started to emit line markers to properly distinguish
> +whether a macro token comes from a system header, or from a normal header
> +(see  href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60723";>PR60723).
> +These new markers can cause intriguing problems, if the packages aren't ready
> +to handle them.  To stop the preprocessor from generating the 
> #line
> +directives, use the -P option, documented
> + href="https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#Preprocessor-Options";>here.
> +

I think it would be nice to give here some example, like:
#include 
exitfailure EXIT_FAILURE
and showing that older gcc -E used to emit
# 2 "test.c" 2
exitfailure 1
whereas GCC 5 emits:
# 2 "test.c" 2

# 2 "test.c"
exitfailure 
# 2 "test.c" 3 4
   1
and thus it can break simple tools that expect the tokens on a single line.

Otherwise, LGTM.

Jakub

[PATCH] Use automatic dependencies for mkoffload.o

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 02:56:07PM +0300, Ilya Verbin wrote:
> On Wed, Feb 18, 2015 at 12:48:21 +0100, Thomas Schwinge wrote:
> > What is the rationale for the insn-modes.h order-only prerequisites for
> > mkoffload.o?  Is this simply to get past the build issue which, for
> > example, Jakub also reported for the nvptx mkoffload,
> > 
> > (»missing mkoffload.o dependencies, patch attached«), or is there a
> > better rationale for adding (only) this one (indirect) dependency,
> > instead of listing all of the (real) dependencies?  (After all, we do
> > want the mkoffload executables to be rebuilt if one of their dependencies
> > is updated.)  (I have not yet tried to figure out how to do that.)
> 
> Yes, mkoffload is just not working without this dependency, and works well 
> with
> it.  Do you know the right way how to add all other dependencies?

I've tested this for both intelmic and nvptx and it works fine.
Ok for trunk?

2015-02-18  Jakub Jelinek  

* config/i386/t-intelmic (mkoffload.o): Remove dependency on
insn-modes.h.
(ALL_HOST_OBJS): Add mkoffload.o.
* config/nvptx/t-nvptx (ALL_HOST_OBJS): Likewise.

--- gcc/config/i386/t-intelmic.jj   2014-11-13 15:13:25.0 +0100
+++ gcc/config/i386/t-intelmic  2015-02-18 13:11:15.650820901 +0100
@@ -1,9 +1,10 @@
-mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c | insn-modes.h
+mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
  -I$(srcdir)/../libgomp \
  -DDEFAULT_REAL_TARGET_MACHINE=\"$(real_target_noncanonical)\" \
  -DDEFAULT_TARGET_MACHINE=\"$(target_noncanonical)\" \
  $< $(OUTPUT_OPTION)
+ALL_HOST_OBJS += mkoffload.o
 
 mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBDEPS)
$(COMPILER) -o $@ mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBS)
--- gcc/config/nvptx/t-nvptx.jj 2015-02-18 12:36:20.0 +0100
+++ gcc/config/nvptx/t-nvptx2015-02-18 13:10:19.822762534 +0100
@@ -3,6 +3,7 @@ CFLAGS-mkoffload.o += $(DRIVER_DEFINES)
 mkoffload.o: $(srcdir)/config/nvptx/mkoffload.c
$(COMPILE) $<
$(POSTCOMPILE)
+ALL_HOST_OBJS += mkoffload.o
 
 mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \


Jakub

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-18 Thread Ilya Verbin

On Wed, Feb 04, 2015 at 15:05:45 +, Julian Brown wrote:
> This (WIP) patch is based on top of a version of your patch that I
> merged to our internal branch: that's still the easiest way for me to
> test the PTX backend (with unloading support) at present, and it passes
> libgomp testing that way. Trunk should be fairly close, but I haven't
> tried applying it there yet.
> 
> The major changes are:
> 
> * The removal of the OpenACC-specific plugin hooks open_device,
>   close_device, set_device_num and get_device_num. The functionality
>   has been moved into the init/fini hooks (for the first two) or moved
>   into the target-independent OpenACC parts, respectively.
> 
> * The PTX mkoffload utility has been extended to support variables as
>   well as function mapping, to fill out support for the load/unload
>   image hooks. (Not really tested so far!)
> 
> * The plugin hooks that are shared between OpenMP and OpenACC now
>   support the "device number" argument properly: that should help with
>   (eventually) unifying the plugin interface for the two APIs. (With
>   set_device_num and get_device_num removed, the plugin is "stateless"
>   with respect to which device is currently active. The rest of the
>   OpenACC hooks -- async functions, etc. -- should probably be changed
>   to take a device number argument too, but that could be a follow-on
>   patch.)
> 
> * The limitation of having only one type of device active simultaneously
>   in the OpenACC runtime has (theoretically!) been removed.
> 
> Thoughts?

Up.

I have no comments here since I'm not familiar with OpenACC and PTX, but I hope
that Thomas and Jakub will review this and my corresponding patches [1], [2]
before the final closure of the trunk.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html
[2] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01912.html

  -- Ilya

Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Maxim Kuvyrkov

On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov  wrote:

> Hi all,
> 
> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_3
> pattern and its associated splitters. The problem is that for the 2nd
> alternative it will split a right-shift into a SISD left-shift by the negated
> amount to be shifted by (the ushl instruction allows such semantics).
> The splitter generates this RTL:
> 
> (set (match_dup 2)
>   (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
> (set (match_dup 0)
>   (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
> 
> The problem here is that the shift amount register is negated without telling
> the register allocator about it (and it can't figure it out itself).
> So if you try to use the register that operand 2 is assigned to later on,
> you get the negated shift amount instead!
> 
> The testcase in the patch demonstrates the simple code that can get 
> miscompiled
> due to this behaviour.
> 
> The solution in this patch is to negate the shift amount into the output
> operand (operand 0) and mark it as an earlyclobber in that alternative.
> This is actually exactly what the very similar
> *aarch64_ashr_sisd_or_int_3 pattern does below.
> I believe this is the safest and simplest fix at this stage.
> 
> This bug was exposed on the Linaro 4.9 branch that happened to have the 
> perfect
> storm of costs and register pressure and ended up miscompiling
> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
> Linaro compiler, generating essentially code like:
> 
> .L141:
>negd8, d8   //d8 negated!
>ushlv0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>fmovw0, s0
><...irrelevant code...>
>b.L140
> <...>
> .L140:
>fmovw0, s8  // s8/d8 used and incremented assuming it had not changed 
> at L141
>addw0, w0, 1
>fmovs8, w0
>fmovw1, s10
>cmpw0, w1
>bne.L141
> 
> 
> Basically d8 is negated and later used as if it had not been at .L140 leading
> to completely wrong behaviour.
> 
> With this patch that particular part of the assembly now contains at L141:
>neg d0, d8
>ushlv0.2s, v11.2s, v0.2s
>fmovw0, s0
> 
> Leaving the original shift amount in d8 intact.
> 
> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
> pattern was introduced for 4.9)
> Bootstrapped and tested on trunk and 4.9.
> 
> Ok for trunk and 4.9?

First of all, applauses!  I realize how difficult it was to reduce this problem.

Your patch looks OK to me, but I can't shake off feeling that it will pessimize 
cases when d8 is not used afterwards.  In particular, your patch makes it 
impossible to use same register for output (operand 0) and inputs (operands 1 
and 2).

Did you consider using SCRATCHes instead of re-using operand 0 with early 
clobber like in the attached [untested] patch?  If I got it all correct, 
register allocator will get more freedom in deciding which register to use for 
negated shift temporary, while still allowing reusing register from operand 0 
for one of the inputs.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org



bz1149.patch
Description: Binary data

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote:
> On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek  wrote:
> > On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> > > Do you literally have »nvptx-newlib symlinked into the gcc tree as
> > > newlib«?  If yes, then that should explain the problem: as I wrote in
> > > ,
> > > you need to »add a symbolic link to nvptx-newlib's newlib directory to
> > > the directory containing the GCC sources«, so not link [GCC]/newlib ->
> > > [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> > > resolve the issue?
> 
> (It did.)  Can you suggest a better wording, to make this more clear in
> the documentation?

Your wording is fine, but should be listed on wiki/Offloading and
doc/install.texi perhaps too?

> > offloading fails:
> > 
> > /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
> >  @/tmp/cce9PdmR
> > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > mkoffload: fatal error: 
> > /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
> >  returned 1 exit status
> > compilation terminated.
> > lto-wrapper: fatal error: 
> > /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
> >  returned 1 exit status
> > compilation terminated.
> > /usr/bin/ld: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> > 
> > Is --enable-languages=c,c++,fortran,lto required when configuring the
> > offload compiler?  It isn't required for intelmic.
> 
> Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
> defaults to supporting LTO, and due to this also defaults to building the
> LTO front end.  I'll enhance the nvptx offloading documentation
> accordingly.  Maybe we should add some "magic" to build the LTO front end
> if --enable-as-accelerator-for=[...] has been specified?

Toplevel configure.ac has:
  # If LTO is enabled, add the LTO front end.
  if test "$enable_lto" = "yes" ; then
case ,${enable_languages}, in
  *,lto,*) ;;
  *) enable_languages="${enable_languages},lto" ;;
esac
if test "${build_lto_plugin}" = "yes" ; then
  configdirs="$configdirs lto-plugin"
fi
  fi
so IMHO we want similar snippet for the --enable-as-accelerator-for= case,
perhaps right below this one.  Not building lto FE for the accelerator
compilers make them completely useless, thus I think we really want to do
that automatically.

> Note that I recently added another prerequisite patch for nvptx
> offloading to :
> .
> If that is not applied, you'll get run-time errors because in
> libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
> can't find main$_omp_fn$0 and similar symbols.

Can you adjust that to add a cgraph flag alongside of the offloadable
instead and use that instead of the attribute?

Jakub

Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Maxim Kuvyrkov

On Feb 18, 2015, at 3:32 PM, Maxim Kuvyrkov  wrote:

> First of all, applauses!  I realize how difficult it was to reduce this 
> problem.
> 
> Your patch looks OK to me, but I can't shake off feeling that it will 
> pessimize cases when d8 is not used afterwards.  In particular, your patch 
> makes it impossible to use same register for output (operand 0) and inputs 
> (operands 1 and 2).
> 
> Did you consider using SCRATCHes instead of re-using operand 0 with early 
> clobber like in the attached [untested] patch?  If I got it all correct, 
> register allocator will get more freedom in deciding which register to use 
> for negated shift temporary, while still allowing reusing register from 
> operand 0 for one of the inputs.

There is a typo in the patch I sent (mode for last match_scratch should be QI, 
not DI).  Corrected patch attached.

--
Maxim Kuvyrkov
www.linaro.org




bz1149.patch
Description: Binary data

Re: [patch, avr] Fix ICE PR64452 pushing eliminated rtxes

2015-02-18 Thread Denis Chertykov

2015-02-18 14:59 GMT+03:00 Georg-Johann Lay :
> Am 02/17/2015 um 03:34 PM schrieb Denis Chertykov:
>
>> 2015-02-17 14:12 GMT+03:00 Georg-Johann Lay :
>>>
>>> Byte-wise pushing virtual regs like arg pointer migth result in patterns
>>> like
>>>
>>>   (set (mem:QI (post_dec:HI (reg:HI 32 SP)))
>>>(subreg:QI (plus:HI (reg:HI 28)
>>>(const_int 17)) 0))
>>>
>>> after elimination.
>>>
>>> Attached patch uses new pushhi1_insn to push virtuals in HImode so that
>>> expressions like in subreg_reg from above can be reloaded.
>>>
>>> Ok to commit ?
>>>
>>> Johann
>>>
>>>  PR target/64452
>>>
>>>  * config/avr/avr.md (pushhi_insn): New insn.
>>>  (push1): Push virtual regs in one chunk using
>>> pushhi1_insn.
>>
>>
>> Approved.
>> (But I'm worry about this because it's reload related problem and it
>> can have a side effect)
>>
>> Denis.
>
>
> So you have a superior solution in mind?
>
> What side effects specifically?
>
> Currently the side effect is that reload gets simpler expressions and hence
> does not ICE.  There isn't even an insn that can push complex (plus rtx in
> this case) expressions or subregs thereof.  Even if there were such insns I
> don't think reload is supposed to handle them.
>
> The current implementation of push1 assumes that all RTXes which ever
> appear in a push can be decomposed into subregs and these can be simplified
> to some of the push insns, i.e. the push operand simplifies to REG or
> CONST0_RTX.  The subreg above, however, cannot be simplified to anything
> reload can handle and does not match an insn.  And supplying such an insn is
> pointless because that insn would need a scratch and hence require secondary
> reloads...
>
> plus rtxes are special as they might be produced by reload (R28 above is
> (hard_)frame_pointer).  For similar reason there are two addhi3 insns (one
> without scratch to accommodate reload and one generic with scratch for
> better performance.)

I don't have any concrete objections.
I'm worried because it's not so easy to predict all possible reloads.
(At least for me)

Denis.

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker  wrote:
> Yay? Nay?
>
> Thank you.
>
> On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
>> Hello,
>>
>> This patch enhances config/zlib.m4 to introduce an extra option
>> --with-libz-prefix which allows us to provide the location of
>> the zlib library we want to use during the build.
>>
>> config/ChangeLog:
>>
>> * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.
>>
>> I didn't see any file in the GCC project that uses this macro,
>> so for the GCC repository, the change to zlib.m4 is it. But
>> I am also attaching to this email a copy of the patch that
>> will be applied to the binutils-gdb.git repository, with all
>> configury using this macro being re-generated - mostly for info,
>> also as a heads-up to both binutils and GDB.
>>
>> This was tested by regenerating all autoconf/automake files in
>> the binutils-gdb project, and rebuilding GDB, using the following
>> combinations:
>>
>>   --with-zlib (system zlib used)
>>   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>>   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>>
>>   --without-zlib (zlib support turned off)
>>   --without-zlib --with-zlib-prefix (zlib support turned off)
>>
>>   --with-zlib (no system zlib available, configure fails with expected error)
>>   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
>>   (no system zlib, configure fails with same error)
>>
>> OK to commit?

Why do you want to turn off zlib? On Linux/x86,  zlib is required
for assembler.  At least, you should issue an error when --without-libz
is used in binutils for Linux/x86 target.

I guess someone has asked it before.  Why can't zlib be made the
same as

  --with-mpc=PATH specify prefix directory for installed MPC package.
  Equivalent to --with-mpc-include=PATH/include plus
  --with-mpc-lib=PATH/lib
  --with-mpc-include=PATH specify directory for installed MPC include files
  --with-mpc-lib=PATH specify directory for the installed MPC library

It is more flexible than your patch.  If you have some existing packages
which use your scheme, you can translate the configure command line
options to this one.


-- 
H.J.

Re: [PATCH] Use automatic dependencies for mkoffload.o

2015-02-18 Thread Richard Biener

On Wed, 18 Feb 2015, Jakub Jelinek wrote:

> On Wed, Feb 18, 2015 at 02:56:07PM +0300, Ilya Verbin wrote:
> > On Wed, Feb 18, 2015 at 12:48:21 +0100, Thomas Schwinge wrote:
> > > What is the rationale for the insn-modes.h order-only prerequisites for
> > > mkoffload.o?  Is this simply to get past the build issue which, for
> > > example, Jakub also reported for the nvptx mkoffload,
> > > 
> > > (»missing mkoffload.o dependencies, patch attached«), or is there a
> > > better rationale for adding (only) this one (indirect) dependency,
> > > instead of listing all of the (real) dependencies?  (After all, we do
> > > want the mkoffload executables to be rebuilt if one of their dependencies
> > > is updated.)  (I have not yet tried to figure out how to do that.)
> > 
> > Yes, mkoffload is just not working without this dependency, and works well 
> > with
> > it.  Do you know the right way how to add all other dependencies?
> 
> I've tested this for both intelmic and nvptx and it works fine.
> Ok for trunk?

Ok.

Thanks,
Richard.

> 2015-02-18  Jakub Jelinek  
> 
>   * config/i386/t-intelmic (mkoffload.o): Remove dependency on
>   insn-modes.h.
>   (ALL_HOST_OBJS): Add mkoffload.o.
>   * config/nvptx/t-nvptx (ALL_HOST_OBJS): Likewise.
> 
> --- gcc/config/i386/t-intelmic.jj 2014-11-13 15:13:25.0 +0100
> +++ gcc/config/i386/t-intelmic2015-02-18 13:11:15.650820901 +0100
> @@ -1,9 +1,10 @@
> -mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c | insn-modes.h
> +mkoffload.o: $(srcdir)/config/i386/intelmic-mkoffload.c
>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> -I$(srcdir)/../libgomp \
> -DDEFAULT_REAL_TARGET_MACHINE=\"$(real_target_noncanonical)\" \
> -DDEFAULT_TARGET_MACHINE=\"$(target_noncanonical)\" \
> $< $(OUTPUT_OPTION)
> +ALL_HOST_OBJS += mkoffload.o
>  
>  mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
> $(LIBIBERTY) $(LIBDEPS)
>   $(COMPILER) -o $@ mkoffload.o collect-utils.o libcommon-target.a 
> $(LIBIBERTY) $(LIBS)
> --- gcc/config/nvptx/t-nvptx.jj   2015-02-18 12:36:20.0 +0100
> +++ gcc/config/nvptx/t-nvptx  2015-02-18 13:10:19.822762534 +0100
> @@ -3,6 +3,7 @@ CFLAGS-mkoffload.o += $(DRIVER_DEFINES)
>  mkoffload.o: $(srcdir)/config/nvptx/mkoffload.c
>   $(COMPILE) $<
>   $(POSTCOMPILE)
> +ALL_HOST_OBJS += mkoffload.o
>  
>  mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
> $(LIBIBERTY) $(LIBDEPS)
>   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

Re: [PATCH] PR target/65064: Return false for COMMON symbols

2015-02-18 Thread H.J. Lu

On Sun, Feb 15, 2015 at 06:19:21AM -0800, H.J. Lu wrote:
> Hi,
> 
> r220674 exposed a bug in ia64_in_small_data_p.  After r220674, COMMON
> symbols binds locally for executables.  But ia64_in_small_data_p returns
> true for COMMON symbols which are never in small data section.  This patch
> fixes it.  OK for trunk?
> 
> H.J.
> 
> Since COMMON symbols are never in small data section, ia64_in_small_data_p
> should return false for COMMON symbols.
> 
>   PR target/65064
>   * config/ia64/ia64.c (ia64_in_small_data_p): Return false for
>   COMMON symbols.

Although common symbols are defined in executables, they aren't in small
data section.  But a definition in small data section overrides a common
symbol, which still binds lcoally, and turns a reference to common symbol
to reference to small data section.  Even if ia64_in_small_data_p returns
true on common symbols, sdata_symbolic_operand must return false on common
symbols.  Common symbols are assumed to be placed in small data section,
but are accessed as if they are in normal data section so that they won't
cause any relocation overflow.

Tested by Andreas Schwab  . OK for trunk?

Thanks.

H.J.
---
PR target/65064
* config/ia64/predicates.md (sdata_symbolic_operand): Return false
for common symbols.

diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md
index cba0efe..b550882 100644
--- a/gcc/config/ia64/predicates.md
+++ b/gcc/config/ia64/predicates.md
@@ -69,7 +69,12 @@
 of constants here.  */
  t = SYMBOL_REF_DECL (op);
  if (DECL_P (t))
-   t = DECL_SIZE_UNIT (t);
+   {
+ /* Common symbol isn't placed in small data section.  */
+ if (DECL_COMMON (t))
+   return false;
+ t = DECL_SIZE_UNIT (t);
+   }
  else
t = TYPE_SIZE_UNIT (TREE_TYPE (t));
  if (t && tree_fits_shwi_p (t))

Re: [PATCH] PR target/65064: Return false for COMMON symbols

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 5:18 AM, H.J. Lu  wrote:
> On Sun, Feb 15, 2015 at 06:19:21AM -0800, H.J. Lu wrote:
>> Hi,
>>
>> r220674 exposed a bug in ia64_in_small_data_p.  After r220674, COMMON
>> symbols binds locally for executables.  But ia64_in_small_data_p returns
>> true for COMMON symbols which are never in small data section.  This patch
>> fixes it.  OK for trunk?
>>
>> H.J.
>> 
>> Since COMMON symbols are never in small data section, ia64_in_small_data_p
>> should return false for COMMON symbols.
>>
>>   PR target/65064
>>   * config/ia64/ia64.c (ia64_in_small_data_p): Return false for
>>   COMMON symbols.
>
>
> Although common symbols are defined in executables, they aren't in small
> data section.  But a definition in small data section overrides a common
> symbol, which still binds lcoally, and turns a reference to common symbol
> to reference to small data section.  Even if ia64_in_small_data_p returns
> true on common symbols, sdata_symbolic_operand must return false on common

 ^^ It should be true.
> symbols.  Common symbols are assumed to be placed in small data section,
> but are accessed as if they are in normal data section so that they won't
> cause any relocation overflow.
>
> Tested by Andreas Schwab  . OK for trunk?
>
> Thanks.
>
>
> H.J.
> ---
> PR target/65064
> * config/ia64/predicates.md (sdata_symbolic_operand): Return false
> for common symbols.
>
> diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md
> index cba0efe..b550882 100644
> --- a/gcc/config/ia64/predicates.md
> +++ b/gcc/config/ia64/predicates.md
> @@ -69,7 +69,12 @@
>  of constants here.  */
>   t = SYMBOL_REF_DECL (op);
>   if (DECL_P (t))
> -   t = DECL_SIZE_UNIT (t);
> +   {
> + /* Common symbol isn't placed in small data section.  */
> + if (DECL_COMMON (t))
> +   return false;
> + t = DECL_SIZE_UNIT (t);
> +   }
>   else
> t = TYPE_SIZE_UNIT (TREE_TYPE (t));
>   if (t && tree_fits_shwi_p (t))



-- 
H.J.

Re: [wwwdocs] Porting to again

2015-02-18 Thread Marek Polacek

On Wed, Feb 18, 2015 at 01:16:55PM +0100, Jakub Jelinek wrote:
> On Wed, Feb 18, 2015 at 01:04:30PM +0100, Marek Polacek wrote:
> > --- porting_to.html 10 Feb 2015 11:12:20 -  1.3
> > +++ porting_to.html 18 Feb 2015 12:01:50 -
> > @@ -24,6 +24,17 @@
> >  manner. Additions and suggestions for improvement are welcome.
> >  
> >  
> > +Preprocessor issues
> > +
> > +The preprocessor started to emit line markers to properly distinguish
> > +whether a macro token comes from a system header, or from a normal header
> > +(see  > href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60723";>PR60723).
> > +These new markers can cause intriguing problems, if the packages aren't 
> > ready
> > +to handle them.  To stop the preprocessor from generating the 
> > #line
> > +directives, use the -P option, documented
> > + > href="https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#Preprocessor-Options";>here.
> > +
> 
> I think it would be nice to give here some example, like:
> #include 
> exitfailure EXIT_FAILURE
> and showing that older gcc -E used to emit
> # 2 "test.c" 2
> exitfailure 1
> whereas GCC 5 emits:
> # 2 "test.c" 2
> 
> # 2 "test.c"
> exitfailure 
> # 2 "test.c" 3 4
>1
> and thus it can break simple tools that expect the tokens on a single line.

Added.

> Otherwise, LGTM.

Thanks, committed now.

Marek

Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Kyrill Tkachov



On 18/02/15 12:32, Maxim Kuvyrkov wrote:

On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov  wrote:


Hi all,

This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_3
pattern and its associated splitters. The problem is that for the 2nd
alternative it will split a right-shift into a SISD left-shift by the negated
amount to be shifted by (the ushl instruction allows such semantics).
The splitter generates this RTL:

(set (match_dup 2)
   (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
   (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))

The problem here is that the shift amount register is negated without telling
the register allocator about it (and it can't figure it out itself).
So if you try to use the register that operand 2 is assigned to later on,
you get the negated shift amount instead!

The testcase in the patch demonstrates the simple code that can get miscompiled
due to this behaviour.

The solution in this patch is to negate the shift amount into the output
operand (operand 0) and mark it as an earlyclobber in that alternative.
This is actually exactly what the very similar
*aarch64_ashr_sisd_or_int_3 pattern does below.
I believe this is the safest and simplest fix at this stage.

This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
storm of costs and register pressure and ended up miscompiling
the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
Linaro compiler, generating essentially code like:

.L141:
negd8, d8   //d8 negated!
ushlv0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
fmovw0, s0
<...irrelevant code...>
b.L140
<...>
.L140:
fmovw0, s8  // s8/d8 used and incremented assuming it had not changed 
at L141
addw0, w0, 1
fmovs8, w0
fmovw1, s10
cmpw0, w1
bne.L141


Basically d8 is negated and later used as if it had not been at .L140 leading
to completely wrong behaviour.

With this patch that particular part of the assembly now contains at L141:
neg d0, d8
ushlv0.2s, v11.2s, v0.2s
fmovw0, s0

Leaving the original shift amount in d8 intact.

This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
pattern was introduced for 4.9)
Bootstrapped and tested on trunk and 4.9.

Ok for trunk and 4.9?

First of all, applauses!  I realize how difficult it was to reduce this problem.


Thanks!


Your patch looks OK to me, but I can't shake off feeling that it will pessimize 
cases when d8 is not used afterwards.  In particular, your patch makes it 
impossible to use same register for output (operand 0) and inputs (operands 1 
and 2).

Did you consider using SCRATCHes instead of re-using operand 0 with early 
clobber like in the attached [untested] patch?  If I got it all correct, 
register allocator will get more freedom in deciding which register to use for 
negated shift temporary, while still allowing reusing register from operand 0 
for one of the inputs.


I considered it (but didn't try it) because we end up demanding a 
scratch register unnecessarily for the two alternatives that don't split 
which might pessimize register allocation.


For stage 4 I think my proposed fix is the minimal one and it keeps 
consistent with the other patterns in that area that were added all 
together with:

https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

Kyrill



Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Maxim Kuvyrkov

On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov  wrote:

> 
> On 18/02/15 12:32, Maxim Kuvyrkov wrote:
>> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov  wrote:
>> 
>>> Hi all,
>>> 
>>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_3
>>> pattern and its associated splitters. The problem is that for the 2nd
>>> alternative it will split a right-shift into a SISD left-shift by the 
>>> negated
>>> amount to be shifted by (the ushl instruction allows such semantics).
>>> The splitter generates this RTL:
>>> 
>>> (set (match_dup 2)
>>>   (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>>> (set (match_dup 0)
>>>   (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>> 
>>> The problem here is that the shift amount register is negated without 
>>> telling
>>> the register allocator about it (and it can't figure it out itself).
>>> So if you try to use the register that operand 2 is assigned to later on,
>>> you get the negated shift amount instead!
>>> 
>>> The testcase in the patch demonstrates the simple code that can get 
>>> miscompiled
>>> due to this behaviour.
>>> 
>>> The solution in this patch is to negate the shift amount into the output
>>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>>> This is actually exactly what the very similar
>>> *aarch64_ashr_sisd_or_int_3 pattern does below.
>>> I believe this is the safest and simplest fix at this stage.
>>> 
>>> This bug was exposed on the Linaro 4.9 branch that happened to have the 
>>> perfect
>>> storm of costs and register pressure and ended up miscompiling
>>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>>> Linaro compiler, generating essentially code like:
>>> 
>>> .L141:
>>>negd8, d8   //d8 negated!
>>>ushlv0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>>fmovw0, s0
>>><...irrelevant code...>
>>>b.L140
>>> <...>
>>> .L140:
>>>fmovw0, s8  // s8/d8 used and incremented assuming it had not 
>>> changed at L141
>>>addw0, w0, 1
>>>fmovs8, w0
>>>fmovw1, s10
>>>cmpw0, w1
>>>bne.L141
>>> 
>>> 
>>> Basically d8 is negated and later used as if it had not been at .L140 
>>> leading
>>> to completely wrong behaviour.
>>> 
>>> With this patch that particular part of the assembly now contains at L141:
>>>neg d0, d8
>>>ushlv0.2s, v11.2s, v0.2s
>>>fmovw0, s0
>>> 
>>> Leaving the original shift amount in d8 intact.
>>> 
>>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>>> pattern was introduced for 4.9)
>>> Bootstrapped and tested on trunk and 4.9.
>>> 
>>> Ok for trunk and 4.9?
>> First of all, applauses!  I realize how difficult it was to reduce this 
>> problem.
> 
> Thanks!
>> 
>> Your patch looks OK to me, but I can't shake off feeling that it will 
>> pessimize cases when d8 is not used afterwards.  In particular, your patch 
>> makes it impossible to use same register for output (operand 0) and inputs 
>> (operands 1 and 2).
>> 
>> Did you consider using SCRATCHes instead of re-using operand 0 with early 
>> clobber like in the attached [untested] patch?  If I got it all correct, 
>> register allocator will get more freedom in deciding which register to use 
>> for negated shift temporary, while still allowing reusing register from 
>> operand 0 for one of the inputs.
> 
> I considered it (but didn't try it) because we end up demanding a scratch 
> register unnecessarily for the two alternatives that don't split which might 
> pessimize register allocation.

That's not the case.  The "X" constraint in (match_scratch) is special; it 
tells RA to not allocate register.

> 
> For stage 4 I think my proposed fix is the minimal one and it keeps 
> consistent with the other patterns in that area that were added all together 
> with:
> https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

I think this approach is OK, as long as we revisit the possibility of using 
SCRATCHes in these and similar patterns at stage 1.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

2015-02-18 Thread Kyrill Tkachov



On 18/02/15 13:46, Maxim Kuvyrkov wrote:

On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov  wrote:


On 18/02/15 12:32, Maxim Kuvyrkov wrote:

On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov  wrote:


Hi all,

This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_3
pattern and its associated splitters. The problem is that for the 2nd
alternative it will split a right-shift into a SISD left-shift by the negated
amount to be shifted by (the ushl instruction allows such semantics).
The splitter generates this RTL:

(set (match_dup 2)
   (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
   (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))

The problem here is that the shift amount register is negated without telling
the register allocator about it (and it can't figure it out itself).
So if you try to use the register that operand 2 is assigned to later on,
you get the negated shift amount instead!

The testcase in the patch demonstrates the simple code that can get miscompiled
due to this behaviour.

The solution in this patch is to negate the shift amount into the output
operand (operand 0) and mark it as an earlyclobber in that alternative.
This is actually exactly what the very similar
*aarch64_ashr_sisd_or_int_3 pattern does below.
I believe this is the safest and simplest fix at this stage.

This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
storm of costs and register pressure and ended up miscompiling
the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
Linaro compiler, generating essentially code like:

.L141:
negd8, d8   //d8 negated!
ushlv0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
fmovw0, s0
<...irrelevant code...>
b.L140
<...>
.L140:
fmovw0, s8  // s8/d8 used and incremented assuming it had not changed 
at L141
addw0, w0, 1
fmovs8, w0
fmovw1, s10
cmpw0, w1
bne.L141


Basically d8 is negated and later used as if it had not been at .L140 leading
to completely wrong behaviour.

With this patch that particular part of the assembly now contains at L141:
neg d0, d8
ushlv0.2s, v11.2s, v0.2s
fmovw0, s0

Leaving the original shift amount in d8 intact.

This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
pattern was introduced for 4.9)
Bootstrapped and tested on trunk and 4.9.

Ok for trunk and 4.9?

First of all, applauses!  I realize how difficult it was to reduce this problem.

Thanks!

Your patch looks OK to me, but I can't shake off feeling that it will pessimize 
cases when d8 is not used afterwards.  In particular, your patch makes it 
impossible to use same register for output (operand 0) and inputs (operands 1 
and 2).

Did you consider using SCRATCHes instead of re-using operand 0 with early 
clobber like in the attached [untested] patch?  If I got it all correct, 
register allocator will get more freedom in deciding which register to use for 
negated shift temporary, while still allowing reusing register from operand 0 
for one of the inputs.

I considered it (but didn't try it) because we end up demanding a scratch 
register unnecessarily for the two alternatives that don't split which might 
pessimize register allocation.

That's not the case.  The "X" constraint in (match_scratch) is special; it 
tells RA to not allocate register.


For stage 4 I think my proposed fix is the minimal one and it keeps consistent 
with the other patterns in that area that were added all together with:
https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

I think this approach is OK, as long as we revisit the possibility of using 
SCRATCHes in these and similar patterns at stage 1.


Ok, these patterns could do with some refactoring anyway (I think 
merging some in define_insn_and_split could be done). We can look at 
them next stage1.


Thanks,
Kyrill



Thanks,

--
Maxim Kuvyrkov
www.linaro.org

Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)

2015-02-18 Thread Martin Liška


On 02/17/2015 10:03 PM, Jan Hubicka wrote:

Hi,
this patch should chase away the expensive thunks and aliases walks from most
of analysis code. I think only real use left is local_p predicate that needs to
stay because i386 expect local flag to match between caller and callee when
expanding assembler thunk. I at least optimized it by first moving the walk to
be conditional for nonlocal functions only and then reorganizing
call_for_symbol_thunks_and_aliases to first inspect aliases (that is cheap) and
only then work on thunks.  Most likely this will find the non-local thunk/alias
faster.  Other cases was leftovers from the conversion of thunks from aliases
to functions.

I also noticed a bug in ipa-profile that does not disable all the
transofrms with !ipa_profile_flag used on OPTIMIZTION_NODE and fixed it.

Bootstrapped/regtested x86_64-linux, comitted.  I would be interested to
know if the call_for_symbol_thunks_and_aliases is now off your oprofiles
(sorry, easier to type than perf-profiles)

Honza

* ipa-visibility.c (function_and_variable_visibility): Only
check locality if node is not already local.
* ipa-inline.c (want_inline_function_to_all_callers_p): Use
call_for_symbol_and_aliases instead of
call_for_symbol_thunks_and_aliases.
(ipa_inline): Likewise.
* cgraph.c (cgraph_node::call_for_symbol_thunks_and_aliases):
first walk aliases.
* ipa.c (symbol_table::remove_unreachable_nodes): Use
call_for_symbol_and_aliases.
* ipa-profile.c (ipa_propagate_frequency_data): Add function_symbol.
(ipa_propagate_frequency_1): Use it; use opt_for_fn
(ipa_propagate_frequency): Update.
(ipa_profile): Add opt_for_fn gueards.
Index: ipa-visibility.c
===
--- ipa-visibility.c(revision 220741)
+++ ipa-visibility.c(working copy)
@@ -595,7 +595,8 @@ function_and_variable_visibility (bool w
  }
FOR_EACH_DEFINED_FUNCTION (node)
  {
-  node->local.local |= node->local_p ();
+  if (!node->local.local)
+node->local.local |= node->local_p ();

/* If we know that function can not be overwritten by a different 
semantics
 and moreover its section can not be discarded, replace all direct calls
Index: ipa-inline.c
===
--- ipa-inline.c(revision 220741)
+++ ipa-inline.c(working copy)
@@ -975,14 +975,14 @@ want_inline_function_to_all_callers_p (s
if (node->global.inlined_to)
  return false;
/* Does it have callers?  */
-  if (!node->call_for_symbol_thunks_and_aliases (has_caller_p, NULL, true))
+  if (!node->call_for_symbol_and_aliases (has_caller_p, NULL, true))
  return false;
/* Inlining into all callers would increase size?  */
if (estimate_growth (node) > 0)
  return false;
/* All inlines must be possible.  */
-  if (node->call_for_symbol_thunks_and_aliases (check_callers, &has_hot_call,
-   true))
+  if (node->call_for_symbol_and_aliases (check_callers, &has_hot_call,
+true))
  return false;
if (!cold && !has_hot_call)
  return false;
@@ -2359,9 +2359,9 @@ ipa_inline (void)
  if (want_inline_function_to_all_callers_p (node, cold))
{
  int num_calls = 0;
- node->call_for_symbol_thunks_and_aliases (sum_callers, &num_calls,
- true);
- while (node->call_for_symbol_thunks_and_aliases
+ node->call_for_symbol_and_aliases (sum_callers, &num_calls,
+true);
+ while (node->call_for_symbol_and_aliases
   (inline_to_all_callers, &num_calls, true))
;
  remove_functions = true;
Index: cgraph.c
===
--- cgraph.c(revision 220741)
+++ cgraph.c(working copy)
@@ -2191,6 +2191,16 @@ cgraph_node::call_for_symbol_thunks_and_

if (callback (this, data))
  return true;
+  FOR_EACH_ALIAS (this, ref)
+{
+  cgraph_node *alias = dyn_cast  (ref->referring);
+  if (include_overwritable
+ || alias->get_availability () > AVAIL_INTERPOSABLE)
+   if (alias->call_for_symbol_thunks_and_aliases (callback, data,
+include_overwritable,
+exclude_virtual_thunks))
+ return true;
+}
for (e = callers; e; e = e->next_caller)
  if (e->caller->thunk.thunk_p
&& (include_overwritable
@@ -2202,16 +2212,6 @@ cgraph_node::call_for_symbol_thunks_and_
   exclude_virtual_thunks))
return true;

-  FOR_EACH_ALIAS (this, ref)
-{
-  cgraph_n

Re: [patch] Warn on undefined loop exit

2015-02-18 Thread Jakub Jelinek

On Thu, Nov 20, 2014 at 05:27:35PM +0100, Richard Biener wrote:
> On Wed, Nov 19, 2014 at 9:19 PM, Andrew Stubbs  wrote:
> > On 19/11/14 16:39, Marek Polacek wrote:
> >>
> >> On Wed, Nov 19, 2014 at 04:32:43PM +, Andrew Stubbs wrote:
> >>>
> >>> +if (warning_at (gimple_location (elt->stmt),
> >>> +OPT_Waggressive_loop_optimizations,
> >>> +"Loop exit may only be reached after
> >>> undefined behaviour."))
> >>
> >>
> >> Warnings should start with a lowercase and should be without
> >> a fullstop at the end.
> >
> >
> > Fixed, and I spotted a britishism too.
> 
> If it's really duplicated code can you split it out to a function?
> 
> +  if (OPT_Waggressive_loop_optimizations)
> +{
> 
> this doesn't do what you think it does ;)  The variable to check is
> warn_aggressive_loop_optimizations.
> 
> +  if (exit_warned && problem_stmts != vNULL)
> +{
> 
> !problem_stmts.empty ()
> 
> Otherwise it looks ok.

This caused PR64491.  If the loop has multiple exits, the loop might be
exit earlier and so it would be just fine if the other loop exit may only be
reached after undefined behavior.

Jakub

Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)

2015-02-18 Thread Martin Liška

hase opt and generate  :  42.32 (70%) usr   0.85 (56%) sys  43.16 (69%) wall 
1387464 kB (28%) ggc
 phase stream in :  18.50 (30%) usr   0.68 (44%) sys  19.17 (31%) wall 
3528077 kB (72%) ggc
 garbage collection  :   2.24 ( 4%) usr   0.00 ( 0%) sys   2.24 ( 4%) wall  
 0 kB ( 0%) ggc
 callgraph optimization  :   0.37 ( 1%) usr   0.00 ( 0%) sys   0.37 ( 1%) wall  
38 kB ( 0%) ggc
 ipa dead code removal   :   3.06 ( 5%) usr   0.01 ( 1%) sys   2.88 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :   5.72 ( 9%) usr   0.06 ( 4%) sys   5.87 ( 9%) wall  
 0 kB ( 0%) ggc
 ipa devirtualization:   0.18 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall  
 22382 kB ( 0%) ggc
 ipa cp  :   2.88 ( 5%) usr   0.09 ( 6%) sys   2.97 ( 5%) wall  
515623 kB (10%) ggc
 ipa inlining heuristics :  13.96 (23%) usr   0.13 ( 8%) sys  14.12 (23%) wall  
471848 kB (10%) ggc
 ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple in   :   2.54 ( 4%) usr   0.48 (31%) sys   3.23 ( 5%) wall  
645652 kB (13%) ggc
 ipa lto decl in :  12.64 (21%) usr   0.37 (24%) sys  13.01 (21%) wall 
2592737 kB (53%) ggc
 ipa lto constructors in :   0.17 ( 0%) usr   0.01 ( 1%) sys   0.20 ( 0%) wall  
 16493 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.58 ( 1%) usr   0.09 ( 6%) sys   0.67 ( 1%) wall  
437504 kB ( 9%) ggc
 ipa lto decl merge  :   1.90 ( 3%) usr   0.00 ( 0%) sys   1.90 ( 3%) wall  
  8191 kB ( 0%) ggc
 ipa lto cgraph merge:   1.30 ( 2%) usr   0.00 ( 0%) sys   1.29 ( 2%) wall  
 14989 kB ( 0%) ggc
 whopr wpa   :   0.91 ( 1%) usr   0.00 ( 0%) sys   0.88 ( 1%) wall  
 2 kB ( 0%) ggc
 whopr partitioning  :   2.66 ( 4%) usr   0.00 ( 0%) sys   2.67 ( 4%) wall  
  6081 kB ( 0%) ggc
 ipa reference   :   1.38 ( 2%) usr   0.01 ( 1%) sys   1.40 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.21 ( 0%) usr   0.01 ( 1%) sys   0.21 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   1.61 ( 3%) usr   0.01 ( 1%) sys   1.61 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa icf :   4.99 ( 8%) usr   0.06 ( 4%) sys   5.00 ( 8%) wall  
  1120 kB ( 0%) ggc
 tree SSA rewrite:   0.12 ( 0%) usr   0.02 ( 1%) sys   0.12 ( 0%) wall  
 23170 kB ( 0%) ggc
 tree SSA incremental:   0.23 ( 0%) usr   0.05 ( 3%) sys   0.21 ( 0%) wall  
 14434 kB ( 0%) ggc
 tree operand scan   :   0.14 ( 0%) usr   0.03 ( 2%) sys   0.22 ( 0%) wall  
145252 kB ( 3%) ggc
 dominance frontiers :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 dominance computation   :   0.14 ( 0%) usr   0.05 ( 3%) sys   0.11 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.03 ( 0%) wall  
 0 kB ( 0%) ggc
 loop fini   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.62 ( 1%) usr   0.00 ( 0%) sys   0.65 ( 1%) wall  
 0 kB ( 0%) ggc
 TOTAL :  60.82 1.5362.34
4917531 kB
[ perf record: Woken up 59 times to write data ]
[ perf record: Captured and wrote 14.722 MB perf.data (~643202 samples) ]
marxin@marxinbox:~/Programming/gecko-dev/obj-x86_64-unknown-linux-gnu/toolkit/library>
 perf report
marxin@marxinbox:~/Programming/gecko-dev/obj-x86_64-unknown-linux-gnu/toolkit/library>
 gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/marxin/Programming/bin/gcc2/lib/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-libsanitizer 
--prefix=/home/marxin/Programming/bin/gcc2 --disable-bootstrap 
--enable-checking=release
Thread model: posix
gcc version 5.0.0 20150218 (experimental) (GCC) 
marxin@marxinbox:~/Programming/gecko-dev/obj-x86_64-unknown-linux-gnu/toolkit/library>
 perf report
marxin@marxinbox:~/Programming/gecko-dev/obj-x86_64-unknown-linux-gnu/toolkit/library>
 perf report --stdio | sed 's/\ *$//' | head -n50 
# To display the perf.data header info, please use --header/--header-only 
options.
#
# Samples: 245K of event 'cycles'
# Event count (approx.): 216467422123
#
# Overhead   Command  Shared Object
#     .  
..
#
 4.97%  lto1-wpa  lto1   [.] inflate_fast
 2.78%  lto1-wpa  lto1   [.] 
symbol_table::remove_unreachable_nodes(_IO_FILE*)
 2.37%  lto1-wpa  libc-2.19.so   [.] _int_malloc
 1.77%  lto1-wpa  lto1   [.] 
record_target_from_binfo(vec&, vec*, tree_node*, tree_node*, vec&, 
lon

Re: [PATCH] PR rtl-optimization/32219: optimizer causees wrong code in pic/hidden/weak symbol checking

2015-02-18 Thread Alex Velenko


On 13/02/15 05:11, Richard Henderson wrote:

On 02/12/2015 08:14 PM, H.J. Lu wrote:

I tried the second patch.  Results look good on Linux/x86-64.


Thanks.  My results concurr.  I went ahead and installed the patch as posted.


r~


2015-02-12  H.J. Lu  
 Richard Henderson  

 PR rtl/32219
 * cgraphunit.c (cgraph_node::finalize_function): Set definition
 before notice_global_symbol.
 (varpool_node::finalize_decl): Likewise.
 * varasm.c (default_binds_local_p_2): Rename from
 default_binds_local_p_1, add weak_dominate argument.  Use direct
 returns instead of assigning to local variable.  Unify varpool and
 cgraph paths via symtab_node.  Reject undef weak variables before
 testing visibility.  Reorder tests for simplicity.
 (default_binds_local_p): Use default_binds_local_p_2.
 (default_binds_local_p_1): Likewise.
 (decl_binds_to_current_def_p): Unify varpool and cgraph paths
 via symtab_node.
 (default_elf_asm_output_external): Emit visibility when specified.

2015-02-12  H.J. Lu  

 PR rtl/32219
 * gcc.dg/visibility-22.c: New test.
 * gcc.dg/visibility-23.c: New test.
 * gcc.target/i386/pr32219-1.c: New test.
 * gcc.target/i386/pr32219-2.c: New test.
 * gcc.target/i386/pr32219-3.c: New test.
 * gcc.target/i386/pr32219-4.c: New test.
 * gcc.target/i386/pr32219-5.c: New test.
 * gcc.target/i386/pr32219-6.c: New test.
 * gcc.target/i386/pr32219-7.c: New test.
 * gcc.target/i386/pr32219-8.c: New test.
 * gcc.target/i386/pr64317.c: Expect GOTOFF, not GOT.



Hi all,
By changing behaviour of varasm.c:default_binds_local_p, this patch 
changes behaviour of gcc/config/arm/arm.c:arm_function_in_section_p and 
through it breaks gcc/config/arm/arm.c:arm_is_long_call_p for weak symbols.


As a result, I get regression for gcc.target/arm/long-calls-1.c on
arm-none-eabi:
FAIL: gcc.target/arm/long-calls-1.c scan-assembler-not \tbl?\tweak_l1\n
FAIL: gcc.target/arm/long-calls-1.c scan-assembler-not \tbl?\tweak_l3\n

In https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html there
is a description for -mlong-calls.

This has to be fixed.

Kind regards,
Alex

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-18 Thread Kenneth Zadeck





> On Feb 18, 2015, at 3:23 AM, Joseph Myers  wrote:
> 
>> On Tue, 17 Feb 2015, Kenneth Zadeck wrote:
>> 
>> The fp exceptions raise some very tricky issues with respect to gcc and 
>> optimization.  On many machines, noisy does not mean to throw an 
>> exception, it means that you set a bit and then check later.  If you try 
>> to model this kind of behavior in gcc, you end up pinning the code so 
>> that nothing can be moved or reordered.
> 
> When I say exception here, I'm always referring to that flag bit setting, 
> not to processor-level exceptions.  In IEEE 754 terms, an exception is 
> *signaled*, and the default exception handling is to *raise* a flag and 
> deliver a default result (except for exact underflow which doesn't raise 
> the flag).
> 
> To quote Annex F, "This specification does not require support for trap 
> handlers that maintain information about the order or count of 
> floating-point exceptions. Therefore, between function calls, 
> floating-point exceptions need not be precise: the actual order and number 
> of occurrences of floating-point exceptions (> 1) may vary from what the 
> source code expresses.".  So it is not necessary to be concerned about 
> configurations where trap handlers may be called.
> 
> There is as yet no public draft of TS 18661-5 (Supplementary attributes).  
> That will provide C bindings for alternate exception handling as described 
> in IEEE 754-2008 clause 8.  I suspect such bindings will not readily be 
> efficiently implementable using processor-level exception handlers; SIGFPE 
> is an awkward interface for implementing such things at the C language 
> level, some processors do not support such trap handlers at all (e.g. many 
> ARM processors), and where traps are supported they may be asynchronous 
> rather than occurring immediately on execution of the relevant 
> instruction.  In addition, at least x86 does not support raising exception 
> flags without running trap handlers on the next floating-point instruction 
> (raiseFlags operation, fesetexcept in TS 18661-1); that is, if trap 
> handlers were used to implement standard functionality, it would need to 
> be in a way such that this x86 peculiarity is not visible.
my point here is that what you want to be able to do is freely reorder the fp 
operations ( within the rules of reordering fp operations) between places were 
those bits are explicitly read or cleared.   were have no way to model that 
chain of modify operations in gcc.
> 
>> to get this right gcc needs something like a monotonic dependency which 
>> would allow reordering and gcc has nothing like this.  essentially, you 
>> need way to say that all of these insns modify the same variable, but 
>> they all just move the value in the same direction so you do not care 
>> what order the operations are performed in.  that does not mean that 
>> this could not be added but gcc has nothing like this.
> 
> Indeed, this is one of the things about defining the default mode that I 
> referred to; the present default is -ftrapping-math, but we may wish to 
> distinguish between strict trapping-math (whenever exception flags might 
> be tested / raised / lowered, exactly the computations specified by the 
> abstract machine have occurred, which might mean rather more limits on 
> code movement in the absence of monotonic dependencies) and loose trapping 
> math (like the present default; maybe don't transform expressions locally 
> in ways that add or remove exceptions, but don't treat an expression as 
> having side effects or reading global state purely because of possible 
> raising of floating-point exceptions).
> 
>> going back to the rounding modes issue, there is a huge range in the 
>> architectural implementation space.  you have a few that are pure 
>> dynamic, a few that are pure static and some in the middle that are just 
>> a mess.  a lot of machines would have liked to support fully static, but 
>> could not fit the bits to specify the rounding modes into the 
>> instruction.  my point here is you do need to at least have a plan that 
>> will support the full space even if you do this with a 1000 small 
>> patches.
> 
> I think the norm is dynamic, because that's what was in IEEE 754-1985, 
> with static rounding added more recently on some processors, because of 
> IEEE 754-2008.  (There are other variants - IA64 having multiple dynamic 
> rounding mode registers and allowing instructions to specify which one the 
> rounding mode is taken from.)
the first ieee standard only allowed the dynamic model.   the second allows the 
static model.   while dynamic is more common, there are/were architectures that 
are fully static.   i believe that the first sparks were fully static and this 
was why the standard changed. ( i could be completely wrong on which arch was 
the first fully static).  the private port that i am working on is currently 
fully static, but i am trying to change that.   code generation of a dynamic 
program

[PATCH] Fix testsuite race on additional_sources

2015-02-18 Thread Maxim Kuvyrkov

Hi,

This testsuite patch fixes race on additional_source testsuite variable.  When 
a test has both dg-additional-sources and "dg-do run { target FOO }" 
directives, it may occur that the FOO test will attempt to use 
additional_sources, which will result in failure to compile FOO test.  It often 
happens that FOO test was done for one of the previous testcases (which didn't 
use dg-additional-sources), so the failure case is not stable.

This behavior can be more-or-less reliably triggered with

make check-gcc make RUNTESTFLAGS="i386.exp=gcc.target/i386/pr64291-1.c"

The attached patch fixes the problem.  OK for trunk and 4.9 branch?

Thanks,

--
Maxim Kuvyrkov
www.linaro.org



fix-race-on-additional_sources.ChangeLog
Description: Binary data


fix-race-on-additional_sources.patch
Description: Binary data

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-18 Thread Joseph Myers

On Wed, 18 Feb 2015, Kenneth Zadeck wrote:

> > I think the norm is dynamic, because that's what was in IEEE 754-1985, 
> > with static rounding added more recently on some processors, because of 
> > IEEE 754-2008.  (There are other variants - IA64 having multiple dynamic 
> > rounding mode registers and allowing instructions to specify which one the 
> > rounding mode is taken from.)
> the first ieee standard only allowed the dynamic model.  the second 
> allows the static model.  while dynamic is more common, there are/were 
> architectures that are fully static.  i believe that the first sparks 
> were fully static and this was why the standard changed. ( i could be 
> completely wrong on which arch was the first fully static).  the private 
> port that i am working on is currently fully static, but i am trying to 
> change that.  code generation of a dynamic program on a fully static 
> machine is gruesome.
> 
> my point here is that there are fully static machines so do not do 
> anything that precludes this.

The C99 standard was hardly designed for such systems, given the 
expectation that you can set the rounding mode with fesetround and then 
have it affect library functions (those that are fully-defined operations 
such as sqrt and fma, that is).  It's not that you can't implement it on 
such a system (by having the functions contain a switch over the 
thread-local variable with the rounding mode, for example, and doing the 
same in all user code that enables FENV_ACCESS and might possibly run in 
non-default rounding modes), but it's not exactly convenient.

Essentially that would involve the reverse of what TS 18661-1 envisages 
when it gives an example of a code sequence with __swapround to implement 
constant rounding directions on a machine where the rounding mode is 
dynamic only.

(To implement the FENV_ROUND pragma, one might have the front end insert 
__builtin_feswapround calls - in which case machines with static rounding 
modes would need to reverse that to identify the rounding modes for 
particular operations - or one might have it tag the operations and a 
later lowering stage insert such calls.  That's in addition to affecting 
constants and appropriately marked function calls.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH][AArch64] Testcase fix for __ATOMIC_CONSUME

2015-02-18 Thread Alex Velenko


On 12/02/15 18:38, Mike Stump wrote:

On Feb 11, 2015, at 12:16 PM, Torvald Riegel  wrote:

On Mon, 2015-02-09 at 09:10 -0800, Mike Stump wrote:

On Feb 9, 2015, at 7:11 AM, Alex Velenko  wrote:

The following patch makes atomic-op-consume.c XFAIL

Is this patch ok?


Ok.

I’d shorten the comment above the xfail to be exceedingly short:

  /* PR59448 consume not implemented yet */

The reason is the brain can process this about 8x faster.  Also, one can cut 
and paste the PR part into a web browser directly, or, if you have an electric 
bugzilla mode for emacs, it will pop right up. */


Given the discussions we had in ISO C++ SG1, it seems the only way to
fix memory_order_consume is to deprecate it (or let it rot forever), and
add something else to the standard.  See
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4321.pdf


Nice paper, thanks.


IOW, I believe the promotion is here to stay.  I'm not aware of any
other implementation doing something else.

Thus, XFAIL doesn't seem right to me.


Since Jakub in PR64930 updated to the now expected output instead of xfail, and 
given the paper above, easy to agree with this.  The changes to remove the 
xfail and expect the now expected codegen are pre-approved.




Hi Mike,
As pre-approved trivial change, on Monday I commited the following patch:

gcc/testsuite/

2015-02-16  Alex Velenko  

* gcc.target/aarch64/atomic-op-consume.c (scan-assember-times):
Directive adjusted to scan for ldaxr.
* gcc.target/arm/atomic-op-consume.c (scan-assember-times): Directive
adjusted to scan for ldaex.


diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c 
b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c

index 0e6dbbe..26ebbdf 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
@@ -3,6 +3,6 @@

 #include "atomic-op-consume.x"

-/* PR59448 consume not implemented yet.  */
-/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, 
\\\[x\[0-9\]+\\\]" 6 { xfail *-*-* } } } */

+/* Scan for ldaxr is a PR59448 consume workaround.  */
+/* { dg-final { scan-assembler-times "ldaxr\tw\[0-9\]+, 
\\\[x\[0-9\]+\\\]" 6 } } */
 /* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, w\[0-9\]+, 
\\\[x\[0-9\]+\\\]" 6 } } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c

index fafe4d6..6c5f989 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c
@@ -5,7 +5,7 @@

 #include "../aarch64/atomic-op-consume.x"

-/* PR59448 consume not implemented yet.  */
-/* { dg-final { scan-assembler-times "ldrex\tr\[0-9\]+, 
\\\[r\[0-9\]+\\\]" 6 { xfail *-*-* } } } */

+/* Scan for ldaex is a PR59448 consume workaround.  */
+/* { dg-final { scan-assembler-times "ldaex\tr\[0-9\]+, 
\\\[r\[0-9\]+\\\]" 6 } } */
 /* { dg-final { scan-assembler-times "strex\t...?, r\[0-9\]+, 
\\\[r\[0-9\]+\\\]" 6 } } */

 /* { dg-final { scan-assembler-not "dmb" } } */

[PATCH] rtl-optimization/64935: Sorting of ready list is different with/without DEBUG_INSNs.

2015-02-18 Thread Maxim Kuvyrkov

Hi,

This patch fixes PR64935, which is triggered when ready list at the start of a 
basic block is greater than --param=max-sched-ready-insns.  Sorting the ready 
list when it has more than max-sched-ready-insns elements is special in that we 
want to sort normal insns even if there are debug insns in the list.  This is 
due to code for max-sched-ready-insns ignoring debug insns on purpose.

The problem in the bug can be fixed with a smallish patch (see 
https://gcc.gnu.org/bugzilla/attachment.cgi?id=34674), but it makes code look 
ugly and non-intuitive.  The second version of the patch (the one attached 
here) is a bit bigger, but it gives functions definitive and clear purpose, and 
makes code easier to understand.

While reviewing the patch I suggest using context diff mode (in emacs C-c C-d). 
 I couldn't convince git to generate context diff.

Bootstrapped/tested on x86_64-linux-gnu and cross-tested on 
arm-linux-gnueabihf.  Markus also tested this patch on powerpc64-linux-gnu.

OK for trunk?

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0001-Fix-PR64935.patch
Description: Binary data

Re: [PATCH][ARM] PR target/64600 Fix another ICE with -mtune=xscale: properly sign-extend mask during constant splitting

2015-02-18 Thread Kyrill Tkachov


Ping.

Thanks,
Kyrill
On 10/02/15 09:25, Kyrill Tkachov wrote:

Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00141.html

Thanks,
Kyrill

On 03/02/15 15:18, Kyrill Tkachov wrote:

Hi all,

The ICE in this PR occurs when -mtune=xscale triggers a particular path
through arm_gen_constant during expand
that creates a 0xf00f mask but for a 64-bit HOST_WIDE_INT doesn't
sign extend it into
0xf00f that signifies the required -4081. It leaves it as
0xf00f (4294963215) that breaks when
later combine tries to perform an SImode bitwise AND using the wide-int
machinery.

I think the correct approach here is to use trunc_int_for_mode that
correctly sign-extends the constant so
that it is properly represented by a HOST_WIDE_INT for the required mode.

Bootstrapped and tested arm-none-linux-gnueabihf with -mtune=xscale in
BOOT_CFLAGS.

The testcase triggers for -mcpu=xscale and all slowmul targets because
they are the only ones that have the
constant_limit tune parameter set to anything >1 which is required to
follow this particular path through
arm_split_constant. Also, the rtx costs can hide this ICE sometimes.

Ok for trunk?

Thanks,
Kyrill

2015-02-03  Kyrylo Tkachov  

   PR target/64600
   * config/arm/arm.c (arm_gen_constant, AND case): Call
   trunc_int_for_mode when constructing AND mask.

2015-02-03  Kyrylo Tkachov  

   PR target/64600
   * gcc.target/arm/pr64600_1.c: New test.

Type comparing TLC

2015-02-18 Thread Jan Hubicka

Hi,
looking across the ODR violation messages in libreoffice and Chromium I found
some false positives and some confused messages.  This patch fixes them. In 
partiuclar
 - I introduced nasty vtable corruption when breaking out my type merging 
patches,
   so we ended up creating separate entries for each copy of type without BINFO 
:(
 - C++ now allows to use enum that has no fields defined. Those needs to match 
enums
   with fields from other unit
 - class and vtable layout diffing got confused by presence of extra vptr 
pointers
   and bases. Fixed thus.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-devirt.c (odr_subtypes_equivalent_p): Fix formating.
(compare_virtual_tables): Be smarter about skipping typeinfos;
do sane output on virtual table table mismatch.
(warn_odr): Be ready for forward declarations of enums;
output sane info on base mismatch and virtual table mismatch.
(add_type_duplicate): Fix code choosing prevailing type; do not ICE
when only one type is polymorphic.
(get_odr_type): Fix hashtable corruption.
(dump_odr_type): Dump mangled names.

Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 220741)
+++ ipa-devirt.c(working copy)
@@ -551,7 +551,8 @@ set_type_binfo (tree type, tree binfo)
 /* Compare T2 and T2 based on name or structure.  */
 
 static bool
-odr_subtypes_equivalent_p (tree t1, tree t2, hash_set 
*visited)
+odr_subtypes_equivalent_p (tree t1, tree t2,
+  hash_set *visited)
 {
   bool an1, an2;
 
@@ -618,7 +619,8 @@ compare_virtual_tables (varpool_node *pr
  prevailing = vtable;
  vtable = tmp;
}
-  if (warning_at (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT 
(vtable->decl))),
+  if (warning_at (DECL_SOURCE_LOCATION
+   (TYPE_NAME (DECL_CONTEXT (vtable->decl))),
  OPT_Wodr,
  "virtual table of type %qD violates one definition rule",
  DECL_CONTEXT (vtable->decl)))
@@ -633,39 +635,118 @@ compare_virtual_tables (varpool_node *pr
 {
   struct ipa_ref *ref1, *ref2;
   bool end1, end2;
+
   end1 = !prevailing->iterate_reference (n1, ref1);
   end2 = !vtable->iterate_reference (n2, ref2);
-  if (end1 && end2)
-   return;
-  if (!end1 && !end2
- && DECL_ASSEMBLER_NAME (ref1->referred->decl)
-!= DECL_ASSEMBLER_NAME (ref2->referred->decl)
- && !n2
- && !DECL_VIRTUAL_P (ref2->referred->decl)
- && DECL_VIRTUAL_P (ref1->referred->decl))
+
+  /* !DECL_VIRTUAL_P means RTTI entry;
+We warn when RTTI is lost because non-RTTI previals; we silently
+accept the other case.  */
+  while (!end2
+&& (end1
+|| (DECL_ASSEMBLER_NAME (ref1->referred->decl)
+!= DECL_ASSEMBLER_NAME (ref2->referred->decl)
+&& DECL_VIRTUAL_P (ref1->referred->decl)))
+&& !DECL_VIRTUAL_P (ref2->referred->decl))
{
- if (warning_at (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT 
(vtable->decl))), 0,
+ if (warning_at (DECL_SOURCE_LOCATION
+   (TYPE_NAME (DECL_CONTEXT (vtable->decl))), 0,
  "virtual table of type %qD contains RTTI information",
  DECL_CONTEXT (vtable->decl)))
{
- inform (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT 
(prevailing->decl))),
- "but is prevailed by one without from other translation 
unit");
- inform (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT 
(prevailing->decl))),
+ inform (DECL_SOURCE_LOCATION
+   (TYPE_NAME (DECL_CONTEXT (prevailing->decl))),
+ "but is prevailed by one without from other translation "
+ "unit");
+ inform (DECL_SOURCE_LOCATION
+   (TYPE_NAME (DECL_CONTEXT (prevailing->decl))),
  "RTTI will not work on this type");
}
  n2++;
   end2 = !vtable->iterate_reference (n2, ref2);
}
-  if (!end1 && !end2
- && DECL_ASSEMBLER_NAME (ref1->referred->decl)
-!= DECL_ASSEMBLER_NAME (ref2->referred->decl)
- && !n1
- && !DECL_VIRTUAL_P (ref1->referred->decl)
- && DECL_VIRTUAL_P (ref2->referred->decl))
+  while (!end1
+&& (end2
+|| (DECL_ASSEMBLER_NAME (ref2->referred->decl)
+!= DECL_ASSEMBLER_NAME (ref1->referred->decl)
+&& DECL_VIRTUAL_P (ref2->referred->decl)))
+&& !DECL_VIRTUAL_P (ref1->referred->decl))
{
  n1++;
   end1 = !vtable->iterate_reference (n1, ref1);
}
+
+  /* Finished?  */
+  if (end1 && end2)
+   {
+ /* Extra parano

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mike Frysinger

On 18 Feb 2015 04:56, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker  wrote:
> > On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
> >> This patch enhances config/zlib.m4 to introduce an extra option
> >> --with-libz-prefix which allows us to provide the location of
> >> the zlib library we want to use during the build.
> >>
> >> config/ChangeLog:
> >>
> >> * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.
> >>
> >> I didn't see any file in the GCC project that uses this macro,
> >> so for the GCC repository, the change to zlib.m4 is it. But
> >> I am also attaching to this email a copy of the patch that
> >> will be applied to the binutils-gdb.git repository, with all
> >> configury using this macro being re-generated - mostly for info,
> >> also as a heads-up to both binutils and GDB.
> >>
> >> This was tested by regenerating all autoconf/automake files in
> >> the binutils-gdb project, and rebuilding GDB, using the following
> >> combinations:
> >>
> >>   --with-zlib (system zlib used)
> >>   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
> >>   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)
> >>
> >>   --without-zlib (zlib support turned off)
> >>   --without-zlib --with-zlib-prefix (zlib support turned off)
> >>
> >>   --with-zlib (no system zlib available, configure fails with expected 
> >> error)
> >>   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
> >>   (no system zlib, configure fails with same error)
> >>
> >> OK to commit?
> 
> Why do you want to turn off zlib? On Linux/x86,  zlib is required
> for assembler.  At least, you should issue an error when --without-libz
> is used in binutils for Linux/x86 target.

err, when did that happen ?  why would zlib be possibly required for an 
assembler ?
-mike


signature.asc
Description: Digital signature

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 8:54 AM, Mike Frysinger  wrote:
> On 18 Feb 2015 04:56, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker  
>> wrote:
>> > On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
>> >> This patch enhances config/zlib.m4 to introduce an extra option
>> >> --with-libz-prefix which allows us to provide the location of
>> >> the zlib library we want to use during the build.
>> >>
>> >> config/ChangeLog:
>> >>
>> >> * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.
>> >>
>> >> I didn't see any file in the GCC project that uses this macro,
>> >> so for the GCC repository, the change to zlib.m4 is it. But
>> >> I am also attaching to this email a copy of the patch that
>> >> will be applied to the binutils-gdb.git repository, with all
>> >> configury using this macro being re-generated - mostly for info,
>> >> also as a heads-up to both binutils and GDB.
>> >>
>> >> This was tested by regenerating all autoconf/automake files in
>> >> the binutils-gdb project, and rebuilding GDB, using the following
>> >> combinations:
>> >>
>> >>   --with-zlib (system zlib used)
>> >>   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>> >>   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>> >>
>> >>   --without-zlib (zlib support turned off)
>> >>   --without-zlib --with-zlib-prefix (zlib support turned off)
>> >>
>> >>   --with-zlib (no system zlib available, configure fails with expected 
>> >> error)
>> >>   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
>> >>   (no system zlib, configure fails with same error)
>> >>
>> >> OK to commit?
>>
>> Why do you want to turn off zlib? On Linux/x86,  zlib is required
>> for assembler.  At least, you should issue an error when --without-libz
>> is used in binutils for Linux/x86 target.
>
> err, when did that happen ?  why would zlib be possibly required for an
> assembler ?
> -mike

commit 89e7505fcde4bd83948f559f429a0e1eb4262f05
Author: H.J. Lu 
Date:   Sun Dec 14 06:41:03 2014 -0800

Compress debug sections for Linux/x86 by default

  * config/tc-i386.c (flag_compress_debug): Default to compress
  debug sections for Linux.


-- 
H.J.

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Joel Sherrill


On 2/18/2015 10:54 AM, Mike Frysinger wrote:
> On 18 Feb 2015 04:56, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker  
>> wrote:
>>> On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
 This patch enhances config/zlib.m4 to introduce an extra option
 --with-libz-prefix which allows us to provide the location of
 the zlib library we want to use during the build.

 config/ChangeLog:

 * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.

 I didn't see any file in the GCC project that uses this macro,
 so for the GCC repository, the change to zlib.m4 is it. But
 I am also attaching to this email a copy of the patch that
 will be applied to the binutils-gdb.git repository, with all
 configury using this macro being re-generated - mostly for info,
 also as a heads-up to both binutils and GDB.

 This was tested by regenerating all autoconf/automake files in
 the binutils-gdb project, and rebuilding GDB, using the following
 combinations:

   --with-zlib (system zlib used)
   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)

   --without-zlib (zlib support turned off)
   --without-zlib --with-zlib-prefix (zlib support turned off)

   --with-zlib (no system zlib available, configure fails with expected 
 error)
   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
   (no system zlib, configure fails with same error)

 OK to commit?
>> Why do you want to turn off zlib? On Linux/x86,  zlib is required
>> for assembler.  At least, you should issue an error when --without-libz
>> is used in binutils for Linux/x86 target.
> err, when did that happen ?  why would zlib be possibly required for an 
> assembler ?

Is there going to be a configure error when the system does not have zlib
and no argument is specified?

This is a common issue for people building tools for RTEMS for the first
time.
> -mike

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985

Re: [PATCH] PR target/65064: Return false for COMMON symbols

2015-02-18 Thread Richard Henderson

On 02/18/2015 05:18 AM, H.J. Lu wrote:
>   PR target/65064
>   * config/ia64/predicates.md (sdata_symbolic_operand): Return false
>   for common symbols.

Ok.


r~

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-02-18 Thread Thomas Schwinge

Hi!

On Mon, 13 Oct 2014 14:33:11 +0400, Ilya Verbin  wrote:
> On 13 Oct 12:19, Jakub Jelinek wrote:
> > On Sat, Oct 11, 2014 at 06:49:00PM +0400, Ilya Verbin wrote:
> > > 2. -foffload-abi=[lp64|ilp32]
> > >This option is supposed to tell mkoffload (and offload compiler) which 
> > > ABI is
> > > used in streamed GIMPLE.  This option is desirable, because host and 
> > > offload
> > > compilers must have the same ABI.  The option is generated by the host 
> > > compiler
> > > automatically, it should not be specified by user.
> > 
> > But I'd like to understand why is this one needed.
> > Why should the compilers care?  Aggregates layout and alignment of
> > integral/floating types must match between host and offload compilers, sure,
> > but isn't that something streamed already in the LTO bytecode?
> > Or is LTO streamer not streaming some types like long_type_node?
> > I'd expect if host and offload compiler disagree on long type size that
> > you'd just use a different integral type with the same size as long on the
> > host.
> > Different sized pointers are of course a bigger problem, but can't you just
> > error out on that during reading of the LTO, or even handle it (just use
> > some integral type for when is the pointer stored in memory, and just
> > convert to pointer after reads from memory, and convert back before storing
> > to memory).  Erroring out during LTO streaming in sounds just fine to me
> > though.
> 
> Actually this option was developed by Bernd, so I think PTX team is going to 
> use
> it somehow.  In MIC's case we're planning just to check in mkoffload that host
> and target compiler's ABI are the same.  Without this check we will crash in 
> LTO
> streamer with ICE, so I'd like to issue an error message, rather than 
> crashing.

In gcc/config/i386/intelmic-mkoffload.c, this option is now parsed to
initialize the target_ilp32 variable, which will then be used
(target_ilp32 ? "-m32" : "-m64") when invoking different tools.

In nvptx, we've been using the following approach:

--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -54,24 +54,28 @@
 
 /* Type Layout.  */
 
+#define TARGET_64BIT \
+  (flag_offload_abi == OFFLOAD_ABI_UNSET ? TARGET_ABI64 \
+   : flag_offload_abi == OFFLOAD_ABI_LP64 ? true : false)
+
 #define DEFAULT_SIGNED_CHAR 1
 
 #define SHORT_TYPE_SIZE 16
 #define INT_TYPE_SIZE 32
-#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32)
+#define LONG_TYPE_SIZE (TARGET_64BIT ? 64 : 32)
 #define LONG_LONG_TYPE_SIZE 64
 #define FLOAT_TYPE_SIZE 32
 #define DOUBLE_TYPE_SIZE 64
 #define LONG_DOUBLE_TYPE_SIZE 64
 
 #undef SIZE_TYPE
-#define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
+#define SIZE_TYPE (TARGET_64BIT ? "long unsigned int" : "unsigned int")
 #undef PTRDIFF_TYPE
-#define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
+#define PTRDIFF_TYPE (TARGET_64BIT ? "long int" : "int")
 
-#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
+#define POINTER_SIZE (TARGET_64BIT ? 64 : 32)
 
-#define Pmode (TARGET_ABI64 ? DImode : SImode)
+#define Pmode (TARGET_64BIT ? DImode : SImode)
 
 /* Registers.  Since ptx is a virtual target, we just define a few
hard registers for special purposes and leave pseudos unallocated.  */

Should we settle on one of the two, that is, either pass -m[...] from
mkoffload, or handle flag_offload_abi in the respective backend?  I think
I prefer the intelmic-mkoffload.c approach; this seems cleaner to me:
mkoffload "configures" the offloading compiler.  (Also, the flag 32-bit
vs. 64-bit flag may in fact be needed for tools other than the offloading
compiler).  Bernd, is there any specific reason for the approach you had
chosen?


Grüße,
 Thomas


signature.asc
Description: PGP signature

[PATCH, 4.8] Backport "Fix debug-insn sensitivity in RA" patch to 4.8

2015-02-18 Thread Uros Bizjak

Hello!

Richard's patch at [1] is needed to fix a bootstrap failure on
alpha-linux-gnu on 4.8 branch. Without the patch, IRA creates
different sequences, depending on the presence of -g option.

One of the many comparison failures is in expr.c, where
-fdebug-compare fails with:

--- expr.gkd2015-02-17 17:53:06.785223764 +0100
+++ expr.gk.gkd 2015-02-17 17:53:11.504169052 +0100
@@ -152249,17 +152249,15 @@
 (insn:TI# 0 0 (set (reg:DI 2 $2)
 (sign_extend:DI (mem:SI (reg/v/f:DI 9 $9 [orig:261 exp ]
[261]) [  S4 A64])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6342#
{*extendsidi2_1}
  (nil))
+(insn:TI# 0 0 (set (reg:DI 4 $4)
+(sign_extend:DI (mem/c:SI (plus:DI (reg/f:DI 30 $30)
+(const_int 144 [0x90])) [  S4 A64])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341#
{*extendsidi2_1}
+ (nil))
 (insn:TI# 0 0 (set (mem/c:DI (plus:DI (reg/f:DI 30 $30)
 (const_int 80 [0x50])) [ %sfp+0 S8 A64])
 (reg:DI 19 $19 [ bitregion_start ]))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341# {*movdi}
  (expr_list:REG_DEAD (reg:DI 19 $19 [ bitregion_start ])
 (nil)))
-(insn:TI# 0 0 (set (reg:DI 19 $19)
-(sign_extend:DI (mem/c:SI (plus:DI (reg/f:DI 30 $30)
-(const_int 144 [0x90])) [  S4 A64])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341#
{*extendsidi2_1}
- (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 30 $30)
-(const_int 144 [0x90])) [  S4 A64])
-(nil)))
 (insn# 0 0 (const_int 0 [0])
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6342# {nop}
  (nil))
 (insn:TI# 0 0 (set (reg:DI 1 $1 [orig:71 D. ] [71])
@@ -152270,10 +152268,10 @@
 (const_int 136 [0x88])) [ alias_set+0 S4 A64])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341#
{*extendsidi2_1}
  (nil))
 (insn:TI# 0 0 (set (reg/v:DI 15 $15 [orig:263 nontemporal ] [263])
-(zero_extend:DI (reg:QI 19 $19 [265])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341#
{zero_extendqidi2}
- (expr_list:REG_DEAD (reg:QI 19 $19 [265])
+(zero_extend:DI (reg:QI 4 $4 [265])))
../../gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:6341#
{zero_extendqidi2}
+ (expr_list:REG_DEAD (reg:QI 4 $4 [265])
 (nil)))
-(jump_insn# 0 0 (set (pc)
+(jump_insn:TI# 0 0 (set (pc)
 (if_then_else (eq (reg:DI 1 $1 [orig:71 D. ] [71])
 (const_int 0 [0]))
 (label_ref:DI #)

2015-02-18  Uros Bizjak  

Backport from mainline
2013-09-08  Richard Sandiford  

* ira.c (update_equiv_regs): Only call set_paradoxical_subreg
for non-debug insns.
* lra.c (new_insn_reg): Take the containing insn as a parameter.
Only modify lra_reg_info[].biggest_mode if it's non-debug insn.
(collect_non_operand_hard_regs, add_regs_to_insn_regno_info): Update
accordingly.

testsuite/ChangeLog:

2015-02-18  Uros Bizjak  

Backport from mainline
2013-09-08  Richard Sandiford  

* g++.dg/debug/ra1.C: New test.

Patch was bootstrapped and regression tested on 86_64-linux-gnu
{,-m32} and alpha-linux-gnu, where it fixes all bootstrap comparison
failures.

The results are at [3], gfortran failures will be fixed by alias.c
backport, after the bootstrap problem is cured.

OK for branch?

[1] https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00472.html
[2] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=202369
[3] https://gcc.gnu.org/ml/gcc-testresults/2015-02/msg02069.html

Uros.
Index: ira.c
===
--- ira.c   (revision 220763)
+++ ira.c   (working copy)
@@ -2944,11 +2944,8 @@ update_equiv_regs (void)
  prevent access beyond allocated memory for paradoxical memory subreg.  */
   FOR_EACH_BB (bb)
 FOR_BB_INSNS (bb, insn)
-  {
-   if (! INSN_P (insn))
- continue;
-   for_each_rtx (&insn, set_paradoxical_subreg, (void *)pdx_subregs);
-  }
+  if (NONDEBUG_INSN_P (insn))
+   for_each_rtx (&insn, set_paradoxical_subreg, (void *) pdx_subregs);
 
   /* Scan the insns and find which registers have equivalences.  Do this
  in a separate scan of the insns because (due to -fcse-follow-jumps)
Index: lra.c
===
--- lra.c   (revision 220763)
+++ lra.c   (working copy)
@@ -446,13 +446,13 @@ init_insn_regs (void)
 = create_alloc_pool ("insn regs", sizeof (struct lra_insn_reg), 100);
 }
 
-/* Create LRA insn related info about referenced REGNO with TYPE
-   (in/out/inout), biggest reference mode MODE, flag that it is
+/* Create LRA insn related info about a reference to REGNO in INSN with
+   TYPE (in/out/inout), biggest reference mode MODE, flag that it is
reference through subreg (SUBREG_P), flag that is early clobbered
in the insn (EARLY_CLOBBER), and reference to the next insn reg
info (NEXT). */
 static struct lra_insn_reg *
-new_insn_reg (int

Re: [Haifa Scheduler] Fix latent bug in macro-fusion/instruction grouping

2015-02-18 Thread Jeff Law


On 02/18/15 01:03, Maxim Kuvyrkov wrote:


The way SCHED_GROUP_P instructions have been handled historically is
by combination of two artifacts: (1) removing all dependencies for
instructions inside SCHED_GROUP sequence but the one to next insn,
and (2) maintaining a fast track for SCHED_GROUP insns that ensures
that once the first SCHED_GROUP insn is issued, scheduler does
nothing but issuing the single dependent insn of the current one.
The "fast track" was actually implemented by just advancing the cycle 
counter forward by an appropriate number of cycles of a SCHED_GROUP_P 
insn got queued.   So the next iteration of the loop, the SCHED_GROUP_P 
is magically ready along with potentially other instructions that had 
been queued prior to the SCHED_GROUP_P insn.


But that was OK (of course) because the SCHED_GROUP_P insns get priority 
over everything else that is ready on a particular cycle.


Bernd's work broke because the SCHED_GROUP_P insn got queued, but the 
cycle counter only moved forward one tick.  Thus previously queued insns 
could become ready while the SCHED_GROUP_P insn was waiting.


My fix restores correctness by queuing the SCHED_GROUP_P insn for just a 
single cycle (it may get queued again, but that's OK as everything else 
that's ready in the same cycle as a queued SCHED_GROUP_P insn will get 
re-queued as well).  It's marginally less compile-time efficient, but it 
was an easy, clean fix.


jeff

Re: [PATCH] PR64959: SFINAE in UDLs

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 06:29:34PM +, Alex Velenko wrote:
> this patch also fixes issues for arm-none-eabi.
> Could someone add this patch?

ENOPATCH

Jakub

[PATCH] Put cleanups of cleanups after cleanups (PR gcov-profile/64634)

2015-02-18 Thread Jakub Jelinek

Hi!

Richard's GIMPLE EH rewrite in r151696 regressed following testcase.
The problem is that when lowering:
  [gcov-15.C:14:5] try
{
  [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
  [gcov-15.C:18:12] try
{
  [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
}
  catch
{
  [gcov-15.C:18:11] __cxa_free_exception (D.2335);
}
  [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
}
  catch
{
  [gcov-15.C:20:3] catch (NULL)
{
  [gcov-15.C:20:3] try
{
  [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (0);
  [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
  [gcov-15.C:22:15] catchEx ();
}
  finally
{
  [gcov-15.C:20:10] __cxa_end_catch ();
}
}
}
we put the cleanup of the catch cleanup in front of the catch cleanup
in the EH sequence:
  [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
  [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
  [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
  :
  [gcov-15.C:24:1] D.2341 = 0;
  [gcov-15.C:24:1] goto ;
  :
  [gcov-15.C:24:1] return D.2341;
  :
  [gcov-15.C:20:10] __cxa_end_catch ();
  resx 3
  :
  eh_dispatch 1
  resx 1
  :
  [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (1);
  [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
  [gcov-15.C:22:15] catchEx ();
  [gcov-15.C:20:10] __cxa_end_catch ();
  goto ;
and as the __cxa_end_catch () is the first bb for line 20,
gcov without -a considers that bb count as the one to be shown.
Before the gimple EH rewrite and also with this patch we instead
order the cleanup (__cxa_end_catch ()) after the __cxa_begin_catch ():
  [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
  [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
  [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
  :
  [gcov-15.C:24:1] D.2341 = 0;
  [gcov-15.C:24:1] goto ;
  :
  [gcov-15.C:24:1] return D.2341;
  :
  eh_dispatch 1
  resx 1
  :
  [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (1);
  [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
  [gcov-15.C:22:15] catchEx ();
  [gcov-15.C:20:10] __cxa_end_catch ();
  goto ;
  :
  [gcov-15.C:20:10] __cxa_end_catch ();
  resx 3

Bootstrapped/regtested on x86_64-linux and i686-linux,
libstdc++.so.6 compiled without and with the patch is identical on
x86_64-linux.  The testcase also has identical generated code both at -O0
and -O2 when compiled without coverage.  Ok for trunk?

2015-02-18  Jakub Jelinek  

PR gcov-profile/64634
* tree-eh.c (frob_into_branch_around): Fix up typos
in function comment.
(lower_catch): Put eh_seq resulting from EH lowering of
the cleanup sequence after the cleanup rather than before
it.

* g++.dg/gcov/gcov-15.C: New test.

--- gcc/tree-eh.c.jj2015-02-12 08:57:35.0 +0100
+++ gcc/tree-eh.c   2015-02-18 16:46:14.878887862 +0100
@@ -884,10 +884,10 @@ eh_region_may_contain_throw (eh_region r
 /* We want to transform
try { body; } catch { stuff; }
to
-   normal_seqence:
+   normal_sequence:
  body;
  over:
-   eh_seqence:
+   eh_sequence:
  landing_pad:
  stuff;
  goto over;
@@ -1813,6 +1813,12 @@ lower_catch (struct leh_state *state, gt
   this_state.cur_region = state->cur_region;
   this_state.ehp_region = try_region;
 
+  /* Add eh_seq from lowering EH in the cleanup sequence after the cleanup
+ itself, so that e.g. for coverage purposes the nested cleanups don't
+ appear before the cleanup body.  See PR64634 for details.  */
+  gimple_seq old_eh_seq = eh_seq;
+  eh_seq = NULL;
+
   out_label = NULL;
   cleanup = gimple_try_cleanup (tp);
   for (gsi = gsi_start (cleanup);
@@ -1849,7 +1855,11 @@ lower_catch (struct leh_state *state, gt
 
   gimple_try_set_cleanup (tp, new_seq);
 
-  return frob_into_branch_around (tp, try_region, out_label);
+  gimple_seq new_eh_seq = eh_seq;
+  eh_seq = old_eh_seq;
+  gimple_seq ret_seq = frob_into_branch_around (tp, try_region, out_label);
+  gimple_seq_add_seq (&eh_seq, new_eh_seq);
+  return ret_seq;
 }
 
 /* A subroutine of lower_eh_constructs_1.  Lower a GIMPLE_TRY with a
--- gcc/testsuite/g++.dg/gcov/gcov-15.C.jj  2015-02-18 17:06:35.599727342 
+0100
+++ gcc/testsuite/g++.dg/gcov/gcov-15.C 2015-02-18 17:17:04.483358209 +0100
@@ -0,0 +1,26 @@
+// PR gcov-profile/64634
+// { dg-options "-fprofile-arcs -ftest-coverage" }
+// { dg-do run { target native } }
+
+void catchEx ()// count(1)
+{
+  __builtin_exit (0);  // count(1)
+  try
+  {}
+  catch (int)
+  {}
+}
+
+int main ()// count(1)
+{
+  try
+  {
+throw 5;   // count(1)
+  }
+  catch (...)  // count(1)
+  {
+catchEx ();// count(1)
+  }
+}
+
+// { dg-final { run-gcov gcov-15.C } }

Jakub

Re: [PATCH] PR64959: SFINAE in UDLs

2015-02-18 Thread Alex Velenko




On 13/02/15 22:21, Andrea Azzarone wrote:

We can use the same trick used in the other tests. Patch attached.
Sorry about that!

2015-02-13 20:45 GMT+01:00 Jakub Jelinek :

On Wed, Feb 11, 2015 at 12:26:33AM +0100, Andrea Azzarone wrote:

 * gcc/testsuite/g++.dg/cpp1y/udlit-char-template-vs-std-literal-operator.C:


This fails on i686-linux:

FAIL: g++.dg/cpp1y/udlit-char-template-vs-std-literal-operator.C  -std=c++14 
(test for excess errors)
Excess errors: 
/home/jakub/src/gcc/gcc/testsuite/g++.dg/cpp1y/udlit-char-template-vs-std-literal-operator.C:10:51:
 error: 'int operator""_script(const char*, long unsigned int)' has invalid 
argument list

Perhaps you meant to #include  too and use
size_t instead of unsigned long?  Or just __SIZE_TYPE__ instead
of unsigned long?

 Jakub







Hi,
this patch also fixes issues for arm-none-eabi.
Could someone add this patch?
Kind regards,
Alex.

Re: [PATCH] Use !implicit_section in the recent set_section change (PR ipa/65087)

2015-02-18 Thread Markus Trippelsdorf

On 2015.02.18 at 10:17 +0100, Jan Hubicka wrote:
> > On 2015.02.17 at 22:00 +0100, Jan Hubicka wrote:
> > > > Hi!
> > > > 
> > > > Markus reported an ICE, that is fixed by following patch, which limits
> > > > the earlier change to !implicit_section only (which I assume is the user
> > > > supplied __attribute__((section (.
> > > > 
> > > > Bootstrapped/regtested on 
> > > > {x86_64,i686,aarch64,ppc64,ppc64le,s390,s390x}-linux.
> > > > Ok for trunk?
> > > > 
> > > > 2015-02-17  Jakub Jelinek  
> > > > 
> > > > PR ipa/65087
> > > > * cgraphclones.c (cgraph_node::create_virtual_clone): Only copy
> > > > section if !implicit_section.
> > > > (cgraph_node::create_version_clone_with_body): Likewise.
> > > > * trans-mem.c (ipa_tm_create_version): Likewise.
> > > 
> > > This seems OK. I wonder what the bug Markus reported is.
> > 
> > The ICE only happens with -fdevirtualize-at-ltrans:
> > 
> > trippels@gcc2-power8 library % g++ -flto -fdevirtualize-at-ltrans -shared 
> > @list
> > lto1: internal compiler error: in ipcp_verify_propagated_values, at 
> > ipa-cp.c:1057
> > 0x10d1270f ipcp_verify_propagated_values()
> > ../../gcc/gcc/ipa-cp.c:1057
> > 0x10d1481b ipcp_propagate_stage
> > ../../gcc/gcc/ipa-cp.c:2758
> > 0x10d1481b ipcp_driver
> > ../../gcc/gcc/ipa-cp.c:4416
> > 0x10d1481b execute
> > ../../gcc/gcc/ipa-cp.c:4511
> > 
> > I will try to come up with a testcase.
> 
> This is interesting indeed. -fdevirtualize-at-ltrans should not change 
> outcome of ipa-cp,
> so we defintly have some latent bug here.  Testcase would be great.

-fno-ipa-icf also fixes the issue:

(You of course need a revision before Jakub's r220786 fix in order to reproduce)

1)

trippels@gcc20 testcase % cat test.ii
class A {
  virtual int m_fn1();
};
class B {
public:
  virtual int m_fn2();
  int m_fn3();
};
class C {
  virtual void m_fn4(int *, B *, bool);
};
class D : A, C {
  void m_fn4(int *, B *, bool);
  void m_fn5(int *, B *, bool);
};
void D::m_fn4(int *, B *p2, bool) { p2->m_fn3() && p2->m_fn2(); }
void D::m_fn5(int *, B *p2, bool) { p2->m_fn3() && p2->m_fn2(); }

trippels@gcc20 testcase % ~/gcc_test/usr/local/bin/g++ -r -nostdlib -flto 
-ffunction-sections -O2 test.ii
lto1: internal compiler error: in ipcp_verify_propagated_values, at 
ipa-cp.c:1057

2)

trippels@gcc20 testcase % cat Unified_cpp_editor_txmgr0.ii
typedef enum
{
} nsresult;
class nsISupports
{
public:
  virtual nsresult m_fn1 ();
  virtual int m_fn2 ();
  virtual int m_fn3 ();
};
class A
{
public:
  A ();
};
class nsCOMPtr_base
{
public:
  ~nsCOMPtr_base () { mRawPtr->m_fn3 (); }
  nsISupports *mRawPtr;
};
class C : nsCOMPtr_base
{
public:
  C (int);
};
class B
{
  C mTransaction;
  B ();
  A _mOwningThread;
};
B::B () : mTransaction (0) {}

trippels@gcc20 testcase % cat Unified_cpp_layout_base2.ii
#pragma GCC visibility push(hidden)
class A {
  virtual int m_fn1();
};
class B {
public:
  virtual bool m_fn2();
  bool m_fn3();
};
#pragma GCC visibility pop
class C {
  virtual void m_fn4(int *, B *, bool);
};
class D : A, C {
  void m_fn4(int *, B *, bool);
  virtual void m_fn5(int *, B *, bool);
};
void D::m_fn4(int *, B *p2, bool) {
  if (p2->m_fn3() && p2->m_fn2()) {
  }
}
void D::m_fn5(int *, B *p2, bool) {
  if (p2->m_fn3() && p2->m_fn2()) {
  }
}
trippels@gcc20 testcase % ~/gcc_test/usr/local/bin/g++ -w -r -nostdlib -flto 
-fdevirtualize-at-ltrans -ffunction-sections -O2 Unified_cpp_editor_txmgr0.ii 
Unified_cpp_layout_base2.ii
lto1: internal compiler error: in ipcp_verify_propagated_values, at 
ipa-cp.c:1057
trippels@gcc20 testcase % ~/gcc_test/usr/local/bin/g++ -w -r -nostdlib 
-fno-ipa-icf -flto -fdevirtualize-at-ltrans -ffunction-sections -O2 
Unified_cpp_editor_txmgr0.ii Unified_cpp_layout_base2.ii
trippels@gcc20 testcase % ~/gcc_test/usr/local/bin/g++ -w -r -nostdlib -flto 
-ffunction-sections -O2 Unified_cpp_editor_txmgr0.ii 
Unified_cpp_layout_base2.ii 
trippels@gcc20 testcase %
-- 
Markus

[patch] Fix codecvt

2015-02-18 Thread Jonathan Wakely


While working on PR64797 I discovered that the codecvt
specialization was, erm, completely broken when creating UTF-16
surrogate pairs.

This fixes it and adds a test, based on the char32_t one I added to
the testsuite yesterday. Tested x86_64-linux (little-endian) and
powerpc64-linux (big-endian).


commit c0a8047982d0911f74647dba43d65f3b14113f1c
Author: Jonathan Wakely 
Date:   Wed Feb 18 11:49:11 2015 +

	* src/c++11/codecvt.cc (write_utf16_code_point): Fix code to output
	surrogate pairs.
	(utf16_in): Pass mode argument to write_utf16_code_point.
	(codecvt::do_in): Set mode according to
	native byte order.
	* testsuite/22_locale/codecvt/char16_t.cc: New.
	* testsuite/22_locale/codecvt/in/wchar_t/1.cc: Fix typo.

diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc
index 594dae6..aebd3f3 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -295,13 +295,10 @@ namespace
   {
 	// Algorithm from http://www.unicode.org/faq/utf_bom.html#utf16-4
 	const char32_t LEAD_OFFSET = 0xD800 - (0x1 >> 10);
-	const char32_t SURROGATE_OFFSET = 0x1 - (0xD800 << 10) - 0xDC00;
 	char16_t lead = LEAD_OFFSET + (codepoint >> 10);
 	char16_t trail = 0xDC00 + (codepoint & 0x3FF);
-	char32_t utf16bytes = (lead << 10) + trail + SURROGATE_OFFSET;
-
-	to.next[0] = adjust_byte_order(utf16bytes >> 16, mode);
-	to.next[1] = adjust_byte_order(utf16bytes & 0x, mode);
+	to.next[0] = adjust_byte_order(lead, mode);
+	to.next[1] = adjust_byte_order(trail, mode);
 	to.next += 2;
 	return true;
   }
@@ -400,7 +397,7 @@ namespace
 	  return codecvt_base::partial;
 	if (codepoint > maxcode)
 	  return codecvt_base::error;
-	if (!write_utf16_code_point(to, codepoint, {}))
+	if (!write_utf16_code_point(to, codepoint, mode))
 	  {
 	from.next = first;
 	return codecvt_base::partial;
@@ -618,7 +615,12 @@ do_in(state_type&, const extern_type* __from, const extern_type* __from_end,
 {
   range from{ __from, __from_end };
   range to{ __to, __to_end };
-  auto res = utf16_in(from, to);
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  codecvt_mode mode = {};
+#else
+  codecvt_mode mode = little_endian;
+#endif
+  auto res = utf16_in(from, to, max_code_point, mode);
   __from_next = from.next;
   __to_next = to.next;
   return res;
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc
new file mode 100644
index 000..14477f5
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc
@@ -0,0 +1,97 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+
+// [locale.codecvt], C++11 22.4.1.4.  specialization.
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  using namespace std;
+  typedef codecvt codecvt_c16;
+  locale loc_c = locale::classic();
+  VERIFY(has_facet(loc_c));
+  const codecvt_c16* const cvt = &use_facet(loc_c);
+
+  VERIFY(!cvt->always_noconv());
+  VERIFY(cvt->max_length() == 3);
+  VERIFY(cvt->encoding() == 0);
+
+  const char u8dat[] = u8"H\U00E4ll\U00F6 \U0001F63F \U56FD "
+u8"\U222B f(\U03BA) exp(-2\U03C0\U03C9) d\U03BA "
+u8"\U0001F6BF \U0001F6BF \U0001F648 \U0413\U0435\U043E"
+u8"\U0433\U0440\U0430\U0444\U0438\U044F \UFB05";
+  const char* const u8dat_end = std::end(u8dat);
+
+  const char16_t u16dat[] = u"H\U00E4ll\U00F6 \U0001F63F \U56FD "
+u"\U222B f(\U03BA) exp(-2\U03C0\U03C9) d\U03BA "
+u"\U0001F6BF \U0001F6BF \U0001F648 \U0413\U0435\U043E"
+u"\U0433\U0440\U0430\U0444\U0438\U044F \UFB05";
+  const char16_t* const u16dat_end = std::end(u16dat);
+
+  {
+const size_t len = u16dat_end - u16dat + 1;
+char16_t* const buffer = new char16_t[len];
+char16_t* const buffer_end = buffer + len;
+
+const char* from_next;
+char16_t* to_next;
+
+codecvt_c16::state_type state01;
+state01 = {};
+codecvt_base::result res = cvt->in(state01, u8dat, u8dat_end, from_next,
+   buffer, buffer_end, to_next);
+
+VERIFY(res == codecvt_base::ok);
+VERIFY(from_

Re: [patch] Fix codecvt

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 07:07:05PM +, Jonathan Wakely wrote:
> While working on PR64797 I discovered that the codecvt
> specialization was, erm, completely broken when creating UTF-16
> surrogate pairs.
> 
> This fixes it and adds a test, based on the char32_t one I added to
> the testsuite yesterday. Tested x86_64-linux (little-endian) and
> powerpc64-linux (big-endian).

Ok for trunk from RM POV, thanks.

> commit c0a8047982d0911f74647dba43d65f3b14113f1c
> Author: Jonathan Wakely 
> Date:   Wed Feb 18 11:49:11 2015 +
> 
>   * src/c++11/codecvt.cc (write_utf16_code_point): Fix code to output
>   surrogate pairs.
>   (utf16_in): Pass mode argument to write_utf16_code_point.
>   (codecvt::do_in): Set mode according to
>   native byte order.
>   * testsuite/22_locale/codecvt/char16_t.cc: New.
>   * testsuite/22_locale/codecvt/in/wchar_t/1.cc: Fix typo.

Jakub

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mike Frysinger

On 18 Feb 2015 08:58, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 8:54 AM, Mike Frysinger wrote:
> > On 18 Feb 2015 04:56, H.J. Lu wrote:
> >> On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker wrote:
> >> > On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
> >> >> This patch enhances config/zlib.m4 to introduce an extra option
> >> >> --with-libz-prefix which allows us to provide the location of
> >> >> the zlib library we want to use during the build.
> >> >>
> >> >> config/ChangeLog:
> >> >>
> >> >> * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.
> >> >>
> >> >> I didn't see any file in the GCC project that uses this macro,
> >> >> so for the GCC repository, the change to zlib.m4 is it. But
> >> >> I am also attaching to this email a copy of the patch that
> >> >> will be applied to the binutils-gdb.git repository, with all
> >> >> configury using this macro being re-generated - mostly for info,
> >> >> also as a heads-up to both binutils and GDB.
> >> >>
> >> >> This was tested by regenerating all autoconf/automake files in
> >> >> the binutils-gdb project, and rebuilding GDB, using the following
> >> >> combinations:
> >> >>
> >> >>   --with-zlib (system zlib used)
> >> >>   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
> >> >>   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)
> >> >>
> >> >>   --without-zlib (zlib support turned off)
> >> >>   --without-zlib --with-zlib-prefix (zlib support turned off)
> >> >>
> >> >>   --with-zlib (no system zlib available, configure fails with expected 
> >> >> error)
> >> >>   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
> >> >>   (no system zlib, configure fails with same error)
> >> >>
> >> >> OK to commit?
> >>
> >> Why do you want to turn off zlib? On Linux/x86,  zlib is required
> >> for assembler.  At least, you should issue an error when --without-libz
> >> is used in binutils for Linux/x86 target.
> >
> > err, when did that happen ?  why would zlib be possibly required for an
> > assembler ?
> 
> commit 89e7505fcde4bd83948f559f429a0e1eb4262f05
> Author: H.J. Lu 
> Date:   Sun Dec 14 06:41:03 2014 -0800
> 
> Compress debug sections for Linux/x86 by default
> 
>   * config/tc-i386.c (flag_compress_debug): Default to compress
>   debug sections for Linux.

i don't see how that justifies making it a hard requirement
-mike


signature.asc
Description: Digital signature

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 11:44 AM, Mike Frysinger  wrote:
> On 18 Feb 2015 08:58, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 8:54 AM, Mike Frysinger wrote:
>> > On 18 Feb 2015 04:56, H.J. Lu wrote:
>> >> On Wed, Feb 18, 2015 at 4:08 AM, Joel Brobecker wrote:
>> >> > On Wed, Jan 07, 2015 at 06:45:48PM +0400, Joel Brobecker wrote:
>> >> >> This patch enhances config/zlib.m4 to introduce an extra option
>> >> >> --with-libz-prefix which allows us to provide the location of
>> >> >> the zlib library we want to use during the build.
>> >> >>
>> >> >> config/ChangeLog:
>> >> >>
>> >> >> * zlib.m4 (AM_ZLIB): Add --with-libz-prefix option support.
>> >> >>
>> >> >> I didn't see any file in the GCC project that uses this macro,
>> >> >> so for the GCC repository, the change to zlib.m4 is it. But
>> >> >> I am also attaching to this email a copy of the patch that
>> >> >> will be applied to the binutils-gdb.git repository, with all
>> >> >> configury using this macro being re-generated - mostly for info,
>> >> >> also as a heads-up to both binutils and GDB.
>> >> >>
>> >> >> This was tested by regenerating all autoconf/automake files in
>> >> >> the binutils-gdb project, and rebuilding GDB, using the following
>> >> >> combinations:
>> >> >>
>> >> >>   --with-zlib (system zlib used)
>> >> >>   --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>> >> >>   --with-zlib --with-libz-prefix=/zlib/prefix (specific zlib linked in)
>> >> >>
>> >> >>   --without-zlib (zlib support turned off)
>> >> >>   --without-zlib --with-zlib-prefix (zlib support turned off)
>> >> >>
>> >> >>   --with-zlib (no system zlib available, configure fails with expected 
>> >> >> error)
>> >> >>   --with-zlib --with-libz-prefix=/invalid/zlib/prefix
>> >> >>   (no system zlib, configure fails with same error)
>> >> >>
>> >> >> OK to commit?
>> >>
>> >> Why do you want to turn off zlib? On Linux/x86,  zlib is required
>> >> for assembler.  At least, you should issue an error when --without-libz
>> >> is used in binutils for Linux/x86 target.
>> >
>> > err, when did that happen ?  why would zlib be possibly required for an
>> > assembler ?
>>
>> commit 89e7505fcde4bd83948f559f429a0e1eb4262f05
>> Author: H.J. Lu 
>> Date:   Sun Dec 14 06:41:03 2014 -0800
>>
>> Compress debug sections for Linux/x86 by default
>>
>>   * config/tc-i386.c (flag_compress_debug): Default to compress
>>   debug sections for Linux.
>
> i don't see how that justifies making it a hard requirement
> -mike

Can you elaborate?

-- 
H.J.

Re: [PATCH 09/36] floatformat.h: Wrap in extern "C".

2015-02-18 Thread Jakub Jelinek

On Thu, Feb 12, 2015 at 11:49:01AM +, Pedro Alves wrote:
> On 02/09/2015 11:49 PM, Pedro Alves wrote:
> > On 02/09/2015 11:35 PM, Andrew Pinski wrote:
> >> On Mon, Feb 9, 2015 at 3:20 PM, Pedro Alves  wrote:
> >>> Just like libiberty.h.  So that C++ programs, such as GDB when built
> >>> as a C++ program, can use it.
> >>
> >> Why is not needed for GCC building with C++ compiler?
> > 
> > Because it doesn't include it.
> > 
> > The header of the file claims it is part of GDB, though MAINTAINERS
> > nowadays says that everything under include/ is owned by GCC.
> 
> Here's an update that moves the extern "C" below the #include.
> 
> OK to push to the GCC repo?
> 
> From: Pedro Alves 
> Subject: [PATCH] floatformat.h: Wrap in extern "C".
> 
> Just like libiberty.h.  So that C++ programs, such as GDB when built
> as a C++ program, can use it.
> 
> include/ChangeLog:
> 2015-02-12  Pedro Alves  
> 
>   * floatformat.h [__cplusplus]: Wrap in extern "C".

Ok, thanks.

Jakub

[committed][PR65107] Add missing cleanup in gfortran.dg/read_eof_8.f90

2015-02-18 Thread Tom de Vries


Hi,

I ran into a failure of gfortran.dg/eof_4.f90, due to the presence of test.dat. 
The contents of test.dat pointed to read_eof_8.f90, which indeed does not 
cleanup the test.dat it uses. This patch fixes that.


Tested by running the test and checking that test.dat was not present anymore in 
the test directories.


Committed as trivial.

Thanks,
- Tom
2015-02-18  Tom de Vries  

	PR testsuite/65107
	* gfortran.dg/read_eof_8.f90: Add missing close.

 gcc/testsuite/gfortran.dg/read_eof_8.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gfortran.dg/read_eof_8.f90 b/gcc/testsuite/gfortran.dg/read_eof_8.f90
index 7436a2b..86228da 100644
--- a/gcc/testsuite/gfortran.dg/read_eof_8.f90
+++ b/gcc/testsuite/gfortran.dg/read_eof_8.f90
@@ -20,6 +20,7 @@ program test
   enddo
   call abort
 100 if (k /= 5) call abort
+  close(25, status="delete")
   stop
 101 call abort
 end program test
-- 
1.9.1

Re: [patch] Fix codecvt

2015-02-18 Thread Jonathan Wakely


On 18/02/15 19:07 +, Jonathan Wakely wrote:

While working on PR64797 I discovered that the codecvt
specialization was, erm, completely broken when creating UTF-16
surrogate pairs.

This fixes it and adds a test, based on the char32_t one I added to
the testsuite yesterday. Tested x86_64-linux (little-endian) and
powerpc64-linux (big-endian).


Committed, along with this tweak to only run the tests where
supported.


commit ada5fdcd89fb1e91c43d2bfcba852fdaf27363b0
Author: Jonathan Wakely 
Date:   Wed Feb 18 19:51:00 2015 +

	* testsuite/22_locale/codecvt/char16_t.cc: Add dg-require-cstdint.
	* testsuite/22_locale/codecvt/char32_t.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc
index 14477f5..9271eca 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-options "-std=gnu++11" }
+// { dg-require-cstdint "" }
 
 // [locale.codecvt], C++11 22.4.1.4.  specialization.
 
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/char32_t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/char32_t.cc
index 07f72c4..ebf30ad 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/char32_t.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/char32_t.cc
@@ -1,4 +1,5 @@
 // { dg-options "-std=gnu++11" }
+// { dg-require-cstdint "" }
 
 // 2014-04-24 Rüdiger Sonderfeld

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mark Wielaard

On Wed, 2015-02-18 at 11:52 -0800, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 11:44 AM, Mike Frysinger  wrote:
> > On 18 Feb 2015 08:58, H.J. Lu wrote:
> >> On Wed, Feb 18, 2015 at 8:54 AM, Mike Frysinger wrote:
> >> >> Why do you want to turn off zlib? On Linux/x86,  zlib is required
> >> >> for assembler.  At least, you should issue an error when --without-libz
> >> >> is used in binutils for Linux/x86 target.
> >> >
> >> > err, when did that happen ?  why would zlib be possibly required for an
> >> > assembler ?
> >>
> >> commit 89e7505fcde4bd83948f559f429a0e1eb4262f05
> >> Author: H.J. Lu 
> >> Date:   Sun Dec 14 06:41:03 2014 -0800
> >>
> >> Compress debug sections for Linux/x86 by default
> >>
> >>   * config/tc-i386.c (flag_compress_debug): Default to compress
> >>   debug sections for Linux.
> >
> > i don't see how that justifies making it a hard requirement
> 
> Can you elaborate?

That doesn't seem like a smart default. And why is is Linux/x86 only?
Shouldn't that be something that is done explicitly by a distro
configuring binutils after making sure it actually is beneficial
(debuginfo is often compressed in a different way, on the package/file
level or with dwz). And after making sure all tools actually work with
it? There are various tools that don't handle the .zdebug format like
valgrind. And at least elfutils has trouble with it for ET_REL files,
like kernel modules, because relocations don't actually apply anymore to
the section data as is (but only after the decompression).

Cheers,

Mark

[committed] Add missing cleanup in gfortran.dg/fmt_cache_1.f

2015-02-18 Thread Tom de Vries


Hi,

I found a fort.10 file in the test directories, and using the contents tracked 
it back to fmt_cache_1.f, which creates fort.10 but doesn't remove it. This 
patch fixes that.


Tested by running the test-case and checking that fort.10 doesn't appear anymore 
in the test directory.


Committed as obvious.

Thanks,
- Tom
2015-02-18  Tom de Vries  

	* gfortran.dg/fmt_cache_1.f: Add missing close.

---
 gcc/testsuite/gfortran.dg/fmt_cache_1.f | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gfortran.dg/fmt_cache_1.f b/gcc/testsuite/gfortran.dg/fmt_cache_1.f
index b9b9fe8..3344e5d 100644
--- a/gcc/testsuite/gfortran.dg/fmt_cache_1.f
+++ b/gcc/testsuite/gfortran.dg/fmt_cache_1.f
@@ -28,6 +28,7 @@
   teststring = ""
   read(10,'(a)') teststring
   if (teststring.ne."   arlxca =   0.0 arlxcc =")call abort
+  close(10, status='delete')
   end program astap
 
 
-- 
1.9.1

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Jakub Jelinek

On Wed, Feb 18, 2015 at 09:32:21PM +0100, Mark Wielaard wrote:
> > Can you elaborate?
> 
> That doesn't seem like a smart default. And why is is Linux/x86 only?
> Shouldn't that be something that is done explicitly by a distro
> configuring binutils after making sure it actually is beneficial
> (debuginfo is often compressed in a different way, on the package/file
> level or with dwz). And after making sure all tools actually work with

Yeah, dwz doesn't handle those, I think debugedit used by rpm doesn't
either.  When stripping into files, it would seem to be smarter to just
compress the separate debug files rather than compressing individual
sections anyway.

Jakub

[committed] Add missing cleanup in gfortran.dg/finalize_28.f90

2015-02-18 Thread Tom de Vries


Hi,

I found finalize_28.f90.003t.original in a gfortran test directory. This patch 
adds the missing cleanup-tree-dump.


Tested by running the test-case and checking that the file does not occur 
anymore in the test directory.


Committed as obvious.

Thanks,
- Tom
2015-02-18  Tom de Vries  

	* gfortran.dg/finalize_28.f90: Add missing cleanup-tree-dump.

---
 gcc/testsuite/gfortran.dg/finalize_28.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gfortran.dg/finalize_28.f90 b/gcc/testsuite/gfortran.dg/finalize_28.f90
index 03de5d0..3d7b241 100644
--- a/gcc/testsuite/gfortran.dg/finalize_28.f90
+++ b/gcc/testsuite/gfortran.dg/finalize_28.f90
@@ -22,3 +22,4 @@ contains
   end subroutine coo_dump_edges
 end module coo_graphs
 ! { dg-final { scan-tree-dump-times "__builtin_free" 3 "original" } }
+! { dg-final { cleanup-tree-dump "original" } }
-- 
1.9.1

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard  wrote:
> On Wed, 2015-02-18 at 11:52 -0800, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 11:44 AM, Mike Frysinger  wrote:
>> > On 18 Feb 2015 08:58, H.J. Lu wrote:
>> >> On Wed, Feb 18, 2015 at 8:54 AM, Mike Frysinger wrote:
>> >> >> Why do you want to turn off zlib? On Linux/x86,  zlib is required
>> >> >> for assembler.  At least, you should issue an error when --without-libz
>> >> >> is used in binutils for Linux/x86 target.
>> >> >
>> >> > err, when did that happen ?  why would zlib be possibly required for an
>> >> > assembler ?
>> >>
>> >> commit 89e7505fcde4bd83948f559f429a0e1eb4262f05
>> >> Author: H.J. Lu 
>> >> Date:   Sun Dec 14 06:41:03 2014 -0800
>> >>
>> >> Compress debug sections for Linux/x86 by default
>> >>
>> >>   * config/tc-i386.c (flag_compress_debug): Default to compress
>> >>   debug sections for Linux.
>> >
>> > i don't see how that justifies making it a hard requirement
>>
>> Can you elaborate?
>
> That doesn't seem like a smart default. And why is is Linux/x86 only?
> Shouldn't that be something that is done explicitly by a distro
> configuring binutils after making sure it actually is beneficial
> (debuginfo is often compressed in a different way, on the package/file
> level or with dwz). And after making sure all tools actually work with
> it? There are various tools that don't handle the .zdebug format like
> valgrind. And at least elfutils has trouble with it for ET_REL files,
> like kernel modules, because relocations don't actually apply anymore to
> the section data as is (but only after the decompression).
>

Now it becomes a monthly topic:

https://sourceware.org/ml/binutils/2015-01/msg00089.html



-- 
H.J.

[committed] Add missing cleanup in gfortran.dg/pr37287-1.f90

2015-02-18 Thread Tom de Vries


Hi,

After running gfortran tests, I found a pr37287_2.mod in my test directory.
Fixed by this patch which adds a missing cleanup-modules.

Tested by running the testcase and checking that the file does not occur anymore 
in the test directory.


Committed as obvious.

Thanks,
- Tom
2015-02-18  Tom de Vries  

	* gfortran.dg/pr37287-1.f90: Add missing cleanup-modules.

---
 gcc/testsuite/gfortran.dg/pr37287-1.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gfortran.dg/pr37287-1.f90 b/gcc/testsuite/gfortran.dg/pr37287-1.f90
index c2d42e6..ca8b879 100644
--- a/gcc/testsuite/gfortran.dg/pr37287-1.f90
+++ b/gcc/testsuite/gfortran.dg/pr37287-1.f90
@@ -12,3 +12,4 @@ contains
   end subroutine set_null
 end module pr37287_1
 end
+! { dg-final { cleanup-modules "pr37287_2" } }
-- 
1.9.1

Re: [PATCH] Fix testsuite race on additional_sources

2015-02-18 Thread Jeff Law


On 02/18/15 08:05, Maxim Kuvyrkov wrote:

Hi,

This testsuite patch fixes race on additional_source testsuite variable.  When a test has 
both dg-additional-sources and "dg-do run { target FOO }" directives, it may 
occur that the FOO test will attempt to use additional_sources, which will result in 
failure to compile FOO test.  It often happens that FOO test was done for one of the 
previous testcases (which didn't use dg-additional-sources), so the failure case is not 
stable.

This behavior can be more-or-less reliably triggered with

make check-gcc make RUNTESTFLAGS="i386.exp=gcc.target/i386/pr64291-1.c"

The attached patch fixes the problem.  OK for trunk and 4.9 branch?

Yes, this is fine.
Jeff

[committed] Add missing cleanup in gfortran.dg/coarray_35a.f90

2015-02-18 Thread Tom de Vries


Hi,

After running gfortran tests, I found a global_coarrays.mod file in my test 
directory. Fixed by this patch which adds a missing cleanup-modules.


Tested by running the testcase and checking that the file does not occur anymore 
in the test directory.


Committed as obvious.

Thanks,
- Tom
2015-02-18  Tom de Vries  

	* gfortran.dg/coarray_35a.f90: Add missing cleanup-modules.

---
 gcc/testsuite/gfortran.dg/coarray_35a.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gfortran.dg/coarray_35a.f90 b/gcc/testsuite/gfortran.dg/coarray_35a.f90
index eeeb289..1b954a9 100644
--- a/gcc/testsuite/gfortran.dg/coarray_35a.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_35a.f90
@@ -26,3 +26,4 @@ end program testmod
 
 ! Check for the symbol of the coarray token (w/o system-dependend prefix)
 ! { dg-final { scan-assembler "caf_token__global_coarrays_MOD_b" } }
+! { dg-final { cleanup-modules "global_coarrays" } }
-- 
1.9.1

Re: [Patch] Add option ftree-stdarg-opt

2015-02-18 Thread Tom de Vries


On 17-02-15 13:26, Richard Biener wrote:

On Tue, Feb 17, 2015 at 1:12 PM, Tom de Vries  wrote:

Hi,

this patch adds option ftree-stdarg-opt, which switches pass_stdarg on or
off.

Pass_stdarg does an optimization on cfun->va_list_gpr/fpr_size, and since
it's an optimization, it's useful to be able to switch it off in case of a
problem with the pass.

This is not a regression or documentation fix, so it doesn't classify as a
stage 4 patch. I could imagine it still being included in stage4 because it
adds the possibility for a workaround in case of problems.

Bootstrapped and reg-tested on x86_64.

OK for stage1 (or even stage 4)?


New options need to be documented in invoke.texi.  I also wonder
if 'stdarg' is a term known to programmers.  I'd rather document it
as
"Optimize the prologue of variadic argument functions with respect
to usage of those arguments"

And please omit 'tree' from the flag, thus -fstdarg-opt

Otherwise generally fine for stage4.



Updated patch accordingly, re-tested and committed as attached.

Thanks,
- Tom

2015-02-18  Tom de Vries  

	* common.opt (fstdarg-opt): New option.
	* tree-stdarg.c (pass_stdarg::gate): Use flag_stdarg_opt.
	* doc/invoke.texi (@item Optimization Options): Add -fstdarg-opt.
	(@item -fstdarg-opt): New item.
---
 gcc/common.opt  | 4 
 gcc/doc/invoke.texi | 7 ++-
 gcc/tree-stdarg.c   | 5 +++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e0d4a1d..4fa12f5 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2069,6 +2069,10 @@ fssa-phiopt
 Common Report Var(flag_ssa_phiopt) Optimization
 Optimize conditional patterns using SSA PHI nodes
 
+fstdarg-opt
+Common Report Var(flag_stdarg_opt) Init(1) Optimization
+Optimize amount of stdarg registers saved to stack at start of function
+
 fvariable-expansion-in-unroller
 Common Report Var(flag_variable_expansion_in_unroller) Optimization
 Apply variable expansion when loops are unrolled
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4a79b48..ef4cc75 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -430,7 +430,7 @@ Objective-C and Objective-C++ Dialects}.
 -fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol
 -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-phiopt @gol
 -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
--fstack-protector-explicit -fstrict-aliasing @gol
+-fstack-protector-explicit -fstdarg-opt -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
@@ -9867,6 +9867,11 @@ references to local frame addresses.
 Like @option{-fstack-protector} but only protects those functions which
 have the @code{stack_protect} attribute
 
+@item -fstdarg-opt
+@opindex fstdarg-opt
+Optimize the prologue of variadic argument functions with respect to usage of
+those arguments.
+
 @item -fsection-anchors
 @opindex fsection-anchors
 Try to reduce the number of symbolic address calculations by using
diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c
index 2cf0ca3..17d51a2 100644
--- a/gcc/tree-stdarg.c
+++ b/gcc/tree-stdarg.c
@@ -704,8 +704,9 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
 {
-  /* This optimization is only for stdarg functions.  */
-  return fun->stdarg != 0;
+  return (flag_stdarg_opt
+	  /* This optimization is only for stdarg functions.  */
+	  && fun->stdarg != 0);
 }
 
   virtual unsigned int execute (function *);
-- 
1.9.1

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mark Wielaard

On Wed, 2015-02-18 at 12:53 -0800, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard  wrote:
> > That doesn't seem like a smart default. And why is is Linux/x86 only?
> > Shouldn't that be something that is done explicitly by a distro
> > configuring binutils after making sure it actually is beneficial
> > (debuginfo is often compressed in a different way, on the package/file
> > level or with dwz). And after making sure all tools actually work with
> > it? There are various tools that don't handle the .zdebug format like
> > valgrind. And at least elfutils has trouble with it for ET_REL files,
> > like kernel modules, because relocations don't actually apply anymore to
> > the section data as is (but only after the decompression).
> 
> Now it becomes a monthly topic:
> 
> https://sourceware.org/ml/binutils/2015-01/msg00089.html

Thanks, I hadn't seen that before. Alan Modra makes some good points in
that thread why it is not a good change:
https://sourceware.org/ml/binutils/2015-01/msg00135.html
Do people agree with that? And/Or can the change be reverted for now
till there is agreement it is a desirable default?

Thanks,

Mark

Re: [PATCH] Put cleanups of cleanups after cleanups (PR gcov-profile/64634)

2015-02-18 Thread Jeff Law


On 02/18/15 11:40, Jakub Jelinek wrote:

Hi!

Richard's GIMPLE EH rewrite in r151696 regressed following testcase.
The problem is that when lowering:
   [gcov-15.C:14:5] try
 {
   [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
   [gcov-15.C:18:12] try
 {
   [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
 }
   catch
 {
   [gcov-15.C:18:11] __cxa_free_exception (D.2335);
 }
   [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
 }
   catch
 {
   [gcov-15.C:20:3] catch (NULL)
 {
   [gcov-15.C:20:3] try
 {
   [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (0);
   [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
   [gcov-15.C:22:15] catchEx ();
 }
   finally
 {
   [gcov-15.C:20:10] __cxa_end_catch ();
 }
 }
 }
we put the cleanup of the catch cleanup in front of the catch cleanup
in the EH sequence:
   [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
   [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
   [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
   :
   [gcov-15.C:24:1] D.2341 = 0;
   [gcov-15.C:24:1] goto ;
   :
   [gcov-15.C:24:1] return D.2341;
   :
   [gcov-15.C:20:10] __cxa_end_catch ();
   resx 3
   :
   eh_dispatch 1
   resx 1
   :
   [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (1);
   [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
   [gcov-15.C:22:15] catchEx ();
   [gcov-15.C:20:10] __cxa_end_catch ();
   goto ;
and as the __cxa_end_catch () is the first bb for line 20,
gcov without -a considers that bb count as the one to be shown.
Before the gimple EH rewrite and also with this patch we instead
order the cleanup (__cxa_end_catch ()) after the __cxa_begin_catch ():
   [gcov-15.C:18:12] D.2335 = __cxa_allocate_exception (4);
   [gcov-15.C:18:12] [gcov-15.C:18:12] MEM[(int *)D.2335] = 5;
   [gcov-15.C:18:11] __cxa_throw (D.2335, &_ZTIi, 0B);
   :
   [gcov-15.C:24:1] D.2341 = 0;
   [gcov-15.C:24:1] goto ;
   :
   [gcov-15.C:24:1] return D.2341;
   :
   eh_dispatch 1
   resx 1
   :
   [gcov-15.C:20:10] D.2340 = __builtin_eh_pointer (1);
   [gcov-15.C:20:10] __cxa_begin_catch (D.2340);
   [gcov-15.C:22:15] catchEx ();
   [gcov-15.C:20:10] __cxa_end_catch ();
   goto ;
   :
   [gcov-15.C:20:10] __cxa_end_catch ();
   resx 3

Bootstrapped/regtested on x86_64-linux and i686-linux,
libstdc++.so.6 compiled without and with the patch is identical on
x86_64-linux.  The testcase also has identical generated code both at -O0
and -O2 when compiled without coverage.  Ok for trunk?

2015-02-18  Jakub Jelinek  

PR gcov-profile/64634
* tree-eh.c (frob_into_branch_around): Fix up typos
in function comment.
(lower_catch): Put eh_seq resulting from EH lowering of
the cleanup sequence after the cleanup rather than before
it.

* g++.dg/gcov/gcov-15.C: New test.

OK.
jeff

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 1:40 PM, Mark Wielaard  wrote:
> On Wed, 2015-02-18 at 12:53 -0800, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard  wrote:
>> > That doesn't seem like a smart default. And why is is Linux/x86 only?
>> > Shouldn't that be something that is done explicitly by a distro
>> > configuring binutils after making sure it actually is beneficial
>> > (debuginfo is often compressed in a different way, on the package/file
>> > level or with dwz). And after making sure all tools actually work with
>> > it? There are various tools that don't handle the .zdebug format like
>> > valgrind. And at least elfutils has trouble with it for ET_REL files,
>> > like kernel modules, because relocations don't actually apply anymore to
>> > the section data as is (but only after the decompression).
>>
>> Now it becomes a monthly topic:
>>
>> https://sourceware.org/ml/binutils/2015-01/msg00089.html
>
> Thanks, I hadn't seen that before. Alan Modra makes some good points in
> that thread why it is not a good change:
> https://sourceware.org/ml/binutils/2015-01/msg00135.html
> Do people agree with that? And/Or can the change be reverted for now
> till there is agreement it is a desirable default?
>

It may not be a good idea for all targets.  If you find an issue
on Linux/x86, please file a bug binutils report.

Thanks.


-- 
H.J.

Re: [PATCH] rtl-optimization/64935: Sorting of ready list is different with/without DEBUG_INSNs.

2015-02-18 Thread Jeff Law


On 02/18/15 09:02, Maxim Kuvyrkov wrote:

Hi,

This patch fixes PR64935, which is triggered when ready list at the start of a 
basic block is greater than --param=max-sched-ready-insns.  Sorting the ready 
list when it has more than max-sched-ready-insns elements is special in that we 
want to sort normal insns even if there are debug insns in the list.  This is 
due to code for max-sched-ready-insns ignoring debug insns on purpose.

The problem in the bug can be fixed with a smallish patch 
(seehttps://gcc.gnu.org/bugzilla/attachment.cgi?id=34674), but it makes code 
look ugly and non-intuitive.  The second version of the patch (the one attached 
here) is a bit bigger, but it gives functions definitive and clear purpose, and 
makes code easier to understand.

While reviewing the patch I suggest using context diff mode (in emacs C-c C-d). 
 I couldn't convince git to generate context diff.

Bootstrapped/tested on x86_64-linux-gnu and cross-tested on 
arm-linux-gnueabihf.  Markus also tested this patch on powerpc64-linux-gnu.

OK for trunk?

Thank you,

--
Maxim Kuvyrkov
www.linaro.org



0001-Fix-PR64935.patch


 From 339d0af94509796d08101724ea54e1d3787f89f2 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov
Date: Wed, 18 Feb 2015 15:27:49 +
Subject: [PATCH] Fix PR64935

* haifa-sched.c (enum rfs_decision, rfs_str): Remove RFS_DEBUG.
(rank_for_schedule_debug): Update.
(ready_sort): Make static.  Move sorting logic to ...
(ready_sort_debug, ready_sort_real): New static functions.
(schedule_block): Sort both debug insns and real insns in preparation
for ready list trimming.  Improve debug output.
* sched-int.h (ready_sort): Remove global declaration.

* gcc.dg/pr64935-1.c, gcc.dg/pr64935-2.c: New tests.

This is fine.   Thanks,
jeff

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mike Frysinger

On 18 Feb 2015 13:54, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 1:40 PM, Mark Wielaard wrote:
> > On Wed, 2015-02-18 at 12:53 -0800, H.J. Lu wrote:
> >> On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard wrote:
> >> > That doesn't seem like a smart default. And why is is Linux/x86 only?
> >> > Shouldn't that be something that is done explicitly by a distro
> >> > configuring binutils after making sure it actually is beneficial
> >> > (debuginfo is often compressed in a different way, on the package/file
> >> > level or with dwz). And after making sure all tools actually work with
> >> > it? There are various tools that don't handle the .zdebug format like
> >> > valgrind. And at least elfutils has trouble with it for ET_REL files,
> >> > like kernel modules, because relocations don't actually apply anymore to
> >> > the section data as is (but only after the decompression).
> >>
> >> Now it becomes a monthly topic:
> >>
> >> https://sourceware.org/ml/binutils/2015-01/msg00089.html
> >
> > Thanks, I hadn't seen that before. Alan Modra makes some good points in
> > that thread why it is not a good change:
> > https://sourceware.org/ml/binutils/2015-01/msg00135.html
> > Do people agree with that? And/Or can the change be reverted for now
> > till there is agreement it is a desirable default?
> 
> It may not be a good idea for all targets.  If you find an issue
> on Linux/x86, please file a bug binutils report.

i think we already have the reports: multiple people don't think it should be 
(1) x86-specific or (2) required.  don't get me wrong -- i think having support 
like this is great.  that doesn't mean we should be forcing it.
-mike


signature.asc
Description: Digital signature

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread H.J. Lu

On Wed, Feb 18, 2015 at 2:21 PM, Mike Frysinger  wrote:
> On 18 Feb 2015 13:54, H.J. Lu wrote:
>> On Wed, Feb 18, 2015 at 1:40 PM, Mark Wielaard wrote:
>> > On Wed, 2015-02-18 at 12:53 -0800, H.J. Lu wrote:
>> >> On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard wrote:
>> >> > That doesn't seem like a smart default. And why is is Linux/x86 only?
>> >> > Shouldn't that be something that is done explicitly by a distro
>> >> > configuring binutils after making sure it actually is beneficial
>> >> > (debuginfo is often compressed in a different way, on the package/file
>> >> > level or with dwz). And after making sure all tools actually work with
>> >> > it? There are various tools that don't handle the .zdebug format like
>> >> > valgrind. And at least elfutils has trouble with it for ET_REL files,
>> >> > like kernel modules, because relocations don't actually apply anymore to
>> >> > the section data as is (but only after the decompression).
>> >>
>> >> Now it becomes a monthly topic:
>> >>
>> >> https://sourceware.org/ml/binutils/2015-01/msg00089.html
>> >
>> > Thanks, I hadn't seen that before. Alan Modra makes some good points in
>> > that thread why it is not a good change:
>> > https://sourceware.org/ml/binutils/2015-01/msg00135.html
>> > Do people agree with that? And/Or can the change be reverted for now
>> > till there is agreement it is a desirable default?
>>
>> It may not be a good idea for all targets.  If you find an issue
>> on Linux/x86, please file a bug binutils report.
>
> i think we already have the reports: multiple people don't think it should be
> (1) x86-specific or (2) required.  don't get me wrong -- i think having 
> support
> like this is great.  that doesn't mean we should be forcing it.
> -mike

Please file a bug report with a testcase.


-- 
H.J.

Re: [patch] fix PR65048: check that jump-thread paths are still valid

2015-02-18 Thread Sebastian Pop

Jeff Law wrote:
> These kinds of situations are normally pruned out in mark_threaded_blocks.

I added the FSM code generation before calling mark_threaded_blocks.

> 
> The dumps for the FSM threads are a bit sparse -- they don't show
> the entire path.  That makes it much harder to see what's going on.

Would a patch improving the FSM dumps ok to commit separately to trunk?

> It also appears that FSM is  registering lots of duplicate paths.

I discussed about this with my colleague Brian in CC, and we think it is
feasible to avoid registering duplicate paths by computing a hashing the paths
that have been already discovered, and checksum the paths before inserting in
the paths vector.  I don't think removing duplicate paths would help in the
current case.

> Anyway, so what node precisely is not connected?  Is that happening
> as a result of the duplicated jump threads or is it something else?

Here is a more complete dump of what is going on:

  Registering FSM jump thread: (6, 7)  (7, 10)  (10, 11)  (11, 12) 
  Registering FSM jump thread: (5, 7)  (7, 10)  (10, 11)  (11, 12) 
  Registering FSM jump thread: (6, 7)  (7, 10)  (10, 11)  (11, 12) 
  Registering FSM jump thread: (5, 7)  (7, 10)  (10, 11)  (11, 12) 
  Registering FSM jump thread: (6, 7)  (7, 10)  (10, 11)  (11, 12) 
  Registering FSM jump thread: (5, 7)  (7, 10)  (10, 11)  (11, 12) 
generating code for:   Registering FSM jump thread: (6, 7)  (7, 10)  (10, 11)  
(11, 12) 
generating code for:   Registering FSM jump thread: (5, 7)  (7, 10)  (10, 11)  
(11, 12) 

That was the first round of jump threading: we discovered all the paths to be
threaded, and then we code generated only two jump-threads.

Here is the second run of jump-thread, probably on a different function:

  Registering FSM jump thread: (6, 14)  (14, 15)  (15, 16)  (16, 3)  (3, 4)  
(4, 9)  (9, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (8, 14)  (14, 15)  (15, 16)  (16, 3)  (3, 4)  
(4, 9)  (9, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (6, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 
14) 
  Registering FSM jump thread: (8, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 
14) 
  Registering FSM jump thread: (5, 11)  (11, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (7, 11)  (11, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (5, 10)  (10, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (7, 10)  (10, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (6, 14)  (14, 15)  (15, 16)  (16, 3)  (3, 4)  
(4, 9)  (9, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (8, 14)  (14, 15)  (15, 16)  (16, 3)  (3, 4)  
(4, 9)  (9, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (6, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 
14) 
  Registering FSM jump thread: (8, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 
14) 
  Registering FSM jump thread: (5, 11)  (11, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (7, 11)  (11, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (5, 10)  (10, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (7, 10)  (10, 12)  (12, 13)  (13, 14) 
  Registering FSM jump thread: (16, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 
15)  (15, 16) 
  Registering FSM jump thread: (11, 12)  (12, 13)  (13, 15)  (15, 16) 
  Registering FSM jump thread: (10, 12)  (12, 13)  (13, 15)  (15, 3) 
generating code for:   Registering FSM jump thread: (6, 14)  (14, 15)  (15, 16) 
 (16, 3)  (3, 4)  (4, 9)  (9, 12)  (12, 13)  (13, 14) 
generating code for:   Registering FSM jump thread: (10, 12)  (12, 13)  (13, 
15)  (15, 3) 
generating code for:   Registering FSM jump thread: (11, 12)  (12, 13)  (13, 
15)  (15, 16) 
generating code for:   Registering FSM jump thread: (16, 3)  (3, 4)  (4, 9)  
(9, 12)  (12, 13)  (13, 15)  (15, 16) 
invalid jump-thread:   Registering FSM jump thread: (7, 10)  (10, 25)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (5, 10)  (10, 25)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (7, 11)  (11, 28)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (5, 11)  (11, 28)  (12, 13) 
 (13, 14) 
generating code for:   Registering FSM jump thread: (8, 3)  (3, 4)  (4, 9)  (9, 
12)  (12, 13)  (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (7, 10)  (10, 25)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (5, 10)  (10, 25)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (7, 11)  (11, 28)  (12, 13) 
 (13, 14) 
invalid jump-thread:   Registering FSM jump thread: (5, 11)  (11, 28)  (12, 13) 
 (13, 14) 

After having generated 4 jump threads, we end up trying to code generate a path
that is not connected anymore: (7, 10)  (10, 25)  (12, 13)  (13, 14)
This is due to the fact that we have already code generated a jump-thread for 
this path:
(10, 12)  (12, 13)  (13, 15)  (15, 3)
and we redirected the edge (10, 12) to (10, 25).
The code generated for the path

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mike Frysinger

On 18 Feb 2015 14:24, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 2:21 PM, Mike Frysinger wrote:
> > i think we already have the reports: multiple people don't think it should 
> > be
> > (1) x86-specific or (2) required.  don't get me wrong -- i think having 
> > support
> > like this is great.  that doesn't mean we should be forcing it.
> 
> Please file a bug report with a testcase.

this is getting kafka-esque.  you yourself stated:
  On Linux/x86, zlib is required for assembler.  At least, you should issue an 
  error when --without-libz is used in binutils for Linux/x86 target.
that should not be the case.  making someone open a bug report so you can close 
it with "fixed" and a patch is wasting time.  just fix it now.

all that said, if we look at your actual commit (89e7505fcde4bd83948f559f429a0):
gas/config/tc-i386.c:
  +#ifdef TE_LINUX
  +/* Default to compress debug sections for Linux.  */
  +int flag_compress_debug = 1;
  +#endif

and we look at where that flag is used:
gas/as.c:
  ...
case OPTION_COMPRESS_DEBUG:
  #ifdef HAVE_ZLIB_H
  flag_compress_debug = 1;
  #else
  as_warn (_("cannot compress debug sections (zlib not installed)"));
  #endif /* HAVE_ZLIB_H */
  break;

case OPTION_NOCOMPRESS_DEBUG:
  flag_compress_debug = 0;
  break;
  ...

gas/write.c:
  void
  write_object_file (void)
  {
  ...
if (flag_compress_debug)
  bfd_map_over_sections (stdoutput, compress_debug, (char *) 0);
  ...
  static void
  compress_debug (bfd *abfd, asection *sec, void *xxx ATTRIBUTE_UNUSED)
  {
  ...
strm = compress_init ();
if (strm == NULL)
  return;

it turns out the current code does *not* require zlib.  as long as that does 
not 
change (either issuing a warning or throwing an error), i see no reason why we 
need or should make zlib a requirement in binutils, regardless of target.
-mike

signature.asc
Description: Digital signature

[PATCH] sem_function::bb_dict_test should take a vec *

2015-02-18 Thread tbsaunde+gcc

From: Trevor Saunders 

bb_dict_test () ment to operate on the callers vector, not a copy of it.
Otherwise it either does nothing or crashes.

approved by Honza off list committing to trunk (bootstrapped + regtested 
x86_64-linux-gnu).

Trev

gcc/ChangeLog:

2015-02-18  Trevor Saunders  

* ipa-icf.c (sem_function::equals_private): Adjust.
(sem_function::bb_dict_test): Take a vec * instead of
auto_vec.
* ipa-icf.h (bb_dict_test): Likewise.
---
 gcc/ChangeLog |  7 +++
 gcc/ipa-icf.c | 16 
 gcc/ipa-icf.h |  2 +-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 07cadb3..174e5b4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-02-18  Trevor Saunders  
+
+   * ipa-icf.c (sem_function::equals_private): Adjust.
+   (sem_function::bb_dict_test): Take a vec * instead of
+   auto_vec.
+   * ipa-icf.h (bb_dict_test): Likewise.
+
 2015-02-18  Jakub Jelinek  
 
PR gcov-profile/64634
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 692946a..494fdcf 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -563,10 +563,10 @@ sem_function::equals_private (sem_item *item,
  if (e1->flags != e2->flags)
return return_false_with_msg ("flags comparison returns false");
 
- if (!bb_dict_test (bb_dict, e1->src->index, e2->src->index))
+ if (!bb_dict_test (&bb_dict, e1->src->index, e2->src->index))
return return_false_with_msg ("edge comparison returns false");
 
- if (!bb_dict_test (bb_dict, e1->dest->index, e2->dest->index))
+ if (!bb_dict_test (&bb_dict, e1->dest->index, e2->dest->index))
return return_false_with_msg ("BB comparison returns false");
 
  if (!m_checker->compare_edge (e1, e2))
@@ -1053,21 +1053,21 @@ sem_function::icf_handled_component_p (tree t)
corresponds to TARGET.  */
 
 bool
-sem_function::bb_dict_test (auto_vec bb_dict, int source, int target)
+sem_function::bb_dict_test (vec *bb_dict, int source, int target)
 {
   source++;
   target++;
 
-  if (bb_dict.length () <= (unsigned)source)
-bb_dict.safe_grow_cleared (source + 1);
+  if (bb_dict->length () <= (unsigned)source)
+bb_dict->safe_grow_cleared (source + 1);
 
-  if (bb_dict[source] == 0)
+  if ((*bb_dict)[source] == 0)
 {
-  bb_dict[source] = target;
+  (*bb_dict)[source] = target;
   return true;
 }
   else
-return bb_dict[source] == target;
+return (*bb_dict)[source] == target;
 }
 
 /* Iterates all tree types in T1 and T2 and returns true if all types
diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h
index adbedd6..a55699b 100644
--- a/gcc/ipa-icf.h
+++ b/gcc/ipa-icf.h
@@ -275,7 +275,7 @@ private:
 
   /* Basic blocks dictionary BB_DICT returns true if SOURCE index BB
  corresponds to TARGET.  */
-  bool bb_dict_test (auto_vec bb_dict, int source, int target);
+  bool bb_dict_test (vec *bb_dict, int source, int target);
 
   /* Iterates all tree types in T1 and T2 and returns true if all types
  are compatible. If COMPARE_POLYMORPHIC is set to true,
-- 
2.3.0.81.g664101d

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Joel Brobecker

> Why do you want to turn off zlib? On Linux/x86,  zlib is required
> for assembler.  At least, you should issue an error when --without-libz
> is used in binutils for Linux/x86 target.

I am trying to do the exact opposite, which is to provide an option
to compile WITH zlib, but using an install at a non-standard location.

> I guess someone has asked it before.  Why can't zlib be made the
> same as
> 
>   --with-mpc=PATH specify prefix directory for installed MPC package.
>   Equivalent to --with-mpc-include=PATH/include plus
>   --with-mpc-lib=PATH/lib
>   --with-mpc-include=PATH specify directory for installed MPC include files
>   --with-mpc-lib=PATH specify directory for the installed MPC library
> 
> It is more flexible than your patch.  If you have some existing packages
> which use your scheme, you can translate the configure command line
> options to this one.

This is fustrating. I already answered that question.

-- 
Joel

[PATCH] Fix for PR c++/60269

2015-02-18 Thread Iyer, Balaji V

Hello Everyone,
    Attached, please find a patch that is a fix for PR c++/60269. 
Tested on x86_64 and have no regression issues. Is this OK for trunk?

Thanks,

Balaji V. Iyer.


+2015-02-18  Balaji V. Iyer  
+
+   PR c++/60269
+   * parser.c (cp_parser_cilk_simd_vectorlength): Added a check for
+   template handling.  If so, then defer the validity checks to pt.c.
+   * pt.c (tsubst_omp_clauses): Added a check for invalid vectorlength
+   for Cilk Plus SIMD loops.
+

+2015-02-18  Balaji V. Iyer  
+
+   PR c++/60269
+   * g++.dg/cilk-plus/pr60269.C: New test.
+
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e0b455c..97ddee4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -33207,13 +33207,19 @@ cp_parser_cilk_simd_vectorlength (cp_parser *parser, 
tree clauses,
  error mark node then they would have emitted an error message.  */
   if (expr == error_mark_node)
 ;
-  else if (!TREE_TYPE (expr)
-  || !TREE_CONSTANT (expr)
-  || !INTEGRAL_TYPE_P (TREE_TYPE (expr)))
-error_at (loc, "vectorlength must be an integer constant");
-  else if (TREE_CONSTANT (expr)
+  else if (!processing_template_decl 
+  && (!TREE_TYPE (expr) || !TREE_CONSTANT (expr)
+  || !INTEGRAL_TYPE_P (TREE_TYPE (expr
+{
+  error_at (loc, "vectorlength must be an integer constant");
+  expr = error_mark_node;
+}
+  else if (!processing_template_decl && TREE_CONSTANT (expr)
   && exact_log2 (TREE_INT_CST_LOW (expr)) == -1)
-error_at (loc, "vectorlength must be a power of 2");
+{
+  error_at (loc, "vectorlength must be a power of 2");
+  expr = error_mark_node;
+}
   else 
 {
   tree c;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 9a00d0d..dc1bae8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13491,6 +13491,18 @@ tsubst_omp_clauses (tree clauses, bool declare_simd,
  OMP_CLAUSE_OPERAND (nc, 0)
= tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, 
   in_decl, /*integral_constant_expression_p=*/false);
+ if (flag_cilkplus && OMP_CLAUSE_CODE (nc) == OMP_CLAUSE_SAFELEN)
+   {
+ tree new_expr = OMP_CLAUSE_OPERAND (nc, 0);
+ if (!new_expr || new_expr == error_mark_node)
+   ;
+ else if (!TREE_TYPE (new_expr) || !TREE_CONSTANT (new_expr)
+ || !INTEGRAL_TYPE_P (TREE_TYPE (new_expr)))
+   error ("vectorlength must be an integer constant");
+ else if (TREE_CONSTANT (new_expr) 
+  && exact_log2 (TREE_INT_CST_LOW (new_expr)) == -1)
+   error ("vectorlength must be a power of 2");
+   }   
  break;
case OMP_CLAUSE_REDUCTION:
  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (oc))
diff --git a/gcc/testsuite/g++.dg/cilk-plus/pr60269.C 
b/gcc/testsuite/g++.dg/cilk-plus/pr60269.C
new file mode 100644
index 000..fa0c25b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/pr60269.C
@@ -0,0 +1,17 @@
+// { dg-do compile }
+// { dg-options "-O3 -fcilkplus" } 
+
+template 
+void foo (int *a, int *b, int *c)
+{
+#pragma simd vectorlength (N)
+for (int i = 0; i < N; i++)
+  a[i] = b[i] * c[i];
+}
+
+void
+bar (int *a, int *b, int *c)
+{
+foo <64> (a, b, c);
+}
+

Re: ping #3: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Mark Wielaard

On Wed, Feb 18, 2015 at 01:54:17PM -0800, H.J. Lu wrote:
> On Wed, Feb 18, 2015 at 1:40 PM, Mark Wielaard  wrote:
> > On Wed, 2015-02-18 at 12:53 -0800, H.J. Lu wrote:
> >> On Wed, Feb 18, 2015 at 12:32 PM, Mark Wielaard  wrote:
> >> > That doesn't seem like a smart default. And why is is Linux/x86 only?
> >> > Shouldn't that be something that is done explicitly by a distro
> >> > configuring binutils after making sure it actually is beneficial
> >> > (debuginfo is often compressed in a different way, on the package/file
> >> > level or with dwz). And after making sure all tools actually work with
> >> > it? There are various tools that don't handle the .zdebug format like
> >> > valgrind. And at least elfutils has trouble with it for ET_REL files,
> >> > like kernel modules, because relocations don't actually apply anymore to
> >> > the section data as is (but only after the decompression).
> >>
> >> Now it becomes a monthly topic:
> >>
> >> https://sourceware.org/ml/binutils/2015-01/msg00089.html
> >
> > Thanks, I hadn't seen that before. Alan Modra makes some good points in
> > that thread why it is not a good change:
> > https://sourceware.org/ml/binutils/2015-01/msg00135.html
> > Do people agree with that? And/Or can the change be reverted for now
> > till there is agreement it is a desirable default?
> >
> 
> It may not be a good idea for all targets.  If you find an issue
> on Linux/x86, please file a bug binutils report.

The issue is that this is not something that is target architecture
specific. As others have pointed out this isn't something that is
target architecture-dependent. So please first get agreement on whether
or not to default for the OS (or for all ELF targets or the GNU targets).
Otherwise distros will have to revert on a target by target basis to get
something consistent. Secondly the bug is not directly in binutils (but
there might be an issue between versions compiled with/without zlib
support). If .zdebug sections are left in on disk ET_REL files, like
kernel modules, there is a problem for programs that don't deal with
.zdebug sections (and/or relocations against them) in ET_REL files
like elfutils, systemtap, debugedit, dwz, etc.

Thanks,

Mark

Re: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Thomas Schwinge

Hi!

On Wed, 7 Jan 2015 17:00:59 +0100, Tristan Gingold  wrote:
> 
> > On 07 Jan 2015, at 15:45, Joel Brobecker  wrote:
> > This patch enhances config/zlib.m4 to introduce an extra option
> > --with-libz-prefix which allows us to provide the location of
> > the zlib library we want to use during the build.
> 
> I prefer the gcc way to provide external library:
> 
> --with-zlib -> system zlib used
> --with-zlib=pathname -> zlib from pathname is used
> 
> I have never needed different include and lib paths, but
> this is supported by gcc.
> 
> (Furthermore, I think that --with-zlib vs --with-libz-prefix is confusing).

I'm not a GCC build machinery maintainer, but I do second Tristan's
suggestion to stay compatible with the existing --with-[...] options that
GCC already supports:

> Cf:
> 
> --with-gmp=pathname
> --with-gmp-include=pathname
> --with-gmp-lib=pathname
> --with-mpfr=pathname
> --with-mpfr-include=pathname
> --with-mpfr-lib=pathname
> --with-mpc=pathname
> --with-mpc-include=pathname
> --with-mpc-lib=pathname
> If you want to build GCC but do not have the GMP library, the MPFR library 
> and/or the MPC library installed in a standard location and do not have their 
> sources present in the GCC source tree then you can explicitly specify the 
> directory where they are installed (‘--with-gmp=gmpinstalldir’, 
> ‘--with-mpfr=mpfrinstalldir’, ‘--with-mpc=mpcinstalldir’). The 
> --with-gmp=gmpinstalldir option is shorthand for 
> --with-gmp-lib=gmpinstalldir/lib and 
> --with-gmp-include=gmpinstalldir/include. Likewise the 
> --with-mpfr=mpfrinstalldir option is shorthand for 
> --with-mpfr-lib=mpfrinstalldir/lib and 
> --with-mpfr-include=mpfrinstalldir/include, also the --with-mpc=mpcinstalldir 
> option is shorthand for --with-mpc-lib=mpcinstalldir/lib and 
> --with-mpc-include=mpcinstalldir/include. If these shorthand assumptions are 
> not correct, you can use the explicit include and lib options directly. You 
> might also need to ensure the shared libraries can be found by the dynamic 
> linker when building and using GCC, for example by setting the runtime shared 
> library path variable (LD_LIBRARY_PATH on GNU/Linux and Solaris systems).
> These flags are applicable to the host platform only. When building a cross 
> compiler, they will not be used to configure target libraries. 


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [RFA] Add --with-libz-prefix option in config/zlib.m4

2015-02-18 Thread Thomas Schwinge

Hi!

On Thu, 19 Feb 2015 08:54:46 +0100, I wrote:
> On Wed, 7 Jan 2015 17:00:59 +0100, Tristan Gingold  
> wrote:
> > 
> > > On 07 Jan 2015, at 15:45, Joel Brobecker  wrote:
> > > This patch enhances config/zlib.m4 to introduce an extra option
> > > --with-libz-prefix which allows us to provide the location of
> > > the zlib library we want to use during the build.
> > 
> > I prefer the gcc way to provide external library:
> > 
> > --with-zlib -> system zlib used
> > --with-zlib=pathname -> zlib from pathname is used
> > 
> > I have never needed different include and lib paths, but
> > this is supported by gcc.
> > 
> > (Furthermore, I think that --with-zlib vs --with-libz-prefix is confusing).
> 
> I'm not a GCC build machinery maintainer, but I do second Tristan's
> suggestion to stay compatible with the existing --with-[...] options that
> GCC already supports:
> 
> > Cf:
> > 
> > --with-gmp=pathname
> > --with-gmp-include=pathname
> > --with-gmp-lib=pathname
> > --with-mpfr=pathname
> > --with-mpfr-include=pathname
> > --with-mpfr-lib=pathname
> > --with-mpc=pathname
> > --with-mpc-include=pathname
> > --with-mpc-lib=pathname
> > If you want to build GCC but do not have the GMP library, the MPFR library 
> > and/or the MPC library installed in a standard location and do not have 
> > their sources present in the GCC source tree then you can explicitly 
> > specify the directory where they are installed (‘--with-gmp=gmpinstalldir’, 
> > ‘--with-mpfr=mpfrinstalldir’, ‘--with-mpc=mpcinstalldir’). The 
> > --with-gmp=gmpinstalldir option is shorthand for 
> > --with-gmp-lib=gmpinstalldir/lib and 
> > --with-gmp-include=gmpinstalldir/include. Likewise the 
> > --with-mpfr=mpfrinstalldir option is shorthand for 
> > --with-mpfr-lib=mpfrinstalldir/lib and 
> > --with-mpfr-include=mpfrinstalldir/include, also the 
> > --with-mpc=mpcinstalldir option is shorthand for 
> > --with-mpc-lib=mpcinstalldir/lib and 
> > --with-mpc-include=mpcinstalldir/include. If these shorthand assumptions 
> > are not correct, you can use the explicit include and lib options directly. 
> > You might also need to ensure the shared libraries can be found by the 
> > dynamic linker when building and using GCC, for example by setting the 
> > runtime shared library path variable (LD_LIBRARY_PATH on GNU/Linux and 
> > Solaris systems).
> > These flags are applicable to the host platform only. When building a cross 
> > compiler, they will not be used to configure target libraries. 

Ah, now I've seen the other email: zlib is not actually used in GCC, and
GCC and binutils/GDB have already diverged in their handling of such
options -- unfortunately.


Grüße,
 Thomas


signature.asc
Description: PGP signature

99 matches

Mail list logo