Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)

2013-02-21 Thread Arnaud Charlet
> Examples of warnings (many of them occuring many up to several dozens of
> times):
> warning: @anchor should not appear in @heading
> warning: @title missing argument
> warning: @itemize has text but no @item
> warning: node `XXX' is next for `YYY' in menu but not in sectioning
> warning: node `XXX' is up for `YYY' in menu but not in sectioning
> warning: @itemize has text but no @item
> warning: node next `XXX' in menu `YYY' and in sectioning `ZZZ' differ
> warning: @tex should only appear at a line beginning
> warning: command @tie does not accept arguments
> 
> Ok for trunk?

The ChangeLog entries should be split in the various directories.
Changes in the ada directory are OK, thanks.

Arno


Re: [Patch, Fortran] PR 56385: [4.6/4.7/4.8 Regression] [OOP] ICE with allocatable function result in a procedure-pointer component

2013-02-21 Thread Tobias Burnus

Janus Weil wrote:

here is a straightforward patch which fixes a regression with
procedure-pointer components which have an allocatable result.
Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk/4.7/4.6?


OK thanks for the patch.

Tobias


2013-02-20  Janus Weil  

 PR fortran/56385
 * trans-array.c (structure_alloc_comps): Handle procedure-pointer
 components with allocatable result.

2013-02-20  Janus Weil  

 PR fortran/56385
 * gfortran.dg/proc_ptr_comp_37.f90: New.




Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support

2013-02-21 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
>  This issue was originally raised here:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html
>
>  We have a shortcoming in GCC in that we only allow the use half of the FP 
> MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register 
> mode (CP0.Status.FR == 1) on MIPS32r2 processors.  Furthermore we never 
> enable the other half (NMADD.fmt and NMSUB.fmt) on those processors.  
> However this whole instruction subset is always available on MIPS32r2 FPUs 
> regardless of the mode selected, just as it always has been on FPUs of the 
> 64-bit ISA line from MIPS IV up.

Hmm, this was discussed here:

http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00488.html
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00492.html

The footnote for COP1X instructions on page 12 of volume 1 of the MIPS32 ISA
(v2.50) says:

1. In Release 1 of the Architecture, these instructions are legal only
   with a MIPS64 processor with 64-bit operations enabled (they are, in
   effect, actually MIPS64 instructions). In Release 2 of the Architecture,
   these instructions are legal with either a MIPS32 or MIPS64 processor
   _which includes a 64-bit floating point unit_.

(my emphasis).  "which" rather than "that" makes this a bit ambiguous,
but various comments in the rest of the manual imply that MIPS32r2 allows
an implementation choice between 32-bit and 64-bit FPUs.  E.g. page 8 says:

  Support for 64-bit coprocessors with 32-bit CPUs: These changes allow
  a 64-bit coprocessor (including an FPU) to be attached to a 32-bit
  CPU. This enhancement is optional in a Release 2 implementation.

and page 45 says:

  In addition to an Instruction Set Architecture, the MIPS architecture
  definition includes processing resources such as the set of
  coprocessor general registers. In Release 1 of the Architecture, the
  32-bit registers in MIPS32 were enlarged to 64-bits in MIPS64;
  however, these 64-bit FPU registers are not backwards
  compatible. Instead, processors implementing the MIPS64 Architecture
  provide a mode bit to select either the 32-bit or 64-bit register
  model. In Release 2 of the Architecture, a 32-bit CPU _may_ include a
  full 64-bit coprocessor, including a floating point unit which
  implements the same mode bit to select 32-bit or 64-bit FPU register
  model.

On page 322 of volume 2, the footnote for "Table A-20 MIPS64 COP1X
Encoding of Function Field" uses slightly different wording:

  COP1X instructions are legal only if 64-bit floating point operations
  are enabled.

So was this all a big misunderstanding on my part?  The TARGET_FLOAT64
condition came from MIPS themselves, and when challenged they seemed
pretty adamant that it was correct.  If I was wrong to be convinced
by the explanation, I hope you can at least see why I was convinced. :-)

If it wasn't a misunderstanding, then the point is that we can't tell
from ISA_MIPS32R2 alone whether the target has a 32-bit or 64-bit FPU,
but we know that it must have a 64-bit FPU if using TARGET_FLOAT64.

>  Also, according to MIPS IV ISA documentation these operations are only 
> fused (i.e. don't match original IEEE 754-1985 accuracy requirements) on 
> the original MIPS IV R8000 CPU, and MIPS architecture specs don't mention 
> any limitations of these instructions either, so I have updated the GCC 
> manual to document that on non-R8000 CPUs (which are ones we really care 
> about) they are numerically equivalent to computations made with 
> corresponding individual operations.

This part is OK, thanks, and is probably the only bit that's suitable for
4.8 at this stage.  Would you mind applying it separately?

>  Finally, while at it, I found it interesting that we have separate 
> conditions to cover MADD.fmt/MSUB.fmt (ISA_HAS_FP_MADD4_MSUB4) and 
> NMADD.fmt/NMADD.fmt (ISA_HAS_NMADD4_NMSUB4) while all the four 
> instructions need to be implemented as a whole group per data format 
> supported and cannot be separated (the MIPS architecture specification 
> explicitly forbids subsetting).  The difference between the two conditions 
> is the former expands to ISA_HAS_FP4, that is enables the subsubset for 
> any MIPS IV and up FPU while the latter has an extra "&& (!TARGET_MIPS5400 
> || TARGET_MAD)" qualifier.
>
>  I went ahead and checked available NEC VR54xx documentation and here's 
> what I came up with:
>
> 1. "VR5400 MIPS RISC Microprocessor Family" datasheet (NEC doc #13362) 
>says:
>
>"The VR5400 processor family complies with the MIPS IV instruction set 
>and IEEE-754 floating-point and IEEE-1149.1/1149.1a JTAG specification, 
>[...]"
>
> 2. "VR5432 MIPS RISC Microprocessor User's Manual, Volume 2" (NEC doc 
>#13751) lists all the individual MADD.fmt, MSUB.fmt, NMADD.fmt and
>NMSUB.fmt instructions in Chapter 18 "Floating-Point Unit Instruction 
>Set" with no restrictions as to their availability (the only other 
>member of the VR54xx family I know of is the VR5464

Re: [PATCH] Don't expand *MEM_REF using extract_bit_field or movmisalign for EXPAND_MEMORY (PR inline-asm/56405)

2013-02-21 Thread Richard Biener
On Thu, 21 Feb 2013, Jakub Jelinek wrote:

> Hi!
> 
> If an input operand of inline-asm doesn't allow registers, but allows
> memory, we expand it with EXPAND_MEMORY modifier (the only case of using
> that modifier).  But the movmisalign code added for 4.6 and especially
> the extract_bit_field code added for 4.8 results in getting a REG from
> expand_expr called with a *MEM_REF with EXPAND_MEMORY, instead of MEM the
> caller expects and can only handle, therefore we ICE.
> 
> This patch fixes it by honoring EXPAND_MEMORY in these cases, it is the sole
> responsibility of the inline asm writer to do the right thing in there for
> misaligned mems (e.g. gcc could pessimistically expect it might be
> misaligned, but user knows it will not).
> 
> Bootstrapped/regtested on {x86_64,i686,armv7hl,ppc,ppc64,s390,s390x}-linux,
> ok for trunk?

Ok.

Thanks,
Richard.

> 2013-02-21  Jakub Jelinek  
> 
>   PR inline-asm/56405
>   * expr.c (expand_expr_real_1) : Don't
>   use movmisalign or extract_bit_field for EXPAND_MEMORY modifier.
> 
>   * gcc.c-torture/compile/pr56405.c: New test.
> 
> --- gcc/expr.c.jj 2013-01-18 18:09:40.0 +0100
> +++ gcc/expr.c2013-02-20 10:29:34.513143634 +0100
> @@ -9551,6 +9551,7 @@ expand_expr_real_1 (tree exp, rtx target
>   set_mem_addr_space (temp, as);
>   align = get_object_alignment (exp);
>   if (modifier != EXPAND_WRITE
> + && modifier != EXPAND_MEMORY
>   && mode != BLKmode
>   && align < GET_MODE_ALIGNMENT (mode)
>   /* If the target does not have special handling for unaligned
> @@ -9639,6 +9640,7 @@ expand_expr_real_1 (tree exp, rtx target
>   if (TREE_THIS_VOLATILE (exp))
> MEM_VOLATILE_P (temp) = 1;
>   if (modifier != EXPAND_WRITE
> + && modifier != EXPAND_MEMORY
>   && mode != BLKmode
>   && align < GET_MODE_ALIGNMENT (mode))
> {
> --- gcc/testsuite/gcc.c-torture/compile/pr56405.c.jj  2013-02-20 
> 10:32:17.807250979 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr56405.c 2013-02-20 
> 10:32:46.963090873 +0100
> @@ -0,0 +1,7 @@
> +/* PR inline-asm/56405 */
> +
> +void
> +foo (void)
> +{
> +  asm volatile ("" : "+m" (*(volatile unsigned short *) 0x1001UL));
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)

2013-02-21 Thread Richard Biener
On Thu, 21 Feb 2013, Jakub Jelinek wrote:

> Hi!
> 
> Currently it is not possible to bootstrap gcc with texinfo 5.0.
> This patch attempts to fix the errors that prevent bootstrap, there are tons
> of warnings this doesn't address and would be good if somebody more TeXinfo
> knowledgeable looked at it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux both with texinfo 5.0
> and older texinfo installed.
> 
> The errors were:
> ../../gcc/doc/invoke.texi:5615: @itemx must follow @item
> ../../gcc/ada/gnat-style.texi:45: unknown command `hfill'
> doc/projects.texi:51: @pxref reference to nonexistent node `Scenarios
>   in Projects'
> doc/projects.texi:363: @pxref reference to nonexistent node `Organizing
>   Projects into Subsystems'
> doc/projects.texi:391: @xref reference to nonexistent node `Project
>   Extension'
> ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected 
> `smallexample', but saw `table'
> ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected 
> `smallexample', but saw `table'
> ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected 
> `smallexample', but saw `table'
> 
> Examples of warnings (many of them occuring many up to several dozens of 
> times):
> warning: @anchor should not appear in @heading
> warning: @title missing argument
> warning: @itemize has text but no @item
> warning: node `XXX' is next for `YYY' in menu but not in sectioning
> warning: node `XXX' is up for `YYY' in menu but not in sectioning
> warning: @itemize has text but no @item
> warning: node next `XXX' in menu `YYY' and in sectioning `ZZZ' differ
> warning: @tex should only appear at a line beginning
> warning: command @tie does not accept arguments
> 
> Ok for trunk?

Ok.  Should we fix the error on branches as well?

Thanks,
Richard.

> 2013-02-21  Jakub Jelinek  
> 
>   PR bootstrap/56258
>   * doc/invoke.texi (-fdump-rtl-pro_and_epilogue): Use @item
>   instead of @itemx.
> 
>   * gnat-style.texi (@title): Remove @hfill.
>   * projects.texi: Avoid line wrapping inside of @pxref or
>   @xref.
> 
>   * doc/cp-tools.texinfo (Virtual Machine Options): Use just
>   one @gccoptlist instead of 3 separate ones.
> 
> --- gcc/doc/invoke.texi.jj2013-01-31 22:57:22.0 +0100
> +++ gcc/doc/invoke.texi   2013-02-20 13:06:47.516405739 +0100
> @@ -5612,7 +5612,7 @@ Dump after the peephole pass.
>  @opindex fdump-rtl-postreload
>  Dump after post-reload optimizations.
>  
> -@itemx -fdump-rtl-pro_and_epilogue
> +@item -fdump-rtl-pro_and_epilogue
>  @opindex fdump-rtl-pro_and_epilogue
>  Dump after generating the function prologues and epilogues.
>  
> --- gcc/ada/gnat-style.texi.jj2012-08-10 12:57:33.0 +0200
> +++ gcc/ada/gnat-style.texi   2013-02-20 13:06:03.042667300 +0100
> @@ -42,7 +42,7 @@ Texts.  A copy of the license is include
>  @titlepage
>  @titlefont{GNAT Coding Style:}
>  @sp 1
> -@title @hfill A Guide for GNAT Developers
> +@title A Guide for GNAT Developers
>  @subtitle GNAT, The GNU Ada Compiler
>  @versionsubtitle
>  @author Ada Core Technologies, Inc.
> --- gcc/ada/projects.texi.jj  2013-01-04 11:16:24.0 +0100
> +++ gcc/ada/projects.texi 2013-02-20 17:48:41.582645159 +0100
> @@ -48,8 +48,7 @@ project files allow you to specify:
>  @item Source file naming conventions; you can specify these either globally 
> or for
>individual compilation units (@pxref{Naming Schemes}).
>  @item Change any of the above settings depending on external values, thus 
> enabling
> -  the reuse of the projects in various @b{scenarios} (@pxref{Scenarios
> -  in Projects}).
> +  the reuse of the projects in various @b{scenarios} (@pxref{Scenarios in 
> Projects}).
>  @item Automatically build libraries as part of the build process
>(@pxref{Library Projects}).
>  
> @@ -360,8 +359,8 @@ locating the specified source files in t
>  
>  @item For various reasons, it is sometimes useful to have a project with no
>sources (most of the time because the attributes defined in the project
> -  file will be reused in other projects, as explained in @pxref{Organizing
> -  Projects into Subsystems}. To do this, the attribute
> +  file will be reused in other projects, as explained in
> +  @pxref{Organizing Projects into Subsystems}. To do this, the attribute
>@emph{Source_Files} is set to the empty list, i.e. @code{()}. 
> Alternatively,
>@emph{Source_Dirs} can be set to the empty list, with the same
>result.
> @@ -388,8 +387,9 @@ locating the specified source files in t
>This can be done thanks to the attribute @b{Excluded_Source_Files}
>(or its synonym @b{Locally_Removed_Files}).
>Its value is the list of file names that should not be taken into account.
> -  This attribute is often used when extending a project, @xref{Project
> -  Extension}. A similar attribute @b{Excluded_Source_List_File} plays the 
> same
> +  This attribute is often used when extending a p

[Patch, Fortran, committed] PR56416 - Fix texinfo 5.0 warnings

2013-02-21 Thread Tobias Burnus
Motivated by Jakub's email at 
http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00966.html


The problem was that the order in @menu didn't match the order of the 
@node (@section/@chapter).


I have committed the attached patch as obvious (Rev. 196194). 
fortran/*.texi is now warning free for "make info" and "make html". 
("make pdf" still shows the usual TeX warnings about overfull boxes and 
similar.)


Tobias
2012-02-21  Tobias Burnus  

	PR fortran/56416
	* gfortran.texi (Part II: Language Reference, Extensions,
	Non-Fortran Main Program): Sort @menu to match actual section order.
	* intrinsic.texi (Intrinsic Procedures): Ditto.
	(C_F_POINTER, PRECISION): Move to the alphabetically correct place.

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 2dccb16..462b443 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -182,8 +182,8 @@ Part I: Invoking GNU Fortran
 Part II: Language Reference
 * Fortran 2003 and 2008 status::  Fortran 2003 and 2008 features supported by GNU Fortran.
 * Compiler Characteristics::  User-visible implementation details.
+* Extensions::Language extensions implemented by GNU Fortran.
 * Mixed-Language Programming::Interoperability with C
-* Extensions::   Language extensions implemented by GNU Fortran.
 * Intrinsic Procedures:: Intrinsic procedures supported by GNU Fortran.
 * Intrinsic Modules::Intrinsic modules supported by GNU Fortran.
 
@@ -1348,8 +1348,8 @@ without warning.
 * Commas in FORMAT specifications::
 * Missing period in FORMAT specifications::
 * I/O item lists::
-* BOZ literal constants::
 * @code{Q} exponent-letter::
+* BOZ literal constants::
 * Real array indices::
 * Unary operators::
 * Implicitly convert LOGICAL and INTEGER values::
@@ -2698,8 +2698,8 @@ the same declaration part as the variable or procedure pointer.
 * _gfortran_set_options:: Set library option flags
 * _gfortran_set_convert:: Set endian conversion
 * _gfortran_set_record_marker:: Set length of record markers
-* _gfortran_set_max_subrecord_length:: Set subrecord length
 * _gfortran_set_fpe:: Set when a Floating Point Exception should be raised
+* _gfortran_set_max_subrecord_length:: Set subrecord length
 @end menu
 
 Even if you are doing mixed-language programming, it is very
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 91f2fea..4a48425 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -87,9 +87,9 @@ Some basic guidelines for editing this document:
 * @code{CHMOD}: CHMOD, Change access permissions of files
 * @code{CMPLX}: CMPLX, Complex conversion function
 * @code{COMMAND_ARGUMENT_COUNT}: COMMAND_ARGUMENT_COUNT, Get number of command line arguments
-* @code{COMPLEX}:   COMPLEX,   Complex conversion function
-* @code{COMPILER_VERSION}: COMPILER_VERSION, Compiler version string
 * @code{COMPILER_OPTIONS}: COMPILER_OPTIONS, Options passed to the compiler
+* @code{COMPILER_VERSION}: COMPILER_VERSION, Compiler version string
+* @code{COMPLEX}:   COMPLEX,   Complex conversion function
 * @code{CONJG}: CONJG, Complex conjugate function
 * @code{COS}:   COS,   Cosine function
 * @code{COSH}:  COSH,  Hyperbolic cosine function
@@ -234,12 +234,12 @@ Some basic guidelines for editing this document:
 * @code{PRESENT}:   PRESENT,   Determine whether an optional dummy argument is specified
 * @code{PRODUCT}:   PRODUCT,   Product of array elements
 * @code{RADIX}: RADIX, Base of a data model
+* @code{RAN}:   RAN,   Real pseudo-random number
+* @code{RAND}:  RAND,  Real pseudo-random number
 * @code{RANDOM_NUMBER}: RANDOM_NUMBER, Pseudo-random number
 * @code{RANDOM_SEED}:   RANDOM_SEED, Initialize a pseudo-random number sequence
-* @code{RAND}:  RAND,  Real pseudo-random number
 * @code{RANGE}: RANGE, Decimal exponent range
 * @code{RANK} : RANK,  Rank of a data object
-* @code{RAN}:   RAN,   Real pseudo-random number
 * @code{REAL}:  REAL,  Convert to real type 
 * @code{RENAME}:RENAME,Rename a file
 * @code{REPEAT}:REPEAT,Repeated string concatenation
@@ -2271,60 +2271,57 @@ end subroutine association_test
 @end table
 
 
-@node C_FUNLOC
-@section @code{C_FUNLOC} --- Obtain the C address of a procedure
-@fnindex C_FUNLOC
-@cindex pointer, C address of procedures
+@node C_F_POINTER
+@section @code{C_F_POINTER} --- Convert C into Fortran pointer
+@fnindex C_F_POINTER
+@cindex pointer, convert C to Fortran
 
 @table @asis
 @item @emph{Description}:
-@code{C_FUNLOC(x)} determines the C address of the argument.
+@code{C_F_POINTER(CPTR, FPTR[, SHAPE])} assigns the target of the C pointer
+@var{CPTR} to the Fortran pointer @var{FPTR} and specifies its shape.
 
 @item @emph{Standard}:
 Fortran 2003 and later
 
 @item @emph{Class}:
-Inquiry function
+Subroutine
 
 @item @emph{Synt

Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 10:07:56AM +0100, Richard Biener wrote:
> > Ok for trunk?
> 
> Ok.  Should we fix the error on branches as well?

The patch applied cleanly to both 4.7 and 4.6, so I've committed it there,
but as I don't have texinfo 5.0 installed on my devel WS (we have it just
for Fedora 19 and in the buildsystem for that right now), I haven't verified
those are the only errors with texinfo 5.0 on the release branches.

Jakub


Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 1:10 AM, Steven Bosscher  wrote:
> Hello,
>
> The attached patch splits a new function df_insn_info_delete from
> df_insn_delete. The original motivation was to get rid of the silly
> "deleting insn with uid = ..." messages when re-scanning an insn,
> because the mentioned insn isn't deleted at all (it's just rescanned).
> But it turns out that there is also a modest but measurable speed-up
> (especially at -O0), probably because of avoiding the overhead of
> df_grow_bb_info and df_grow_reg_info in common usage of
> df_insn_delete.
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu and on
> x86_64-unknown-linux-gnu. OK for trunk?

+/* Delete all of the refs information from the insn with UID.
+   Internal helper for df_insn_info_delete, df_insn_rescan, and other

df_insn_delete I suppose

+   df-scan routines that don't have to work in deferred mode and don't
+   have to mark basic blocks for re-processing.  */

-void
-df_insn_delete (basic_block bb, unsigned int uid)
+static void
+df_insn_info_delete (unsigned int uid)
 {

+/* Delete all of the refs information from INSN, either right now
+   or marked for later in deferred mode.
+
+   FIXME: BB must be passed in, to mark the basic block containing
+ the insn as dirty, but emit-rtl.c doesn't do so.  */

you mean the remove_insn call?  So it should call df_insn_info_delete instead?
It _does_ mark the block as dirtly later, if it is not a debug insn.

Your change changes behavior for df->changeable_flags &
DF_DEFER_INSN_RESCAN btw, so it needs someone DF aware to review
and that makes it stage1 material as well I think.

It's also odd how insn_info == NULL is handled here.

Richard.

> Ciao!
> Steven


[PATCH] Add testcase for PR56398

2013-02-21 Thread Marek Polacek
This adds a reduced testcase for PR56398.  Ok for trunk?

2013-02-21  Marek Polacek  

PR tree-optimization/56398
* g++.dg/torture/pr56398.C: New test. 

--- gcc/g++.dg/torture/pr56398.C.mp 2013-02-21 10:58:14.388913070 +0100
+++ gcc/g++.dg/torture/pr56398.C2013-02-21 10:58:53.301026474 +0100
@@ -0,0 +1,22 @@
+// { dg-do compile }
+// { dg-options "-g" }
+
+namespace
+{
+#0 "/usr/include/c/4.8/bits/postypes.h" 3
+}
+
+vtkpow (int b)
+{
+  int a1;
+  int b1;
+  int c;
+  while (b1)
+{
+  while (b)
+b1 = 0;
+  b1 = b1 - 1;
+  c = c * a1;
+}
+  return c;
+}

Marek


Re: [PATCH] Add testcase for PR56398

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 11:05 AM, Marek Polacek  wrote:
> This adds a reduced testcase for PR56398.  Ok for trunk?

Ok.

Thanks,
Richard.

> 2013-02-21  Marek Polacek  
>
> PR tree-optimization/56398
> * g++.dg/torture/pr56398.C: New test.
>
> --- gcc/g++.dg/torture/pr56398.C.mp 2013-02-21 10:58:14.388913070 +0100
> +++ gcc/g++.dg/torture/pr56398.C2013-02-21 10:58:53.301026474 +0100
> @@ -0,0 +1,22 @@
> +// { dg-do compile }
> +// { dg-options "-g" }
> +
> +namespace
> +{
> +#0 "/usr/include/c/4.8/bits/postypes.h" 3
> +}
> +
> +vtkpow (int b)
> +{
> +  int a1;
> +  int b1;
> +  int c;
> +  while (b1)
> +{
> +  while (b)
> +b1 = 0;
> +  b1 = b1 - 1;
> +  c = c * a1;
> +}
> +  return c;
> +}
>
> Marek


Re: [PATCH][RFC] Less TODO_remove_unused_locals

2013-02-21 Thread Jan Hubicka



Hunting for the "we're getting slower" bits I noticed that
TODO_remove_unused_locals is a big part of execute_function_todo
(and accounts for 1% of compile-time of ac.f90).
The following patch removes most of the remove_unused_locals
calls based on the fact that with anonymous SSA names now available
we should never create new locals (wishful thinking of course ...)


We still can kill references to existing ones.


and the important places to remove unused stuff are driven by
1) avoid creating yet another copy of the unused stuff, thus do
it before inlining, on the callee;  2) avoid pinning unused memory
while we operate on other function bodies, thus, do it at the end
of non-IPA pass pipelines

In the end this asks for more explicit placement and thus a
real pass ... but the following should be enough as a RFC and
be good enough for 4.8.


Yes, it looks good to me.  I introduced this remove_unused_locals stuff
and I was aware it is executed quite too often. I meant to get some  
data on this

but never got around that and moreover the number of execution has increased
over the years as we increased number of passes...

My recollection is that at that time the pass also did dropping of  
some flags (ADDRESSABLE?) promoting into Gimple registers.  This was  
motivation to keep it working often since it was code quality thing  
too. I suppose this is now done elsewhere.


Honza


Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed

2013-02-21 Thread Steven Bosscher
On Thu, Feb 21, 2013 at 10:59 AM, Richard Biener wrote:
> On Thu, Feb 21, 2013 at 1:10 AM, Steven Bosscher wrote:
> +/* Delete all of the refs information from the insn with UID.
> +   Internal helper for df_insn_info_delete, df_insn_rescan, and other
>
> df_insn_delete I suppose

Right.


> +   df-scan routines that don't have to work in deferred mode and don't
> +   have to mark basic blocks for re-processing.  */
>
> -void
> -df_insn_delete (basic_block bb, unsigned int uid)
> +static void
> +df_insn_info_delete (unsigned int uid)
>  {
>
> +/* Delete all of the refs information from INSN, either right now
> +   or marked for later in deferred mode.
> +
> +   FIXME: BB must be passed in, to mark the basic block containing
> + the insn as dirty, but emit-rtl.c doesn't do so.  */
>
> you mean the remove_insn call?  So it should call df_insn_info_delete instead?

Yes, remove_insn calls df_insn_delete with argument bb=NULL. This goes
against the interface documented in df-scan. It should call
df_insn_delete with a non-NULL bb argument.

df_insn_info_delete should remain internal to df-scan.c. (Ideally,
"struct df_insn_info" would be internal to DF, IMHO.)


> It _does_ mark the block as dirtly later, if it is not a debug insn.

Yes, it does. But it should have been left up to df_insn_delete to
make the block dirty. It looks like remove_insn only passes NULL
because it calls df_insn_delete on non-insns and it doesn't pass
BLOCK_FOR_INSN because it feeds BARRIERs to df_insn_delete -- totally
bogus of course...

For GCC 4.9 I want to make df_insn_delete take care of the basic block
argument itself, and remove all df_set_bb_dirty calls from emit-rtl.c.


> Your change changes behavior for df->changeable_flags &
> DF_DEFER_INSN_RESCAN btw,

Not really. All paths to df_insn_info_delete are non-deferred.


> so it needs someone DF aware to review
> and that makes it stage1 material as well I think.

Also good. I think the patch is quite safe but it's obviously not a
bug fix (except perhaps for making the dumps less confusing). Let's
consider this queued for GCC 4.9.
FWIW, I like to think of myself as quite DF aware ;-)


> It's also odd how insn_info == NULL is handled here.

Yes, for GCC 4.9 I want to put a gcc_assert in df_insn_info_delete to
make sure insn_info is always non-NULL. It can be NULL now for NOTEs
and BARRIERs (via the remove_insn path).

Ciao!
Steven


Re: [PATCH][RFC] Less TODO_remove_unused_locals

2013-02-21 Thread Richard Biener
On Thu, 21 Feb 2013, Jan Hubicka wrote:

> 
> > > Hunting for the "we're getting slower" bits I noticed that
> > > TODO_remove_unused_locals is a big part of execute_function_todo
> > > (and accounts for 1% of compile-time of ac.f90).
> > > The following patch removes most of the remove_unused_locals
> > > calls based on the fact that with anonymous SSA names now available
> > > we should never create new locals (wishful thinking of course ...)
> 
> We still can kill references to existing ones.
> 
> > > and the important places to remove unused stuff are driven by
> > > 1) avoid creating yet another copy of the unused stuff, thus do
> > > it before inlining, on the callee;  2) avoid pinning unused memory
> > > while we operate on other function bodies, thus, do it at the end
> > > of non-IPA pass pipelines
> > > 
> > > In the end this asks for more explicit placement and thus a
> > > real pass ... but the following should be enough as a RFC and
> > > be good enough for 4.8.
> 
> Yes, it looks good to me.  I introduced this remove_unused_locals stuff
> and I was aware it is executed quite too often. I meant to get some data on
> this
> but never got around that and moreover the number of execution has increased
> over the years as we increased number of passes...
> 
> My recollection is that at that time the pass also did dropping of some flags
> (ADDRESSABLE?) promoting into Gimple registers.  This was motivation to keep
> it working often since it was code quality thing too. I suppose this is now
> done elsewhere.

Yes, that's done with TODO_update_address_taken which is done in a more
controlled fashion.

Richard.


Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 02:16:19PM +0400, Konstantin Serebryany wrote:
> The attached patch is the libsanitizer merge from upstream r175042.
> 
> A few changes. Among other things:
> - a second attempt to change the shadow offset to 0x7fff8000 (~5%
> speedup).  x86_64-linux-only

I've been applying that i386.c change + asan_mapping.h and asan_rtl.cc
changes in our tree for a few days, and it looks good on all platforms
where asan is supported (i686, x86_64, ppc, ppc64).

> - asan_init is now called from preinit_array (linux only)
> 
> Patch for libsanitizer is automatically generated by libsanitizer/merge.sh
> Tested with
> rm -rf */{*/,}libsanitizer \
>   && make -j 50 \
>   && make -C gcc check-g{cc,++}
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp'
> 
> Our internal LLVM bots (Linux, Mac and Android) are green.
> 
> Ok to commit?

Yes, thanks.

Jakub


Re: Fix for 56175

2013-02-21 Thread Richard Biener
On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> First of all, your proposal to move type sinking to the end of
> function does not work since we handle each statement in function and
> we want that 1st type folding of X & C will not happen.
> Note that we have the following sequence of gimple before forwprop1:
>
>x.0_10 = (signed char) x_8;
>   _11 = x.0_10 & 1;
>   _12 = (signed char) y_9;
>   _13 = _12 & 1;
>   _14 = _11 ^ _13;

Ah, indeed.  Reminds me of some of my dead patches that separated
forwprop into a forward and backward stage.  Of course then you have
the ordering issue of whether to first forward or backward.

Which means that I bet you can construct a testcase that with
your change is no longer optimized (just make pushing the conversion
make the types _match_).  Which is always the case
with this kind of local pattern-matching transforms.

Currently forwprop processes leafs of expression trees first (well, inside
a basic-block), similar to how fold () is supposed to be operated, based
on the idea that simplified / canonicalized leafs helps keeping pattern
recognition simple and cost considerations more accurate.

When one order works better than another you always have to consider
that the user could already have written the code in a way that results
in the input that isn't well handled.

Not that this helps very much for the situation ;)

But I don't like the use of first_pass_instance ... and the fix isn't
an improvement but just a hack for the benchmark.

Richard.

> I also added comment to my fix and create new test for it. I also
> checked that this test is passed with patched compiler  only. So
> Change Log was also modified:
>
> ChangeLog
>
> 2013-02-20  Yuri Rumyantsev  
>
> PR tree-optimization/56175
> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
> for short integer types.
> * gcc.dg/pr56175.c: New test.
>
>
>
>
> 2013/2/20 Richard Biener :
>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>> pattern in simpify_bitwise_binary for short integer types.
>>> The fix is very simple - we simply turn off short type sinking at the
>>> first pass of forward propagation allows to get
>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>> +5-7% for other x86 platforms too.
>>> Bootstrapping and regression testing were successful on x86-64.
>>>
>>> Is it Ok for trunk?
>>
>> It definitely needs a comment before the checks.
>>
>> Also I think it simply shows that the code is placed at the wrong spot.
>> Simply moving it down in simplify_bitwise_binary to be done the very last
>> should get both of the effects done.
>>
>> Can you rework the patch according to that?
>>
>> You also miss a testcase, we should make sure to not regress again here.
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog.
>>>
>>> 2013-02-20  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/56175
>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type sinking
>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>> for short integer types.


Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed

2013-02-21 Thread Paolo Bonzini

> > so it needs someone DF aware to review
> > and that makes it stage1 material as well I think.
> 
> Also good. I think the patch is quite safe but it's obviously not a
> bug fix (except perhaps for making the dumps less confusing). Let's
> consider this queued for GCC 4.9.
> FWIW, I like to think of myself as quite DF aware ;-)

You're definitely DF aware and if the steering committee
wants to appoint you as dataflow reviewer, I would have no
reason to complain...

Still DF only has reviewers, not maintainers, so it needs
someone else DF aware anyway.  If it is 4.9 material perhaps
it is simpler to place it after the patches that kill the basic
block argument.

Paolo


Re: [Patch, Fortran] PR 56385: [4.6/4.7/4.8 Regression] [OOP] ICE with allocatable function result in a procedure-pointer component

2013-02-21 Thread Janus Weil
>> here is a straightforward patch which fixes a regression with
>> procedure-pointer components which have an allocatable result.
>> Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk/4.7/4.6?
>
>
> OK thanks for the patch.

Thanks, Tobias! Committed to trunk as r196202. Will apply to 4.7 and
4.6 in a few days.

Cheers,
Janus



>> 2013-02-20  Janus Weil  
>>
>>  PR fortran/56385
>>  * trans-array.c (structure_alloc_comps): Handle procedure-pointer
>>  components with allocatable result.
>>
>> 2013-02-20  Janus Weil  
>>
>>  PR fortran/56385
>>  * gfortran.dg/proc_ptr_comp_37.f90: New.
>
>


Re: Fix for 56175

2013-02-21 Thread Yuri Rumyantsev
Richard,

As we know Kai is working on this problem for 4.9 and I assume that
type sinking will be deleted from forwprop pass. Could we stay on this
fix but more common fix will be done.
I also can propose to introduce new hook for it but need to know your
opinion since we don't went to waste our time on preparing dead
patches. Note that x86 supports all short types in HW and such type
sinkning is usually useless if short types are involved.

Best regards.
Yuri.


2013/2/21 Richard Biener :
> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> First of all, your proposal to move type sinking to the end of
>> function does not work since we handle each statement in function and
>> we want that 1st type folding of X & C will not happen.
>> Note that we have the following sequence of gimple before forwprop1:
>>
>>x.0_10 = (signed char) x_8;
>>   _11 = x.0_10 & 1;
>>   _12 = (signed char) y_9;
>>   _13 = _12 & 1;
>>   _14 = _11 ^ _13;
>
> Ah, indeed.  Reminds me of some of my dead patches that separated
> forwprop into a forward and backward stage.  Of course then you have
> the ordering issue of whether to first forward or backward.
>
> Which means that I bet you can construct a testcase that with
> your change is no longer optimized (just make pushing the conversion
> make the types _match_).  Which is always the case
> with this kind of local pattern-matching transforms.
>
> Currently forwprop processes leafs of expression trees first (well, inside
> a basic-block), similar to how fold () is supposed to be operated, based
> on the idea that simplified / canonicalized leafs helps keeping pattern
> recognition simple and cost considerations more accurate.
>
> When one order works better than another you always have to consider
> that the user could already have written the code in a way that results
> in the input that isn't well handled.
>
> Not that this helps very much for the situation ;)
>
> But I don't like the use of first_pass_instance ... and the fix isn't
> an improvement but just a hack for the benchmark.
>
> Richard.
>
>> I also added comment to my fix and create new test for it. I also
>> checked that this test is passed with patched compiler  only. So
>> Change Log was also modified:
>>
>> ChangeLog
>>
>> 2013-02-20  Yuri Rumyantsev  
>>
>> PR tree-optimization/56175
>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>> for short integer types.
>> * gcc.dg/pr56175.c: New test.
>>
>>
>>
>>
>> 2013/2/20 Richard Biener :
>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev  wrote:
 Hi All,

 This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
 pattern in simpify_bitwise_binary for short integer types.
 The fix is very simple - we simply turn off short type sinking at the
 first pass of forward propagation allows to get
 +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
 +5-7% for other x86 platforms too.
 Bootstrapping and regression testing were successful on x86-64.

 Is it Ok for trunk?
>>>
>>> It definitely needs a comment before the checks.
>>>
>>> Also I think it simply shows that the code is placed at the wrong spot.
>>> Simply moving it down in simplify_bitwise_binary to be done the very last
>>> should get both of the effects done.
>>>
>>> Can you rework the patch according to that?
>>>
>>> You also miss a testcase, we should make sure to not regress again here.
>>>
>>> Thanks,
>>> Richard.
>>>
 ChangeLog.

 2013-02-20  Yuri Rumyantsev  

 PR tree-optimization/56175
 * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type 
 sinking
 at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
 for short integer types.


Re: Fix for 56175

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> As we know Kai is working on this problem for 4.9 and I assume that
> type sinking will be deleted from forwprop pass. Could we stay on this
> fix but more common fix will be done.

Well, unless you show it is a regression the patch is not applicable for 4.8
anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 either,
it is after all a canonicalization - fold seems to perform the opposite one
though:

  /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
 constants (if x has signed type, the sign bit cannot be set
 in c).  This folds extension into the BIT_AND_EXPR.

note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
not sinking.  Generally frontends and fold try to narrow operands when
possible (even though some targets later widen them again because of
instruction set constraints).

Most of this hoisting code was done to make lowering logical && and ||
I believe.  Looking at the testcases added tells us that while matching
the two patterns as done now helps them but only because that pattern
feeds single-operand instructions that then simplify.  So doing the transform
starting from that single-operand instructions instead looks like a better
fix (op !=/== 0/1 and (T) op) and also would not disagree with the
canonicalization done by fold.

Richard.

> I also can propose to introduce new hook for it but need to know your
> opinion since we don't went to waste our time on preparing dead
> patches. Note that x86 supports all short types in HW and such type
> sinkning is usually useless if short types are involved.
>
> Best regards.
> Yuri.
>
>
> 2013/2/21 Richard Biener :
>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> First of all, your proposal to move type sinking to the end of
>>> function does not work since we handle each statement in function and
>>> we want that 1st type folding of X & C will not happen.
>>> Note that we have the following sequence of gimple before forwprop1:
>>>
>>>x.0_10 = (signed char) x_8;
>>>   _11 = x.0_10 & 1;
>>>   _12 = (signed char) y_9;
>>>   _13 = _12 & 1;
>>>   _14 = _11 ^ _13;
>>
>> Ah, indeed.  Reminds me of some of my dead patches that separated
>> forwprop into a forward and backward stage.  Of course then you have
>> the ordering issue of whether to first forward or backward.
>>
>> Which means that I bet you can construct a testcase that with
>> your change is no longer optimized (just make pushing the conversion
>> make the types _match_).  Which is always the case
>> with this kind of local pattern-matching transforms.
>>
>> Currently forwprop processes leafs of expression trees first (well, inside
>> a basic-block), similar to how fold () is supposed to be operated, based
>> on the idea that simplified / canonicalized leafs helps keeping pattern
>> recognition simple and cost considerations more accurate.
>>
>> When one order works better than another you always have to consider
>> that the user could already have written the code in a way that results
>> in the input that isn't well handled.
>>
>> Not that this helps very much for the situation ;)
>>
>> But I don't like the use of first_pass_instance ... and the fix isn't
>> an improvement but just a hack for the benchmark.
>>
>> Richard.
>>
>>> I also added comment to my fix and create new test for it. I also
>>> checked that this test is passed with patched compiler  only. So
>>> Change Log was also modified:
>>>
>>> ChangeLog
>>>
>>> 2013-02-20  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/56175
>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>> for short integer types.
>>> * gcc.dg/pr56175.c: New test.
>>>
>>>
>>>
>>>
>>> 2013/2/20 Richard Biener :
 On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev  
 wrote:
> Hi All,
>
> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
> pattern in simpify_bitwise_binary for short integer types.
> The fix is very simple - we simply turn off short type sinking at the
> first pass of forward propagation allows to get
> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
> +5-7% for other x86 platforms too.
> Bootstrapping and regression testing were successful on x86-64.
>
> Is it Ok for trunk?

 It definitely needs a comment before the checks.

 Also I think it simply shows that the code is placed at the wrong spot.
 Simply moving it down in simplify_bitwise_binary to be done the very last
 should get both of the effects done.

 Can you rework the patch according to that?

 You also miss a testcase, we should make sure to not regress again here.

 Thanks,
 Richard.

> ChangeLog.
>
> 2013-02-20  Yuri Rumyantsev  
>>

Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote:
> This commit breaks the build if the BFD linker is used (I have gold on
> my box, so I missed it).
> 
> Short repro:
> % cat preinit.cc
> void foo() {}
> __attribute__((section(".preinit_array")))  void (*xxx)(void) = foo;
> % g++ preinit.cc -shared # gold
> % sudo apt-get remove  binutils-gold
> ...
> % g++ preinit.cc -shared # bfd
> /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO
> /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable
> section on output
> collect2: ld returned 1 exit status
> %
> 
> Can we stop building the asan-rt as DSO and leave only the static
> variant (as in clang)?

No, IMNSHO it is desirable to support also that.

Here is a different fix, so libasan.so will not have .preinit_array, but
libasan.a will have it.  Ideally, that hunk should go into a separate
source file (asan_preinit.cc ?), be just compiled into an object file,
rather than shared library and the compiler driver should include it
explicitly in the link.

The used attribute is there because, as it isn't (or shouldn't) be exported
out of the library, if libasan was built with LTO, it could very well be
optimized away.

2013-02-21  Jakub Jelinek  

* asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is
defined.  Add used attribute.

--- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100
+++ libsanitizer/asan/asan_rtl.cc   2013-02-21 14:16:28.985547506 +0100
@@ -520,11 +520,11 @@ void __asan_init() {
   }
 }
 
-#if ASAN_USE_PREINIT_ARRAY
+#if ASAN_USE_PREINIT_ARRAY && !defined (PIC)
   // On Linux, we force __asan_init to be called before anyone else
   // by placing it into .preinit_array section.
   // FIXME: do we have anything like this on Mac?
-  __attribute__((section(".preinit_array")))
+  __attribute__((section(".preinit_array"), used))
   void (*__asan_preinit)(void) =__asan_init;
 #elif defined(_WIN32) && defined(_DLL)
   // On Windows, when using dynamic CRT (/MD), we can put a pointer


Jakub


Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 02:21:36PM +0100, Jakub Jelinek wrote:
> Here is a different fix, so libasan.so will not have .preinit_array, but
> libasan.a will have it.  Ideally, that hunk should go into a separate
> source file (asan_preinit.cc ?), be just compiled into an object file,
> rather than shared library and the compiler driver should include it
> explicitly in the link.

BTW, if you move it into asan_preinit.cc (or whatever other name) separate
file, as the Makefiles aren't shared, it is fine if for clang you choose to
put it into clang libasan.a, and gcc can do something different.

Jakub


Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...

2013-02-21 Thread Konstantin Serebryany
On Thu, Feb 21, 2013 at 5:21 PM, Jakub Jelinek  wrote:
> On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote:
>> This commit breaks the build if the BFD linker is used (I have gold on
>> my box, so I missed it).
>>
>> Short repro:
>> % cat preinit.cc
>> void foo() {}
>> __attribute__((section(".preinit_array")))  void (*xxx)(void) = foo;
>> % g++ preinit.cc -shared # gold
>> % sudo apt-get remove  binutils-gold
>> ...
>> % g++ preinit.cc -shared # bfd
>> /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO
>> /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable
>> section on output
>> collect2: ld returned 1 exit status
>> %
>>
>> Can we stop building the asan-rt as DSO and leave only the static
>> variant (as in clang)?
>
> No, IMNSHO it is desirable to support also that.

It may cause I more trouble (we've seen a couple of bugs already) then do good.
Anyway, we can get rid of this later.

>
> Here is a different fix, so libasan.so will not have .preinit_array, but
> libasan.a will have it.  Ideally, that hunk should go into a separate
> source file (asan_preinit.cc ?), be just compiled into an object file,
> rather than shared library and the compiler driver should include it
> explicitly in the link.
>
> The used attribute is there because, as it isn't (or shouldn't) be exported
> out of the library, if libasan was built with LTO, it could very well be
> optimized away.
>
> 2013-02-21  Jakub Jelinek  
>
> * asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is
> defined.  Add used attribute.
>
> --- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100
> +++ libsanitizer/asan/asan_rtl.cc   2013-02-21 14:16:28.985547506 +0100
> @@ -520,11 +520,11 @@ void __asan_init() {
>}
>  }
>
> -#if ASAN_USE_PREINIT_ARRAY
> +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC)
>// On Linux, we force __asan_init to be called before anyone else
>// by placing it into .preinit_array section.
>// FIXME: do we have anything like this on Mac?
> -  __attribute__((section(".preinit_array")))
> +  __attribute__((section(".preinit_array"), used))
>void (*__asan_preinit)(void) =__asan_init;
>  #elif defined(_WIN32) && defined(_DLL)
>// On Windows, when using dynamic CRT (/MD), we can put a pointer

Thanks!
May I ask you to commit this to gcc (I have to run away now)?
I'll put this into upstream (maybe with asan_preinit.cc as you
suggest) tomorrow.

--kcc

>
>
> Jakub


Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 05:26:30PM +0400, Konstantin Serebryany wrote:
> May I ask you to commit this to gcc (I have to run away now)?

I'll get it through bootstrap/regtest cycle first, commit if that succeeds.

Jakub


Re: Fix for 56175

2013-02-21 Thread Yuri Rumyantsev
Richard,

This regression was introduced by Kai

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html

2011-06-27  Kai Tietz  

* tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
type sinking.
* tree-ssa-math-opts.c (execute_optimize_bswap): Separate
search for di/si mode patterns for finding widest match.

Is it sufficient for you to accept my patch?

Best regards.
yuri.

2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> As we know Kai is working on this problem for 4.9 and I assume that
>> type sinking will be deleted from forwprop pass. Could we stay on this
>> fix but more common fix will be done.
>
> Well, unless you show it is a regression the patch is not applicable for 4.8
> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
> either,
> it is after all a canonicalization - fold seems to perform the opposite one
> though:
>
>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>  constants (if x has signed type, the sign bit cannot be set
>  in c).  This folds extension into the BIT_AND_EXPR.
>
> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
> not sinking.  Generally frontends and fold try to narrow operands when
> possible (even though some targets later widen them again because of
> instruction set constraints).
>
> Most of this hoisting code was done to make lowering logical && and ||
> I believe.  Looking at the testcases added tells us that while matching
> the two patterns as done now helps them but only because that pattern
> feeds single-operand instructions that then simplify.  So doing the transform
> starting from that single-operand instructions instead looks like a better
> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
> canonicalization done by fold.
>
> Richard.
>
>> I also can propose to introduce new hook for it but need to know your
>> opinion since we don't went to waste our time on preparing dead
>> patches. Note that x86 supports all short types in HW and such type
>> sinkning is usually useless if short types are involved.
>>
>> Best regards.
>> Yuri.
>>
>>
>> 2013/2/21 Richard Biener :
>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  wrote:
 Richard,

 First of all, your proposal to move type sinking to the end of
 function does not work since we handle each statement in function and
 we want that 1st type folding of X & C will not happen.
 Note that we have the following sequence of gimple before forwprop1:

x.0_10 = (signed char) x_8;
   _11 = x.0_10 & 1;
   _12 = (signed char) y_9;
   _13 = _12 & 1;
   _14 = _11 ^ _13;
>>>
>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>> forwprop into a forward and backward stage.  Of course then you have
>>> the ordering issue of whether to first forward or backward.
>>>
>>> Which means that I bet you can construct a testcase that with
>>> your change is no longer optimized (just make pushing the conversion
>>> make the types _match_).  Which is always the case
>>> with this kind of local pattern-matching transforms.
>>>
>>> Currently forwprop processes leafs of expression trees first (well, inside
>>> a basic-block), similar to how fold () is supposed to be operated, based
>>> on the idea that simplified / canonicalized leafs helps keeping pattern
>>> recognition simple and cost considerations more accurate.
>>>
>>> When one order works better than another you always have to consider
>>> that the user could already have written the code in a way that results
>>> in the input that isn't well handled.
>>>
>>> Not that this helps very much for the situation ;)
>>>
>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>> an improvement but just a hack for the benchmark.
>>>
>>> Richard.
>>>
 I also added comment to my fix and create new test for it. I also
 checked that this test is passed with patched compiler  only. So
 Change Log was also modified:

 ChangeLog

 2013-02-20  Yuri Rumyantsev  

 PR tree-optimization/56175
 * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
 at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
 for short integer types.
 * gcc.dg/pr56175.c: New test.




 2013/2/20 Richard Biener :
> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev  
> wrote:
>> Hi All,
>>
>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>> pattern in simpify_bitwise_binary for short integer types.
>> The fix is very simple - we simply turn off short type sinking at the
>> first pass of forward propagation allows to get
>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>> +5-7% for other x86 platforms too.
>> Bootstrapping and

Re: Fix for 56175

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> This regression was introduced by Kai
>
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>
> 2011-06-27  Kai Tietz  
>
> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
> type sinking.
> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
> search for di/si mode patterns for finding widest match.
>
> Is it sufficient for you to accept my patch?

No, I still don't think the fix is sound.  A proper fix would revert the
above change (the simplify_bitwise_binary one), watch for testsuite
fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
and fix those failures in a better way.

Richard.

> Best regards.
> yuri.
>
> 2013/2/21 Richard Biener :
>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> As we know Kai is working on this problem for 4.9 and I assume that
>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>> fix but more common fix will be done.
>>
>> Well, unless you show it is a regression the patch is not applicable for 4.8
>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
>> either,
>> it is after all a canonicalization - fold seems to perform the opposite one
>> though:
>>
>>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>  constants (if x has signed type, the sign bit cannot be set
>>  in c).  This folds extension into the BIT_AND_EXPR.
>>
>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
>> not sinking.  Generally frontends and fold try to narrow operands when
>> possible (even though some targets later widen them again because of
>> instruction set constraints).
>>
>> Most of this hoisting code was done to make lowering logical && and ||
>> I believe.  Looking at the testcases added tells us that while matching
>> the two patterns as done now helps them but only because that pattern
>> feeds single-operand instructions that then simplify.  So doing the transform
>> starting from that single-operand instructions instead looks like a better
>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>> canonicalization done by fold.
>>
>> Richard.
>>
>>> I also can propose to introduce new hook for it but need to know your
>>> opinion since we don't went to waste our time on preparing dead
>>> patches. Note that x86 supports all short types in HW and such type
>>> sinkning is usually useless if short types are involved.
>>>
>>> Best regards.
>>> Yuri.
>>>
>>>
>>> 2013/2/21 Richard Biener :
 On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  
 wrote:
> Richard,
>
> First of all, your proposal to move type sinking to the end of
> function does not work since we handle each statement in function and
> we want that 1st type folding of X & C will not happen.
> Note that we have the following sequence of gimple before forwprop1:
>
>x.0_10 = (signed char) x_8;
>   _11 = x.0_10 & 1;
>   _12 = (signed char) y_9;
>   _13 = _12 & 1;
>   _14 = _11 ^ _13;

 Ah, indeed.  Reminds me of some of my dead patches that separated
 forwprop into a forward and backward stage.  Of course then you have
 the ordering issue of whether to first forward or backward.

 Which means that I bet you can construct a testcase that with
 your change is no longer optimized (just make pushing the conversion
 make the types _match_).  Which is always the case
 with this kind of local pattern-matching transforms.

 Currently forwprop processes leafs of expression trees first (well, inside
 a basic-block), similar to how fold () is supposed to be operated, based
 on the idea that simplified / canonicalized leafs helps keeping pattern
 recognition simple and cost considerations more accurate.

 When one order works better than another you always have to consider
 that the user could already have written the code in a way that results
 in the input that isn't well handled.

 Not that this helps very much for the situation ;)

 But I don't like the use of first_pass_instance ... and the fix isn't
 an improvement but just a hack for the benchmark.

 Richard.

> I also added comment to my fix and create new test for it. I also
> checked that this test is passed with patched compiler  only. So
> Change Log was also modified:
>
> ChangeLog
>
> 2013-02-20  Yuri Rumyantsev  
>
> PR tree-optimization/56175
> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type 
> sinking
> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
> for short integer types.
> * gcc.dg/pr56175.c: New test.
>
>
>
>
> 2013/2/20 Richard Biener :
>> On Wed, Feb 20, 2013 at 1:00 PM, 

[PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Andi Kleen
An auto generated program with a 6.4mio line asm statement gave
with 4.7 and 4.8:

xxx.c:6400017:1: internal compiler error: in account_size_time, at
ipa-inline-analysis.c:601

The problem is that the inliner counts the number of lines in the asm
statement and multiplies that with a weight. With the weight this
overflows 32bit signed int, and triggers an assert for negative time.

Fix this by limiting the number of lines to 1000 for asm cost
estimation. The RTL backend also does similar multiplications for
jump shortening. I haven't tried to address this, but presumably
it's less likely to result in a failure.

Passes test suite on x86_64-linux.

Ok for 4.7 and 4.8?

2013-02-17  Andi Kleen  

* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 2a1b692..7f8f2f2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
   return 0;
 
 case GIMPLE_ASM:
-  return asm_str_count (gimple_asm_string (stmt));
+  {
+   int count = asm_str_count (gimple_asm_string (stmt));
+   /* 1000 means infinity. This avoids overflows later
+  with very long asm statements.  */
+   if (count > 1000)
+ count = 1000;
+   return count;
+  }
 
 case GIMPLE_RESX:
   /* This is either going to be an external function call with one
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Dominique Dhumieres
This commit (r196201) causes a lot of bootstrap failures
(including x86_64-apple-darwin10), see

http://gcc.gnu.org/ml/gcc-regression/2013-02/

TIA

Dominique


Re: Fix for 56175

2013-02-21 Thread Yuri Rumyantsev
Richard,

I double checked that with and without my fix compiler produces the
same output with -fdump-tree-optimized.

What patch did you apply? I think that you should apply the second one
- 56175.diff.

Yuri.

2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> This regression was introduced by Kai
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>
>> 2011-06-27  Kai Tietz  
>>
>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>> type sinking.
>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>> search for di/si mode patterns for finding widest match.
>>
>> Is it sufficient for you to accept my patch?
>
> No, I still don't think the fix is sound.  A proper fix would revert the
> above change (the simplify_bitwise_binary one), watch for testsuite
> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
> and fix those failures in a better way.
>
> Richard.
>
>> Best regards.
>> yuri.
>>
>> 2013/2/21 Richard Biener :
>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  wrote:
 Richard,

 As we know Kai is working on this problem for 4.9 and I assume that
 type sinking will be deleted from forwprop pass. Could we stay on this
 fix but more common fix will be done.
>>>
>>> Well, unless you show it is a regression the patch is not applicable for 4.8
>>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
>>> either,
>>> it is after all a canonicalization - fold seems to perform the opposite one
>>> though:
>>>
>>>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>  constants (if x has signed type, the sign bit cannot be set
>>>  in c).  This folds extension into the BIT_AND_EXPR.
>>>
>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
>>> not sinking.  Generally frontends and fold try to narrow operands when
>>> possible (even though some targets later widen them again because of
>>> instruction set constraints).
>>>
>>> Most of this hoisting code was done to make lowering logical && and ||
>>> I believe.  Looking at the testcases added tells us that while matching
>>> the two patterns as done now helps them but only because that pattern
>>> feeds single-operand instructions that then simplify.  So doing the 
>>> transform
>>> starting from that single-operand instructions instead looks like a better
>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>> canonicalization done by fold.
>>>
>>> Richard.
>>>
 I also can propose to introduce new hook for it but need to know your
 opinion since we don't went to waste our time on preparing dead
 patches. Note that x86 supports all short types in HW and such type
 sinkning is usually useless if short types are involved.

 Best regards.
 Yuri.


 2013/2/21 Richard Biener :
> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> First of all, your proposal to move type sinking to the end of
>> function does not work since we handle each statement in function and
>> we want that 1st type folding of X & C will not happen.
>> Note that we have the following sequence of gimple before forwprop1:
>>
>>x.0_10 = (signed char) x_8;
>>   _11 = x.0_10 & 1;
>>   _12 = (signed char) y_9;
>>   _13 = _12 & 1;
>>   _14 = _11 ^ _13;
>
> Ah, indeed.  Reminds me of some of my dead patches that separated
> forwprop into a forward and backward stage.  Of course then you have
> the ordering issue of whether to first forward or backward.
>
> Which means that I bet you can construct a testcase that with
> your change is no longer optimized (just make pushing the conversion
> make the types _match_).  Which is always the case
> with this kind of local pattern-matching transforms.
>
> Currently forwprop processes leafs of expression trees first (well, inside
> a basic-block), similar to how fold () is supposed to be operated, based
> on the idea that simplified / canonicalized leafs helps keeping pattern
> recognition simple and cost considerations more accurate.
>
> When one order works better than another you always have to consider
> that the user could already have written the code in a way that results
> in the input that isn't well handled.
>
> Not that this helps very much for the situation ;)
>
> But I don't like the use of first_pass_instance ... and the fix isn't
> an improvement but just a hack for the benchmark.
>
> Richard.
>
>> I also added comment to my fix and create new test for it. I also
>> checked that this test is passed with patched compiler  only. So
>> Change Log was also modified:
>>
>> ChangeLog
>>
>> 2013-02-20  Yuri Rumyantsev  
>>
>> PR tree-o

Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Dominique Dhumieres
The failure on x86_64-apple-darwin10 is

...
true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" 
"CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" 
"CFLAGS=-g -O2  -m32" "CXXFLAGS=-g -O2  -m32" "CFLAGS_FOR_BUILD=-g -O2" 
"CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" 
"INSTALL_DATA=/sw64/bin/ginstall -c -m 644" "INSTALL_PROGRAM=/sw64/bin/ginstall 
-c" "INSTALL_SCRIPT=/sw64/bin/ginstall -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2  
-m32" "LIBCFLAGS_FOR_TARGET=-g -O2" "MAKE=make" "MAKEINFO=makeinfo 
--split-size=500 --split-size=500 --split-size=500   " 
"SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/opt/gcc/gcc4.8w" 
"infodir=/opt/gcc/gcc4.8w/share/info" "libdir=/opt/gcc/gcc4.8w/lib" 
"includedir=/opt/gcc/gcc4.8w/include" "prefix=/opt/gcc/gcc4.8w" 
"tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" 
"gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" 
"AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" 
"RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" 
"NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make
Checking multilib configuration for libsanitizer...
Making all in sanitizer_common
make[4]: Nothing to be done for `all'.
Making all in asan
make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', 
needed by `asan_interceptors_dynamic.lo'.  Stop.
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-stage1-target-libsanitizer] Error 2
make[1]: *** [stage1-bubble] Error 2
make: *** [all] Error 2

On linux it is

/bin/sh ../libtool --tag=CXX   --mode=link 
/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc 
-B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
 -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem 
/usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include
-Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC 
-fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables 
-fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include 
-I../../libstdc++-v3/include/i686-linux 
-I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 
-D_GNU_SOURCE -version-info `grep -v '^#' 
../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl  -o 
libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo 
asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo 
asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo 
asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo 
asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo 
asan_thread.lo asan_win.lo  ../sanitizer_common/libsanitizer_common.la 
../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la 
libtool: link:  /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc 
-shared-libgcc -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc 
-nostdinc++ 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
 -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem 
/usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include
-shared -nostdlib /lib/crti.o 
/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtbeginS.o  
.libs/asan_allocator.o .libs/asan_allocator2.o .libs/asan_interceptors.o 
.libs/asan_mac.o .libs/asan_malloc_mac.o .libs/asan_new_delete.o 
.libs/asan_posix.o .libs/asan_rtl.o .libs/asan_stats.o 
.libs/asan_thread_registry.o .libs/asan_fake_stack.o .libs/asan_globals.o 
.libs/asan_linux.o .libs/asan_malloc_linux.o .libs/asan_malloc_win.o 
.libs/asan_poisoning.o .libs/asan_report.o .libs/asan_stack.o 
.libs/asan_thread.o .libs/asan_win.o  -Wl,--whole-archive 
../sanitizer_common/.libs/libsanitizer_common.a 
../interception/.libs/libinterception.a -Wl,--no-whole-archive  -Wl,-rpath 
-Wl,/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
 -Wl,-rpath -Wl,/usr/local/lib 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
 -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src 
-lpthread -ldl ../../libstdc++-v3/src/.libs/libstdc++.so -lm 
-L/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -lc -lgcc_s 
/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtendS.o /lib/crtn.o
-Wl,-soname -Wl,libasan.so.0 -o .libs/libasan.so.0.0.0
/usr/local32/bin/ld: .libs/asan_rtl.o: .preinit_array section is not allowed in 
DSO
/usr/local32/bin/ld: failed to set dynamic section sizes: Nonrepresentable 
section on output

Dominique


Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 03:29:40PM +0100, Dominique Dhumieres wrote:
> The failure on x86_64-apple-darwin10 is
> 
> ...
> true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" 
> "CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" 
> "CFLAGS=-g -O2  -m32" "CXXFLAGS=-g -O2  -m32" "CFLAGS_FOR_BUILD=-g -O2" 
> "CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" 
> "INSTALL_DATA=/sw64/bin/ginstall -c -m 644" 
> "INSTALL_PROGRAM=/sw64/bin/ginstall -c" "INSTALL_SCRIPT=/sw64/bin/ginstall 
> -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2  -m32" "LIBCFLAGS_FOR_TARGET=-g -O2" 
> "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 
> --split-size=500   " "SHELL=/bin/sh" "RUNTESTFLAGS=" 
> "exec_prefix=/opt/gcc/gcc4.8w" "infodir=/opt/gcc/gcc4.8w/share/info" 
> "libdir=/opt/gcc/gcc4.8w/lib" "includedir=/opt/gcc/gcc4.8w/include" 
> "prefix=/opt/gcc/gcc4.8w" 
> "tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" 
> "gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" 
> "AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" 
> "RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" 
> "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make
> Checking multilib configuration for libsanitizer...
> Making all in sanitizer_common
> make[4]: Nothing to be done for `all'.
> Making all in asan
> make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', 
> needed by `asan_interceptors_dynamic.lo'.  Stop.
> make[3]: *** [all-recursive] Error 1
> make[2]: *** [all-stage1-target-libsanitizer] Error 2
> make[1]: *** [stage1-bubble] Error 2
> make: *** [all] Error 2

Removed:
   
trunk/libsanitizer/asan/dynamic/asan_interceptors_dynamic.cc
   

Thus I guess perhaps you should try removing
libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
line from libsanitizer/asan/Makefile.am
and run automake to regenerate Makefile.in.

> On linux it is
> 
> /bin/sh ../libtool --tag=CXX   --mode=link 
> /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc 
> -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ 
> -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src 
> -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
>  -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem 
> /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include
> -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC 
> -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables 
> -fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include 
> -I../../libstdc++-v3/include/i686-linux 
> -I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 
> -D_GNU_SOURCE -version-info `grep -v '^#' 
> ../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl  -o 
> libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo 
> asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo 
> asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo 
> asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo 
> asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo 
> asan_thread.lo asan_win.lo  ../sanitizer_common/libsanitizer_common.la 
> ../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la 
> libtool: link:  /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc 
> -shared-libgcc -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc 
> -nostdinc++ 
> -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src 
> -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
>  -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem 
> /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include
> -shared -nostdlib /lib/crti.o 
> /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtbeginS.o  
> .libs/asan_allocator.o .libs/asan_allocator2.o .libs/asan_interceptors.o 
> .libs/asan_mac.o .libs/asan_malloc_mac.o .libs/asan_new_delete.o 
> .libs/asan_posix.o .libs/asan_rtl.o .libs/asan_stats.o 
> .libs/asan_thread_registry.o .libs/asan_fake_stack.o .libs/asan_globals.o 
> .libs/asan_linux.o .libs/asan_malloc_linux.o .libs/asan_malloc_win.o 
> .libs/asan_poisoning.o .libs/asan_report.o .libs/asan_stack.o 
> .libs/asan_thread.o .libs/asan_win.o  -Wl,--whole-archive 
> ../sanitizer_common/.libs/libsanitizer_common.a 
> ../interception/.libs/libinterception.a -Wl,--no-whole-archive  -Wl,-rpath 
> -Wl,/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
>  -Wl,-rpath -Wl,/usr/local/lib 
> -L/export/gnu/import/git/gcc-test-ia32corei7/bld/

Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Richard Earnshaw

On 21/02/13 14:05, Andi Kleen wrote:

An auto generated program with a 6.4mio line asm statement gave
with 4.7 and 4.8:

xxx.c:6400017:1: internal compiler error: in account_size_time, at
ipa-inline-analysis.c:601

The problem is that the inliner counts the number of lines in the asm
statement and multiplies that with a weight. With the weight this
overflows 32bit signed int, and triggers an assert for negative time.

Fix this by limiting the number of lines to 1000 for asm cost
estimation. The RTL backend also does similar multiplications for
jump shortening. I haven't tried to address this, but presumably
it's less likely to result in a failure.

Passes test suite on x86_64-linux.

Ok for 4.7 and 4.8?

2013-02-17  Andi Kleen  

* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 2a1b692..7f8f2f2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
return 0;

  case GIMPLE_ASM:
-  return asm_str_count (gimple_asm_string (stmt));
+  {
+   int count = asm_str_count (gimple_asm_string (stmt));
+   /* 1000 means infinity. This avoids overflows later
+  with very long asm statements.  */
+   if (count > 1000)
+ count = 1000;
+   return count;
+  }

  case GIMPLE_RESX:
/* This is either going to be an external function call with one




That doesn't sound enough, unless there is already code out there that 
respects this count.  1000 at 4 bytes per instruction is only 4k.   More 
that small enough for the rest of the compiler to think that it could 
jump around such blocks cheaply.


I think a limit of 1M or more might be more appropriate.

R.



Re: Fix for 56175

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> I double checked that with and without my fix compiler produces the
> same output with -fdump-tree-optimized.

For what testcase?

Richard.

> What patch did you apply? I think that you should apply the second one
> - 56175.diff.
>
> Yuri.
>
> 2013/2/21 Richard Biener :
>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> This regression was introduced by Kai
>>>
>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>>
>>> 2011-06-27  Kai Tietz  
>>>
>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>> type sinking.
>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>> search for di/si mode patterns for finding widest match.
>>>
>>> Is it sufficient for you to accept my patch?
>>
>> No, I still don't think the fix is sound.  A proper fix would revert the
>> above change (the simplify_bitwise_binary one), watch for testsuite
>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>> and fix those failures in a better way.
>>
>> Richard.
>>
>>> Best regards.
>>> yuri.
>>>
>>> 2013/2/21 Richard Biener :
 On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  
 wrote:
> Richard,
>
> As we know Kai is working on this problem for 4.9 and I assume that
> type sinking will be deleted from forwprop pass. Could we stay on this
> fix but more common fix will be done.

 Well, unless you show it is a regression the patch is not applicable for 
 4.8
 anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
 either,
 it is after all a canonicalization - fold seems to perform the opposite one
 though:

   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
  constants (if x has signed type, the sign bit cannot be set
  in c).  This folds extension into the BIT_AND_EXPR.

 note that what forwprop does (T)x & c -> (T)(x & c') I'd call type 
 hoisting,
 not sinking.  Generally frontends and fold try to narrow operands when
 possible (even though some targets later widen them again because of
 instruction set constraints).

 Most of this hoisting code was done to make lowering logical && and ||
 I believe.  Looking at the testcases added tells us that while matching
 the two patterns as done now helps them but only because that pattern
 feeds single-operand instructions that then simplify.  So doing the 
 transform
 starting from that single-operand instructions instead looks like a better
 fix (op !=/== 0/1 and (T) op) and also would not disagree with the
 canonicalization done by fold.

 Richard.

> I also can propose to introduce new hook for it but need to know your
> opinion since we don't went to waste our time on preparing dead
> patches. Note that x86 supports all short types in HW and such type
> sinkning is usually useless if short types are involved.
>
> Best regards.
> Yuri.
>
>
> 2013/2/21 Richard Biener :
>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  
>> wrote:
>>> Richard,
>>>
>>> First of all, your proposal to move type sinking to the end of
>>> function does not work since we handle each statement in function and
>>> we want that 1st type folding of X & C will not happen.
>>> Note that we have the following sequence of gimple before forwprop1:
>>>
>>>x.0_10 = (signed char) x_8;
>>>   _11 = x.0_10 & 1;
>>>   _12 = (signed char) y_9;
>>>   _13 = _12 & 1;
>>>   _14 = _11 ^ _13;
>>
>> Ah, indeed.  Reminds me of some of my dead patches that separated
>> forwprop into a forward and backward stage.  Of course then you have
>> the ordering issue of whether to first forward or backward.
>>
>> Which means that I bet you can construct a testcase that with
>> your change is no longer optimized (just make pushing the conversion
>> make the types _match_).  Which is always the case
>> with this kind of local pattern-matching transforms.
>>
>> Currently forwprop processes leafs of expression trees first (well, 
>> inside
>> a basic-block), similar to how fold () is supposed to be operated, based
>> on the idea that simplified / canonicalized leafs helps keeping pattern
>> recognition simple and cost considerations more accurate.
>>
>> When one order works better than another you always have to consider
>> that the user could already have written the code in a way that results
>> in the input that isn't well handled.
>>
>> Not that this helps very much for the situation ;)
>>
>> But I don't like the use of first_pass_instance ... and the fix isn't
>> an improvement but just a hack for the benchmark.
>>
>> Richard.
>>
>>> I also added comment to my fix an

Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Jack Howarth
On Thu, Feb 21, 2013 at 03:45:04PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 21, 2013 at 03:29:40PM +0100, Dominique Dhumieres wrote:
> > The failure on x86_64-apple-darwin10 is
> > 
> > ...
> > true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" 
> > "CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" 
> > "CFLAGS=-g -O2  -m32" "CXXFLAGS=-g -O2  -m32" "CFLAGS_FOR_BUILD=-g -O2" 
> > "CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" 
> > "INSTALL_DATA=/sw64/bin/ginstall -c -m 644" 
> > "INSTALL_PROGRAM=/sw64/bin/ginstall -c" "INSTALL_SCRIPT=/sw64/bin/ginstall 
> > -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2  -m32" "LIBCFLAGS_FOR_TARGET=-g -O2" 
> > "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 
> > --split-size=500   " "SHELL=/bin/sh" "RUNTESTFLAGS=" 
> > "exec_prefix=/opt/gcc/gcc4.8w" "infodir=/opt/gcc/gcc4.8w/share/info" 
> > "libdir=/opt/gcc/gcc4.8w/lib" "includedir=/opt/gcc/gcc4.8w/include" 
> > "prefix=/opt/gcc/gcc4.8w" 
> > "tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" 
> > "gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" 
> > "AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" 
> > "RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" 
> > "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make
> > Checking multilib configuration for libsanitizer...
> > Making all in sanitizer_common
> > make[4]: Nothing to be done for `all'.
> > Making all in asan
> > make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', 
> > needed by `asan_interceptors_dynamic.lo'.  Stop.
> > make[3]: *** [all-recursive] Error 1
> > make[2]: *** [all-stage1-target-libsanitizer] Error 2
> > make[1]: *** [stage1-bubble] Error 2
> > make: *** [all] Error 2
> 
> Removed:  
>  
> trunk/libsanitizer/asan/dynamic/asan_interceptors_dynamic.cc  
>  

Index: libsanitizer/asan/Makefile.am
===
--- libsanitizer/asan/Makefile.am   (revision 196205)
+++ libsanitizer/asan/Makefile.am   (working copy)
@@ -37,7 +37,6 @@ asan_files = \
 
 libasan_la_SOURCES = $(asan_files)
 if USING_MAC_INTERPOSE
-libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
 libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la
 else
 libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la
Index: libsanitizer/merge.sh
===
--- libsanitizer/merge.sh   (revision 196205)
+++ libsanitizer/merge.sh   (working copy)
@@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev)
 echo Current upstream revision: $CUR_REV
 merge include/sanitizer include/sanitizer
 merge lib/asan asan
-merge lib/asan/dynamic asan/dynamic
 merge lib/tsan/rtl tsan
 merge lib/sanitizer_common sanitizer_common
 merge lib/interception interception

appears to fix the darwin bootstrap after running automake1.11 in libsanitizer.

> 
> Thus I guess perhaps you should try removing
> libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
> line from libsanitizer/asan/Makefile.am
> and run automake to regenerate Makefile.in.
> 
> > On linux it is
> > 
> > /bin/sh ../libtool --tag=CXX   --mode=link 
> > /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc 
> > -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ 
> > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src
> >  
> > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs
> >  -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem 
> > /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include
> > -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long 
> > -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables 
> > -fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include 
> > -I../../libstdc++-v3/include/i686-linux 
> > -I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 
> > -D_GNU_SOURCE -version-info `grep -v '^#' 
> > ../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl  -o 
> > libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo 
> > asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo 
> > asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo 
> > asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo 
> > asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo 
> > asan_thread.lo asan_win.lo  ../sanitizer_common/libsanitizer_common.la 
> > ../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la 
> > libt

Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 10:06:12AM -0500, Jack Howarth wrote:

> Index: libsanitizer/asan/Makefile.am
> ===
> --- libsanitizer/asan/Makefile.am (revision 196205)
> +++ libsanitizer/asan/Makefile.am (working copy)
> @@ -37,7 +37,6 @@ asan_files = \
>  
>  libasan_la_SOURCES = $(asan_files)
>  if USING_MAC_INTERPOSE
> -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
>  libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la
>  else
>  libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
> $(top_builddir)/interception/libinterception.la
> Index: libsanitizer/merge.sh
> ===
> --- libsanitizer/merge.sh (revision 196205)
> +++ libsanitizer/merge.sh (working copy)
> @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev)
>  echo Current upstream revision: $CUR_REV
>  merge include/sanitizer include/sanitizer
>  merge lib/asan asan
> -merge lib/asan/dynamic asan/dynamic
>  merge lib/tsan/rtl tsan
>  merge lib/sanitizer_common sanitizer_common
>  merge lib/interception interception
> 
> appears to fix the darwin bootstrap after running automake1.11 in 
> libsanitizer.

Patch preapproved with proper ChangeLog entry.

Jakub


Re: Fix for 56175

2013-02-21 Thread Yuri Rumyantsev
Richard,

Sorry for my previous message - I did not undersatnd it properly.

Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
(type-x) c) transformation if x has short type:

+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -1789,8 +1789,11 @@
   defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
   defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);

-  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
+  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
+ Do that only if X has not short type.  */
   if (TREE_CODE (arg2) == INTEGER_CST
+  && (TYPE_PRECISION (TREE_TYPE (arg1))
+  >= TYPE_PRECISION (integer_type_node))
   && CONVERT_EXPR_CODE_P (def1_code)
   && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
   && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))

Does this fix look suitable?

Yuri.

2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> I double checked that with and without my fix compiler produces the
>> same output with -fdump-tree-optimized.
>
> For what testcase?
>
> Richard.
>
>> What patch did you apply? I think that you should apply the second one
>> - 56175.diff.
>>
>> Yuri.
>>
>> 2013/2/21 Richard Biener :
>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  wrote:
 Richard,

 This regression was introduced by Kai

 http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html

 2011-06-27  Kai Tietz  

 * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
 type sinking.
 * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
 search for di/si mode patterns for finding widest match.

 Is it sufficient for you to accept my patch?
>>>
>>> No, I still don't think the fix is sound.  A proper fix would revert the
>>> above change (the simplify_bitwise_binary one), watch for testsuite
>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>>> and fix those failures in a better way.
>>>
>>> Richard.
>>>
 Best regards.
 yuri.

 2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> As we know Kai is working on this problem for 4.9 and I assume that
>> type sinking will be deleted from forwprop pass. Could we stay on this
>> fix but more common fix will be done.
>
> Well, unless you show it is a regression the patch is not applicable for 
> 4.8
> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
> either,
> it is after all a canonicalization - fold seems to perform the opposite 
> one
> though:
>
>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>  constants (if x has signed type, the sign bit cannot be set
>  in c).  This folds extension into the BIT_AND_EXPR.
>
> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type 
> hoisting,
> not sinking.  Generally frontends and fold try to narrow operands when
> possible (even though some targets later widen them again because of
> instruction set constraints).
>
> Most of this hoisting code was done to make lowering logical && and ||
> I believe.  Looking at the testcases added tells us that while matching
> the two patterns as done now helps them but only because that pattern
> feeds single-operand instructions that then simplify.  So doing the 
> transform
> starting from that single-operand instructions instead looks like a better
> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
> canonicalization done by fold.
>
> Richard.
>
>> I also can propose to introduce new hook for it but need to know your
>> opinion since we don't went to waste our time on preparing dead
>> patches. Note that x86 supports all short types in HW and such type
>> sinkning is usually useless if short types are involved.
>>
>> Best regards.
>> Yuri.
>>
>>
>> 2013/2/21 Richard Biener :
>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev  
>>> wrote:
 Richard,

 First of all, your proposal to move type sinking to the end of
 function does not work since we handle each statement in function and
 we want that 1st type folding of X & C will not happen.
 Note that we have the following sequence of gimple before forwprop1:

x.0_10 = (signed char) x_8;
   _11 = x.0_10 & 1;
   _12 = (signed char) y_9;
   _13 = _12 & 1;
   _14 = _11 ^ _13;
>>>
>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>> forwprop into a forward and backward stage.  Of course then you have
>>> the ordering issue of whether to first forward or backward.
>>>
>>> Which means th

Re: Fix for 56175

2013-02-21 Thread Richard Biener
On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Sorry for my previous message - I did not undersatnd it properly.
>
> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
> (type-x) c) transformation if x has short type:
>
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -1789,8 +1789,11 @@
>defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>
> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
> + Do that only if X has not short type.  */
>if (TREE_CODE (arg2) == INTEGER_CST
> +  && (TYPE_PRECISION (TREE_TYPE (arg1))
> +  >= TYPE_PRECISION (integer_type_node))
>&& CONVERT_EXPR_CODE_P (def1_code)
>&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>
> Does this fix look suitable?

I think the fix is to disable the transform if it widens the operation.
Thus, sth like

if (TREE_CODE (arg2) == INTEGER_CST
   && CONVERT_EXPR_CODE_P (def1_code)
   && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
+ && (TYPE_PRECISION (TREE_TYPE (arg1))
+>= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
   && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))

see the restriction we place on the transform for the (T1) x & (T2) y case:

  /* For bitwise binary operations apply operand conversions to the
 binary operation result instead of to the operands.  This allows
 to combine successive conversions and bitwise binary operations.  */
  if (CONVERT_EXPR_CODE_P (def1_code)
  && CONVERT_EXPR_CODE_P (def2_code)
  && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
  /* Make sure that the conversion widens the operands, or has same
 precision,  or that it changes the operation to a bitfield
 precision.  */
  && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
   <= TYPE_PRECISION (TREE_TYPE (arg1)))
  || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
  != MODE_INT)
  || (TYPE_PRECISION (TREE_TYPE (arg1))
  != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))

ideally you'd split out that condition into a predicate function taking the
two type and use it in both places.

Richard.

> Yuri.
>
> 2013/2/21 Richard Biener :
>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> I double checked that with and without my fix compiler produces the
>>> same output with -fdump-tree-optimized.
>>
>> For what testcase?
>>
>> Richard.
>>
>>> What patch did you apply? I think that you should apply the second one
>>> - 56175.diff.
>>>
>>> Yuri.
>>>
>>> 2013/2/21 Richard Biener :
 On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  
 wrote:
> Richard,
>
> This regression was introduced by Kai
>
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>
> 2011-06-27  Kai Tietz  
>
> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
> type sinking.
> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
> search for di/si mode patterns for finding widest match.
>
> Is it sufficient for you to accept my patch?

 No, I still don't think the fix is sound.  A proper fix would revert the
 above change (the simplify_bitwise_binary one), watch for testsuite
 fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
 and fix those failures in a better way.

 Richard.

> Best regards.
> yuri.
>
> 2013/2/21 Richard Biener :
>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  
>> wrote:
>>> Richard,
>>>
>>> As we know Kai is working on this problem for 4.9 and I assume that
>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>> fix but more common fix will be done.
>>
>> Well, unless you show it is a regression the patch is not applicable for 
>> 4.8
>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
>> either,
>> it is after all a canonicalization - fold seems to perform the opposite 
>> one
>> though:
>>
>>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>  constants (if x has signed type, the sign bit cannot be set
>>  in c).  This folds extension into the BIT_AND_EXPR.
>>
>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type 
>> hoisting,
>> not sinking.  Generally frontends and fold try to narrow operands when
>> possible (even though some targets later widen them again because of
>> instruction set constraints).
>>
>> Most of this hoisting code was done to make lowering logical && and ||
>> I believe.  Looking at the testcases added tells us that whi

Re: libsanitizer merge from upstream r175733

2013-02-21 Thread Jack Howarth
On Thu, Feb 21, 2013 at 04:11:56PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 21, 2013 at 10:06:12AM -0500, Jack Howarth wrote:
> 
> > Index: libsanitizer/asan/Makefile.am
> > ===
> > --- libsanitizer/asan/Makefile.am   (revision 196205)
> > +++ libsanitizer/asan/Makefile.am   (working copy)
> > @@ -37,7 +37,6 @@ asan_files = \
> >  
> >  libasan_la_SOURCES = $(asan_files)
> >  if USING_MAC_INTERPOSE
> > -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
> >  libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la
> >  else
> >  libasan_la_LIBADD = 
> > $(top_builddir)/sanitizer_common/libsanitizer_common.la 
> > $(top_builddir)/interception/libinterception.la
> > Index: libsanitizer/merge.sh
> > ===
> > --- libsanitizer/merge.sh   (revision 196205)
> > +++ libsanitizer/merge.sh   (working copy)
> > @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev)
> >  echo Current upstream revision: $CUR_REV
> >  merge include/sanitizer include/sanitizer
> >  merge lib/asan asan
> > -merge lib/asan/dynamic asan/dynamic
> >  merge lib/tsan/rtl tsan
> >  merge lib/sanitizer_common sanitizer_common
> >  merge lib/interception interception
> > 
> > appears to fix the darwin bootstrap after running automake1.11 in 
> > libsanitizer.
> 
> Patch preapproved with proper ChangeLog entry.
> 
>   Jakub

No regressions in asan.exp for the resulting build on x86_64-apple-darwin12.



Re: Fix for 56175

2013-02-21 Thread Yuri Rumyantsev
Richard,

This does not fit to my fix  since if we have another pattern like
t = (u8)((x & 1) ^ ((u8)y & 1));
where y has short type, for rhs operand type sinkning (or hoisting as
you prefer) still will be performed but I don't see any reason for
converting  (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all
targets including x86.

Yuri.

2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> Sorry for my previous message - I did not undersatnd it properly.
>>
>> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
>> (type-x) c) transformation if x has short type:
>>
>> +++ gcc/tree-ssa-forwprop.c (working copy)
>> @@ -1789,8 +1789,11 @@
>>defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>>defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>>
>> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
>> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
>> + Do that only if X has not short type.  */
>>if (TREE_CODE (arg2) == INTEGER_CST
>> +  && (TYPE_PRECISION (TREE_TYPE (arg1))
>> +  >= TYPE_PRECISION (integer_type_node))
>>&& CONVERT_EXPR_CODE_P (def1_code)
>>&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>
>> Does this fix look suitable?
>
> I think the fix is to disable the transform if it widens the operation.
> Thus, sth like
>
> if (TREE_CODE (arg2) == INTEGER_CST
>&& CONVERT_EXPR_CODE_P (def1_code)
>&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
> + && (TYPE_PRECISION (TREE_TYPE (arg1))
> +>= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
>&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>
> see the restriction we place on the transform for the (T1) x & (T2) y case:
>
>   /* For bitwise binary operations apply operand conversions to the
>  binary operation result instead of to the operands.  This allows
>  to combine successive conversions and bitwise binary operations.  */
>   if (CONVERT_EXPR_CODE_P (def1_code)
>   && CONVERT_EXPR_CODE_P (def2_code)
>   && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
>   /* Make sure that the conversion widens the operands, or has same
>  precision,  or that it changes the operation to a bitfield
>  precision.  */
>   && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
><= TYPE_PRECISION (TREE_TYPE (arg1)))
>   || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
>   != MODE_INT)
>   || (TYPE_PRECISION (TREE_TYPE (arg1))
>   != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))
>
> ideally you'd split out that condition into a predicate function taking the
> two type and use it in both places.
>
> Richard.
>
>> Yuri.
>>
>> 2013/2/21 Richard Biener :
>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev  wrote:
 Richard,

 I double checked that with and without my fix compiler produces the
 same output with -fdump-tree-optimized.
>>>
>>> For what testcase?
>>>
>>> Richard.
>>>
 What patch did you apply? I think that you should apply the second one
 - 56175.diff.

 Yuri.

 2013/2/21 Richard Biener :
> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> This regression was introduced by Kai
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>
>> 2011-06-27  Kai Tietz  
>>
>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>> type sinking.
>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>> search for di/si mode patterns for finding widest match.
>>
>> Is it sufficient for you to accept my patch?
>
> No, I still don't think the fix is sound.  A proper fix would revert the
> above change (the simplify_bitwise_binary one), watch for testsuite
> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
> and fix those failures in a better way.
>
> Richard.
>
>> Best regards.
>> yuri.
>>
>> 2013/2/21 Richard Biener :
>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev  
>>> wrote:
 Richard,

 As we know Kai is working on this problem for 4.9 and I assume that
 type sinking will be deleted from forwprop pass. Could we stay on this
 fix but more common fix will be done.
>>>
>>> Well, unless you show it is a regression the patch is not applicable 
>>> for 4.8
>>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
>>> either,
>>> it is after all a canonicalization - fold seems to perform the opposite 
>>> one
>>> though:
>>>
>>>   /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>  constants (if x has signed type, the sign bit 

Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Andi Kleen
> That doesn't sound enough, unless there is already code out there
> that respects this count.  1000 at 4 bytes per instruction is only
> 4k.   More that small enough for the rest of the compiler to think
> that it could jump around such blocks cheaply.
> 
> I think a limit of 1M or more might be more appropriate.

I got an overflow for 6.4M, so 1M would be dangerously near that.
100k perhaps ? 

This was not for jump shortening, but the inliner heuristics.

In the worst case we could separate the two, would be a larger
patch though.

-Andi


[PATCH] Fix darwin bootstrap and merge.sh

2013-02-21 Thread Jack Howarth
  The attached patch fixes the broken bootstrap on darwin in libsanitizer due 
to the
deprecation of the dynamic/asan_interceptors_dynamic.cc file. The patch also 
eliminates
the merge of the deprecated subdirectory in merge.sh. Bootstrap and asan.exp 
regression
tested on x86_64-apple-darwin12. Okay for gcc trunk?
Jack
libsanitizer/

2013-02-21  Jack Howarth 

* asan/Makefile.am (libasan_la_SOURCES): Remove deprecated
dynamic/asan_interceptors_dynamic.cc.
* asan/Makefile.in: Regenerated.
* merge.sh: Remove merge of deprecated lib/asan/dynamic.

Index: libsanitizer/asan/Makefile.am
===
--- libsanitizer/asan/Makefile.am   (revision 196205)
+++ libsanitizer/asan/Makefile.am   (working copy)
@@ -37,7 +37,6 @@ asan_files = \
 
 libasan_la_SOURCES = $(asan_files)
 if USING_MAC_INTERPOSE
-libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc
 libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la
 else
 libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la
Index: libsanitizer/merge.sh
===
--- libsanitizer/merge.sh   (revision 196205)
+++ libsanitizer/merge.sh   (working copy)
@@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev)
 echo Current upstream revision: $CUR_REV
 merge include/sanitizer include/sanitizer
 merge lib/asan asan
-merge lib/asan/dynamic asan/dynamic
 merge lib/tsan/rtl tsan
 merge lib/sanitizer_common sanitizer_common
 merge lib/interception interception


Fix some texinfo 5.0 warnings in gcc/doc + libiberty

2013-02-21 Thread Tobias Burnus

This is a follow up to Jakub's patch.

With texinfo 5.0 one gets a bunch of warnings. This patch reduces the 
number of warnings – but there are still warnings to be fixed.


This patch solves most of the issues related to mismatches between the 
item order in the @menu and the actual @nodes. As always, there is the 
question whether the @node or the item in @menu should be changed. In 
one case, I had the choice to either add a new item under @menu or to 
change a @section into @subsection. I did the latter.


Tested with "make info" and no new warnings in texinfo 5.0. 
Additionally, I used texinfo 4.13a with showed no warnings.

OK for the trunk?

Tobias


PS: I tried the following libiberty patch; it fixes a warning with 
texinfo 5.0. But I do not include it as it fails for some reason with an 
error with texinfo 4.13:
../../libiberty/libiberty.texi:250: Prev field of node `Functions' not 
pointed to.

../../libiberty//obstacks.texi:1: This node (Obstacks) has the bad Next.
...

diff --git a/libiberty/libiberty.texi b/libiberty/libiberty.texi
index 74f70d2..5a72f13 100644
--- a/libiberty/libiberty.texi
+++ b/libiberty/libiberty.texi
@@ -80,10 +80,10 @@ License; for more information, see @ref{Library 
Copying}.


* Overview:: Overview of available function groups.

-* Functions:: Available functions, macros, and global variables.
-
* Obstacks:: Object Stacks.

+* Functions:: Available functions, macros, and global variables.
+
* Licenses:: The various licenses under which libiberty sources are
distributed.

gcc/
2013-02-21  Tobias Burnus  

	* doc/extended.texi (C Extensions): Change order in @menu
	to match @node.
	(Other MIPS Built-in Functions): Move last MIPS entry before
	"picoChip Built-in Functions".
	(SH Built-in Functions): Move after RX Built-in Functions.
	* doc/gcc.texi (Introduction): Change order in @menu
	to match @node.
	* doc/md.texi (Constraints): Ditto.
	* gty.texi (Type Information): Ditto.
	(User-provided marking routines for template types): Make
	subsection.
	* doc/invoke.texi (AArch64 Options): Move before
	"Adapteva Epiphany Options".

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index eb84408..ea4dfc1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -30,8 +30,8 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Constructing Calls::  Dispatching a call to another function.
 * Typeof::  @code{typeof}: referring to the type of an expression.
 * Conditionals::Omitting the middle operand of a @samp{?:} expression.
-* Long Long::   Double-word integers---@code{long long int}.
 * __int128::			128-bit integers---@code{__int128}.
+* Long Long::   Double-word integers---@code{long long int}.
 * Complex:: Data types for complex numbers.
 * Floating Types::  Additional Floating Types.
 * Half-Precision::  Half-Precision Floating Point.
@@ -40,8 +40,8 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Fixed-Point:: Fixed-Point Types.
 * Named Address Spaces::Named address spaces.
 * Zero Length:: Zero-length arrays.
-* Variable Length:: Arrays whose length is computed at run time.
 * Empty Structures::Structures with no members.
+* Variable Length:: Arrays whose length is computed at run time.
 * Variadic Macros:: Macros with a variable number of arguments.
 * Escaped Newlines::Slightly looser rules for escaped newlines.
 * Subscripting::Any array can be subscripted, even if not an lvalue.
@@ -50,8 +50,8 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Compound Literals::   Compound literals give structures, unions
 or arrays as values.
 * Designated Inits::Labeling elements of initializers.
-* Cast to Union::   Casting to union type from any member of the union.
 * Case Ranges:: `case 1 ... 9' and such.
+* Cast to Union::   Casting to union type from any member of the union.
 * Mixed Declarations::  Mixing declarations and code.
 * Function Attributes:: Declaring that functions have no side effects,
 or that they can never return.
@@ -11653,6 +11653,18 @@ else
 @end smallexample
 @end table
 
+@node Other MIPS Built-in Functions
+@subsection Other MIPS Built-in Functions
+
+GCC provides other MIPS-specific built-in functions:
+
+@table @code
+@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr})
+Insert a @samp{cache} instruction with operands @var{op} and @var{addr}.
+GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE}
+when this function is available.
+@end table
+
 @node picoChip Built-in Functions
 @subsection picoChip Built-in Functions
 
@@ -11689,18 +11701,6 @@ implementing assertions.
 
 @end table
 
-@node Other MIPS Built-in Functions
-@subsection Other MIPS Built-in Functions
-
-GCC provides other MIPS-specific built-in functions:
-
-@table @code
-@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{add

[PATCH] C++ math constants

2013-02-21 Thread Ulrich Drepper
How about the attached file as a start for .  I used the
constexpr approach (instead of function calls) and replicated the
constants that are available in  in Unix.

What other constants to add?


math
Description: Binary data


Re: Fix some texinfo 5.0 warnings in gcc/doc + libiberty

2013-02-21 Thread Andreas Schwab
Tobias Burnus  writes:

> PS: I tried the following libiberty patch; it fixes a warning with texinfo
> 5.0. But I do not include it as it fails for some reason with an error
> with texinfo 4.13:
> ../../libiberty/libiberty.texi:250: Prev field of node `Functions' not
> pointed to.
> ../../libiberty//obstacks.texi:1: This node (Obstacks) has the bad Next.
> ...

obstacks.texi is using @chapter when it should use @subsection, so it
needs to be bracketed with @lowersections/@raisesections, like this:

diff --git a/libiberty/libiberty.texi b/libiberty/libiberty.texi
index 74f70d2..f1e4bdd 100644
--- a/libiberty/libiberty.texi
+++ b/libiberty/libiberty.texi
@@ -82,8 +82,6 @@ License; for more information, see @ref{Library Copying}.
 
 * Functions::  Available functions, macros, and global variables.
 
-* Obstacks::   Object Stacks.
-
 * Licenses::   The various licenses under which libiberty sources are
distributed.
 
@@ -245,7 +243,11 @@ central location from which to use, maintain, and 
distribute them.
 
 @c This is generated from the glibc manual using a make-obstacks-texi.sh
 @c script of Phil's.  Hope it's accurate.
+@lowersections
+@lowersections
 @include obstacks.texi
+@raisesections
+@raisesections
 
 @node Functions
 @chapter Function, Variable, and Macro Listing.
diff --git a/libiberty/obstacks.texi b/libiberty/obstacks.texi
index a1b1b47..67780aa 100644
--- a/libiberty/obstacks.texi
+++ b/libiberty/obstacks.texi
@@ -1,4 +1,4 @@
-@node Obstacks,Licenses,Functions,Top
+@node Obstacks
 @chapter Obstacks
 @cindex obstacks
 

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support

2013-02-21 Thread Maciej W. Rozycki
On Thu, 21 Feb 2013, Richard Sandiford wrote:

> >  This issue was originally raised here:
> >
> > http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html
> >
> >  We have a shortcoming in GCC in that we only allow the use half of the FP 
> > MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register 
> > mode (CP0.Status.FR == 1) on MIPS32r2 processors.  Furthermore we never 
> > enable the other half (NMADD.fmt and NMSUB.fmt) on those processors.  
> > However this whole instruction subset is always available on MIPS32r2 FPUs 
> > regardless of the mode selected, just as it always has been on FPUs of the 
> > 64-bit ISA line from MIPS IV up.
> 
> Hmm, this was discussed here:
> 
> http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00488.html
> http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00492.html
> 
> The footnote for COP1X instructions on page 12 of volume 1 of the MIPS32 ISA
> (v2.50) says:

 Please note that v2.50 is obsolete, several typos were corrected in later 
revisions.  Sadly I don't even have a copy of that exact revision of 
volume I.

> 1. In Release 1 of the Architecture, these instructions are legal only
>with a MIPS64 processor with 64-bit operations enabled (they are, in
>effect, actually MIPS64 instructions). In Release 2 of the Architecture,
>these instructions are legal with either a MIPS32 or MIPS64 processor
>_which includes a 64-bit floating point unit_.
> 
> (my emphasis).  "which" rather than "that" makes this a bit ambiguous,
> but various comments in the rest of the manual imply that MIPS32r2 allows
> an implementation choice between 32-bit and 64-bit FPUs.  E.g. page 8 says:
> 
>   Support for 64-bit coprocessors with 32-bit CPUs: These changes allow
>   a 64-bit coprocessor (including an FPU) to be attached to a 32-bit
>   CPU. This enhancement is optional in a Release 2 implementation.
> 
> and page 45 says:
> 
>   In addition to an Instruction Set Architecture, the MIPS architecture
>   definition includes processing resources such as the set of
>   coprocessor general registers. In Release 1 of the Architecture, the
>   32-bit registers in MIPS32 were enlarged to 64-bits in MIPS64;
>   however, these 64-bit FPU registers are not backwards
>   compatible. Instead, processors implementing the MIPS64 Architecture
>   provide a mode bit to select either the 32-bit or 64-bit register
>   model. In Release 2 of the Architecture, a 32-bit CPU _may_ include a
>   full 64-bit coprocessor, including a floating point unit which
>   implements the same mode bit to select 32-bit or 64-bit FPU register
>   model.
> 
> On page 322 of volume 2, the footnote for "Table A-20 MIPS64 COP1X
> Encoding of Function Field" uses slightly different wording:
> 
>   COP1X instructions are legal only if 64-bit floating point operations
>   are enabled.
> 
> So was this all a big misunderstanding on my part?  The TARGET_FLOAT64
> condition came from MIPS themselves, and when challenged they seemed
> pretty adamant that it was correct.  If I was wrong to be convinced
> by the explanation, I hope you can at least see why I was convinced. :-)

 As good as my English comprehension might be I believe your understanding 
of documentation is right.  However several inconsistencies have already 
been spotted and it may well be that for example the restriction on COP1X 
instructions is simply a leftover that leaked from rev.1 of the MIPS64 
architecture and was not spotted on the update (please note that obviously 
all the architecture documents are produced from templates shared between 
the MIPS32 and MIPS64 architectures with some parts being output 
conditionally as appropriate).

> If it wasn't a misunderstanding, then the point is that we can't tell
> from ISA_MIPS32R2 alone whether the target has a 32-bit or 64-bit FPU,
> but we know that it must have a 64-bit FPU if using TARGET_FLOAT64.

 Well, the important point is as I understand all the MIPS32r2 FPU 
instructions (not all the FP formats of course though, L and PS are 
optional) are mandatory no matter if the FPU is 32-bit or 64-bit.  This 
should rule out any question as to what kind of FPU has been implemented 
or whether the 64-bit mode has been enabled.  I could be wrong however.

 However the good news is it's not the architecture documentation set that 
is normative for architecture implementers, it's the AVP (Architecture 
Verification Program) they have to run successfully on their processors to 
claim compliance.

 Steve, would you therefore please do us a favour and check what the AVP 
for MIPS32r2 requires for support of the floating-point MADD instruction 
subset, or preferably, the whole COP1X instruction set?  Are these 
instructions only mandatory for a 64-bit FPU (CP1.FIR.F64 == 1), or do 
they have to be included in all FPU implementations?

> This part is OK, thanks, and is probably the only bit that's suitable for
> 4.8 at this stage.  Would you mind applying it separately?

 Applied now, thanks.

[PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)

2013-02-21 Thread Jakub Jelinek
Hi!

This patch teaches lower-subreg pass to also handle ASHIFTRTs with
BITS_PER_WORD to 2*BITS_PER_WORD-1 constant shift counts, like it already
handles similar LSHIFTRTs.
While for LSHIFTRT we should zero the upper half, for ASHIFTRT we either
should set it to upper source half >> (BITS_PER_WORD-1), or for
shifts by >> (2*BITS_PER_WORD-1) we can optimize that to one shift
followed by copying it from the lower to the upper half.

On the testcase from the PR this removes 3 unnecessary moves, so we are one
more better than 4.7 (thus fix a regression), and on the other testcases
either we generated the same quality of code, or better.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-02-21  Jakub Jelinek  

PR rtl-optimization/50339
* lower-subreg.h (struct lower_subreg_choices): Add splitting_ashiftrt
field.
* lower-subreg.c (compute_splitting_shift): Handle ASHIFTRT.
(compute_costs): Call compute_splitting_shift also for ASHIFTRT
into splitting_ashiftrt field.
(find_decomposable_shift_zext, resolve_shift_zext): Handle also
ASHIFTRT.
(dump_choices): Fix up printing LSHIFTRT choices, print ASHIFTRT
choices.

--- gcc/lower-subreg.h.jj   2013-02-21 14:10:39.033592663 +0100
+++ gcc/lower-subreg.h  2013-02-21 15:26:18.773634801 +0100
@@ -34,6 +34,7 @@ struct lower_subreg_choices {
  should be split.  */
   bool splitting_ashift[MAX_BITS_PER_WORD];
   bool splitting_lshiftrt[MAX_BITS_PER_WORD];
+  bool splitting_ashiftrt[MAX_BITS_PER_WORD];
 
   /* True if there is at least one mode that is worth splitting.  */
   bool something_to_do;
--- gcc/lower-subreg.c.jj   2013-02-21 14:10:38.975592966 +0100
+++ gcc/lower-subreg.c  2013-02-21 15:27:15.114316148 +0100
@@ -57,9 +57,9 @@ along with GCC; see the file COPYING3.
to do this.
 
This pass only splits moves with modes that are wider than
-   word_mode and ASHIFTs, LSHIFTRTs and ZERO_EXTENDs with integer
-   modes that are twice the width of word_mode.  The latter could be
-   generalized if there was a need to do this, but the trend in
+   word_mode and ASHIFTs, LSHIFTRTs, ASHIFTRTs and ZERO_EXTENDs with
+   integer modes that are twice the width of word_mode.  The latter
+   could be generalized if there was a need to do this, but the trend in
architectures is to not need this.
 
There are two useful preprocessor defines for use by maintainers:
@@ -152,7 +152,7 @@ compute_splitting_shift (bool speed_p, s
 bool *splitting, enum rtx_code code,
 int word_move_zero_cost, int word_move_cost)
 {
-  int wide_cost, narrow_cost, i;
+  int wide_cost, narrow_cost, upper_cost, i;
 
   for (i = 0; i < BITS_PER_WORD; i++)
 {
@@ -163,13 +163,20 @@ compute_splitting_shift (bool speed_p, s
   else
narrow_cost = shift_cost (speed_p, rtxes, code, word_mode, i);
 
+  if (code != ASHIFTRT)
+   upper_cost = word_move_zero_cost;
+  else if (i == BITS_PER_WORD - 1)
+   upper_cost = word_move_cost;
+  else
+   upper_cost = shift_cost (speed_p, rtxes, code, word_mode,
+BITS_PER_WORD - 1);
+
   if (LOG_COSTS)
fprintf (stderr, "%s %s by %d: original cost %d, split cost %d + %d\n",
 GET_MODE_NAME (twice_word_mode), GET_RTX_NAME (code),
-i + BITS_PER_WORD, wide_cost, narrow_cost,
-word_move_zero_cost);
+i + BITS_PER_WORD, wide_cost, narrow_cost, upper_cost);
 
-  if (FORCE_LOWERING || wide_cost >= narrow_cost + word_move_zero_cost)
+  if (FORCE_LOWERING || wide_cost >= narrow_cost + upper_cost)
splitting[i] = true;
 }
 }
@@ -248,6 +255,9 @@ compute_costs (bool speed_p, struct cost
   compute_splitting_shift (speed_p, rtxes,
   choices[speed_p].splitting_lshiftrt, LSHIFTRT,
   word_move_zero_cost, word_move_cost);
+  compute_splitting_shift (speed_p, rtxes,
+  choices[speed_p].splitting_ashiftrt, ASHIFTRT,
+  word_move_zero_cost, word_move_cost);
 }
 }
 
@@ -1153,6 +1163,7 @@ find_decomposable_shift_zext (rtx insn,
   op = SET_SRC (set);
   if (GET_CODE (op) != ASHIFT
   && GET_CODE (op) != LSHIFTRT
+  && GET_CODE (op) != ASHIFTRT
   && GET_CODE (op) != ZERO_EXTEND)
 return false;
 
@@ -1173,6 +1184,8 @@ find_decomposable_shift_zext (rtx insn,
 {
   bool *splitting = (GET_CODE (op) == ASHIFT
 ? choices[speed_p].splitting_ashift
+: GET_CODE (op) == ASHIFTRT
+? choices[speed_p].splitting_ashiftrt
 : choices[speed_p].splitting_lshiftrt);
   if (!CONST_INT_P (XEXP (op, 1))
  || !IN_RANGE (INTVAL (XEXP (op, 1)), BITS_PER_WORD,
@@ -1200,7 +1213,7 @@ resolve_shift_zext (rtx insn)
   rtx op;
   rtx op_operand

vec_concat, vec_duplicate doc updated (was: Re: question about section 10.12)

2013-02-21 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Mon, 28 Jan 2013 23:14:21 +0100

>   * doc/rtl.texi (vec_concat, vec_duplicate): Mention that
>   scalars are valid operands.

Finally committed as obvious after brief check of generated dvi
and info.

brgds, H-P


Re: [PATCH] C++ math constants

2013-02-21 Thread Daniel Krügler
2013/2/21 Ulrich Drepper :
> How about the attached file as a start for .  I used the
> constexpr approach (instead of function calls) and replicated the
> constants that are available in  in Unix.

1) In this case I miss the corresponding variable definitions, because
you violate the ODR, when you have something like the following:

#include 

template
void print(const T& t) { std::cout << t; }

int main() {
  print(__math_constants::__pi);
}

2) You need to use either braced initializers *or* using initializers
with equal, a parenthesized initializer isn't supported
(/brace-or-equal-initializer/).

- Daniel


Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Andi Kleen
> This was not for jump shortening, but the inliner heuristics.
> 
> In the worst case we could separate the two, would be a larger
> patch though.

Actually it's already separated, I don't affect the jump shortening
at all. Only the inliner code adds the limit.

So it would depend whether 1000 * weight is large enough for inlining.
Honza?

-Andi


Re: [PATCH] C++ math constants

2013-02-21 Thread Alec Teal

On 21/02/13 16:32, Ulrich Drepper wrote:

How about the attached file as a start for .  I used the
constexpr approach (instead of function calls) and replicated the
constants that are available in  in Unix.

What other constants to add?

Pi/3
ln(3)
ln(10) (for base conversions)
sqrt(3)
sqrt(5)
sqrt(7)
1/e

I'll write down what I use in my day to day stuff, keep you posted.




Add myself as maintainer

2013-02-21 Thread Andrew Sutton
Adding myself as Write After Approval maintainer.

Andrew Sutton
andrew.n.sut...@gmail.com


Index: ChangeLog
===
--- ChangeLog   (revision 196207)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2013-02-20  Andrew Sutton  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2013-02-15  Yufeng Zhang  

* configure.ac: Set libgloss_dir for the aarch64*-*-* targets.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 196207)
+++ MAINTAINERS (working copy)
@@ -523,6 +523,7 @@ Graham Stott
graham.stott@btinternet
 Andrew Stubbs  a...@codesourcery.com
 Mike Stump mikest...@comcast.net
 Jeff Sturm jst...@gcc.gnu.org
+Andrew Sutton  andrew.n.sut...@gmail.com
 Gabriele Sveltogabriele.sve...@st.com
 Sriraman Tallamtmsri...@google.com
 Chung-Lin Tang clt...@codesourcery.com


Re: [PATCH] C++ math constants

2013-02-21 Thread Benjamin De Kosnik

> > How about the attached file as a start for .  I used the
> > constexpr approach (instead of function calls) and replicated the
> > constants that are available in  in Unix.

then this should really be 

ext/cmath

> 
> 1) In this case I miss the corresponding variable definitions, because
> you violate the ODR, when you have something like the following:
> 
> #include 
> 
> template
> void print(const T& t) { std::cout << t; }
> 
> int main() {
>   print(__math_constants::__pi);
> }

Not seeing it.

Say for:

#include 

  // A class for math constants.
  template
struct __math_constants
{
  // Constant @f$ \pi @f$.
  static constexpr _RealType __pie =
  3.1415926535897932384626433832795029L; };

template
void print(const T& t) { std::cout << t; }

int main() 
{
  print(__math_constants::__pie);
  return 0;
}

I'm not getting any definition, even at -O0.

Any chance you could expand on your thinking here Daniel?


> 2) You need to use either braced initializers *or* using initializers
> with equal, a parenthesized initializer isn't supported
> (/brace-or-equal-initializer/).

Yeah. This works:

 // Constant @f$ \pi @f$.
 static constexpr _RealType __pie =
 3.1415926535897932384626433832795029L;


-benjamin



Re: [PATCH] Fix darwin bootstrap and merge.sh

2013-02-21 Thread Konstantin Serebryany
Thanks!
Does this need to regenerate libsanitizer/asan/Makefile.in ?
(It'll take a while for me to do this on Mac)

--kcc


On Thu, Feb 21, 2013 at 8:04 PM, Jack Howarth  wrote:
>   The attached patch fixes the broken bootstrap on darwin in libsanitizer due 
> to the
> deprecation of the dynamic/asan_interceptors_dynamic.cc file. The patch also 
> eliminates
> the merge of the deprecated subdirectory in merge.sh. Bootstrap and asan.exp 
> regression
> tested on x86_64-apple-darwin12. Okay for gcc trunk?
> Jack


Re: [PR middle-end/56108] handle transactions with ASMs in the first block

2013-02-21 Thread Richard Henderson
On 02/20/2013 07:35 AM, Aldy Hernandez wrote:
> In the following test, the first statement of a relaxed transaction is
> an inline asm:
> 
>   __transaction_relaxed { __asm__(""); }
> 
> Since we bypass inserting BUILT_IN_TM_IRREVOCABLE at the beginning of
> transactions that are sure to be irrevocable, later when we try to
> expand the transaction, we ICE when we encounter the inline asm.
> 
> Currently, we bypass the TM_IRREVOCABLE call here:
> 
>  for (region = d->all_tm_regions; region; region = region->next)
> {
>   /* If we're sure to go irrevocable, don't transform anything.  */
>   if (d->irrevocable_blocks_normal
>   && bitmap_bit_p (d->irrevocable_blocks_normal,
>region->entry_block->index))
> {
>   transaction_subcode_ior (region, GTMA_DOES_GO_IRREVOCABLE);
>   transaction_subcode_ior (region, GTMA_MAY_ENTER_IRREVOCABLE);
>   continue;
> }
> 
> If I understand this correctly, ideally a transaction marked as
> doesGoIrrevocable shouldn't bother instrumenting the statements inside,
> since the runtime will go irrevocable immediately.  In which case, we
> can elide the instrumentation altogether as the attached patch does.
> 
> If my analysis is correct, then testsuite/gcc.dg/tm/memopt-1.c would
> surely go irrevocable, thus requiring no instrumentation, causing the
> memory optimizations to get skipped altogether.  In which case, it's
> best to mark the function calls as safe, so they don't cause the
> transaction to become obviously irrevocable.
> 
> Is this correct?  If so, OK?

Yes, that's correct.  The patch is ok.


r~


Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Jan Hubicka
> > This was not for jump shortening, but the inliner heuristics.
> > 
> > In the worst case we could separate the two, would be a larger
> > patch though.
> 
> Actually it's already separated, I don't affect the jump shortening
> at all. Only the inliner code adds the limit.
> 
> So it would depend whether 1000 * weight is large enough for inlining.
> Honza?

I think it is safely more than enough. 1000*weight makes the function safely
uninlinable. So only it cuts of precision in large function growth/large unit
growth logic.  The logic is skewed here: large function growth is there really
for compiler visible code to prevent non-linear algorithm explosion.  For large
unit growth I duno - in a way those gigantic asm statements that are not going
to be duplicated can be tought as not part of the unit for this putpose, too.

So I am fine with the cutoff.  We may need to add more overflow guards (we
already have quite few for time) that makes me wonder if all this should not be
done all in saturating arithmetic now when it can be done theoretically with one
C++ class?

Honza
> 
> -Andi


Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Andi Kleen
> So I am fine with the cutoff.  We may need to add more overflow guards (we
> already have quite few for time) that makes me wonder if all this should not 
> be
> done all in saturating arithmetic now when it can be done theoretically with 
> one
> C++ class?

Sounds like a good idea, although I don't know how intrusive that is. But can
try. 

Still would like to use the 1000 for the 4.7 backport. Ok?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Richard Earnshaw

On 21/02/13 15:59, Andi Kleen wrote:

That doesn't sound enough, unless there is already code out there
that respects this count.  1000 at 4 bytes per instruction is only
4k.   More that small enough for the rest of the compiler to think
that it could jump around such blocks cheaply.

I think a limit of 1M or more might be more appropriate.


I got an overflow for 6.4M, so 1M would be dangerously near that.
100k perhaps ?

This was not for jump shortening, but the inliner heuristics.



My mistake.  However, the real problem here is that you're summing a 
potentially large number of items on the expectation that the result 
will be small enough to be scaled successfully by the weight function.


A better patch would be to a apply a much larger limit at the time when 
the count of instructions is finally used, rather than picking some 
arbitrary number here in the hope that it won't cause overflow later on.


Yes, that's a bigger patch, but it would be a more robust solution than 
picking an arbitrary number here.



In the worst case we could separate the two, would be a larger
patch though.

-Andi






Re: Re: [PATCH] C++ math constants

2013-02-21 Thread 3dw4rd
 
On 02/21/13, Alec Teal wrote:
 
On 21/02/13 16:32, Ulrich Drepper wrote:
> How about the attached file as a start for . I used the
> constexpr approach (instead of function calls) and replicated the
> constants that are available in  in Unix.
>
> What other constants to add?
Pi/3
ln(3)
ln(10) (for base conversions)
sqrt(3)
sqrt(5)
sqrt(7)
1/e

I'll write down what I use in my day to day stuff, keep you posted.


I use the Euler constant a *lot* in special functions.  Also, I use the golden 
ratio to size dialog boxes in GUI work.
 
   // Constant @f$  @f$.
  static constexpr _RealType 
__pi_third(1.0471975511965977461542144610931676L);
  // Constant @f$ \gamma_E @f$.
  static constexpr _RealType 
__gamma_e(0.5772156649015328606065120900824024L);
  // Constant @f$ \phi = (1 + \sqrt(5))/2 @f$.
  static constexpr _RealType 
__golden_ratio(1.6180339887498948482045868343656381L);



Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)

2013-02-21 Thread Richard Henderson
On 02/21/2013 08:42 AM, Jakub Jelinek wrote:
> @@ -1243,12 +1258,20 @@ resolve_shift_zext (rtx insn)
>dest_reg = simplify_gen_subreg_concatn (word_mode, SET_DEST (set),
>GET_MODE (SET_DEST (set)),
>offset1);
> -  dest_zero = simplify_gen_subreg_concatn (word_mode, SET_DEST (set),
> -   GET_MODE (SET_DEST (set)),
> -   offset2);
> +  dest_upper = simplify_gen_subreg_concatn (word_mode, SET_DEST (set),
> + GET_MODE (SET_DEST (set)),
> + offset2);
>src_reg = simplify_gen_subreg_concatn (word_mode, op_operand,
>   GET_MODE (op_operand),
>   src_offset);
> +  if (GET_CODE (op) == ASHIFTRT
> +  && INTVAL (XEXP (op, 1)) != 2 * BITS_PER_WORD - 1)
> +{
> +  rtx tem = expand_shift (RSHIFT_EXPR, word_mode, copy_rtx (src_reg),
> +   BITS_PER_WORD - 1, dest_upper, 0);
> +  if (dest_upper != tem)
> + emit_move_insn (dest_upper, tem);
> +}
>if (GET_CODE (op) != ZERO_EXTEND)
>  {
>int shift_count = INTVAL (XEXP (op, 1));
> @@ -1257,12 +1280,15 @@ resolve_shift_zext (rtx insn)
>   LSHIFT_EXPR : RSHIFT_EXPR,
>   word_mode, src_reg,
>   shift_count - BITS_PER_WORD,
> - dest_reg, 1);
> + dest_reg, GET_CODE (op) != ASHIFTRT);
>  }
>  
>if (dest_reg != src_reg)
>  emit_move_insn (dest_reg, src_reg);
> -  emit_move_insn (dest_zero, CONST0_RTX (word_mode));
> +  if (GET_CODE (op) != ASHIFTRT)
> +emit_move_insn (dest_upper, CONST0_RTX (word_mode));
> +  else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
> +emit_move_insn (dest_upper, copy_rtx (src_reg));
>insns = get_insns ();

Am I missing something?  This looks like it would clobber the input too
early in the case of

(set (reg:DI x) (ashiftrt (reg:DI x) (const_int 60)))

where src_reg and dest_upper could resolve to the same concatn, and thus
the same SImode registers underneath?

Don't you need to delay that upper copy til the final block?


r~




Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 10:40:18AM -0800, Richard Henderson wrote:
> Am I missing something?  This looks like it would clobber the input too
> early in the case of
> 
>   (set (reg:DI x) (ashiftrt (reg:DI x) (const_int 60)))
> 
> where src_reg and dest_upper could resolve to the same concatn, and thus
> the same SImode registers underneath?
> 
> Don't you need to delay that upper copy til the final block?

You're right, thanks for catching that.

So, do you prefer e.g. the following, or would you instead prefer
that I just set some rtx to copy_rtx (src_reg) before the
if (GET_CODE (op) != ZERO_EXTEND)
  {
...
  }
block and do the >> (BITS_PER_WORD - 1) shift only after this (then it can
be with dest_upper as target and moved only if expand_shift returns
something different)?

For the patch below, I've verified no codegen changes from the earlier patch
on the 3 testcases from the PR.

2013-02-21  Jakub Jelinek  

PR rtl-optimization/50339
* lower-subreg.h (struct lower_subreg_choices): Add splitting_ashiftrt
field.
* lower-subreg.c (compute_splitting_shift): Handle ASHIFTRT.
(compute_costs): Call compute_splitting_shift also for ASHIFTRT
into splitting_ashiftrt field.
(find_decomposable_shift_zext, resolve_shift_zext): Handle also
ASHIFTRT.
(dump_choices): Fix up printing LSHIFTRT choices, print ASHIFTRT
choices.

--- gcc/lower-subreg.h.jj   2013-02-21 14:10:39.033592663 +0100
+++ gcc/lower-subreg.h  2013-02-21 15:26:18.773634801 +0100
@@ -34,6 +34,7 @@ struct lower_subreg_choices {
  should be split.  */
   bool splitting_ashift[MAX_BITS_PER_WORD];
   bool splitting_lshiftrt[MAX_BITS_PER_WORD];
+  bool splitting_ashiftrt[MAX_BITS_PER_WORD];
 
   /* True if there is at least one mode that is worth splitting.  */
   bool something_to_do;
--- gcc/lower-subreg.c.jj   2013-02-21 14:10:38.975592966 +0100
+++ gcc/lower-subreg.c  2013-02-21 19:49:02.289774205 +0100
@@ -57,9 +57,9 @@ along with GCC; see the file COPYING3.
to do this.
 
This pass only splits moves with modes that are wider than
-   word_mode and ASHIFTs, LSHIFTRTs and ZERO_EXTENDs with integer
-   modes that are twice the width of word_mode.  The latter could be
-   generalized if there was a need to do this, but the trend in
+   word_mode and ASHIFTs, LSHIFTRTs, ASHIFTRTs and ZERO_EXTENDs with
+   integer modes that are twice the width of word_mode.  The latter
+   could be generalized if there was a need to do this, but the trend in
architectures is to not need this.
 
There are two useful preprocessor defines for use by maintainers:
@@ -152,7 +152,7 @@ compute_splitting_shift (bool speed_p, s
 bool *splitting, enum rtx_code code,
 int word_move_zero_cost, int word_move_cost)
 {
-  int wide_cost, narrow_cost, i;
+  int wide_cost, narrow_cost, upper_cost, i;
 
   for (i = 0; i < BITS_PER_WORD; i++)
 {
@@ -163,13 +163,20 @@ compute_splitting_shift (bool speed_p, s
   else
narrow_cost = shift_cost (speed_p, rtxes, code, word_mode, i);
 
+  if (code != ASHIFTRT)
+   upper_cost = word_move_zero_cost;
+  else if (i == BITS_PER_WORD - 1)
+   upper_cost = word_move_cost;
+  else
+   upper_cost = shift_cost (speed_p, rtxes, code, word_mode,
+BITS_PER_WORD - 1);
+
   if (LOG_COSTS)
fprintf (stderr, "%s %s by %d: original cost %d, split cost %d + %d\n",
 GET_MODE_NAME (twice_word_mode), GET_RTX_NAME (code),
-i + BITS_PER_WORD, wide_cost, narrow_cost,
-word_move_zero_cost);
+i + BITS_PER_WORD, wide_cost, narrow_cost, upper_cost);
 
-  if (FORCE_LOWERING || wide_cost >= narrow_cost + word_move_zero_cost)
+  if (FORCE_LOWERING || wide_cost >= narrow_cost + upper_cost)
splitting[i] = true;
 }
 }
@@ -248,6 +255,9 @@ compute_costs (bool speed_p, struct cost
   compute_splitting_shift (speed_p, rtxes,
   choices[speed_p].splitting_lshiftrt, LSHIFTRT,
   word_move_zero_cost, word_move_cost);
+  compute_splitting_shift (speed_p, rtxes,
+  choices[speed_p].splitting_ashiftrt, ASHIFTRT,
+  word_move_zero_cost, word_move_cost);
 }
 }
 
@@ -1153,6 +1163,7 @@ find_decomposable_shift_zext (rtx insn,
   op = SET_SRC (set);
   if (GET_CODE (op) != ASHIFT
   && GET_CODE (op) != LSHIFTRT
+  && GET_CODE (op) != ASHIFTRT
   && GET_CODE (op) != ZERO_EXTEND)
 return false;
 
@@ -1173,6 +1184,8 @@ find_decomposable_shift_zext (rtx insn,
 {
   bool *splitting = (GET_CODE (op) == ASHIFT
 ? choices[speed_p].splitting_ashift
+: GET_CODE (op) == ASHIFTRT
+? choices[speed_p].splitting_ashiftrt
 : choices[speed_p].splitting

Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)

2013-02-21 Thread Richard Henderson
On 02/21/2013 10:58 AM, Jakub Jelinek wrote:
> So, do you prefer e.g. the following, or would you instead prefer
> that I just set some rtx to copy_rtx (src_reg) before the
> if (GET_CODE (op) != ZERO_EXTEND)
>   {
> ...
>   }
> block and do the >> (BITS_PER_WORD - 1) shift only after this (then it can
> be with dest_upper as target and moved only if expand_shift returns
> something different)?

This new patch is fine.

> For the patch below, I've verified no codegen changes from the earlier patch
> on the 3 testcases from the PR.

Excellent.


r~


Re: [PATCH] C++ math constants

2013-02-21 Thread Oleg Endo
On Thu, 2013-02-21 at 17:14 +, Alec Teal wrote:
> On 21/02/13 16:32, Ulrich Drepper wrote:
> > How about the attached file as a start for .  I used the
> > constexpr approach (instead of function calls) and replicated the
> > constants that are available in  in Unix.
> >
> > What other constants to add?
> Pi/3

> 1/e
> 

I'm wondering, why adding constants for these if the compiler can figure
it out from e.g. "math::pi/3" (or whatever the namespace is)?

> ln(3)
> ln(10) (for base conversions)
> sqrt(3)
> sqrt(5)
> sqrt(7)

Aren't these evaluated by the compiler and converted to constants?

Cheers,
Oleg



[LIBFFI] Re: Re: [PATCH] Add support for PaX enable kernels (MPROTECT)

2013-02-21 Thread Dave Korn
On 07/11/2012 00:14, Magnus Granberg wrote:

> 2012-11-07  Magnus Granberg   Pavel Labushev   
>* configure.ac: Add --enable-pax_emutramp for PaX enable kernels.
>* src/closures.c: Add emutramp_enabled_check. Don't mmap with PROT_EXEC
>   on PaX enable Kernels.
>* README: Add description for --enable-pax_emutramp.
>* fficonfig.h.in: Rebuilt.
>* configure.ac: Rebuilt.

Hi lists,

  There was a small problem with this (upstream relative to gcc) libffi
patch(*).  The entire #ifdef FFI_MMAP_EXEC_EMUTRAMP_PAX clause is contained
within an outer #if !defined(X86_WIN32) && !defined(X86_WIN64) clause.  That
means that Windows platforms don't get the default definition of
is_emutramp_enabled() supplied by the #else clause.  However,
is_emutramp_enabled() is unconditionally referenced in dlmmap(), and without
this default definition Windows targets fail to compile.

  The attached patch fixes this by moving the #else clause with the default
is_emutramp_enabled() definition to a standalone #ifndef clause outside any
enclosing conditional compilation test.  I couldn't think of a better way to
do it; the #if !(windows) clause is followed by a #elif (cygwin/interix)
clause, so I'd have had to put a default definition in there and also a second
one in a subsequent unconditional #else if I didn't move it out of the
enclosing #if scope altogether.

  Gcc-patches: Assuming AG approves, can we commit this without waiting for an
upstream libffi release and doing a full merge?  Currently GCC HEAD won't
build libffi (and hence libjava) without it.

2013-02-21  Dave Korn  

* src/closures.c (is_emutramp_enabled [!FFI_MMAP_EXEC_EMUTRAMP_PAX]):
Move default definition outside enclosing #if scope.

cheers,
  DaveK
-- 
(*) - Patch: http://sourceware.org/ml/libffi-discuss/2012/msg00269.html
- Thread: http://sourceware.org/ml/libffi-discuss/2012/threads.html#00247
Index: src/closures.c
===
--- src/closures.c	(revision 196167)
+++ src/closures.c	(working copy)
@@ -189,8 +189,6 @@ emutramp_enabled_check (void)
 
 #define is_emutramp_enabled() (emutramp_enabled >= 0 ? emutramp_enabled \
: (emutramp_enabled = emutramp_enabled_check ()))
-#else
-#define is_emutramp_enabled() 0
 #endif /* FFI_MMAP_EXEC_EMUTRAMP_PAX */
 
 #elif defined (__CYGWIN__) || defined(__INTERIX)
@@ -202,6 +200,10 @@ emutramp_enabled_check (void)
 
 #endif /* !defined(X86_WIN32) && !defined(X86_WIN64) */
 
+#ifndef FFI_MMAP_EXEC_EMUTRAMP_PAX
+#define is_emutramp_enabled() 0
+#endif /* FFI_MMAP_EXEC_EMUTRAMP_PAX */
+
 /* Declare all functions defined in dlmalloc.c as static.  */
 static void *dlmalloc(size_t);
 static void dlfree(void*);


Re: [PATCH] C++ math constants

2013-02-21 Thread Jakub Jelinek
On Thu, Feb 21, 2013 at 08:06:02PM +0100, Oleg Endo wrote:
> On Thu, 2013-02-21 at 17:14 +, Alec Teal wrote:
> > On 21/02/13 16:32, Ulrich Drepper wrote:
> > > How about the attached file as a start for .  I used the
> > > constexpr approach (instead of function calls) and replicated the
> > > constants that are available in  in Unix.
> > >
> > > What other constants to add?
> > Pi/3
> 
> > 1/e
> > 
> 
> I'm wondering, why adding constants for these if the compiler can figure
> it out from e.g. "math::pi/3" (or whatever the namespace is)?

It can't, you generally want to avoid double rounding for these, first
rounding infinite precision pi to the corresponding floating point format and
then performing in that finite precision division by 3 (and rounding again
to that precision) might very well give different results from dividing
infinite precision pi by 3 and only then rounding the result to the
corresponding precision.  Haven't tried it, so you might be lucky and in
some of float, double or long double it might be equal.

> > ln(3)
> > ln(10) (for base conversions)
> > sqrt(3)
> > sqrt(5)
> > sqrt(7)
> 
> Aren't these evaluated by the compiler and converted to constants?

Only if optimizing.  Furthermore, I don't think std::sqrt etc. are
constexpr, therefore you can't use them to initialize constexpr vars or in
constexpr functions.

Jakub


Re: [LIBFFI] Re: Re: [PATCH] Add support for PaX enable kernels (MPROTECT)

2013-02-21 Thread Anthony Green
On Thu, Feb 21, 2013 at 2:22 PM, Dave Korn  wrote:
>   Gcc-patches: Assuming AG approves, can we commit this without waiting for an
> upstream libffi release and doing a full merge?  Currently GCC HEAD won't
> build libffi (and hence libjava) without it.

This patch looks fine, thanks.  I don't plan to merge back into GCC
for at least a week or two, so I think you should commit it to the GCC
tree independently.

This means that 3.0.12 is broken for Cygwin.  Are you able to produce
test results once you apply this change?  I should probably issue a
quick 3.0.13 if the results are decent.

Thanks,

AG


>
> 2013-02-21  Dave Korn  
>
> * src/closures.c (is_emutramp_enabled [!FFI_MMAP_EXEC_EMUTRAMP_PAX]):
> Move default definition outside enclosing #if scope.
>
> cheers,
>   DaveK
> --
> (*) - Patch: http://sourceware.org/ml/libffi-discuss/2012/msg00269.html
> - Thread: http://sourceware.org/ml/libffi-discuss/2012/threads.html#00247


Re: FW: [PATCH] [MIPS] microMIPS gcc support

2013-02-21 Thread Richard Sandiford
"Moore, Catherine"  writes:
>> Looks good otherwise, but please post an updated patch.  It's probably
>> stating the obvious, but the patch is too invasive for this late stage
>> of 4.8, so
>> it'll need to wait until 4.9.
>> 
>
> The updated patch is attached.  How's it looking this time?

Almost there. :-)

As I mentioned in an earlier reply, please always check for warnings.
If you're not bootstrapping on a MIPS host then instead configure the
cross compiler with --enable-werror-always, using a recent GCC as the
host compiler.  If the patch goes in with warnings, it'll break the
build for other people.

The warnings I could see are:

> +  for (n = 0; n < XVECLEN (pattern, 0); n++)
> +{
> +  rtx set, reg, mem, this_base;
> +  HOST_WIDE_INT this_offset;
> +  unsigned int this_regno;

this_regno is an unused variable (thanks for making it unused though).

>  static bool
>  mips_function_ok_for_sibcall (tree decl, tree exp ATTRIBUTE_UNUSED)
>  {
> +
> +  unsigned int compression_mode = mips_get_compress_mode (decl);

compression_mode is unused.

> +static bool
> +umips_build_save_restore (mips_save_restore_fn fn,
> +   unsigned *mask, HOST_WIDE_INT *offset)
> +{
> +  int i, nregs;
> +  unsigned int j;
> +  rtx pattern, set, reg, mem;
> +  HOST_WIDE_INT this_offset;
> +  rtx this_base;
> +
> +  /* Try matching $16 to $31 (s0 to ra).  */
> +  for (i = 0; i < ARRAY_SIZE (umips_swm_mask); i++)
> +if ((*mask & 0x) == umips_swm_mask[i])
> +  break;
> +
> +  if (i == ARRAY_SIZE (umips_swm_mask))
> +return false;

"i" needs to be unsigned to avoid a "comparison between signed and unsigned"
warning.

> +/* Return the assembly instruction for microMIPS lwm or swm.
> +   SAVE_P and PATTERN are as for umips_save_restore_pattern_p.  */
> +
> +const char *
> +umips_output_save_restore (bool save_p, rtx pattern)
> +{
> +  static char buffer[300];
> +  char *s;
> +  int n;
> +  HOST_WIDE_INT offset;
> +  rtx base, mem, set, reg, last_set, last_reg;
> +
> +  /* Parse the pattern.  */
> +  gcc_assert (umips_save_restore_pattern_p (save_p, pattern));
> +
> +  s = strcpy (buffer, save_p ? "swm\t" : "lwm\t");
> +  s += strlen (s);
> +  n = XVECLEN (pattern, 0);
> +
> +  set = XVECEXP (pattern, 0, 0);
> +  reg = save_p ? SET_SRC (set) : SET_DEST (set);

"reg" is unused and should be deleted.

In case it sounds otherwise, I didn't build the patch to try to catch you out.
I built it so that I could try to answer:

> > Is this a test of the swap_p case?  Thanks if so, but could you add a 
> > comment
> > saying where the SWP gets generated, and why the test needs to be skipped
> > at -O1 and -Os?
> > 
> Yes, this was the swap_p case.  But it turns out that umips-lpw-swp-1.c
> was also testing the swap_p case.  I've now dropped this test in favor
> of a different test that exercises the swap_p = false case.
> You had mentioned testing for explicit registers.  This is turning out
> to be problematic.  The Linux and ELF tool chains don't always use the
> same register set.  In addition, -O3 seems to expose more opportunities
> for generating these instructions so the code generated isn't consistent
> among optimization options.  I've changed it back to just testing for
> swp, but I'm open to other suggestions.

I've attached some tests below, both for LWP (which didn't seem to be
covered) and SWP.  They use things like multiplication to force a
particular load or store order in scheduled code, so that both swap_p
and !swap_p are covered (verified using both printfs and -fno-peephole2).
umips-lwp-7.c tests for:

  /* Avoid invalid load pair instructions.  */
  if (load_p && REGNO (first_reg) == REGNO (base1))
return false;

  /* We must avoid this case for anti-dependence.
 Ex:  lw $3, 4($3)
  lw $2, 0($3)
 first_reg is $2, but the base is $3.  */
  if (load_p
  && swap_p
  && REGNO (first_reg) + 1 == REGNO (base1))
return false;

while umips-lwp-8.c is an example of something that would be pessimised
by not having the swap_p test above.

I also noticed a couple of other things while testing:

- ISA_HAS_LOAD_DELAY should be false for microMIPS, otherwise we'll try
  to insert nops after LWP on targets that default to -mips1, and ICE
  when we see that we have a double SET.

- In the testsuite, a -mmicromips test option should override an ambient
  -mips16, and vice versa.

- umips-movep-2.c fails on n64 because there the moves are DImode rather
  than SImode.  Since the initial patch is just for 32-bit, I forced
  -mgp32 for now.  Also, the function needs a MICROMIPS attribute.

There were various other minor formatting and comment nits that were probably
easier for me to fix than describe.  There was also some trailing whitespace.

The patch is OK for 4.9 with this one applied on top of it, provided that
it works with your setup of course.  Thanks for your patience!

Richard


Index: gcc/config/mips/constraints.md
==

[committed] Fix expand_mult (PR middle-end/56420)

2013-02-21 Thread Jakub Jelinek
Hi!

I've committed the following fix for the following testcase.
When scalar_op1 is 0x8000 with 64-bit HWI,
it matches EXACT_POWER_OF_2_OR_ZERO_P, but we should expand it as
negation of the << 63 shift rather than the << 63 shift alone.

The patch also improves multiplication by 0x8000
in TImode, which can be done as << 63 shift, and multiplication by
-(((TImode)1) << N) for N 0 through 62, which can be also expanded as
negation of left shift.

Furthermore I've noticed several places where we could invoke signed integer
overflows inside of the compiler and fixed them.

Bootstrapped/regtested on x86_64-linux and i686-linux, acked by Richard
on IRC, committed to trunk.

2013-02-21  Jakub Jelinek  

PR middle-end/56420
* expmed.c (EXACT_POWER_OF_2_OR_ZERO_P): Do subtraction in uhwi, to
avoid signed wrapping.
(expand_mult): Handle properly multiplication by
((dword_type) -1) << (BITS_PER_WORD - 1).  Improve multiplication by
((dword_type) 1) << (BITS_PER_WORD - 1).  Avoid undefined behavior
in the compiler if coeff is HOST_WIDE_INT_MIN.
(expand_divmod): Don't make ext_op1 static, change it's type to
uhwi.  Avoid undefined behavior in -INTVAL (op1).

* gcc.dg/torture/pr56420.c: New test.

--- gcc/expmed.c.jj 2013-02-13 21:46:52.0 +0100
+++ gcc/expmed.c2013-02-21 19:25:05.692298011 +0100
@@ -64,7 +64,8 @@ static rtx expand_smod_pow2 (enum machin
 static rtx expand_sdiv_pow2 (enum machine_mode, rtx, HOST_WIDE_INT);
 
 /* Test whether a value is zero of a power of two.  */
-#define EXACT_POWER_OF_2_OR_ZERO_P(x) (((x) & ((x) - 1)) == 0)
+#define EXACT_POWER_OF_2_OR_ZERO_P(x) \
+  (((x) & ((x) - (unsigned HOST_WIDE_INT) 1)) == 0)
 
 struct init_expmed_rtl
 {
@@ -3079,7 +3080,10 @@ expand_mult (enum machine_mode mode, rtx
  /* If we are multiplying in DImode, it may still be a win
 to try to work with shifts and adds.  */
  if (CONST_DOUBLE_HIGH (scalar_op1) == 0
- && CONST_DOUBLE_LOW (scalar_op1) > 0)
+ && (CONST_DOUBLE_LOW (scalar_op1) > 0
+ || (CONST_DOUBLE_LOW (scalar_op1) < 0
+ && EXACT_POWER_OF_2_OR_ZERO_P
+  (CONST_DOUBLE_LOW (scalar_op1)
{
  coeff = CONST_DOUBLE_LOW (scalar_op1);
  is_neg = false;
@@ -3109,7 +3113,8 @@ expand_mult (enum machine_mode mode, rtx
 use synth_mult.  */
 
   /* Special case powers of two.  */
-  if (EXACT_POWER_OF_2_OR_ZERO_P (coeff))
+  if (EXACT_POWER_OF_2_OR_ZERO_P (coeff)
+ && !(is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT))
return expand_shift (LSHIFT_EXPR, mode, op0,
 floor_log2 (coeff), target, unsignedp);
 
@@ -3124,13 +3129,24 @@ expand_mult (enum machine_mode mode, rtx
 result is interpreted as an unsigned coefficient.
 Exclude cost of op0 from max_cost to match the cost
 calculation of the synth_mult.  */
+ coeff = -(unsigned HOST_WIDE_INT) coeff;
  max_cost = (set_src_cost (gen_rtx_MULT (mode, fake_reg, op1), speed)
  - neg_cost(speed, mode));
- if (max_cost > 0
- && choose_mult_variant (mode, -coeff, &algorithm,
- &variant, max_cost))
+ if (max_cost <= 0)
+   goto skip_synth;
+
+ /* Special case powers of two.  */
+ if (EXACT_POWER_OF_2_OR_ZERO_P (coeff))
+   {
+ rtx temp = expand_shift (LSHIFT_EXPR, mode, op0,
+  floor_log2 (coeff), target, unsignedp);
+ return expand_unop (mode, neg_optab, temp, target, 0);
+   }
+
+ if (choose_mult_variant (mode, coeff, &algorithm, &variant,
+  max_cost))
{
- rtx temp = expand_mult_const (mode, op0, -coeff, NULL_RTX,
+ rtx temp = expand_mult_const (mode, op0, coeff, NULL_RTX,
&algorithm, variant);
  return expand_unop (mode, neg_optab, temp, target, 0);
}
@@ -3813,13 +3829,12 @@ expand_divmod (int rem_flag, enum tree_c
   int op1_is_constant, op1_is_pow2 = 0;
   int max_cost, extra_cost;
   static HOST_WIDE_INT last_div_const = 0;
-  static HOST_WIDE_INT ext_op1;
   bool speed = optimize_insn_for_speed_p ();
 
   op1_is_constant = CONST_INT_P (op1);
   if (op1_is_constant)
 {
-  ext_op1 = INTVAL (op1);
+  unsigned HOST_WIDE_INT ext_op1 = UINTVAL (op1);
   if (unsignedp)
ext_op1 &= GET_MODE_MASK (mode);
   op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1)
@@ -3967,7 +3982,7 @@ expand_divmod (int rem_flag, enum tree_c
   op1_is_pow2 = (op1_is_constant
 && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
  || (! unsignedp
-   

Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support

2013-02-21 Thread Mike Stump
On Feb 20, 2013, at 3:50 PM, Maciej W. Rozycki  wrote:
> BTW, do you happen to know a way to reliable force all our testsuites NOT 
> to delete executables after run?  Personally I think it's missing the 
> point to have them deleted -- how can one debug any regressions then?

So, I go into the source and comment out the cleanup lines…  Would be nice to 
have them conditional on a variable so that one can set that variable in their 
.dejagnurc file.

The limitations of this are, multiple test cases can use the same intermediate 
files and they would not give you a nice 1 to 1 mapping back.  However, this I 
usually side step by only running a subset of the test suite that contains the 
part I'm interested in.

> What do other people do?

We usually can cut and paste the one line and run the one case by hand.  Test 
cases that don't fall into this category, well, suck, I mean, are more annoying.



do not pass PR_INSTRUMENTEDCODE if there is no instrumentation

2013-02-21 Thread Aldy Hernandez
It has come to my attention, by..ahem...you, that in handling the 
testcase from my previous patch:


  __transaction_relaxed { __asm__(""); }

...if we avoid instrumentation altogether, we shouldn't lie to the 
run-time and pass PR_INSTRUMENTEDCODE.


I suppose we could cheat and avoid passing PR_INSTRUMENTEDCODE if we 
ever enter expand_block_tm(), but perhaps we could do a little better as 
with the attached patch.


With this patch, we generate:

  tm_state.2_6 = __builtin__ITM_beginTransaction (16458); [ 
uninstrumentedCode hasNoAbort doesGoIrrevocable readOnly ]


Notice there is no "instrumentedCode".

By the way, can you double check my logic in expand_assign_tm() and 
expand_call_tm(), particularly if we should bother setting 
GTMA_INSTRUMENTED_CODE for thread private variables?


Thanks.
commit a441d916db334b71e76d9f4350ffdf992c058b4f
Author: Aldy Hernandez 
Date:   Thu Feb 21 16:02:11 2013 -0600

* trans-mem.c (expand_transaction): Only set PR_INSTRUMENTEDCODE
if there was instrumented code.
(expand_call_tm): Set GTMA_INSTRUMENTED_CODE.
(expand_assign_tm): Same.
* gimple.h (GTMA_INSTRUMENTED_CODE): New.

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 4bd6b3d..baa5c24 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -661,6 +661,8 @@ struct GTY(()) gimple_statement_omp_atomic_store {
tell the runtime that it should begin the transaction in
serial-irrevocable mode.  */
 #define GTMA_DOES_GO_IRREVOCABLE   (1u << 6)
+/* The transaction has at least one instrumented instruction within.  */
+#define GTMA_INSTRUMENTED_CODE (1u << 7)
 
 struct GTY(()) gimple_statement_transaction
 {
diff --git a/gcc/testsuite/gcc.dg/tm/instrumented-mask.c 
b/gcc/testsuite/gcc.dg/tm/instrumented-mask.c
new file mode 100644
index 000..6cfd3e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tm/instrumented-mask.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -fdump-tree-tmmark" } */
+
+/* If we're sure to go irrevocable, as in the case below, do not pass
+   PR_INSTRUMENTEDCODE to the run-time if there is nothing
+   instrumented within the transaction.  */
+
+int
+main()
+{
+  __transaction_relaxed { __asm__(""); }
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times " instrumentedCode" 0 "tmmark" } } */
+/* { dg-final { cleanup-tree-dump "tmmark" } } */
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 71eaa44..315f00c 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -2146,6 +2146,9 @@ expand_assign_tm (struct tm_region *region, 
gimple_stmt_iterator *gsi)
 {
   /* Add thread private addresses to log if applicable.  */
   requires_barrier (region->entry_block, lhs, stmt);
+  // ?? Should we set GTMA_INSTRUMENTED_CODE here if there is no
+  // instrumentation, but requires_barrier added a thread private
+  // address to the log?
   gsi_next (gsi);
   return;
 }
@@ -2153,6 +2156,8 @@ expand_assign_tm (struct tm_region *region, 
gimple_stmt_iterator *gsi)
   // Remove original load/store statement.
   gsi_remove (gsi, true);
 
+  transaction_subcode_ior (region, GTMA_INSTRUMENTED_CODE);
+
   if (load_p && !store_p)
 {
   transaction_subcode_ior (region, GTMA_HAVE_LOAD);
@@ -2230,9 +2235,12 @@ expand_call_tm (struct tm_region *region,
 
   if (fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMCPY)
   || fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMMOVE))
-transaction_subcode_ior (region, GTMA_HAVE_STORE | GTMA_HAVE_LOAD);
+transaction_subcode_ior (region, GTMA_HAVE_STORE
+| GTMA_HAVE_LOAD
+| GTMA_INSTRUMENTED_CODE);
   if (fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMSET))
-transaction_subcode_ior (region, GTMA_HAVE_STORE);
+transaction_subcode_ior (region, GTMA_HAVE_STORE
+| GTMA_INSTRUMENTED_CODE);
 
   if (is_tm_pure_call (stmt))
 return false;
@@ -2243,7 +2251,8 @@ expand_call_tm (struct tm_region *region,
 {
   /* Assume all non-const/pure calls write to memory, except
 transaction ending builtins.  */
-  transaction_subcode_ior (region, GTMA_HAVE_STORE);
+  transaction_subcode_ior (region, GTMA_HAVE_STORE
+  | GTMA_INSTRUMENTED_CODE);
 }
 
   /* For indirect calls, we already generated a call into the runtime.  */
@@ -2602,7 +2611,7 @@ expand_transaction (struct tm_region *region, void *data 
ATTRIBUTE_UNUSED)
   flags |= PR_HASNOABORT;
 if ((subcode & GTMA_HAVE_STORE) == 0)
   flags |= PR_READONLY;
-if (inst_edge)
+if (inst_edge && (subcode & GTMA_INSTRUMENTED_CODE))
   flags |= PR_INSTRUMENTEDCODE;
 if (uninst_edge)
   flags |= PR_UNINSTRUMENTEDCODE;


Re: [PATCH] C++ math constants

2013-02-21 Thread Daniel Krügler
2013/2/21 Benjamin De Kosnik :
>
>> > How about the attached file as a start for .  I used the
>> > constexpr approach (instead of function calls) and replicated the
>> > constants that are available in  in Unix.
>
> then this should really be
>
> ext/cmath
>
>>
>> 1) In this case I miss the corresponding variable definitions, because
>> you violate the ODR, when you have something like the following:
>>
>> #include 
>>
>> template
>> void print(const T& t) { std::cout << t; }
>>
>> int main() {
>>   print(__math_constants::__pi);
>> }
>
> Not seeing it.

An implementation is not required to diagnose it. You will notice a
linker error with Clang for example and this is conforming, because
__math_constants::__pi is ODR-used.

> Say for:
>
> #include 
>
>   // A class for math constants.
>   template
> struct __math_constants
> {
>   // Constant @f$ \pi @f$.
>   static constexpr _RealType __pie =
>   3.1415926535897932384626433832795029L; };
>
> template
> void print(const T& t) { std::cout << t; }
>
> int main()
> {
>   print(__math_constants::__pie);
>   return 0;
> }
>
> I'm not getting any definition, even at -O0.
>
> Any chance you could expand on your thinking here Daniel?

No ;-)

- Daniel


Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support

2013-02-21 Thread Ramana Radhakrishnan
>> What do other people do?
>
> We usually can cut and paste the one line and run the one case by hand.  Test 
> cases that don't fall into this category, well, suck, I mean, are more 
> annoying.
>

These days I find contrib/repro_fail 
file.log.useful to rebuild the failing test(s).

regards
Ramana


RE: PR target/52555: attribute optimize is overriding command line options

2013-02-21 Thread Steve Ellcey

On Mon, Feb 18, 2013 at 12:50:59PM -0600, Aldy Hernandez wrote:
> OK pending tests?

>   PR target/52555
>   * genopinit.c (raw_optab_handler): Use this_fn_optabs.
>   (swap_optab_enable): Same.
>   (init_all_optabs): Use argument instead of global.
>   * tree.h (struct tree_optimization_option): New field
>   target_optabs.
>   * expr.h (init_all_optabs): Add argument to prototype.
>   (TREE_OPTIMIZATION_OPTABS): New.
>   (save_optabs_if_changed): Protoize.
>   * optabs.h: Declare this_fn_optabs.
>   * optabs.c (save_optabs_if_changed): New.
>   Declare this_fn_optabs.
>   (init_optabs): Add argument to init_all_optabs() call.
>   * function.c (invoke_set_current_function_hook): Handle per
>   function optabs.
>   * function.h (struct function): New field optabs.
>   * config/mips/mips.c (mips_set_mips16_mode): Handle when
>   optimization_current_node has changed.
>   * target-globals.h (save_target_globals_default_opts): Protoize.
>   * target-globals.c (save_target_globals_default_opts): New.
> c/family
>   PR target/52555
>   * c-common.c (handle_optimize_attribute): Call
>   save_optabs_if_changed.

Aldy,

Have you gotten any reports of problems with this patch?  It seems to be 
sending cc1 into an infinite
loop during the GCC testsuite for me.  I am testing the mips-mti-linux-gnu 
target and tests like
gcc.target/mips/call-saved-1.c are causing cc1 to suck up all my memory and 
swap space before the
test times out.

I will keep digging and see if I can figure out what is going on but I wanted 
to see if anyone else has
reported this problem.

Steve Ellcey
sell...@imgtec.com


Re: PR target/52555: attribute optimize is overriding command line options

2013-02-21 Thread Aldy Hernandez



Have you gotten any reports of problems with this patch?  It seems to be 
sending cc1 into an infinite
loop during the GCC testsuite for me.  I am testing the mips-mti-linux-gnu 
target and tests like
gcc.target/mips/call-saved-1.c are causing cc1 to suck up all my memory and 
swap space before the
test times out.


No, I have not.


I will keep digging and see if I can figure out what is going on but I wanted 
to see if anyone else has
reported this problem.



Feel free to open a PR and CC me on it.

Thanks.



Re: [PATCH] C++ math constants

2013-02-21 Thread Gabriel Dos Reis
On Thu, Feb 21, 2013 at 4:18 PM, Daniel Krügler
 wrote:
> 2013/2/21 Benjamin De Kosnik :
>>
>>> > How about the attached file as a start for .  I used the
>>> > constexpr approach (instead of function calls) and replicated the
>>> > constants that are available in  in Unix.
>>
>> then this should really be
>>
>> ext/cmath
>>
>>>
>>> 1) In this case I miss the corresponding variable definitions, because
>>> you violate the ODR, when you have something like the following:
>>>
>>> #include 
>>>
>>> template
>>> void print(const T& t) { std::cout << t; }
>>>
>>> int main() {
>>>   print(__math_constants::__pi);
>>> }
>>
>> Not seeing it.
>
> An implementation is not required to diagnose it. You will notice a
> linker error with Clang for example and this is conforming, because
> __math_constants::__pi is ODR-used.
>
>> Say for:
>>
>> #include 
>>
>>   // A class for math constants.
>>   template
>> struct __math_constants
>> {
>>   // Constant @f$ \pi @f$.
>>   static constexpr _RealType __pie =
>>   3.1415926535897932384626433832795029L; };
>>
>> template
>> void print(const T& t) { std::cout << t; }
>>
>> int main()
>> {
>>   print(__math_constants::__pie);
>>   return 0;
>> }
>>
>> I'm not getting any definition, even at -O0.
>>
>> Any chance you could expand on your thinking here Daniel?
>
> No ;-)
>
> - Daniel

There will be a proposal for 'constexpr variable template' in the Bristol
mailing.  It should take care of these issues.

-- Gaby


Re: [PATCH] Fix handling of very long asm statements in inliner

2013-02-21 Thread Andi Kleen
On Thu, Feb 21, 2013 at 07:19:10PM +0100, Andi Kleen wrote:
> > So I am fine with the cutoff.  We may need to add more overflow guards (we
> > already have quite few for time) that makes me wonder if all this should 
> > not be
> > done all in saturating arithmetic now when it can be done theoretically 
> > with one
> > C++ class?
> 
> Sounds like a good idea, although I don't know how intrusive that is. But can
> try. 

I looked at this now and it would be very intrusive. Not only in
tree-inline,  but also various callers elsewhere. And there's no
saturating class either, so that would need to be added too.

I think I prefer the simple solution.

BTW this is a regression, old gccs compiled this.

-Andi


Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...

2013-02-21 Thread Konstantin Serebryany
Jakub, thanks again for cleaning up my mess.

Here is a question regarding your fix:
> -#if ASAN_USE_PREINIT_ARRAY
> +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC)

The PIC macro is an artifact of the GCC build system and is not
directly related the the -fPIC flag?
As I can see, in the gcc build we compile all of asan sources twice:
w/ and w/o "-fPIC -DPIC".
If I move the preinit_array to a separate file (asan_preinit.cc), will
we need to have two builds?
I think we can just build all objects once with -fPIC and then not
link asan_preinit.o into libasan.so
In clang we build the static libasan with -fPIC, it doesn't hurt.
Anyway, I've just committed
http://llvm.org/viewvc/llvm-project?rev=175871&view=rev with
asan_preinit.cc

--kcc


On Thu, Feb 21, 2013 at 5:26 PM, Konstantin Serebryany
 wrote:
> On Thu, Feb 21, 2013 at 5:21 PM, Jakub Jelinek  wrote:
>> On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote:
>>> This commit breaks the build if the BFD linker is used (I have gold on
>>> my box, so I missed it).
>>>
>>> Short repro:
>>> % cat preinit.cc
>>> void foo() {}
>>> __attribute__((section(".preinit_array")))  void (*xxx)(void) = foo;
>>> % g++ preinit.cc -shared # gold
>>> % sudo apt-get remove  binutils-gold
>>> ...
>>> % g++ preinit.cc -shared # bfd
>>> /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO
>>> /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable
>>> section on output
>>> collect2: ld returned 1 exit status
>>> %
>>>
>>> Can we stop building the asan-rt as DSO and leave only the static
>>> variant (as in clang)?
>>
>> No, IMNSHO it is desirable to support also that.
>
> It may cause I more trouble (we've seen a couple of bugs already) then do 
> good.
> Anyway, we can get rid of this later.
>
>>
>> Here is a different fix, so libasan.so will not have .preinit_array, but
>> libasan.a will have it.  Ideally, that hunk should go into a separate
>> source file (asan_preinit.cc ?), be just compiled into an object file,
>> rather than shared library and the compiler driver should include it
>> explicitly in the link.
>>
>> The used attribute is there because, as it isn't (or shouldn't) be exported
>> out of the library, if libasan was built with LTO, it could very well be
>> optimized away.
>>
>> 2013-02-21  Jakub Jelinek  
>>
>> * asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is
>> defined.  Add used attribute.
>>
>> --- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100
>> +++ libsanitizer/asan/asan_rtl.cc   2013-02-21 14:16:28.985547506 +0100
>> @@ -520,11 +520,11 @@ void __asan_init() {
>>}
>>  }
>>
>> -#if ASAN_USE_PREINIT_ARRAY
>> +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC)
>>// On Linux, we force __asan_init to be called before anyone else
>>// by placing it into .preinit_array section.
>>// FIXME: do we have anything like this on Mac?
>> -  __attribute__((section(".preinit_array")))
>> +  __attribute__((section(".preinit_array"), used))
>>void (*__asan_preinit)(void) =__asan_init;
>>  #elif defined(_WIN32) && defined(_DLL)
>>// On Windows, when using dynamic CRT (/MD), we can put a pointer
>
> Thanks!
> May I ask you to commit this to gcc (I have to run away now)?
> I'll put this into upstream (maybe with asan_preinit.cc as you
> suggest) tomorrow.
>
> --kcc
>
>>
>>
>> Jakub