Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)
> Examples of warnings (many of them occuring many up to several dozens of > times): > warning: @anchor should not appear in @heading > warning: @title missing argument > warning: @itemize has text but no @item > warning: node `XXX' is next for `YYY' in menu but not in sectioning > warning: node `XXX' is up for `YYY' in menu but not in sectioning > warning: @itemize has text but no @item > warning: node next `XXX' in menu `YYY' and in sectioning `ZZZ' differ > warning: @tex should only appear at a line beginning > warning: command @tie does not accept arguments > > Ok for trunk? The ChangeLog entries should be split in the various directories. Changes in the ada directory are OK, thanks. Arno
Re: [Patch, Fortran] PR 56385: [4.6/4.7/4.8 Regression] [OOP] ICE with allocatable function result in a procedure-pointer component
Janus Weil wrote: here is a straightforward patch which fixes a regression with procedure-pointer components which have an allocatable result. Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk/4.7/4.6? OK thanks for the patch. Tobias 2013-02-20 Janus Weil PR fortran/56385 * trans-array.c (structure_alloc_comps): Handle procedure-pointer components with allocatable result. 2013-02-20 Janus Weil PR fortran/56385 * gfortran.dg/proc_ptr_comp_37.f90: New.
Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support
"Maciej W. Rozycki" writes: > This issue was originally raised here: > > http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html > > We have a shortcoming in GCC in that we only allow the use half of the FP > MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register > mode (CP0.Status.FR == 1) on MIPS32r2 processors. Furthermore we never > enable the other half (NMADD.fmt and NMSUB.fmt) on those processors. > However this whole instruction subset is always available on MIPS32r2 FPUs > regardless of the mode selected, just as it always has been on FPUs of the > 64-bit ISA line from MIPS IV up. Hmm, this was discussed here: http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00488.html http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00492.html The footnote for COP1X instructions on page 12 of volume 1 of the MIPS32 ISA (v2.50) says: 1. In Release 1 of the Architecture, these instructions are legal only with a MIPS64 processor with 64-bit operations enabled (they are, in effect, actually MIPS64 instructions). In Release 2 of the Architecture, these instructions are legal with either a MIPS32 or MIPS64 processor _which includes a 64-bit floating point unit_. (my emphasis). "which" rather than "that" makes this a bit ambiguous, but various comments in the rest of the manual imply that MIPS32r2 allows an implementation choice between 32-bit and 64-bit FPUs. E.g. page 8 says: Support for 64-bit coprocessors with 32-bit CPUs: These changes allow a 64-bit coprocessor (including an FPU) to be attached to a 32-bit CPU. This enhancement is optional in a Release 2 implementation. and page 45 says: In addition to an Instruction Set Architecture, the MIPS architecture definition includes processing resources such as the set of coprocessor general registers. In Release 1 of the Architecture, the 32-bit registers in MIPS32 were enlarged to 64-bits in MIPS64; however, these 64-bit FPU registers are not backwards compatible. Instead, processors implementing the MIPS64 Architecture provide a mode bit to select either the 32-bit or 64-bit register model. In Release 2 of the Architecture, a 32-bit CPU _may_ include a full 64-bit coprocessor, including a floating point unit which implements the same mode bit to select 32-bit or 64-bit FPU register model. On page 322 of volume 2, the footnote for "Table A-20 MIPS64 COP1X Encoding of Function Field" uses slightly different wording: COP1X instructions are legal only if 64-bit floating point operations are enabled. So was this all a big misunderstanding on my part? The TARGET_FLOAT64 condition came from MIPS themselves, and when challenged they seemed pretty adamant that it was correct. If I was wrong to be convinced by the explanation, I hope you can at least see why I was convinced. :-) If it wasn't a misunderstanding, then the point is that we can't tell from ISA_MIPS32R2 alone whether the target has a 32-bit or 64-bit FPU, but we know that it must have a 64-bit FPU if using TARGET_FLOAT64. > Also, according to MIPS IV ISA documentation these operations are only > fused (i.e. don't match original IEEE 754-1985 accuracy requirements) on > the original MIPS IV R8000 CPU, and MIPS architecture specs don't mention > any limitations of these instructions either, so I have updated the GCC > manual to document that on non-R8000 CPUs (which are ones we really care > about) they are numerically equivalent to computations made with > corresponding individual operations. This part is OK, thanks, and is probably the only bit that's suitable for 4.8 at this stage. Would you mind applying it separately? > Finally, while at it, I found it interesting that we have separate > conditions to cover MADD.fmt/MSUB.fmt (ISA_HAS_FP_MADD4_MSUB4) and > NMADD.fmt/NMADD.fmt (ISA_HAS_NMADD4_NMSUB4) while all the four > instructions need to be implemented as a whole group per data format > supported and cannot be separated (the MIPS architecture specification > explicitly forbids subsetting). The difference between the two conditions > is the former expands to ISA_HAS_FP4, that is enables the subsubset for > any MIPS IV and up FPU while the latter has an extra "&& (!TARGET_MIPS5400 > || TARGET_MAD)" qualifier. > > I went ahead and checked available NEC VR54xx documentation and here's > what I came up with: > > 1. "VR5400 MIPS RISC Microprocessor Family" datasheet (NEC doc #13362) >says: > >"The VR5400 processor family complies with the MIPS IV instruction set >and IEEE-754 floating-point and IEEE-1149.1/1149.1a JTAG specification, >[...]" > > 2. "VR5432 MIPS RISC Microprocessor User's Manual, Volume 2" (NEC doc >#13751) lists all the individual MADD.fmt, MSUB.fmt, NMADD.fmt and >NMSUB.fmt instructions in Chapter 18 "Floating-Point Unit Instruction >Set" with no restrictions as to their availability (the only other >member of the VR54xx family I know of is the VR5464
Re: [PATCH] Don't expand *MEM_REF using extract_bit_field or movmisalign for EXPAND_MEMORY (PR inline-asm/56405)
On Thu, 21 Feb 2013, Jakub Jelinek wrote: > Hi! > > If an input operand of inline-asm doesn't allow registers, but allows > memory, we expand it with EXPAND_MEMORY modifier (the only case of using > that modifier). But the movmisalign code added for 4.6 and especially > the extract_bit_field code added for 4.8 results in getting a REG from > expand_expr called with a *MEM_REF with EXPAND_MEMORY, instead of MEM the > caller expects and can only handle, therefore we ICE. > > This patch fixes it by honoring EXPAND_MEMORY in these cases, it is the sole > responsibility of the inline asm writer to do the right thing in there for > misaligned mems (e.g. gcc could pessimistically expect it might be > misaligned, but user knows it will not). > > Bootstrapped/regtested on {x86_64,i686,armv7hl,ppc,ppc64,s390,s390x}-linux, > ok for trunk? Ok. Thanks, Richard. > 2013-02-21 Jakub Jelinek > > PR inline-asm/56405 > * expr.c (expand_expr_real_1) : Don't > use movmisalign or extract_bit_field for EXPAND_MEMORY modifier. > > * gcc.c-torture/compile/pr56405.c: New test. > > --- gcc/expr.c.jj 2013-01-18 18:09:40.0 +0100 > +++ gcc/expr.c2013-02-20 10:29:34.513143634 +0100 > @@ -9551,6 +9551,7 @@ expand_expr_real_1 (tree exp, rtx target > set_mem_addr_space (temp, as); > align = get_object_alignment (exp); > if (modifier != EXPAND_WRITE > + && modifier != EXPAND_MEMORY > && mode != BLKmode > && align < GET_MODE_ALIGNMENT (mode) > /* If the target does not have special handling for unaligned > @@ -9639,6 +9640,7 @@ expand_expr_real_1 (tree exp, rtx target > if (TREE_THIS_VOLATILE (exp)) > MEM_VOLATILE_P (temp) = 1; > if (modifier != EXPAND_WRITE > + && modifier != EXPAND_MEMORY > && mode != BLKmode > && align < GET_MODE_ALIGNMENT (mode)) > { > --- gcc/testsuite/gcc.c-torture/compile/pr56405.c.jj 2013-02-20 > 10:32:17.807250979 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr56405.c 2013-02-20 > 10:32:46.963090873 +0100 > @@ -0,0 +1,7 @@ > +/* PR inline-asm/56405 */ > + > +void > +foo (void) > +{ > + asm volatile ("" : "+m" (*(volatile unsigned short *) 0x1001UL)); > +} > > Jakub > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)
On Thu, 21 Feb 2013, Jakub Jelinek wrote: > Hi! > > Currently it is not possible to bootstrap gcc with texinfo 5.0. > This patch attempts to fix the errors that prevent bootstrap, there are tons > of warnings this doesn't address and would be good if somebody more TeXinfo > knowledgeable looked at it. > > Bootstrapped/regtested on x86_64-linux and i686-linux both with texinfo 5.0 > and older texinfo installed. > > The errors were: > ../../gcc/doc/invoke.texi:5615: @itemx must follow @item > ../../gcc/ada/gnat-style.texi:45: unknown command `hfill' > doc/projects.texi:51: @pxref reference to nonexistent node `Scenarios > in Projects' > doc/projects.texi:363: @pxref reference to nonexistent node `Organizing > Projects into Subsystems' > doc/projects.texi:391: @xref reference to nonexistent node `Project > Extension' > ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected > `smallexample', but saw `table' > ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected > `smallexample', but saw `table' > ../../../../../libjava/classpath/doc/cp-tools.texinfo:2030: `@end' expected > `smallexample', but saw `table' > > Examples of warnings (many of them occuring many up to several dozens of > times): > warning: @anchor should not appear in @heading > warning: @title missing argument > warning: @itemize has text but no @item > warning: node `XXX' is next for `YYY' in menu but not in sectioning > warning: node `XXX' is up for `YYY' in menu but not in sectioning > warning: @itemize has text but no @item > warning: node next `XXX' in menu `YYY' and in sectioning `ZZZ' differ > warning: @tex should only appear at a line beginning > warning: command @tie does not accept arguments > > Ok for trunk? Ok. Should we fix the error on branches as well? Thanks, Richard. > 2013-02-21 Jakub Jelinek > > PR bootstrap/56258 > * doc/invoke.texi (-fdump-rtl-pro_and_epilogue): Use @item > instead of @itemx. > > * gnat-style.texi (@title): Remove @hfill. > * projects.texi: Avoid line wrapping inside of @pxref or > @xref. > > * doc/cp-tools.texinfo (Virtual Machine Options): Use just > one @gccoptlist instead of 3 separate ones. > > --- gcc/doc/invoke.texi.jj2013-01-31 22:57:22.0 +0100 > +++ gcc/doc/invoke.texi 2013-02-20 13:06:47.516405739 +0100 > @@ -5612,7 +5612,7 @@ Dump after the peephole pass. > @opindex fdump-rtl-postreload > Dump after post-reload optimizations. > > -@itemx -fdump-rtl-pro_and_epilogue > +@item -fdump-rtl-pro_and_epilogue > @opindex fdump-rtl-pro_and_epilogue > Dump after generating the function prologues and epilogues. > > --- gcc/ada/gnat-style.texi.jj2012-08-10 12:57:33.0 +0200 > +++ gcc/ada/gnat-style.texi 2013-02-20 13:06:03.042667300 +0100 > @@ -42,7 +42,7 @@ Texts. A copy of the license is include > @titlepage > @titlefont{GNAT Coding Style:} > @sp 1 > -@title @hfill A Guide for GNAT Developers > +@title A Guide for GNAT Developers > @subtitle GNAT, The GNU Ada Compiler > @versionsubtitle > @author Ada Core Technologies, Inc. > --- gcc/ada/projects.texi.jj 2013-01-04 11:16:24.0 +0100 > +++ gcc/ada/projects.texi 2013-02-20 17:48:41.582645159 +0100 > @@ -48,8 +48,7 @@ project files allow you to specify: > @item Source file naming conventions; you can specify these either globally > or for >individual compilation units (@pxref{Naming Schemes}). > @item Change any of the above settings depending on external values, thus > enabling > - the reuse of the projects in various @b{scenarios} (@pxref{Scenarios > - in Projects}). > + the reuse of the projects in various @b{scenarios} (@pxref{Scenarios in > Projects}). > @item Automatically build libraries as part of the build process >(@pxref{Library Projects}). > > @@ -360,8 +359,8 @@ locating the specified source files in t > > @item For various reasons, it is sometimes useful to have a project with no >sources (most of the time because the attributes defined in the project > - file will be reused in other projects, as explained in @pxref{Organizing > - Projects into Subsystems}. To do this, the attribute > + file will be reused in other projects, as explained in > + @pxref{Organizing Projects into Subsystems}. To do this, the attribute >@emph{Source_Files} is set to the empty list, i.e. @code{()}. > Alternatively, >@emph{Source_Dirs} can be set to the empty list, with the same >result. > @@ -388,8 +387,9 @@ locating the specified source files in t >This can be done thanks to the attribute @b{Excluded_Source_Files} >(or its synonym @b{Locally_Removed_Files}). >Its value is the list of file names that should not be taken into account. > - This attribute is often used when extending a project, @xref{Project > - Extension}. A similar attribute @b{Excluded_Source_List_File} plays the > same > + This attribute is often used when extending a p
[Patch, Fortran, committed] PR56416 - Fix texinfo 5.0 warnings
Motivated by Jakub's email at http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00966.html The problem was that the order in @menu didn't match the order of the @node (@section/@chapter). I have committed the attached patch as obvious (Rev. 196194). fortran/*.texi is now warning free for "make info" and "make html". ("make pdf" still shows the usual TeX warnings about overfull boxes and similar.) Tobias 2012-02-21 Tobias Burnus PR fortran/56416 * gfortran.texi (Part II: Language Reference, Extensions, Non-Fortran Main Program): Sort @menu to match actual section order. * intrinsic.texi (Intrinsic Procedures): Ditto. (C_F_POINTER, PRECISION): Move to the alphabetically correct place. diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 2dccb16..462b443 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -182,8 +182,8 @@ Part I: Invoking GNU Fortran Part II: Language Reference * Fortran 2003 and 2008 status:: Fortran 2003 and 2008 features supported by GNU Fortran. * Compiler Characteristics:: User-visible implementation details. +* Extensions::Language extensions implemented by GNU Fortran. * Mixed-Language Programming::Interoperability with C -* Extensions:: Language extensions implemented by GNU Fortran. * Intrinsic Procedures:: Intrinsic procedures supported by GNU Fortran. * Intrinsic Modules::Intrinsic modules supported by GNU Fortran. @@ -1348,8 +1348,8 @@ without warning. * Commas in FORMAT specifications:: * Missing period in FORMAT specifications:: * I/O item lists:: -* BOZ literal constants:: * @code{Q} exponent-letter:: +* BOZ literal constants:: * Real array indices:: * Unary operators:: * Implicitly convert LOGICAL and INTEGER values:: @@ -2698,8 +2698,8 @@ the same declaration part as the variable or procedure pointer. * _gfortran_set_options:: Set library option flags * _gfortran_set_convert:: Set endian conversion * _gfortran_set_record_marker:: Set length of record markers -* _gfortran_set_max_subrecord_length:: Set subrecord length * _gfortran_set_fpe:: Set when a Floating Point Exception should be raised +* _gfortran_set_max_subrecord_length:: Set subrecord length @end menu Even if you are doing mixed-language programming, it is very diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi index 91f2fea..4a48425 100644 --- a/gcc/fortran/intrinsic.texi +++ b/gcc/fortran/intrinsic.texi @@ -87,9 +87,9 @@ Some basic guidelines for editing this document: * @code{CHMOD}: CHMOD, Change access permissions of files * @code{CMPLX}: CMPLX, Complex conversion function * @code{COMMAND_ARGUMENT_COUNT}: COMMAND_ARGUMENT_COUNT, Get number of command line arguments -* @code{COMPLEX}: COMPLEX, Complex conversion function -* @code{COMPILER_VERSION}: COMPILER_VERSION, Compiler version string * @code{COMPILER_OPTIONS}: COMPILER_OPTIONS, Options passed to the compiler +* @code{COMPILER_VERSION}: COMPILER_VERSION, Compiler version string +* @code{COMPLEX}: COMPLEX, Complex conversion function * @code{CONJG}: CONJG, Complex conjugate function * @code{COS}: COS, Cosine function * @code{COSH}: COSH, Hyperbolic cosine function @@ -234,12 +234,12 @@ Some basic guidelines for editing this document: * @code{PRESENT}: PRESENT, Determine whether an optional dummy argument is specified * @code{PRODUCT}: PRODUCT, Product of array elements * @code{RADIX}: RADIX, Base of a data model +* @code{RAN}: RAN, Real pseudo-random number +* @code{RAND}: RAND, Real pseudo-random number * @code{RANDOM_NUMBER}: RANDOM_NUMBER, Pseudo-random number * @code{RANDOM_SEED}: RANDOM_SEED, Initialize a pseudo-random number sequence -* @code{RAND}: RAND, Real pseudo-random number * @code{RANGE}: RANGE, Decimal exponent range * @code{RANK} : RANK, Rank of a data object -* @code{RAN}: RAN, Real pseudo-random number * @code{REAL}: REAL, Convert to real type * @code{RENAME}:RENAME,Rename a file * @code{REPEAT}:REPEAT,Repeated string concatenation @@ -2271,60 +2271,57 @@ end subroutine association_test @end table -@node C_FUNLOC -@section @code{C_FUNLOC} --- Obtain the C address of a procedure -@fnindex C_FUNLOC -@cindex pointer, C address of procedures +@node C_F_POINTER +@section @code{C_F_POINTER} --- Convert C into Fortran pointer +@fnindex C_F_POINTER +@cindex pointer, convert C to Fortran @table @asis @item @emph{Description}: -@code{C_FUNLOC(x)} determines the C address of the argument. +@code{C_F_POINTER(CPTR, FPTR[, SHAPE])} assigns the target of the C pointer +@var{CPTR} to the Fortran pointer @var{FPTR} and specifies its shape. @item @emph{Standard}: Fortran 2003 and later @item @emph{Class}: -Inquiry function +Subroutine @item @emph{Synt
Re: [PATCH] Small *.texi{,info} fixes for texinfo 5.0 (PR bootstrap/56258)
On Thu, Feb 21, 2013 at 10:07:56AM +0100, Richard Biener wrote: > > Ok for trunk? > > Ok. Should we fix the error on branches as well? The patch applied cleanly to both 4.7 and 4.6, so I've committed it there, but as I don't have texinfo 5.0 installed on my devel WS (we have it just for Fedora 19 and in the buildsystem for that right now), I haven't verified those are the only errors with texinfo 5.0 on the release branches. Jakub
Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed
On Thu, Feb 21, 2013 at 1:10 AM, Steven Bosscher wrote: > Hello, > > The attached patch splits a new function df_insn_info_delete from > df_insn_delete. The original motivation was to get rid of the silly > "deleting insn with uid = ..." messages when re-scanning an insn, > because the mentioned insn isn't deleted at all (it's just rescanned). > But it turns out that there is also a modest but measurable speed-up > (especially at -O0), probably because of avoiding the overhead of > df_grow_bb_info and df_grow_reg_info in common usage of > df_insn_delete. > > Bootstrapped&tested on powerpc64-unknown-linux-gnu and on > x86_64-unknown-linux-gnu. OK for trunk? +/* Delete all of the refs information from the insn with UID. + Internal helper for df_insn_info_delete, df_insn_rescan, and other df_insn_delete I suppose + df-scan routines that don't have to work in deferred mode and don't + have to mark basic blocks for re-processing. */ -void -df_insn_delete (basic_block bb, unsigned int uid) +static void +df_insn_info_delete (unsigned int uid) { +/* Delete all of the refs information from INSN, either right now + or marked for later in deferred mode. + + FIXME: BB must be passed in, to mark the basic block containing + the insn as dirty, but emit-rtl.c doesn't do so. */ you mean the remove_insn call? So it should call df_insn_info_delete instead? It _does_ mark the block as dirtly later, if it is not a debug insn. Your change changes behavior for df->changeable_flags & DF_DEFER_INSN_RESCAN btw, so it needs someone DF aware to review and that makes it stage1 material as well I think. It's also odd how insn_info == NULL is handled here. Richard. > Ciao! > Steven
[PATCH] Add testcase for PR56398
This adds a reduced testcase for PR56398. Ok for trunk? 2013-02-21 Marek Polacek PR tree-optimization/56398 * g++.dg/torture/pr56398.C: New test. --- gcc/g++.dg/torture/pr56398.C.mp 2013-02-21 10:58:14.388913070 +0100 +++ gcc/g++.dg/torture/pr56398.C2013-02-21 10:58:53.301026474 +0100 @@ -0,0 +1,22 @@ +// { dg-do compile } +// { dg-options "-g" } + +namespace +{ +#0 "/usr/include/c/4.8/bits/postypes.h" 3 +} + +vtkpow (int b) +{ + int a1; + int b1; + int c; + while (b1) +{ + while (b) +b1 = 0; + b1 = b1 - 1; + c = c * a1; +} + return c; +} Marek
Re: [PATCH] Add testcase for PR56398
On Thu, Feb 21, 2013 at 11:05 AM, Marek Polacek wrote: > This adds a reduced testcase for PR56398. Ok for trunk? Ok. Thanks, Richard. > 2013-02-21 Marek Polacek > > PR tree-optimization/56398 > * g++.dg/torture/pr56398.C: New test. > > --- gcc/g++.dg/torture/pr56398.C.mp 2013-02-21 10:58:14.388913070 +0100 > +++ gcc/g++.dg/torture/pr56398.C2013-02-21 10:58:53.301026474 +0100 > @@ -0,0 +1,22 @@ > +// { dg-do compile } > +// { dg-options "-g" } > + > +namespace > +{ > +#0 "/usr/include/c/4.8/bits/postypes.h" 3 > +} > + > +vtkpow (int b) > +{ > + int a1; > + int b1; > + int c; > + while (b1) > +{ > + while (b) > +b1 = 0; > + b1 = b1 - 1; > + c = c * a1; > +} > + return c; > +} > > Marek
Re: [PATCH][RFC] Less TODO_remove_unused_locals
Hunting for the "we're getting slower" bits I noticed that TODO_remove_unused_locals is a big part of execute_function_todo (and accounts for 1% of compile-time of ac.f90). The following patch removes most of the remove_unused_locals calls based on the fact that with anonymous SSA names now available we should never create new locals (wishful thinking of course ...) We still can kill references to existing ones. and the important places to remove unused stuff are driven by 1) avoid creating yet another copy of the unused stuff, thus do it before inlining, on the callee; 2) avoid pinning unused memory while we operate on other function bodies, thus, do it at the end of non-IPA pass pipelines In the end this asks for more explicit placement and thus a real pass ... but the following should be enough as a RFC and be good enough for 4.8. Yes, it looks good to me. I introduced this remove_unused_locals stuff and I was aware it is executed quite too often. I meant to get some data on this but never got around that and moreover the number of execution has increased over the years as we increased number of passes... My recollection is that at that time the pass also did dropping of some flags (ADDRESSABLE?) promoting into Gimple registers. This was motivation to keep it working often since it was code quality thing too. I suppose this is now done elsewhere. Honza
Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed
On Thu, Feb 21, 2013 at 10:59 AM, Richard Biener wrote: > On Thu, Feb 21, 2013 at 1:10 AM, Steven Bosscher wrote: > +/* Delete all of the refs information from the insn with UID. > + Internal helper for df_insn_info_delete, df_insn_rescan, and other > > df_insn_delete I suppose Right. > + df-scan routines that don't have to work in deferred mode and don't > + have to mark basic blocks for re-processing. */ > > -void > -df_insn_delete (basic_block bb, unsigned int uid) > +static void > +df_insn_info_delete (unsigned int uid) > { > > +/* Delete all of the refs information from INSN, either right now > + or marked for later in deferred mode. > + > + FIXME: BB must be passed in, to mark the basic block containing > + the insn as dirty, but emit-rtl.c doesn't do so. */ > > you mean the remove_insn call? So it should call df_insn_info_delete instead? Yes, remove_insn calls df_insn_delete with argument bb=NULL. This goes against the interface documented in df-scan. It should call df_insn_delete with a non-NULL bb argument. df_insn_info_delete should remain internal to df-scan.c. (Ideally, "struct df_insn_info" would be internal to DF, IMHO.) > It _does_ mark the block as dirtly later, if it is not a debug insn. Yes, it does. But it should have been left up to df_insn_delete to make the block dirty. It looks like remove_insn only passes NULL because it calls df_insn_delete on non-insns and it doesn't pass BLOCK_FOR_INSN because it feeds BARRIERs to df_insn_delete -- totally bogus of course... For GCC 4.9 I want to make df_insn_delete take care of the basic block argument itself, and remove all df_set_bb_dirty calls from emit-rtl.c. > Your change changes behavior for df->changeable_flags & > DF_DEFER_INSN_RESCAN btw, Not really. All paths to df_insn_info_delete are non-deferred. > so it needs someone DF aware to review > and that makes it stage1 material as well I think. Also good. I think the patch is quite safe but it's obviously not a bug fix (except perhaps for making the dumps less confusing). Let's consider this queued for GCC 4.9. FWIW, I like to think of myself as quite DF aware ;-) > It's also odd how insn_info == NULL is handled here. Yes, for GCC 4.9 I want to put a gcc_assert in df_insn_info_delete to make sure insn_info is always non-NULL. It can be NULL now for NOTEs and BARRIERs (via the remove_insn path). Ciao! Steven
Re: [PATCH][RFC] Less TODO_remove_unused_locals
On Thu, 21 Feb 2013, Jan Hubicka wrote: > > > > Hunting for the "we're getting slower" bits I noticed that > > > TODO_remove_unused_locals is a big part of execute_function_todo > > > (and accounts for 1% of compile-time of ac.f90). > > > The following patch removes most of the remove_unused_locals > > > calls based on the fact that with anonymous SSA names now available > > > we should never create new locals (wishful thinking of course ...) > > We still can kill references to existing ones. > > > > and the important places to remove unused stuff are driven by > > > 1) avoid creating yet another copy of the unused stuff, thus do > > > it before inlining, on the callee; 2) avoid pinning unused memory > > > while we operate on other function bodies, thus, do it at the end > > > of non-IPA pass pipelines > > > > > > In the end this asks for more explicit placement and thus a > > > real pass ... but the following should be enough as a RFC and > > > be good enough for 4.8. > > Yes, it looks good to me. I introduced this remove_unused_locals stuff > and I was aware it is executed quite too often. I meant to get some data on > this > but never got around that and moreover the number of execution has increased > over the years as we increased number of passes... > > My recollection is that at that time the pass also did dropping of some flags > (ADDRESSABLE?) promoting into Gimple registers. This was motivation to keep > it working often since it was code quality thing too. I suppose this is now > done elsewhere. Yes, that's done with TODO_update_address_taken which is done in a more controlled fashion. Richard.
Re: libsanitizer merge from upstream r175733
On Thu, Feb 21, 2013 at 02:16:19PM +0400, Konstantin Serebryany wrote: > The attached patch is the libsanitizer merge from upstream r175042. > > A few changes. Among other things: > - a second attempt to change the shadow offset to 0x7fff8000 (~5% > speedup). x86_64-linux-only I've been applying that i386.c change + asan_mapping.h and asan_rtl.cc changes in our tree for a few days, and it looks good on all platforms where asan is supported (i686, x86_64, ppc, ppc64). > - asan_init is now called from preinit_array (linux only) > > Patch for libsanitizer is automatically generated by libsanitizer/merge.sh > Tested with > rm -rf */{*/,}libsanitizer \ > && make -j 50 \ > && make -C gcc check-g{cc,++} > RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' > > Our internal LLVM bots (Linux, Mac and Android) are green. > > Ok to commit? Yes, thanks. Jakub
Re: Fix for 56175
On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev wrote: > Richard, > > First of all, your proposal to move type sinking to the end of > function does not work since we handle each statement in function and > we want that 1st type folding of X & C will not happen. > Note that we have the following sequence of gimple before forwprop1: > >x.0_10 = (signed char) x_8; > _11 = x.0_10 & 1; > _12 = (signed char) y_9; > _13 = _12 & 1; > _14 = _11 ^ _13; Ah, indeed. Reminds me of some of my dead patches that separated forwprop into a forward and backward stage. Of course then you have the ordering issue of whether to first forward or backward. Which means that I bet you can construct a testcase that with your change is no longer optimized (just make pushing the conversion make the types _match_). Which is always the case with this kind of local pattern-matching transforms. Currently forwprop processes leafs of expression trees first (well, inside a basic-block), similar to how fold () is supposed to be operated, based on the idea that simplified / canonicalized leafs helps keeping pattern recognition simple and cost considerations more accurate. When one order works better than another you always have to consider that the user could already have written the code in a way that results in the input that isn't well handled. Not that this helps very much for the situation ;) But I don't like the use of first_pass_instance ... and the fix isn't an improvement but just a hack for the benchmark. Richard. > I also added comment to my fix and create new test for it. I also > checked that this test is passed with patched compiler only. So > Change Log was also modified: > > ChangeLog > > 2013-02-20 Yuri Rumyantsev > > PR tree-optimization/56175 > * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking > at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C > for short integer types. > * gcc.dg/pr56175.c: New test. > > > > > 2013/2/20 Richard Biener : >> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>> pattern in simpify_bitwise_binary for short integer types. >>> The fix is very simple - we simply turn off short type sinking at the >>> first pass of forward propagation allows to get >>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>> +5-7% for other x86 platforms too. >>> Bootstrapping and regression testing were successful on x86-64. >>> >>> Is it Ok for trunk? >> >> It definitely needs a comment before the checks. >> >> Also I think it simply shows that the code is placed at the wrong spot. >> Simply moving it down in simplify_bitwise_binary to be done the very last >> should get both of the effects done. >> >> Can you rework the patch according to that? >> >> You also miss a testcase, we should make sure to not regress again here. >> >> Thanks, >> Richard. >> >>> ChangeLog. >>> >>> 2013-02-20 Yuri Rumyantsev >>> >>> PR tree-optimization/56175 >>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type sinking >>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>> for short integer types.
Re: [patch] df-scan: split df_insn_delete for clearer dumps and better speed
> > so it needs someone DF aware to review > > and that makes it stage1 material as well I think. > > Also good. I think the patch is quite safe but it's obviously not a > bug fix (except perhaps for making the dumps less confusing). Let's > consider this queued for GCC 4.9. > FWIW, I like to think of myself as quite DF aware ;-) You're definitely DF aware and if the steering committee wants to appoint you as dataflow reviewer, I would have no reason to complain... Still DF only has reviewers, not maintainers, so it needs someone else DF aware anyway. If it is 4.9 material perhaps it is simpler to place it after the patches that kill the basic block argument. Paolo
Re: [Patch, Fortran] PR 56385: [4.6/4.7/4.8 Regression] [OOP] ICE with allocatable function result in a procedure-pointer component
>> here is a straightforward patch which fixes a regression with >> procedure-pointer components which have an allocatable result. >> Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk/4.7/4.6? > > > OK thanks for the patch. Thanks, Tobias! Committed to trunk as r196202. Will apply to 4.7 and 4.6 in a few days. Cheers, Janus >> 2013-02-20 Janus Weil >> >> PR fortran/56385 >> * trans-array.c (structure_alloc_comps): Handle procedure-pointer >> components with allocatable result. >> >> 2013-02-20 Janus Weil >> >> PR fortran/56385 >> * gfortran.dg/proc_ptr_comp_37.f90: New. > >
Re: Fix for 56175
Richard, As we know Kai is working on this problem for 4.9 and I assume that type sinking will be deleted from forwprop pass. Could we stay on this fix but more common fix will be done. I also can propose to introduce new hook for it but need to know your opinion since we don't went to waste our time on preparing dead patches. Note that x86 supports all short types in HW and such type sinkning is usually useless if short types are involved. Best regards. Yuri. 2013/2/21 Richard Biener : > On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev wrote: >> Richard, >> >> First of all, your proposal to move type sinking to the end of >> function does not work since we handle each statement in function and >> we want that 1st type folding of X & C will not happen. >> Note that we have the following sequence of gimple before forwprop1: >> >>x.0_10 = (signed char) x_8; >> _11 = x.0_10 & 1; >> _12 = (signed char) y_9; >> _13 = _12 & 1; >> _14 = _11 ^ _13; > > Ah, indeed. Reminds me of some of my dead patches that separated > forwprop into a forward and backward stage. Of course then you have > the ordering issue of whether to first forward or backward. > > Which means that I bet you can construct a testcase that with > your change is no longer optimized (just make pushing the conversion > make the types _match_). Which is always the case > with this kind of local pattern-matching transforms. > > Currently forwprop processes leafs of expression trees first (well, inside > a basic-block), similar to how fold () is supposed to be operated, based > on the idea that simplified / canonicalized leafs helps keeping pattern > recognition simple and cost considerations more accurate. > > When one order works better than another you always have to consider > that the user could already have written the code in a way that results > in the input that isn't well handled. > > Not that this helps very much for the situation ;) > > But I don't like the use of first_pass_instance ... and the fix isn't > an improvement but just a hack for the benchmark. > > Richard. > >> I also added comment to my fix and create new test for it. I also >> checked that this test is passed with patched compiler only. So >> Change Log was also modified: >> >> ChangeLog >> >> 2013-02-20 Yuri Rumyantsev >> >> PR tree-optimization/56175 >> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking >> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C >> for short integer types. >> * gcc.dg/pr56175.c: New test. >> >> >> >> >> 2013/2/20 Richard Biener : >>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev wrote: Hi All, This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C pattern in simpify_bitwise_binary for short integer types. The fix is very simple - we simply turn off short type sinking at the first pass of forward propagation allows to get +10% speedup for important benchmark Coremark 1.0 at x86 Atom and +5-7% for other x86 platforms too. Bootstrapping and regression testing were successful on x86-64. Is it Ok for trunk? >>> >>> It definitely needs a comment before the checks. >>> >>> Also I think it simply shows that the code is placed at the wrong spot. >>> Simply moving it down in simplify_bitwise_binary to be done the very last >>> should get both of the effects done. >>> >>> Can you rework the patch according to that? >>> >>> You also miss a testcase, we should make sure to not regress again here. >>> >>> Thanks, >>> Richard. >>> ChangeLog. 2013-02-20 Yuri Rumyantsev PR tree-optimization/56175 * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type sinking at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C for short integer types.
Re: Fix for 56175
On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev wrote: > Richard, > > As we know Kai is working on this problem for 4.9 and I assume that > type sinking will be deleted from forwprop pass. Could we stay on this > fix but more common fix will be done. Well, unless you show it is a regression the patch is not applicable for 4.8 anyway. Not sure if the code will be deleted from forwprop pass in 4.9 either, it is after all a canonicalization - fold seems to perform the opposite one though: /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set in c). This folds extension into the BIT_AND_EXPR. note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, not sinking. Generally frontends and fold try to narrow operands when possible (even though some targets later widen them again because of instruction set constraints). Most of this hoisting code was done to make lowering logical && and || I believe. Looking at the testcases added tells us that while matching the two patterns as done now helps them but only because that pattern feeds single-operand instructions that then simplify. So doing the transform starting from that single-operand instructions instead looks like a better fix (op !=/== 0/1 and (T) op) and also would not disagree with the canonicalization done by fold. Richard. > I also can propose to introduce new hook for it but need to know your > opinion since we don't went to waste our time on preparing dead > patches. Note that x86 supports all short types in HW and such type > sinkning is usually useless if short types are involved. > > Best regards. > Yuri. > > > 2013/2/21 Richard Biener : >> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> First of all, your proposal to move type sinking to the end of >>> function does not work since we handle each statement in function and >>> we want that 1st type folding of X & C will not happen. >>> Note that we have the following sequence of gimple before forwprop1: >>> >>>x.0_10 = (signed char) x_8; >>> _11 = x.0_10 & 1; >>> _12 = (signed char) y_9; >>> _13 = _12 & 1; >>> _14 = _11 ^ _13; >> >> Ah, indeed. Reminds me of some of my dead patches that separated >> forwprop into a forward and backward stage. Of course then you have >> the ordering issue of whether to first forward or backward. >> >> Which means that I bet you can construct a testcase that with >> your change is no longer optimized (just make pushing the conversion >> make the types _match_). Which is always the case >> with this kind of local pattern-matching transforms. >> >> Currently forwprop processes leafs of expression trees first (well, inside >> a basic-block), similar to how fold () is supposed to be operated, based >> on the idea that simplified / canonicalized leafs helps keeping pattern >> recognition simple and cost considerations more accurate. >> >> When one order works better than another you always have to consider >> that the user could already have written the code in a way that results >> in the input that isn't well handled. >> >> Not that this helps very much for the situation ;) >> >> But I don't like the use of first_pass_instance ... and the fix isn't >> an improvement but just a hack for the benchmark. >> >> Richard. >> >>> I also added comment to my fix and create new test for it. I also >>> checked that this test is passed with patched compiler only. So >>> Change Log was also modified: >>> >>> ChangeLog >>> >>> 2013-02-20 Yuri Rumyantsev >>> >>> PR tree-optimization/56175 >>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking >>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>> for short integer types. >>> * gcc.dg/pr56175.c: New test. >>> >>> >>> >>> >>> 2013/2/20 Richard Biener : On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev wrote: > Hi All, > > This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C > pattern in simpify_bitwise_binary for short integer types. > The fix is very simple - we simply turn off short type sinking at the > first pass of forward propagation allows to get > +10% speedup for important benchmark Coremark 1.0 at x86 Atom and > +5-7% for other x86 platforms too. > Bootstrapping and regression testing were successful on x86-64. > > Is it Ok for trunk? It definitely needs a comment before the checks. Also I think it simply shows that the code is placed at the wrong spot. Simply moving it down in simplify_bitwise_binary to be done the very last should get both of the effects done. Can you rework the patch according to that? You also miss a testcase, we should make sure to not regress again here. Thanks, Richard. > ChangeLog. > > 2013-02-20 Yuri Rumyantsev >>
Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...
On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote: > This commit breaks the build if the BFD linker is used (I have gold on > my box, so I missed it). > > Short repro: > % cat preinit.cc > void foo() {} > __attribute__((section(".preinit_array"))) void (*xxx)(void) = foo; > % g++ preinit.cc -shared # gold > % sudo apt-get remove binutils-gold > ... > % g++ preinit.cc -shared # bfd > /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO > /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable > section on output > collect2: ld returned 1 exit status > % > > Can we stop building the asan-rt as DSO and leave only the static > variant (as in clang)? No, IMNSHO it is desirable to support also that. Here is a different fix, so libasan.so will not have .preinit_array, but libasan.a will have it. Ideally, that hunk should go into a separate source file (asan_preinit.cc ?), be just compiled into an object file, rather than shared library and the compiler driver should include it explicitly in the link. The used attribute is there because, as it isn't (or shouldn't) be exported out of the library, if libasan was built with LTO, it could very well be optimized away. 2013-02-21 Jakub Jelinek * asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is defined. Add used attribute. --- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100 +++ libsanitizer/asan/asan_rtl.cc 2013-02-21 14:16:28.985547506 +0100 @@ -520,11 +520,11 @@ void __asan_init() { } } -#if ASAN_USE_PREINIT_ARRAY +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC) // On Linux, we force __asan_init to be called before anyone else // by placing it into .preinit_array section. // FIXME: do we have anything like this on Mac? - __attribute__((section(".preinit_array"))) + __attribute__((section(".preinit_array"), used)) void (*__asan_preinit)(void) =__asan_init; #elif defined(_WIN32) && defined(_DLL) // On Windows, when using dynamic CRT (/MD), we can put a pointer Jakub
Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...
On Thu, Feb 21, 2013 at 02:21:36PM +0100, Jakub Jelinek wrote: > Here is a different fix, so libasan.so will not have .preinit_array, but > libasan.a will have it. Ideally, that hunk should go into a separate > source file (asan_preinit.cc ?), be just compiled into an object file, > rather than shared library and the compiler driver should include it > explicitly in the link. BTW, if you move it into asan_preinit.cc (or whatever other name) separate file, as the Makefiles aren't shared, it is fine if for clang you choose to put it into clang libasan.a, and gcc can do something different. Jakub
Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...
On Thu, Feb 21, 2013 at 5:21 PM, Jakub Jelinek wrote: > On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote: >> This commit breaks the build if the BFD linker is used (I have gold on >> my box, so I missed it). >> >> Short repro: >> % cat preinit.cc >> void foo() {} >> __attribute__((section(".preinit_array"))) void (*xxx)(void) = foo; >> % g++ preinit.cc -shared # gold >> % sudo apt-get remove binutils-gold >> ... >> % g++ preinit.cc -shared # bfd >> /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO >> /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable >> section on output >> collect2: ld returned 1 exit status >> % >> >> Can we stop building the asan-rt as DSO and leave only the static >> variant (as in clang)? > > No, IMNSHO it is desirable to support also that. It may cause I more trouble (we've seen a couple of bugs already) then do good. Anyway, we can get rid of this later. > > Here is a different fix, so libasan.so will not have .preinit_array, but > libasan.a will have it. Ideally, that hunk should go into a separate > source file (asan_preinit.cc ?), be just compiled into an object file, > rather than shared library and the compiler driver should include it > explicitly in the link. > > The used attribute is there because, as it isn't (or shouldn't) be exported > out of the library, if libasan was built with LTO, it could very well be > optimized away. > > 2013-02-21 Jakub Jelinek > > * asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is > defined. Add used attribute. > > --- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100 > +++ libsanitizer/asan/asan_rtl.cc 2013-02-21 14:16:28.985547506 +0100 > @@ -520,11 +520,11 @@ void __asan_init() { >} > } > > -#if ASAN_USE_PREINIT_ARRAY > +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC) >// On Linux, we force __asan_init to be called before anyone else >// by placing it into .preinit_array section. >// FIXME: do we have anything like this on Mac? > - __attribute__((section(".preinit_array"))) > + __attribute__((section(".preinit_array"), used)) >void (*__asan_preinit)(void) =__asan_init; > #elif defined(_WIN32) && defined(_DLL) >// On Windows, when using dynamic CRT (/MD), we can put a pointer Thanks! May I ask you to commit this to gcc (I have to run away now)? I'll put this into upstream (maybe with asan_preinit.cc as you suggest) tomorrow. --kcc > > > Jakub
Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...
On Thu, Feb 21, 2013 at 05:26:30PM +0400, Konstantin Serebryany wrote: > May I ask you to commit this to gcc (I have to run away now)? I'll get it through bootstrap/regtest cycle first, commit if that succeeds. Jakub
Re: Fix for 56175
Richard, This regression was introduced by Kai http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html 2011-06-27 Kai Tietz * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve type sinking. * tree-ssa-math-opts.c (execute_optimize_bswap): Separate search for di/si mode patterns for finding widest match. Is it sufficient for you to accept my patch? Best regards. yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev wrote: >> Richard, >> >> As we know Kai is working on this problem for 4.9 and I assume that >> type sinking will be deleted from forwprop pass. Could we stay on this >> fix but more common fix will be done. > > Well, unless you show it is a regression the patch is not applicable for 4.8 > anyway. Not sure if the code will be deleted from forwprop pass in 4.9 > either, > it is after all a canonicalization - fold seems to perform the opposite one > though: > > /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer > constants (if x has signed type, the sign bit cannot be set > in c). This folds extension into the BIT_AND_EXPR. > > note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, > not sinking. Generally frontends and fold try to narrow operands when > possible (even though some targets later widen them again because of > instruction set constraints). > > Most of this hoisting code was done to make lowering logical && and || > I believe. Looking at the testcases added tells us that while matching > the two patterns as done now helps them but only because that pattern > feeds single-operand instructions that then simplify. So doing the transform > starting from that single-operand instructions instead looks like a better > fix (op !=/== 0/1 and (T) op) and also would not disagree with the > canonicalization done by fold. > > Richard. > >> I also can propose to introduce new hook for it but need to know your >> opinion since we don't went to waste our time on preparing dead >> patches. Note that x86 supports all short types in HW and such type >> sinkning is usually useless if short types are involved. >> >> Best regards. >> Yuri. >> >> >> 2013/2/21 Richard Biener : >>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev wrote: Richard, First of all, your proposal to move type sinking to the end of function does not work since we handle each statement in function and we want that 1st type folding of X & C will not happen. Note that we have the following sequence of gimple before forwprop1: x.0_10 = (signed char) x_8; _11 = x.0_10 & 1; _12 = (signed char) y_9; _13 = _12 & 1; _14 = _11 ^ _13; >>> >>> Ah, indeed. Reminds me of some of my dead patches that separated >>> forwprop into a forward and backward stage. Of course then you have >>> the ordering issue of whether to first forward or backward. >>> >>> Which means that I bet you can construct a testcase that with >>> your change is no longer optimized (just make pushing the conversion >>> make the types _match_). Which is always the case >>> with this kind of local pattern-matching transforms. >>> >>> Currently forwprop processes leafs of expression trees first (well, inside >>> a basic-block), similar to how fold () is supposed to be operated, based >>> on the idea that simplified / canonicalized leafs helps keeping pattern >>> recognition simple and cost considerations more accurate. >>> >>> When one order works better than another you always have to consider >>> that the user could already have written the code in a way that results >>> in the input that isn't well handled. >>> >>> Not that this helps very much for the situation ;) >>> >>> But I don't like the use of first_pass_instance ... and the fix isn't >>> an improvement but just a hack for the benchmark. >>> >>> Richard. >>> I also added comment to my fix and create new test for it. I also checked that this test is passed with patched compiler only. So Change Log was also modified: ChangeLog 2013-02-20 Yuri Rumyantsev PR tree-optimization/56175 * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C for short integer types. * gcc.dg/pr56175.c: New test. 2013/2/20 Richard Biener : > On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev > wrote: >> Hi All, >> >> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >> pattern in simpify_bitwise_binary for short integer types. >> The fix is very simple - we simply turn off short type sinking at the >> first pass of forward propagation allows to get >> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >> +5-7% for other x86 platforms too. >> Bootstrapping and
Re: Fix for 56175
On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev wrote: > Richard, > > This regression was introduced by Kai > > http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html > > 2011-06-27 Kai Tietz > > * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve > type sinking. > * tree-ssa-math-opts.c (execute_optimize_bswap): Separate > search for di/si mode patterns for finding widest match. > > Is it sufficient for you to accept my patch? No, I still don't think the fix is sound. A proper fix would revert the above change (the simplify_bitwise_binary one), watch for testsuite fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) and fix those failures in a better way. Richard. > Best regards. > yuri. > > 2013/2/21 Richard Biener : >> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> As we know Kai is working on this problem for 4.9 and I assume that >>> type sinking will be deleted from forwprop pass. Could we stay on this >>> fix but more common fix will be done. >> >> Well, unless you show it is a regression the patch is not applicable for 4.8 >> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >> either, >> it is after all a canonicalization - fold seems to perform the opposite one >> though: >> >> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >> constants (if x has signed type, the sign bit cannot be set >> in c). This folds extension into the BIT_AND_EXPR. >> >> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, >> not sinking. Generally frontends and fold try to narrow operands when >> possible (even though some targets later widen them again because of >> instruction set constraints). >> >> Most of this hoisting code was done to make lowering logical && and || >> I believe. Looking at the testcases added tells us that while matching >> the two patterns as done now helps them but only because that pattern >> feeds single-operand instructions that then simplify. So doing the transform >> starting from that single-operand instructions instead looks like a better >> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >> canonicalization done by fold. >> >> Richard. >> >>> I also can propose to introduce new hook for it but need to know your >>> opinion since we don't went to waste our time on preparing dead >>> patches. Note that x86 supports all short types in HW and such type >>> sinkning is usually useless if short types are involved. >>> >>> Best regards. >>> Yuri. >>> >>> >>> 2013/2/21 Richard Biener : On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev wrote: > Richard, > > First of all, your proposal to move type sinking to the end of > function does not work since we handle each statement in function and > we want that 1st type folding of X & C will not happen. > Note that we have the following sequence of gimple before forwprop1: > >x.0_10 = (signed char) x_8; > _11 = x.0_10 & 1; > _12 = (signed char) y_9; > _13 = _12 & 1; > _14 = _11 ^ _13; Ah, indeed. Reminds me of some of my dead patches that separated forwprop into a forward and backward stage. Of course then you have the ordering issue of whether to first forward or backward. Which means that I bet you can construct a testcase that with your change is no longer optimized (just make pushing the conversion make the types _match_). Which is always the case with this kind of local pattern-matching transforms. Currently forwprop processes leafs of expression trees first (well, inside a basic-block), similar to how fold () is supposed to be operated, based on the idea that simplified / canonicalized leafs helps keeping pattern recognition simple and cost considerations more accurate. When one order works better than another you always have to consider that the user could already have written the code in a way that results in the input that isn't well handled. Not that this helps very much for the situation ;) But I don't like the use of first_pass_instance ... and the fix isn't an improvement but just a hack for the benchmark. Richard. > I also added comment to my fix and create new test for it. I also > checked that this test is passed with patched compiler only. So > Change Log was also modified: > > ChangeLog > > 2013-02-20 Yuri Rumyantsev > > PR tree-optimization/56175 > * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type > sinking > at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C > for short integer types. > * gcc.dg/pr56175.c: New test. > > > > > 2013/2/20 Richard Biener : >> On Wed, Feb 20, 2013 at 1:00 PM,
[PATCH] Fix handling of very long asm statements in inliner
An auto generated program with a 6.4mio line asm statement gave with 4.7 and 4.8: xxx.c:6400017:1: internal compiler error: in account_size_time, at ipa-inline-analysis.c:601 The problem is that the inliner counts the number of lines in the asm statement and multiplies that with a weight. With the weight this overflows 32bit signed int, and triggers an assert for negative time. Fix this by limiting the number of lines to 1000 for asm cost estimation. The RTL backend also does similar multiplications for jump shortening. I haven't tried to address this, but presumably it's less likely to result in a failure. Passes test suite on x86_64-linux. Ok for 4.7 and 4.8? 2013-02-17 Andi Kleen * tree-inline.c (estimate_num_insns): Limit asm cost to 1000. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 2a1b692..7f8f2f2 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights) return 0; case GIMPLE_ASM: - return asm_str_count (gimple_asm_string (stmt)); + { + int count = asm_str_count (gimple_asm_string (stmt)); + /* 1000 means infinity. This avoids overflows later + with very long asm statements. */ + if (count > 1000) + count = 1000; + return count; + } case GIMPLE_RESX: /* This is either going to be an external function call with one -- a...@linux.intel.com -- Speaking for myself only.
Re: libsanitizer merge from upstream r175733
This commit (r196201) causes a lot of bootstrap failures (including x86_64-apple-darwin10), see http://gcc.gnu.org/ml/gcc-regression/2013-02/ TIA Dominique
Re: Fix for 56175
Richard, I double checked that with and without my fix compiler produces the same output with -fdump-tree-optimized. What patch did you apply? I think that you should apply the second one - 56175.diff. Yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev wrote: >> Richard, >> >> This regression was introduced by Kai >> >> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >> >> 2011-06-27 Kai Tietz >> >> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >> type sinking. >> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >> search for di/si mode patterns for finding widest match. >> >> Is it sufficient for you to accept my patch? > > No, I still don't think the fix is sound. A proper fix would revert the > above change (the simplify_bitwise_binary one), watch for testsuite > fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) > and fix those failures in a better way. > > Richard. > >> Best regards. >> yuri. >> >> 2013/2/21 Richard Biener : >>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev wrote: Richard, As we know Kai is working on this problem for 4.9 and I assume that type sinking will be deleted from forwprop pass. Could we stay on this fix but more common fix will be done. >>> >>> Well, unless you show it is a regression the patch is not applicable for 4.8 >>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>> either, >>> it is after all a canonicalization - fold seems to perform the opposite one >>> though: >>> >>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>> constants (if x has signed type, the sign bit cannot be set >>> in c). This folds extension into the BIT_AND_EXPR. >>> >>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, >>> not sinking. Generally frontends and fold try to narrow operands when >>> possible (even though some targets later widen them again because of >>> instruction set constraints). >>> >>> Most of this hoisting code was done to make lowering logical && and || >>> I believe. Looking at the testcases added tells us that while matching >>> the two patterns as done now helps them but only because that pattern >>> feeds single-operand instructions that then simplify. So doing the >>> transform >>> starting from that single-operand instructions instead looks like a better >>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>> canonicalization done by fold. >>> >>> Richard. >>> I also can propose to introduce new hook for it but need to know your opinion since we don't went to waste our time on preparing dead patches. Note that x86 supports all short types in HW and such type sinkning is usually useless if short types are involved. Best regards. Yuri. 2013/2/21 Richard Biener : > On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev > wrote: >> Richard, >> >> First of all, your proposal to move type sinking to the end of >> function does not work since we handle each statement in function and >> we want that 1st type folding of X & C will not happen. >> Note that we have the following sequence of gimple before forwprop1: >> >>x.0_10 = (signed char) x_8; >> _11 = x.0_10 & 1; >> _12 = (signed char) y_9; >> _13 = _12 & 1; >> _14 = _11 ^ _13; > > Ah, indeed. Reminds me of some of my dead patches that separated > forwprop into a forward and backward stage. Of course then you have > the ordering issue of whether to first forward or backward. > > Which means that I bet you can construct a testcase that with > your change is no longer optimized (just make pushing the conversion > make the types _match_). Which is always the case > with this kind of local pattern-matching transforms. > > Currently forwprop processes leafs of expression trees first (well, inside > a basic-block), similar to how fold () is supposed to be operated, based > on the idea that simplified / canonicalized leafs helps keeping pattern > recognition simple and cost considerations more accurate. > > When one order works better than another you always have to consider > that the user could already have written the code in a way that results > in the input that isn't well handled. > > Not that this helps very much for the situation ;) > > But I don't like the use of first_pass_instance ... and the fix isn't > an improvement but just a hack for the benchmark. > > Richard. > >> I also added comment to my fix and create new test for it. I also >> checked that this test is passed with patched compiler only. So >> Change Log was also modified: >> >> ChangeLog >> >> 2013-02-20 Yuri Rumyantsev >> >> PR tree-o
Re: libsanitizer merge from upstream r175733
The failure on x86_64-apple-darwin10 is ... true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" "CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" "CFLAGS=-g -O2 -m32" "CXXFLAGS=-g -O2 -m32" "CFLAGS_FOR_BUILD=-g -O2" "CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" "INSTALL_DATA=/sw64/bin/ginstall -c -m 644" "INSTALL_PROGRAM=/sw64/bin/ginstall -c" "INSTALL_SCRIPT=/sw64/bin/ginstall -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2 -m32" "LIBCFLAGS_FOR_TARGET=-g -O2" "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 --split-size=500 " "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/opt/gcc/gcc4.8w" "infodir=/opt/gcc/gcc4.8w/share/info" "libdir=/opt/gcc/gcc4.8w/lib" "includedir=/opt/gcc/gcc4.8w/include" "prefix=/opt/gcc/gcc4.8w" "tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" "gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" "AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" "RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make Checking multilib configuration for libsanitizer... Making all in sanitizer_common make[4]: Nothing to be done for `all'. Making all in asan make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', needed by `asan_interceptors_dynamic.lo'. Stop. make[3]: *** [all-recursive] Error 1 make[2]: *** [all-stage1-target-libsanitizer] Error 2 make[1]: *** [stage1-bubble] Error 2 make: *** [all] Error 2 On linux it is /bin/sh ../libtool --tag=CXX --mode=link /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include -I../../libstdc++-v3/include/i686-linux -I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 -D_GNU_SOURCE -version-info `grep -v '^#' ../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl -o libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo asan_thread.lo asan_win.lo ../sanitizer_common/libsanitizer_common.la ../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la libtool: link: /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include -shared -nostdlib /lib/crti.o /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtbeginS.o .libs/asan_allocator.o .libs/asan_allocator2.o .libs/asan_interceptors.o .libs/asan_mac.o .libs/asan_malloc_mac.o .libs/asan_new_delete.o .libs/asan_posix.o .libs/asan_rtl.o .libs/asan_stats.o .libs/asan_thread_registry.o .libs/asan_fake_stack.o .libs/asan_globals.o .libs/asan_linux.o .libs/asan_malloc_linux.o .libs/asan_malloc_win.o .libs/asan_poisoning.o .libs/asan_report.o .libs/asan_stack.o .libs/asan_thread.o .libs/asan_win.o -Wl,--whole-archive ../sanitizer_common/.libs/libsanitizer_common.a ../interception/.libs/libinterception.a -Wl,--no-whole-archive -Wl,-rpath -Wl,/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs -Wl,-rpath -Wl,/usr/local/lib -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src -lpthread -ldl ../../libstdc++-v3/src/.libs/libstdc++.so -lm -L/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -lc -lgcc_s /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtendS.o /lib/crtn.o -Wl,-soname -Wl,libasan.so.0 -o .libs/libasan.so.0.0.0 /usr/local32/bin/ld: .libs/asan_rtl.o: .preinit_array section is not allowed in DSO /usr/local32/bin/ld: failed to set dynamic section sizes: Nonrepresentable section on output Dominique
Re: libsanitizer merge from upstream r175733
On Thu, Feb 21, 2013 at 03:29:40PM +0100, Dominique Dhumieres wrote: > The failure on x86_64-apple-darwin10 is > > ... > true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" > "CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" > "CFLAGS=-g -O2 -m32" "CXXFLAGS=-g -O2 -m32" "CFLAGS_FOR_BUILD=-g -O2" > "CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" > "INSTALL_DATA=/sw64/bin/ginstall -c -m 644" > "INSTALL_PROGRAM=/sw64/bin/ginstall -c" "INSTALL_SCRIPT=/sw64/bin/ginstall > -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2 -m32" "LIBCFLAGS_FOR_TARGET=-g -O2" > "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 > --split-size=500 " "SHELL=/bin/sh" "RUNTESTFLAGS=" > "exec_prefix=/opt/gcc/gcc4.8w" "infodir=/opt/gcc/gcc4.8w/share/info" > "libdir=/opt/gcc/gcc4.8w/lib" "includedir=/opt/gcc/gcc4.8w/include" > "prefix=/opt/gcc/gcc4.8w" > "tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" > "gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" > "AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" > "RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" > "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make > Checking multilib configuration for libsanitizer... > Making all in sanitizer_common > make[4]: Nothing to be done for `all'. > Making all in asan > make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', > needed by `asan_interceptors_dynamic.lo'. Stop. > make[3]: *** [all-recursive] Error 1 > make[2]: *** [all-stage1-target-libsanitizer] Error 2 > make[1]: *** [stage1-bubble] Error 2 > make: *** [all] Error 2 Removed: trunk/libsanitizer/asan/dynamic/asan_interceptors_dynamic.cc Thus I guess perhaps you should try removing libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc line from libsanitizer/asan/Makefile.am and run automake to regenerate Makefile.in. > On linux it is > > /bin/sh ../libtool --tag=CXX --mode=link > /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc > -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs > -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem > /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include > -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC > -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables > -fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include > -I../../libstdc++-v3/include/i686-linux > -I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 > -D_GNU_SOURCE -version-info `grep -v '^#' > ../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl -o > libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo > asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo > asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo > asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo > asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo > asan_thread.lo asan_win.lo ../sanitizer_common/libsanitizer_common.la > ../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la > libtool: link: /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc > -shared-libgcc -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc > -nostdinc++ > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs > -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem > /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include > -shared -nostdlib /lib/crti.o > /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/crtbeginS.o > .libs/asan_allocator.o .libs/asan_allocator2.o .libs/asan_interceptors.o > .libs/asan_mac.o .libs/asan_malloc_mac.o .libs/asan_new_delete.o > .libs/asan_posix.o .libs/asan_rtl.o .libs/asan_stats.o > .libs/asan_thread_registry.o .libs/asan_fake_stack.o .libs/asan_globals.o > .libs/asan_linux.o .libs/asan_malloc_linux.o .libs/asan_malloc_win.o > .libs/asan_poisoning.o .libs/asan_report.o .libs/asan_stack.o > .libs/asan_thread.o .libs/asan_win.o -Wl,--whole-archive > ../sanitizer_common/.libs/libsanitizer_common.a > ../interception/.libs/libinterception.a -Wl,--no-whole-archive -Wl,-rpath > -Wl,/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs > -Wl,-rpath -Wl,/usr/local/lib > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/
Re: [PATCH] Fix handling of very long asm statements in inliner
On 21/02/13 14:05, Andi Kleen wrote: An auto generated program with a 6.4mio line asm statement gave with 4.7 and 4.8: xxx.c:6400017:1: internal compiler error: in account_size_time, at ipa-inline-analysis.c:601 The problem is that the inliner counts the number of lines in the asm statement and multiplies that with a weight. With the weight this overflows 32bit signed int, and triggers an assert for negative time. Fix this by limiting the number of lines to 1000 for asm cost estimation. The RTL backend also does similar multiplications for jump shortening. I haven't tried to address this, but presumably it's less likely to result in a failure. Passes test suite on x86_64-linux. Ok for 4.7 and 4.8? 2013-02-17 Andi Kleen * tree-inline.c (estimate_num_insns): Limit asm cost to 1000. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 2a1b692..7f8f2f2 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights) return 0; case GIMPLE_ASM: - return asm_str_count (gimple_asm_string (stmt)); + { + int count = asm_str_count (gimple_asm_string (stmt)); + /* 1000 means infinity. This avoids overflows later + with very long asm statements. */ + if (count > 1000) + count = 1000; + return count; + } case GIMPLE_RESX: /* This is either going to be an external function call with one That doesn't sound enough, unless there is already code out there that respects this count. 1000 at 4 bytes per instruction is only 4k. More that small enough for the rest of the compiler to think that it could jump around such blocks cheaply. I think a limit of 1M or more might be more appropriate. R.
Re: Fix for 56175
On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev wrote: > Richard, > > I double checked that with and without my fix compiler produces the > same output with -fdump-tree-optimized. For what testcase? Richard. > What patch did you apply? I think that you should apply the second one > - 56175.diff. > > Yuri. > > 2013/2/21 Richard Biener : >> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> This regression was introduced by Kai >>> >>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>> >>> 2011-06-27 Kai Tietz >>> >>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>> type sinking. >>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>> search for di/si mode patterns for finding widest match. >>> >>> Is it sufficient for you to accept my patch? >> >> No, I still don't think the fix is sound. A proper fix would revert the >> above change (the simplify_bitwise_binary one), watch for testsuite >> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >> and fix those failures in a better way. >> >> Richard. >> >>> Best regards. >>> yuri. >>> >>> 2013/2/21 Richard Biener : On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev wrote: > Richard, > > As we know Kai is working on this problem for 4.9 and I assume that > type sinking will be deleted from forwprop pass. Could we stay on this > fix but more common fix will be done. Well, unless you show it is a regression the patch is not applicable for 4.8 anyway. Not sure if the code will be deleted from forwprop pass in 4.9 either, it is after all a canonicalization - fold seems to perform the opposite one though: /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set in c). This folds extension into the BIT_AND_EXPR. note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, not sinking. Generally frontends and fold try to narrow operands when possible (even though some targets later widen them again because of instruction set constraints). Most of this hoisting code was done to make lowering logical && and || I believe. Looking at the testcases added tells us that while matching the two patterns as done now helps them but only because that pattern feeds single-operand instructions that then simplify. So doing the transform starting from that single-operand instructions instead looks like a better fix (op !=/== 0/1 and (T) op) and also would not disagree with the canonicalization done by fold. Richard. > I also can propose to introduce new hook for it but need to know your > opinion since we don't went to waste our time on preparing dead > patches. Note that x86 supports all short types in HW and such type > sinkning is usually useless if short types are involved. > > Best regards. > Yuri. > > > 2013/2/21 Richard Biener : >> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev >> wrote: >>> Richard, >>> >>> First of all, your proposal to move type sinking to the end of >>> function does not work since we handle each statement in function and >>> we want that 1st type folding of X & C will not happen. >>> Note that we have the following sequence of gimple before forwprop1: >>> >>>x.0_10 = (signed char) x_8; >>> _11 = x.0_10 & 1; >>> _12 = (signed char) y_9; >>> _13 = _12 & 1; >>> _14 = _11 ^ _13; >> >> Ah, indeed. Reminds me of some of my dead patches that separated >> forwprop into a forward and backward stage. Of course then you have >> the ordering issue of whether to first forward or backward. >> >> Which means that I bet you can construct a testcase that with >> your change is no longer optimized (just make pushing the conversion >> make the types _match_). Which is always the case >> with this kind of local pattern-matching transforms. >> >> Currently forwprop processes leafs of expression trees first (well, >> inside >> a basic-block), similar to how fold () is supposed to be operated, based >> on the idea that simplified / canonicalized leafs helps keeping pattern >> recognition simple and cost considerations more accurate. >> >> When one order works better than another you always have to consider >> that the user could already have written the code in a way that results >> in the input that isn't well handled. >> >> Not that this helps very much for the situation ;) >> >> But I don't like the use of first_pass_instance ... and the fix isn't >> an improvement but just a hack for the benchmark. >> >> Richard. >> >>> I also added comment to my fix an
Re: libsanitizer merge from upstream r175733
On Thu, Feb 21, 2013 at 03:45:04PM +0100, Jakub Jelinek wrote: > On Thu, Feb 21, 2013 at 03:29:40PM +0100, Dominique Dhumieres wrote: > > The failure on x86_64-apple-darwin10 is > > > > ... > > true "AR_FLAGS=rc" "CC_FOR_BUILD=gcc" > > "CC_FOR_TARGET=/opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/" > > "CFLAGS=-g -O2 -m32" "CXXFLAGS=-g -O2 -m32" "CFLAGS_FOR_BUILD=-g -O2" > > "CFLAGS_FOR_TARGET=-g -O2" "INSTALL=/sw64/bin/ginstall -c" > > "INSTALL_DATA=/sw64/bin/ginstall -c -m 644" > > "INSTALL_PROGRAM=/sw64/bin/ginstall -c" "INSTALL_SCRIPT=/sw64/bin/ginstall > > -c" "LDFLAGS=-m32" "LIBCFLAGS=-g -O2 -m32" "LIBCFLAGS_FOR_TARGET=-g -O2" > > "MAKE=make" "MAKEINFO=makeinfo --split-size=500 --split-size=500 > > --split-size=500 " "SHELL=/bin/sh" "RUNTESTFLAGS=" > > "exec_prefix=/opt/gcc/gcc4.8w" "infodir=/opt/gcc/gcc4.8w/share/info" > > "libdir=/opt/gcc/gcc4.8w/lib" "includedir=/opt/gcc/gcc4.8w/include" > > "prefix=/opt/gcc/gcc4.8w" > > "tooldir=/opt/gcc/gcc4.8w/x86_64-apple-darwin10.8.0" > > "gxx_include_dir=/opt/gcc/gcc4.8w/include/c++/4.8.0" "AR=ar" > > "AS=/opt/gcc/build_w/./gcc/as" "LD=/opt/gcc/build_w/./gcc/collect-ld" > > "RANLIB=ranlib" "NM=/opt/gcc/build_w/./gcc/nm" "NM_FOR_BUILD=" > > "NM_FOR_TARGET=nm" "DESTDIR=" "WERROR=" DO=all multi-do # make > > Checking multilib configuration for libsanitizer... > > Making all in sanitizer_common > > make[4]: Nothing to be done for `all'. > > Making all in asan > > make[4]: *** No rule to make target `dynamic/asan_interceptors_dynamic.cc', > > needed by `asan_interceptors_dynamic.lo'. Stop. > > make[3]: *** [all-recursive] Error 1 > > make[2]: *** [all-stage1-target-libsanitizer] Error 2 > > make[1]: *** [stage1-bubble] Error 2 > > make: *** [all] Error 2 > > Removed: > > trunk/libsanitizer/asan/dynamic/asan_interceptors_dynamic.cc > Index: libsanitizer/asan/Makefile.am === --- libsanitizer/asan/Makefile.am (revision 196205) +++ libsanitizer/asan/Makefile.am (working copy) @@ -37,7 +37,6 @@ asan_files = \ libasan_la_SOURCES = $(asan_files) if USING_MAC_INTERPOSE -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la else libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la $(top_builddir)/interception/libinterception.la Index: libsanitizer/merge.sh === --- libsanitizer/merge.sh (revision 196205) +++ libsanitizer/merge.sh (working copy) @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev) echo Current upstream revision: $CUR_REV merge include/sanitizer include/sanitizer merge lib/asan asan -merge lib/asan/dynamic asan/dynamic merge lib/tsan/rtl tsan merge lib/sanitizer_common sanitizer_common merge lib/interception interception appears to fix the darwin bootstrap after running automake1.11 in libsanitizer. > > Thus I guess perhaps you should try removing > libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc > line from libsanitizer/asan/Makefile.am > and run automake to regenerate Makefile.in. > > > On linux it is > > > > /bin/sh ../libtool --tag=CXX --mode=link > > /export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc -shared-libgcc > > -B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc -nostdinc++ > > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src > > > > -L/export/gnu/import/git/gcc-test-ia32corei7/bld/i686-linux/libstdc++-v3/src/.libs > > -B/usr/local/i686-linux/bin/ -B/usr/local/i686-linux/lib/ -isystem > > /usr/local/i686-linux/include -isystem /usr/local/i686-linux/sys-include > > -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long > > -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables > > -fvisibility=hidden -Wno-variadic-macros -I../../libstdc++-v3/include > > -I../../libstdc++-v3/include/i686-linux > > -I../../../../src-trunk/libsanitizer/../libstdc++-v3/libsupc++ -g -O2 > > -D_GNU_SOURCE -version-info `grep -v '^#' > > ../../../../src-trunk/libsanitizer/asan/libtool-version` -lpthread -ldl -o > > libasan.la -rpath /usr/local/lib asan_allocator.lo asan_allocator2.lo > > asan_interceptors.lo asan_mac.lo asan_malloc_mac.lo asan_new_delete.lo > > asan_posix.lo asan_rtl.lo asan_stats.lo asan_thread_registry.lo > > asan_fake_stack.lo asan_globals.lo asan_linux.lo asan_malloc_linux.lo > > asan_malloc_win.lo asan_poisoning.lo asan_report.lo asan_stack.lo > > asan_thread.lo asan_win.lo ../sanitizer_common/libsanitizer_common.la > > ../interception/libinterception.la ../../libstdc++-v3/src/libstdc++.la > > libt
Re: libsanitizer merge from upstream r175733
On Thu, Feb 21, 2013 at 10:06:12AM -0500, Jack Howarth wrote: > Index: libsanitizer/asan/Makefile.am > === > --- libsanitizer/asan/Makefile.am (revision 196205) > +++ libsanitizer/asan/Makefile.am (working copy) > @@ -37,7 +37,6 @@ asan_files = \ > > libasan_la_SOURCES = $(asan_files) > if USING_MAC_INTERPOSE > -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc > libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la > else > libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la > $(top_builddir)/interception/libinterception.la > Index: libsanitizer/merge.sh > === > --- libsanitizer/merge.sh (revision 196205) > +++ libsanitizer/merge.sh (working copy) > @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev) > echo Current upstream revision: $CUR_REV > merge include/sanitizer include/sanitizer > merge lib/asan asan > -merge lib/asan/dynamic asan/dynamic > merge lib/tsan/rtl tsan > merge lib/sanitizer_common sanitizer_common > merge lib/interception interception > > appears to fix the darwin bootstrap after running automake1.11 in > libsanitizer. Patch preapproved with proper ChangeLog entry. Jakub
Re: Fix for 56175
Richard, Sorry for my previous message - I did not undersatnd it properly. Anyway I proposed another fix that avoid (type) x & c --> (type) (x & (type-x) c) transformation if x has short type: +++ gcc/tree-ssa-forwprop.c (working copy) @@ -1789,8 +1789,11 @@ defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). + Do that only if X has not short type. */ if (TREE_CODE (arg2) == INTEGER_CST + && (TYPE_PRECISION (TREE_TYPE (arg1)) + >= TYPE_PRECISION (integer_type_node)) && CONVERT_EXPR_CODE_P (def1_code) && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) Does this fix look suitable? Yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev wrote: >> Richard, >> >> I double checked that with and without my fix compiler produces the >> same output with -fdump-tree-optimized. > > For what testcase? > > Richard. > >> What patch did you apply? I think that you should apply the second one >> - 56175.diff. >> >> Yuri. >> >> 2013/2/21 Richard Biener : >>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev wrote: Richard, This regression was introduced by Kai http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html 2011-06-27 Kai Tietz * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve type sinking. * tree-ssa-math-opts.c (execute_optimize_bswap): Separate search for di/si mode patterns for finding widest match. Is it sufficient for you to accept my patch? >>> >>> No, I still don't think the fix is sound. A proper fix would revert the >>> above change (the simplify_bitwise_binary one), watch for testsuite >>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>> and fix those failures in a better way. >>> >>> Richard. >>> Best regards. yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev > wrote: >> Richard, >> >> As we know Kai is working on this problem for 4.9 and I assume that >> type sinking will be deleted from forwprop pass. Could we stay on this >> fix but more common fix will be done. > > Well, unless you show it is a regression the patch is not applicable for > 4.8 > anyway. Not sure if the code will be deleted from forwprop pass in 4.9 > either, > it is after all a canonicalization - fold seems to perform the opposite > one > though: > > /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer > constants (if x has signed type, the sign bit cannot be set > in c). This folds extension into the BIT_AND_EXPR. > > note that what forwprop does (T)x & c -> (T)(x & c') I'd call type > hoisting, > not sinking. Generally frontends and fold try to narrow operands when > possible (even though some targets later widen them again because of > instruction set constraints). > > Most of this hoisting code was done to make lowering logical && and || > I believe. Looking at the testcases added tells us that while matching > the two patterns as done now helps them but only because that pattern > feeds single-operand instructions that then simplify. So doing the > transform > starting from that single-operand instructions instead looks like a better > fix (op !=/== 0/1 and (T) op) and also would not disagree with the > canonicalization done by fold. > > Richard. > >> I also can propose to introduce new hook for it but need to know your >> opinion since we don't went to waste our time on preparing dead >> patches. Note that x86 supports all short types in HW and such type >> sinkning is usually useless if short types are involved. >> >> Best regards. >> Yuri. >> >> >> 2013/2/21 Richard Biener : >>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev >>> wrote: Richard, First of all, your proposal to move type sinking to the end of function does not work since we handle each statement in function and we want that 1st type folding of X & C will not happen. Note that we have the following sequence of gimple before forwprop1: x.0_10 = (signed char) x_8; _11 = x.0_10 & 1; _12 = (signed char) y_9; _13 = _12 & 1; _14 = _11 ^ _13; >>> >>> Ah, indeed. Reminds me of some of my dead patches that separated >>> forwprop into a forward and backward stage. Of course then you have >>> the ordering issue of whether to first forward or backward. >>> >>> Which means th
Re: Fix for 56175
On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev wrote: > Richard, > > Sorry for my previous message - I did not undersatnd it properly. > > Anyway I proposed another fix that avoid (type) x & c --> (type) (x & > (type-x) c) transformation if x has short type: > > +++ gcc/tree-ssa-forwprop.c (working copy) > @@ -1789,8 +1789,11 @@ >defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); > > - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ > + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). > + Do that only if X has not short type. */ >if (TREE_CODE (arg2) == INTEGER_CST > + && (TYPE_PRECISION (TREE_TYPE (arg1)) > + >= TYPE_PRECISION (integer_type_node)) >&& CONVERT_EXPR_CODE_P (def1_code) >&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) > > Does this fix look suitable? I think the fix is to disable the transform if it widens the operation. Thus, sth like if (TREE_CODE (arg2) == INTEGER_CST && CONVERT_EXPR_CODE_P (def1_code) && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) + && (TYPE_PRECISION (TREE_TYPE (arg1)) +>= TYPE_PRECISION (TREE_TYPE (def1_arg1))) && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) see the restriction we place on the transform for the (T1) x & (T2) y case: /* For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. */ if (CONVERT_EXPR_CODE_P (def1_code) && CONVERT_EXPR_CODE_P (def2_code) && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) /* Make sure that the conversion widens the operands, or has same precision, or that it changes the operation to a bitfield precision. */ && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) <= TYPE_PRECISION (TREE_TYPE (arg1))) || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) != MODE_INT) || (TYPE_PRECISION (TREE_TYPE (arg1)) != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)) ideally you'd split out that condition into a predicate function taking the two type and use it in both places. Richard. > Yuri. > > 2013/2/21 Richard Biener : >> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> I double checked that with and without my fix compiler produces the >>> same output with -fdump-tree-optimized. >> >> For what testcase? >> >> Richard. >> >>> What patch did you apply? I think that you should apply the second one >>> - 56175.diff. >>> >>> Yuri. >>> >>> 2013/2/21 Richard Biener : On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev wrote: > Richard, > > This regression was introduced by Kai > > http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html > > 2011-06-27 Kai Tietz > > * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve > type sinking. > * tree-ssa-math-opts.c (execute_optimize_bswap): Separate > search for di/si mode patterns for finding widest match. > > Is it sufficient for you to accept my patch? No, I still don't think the fix is sound. A proper fix would revert the above change (the simplify_bitwise_binary one), watch for testsuite fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) and fix those failures in a better way. Richard. > Best regards. > yuri. > > 2013/2/21 Richard Biener : >> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev >> wrote: >>> Richard, >>> >>> As we know Kai is working on this problem for 4.9 and I assume that >>> type sinking will be deleted from forwprop pass. Could we stay on this >>> fix but more common fix will be done. >> >> Well, unless you show it is a regression the patch is not applicable for >> 4.8 >> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >> either, >> it is after all a canonicalization - fold seems to perform the opposite >> one >> though: >> >> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >> constants (if x has signed type, the sign bit cannot be set >> in c). This folds extension into the BIT_AND_EXPR. >> >> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >> hoisting, >> not sinking. Generally frontends and fold try to narrow operands when >> possible (even though some targets later widen them again because of >> instruction set constraints). >> >> Most of this hoisting code was done to make lowering logical && and || >> I believe. Looking at the testcases added tells us that whi
Re: libsanitizer merge from upstream r175733
On Thu, Feb 21, 2013 at 04:11:56PM +0100, Jakub Jelinek wrote: > On Thu, Feb 21, 2013 at 10:06:12AM -0500, Jack Howarth wrote: > > > Index: libsanitizer/asan/Makefile.am > > === > > --- libsanitizer/asan/Makefile.am (revision 196205) > > +++ libsanitizer/asan/Makefile.am (working copy) > > @@ -37,7 +37,6 @@ asan_files = \ > > > > libasan_la_SOURCES = $(asan_files) > > if USING_MAC_INTERPOSE > > -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc > > libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la > > else > > libasan_la_LIBADD = > > $(top_builddir)/sanitizer_common/libsanitizer_common.la > > $(top_builddir)/interception/libinterception.la > > Index: libsanitizer/merge.sh > > === > > --- libsanitizer/merge.sh (revision 196205) > > +++ libsanitizer/merge.sh (working copy) > > @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev) > > echo Current upstream revision: $CUR_REV > > merge include/sanitizer include/sanitizer > > merge lib/asan asan > > -merge lib/asan/dynamic asan/dynamic > > merge lib/tsan/rtl tsan > > merge lib/sanitizer_common sanitizer_common > > merge lib/interception interception > > > > appears to fix the darwin bootstrap after running automake1.11 in > > libsanitizer. > > Patch preapproved with proper ChangeLog entry. > > Jakub No regressions in asan.exp for the resulting build on x86_64-apple-darwin12.
Re: Fix for 56175
Richard, This does not fit to my fix since if we have another pattern like t = (u8)((x & 1) ^ ((u8)y & 1)); where y has short type, for rhs operand type sinkning (or hoisting as you prefer) still will be performed but I don't see any reason for converting (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all targets including x86. Yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev wrote: >> Richard, >> >> Sorry for my previous message - I did not undersatnd it properly. >> >> Anyway I proposed another fix that avoid (type) x & c --> (type) (x & >> (type-x) c) transformation if x has short type: >> >> +++ gcc/tree-ssa-forwprop.c (working copy) >> @@ -1789,8 +1789,11 @@ >>defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >>defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); >> >> - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ >> + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). >> + Do that only if X has not short type. */ >>if (TREE_CODE (arg2) == INTEGER_CST >> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >> + >= TYPE_PRECISION (integer_type_node)) >>&& CONVERT_EXPR_CODE_P (def1_code) >>&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >> >> Does this fix look suitable? > > I think the fix is to disable the transform if it widens the operation. > Thus, sth like > > if (TREE_CODE (arg2) == INTEGER_CST >&& CONVERT_EXPR_CODE_P (def1_code) >&& INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) > + && (TYPE_PRECISION (TREE_TYPE (arg1)) > +>= TYPE_PRECISION (TREE_TYPE (def1_arg1))) >&& int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) > > see the restriction we place on the transform for the (T1) x & (T2) y case: > > /* For bitwise binary operations apply operand conversions to the > binary operation result instead of to the operands. This allows > to combine successive conversions and bitwise binary operations. */ > if (CONVERT_EXPR_CODE_P (def1_code) > && CONVERT_EXPR_CODE_P (def2_code) > && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) > /* Make sure that the conversion widens the operands, or has same > precision, or that it changes the operation to a bitfield > precision. */ > && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) ><= TYPE_PRECISION (TREE_TYPE (arg1))) > || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) > != MODE_INT) > || (TYPE_PRECISION (TREE_TYPE (arg1)) > != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)) > > ideally you'd split out that condition into a predicate function taking the > two type and use it in both places. > > Richard. > >> Yuri. >> >> 2013/2/21 Richard Biener : >>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev wrote: Richard, I double checked that with and without my fix compiler produces the same output with -fdump-tree-optimized. >>> >>> For what testcase? >>> >>> Richard. >>> What patch did you apply? I think that you should apply the second one - 56175.diff. Yuri. 2013/2/21 Richard Biener : > On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev > wrote: >> Richard, >> >> This regression was introduced by Kai >> >> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >> >> 2011-06-27 Kai Tietz >> >> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >> type sinking. >> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >> search for di/si mode patterns for finding widest match. >> >> Is it sufficient for you to accept my patch? > > No, I still don't think the fix is sound. A proper fix would revert the > above change (the simplify_bitwise_binary one), watch for testsuite > fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) > and fix those failures in a better way. > > Richard. > >> Best regards. >> yuri. >> >> 2013/2/21 Richard Biener : >>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev >>> wrote: Richard, As we know Kai is working on this problem for 4.9 and I assume that type sinking will be deleted from forwprop pass. Could we stay on this fix but more common fix will be done. >>> >>> Well, unless you show it is a regression the patch is not applicable >>> for 4.8 >>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>> either, >>> it is after all a canonicalization - fold seems to perform the opposite >>> one >>> though: >>> >>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>> constants (if x has signed type, the sign bit
Re: [PATCH] Fix handling of very long asm statements in inliner
> That doesn't sound enough, unless there is already code out there > that respects this count. 1000 at 4 bytes per instruction is only > 4k. More that small enough for the rest of the compiler to think > that it could jump around such blocks cheaply. > > I think a limit of 1M or more might be more appropriate. I got an overflow for 6.4M, so 1M would be dangerously near that. 100k perhaps ? This was not for jump shortening, but the inliner heuristics. In the worst case we could separate the two, would be a larger patch though. -Andi
[PATCH] Fix darwin bootstrap and merge.sh
The attached patch fixes the broken bootstrap on darwin in libsanitizer due to the deprecation of the dynamic/asan_interceptors_dynamic.cc file. The patch also eliminates the merge of the deprecated subdirectory in merge.sh. Bootstrap and asan.exp regression tested on x86_64-apple-darwin12. Okay for gcc trunk? Jack libsanitizer/ 2013-02-21 Jack Howarth * asan/Makefile.am (libasan_la_SOURCES): Remove deprecated dynamic/asan_interceptors_dynamic.cc. * asan/Makefile.in: Regenerated. * merge.sh: Remove merge of deprecated lib/asan/dynamic. Index: libsanitizer/asan/Makefile.am === --- libsanitizer/asan/Makefile.am (revision 196205) +++ libsanitizer/asan/Makefile.am (working copy) @@ -37,7 +37,6 @@ asan_files = \ libasan_la_SOURCES = $(asan_files) if USING_MAC_INTERPOSE -libasan_la_SOURCES += dynamic/asan_interceptors_dynamic.cc libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la else libasan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la $(top_builddir)/interception/libinterception.la Index: libsanitizer/merge.sh === --- libsanitizer/merge.sh (revision 196205) +++ libsanitizer/merge.sh (working copy) @@ -66,7 +66,6 @@ CUR_REV=$(get_current_rev) echo Current upstream revision: $CUR_REV merge include/sanitizer include/sanitizer merge lib/asan asan -merge lib/asan/dynamic asan/dynamic merge lib/tsan/rtl tsan merge lib/sanitizer_common sanitizer_common merge lib/interception interception
Fix some texinfo 5.0 warnings in gcc/doc + libiberty
This is a follow up to Jakub's patch. With texinfo 5.0 one gets a bunch of warnings. This patch reduces the number of warnings – but there are still warnings to be fixed. This patch solves most of the issues related to mismatches between the item order in the @menu and the actual @nodes. As always, there is the question whether the @node or the item in @menu should be changed. In one case, I had the choice to either add a new item under @menu or to change a @section into @subsection. I did the latter. Tested with "make info" and no new warnings in texinfo 5.0. Additionally, I used texinfo 4.13a with showed no warnings. OK for the trunk? Tobias PS: I tried the following libiberty patch; it fixes a warning with texinfo 5.0. But I do not include it as it fails for some reason with an error with texinfo 4.13: ../../libiberty/libiberty.texi:250: Prev field of node `Functions' not pointed to. ../../libiberty//obstacks.texi:1: This node (Obstacks) has the bad Next. ... diff --git a/libiberty/libiberty.texi b/libiberty/libiberty.texi index 74f70d2..5a72f13 100644 --- a/libiberty/libiberty.texi +++ b/libiberty/libiberty.texi @@ -80,10 +80,10 @@ License; for more information, see @ref{Library Copying}. * Overview:: Overview of available function groups. -* Functions:: Available functions, macros, and global variables. - * Obstacks:: Object Stacks. +* Functions:: Available functions, macros, and global variables. + * Licenses:: The various licenses under which libiberty sources are distributed. gcc/ 2013-02-21 Tobias Burnus * doc/extended.texi (C Extensions): Change order in @menu to match @node. (Other MIPS Built-in Functions): Move last MIPS entry before "picoChip Built-in Functions". (SH Built-in Functions): Move after RX Built-in Functions. * doc/gcc.texi (Introduction): Change order in @menu to match @node. * doc/md.texi (Constraints): Ditto. * gty.texi (Type Information): Ditto. (User-provided marking routines for template types): Make subsection. * doc/invoke.texi (AArch64 Options): Move before "Adapteva Epiphany Options". diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index eb84408..ea4dfc1 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -30,8 +30,8 @@ extensions, accepted by GCC in C90 mode and in C++. * Constructing Calls:: Dispatching a call to another function. * Typeof:: @code{typeof}: referring to the type of an expression. * Conditionals::Omitting the middle operand of a @samp{?:} expression. -* Long Long:: Double-word integers---@code{long long int}. * __int128:: 128-bit integers---@code{__int128}. +* Long Long:: Double-word integers---@code{long long int}. * Complex:: Data types for complex numbers. * Floating Types:: Additional Floating Types. * Half-Precision:: Half-Precision Floating Point. @@ -40,8 +40,8 @@ extensions, accepted by GCC in C90 mode and in C++. * Fixed-Point:: Fixed-Point Types. * Named Address Spaces::Named address spaces. * Zero Length:: Zero-length arrays. -* Variable Length:: Arrays whose length is computed at run time. * Empty Structures::Structures with no members. +* Variable Length:: Arrays whose length is computed at run time. * Variadic Macros:: Macros with a variable number of arguments. * Escaped Newlines::Slightly looser rules for escaped newlines. * Subscripting::Any array can be subscripted, even if not an lvalue. @@ -50,8 +50,8 @@ extensions, accepted by GCC in C90 mode and in C++. * Compound Literals:: Compound literals give structures, unions or arrays as values. * Designated Inits::Labeling elements of initializers. -* Cast to Union:: Casting to union type from any member of the union. * Case Ranges:: `case 1 ... 9' and such. +* Cast to Union:: Casting to union type from any member of the union. * Mixed Declarations:: Mixing declarations and code. * Function Attributes:: Declaring that functions have no side effects, or that they can never return. @@ -11653,6 +11653,18 @@ else @end smallexample @end table +@node Other MIPS Built-in Functions +@subsection Other MIPS Built-in Functions + +GCC provides other MIPS-specific built-in functions: + +@table @code +@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr}) +Insert a @samp{cache} instruction with operands @var{op} and @var{addr}. +GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE} +when this function is available. +@end table + @node picoChip Built-in Functions @subsection picoChip Built-in Functions @@ -11689,18 +11701,6 @@ implementing assertions. @end table -@node Other MIPS Built-in Functions -@subsection Other MIPS Built-in Functions - -GCC provides other MIPS-specific built-in functions: - -@table @code -@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{add
[PATCH] C++ math constants
How about the attached file as a start for . I used the constexpr approach (instead of function calls) and replicated the constants that are available in in Unix. What other constants to add? math Description: Binary data
Re: Fix some texinfo 5.0 warnings in gcc/doc + libiberty
Tobias Burnus writes: > PS: I tried the following libiberty patch; it fixes a warning with texinfo > 5.0. But I do not include it as it fails for some reason with an error > with texinfo 4.13: > ../../libiberty/libiberty.texi:250: Prev field of node `Functions' not > pointed to. > ../../libiberty//obstacks.texi:1: This node (Obstacks) has the bad Next. > ... obstacks.texi is using @chapter when it should use @subsection, so it needs to be bracketed with @lowersections/@raisesections, like this: diff --git a/libiberty/libiberty.texi b/libiberty/libiberty.texi index 74f70d2..f1e4bdd 100644 --- a/libiberty/libiberty.texi +++ b/libiberty/libiberty.texi @@ -82,8 +82,6 @@ License; for more information, see @ref{Library Copying}. * Functions:: Available functions, macros, and global variables. -* Obstacks:: Object Stacks. - * Licenses:: The various licenses under which libiberty sources are distributed. @@ -245,7 +243,11 @@ central location from which to use, maintain, and distribute them. @c This is generated from the glibc manual using a make-obstacks-texi.sh @c script of Phil's. Hope it's accurate. +@lowersections +@lowersections @include obstacks.texi +@raisesections +@raisesections @node Functions @chapter Function, Variable, and Macro Listing. diff --git a/libiberty/obstacks.texi b/libiberty/obstacks.texi index a1b1b47..67780aa 100644 --- a/libiberty/obstacks.texi +++ b/libiberty/obstacks.texi @@ -1,4 +1,4 @@ -@node Obstacks,Licenses,Functions,Top +@node Obstacks @chapter Obstacks @cindex obstacks Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support
On Thu, 21 Feb 2013, Richard Sandiford wrote: > > This issue was originally raised here: > > > > http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html > > > > We have a shortcoming in GCC in that we only allow the use half of the FP > > MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register > > mode (CP0.Status.FR == 1) on MIPS32r2 processors. Furthermore we never > > enable the other half (NMADD.fmt and NMSUB.fmt) on those processors. > > However this whole instruction subset is always available on MIPS32r2 FPUs > > regardless of the mode selected, just as it always has been on FPUs of the > > 64-bit ISA line from MIPS IV up. > > Hmm, this was discussed here: > > http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00488.html > http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00492.html > > The footnote for COP1X instructions on page 12 of volume 1 of the MIPS32 ISA > (v2.50) says: Please note that v2.50 is obsolete, several typos were corrected in later revisions. Sadly I don't even have a copy of that exact revision of volume I. > 1. In Release 1 of the Architecture, these instructions are legal only >with a MIPS64 processor with 64-bit operations enabled (they are, in >effect, actually MIPS64 instructions). In Release 2 of the Architecture, >these instructions are legal with either a MIPS32 or MIPS64 processor >_which includes a 64-bit floating point unit_. > > (my emphasis). "which" rather than "that" makes this a bit ambiguous, > but various comments in the rest of the manual imply that MIPS32r2 allows > an implementation choice between 32-bit and 64-bit FPUs. E.g. page 8 says: > > Support for 64-bit coprocessors with 32-bit CPUs: These changes allow > a 64-bit coprocessor (including an FPU) to be attached to a 32-bit > CPU. This enhancement is optional in a Release 2 implementation. > > and page 45 says: > > In addition to an Instruction Set Architecture, the MIPS architecture > definition includes processing resources such as the set of > coprocessor general registers. In Release 1 of the Architecture, the > 32-bit registers in MIPS32 were enlarged to 64-bits in MIPS64; > however, these 64-bit FPU registers are not backwards > compatible. Instead, processors implementing the MIPS64 Architecture > provide a mode bit to select either the 32-bit or 64-bit register > model. In Release 2 of the Architecture, a 32-bit CPU _may_ include a > full 64-bit coprocessor, including a floating point unit which > implements the same mode bit to select 32-bit or 64-bit FPU register > model. > > On page 322 of volume 2, the footnote for "Table A-20 MIPS64 COP1X > Encoding of Function Field" uses slightly different wording: > > COP1X instructions are legal only if 64-bit floating point operations > are enabled. > > So was this all a big misunderstanding on my part? The TARGET_FLOAT64 > condition came from MIPS themselves, and when challenged they seemed > pretty adamant that it was correct. If I was wrong to be convinced > by the explanation, I hope you can at least see why I was convinced. :-) As good as my English comprehension might be I believe your understanding of documentation is right. However several inconsistencies have already been spotted and it may well be that for example the restriction on COP1X instructions is simply a leftover that leaked from rev.1 of the MIPS64 architecture and was not spotted on the update (please note that obviously all the architecture documents are produced from templates shared between the MIPS32 and MIPS64 architectures with some parts being output conditionally as appropriate). > If it wasn't a misunderstanding, then the point is that we can't tell > from ISA_MIPS32R2 alone whether the target has a 32-bit or 64-bit FPU, > but we know that it must have a 64-bit FPU if using TARGET_FLOAT64. Well, the important point is as I understand all the MIPS32r2 FPU instructions (not all the FP formats of course though, L and PS are optional) are mandatory no matter if the FPU is 32-bit or 64-bit. This should rule out any question as to what kind of FPU has been implemented or whether the 64-bit mode has been enabled. I could be wrong however. However the good news is it's not the architecture documentation set that is normative for architecture implementers, it's the AVP (Architecture Verification Program) they have to run successfully on their processors to claim compliance. Steve, would you therefore please do us a favour and check what the AVP for MIPS32r2 requires for support of the floating-point MADD instruction subset, or preferably, the whole COP1X instruction set? Are these instructions only mandatory for a 64-bit FPU (CP1.FIR.F64 == 1), or do they have to be included in all FPU implementations? > This part is OK, thanks, and is probably the only bit that's suitable for > 4.8 at this stage. Would you mind applying it separately? Applied now, thanks.
[PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)
Hi! This patch teaches lower-subreg pass to also handle ASHIFTRTs with BITS_PER_WORD to 2*BITS_PER_WORD-1 constant shift counts, like it already handles similar LSHIFTRTs. While for LSHIFTRT we should zero the upper half, for ASHIFTRT we either should set it to upper source half >> (BITS_PER_WORD-1), or for shifts by >> (2*BITS_PER_WORD-1) we can optimize that to one shift followed by copying it from the lower to the upper half. On the testcase from the PR this removes 3 unnecessary moves, so we are one more better than 4.7 (thus fix a regression), and on the other testcases either we generated the same quality of code, or better. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-02-21 Jakub Jelinek PR rtl-optimization/50339 * lower-subreg.h (struct lower_subreg_choices): Add splitting_ashiftrt field. * lower-subreg.c (compute_splitting_shift): Handle ASHIFTRT. (compute_costs): Call compute_splitting_shift also for ASHIFTRT into splitting_ashiftrt field. (find_decomposable_shift_zext, resolve_shift_zext): Handle also ASHIFTRT. (dump_choices): Fix up printing LSHIFTRT choices, print ASHIFTRT choices. --- gcc/lower-subreg.h.jj 2013-02-21 14:10:39.033592663 +0100 +++ gcc/lower-subreg.h 2013-02-21 15:26:18.773634801 +0100 @@ -34,6 +34,7 @@ struct lower_subreg_choices { should be split. */ bool splitting_ashift[MAX_BITS_PER_WORD]; bool splitting_lshiftrt[MAX_BITS_PER_WORD]; + bool splitting_ashiftrt[MAX_BITS_PER_WORD]; /* True if there is at least one mode that is worth splitting. */ bool something_to_do; --- gcc/lower-subreg.c.jj 2013-02-21 14:10:38.975592966 +0100 +++ gcc/lower-subreg.c 2013-02-21 15:27:15.114316148 +0100 @@ -57,9 +57,9 @@ along with GCC; see the file COPYING3. to do this. This pass only splits moves with modes that are wider than - word_mode and ASHIFTs, LSHIFTRTs and ZERO_EXTENDs with integer - modes that are twice the width of word_mode. The latter could be - generalized if there was a need to do this, but the trend in + word_mode and ASHIFTs, LSHIFTRTs, ASHIFTRTs and ZERO_EXTENDs with + integer modes that are twice the width of word_mode. The latter + could be generalized if there was a need to do this, but the trend in architectures is to not need this. There are two useful preprocessor defines for use by maintainers: @@ -152,7 +152,7 @@ compute_splitting_shift (bool speed_p, s bool *splitting, enum rtx_code code, int word_move_zero_cost, int word_move_cost) { - int wide_cost, narrow_cost, i; + int wide_cost, narrow_cost, upper_cost, i; for (i = 0; i < BITS_PER_WORD; i++) { @@ -163,13 +163,20 @@ compute_splitting_shift (bool speed_p, s else narrow_cost = shift_cost (speed_p, rtxes, code, word_mode, i); + if (code != ASHIFTRT) + upper_cost = word_move_zero_cost; + else if (i == BITS_PER_WORD - 1) + upper_cost = word_move_cost; + else + upper_cost = shift_cost (speed_p, rtxes, code, word_mode, +BITS_PER_WORD - 1); + if (LOG_COSTS) fprintf (stderr, "%s %s by %d: original cost %d, split cost %d + %d\n", GET_MODE_NAME (twice_word_mode), GET_RTX_NAME (code), -i + BITS_PER_WORD, wide_cost, narrow_cost, -word_move_zero_cost); +i + BITS_PER_WORD, wide_cost, narrow_cost, upper_cost); - if (FORCE_LOWERING || wide_cost >= narrow_cost + word_move_zero_cost) + if (FORCE_LOWERING || wide_cost >= narrow_cost + upper_cost) splitting[i] = true; } } @@ -248,6 +255,9 @@ compute_costs (bool speed_p, struct cost compute_splitting_shift (speed_p, rtxes, choices[speed_p].splitting_lshiftrt, LSHIFTRT, word_move_zero_cost, word_move_cost); + compute_splitting_shift (speed_p, rtxes, + choices[speed_p].splitting_ashiftrt, ASHIFTRT, + word_move_zero_cost, word_move_cost); } } @@ -1153,6 +1163,7 @@ find_decomposable_shift_zext (rtx insn, op = SET_SRC (set); if (GET_CODE (op) != ASHIFT && GET_CODE (op) != LSHIFTRT + && GET_CODE (op) != ASHIFTRT && GET_CODE (op) != ZERO_EXTEND) return false; @@ -1173,6 +1184,8 @@ find_decomposable_shift_zext (rtx insn, { bool *splitting = (GET_CODE (op) == ASHIFT ? choices[speed_p].splitting_ashift +: GET_CODE (op) == ASHIFTRT +? choices[speed_p].splitting_ashiftrt : choices[speed_p].splitting_lshiftrt); if (!CONST_INT_P (XEXP (op, 1)) || !IN_RANGE (INTVAL (XEXP (op, 1)), BITS_PER_WORD, @@ -1200,7 +1213,7 @@ resolve_shift_zext (rtx insn) rtx op; rtx op_operand
vec_concat, vec_duplicate doc updated (was: Re: question about section 10.12)
> From: Hans-Peter Nilsson > Date: Mon, 28 Jan 2013 23:14:21 +0100 > * doc/rtl.texi (vec_concat, vec_duplicate): Mention that > scalars are valid operands. Finally committed as obvious after brief check of generated dvi and info. brgds, H-P
Re: [PATCH] C++ math constants
2013/2/21 Ulrich Drepper : > How about the attached file as a start for . I used the > constexpr approach (instead of function calls) and replicated the > constants that are available in in Unix. 1) In this case I miss the corresponding variable definitions, because you violate the ODR, when you have something like the following: #include template void print(const T& t) { std::cout << t; } int main() { print(__math_constants::__pi); } 2) You need to use either braced initializers *or* using initializers with equal, a parenthesized initializer isn't supported (/brace-or-equal-initializer/). - Daniel
Re: [PATCH] Fix handling of very long asm statements in inliner
> This was not for jump shortening, but the inliner heuristics. > > In the worst case we could separate the two, would be a larger > patch though. Actually it's already separated, I don't affect the jump shortening at all. Only the inliner code adds the limit. So it would depend whether 1000 * weight is large enough for inlining. Honza? -Andi
Re: [PATCH] C++ math constants
On 21/02/13 16:32, Ulrich Drepper wrote: How about the attached file as a start for . I used the constexpr approach (instead of function calls) and replicated the constants that are available in in Unix. What other constants to add? Pi/3 ln(3) ln(10) (for base conversions) sqrt(3) sqrt(5) sqrt(7) 1/e I'll write down what I use in my day to day stuff, keep you posted.
Add myself as maintainer
Adding myself as Write After Approval maintainer. Andrew Sutton andrew.n.sut...@gmail.com Index: ChangeLog === --- ChangeLog (revision 196207) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2013-02-20 Andrew Sutton + + * MAINTAINERS (Write After Approval): Add myself. + 2013-02-15 Yufeng Zhang * configure.ac: Set libgloss_dir for the aarch64*-*-* targets. Index: MAINTAINERS === --- MAINTAINERS (revision 196207) +++ MAINTAINERS (working copy) @@ -523,6 +523,7 @@ Graham Stott graham.stott@btinternet Andrew Stubbs a...@codesourcery.com Mike Stump mikest...@comcast.net Jeff Sturm jst...@gcc.gnu.org +Andrew Sutton andrew.n.sut...@gmail.com Gabriele Sveltogabriele.sve...@st.com Sriraman Tallamtmsri...@google.com Chung-Lin Tang clt...@codesourcery.com
Re: [PATCH] C++ math constants
> > How about the attached file as a start for . I used the > > constexpr approach (instead of function calls) and replicated the > > constants that are available in in Unix. then this should really be ext/cmath > > 1) In this case I miss the corresponding variable definitions, because > you violate the ODR, when you have something like the following: > > #include > > template > void print(const T& t) { std::cout << t; } > > int main() { > print(__math_constants::__pi); > } Not seeing it. Say for: #include // A class for math constants. template struct __math_constants { // Constant @f$ \pi @f$. static constexpr _RealType __pie = 3.1415926535897932384626433832795029L; }; template void print(const T& t) { std::cout << t; } int main() { print(__math_constants::__pie); return 0; } I'm not getting any definition, even at -O0. Any chance you could expand on your thinking here Daniel? > 2) You need to use either braced initializers *or* using initializers > with equal, a parenthesized initializer isn't supported > (/brace-or-equal-initializer/). Yeah. This works: // Constant @f$ \pi @f$. static constexpr _RealType __pie = 3.1415926535897932384626433832795029L; -benjamin
Re: [PATCH] Fix darwin bootstrap and merge.sh
Thanks! Does this need to regenerate libsanitizer/asan/Makefile.in ? (It'll take a while for me to do this on Mac) --kcc On Thu, Feb 21, 2013 at 8:04 PM, Jack Howarth wrote: > The attached patch fixes the broken bootstrap on darwin in libsanitizer due > to the > deprecation of the dynamic/asan_interceptors_dynamic.cc file. The patch also > eliminates > the merge of the deprecated subdirectory in merge.sh. Bootstrap and asan.exp > regression > tested on x86_64-apple-darwin12. Okay for gcc trunk? > Jack
Re: [PR middle-end/56108] handle transactions with ASMs in the first block
On 02/20/2013 07:35 AM, Aldy Hernandez wrote: > In the following test, the first statement of a relaxed transaction is > an inline asm: > > __transaction_relaxed { __asm__(""); } > > Since we bypass inserting BUILT_IN_TM_IRREVOCABLE at the beginning of > transactions that are sure to be irrevocable, later when we try to > expand the transaction, we ICE when we encounter the inline asm. > > Currently, we bypass the TM_IRREVOCABLE call here: > > for (region = d->all_tm_regions; region; region = region->next) > { > /* If we're sure to go irrevocable, don't transform anything. */ > if (d->irrevocable_blocks_normal > && bitmap_bit_p (d->irrevocable_blocks_normal, >region->entry_block->index)) > { > transaction_subcode_ior (region, GTMA_DOES_GO_IRREVOCABLE); > transaction_subcode_ior (region, GTMA_MAY_ENTER_IRREVOCABLE); > continue; > } > > If I understand this correctly, ideally a transaction marked as > doesGoIrrevocable shouldn't bother instrumenting the statements inside, > since the runtime will go irrevocable immediately. In which case, we > can elide the instrumentation altogether as the attached patch does. > > If my analysis is correct, then testsuite/gcc.dg/tm/memopt-1.c would > surely go irrevocable, thus requiring no instrumentation, causing the > memory optimizations to get skipped altogether. In which case, it's > best to mark the function calls as safe, so they don't cause the > transaction to become obviously irrevocable. > > Is this correct? If so, OK? Yes, that's correct. The patch is ok. r~
Re: [PATCH] Fix handling of very long asm statements in inliner
> > This was not for jump shortening, but the inliner heuristics. > > > > In the worst case we could separate the two, would be a larger > > patch though. > > Actually it's already separated, I don't affect the jump shortening > at all. Only the inliner code adds the limit. > > So it would depend whether 1000 * weight is large enough for inlining. > Honza? I think it is safely more than enough. 1000*weight makes the function safely uninlinable. So only it cuts of precision in large function growth/large unit growth logic. The logic is skewed here: large function growth is there really for compiler visible code to prevent non-linear algorithm explosion. For large unit growth I duno - in a way those gigantic asm statements that are not going to be duplicated can be tought as not part of the unit for this putpose, too. So I am fine with the cutoff. We may need to add more overflow guards (we already have quite few for time) that makes me wonder if all this should not be done all in saturating arithmetic now when it can be done theoretically with one C++ class? Honza > > -Andi
Re: [PATCH] Fix handling of very long asm statements in inliner
> So I am fine with the cutoff. We may need to add more overflow guards (we > already have quite few for time) that makes me wonder if all this should not > be > done all in saturating arithmetic now when it can be done theoretically with > one > C++ class? Sounds like a good idea, although I don't know how intrusive that is. But can try. Still would like to use the 1000 for the 4.7 backport. Ok? -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] Fix handling of very long asm statements in inliner
On 21/02/13 15:59, Andi Kleen wrote: That doesn't sound enough, unless there is already code out there that respects this count. 1000 at 4 bytes per instruction is only 4k. More that small enough for the rest of the compiler to think that it could jump around such blocks cheaply. I think a limit of 1M or more might be more appropriate. I got an overflow for 6.4M, so 1M would be dangerously near that. 100k perhaps ? This was not for jump shortening, but the inliner heuristics. My mistake. However, the real problem here is that you're summing a potentially large number of items on the expectation that the result will be small enough to be scaled successfully by the weight function. A better patch would be to a apply a much larger limit at the time when the count of instructions is finally used, rather than picking some arbitrary number here in the hope that it won't cause overflow later on. Yes, that's a bigger patch, but it would be a more robust solution than picking an arbitrary number here. In the worst case we could separate the two, would be a larger patch though. -Andi
Re: Re: [PATCH] C++ math constants
On 02/21/13, Alec Teal wrote: On 21/02/13 16:32, Ulrich Drepper wrote: > How about the attached file as a start for . I used the > constexpr approach (instead of function calls) and replicated the > constants that are available in in Unix. > > What other constants to add? Pi/3 ln(3) ln(10) (for base conversions) sqrt(3) sqrt(5) sqrt(7) 1/e I'll write down what I use in my day to day stuff, keep you posted. I use the Euler constant a *lot* in special functions. Also, I use the golden ratio to size dialog boxes in GUI work. // Constant @f$ @f$. static constexpr _RealType __pi_third(1.0471975511965977461542144610931676L); // Constant @f$ \gamma_E @f$. static constexpr _RealType __gamma_e(0.5772156649015328606065120900824024L); // Constant @f$ \phi = (1 + \sqrt(5))/2 @f$. static constexpr _RealType __golden_ratio(1.6180339887498948482045868343656381L);
Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)
On 02/21/2013 08:42 AM, Jakub Jelinek wrote: > @@ -1243,12 +1258,20 @@ resolve_shift_zext (rtx insn) >dest_reg = simplify_gen_subreg_concatn (word_mode, SET_DEST (set), >GET_MODE (SET_DEST (set)), >offset1); > - dest_zero = simplify_gen_subreg_concatn (word_mode, SET_DEST (set), > - GET_MODE (SET_DEST (set)), > - offset2); > + dest_upper = simplify_gen_subreg_concatn (word_mode, SET_DEST (set), > + GET_MODE (SET_DEST (set)), > + offset2); >src_reg = simplify_gen_subreg_concatn (word_mode, op_operand, > GET_MODE (op_operand), > src_offset); > + if (GET_CODE (op) == ASHIFTRT > + && INTVAL (XEXP (op, 1)) != 2 * BITS_PER_WORD - 1) > +{ > + rtx tem = expand_shift (RSHIFT_EXPR, word_mode, copy_rtx (src_reg), > + BITS_PER_WORD - 1, dest_upper, 0); > + if (dest_upper != tem) > + emit_move_insn (dest_upper, tem); > +} >if (GET_CODE (op) != ZERO_EXTEND) > { >int shift_count = INTVAL (XEXP (op, 1)); > @@ -1257,12 +1280,15 @@ resolve_shift_zext (rtx insn) > LSHIFT_EXPR : RSHIFT_EXPR, > word_mode, src_reg, > shift_count - BITS_PER_WORD, > - dest_reg, 1); > + dest_reg, GET_CODE (op) != ASHIFTRT); > } > >if (dest_reg != src_reg) > emit_move_insn (dest_reg, src_reg); > - emit_move_insn (dest_zero, CONST0_RTX (word_mode)); > + if (GET_CODE (op) != ASHIFTRT) > +emit_move_insn (dest_upper, CONST0_RTX (word_mode)); > + else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1) > +emit_move_insn (dest_upper, copy_rtx (src_reg)); >insns = get_insns (); Am I missing something? This looks like it would clobber the input too early in the case of (set (reg:DI x) (ashiftrt (reg:DI x) (const_int 60))) where src_reg and dest_upper could resolve to the same concatn, and thus the same SImode registers underneath? Don't you need to delay that upper copy til the final block? r~
Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)
On Thu, Feb 21, 2013 at 10:40:18AM -0800, Richard Henderson wrote: > Am I missing something? This looks like it would clobber the input too > early in the case of > > (set (reg:DI x) (ashiftrt (reg:DI x) (const_int 60))) > > where src_reg and dest_upper could resolve to the same concatn, and thus > the same SImode registers underneath? > > Don't you need to delay that upper copy til the final block? You're right, thanks for catching that. So, do you prefer e.g. the following, or would you instead prefer that I just set some rtx to copy_rtx (src_reg) before the if (GET_CODE (op) != ZERO_EXTEND) { ... } block and do the >> (BITS_PER_WORD - 1) shift only after this (then it can be with dest_upper as target and moved only if expand_shift returns something different)? For the patch below, I've verified no codegen changes from the earlier patch on the 3 testcases from the PR. 2013-02-21 Jakub Jelinek PR rtl-optimization/50339 * lower-subreg.h (struct lower_subreg_choices): Add splitting_ashiftrt field. * lower-subreg.c (compute_splitting_shift): Handle ASHIFTRT. (compute_costs): Call compute_splitting_shift also for ASHIFTRT into splitting_ashiftrt field. (find_decomposable_shift_zext, resolve_shift_zext): Handle also ASHIFTRT. (dump_choices): Fix up printing LSHIFTRT choices, print ASHIFTRT choices. --- gcc/lower-subreg.h.jj 2013-02-21 14:10:39.033592663 +0100 +++ gcc/lower-subreg.h 2013-02-21 15:26:18.773634801 +0100 @@ -34,6 +34,7 @@ struct lower_subreg_choices { should be split. */ bool splitting_ashift[MAX_BITS_PER_WORD]; bool splitting_lshiftrt[MAX_BITS_PER_WORD]; + bool splitting_ashiftrt[MAX_BITS_PER_WORD]; /* True if there is at least one mode that is worth splitting. */ bool something_to_do; --- gcc/lower-subreg.c.jj 2013-02-21 14:10:38.975592966 +0100 +++ gcc/lower-subreg.c 2013-02-21 19:49:02.289774205 +0100 @@ -57,9 +57,9 @@ along with GCC; see the file COPYING3. to do this. This pass only splits moves with modes that are wider than - word_mode and ASHIFTs, LSHIFTRTs and ZERO_EXTENDs with integer - modes that are twice the width of word_mode. The latter could be - generalized if there was a need to do this, but the trend in + word_mode and ASHIFTs, LSHIFTRTs, ASHIFTRTs and ZERO_EXTENDs with + integer modes that are twice the width of word_mode. The latter + could be generalized if there was a need to do this, but the trend in architectures is to not need this. There are two useful preprocessor defines for use by maintainers: @@ -152,7 +152,7 @@ compute_splitting_shift (bool speed_p, s bool *splitting, enum rtx_code code, int word_move_zero_cost, int word_move_cost) { - int wide_cost, narrow_cost, i; + int wide_cost, narrow_cost, upper_cost, i; for (i = 0; i < BITS_PER_WORD; i++) { @@ -163,13 +163,20 @@ compute_splitting_shift (bool speed_p, s else narrow_cost = shift_cost (speed_p, rtxes, code, word_mode, i); + if (code != ASHIFTRT) + upper_cost = word_move_zero_cost; + else if (i == BITS_PER_WORD - 1) + upper_cost = word_move_cost; + else + upper_cost = shift_cost (speed_p, rtxes, code, word_mode, +BITS_PER_WORD - 1); + if (LOG_COSTS) fprintf (stderr, "%s %s by %d: original cost %d, split cost %d + %d\n", GET_MODE_NAME (twice_word_mode), GET_RTX_NAME (code), -i + BITS_PER_WORD, wide_cost, narrow_cost, -word_move_zero_cost); +i + BITS_PER_WORD, wide_cost, narrow_cost, upper_cost); - if (FORCE_LOWERING || wide_cost >= narrow_cost + word_move_zero_cost) + if (FORCE_LOWERING || wide_cost >= narrow_cost + upper_cost) splitting[i] = true; } } @@ -248,6 +255,9 @@ compute_costs (bool speed_p, struct cost compute_splitting_shift (speed_p, rtxes, choices[speed_p].splitting_lshiftrt, LSHIFTRT, word_move_zero_cost, word_move_cost); + compute_splitting_shift (speed_p, rtxes, + choices[speed_p].splitting_ashiftrt, ASHIFTRT, + word_move_zero_cost, word_move_cost); } } @@ -1153,6 +1163,7 @@ find_decomposable_shift_zext (rtx insn, op = SET_SRC (set); if (GET_CODE (op) != ASHIFT && GET_CODE (op) != LSHIFTRT + && GET_CODE (op) != ASHIFTRT && GET_CODE (op) != ZERO_EXTEND) return false; @@ -1173,6 +1184,8 @@ find_decomposable_shift_zext (rtx insn, { bool *splitting = (GET_CODE (op) == ASHIFT ? choices[speed_p].splitting_ashift +: GET_CODE (op) == ASHIFTRT +? choices[speed_p].splitting_ashiftrt : choices[speed_p].splitting
Re: [PATCH] Handle ASHIFTRT with constant shift count >= BITS_PER_WORD in subreg lowering (PR rtl-optimization/50339)
On 02/21/2013 10:58 AM, Jakub Jelinek wrote: > So, do you prefer e.g. the following, or would you instead prefer > that I just set some rtx to copy_rtx (src_reg) before the > if (GET_CODE (op) != ZERO_EXTEND) > { > ... > } > block and do the >> (BITS_PER_WORD - 1) shift only after this (then it can > be with dest_upper as target and moved only if expand_shift returns > something different)? This new patch is fine. > For the patch below, I've verified no codegen changes from the earlier patch > on the 3 testcases from the PR. Excellent. r~
Re: [PATCH] C++ math constants
On Thu, 2013-02-21 at 17:14 +, Alec Teal wrote: > On 21/02/13 16:32, Ulrich Drepper wrote: > > How about the attached file as a start for . I used the > > constexpr approach (instead of function calls) and replicated the > > constants that are available in in Unix. > > > > What other constants to add? > Pi/3 > 1/e > I'm wondering, why adding constants for these if the compiler can figure it out from e.g. "math::pi/3" (or whatever the namespace is)? > ln(3) > ln(10) (for base conversions) > sqrt(3) > sqrt(5) > sqrt(7) Aren't these evaluated by the compiler and converted to constants? Cheers, Oleg
[LIBFFI] Re: Re: [PATCH] Add support for PaX enable kernels (MPROTECT)
On 07/11/2012 00:14, Magnus Granberg wrote: > 2012-11-07 Magnus Granberg Pavel Labushev >* configure.ac: Add --enable-pax_emutramp for PaX enable kernels. >* src/closures.c: Add emutramp_enabled_check. Don't mmap with PROT_EXEC > on PaX enable Kernels. >* README: Add description for --enable-pax_emutramp. >* fficonfig.h.in: Rebuilt. >* configure.ac: Rebuilt. Hi lists, There was a small problem with this (upstream relative to gcc) libffi patch(*). The entire #ifdef FFI_MMAP_EXEC_EMUTRAMP_PAX clause is contained within an outer #if !defined(X86_WIN32) && !defined(X86_WIN64) clause. That means that Windows platforms don't get the default definition of is_emutramp_enabled() supplied by the #else clause. However, is_emutramp_enabled() is unconditionally referenced in dlmmap(), and without this default definition Windows targets fail to compile. The attached patch fixes this by moving the #else clause with the default is_emutramp_enabled() definition to a standalone #ifndef clause outside any enclosing conditional compilation test. I couldn't think of a better way to do it; the #if !(windows) clause is followed by a #elif (cygwin/interix) clause, so I'd have had to put a default definition in there and also a second one in a subsequent unconditional #else if I didn't move it out of the enclosing #if scope altogether. Gcc-patches: Assuming AG approves, can we commit this without waiting for an upstream libffi release and doing a full merge? Currently GCC HEAD won't build libffi (and hence libjava) without it. 2013-02-21 Dave Korn * src/closures.c (is_emutramp_enabled [!FFI_MMAP_EXEC_EMUTRAMP_PAX]): Move default definition outside enclosing #if scope. cheers, DaveK -- (*) - Patch: http://sourceware.org/ml/libffi-discuss/2012/msg00269.html - Thread: http://sourceware.org/ml/libffi-discuss/2012/threads.html#00247 Index: src/closures.c === --- src/closures.c (revision 196167) +++ src/closures.c (working copy) @@ -189,8 +189,6 @@ emutramp_enabled_check (void) #define is_emutramp_enabled() (emutramp_enabled >= 0 ? emutramp_enabled \ : (emutramp_enabled = emutramp_enabled_check ())) -#else -#define is_emutramp_enabled() 0 #endif /* FFI_MMAP_EXEC_EMUTRAMP_PAX */ #elif defined (__CYGWIN__) || defined(__INTERIX) @@ -202,6 +200,10 @@ emutramp_enabled_check (void) #endif /* !defined(X86_WIN32) && !defined(X86_WIN64) */ +#ifndef FFI_MMAP_EXEC_EMUTRAMP_PAX +#define is_emutramp_enabled() 0 +#endif /* FFI_MMAP_EXEC_EMUTRAMP_PAX */ + /* Declare all functions defined in dlmalloc.c as static. */ static void *dlmalloc(size_t); static void dlfree(void*);
Re: [PATCH] C++ math constants
On Thu, Feb 21, 2013 at 08:06:02PM +0100, Oleg Endo wrote: > On Thu, 2013-02-21 at 17:14 +, Alec Teal wrote: > > On 21/02/13 16:32, Ulrich Drepper wrote: > > > How about the attached file as a start for . I used the > > > constexpr approach (instead of function calls) and replicated the > > > constants that are available in in Unix. > > > > > > What other constants to add? > > Pi/3 > > > 1/e > > > > I'm wondering, why adding constants for these if the compiler can figure > it out from e.g. "math::pi/3" (or whatever the namespace is)? It can't, you generally want to avoid double rounding for these, first rounding infinite precision pi to the corresponding floating point format and then performing in that finite precision division by 3 (and rounding again to that precision) might very well give different results from dividing infinite precision pi by 3 and only then rounding the result to the corresponding precision. Haven't tried it, so you might be lucky and in some of float, double or long double it might be equal. > > ln(3) > > ln(10) (for base conversions) > > sqrt(3) > > sqrt(5) > > sqrt(7) > > Aren't these evaluated by the compiler and converted to constants? Only if optimizing. Furthermore, I don't think std::sqrt etc. are constexpr, therefore you can't use them to initialize constexpr vars or in constexpr functions. Jakub
Re: [LIBFFI] Re: Re: [PATCH] Add support for PaX enable kernels (MPROTECT)
On Thu, Feb 21, 2013 at 2:22 PM, Dave Korn wrote: > Gcc-patches: Assuming AG approves, can we commit this without waiting for an > upstream libffi release and doing a full merge? Currently GCC HEAD won't > build libffi (and hence libjava) without it. This patch looks fine, thanks. I don't plan to merge back into GCC for at least a week or two, so I think you should commit it to the GCC tree independently. This means that 3.0.12 is broken for Cygwin. Are you able to produce test results once you apply this change? I should probably issue a quick 3.0.13 if the results are decent. Thanks, AG > > 2013-02-21 Dave Korn > > * src/closures.c (is_emutramp_enabled [!FFI_MMAP_EXEC_EMUTRAMP_PAX]): > Move default definition outside enclosing #if scope. > > cheers, > DaveK > -- > (*) - Patch: http://sourceware.org/ml/libffi-discuss/2012/msg00269.html > - Thread: http://sourceware.org/ml/libffi-discuss/2012/threads.html#00247
Re: FW: [PATCH] [MIPS] microMIPS gcc support
"Moore, Catherine" writes: >> Looks good otherwise, but please post an updated patch. It's probably >> stating the obvious, but the patch is too invasive for this late stage >> of 4.8, so >> it'll need to wait until 4.9. >> > > The updated patch is attached. How's it looking this time? Almost there. :-) As I mentioned in an earlier reply, please always check for warnings. If you're not bootstrapping on a MIPS host then instead configure the cross compiler with --enable-werror-always, using a recent GCC as the host compiler. If the patch goes in with warnings, it'll break the build for other people. The warnings I could see are: > + for (n = 0; n < XVECLEN (pattern, 0); n++) > +{ > + rtx set, reg, mem, this_base; > + HOST_WIDE_INT this_offset; > + unsigned int this_regno; this_regno is an unused variable (thanks for making it unused though). > static bool > mips_function_ok_for_sibcall (tree decl, tree exp ATTRIBUTE_UNUSED) > { > + > + unsigned int compression_mode = mips_get_compress_mode (decl); compression_mode is unused. > +static bool > +umips_build_save_restore (mips_save_restore_fn fn, > + unsigned *mask, HOST_WIDE_INT *offset) > +{ > + int i, nregs; > + unsigned int j; > + rtx pattern, set, reg, mem; > + HOST_WIDE_INT this_offset; > + rtx this_base; > + > + /* Try matching $16 to $31 (s0 to ra). */ > + for (i = 0; i < ARRAY_SIZE (umips_swm_mask); i++) > +if ((*mask & 0x) == umips_swm_mask[i]) > + break; > + > + if (i == ARRAY_SIZE (umips_swm_mask)) > +return false; "i" needs to be unsigned to avoid a "comparison between signed and unsigned" warning. > +/* Return the assembly instruction for microMIPS lwm or swm. > + SAVE_P and PATTERN are as for umips_save_restore_pattern_p. */ > + > +const char * > +umips_output_save_restore (bool save_p, rtx pattern) > +{ > + static char buffer[300]; > + char *s; > + int n; > + HOST_WIDE_INT offset; > + rtx base, mem, set, reg, last_set, last_reg; > + > + /* Parse the pattern. */ > + gcc_assert (umips_save_restore_pattern_p (save_p, pattern)); > + > + s = strcpy (buffer, save_p ? "swm\t" : "lwm\t"); > + s += strlen (s); > + n = XVECLEN (pattern, 0); > + > + set = XVECEXP (pattern, 0, 0); > + reg = save_p ? SET_SRC (set) : SET_DEST (set); "reg" is unused and should be deleted. In case it sounds otherwise, I didn't build the patch to try to catch you out. I built it so that I could try to answer: > > Is this a test of the swap_p case? Thanks if so, but could you add a > > comment > > saying where the SWP gets generated, and why the test needs to be skipped > > at -O1 and -Os? > > > Yes, this was the swap_p case. But it turns out that umips-lpw-swp-1.c > was also testing the swap_p case. I've now dropped this test in favor > of a different test that exercises the swap_p = false case. > You had mentioned testing for explicit registers. This is turning out > to be problematic. The Linux and ELF tool chains don't always use the > same register set. In addition, -O3 seems to expose more opportunities > for generating these instructions so the code generated isn't consistent > among optimization options. I've changed it back to just testing for > swp, but I'm open to other suggestions. I've attached some tests below, both for LWP (which didn't seem to be covered) and SWP. They use things like multiplication to force a particular load or store order in scheduled code, so that both swap_p and !swap_p are covered (verified using both printfs and -fno-peephole2). umips-lwp-7.c tests for: /* Avoid invalid load pair instructions. */ if (load_p && REGNO (first_reg) == REGNO (base1)) return false; /* We must avoid this case for anti-dependence. Ex: lw $3, 4($3) lw $2, 0($3) first_reg is $2, but the base is $3. */ if (load_p && swap_p && REGNO (first_reg) + 1 == REGNO (base1)) return false; while umips-lwp-8.c is an example of something that would be pessimised by not having the swap_p test above. I also noticed a couple of other things while testing: - ISA_HAS_LOAD_DELAY should be false for microMIPS, otherwise we'll try to insert nops after LWP on targets that default to -mips1, and ICE when we see that we have a double SET. - In the testsuite, a -mmicromips test option should override an ambient -mips16, and vice versa. - umips-movep-2.c fails on n64 because there the moves are DImode rather than SImode. Since the initial patch is just for 32-bit, I forced -mgp32 for now. Also, the function needs a MICROMIPS attribute. There were various other minor formatting and comment nits that were probably easier for me to fix than describe. There was also some trailing whitespace. The patch is OK for 4.9 with this one applied on top of it, provided that it works with your setup of course. Thanks for your patience! Richard Index: gcc/config/mips/constraints.md ==
[committed] Fix expand_mult (PR middle-end/56420)
Hi! I've committed the following fix for the following testcase. When scalar_op1 is 0x8000 with 64-bit HWI, it matches EXACT_POWER_OF_2_OR_ZERO_P, but we should expand it as negation of the << 63 shift rather than the << 63 shift alone. The patch also improves multiplication by 0x8000 in TImode, which can be done as << 63 shift, and multiplication by -(((TImode)1) << N) for N 0 through 62, which can be also expanded as negation of left shift. Furthermore I've noticed several places where we could invoke signed integer overflows inside of the compiler and fixed them. Bootstrapped/regtested on x86_64-linux and i686-linux, acked by Richard on IRC, committed to trunk. 2013-02-21 Jakub Jelinek PR middle-end/56420 * expmed.c (EXACT_POWER_OF_2_OR_ZERO_P): Do subtraction in uhwi, to avoid signed wrapping. (expand_mult): Handle properly multiplication by ((dword_type) -1) << (BITS_PER_WORD - 1). Improve multiplication by ((dword_type) 1) << (BITS_PER_WORD - 1). Avoid undefined behavior in the compiler if coeff is HOST_WIDE_INT_MIN. (expand_divmod): Don't make ext_op1 static, change it's type to uhwi. Avoid undefined behavior in -INTVAL (op1). * gcc.dg/torture/pr56420.c: New test. --- gcc/expmed.c.jj 2013-02-13 21:46:52.0 +0100 +++ gcc/expmed.c2013-02-21 19:25:05.692298011 +0100 @@ -64,7 +64,8 @@ static rtx expand_smod_pow2 (enum machin static rtx expand_sdiv_pow2 (enum machine_mode, rtx, HOST_WIDE_INT); /* Test whether a value is zero of a power of two. */ -#define EXACT_POWER_OF_2_OR_ZERO_P(x) (((x) & ((x) - 1)) == 0) +#define EXACT_POWER_OF_2_OR_ZERO_P(x) \ + (((x) & ((x) - (unsigned HOST_WIDE_INT) 1)) == 0) struct init_expmed_rtl { @@ -3079,7 +3080,10 @@ expand_mult (enum machine_mode mode, rtx /* If we are multiplying in DImode, it may still be a win to try to work with shifts and adds. */ if (CONST_DOUBLE_HIGH (scalar_op1) == 0 - && CONST_DOUBLE_LOW (scalar_op1) > 0) + && (CONST_DOUBLE_LOW (scalar_op1) > 0 + || (CONST_DOUBLE_LOW (scalar_op1) < 0 + && EXACT_POWER_OF_2_OR_ZERO_P + (CONST_DOUBLE_LOW (scalar_op1) { coeff = CONST_DOUBLE_LOW (scalar_op1); is_neg = false; @@ -3109,7 +3113,8 @@ expand_mult (enum machine_mode mode, rtx use synth_mult. */ /* Special case powers of two. */ - if (EXACT_POWER_OF_2_OR_ZERO_P (coeff)) + if (EXACT_POWER_OF_2_OR_ZERO_P (coeff) + && !(is_neg && mode_bitsize > HOST_BITS_PER_WIDE_INT)) return expand_shift (LSHIFT_EXPR, mode, op0, floor_log2 (coeff), target, unsignedp); @@ -3124,13 +3129,24 @@ expand_mult (enum machine_mode mode, rtx result is interpreted as an unsigned coefficient. Exclude cost of op0 from max_cost to match the cost calculation of the synth_mult. */ + coeff = -(unsigned HOST_WIDE_INT) coeff; max_cost = (set_src_cost (gen_rtx_MULT (mode, fake_reg, op1), speed) - neg_cost(speed, mode)); - if (max_cost > 0 - && choose_mult_variant (mode, -coeff, &algorithm, - &variant, max_cost)) + if (max_cost <= 0) + goto skip_synth; + + /* Special case powers of two. */ + if (EXACT_POWER_OF_2_OR_ZERO_P (coeff)) + { + rtx temp = expand_shift (LSHIFT_EXPR, mode, op0, + floor_log2 (coeff), target, unsignedp); + return expand_unop (mode, neg_optab, temp, target, 0); + } + + if (choose_mult_variant (mode, coeff, &algorithm, &variant, + max_cost)) { - rtx temp = expand_mult_const (mode, op0, -coeff, NULL_RTX, + rtx temp = expand_mult_const (mode, op0, coeff, NULL_RTX, &algorithm, variant); return expand_unop (mode, neg_optab, temp, target, 0); } @@ -3813,13 +3829,12 @@ expand_divmod (int rem_flag, enum tree_c int op1_is_constant, op1_is_pow2 = 0; int max_cost, extra_cost; static HOST_WIDE_INT last_div_const = 0; - static HOST_WIDE_INT ext_op1; bool speed = optimize_insn_for_speed_p (); op1_is_constant = CONST_INT_P (op1); if (op1_is_constant) { - ext_op1 = INTVAL (op1); + unsigned HOST_WIDE_INT ext_op1 = UINTVAL (op1); if (unsignedp) ext_op1 &= GET_MODE_MASK (mode); op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1) @@ -3967,7 +3982,7 @@ expand_divmod (int rem_flag, enum tree_c op1_is_pow2 = (op1_is_constant && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1)) || (! unsignedp -
Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support
On Feb 20, 2013, at 3:50 PM, Maciej W. Rozycki wrote: > BTW, do you happen to know a way to reliable force all our testsuites NOT > to delete executables after run? Personally I think it's missing the > point to have them deleted -- how can one debug any regressions then? So, I go into the source and comment out the cleanup lines… Would be nice to have them conditional on a variable so that one can set that variable in their .dejagnurc file. The limitations of this are, multiple test cases can use the same intermediate files and they would not give you a nice 1 to 1 mapping back. However, this I usually side step by only running a subset of the test suite that contains the part I'm interested in. > What do other people do? We usually can cut and paste the one line and run the one case by hand. Test cases that don't fall into this category, well, suck, I mean, are more annoying.
do not pass PR_INSTRUMENTEDCODE if there is no instrumentation
It has come to my attention, by..ahem...you, that in handling the testcase from my previous patch: __transaction_relaxed { __asm__(""); } ...if we avoid instrumentation altogether, we shouldn't lie to the run-time and pass PR_INSTRUMENTEDCODE. I suppose we could cheat and avoid passing PR_INSTRUMENTEDCODE if we ever enter expand_block_tm(), but perhaps we could do a little better as with the attached patch. With this patch, we generate: tm_state.2_6 = __builtin__ITM_beginTransaction (16458); [ uninstrumentedCode hasNoAbort doesGoIrrevocable readOnly ] Notice there is no "instrumentedCode". By the way, can you double check my logic in expand_assign_tm() and expand_call_tm(), particularly if we should bother setting GTMA_INSTRUMENTED_CODE for thread private variables? Thanks. commit a441d916db334b71e76d9f4350ffdf992c058b4f Author: Aldy Hernandez Date: Thu Feb 21 16:02:11 2013 -0600 * trans-mem.c (expand_transaction): Only set PR_INSTRUMENTEDCODE if there was instrumented code. (expand_call_tm): Set GTMA_INSTRUMENTED_CODE. (expand_assign_tm): Same. * gimple.h (GTMA_INSTRUMENTED_CODE): New. diff --git a/gcc/gimple.h b/gcc/gimple.h index 4bd6b3d..baa5c24 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -661,6 +661,8 @@ struct GTY(()) gimple_statement_omp_atomic_store { tell the runtime that it should begin the transaction in serial-irrevocable mode. */ #define GTMA_DOES_GO_IRREVOCABLE (1u << 6) +/* The transaction has at least one instrumented instruction within. */ +#define GTMA_INSTRUMENTED_CODE (1u << 7) struct GTY(()) gimple_statement_transaction { diff --git a/gcc/testsuite/gcc.dg/tm/instrumented-mask.c b/gcc/testsuite/gcc.dg/tm/instrumented-mask.c new file mode 100644 index 000..6cfd3e4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tm/instrumented-mask.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -fdump-tree-tmmark" } */ + +/* If we're sure to go irrevocable, as in the case below, do not pass + PR_INSTRUMENTEDCODE to the run-time if there is nothing + instrumented within the transaction. */ + +int +main() +{ + __transaction_relaxed { __asm__(""); } + return 0; +} + +/* { dg-final { scan-tree-dump-times " instrumentedCode" 0 "tmmark" } } */ +/* { dg-final { cleanup-tree-dump "tmmark" } } */ diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c index 71eaa44..315f00c 100644 --- a/gcc/trans-mem.c +++ b/gcc/trans-mem.c @@ -2146,6 +2146,9 @@ expand_assign_tm (struct tm_region *region, gimple_stmt_iterator *gsi) { /* Add thread private addresses to log if applicable. */ requires_barrier (region->entry_block, lhs, stmt); + // ?? Should we set GTMA_INSTRUMENTED_CODE here if there is no + // instrumentation, but requires_barrier added a thread private + // address to the log? gsi_next (gsi); return; } @@ -2153,6 +2156,8 @@ expand_assign_tm (struct tm_region *region, gimple_stmt_iterator *gsi) // Remove original load/store statement. gsi_remove (gsi, true); + transaction_subcode_ior (region, GTMA_INSTRUMENTED_CODE); + if (load_p && !store_p) { transaction_subcode_ior (region, GTMA_HAVE_LOAD); @@ -2230,9 +2235,12 @@ expand_call_tm (struct tm_region *region, if (fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMCPY) || fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMMOVE)) -transaction_subcode_ior (region, GTMA_HAVE_STORE | GTMA_HAVE_LOAD); +transaction_subcode_ior (region, GTMA_HAVE_STORE +| GTMA_HAVE_LOAD +| GTMA_INSTRUMENTED_CODE); if (fn_decl == builtin_decl_explicit (BUILT_IN_TM_MEMSET)) -transaction_subcode_ior (region, GTMA_HAVE_STORE); +transaction_subcode_ior (region, GTMA_HAVE_STORE +| GTMA_INSTRUMENTED_CODE); if (is_tm_pure_call (stmt)) return false; @@ -2243,7 +2251,8 @@ expand_call_tm (struct tm_region *region, { /* Assume all non-const/pure calls write to memory, except transaction ending builtins. */ - transaction_subcode_ior (region, GTMA_HAVE_STORE); + transaction_subcode_ior (region, GTMA_HAVE_STORE + | GTMA_INSTRUMENTED_CODE); } /* For indirect calls, we already generated a call into the runtime. */ @@ -2602,7 +2611,7 @@ expand_transaction (struct tm_region *region, void *data ATTRIBUTE_UNUSED) flags |= PR_HASNOABORT; if ((subcode & GTMA_HAVE_STORE) == 0) flags |= PR_READONLY; -if (inst_edge) +if (inst_edge && (subcode & GTMA_INSTRUMENTED_CODE)) flags |= PR_INSTRUMENTEDCODE; if (uninst_edge) flags |= PR_UNINSTRUMENTEDCODE;
Re: [PATCH] C++ math constants
2013/2/21 Benjamin De Kosnik : > >> > How about the attached file as a start for . I used the >> > constexpr approach (instead of function calls) and replicated the >> > constants that are available in in Unix. > > then this should really be > > ext/cmath > >> >> 1) In this case I miss the corresponding variable definitions, because >> you violate the ODR, when you have something like the following: >> >> #include >> >> template >> void print(const T& t) { std::cout << t; } >> >> int main() { >> print(__math_constants::__pi); >> } > > Not seeing it. An implementation is not required to diagnose it. You will notice a linker error with Clang for example and this is conforming, because __math_constants::__pi is ODR-used. > Say for: > > #include > > // A class for math constants. > template > struct __math_constants > { > // Constant @f$ \pi @f$. > static constexpr _RealType __pie = > 3.1415926535897932384626433832795029L; }; > > template > void print(const T& t) { std::cout << t; } > > int main() > { > print(__math_constants::__pie); > return 0; > } > > I'm not getting any definition, even at -O0. > > Any chance you could expand on your thinking here Daniel? No ;-) - Daniel
Re: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support
>> What do other people do? > > We usually can cut and paste the one line and run the one case by hand. Test > cases that don't fall into this category, well, suck, I mean, are more > annoying. > These days I find contrib/repro_fail file.log.useful to rebuild the failing test(s). regards Ramana
RE: PR target/52555: attribute optimize is overriding command line options
On Mon, Feb 18, 2013 at 12:50:59PM -0600, Aldy Hernandez wrote: > OK pending tests? > PR target/52555 > * genopinit.c (raw_optab_handler): Use this_fn_optabs. > (swap_optab_enable): Same. > (init_all_optabs): Use argument instead of global. > * tree.h (struct tree_optimization_option): New field > target_optabs. > * expr.h (init_all_optabs): Add argument to prototype. > (TREE_OPTIMIZATION_OPTABS): New. > (save_optabs_if_changed): Protoize. > * optabs.h: Declare this_fn_optabs. > * optabs.c (save_optabs_if_changed): New. > Declare this_fn_optabs. > (init_optabs): Add argument to init_all_optabs() call. > * function.c (invoke_set_current_function_hook): Handle per > function optabs. > * function.h (struct function): New field optabs. > * config/mips/mips.c (mips_set_mips16_mode): Handle when > optimization_current_node has changed. > * target-globals.h (save_target_globals_default_opts): Protoize. > * target-globals.c (save_target_globals_default_opts): New. > c/family > PR target/52555 > * c-common.c (handle_optimize_attribute): Call > save_optabs_if_changed. Aldy, Have you gotten any reports of problems with this patch? It seems to be sending cc1 into an infinite loop during the GCC testsuite for me. I am testing the mips-mti-linux-gnu target and tests like gcc.target/mips/call-saved-1.c are causing cc1 to suck up all my memory and swap space before the test times out. I will keep digging and see if I can figure out what is going on but I wanted to see if anyone else has reported this problem. Steve Ellcey sell...@imgtec.com
Re: PR target/52555: attribute optimize is overriding command line options
Have you gotten any reports of problems with this patch? It seems to be sending cc1 into an infinite loop during the GCC testsuite for me. I am testing the mips-mti-linux-gnu target and tests like gcc.target/mips/call-saved-1.c are causing cc1 to suck up all my memory and swap space before the test times out. No, I have not. I will keep digging and see if I can figure out what is going on but I wanted to see if anyone else has reported this problem. Feel free to open a PR and CC me on it. Thanks.
Re: [PATCH] C++ math constants
On Thu, Feb 21, 2013 at 4:18 PM, Daniel Krügler wrote: > 2013/2/21 Benjamin De Kosnik : >> >>> > How about the attached file as a start for . I used the >>> > constexpr approach (instead of function calls) and replicated the >>> > constants that are available in in Unix. >> >> then this should really be >> >> ext/cmath >> >>> >>> 1) In this case I miss the corresponding variable definitions, because >>> you violate the ODR, when you have something like the following: >>> >>> #include >>> >>> template >>> void print(const T& t) { std::cout << t; } >>> >>> int main() { >>> print(__math_constants::__pi); >>> } >> >> Not seeing it. > > An implementation is not required to diagnose it. You will notice a > linker error with Clang for example and this is conforming, because > __math_constants::__pi is ODR-used. > >> Say for: >> >> #include >> >> // A class for math constants. >> template >> struct __math_constants >> { >> // Constant @f$ \pi @f$. >> static constexpr _RealType __pie = >> 3.1415926535897932384626433832795029L; }; >> >> template >> void print(const T& t) { std::cout << t; } >> >> int main() >> { >> print(__math_constants::__pie); >> return 0; >> } >> >> I'm not getting any definition, even at -O0. >> >> Any chance you could expand on your thinking here Daniel? > > No ;-) > > - Daniel There will be a proposal for 'constexpr variable template' in the Bristol mailing. It should take care of these issues. -- Gaby
Re: [PATCH] Fix handling of very long asm statements in inliner
On Thu, Feb 21, 2013 at 07:19:10PM +0100, Andi Kleen wrote: > > So I am fine with the cutoff. We may need to add more overflow guards (we > > already have quite few for time) that makes me wonder if all this should > > not be > > done all in saturating arithmetic now when it can be done theoretically > > with one > > C++ class? > > Sounds like a good idea, although I don't know how intrusive that is. But can > try. I looked at this now and it would be very intrusive. Not only in tree-inline, but also various callers elsewhere. And there's no saturating class either, so that would need to be added too. I think I prefer the simple solution. BTW this is a regression, old gccs compiled this. -Andi
Re: r196201 - in /trunk: gcc/ChangeLog gcc/config/i...
Jakub, thanks again for cleaning up my mess. Here is a question regarding your fix: > -#if ASAN_USE_PREINIT_ARRAY > +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC) The PIC macro is an artifact of the GCC build system and is not directly related the the -fPIC flag? As I can see, in the gcc build we compile all of asan sources twice: w/ and w/o "-fPIC -DPIC". If I move the preinit_array to a separate file (asan_preinit.cc), will we need to have two builds? I think we can just build all objects once with -fPIC and then not link asan_preinit.o into libasan.so In clang we build the static libasan with -fPIC, it doesn't hurt. Anyway, I've just committed http://llvm.org/viewvc/llvm-project?rev=175871&view=rev with asan_preinit.cc --kcc On Thu, Feb 21, 2013 at 5:26 PM, Konstantin Serebryany wrote: > On Thu, Feb 21, 2013 at 5:21 PM, Jakub Jelinek wrote: >> On Thu, Feb 21, 2013 at 05:15:51PM +0400, Konstantin Serebryany wrote: >>> This commit breaks the build if the BFD linker is used (I have gold on >>> my box, so I missed it). >>> >>> Short repro: >>> % cat preinit.cc >>> void foo() {} >>> __attribute__((section(".preinit_array"))) void (*xxx)(void) = foo; >>> % g++ preinit.cc -shared # gold >>> % sudo apt-get remove binutils-gold >>> ... >>> % g++ preinit.cc -shared # bfd >>> /usr/bin/ld: /tmp/cc4GVflE.o: .preinit_array section is not allowed in DSO >>> /usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable >>> section on output >>> collect2: ld returned 1 exit status >>> % >>> >>> Can we stop building the asan-rt as DSO and leave only the static >>> variant (as in clang)? >> >> No, IMNSHO it is desirable to support also that. > > It may cause I more trouble (we've seen a couple of bugs already) then do > good. > Anyway, we can get rid of this later. > >> >> Here is a different fix, so libasan.so will not have .preinit_array, but >> libasan.a will have it. Ideally, that hunk should go into a separate >> source file (asan_preinit.cc ?), be just compiled into an object file, >> rather than shared library and the compiler driver should include it >> explicitly in the link. >> >> The used attribute is there because, as it isn't (or shouldn't) be exported >> out of the library, if libasan was built with LTO, it could very well be >> optimized away. >> >> 2013-02-21 Jakub Jelinek >> >> * asan/asan_rtl.cc (__asan_preinit): Don't add if PIC macro is >> defined. Add used attribute. >> >> --- libsanitizer/asan/asan_rtl.cc.jj2013-02-21 14:10:41.0 +0100 >> +++ libsanitizer/asan/asan_rtl.cc 2013-02-21 14:16:28.985547506 +0100 >> @@ -520,11 +520,11 @@ void __asan_init() { >>} >> } >> >> -#if ASAN_USE_PREINIT_ARRAY >> +#if ASAN_USE_PREINIT_ARRAY && !defined (PIC) >>// On Linux, we force __asan_init to be called before anyone else >>// by placing it into .preinit_array section. >>// FIXME: do we have anything like this on Mac? >> - __attribute__((section(".preinit_array"))) >> + __attribute__((section(".preinit_array"), used)) >>void (*__asan_preinit)(void) =__asan_init; >> #elif defined(_WIN32) && defined(_DLL) >>// On Windows, when using dynamic CRT (/MD), we can put a pointer > > Thanks! > May I ask you to commit this to gcc (I have to run away now)? > I'll put this into upstream (maybe with asan_preinit.cc as you > suggest) tomorrow. > > --kcc > >> >> >> Jakub