Re: [i386] recognize haddpd

2012-09-28 Thread Uros Bizjak
On Wed, Sep 26, 2012 at 5:16 PM, Marc Glisse  wrote:
> Adding an x86 maintainer in Cc:

>>> this patch passes bootstrap+testsuite. It is probably wrong in many ways,
>>> but I don't know enough to do more without some advice.
>>>
>>> The goal is to recognize that v[0]+v[1] can be computed with haddpd. With
>>> the patch, v[0]-v[1] becomes hsubpd and v[1]+v[0] becomes haddpd. Also,
>>> thanks to it, {v[0]-v[1], w[0]-w[1]} is now recognized as a single hsubpd.
>>>
>>> 1) Is a define_insn the right tool?

Yes, combine pass looks at insn patterns to determine which insn is correct.

>>> 2) {v[0]-v[1], v[0]-v[1]} is not recognized as a hsubpd because
>>> vec_duplicate doesn't match vec_concat. Do we really need to duplicate (no
>>> pun intended) the pattern?

You can add this transformation to simplify-rtx.c. Probably vec_concat
with two equal operands can be canonicalized as vec_duplicate.

>>> 3) v[0]+v[1] is not recognized. Some pass changed their order, and
>>> nothing tries the reverse order. I can see 3 ways: canonicalize the order at
>>> some point, let combine try both orders for commutative operators or make
>>> the patterns more flexible (I don't know how many would need changing).

In your case, split the macroized pattern to plus and minus. Plus
pattern should use "const_0_to_1_operand" predicate, and in the insn
constraint, you just check that operands are different.

>>> 4) I don't understand the set_attr part. I copied it from the haddpd
>>> define_insn, and removed (set_attr "type" "sseadd") because it crashed the
>>> compiler. isa and prefix make sense and they match the alternatives, but I
>>> am not sure about "mode" (removing it still works IIRC).

Somewhere deep inside attribute calculations, it is assumed that
sseadd has two operands. Your pattern doesn't satisfy this assumption.
You have to add sseadd1, please see for example sseiadd/sseiadd1 pair.

Uros.


Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Richard Guenther
On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov  wrote:
>   Originally I was to submit LRA at the very beginning of stage1 for
> gcc4.9 as it was discussed on this summer GNU Tools Cauldron.  After
> some thinking, I've decided to submit LRA now but only switched on for
> *x86/x86-64* target.  The reasons for that are
>   o I am already pretty confident in LRA for this target with the
> point of reliability, performance, code size, and compiler speed.
>   o I am confident that I can fix LRA bugs and pitfalls which might be
> recognized and reported during stage2 and 3 of gcc4.8.
>   o Wider LRA testing for x86/x86-64 will make smoother a hard transition of
> other targets to LRA during gcc4.9 development.
>
>   During development of gcc4.9, I'd like to switch major targets to
> LRA as it was planned before.  I hope that all targets will be
> switched for the next release after gcc4.9 (although it will be
> dependent mostly on the target maintainers).  When/if it is done,
> reload and reload oriented machine-dependent code can be removed.
>
>   LRA project was reported on 2012 GNU Tools Cauldron
> (http://gcc.gnu.org/wiki/cauldron2012).  The presentation contains a
> high-level description of LRA and the project status.
>
>   The following patches makes LRA working for x86/x86-64. Separately
> patches mostly do nothing until the last patch switches on LRA for
> x86/x86-64.  Although compiler is bootstrapped after applying each
> patch in given order, the division is only for review convenience.
>
>   Any comments and proposals are appreciated.  Even if GCC community
> decides that it is too late to submit it to gcc4.8, the earlier reviews
> are always useful.

>From a release-manager point of view the patch is "in time" for 4.8, in that
it is during stage1 (which I expect to last another two to four weeks).  Note
that there is no such thing as "stage2" anymore but we go straight to
"stage3" (bugfixing mode, no new features) from stage1.  After three months
of stage3 we go into regression-fixes only mode for as long as there are
release-blocking bugs (regressions with priority P1).  You will have roughly
half a year to fix LRA for 4.8.0 after stage1 closes.

Thanks,
Richard.

>   The patches were successfully bootstrapped and tested for x86/x86-64.
>


Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Steven Bosscher
On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov  wrote:
>   Any comments and proposals are appreciated.  Even if GCC community
> decides that it is too late to submit it to gcc4.8, the earlier reviews
> are always useful.

I would like to see some benchmark numbers, both for code quality and
compile time impact for the most notorious compile time hog PRs for
large routines where IRA performs poorly (e.g. PR54146, PR26854).

Ciao!
Steven


Re: [PATCH RFA] Implement register pressure directed hoist pass

2012-09-28 Thread Steven Bosscher
On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng  wrote:
> (get_regno_pressure_class, get_pressure_class_and_nregs)

Broken long lines in a ChangeLog entry end with a ",".


> (change_pressure, mark_regno_live, mark_regno_death, mark_reg_death)
> (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)

Please use the DF caches instead of note_stores, note_uses, etc.


> (free_bb_data): New.

Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
free_aux_for_block.

Ciao!
Steven


Re: RFC: LRA for x86/x86-64 [2/9]

2012-09-28 Thread Steven Bosscher
On Fri, Sep 28, 2012 at 12:57 AM, Vladimir Makarov  wrote:
> LRA outputs a lot debug information about insns.  I found that using slim
> insn/rtl presentation helps a lot for LRA debuging. The following patch
> makes slim presentation printing functions visible to LRA.  It also
> implements one more such function.
>
> 2012-09-27  Vladimir Makarov  
>
> * rtl.h (debug_bb_n_slim, debug_bb_slim, print_value_slim): New
> prototypes.
> (debug_rtl_slim, debug_insn_slim): Ditto.
> * sched-vis.c (print_value_slim): New.

I have patches in the works to use the slim RTL dumping format more,
too, and to use the pretty-printer code so that printing strings with
escaped characters can be made more transparent (e.g. for use in
GraphViz dumps).

Perhaps it's time to rename sched-vis.c to print-rtl-slim.c? :-)

Ciao!
Steven


Re: vec_cond_expr adjustments

2012-09-28 Thread Richard Guenther
On Fri, Sep 28, 2012 at 12:42 AM, Marc Glisse  wrote:
> Hello,
>
> I have been experimenting with generating VEC_COND_EXPR from the front-end,
> and these are just a couple things I noticed.
>
> 1) optabs.c requires that the first argument of vec_cond_expr be a
> comparison, but verify_gimple_assign_ternary only checks is_gimple_condexpr,
> like for COND_EXPR. In the long term, it seems better to also allow ssa_name
> and vector_cst (thus match the is_gimple_condexpr condition), but for now I
> just want to know early if I created an invalid vec_cond_expr.

optabs should be fixed instead, an is_gimple_val condition is implicitely
val != 0.

> 2) a little refactoring of the code to find a suitable vector type for
> comparison results, and one more place where it should be used (no testcase
> yet because I don't know if that path can be taken without front-end changes
> first).

Yes, it looks fine to me.

> I did wonder, for tree-ssa-forwprop, about using directly TREE_TYPE
> (cond) without truth_type_for.

Yes, that should work.

> Hmm, now I am wondering whether I should have waited until I had front-end
> vec_cond_expr support to submit everything at once...

;)

The tree.[ch] and gimple-fold.c hunks are ok if tested properly, the
tree-ssa-forwprop.c idea of using TREE_TYPE (cond), too.

I don't like the tree-cfg.c change, instead re-factor optabs.c to
get a decomposed cond for vector_compare_rtx and appropriately
"decompose" a non-comparison-class cond in expand_vec_cond_expr.

If we for example have

 predicate = a < b;
 x = predicate ? d : e;
 y = predicate ? f : g;

we ideally want to re-use the predicate computation on targets where
that would be optimal (and combine should be able to recover the
case where it is not).

Thanks,
Richard.

> 2012-09-27  Marc Glisse  
>
> * tree-cfg.c (verify_gimple_assign_ternary): Stricter check on
> first argument of VEC_COND_EXPR.
> * tree.c (truth_type_for): New function.
> * tree.h (truth_type_for): Declare.
> * gimple-fold.c (and_comparisons_1): Call it.
> (or_comparisons_1): Likewise.
> * tree-ssa-forwprop.c (forward_propagate_into_cond): Likewise.
>
> --
> Marc Glisse
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c (revision 191810)
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -549,21 +549,22 @@ static bool
>  forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
>  {
>gimple stmt = gsi_stmt (*gsi_p);
>tree tmp = NULL_TREE;
>tree cond = gimple_assign_rhs1 (stmt);
>bool swap = false;
>
>/* We can do tree combining on SSA_NAME and comparison expressions.  */
>if (COMPARISON_CLASS_P (cond))
>  tmp = forward_propagate_into_comparison_1 (stmt, TREE_CODE (cond),
> -  boolean_type_node,
> +  truth_type_for
> +(TREE_TYPE (cond)),
>TREE_OPERAND (cond, 0),
>TREE_OPERAND (cond, 1));
>else if (TREE_CODE (cond) == SSA_NAME)
>  {
>enum tree_code code;
>tree name = cond;
>gimple def_stmt = get_prop_source_stmt (name, true, NULL);
>if (!def_stmt || !can_propagate_from (def_stmt))
> return 0;
>
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c  (revision 191810)
> +++ gcc/tree-cfg.c  (working copy)
> @@ -3758,22 +3758,24 @@ verify_gimple_assign_ternary (gimple stm
>tree rhs2_type = TREE_TYPE (rhs2);
>tree rhs3 = gimple_assign_rhs3 (stmt);
>tree rhs3_type = TREE_TYPE (rhs3);
>
>if (!is_gimple_reg (lhs))
>  {
>error ("non-register as LHS of ternary operation");
>return true;
>  }
>
> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> -   ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> +  if (((rhs_code == COND_EXPR) ? !is_gimple_condexpr (rhs1)
> +   : (rhs_code == VEC_COND_EXPR) ? (!is_gimple_condexpr (rhs1)
> +   || is_gimple_val (rhs1))
> +   : !is_gimple_val (rhs1))
>|| !is_gimple_val (rhs2)
>|| !is_gimple_val (rhs3))
>  {
>error ("invalid operands in ternary operation");
>return true;
>  }
>
>/* First handle operations that involve different types.  */
>switch (rhs_code)
>  {
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c   (revision 191810)
> +++ gcc/gimple-fold.c   (working copy)
> @@ -23,21 +23,20 @@ along with GCC; see the file COPYING3.
>  #include "coretypes.h"
>  #include "tm.h"
>  #include "tree.h"
>  #include "flags.h"
>  #include "function.h"
>  #include "dumpfile.h"
>  #include "tree-flow.h"
>  #include "t

[google] Emit relative addresses to function patch sections instead of absolute addresses. (issue6572065)

2012-09-28 Thread Harshit Chopra
commit fc3a55ccec9bc770c79f8a221f5abd397befc8f6
Author: Harshit Chopra 
Date:   Thu Sep 20 17:49:59 2012 -0700

Instead of emitting absolute addresses to the function patch sections, emit 
relative addresses. Absolute addresses might require relocation, which is time 
consuming and fraught with other issues.

M   gcc/config/i386/i386.c

Tested:
  Ran make check-gcc and manually confirmed that the affected tests pass.

ChangeLog:

2012-09-28  Harshit Chopra  

* gcc/config/i386/i386.c (ix86_output_function_nops_prologue_epilogue): 
Emit relative address to function patch sections.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f72b0b5..8c9334f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11098,7 +11098,7 @@ ix86_output_function_nops_prologue_epilogue (FILE *file,
$LFPEL0:
  
  0x90 (repeated num_actual_nops times)
- .quad $LFPESL0
+ .quad $LFPESL0 - .
  followed by section 'section_name' which contains the address
  of instruction at 'label'.
*/
@@ -0,7 +0,10 @@ ix86_output_function_nops_prologue_epilogue (FILE 
*file,
 asm_fprintf (file, ASM_BYTE"0x90\n");
 
   fprintf (file, ASM_QUAD);
+  /* Output "section_label - ." for the relative address of the entry in
+ the section 'section_name'.  */
   assemble_name_raw (file, section_label);
+  fprintf (file, " - .");
   fprintf (file, "\n");
 
   /* Emit the backpointer section. For functions belonging to comdat group,
@@ -11144,7 +11147,7 @@ ix86_output_function_nops_prologue_epilogue (FILE *file,
  .quad $LFPEL0
*/
   ASM_OUTPUT_INTERNAL_LABEL (file, section_label);
-  fprintf(file, ASM_QUAD"\t");
+  fprintf(file, ASM_QUAD);
   assemble_name_raw (file, label);
   fprintf (file, "\n");
 

--
This patch is available for review at http://codereview.appspot.com/6572065


RE: [PATCH RFA] Implement register pressure directed hoist pass

2012-09-28 Thread Bin Cheng
Thanks for comments.

> -Original Message-
> From: Steven Bosscher [mailto:stevenb@gmail.com]
> Sent: Friday, September 28, 2012 4:29 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; Eric Botcazou; Richard Sandiford;
> vmaka...@redhat.com
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng  wrote:
> > (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".

Did the mail-client wrap this line? I found no line exceeding 80 characters,
or I missed something?

> 
> 
> > (change_pressure, mark_regno_live, mark_regno_death,
mark_reg_death)
> > (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
> 
> Please use the DF caches instead of note_stores, note_uses, etc.

These register pressure calculation codes are copied from loop-invariant.c.
I was thinking: Should we just export these interfaces from
loop-invariant.c, or copy it as in this patch. Maybe we can abstract these
function/data-structure into a common file? I would like to hear your
comments.

If we decide to keep these codes in gcse.c, I will make the change using DF
caches according to your comments. 
Moreover, it seems I still have to iterate REG_NOTES to find
REG_UNUSED/REG_DEAD information, or I can get them from DF caches too?
Thanks.
 
> 
> 
> > (free_bb_data): New.
> 
> Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
> free_aux_for_block.

Yes.






Re: [PATCH RFA] Implement register pressure directed hoist pass

2012-09-28 Thread Pedro Alves
On 09/28/2012 09:29 AM, Steven Bosscher wrote:
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng  wrote:
>> (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".

>From 
>:

Break long lists of function names by closing continued lines with ‘)’, rather 
than ‘,’, and opening the continuation with ‘(’ as in this example:

* keyboard.c (menu_bar_items, tool_bar_items)
(Fexecute_extended_command): Deal with 'keymap' property.

-- 
Pedro Alves



[PATCH] [MELT] Fixing echo stderr permission issue

2012-09-28 Thread Alexandre Lissy
Hello,

While updating my gcc melt package for debian/ubuntu, I ran into an
issue of /dev/stderr permissions. Some melt build script are directly
echoing to /dev/stderr; and that seems not correct. The attached patch
changes this to echoing to &2, which is more sane and works.
>From ebc8fa49c1aee06160844012220bdf08c8e6bff9 Mon Sep 17 00:00:00 2001
From: Alexandre Lissy 
Date: Fri, 28 Sep 2012 12:23:30 +0200
Subject: [PATCH] 2012-09-26  Alexandre Lissy  

[gcc/]
	* melt-build-script.pl: Using >&2 instead of >/dev/stderr
---
 gcc/ChangeLog.MELT|3 ++-
 gcc/melt-build-script.tpl |   28 ++--
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/ChangeLog.MELT b/gcc/ChangeLog.MELT
index dac5c2a..734547e 100644
--- a/gcc/ChangeLog.MELT
+++ b/gcc/ChangeLog.MELT
@@ -1,4 +1,5 @@
-
+2012-09-28  Alexandre Lissy  
+	* melt-build-script.pl: using >&2 instead of >/dev/stderr
 
 2012-09-26  Basile Starynkevitch  
 
diff --git a/gcc/melt-build-script.tpl b/gcc/melt-build-script.tpl
index 55d0117..4d3eddf 100644
--- a/gcc/melt-build-script.tpl
+++ b/gcc/melt-build-script.tpl
@@ -65,7 +65,7 @@ date +"/*empty file for MELT build %c*/" > meltbuild-empty-file.c
 
 ## our error function  [+(.(fromline))+]
 function meltbuild_error () {
-echo MELT BUILD SCRIPT FAILURE: $@ > /dev/stderr
+echo MELT BUILD SCRIPT FAILURE: $@ >&2
 exit 1
 }
 
@@ -76,14 +76,14 @@ function meltbuild_symlink () {
 
 ## our info function
 function meltbuild_info () {
-echo MELT BUILD SCRIPT INFO: $@ > /dev/stderr
+echo MELT BUILD SCRIPT INFO: $@ >&2
 }
 
 ## our notice function - for more important things than info
 function meltbuild_notice () {
 meltnotititle=$1
 shift
-(echo; echo; echo MELT BUILD SCRIPT NOTICE "$meltnotititle:" $@ ; echo ) > /dev/stderr
+(echo; echo; echo MELT BUILD SCRIPT NOTICE "$meltnotititle:" $@ ; echo ) >&2
 if [ -n "$GCCMELT_BUILD_NOTIFICATION" ]; then
 	$GCCMELT_BUILD_NOTIFICATION "$meltnotititle:" "$*"
 fi
@@ -205,7 +205,7 @@ meltbuild_notice STAGE0+  [+(.(fromline))+] starting stage zero
 ##
   mv $MELT_ZERO_GENERATED_[+varsuf+]_BUILDMK-tmp$$ $MELT_ZERO_GENERATED_[+varsuf+]_BUILDMK
   meltbuild_info [+(.(fromline))+] generated stagezero makedep $MELT_ZERO_GENERATED_[+varsuf+]_BUILDMK
-  ls -l $MELT_ZERO_GENERATED_[+varsuf+]_BUILDMK > /dev/stderr
+  ls -l $MELT_ZERO_GENERATED_[+varsuf+]_BUILDMK >&2
 
   $GCCMELT_MAKE -f $GCCMELT_MODULE_MK melt_module \
   GCCMELT_FROM=stagezero-[+(.(fromline))+] \
@@ -221,7 +221,7 @@ meltbuild_notice STAGE0+  [+(.(fromline))+] starting stage zero
   || meltbuild_error  [+(.(fromline))+] stage0 [+base+] did not build "(with $GCCMELT_MAKE  -f $GCCMELT_MODULE_MK)" compiler $GCCMELT_COMPILER cflags $GCCMELT_COMPILER_FLAGS
 
   meltbuild_info [+(.(fromline))+] stage0 [+base+] module 
-  ls -l "$GCCMELT_STAGE_ZERO/[+base+].meltmod-$MELT_ZERO_GENERATED_[+varsuf+]_CUMULMD5.$GCCMELT_ZERO_FLAVOR.so" > /dev/stderr \
+  ls -l "$GCCMELT_STAGE_ZERO/[+base+].meltmod-$MELT_ZERO_GENERATED_[+varsuf+]_CUMULMD5.$GCCMELT_ZERO_FLAVOR.so" >&2 \
   || meltbuild_error  [+(.(fromline))+] stage0 [+base+] fail to build \
   "$GCCMELT_STAGE_ZERO/[+base+].meltmod-$MELT_ZERO_GENERATED_[+varsuf+]_CUMULMD5.$GCCMELT_ZERO_FLAVOR.so"
 
@@ -250,7 +250,7 @@ else
meltbuild_info [+(.(fromline))+] skipped stage0 because of stamp file $melt_stagezero_stamp
 fi
 
-meltbuild_info [+(.(fromline))+] times after stagezero at `date '+%x %H:%M:%S'`: ;  times > /dev/stderr
+meltbuild_info [+(.(fromline))+] times after stagezero at `date '+%x %H:%M:%S'`: ;  times >&2
 
 
 
@@ -301,7 +301,7 @@ function meltbuild_emit () {
 echo meltbuild-empty-file.c >>  $meltargs-$$-tmp
 mv $meltargs-$$-tmp $meltargs
 meltbuild_info $meltfrom argument file $meltargs is
-cat  $meltargs < /dev/null > /dev/stderr
+cat  $meltargs < /dev/null >&2
 if [ -z "$GCCMELT_SKIPEMITC" ]; then
 	$GCCMELT_CC1_PREFIX $GCCMELT_CC1 @$meltargs || meltbuild_error $meltfrom failed with arguments @$meltargs
 ## remove obsolete secondary C files left previously in $meltstage 
@@ -557,7 +557,7 @@ fi
 
 meltbuild_info [+(.(fromline))+] before applications GCCMELT_SKIPEMITC=$GCCMELT_SKIPEMITC.
 
-meltbuild_info [+(.(fromline))+] times before applications at `date '+%x %H:%M:%S'`: ;  times > /dev/stderr
+meltbuild_info [+(.(fromline))+] times before applications at `date '+%x %H:%M:%S'`: ;  times >&2
  
 melt_final_application_stamp=meltbuild-final-application.stamp
 
@@ -622,7 +622,7 @@ meltbuild_notice 'doing applications'  [+(.(fromline))+] doing applications
 [+ENDFOR melt_application_file+]
   echo "///end stamp $melt_final_application_stamp"  >> $meltappstamptemp
   $GCCMELT_MOVE_IF_CHANGE $meltappstamptemp  $melt_final_application_stamp
-meltbuild_info [+(.(fromline))+] times after applications at `date '+%x %H:%M:%S'`: ;  times > /dev/stderr
+meltbuild_info [+(.(fromline

[PATCH][LTO] "properly" stream PARM_DECLs

2012-09-28 Thread Richard Guenther

This avoids streaming PARM_DECLs both in the global type/decl and
the local function sections.  They are needed on the global level,
so properly stream them there.  Fixes part of the issues we have
with debug information for early inlining, too.

LTO Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-09-28  Richard Guenther  

PR lto/47799
* lto-streamer-out.c (tree_is_indexable): Make PARM_DECLs global.
(lto_output_tree_ref): Handle references to them.
(output_function): Do not output function arguments again.
* lto-streamer-in.c (input_function): Do not input arguments
again, nor overwrite them.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c.orig 2012-09-26 16:47:18.0 +0200
--- gcc/lto-streamer-out.c  2012-09-28 11:18:35.438055184 +0200
*** static bool
*** 125,131 
  tree_is_indexable (tree t)
  {
if (TREE_CODE (t) == PARM_DECL)
! return false;
else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
   && !TREE_STATIC (t))
  return false;
--- 125,131 
  tree_is_indexable (tree t)
  {
if (TREE_CODE (t) == PARM_DECL)
! return true;
else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
   && !TREE_STATIC (t))
  return false;
*** lto_output_tree_ref (struct output_block
*** 237,242 
--- 237,243 
  case VAR_DECL:
  case DEBUG_EXPR_DECL:
gcc_assert (decl_function_context (expr) == NULL || TREE_STATIC (expr));
+ case PARM_DECL:
streamer_write_record_start (ob, LTO_global_decl_ref);
lto_output_var_decl_index (ob->decl_state, ob->main_stream, expr);
break;
*** output_function (struct cgraph_node *nod
*** 806,814 
  
output_struct_function_base (ob, fn);
  
-   /* Output the head of the arguments list.  */
-   stream_write_tree (ob, DECL_ARGUMENTS (function), true);
- 
/* Output all the SSA names used in the function.  */
output_ssa_names (ob, fn);
  
--- 807,812 
Index: gcc/lto-streamer-in.c
===
*** gcc/lto-streamer-in.c.orig  2012-09-21 10:59:45.0 +0200
--- gcc/lto-streamer-in.c   2012-09-28 11:14:53.835068419 +0200
*** input_function (tree fn_decl, struct dat
*** 823,829 
gimple *stmts;
basic_block bb;
struct cgraph_node *node;
-   tree args, narg, oarg;
  
fn = DECL_STRUCT_FUNCTION (fn_decl);
tag = streamer_read_record_start (ib);
--- 823,828 
*** input_function (tree fn_decl, struct dat
*** 834,855 
  
input_struct_function_base (fn, data_in, ib);
  
-   /* Read all function arguments.  We need to re-map them here to the
-  arguments of the merged function declaration.  */
-   args = stream_read_tree (ib, data_in);
-   for (oarg = args, narg = DECL_ARGUMENTS (fn_decl);
-oarg && narg;
-oarg = TREE_CHAIN (oarg), narg = TREE_CHAIN (narg))
- {
-   unsigned ix;
-   bool res;
-   res = streamer_tree_cache_lookup (data_in->reader_cache, oarg, &ix);
-   gcc_assert (res);
-   /* Replace the argument in the streamer cache.  */
-   streamer_tree_cache_insert_at (data_in->reader_cache, narg, ix);
- }
-   gcc_assert (!oarg && !narg);
- 
/* Read all the SSA names.  */
input_ssa_names (ib, data_in, fn);
  
--- 833,838 


[C++ Patch] PR 54249

2012-09-28 Thread Paolo Carlini

Hi,

this patchlet fixes the issue reported by Daniel by simply adding 
nullptr_t to the global namespace in stddef.h (over which luckily we 
have control). Tested x86_64-linux.


Thanks,
Paolo.

/
2012-09-28  Paolo Carlini  

PR c++/54249
* ginclude/stddef.h: In C++11 mode declare nullptr_t in the global
namespace.

/testsuite
2012-09-28  Paolo Carlini  

PR c++/54249
* g++.dg/cpp0x/stddef.C: New.
Index: ginclude/stddef.h
===
--- ginclude/stddef.h   (revision 191823)
+++ ginclude/stddef.h   (working copy)
@@ -427,6 +427,13 @@ typedef struct {
 #endif
 #endif /* C11 or C++11.  */
 
+#if defined(__cplusplus) && __cplusplus >= 201103L
+#ifndef _GXX_NULLPTR_T
+#define _GXX_NULLPTR_T
+  typedef decltype(nullptr) nullptr_t;
+#endif
+#endif /* C++11.  */
+
 #endif /* _STDDEF_H was defined this time */
 
 #endif /* !_STDDEF_H && !_STDDEF_H_ && !_ANSI_STDDEF_H && !__STDDEF_H__
Index: testsuite/g++.dg/cpp0x/stddef.C
===
--- testsuite/g++.dg/cpp0x/stddef.C (revision 0)
+++ testsuite/g++.dg/cpp0x/stddef.C (working copy)
@@ -0,0 +1,6 @@
+// PR c++/54249
+// { dg-do compile { target c++11 } }
+
+#include 
+
+::nullptr_t n;


[PATCH] Fix up vector CONSTRUCTOR handling (PR tree-optimization/54713)

2012-09-28 Thread Jakub Jelinek
Hi!

As discussed in the PR, tree-vect-generic.c sometimes creates CONSTRUCTORs
with vector elements to build up larger vectors from vectors supported by
HW.  This patch teaches fold-const to bail up on those.  Vector CONSTRUCTOR
verification changes and assorted fixes have been separated from the patch
and will be posted probably next week.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-09-28  Jakub Jelinek  

PR tree-optimization/54713
* fold-const.c (vec_cst_ctor_to_array): Give up if vector CONSTRUCTOR
has vector elements.
(fold_ternary_loc) : Likewise.
* tree-vect-generic.c (vector_element): Don't rely on CONSTRUCTOR elts
indexes.  Use BIT_FIELD_REF if CONSTRUCTOR has vector elements.
(lower_vec_perm): Use NULL_TREE CONSTRUCTOR indexes.

* gcc.c-torture/compile/pr54713-1.c: New test.
* gcc.c-torture/compile/pr54713-2.c: New test.
* gcc.c-torture/compile/pr54713-3.c: New test.

--- gcc/fold-const.c.jj 2012-09-25 11:59:43.0 +0200
+++ gcc/fold-const.c2012-09-26 13:14:05.639000395 +0200
@@ -9559,7 +9559,7 @@ vec_cst_ctor_to_array (tree arg, tree *e
   constructor_elt *elt;
 
   FOR_EACH_VEC_ELT (constructor_elt, CONSTRUCTOR_ELTS (arg), i, elt)
-   if (i >= nelts)
+   if (i >= nelts || TREE_CODE (TREE_TYPE (elt->value)) == VECTOR_TYPE)
  return false;
else
  elts[i] = elt->value;
@@ -14030,22 +14030,35 @@ fold_ternary_loc (location_t loc, enum t
  unsigned i;
  if (CONSTRUCTOR_NELTS (arg0) == 0)
return build_constructor (type, NULL);
- vals = VEC_alloc (constructor_elt, gc, n);
- for (i = 0; i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
-  ++i)
-   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
-   CONSTRUCTOR_ELT
- (arg0, idx + i)->value);
- return build_constructor (type, vals);
+ if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
+0)->value))
+ != VECTOR_TYPE)
+   {
+ vals = VEC_alloc (constructor_elt, gc, n);
+ for (i = 0;
+  i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
+  ++i)
+   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
+   CONSTRUCTOR_ELT
+ (arg0, idx + i)->value);
+ return build_constructor (type, vals);
+   }
}
}
  else if (n == 1)
{
  if (TREE_CODE (arg0) == VECTOR_CST)
return VECTOR_CST_ELT (arg0, idx);
- else if (idx < CONSTRUCTOR_NELTS (arg0))
-   return CONSTRUCTOR_ELT (arg0, idx)->value;
- return build_zero_cst (type);
+ else if (CONSTRUCTOR_NELTS (arg0) == 0)
+   return build_zero_cst (type);
+ else if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
+ 0)->value))
+  != VECTOR_TYPE)
+   {
+ if (idx < CONSTRUCTOR_NELTS (arg0))
+   return CONSTRUCTOR_ELT (arg0, idx)->value;
+ return build_zero_cst (type);
+   }
}
}
}
--- gcc/tree-vect-generic.c.jj  2012-09-18 12:14:48.0 +0200
+++ gcc/tree-vect-generic.c 2012-09-26 14:23:40.742171292 +0200
@@ -1050,14 +1050,13 @@ vector_element (gimple_stmt_iterator *gs
 
   if (TREE_CODE (vect) == VECTOR_CST)
return VECTOR_CST_ELT (vect, index);
-  else if (TREE_CODE (vect) == CONSTRUCTOR)
+  else if (TREE_CODE (vect) == CONSTRUCTOR
+  && (CONSTRUCTOR_NELTS (vect) == 0
+  || TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (vect, 0)->value))
+ != VECTOR_TYPE))
 {
-  unsigned i;
-  tree elt_i, elt_v;
-
- FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (vect), i, elt_i, elt_v)
-if (operand_equal_p (elt_i, idx, 0))
-  return elt_v;
+ if (index < CONSTRUCTOR_NELTS (vect))
+   return CONSTRUCTOR_ELT (vect, index)->value;
   return build_zero_cst (vect_elt_type);
 }
   else
@@ -1215,7 +1214,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
t = v0_val;
 }
 
-  CONSTRUCTOR_APPEND_ELT (v, si, t);
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t);
 }
 
   constr = build_constructor (vect_type, v);

Re: [PATCH] Fix up vector CONSTRUCTOR handling (PR tree-optimization/54713)

2012-09-28 Thread Richard Guenther
On Fri, Sep 28, 2012 at 1:31 PM, Jakub Jelinek  wrote:
> Hi!
>
> As discussed in the PR, tree-vect-generic.c sometimes creates CONSTRUCTORs
> with vector elements to build up larger vectors from vectors supported by
> HW.  This patch teaches fold-const to bail up on those.  Vector CONSTRUCTOR
> verification changes and assorted fixes have been separated from the patch
> and will be posted probably next week.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2012-09-28  Jakub Jelinek  
>
> PR tree-optimization/54713
> * fold-const.c (vec_cst_ctor_to_array): Give up if vector CONSTRUCTOR
> has vector elements.
> (fold_ternary_loc) : Likewise.
> * tree-vect-generic.c (vector_element): Don't rely on CONSTRUCTOR elts
> indexes.  Use BIT_FIELD_REF if CONSTRUCTOR has vector elements.
> (lower_vec_perm): Use NULL_TREE CONSTRUCTOR indexes.
>
> * gcc.c-torture/compile/pr54713-1.c: New test.
> * gcc.c-torture/compile/pr54713-2.c: New test.
> * gcc.c-torture/compile/pr54713-3.c: New test.
>
> --- gcc/fold-const.c.jj 2012-09-25 11:59:43.0 +0200
> +++ gcc/fold-const.c2012-09-26 13:14:05.639000395 +0200
> @@ -9559,7 +9559,7 @@ vec_cst_ctor_to_array (tree arg, tree *e
>constructor_elt *elt;
>
>FOR_EACH_VEC_ELT (constructor_elt, CONSTRUCTOR_ELTS (arg), i, elt)
> -   if (i >= nelts)
> +   if (i >= nelts || TREE_CODE (TREE_TYPE (elt->value)) == VECTOR_TYPE)
>   return false;
> else
>   elts[i] = elt->value;
> @@ -14030,22 +14030,35 @@ fold_ternary_loc (location_t loc, enum t
>   unsigned i;
>   if (CONSTRUCTOR_NELTS (arg0) == 0)
> return build_constructor (type, NULL);
> - vals = VEC_alloc (constructor_elt, gc, n);
> - for (i = 0; i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
> -  ++i)
> -   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
> -   CONSTRUCTOR_ELT
> - (arg0, idx + i)->value);
> - return build_constructor (type, vals);
> + if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
> +0)->value))
> + != VECTOR_TYPE)
> +   {
> + vals = VEC_alloc (constructor_elt, gc, n);
> + for (i = 0;
> +  i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
> +  ++i)
> +   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
> +   CONSTRUCTOR_ELT
> + (arg0, idx + i)->value);
> + return build_constructor (type, vals);
> +   }
> }
> }
>   else if (n == 1)
> {
>   if (TREE_CODE (arg0) == VECTOR_CST)
> return VECTOR_CST_ELT (arg0, idx);
> - else if (idx < CONSTRUCTOR_NELTS (arg0))
> -   return CONSTRUCTOR_ELT (arg0, idx)->value;
> - return build_zero_cst (type);
> + else if (CONSTRUCTOR_NELTS (arg0) == 0)
> +   return build_zero_cst (type);
> + else if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
> + 0)->value))
> +  != VECTOR_TYPE)
> +   {
> + if (idx < CONSTRUCTOR_NELTS (arg0))
> +   return CONSTRUCTOR_ELT (arg0, idx)->value;
> + return build_zero_cst (type);
> +   }
> }
> }
> }
> --- gcc/tree-vect-generic.c.jj  2012-09-18 12:14:48.0 +0200
> +++ gcc/tree-vect-generic.c 2012-09-26 14:23:40.742171292 +0200
> @@ -1050,14 +1050,13 @@ vector_element (gimple_stmt_iterator *gs
>
>if (TREE_CODE (vect) == VECTOR_CST)
> return VECTOR_CST_ELT (vect, index);
> -  else if (TREE_CODE (vect) == CONSTRUCTOR)
> +  else if (TREE_CODE (vect) == CONSTRUCTOR
> +  && (CONSTRUCTOR_NELTS (vect) == 0
> +  || TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (vect, 0)->value))
> + != VECTOR_TYPE))
>  {
> -  unsigned i;
> -  tree elt_i, elt_v;
> -
> - FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (vect), i, elt_i, elt_v)
> -if (operand_equal_p (elt_i, idx, 0))
> -  return elt_v;
> + if (index < CONSTRUCTOR_NELTS (vect))
> +   return CONSTRUCTOR_ELT (vect, index)->value;
>return build_zero_cst (vect_elt_type

Re: Merge C++ conversion into trunk (4/6 - hash table rewrite)

2012-09-28 Thread Michael Matz
Hi,

On Thu, 27 Sep 2012, Lawrence Crowl wrote:

> > template 
> > struct D : B
> > {
> >   typedef typename B::E E; // element_type
> >   E getme (int index);
> > }
> 
> Inside that struct, lets say we have a field of type E.  Do we name
> it F or f?

IMHO only for types, not for any other decls.

> > > Do you have an alternate suggestion, one that does not confuse
> > > template parameters and dependent names?
> >
> > Upper last character?  Just kidding :)  Too many detailed rules
> > for conventions are the death of them, use rules of thumbs,
> > my one would be "somehow depends on template args -> has upper
> > character in name", where "somehow depends on" includes "is a".
> 
> Ah, but there is a problem.  That typedef name does not necessarily
> depend on a template parameter.
> 
> It is common practice to have
> 
> struct Q
> {
>   typedef int E;
>   E getme (int index);
> };

Easy: I wouldn't make a typedef for Q::E that merely is int.  The reason 
is that it makes knowing what getme really returns harder.  You have to 
look it up always (or know the class already).  In fact that's one of my 
gripes with the standard library, much too much indirection through 
entities merely referring to other entities.  Might be only important for 
the libraries implementors but I sure hope that we don't start down that 
road in GCC.

> In fact, one place is in the hash table code we are discussing.
> The hash descriptor type may not itself be a template.  I believe
> that few of them will actually be templates.

Then I don't see the need for class-local typedefs.

> So, if E implies comes from template, the implication is wrong.
> 
> If we were to follow C++ standard library conventions, we would call it 
> value_type.

Well, but value_type surely does depend on the hashtables type argument, 
doesn't it?   After all it is a typedef from 'Key'.
I would expect that htab::value_type is tree, and 
htab::value_type is int, and I would like to see it named 
htab::T or ::E.

> That would be my preference.  However, if folks want a shorter name, 
> I'll live with that too.  But as it stands, the current name is very 
> confusing.

I would even prefer 'e' over value_type.  It's scoped, the context always 
will be clear, no need to be verbose in that name.  I find the long names 
inelegant, as most of the standard libs conventions.


Ciao,
Michael.


Re: Merge C++ conversion into trunk (4/6 - hash table rewrite)

2012-09-28 Thread Gabriel Dos Reis
On Fri, Sep 28, 2012 at 8:18 AM, Michael Matz  wrote:

>> It is common practice to have
>>
>> struct Q
>> {
>>   typedef int E;
>>   E getme (int index);
>> };
>
> Easy: I wouldn't make a typedef for Q::E that merely is int.  The reason
> is that it makes knowing what getme really returns harder.

The point of these nested type is precisely to allow  a *uniform* access
to associated from within a template (e.g. a container) -- irrespective of
what those types happen to resolve to, builtin or not.

-- Gaby


libgo patch committed: Use libbacktrace

2012-09-28 Thread Ian Lance Taylor
This patch to libgo changes it to use libbacktrace.  Previously
backtraces required the Go package debug/elf to register itself with the
runtime during the package initialization, which only worked if the
program actually imported debug/elf one way or another.  Bootstrapped
and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
mainline.

Ian


2012-09-28  Ian Lance Taylor  

* Makefile.def: Make all-target-libgo depend on
all-target-libbacktrace.
* Makefile.in: Rebuild.


diff -r 0126903cb089 libgo/Makefile.am
--- a/libgo/Makefile.am	Wed Sep 26 22:40:01 2012 -0700
+++ b/libgo/Makefile.am	Fri Sep 28 07:22:51 2012 -0700
@@ -39,7 +39,8 @@
 
 AM_CFLAGS = -fexceptions -fplan9-extensions $(SPLIT_STACK) $(WARN_CFLAGS) \
 	$(STRINGOPS_FLAG) $(OSCFLAGS) \
-	-I $(srcdir)/../libgcc -I $(MULTIBUILDTOP)../../gcc/include
+	-I $(srcdir)/../libgcc -I $(srcdir)/../libbacktrace \
+	-I $(MULTIBUILDTOP)../../gcc/include
 
 if USING_SPLIT_STACK
 AM_LDFLAGS = -XCClinker $(SPLIT_STACK)
@@ -1062,8 +1063,7 @@
 	go/debug/dwarf/unit.go
 go_debug_elf_files = \
 	go/debug/elf/elf.go \
-	go/debug/elf/file.go \
-	go/debug/elf/runtime.go
+	go/debug/elf/file.go
 go_debug_gosym_files = \
 	go/debug/gosym/pclntab.go \
 	go/debug/gosym/symtab.go
@@ -1782,7 +1782,8 @@
 libgo_la_LDFLAGS = $(PTHREAD_CFLAGS) $(AM_LDFLAGS)
 
 libgo_la_LIBADD = \
-	$(libgo_go_objs) $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)
+	$(libgo_go_objs) ../libbacktrace/libbacktrace.la \
+	$(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)
 
 libgobegin_a_SOURCES = \
 	runtime/go-main.c
diff -r 0126903cb089 libgo/go/debug/elf/elf_test.go
--- a/libgo/go/debug/elf/elf_test.go	Wed Sep 26 22:40:01 2012 -0700
+++ b/libgo/go/debug/elf/elf_test.go	Fri Sep 28 07:22:51 2012 -0700
@@ -2,10 +2,9 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-package elf_test
+package elf
 
 import (
-	. "debug/elf"
 	"fmt"
 	"testing"
 )
diff -r 0126903cb089 libgo/go/debug/elf/file_test.go
--- a/libgo/go/debug/elf/file_test.go	Wed Sep 26 22:40:01 2012 -0700
+++ b/libgo/go/debug/elf/file_test.go	Fri Sep 28 07:22:51 2012 -0700
@@ -2,11 +2,10 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-package elf_test
+package elf
 
 import (
 	"debug/dwarf"
-	. "debug/elf"
 	"encoding/binary"
 	"net"
 	"os"
diff -r 0126903cb089 libgo/go/debug/elf/runtime.go
--- a/libgo/go/debug/elf/runtime.go	Wed Sep 26 22:40:01 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +
@@ -1,161 +0,0 @@
-// Copyright 2012 The Go Authors. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-
-// This is gccgo-specific code that uses DWARF information to fetch
-// file/line information for PC values.  This package registers itself
-// with the runtime package.
-
-package elf
-
-import (
-	"debug/dwarf"
-	"debug/macho"
-	"os"
-	"runtime"
-	"sort"
-	"sync"
-)
-
-func init() {
-	// Register our lookup functions with the runtime package.
-	runtime.RegisterDebugLookup(funcFileLine, symbolValue)
-}
-
-// The file struct holds information for a specific file that is part
-// of the execution.
-type file struct {
-	elf   *File   // If ELF
-	macho *macho.File // If Mach-O
-	dwarf *dwarf.Data // DWARF information
-
-	symsByName []sym // Sorted by name
-	symsByAddr []sym // Sorted by address
-}
-
-// Sort symbols by name.
-type symsByName []sym
-
-func (s symsByName) Len() int   { return len(s) }
-func (s symsByName) Less(i, j int) bool { return s[i].name < s[j].name }
-func (s symsByName) Swap(i, j int)  { s[i], s[j] = s[j], s[i] }
-
-// Sort symbols by address.
-type symsByAddr []sym
-
-func (s symsByAddr) Len() int   { return len(s) }
-func (s symsByAddr) Less(i, j int) bool { return s[i].addr < s[j].addr }
-func (s symsByAddr) Swap(i, j int)  { s[i], s[j] = s[j], s[i] }
-
-// The sym structure holds the information we care about for a symbol,
-// namely name and address.
-type sym struct {
-	name string
-	addr uintptr
-}
-
-// Open an input file.
-func open(name string) (*file, error) {
-	efile, err := Open(name)
-	var mfile *macho.File
-	if err != nil {
-		var merr error
-		mfile, merr = macho.Open(name)
-		if merr != nil {
-			return nil, err
-		}
-	}
-
-	r := &file{elf: efile, macho: mfile}
-
-	if efile != nil {
-		r.dwarf, err = efile.DWARF()
-	} else {
-		r.dwarf, err = mfile.DWARF()
-	}
-	if err != nil {
-		return nil, err
-	}
-
-	var syms []sym
-	if efile != nil {
-		esyms, err := efile.Symbols()
-		if err != nil {
-			return nil, err
-		}
-		syms = make([]sym, 0, len(esyms))
-		for _, s := range esyms {
-			if ST_TYPE(s.Info) == STT_FUNC {
-syms = append(syms, sym{s.Name, uintptr(s.Value)})
-			}
-		}
-	} else {
-		syms = make([]sym, 0, len(mfile.Symtab.Syms))
-		for _, s := range mfile.Symtab.Syms {
-			syms = append(syms, sym{s.Name, uintptr(s.Value)})
-		}
-	}

[patch] fix cross build on powerpc*-*-freebsd

2012-09-28 Thread Andreas Tobler

Hi,

I didn't test building a cross compiler when I committed the port for 
powerpc64-*-freebsd. And now I struggled myself when I wanted to build 
an amd64-freebsd -> powerpc64-freebsd cross compiler.


With the below patch I'm able to do so.

Ok for trunk and 4.7 once I completed the test suite on both branches?

TIA,

Andreas

2012-09-28  Andreas Tobler  

* config.gcc: Replace 'host' with 'target' when configuring for
powerpc64*-*-freebsd.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 191819)
+++ gcc/config.gcc  (working copy)
@@ -1919,7 +1919,7 @@
tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} rs6000/sysv4.h"
extra_options="${extra_options} rs6000/sysv4.opt"
 	tmake_file="rs6000/t-fprules rs6000/t-ppcos ${tmake_file} 
rs6000/t-ppccomm"

-   case ${host} in
+   case ${target} in
 powerpc64*)
tm_file="${tm_file} rs6000/default64.h rs6000/freebsd64.h"
tmake_file="${tmake_file} rs6000/t-freebsd64"


Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Vladimir Makarov

On 12-09-28 4:21 AM, Steven Bosscher wrote:

On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov  wrote:

   Any comments and proposals are appreciated.  Even if GCC community
decides that it is too late to submit it to gcc4.8, the earlier reviews
are always useful.

I would like to see some benchmark numbers, both for code quality and
compile time impact for the most notorious compile time hog PRs for
large routines where IRA performs poorly (e.g. PR54146, PR26854).


I should look at this, Steven. Unfortunately, the compiler @ trunk 
(without my patch) crashes on PR54156:


../../../trunk2/slow.cc: In function ‘void check_() [with NT = 
CGAL::Gmpfi; int s = 3]’:

../../../trunk2/slow.cc:95489:6: internal compiler error: Segmentation fault
void check_(){
^
0x888adf crash_signal
/home/vmakarov/build1/trunk/gcc/gcc/toplev.c:335
0x8f4718 gimple_code
/home/vmakarov/build1/trunk/gcc/gcc/gimple.h:1126
0x8f4718 gimple_nop_p
/home/vmakarov/build1/trunk/gcc/gcc/gimple.h:4851
0x8f4718 walk_aliased_vdefs_1
/home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2204
0x8f50ed walk_aliased_vdefs(ao_ref_s*, tree_node*, bool (*)(ao_ref_s*, 
tree_node*, void*), void*, bitmap_head_def**)

/home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2240
0x9018b5 propagate_necessity
/home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:909
0x9027b3 perform_tree_ssa_dce
/home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:1584
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


PR26854 will take a lot of time to get the data. So I inform you when I 
get them.


Related to compilation time. I reported the compilation time on GCC 
Cauldron for -O2/-O3 and Richard asked me about -O0. I did not have the 
answer that time. I checked the compilation time for 
all_cp2k_fortran.f90 (500K lines of fortran). The compilation time (usr 
and real time) was the same (no visible differences) for GCC with reload 
and for GCC with LRA for -O0.


When I started LRA project, my major design decision was to reflect LRA 
decision in RTL as much as possible. This simplifies LRA and make it 
easy for maintanence and this is quite different from reload design. I 
realized that time that LRA will be slower reload because of this 
decision as reload works on specialized very fast representation and 
roughly speaking changes RTL only once at the end of its work when it 
decides that it can a generate a right RTL from the representation while 
LRA takes most info from RTL (a bit simplified picture) and changes RTL 
many times during its work.


For me it was a surprise that I managed the same GCC speed (or even 2-3% 
faster for all_cp2k_fortran.f90 on x86) as reload after some hard work. 
But if you check LRA through valgrind --tool=lackey, you will see that 
LRA still, as I guessed before, executes more insns than reload. I think 
that the same or better speed of LRA is achieved by better data and code 
locality, and smaller code size which is translated in faster work of 
the subsequent passes.





Re: RFC: LRA for x86/x86-64 [2/9]

2012-09-28 Thread Vladimir Makarov

On 12-09-28 4:43 AM, Steven Bosscher wrote:

On Fri, Sep 28, 2012 at 12:57 AM, Vladimir Makarov  wrote:

LRA outputs a lot debug information about insns.  I found that using slim
insn/rtl presentation helps a lot for LRA debuging. The following patch
makes slim presentation printing functions visible to LRA.  It also
implements one more such function.

2012-09-27  Vladimir Makarov  

 * rtl.h (debug_bb_n_slim, debug_bb_slim, print_value_slim): New
 prototypes.
 (debug_rtl_slim, debug_insn_slim): Ditto.
 * sched-vis.c (print_value_slim): New.

I have patches in the works to use the slim RTL dumping format more,
too, and to use the pretty-printer code so that printing strings with
escaped characters can be made more transparent (e.g. for use in
GraphViz dumps).
That would be nice.  Slim printing is very useful for LRA which prints a 
lot of changes in RTL code during all its work.  Regular printing is 
unreadable because of its volume.  For LRA debugging I usually find a 
suspicious place in slim dump and if I need more info I use regular dump 
of the suspicious insn.

Perhaps it's time to rename sched-vis.c to print-rtl-slim.c? :-)


Yes, the name sched-vis.c is very misleading.


Re: [patch] fix cross build on powerpc*-*-freebsd

2012-09-28 Thread Ian Lance Taylor
On Fri, Sep 28, 2012 at 4:30 AM, Andreas Tobler  wrote:
>
> 2012-09-28  Andreas Tobler  
>
> * config.gcc: Replace 'host' with 'target' when configuring for
> powerpc64*-*-freebsd.

Counts as obvious.

OK in any case.

Thanks.

Ian


Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant

2012-09-28 Thread Uros Bizjak
On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek  wrote:
> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>> After some off-line discussion with Richard, attached is v2 of the patch.
>>
>> 2012-09-27  Uros Bizjak  
>>
>> PR rtl-optimization/54457
>> * simplify-rtx.c (simplify_subreg):
>>   Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>   to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>
> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

I have bootstrapped and regtested [1] the patch on
alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
were no additional failures.

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02828.html

Uros.


Re: [patch] fix cross build on powerpc*-*-freebsd

2012-09-28 Thread Andreas Tobler

On 28.09.12 17:21, Ian Lance Taylor wrote:

On Fri, Sep 28, 2012 at 4:30 AM, Andreas Tobler  wrote:


2012-09-28  Andreas Tobler  

 * config.gcc: Replace 'host' with 'target' when configuring for
 powerpc64*-*-freebsd.


Counts as obvious.

OK in any case.

Thanks.


Thank you Ian!

I'll continue with testing and once I finished I'll commit to 4.7/trunk.

Andreas



Re: RFC: LRA for x86/x86-64 [2/9]

2012-09-28 Thread Jeff Law

On 09/28/2012 09:21 AM, Vladimir Makarov wrote:

On 12-09-28 4:43 AM, Steven Bosscher wrote:

On Fri, Sep 28, 2012 at 12:57 AM, Vladimir Makarov
 wrote:

LRA outputs a lot debug information about insns.  I found that using
slim
insn/rtl presentation helps a lot for LRA debuging. The following patch
makes slim presentation printing functions visible to LRA.  It also
implements one more such function.

2012-09-27  Vladimir Makarov  

 * rtl.h (debug_bb_n_slim, debug_bb_slim, print_value_slim): New
 prototypes.
 (debug_rtl_slim, debug_insn_slim): Ditto.
 * sched-vis.c (print_value_slim): New.

I have patches in the works to use the slim RTL dumping format more,
too, and to use the pretty-printer code so that printing strings with
escaped characters can be made more transparent (e.g. for use in
GraphViz dumps).

That would be nice.  Slim printing is very useful for LRA which prints a
lot of changes in RTL code during all its work.  Regular printing is
unreadable because of its volume.  For LRA debugging I usually find a
suspicious place in slim dump and if I need more info I use regular dump
of the suspicious insn.

Perhaps it's time to rename sched-vis.c to print-rtl-slim.c? :-)


Yes, the name sched-vis.c is very misleading.
It seems to me this change ought to go forward now (rename to 
print-rtl-slim.c and add print_value_slim.


WRT print_value_slim we have this block comment:




+/* Prints rtxes, I customarily classified as values.  They're
+   constants, registers, labels, symbols and memory accesses.  Print
+   them to file F.  */


That block comment just doesn't make sense when I read it.  It seems like.

/* Print X, an RTL value node, to file F in slim format.  Include
   additional information if VERBOSE is nonzero.

   Value nodes are constants, registers, labels, symbols and
   memory.  */


With that change I think you could make the obvious changes necessary to 
rename to print-rtl-slim and check this patch in.


jeff


[PATCH] PR c++/54401 - Confusing diagnostics about type-alias at class scope

2012-09-28 Thread Dodji Seketeli
Hello,

Consider this invalid example given in the PR, where T is not defined:

 1  template
 2  struct X {
 3  using type = T;
 4  };

g++ yields the confusing diagnostics:

test.cc:3:10: error: expected nested-name-specifier before 'type'
using type = T;
  ^
test.cc:3:10: error: using-declaration for non-member at class scope
test.cc:3:15: error: expected ';' before '=' token
using type = T;
   ^
test.cc:3:15: error: expected unqualified-id before '=' token

I think this is because in cp_parser_member_declaration we tentatively
parse an alias declaration; we then have a somewhat meaningful
diagnostic which alas is not emitted because we are parsing
tentatively.  As the parsing didn't succeed (because the input is
invalid) we try to parse a using declaration, which fails as well; but
then the diagnostic emitted is the one for the failed attempt at
parsing a using declaration, not an alias declaration.  Oops.

The idea of this patch is to detect in advance that we want to parse
an alias declaration, parse it non-tentatively, and then if an error
arises, emit it.

I also changed cp_parser_alias_declaration to get out directly when it
detects that the type-id is invalid, rather than going on nonetheless
and emitting more (irrelevant) error diagnostics.

We are now getting the following output:

test.cc:3:18: erreur: expected type-specifier before ‘T’
 using type = T;
  ^
test.cc:3:18: erreur: ‘T’ does not name a type

I don't really like the "before 'T'" there, but I think we maybe could
revisit the format of what cp_parser_error emits in general, now that
we have caret diagnostics;  We could maybe do away with the "before T"
altogether?

In the mean time, it seems to me that this patch brings an improvement
over what we already have in trunk, and the issue above could be
addressed separately.

Tested on x86_64-unknown-linux-gnu against trunk.

gcc/cp/

* parser.c (cp_parser_expecting_alias_declaration_p): New static
function.
(cp_parser_block_declaration): Use it.
(cp_parser_member_declaration): Likewise.  Don't parse the using
declaration tentatively.
(cp_parser_alias_declaration): Get out if the type-id is invalid.

gcc/testsuite/

* g++.dg/cpp0x/alias-decl-23.C: New test.
---
 gcc/cp/parser.c| 38 +++---
 gcc/testsuite/g++.dg/cpp0x/alias-decl-23.C |  7 ++
 2 files changed, 36 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-23.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e8c0378..cab2d09 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1937,6 +1937,8 @@ static bool cp_parser_using_declaration
   (cp_parser *, bool);
 static void cp_parser_using_directive
   (cp_parser *);
+static bool cp_parser_expecting_alias_declaration_p
+  (cp_parser*);
 static tree cp_parser_alias_declaration
   (cp_parser *);
 static void cp_parser_asm_definition
@@ -10292,11 +10294,7 @@ cp_parser_block_declaration (cp_parser *parser,
cp_parser_using_directive (parser);
   /* If the second token after 'using' is '=', then we have an
 alias-declaration.  */
-  else if (cxx_dialect >= cxx0x
-  && token2->type == CPP_NAME
-  && ((cp_lexer_peek_nth_token (parser->lexer, 3)->type == CPP_EQ)
-  || (cp_lexer_peek_nth_token (parser->lexer, 3)->keyword
-  == RID_ATTRIBUTE)))
+  else if (cp_parser_expecting_alias_declaration_p (parser))
cp_parser_alias_declaration (parser);
   /* Otherwise, it's a using-declaration.  */
   else
@@ -15079,6 +15077,24 @@ cp_parser_using_declaration (cp_parser* parser,
   return true;
 }
 
+/* Return TRUE if the coming tokens reasonably denote the beginning of
+   an alias declaration.  */
+
+static bool
+cp_parser_expecting_alias_declaration_p (cp_parser* parser)
+{
+  if (cxx_dialect < cxx0x)
+return false;
+  cp_parser_parse_tentatively (parser);
+  cp_parser_require_keyword (parser, RID_USING, RT_USING);
+  cp_parser_identifier (parser);
+  cp_parser_attributes_opt (parser);
+  cp_parser_require (parser, CPP_EQ, RT_EQ);
+  bool is_ok = !cp_parser_error_occurred (parser);
+  cp_parser_abort_tentative_parse (parser);
+  return is_ok;
+}
+
 /* Parse an alias-declaration.
 
alias-declaration:
@@ -15141,6 +15157,9 @@ cp_parser_alias_declaration (cp_parser* parser)
   if (parser->num_template_parameter_lists)
 parser->type_definition_forbidden_message = saved_message;
 
+  if (type == error_mark_node)
+return error_mark_node;
+
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
   if (cp_parser_error_occurred (parser))
@@ -18849,10 +18868,11 @@ cp_parser_member_declaration (cp_parser* parser)
   else
{
  tree decl;
- cp_parser_parse_tentatively (parser);
- decl = cp_parser_alias_declaration (parser);
- 

Re: Defining C99 predefined macros for whole translation unit

2012-09-28 Thread Joseph S. Myers
On Fri, 24 Apr 2009, Ian Lance Taylor wrote:

> This patch to gcc is OK if the patch to glibc is OK.

Now that the corresponding glibc patch is in glibc 2.16, I've updated
the GCC patch (original submission
) for current
sources and retested it.  Although the original version was approved
conditional on the glibc patch, some fairly substantial changes were
needed for changes to hook handling in the past three years, so I'm
resubmitting (the non-C parts of) this patch rather than just
committing it.  (The C parts also have fairly substantial changes
relating to PCH, which seems to have got rather more fragile and
difficult to keep working in the past three years; the PCH-related
bits of this patch may well not be particularly ideal, but they were
what I found that actually works for the PCH tests for both old and
new glibc headers.)

Bootstrapped with no regressions on x86_64-unknown-linux-gnu (older
glibc).  Also tested with cross to arm-none-linux-gnueabi (current
glibc).  OK to commit?

gcc:
2012-09-28  Joseph Myers  

* config.gcc (*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu |
*-*-knetbsd*-gnu | *-*-gnu* | *-*-kopensolaris*-gnu): Use
glibc-c.o in c_target_objs and cxx_target_objs.  Use t-glibc in
tmake_file.  Set target_has_targetcm.
(tilegx-*-linux*, tilepro-*-linux*): Append to c_target_objs and
cxx_target_objs rather than overriding previous value.
* config/glibc-c.c, config/t-glibc: New.
* doc/tm.texi.in (TARGET_C_PREINCLUDE): New @hook.
* doc/tm.texi: Regenerate.
* hooks.c (hook_constcharptr_void_null): New.
* hooks.h (hook_constcharptr_void_null): Declare.

gcc/c-family:
2012-09-28  Joseph Myers  

* c-common.h (pch_cpp_save_state): Declare.
* c-target.def (c_preinclude): New hook.
* c-opts.c (done_preinclude): New.
(push_command_line_include): Handle default preincluded header.
(cb_file_change): Call pch_cpp_save_state when calling
push_command_line_include.
* c-pch.c (pch_ready_to_save_cpp_state, pch_cpp_state_saved)
(pch_cpp_save_state): New.
(pch_init): Call pch_cpp_save_state conditionally, instead of
calling cpp_save_state.

gcc/testsuite:
2012-09-28  Joseph Myers  

* gcc.dg/c99-predef-1.c: New test.
* gcc.dg/cpp/cmdlne-dU-1.c, gcc.dg/cpp/cmdlne-dU-2.c,
gcc.dg/cpp/cmdlne-dU-3.c, gcc.dg/cpp/cmdlne-dU-4.c,
gcc.dg/cpp/cmdlne-dU-5.c, gcc.dg/cpp/cmdlne-dU-6.c,
gcc.dg/cpp/cmdlne-dU-7.c, gcc.dg/cpp/cmdlne-dU-8.c,
gcc.dg/cpp/cmdlne-dU-9.c, gcc.dg/cpp/cmdlne-dU-10.c,
gcc.dg/cpp/cmdlne-dU-11.c, gcc.dg/cpp/cmdlne-dU-12.c,
gcc.dg/cpp/cmdlne-dU-13.c, gcc.dg/cpp/cmdlne-dU-14.c,
gcc.dg/cpp/cmdlne-dU-15.c, gcc.dg/cpp/cmdlne-dU-16.c,
gcc.dg/cpp/cmdlne-dU-17.c, gcc.dg/cpp/cmdlne-dU-18.c,
gcc.dg/cpp/cmdlne-dU-19.c, gcc.dg/cpp/cmdlne-dU-20.c,
gcc.dg/cpp/cmdlne-dU-21.c, gcc.dg/cpp/cmdlne-dU-22.c,
gcc.dg/cpp/mi5.c, gcc.dg/cpp/multiline.c: Add -nostdinc to
dg-options.

libcpp:
2012-09-28  Joseph Myers  

* files.c (struct _cpp_file): Add implicit_preinclude.
(pch_open_file): Allow a previously opened implicitly included
file.
(_cpp_find_file): Add implicit_preinclude argument.  Free file and
do not call open_file_failed if implicit_preinclude.  Store
implicit_preinclude value.
(_cpp_stack_include, _cpp_fake_include, _cpp_compare_file_date):
Update calls to _cpp_find_file.
(_cpp_stack_include): Handle IT_DEFAULT.
(cpp_push_default_include): New.
* include/cpplib.h (cpp_push_default_include): Declare.
* init.c (cpp_read_main_file): Update call to _cpp_find_file.
* internal.h (enum include_type): Add IT_DEFAULT.
(_cpp_find_file): Update prototype.

Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 191711)
+++ gcc/doc/tm.texi (working copy)
@@ -10646,6 +10646,12 @@ convention when processing system header files, bu
 files @code{__STDC__} will always expand to 1.
 @end defmac
 
+@deftypefn {C Target Hook} {const char *} TARGET_C_PREINCLUDE (void)
+Define this hook to return the name of a header file to be included at the 
start of all compilations, as if it had been included with @code{#include 
<@var{file}>}.  If this hook returns @code{NULL}, or is not defined, or the 
header is not found, or if the user specifies @option{-ffreestanding} or 
@option{-nostdinc}, no header is included.
+
+ This hook can be used together with a header provided by the system C library 
to implement ISO C requirements for certain macros to be predefined that 
describe properties of the whole implementation rather than just the compiler.
+@end deftypefn
+
 @defmac NO_IMPLICIT_EXTERN_C
 Define this macro if the system 

Re: RFC: LRA for x86/x86-64 [1/9]

2012-09-28 Thread Jeff Law

On 09/27/2012 04:56 PM, Vladimir Makarov wrote:

   The following patch adds a new argument for function alter_subreg.
LRA will sometime call alter_subreg with different argument value.

2012-09-27  Vladimir Makarov  

 * output.h (alter_subreg): Add new argument.
 * dbxout.c (dbxout_symbol_location): Pass new argument to
 alter_subreg.
 * sdbout.c (sdbout_symbol): Pass new argument to alter_subreg.
 * final.c (final_scan_insn, cleanup_subreg_operands): Pass new
 argument to alter_subreg.
 (walk_alter_subreg, output_operand): Ditto.
 (alter_subreg): Add new argument.
 * config/m32r/m32r.c (gen_split_move_double): Pass new argument to
 alter_subreg.
 * config/sh/sh.md: Ditto.
 * config/xtensa/xtensa.c (fixup_subreg_mem): Ditto.
 * config/m68k/m68k.c (emit_move_sequence): Ditto.
 * config/arm/arm.c (load_multiple_sequence): Ditto.
 (store_multiple_sequence): Ditto.
 * config/pa/pa.c (pa_emit_move_sequence): Ditto.
 * config/v850/v850.c (v850_reorg): Ditto.

This is OK.

jeff


*ping* [patch, fortran] Handle -Wextra, -fcompare-reals is implied with -Wextra

2012-09-28 Thread Thomas Koenig

I wrote:


the attatched patch (this time for real!) implements
-Wextra for the Fortran front end, and adds -fcompare-reals
to -Wextra.


Ping?


[PATCH] Disable updating VRSAVE everywhere except Darwin

2012-09-28 Thread David Edelsohn
The following proposed patch disables setting, saving and restoring
the VRSAVE register on all targets except Darwin.

VRSAVE was removed from the AIX ABI and was suppose to have been
removed from the PPC SVR4 ABI.  All recent versions of the Linux
kernel set and maintain VRSAVE itself, as a process-level flag, not as
individual bits, so no need for the compiler to set the register or to
save and restore it across calls.  All uses of VRSAVE (e.g., GLibc)
will continue to work using the value set by the kernel.

Comments?

- David

* config/rs6000/rs6000.c (rs6000_option_override_internal): Do not
set TARGET_ALTIVEC_VRSAVE for TARGET_ELF.
(rs6000_stack_info): Only set vrsave_mask for Darwin.

Index: rs6000.c
===
*** rs6000.c(revision 191810)
--- rs6000.c(working copy)
*** rs6000_option_override_internal (bool gl
*** 2725,2734 
  else
rs6000_altivec_abi = 1;
}
-
-   /* Enable VRSAVE for AltiVec ABI, unless explicitly overridden.  */
-   if (!global_options_set.x_TARGET_ALTIVEC_VRSAVE)
-   TARGET_ALTIVEC_VRSAVE = rs6000_altivec_abi;
  }

/* Set the Darwin64 ABI as default for 64-bit Darwin.
--- 2725,2730 
*** rs6000_stack_info (void)
*** 17842,17848 
else
  info_ptr->spe_gp_size = 0;

!   if (TARGET_ALTIVEC_ABI)
  info_ptr->vrsave_mask = compute_vrsave_mask ();
else
  info_ptr->vrsave_mask = 0;
--- 17838,17845 
else
  info_ptr->spe_gp_size = 0;

!   /* Only set VRSAVE register on Darwin.  */
!   if (DEFAULT_ABI == ABI_DARWIN)
  info_ptr->vrsave_mask = compute_vrsave_mask ();
else
  info_ptr->vrsave_mask = 0;


Re: vec_cond_expr adjustments

2012-09-28 Thread Marc Glisse

On Fri, 28 Sep 2012, Richard Guenther wrote:


On Fri, Sep 28, 2012 at 12:42 AM, Marc Glisse  wrote:

Hello,

I have been experimenting with generating VEC_COND_EXPR from the front-end,
and these are just a couple things I noticed.

1) optabs.c requires that the first argument of vec_cond_expr be a
comparison, but verify_gimple_assign_ternary only checks is_gimple_condexpr,
like for COND_EXPR. In the long term, it seems better to also allow ssa_name
and vector_cst (thus match the is_gimple_condexpr condition), but for now I
just want to know early if I created an invalid vec_cond_expr.


optabs should be fixed instead, an is_gimple_val condition is implicitely
val != 0.


For vectors, I think it should be val < 0 (with an appropriate cast of val 
to a signed integer vector type if necessary). Or (val & highbit) != 0, 
but that's longer.



The tree.[ch] and gimple-fold.c hunks are ok if tested properly, the
tree-ssa-forwprop.c idea of using TREE_TYPE (cond), too.


Ok, I will retest that way.


I don't like the tree-cfg.c change, instead re-factor optabs.c to
get a decomposed cond for vector_compare_rtx and appropriately
"decompose" a non-comparison-class cond in expand_vec_cond_expr.


So vector_compare_rtx will take as arguments rcode, t_op0, t_op1 instead 
of cond. And in expand_vec_cond_expr, if I have a condition, I pass its 
elements to vector_compare_rtx, and otherwise I use 0 and the code for 
LT_EXPR as the other arguments.



If we for example have

predicate = a < b;
x = predicate ? d : e;
y = predicate ? f : g;

we ideally want to re-use the predicate computation on targets where
that would be optimal (and combine should be able to recover the
case where it is not).


That I don't understand. The vcond instruction implemented by targets 
takes as arguments d, e, cmp, a, b and emits the comparison itself. I 
don't see how I can avoid sending to the targets both (d,e,<,a,b) and 
(f,g,<,a,b). They will notice eventually that aremove one of the two, but I don't see how to do that in optabs.c. Or I 
can compute x = a < b, use x < 0 as the comparison passed to the targets, 
and expect targets (those for which it is true) to recognize that < 0 is 
useless in a vector condition (PR54700), or is useless on a comparison 
result.


Thanks for the comments,

--
Marc Glisse


Re: RFC: LRA for x86/x86-64 [3/9]

2012-09-28 Thread Paolo Bonzini
Il 28/09/2012 00:57, Vladimir Makarov ha scritto:
> LRA creates a lot of new pseudos.  So the following patch implements
> ahead allocation reg info information which is important for LRA
> compilation speed.
> 
> 2012-09-27  Vladimir Makarov  
> 
> * reginfo.c (max_regno_since_last_resize): New.
> (reg_preferred_class, reg_alternate_class): Add assert.
> (allocate_reg_info): Initialize allocated reg info.
> (resize_reg_info): Make bigger reg_info and initialize new memory.
> (reginfo_init): Initialize max_regno_since_last_resize.
> (setup_reg_classes): Change assert.
> 

Is this considered dataflow stuff?  If so, I also want to approve part
of LRA! :)

This is ok.

Paolo


Re: Remove def operands cache

2012-09-28 Thread Andrew MacLeod

On 09/11/2012 10:53 AM, Michael Matz wrote:

Hi,

the operands cache is ugly.  This patch removes it at least for the def
operands, saving three pointers for roughly each normal statement (the
pointer in gsbase, and two pointers from def_optype_d).  This is
relatively easy to do, because all statements except ASMs have at most one
def (and one vdef), which themself aren't pointed to by something else,
unlike the use operands which have more structure for the SSA web.
Yeah, a bit has changed since the original implementation. There is only 
ever one vdef, and not much else in the way of multiple def's ever 
arrived,  So I think this makes sense.


Note that when I introduce gimple_atomic (delayed until next year :-P), 
there will be at least one more statement with 2 results.. the 
__atomic_cmpxchg node. so the iterator code will grow a little bit more.


Andrew


Re: [google 4.7] fix line number checksum mismatch in lipo-use (issue6566044)

2012-09-28 Thread Rong Xu
Comments are inlined.
Attached is the new patch.

Thanks,

-Rong

On Tue, Sep 25, 2012 at 2:25 PM, Xinliang David Li  wrote:
> On Mon, Sep 24, 2012 at 2:42 PM, Rong Xu  wrote:
>> Hi,
>>
>> This is for google branches only.
>> It fix the lino number checksum mismatch during LIPO-use build.
>>
>> Tested with SPEC and google internal banchmarks.
>>
>> Thanks,
>>
>> -Rong
>>
>> 2012-09-24  Rong Xu  
>>
>> * gcc/coverage.c (coverage_checksum_string): strip out LIPO
>> specific string.
>> (crc32_string_1): New function.
>> * gcc/cp/decl2.c (start_static_storage_duration_function):
>> generate LIPO specific string.
>>
>> Index: gcc/coverage.c
>> ===
>> --- gcc/coverage.c  (revision 191679)
>> +++ gcc/coverage.c  (working copy)
>> @@ -903,6 +903,27 @@
>>  }
>>
>>
>> +/* Generate a crc32 of a string with specified STR_ELN when it's not 0.
>
> STR_ELN --> STR_LEN

Fixed.

>
>> +   Non-zero STR_LEN should only be seen in LIPO mode.  */
>
> Empty line needed.

Fixed.

>
>> +static unsigned
>> +crc32_string_1 (unsigned chksum, const char *string, unsigned str_len)
>> +{
>> +  char *dup;
>> +
>> +  if (!L_IPO_COMP_MODE || str_len == 0)
>> +return crc32_string (chksum, string);
>> +
>> +  gcc_assert (str_len > 0 && str_len < strlen(string));
>> +  dup = xstrdup (string);
>> +  dup[str_len] = 0;
>> +  chksum = crc32_string (chksum, dup);
>> +  free (dup);
>> +
>> +  return chksum;
>> +
>
> Remove extra lines after return.

Fixed.

>
>> +
>> +}
>> +
>>  /* Generate a checksum for a string.  CHKSUM is the current
>> checksum.  */
>>
>> @@ -911,7 +932,26 @@
>>  {
>>int i;
>>char *dup = NULL;
>> +  unsigned lipo_orig_str_len = 0;
>>
>> +  /* Strip out the ending "_cmo_[0-9]*" string from function
>> + name. Otherwise we will have lineno checksum mismatch.  */
>> +  if (L_IPO_COMP_MODE)
>> +{
>> +  int len;
>> +
>> +  i = len = strlen (string);
>> +  while (i--)
>> +if ((string[i] < '0' || string[i] > '9'))
>> +  break;
>> +  if ((i > 5) && (i != len - 1))
>
>  i >= 5?

This should not matter because we are expecting a non-empty sub-string
before "_cmo_". If there not sub-string before "_cmo_", the original
code will do nothing (which I think it's correct in the case of user
defined name.)

>
>> +{
>> +  if (!strncmp (string + i - 4, "_cmo_", 5))
>
> _cmo_ or .cmo. ?
>
>> +lipo_orig_str_len = i - 4;
>> +}
>> +
>> +}
>> +
>>/* Look for everything that looks if it were produced by
>>   get_file_function_name and zero out the second part
>>   that may result from flag_random_seed.  This is not critical
>> @@ -957,7 +997,7 @@
>> }
>>  }
>>
>> -  chksum = crc32_string (chksum, string);
>> +  chksum = crc32_string_1 (chksum, string, lipo_orig_str_len);
>>if (dup)
>>  free (dup);
>>
>> Index: gcc/cp/decl2.c
>> ===
>> --- gcc/cp/decl2.c  (revision 191679)
>> +++ gcc/cp/decl2.c  (working copy)
>> @@ -2911,7 +2911,7 @@
>>   SSDF_IDENTIFIER_.  */
>>sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
>>if (L_IPO_IS_AUXILIARY_MODULE)
>> -sprintf (id, "%s_%u", id, current_module_id);
>> +sprintf (id, "%s_cmo_%u", id, current_module_id);
>
> _cmo_ or .cmo. for consistency?

Changed all "_cmo_" to ".cmo.".

>
> David
>
>>
>>type = build_function_type_list (void_type_node,
>>integer_type_node, integer_type_node,
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/6566044


patch_set2.diff
Description: Binary data


Re: [google 4.7] fix line number checksum mismatch in lipo-use (issue6566044)

2012-09-28 Thread Xinliang David Li
ok (for google-47 and google/main)

thanks,

David

On Fri, Sep 28, 2012 at 10:22 AM, Rong Xu  wrote:
> Comments are inlined.
> Attached is the new patch.
>
> Thanks,
>
> -Rong
>
> On Tue, Sep 25, 2012 at 2:25 PM, Xinliang David Li  wrote:
>> On Mon, Sep 24, 2012 at 2:42 PM, Rong Xu  wrote:
>>> Hi,
>>>
>>> This is for google branches only.
>>> It fix the lino number checksum mismatch during LIPO-use build.
>>>
>>> Tested with SPEC and google internal banchmarks.
>>>
>>> Thanks,
>>>
>>> -Rong
>>>
>>> 2012-09-24  Rong Xu  
>>>
>>> * gcc/coverage.c (coverage_checksum_string): strip out LIPO
>>> specific string.
>>> (crc32_string_1): New function.
>>> * gcc/cp/decl2.c (start_static_storage_duration_function):
>>> generate LIPO specific string.
>>>
>>> Index: gcc/coverage.c
>>> ===
>>> --- gcc/coverage.c  (revision 191679)
>>> +++ gcc/coverage.c  (working copy)
>>> @@ -903,6 +903,27 @@
>>>  }
>>>
>>>
>>> +/* Generate a crc32 of a string with specified STR_ELN when it's not 0.
>>
>> STR_ELN --> STR_LEN
>
> Fixed.
>
>>
>>> +   Non-zero STR_LEN should only be seen in LIPO mode.  */
>>
>> Empty line needed.
>
> Fixed.
>
>>
>>> +static unsigned
>>> +crc32_string_1 (unsigned chksum, const char *string, unsigned str_len)
>>> +{
>>> +  char *dup;
>>> +
>>> +  if (!L_IPO_COMP_MODE || str_len == 0)
>>> +return crc32_string (chksum, string);
>>> +
>>> +  gcc_assert (str_len > 0 && str_len < strlen(string));
>>> +  dup = xstrdup (string);
>>> +  dup[str_len] = 0;
>>> +  chksum = crc32_string (chksum, dup);
>>> +  free (dup);
>>> +
>>> +  return chksum;
>>> +
>>
>> Remove extra lines after return.
>
> Fixed.
>
>>
>>> +
>>> +}
>>> +
>>>  /* Generate a checksum for a string.  CHKSUM is the current
>>> checksum.  */
>>>
>>> @@ -911,7 +932,26 @@
>>>  {
>>>int i;
>>>char *dup = NULL;
>>> +  unsigned lipo_orig_str_len = 0;
>>>
>>> +  /* Strip out the ending "_cmo_[0-9]*" string from function
>>> + name. Otherwise we will have lineno checksum mismatch.  */
>>> +  if (L_IPO_COMP_MODE)
>>> +{
>>> +  int len;
>>> +
>>> +  i = len = strlen (string);
>>> +  while (i--)
>>> +if ((string[i] < '0' || string[i] > '9'))
>>> +  break;
>>> +  if ((i > 5) && (i != len - 1))
>>
>>  i >= 5?
>
> This should not matter because we are expecting a non-empty sub-string
> before "_cmo_". If there not sub-string before "_cmo_", the original
> code will do nothing (which I think it's correct in the case of user
> defined name.)
>
>>
>>> +{
>>> +  if (!strncmp (string + i - 4, "_cmo_", 5))
>>
>> _cmo_ or .cmo. ?
>>
>>> +lipo_orig_str_len = i - 4;
>>> +}
>>> +
>>> +}
>>> +
>>>/* Look for everything that looks if it were produced by
>>>   get_file_function_name and zero out the second part
>>>   that may result from flag_random_seed.  This is not critical
>>> @@ -957,7 +997,7 @@
>>> }
>>>  }
>>>
>>> -  chksum = crc32_string (chksum, string);
>>> +  chksum = crc32_string_1 (chksum, string, lipo_orig_str_len);
>>>if (dup)
>>>  free (dup);
>>>
>>> Index: gcc/cp/decl2.c
>>> ===
>>> --- gcc/cp/decl2.c  (revision 191679)
>>> +++ gcc/cp/decl2.c  (working copy)
>>> @@ -2911,7 +2911,7 @@
>>>   SSDF_IDENTIFIER_.  */
>>>sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
>>>if (L_IPO_IS_AUXILIARY_MODULE)
>>> -sprintf (id, "%s_%u", id, current_module_id);
>>> +sprintf (id, "%s_cmo_%u", id, current_module_id);
>>
>> _cmo_ or .cmo. for consistency?
>
> Changed all "_cmo_" to ".cmo.".
>
>>
>> David
>>
>>>
>>>type = build_function_type_list (void_type_node,
>>>integer_type_node, integer_type_node,
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/6566044


libgo patch committed: runtime.Caller should succeed without debug info

2012-09-28 Thread Ian Lance Taylor
Further testing uncovered a small bug in the change to use the
libbacktrace library.  The runtime.Caller function should succeed if we
get the PC, even if we don't have any debug info.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r dff305030965 libgo/runtime/go-caller.c
--- a/libgo/runtime/go-caller.c	Fri Sep 28 07:26:19 2012 -0700
+++ b/libgo/runtime/go-caller.c	Fri Sep 28 08:49:33 2012 -0700
@@ -172,7 +172,8 @@
   if (n < 1)
 return ret;
   ret.pc = pc;
-  ret.ok = __go_file_line (pc, &fn, &ret.file, &ret.line);
+  __go_file_line (pc, &fn, &ret.file, &ret.line);
+  ret.ok = 1;
   return ret;
 }
 


Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Andi Kleen
Steven Bosscher  writes:

> On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov  
> wrote:
>>   Any comments and proposals are appreciated.  Even if GCC community
>> decides that it is too late to submit it to gcc4.8, the earlier reviews
>> are always useful.
>
> I would like to see some benchmark numbers, both for code quality and
> compile time impact for the most notorious compile time hog PRs for
> large routines where IRA performs poorly (e.g. PR54146, PR26854).

I would be interested in some numbers how much the new XMM spilling
helps on x86 and how it affects code size.

Unfortunately not really qualified to review the code.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #2c

2012-09-28 Thread Michael Meissner
Segher Boessenkool asked me on IRC to break out the fix in the last change.
This patch is just the change to set the default options if the user did not
use -mcpu= and the compiler was not configured with --with-cpu=.
Here are the patches.

I can submit this patch first if David desires, and then resubmit the first of
the infrastructure patches again, or commit both together.

2012-09-28  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_option_override_internal): If
-mcpu= is not specified and the compiler is not configured
using --with-cpu=, use the bits from the TARGET_DEFAULT to
set the initial options.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 191831)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -2461,6 +2461,11 @@ rs6000_option_override_internal (bool gl
   target_flags |= (processor_target_table[cpu_index].target_enable
   & set_masks);
 
+  /* If no -mcpu=, inherit any default options that were cleared via
+ POWERPC_MASKS.  */
+  if (!have_cpu)
+target_flags |= (TARGET_DEFAULT & ~target_flags_explicit);
+
   if (rs6000_tune_index >= 0)
 tune_index = rs6000_tune_index;
   else if (have_cpu)


-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899



Re: RFC: LRA for x86/x86-64 [3/9]

2012-09-28 Thread Jeff Law

On 09/27/2012 04:57 PM, Vladimir Makarov wrote:

LRA creates a lot of new pseudos.  So the following patch implements
ahead allocation reg info information which is important for LRA
compilation speed.

2012-09-27  Vladimir Makarov  

 * reginfo.c (max_regno_since_last_resize): New.
 (reg_preferred_class, reg_alternate_class): Add assert.
 (allocate_reg_info): Initialize allocated reg info.
 (resize_reg_info): Make bigger reg_info and initialize new memory.
 (reginfo_init): Initialize max_regno_since_last_resize.
 (setup_reg_classes): Change assert.
This is fine.  FWIW, it roughly mirrors code I wrote a couple years ago 
when working on range splitting.

jeff


Re: RFC: LRA for x86/x86-64 [5/9]

2012-09-28 Thread Jeff Law

On 09/27/2012 04:58 PM, Vladimir Makarov wrote:

   The following patch mostly prepares some data from IRA which will be
used by LRA.  It is done by moving some definitions fro ira-int.h to
ira.h.  New data reg_class_subset is generated in IRA for LRA.
New functions dealing with equivs are created.  They will be used by
LRA.  Some code of IRA is rewritten to use them too.

   The patch also adds a wrapper code in IRA to be prepared to call
LRA.

2012-09-27  Vladimir Makarov  

 * ira-int.h (struct target_ira_int): Remove x_ira_class_subset_p
 and x_ira_reg_classes_intersect_p.
 (ira_class_subset_p, ira_reg_classes_intersect_p): Remove.
 (ira_reg_equiv_len, ira_reg_equiv_invariant_p): Ditto.
 (ira_reg_equiv_const): Ditto.
 (ira_equiv_no_lvalue_p): New function.
 * ira-color.c (color_pass, move_spill_restore, coalesce_allocnos):
 Use ira_equiv_no_lvalue_p.
 (coalesce_spill_slots, ira_sort_regnos_for_alter_reg): Ditto.
 * ira-emit.c (ira_create_new_reg): Call ira_expand_reg_equiv.
 (generate_edge_moves, change_loop) Use ira_equiv_no_lvalue_p.
 (emit_move_list): Simplify code.  Call
 ira_update_equiv_info_by_shuffle_insn.  Use ira_reg_equiv instead
 of ira_reg_equiv_invariant_p and ira_reg_equiv_const. Change
 assert.
 * ira.c: (setup_reg_class_relations): Set up ira_reg_class_subset.
 (ira_reg_equiv_invariant_p, ira_reg_equiv_const): Remove.
 (find_reg_equiv_invariant_const): Ditto.
 (setup_reg_renumber): Use ira_equiv_no_lvalue_p instead
 of ira_reg_equiv_invariant_p.  Skip caps for LRA.
 (setup_reg_equiv_init, ira_update_equiv_info_by_shuffle_insn): New
 functions.
 (ira_reg_equiv_len): Move it before ira_reg_equiv. Change
 comment.
 (ira_reg_equiv): New.
 (ira_expand_reg_equiv, finish_reg_equiv): New functions.
 (no_equiv, update_equiv_regs): Use ira_reg_equiv instead of
 reg_equiv_init.
 (setup_reg_equiv): New function.
 (ira_use_lra_p): New global.
 (ira): Move initialization of ira_obstack and ira_bitmap_obstack
 upper.  Call init_reg_equiv, setup_reg_equiv, and
 setup_reg_equiv_init instead of initialization of
 ira_reg_equiv_len, ira_reg_equiv_invariant_p, and
 ira_reg_equiv_const.  Don't flatten IRA IRA for LRA. Don't
 reassign conflict allocnos for LRA. Call finish_reg_equiv.
 (do_reload): Prepare code for LRA call.
 * ira.h (ira_use_lra_p): New external.
 (struct target_ira): Add members x_ira_class_subset_p
 x_ira_reg_class_subset, and x_ira_reg_classes_intersect_p.
 (ira_class_subset_p, ira_reg_class_subset): New macros.
 (ira_reg_classes_intersect_p): New macro.
 (ira_reg_equiv_len, ira_reg_equiv): New externals.
 (struct ira_reg_equiv): New.
 (ira_expand_reg_equiv, ira_update_equiv_info_by_shuffle_insn): New
 prototypes.

This is strictly changes within the existing allocator.  It's fine.

jeff




Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Steven Bosscher
On Fri, Sep 28, 2012 at 5:21 PM, Vladimir Makarov  wrote:
> On 12-09-28 4:21 AM, Steven Bosscher wrote:
>>
>> On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov 
>> wrote:
>>>
>>>Any comments and proposals are appreciated.  Even if GCC community
>>> decides that it is too late to submit it to gcc4.8, the earlier reviews
>>> are always useful.
>>
>> I would like to see some benchmark numbers, both for code quality and
>> compile time impact for the most notorious compile time hog PRs for
>> large routines where IRA performs poorly (e.g. PR54146, PR26854).
>>
>>
> I should look at this, Steven. Unfortunately, the compiler @ trunk (without
> my patch) crashes on PR54156:
>
> ../../../trunk2/slow.cc: In function ‘void check_() [with NT = CGAL::Gmpfi;
> int s = 3]’:
> ../../../trunk2/slow.cc:95489:6: internal compiler error: Segmentation fault
> void check_(){
> ^
> 0x888adf crash_signal
> /home/vmakarov/build1/trunk/gcc/gcc/toplev.c:335
> 0x8f4718 gimple_code
> /home/vmakarov/build1/trunk/gcc/gcc/gimple.h:1126
> 0x8f4718 gimple_nop_p
> /home/vmakarov/build1/trunk/gcc/gcc/gimple.h:4851
> 0x8f4718 walk_aliased_vdefs_1
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2204
> 0x8f50ed walk_aliased_vdefs(ao_ref_s*, tree_node*, bool (*)(ao_ref_s*,
> tree_node*, void*), void*, bitmap_head_def**)
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2240
> 0x9018b5 propagate_necessity
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:909
> 0x9027b3 perform_tree_ssa_dce
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:1584
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.

Works for me on gcc17 at r191835, with a gcc configured like so:

"../trunk/configure --with-mpfr=/opt/cfarm/mpfr-latest
--with-gmp=/opt/cfarm/gmp-latest --with-mpc=/opt/cfarm/mpc-latest
--with-isl=/opt/cfarm/isl-latest --with-cloog=/opt/cfarm/cloog-latest
--enable-languages=c,c++ --disable-bootstrap --enable-checking=release
--with-gnu-as --with-gnu-ld
--with-as=/opt/cfarm/binutils-latest/bin/as
--with-ld=/opt/cfarm/binutils-latest/bin/ld"

Top 10 time consumers:
integrated_RA   191.66
df_live&initialized_regs73.43
df_live_regs72.25
out_of_ssa  45.21
tree_PTA35.44
tree_SSA_incremental26.53
remove_unused_locals18.78
combiner16.54
dominance_computation   14.44
register_information14.20
 TOTAL : 732.10

Note I'm using the simplified test case, see comment #14 in the PR.
You can just take the original test case
(http://gcc.gnu.org/bugzilla/attachment.cgi?id=27912) and apply this
patch:

--- slow.cc.orig2012-09-28 21:07:58.0 +0200
+++ slow.cc 2012-09-28 21:08:38.0 +0200
@@ -95503,6 +95503,7 @@
   check_();
 }
 int main(){
+#if 0
   {
 typedef CGAL::Interval_nt I1;
 I1::Protector p1;
@@ -95517,11 +95518,14 @@
   check();
   check();
   check();
+#endif
   check();
+#if 0
   check >();
   check >();
   check();
   check();
   check();
   check();
+#endif
 }

You can compile the test case with:
"./xgcc -B. -S -std=gnu++11 -O1 -frounding-math -ftime-report slow.cc"

Even with this path, the test case really is a great scalability
challange for the compiler :-) I never got the full test case to work
at -O1, and the simpler test case still blows up the compiler at -O2
and higher. At -O1 you need a machine with at least 8GB of memory.
More than half of that is for IRA+reload...

Ciao!
Steven


Re: Merge C++ conversion into trunk (4/6 - hash table rewrite)

2012-09-28 Thread Lawrence Crowl
On 9/28/12, Michael Matz  wrote:
> On Thu, 27 Sep 2012, Lawrence Crowl wrote:
> > > template 
> > > struct D : B
> > > {
> > >   typedef typename B::E E; // element_type
> > >   E getme (int index);
> > > }
> >
> > Inside that struct, lets say we have a field of type E.  Do we name
> > it F or f?
>
> IMHO only for types, not for any other decls.
>
> > > > Do you have an alternate suggestion, one that does not confuse
> > > > template parameters and dependent names?
> > >
> > > Upper last character?  Just kidding :)  Too many detailed rules
> > > for conventions are the death of them, use rules of thumbs,
> > > my one would be "somehow depends on template args -> has upper
> > > character in name", where "somehow depends on" includes "is a".
> >
> > Ah, but there is a problem.  That typedef name does not necessarily
> > depend on a template parameter.
> >
> > It is common practice to have
> >
> > struct Q
> > {
> >   typedef int E;
> >   E getme (int index);
> > };
>
> Easy: I wouldn't make a typedef for Q::E that merely is int.  The reason
> is that it makes knowing what getme really returns harder.  You have to
> look it up always (or know the class already).  In fact that's one of my
> gripes with the standard library, much too much indirection through
> entities merely referring to other entities.  Might be only important for
> the libraries implementors but I sure hope that we don't start down that
> road in GCC.
>
> > In fact, one place is in the hash table code we are discussing.
> > The hash descriptor type may not itself be a template.  I believe
> > that few of them will actually be templates.
>
> Then I don't see the need for class-local typedefs.
>
> > So, if E implies comes from template, the implication is wrong.
> >
> > If we were to follow C++ standard library conventions, we would call it
> > value_type.
>
> Well, but value_type surely does depend on the hashtables
> type argument, doesn't it?  After all it is a typedef from
> 'Key'.  I would expect that htab::value_type is tree, and
> htab::value_type is int, and I would like to see it named
> htab::T or ::E.

One declares a hash table as follows.

  hash_table  variable;

The type stored in the hash table is not part of the declaration.
It is part of the descriptor, along with other things like the
hash function.  The hash table essentially queries the descriptor
for the value type.  For example,

  template  class hash_table {
typename Descriptor::value_type *storage;
...

More typically though, the typedef is repeated inside the class to
avoid excess verbosity, and more importantly, to reexport the name.

  template  class hash_table {
typedef typename Descriptor::value_type value_type;
value_type *storage;
...

Using these typedef names is an essential component of the
abstraction.  Without it, we end up going to void* and loosing all
type safety.

> > That would be my preference.  However, if folks want a shorter name,
> > I'll live with that too.  But as it stands, the current name is very
> > confusing.
>
> I would even prefer 'e' over value_type.  It's scoped, the context always
> will be clear, no need to be verbose in that name.  I find the long names
> inelegant, as most of the standard libs conventions.

We need some convention.  If we choose a convention different from
the standard library, then we are essentially saying that we do not
intend to interoperate with the standard library.  I do not think
that is the intent of the community, but I could be wrong about that.

-- 
Lawrence Crowl


Re: Merge C++ conversion into trunk (4/6 - hash table rewrite)

2012-09-28 Thread Diego Novillo
On Fri, Sep 28, 2012 at 12:40 PM, Lawrence Crowl  wrote:
> On 9/28/12, Michael Matz  wrote:
>>
>> I would even prefer 'e' over value_type.  It's scoped, the context always
>> will be clear, no need to be verbose in that name.  I find the long names
>> inelegant, as most of the standard libs conventions.
>
> We need some convention.  If we choose a convention different from
> the standard library, then we are essentially saying that we do not
> intend to interoperate with the standard library.  I do not think
> that is the intent of the community, but I could be wrong about that.

I agree.  If there already exists a convention that is widely known
and recognized, then we should use it.  There is negative value in
inventing a new convention.  We need to lower barriers to adoption,
not raise them.

Using the standard library convention seems to me like the best thing
to do here.


Diego.


Re: Merge C++ conversion into trunk (4/6 - hash table rewrite)

2012-09-28 Thread Lawrence Crowl
On 9/28/12, Gabriel Dos Reis  wrote:
> On Fri, Sep 28, 2012 at 8:18 AM, Michael Matz  wrote:
>>> It is common practice to have
>>>
>>> struct Q
>>> {
>>>   typedef int E;
>>>   E getme (int index);
>>> };
>>
>> Easy: I wouldn't make a typedef for Q::E that merely is int.  The reason
>> is that it makes knowing what getme really returns harder.
>
> The point of these nested type is precisely to allow  a *uniform* access
> to associated from within a template (e.g. a container) -- irrespective of
> what those types happen to resolve to, builtin or not.

Perhaps an analogy might be helpful.  If I say "Tim's father" you know
role of that person without knowing exactly who it is.  The typedef
convention serves the same purpose.  It is important that we use the
same term for father, or we wouldn't be able to communicate.

-- 
Lawrence Crowl


Re: RFC: LRA for x86/x86-64 [0/9]

2012-09-28 Thread Markus Trippelsdorf
On 2012.09.28 at 11:21 -0400, Vladimir Makarov wrote:
> On 12-09-28 4:21 AM, Steven Bosscher wrote:
> > On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov  
> > wrote:
> >>Any comments and proposals are appreciated.  Even if GCC community
> >> decides that it is too late to submit it to gcc4.8, the earlier reviews
> >> are always useful.
> > I would like to see some benchmark numbers, both for code quality and
> > compile time impact for the most notorious compile time hog PRs for
> > large routines where IRA performs poorly (e.g. PR54146, PR26854).
> >
> >
> I should look at this, Steven. Unfortunately, the compiler @ trunk 
> (without my patch) crashes on PR54156:
> 
> ../../../trunk2/slow.cc: In function ‘void check_() [with NT = 
> CGAL::Gmpfi; int s = 3]’:
> ../../../trunk2/slow.cc:95489:6: internal compiler error: Segmentation fault
> void check_(){
> ^
> 0x888adf crash_signal
> /home/vmakarov/build1/trunk/gcc/gcc/toplev.c:335
> 0x8f4718 gimple_code
> /home/vmakarov/build1/trunk/gcc/gcc/gimple.h:1126
> 0x8f4718 gimple_nop_p
> /home/vmakarov/build1/trunk/gcc/gcc/gimple.h:4851
> 0x8f4718 walk_aliased_vdefs_1
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2204
> 0x8f50ed walk_aliased_vdefs(ao_ref_s*, tree_node*, bool (*)(ao_ref_s*, 
> tree_node*, void*), void*, bitmap_head_def**)
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-alias.c:2240
> 0x9018b5 propagate_necessity
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:909
> 0x9027b3 perform_tree_ssa_dce
> /home/vmakarov/build1/trunk/gcc/gcc/tree-ssa-dce.c:1584
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54735

-- 
Markus


[v3] patch, configuring GCC --disable-libgomp causes libstdc++ abi check to fail.

2012-09-28 Thread Benjamin De Kosnik

Hey Iain.

I looked at this, and easily reproduced your results.

The following symbols are actually required by all
namespace associated-modes, not just parallel, and do not require
parallel capability:

> Symbols reported as missing by the abi.exp code:
> 
> _ZNSt9__cxx199815_List_node_base8transferEPS0_S1_
> std::__cxx1998::_List_node_base::transfer(std::__cxx1998::_List_node_base*,
> std::__cxx1998::_List_node_base*)
> 
> _ZNSt9__cxx199815_List_node_base4swapERS0_S1_
> std::__cxx1998::_List_node_base::swap(std::__cxx1998::_List_node_base&,
> std::__cxx1998::_List_node_base&)
> 
> _ZNSt9__cxx199815_List_node_base9_M_unhookEv
> std::__cxx1998::_List_node_base::_M_unhook()
> 
> _ZNSt9__cxx199815_List_node_base10_M_reverseEv
> std::__cxx1998::_List_node_base::_M_reverse()
>
> _ZNSt9__cxx199815_List_node_base11_M_transferEPS0_S1_
> std::__cxx1998::_List_node_base::_M_transfer(std::__cxx1998::_List_node_base*,
> std::__cxx1998::_List_node_base*)
> 
> _ZNSt9__cxx199815_List_node_base7reverseEv
> std::__cxx1998::_List_node_base::reverse()
> 
> _ZNSt9__cxx199815_List_node_base4hookEPS0_
> std::__cxx1998::_List_node_base::hook(std::__cxx1998::_List_node_base*)
> 
> _ZNSt9__cxx199815_List_node_base6unhookEv
> std::__cxx1998::_List_node_base::unhook()
> 
> _ZNSt9__cxx199815_List_node_base7_M_hookEPS0_
> std::__cxx1998::_List_node_base::_M_hook(std::__cxx1998::_List_node_base*)
> 

I've adjusted the Makefiles accordingly, and renamed some of these
files to accurately reflect what is necessary for compatibility and
what is not.

And the remaining:

> _ZN14__gnu_parallel9_Settings3setERS0_
> __gnu_parallel::_Settings::set(__gnu_parallel::_Settings&)
> 
> _ZN14__gnu_parallel9_Settings3getEv
> __gnu_parallel::_Settings::get()
> 

are sufficiently generic such that they can safely be defined even
without libgomp being built. So let's do so! 

(The only problem may be
targets that had shared libraries with symbol versioning and disabled
libgomp, as now new symbols (ie, get/set from _Settings, above) will be
in the shared library but with an older version number.  In practice
this may not be an issue.)

There's no reason for this unintentional  ABI breakage with this
configure flag.

tested x86/linux
tested x86/linux --disable-libgomp
2012-09-28  Benjamin Kosnik  

	* acinclude.m4 (GLIBCXX_ENABLE_PARALLEL): Remove ENABLE_PARALLEL.
	* include/Makefile.am: Same.
	* src/c++98/Makefile.am: Same.
	* src/Makefile.am: Same.
	* Makefile.in: Regenerated.
	* aclocal.m4: Same.
	* configure: Same.
	* doc/Makefile.in: Same.
	* include/Makefile.in: Same.
	* libsupc++/Makefile.in: Same.
	* po/Makefile.in: Same.
	* python/Makefile.in: Same.
	* src/Makefile.in: Same.
	* testsuite/Makefile.in: Same.
	* src/c++11/Makefile.in: Same.
	* src/c++98/Makefile.in: Same.

	* src/c++98/compatibility-debug_list-2.cc: Update comments.
	* src/c++98/compatibility-debug_list.cc: Same.
	* src/c++98/compatibility-list-2.cc: Renamed to src/c++98/list-aux-2.cc
	* src/c++98/compatibility-list.cc: Renamed to src/c++98/list-aux.cc
	* src/c++98/compatibility-parallel_list-2.cc: Renamed to
	src/c++98/list_associated-2.cc.
	* src/c++98/compatibility-parallel_list.cc: Renamed to
	src/c++98/list_associated.cc.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index ab26660..b9aa5c7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2173,7 +2173,6 @@ AC_DEFUN([GLIBCXX_ENABLE_PARALLEL], [
 
   AC_MSG_CHECKING([for parallel mode support])
   AC_MSG_RESULT([$enable_parallel])
-  GLIBCXX_CONDITIONAL(ENABLE_PARALLEL, test $enable_parallel = yes)
 ])
 
 
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 7b12ea2..09925d5 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -737,7 +737,6 @@ debug_headers = \
 # Parallel mode headers
 parallel_srcdir = ${glibcxx_srcdir}/include/parallel
 parallel_builddir = ./parallel
-if ENABLE_PARALLEL
 parallel_headers = \
 	${parallel_srcdir}/algo.h \
 	${parallel_srcdir}/algobase.h \
@@ -782,9 +781,6 @@ parallel_headers = \
 	${parallel_srcdir}/types.h \
 	${parallel_srcdir}/unique_copy.h \
 	${parallel_srcdir}/workstealing.h
-else
-parallel_headers =
-endif
 
 # Profile mode headers
 profile_srcdir = ${glibcxx_srcdir}/include/profile
@@ -1283,10 +1279,8 @@ install-headers:
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${debug_builddir}
 	for file in ${debug_headers}; do \
 	  $(INSTALL_DATA) $${file} $(DESTDIR)${gxx_include_dir}/${debug_builddir}; done
-	parallel_headers_install='${parallel_headers}';\
-	test -z "$$parallel_headers_install" || \
-	  $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${parallel_builddir};\
-	for file in $$parallel_headers_install; do \
+	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${parallel_builddir}
+	for file in ${parallel_headers}; do \
 	  $(INSTALL_DATA) $${file} $(DESTDIR)${gxx_include_dir}/${parallel_builddir}; done
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${profile_builddir}
 

Re: RFC: LRA for x86/x86-64 [6/9]

2012-09-28 Thread Jeff Law

On 09/27/2012 04:58 PM, Vladimir Makarov wrote:

   The following patch modifies some code in the rest of compiler for
correct work of LRA.  The code works the same way when LRA is not
used.  It is achieved by checking a new variable lra_in_progress.

2012-09-27  Vladimir Makarov 

 * rtlanal.c (simplify_subreg_regno): Permit ARG_POINTER_REGNUM and
 STACK_POINTER_REGNU for LRA.
 * jump.c (true_regnum): Always use hard_regno for subreg_get_info when
 lra is in progress.
 * expr.c (emit_move_insn_1): Pass an additional argument to
 emit_move_via_integer.  Use emit_move_via_integer for LRA only if
 the insn is recognized.
 * recog.c (general_operand, register_operand): Accept paradoxical
FLOAD_MODE
 subregs for LRA.
 (scratch_operand): Accept pseudos for LRA.
 * emit-rtl.c (gen_rtx_REG): Add lra_in_progress.
 (validate_subreg): Don't check offset for LRA and
 floating point modes.
 * rtl.h (lra_in_progress): New external.
 * ira.c (lra_in_progress): Define.

s/FLOAD/FLOAT/ to fix ChangeLog typo.




Index: jump.c
===
--- jump.c  (revision 191771)
+++ jump.c  (working copy)
@@ -1868,7 +1868,8 @@ true_regnum (const_rtx x)
  {
if (REG_P (x))
  {
-  if (REGNO (x) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO (x)] >= 0)
+  if (REGNO (x) >= FIRST_PSEUDO_REGISTER
+ && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
return reg_renumber[REGNO (x)];
return REGNO (x);
  }
This hunk doesn't make any sense to me, unless you want true_regnum to 
return a negative value during LRA for cases where the pseudo is still 
unassigned.  Is that what's you're intending here?  If that's what you 
want, then I think it's worth a quick comment.





@@ -1880,7 +1881,8 @@ true_regnum (const_rtx x)
{
  struct subreg_info info;

- subreg_get_info (REGNO (SUBREG_REG (x)),
+ subreg_get_info (lra_in_progress
+  ? (unsigned) base : REGNO (SUBREG_REG (x)),
   GET_MODE (SUBREG_REG (x)),
   SUBREG_BYTE (x), GET_MODE (x), &info);
I'd be good to indicate why you want to do something different for LRA 
here.




Index: rtlanal.c
===
--- rtlanal.c   (revision 191771)
+++ rtlanal.c   (working copy)
@@ -3465,7 +3465,9 @@ simplify_subreg_regno (unsigned int xreg
/* Give the backend a chance to disallow the mode change.  */
if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
&& GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode))
+  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+  /* We can use mode change in LRA for some transformations.  */
+  && ! lra_in_progress)
  return -1;
  #endif

I don't think this change is reflected in the ChangeLog.

I think just the minor ChangeLog updates and clarification of the 
changes to jump.c are all that's needed for this patch to be approved.


jeff



Profile housekeeping 1/n (tree-vect-loop-manip updates)

2012-09-28 Thread Jan Hubicka
Hi,
this patch makes the vectorizer to produce valid profile, at least for simple
testcases, like
int a[1];
int i=5;
main()
{
  while (i)
{
  int j;
 for(j=0;j<1;j++)
  a[j]=1;
  i--;
}
}

or

struct a
{
  struct a * x;
};

void
foo (struct a * b, int j)
{
  int i;

  for (i = 0; i < j; i++)
{
  b->x = b;
  b++;
}
}

Profile updating is not really trivial here.  The vectorizer may peel prologue
and epilogue that are executed frequently but their expected number of
iterations is bounded by small number.

At the moment we keep all loops with original profile, that is they are all
predicted to iterate many times.  This patch adds scale_loop_profile function
that can reduce frequency of loop body (needed to take into account the loop
guards) as well as bound number of iterations by some number.

I also added few wrappers around probability computations since I tend to
be quite cryptic about the code and tend to use truncating divides that
are suboptimal.

We also did particularly bad job on the newly inserted control flow, where we
usually set zero counts and probabilities driving GCC to optimize them for size
rather than speed.

Bootstrapped/regtested x86_64-linux, will commit it after bit more testing.  I
did not manage to invent testcase that can be easilly tested in testsuite.  I
have some extra code for profile maintenance so I will push it out afterwards.

Honza
* basic-block.h (RDIV): Define.
(EDGE_FREQUENCY): Simplify.
(check_probability, combine_probabilities, apply_probability,
inverse_probability): New.
* cfgloop.c (scale_loop_profile): New function.
* cfgloop.h (scale_loop_profile): Declare.
(slpeel_add_loop_guard): Add probability parameter.
(set_prologue_iterations): Add probability parameter.
(slpeel_tree_peel_loop_to_edge): Add bound1 and bound2 parameters;
update probabilities correctly.
(vect_do_peeling_for_alignment, vect_gen_niters_for_prolog_loop): New.
Index: basic-block.h
===
--- basic-block.h   (revision 191823)
+++ basic-block.h   (working copy)
@@ -478,11 +478,10 @@ struct edge_list
 #define BRANCH_EDGE(bb)(EDGE_SUCC ((bb), 0)->flags & 
EDGE_FALLTHRU \
 ? EDGE_SUCC ((bb), 1) : EDGE_SUCC 
((bb), 0))
 
+#define RDIV(X,Y) (((X) + (Y) / 2) / (Y))
 /* Return expected execution frequency of the edge E.  */
-#define EDGE_FREQUENCY(e)  (((e)->src->frequency \
- * (e)->probability \
- + REG_BR_PROB_BASE / 2) \
-/ REG_BR_PROB_BASE)
+#define EDGE_FREQUENCY(e)  RDIV ((e)->src->frequency * 
(e)->probability, \
+ REG_BR_PROB_BASE)
 
 /* Return nonzero if edge is critical.  */
 #define EDGE_CRITICAL_P(e) (EDGE_COUNT ((e)->src->succs) >= 2 \
@@ -910,4 +909,40 @@ extern void default_rtl_profile (void);
 /* In profile.c.  */
 extern gcov_working_set_t *find_working_set(unsigned pct_times_10);
 
+/* Check tha probability is sane.  */
+
+static inline void
+check_probability (int prob)
+{
+  gcc_checking_assert (prob >= 0 && prob < REG_BR_PROB_BASE);
+}
+
+/* Given PROB1 and PROB2, return PROB1*PROB2/REG_BR_PROB_BASE. 
+   Used to combine BB probabilities.  */
+
+static inline int
+combine_probabilities (int prob1, int prob2)
+{
+  check_probability (prob1);
+  check_probability (prob2);
+  return RDIV (prob1 * prob2, REG_BR_PROB_BASE);
+}
+
+/* Apply probability PROB on frequency or count FREQ.  */
+
+static inline gcov_type
+apply_probability (gcov_type freq, int prob)
+{
+  check_probability (prob);
+  return RDIV (freq * prob, REG_BR_PROB_BASE);
+}
+
+/* Return inverse probability for PROB.  */
+
+static inline int
+inverse_probability (int prob1)
+{
+  check_probability (prob1);
+  return REG_BR_PROB_BASE - prob1;
+}
 #endif /* GCC_BASIC_BLOCK_H */
Index: cfgloop.c
===
--- cfgloop.c   (revision 191823)
+++ cfgloop.c   (working copy)
@@ -1666,3 +1666,121 @@ loop_exits_from_bb_p (struct loop *loop,
 
   return false;
 }
+
+/* Scale the profile estiamte within loop by SCALE.
+   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
+   to iterate too many times.  */
+void
+scale_loop_profile (struct loop *loop, int scale, int iteration_bound)
+{
+  gcov_type iterations = expected_loop_iterations_unbounded (loop);
+  basic_block *bbs;
+  unsigned int i;
+  edge e;
+  edge_iterator ei;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, ";; Scaling loop %i with scale %f, "
+"bounding iterations to %i from guessed %i\n",
+loop->num, (double)scale / REG_BR_PROB_BASE,
+iteration_bound, (int)iterati

Profile housekeeping 2/n (shrink wrapping fix)

2012-09-28 Thread Jan Hubicka
Hi,
this patch fixes updating in shrink wrapping.  Bootstrapped/regtested 
x86_64-linux.
Will commit it shortly.

Honza

* function.c (dup_block_and_redirect): Update profile.
Index: function.c
===
--- function.c  (revision 191823)
+++ function.c  (working copy)
@@ -5668,6 +5668,15 @@ dup_block_and_redirect (basic_block bb,
   for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
 if (!bitmap_bit_p (need_prologue, e->src->index))
   {
+   int freq = EDGE_FREQUENCY (e);
+   copy_bb->count += e->count;
+   copy_bb->frequency += EDGE_FREQUENCY (e);
+   e->dest->count -= e->count;
+   if (e->dest->count < 0)
+ e->dest->count = 0;
+   e->dest->frequency -= freq;
+   if (e->dest->frequency < 0)
+ e->dest->frequency = 0;
redirect_edge_and_branch_force (e, copy_bb);
continue;
   }


Fix PR54688

2012-09-28 Thread Bernd Schmidt
This is a bug in the new scheduler dependency breaking code. Sparc has a
slightly broken machine description using noncanonical RTL: we get MINUS
with a constant second argument. That exposes a problem in the new code,
it pretends to handle MINUS but doesn't really. Since the most important
case is the one where we're adding constants, I've removed handling of
MINUS entirely.

This also showed a potential problem with targets where
!STACK_GROWS_DOWNWARD.

I've bootstrapped and tested the following on x86_64-linux and committed
as obvious.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 191837)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2012-09-20  Bernd Schmidt  
+
+	PR bootstrap/54688
+	* sched-deps.c (parse_add_or_inc): Remove MINUS handling.  Take
+	STACK_GROWS_DOWNWARD into account.
+
 2012-09-28  Jakub Jelinek  
 
 	PR target/54716
Index: gcc/sched-deps.c
===
--- gcc/sched-deps.c	(revision 191823)
+++ gcc/sched-deps.c	(working copy)
@@ -4600,8 +4600,7 @@ parse_add_or_inc (struct mem_inc_info *m
   if (!REG_P (SET_DEST (pat)))
 return false;
 
-  if (GET_CODE (SET_SRC (pat)) != PLUS
-  && GET_CODE (SET_SRC (pat)) != MINUS)
+  if (GET_CODE (SET_SRC (pat)) != PLUS)
 return false;
 
   mii->inc_insn = insn;
@@ -4629,9 +4628,14 @@ parse_add_or_inc (struct mem_inc_info *m
 }
 
   if (regs_equal && REGNO (SET_DEST (pat)) == STACK_POINTER_REGNUM)
-/* Note that the sign has already been reversed for !before_mem.  */
-return mii->inc_constant > 0;
-
+{
+  /* Note that the sign has already been reversed for !before_mem.  */
+#ifdef STACK_GROWS_DOWNWARD
+  return mii->inc_constant > 0;
+#else
+  return mii->inc_constant < 0;
+#endif
+}
   return true;
 }
 


Profile housekeeping 3/4 (call-cddce fix)

2012-09-28 Thread Jan Hubicka
Hi,
shrink_wrap_one_built_in_call forgets to update counts.

* tree-call-cdce.c (shrink_wrap_one_built_in_call):
Update counts.
Index: tree-call-cdce.c
===
--- tree-call-cdce.c(revision 191823)
+++ tree-call-cdce.c(working copy)
@@ -773,8 +773,13 @@ shrink_wrap_one_built_in_call (gimple bi
   EDGE_FALSE_VALUE);
 
   bi_call_in_edge0->probability = REG_BR_PROB_BASE * ERR_PROB;
+  bi_call_in_edge0->count =
+  apply_probability (guard_bb0->count,
+bi_call_in_edge0->probability);
   join_tgt_in_edge_fall_thru->probability =
-  REG_BR_PROB_BASE - bi_call_in_edge0->probability;
+  inverse_probability (bi_call_in_edge0->probability);
+  join_tgt_in_edge_fall_thru->count =
+  guard_bb0->count - bi_call_in_edge0->count;
 
   /* Code generation for the rest of the conditions  */
   guard_bb = guard_bb0;
@@ -804,8 +809,12 @@ shrink_wrap_one_built_in_call (gimple bi
   bi_call_in_edge = make_edge (guard_bb, bi_call_bb, EDGE_TRUE_VALUE);
 
   bi_call_in_edge->probability = REG_BR_PROB_BASE * ERR_PROB;
+  bi_call_in_edge->count =
+ apply_probability (guard_bb->count,
+bi_call_in_edge->probability);
   guard_bb_in_edge->probability =
-  REG_BR_PROB_BASE - bi_call_in_edge->probability;
+  inverse_probability (bi_call_in_edge->probability);
+  guard_bb_in_edge->count = guard_bb->count - bi_call_in_edge->count;
 }
 
   VEC_free (gimple, heap, conds);


Re: Profile housekeeping 3/4 (call-cddce fix)

2012-09-28 Thread Steven Bosscher
On Fri, Sep 28, 2012 at 10:43 PM, Jan Hubicka  wrote:
> Hi,
> shrink_wrap_one_built_in_call forgets to update counts.

Hi,

Can you look at the one-liner from
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00794.html too, please?

The patch there is this:

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 191129)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1478,6 +1478,8 @@
 bb2->frequency = BB_FREQ_MAX;
   bb1->frequency = 0;

+  bb2->count += bb1->count;
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);

That looks correct to me, but you're much more familiar with this code
than I am :-)

Ciao!
Steven


Re: Profile housekeeping 3/4 (call-cddce fix)

2012-09-28 Thread Jan Hubicka
> On Fri, Sep 28, 2012 at 10:43 PM, Jan Hubicka  wrote:
> > Hi,
> > shrink_wrap_one_built_in_call forgets to update counts.
> 
> Hi,
> 
> Can you look at the one-liner from
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00794.html too, please?
> 
> The patch there is this:
> 
> Index: tree-ssa-tail-merge.c
> ===
> --- tree-ssa-tail-merge.c (revision 191129)
> +++ tree-ssa-tail-merge.c (working copy)
> @@ -1478,6 +1478,8 @@
>  bb2->frequency = BB_FREQ_MAX;
>bb1->frequency = 0;
> 
> +  bb2->count += bb1->count;
> +
>/* Do updates that use bb1, before deleting bb1.  */
>release_last_vdef (bb1);
>same_succ_flush_bb (bb1);
> 
> That looks correct to me, but you're much more familiar with this code
> than I am :-)

Yes, this one is obvoiusly correct.  Thanks for pointing it out.

Patch is OK.
Honza
> 
> Ciao!
> Steven


libgo patch committed: Better detection of memory overflow

2012-09-28 Thread Ian Lance Taylor
This patch, which brings in some bits of code from the master Go
library, does a better job of detecting when a memory allocation request
will overflow.  This lets us panic in a way that the program can see,
rather than crashing.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

diff -r 3e4478623419 libgo/runtime/chan.c
--- a/libgo/runtime/chan.c	Fri Sep 28 10:41:20 2012 -0700
+++ b/libgo/runtime/chan.c	Fri Sep 28 14:06:03 2012 -0700
@@ -3,6 +3,8 @@
 // license that can be found in the LICENSE file.
 
 #include "runtime.h"
+#include "arch.h"
+#include "malloc.h"
 #include "go-type.h"
 
 #define	NOSELGEN	1
@@ -88,7 +90,7 @@
 	
 	elem = t->__element_type;
 
-	if(hint < 0 || (int32)hint != hint || (elem->__size > 0 && (uintptr)hint > ((uintptr)-1) / elem->__size))
+	if(hint < 0 || (int32)hint != hint || (elem->__size > 0 && (uintptr)hint > MaxMem / elem->__size))
 		runtime_panicstring("makechan: size out of range");
 
 	n = sizeof(*c);
diff -r 3e4478623419 libgo/runtime/go-append.c
--- a/libgo/runtime/go-append.c	Fri Sep 28 10:41:20 2012 -0700
+++ b/libgo/runtime/go-append.c	Fri Sep 28 14:06:03 2012 -0700
@@ -54,6 +54,9 @@
 	  while (m < count);
 	}
 
+  if ((uintptr) m > MaxMem / element_size)
+	runtime_panicstring ("growslice: cap out of range");
+
   n = __go_alloc (m * element_size);
   __builtin_memcpy (n, a.__values, a.__count * element_size);
 
diff -r 3e4478623419 libgo/runtime/go-make-slice.c
--- a/libgo/runtime/go-make-slice.c	Fri Sep 28 10:41:20 2012 -0700
+++ b/libgo/runtime/go-make-slice.c	Fri Sep 28 14:06:03 2012 -0700
@@ -37,7 +37,7 @@
   if (cap < len
   || (uintptr_t) icap != cap
   || (std->__element_type->__size > 0
-	  && cap > (uintptr_t) -1U / std->__element_type->__size))
+	  && cap > MaxMem / std->__element_type->__size))
 runtime_panicstring ("makeslice: cap out of range");
 
   ret.__count = ilen;
diff -r 3e4478623419 libgo/runtime/malloc.h
--- a/libgo/runtime/malloc.h	Fri Sep 28 10:41:20 2012 -0700
+++ b/libgo/runtime/malloc.h	Fri Sep 28 14:06:03 2012 -0700
@@ -128,6 +128,15 @@
 	MaxGcproc = 4,
 };
 
+// Maximum memory allocation size, a hint for callers.
+// This must be a #define instead of an enum because it
+// is so large.
+#if __SIZEOF_POINTER__ == 8
+#define	MaxMem	(16ULL<<30)	/* 16 GB */
+#else
+#define	MaxMem	((uintptr)-1)
+#endif
+
 // A generic linked list of blocks.  (Typically the block is bigger than sizeof(MLink).)
 struct MLink
 {


Re: [google] Add new dump flag -pmu to display PMU data in dumps (issue6551072)

2012-09-28 Thread Dehao Chen
Sorry to reply late, missed this mail again... not sure why.

LGTM, okay for google branches.

Dehao

On Mon, Sep 24, 2012 at 1:20 PM, Teresa Johnson  wrote:
> Revised patch to add a new dump flag that dumps PMU profile information using
> the -pmu dump option. (Was issue 6489092, creating new issue since I don't own
> that one.)
>
> Ok for google/main?
>
> Passes bootstrap and regression tests.
>
> Teresa
>
> 2012-09-24  Teresa Johnson  
> Chris Manghane  
>
> * doc/invoke.texi: Update -fpmu-profile-use option.
> * tree-dump.c: Add new dump flag.
> * tree-pretty-print.c (dump_load_latency_details): New function.
> (dump_pmu): Ditto.
> (dump_generic_node): Add support for new dump flag.
> * tree-pretty-print.h (dump_pmu): Declare.
> * tree-pass.h (enum tree_dump_index): Add new dump flag.
> * gcov.c (process_pmu_profile): Fix string table count assert.
> * opts.c (OPT_fpmu_profile_use_): Add support for -fpmu-profile-use.
> * gimple-pretty-print.c (dump_gimple_phi): Add support for new dump
> flag.
> (dump_gimple_stmt): Ditto.
> * coverage.c (struct pmu_entry): New structure.
> (struct gcov_pmu_summary): Ditto.
> (htab_pmu_entry_hash): New function.
> (htab_pmu_entry_eq): Ditto.
> (htab_pmu_entry_del): Ditto.
> (read_pmu_file): Ditto.
> (get_pmu_hash_entry): Ditto.
> (process_pmu_data): Ditto.
> (get_coverage_pmu_latency): Ditto.
> (get_coverage_pmu_branch_mispredict): Ditto.
> (pmu_data_present): Ditto.
> (coverage_init): Add pmu file read support.
> * coverage.h (get_coverage_pmu_latency): Declare.
> (get_coverage_pmu_branch_mispredict): Ditto.
> * common.opt: Update -fpmu-profile-use option.
>
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi (revision 191138)
> +++ doc/invoke.texi (working copy)
> @@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
>  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
>  -fpmu-profile-generate=@var{pmuoption} @gol
> --fpmu-profile-use=@var{pmuoption} @gol
> +-fpmu-profile-use=@var{pmudata} @gol
>  -freciprocal-math -free -fregmove -frename-registers -freorder-blocks @gol
>  -frecord-gcc-switches-in-elf@gol
>  -freorder-blocks-and-partition -freorder-functions @gol
> @@ -8381,12 +8381,11 @@ displayed using coverage tool gcov. The params var
>  "pmu_profile_n_addresses" can be used to restrict PMU data collection
>  to only this many addresses.
>
> -@item -fpmu-profile-use=@var{pmuoption}
> +@item -fpmu-profile-use=@var{pmudata}
>  @opindex fpmu-profile-use
>
> -Enable performance monitoring unit (PMU) profiling based
> -optimizations.  Currently only @var{load-latency} and
> -@var{branch-mispredict} are supported.
> +If @var{pmudata} is specified, GCC will read PMU data from @var{pmudata}. If
> +unspecified, PMU data will be read from 'pmuprofile.gcda'.
>
>  @item -fprofile-strip=@var{base_suffix}
>  @opindex fprofile-strip
> Index: tree-dump.c
> ===
> --- tree-dump.c (revision 191138)
> +++ tree-dump.c (working copy)
> @@ -824,9 +824,11 @@ static const struct dump_option_value_info dump_op
>{"nouid", TDF_NOUID},
>{"enumerate_locals", TDF_ENUMERATE_LOCALS},
>{"scev", TDF_SCEV},
> +  {"pmu", TDF_PMU},
>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA
> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
> -   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)},
> +   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
> +| TDF_PMU)},
>{NULL, 0}
>  };
>
> Index: tree-pretty-print.c
> ===
> --- tree-pretty-print.c (revision 191138)
> +++ tree-pretty-print.c (working copy)
> @@ -25,6 +25,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm.h"
>  #include "tree.h"
>  #include "output.h"
> +#include "basic-block.h"
> +#include "gcov-io.h"
> +#include "coverage.h"
>  #include "tree-pretty-print.h"
>  #include "hashtab.h"
>  #include "tree-flow.h"
> @@ -51,6 +54,7 @@ static void do_niy (pretty_printer *, const_tree);
>
>  static pretty_printer buffer;
>  static int initialized = 0;
> +static char *file_prefix = NULL;
>
>  /* Try to print something for an unknown tree code.  */
>
> @@ -461,7 +465,32 @@ dump_omp_clauses (pretty_printer *buffer, tree cla
>  }
>  }
>
> +/* Dump detailed information about pmu load latency events */
>
> +static void
> +dump_load_latency_details (pretty_printer *buffer, gcov_pmu_ll_info_t 
> *ll_info)
> +{
> +  if (ll_info == NULL)
> +return;
> +
> +  pp_string (buffer, "\n[load latency contribution: ");
> +  p

Go patch committed: Fix handling of omitted expression in switch

2012-09-28 Thread Ian Lance Taylor
In Go, if a switch statement omits the expression on which to switch, it
is taken to be the constant "true".  I was simply testing that the cases
were boolean, which is not quite right, as it is valid to compare an
empty interface against "true".  This patch to the Go frontend fixes the
problem.  Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.7 branch.

Ian

diff -r f2c5d044661e go/statements.cc
--- a/go/statements.cc	Fri Sep 28 14:19:44 2012 -0700
+++ b/go/statements.cc	Fri Sep 28 14:46:27 2012 -0700
@@ -3313,16 +3313,10 @@
 	   p != this->cases_->end();
 	   ++p)
 	{
-	  Expression* this_cond;
-	  if (val_temp == NULL)
-	this_cond = *p;
-	  else
-	{
-	  Expression* ref = Expression::make_temporary_reference(val_temp,
- loc);
-	  this_cond = Expression::make_binary(OPERATOR_EQEQ, ref, *p, loc);
-	}
-
+	  Expression* ref = Expression::make_temporary_reference(val_temp,
+ loc);
+	  Expression* this_cond = Expression::make_binary(OPERATOR_EQEQ, ref,
+			  *p, loc);
 	  if (cond == NULL)
 	cond = this_cond;
 	  else
@@ -3866,15 +3860,12 @@
   return Statement::make_statement(val, true);
 }
 
-  Temporary_statement* val_temp;
-  if (this->val_ == NULL)
-val_temp = NULL;
-  else
-{
-  // var val_temp VAL_TYPE = VAL
-  val_temp = Statement::make_temporary(NULL, this->val_, loc);
-  b->add_statement(val_temp);
-}
+  // var val_temp VAL_TYPE = VAL
+  Expression* val = this->val_;
+  if (val == NULL)
+val = Expression::make_boolean(true, loc);
+  Temporary_statement* val_temp = Statement::make_temporary(NULL, val, loc);
+  b->add_statement(val_temp);
 
   this->clauses_->lower(b, val_temp, this->break_label());
 


Re: RFC: LRA for x86/x86-64 [6/9]

2012-09-28 Thread Vladimir Makarov

On 12-09-28 4:07 PM, Jeff Law wrote:

On 09/27/2012 04:58 PM, Vladimir Makarov wrote:

   The following patch modifies some code in the rest of compiler for
correct work of LRA.  The code works the same way when LRA is not
used.  It is achieved by checking a new variable lra_in_progress.

2012-09-27  Vladimir Makarov 

 * rtlanal.c (simplify_subreg_regno): Permit ARG_POINTER_REGNUM and
 STACK_POINTER_REGNU for LRA.
 * jump.c (true_regnum): Always use hard_regno for 
subreg_get_info when

 lra is in progress.
 * expr.c (emit_move_insn_1): Pass an additional argument to
 emit_move_via_integer.  Use emit_move_via_integer for LRA only if
 the insn is recognized.
 * recog.c (general_operand, register_operand): Accept paradoxical
FLOAD_MODE
 subregs for LRA.
 (scratch_operand): Accept pseudos for LRA.
 * emit-rtl.c (gen_rtx_REG): Add lra_in_progress.
 (validate_subreg): Don't check offset for LRA and
 floating point modes.
 * rtl.h (lra_in_progress): New external.
 * ira.c (lra_in_progress): Define.

s/FLOAD/FLOAT/ to fix ChangeLog typo.



Thanks.  I fixed it will be in the revised versions of the patches.



Index: jump.c
===
--- jump.c(revision 191771)
+++ jump.c(working copy)
@@ -1868,7 +1868,8 @@ true_regnum (const_rtx x)
  {
if (REG_P (x))
  {
-  if (REGNO (x) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO 
(x)] >= 0)

+  if (REGNO (x) >= FIRST_PSEUDO_REGISTER
+  && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
  return reg_renumber[REGNO (x)];
return REGNO (x);
  }
This hunk doesn't make any sense to me, unless you want true_regnum to 
return a negative value during LRA for cases where the pseudo is still 
unassigned.  Is that what's you're intending here?  If that's what you 
want, then I think it's worth a quick comment.


Yes, that was my intention.  LRA works a bit different from reload. It 
needs the current assignment of pseudos (assigned hard register or -1 
even it is not assigned).


I'll add the comment about this in the next version of the patch. The 
code looks a bit strange and could be more clear but I kept in my mind 
removing code for reload in the future.




@@ -1880,7 +1881,8 @@ true_regnum (const_rtx x)
  {
struct subreg_info info;

-  subreg_get_info (REGNO (SUBREG_REG (x)),
+  subreg_get_info (lra_in_progress
+   ? (unsigned) base : REGNO (SUBREG_REG (x)),
 GET_MODE (SUBREG_REG (x)),
 SUBREG_BYTE (x), GET_MODE (x), &info);
I'd be good to indicate why you want to do something different for LRA 
here.


I need to know the current allocation with taking into account that the 
subregister with final hard register will be representable. I'll add the 
comment.


Index: rtlanal.c
===
--- rtlanal.c(revision 191771)
+++ rtlanal.c(working copy)
@@ -3465,7 +3465,9 @@ simplify_subreg_regno (unsigned int xreg
/* Give the backend a chance to disallow the mode change. */
if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
&& GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode))
+  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+  /* We can use mode change in LRA for some transformations.  */
+  && ! lra_in_progress)
  return -1;
  #endif

I don't think this change is reflected in the ChangeLog.

You are right.  I added the change description in the ChangeLog. Reload 
has no problem in representation of some decisions because it uses 
internal representation.  It is a problem for LRA using RTL when for 
example two insn operands in different modes should have the same hard 
register according to the constraint.  I use subreg for that even it is 
not a correct in other parts of the compiler.
I think just the minor ChangeLog updates and clarification of the 
changes to jump.c are all that's needed for this patch to be approved.



Thanks, Jeff.  I really appreciate your help.




Re: RFC: LRA for x86/x86-64 [2/9]

2012-09-28 Thread Vladimir Makarov

On 12-09-28 11:36 AM, Jeff Law wrote:

On 09/28/2012 09:21 AM, Vladimir Makarov wrote:

On 12-09-28 4:43 AM, Steven Bosscher wrote:

I have patches in the works to use the slim RTL dumping format more,
too, and to use the pretty-printer code so that printing strings with
escaped characters can be made more transparent (e.g. for use in
GraphViz dumps).

That would be nice.  Slim printing is very useful for LRA which prints a
lot of changes in RTL code during all its work.  Regular printing is
unreadable because of its volume.  For LRA debugging I usually find a
suspicious place in slim dump and if I need more info I use regular dump
of the suspicious insn.

Perhaps it's time to rename sched-vis.c to print-rtl-slim.c? :-)


Yes, the name sched-vis.c is very misleading.
It seems to me this change ought to go forward now (rename to 
print-rtl-slim.c and add print_value_slim.


WRT print_value_slim we have this block comment:




+/* Prints rtxes, I customarily classified as values.  They're
+   constants, registers, labels, symbols and memory accesses. Print
+   them to file F.  */


That block comment just doesn't make sense when I read it.  It seems 
like.


/* Print X, an RTL value node, to file F in slim format.  Include
   additional information if VERBOSE is nonzero.

   Value nodes are constants, registers, labels, symbols and
   memory.  */


With that change I think you could make the obvious changes necessary 
to rename to print-rtl-slim and check this patch in.



Thanks, Jeff.  I will do it on Monday.



Re: RFC: LRA for x86/x86-64 [3/9]

2012-09-28 Thread Vladimir Makarov

On 12-09-28 3:08 PM, Jeff Law wrote:

On 09/27/2012 04:57 PM, Vladimir Makarov wrote:

LRA creates a lot of new pseudos.  So the following patch implements
ahead allocation reg info information which is important for LRA
compilation speed.

2012-09-27  Vladimir Makarov 

 * reginfo.c (max_regno_since_last_resize): New.
 (reg_preferred_class, reg_alternate_class): Add assert.
 (allocate_reg_info): Initialize allocated reg info.
 (resize_reg_info): Make bigger reg_info and initialize new memory.
 (reginfo_init): Initialize max_regno_since_last_resize.
 (setup_reg_classes): Change assert.
This is fine.  FWIW, it roughly mirrors code I wrote a couple years 
ago when working on range splitting.
I repeated it too several times for some my projects.  So the history 
repeats.  I think we should have added the code even if it would be 
never used.




[C++ Patch] PR 54738

2012-09-28 Thread Paolo Carlini

Hi,

Daniel, while working on the SFINAE-friendly std::result_of, noticed one 
more SFINAE bug: we weren't propagating complain from 
tsubst_copy_and_build via build_offset_ref_call_from_tree. Fixed the 
usual way.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2012-09-28  Paolo Carlini  

PR c++/54738
* decl2.c (build_offset_ref_call_from_tree): Add tsubst_flags_t
parameter.
* pt.c (tsubst_copy_and_build): Adjust.
* parser.c (cp_parser_postfix_expression): Likewise.
* cp-tree.h: Adjust declaration.

/testsuite
2012-09-28  Paolo Carlini  

PR c++/54738
* g++.dg/cpp0x/sfinae42.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 191843)
+++ cp/pt.c (working copy)
@@ -13783,7 +13783,8 @@ tsubst_copy_and_build (tree t,
  mark_used (function);
 
if (TREE_CODE (function) == OFFSET_REF)
- ret = build_offset_ref_call_from_tree (function, &call_args);
+ ret = build_offset_ref_call_from_tree (function, &call_args,
+complain);
else if (TREE_CODE (function) == COMPONENT_REF)
  {
tree instance = TREE_OPERAND (function, 0);
Index: cp/parser.c
===
--- cp/parser.c (revision 191843)
+++ cp/parser.c (working copy)
@@ -5749,7 +5749,8 @@ cp_parser_postfix_expression (cp_parser *parser, b
 || TREE_CODE (postfix_expression) == MEMBER_REF
 || TREE_CODE (postfix_expression) == DOTSTAR_EXPR)
  postfix_expression = (build_offset_ref_call_from_tree
-   (postfix_expression, &args));
+   (postfix_expression, &args,
+tf_warning_or_error));
else if (idk == CP_ID_KIND_QUALIFIED)
  /* A call to a static class member, or a namespace-scope
 function.  */
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 191843)
+++ cp/cp-tree.h(working copy)
@@ -5149,7 +5149,8 @@ extern void determine_visibility  (tree);
 extern void constrain_class_visibility (tree);
 extern void import_export_decl (tree);
 extern tree build_cleanup  (tree);
-extern tree build_offset_ref_call_from_tree(tree, VEC(tree,gc) **);
+extern tree build_offset_ref_call_from_tree(tree, VEC(tree,gc) **,
+tsubst_flags_t);
 extern bool decl_constant_var_p(tree);
 extern bool decl_maybe_constant_var_p  (tree);
 extern void check_default_args (tree);
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 191843)
+++ cp/decl2.c  (working copy)
@@ -4087,7 +4087,8 @@ cp_write_global_declarations (void)
ARGS.  */
 
 tree
-build_offset_ref_call_from_tree (tree fn, VEC(tree,gc) **args)
+build_offset_ref_call_from_tree (tree fn, VEC(tree,gc) **args,
+tsubst_flags_t complain)
 {
   tree orig_fn;
   VEC(tree,gc) *orig_args = NULL;
@@ -4115,7 +4116,7 @@ tree
   if (TREE_CODE (TREE_TYPE (fn)) == METHOD_TYPE)
{
  if (TREE_CODE (fn) == DOTSTAR_EXPR)
-   object = cp_build_addr_expr (object, tf_warning_or_error);
+   object = cp_build_addr_expr (object, complain);
  VEC_safe_insert (tree, gc, *args, 0, object);
}
   /* Now that the arguments are done, transform FN.  */
@@ -4130,17 +4131,17 @@ tree
void B::g() { (this->*p)(); }  */
   if (TREE_CODE (fn) == OFFSET_REF)
 {
-  tree object_addr = cp_build_addr_expr (object, tf_warning_or_error);
+  tree object_addr = cp_build_addr_expr (object, complain);
   fn = TREE_OPERAND (fn, 1);
   fn = get_member_function_from_ptrfunc (&object_addr, fn,
-tf_warning_or_error);
+complain);
   VEC_safe_insert (tree, gc, *args, 0, object_addr);
 }
 
   if (CLASS_TYPE_P (TREE_TYPE (fn)))
-expr = build_op_call (fn, args, tf_warning_or_error);
+expr = build_op_call (fn, args, complain);
   else
-expr = cp_build_function_call_vec (fn, args, tf_warning_or_error);
+expr = cp_build_function_call_vec (fn, args, complain);
   if (processing_template_decl && expr != error_mark_node)
 expr = build_min_non_dep_call_vec (expr, orig_fn, orig_args);
 
Index: testsuite/g++.dg/cpp0x/sfinae42.C
===
--- testsuite/g++.dg/cpp0x/sfinae42.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/sfinae42.C   (working copy)
@@ -0,0 +1,46 @@
+// PR c++/54738
+// { dg-do compile { target c++11 } }
+
+template
+T&& d

Breakage with "[v3] patch, configuring GCC --disable-libgomp causes libstdc++ abi check to fail."

2012-09-28 Thread Hans-Peter Nilsson
> MIME-Version: 1.0
> From: Benjamin De Kosnik 
> Date: Fri, 28 Sep 2012 22:02:46 +0200

> There's no reason for this unintentional  ABI breakage with this
> configure flag.

> 2012-09-28  Benjamin Kosnik  
> 
> * acinclude.m4 (GLIBCXX_ENABLE_PARALLEL): Remove ENABLE_PARALLEL.
> * include/Makefile.am: Same.
> * src/c++98/Makefile.am: Same.
> * src/Makefile.am: Same.
> * Makefile.in: Regenerated.
> * aclocal.m4: Same.
> * configure: Same.
> * doc/Makefile.in: Same.
> * include/Makefile.in: Same.
> * libsupc++/Makefile.in: Same.
> * po/Makefile.in: Same.
> * python/Makefile.in: Same.
> * src/Makefile.in: Same.
> * testsuite/Makefile.in: Same.
> * src/c++11/Makefile.in: Same.
> * src/c++98/Makefile.in: Same.

Something went wrong and the target compiler is now wrongly
assumed to unconditionally understand -pthread.  For cris-elf:

libtool: compile:  /tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc/xgcc 
-shared-libgcc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc -nostdinc++ 
-L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src 
-L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src/.libs 
-nostdinc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/ -isystem 
/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/targ-include -isystem 
/tmp/hpautotest-gcc1/gcc/newlib/libc/include 
-B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/cris 
-L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/libnosys 
-L/tmp/hpautotest-gcc1/gcc/libgloss/cris 
-B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/bin/ 
-B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/lib/ -isystem 
/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/include -isystem 
/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/sys-include 
-I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/../libgcc 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include/cris-elf
  -I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include 
-I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall 
-Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once 
-ffunction-sections -fdata-sections -frandom-seed=parallel_settings.lo -g -O2 
-fopenmp -D_GLIBCXX_PARALLEL 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/../libgomp -c 
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/src/c++98/parallel_settings.cc -o 
parallel_settings.o
xgcc: error: unrecognized command line option '-pthread'

brgds, H-P


Re: [google] AutoFDO implementation

2012-09-28 Thread Andi Kleen
Dehao Chen  writes:
>
> If people in up-stream find this feature interesting, I'll spend some
> time to port this to trunk and try to opensource the tool to generate
> profile data file.

I find it interesting and would like to see the tool and a trunk version.

Thanks,
-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


[v3] fixup --enable-cxx-flags

2012-09-28 Thread Benjamin De Kosnik

... found while working on arm-eabisim cross, using --enable-cxx-flags
was not working as AM_CXXFLAGS was being over-ridden by CXXFLAGS.
(Despite the comments warning about this.)

Fixed.

Also patchlet for the last commit, forgot to edit PARALELL_FLAGS, so
--disable-thread compiles were failing. 

-benjamin

tested x86/linux
tested x86/linux --enable-cxx-flags="-g0"2012-09-28  Benjamin Kosnik  

	* fragment.am (CONFIG_CXXFLAGS): Remove EXTRA_CXX_FLAGS.
	* libsupc++/Makefile.am (LTCXXCOMPILE): Add EXTRA_CXX_FLAGS here.
	* src/Makefile.am: Same.
	* src/c++98/Makefile.am: Same.
	* src/c++11/Makefile.am: Same.
	* Makefile.in: Regenerated.
	* src/Makefile.am: Same.
	* src/c++11/Makefile.in: Same.
	* src/c++98/Makefile.in: Same.
	* include/Makefile.in: Same.
	* po/Makefile.in: Same.
	* python/Makefile.in: Same.
	* testsuite/Makefile.in: Same.

2012-09-28  Benjamin Kosnik  

	* src/c++98/Makefile.am: Fixup PARALLEL_FLAGS.

diff --git a/libstdc++-v3/fragment.am b/libstdc++-v3/fragment.am
index 64247af..5b1d503 100644
--- a/libstdc++-v3/fragment.am
+++ b/libstdc++-v3/fragment.am
@@ -22,7 +22,8 @@ endif
 # These bits are all figured out from configure.  Look in acinclude.m4
 # or configure.ac to see how they are set.  See GLIBCXX_EXPORT_FLAGS.
 CONFIG_CXXFLAGS = \
-	$(SECTION_FLAGS) $(HWCAP_FLAGS) $(EXTRA_CXX_FLAGS) -frandom-seed=$@
+	$(SECTION_FLAGS) $(HWCAP_FLAGS) -frandom-seed=$@
+
 WARN_CXXFLAGS = \
 	$(WARN_FLAGS) $(WERROR_FLAG) -fdiagnostics-show-location=once 
 
diff --git a/libstdc++-v3/libsupc++/Makefile.am b/libstdc++-v3/libsupc++/Makefile.am
index 7c72f58..69cbf5c 100644
--- a/libstdc++-v3/libsupc++/Makefile.am
+++ b/libstdc++-v3/libsupc++/Makefile.am
@@ -179,14 +179,14 @@ LTCOMPILE = $(LIBTOOL) --tag CC --tag disable-shared $(LIBTOOLFLAGS) --mode=comp
 # placed after --tag CXX lest things CXX undo the affect of
 # disable-shared.
 
-# 2) Need to explicitly set LTCXXCOMPILE so that AM_CXXFLAGS is
+# 2) Need to explicitly set LTCXXCOMPILE so that EXTRA_CXX_FLAGS is
 # last. (That way, things like -O2 passed down from the toplevel can
 # be overridden by --enable-debug.)
 LTCXXCOMPILE = \
 	$(LIBTOOL) --tag CXX --tag disable-shared \
 	$(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CXX) $(TOPLEVEL_INCLUDES) \
-	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
+	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) $(EXTRA_CXX_FLAGS)
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
 
diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index 0251289..d76318e 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -154,14 +154,14 @@ AM_CXXFLAGS = \
 # placed after --tag CXX lest things CXX undo the affect of
 # disable-shared.
 
-# 2) Need to explicitly set LTCXXCOMPILE so that AM_CXXFLAGS is
+# 2) Need to explicitly set LTCXXCOMPILE so that EXTRA_CXX_FLAGS is
 # last. (That way, things like -O2 passed down from the toplevel can
-# be overridden by --enable-debug.)
+# be overridden by --enable-debug and --enable-cxx-flags.)
 LTCXXCOMPILE = \
 	$(LIBTOOL) --tag CXX \
 	$(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CXX) $(INCLUDES) \
-	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
+	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) $(EXTRA_CXX_FLAGS)
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
 
diff --git a/libstdc++-v3/src/c++11/Makefile.am b/libstdc++-v3/src/c++11/Makefile.am
index 1e3dd99..4922461 100644
--- a/libstdc++-v3/src/c++11/Makefile.am
+++ b/libstdc++-v3/src/c++11/Makefile.am
@@ -99,14 +99,14 @@ AM_MAKEFLAGS = \
 # placed after --tag CXX lest things CXX undo the affect of
 # disable-shared.
 
-# 2) Need to explicitly set LTCXXCOMPILE so that AM_CXXFLAGS is
+# 2) Need to explicitly set LTCXXCOMPILE so that EXTRA_CXX_FLAGS is
 # last. (That way, things like -O2 passed down from the toplevel can
 # be overridden by --enable-debug.)
 LTCXXCOMPILE = \
 	$(LIBTOOL) --tag CXX --tag disable-shared \
 	$(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
 	--mode=compile $(CXX) $(TOPLEVEL_INCLUDES) \
-	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
+	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) $(EXTRA_CXX_FLAGS)
 
 LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS))
 
diff --git a/libstdc++-v3/src/c++98/Makefile.am b/libstdc++-v3/src/c++98/Makefile.am
index 5733f34..6d83416 100644
--- a/libstdc++-v3/src/c++98/Makefile.am
+++ b/libstdc++-v3/src/c++98/Makefile.am
@@ -161,7 +161,7 @@ concept-inst.o: concept-inst.cc
 	$(CXXCOMPILE) -D_GLIBCXX_CONCEPT_CHECKS -fimplicit-templates -c $<
 
 # Use special rules for parallel mode compilation.
-PARALLEL_FLAGS = -fopenmp -D_GLIBCXX_PARALLEL -I$(glibcxx_builddir)/../libgomp
+PARALLEL_FLAGS = -D_GLIBCXX_PARALLEL 
 parallel_settings.lo: parallel_settings.cc
 	$(LTCXXCOMPILE) $(PARALLEL_FLAGS) -c $<
 parallel_settings.o: parallel_settings.cc
@@ -203,14 +203,14 @@ AM_MAKEFLAGS = \
 # placed after --tag CXX lest

RE: [PATCH RFA] Implement register pressure directed hoist pass

2012-09-28 Thread Bin Cheng

> -Original Message-
> From: Steven Bosscher [mailto:stevenb@gmail.com]
> Sent: Friday, September 28, 2012 4:29 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; Eric Botcazou; Richard Sandiford;
> vmaka...@redhat.com
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng  wrote:
> > (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".
> 
> 
> > (change_pressure, mark_regno_live, mark_regno_death,
mark_reg_death)
> > (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
> 
> Please use the DF caches instead of note_stores, note_uses, etc.
> 
> 
> > (free_bb_data): New.
> 
> Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
> free_aux_for_block.
> 

Hi Steven,

This is the updated patch according to your comments. Please review.
I also re-collected code size data and found it is improved by about 0.24%
for mips, which is better than previous data. I believe this should be
caused by recent changes in trunk, rather than by using DF caches to
calculate register pressure. 

Thanks.

2012-09-29  Bin Cheng  

* common.opt (flag_ira_hoist_pressure): New.
* doc/invoke.texi (-fira-hoist-pressure): Describe.
* ira-costs.c (ira_set_pseudo_classes): New parameter.
* ira.h (ira_set_pseudo_classes): Update prototype.
* haifa-sched.c (sched_init): Update call.
* ira.c (ira): Update call.
* regmove.c (regmove_optimize): Update call.
* loop-invariant.c (move_loop_invariants): Update call.
* gcse.c (struct bb_data): New structure.
(BB_DATA): New macro.
(curr_bb, curr_regs_live, curr_reg_pressure, regs_set)
(n_regs_set): New static variables.
(hoist_expr_reaches_here_p): Use reg pressure to determine the
distance expr can be hoisted.
(hoist_code): Use reg pressure to direct the hoist process.
(get_regno_pressure_class, get_pressure_class_and_nregs)
(change_pressure, mark_regno_live, mark_regno_death)
(mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New.
(one_code_hoisting_pass): Calculate register pressure. Free data.
* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
on Thumb1 when optimizing for size.Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 191816)
+++ gcc/doc/invoke.texi (working copy)
@@ -370,7 +370,7 @@ Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6904,6 +6904,14 @@ This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in hoist pass for decisions to hoist
+expressions.  This option usually results in generation of smaller code on
+RISC machines, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for some targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/haifa-sched.c
===
--- gcc/haifa-sched.c   (revision 191816)
+++ gcc/haifa-sched.c   (working copy)
@@ -6629,7 +6629,7 @@ sched_init (void)
/* We need info about pseudos for rtl dumps about pseudo
   classes and costs.  */
regstat_init_n_sets_and_refs ();
-  ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+  ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
   sched_regno_pressure_class
= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
   for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===
--- gcc/regmove.c   (revision 191816)
+++ gcc/regmove.c   (working copy)
@@ -1237,7 +1237,7 @@ regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-ira_set_pseudo_classes (dump_file);
+ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===
--- gcc/gcse.c  (revision 191816)
+++ gcc/gcse.c  (working copy)
@@ -20,9 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* TODO
- reordering of memory allocation and freeing to be more space efficient
-

[PATCH RFA] Implement register pressure directed hoist pass

2012-09-28 Thread Bin Cheng
Hi,

This patch implements register pressure directed hoist pass. Basically it
calculates register pressure for each basic block and use that information
to determine the hoist distance of each candidate expression. The register
pressure is calculated by re-using IRA utilities.

I measured the benefit on Thumb1/Thumb2/ARM/x86/MIPS instruction sets and
targets. For CSiBE, it improves code size by more than 0.1% on thumb1/ARM
instruction set; it improves code size nearly 0.2% on MIPS, all with very
small regressions. Since the hoist itself improves code size by only about
0.1% on Thumb1 instruction set, this is considerable improvement.
Unfortunately this patch has no obvious effect on Thumb2 and X86, so
currently I enabled it on Thumb1 when optimizing for size. Other targets can
take advantage of it as necessary after upstream.

Apart from the change in hoist pass, this patch also changes prototype of
function ira_set_pseudo_classes in IRA. This change is to make IRA
re-calculate cost information by itself, rather than re-using the info
calculated by hoist pass, because hoist is an early pass and the information
cannot be used directly in IRA. You can refer to
http://gcc.gnu.org/ml/gcc/2012-08/msg00299.html for some discussion.

I bootstrap gcc x86_64 on trunk r190769, since the head revision when I was
working on this fails bootstrap that time. The results are:
bootstraptime(real/user/sys)
trunk/Os 122m9s/118m30s/19m48s
patched/Os   122m20s/118m19s/19m52s
patched/Os/fira-hoist-pressure   120m47s/119m9s/19m38s
It seems the patch has no obvious slowdown on gcc compilation time. I also
measured miscellaneous binaries generated like cc1/cc1plus, the code size of
text section has been improved by only 0.05% on x86_64, not as obvious as
ARM/MIPS(0.1-0.2%).

I ran regression test on cortex-m3/cortex-m0/X86 with Os and everything was
fine.

Is it ok for upstream?

Thanks

2012-09-28  Bin Cheng  

* common.opt (flag_ira_hoist_pressure): New.
* doc/invoke.texi (-fira-hoist-pressure): Describe.
* ira-costs.c (ira_set_pseudo_classes): New parameter.
* ira.h (ira_set_pseudo_classes): Update prototype.
* haifa-sched.c (sched_init): Update call.
* ira.c (ira): Update call.
* regmove.c (regmove_optimize): Update call.
* loop-invariant.c (move_loop_invariants): Update call.
* gcse.c (struct bb_data): New structure.
(BB_DATA): New macro.
(curr_bb, curr_regs_live, curr_reg_pressure, regs_set, n_regs_set):
New
static variables.
(hoist_expr_reaches_here_p): Use reg pressure to determin the
distance
expr can be hoisted.
(hoist_code): Use reg pressure to direct the hoist process.
(get_regno_pressure_class, get_pressure_class_and_nregs)
(change_pressure, mark_regno_live, mark_regno_death, mark_reg_death)
(mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
(free_bb_data): New.
(one_code_hoisting_pass): Calculate register pressure. Free data.
* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
on Thumb1 when optimizing for size.Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 191816)
+++ gcc/doc/invoke.texi (working copy)
@@ -370,7 +370,7 @@ Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6904,6 +6904,14 @@ This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in hoist pass for decisions to hoist
+expressions.  This option usually results in generation of smaller code on
+RISC machines, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for some targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/haifa-sched.c
===
--- gcc/haifa-sched.c   (revision 191816)
+++ gcc/haifa-sched.c   (working copy)
@@ -6629,7 +6629,7 @@ sched_init (void)
/* We need info about pseudos for rtl dumps about pseudo
   classes and costs.  */
regstat_init_n_sets_and_refs ();
-  ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+  ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
   sched_regno_pressure_class
= (enum reg_cl