Commit: Xstormy16: Add modes to post_inc and pre_dec patterns

2014-02-21 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to add modes to the POST_INC and PRE_DEC
  patterns in the XStormy16 backend.  The lack of the modes was leading
  to some build problems.

Cheers
  Nick

gcc/ChangeLog
2014-02-21  Nick Clifton  

* config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc.
(pushhi1): Likewise.
(popqi1): Add mode to pre_dec.
(pophi1): Likewise.

Index: gcc/config/stormy16/stormy16.md
===
--- gcc/config/stormy16/stormy16.md (revision 207983)
+++ gcc/config/stormy16/stormy16.md (working copy)
@@ -114,7 +114,7 @@
 ;; insns like this one are never generated.
 
 (define_insn "pushqi1"
-  [(set (mem:QI (post_inc (reg:HI 15)))
+  [(set (mem:QI (post_inc:HI (reg:HI 15)))
(match_operand:QI 0 "register_operand" "r"))]
   ""
   "push %0"
@@ -123,7 +123,7 @@
 
 (define_insn "popqi1"
   [(set (match_operand:QI 0 "register_operand" "=r")
-   (mem:QI (pre_dec (reg:HI 15]
+   (mem:QI (pre_dec:HI (reg:HI 15]
   ""
   "pop %0"
   [(set_attr "psw_operand" "nop")
@@ -168,7 +168,7 @@
(set_attr "psw_operand" "0,0,0,0,nop,0,nop,0,0")])
 
 (define_insn "pushhi1"
-  [(set (mem:HI (post_inc (reg:HI 15)))
+  [(set (mem:HI (post_inc:HI (reg:HI 15)))
(match_operand:HI 0 "register_operand" "r"))]
   ""
   "push %0"
@@ -177,7 +177,7 @@
 
 (define_insn "pophi1"
   [(set (match_operand:HI 0 "register_operand" "=r")
-   (mem:HI (pre_dec (reg:HI 15]
+   (mem:HI (pre_dec:HI (reg:HI 15]
   ""
   "pop %0"
   [(set_attr "psw_operand" "nop")


Re: [PATCH] Fix PR c++/60065.

2014-02-21 Thread Adam Butcher

On 2014-02-20 16:18, Jason Merrill wrote:

On 02/19/2014 10:00 PM, Adam Butcher wrote:

+  if (current_template_parms)
+{
+  cp_binding_level *maybe_tmpl_scope = 
current_binding_level->level_chain;
+  while (maybe_tmpl_scope && maybe_tmpl_scope->kind == 
sk_class)

+   maybe_tmpl_scope = maybe_tmpl_scope->level_chain;
+  if (maybe_tmpl_scope && maybe_tmpl_scope->kind == 
sk_template_parms)

+   declaring_template_p = true;
+}


Won't this return true for a member function of a class template?  
i.e.


template 
struct A {
  void f(auto x);
};

Yes I think you're right.  I was thinking about that yesterday but 
hadn't had a chance to get to my PC to check or post a reply.  The 
intent is to deal with out-of-line implicit member templates.  But I 
think the issue is more complex; and I think it may be true for the 
synthesize code as well as this new code.


A class template with an out-of-line generic function definition will 
give the same issue I think:


  template 
  void A::f(auto x) {}  // should inject a new list

It needs to know when to extend a function template parameter list and 
when to insert a new one.  Another case:


  struct B
  {
template 
void f(auto x);
  };

  template 
  void B::f(auto x) {}  // should extend existing inner list

And also:

  template 
  struct C
  {
template 
void f(auto x);
  };

  template 
  template 
  void C::f(auto x) {}  // should extend existing inner list

Obviously there is an arbitrary depth of class and class templates.

Need to look further into it when I get some more time.


Once it's resolved I think it'd be useful to create a new function to 
determine this rather than doing the scope walk in a number of places.  
Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully 
some more elegant name!




Why doesn't num_template_parameter_lists work as a predicate here?

It works in the lambda case as it is updated there, but for generic 
functions I think the following prevents it:


  cp/parser.c:17063:

  /* Inside the function parameter list, surrounding
 template-parameter-lists do not apply.  */
  saved_num_template_parameter_lists
= parser->num_template_parameter_lists;
  parser->num_template_parameter_lists = 0;

  begin_scope (sk_function_parms, NULL_TREE);

  /* Parse the parameter-declaration-clause.  */
  params = cp_parser_parameter_declaration_clause (parser);

  /* Restore saved template parameter lists accounting for 
implicit

 template parameters.  */
  parser->num_template_parameter_lists
+= saved_num_template_parameter_lists;


Cheers,
Adam



[PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Jakub Jelinek
Hi!

As discussed in the PR, on larger functions we can end up with
over 3 million of compute_control_dep_chain nested calls from
a single compute_control_dep_chain call, on that testcase all that
effort just to get zero or at most one (useless) control dep path.
The problem is that the function is really unbound, even with the
6 element path length limitation (recursion depth) and the limit of 8
find_pdom calls - everything still iterates on all the successor edges at
each level.  And, the function is often called on the same basic block
again and again, even at a particular depth level (e.g. over 20 times
same bb same depth level).  But the preceeding edge list is slightly
different in each case and in theory it could give different answers.

Fixed by bounding the total number of nested calls.

Additionally, I've made a couple of cleanups, heap allocating 8 field array
instead of using an automatic array makes no sense, the chain length is at
most 6 and thus we can use a stack vector, etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-02-21  Jakub Jelinek  

PR tree-optimization/56490
* params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
* tree-ssa-uninit.c: Include params.h.
(compute_control_dep_chain): Add num_calls argument, return false
if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
num_calls to recursive call.
(find_predicates): Change dep_chain into normal array,
cur_chain into auto_vec, add num_calls
variable and adjust compute_control_dep_chain caller.
(find_def_preds): Likewise.

--- gcc/params.def.jj   2014-01-09 19:09:47.0 +0100
+++ gcc/params.def  2014-02-20 19:30:37.467597338 +0100
@@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
  "asan-use-after-return",
  "Enable asan builtin functions protection",
  1, 0, 1)
+
+DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
+ "uninit-control-dep-attempts",
+ "Maximum number of nested calls to search for control dependencies "
+ "during uninitialized variable analysis",
+ 1000, 1, 0)
 /*
 
 Local variables:
--- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100
+++ gcc/tree-ssa-uninit.c   2014-02-20 19:31:14.198385817 +0100
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
 #include "hashtab.h"
 #include "tree-pass.h"
 #include "diagnostic-core.h"
+#include "params.h"
 
 /* This implements the pass that does predicate aware warning on uses of
possibly uninitialized variables. The pass first collects the set of
@@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb
 
 /* Computes the control dependence chains (paths of edges)
for DEP_BB up to the dominating basic block BB (the head node of a
-   chain should be dominated by it).  CD_CHAINS is pointer to a
-   dynamic array holding the result chains. CUR_CD_CHAIN is the current
+   chain should be dominated by it).  CD_CHAINS is pointer to an
+   array holding the result chains.  CUR_CD_CHAIN is the current
chain being computed.  *NUM_CHAINS is total number of chains.  The
function returns true if the information is successfully computed,
return false if there is no control dependence or not computed.  */
@@ -400,7 +401,8 @@ static bool
 compute_control_dep_chain (basic_block bb, basic_block dep_bb,
vec *cd_chains,
size_t *num_chains,
-   vec *cur_cd_chain)
+  vec *cur_cd_chain,
+  int *num_calls)
 {
   edge_iterator ei;
   edge e;
@@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
   if (EDGE_COUNT (bb->succs) < 2)
 return false;
 
+  if (*num_calls > PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
+return false;
+  ++*num_calls;
+
   /* Could use a set instead.  */
   cur_chain_len = cur_cd_chain->length ();
   if (cur_chain_len > MAX_CHAIN_LEN)
@@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b
 
   /* Now check if DEP_BB is indirectly control dependent on BB.  */
   if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
- num_chains, cur_cd_chain))
+num_chains, cur_cd_chain, num_calls))
 {
   found_cd_chain = true;
   break;
@@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
  basic_block use_bb)
 {
   size_t num_chains = 0, i;
-  vec *dep_chains = 0;
-  vec cur_chain = vNULL;
+  int num_calls = 0;
+  vec dep_chains[MAX_NUM_CHAINS];
+  auto_vec cur_chain;
   bool has_valid_pred = false;
   basic_block cd_root = 0;
 
-  typedef vec vec_edge_heap;
-  dep_chains = XCNEWVEC (vec_edge_heap, MAX_NUM_CHAINS);
-
   /* First find the closest bb that is control equivalent to PHI_BB
  that also dominates USE_BB.  */
   cd_root = phi_bb;
@@

Re: [RFA/dwarf v2] Add DW_AT_GNAT_use_descriptive_type flag for Ada units.

2014-02-21 Thread Joel Brobecker
Hello,

Would anyone be able to (re-)approve this patch, please?

It should be really really straightforward, and only adds a DWARF
flag to Ada Compilation Units, so I should think that the risk
is near zero. I've tested the patch as usual regardless.

Parallel to that, we have also started working on producing
standard DWARF in place of our encodings, and small progress has been
made. But this is even more of a huge task than we thought, and in
the meantime, this little flag will help non-AdaCore users...

Thank you!

On Fri, Jan 31, 2014 at 09:09:05AM +0400, Joel Brobecker wrote:
> On Tue, Feb 19, 2013 at 10:50:46PM -0500, Jason Merrill wrote:
> > On 02/19/2013 10:42 PM, Joel Brobecker wrote:
> > >This is useful when a DIE does not have a descriptive type attribute.
> > >In that case, the debugger needs to determine whether the unit
> > >was compiled with a compiler that normally provides that information,
> > >or not.
> > 
> > Ah.  OK, then.  But I'd prefer to call it
> > DW_AT_GNAT_use_descriptive_type, to follow the convention of keeping
> > the vendor tag at the beginning of the name.
> 
> Almost a year ago, you privately approved a small patch of mine,
> with the small request above. I'm sorry I let it drag so long!
> Here is the updated patch.
> 
> include/ChangeLog:
> 
> * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into
> DW_AT_GNAT_use_descriptive_type.
> 
> gcc/ChangeLog:
> 
> * dwarf2out.c (gen_compile_unit_die): Add
> DW_AT_use_GNAT_descriptive_type attribute for Ada units.
> 
> Tested on x86_64-linux.
> 
> I should also adjust the Wiki page accordingly, but the login process
> keeps timing out. I know I have the right login and passwd since
> I succesfully reset them using the passwd recovery procedure, just
> in case the error was due to bad credentials. I'll try again later.
> 
> If approved, I will also take care of coordinating the dwarf2.def
> change with binutils-gdb.git.
> 
> Is this patch still OK to commit?
> 
> Thank you,
> -- 
> Joel

> >From 7aae3721addf6905113d9f0287a5cbb5301a462b Mon Sep 17 00:00:00 2001
> From: Joel Brobecker 
> Date: Thu, 3 Jan 2013 09:25:12 -0500
> Subject: [PATCH] [dwarf] Add DW_AT_GNAT_use_descriptive_type flag for Ada 
> units.
> 
> This patch first renames the DW_AT_use_GNAT_descriptive_type DWARF
> attribute into DW_AT_GNAT_use_descriptive_type to better follow
> the usual convention of keeping the vendor tag at the beginning
> of the name.
> 
> It then modifies dwadrf2out to generate this attribute for Ada units.
> 
> include/ChangeLog:
> 
> * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into
> DW_AT_GNAT_use_descriptive_type.
> 
> gcc/ChangeLog:
> 
> * dwarf2out.c (gen_compile_unit_die): Add
> DW_AT_use_GNAT_descriptive_type attribute for Ada units.
> ---
>  gcc/dwarf2out.c|4 
>  include/dwarf2.def |2 +-
>  2 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index d1ca4ba..057605c 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -19318,6 +19318,10 @@ gen_compile_unit_die (const char *filename)
>/* The default DW_ID_case_sensitive doesn't need to be specified.  */
>break;
>  }
> +
> +  if (language == DW_LANG_Ada95)
> +add_AT_flag (die, DW_AT_GNAT_use_descriptive_type, 1);
> +
>return die;
>  }
>  
> diff --git a/include/dwarf2.def b/include/dwarf2.def
> index 71a37b3..4dd636e 100644
> --- a/include/dwarf2.def
> +++ b/include/dwarf2.def
> @@ -398,7 +398,7 @@ DW_AT (DW_AT_VMS_rtnbeg_pd_address, 0x2201)
>  /* GNAT extensions.  */
>  /* GNAT descriptive type.
> See http://gcc.gnu.org/wiki/DW_AT_GNAT_descriptive_type .  */
> -DW_AT (DW_AT_use_GNAT_descriptive_type, 0x2301)
> +DW_AT (DW_AT_GNAT_use_descriptive_type, 0x2301)
>  DW_AT (DW_AT_GNAT_descriptive_type, 0x2302)
>  /* UPC extension.  */
>  DW_AT (DW_AT_upc_threads_scaled, 0x3210)
> -- 
> 1.7.0.4
> 


-- 
Joel


Re: [Patch, Fortran, OOP, Regression] PR 60234: ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-21 Thread Janus Weil
2014-02-21 8:25 GMT+01:00 Tobias Burnus :
> Hi Janus,
>
> Janus Weil wrote:
>>
>> What the patch does is to defer the building of the vtabs to a later
>> stage. Previously this was done only for some rare cases, now we do it
>> basically for all vtabs. This is necessary with finalization, since
>> building the vtab also implies building the finalization wrapper, for
>> which it is necessary that the finalizers have been resolved.
>>
>> Anyway, the patch regtests cleanly on x86_64-unknown-linux-gnu. Ok for
>> trunk?
>
>
> Looks good to me.
>
> Does
>
>  comp_is_finalizable (gfc_component *comp)
>  {
> -  if (comp->attr.allocatable && comp->ts.type != BT_CLASS)
> +  if (comp->attr.proc_pointer)
> +return false;
> +  else if (comp->attr.allocatable && comp->ts.type != BT_CLASS)
>
> fix an other PR - or did you just spot it when looking at the code? It it
> certainly simple, correct and should go in.

this became necessary after the vtab changes (although I don't
remember which test case triggered it). comp_is_finalizable is called
(more or less directly) from generate_finalization_wrapper. Since the
latter was called too early, the problem with PPCs was not triggered
previously, it seems.

I have committed the patch as r207986. Thanks for the review!

Cheers,
Janus



>> 2014-02-20  Janus Weil  
>>
>>  PR fortran/60234
>>  * gfortran.h (gfc_build_class_symbol): Removed argument.
>>  * class.c (gfc_add_component_ref): Fix up missing vtype if necessary.
>>  (gfc_build_class_symbol): Remove argument 'delayed_vtab'. vtab is
>> always
>>  delayed now, except for unlimited polymorphics.
>>  (comp_is_finalizable): Procedure pointer components are not
>> finalizable.
>>  * decl. (build_sym, build_struct, attr_decl1): Removed argument of
>>  'gfc_build_class_symbol'.
>>  * match.c (copy_ts_from_selector_to_associate, select_type_set_tmp):
>>  Ditto.
>>  * symbol.c (gfc_set_default_type): Ditto.
>>
>>
>> 2014-02-20  Janus Weil  
>>
>>  PR fortran/60234
>>  * gfortran.dg/finalize_23.f90: New.


Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, on larger functions we can end up with
> over 3 million of compute_control_dep_chain nested calls from
> a single compute_control_dep_chain call, on that testcase all that
> effort just to get zero or at most one (useless) control dep path.
> The problem is that the function is really unbound, even with the
> 6 element path length limitation (recursion depth) and the limit of 8
> find_pdom calls - everything still iterates on all the successor edges at
> each level.  And, the function is often called on the same basic block
> again and again, even at a particular depth level (e.g. over 20 times
> same bb same depth level).  But the preceeding edge list is slightly
> different in each case and in theory it could give different answers.
> 
> Fixed by bounding the total number of nested calls.
> 
> Additionally, I've made a couple of cleanups, heap allocating 8 field array
> instead of using an automatic array makes no sense, the chain length is at
> most 6 and thus we can use a stack vector, etc.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2014-02-21  Jakub Jelinek  
> 
>   PR tree-optimization/56490
>   * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
>   * tree-ssa-uninit.c: Include params.h.
>   (compute_control_dep_chain): Add num_calls argument, return false
>   if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
>   num_calls to recursive call.
>   (find_predicates): Change dep_chain into normal array,
>   cur_chain into auto_vec, add num_calls
>   variable and adjust compute_control_dep_chain caller.
>   (find_def_preds): Likewise.
> 
> --- gcc/params.def.jj 2014-01-09 19:09:47.0 +0100
> +++ gcc/params.def2014-02-20 19:30:37.467597338 +0100
> @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
>   "asan-use-after-return",
>   "Enable asan builtin functions protection",
>   1, 0, 1)
> +
> +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
> +   "uninit-control-dep-attempts",
> +   "Maximum number of nested calls to search for control dependencies "
> +   "during uninitialized variable analysis",
> +   1000, 1, 0)
>  /*
>  
>  Local variables:
> --- gcc/tree-ssa-uninit.c.jj  2014-02-04 01:35:58.0 +0100
> +++ gcc/tree-ssa-uninit.c 2014-02-20 19:31:14.198385817 +0100
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
>  #include "hashtab.h"
>  #include "tree-pass.h"
>  #include "diagnostic-core.h"
> +#include "params.h"
>  
>  /* This implements the pass that does predicate aware warning on uses of
> possibly uninitialized variables. The pass first collects the set of
> @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb
>  
>  /* Computes the control dependence chains (paths of edges)
> for DEP_BB up to the dominating basic block BB (the head node of a
> -   chain should be dominated by it).  CD_CHAINS is pointer to a
> -   dynamic array holding the result chains. CUR_CD_CHAIN is the current
> +   chain should be dominated by it).  CD_CHAINS is pointer to an
> +   array holding the result chains.  CUR_CD_CHAIN is the current
> chain being computed.  *NUM_CHAINS is total number of chains.  The
> function returns true if the information is successfully computed,
> return false if there is no control dependence or not computed.  */
> @@ -400,7 +401,8 @@ static bool
>  compute_control_dep_chain (basic_block bb, basic_block dep_bb,
> vec *cd_chains,
> size_t *num_chains,
> -   vec *cur_cd_chain)
> +vec *cur_cd_chain,
> +int *num_calls)
>  {
>edge_iterator ei;
>edge e;
> @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
>if (EDGE_COUNT (bb->succs) < 2)
>  return false;
>  
> +  if (*num_calls > PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
> +return false;
> +  ++*num_calls;
> +
>/* Could use a set instead.  */
>cur_chain_len = cur_cd_chain->length ();
>if (cur_chain_len > MAX_CHAIN_LEN)
> @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b
>  
>/* Now check if DEP_BB is indirectly control dependent on BB.  */
>if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
> - num_chains, cur_cd_chain))
> +  num_chains, cur_cd_chain, num_calls))
>  {
>found_cd_chain = true;
>break;
> @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
>   basic_block use_bb)
>  {
>size_t num_chains = 0, i;
> -  vec *dep_chains = 0;
> -  vec cur_chain = vNULL;
> +  int num_calls = 0;
> +  vec dep_chains[MAX_NUM_CHAINS];
> +  auto_vec cur_chain;
>bool has_valid_pred = false;
>b

Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk

2014-02-21 Thread Thomas Schwinge
Hi!

On Fri, 15 Nov 2013 14:44:45 -0700, Aldy Hernandez  wrote:
> Attached is the final version of the patch I have committed to trunk.

> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -1118,6 +1118,8 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, 
> int spc, int flags)
>   case GF_OMP_FOR_KIND_SIMD:
> kind = " simd";
> break;
> + case GF_OMP_FOR_KIND_CILKSIMD:
> +   kind = " cilksimd";
>   case GF_OMP_FOR_KIND_DISTRIBUTE:
> kind = " distribute";
> break;

Fixed (untested, but obvious) in r207987:

commit b12563e00026b48b817fd3532fc3df2db2a0f460
Author: tschwinge 
Date:   Fri Feb 21 09:18:15 2014 +

Correct TDF_RAW pretty-printing of GIMPLE_OMP_FOR's 
GF_OMP_FOR_KIND_CILKSIMD.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_for) [flags & TDF_RAW]
: Add missing break statement.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@207987 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog gcc/ChangeLog
index 67299af..cc9031b 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-21  Thomas Schwinge  
+
+   * gimple-pretty-print.c (dump_gimple_omp_for) [flags & TDF_RAW]
+   : Add missing break statement.
+
 2014-02-21  Nick Clifton  
 
* config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc.
diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c
index 2d1e1c7..741cd92 100644
--- gcc/gimple-pretty-print.c
+++ gcc/gimple-pretty-print.c
@@ -1121,6 +1121,7 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, 
int spc, int flags)
  break;
case GF_OMP_FOR_KIND_CILKSIMD:
  kind = " cilksimd";
+ break;
case GF_OMP_FOR_KIND_DISTRIBUTE:
  kind = " distribute";
  break;


Grüße,
 Thomas


pgpOQlUIk9VU2.pgp
Description: PGP signature


[PATCH][1/n] Improve PR60291

2014-02-21 Thread Richard Biener

This improves compile-time of PR60291 at -O1 from 210s to 85s,
getting remove unused locals out of the profile.  There walking
DECL_INITIAL of globals is quadratic when that is refered to from
multiple functions.  We've had the same issue with
add_referenced_vars when that still existed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
and branch?

I've verified that I can still properly debug

int **
foo (void)
{
  static int a = 0;
  static int *b = &a;
  static int **c = &b;
  return c;
}
int main()
{
  return **foo();
}

(step into foo and print a, b and c).  Note that even with 4.8
right now

int **
foo (void)
{
  int **q;
{
  static int a = 0;
  static int *b = &a;
  static int **c = &b;
  q = c;
}
  return q;
}
int main()
{
  return **foo();
}

is broken (with -O1 -fno-inline, with inlining both cases are
broken).  But that all doesn't regress with the following and
if we fix it then we should fix it another way, not by walking
global initializers.

Thanks,
Richard.

2014-02-21  Richard Biener  

PR middle-end/60291
* tree-ssa-live.c (mark_all_vars_used_1): Do not walk
DECL_INITIAL.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c (revision 207960)
--- gcc/tree-ssa-live.c (working copy)
*** mark_all_vars_used_1 (tree *tp, int *wal
*** 432,443 
/* Only need to mark VAR_DECLS; parameters and return results are not
   eliminated as unused.  */
if (TREE_CODE (t) == VAR_DECL)
! {
!   /* When a global var becomes used for the first time also walk its
!  initializer (non global ones don't have any).  */
!   if (set_is_used (t) && is_global_var (t))
!   mark_all_vars_used (&DECL_INITIAL (t));
! }
/* remove_unused_scope_block_p requires information about labels
   which are not DECL_IGNORED_P to tell if they might be used in the IL.  */
else if (TREE_CODE (t) == LABEL_DECL)
--- 432,438 
/* Only need to mark VAR_DECLS; parameters and return results are not
   eliminated as unused.  */
if (TREE_CODE (t) == VAR_DECL)
! set_is_used (t);
/* remove_unused_scope_block_p requires information about labels
   which are not DECL_IGNORED_P to tell if they might be used in the IL.  */
else if (TREE_CODE (t) == LABEL_DECL)


[PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener

This fixes the slowness of RTL expansion in PR60291 which is caused
by excessive collisions in mem-attr sharing.  The issue here is
that sharing attempts happens across functions and we have a _lot_
of functions in this testcase referencing the same lexically
equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
means those get the same hash value.  But they don't compare
equal because an SSA name _5 from function A is of course not equal
to one from function B.

The following fixes that by not doing mem-attr sharing across functions
by clearing the mem-attrs hashtable in rest_of_clean_state.

Another fix may be to do what the comment in iterative_hash_expr
says for SSA names:

case SSA_NAME:
  /* We can just compare by pointer.  */
  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);

(probably blame me for changing that to hashing the SSA version).  But
I'm not sure that doesn't uncover issues with various hashtables and
walking them, generating code dependent on the order.  It's IMHO just not
expected that you compare function-local expressions from different
functions.

The other thing would be to discard mem-attr sharing alltogether,
but that doesn't seem appropriate at this stage (but it would
also simplify quite some code).  With only one function in RTL
at a time that shouldn't be too bad (see several suggestions
along that line, even with statistics).

Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
trunk and 4.8 branch?

Thanks,
Richard.

2014-02-21  Richard Biener  

PR middle-end/60291
* rtl.h (clear_mem_attrs_htab): Declare.
* emit-rtl.c (clear_mem_attrs_htab): New function.
* final.c (rest_of_clean_state): Call clear_mem_attrs_htab
to avoid sharing mem-attrs between functions.

Index: gcc/rtl.h
===
*** gcc/rtl.h   (revision 207960)
--- gcc/rtl.h   (working copy)
*** extern int in_sequence_p (void);
*** 2546,2551 
--- 2546,2552 
  extern void init_emit (void);
  extern void init_emit_regs (void);
  extern void init_emit_once (void);
+ extern void clear_mem_attrs_htab (void);
  extern void push_topmost_sequence (void);
  extern void pop_topmost_sequence (void);
  extern void set_new_first_and_last_insn (rtx, rtx);
Index: gcc/emit-rtl.c
===
*** gcc/emit-rtl.c  (revision 207960)
--- gcc/emit-rtl.c  (working copy)
*** init_emit_once (void)
*** 5913,5918 
--- 5913,5926 
simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
  }
+ 
+ /* Clear the mem-attrs sharing hash table.  */
+ 
+ void
+ clear_mem_attrs_htab (void)
+ {
+   htab_empty (mem_attrs_htab);
+ }
  
  /* Produce exact duplicate of insn INSN after AFTER.
 Care updating of libcall regions if present.  */
Index: gcc/final.c
===
*** gcc/final.c (revision 207960)
--- gcc/final.c (working copy)
*** rest_of_clean_state (void)
*** 4678,4683 
--- 4678,4686 
  
init_recog_no_volatile ();
  
+   /* Reset mem-attrs sharing.  */
+   clear_mem_attrs_htab ();
+ 
/* We're done with this function.  Free up memory if we can.  */
free_after_parsing (cfun);
free_after_compilation (cfun);


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> 
> This fixes the slowness of RTL expansion in PR60291 which is caused
> by excessive collisions in mem-attr sharing.  The issue here is
> that sharing attempts happens across functions and we have a _lot_
> of functions in this testcase referencing the same lexically
> equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> means those get the same hash value.  But they don't compare
> equal because an SSA name _5 from function A is of course not equal
> to one from function B.
> 
> The following fixes that by not doing mem-attr sharing across functions
> by clearing the mem-attrs hashtable in rest_of_clean_state.
> 
> Another fix may be to do what the comment in iterative_hash_expr
> says for SSA names:
> 
> case SSA_NAME:
>   /* We can just compare by pointer.  */
>   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> 
> (probably blame me for changing that to hashing the SSA version).

It was lxo.

> But I'm not sure that doesn't uncover issues with various hashtables and
> walking them, generating code dependent on the order.  It's IMHO just not
> expected that you compare function-local expressions from different
> functions.

Same speedup result from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

and from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_host_wide_int
+ (DECL_UID (cfun->decl),
+  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

better than hashing pointers but requring cfun != NULL in this
function isn't good either.

At this point I'm more comfortable with clearing the hashtable
than with changing iterative_hash_expr in any way.  It's also
along the way to get rid of the hash completely.

Oh, btw, the speedup is going from

 expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
wall  293891 kB (15%) ggc

to

 expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
wall  262544 kB (13%) ggc

at -O0 (less dramatic slowness for -On).

> The other thing would be to discard mem-attr sharing alltogether,
> but that doesn't seem appropriate at this stage (but it would
> also simplify quite some code).  With only one function in RTL
> at a time that shouldn't be too bad (see several suggestions
> along that line, even with statistics).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
> trunk and 4.8 branch?
> 
> Thanks,
> Richard.
> 
> 2014-02-21  Richard Biener  
> 
>   PR middle-end/60291
>   * rtl.h (clear_mem_attrs_htab): Declare.
>   * emit-rtl.c (clear_mem_attrs_htab): New function.
>   * final.c (rest_of_clean_state): Call clear_mem_attrs_htab
>   to avoid sharing mem-attrs between functions.
> 
> Index: gcc/rtl.h
> ===
> *** gcc/rtl.h (revision 207960)
> --- gcc/rtl.h (working copy)
> *** extern int in_sequence_p (void);
> *** 2546,2551 
> --- 2546,2552 
>   extern void init_emit (void);
>   extern void init_emit_regs (void);
>   extern void init_emit_once (void);
> + extern void clear_mem_attrs_htab (void);
>   extern void push_topmost_sequence (void);
>   extern void pop_topmost_sequence (void);
>   extern void set_new_first_and_last_insn (rtx, rtx);
> Index: gcc/emit-rtl.c
> ===
> *** gcc/emit-rtl.c(revision 207960)
> --- gcc/emit-rtl.c(working copy)
> *** init_emit_once (void)
> *** 5913,5918 
> --- 5913,5926 
> simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
> cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
>   }
> + 
> + /* Clear the mem-attrs sharing hash table.  */
> + 
> + void
> + clear_mem_attrs_htab (void)
> + {
> +   htab_empty (mem_attrs_htab);
> + }
>   
>   /* Produce exact duplicate of insn INSN after AFTER.
>  Care updating of libcall regions if present.  */
> Index: gcc/final.c
> ===
> *** gcc/final.c   (revision 207960)
> --

[PATCH] Fix PR60276

2014-02-21 Thread Richard Biener

This attempts to fix PR60276 - the fact that the vectorizer dependence
analysis is run too early and that it invalidates assumptions it
makes there later.  The specific issue in question arises when
the vectorizer needs to effectively unroll the loop and by
performing all vectorized loads first and vectorized stores last
the idea that it can ignore known dependences with negative
distance doesn't work out if that distance is too short.

The following is the shortest (and eventually backportable) change
I could come up with - record the negative distance during
dependence analysis and re-validate it when decisions about
stmt copying and group sizes are fixed.

Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look
ok?

Thanks,
Richard.

2014-02-21  Richard Biener  

PR tree-optimization/60276
* tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field.
(STMT_VINFO_MIN_NEG_DIST): New macro.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record
STMT_VINFO_MIN_NEG_DIST.
* tree-vect-stmts.c (vectorizable_load): Verify if assumptions
made for negative dependence distances still hold.

* gcc.dg/vect/pr60276.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 207938)
--- gcc/tree-vectorizer.h   (working copy)
*** typedef struct _stmt_vec_info {
*** 622,627 
--- 622,631 
   is 1.  */
unsigned int gap;
  
+   /* The minimum negative dependence distance this stmt participates in
+  or zero if none.  */
+   unsigned int min_neg_dist;
+ 
/* Not all stmts in the loop need to be vectorized. e.g, the increment
   of the loop induction variable and computation of array indexes. relevant
   indicates whether the stmt needs to be vectorized.  */
*** typedef struct _stmt_vec_info {
*** 677,682 
--- 681,687 
  #define STMT_VINFO_GROUP_SAME_DR_STMT(S)   (S)->same_dr_stmt
  #define STMT_VINFO_GROUPED_ACCESS(S)  ((S)->first_element != NULL && 
(S)->data_ref_info)
  #define STMT_VINFO_LOOP_PHI_EVOLUTION_PART(S) (S)->loop_phi_evolution_part
+ #define STMT_VINFO_MIN_NEG_DIST(S)(S)->min_neg_dist
  
  #define GROUP_FIRST_ELEMENT(S)  (S)->first_element
  #define GROUP_NEXT_ELEMENT(S)   (S)->next_element
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 207938)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_analyze_data_ref_dependence (struct
*** 403,408 
--- 425,437 
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "dependence distance negative.\n");
+ /* Record a negative dependence distance to later limit the
+amount of stmt copying / unrolling we can perform.
+Only need to handle read-after-write dependence.  */
+ if (DR_IS_READ (drb)
+ && (STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) == 0
+ || STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) > dist))
+   STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) = dist;
  continue;
}
  
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 207938)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_load (gimple stmt, gimple_s
*** 5629,5634 
--- 5629,5648 
return false;
  }
  
+   /* Invalidate assumptions made by dependence analysis when vectorization
+  on the unrolled body effectively re-orders stmts.  */
+   if (ncopies > 1
+   && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+   && ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ > STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+ {
+   if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot perform implicit CSE when unrolling "
+"with negative dependence distance\n");
+   return false;
+ }
+ 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
*** vectorizable_load (gimple stmt, gimple_s
*** 5686,5691 
--- 5700,5719 
  else if (!vect_grouped_load_supported (vectype, group_size))
return false;
}
+ 
+   /* Invalidate assumptions made by dependence analysis when vectorization
+on the unrolled body effectively re-orders stmts.  */
+   if (!PURE_SLP_STMT (stmt_info)
+ && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+ && ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ > STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ 

Re: [PATCH] Fix latent bug in replace_uses_by

2014-02-21 Thread Bin.Cheng
On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener  wrote:
>
> The following fixes an ICE I got when building libjava with a local
> patch.  This causes us to substitute &MEM[&a, 5] into MEM[_3, 0]
> to MEM[&MEM[&a, 5], 0] and then asking stmt_ends_bb_p which doesn't
> expect such bogus MEM_REFs.  The MEM_REF is canonicalized by
> calling fold_stmt on it later, but the fix is of course to move
> the marking of altered BBs before doing the actual substitution
> (only then we are sure to catch all previous bb-ending stmts).
>
> I also noticed we don't verify MEM_REFs on LHSs.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
> to trunk and branch (it's a regression uncovered by the fix for PR60115).
>
> Richard.
>
> 2014-02-20  Richard Biener  
>
> * tree-cfg.c (replace_uses_by): Mark altered BBs before
> doing the substitution.
> (verify_gimple_assign_single): Also verify bare MEM_REFs
> on the lhs.
>
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c  (revision 207936)
> +++ gcc/tree-cfg.c  (working copy)
> @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val)
>
>FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
>  {
> +  /* Mark the block if we change the last stmt in it.  */
> +  if (cfgcleanup_altered_bbs
> + && stmt_ends_bb_p (stmt))
> +   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)->index);
> +
>FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
>  {
>   replace_exp (use, val);
> @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val)
>   gimple orig_stmt = stmt;
>   size_t i;
>
> - /* Mark the block if we changed the last stmt in it.  */
> - if (cfgcleanup_altered_bbs
> - && stmt_ends_bb_p (stmt))
> -   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)->index);
> -
Hi Richard,
I also noticed this with local patch, but is it OK just to move above
code after fold_stmt? In other words, does phi node matter (according
to comments before cfgcleanup_altered_bbs)?

Thanks in advance.


>   /* FIXME.  It shouldn't be required to keep TREE_CONSTANT
>  on ADDR_EXPRs up-to-date on GIMPLE.  Propagation will
>  only change sth from non-invariant to invariant, and only
> @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt
>return true;
>  }
>
> -  if (handled_component_p (lhs))
> +  if (handled_component_p (lhs)
> +  || TREE_CODE (lhs) == MEM_REF
> +  || TREE_CODE (lhs) == TARGET_MEM_REF)
>  res |= verify_types_in_gimple_reference (lhs, true);
>
>/* Special codes we cannot handle via their class.  */



-- 
Best Regards.


Re: [PATCH] Fix PR60276

2014-02-21 Thread Jakub Jelinek
On Fri, Feb 21, 2014 at 11:32:41AM +0100, Richard Biener wrote:
> 
> This attempts to fix PR60276 - the fact that the vectorizer dependence
> analysis is run too early and that it invalidates assumptions it
> makes there later.  The specific issue in question arises when
> the vectorizer needs to effectively unroll the loop and by
> performing all vectorized loads first and vectorized stores last
> the idea that it can ignore known dependences with negative
> distance doesn't work out if that distance is too short.
> 
> The following is the shortest (and eventually backportable) change
> I could come up with - record the negative distance during
> dependence analysis and re-validate it when decisions about
> stmt copying and group sizes are fixed.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look
> ok?

Ok, thanks.

> 2014-02-21  Richard Biener  
> 
>   PR tree-optimization/60276
>   * tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field.
>   (STMT_VINFO_MIN_NEG_DIST): New macro.
>   * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record
>   STMT_VINFO_MIN_NEG_DIST.
>   * tree-vect-stmts.c (vectorizable_load): Verify if assumptions
>   made for negative dependence distances still hold.
> 
>   * gcc.dg/vect/pr60276.c: New testcase.

Jakub


Re: [PATCH][1/n] Improve PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> 
> This improves compile-time of PR60291 at -O1 from 210s to 85s,
> getting remove unused locals out of the profile.  There walking
> DECL_INITIAL of globals is quadratic when that is refered to from
> multiple functions.  We've had the same issue with
> add_referenced_vars when that still existed.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
> and branch?
> 
> I've verified that I can still properly debug
> 
> int **
> foo (void)
> {
>   static int a = 0;
>   static int *b = &a;
>   static int **c = &b;
>   return c;
> }
> int main()
> {
>   return **foo();
> }
> 
> (step into foo and print a, b and c).  Note that even with 4.8
> right now
> 
> int **
> foo (void)
> {
>   int **q;
> {
>   static int a = 0;
>   static int *b = &a;
>   static int **c = &b;
>   q = c;
> }
>   return q;
> }
> int main()
> {
>   return **foo();
> }
> 
> is broken (with -O1 -fno-inline, with inlining both cases are
> broken).  But that all doesn't regress with the following and
> if we fix it then we should fix it another way, not by walking
> global initializers.

So I checked if this all is a regression and this particular
piece is a regression from 4.7 where we only walk global
initializers for VAR_DECLs with DECL_CONTEXT == current_function_decl.

So at this point it's easiest and least intrusive to re-instantiate
this restriction which was removed by r187800 (that was me - the
change looks accidential).

Re-bootstrapping / testing on x86_64-unknown-linux-gnu and will
commit afterwards to trunk and to the branch a bit later.

Thanks,
Richard.

2014-02-21  Richard Biener  

PR middle-end/60291
* tree-ssa-live.c (mark_all_vars_used_1): Do not walk
DECL_INITIAL for globals not in the current function context.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c (revision 207960)
--- gcc/tree-ssa-live.c (working copy)
*** mark_all_vars_used_1 (tree *tp, int *wal
*** 435,441 
  {
/* When a global var becomes used for the first time also walk its
   initializer (non global ones don't have any).  */
!   if (set_is_used (t) && is_global_var (t))
mark_all_vars_used (&DECL_INITIAL (t));
  }
/* remove_unused_scope_block_p requires information about labels
--- 435,442 
  {
/* When a global var becomes used for the first time also walk its
   initializer (non global ones don't have any).  */
!   if (set_is_used (t) && is_global_var (t)
! && DECL_CONTEXT (t) == current_function_decl)
mark_all_vars_used (&DECL_INITIAL (t));
  }
/* remove_unused_scope_block_p requires information about labels


Re: [PATCH] Fix latent bug in replace_uses_by

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Bin.Cheng wrote:

> On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener  wrote:
> >
> > The following fixes an ICE I got when building libjava with a local
> > patch.  This causes us to substitute &MEM[&a, 5] into MEM[_3, 0]
> > to MEM[&MEM[&a, 5], 0] and then asking stmt_ends_bb_p which doesn't
> > expect such bogus MEM_REFs.  The MEM_REF is canonicalized by
> > calling fold_stmt on it later, but the fix is of course to move
> > the marking of altered BBs before doing the actual substitution
> > (only then we are sure to catch all previous bb-ending stmts).
> >
> > I also noticed we don't verify MEM_REFs on LHSs.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
> > to trunk and branch (it's a regression uncovered by the fix for PR60115).
> >
> > Richard.
> >
> > 2014-02-20  Richard Biener  
> >
> > * tree-cfg.c (replace_uses_by): Mark altered BBs before
> > doing the substitution.
> > (verify_gimple_assign_single): Also verify bare MEM_REFs
> > on the lhs.
> >
> > Index: gcc/tree-cfg.c
> > ===
> > --- gcc/tree-cfg.c  (revision 207936)
> > +++ gcc/tree-cfg.c  (working copy)
> > @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val)
> >
> >FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
> >  {
> > +  /* Mark the block if we change the last stmt in it.  */
> > +  if (cfgcleanup_altered_bbs
> > + && stmt_ends_bb_p (stmt))
> > +   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)->index);
> > +
> >FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
> >  {
> >   replace_exp (use, val);
> > @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val)
> >   gimple orig_stmt = stmt;
> >   size_t i;
> >
> > - /* Mark the block if we changed the last stmt in it.  */
> > - if (cfgcleanup_altered_bbs
> > - && stmt_ends_bb_p (stmt))
> > -   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb 
> > (stmt)->index);
> > -
> Hi Richard,
> I also noticed this with local patch, but is it OK just to move above
> code after fold_stmt? In other words, does phi node matter (according
> to comments before cfgcleanup_altered_bbs)?

PHI node doesn't matter but doesn't trigger stmt_ends_bb_p anyway.
It's better to do before the replacement because a stmt that may
have been stmt_ends_bb_p before the replacement might not be
after it (and thus we'd miss a cfgcleanup opportunity to merge
two blocks).

Richard.

> Thanks in advance.
> 
> 
> >   /* FIXME.  It shouldn't be required to keep TREE_CONSTANT
> >  on ADDR_EXPRs up-to-date on GIMPLE.  Propagation will
> >  only change sth from non-invariant to invariant, and only
> > @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt
> >return true;
> >  }
> >
> > -  if (handled_component_p (lhs))
> > +  if (handled_component_p (lhs)
> > +  || TREE_CODE (lhs) == MEM_REF
> > +  || TREE_CODE (lhs) == TARGET_MEM_REF)
> >  res |= verify_types_in_gimple_reference (lhs, true);
> >
> >/* Special codes we cannot handle via their class.  */
> 
> 
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


[C++ Patch] PR 60253

2014-02-21 Thread Paolo Carlini

Hi,

unless we have reasons to believe that the diagnostic quality could 
regress in some circumstances, we can easily resolve this ICE on invalid 
regression by always returning error_mark_node after error (thus outside 
SFINAE too).


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2014-02-21  Paolo Carlini  

PR c++/60253
* call.c (convert_arg_to_ellipsis): Return error_mark_node after
error_at.

/testsuite
2014-02-21  Paolo Carlini  

PR c++/60253
* g++.dg/overload/ellipsis2.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 207987)
+++ cp/call.c   (working copy)
@@ -6406,8 +6406,7 @@ convert_arg_to_ellipsis (tree arg, tsubst_flags_t
  if (complain & tf_error)
error_at (loc, "cannot pass objects of non-trivially-copyable "
  "type %q#T through %<...%>", arg_type);
- else
-   return error_mark_node;
+ return error_mark_node;
}
 }
 
Index: testsuite/g++.dg/overload/ellipsis2.C
===
--- testsuite/g++.dg/overload/ellipsis2.C   (revision 0)
+++ testsuite/g++.dg/overload/ellipsis2.C   (working copy)
@@ -0,0 +1,13 @@
+// PR c++/60253
+
+struct A
+{
+  ~A();
+};
+
+struct B
+{
+  B(...);
+};
+
+B b(0, A());  // { dg-error "cannot pass" }


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Biener wrote:
> 
> > 
> > This fixes the slowness of RTL expansion in PR60291 which is caused
> > by excessive collisions in mem-attr sharing.  The issue here is
> > that sharing attempts happens across functions and we have a _lot_
> > of functions in this testcase referencing the same lexically
> > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> > means those get the same hash value.  But they don't compare
> > equal because an SSA name _5 from function A is of course not equal
> > to one from function B.
> > 
> > The following fixes that by not doing mem-attr sharing across functions
> > by clearing the mem-attrs hashtable in rest_of_clean_state.
> > 
> > Another fix may be to do what the comment in iterative_hash_expr
> > says for SSA names:
> > 
> > case SSA_NAME:
> >   /* We can just compare by pointer.  */
> >   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > 
> > (probably blame me for changing that to hashing the SSA version).
> 
> It was lxo.
> 
> > But I'm not sure that doesn't uncover issues with various hashtables and
> > walking them, generating code dependent on the order.  It's IMHO just not
> > expected that you compare function-local expressions from different
> > functions.
> 
> Same speedup result from
> 
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  (revision 207960)
> +++ gcc/tree.c  (working copy)
> @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
>}
>  case SSA_NAME:
>/* We can just compare by pointer.  */
> -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> +  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
>  case PLACEHOLDER_EXPR:
>/* The node itself doesn't matter.  */
>return val;
> 
> and from
> 
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  (revision 207960)
> +++ gcc/tree.c  (working copy)
> @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
>}
>  case SSA_NAME:
>/* We can just compare by pointer.  */
> -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> +  return iterative_hash_host_wide_int
> + (DECL_UID (cfun->decl),
> +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
>  case PLACEHOLDER_EXPR:
>/* The node itself doesn't matter.  */
>return val;
> 
> better than hashing pointers but requring cfun != NULL in this
> function isn't good either.
> 
> At this point I'm more comfortable with clearing the hashtable
> than with changing iterative_hash_expr in any way.  It's also
> along the way to get rid of the hash completely.
> 
> Oh, btw, the speedup is going from
> 
>  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
> wall  293891 kB (15%) ggc
> 
> to
> 
>  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
> wall  262544 kB (13%) ggc
> 
> at -O0 (less dramatic slowness for -On).
> 
> > The other thing would be to discard mem-attr sharing alltogether,
> > but that doesn't seem appropriate at this stage (but it would
> > also simplify quite some code).  With only one function in RTL
> > at a time that shouldn't be too bad (see several suggestions
> > along that line, even with statistics).

Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

Richard.


[patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-21 Thread Joey Ye
Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
this ICE:

1. thumb1 estimate if far_jump is used based on function insn size
2. During reload, after stack layout finalized, it does reload_as_needed. It
however increases insn size that changes estimation result of far_jump,
which in return need to save lr and change stack layout again. While there
is not chance to change, GCC crashes.

Solution:
Do not change estimation result of far_jump if reload_in_progress or
reload_completed is true.

Not likely need to fix lra according to Vlad:
http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html

ChangeLog:
* config/arm/arm.c (thumb_far_jump_used_p): Don't change
  if reload in progress or completed.

* gcc.target/arm/thumb1-far-jump-3.c: New case.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b562986..2cf362c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26255,6 +26255,11 @@ thumb_far_jump_used_p (void)
return 0;
 }
 
+  /* We should not change far_jump_used during or after reload, as there is
+ no chance to change stack frame layout.  */
+  if (reload_in_progress || reload_completed)
+return 0;
+
   /* Check to see if the function contains a branch
  insn with the far jump attribute set.  */
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c 
b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c
new file mode 100644
index 000..90559ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c
@@ -0,0 +1,108 @@
+/* Catch reload ICE on target thumb1 with far jump optimization.
+ * It is also a valid case for non-thumb1 target.  */
+
+/* Add -mno-lra option as it is only reproducable with reload.  It will
+   be removed after reload is completely removed.  */
+/* { dg-options "-mno-lra -fomit-frame-pointer" } */
+/* { dg-do compile } */
+
+#define C  2
+#define A  4
+#define RGB  (C | A)
+#define GRAY (A)
+
+typedef unsigned long uint_32;
+typedef unsigned char byte;
+typedef byte* bytep;
+
+typedef struct ss
+{
+   uint_32 w;
+   uint_32 r;
+   byte c;
+   byte b;
+   byte p;
+} info;
+
+typedef info * infop;
+
+void
+foo(infop info, bytep row)
+{
+   uint_32 iw = info->w;
+   if (info->c == RGB)
+   {
+  if (info->b == 8)
+  {
+ bytep sp = row + info->r;
+ bytep dp = sp;
+ byte save;
+ uint_32 i;
+
+ for (i = 0; i < iw; i++)
+ {
+save = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save;
+ }
+  }
+
+  else
+  {
+ bytep sp = row + info->r;
+ bytep dp = sp;
+ byte save[2];
+ uint_32 i;
+
+ for (i = 0; i < iw; i++)
+ {
+save[0] = *(--sp);
+save[1] = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save[0];
+*(--dp) = save[1];
+ }
+  }
+   }
+   else if (info->c == GRAY)
+   {
+  if (info->b == 8)
+  {
+ bytep sp = row + info->r;
+ bytep dp = sp;
+ byte save;
+ uint_32 i;
+
+ for (i = 0; i < iw; i++)
+ {
+save = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save;
+ }
+  }
+  else
+  {
+ bytep sp = row + info->r;
+ bytep dp = sp;
+ byte save[2];
+ uint_32 i;
+
+ for (i = 0; i < iw; i++)
+ {
+save[0] = *(--sp);
+save[1] = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save[0];
+*(--dp) = save[1];
+ }
+  }
+   }
+}


RE: [patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-21 Thread Joey Ye
OK to trunk and 4.8?

-Original Message-
From: Joey Ye [mailto:joey...@arm.com] 
Sent: 2014年2月21日 19:32
To: gcc-patches@gcc.gnu.org
Subject: [patch] [arm] Fix PR60169 - thumb1 far jump

Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
this ICE:

1. thumb1 estimate if far_jump is used based on function insn size 2. During
reload, after stack layout finalized, it does reload_as_needed. It however
increases insn size that changes estimation result of far_jump, which in
return need to save lr and change stack layout again. While there is not
chance to change, GCC crashes.

Solution:
Do not change estimation result of far_jump if reload_in_progress or
reload_completed is true.

Not likely need to fix lra according to Vlad:
http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html

ChangeLog:
* config/arm/arm.c (thumb_far_jump_used_p): Don't change
  if reload in progress or completed.

* gcc.target/arm/thumb1-far-jump-3.c: New case.





Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Biener wrote:
> 
> > On Fri, 21 Feb 2014, Richard Biener wrote:
> > 
> > > 
> > > This fixes the slowness of RTL expansion in PR60291 which is caused
> > > by excessive collisions in mem-attr sharing.  The issue here is
> > > that sharing attempts happens across functions and we have a _lot_
> > > of functions in this testcase referencing the same lexically
> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> > > means those get the same hash value.  But they don't compare
> > > equal because an SSA name _5 from function A is of course not equal
> > > to one from function B.
> > > 
> > > The following fixes that by not doing mem-attr sharing across functions
> > > by clearing the mem-attrs hashtable in rest_of_clean_state.
> > > 
> > > Another fix may be to do what the comment in iterative_hash_expr
> > > says for SSA names:
> > > 
> > > case SSA_NAME:
> > >   /* We can just compare by pointer.  */
> > >   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > > 
> > > (probably blame me for changing that to hashing the SSA version).
> > 
> > It was lxo.
> > 
> > > But I'm not sure that doesn't uncover issues with various hashtables and
> > > walking them, generating code dependent on the order.  It's IMHO just not
> > > expected that you compare function-local expressions from different
> > > functions.
> > 
> > Same speedup result from
> > 
> > Index: gcc/tree.c
> > ===
> > --- gcc/tree.c  (revision 207960)
> > +++ gcc/tree.c  (working copy)
> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
> >}
> >  case SSA_NAME:
> >/* We can just compare by pointer.  */
> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > +  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
> >  case PLACEHOLDER_EXPR:
> >/* The node itself doesn't matter.  */
> >return val;
> > 
> > and from
> > 
> > Index: gcc/tree.c
> > ===
> > --- gcc/tree.c  (revision 207960)
> > +++ gcc/tree.c  (working copy)
> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
> >}
> >  case SSA_NAME:
> >/* We can just compare by pointer.  */
> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > +  return iterative_hash_host_wide_int
> > + (DECL_UID (cfun->decl),
> > +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
> >  case PLACEHOLDER_EXPR:
> >/* The node itself doesn't matter.  */
> >return val;
> > 
> > better than hashing pointers but requring cfun != NULL in this
> > function isn't good either.
> > 
> > At this point I'm more comfortable with clearing the hashtable
> > than with changing iterative_hash_expr in any way.  It's also
> > along the way to get rid of the hash completely.
> > 
> > Oh, btw, the speedup is going from
> > 
> >  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
> > wall  293891 kB (15%) ggc
> > 
> > to
> > 
> >  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
> > wall  262544 kB (13%) ggc
> > 
> > at -O0 (less dramatic slowness for -On).
> > 
> > > The other thing would be to discard mem-attr sharing alltogether,
> > > but that doesn't seem appropriate at this stage (but it would
> > > also simplify quite some code).  With only one function in RTL
> > > at a time that shouldn't be too bad (see several suggestions
> > > along that line, even with statistics).
> 
> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

With the patch below to get some statistics we see that one important
piece of sharing not covered by above measurements is RTX copying(?).

On the testcase for this PR I get at -O1 and without the patch
to clear the hashtable after each function

142489 mem_attrs created (142439 for new, 50 for modification)
1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
by rtx copying)
0 mem_attrs dropped

and with the patch to clear after each function

364411 mem_attrs created (144478 for new, 219933 for modification)
1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
by rtx copying)
0 mem_attrs dropped

while for dwarf2out.c I see without the clearing

24399 mem_attrs created (6929 for new, 17470 for modification)
102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
rtx copying)
16 mem_attrs dropped

which means that completely dropping the sharing would result
in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
That's still not a lot overhead given that mem-attrs take 40 bytes
(3MB vs. 950kB).  There is also always the possibility to
explicitely ref-count mem-attrs to handle sharing by rtx
copying (at least cse, fwprop, combine, ira

Re: [PATCH][AArch64] vrnd<*>_f64 patch for stage-1

2014-02-21 Thread Alex Velenko

On 13/02/14 17:43, Richard Henderson wrote:

On 02/13/2014 03:17 AM, Alex Velenko wrote:

+/* Sets "rmode" field of "FPCR" control register to
+   "FPROUNDING_ZERO".  */


Comment is wrong, or at least misleading.


+void __inline __attribute__ ((__always_inline__))
+set_rounding_mode (uint32_t mode)
+{
+  uint32_t r;
+
+  /* Read current FPCR.  */
+  asm volatile ("mrs %[r], fpcr" : [r] "=r" (r) : :);
+
+  /* Clear rmode.  */
+  r &= 3 << RMODE_START;


   ~(3 << RMODE_START)


+  /* Calculate desired FPCR.  */
+  r |= mode << RMODE_START;
+
+  /* Write desired FPCR back.  */
+  asm volatile ("msr fpcr, %[r]" : : [r] "r" (r) :);
+}


Fortunately for this testcase, you do always use FPROUNDING_ZERO == 3 when
calling this function, so the bugs are hidden.


r~



Hi Richard,
Thank you for pointing those issue out. here is a respin of the same 
patch with indecated issues fixed. the description of the patch is as 
follows:


This patch adds vrnd<*>_f64 aarch64 intrinsics. A testcase for those
intrinsics is added. Run a complete LE and BE regression run with no 
regressions.


Is patch OK for stage-1?

gcc/

2014-02-21  Alex Velenko  

* config/aarch64/aarch64-builtins.c (BUILTIN_VDQF_DF): Macro
added.
* config/aarch64/aarch64-simd-builtins.def (frintn): Use added
macro.
* config/aarch64/aarch64-simd.md (): Comment
corrected.
* config/aarch64/aarch64.md (): Likewise.
* config/aarch64/arm_neon.h (vrnd_f64): Added.
(vrnda_f64): Likewise.
(vrndi_f64): Likewise.
(vrndm_f64): Likewise.
(vrndn_f64): Likewise.
(vrndp_f64): Likewise.
(vrndx_f64): Likewise.

gcc/testsuite/

2014-02-21  Alex Velenko  

gcc.target/aarch64/vrnd_f64_1.c : New testcase.


diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index ebab2ce8347a4425977c5cbd0f285c3ff1d9f2f1..7adc5fb96b6473ecde5c4f76973aff68af0ca7d4 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -307,6 +307,8 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
 #define BUILTIN_VDQF(T, N, MAP) \
   VAR3 (T, N, MAP, v2sf, v4sf, v2df)
+#define BUILTIN_VDQF_DF(T, N, MAP) \
+  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
 #define BUILTIN_VDQH(T, N, MAP) \
   VAR2 (T, N, MAP, v4hi, v8hi)
 #define BUILTIN_VDQHS(T, N, MAP) \
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index e5f71b479ccfd1a9cbf84aed0f96b49762053f59..09e230c56683a0225f8760472d7137b7bac98297 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -264,7 +264,7 @@
   BUILTIN_VDQF (UNOP, nearbyint, 2)
   BUILTIN_VDQF (UNOP, rint, 2)
   BUILTIN_VDQF (UNOP, round, 2)
-  BUILTIN_VDQF (UNOP, frintn, 2)
+  BUILTIN_VDQF_DF (UNOP, frintn, 2)
 
   /* Implemented by l2.  */
   VAR1 (UNOP, lbtruncv2sf, 2, v2si)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4dffb59e856aeaafb79007255d3b91a73ef1ef13..0c1d7de5b3f4fb0fa8fa226b81ec690d8112b849 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1427,7 +1427,7 @@
 )
 
 ;; Vector versions of the floating-point frint patterns.
-;; Expands to btrunc, ceil, floor, nearbyint, rint, round.
+;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 (define_insn "2"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
 	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99a6ac8fcbdcd24a0ea18cc037bef9cf72070281..577aa9fe08bb445e66734bc404e94e13dc1fa65b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3187,7 +3187,7 @@
 ;; ---
 
 ;; frint floating-point round to integral standard patterns.
-;; Expands to btrunc, ceil, floor, nearbyint, rint, round.
+;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6af99361b8e265f66026dc506cfc23f044d153b4..797e37ad638648312ef34bcd63c463e5873c30c4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22481,6 +22481,12 @@ vrnd_f32 (float32x2_t __a)
   return __builtin_aarch64_btruncv2sf (__a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vrnd_f64 (float64x1_t __a)
+{
+  return vset_lane_f64 (__builtin_trunc (vget_lane_f64 (__a, 0)), __a, 0);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrndq_f32 (float32x4_t __a)
 {
@@ -22501,6 +22507,12 @@ vrnda_f32 (float32x2_t __a)
   return __builtin_aarch64_roundv2sf (__a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_in

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 21 Feb 2014, Richard Biener wrote:
>
>> On Fri, 21 Feb 2014, Richard Biener wrote:
>> 
>> > On Fri, 21 Feb 2014, Richard Biener wrote:
>> > 
>> > > 
>> > > This fixes the slowness of RTL expansion in PR60291 which is caused
>> > > by excessive collisions in mem-attr sharing.  The issue here is
>> > > that sharing attempts happens across functions and we have a _lot_
>> > > of functions in this testcase referencing the same lexically
>> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
>> > > means those get the same hash value.  But they don't compare
>> > > equal because an SSA name _5 from function A is of course not equal
>> > > to one from function B.
>> > > 
>> > > The following fixes that by not doing mem-attr sharing across functions
>> > > by clearing the mem-attrs hashtable in rest_of_clean_state.
>> > > 
>> > > Another fix may be to do what the comment in iterative_hash_expr
>> > > says for SSA names:
>> > > 
>> > > case SSA_NAME:
>> > >   /* We can just compare by pointer.  */
>> > >   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > > 
>> > > (probably blame me for changing that to hashing the SSA version).
>> > 
>> > It was lxo.
>> > 
>> > > But I'm not sure that doesn't uncover issues with various hashtables and
>> > > walking them, generating code dependent on the order.  It's IMHO just not
>> > > expected that you compare function-local expressions from different
>> > > functions.
>> > 
>> > Same speedup result from
>> > 
>> > Index: gcc/tree.c
>> > ===
>> > --- gcc/tree.c  (revision 207960)
>> > +++ gcc/tree.c  (working copy)
>> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
>> >}
>> >  case SSA_NAME:
>> >/* We can just compare by pointer.  */
>> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > +  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
>> >  case PLACEHOLDER_EXPR:
>> >/* The node itself doesn't matter.  */
>> >return val;
>> > 
>> > and from
>> > 
>> > Index: gcc/tree.c
>> > ===
>> > --- gcc/tree.c  (revision 207960)
>> > +++ gcc/tree.c  (working copy)
>> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
>> >}
>> >  case SSA_NAME:
>> >/* We can just compare by pointer.  */
>> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > +  return iterative_hash_host_wide_int
>> > + (DECL_UID (cfun->decl),
>> > +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
>> >  case PLACEHOLDER_EXPR:
>> >/* The node itself doesn't matter.  */
>> >return val;
>> > 
>> > better than hashing pointers but requring cfun != NULL in this
>> > function isn't good either.
>> > 
>> > At this point I'm more comfortable with clearing the hashtable
>> > than with changing iterative_hash_expr in any way.  It's also
>> > along the way to get rid of the hash completely.
>> > 
>> > Oh, btw, the speedup is going from
>> > 
>> >  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
>> > wall  293891 kB (15%) ggc
>> > 
>> > to
>> > 
>> >  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
>> > wall  262544 kB (13%) ggc
>> > 
>> > at -O0 (less dramatic slowness for -On).
>> > 
>> > > The other thing would be to discard mem-attr sharing alltogether,
>> > > but that doesn't seem appropriate at this stage (but it would
>> > > also simplify quite some code).  With only one function in RTL
>> > > at a time that shouldn't be too bad (see several suggestions
>> > > along that line, even with statistics).
>> 
>> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
>
> With the patch below to get some statistics we see that one important
> piece of sharing not covered by above measurements is RTX copying(?).
>
> On the testcase for this PR I get at -O1 and without the patch
> to clear the hashtable after each function
>
> 142489 mem_attrs created (142439 for new, 50 for modification)
> 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
>
> and with the patch to clear after each function
>
> 364411 mem_attrs created (144478 for new, 219933 for modification)
> 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
>
> while for dwarf2out.c I see without the clearing
>
> 24399 mem_attrs created (6929 for new, 17470 for modification)
> 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
> rtx copying)
> 16 mem_attrs dropped
>
> which means that completely dropping the sharing would result
> in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
> That's still not a lot overhead given that mem-attrs ta

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Biener wrote:
> 
> > Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
> 
> With the patch below to get some statistics we see that one important
> piece of sharing not covered by above measurements is RTX copying(?).
> 
> On the testcase for this PR I get at -O1 and without the patch
> to clear the hashtable after each function
> 
> 142489 mem_attrs created (142439 for new, 50 for modification)
> 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
> 
> and with the patch to clear after each function
> 
> 364411 mem_attrs created (144478 for new, 219933 for modification)
> 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
> 
> while for dwarf2out.c I see without the clearing
> 
> 24399 mem_attrs created (6929 for new, 17470 for modification)
> 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
> rtx copying)
> 16 mem_attrs dropped

Oh, and more than half of shared-modified are actually not modified
so are false sharing reports (set_mem_attrs (mem, MEM_ATTRS (mem))).

24399 mem_attrs created (6929 for new, 17470 for modification)
85801 mem_attrs shared (10878 for new, 12390 for modification, 62533 by 
rtx copying)
16 mem_attrs dropped

when dropping sharing completely you "win" creations for modification
but lose shares for "new" and "copy".  Losing the "copy" case makes
it a loss overall which you can eventually offset by using a ref-counting
scheme (or better by avoiding copying the MEM in the first place,
a MEM is currently 24 bytes while its attrs are 40 bytes).

Richard.

Index: gcc/rtl.c
===
--- gcc/rtl.c   (revision 207938)
+++ gcc/rtl.c   (working copy)
@@ -326,6 +326,8 @@ copy_rtx (rtx orig)
   return copy;
 }
 
+unsigned long mem_attrs_shared_copy;
+
 /* Create a new copy of an rtx.  Only copy just one level.  */
 
 rtx
@@ -333,6 +335,8 @@ shallow_copy_rtx_stat (const_rtx orig ME
 {
   const unsigned int size = rtx_size (orig);
   rtx const copy = ggc_alloc_rtx_def_stat (size PASS_MEM_STAT);
+  if (MEM_P (orig) && MEM_ATTRS (orig))
+mem_attrs_shared_copy++;
   return (rtx) memcpy (copy, orig, size);
 }
 
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 207938)
+++ gcc/emit-rtl.c  (working copy)
@@ -290,6 +290,12 @@ mem_attrs_htab_eq (const void *x, const
   return mem_attrs_eq_p ((const mem_attrs *) x, (const mem_attrs *) y);
 }
 
+unsigned long mem_attrs_dropped;
+unsigned long mem_attrs_new;
+unsigned long mem_attrs_new_modified;
+unsigned long mem_attrs_shared;
+unsigned long mem_attrs_shared_modified;
+
 /* Set MEM's memory attributes so that they are the same as ATTRS.  */
 
 static void
@@ -300,6 +306,8 @@ set_mem_attrs (rtx mem, mem_attrs *attrs
   /* If everything is the default, we can just clear the attributes.  */
   if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
 {
+  if (MEM_ATTRS (mem))
+   mem_attrs_dropped++;
   MEM_ATTRS (mem) = 0;
   return;
 }
@@ -309,6 +317,20 @@ set_mem_attrs (rtx mem, mem_attrs *attrs
 {
   *slot = ggc_alloc_mem_attrs ();
   memcpy (*slot, attrs, sizeof (mem_attrs));
+  if (MEM_ATTRS (mem))
+   mem_attrs_new_modified++;
+  else
+   mem_attrs_new++;
+}
+  else
+{
+  if (MEM_ATTRS (mem))
+   {
+ if (MEM_ATTRS (mem) != *slot)
+   mem_attrs_shared_modified++;
+   }
+  else
+   mem_attrs_shared++;
 }
 
   MEM_ATTRS (mem) = (mem_attrs *) *slot;
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 207938)
+++ gcc/toplev.c(working copy)
@@ -1989,6 +2023,26 @@ toplev_main (int argc, char **argv)
   if (!exit_after_options)
 do_compile ();
 
+{
+  extern unsigned long mem_attrs_dropped;
+  extern unsigned long mem_attrs_new;
+  extern unsigned long mem_attrs_new_modified;
+  extern unsigned long mem_attrs_shared;
+  extern unsigned long mem_attrs_shared_modified;
+  extern unsigned long mem_attrs_shared_copy;
+  fprintf (stderr, "%lu mem_attrs created (%lu for new, %lu for "
+  "modification)\n",
+  mem_attrs_new + mem_attrs_new_modified,
+  mem_attrs_new, mem_attrs_new_modified);
+  fprintf (stderr, "%lu mem_attrs shared (%lu for new, %lu for "
+  "modification, %lu by rtx copying)\n",
+  mem_attrs_shared + mem_attrs_shared_modified
+  + mem_attrs_shared_copy,
+  mem_attrs_shared, mem_attrs_shared_modified,
+  mem_attrs_shared_copy);
+  fprintf (stderr, "%lu mem_attrs dropped\n", mem_attrs_dropped);
+}
+
   if (warningcount || errorcount || werror

How to use GCC to compile glib

2014-02-21 Thread shafiq132
Sir,

   I  have a cross compiler and I know how to cross compile a file . But I
am doing all just for glib compilation that I do not know how to do. Anyone
to guide me. or generally just inform me how can I compile a complete
library using gcc.



--
View this message in context: 
http://gcc.1065356.n5.nabble.com/PATCH-2-2-Fix-expansion-slowness-of-PR60291-tp1013329p1013362.html
Sent from the gcc - patches mailing list archive at Nabble.com.


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 21 Feb 2014, Richard Biener wrote:
> >
> >> On Fri, 21 Feb 2014, Richard Biener wrote:
> >> 
> >> > On Fri, 21 Feb 2014, Richard Biener wrote:
> >> > 
> >> > > 
> >> > > This fixes the slowness of RTL expansion in PR60291 which is caused
> >> > > by excessive collisions in mem-attr sharing.  The issue here is
> >> > > that sharing attempts happens across functions and we have a _lot_
> >> > > of functions in this testcase referencing the same lexically
> >> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> >> > > means those get the same hash value.  But they don't compare
> >> > > equal because an SSA name _5 from function A is of course not equal
> >> > > to one from function B.
> >> > > 
> >> > > The following fixes that by not doing mem-attr sharing across functions
> >> > > by clearing the mem-attrs hashtable in rest_of_clean_state.
> >> > > 
> >> > > Another fix may be to do what the comment in iterative_hash_expr
> >> > > says for SSA names:
> >> > > 
> >> > > case SSA_NAME:
> >> > >   /* We can just compare by pointer.  */
> >> > >   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> >> > > 
> >> > > (probably blame me for changing that to hashing the SSA version).
> >> > 
> >> > It was lxo.
> >> > 
> >> > > But I'm not sure that doesn't uncover issues with various hashtables 
> >> > > and
> >> > > walking them, generating code dependent on the order.  It's IMHO just 
> >> > > not
> >> > > expected that you compare function-local expressions from different
> >> > > functions.
> >> > 
> >> > Same speedup result from
> >> > 
> >> > Index: gcc/tree.c
> >> > ===
> >> > --- gcc/tree.c  (revision 207960)
> >> > +++ gcc/tree.c  (working copy)
> >> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
> >> >}
> >> >  case SSA_NAME:
> >> >/* We can just compare by pointer.  */
> >> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> >> > +  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
> >> >  case PLACEHOLDER_EXPR:
> >> >/* The node itself doesn't matter.  */
> >> >return val;
> >> > 
> >> > and from
> >> > 
> >> > Index: gcc/tree.c
> >> > ===
> >> > --- gcc/tree.c  (revision 207960)
> >> > +++ gcc/tree.c  (working copy)
> >> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
> >> >}
> >> >  case SSA_NAME:
> >> >/* We can just compare by pointer.  */
> >> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> >> > +  return iterative_hash_host_wide_int
> >> > + (DECL_UID (cfun->decl),
> >> > +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
> >> >  case PLACEHOLDER_EXPR:
> >> >/* The node itself doesn't matter.  */
> >> >return val;
> >> > 
> >> > better than hashing pointers but requring cfun != NULL in this
> >> > function isn't good either.
> >> > 
> >> > At this point I'm more comfortable with clearing the hashtable
> >> > than with changing iterative_hash_expr in any way.  It's also
> >> > along the way to get rid of the hash completely.
> >> > 
> >> > Oh, btw, the speedup is going from
> >> > 
> >> >  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 
> >> > (93%) 
> >> > wall  293891 kB (15%) ggc
> >> > 
> >> > to
> >> > 
> >> >  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 
> >> > 6%) 
> >> > wall  262544 kB (13%) ggc
> >> > 
> >> > at -O0 (less dramatic slowness for -On).
> >> > 
> >> > > The other thing would be to discard mem-attr sharing alltogether,
> >> > > but that doesn't seem appropriate at this stage (but it would
> >> > > also simplify quite some code).  With only one function in RTL
> >> > > at a time that shouldn't be too bad (see several suggestions
> >> > > along that line, even with statistics).
> >> 
> >> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
> >
> > With the patch below to get some statistics we see that one important
> > piece of sharing not covered by above measurements is RTX copying(?).
> >
> > On the testcase for this PR I get at -O1 and without the patch
> > to clear the hashtable after each function
> >
> > 142489 mem_attrs created (142439 for new, 50 for modification)
> > 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
> > by rtx copying)
> > 0 mem_attrs dropped
> >
> > and with the patch to clear after each function
> >
> > 364411 mem_attrs created (144478 for new, 219933 for modification)
> > 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
> > by rtx copying)
> > 0 mem_attrs dropped
> >
> > while for dwarf2out.c I see without the clearing
> >
> > 24399 mem_attrs created (6929 for new, 1747

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Fri, 21 Feb 2014, Richard Biener wrote:
> > >
> > >> On Fri, 21 Feb 2014, Richard Biener wrote:
> > >> 
> > >> > On Fri, 21 Feb 2014, Richard Biener wrote:
> > >> > 
> > >> > > 
> > >> > > This fixes the slowness of RTL expansion in PR60291 which is caused
> > >> > > by excessive collisions in mem-attr sharing.  The issue here is
> > >> > > that sharing attempts happens across functions and we have a _lot_
> > >> > > of functions in this testcase referencing the same lexically
> > >> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> > >> > > means those get the same hash value.  But they don't compare
> > >> > > equal because an SSA name _5 from function A is of course not equal
> > >> > > to one from function B.
> > >> > > 
> > >> > > The following fixes that by not doing mem-attr sharing across 
> > >> > > functions
> > >> > > by clearing the mem-attrs hashtable in rest_of_clean_state.
> > >> > > 
> > >> > > Another fix may be to do what the comment in iterative_hash_expr
> > >> > > says for SSA names:
> > >> > > 
> > >> > > case SSA_NAME:
> > >> > >   /* We can just compare by pointer.  */
> > >> > >   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), 
> > >> > > val);
> > >> > > 
> > >> > > (probably blame me for changing that to hashing the SSA version).
> > >> > 
> > >> > It was lxo.
> > >> > 
> > >> > > But I'm not sure that doesn't uncover issues with various hashtables 
> > >> > > and
> > >> > > walking them, generating code dependent on the order.  It's IMHO 
> > >> > > just not
> > >> > > expected that you compare function-local expressions from different
> > >> > > functions.
> > >> > 
> > >> > Same speedup result from
> > >> > 
> > >> > Index: gcc/tree.c
> > >> > ===
> > >> > --- gcc/tree.c  (revision 207960)
> > >> > +++ gcc/tree.c  (working copy)
> > >> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
> > >> >}
> > >> >  case SSA_NAME:
> > >> >/* We can just compare by pointer.  */
> > >> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > >> > +  return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
> > >> >  case PLACEHOLDER_EXPR:
> > >> >/* The node itself doesn't matter.  */
> > >> >return val;
> > >> > 
> > >> > and from
> > >> > 
> > >> > Index: gcc/tree.c
> > >> > ===
> > >> > --- gcc/tree.c  (revision 207960)
> > >> > +++ gcc/tree.c  (working copy)
> > >> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
> > >> >}
> > >> >  case SSA_NAME:
> > >> >/* We can just compare by pointer.  */
> > >> > -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > >> > +  return iterative_hash_host_wide_int
> > >> > + (DECL_UID (cfun->decl),
> > >> > +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), 
> > >> > val));
> > >> >  case PLACEHOLDER_EXPR:
> > >> >/* The node itself doesn't matter.  */
> > >> >return val;
> > >> > 
> > >> > better than hashing pointers but requring cfun != NULL in this
> > >> > function isn't good either.
> > >> > 
> > >> > At this point I'm more comfortable with clearing the hashtable
> > >> > than with changing iterative_hash_expr in any way.  It's also
> > >> > along the way to get rid of the hash completely.
> > >> > 
> > >> > Oh, btw, the speedup is going from
> > >> > 
> > >> >  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 
> > >> > (93%) 
> > >> > wall  293891 kB (15%) ggc
> > >> > 
> > >> > to
> > >> > 
> > >> >  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 
> > >> > 6%) 
> > >> > wall  262544 kB (13%) ggc
> > >> > 
> > >> > at -O0 (less dramatic slowness for -On).
> > >> > 
> > >> > > The other thing would be to discard mem-attr sharing alltogether,
> > >> > > but that doesn't seem appropriate at this stage (but it would
> > >> > > also simplify quite some code).  With only one function in RTL
> > >> > > at a time that shouldn't be too bad (see several suggestions
> > >> > > along that line, even with statistics).
> > >> 
> > >> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
> > >
> > > With the patch below to get some statistics we see that one important
> > > piece of sharing not covered by above measurements is RTX copying(?).
> > >
> > > On the testcase for this PR I get at -O1 and without the patch
> > > to clear the hashtable after each function
> > >
> > > 142489 mem_attrs created (142439 for new, 50 for modification)
> > > 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
> > > by rtx copying)
> > > 0 mem_attrs dropped
> > >
> > > and with the patch to clear after each functi

C++ PATCH for c++/60167 (reference template parameters)

2014-02-21 Thread Jason Merrill
My patch for 58606 was incomplete; there were other places that needed 
to change to handle dereferencing reference non-type template parameters.


Tested x86_64-pc-linux-gnu, applying to trunk.  I reverted the earlier 
58606 patch on the 4.8 branch.
commit 7b1bb4515ae768ca44e192442d2578ea46c16f96
Author: Jason Merrill 
Date:   Thu Feb 20 23:22:21 2014 -0500

	PR c++/60167
	PR c++/60222
	PR c++/58606
	* parser.c (cp_parser_template_argument): Restore dereference.
	* pt.c (template_parm_to_arg): Dereference non-pack expansions too.
	(process_partial_specialization): Handle deref.
	(unify): Likewise.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4673f78..d8ccd2b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13937,6 +13937,7 @@ cp_parser_template_argument (cp_parser* parser)
 
 	  if (INDIRECT_REF_P (argument))
 	{
+	  /* Strip the dereference temporarily.  */
 	  gcc_assert (REFERENCE_REF_P (argument));
 	  argument = TREE_OPERAND (argument, 0);
 	}
@@ -13975,6 +13976,8 @@ cp_parser_template_argument (cp_parser* parser)
 	  if (address_p)
 		argument = build_x_unary_op (loc, ADDR_EXPR, argument,
 	 tf_warning_or_error);
+	  else
+		argument = convert_from_reference (argument);
 	  return argument;
 	}
 	}
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6477fce..4cf387a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3861,6 +3861,8 @@ template_parm_to_arg (tree t)
 	  SET_ARGUMENT_PACK_ARGS (t, vec);
 	  TREE_TYPE (t) = type;
 	}
+  else
+	t = convert_from_reference (t);
 }
   return t;
 }
@@ -4218,10 +4220,12 @@ process_partial_specialization (tree decl)
   if (/* These first two lines are the `non-type' bit.  */
   !TYPE_P (arg)
   && TREE_CODE (arg) != TEMPLATE_DECL
-  /* This next line is the `argument expression is not just a
+  /* This next two lines are the `argument expression is not just a
  simple identifier' condition and also the `specialized
  non-type argument' bit.  */
-  && TREE_CODE (arg) != TEMPLATE_PARM_INDEX)
+  && TREE_CODE (arg) != TEMPLATE_PARM_INDEX
+	  && !(REFERENCE_REF_P (arg)
+		   && TREE_CODE (TREE_OPERAND (arg, 0)) == TEMPLATE_PARM_INDEX))
 {
   if ((!packed_args && tpd.arg_uses_template_parms[i])
   || (packed_args && uses_template_parms (arg)))
@@ -17893,6 +17897,12 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict,
   /* Unification fails if we hit an error node.  */
   return unify_invalid (explain_p);
 
+case INDIRECT_REF:
+  if (REFERENCE_REF_P (parm))
+	return unify (tparms, targs, TREE_OPERAND (parm, 0), arg,
+		  strict, explain_p);
+  /* FALLTHRU */
+
 default:
   /* An unresolved overload is a nondeduced context.  */
   if (is_overloaded_fn (parm) || type_unknown_p (parm))
diff --git a/gcc/testsuite/g++.dg/template/ref7.C b/gcc/testsuite/g++.dg/template/ref7.C
new file mode 100644
index 000..f6395e2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ref7.C
@@ -0,0 +1,10 @@
+// PR c++/60167
+
+template 
+struct Foo {
+  typedef int Bar;
+
+  static Bar cache;
+};
+
+template  typename Foo::Bar Foo::cache;
diff --git a/gcc/testsuite/g++.dg/template/ref8.C b/gcc/testsuite/g++.dg/template/ref8.C
new file mode 100644
index 000..a2fc847
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ref8.C
@@ -0,0 +1,8 @@
+// PR c++/60222
+
+template struct A
+{
+  template struct B;
+
+  template struct B {};
+};


C++ PATCH for c++/60251 (ICE with VLA capture)

2014-02-21 Thread Jason Merrill
is_normal_capture_proxy got confused by the contortions we go through to 
build up a capture proxy for a VLA capture, so it's easier to just check 
for variably modified type.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1fa864d218992c8a1b9b1fd4fae2205d5572205b
Author: Jason Merrill 
Date:   Thu Feb 20 23:35:28 2014 -0500

	PR c++/60251
	* lambda.c (is_normal_capture_proxy): Handle VLA capture.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 8bb820d..ad993e9d 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -250,6 +250,10 @@ is_normal_capture_proxy (tree decl)
 /* It's not a capture proxy.  */
 return false;
 
+  if (variably_modified_type_p (TREE_TYPE (decl), NULL_TREE))
+/* VLA capture.  */
+return true;
+
   /* It is a capture proxy, is it a normal capture?  */
   tree val = DECL_VALUE_EXPR (decl);
   if (val == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla11.C b/gcc/testsuite/g++.dg/cpp1y/vla11.C
new file mode 100644
index 000..c9cdade
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/vla11.C
@@ -0,0 +1,8 @@
+// PR c++/60251
+// { dg-options "-std=c++1y -pedantic-errors" }
+
+void foo(int n)
+{
+  int x[n];
+  [&x]() { decltype(x) y; }; // { dg-error "decltype of array of runtime bound" }
+}


C++ PATCH for c++/60250 (ICE with invalid array bound)

2014-02-21 Thread Jason Merrill
A type-dependent expression can have NULL TREE_TYPE, and if we wrap it 
in a NOP_EXPR also with NULL type, that confuses things.  Let's not try 
to do that.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 5564347b2b7b39d92f8f3b8307bc8ed8551e4d91
Author: Jason Merrill 
Date:   Thu Feb 20 23:46:00 2014 -0500

	PR c++/60250
	* parser.c (cp_parser_direct_declarator): Don't wrap a
	type-dependent expression in a NOP_EXPR.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d8ccd2b..d6c176f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -17233,7 +17233,8 @@ cp_parser_direct_declarator (cp_parser* parser,
    "array bound is not an integer constant");
 		  bounds = error_mark_node;
 		}
-	  else if (processing_template_decl)
+	  else if (processing_template_decl
+		   && !type_dependent_expression_p (bounds))
 		{
 		  /* Remember this wasn't a constant-expression.  */
 		  bounds = build_nop (TREE_TYPE (bounds), bounds);
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla12.C b/gcc/testsuite/g++.dg/cpp1y/vla12.C
new file mode 100644
index 000..df47f26
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/vla12.C
@@ -0,0 +1,7 @@
+// PR c++/60250
+// { dg-options "-std=c++1y -pedantic-errors" }
+
+template void foo()
+{
+  typedef int T[ ([](){ return 1; }()) ]; // { dg-error "runtime bound" }
+}


Re: [AArch64 01/14] Use "generic" target, if no other default.

2014-02-21 Thread Kyrill Tkachov

Hi Philipp,

On 18/02/14 21:09, Philipp Tomsich wrote:

The default target should be "generic", as Cortex-A53 includes
optional ISA features (CRC and CRYPTO) that are not required for
architectural compliance. The key difference between generic (which
already uses the cortexa53 pipeline model for scheduling) is the
absence of any optional ISA features in the "generic" target.
---
  gcc/config/aarch64/aarch64.c | 2 +-
  gcc/config/aarch64/aarch64.h | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 784bfa3..70dda00 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5244,7 +5244,7 @@ aarch64_override_options (void)
  
/* If the user did not specify a processor, choose the default

   one for them.  This will be the CPU set during configuration using
- --with-cpu, otherwise it is "cortex-a53".  */
+ --with-cpu, otherwise it is "generic".  */
if (!selected_cpu)
  {
selected_cpu = &all_cores[TARGET_CPU_DEFAULT & 0x3f];
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 13c424c..b66a6b4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -472,10 +472,10 @@ enum target_cpus
TARGET_CPU_generic
  };
  
-/* If there is no CPU defined at configure, use "cortex-a53" as default.  */

+/* If there is no CPU defined at configure, use "generic" as default.  */
  #ifndef TARGET_CPU_DEFAULT
  #define TARGET_CPU_DEFAULT \
-  (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
+  (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
  #endif
  
  /* The processor for which instructions should be scheduled.  */


I don't think this approach will work. The bug we have here is that in 
config.gcc when processing a --with-arch directive it will use the CPU flags of 
the sample cpu given for the architecture in aarch64-arches.def. This will cause 
it to use cortex-a53+fp+simd+crypto+crc when asked to configure for 
--with-arch=armv8-a. Instead it should be using the 4th field of the 
AARCH64_ARCH which specifies the ISA flags implied by the architecture. Then we 
would get cortex-a53+fp+simd.


Also, if no --with-arch or --with-cpu is specified, config.gcc will still 
specify TARGET_CPU_DEFAULT as TARGET_CPU_generic but without encoding the ISA 
flags (AARCH64_FL_FOR_ARCH8 in this case) for it in the upper bits of 
TARGET_CPU_DEFAULT, leading to an always defined TARGET_CPU_DEFAULT which will 
cause the last hunk in this patch to never be used and configuring.


I'm working on a fix for these issues.

HTH,
Kyrill



C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)

2014-02-21 Thread Jason Merrill
While parsing the template parameter list for a lambda, we've already 
pushed into the closure class but haven't created the op() 
FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer 
of op() breaks.  Avoid the ICE by not trying to capture 'this' when 
parsing a parameter list.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 415022d49d1cee84b6d2085e7585e1d801d15732
Author: Jason Merrill 
Date:   Fri Feb 21 00:35:35 2014 -0500

	PR c++/60252
	* lambda.c (maybe_resolve_dummy): Don't try to capture this
	in declaration context.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index ad993e9d..7fe235b 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -749,7 +749,10 @@ maybe_resolve_dummy (tree object)
   if (type != current_class_type
   && current_class_type
   && LAMBDA_TYPE_P (current_class_type)
-  && DERIVED_FROM_P (type, current_nonlambda_class_type ()))
+  && DERIVED_FROM_P (type, current_nonlambda_class_type ())
+  /* If we get here while parsing the parameter list of a lambda, it
+	 will fail, so don't even try (c++/60252).  */
+  && current_binding_level->kind != sk_function_parms)
 {
   /* In a lambda, need to go through 'this' capture.  */
   tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type);
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C
new file mode 100644
index 000..58f0fa3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C
@@ -0,0 +1,12 @@
+// PR c++/60252
+// { dg-require-effective-target c++11 }
+
+struct A
+{
+  int i;			// { dg-message "" }
+
+  void foo()
+  {
+[&](){ [&](int[i]){}; };	// { dg-error "" }
+  }
+};


C++ PATCH for c++/60248 (ICE with variadic template)

2014-02-21 Thread Jason Merrill
mangle_decl shouldn't try to make a forward-compatibility alias for a 
TYPE_DECL, since they don't have symbols.


Tested x86_64-pc-linux-gnu, applying to trunk, 4.7, 4.8.
commit 8d40d9322f567ba5720ac807168232ae3c5ee0e4
Author: Jason Merrill 
Date:   Fri Feb 21 00:39:25 2014 -0500

	PR c++/60248
	* mangle.c (mangle_decl): Don't make an alias for a TYPE_DECL.

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 7bb6f4b..251edb1 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3485,6 +3485,7 @@ mangle_decl (const tree decl)
 
   if (G.need_abi_warning
   /* Don't do this for a fake symbol we aren't going to emit anyway.  */
+  && TREE_CODE (decl) != TYPE_DECL
   && !DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)
   && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl))
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic149.C b/gcc/testsuite/g++.dg/cpp0x/variadic149.C
new file mode 100644
index 000..a250e7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic149.C
@@ -0,0 +1,11 @@
+// PR c++/60248
+// { dg-options "-std=c++11 -g -fabi-version=2" }
+
+template struct A {};
+
+template<> struct A<0>
+{
+  typedef enum { e } B;
+};
+
+A<0> a;


C++ PATCH for c++/60224 (ICE initializing array with PMF)

2014-02-21 Thread Jason Merrill

We shouldn't treat a CONSTRUCTOR as an init-list if it already has a type.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8e1493a7a31ffdb1e70977c325e7d2f2686b14a7
Author: Jason Merrill 
Date:   Fri Feb 21 00:52:20 2014 -0500

	PR c++/60224
	* decl.c (cp_complete_array_type, maybe_deduce_size_from_array_init):
	Don't get confused by a CONSTRUCTOR that already has a type.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b7d2d9f..04c4cf5 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4880,7 +4880,7 @@ maybe_deduce_size_from_array_init (tree decl, tree init)
 	 those are not supported in GNU C++, and as the middle-end
 	 will crash if presented with a non-numeric designated
 	 initializer.  */
-  if (initializer && TREE_CODE (initializer) == CONSTRUCTOR)
+  if (initializer && BRACE_ENCLOSED_INITIALIZER_P (initializer))
 	{
 	  vec *v = CONSTRUCTOR_ELTS (initializer);
 	  constructor_elt *ce;
@@ -7099,6 +7099,11 @@ cp_complete_array_type (tree *ptype, tree initial_value, bool do_default)
   int failure;
   tree type, elt_type;
 
+  /* Don't get confused by a CONSTRUCTOR for some other type.  */
+  if (initial_value && TREE_CODE (initial_value) == CONSTRUCTOR
+  && !BRACE_ENCLOSED_INITIALIZER_P (initial_value))
+return 1;
+
   if (initial_value)
 {
   unsigned HOST_WIDE_INT i;
diff --git a/gcc/testsuite/g++.dg/init/array36.C b/gcc/testsuite/g++.dg/init/array36.C
new file mode 100644
index 000..77e4f90
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/array36.C
@@ -0,0 +1,8 @@
+// PR c++/60224
+
+struct A {};
+
+void foo()
+{
+  bool b[] = (int (A::*)())0;	// { dg-error "" }
+}


C++ PATCH for c++/60219 (ICE with invalid variadics)

2014-02-21 Thread Jason Merrill
In coerce_template_parms, if we try to pack the remaining arguments into 
an argument pack and that fails, we should immediately stop trying to 
process more arguments.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 1555baa24f537d0e724c53845e7ba2881df7a77f
Author: Jason Merrill 
Date:   Fri Feb 21 01:05:42 2014 -0500

	PR c++/60219
	* pt.c (coerce_template_parms): Bail if argument packing fails.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3e464ff..0729d93 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6808,6 +6808,8 @@ coerce_template_parms (tree parms,
   /* Store this argument.  */
   if (arg == error_mark_node)
 lost++;
+	  if (lost)
+	break;
   TREE_VEC_ELT (new_inner_args, parm_idx) = arg;
 
 	  /* We are done with all of the arguments.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic150.C b/gcc/testsuite/g++.dg/cpp0x/variadic150.C
new file mode 100644
index 000..6a30efe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic150.C
@@ -0,0 +1,9 @@
+// PR c++/60219
+// { dg-require-effective-target c++11 }
+
+template void foo();
+
+void bar()
+{
+  foo<0>;			// { dg-error "" }
+}


C++ PATCH for c++/60216 (ICE with specialization of deleted template)

2014-02-21 Thread Jason Merrill
We need to propagate DECL_DELETED_FN to clones when we get a new 
specialization.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit eaf1689e134ff4fb364c0045965b19879bff8f32
Author: Jason Merrill 
Date:   Fri Feb 21 08:47:01 2014 -0500

	PR c++/60216
	* pt.c (register_specialization): Copy DECL_DELETED_FN to clones.
	(check_explicit_specialization): Don't clone.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0729d93..f07f6e6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1440,6 +1440,8 @@ register_specialization (tree spec, tree tmpl, tree args, bool is_friend,
 		= DECL_DECLARED_INLINE_P (fn);
 		  DECL_SOURCE_LOCATION (clone)
 		= DECL_SOURCE_LOCATION (fn);
+		  DECL_DELETED_FN (clone)
+		= DECL_DELETED_FN (fn);
 		}
 	  check_specialization_namespace (tmpl);
 
@@ -2770,15 +2772,16 @@ check_explicit_specialization (tree declarator,
 	   It's just the name of an instantiation.  But, it's not
 	   a request for an instantiation, either.  */
 	SET_DECL_IMPLICIT_INSTANTIATION (decl);
-	  else if (DECL_CONSTRUCTOR_P (decl) || DECL_DESTRUCTOR_P (decl))
-	/* This is indeed a specialization.  In case of constructors
-	   and destructors, we need in-charge and not-in-charge
-	   versions in V3 ABI.  */
-	clone_function_decl (decl, /*update_method_vec_p=*/0);
 
 	  /* Register this specialization so that we can find it
 	 again.  */
 	  decl = register_specialization (decl, gen_tmpl, targs, is_friend, 0);
+
+	  /* A 'structor should already have clones.  */
+	  gcc_assert (decl == error_mark_node
+		  || !(DECL_CONSTRUCTOR_P (decl)
+			   || DECL_DESTRUCTOR_P (decl))
+		  || DECL_CLONED_FUNCTION_P (DECL_CHAIN (decl)));
 	}
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted3.C b/gcc/testsuite/g++.dg/cpp0x/deleted3.C
new file mode 100644
index 000..6783677
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/deleted3.C
@@ -0,0 +1,11 @@
+// PR c++/60216
+// { dg-require-effective-target c++11 }
+
+struct A
+{
+  template A(T) = delete;
+};
+
+template<> A::A(int) {}
+
+A a(0);


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Biener wrote:
> 
> > On Fri, 21 Feb 2014, Richard Sandiford wrote:
> > 
> > > In a thread a few years ago you talked about the possibility of going
> > > further and folding the attributes into the MEM itself, so avoiding
> > > the indirection and separate allocation:
> > > 
> > >   http://thread.gmane.org/gmane.comp.gcc.patches/244464/focus=244538
> > > 
> > > (and earlier posts in the thread).  Would that still be OK?
> > > I might have a go if so.
> > 
> > It would work for me.  Micha just brought up the easiest incremental
> > change though, which is

...

> I am testing the following (and also consider it appropriate as a
> fix for the regression PR60291).
> 
> Ok for trunk/branch(es)?  Now we have many variants to choose from ;)

Jakub requested statistics for a bootstrap for this one.  I get
for r207939 and a --enable-languages=c x86_64 bootstrap
3609924 mem-attrs created overall without the patch and
8268976 with the patch (that's a factor of 2.3 and thus "nothing").

Richard.


[PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call

2014-02-21 Thread Martin Jambor
Hi,

in propagate_constants_accross_call we expect a thunk to have at least
one parameter and thus an ipa-prop parameter descriptor.  However,
when the callee comes from a CU that was compiled with -O0, there are
no parameter descriptors and we fail an index checking assert.

This patch fixes it by bailing out early if there are no parameter
descriptors because in that case there is nothing to do in that
function anyway.  Bootstrap and testing in progress, OK for trunk if
it passes?

Thanks,

Martin


2014-02-21  Martin Jambor  

PR ipa/60266
* ipa-cp.c (propagate_constants_accross_call): Bail out early if
there are no parameter descriptors.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 7d8bc05..4c9ab12 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
   args = IPA_EDGE_REF (cs);
   args_count = ipa_get_cs_argument_count (args);
   parms_count = ipa_get_param_count (callee_info);
+  if (parms_count == 0)
+return false;
 
   /* If this call goes through a thunk we must not propagate to the first (0th)
  parameter.  However, we might need to uncover a thunk from below a series


Re: [PATCH, ARM] Support ORN for DImode

2014-02-21 Thread Richard Earnshaw
On 19/02/14 10:18, Ian Bolton wrote:
> Hi,
> 
> Patterns had previously been added to thumb2.md to support ORN, but only for
> SImode.
> 
> This patch adds DImode support, to cover the full 64|64->64 operation and
> the various 32|64->64 operations (see AND:DI variants that use NOT).
> 
> The patch comes with its own execution test and looks for correct number of
> ORN instructions in the assembly.
> 
> Regressions passed.
> 
> OK for stage 1?
> 

OK.

Do you not also need a pattern for

(ior:DI (not:DI (reg:DI))
(zero_extend:DI (reg:SI))

->
   orn (lowpart)+ mvn(highpart)

I don't think one works for sign-extension, though.

R.

> 
> 2014-02-19  Ian Bolton  
> 
> gcc/
> * config/arm/thumb2.md (*iordi_notdi_di): New pattern.
> (*iordi_notzesidi): New pattern.
> (*iordi_notsesidi_di): New pattern.
> testsuite/
> * gcc.target/arm/iordi_notdi-1.c: New test.
> 



[libstdc++-v3] Move shared_mutex to shared_timed_mutex - late C++14 change (n3891)

2014-02-21 Thread Ed Smith-Rowland

This are the patches as applied

Built and tested x86_64-linux.

2014-02-20  Ed Smith-Rowland  <3dw...@verizon.net>

Rename shared_mutex to shared_timed_mutex per C++14 acceptance of N3891.
* include/std/shared_mutex: Rename shared_mutex to shared_timed_mutex.
* testsuite/30_threads/shared_lock/locking/2.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/4.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/1.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/3.cc: Ditto.
* testsuite/30_threads/shared_lock/requirements/
explicit_instantiation.cc: Ditto.
* testsuite/30_threads/shared_lock/requirements/typedefs.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/2.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/4.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/1.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/6.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/3.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/5.cc: Ditto.
* testsuite/30_threads/shared_lock/modifiers/2.cc: Ditto.
* testsuite/30_threads/shared_lock/modifiers/1.cc: Ditto.
* testsuite/30_threads/shared_mutex/requirements/
standard_layout.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/copy_neg.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/1.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/assign_neg.cc: Ditto.
* testsuite/30_threads/shared_mutex/try_lock/2.cc: Ditto.
* testsuite/30_threads/shared_mutex/try_lock/1.cc: Ditto.
2014-02-21  Ed Smith-Rowland  <3dw...@verizon.net>

Rename testsuite directory shared_mutex to shared_timed_mutex
for consistency.
* testsuite/30_threads/shared_mutex: Moved to...
* testsuite/30_threads/shared_timed_mutex: ...here
Index: include/std/shared_mutex
===
--- include/std/shared_mutex(revision 207061)
+++ include/std/shared_mutex(working copy)
@@ -52,8 +52,8 @@
*/
 
 #if defined(_GLIBCXX_HAS_GTHREADS) && defined(_GLIBCXX_USE_C99_STDINT_TR1)
-  /// shared_mutex
-  class shared_mutex
+  /// shared_timed_mutex
+  class shared_timed_mutex
   {
 #if _GTHREAD_USE_MUTEX_TIMEDLOCK
 struct _Mutex : mutex, __timed_mutex_impl<_Mutex>
@@ -84,15 +84,15 @@
 static constexpr unsigned _M_n_readers = ~_S_write_entered;
 
   public:
-shared_mutex() : _M_state(0) {}
+shared_timed_mutex() : _M_state(0) {}
 
-~shared_mutex()
+~shared_timed_mutex()
 {
   _GLIBCXX_DEBUG_ASSERT( _M_state == 0 );
 }
 
-shared_mutex(const shared_mutex&) = delete;
-shared_mutex& operator=(const shared_mutex&) = delete;
+shared_timed_mutex(const shared_timed_mutex&) = delete;
+shared_timed_mutex& operator=(const shared_timed_mutex&) = delete;
 
 // Exclusive ownership
 
Index: testsuite/30_threads/shared_lock/locking/2.cc
===
--- testsuite/30_threads/shared_lock/locking/2.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/2.cc   (working copy)
@@ -30,7 +30,7 @@
 void test01()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lock lock_type;
 
   try
@@ -66,7 +66,7 @@
 void test02()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lock lock_type;
 
   try
Index: testsuite/30_threads/shared_lock/locking/4.cc
===
--- testsuite/30_threads/shared_lock/locking/4.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/4.cc   (working copy)
@@ -31,7 +31,7 @@
 int main()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lock lock_type;
   typedef std::chrono::system_clock clock_type;
 
Index: testsuite/30_threads/shared_lock/locking/1.cc
===
--- testsuite/30_threads/shared_lock/locking/1.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/1.cc   (working copy)
@@ -30,7 +30,7 @@
 int main()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lock lock_type;
 
   try
Index: testsuite/30_threads/shared_lock/locking/3.cc
===
--- testsuite/30_threads/shared_lock/locking/3.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/3.cc   (working copy)
@@ -31,7 +31,7 @@
 int main()
 {
   bool test __attribute__((unused)) = true;
-

C++ PATCH for c++/60051 (ICE deducing array)

2014-02-21 Thread Jason Merrill
This patch benefits from the discussion of array deduction at last 
week's C++ standardization committee meeting, where we clarified that we 
should only try to deduce the array bound from an initializer-list if 
the array bound is deducible, i.e. if it's a non-type template 
parameter.  We also should avoid crashing on a 0-length init-list which 
would result in an invalid 0-length array.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8fc69de2c377470b3ae9a8ebc65b0909d626d6e3
Author: Jason Merrill 
Date:   Fri Feb 21 00:16:52 2014 -0500

	DR 1591
	PR c++/60051
	* pt.c (unify): Only unify if deducible.  Handle 0-length list.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4cf387a..0f576a5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17262,14 +17262,16 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict,
    explain_p);
 	}
 
-  if (TREE_CODE (parm) == ARRAY_TYPE)
+  if (TREE_CODE (parm) == ARRAY_TYPE
+	  && deducible_array_bound (TYPE_DOMAIN (parm)))
 	{
 	  /* Also deduce from the length of the initializer list.  */
 	  tree max = size_int (CONSTRUCTOR_NELTS (arg));
 	  tree idx = compute_array_index_type (NULL_TREE, max, tf_none);
-	  if (TYPE_DOMAIN (parm) != NULL_TREE)
-	return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm),
-   idx, explain_p);
+	  if (idx == error_mark_node)
+	return unify_invalid (explain_p);
+	  return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm),
+ idx, explain_p);
 	}
 
   /* If the std::initializer_list deduction worked, replace the
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist80.C b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
new file mode 100644
index 000..7947f1f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
@@ -0,0 +1,6 @@
+// PR c++/60051
+// { dg-require-effective-target c++11 }
+
+#include 
+
+auto x[2] = {};			// { dg-error "" }


Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Ilya Verbin
2014-02-20 22:27 GMT+04:00 Bernd Schmidt :
> There were still a number of things in these patches that did not make sense
> to me and which I've changed. Let me know if there was a good reason for the
> way some of these things were originally done.
>  * Functions and variables now go into different tables, otherwise
>intermixing between them could be a problem that causes tables to
>go out of sync between host and target (imagine one big table being
>generated by ptx lto1/mkoffload, and multiple small table fragments
>being linked together on the host side).

What do you mean by multiple small table fragments?
The tables from every object file should be joined together while
linking DSO in the same order for both host and target.
If you need to join tables from multiple target images into one big
table, the host tables also should be joined in the same order. In our
case we're obtaining each target table while loading the image to
target device, and merging it with a corresponding host table.
How splitting functions and global vars into 2 tables will help to
avoid intermixing?

>  * Is there a reason to call a register function for the host tables?
>The way I've set it up, we register a target function/variable table
>while also passing a pointer to the __OPENMP_TARGET__ symbol which
>holds information about the host side tables.

Suppose there is liba, that depends on libb, that depends on libc.
Also corresponding target image tgtimga depends on tgtimgb, that
depends on tgtimgc. When liba is going to start offloaded function, it
calls GOMP_target with a pointer to its descriptor, which contains a
pointer to tgtimga. But how does GOMP_target know that it should also
load tgtimgb and tgtimgc to target? And where to get their descriptors
from?
That's why we have added host-side DSO registration. In this example
they are loaded on host in the following order: libc, libb, liba. In
the same order they are registered in libgomp, and loaded to target
device while initialization. In the same order the tables received
from target are merged with the host tables from the descriptors.

> I'm appending those parts of my current patch kit that seem relevant. This
> includes the ptx mkoffload tool and a patch to make a dummy
> GOMP_offload_register function. Most of the others are updated versions of
> patches I've posted before, and two adapted from Michael Zolotukhin's set
> (automatically generated files not included in the diffs for size reasons).
> How does this look?

I will take a closer look at you changes, try to run it, and send
feedback next week.

  -- Ilya


Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Ilya Tocar
> > Latest version of AVX512 spec
> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
> > Has a few changes.
> >
> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
> > We can either support new CPUID or disable PREFETCHWT1 from generating,
> > without removing code, and enable it in 4.9.1/latest version.
> > I am not sure that adding new -m flag and related stuff this late
> > is a good idea. Should still add it?
> 
> Please submit the patch anyway. We can relax release constraints on
> non-algorithmic patch a bit, weighting in benefits of having gcc
> release that fully conforms to some published specification.
>
Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
Ok for trunk?

ChangeLog:

2014-02-21  Ilya Tocar  

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf__: Change into ...
 (*prefetch_prefetchwt1_: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. 

And for tests:

2014-02-22  Ilya Tocar  

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch.
* gcc.target/i386/sse-23.c: Ditto. 

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 14 files changed, 68 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -154,6 +155,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mprefetchwt1:
+  if (value)
+   {
+ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+   }
+  else
+   {
+ opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+   }
+  return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index c7a53dd..8c323ae 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gc

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Bernd Schmidt

On 02/21/2014 04:17 PM, Ilya Verbin wrote:

2014-02-20 22:27 GMT+04:00 Bernd Schmidt :

There were still a number of things in these patches that did not make sense
to me and which I've changed. Let me know if there was a good reason for the
way some of these things were originally done.
  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).


What do you mean by multiple small table fragments?


Well, suppose you have file1.o and file2.o compiled for the host with a 
.offload_func_table_section in each, and they get linked together - each 
provides a fragment of the whole table.



The tables from every object file should be joined together while
linking DSO in the same order for both host and target.
If you need to join tables from multiple target images into one big
table, the host tables also should be joined in the same order.


The problem is that ptx does not have a linker, so we cannot exactly 
reproduce what happens on the host side. We have to process all host .o 
files in one single invocation of ptx lto1, and produce a single ptx 
assembly file, with a single function/variable table, from there. Having 
functions and variables separated gives us at least a small chance that 
the order will match that found in the host tables if the host table is 
produced by linking multiple fragments.



Suppose there is liba, that depends on libb, that depends on libc.


What kind of dependencies between liba and libb do you expect to be able 
to support on the target side? References to each other's functions and 
variables?



Bernd



Re: [PATCH] Fix PR c++/60065.

2014-02-21 Thread Jason Merrill

On 02/21/2014 03:19 AM, Adam Butcher wrote:

A class template with an out-of-line generic function definition will
give the same issue I think:

   template 
   void A::f(auto x) {}  // should inject a new list


Right.  template_class_depth should be useful here.  This is basically 
the same question as whether a particular member function is a primary 
template (member template) or not, but figuring it out in the middle of 
the parameter list complicates things.



Once it's resolved I think it'd be useful to create a new function to
determine this rather than doing the scope walk in a number of places.
Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully
some more elegant name!


Right.


Why doesn't num_template_parameter_lists work as a predicate here?


It works in the lambda case as it is updated there, but for generic
functions I think the following prevents it:

   cp/parser.c:17063:

   /* Inside the function parameter list, surrounding
  template-parameter-lists do not apply.  */
   saved_num_template_parameter_lists
 = parser->num_template_parameter_lists;
   parser->num_template_parameter_lists = 0;


Hmm, I wonder what that's for?  What breaks when you remove it? :)

Jason



[jit] New API entrypoint: gcc_jit_context_dump_to_file

2014-02-21 Thread David Malcolm
Committed to branch dmalcolm/jit:

Add a new "gcc_jit_context_dump_to_file", which dumps a C-like
representation of the context's IR to a given path.

There is also a flag "update_locations", which, when true, will set up
gcc_jit_location information throughout the context, pointing at the dump
file as if it were a source file.

I've been using this in conjunction with GCC_JIT_BOOL_OPTION_DEBUGINFO to
step through generated code in the debugger (when trying to debug my port
of GNU Octave's JIT to libgccjit).

gcc/jit/
* libgccjit.h (gcc_jit_context_dump_to_file): New.
* libgccjit.map (gcc_jit_context_dump_to_file): New.
* libgccjit.c (gcc_jit_context_dump_to_file): New.
* libgccjit++.h (gccjit::context::dump_to_file): New.

* internal-api.h (gcc::jit::dump): New class.
(gcc::jit::recording::playback_location): Add a replayer argument,
so that playback locations can be created before playback statements.
(gcc::jit::recording::location::playback_location): Likewise.
(gcc::jit::recording::statement::playback_location): Likewise.
(gcc::jit::recording::context::dump_to_file): New.
(gcc::jit::recording::context::m_structs): New field, for use by
dump_to_file.
(gcc::jit::recording::context::m_functions): Likewise.
(gcc::jit::recording::memento::write_to_dump): New virtual function.
(gcc::jit::recording::field::write_to_dump): New.
(gcc::jit::recording::fields::write_to_dump): New.
(gcc::jit::recording::function::write_to_dump): New.
(gcc::jit::recording::function::m_locals): New field for use by
write_to_dump.
(gcc::jit::recording::function::m_activity): Likewise.
(gcc::jit::recording::local::write_to_dump): New.
(gcc::jit::recording::statement::write_to_dump): New.
(gcc::jit::recording::place_label::write_to_dump): New.

* internal-api.c (gcc::jit::dump::dump): New.
(gcc::jit::dump::~dump): New.
(gcc::jit::dump::write): New.
(gcc::jit::dump::make_location): New.
(gcc::jit::recording::playback_location): Add a replayer argument,
so that playback locations can be created before playback statements.

(gcc::jit::recording::context::context): Initialize new fields.
(gcc::jit::recording::function::function): Likewise.

(gcc::jit::recording::context::new_struct_type): Add struct to the
context's m_structs vector.
(gcc::jit::recording::context::new_function): Add function to the
context's m_functions vector.
(gcc::jit::recording::context::dump_to_file): New.
(gcc::jit::recording::memento::write_to_dump): New.
(gcc::jit::recording::field::write_to_dump): New.
(gcc::jit::recording::fields::write_to_dump): New.
(gcc::jit::recording::function::write_to_dump): New.
(gcc::jit::recording::local::write_to_dump): New.
(gcc::jit::recording::statement::write_to_dump): New.
(gcc::jit::recording::place_label::write_to_dump): New.

(gcc::jit::recording::array_type::replay_into): Pass on replayer
to call to playback_location.
(gcc::jit::recording::field::replay_into): Likewise.
(gcc::jit::recording::struct_::replay_into): Likewise.
(gcc::jit::recording::param::replay_into): Likewise.
(gcc::jit::recording::function::replay_into): Likewise.
(gcc::jit::recording::global::replay_into): Likewise.
(gcc::jit::recording::unary_op::replay_into): Likewise.
(gcc::jit::recording::binary_op::replay_into): Likewise.
(gcc::jit::recording::comparison::replay_into): Likewise.
(gcc::jit::recording::call::replay_into): Likewise.
(gcc::jit::recording::array_access::replay_into): Likewise.
(gcc::jit::recording::access_field_of_lvalue::replay_into): Likewise.
(gcc::jit::recording::access_field_rvalue::replay_into): Likewise.
(gcc::jit::recording::dereference_field_rvalue::replay_into): Likewise.
(gcc::jit::recording::dereference_rvalue::replay_into): Likewise.
(gcc::jit::recording::get_address_of_lvalue::replay_into): Likewise.
(gcc::jit::recording::local::replay_into): Likewise.
(gcc::jit::recording::eval::replay_into): Likewise.
(gcc::jit::recording::assignment::replay_into): Likewise.
(gcc::jit::recording::assignment_op::replay_into): Likewise.
(gcc::jit::recording::comment::replay_into): Likewise.
(gcc::jit::recording::conditional::replay_into): Likewise.
(gcc::jit::recording::place_label::replay_into): Likewise.
(gcc::jit::recording::jump::replay_into): Likewise.
(gcc::jit::recording::return_::replay_into): Likewise.
(gcc::jit::recording::loop::replay_into): Likewise.
(gcc::jit::recording::loop_end::replay_into): Likewise.

(gcc::jit::recording::function::new_local): Add to the function

[Patch, AArch64] Fix shuffle for big-endian.

2014-02-21 Thread Tejas Belagod


Hi,

When a shuffle of more than one input happens, on NEON we end up with a 
'mixed-endian' format in the register list which TBL operates on. We don't make 
this correction in RTL and therefore the shuffle operation gets it incorrect. 
Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
be mixed-endian to reflect what's happening on NEON.


As trunk stands, this patch will not be exercised as constant vector permute for 
Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
it fixes the some regressions we have on big-endian:


aarch64_be-none-elf:
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-loops

FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-114.c execution test
FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-15.c execution test

Also regressed on aarch64-none-elf.

OK for stage-1?

Thanks,
Tejas.

2014-02-21  Tejas Belagod  

gcc/
* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
big-endian when dealing with more than one input shuffle vector.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea90311..fd473a3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
 return false;

   for (i = 0; i < nelt; ++i)
-rperm[i] = GEN_INT (d->perm[i]);
+{
+  int nunits = GET_MODE_NUNITS (vmode);
+  int elt = d->perm[i];
+
+  /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
+  if (BYTES_BIG_ENDIAN)
+   {
+ if (!d->one_vector_p && d->perm[i] & nunits)
+   {
+ /* Extract the offset.  */
+ elt = d->perm[i] & (nunits - 1);
+ /* Reverse the top half.  */
+ elt = nunits - 1 - elt;
+ /* Offset it by the bottom half.  */
+ elt += nunits;
+   }
+ else
+   elt = nunits - 1 - d->perm[i];
+   }
+
+  rperm[i] = GEN_INT (elt);
+}
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);


Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Xinliang David Li
thanks for the fix!

David

On Fri, Feb 21, 2014 at 12:21 AM, Jakub Jelinek  wrote:
> Hi!
>
> As discussed in the PR, on larger functions we can end up with
> over 3 million of compute_control_dep_chain nested calls from
> a single compute_control_dep_chain call, on that testcase all that
> effort just to get zero or at most one (useless) control dep path.
> The problem is that the function is really unbound, even with the
> 6 element path length limitation (recursion depth) and the limit of 8
> find_pdom calls - everything still iterates on all the successor edges at
> each level.  And, the function is often called on the same basic block
> again and again, even at a particular depth level (e.g. over 20 times
> same bb same depth level).  But the preceeding edge list is slightly
> different in each case and in theory it could give different answers.
>
> Fixed by bounding the total number of nested calls.
>
> Additionally, I've made a couple of cleanups, heap allocating 8 field array
> instead of using an automatic array makes no sense, the chain length is at
> most 6 and thus we can use a stack vector, etc.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-02-21  Jakub Jelinek  
>
> PR tree-optimization/56490
> * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
> * tree-ssa-uninit.c: Include params.h.
> (compute_control_dep_chain): Add num_calls argument, return false
> if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
> num_calls to recursive call.
> (find_predicates): Change dep_chain into normal array,
> cur_chain into auto_vec, add num_calls
> variable and adjust compute_control_dep_chain caller.
> (find_def_preds): Likewise.
>
> --- gcc/params.def.jj   2014-01-09 19:09:47.0 +0100
> +++ gcc/params.def  2014-02-20 19:30:37.467597338 +0100
> @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
>   "asan-use-after-return",
>   "Enable asan builtin functions protection",
>   1, 0, 1)
> +
> +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
> + "uninit-control-dep-attempts",
> + "Maximum number of nested calls to search for control dependencies "
> + "during uninitialized variable analysis",
> + 1000, 1, 0)
>  /*
>
>  Local variables:
> --- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100
> +++ gcc/tree-ssa-uninit.c   2014-02-20 19:31:14.198385817 +0100
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
>  #include "hashtab.h"
>  #include "tree-pass.h"
>  #include "diagnostic-core.h"
> +#include "params.h"
>
>  /* This implements the pass that does predicate aware warning on uses of
> possibly uninitialized variables. The pass first collects the set of
> @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb
>
>  /* Computes the control dependence chains (paths of edges)
> for DEP_BB up to the dominating basic block BB (the head node of a
> -   chain should be dominated by it).  CD_CHAINS is pointer to a
> -   dynamic array holding the result chains. CUR_CD_CHAIN is the current
> +   chain should be dominated by it).  CD_CHAINS is pointer to an
> +   array holding the result chains.  CUR_CD_CHAIN is the current
> chain being computed.  *NUM_CHAINS is total number of chains.  The
> function returns true if the information is successfully computed,
> return false if there is no control dependence or not computed.  */
> @@ -400,7 +401,8 @@ static bool
>  compute_control_dep_chain (basic_block bb, basic_block dep_bb,
> vec *cd_chains,
> size_t *num_chains,
> -   vec *cur_cd_chain)
> +  vec *cur_cd_chain,
> +  int *num_calls)
>  {
>edge_iterator ei;
>edge e;
> @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
>if (EDGE_COUNT (bb->succs) < 2)
>  return false;
>
> +  if (*num_calls > PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
> +return false;
> +  ++*num_calls;
> +
>/* Could use a set instead.  */
>cur_chain_len = cur_cd_chain->length ();
>if (cur_chain_len > MAX_CHAIN_LEN)
> @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b
>
>/* Now check if DEP_BB is indirectly control dependent on BB.  */
>if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
> - num_chains, cur_cd_chain))
> +num_chains, cur_cd_chain, num_calls))
>  {
>found_cd_chain = true;
>break;
> @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
>   basic_block use_bb)
>  {
>size_t num_chains = 0, i;
> -  vec *dep_chains = 0;
> -  vec cur_chain = vNULL;
> +  int num_calls = 0;
> +  vec dep_chains[MAX_NUM_CHAINS];
> +  auto_vec cur_ch

Re: [PATCH] Fix PR 60268

2014-02-21 Thread Vladimir Makarov

On 2/21/2014, 2:22 AM, Andrey Belevantsev wrote:

Hello,

While fixing PR 58960 I forgot about single-block regions placing the
initialization of the new nr_regions_initial variable in the wrong
place. Thus for single block regions we ended up with nr_regions = 1 and
nr_regions_initial = 0 and effectively turned off sched-pressure
immediately.  No worries for the usual scheduling path but with the
-flive-range-shrinkage we have broke an assert that sched-pressure is in
the specific mode.

Fixed by placing the initialization properly at the end of
sched_rgn_init and also moving the check for sched_pressure != NONE
outside of the if statement in schedule_region as discussed in the PR
trail with Jakub.

Bootstrapped and tested on x86-64, ok?




Ok.  Thanks, Andrey.


2014-02-21  Andrey Belevantsev  

 PR rtl-optimization/60268
 * sched-rgn.c (haifa_find_rgns): Move the nr_regions_initial init
to ...
 (sched_rgn_init) ... here.
 (schedule_region): Check for SCHED_PRESSURE_NONE earlier.

testsuite/

2014-02-21  Andrey Belevantsev  

 PR rtl-optimization/60268
 * gcc.c-torture/compile/pr60268.c: New test.




Re: [PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call

2014-02-21 Thread Jan Hubicka
> Hi,
> 
> in propagate_constants_accross_call we expect a thunk to have at least
> one parameter and thus an ipa-prop parameter descriptor.  However,
> when the callee comes from a CU that was compiled with -O0, there are
> no parameter descriptors and we fail an index checking assert.
> 
> This patch fixes it by bailing out early if there are no parameter
> descriptors because in that case there is nothing to do in that
> function anyway.  Bootstrap and testing in progress, OK for trunk if
> it passes?
> 
> Thanks,
> 
> Martin
> 
> 
> 2014-02-21  Martin Jambor  
> 
>   PR ipa/60266
>   * ipa-cp.c (propagate_constants_accross_call): Bail out early if
>   there are no parameter descriptors.

Actually I have similar patch in my tree for few days since I hit the problem
while building libreoffice.

OK.
Honza
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 7d8bc05..4c9ab12 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge 
> *cs)
>args = IPA_EDGE_REF (cs);
>args_count = ipa_get_cs_argument_count (args);
>parms_count = ipa_get_param_count (callee_info);
> +  if (parms_count == 0)
> +return false;
>  
>/* If this call goes through a thunk we must not propagate to the first 
> (0th)
>   parameter.  However, we might need to uncover a thunk from below a 
> series


Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Uros Bizjak
On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar  wrote:
>> > Latest version of AVX512 spec
>> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
>> > Has a few changes.
>> >
>> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
>> > We can either support new CPUID or disable PREFETCHWT1 from generating,
>> > without removing code, and enable it in 4.9.1/latest version.
>> > I am not sure that adding new -m flag and related stuff this late
>> > is a good idea. Should still add it?
>>
>> Please submit the patch anyway. We can relax release constraints on
>> non-algorithmic patch a bit, weighting in benefits of having gcc
>> release that fully conforms to some published specification.
>>
> Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
> and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
> Ok for trunk?
>
> ChangeLog:
>
> 2014-02-21  Ilya Tocar  
>
> * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
> (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
> (ix86_handle_option): Handle OPT_mprefetchwt1.
> * config/i386/cpuid.h (bit_PREFETCHWT1): New.
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> PREFETCHWT1 CPUID.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> OPTION_MASK_ISA_PREFETCHWT1.
> * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
> (PTA_PREFETCHWT1): New.
> (ix86_option_override_internal): Handle PTA_PREFETCHWT1.
> (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
> * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
>   New.
> * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
> (*prefetch_avx512pf__: Change into ...
>  (*prefetch_prefetchwt1_: This.
> * config/i386/i386.opt (mprefetchwt1): New.
> * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
> (_mm_prefetch): Handle intent to write.
> * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.
>
> And for tests:
>
> 2014-02-22  Ilya Tocar  
>
> * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
> * gcc.target/i386/prefetchwt1-1.c: New.
> * gcc.target/i386/sse-13.c: Update __builtin_prefetch.
> * gcc.target/i386/sse-23.c: Ditto.

Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
g++.dg/other/i386-{2,3} and new options to
gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
repost the patch.

> @@ -17867,8 +17867,8 @@
>   supported by SSE counterpart or the SSE prefetch is not available
>   (K6 machines).  Otherwise use SSE prefetch as it allows specifying
>   of locality.  */
> -  if (TARGET_AVX512PF && write)
> -operands[2] = const1_rtx;
> +  if (TARGET_PREFETCHWT1 && write)
> +operands[2] = GEN_INT (2);

you can use const2_rtx here.

Uros.


[PATCH, rs6000] vec_sums must define all result vector elements

2014-02-21 Thread Bill Schmidt
Hi,

The little-endian implementation of vec_sums is incorrect.  I had
misread the specification and thought that the fields not containing the
result value were undefined, but in fact they are defined to contain
zero.  My previous implementation used a vector splat to copy the field
from BE element 3 to LE element 3.  The corrected implementation will
use a vector shift left to move the field and fill the remaining fields
with zeros.

When I fixed this, I discovered I had also missed a use of
gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct
instead.  This is fixed in this patch as well.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Bootstrap and regression test on
powerpc64-unknown-linux-gnu is in progress.  If no big-endian
regressions are found, is this ok for trunk?

Thanks,
Bill


gcc:

2014-02-21  Bill Schmidt  

* config/rs6000/altivec.md (altivec_vsumsws): Replace second
vspltw with vsldoi.
(reduc_uplus_v16qi): Use gen_altivec_vsumsws_direct instead of
gen_altivec_vsumsws.

gcc/testsuite:

2014-02-21  Bill Schmidt  

* gcc.dg/vmx/vsums.c: Check entire result vector.
* gcc.dg/vmx/vsums-be-order.c: Likewise.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 207967)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1651,7 +1651,7 @@
   if (VECTOR_ELT_ORDER_BIG)
 return "vsumsws %0,%1,%2";
   else
-return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvspltw %0,%3,3";
+return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12";
 }
   [(set_attr "type" "veccomplex")
(set (attr "length")
@@ -2483,7 +2539,7 @@
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws (dest, vtmp1, vzero));
+  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
   DONE;
 })
 
Index: gcc/testsuite/gcc.dg/vmx/vsums-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/vsums-be-order.c   (revision 207967)
+++ gcc/testsuite/gcc.dg/vmx/vsums-be-order.c   (working copy)
@@ -8,12 +8,13 @@ static void test()
 
 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   vector signed int vb = {128,0,0,0};
+  vector signed int evd = {136,0,0,0};
 #else
   vector signed int vb = {0,0,0,128};
+  vector signed int evd = {0,0,0,136};
 #endif
 
   vector signed int vd = vec_sums (va, vb);
-  signed int r = vec_extract (vd, 3);
 
-  check (r == 136, "sums");
+  check (vec_all_eq (vd, evd), "sums");
 }
Index: gcc/testsuite/gcc.dg/vmx/vsums.c
===
--- gcc/testsuite/gcc.dg/vmx/vsums.c(revision 207967)
+++ gcc/testsuite/gcc.dg/vmx/vsums.c(working copy)
@@ -4,9 +4,9 @@ static void test()
 {
   vector signed int va = {-7,11,-13,17};
   vector signed int vb = {0,0,0,128};
+  vector signed int evd = {0,0,0,136};
 
   vector signed int vd = vec_sums (va, vb);
-  signed int r = vec_extract (vd, 3);
 
-  check (r == 136, "sums");
+  check (vec_all_eq (vd, evd), "sums");
 }




Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Ilya Verbin
2014-02-21 19:41 GMT+04:00 Bernd Schmidt :
> The problem is that ptx does not have a linker, so we cannot exactly
> reproduce what happens on the host side. We have to process all host .o
> files in one single invocation of ptx lto1, and produce a single ptx
> assembly file, with a single function/variable table, from there. Having
> functions and variables separated gives us at least a small chance that the
> order will match that found in the host tables if the host table is produced
> by linking multiple fragments.

If ptx lto1 will process all .o files in order as they were passed to
it, the resulting table should be consistent with the table produced
by host's lto1.

> What kind of dependencies between liba and libb do you expect to be able to
> support on the target side? References to each other's functions and
> variables?

Yes, references to global variables and calls to functions, marked
with "omp declare target".


Re: [PATCH, rs6000] vec_sums must define all result vector elements

2014-02-21 Thread David Edelsohn
On Fri, Feb 21, 2014 at 12:56 PM, Bill Schmidt
 wrote:
> Hi,
>
> The little-endian implementation of vec_sums is incorrect.  I had
> misread the specification and thought that the fields not containing the
> result value were undefined, but in fact they are defined to contain
> zero.  My previous implementation used a vector splat to copy the field
> from BE element 3 to LE element 3.  The corrected implementation will
> use a vector shift left to move the field and fill the remaining fields
> with zeros.
>
> When I fixed this, I discovered I had also missed a use of
> gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct
> instead.  This is fixed in this patch as well.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Bootstrap and regression test on
> powerpc64-unknown-linux-gnu is in progress.  If no big-endian
> regressions are found, is this ok for trunk?

Okay.
Thanks, David


[PATCH, testsuite]: Add some missing avx512 options to g++.dg/other/i386-{2,3}.C and gcc.target/i386/sse-{12,13}.c

2014-02-21 Thread Uros Bizjak
Hello!

No additional testsuite failures.

2014-02-21  Uros Bizjak  

* g++.dg/other/i386-2.C (dg-options): Add -mavx512pf.
* g++.dg/other/i386-3.C (dg-options): Ditto.
* gcc.target/i386/sse-12.c (dg-options): Add -msha.
* gcc.target/i386/sse-13.c (dg-options): Add -mavx512er, -mavx512cd,
-mavx512pf and -msha.

Tested on x86_64-pc-linux-gnu and committed to mainline SVN.

Uros.
Index: g++.dg/other/i386-2.C
===
--- g++.dg/other/i386-2.C   (revision 208010)
+++ g++.dg/other/i386-2.C   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 
-mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp 
-mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt 
-mavx512f -mavx512er -mavx512cd -msha" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 
-mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp 
-mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt 
-mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
Index: g++.dg/other/i386-3.C
===
--- g++.dg/other/i386-3.C   (revision 208010)
+++ g++.dg/other/i386-3.C   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -msha" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
Index: gcc.target/i386/sse-12.c
===
--- gcc.target/i386/sse-12.c(revision 208010)
+++ gcc.target/i386/sse-12.c(working copy)
@@ -3,7 +3,7 @@
popcntintrin.h and mm_malloc.h are usable
with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512cd -mavx512er -mavx512pf" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
 
 #include 
 
Index: gcc.target/i386/sse-13.c
===
--- gcc.target/i386/sse-13.c(revision 208010)
+++ gcc.target/i386/sse-13.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a 
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi 
-mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw 
-madx -mfxsr -mxsaveopt -mavx512f" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a 
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi 
-mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw 
-madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
 
 #include 
 


Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2014-02-21 Thread David Edelsohn
On Thu, Feb 20, 2014 at 2:46 PM, Bill Schmidt
 wrote:
> Hi,
>
> For compatibility with the XL compilers, we need to support -maltivec=be
> for vec_ld, vec_ldl, vec_st, and vec_stl.  (A later patch will also
> handle vec_lde and vec_ste.)
>
> This is a much simpler patch than its size would indicate.  The original
> implementation of these built-ins treated them all as always loading and
> storing V4SI values, relying on subregs to adjust type mismatches.  For
> this work we need to have the true type so that we know how to reverse
> the order of vector elements.  So most of this patch is the busy-work of
> adding new built-in definitions for all the supported types (six types
> for each of the four built-ins).
>
> The real work is done in altivec.md to call altivec_expand_{lvx,stvx}_be
> for these built-ins when -maltivec=be is selected for a little endian
> target, and in rs6000.c where these functions are defined.  For the
> loads, the usual load insn is generated followed by a permute to reverse
> the order of the vector elements.  For the stores, the usual store insn
> is generated preceded by a permute to reverse the order of the vector
> elements.  A common routine swap_selector_for_mode is used to generate
> the permute control vector for the permute.
>
> There are 16 new tests, 4 for each built-in.  These cover the VMX and
> VSX built-ins for big-endian, little-endian, and little-endian with
> -maltivec=be.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
> regressions.  All the new tests pass in all endian environments.  Is
> this ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2014-02-20  Bill Schmidt  
>
> * config/rs6000/altivec.md (altivec_lvxl): Rename as
> *altivec_lvxl__internal and use VM2 iterator instead of
> V4SI.
> (altivec_lvxl_): New define_expand incorporating
> -maltivec=be semantics where needed.
> (altivec_lvx): Rename as *altivec_lvx__internal.
> (altivec_lvx_): New define_expand incorporating -maltivec=be
> semantics where needed.
> (altivec_stvx): Rename as *altivec_stvx__internal.
> (altivec_stvx_): New define_expand incorporating
> -maltivec=be semantics where needed.
> (altivec_stvxl): Rename as *altivec_stvxl__internal and use
> VM2 iterator instead of V4SI.
> (altivec_stvxl_): New define_expand incorporating
> -maltivec=be semantics where needed.
> * config/rs6000/rs6000-builtin.def: Add new built-in definitions
> LVXL_V2DF, LVXL_V2DI, LVXL_V4SF, LVXL_V4SI, LVXL_V8HI, LVXL_V16QI,
> LVX_V2DF, LVX_V2DI, LVX_V4SF, LVX_V4SI, LVX_V8HI, LVX_V16QI,
> STVX_V2DF, STVX_V2DI, STVX_V4SF, STVX_V4SI, STVX_V8HI, STVX_V16QI,
> STVXL_V2DF, STVXL_V2DI, STVXL_V4SF, STVXL_V4SI, STVXL_V8HI,
> STVXL_V16QI.
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Replace
> ALTIVEC_BUILTIN_LVX with ALTIVEC_BUILTIN_LVX_ throughout;
> similarly for ALTIVEC_BUILTIN_LVXL, ALTIVEC_BUILTIN_STVX, and
> ALTIVEC_BUILTIN_STVXL.
> * config/rs6000/rs6000-protos.h (altivec_expand_lvx_be): New
> prototype.
> (altivec_expand_stvx_be): Likewise.
> * config/rs6000/rs6000.c (swap_selector_for_mode): New function.
> (altivec_expand_lvx_be): Likewise.
> (altivec_expand_stvx_be): Likewise.
> (altivec_expand_builtin): Add cases for
> ALTIVEC_BUILTIN_STVX_, ALTIVEC_BUILTIN_STVXL_,
> ALTIVEC_BUILTIN_LVXL_, and ALTIVEC_BUILTIN_LVX_.
> (altivec_init_builtins): Add definitions for
> __builtin_altivec_lvxl_, __builtin_altivec_lvx_,
> __builtin_altivec_stvx_, and
> __builtin_altivec_stvxl_.
>
>
> gcc/testsuite:
>
> 2014-02-20  Bill Schmidt  
>
> * gcc.dg/vmx/ld.c: New test.
> * gcc.dg/vmx/ld-be-order.c: New test.
> * gcc.dg/vmx/ld-vsx.c: New test.
> * gcc.dg/vmx/ld-vsx-be-order.c: New test.
> * gcc.dg/vmx/ldl.c: New test.
> * gcc.dg/vmx/ldl-be-order.c: New test.
> * gcc.dg/vmx/ldl-vsx.c: New test.
> * gcc.dg/vmx/ldl-vsx-be-order.c: New test.
> * gcc.dg/vmx/st.c: New test.
> * gcc.dg/vmx/st-be-order.c: New test.
> * gcc.dg/vmx/st-vsx.c: New test.
> * gcc.dg/vmx/st-vsx-be-order.c: New test.
> * gcc.dg/vmx/stl.c: New test.
> * gcc.dg/vmx/stl-be-order.c: New test.
> * gcc.dg/vmx/stl-vsx.c: New test.
> * gcc.dg/vmx/stl-vsx-be-order.c: New test.

Okay.
Thanks, David


[GOMP4] gimple_code_is_oacc -> is_gimple_omp_oacc_specifically (was: [PATCH 4/6] [GOMP4] OpenACC 1.0+ support in fortran front-end)

2014-02-21 Thread Thomas Schwinge
Hi!

On Tue, 11 Feb 2014 17:51:15 +0100, I wrote:
> On Fri, 31 Jan 2014 15:16:07 +0400, Ilmir Usmanov  
> wrote:
> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c
> > @@ -1491,6 +1491,18 @@ fixup_child_record_type (omp_context *ctx)
> >TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type);
> >  }
> >  
> > +static bool
> > +gimple_code_is_oacc (const_gimple g)
> > +{
> > +  switch (gimple_code (g))
> > +{
> > +case GIMPLE_OACC_PARALLEL:
> > +  return true;
> > +default:
> > +  return false;
> > +}
> > +}
> > +
> 
> Eventually, this will probably end up next to CASE_GIMPLE_OMP/is_gimple_omp
> in gimple.h (or the latter be reworked to be able to ask for is_omp vs.
> is_oacc vs. is_omp_or_oacc), but it's fine to do that once we actually
> need it in files other than just omp-low.c, and once we support more
> GIMPLE_OACC_* codes.

Ah, well, I'm now in the situation that I need to do such a check in
another file, so I have applied the following to gomp-4_0-branch in
r208013.  I have also renamed the function to
is_gimple_omp_oacc_specifically, building on the existing is_gimple_omp
name.  (Don't worry about the unwieldy name, as all this is to disappear
as the development progresses.)

commit 25aab0dd39a57661e9d7f3a5f405f4647977b9de
Author: tschwinge 
Date:   Fri Feb 21 19:26:01 2014 +

gimple_code_is_oacc -> is_gimple_omp_oacc_specifically.

gcc/
* omp-low.c (gimple_code_is_oacc): Move to...
* gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
users, and also use it in more places where currently we've only
been checking for GIMPLE_OACC_PARALLEL.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208013 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 14d8805..1ce952d 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,10 @@
+2014-02-21  Thomas Schwinge  
+
+   * omp-low.c (gimple_code_is_oacc): Move to...
+   * gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
+   users, and also use it in more places where currently we've only
+   been checking for GIMPLE_OACC_PARALLEL.
+
 2014-02-18  Thomas Schwinge  
 
* omp-low.c (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2): Handle
diff --git gcc/gimple.h gcc/gimple.h
index 5b5a0ee..0d250ef 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ -5670,6 +5670,25 @@ is_gimple_omp (const_gimple stmt)
 }
 }
 
+/* Return true if STMT is any of the OpenACC types specifically.
+
+   TODO: This function should go away eventually, once all its callers have
+   either been fixed, changed into more specific checks, or verified to not
+   need any special handling for OpenACC.  */
+
+static inline bool
+is_gimple_omp_oacc_specifically (const_gimple stmt)
+{
+  gcc_assert (is_gimple_omp (stmt));
+  switch (gimple_code (stmt))
+{
+case GIMPLE_OACC_PARALLEL:
+  return true;
+default:
+  return false;
+}
+}
+
 
 /* Returns TRUE if statement G is a GIMPLE_NOP.  */
 
diff --git gcc/omp-low.c gcc/omp-low.c
index 110ea63..b975dad 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -863,7 +863,7 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
  when we know the value is not accessible from an outer scope.  */
   if (shared_ctx)
 {
-  gcc_assert (gimple_code (shared_ctx->stmt) != GIMPLE_OACC_PARALLEL);
+  gcc_assert (!is_gimple_omp_oacc_specifically (shared_ctx->stmt));
 
   /* ??? Trivially accessible from anywhere.  But why would we even
 be passing an address in this case?  Should we simply assert
@@ -1006,7 +1006,7 @@ build_receiver_ref (tree var, bool by_ref, omp_context 
*ctx)
 static tree
 build_outer_var_ref (tree var, omp_context *ctx)
 {
-  gcc_assert (gimple_code (ctx->stmt) != GIMPLE_OACC_PARALLEL);
+  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
 
   tree x;
 
@@ -1072,7 +1072,7 @@ install_var_field (tree var, bool by_ref, int mask, 
omp_context *ctx)
   gcc_assert ((mask & 2) == 0 || !ctx->sfield_map
  || !splay_tree_lookup (ctx->sfield_map, (splay_tree_key) var));
   gcc_assert ((mask & 3) == 3
- || gimple_code (ctx->stmt) != GIMPLE_OACC_PARALLEL);
+ || !is_gimple_omp_oacc_specifically (ctx->stmt));
 
   type = TREE_TYPE (var);
   if (mask & 4)
@@ -1491,18 +1491,6 @@ fixup_child_record_type (omp_context *ctx)
   TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type);
 }
 
-static bool
-gimple_code_is_oacc (const_gimple g)
-{
-  switch (gimple_code (g))
-{
-case GIMPLE_OACC_PARALLEL:
-  return true;
-default:
-  return false;
-}
-}
-
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
specified by CLAUSES.  */
 
@@ -1519,7 +1507,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
   switch (OMP_CLAUSE_CODE (c))
{
case OMP_CLAUSE_PRIVATE:
- gcc_assert (gimple_code (ctx->stmt) != GIMPLE_O

Re: [gomp4 3/6] Initial support for OpenACC memory mapping semantics.

2014-02-21 Thread Thomas Schwinge
Hi!

On Tue, 14 Jan 2014 16:10:05 +0100, I wrote:
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -86,7 +92,11 @@ enum omp_region_type
>ORT_UNTIED_TASK = 5,
>ORT_TEAMS = 8,
>ORT_TARGET_DATA = 16,
> -  ORT_TARGET = 32
> +  ORT_TARGET = 32,
> +
> +  /* Flags for ORT_TARGET.  */
> +  /* Default to GOVD_MAP_FORCE for implicit mappings in this region.  */
> +  ORT_TARGET_MAP_FORCE = 64
>  };

Continuing on that route, I have now applied the following to
gomp-4_0-branch in r208014:

commit dee2965ae547af0bc90d618e7fa40fbf2f5292b4
Author: tschwinge 
Date:   Fri Feb 21 19:45:12 2014 +

Gimplification: New flag ORT_TARGET_OFFLOAD replaces !ORT_TARGET_DATA.

gcc/
* gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
Update all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208014 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 1ce952d..bf8ec96 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2014-02-21  Thomas Schwinge  
 
+   * gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
+   flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
+   Update all users.
+
* omp-low.c (gimple_code_is_oacc): Move to...
* gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
users, and also use it in more places where currently we've only
diff --git gcc/gimplify.c gcc/gimplify.c
index 51a1b73..9aa9301c 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -100,10 +100,11 @@ enum omp_region_type
   ORT_TASK = 4,
   ORT_UNTIED_TASK = 5,
   ORT_TEAMS = 8,
-  ORT_TARGET_DATA = 16,
-  ORT_TARGET = 32,
+  ORT_TARGET = 16,
 
   /* Flags for ORT_TARGET.  */
+  /* Prepare this region for offloading.  */
+  ORT_TARGET_OFFLOAD = 32,
   /* Default to GOVD_MAP_FORCE for implicit mappings in this region.  */
   ORT_TARGET_MAP_FORCE = 64
 };
@@ -2202,7 +2203,7 @@ gimplify_arg (tree *arg_p, gimple_seq *pre_p, location_t 
call_location)
   return gimplify_expr (arg_p, pre_p, NULL, test, fb);
 }
 
-/* Don't fold STMT inside ORT_TARGET, because it can break code by adding decl
+/* Don't fold inside offloading regsion: it can break code by adding decl
references that weren't in the source.  We'll do it during omplower pass
instead.  */
 
@@ -2211,7 +2212,8 @@ maybe_fold_stmt (gimple_stmt_iterator *gsi)
 {
   struct gimplify_omp_ctx *ctx;
   for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context)
-if (ctx->region_type & ORT_TARGET)
+if (ctx->region_type & ORT_TARGET
+   && ctx->region_type & ORT_TARGET_OFFLOAD)
   return false;
   return fold_stmt (gsi);
 }
@@ -5388,10 +5390,12 @@ omp_firstprivatize_variable (struct gimplify_omp_ctx 
*ctx, tree decl)
return;
}
   else if (ctx->region_type & ORT_TARGET)
-   omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY);
+   {
+ if (ctx->region_type & ORT_TARGET_OFFLOAD)
+   omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY);
+   }
   else if (ctx->region_type != ORT_WORKSHARE
-  && ctx->region_type != ORT_SIMD
-  && ctx->region_type != ORT_TARGET_DATA)
+  && ctx->region_type != ORT_SIMD)
omp_add_variable (ctx, decl, GOVD_FIRSTPRIVATE);
 
   ctx = ctx->outer_context;
@@ -5580,7 +5584,8 @@ omp_notice_threadprivate_variable (struct 
gimplify_omp_ctx *ctx, tree decl,
   struct gimplify_omp_ctx *octx;
 
   for (octx = ctx; octx; octx = octx->outer_context)
-if (octx->region_type & ORT_TARGET)
+if ((octx->region_type & ORT_TARGET)
+   && (octx->region_type & ORT_TARGET_OFFLOAD))
   {
gcc_assert (!(octx->region_type & ORT_TARGET_MAP_FORCE));
 
@@ -5643,7 +5648,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
 }
 
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
-  if (ctx->region_type & ORT_TARGET)
+  if ((ctx->region_type & ORT_TARGET)
+  && (ctx->region_type & ORT_TARGET_OFFLOAD))
 {
   unsigned map_force;
   if (ctx->region_type & ORT_TARGET_MAP_FORCE)
@@ -5695,7 +5701,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
 
   if (ctx->region_type == ORT_WORKSHARE
  || ctx->region_type == ORT_SIMD
- || ctx->region_type == ORT_TARGET_DATA)
+ || ((ctx->region_type & ORT_TARGET)
+ && !(ctx->region_type & ORT_TARGET_OFFLOAD)))
goto do_outer;
 
   /* ??? Some compiler-generated variables (like SAVE_EXPRs) could be
@@ -5746,7 +5753,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
{
  splay_tree_node n2;
 
- if ((octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)) != 0)
+ if (octx->region_type & ORT_TARGET)
continue;
  n2

[gomp4 1/3] Clarify to/from/map clauses usage in context of GF_OMP_TARGET_KIND_UPDATE.

2014-02-21 Thread Thomas Schwinge
From: tschwinge 

gcc/
* omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208015 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  3 +++
 gcc/omp-low.c  | 25 +
 2 files changed, 28 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index bf8ec96..bd46f2e 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-02-21  Thomas Schwinge  
 
+   * omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
+   of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.
+
* gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
Update all users.
diff --git gcc/omp-low.c gcc/omp-low.c
index 9fef4c1..bca4599 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1630,6 +1630,26 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FROM:
  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
case OMP_CLAUSE_MAP:
+ switch (OMP_CLAUSE_CODE (c))
+   {
+   case OMP_CLAUSE_TO:
+   case OMP_CLAUSE_FROM:
+ /* The to and from clauses are only ever seen with OpenMP target
+update constructs.  */
+ gcc_assert (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+ && (gimple_omp_target_kind (ctx->stmt)
+ == GF_OMP_TARGET_KIND_UPDATE));
+ break;
+   case OMP_CLAUSE_MAP:
+ /* The map clause is never seen with OpenMP target update
+constructs.  */
+ gcc_assert (gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET
+ || (gimple_omp_target_kind (ctx->stmt)
+ != GF_OMP_TARGET_KIND_UPDATE));
+ break;
+   default:
+ gcc_unreachable ();
+   }
  if (ctx->outer)
scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer);
  decl = OMP_CLAUSE_DECL (c);
@@ -1799,6 +1819,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  break;
 
case OMP_CLAUSE_MAP:
+ /* The map clause is never seen with OpenMP target update
+constructs.  */
+ gcc_assert (gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET
+ || (gimple_omp_target_kind (ctx->stmt)
+ != GF_OMP_TARGET_KIND_UPDATE));
  if (!gimple_code_is_oacc (ctx->stmt)
  && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_DATA)
break;
-- 
1.8.1.1



[gomp4 2/3] OpenACC data construct implementation in terms of GF_OMP_TARGET_KIND_OACC_DATA.

2014-02-21 Thread Thomas Schwinge
From: tschwinge 

gcc/
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA.
(is_gimple_omp_oacc_specifically): Handle it.
* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
* gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise.
* omp-low.c (scan_sharing_clauses, scan_omp_target)
(expand_omp_target, lower_omp_target, lower_omp_1): Likewise.
* gimple.def (GIMPLE_OMP_TARGET): Update comment.
* gimple.c (gimple_build_omp_target): Likewise.
(gimple_copy): Catch unimplemented case.
* tree-inline.c (remap_gimple_stmt): Likewise.
* tree-nested.c (convert_nonlocal_reference_stmt)
(convert_local_reference_stmt, convert_gimple_call): Likewise.
* oacc-builtins.def (BUILT_IN_GOACC_DATA_START)
(BUILT_IN_GOACC_DATA_END): New builtins.
libgomp/
* libgomp.map (GOACC_2.0): Add GOACC_data_end, GOACC_data_start.
* libgomp_g.h (GOACC_data_start, GOACC_data_end): New prototypes.
* oacc-parallel.c (GOACC_data_start, GOACC_data_end): New
functions.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208016 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  15 ++
 gcc/gimple-pretty-print.c |   3 ++
 gcc/gimple.c  |   4 +-
 gcc/gimple.def|   1 +
 gcc/gimple.h  |   9 
 gcc/gimplify.c|  33 +---
 gcc/oacc-builtins.def |   6 ++-
 gcc/omp-low.c | 132 --
 gcc/tree-inline.c |   1 +
 gcc/tree-nested.c |   3 ++
 libgomp/ChangeLog.gomp|   7 +++
 libgomp/libgomp.map   |   2 +
 libgomp/libgomp_g.h   |   3 ++
 libgomp/oacc-parallel.c   |  34 +++-
 14 files changed, 213 insertions(+), 40 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index bd46f2e..824ec94 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,20 @@
 2014-02-21  Thomas Schwinge  
 
+   * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA.
+   (is_gimple_omp_oacc_specifically): Handle it.
+   * gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
+   * gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise.
+   * omp-low.c (scan_sharing_clauses, scan_omp_target)
+   (expand_omp_target, lower_omp_target, lower_omp_1): Likewise.
+   * gimple.def (GIMPLE_OMP_TARGET): Update comment.
+   * gimple.c (gimple_build_omp_target): Likewise.
+   (gimple_copy): Catch unimplemented case.
+   * tree-inline.c (remap_gimple_stmt): Likewise.
+   * tree-nested.c (convert_nonlocal_reference_stmt)
+   (convert_local_reference_stmt, convert_gimple_call): Likewise.
+   * oacc-builtins.def (BUILT_IN_GOACC_DATA_START)
+   (BUILT_IN_GOACC_DATA_END): New builtins.
+
* omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.
 
diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c
index 91a3eb2..ad9369c 100644
--- gcc/gimple-pretty-print.c
+++ gcc/gimple-pretty-print.c
@@ -1289,6 +1289,9 @@ dump_gimple_omp_target (pretty_printer *buffer, gimple 
gs, int spc, int flags)
 case GF_OMP_TARGET_KIND_UPDATE:
   kind = " update";
   break;
+case GF_OMP_TARGET_KIND_OACC_DATA:
+  kind = " oacc_data";
+  break;
 default:
   gcc_unreachable ();
 }
diff --git gcc/gimple.c gcc/gimple.c
index 2a967aa..30561b1 100644
--- gcc/gimple.c
+++ gcc/gimple.c
@@ -1051,7 +1051,8 @@ gimple_build_omp_single (gimple_seq body, tree clauses)
 /* Build a GIMPLE_OMP_TARGET statement.
 
BODY is the sequence of statements that will be executed.
-   CLAUSES are any of the OMP target construct's clauses.  */
+   KIND is the kind of target region.
+   CLAUSES are any of the construct's clauses.  */
 
 gimple
 gimple_build_omp_target (gimple_seq body, int kind, tree clauses)
@@ -1747,6 +1748,7 @@ gimple_copy (gimple stmt)
case GIMPLE_OMP_TASKGROUP:
case GIMPLE_OMP_ORDERED:
copy_omp_body:
+ gcc_assert (!is_gimple_omp_oacc_specifically (stmt));
  new_seq = gimple_seq_copy (gimple_omp_body (stmt));
  gimple_omp_set_body (copy, new_seq);
  break;
diff --git gcc/gimple.def gcc/gimple.def
index 2b78c06..ce800bd 100644
--- gcc/gimple.def
+++ gcc/gimple.def
@@ -360,6 +360,7 @@ DEFGSCODE(GIMPLE_OMP_SECTIONS_SWITCH, 
"gimple_omp_sections_switch", GSS_BASE)
 DEFGSCODE(GIMPLE_OMP_SINGLE, "gimple_omp_single", GSS_OMP_SINGLE_LAYOUT)
 
 /* GIMPLE_OMP_TARGET  represents
+   #pragma acc data
#pragma omp target {,data,update}
BODY is the sequence of statements inside the target construct
(NULL for target update).
diff --git gcc/gimple.h gcc/gimple.h
index 0d250ef..b4ee9fa 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ -102,6 +102,7 @@ enum gf_mask {
 GF_OMP_TARGET_KIND_REGION  = 0 << 0,
 GF_OMP_TARG

[gomp4 3/3] OpenACC data construct support in the C front end.

2014-02-21 Thread Thomas Schwinge
From: tschwinge 

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add "data".
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA.
gcc/c/
* c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition.
(c_parser_oacc_data): New function.
(c_parser_omp_construct): Handle PRAGMA_OACC_DATA.
* c-tree.h (c_finish_oacc_data): New prototype.
* c-typeck.c (c_finish_oacc_data): New function.
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC
data construct.
* c-c++-common/goacc/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/parallel-fail-1.c: Rename to...
* c-c++-common/goacc/clauses-fail.c: ... this new file.  Extend
for OpenACC data construct.
* c-c++-common/goacc/data-1.c: New file.
libgomp/
* testsuite/libgomp.oacc-c/data-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208017 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c-family/ChangeLog.gomp|   5 +
 gcc/c-family/c-pragma.c|   1 +
 gcc/c-family/c-pragma.h|   1 +
 gcc/c/ChangeLog.gomp   |   8 +
 gcc/c/c-parser.c   |  42 +
 gcc/c/c-tree.h |   1 +
 gcc/c/c-typeck.c   |  19 +++
 gcc/testsuite/ChangeLog.gomp   |  10 ++
 .../c-c++-common/goacc-gomp/nesting-fail-1.c   |  92 ++-
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c|   9 ++
 gcc/testsuite/c-c++-common/goacc/data-1.c  |   6 +
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  18 ++-
 gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c |   6 -
 libgomp/ChangeLog.gomp |   2 +
 libgomp/testsuite/libgomp.oacc-c/data-1.c  | 170 +
 15 files changed, 380 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/clauses-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/data-1.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/data-1.c

diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp
index e092d53..3da377f 100644
--- gcc/c-family/ChangeLog.gomp
+++ gcc/c-family/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-02-21  Thomas Schwinge  
+
+   * c-pragma.c (oacc_pragmas): Add "data".
+   * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA.
+
 2014-01-28  Thomas Schwinge  
 
* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_COPY,
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index f69486a..08374aa 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1169,6 +1169,7 @@ static vec registered_pp_pragmas;
 
 struct omp_pragma_def { const char *name; unsigned int id; };
 static const struct omp_pragma_def oacc_pragmas[] = {
+  { "data", PRAGMA_OACC_DATA },
   { "parallel", PRAGMA_OACC_PARALLEL },
 };
 static const struct omp_pragma_def omp_pragmas[] = {
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index 1ea5b1d..d092f9f 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef enum pragma_kind {
   PRAGMA_NONE = 0,
 
+  PRAGMA_OACC_DATA,
   PRAGMA_OACC_PARALLEL,
   PRAGMA_OMP_ATOMIC,
   PRAGMA_OMP_BARRIER,
diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp
index b199957..9b95725 100644
--- gcc/c/ChangeLog.gomp
+++ gcc/c/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-02-21  Thomas Schwinge  
+
+   * c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition.
+   (c_parser_oacc_data): New function.
+   (c_parser_omp_construct): Handle PRAGMA_OACC_DATA.
+   * c-tree.h (c_finish_oacc_data): New prototype.
+   * c-typeck.c (c_finish_oacc_data): New function.
+
 2014-02-17  Thomas Schwinge  
 
* c-parser.c (c_parser_omp_clause_name): Accept pcopy, pcopyin,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 7850eab..4643722 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -4776,10 +4776,14 @@ c_parser_label (c_parser *parser)
 
openacc-construct:
  parallel-construct
+ data-construct
 
parallel-construct:
  parallel-directive structured-block
 
+   data-construct:
+ data-directive structured-block
+
OpenMP:
 
statement:
@@ -11362,6 +11366,41 @@ c_parser_omp_structured_block (c_parser *parser)
 }
 
 /* OpenACC 2.0:
+   # pragma acc data oacc-data-clause[optseq] new-line
+ structured-block
+
+   LOC is the location of the #pragma token.
+*/
+
+#define OACC_DATA_CLAUSE_MASK  \
+   ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COPY) \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COPYIN)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_C

patch to fix PR60298

2014-02-21 Thread Vladimir Makarov

The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60298

The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 208023.

2014-02-21  Vladimir Makarov  

PR target/60298
* lra-constraints.c (inherit_reload_reg): Use lra_emit_move
instead of emit_move_insn.
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 207787)
+++ lra-constraints.c   (working copy)
@@ -4473,9 +4473,9 @@ inherit_reload_reg (bool def_p, int orig
rclass, "inheritance");
   start_sequence ();
   if (def_p)
-emit_move_insn (original_reg, new_reg);
+lra_emit_move (original_reg, new_reg);
   else
-emit_move_insn (new_reg, original_reg);
+lra_emit_move (new_reg, original_reg);
   new_insns = get_insns ();
   end_sequence ();
   if (NEXT_INSN (new_insns) != NULL_RTX)


C++ PATCH for c++/60241 (ICE with specialization of member class template)

2014-02-21 Thread Jason Merrill
We already have the code to reassign instances to the appropriate 
template when we see a specialization of a partial instantiation of a 
member template, but it wasn't firing properly in this case, for two 
reasons:


1) We were attaching the instances to the most general template and then 
looking for them on the partial instantiation.

2) We were only reassigning explicit specializations.

Tested x86_64-pc-linux-gnu, applying to trunk.  It should be appropriate 
for backporting later if it doesn't cause trouble.
commit 667bae7d1bfeea4e881cf6236d8679fc0c11c49e
Author: Jason Merrill 
Date:   Fri Feb 21 13:51:18 2014 -0500

	PR c++/60241
	* pt.c (lookup_template_class_1): Update DECL_TEMPLATE_INSTANTIATIONS
	of the partial instantiation, not the most general template.
	(maybe_process_partial_specialization): Reassign everything on
	that list.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a394441..91a8840 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -914,11 +914,13 @@ maybe_process_partial_specialization (tree type)
 	   t; t = TREE_CHAIN (t))
 	{
 	  tree inst = TREE_VALUE (t);
-	  if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst))
+	  if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst)
+		  || !COMPLETE_OR_OPEN_TYPE_P (inst))
 		{
 		  /* We already have a full specialization of this partial
-		 instantiation.  Reassign it to the new member
-		 specialization template.  */
+		 instantiation, or a full specialization has been
+		 looked up but not instantiated.  Reassign it to the
+		 new member specialization template.  */
 		  spec_entry elt;
 		  spec_entry *entry;
 		  void **slot;
@@ -937,7 +939,7 @@ maybe_process_partial_specialization (tree type)
 		  *entry = elt;
 		  *slot = entry;
 		}
-	  else if (COMPLETE_OR_OPEN_TYPE_P (inst))
+	  else
 		/* But if we've had an implicit instantiation, that's a
 		   problem ([temp.expl.spec]/6).  */
 		error ("specialization %qT after instantiation %qT",
@@ -7596,7 +7598,7 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 	}
 
   /* Let's consider the explicit specialization of a member
- of a class template specialization that is implicitely instantiated,
+ of a class template specialization that is implicitly instantiated,
 	 e.g.:
 	 template
 	 struct S
@@ -7694,9 +7696,9 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 
   /* Note this use of the partial instantiation so we can check it
 	 later in maybe_process_partial_specialization.  */
-  DECL_TEMPLATE_INSTANTIATIONS (templ)
+  DECL_TEMPLATE_INSTANTIATIONS (found)
 	= tree_cons (arglist, t,
-		 DECL_TEMPLATE_INSTANTIATIONS (templ));
+		 DECL_TEMPLATE_INSTANTIATIONS (found));
 
   if (TREE_CODE (template_type) == ENUMERAL_TYPE && !is_dependent_type
 	  && !DECL_ALIAS_TEMPLATE_P (gen_tmpl))
diff --git a/gcc/testsuite/g++.dg/template/memclass5.C b/gcc/testsuite/g++.dg/template/memclass5.C
new file mode 100644
index 000..eb32f13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/memclass5.C
@@ -0,0 +1,26 @@
+// PR c++/60241
+
+template 
+struct x
+{
+template 
+struct y
+{
+typedef T result2;
+};
+
+typedef y zy;
+};
+
+template<>
+template
+struct x::y
+{
+typedef double result2;
+};
+
+int main()
+{
+x::zy::result2 xxx;
+x::y::result2 xxx2;
+}


C++ PATCH for c++/59347 (ICE with ill-formed typedef in template)

2014-02-21 Thread Jason Merrill
An earlier patch of mine changed the compiler to retain erroneous 
declarations to provide better error-recovery behavior.  But that's 
causing problems with nested typedefs, so let's not bother in that case.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 85cffc1cc3fe706d61a417cf6a1139f546a458e9
Author: Jason Merrill 
Date:   Fri Feb 21 13:59:45 2014 -0500

	PR c++/59347
	* pt.c (tsubst_decl) [TYPE_DECL]: Don't try to instantiate an
	erroneous typedef.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 91a8840..2dc5f32 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10824,6 +10824,9 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	tree type = NULL_TREE;
 	bool local_p;
 
+	if (TREE_TYPE (t) == error_mark_node)
+	  RETURN (error_mark_node);
+
 	if (TREE_CODE (t) == TYPE_DECL
 	&& t == TYPE_MAIN_DECL (TREE_TYPE (t)))
 	  {
diff --git a/gcc/testsuite/g++.dg/template/typedef41.C b/gcc/testsuite/g++.dg/template/typedef41.C
new file mode 100644
index 000..dc25518
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/typedef41.C
@@ -0,0 +1,8 @@
+// PR c++/59347
+
+template struct A
+{
+  typedef int ::X;		// { dg-error "" }
+};
+
+A<0> a;


C++ PATCH for c++/60187 (ICE with bare parameter pack in enum-base)

2014-02-21 Thread Jason Merrill

Yet another place where we need to check for bare parameter packs.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 4e02d1498063b3ffa31d3fe35682b0c94667360c
Author: Jason Merrill 
Date:   Fri Feb 21 14:03:36 2014 -0500

	PR c++/60187
	* parser.c (cp_parser_enum_specifier): Call
	check_for_bare_parameter_packs.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6f19ae2..7bbdf90 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15376,7 +15376,8 @@ cp_parser_enum_specifier (cp_parser* parser)
 {
   underlying_type = grokdeclarator (NULL, &type_specifiers, TYPENAME,
 /*initialized=*/0, NULL);
-  if (underlying_type == error_mark_node)
+  if (underlying_type == error_mark_node
+	  || check_for_bare_parameter_packs (underlying_type))
 underlying_type = NULL_TREE;
 }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/enum_base2.C b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C
new file mode 100644
index 000..8c6a901
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C
@@ -0,0 +1,9 @@
+// PR c++/60187
+// { dg-require-effective-target c++11 }
+
+template struct A
+{
+  enum E : T {};		// { dg-error "parameter pack" }
+};
+
+A a;


C++ PATCH for c++/60186 (ICE with constexpr and init-list in template)

2014-02-21 Thread Jason Merrill
My earlier massage_init_elt patch neglected to call 
fold_non_dependent_expr before maybe_constant_init.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit b77241e3be8b3eb4247d07e2f2967cbb585e08bc
Author: Jason Merrill 
Date:   Fri Feb 21 14:37:17 2014 -0500

	PR c++/60186
	* typeck2.c (massage_init_elt): Call fold_non_dependent_expr_sfinae.

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 546b83f..8877286 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1131,7 +1131,10 @@ massage_init_elt (tree type, tree init, tsubst_flags_t complain)
 init = TARGET_EXPR_INITIAL (init);
   /* When we defer constant folding within a statement, we may want to
  defer this folding as well.  */
-  init = maybe_constant_init (init);
+  tree t = fold_non_dependent_expr_sfinae (init, complain);
+  t = maybe_constant_value (t);
+  if (TREE_CONSTANT (t))
+init = t;
   return init;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C
new file mode 100644
index 000..6fea82f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C
@@ -0,0 +1,7 @@
+// PR c++/60186
+// { dg-require-effective-target c++11 }
+
+template void foo(int i)
+{
+  constexpr int a[] = { i };	// { dg-error "" }
+}


Re: C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)

2014-02-21 Thread Jason Merrill

On 02/21/2014 09:10 AM, Jason Merrill wrote:

While parsing the template parameter list for a lambda, we've already
pushed into the closure class but haven't created the op()
FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer
of op() breaks.  Avoid the ICE by not trying to capture 'this' when
parsing a parameter list.


On second thought, I'd rather not depend on the parsing state here, 
since we don't always update current_binding_level during template 
instantiation.  So let's check for the actual problem instead.


Tested x86_64-pc-linux-gnu, applying to trunk.


commit 5ca06118071f28b060b751415d18f8af4968a0a4
Author: Jason Merrill 
Date:   Fri Feb 21 15:06:47 2014 -0500

	PR c++/60252
	* lambda.c (maybe_resolve_dummy): Check lambda_function rather
	than current_binding_level.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 7fe235b..277dec6 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -749,10 +749,8 @@ maybe_resolve_dummy (tree object)
   if (type != current_class_type
   && current_class_type
   && LAMBDA_TYPE_P (current_class_type)
-  && DERIVED_FROM_P (type, current_nonlambda_class_type ())
-  /* If we get here while parsing the parameter list of a lambda, it
-	 will fail, so don't even try (c++/60252).  */
-  && current_binding_level->kind != sk_function_parms)
+  && lambda_function (current_class_type)
+  && DERIVED_FROM_P (type, current_nonlambda_class_type ()))
 {
   /* In a lambda, need to go through 'this' capture.  */
   tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type);


C++ PATCH for c++/60185 (ICE with invalid default arg in template)

2014-02-21 Thread Jason Merrill
To avoid problems trying to resolve an invalid use of 'this' before 
diagnosing it later, let's do the same thing we do in 
tsubst_default_argument, namely clear current_class_{ptr,ref}.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f1051ca23020746350bacff3c499b2a9d1ec0dff
Author: Jason Merrill 
Date:   Fri Feb 21 15:08:28 2014 -0500

	PR c++/60185
	* parser.c (cp_parser_default_argument): Clear
	current_class_ptr/current_class_ref like tsubst_default_argument.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 7bbdf90..47a67c4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18633,8 +18633,24 @@ cp_parser_default_argument (cp_parser *parser, bool template_parm_p)
   /* Parse the assignment-expression.  */
   if (template_parm_p)
 push_deferring_access_checks (dk_no_deferred);
+  tree saved_class_ptr = NULL_TREE;
+  tree saved_class_ref = NULL_TREE;
+  /* The "this" pointer is not valid in a default argument.  */
+  if (cfun)
+{
+  saved_class_ptr = current_class_ptr;
+  cp_function_chain->x_current_class_ptr = NULL_TREE;
+  saved_class_ref = current_class_ref;
+  cp_function_chain->x_current_class_ref = NULL_TREE;
+}
   default_argument
 = cp_parser_initializer (parser, &is_direct_init, &non_constant_p);
+  /* Restore the "this" pointer.  */
+  if (cfun)
+{
+  cp_function_chain->x_current_class_ptr = saved_class_ptr;
+  cp_function_chain->x_current_class_ref = saved_class_ref;
+}
   if (BRACE_ENCLOSED_INITIALIZER_P (default_argument))
 maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
   if (template_parm_p)
diff --git a/gcc/testsuite/g++.dg/overload/defarg5.C b/gcc/testsuite/g++.dg/overload/defarg5.C
index 06ea6bf..d022b0c 100644
--- a/gcc/testsuite/g++.dg/overload/defarg5.C
+++ b/gcc/testsuite/g++.dg/overload/defarg5.C
@@ -2,6 +2,6 @@
 
 struct A
 {
-  int i;
-  A() { void foo(int=i); }	// { dg-error "this" }
+  int i;			// { dg-message "" }
+  A() { void foo(int=i); }	// { dg-error "" }
 };
diff --git a/gcc/testsuite/g++.dg/template/defarg17.C b/gcc/testsuite/g++.dg/template/defarg17.C
new file mode 100644
index 000..38d68d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/defarg17.C
@@ -0,0 +1,9 @@
+// PR c++/60185
+
+template struct A
+{
+  int i;			// { dg-message "" }
+  A() { void foo(int=i); }	// { dg-error "" }
+};
+
+A<0> a;


C++ PATCH for c++/60108 (ICE with defaulted virtual in template)

2014-02-21 Thread Jason Merrill
emit_associated_thunks expects DECL_INTERFACE_KNOWN to be set, but we 
weren't setting it in this case (as opposed to the case where the 
destructor is implicitly declared) because it has 
DECL_TEMPLATE_INSTANTIATION set.  Fixed by checking for 
DECL_DEFAULTED_FN as well.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 670511e83f8bb5df8dd87bfbd3b8a9625ba9963f
Author: Jason Merrill 
Date:   Fri Feb 21 15:37:45 2014 -0500

	PR c++/60108
	* semantics.c (expand_or_defer_fn_1): Check DECL_DEFAULTED_FN.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 6f32496..85d6807 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3986,7 +3986,7 @@ expand_or_defer_fn_1 (tree fn)
 	 linkage of all functions, and as that causes writes to
 	 the data mapped in from the PCH file, it's advantageous
 	 to mark the functions at this point.  */
-	  if (!DECL_IMPLICIT_INSTANTIATION (fn))
+	  if (!DECL_IMPLICIT_INSTANTIATION (fn) || DECL_DEFAULTED_FN (fn))
 	{
 	  /* This function must have external linkage, as
 		 otherwise DECL_INTERFACE_KNOWN would have been
diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted48.C b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C
new file mode 100644
index 000..727afc5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C
@@ -0,0 +1,17 @@
+// PR c++/60108
+// { dg-require-effective-target c++11 }
+
+template struct A
+{
+  virtual ~A();
+};
+
+template struct B : A<0>, A<1>
+{
+  ~B() = default;
+};
+
+struct C : B
+{
+  C() {}
+};


Re: [google gcc-4_8] not split bb for machine dependent builtins

2014-02-21 Thread Xinliang David Li
Ok. I expect this also submitted  to trunk later.

David

On Fri, Feb 21, 2014 at 2:08 PM, Rong Xu  wrote:
> Hi,
>
> For builtins without nothrow attributes, we currently split bb by adding
> fake edge to func_exit in instrumenting profile counters. While it's safe,
> The resulted control flow and additional counters drastically increase the
> compile time for programs with lots of builtin calls.
> This patch suppresses the adding of the fake edges for machine dependent
> builtins.
>
> This is for google branch only.
>
> Tested with SPEC2006, google internal benchmarks and bootstrap.
>
> OK to commit?
>
> Thanks,
>
> -Rong
>
>


C++ PATCH for c++/58170 (ICE with alias template)

2014-02-21 Thread Jason Merrill
There's no reason why we wouldn't check for dependent scopes when 
parsing the target of an alias declaration, and indeed not doing so led 
to the ICE here.


The rest of the patch improves the diagnostic for this testcase (and 
some others).


Tested x86_64-pc-linux-gnu, applying to trunk.  Also applying the 
cp_parser_type_name hunk to 4.8.
commit 21f4a8a5550498513e1235239b69aa5bc537687b
Author: Jason Merrill 
Date:   Fri Feb 21 16:58:21 2014 -0500

	PR c++/58170
	* parser.c (cp_parser_type_name): Always check dependency.
	(cp_parser_type_specifier_seq): Call
	cp_parser_parse_and_diagnose_invalid_type_name.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 47a67c4..1e98032 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -14763,7 +14763,7 @@ cp_parser_type_name (cp_parser* parser)
 	 instantiation of an alias template...  */
   type_decl = cp_parser_template_id (parser,
 	 /*template_keyword_p=*/false,
-	 /*check_dependency_p=*/false,
+	 /*check_dependency_p=*/true,
 	 none_type,
 	 /*is_declaration=*/false);
   /* Note that this must be an instantiation of an alias template
@@ -18083,7 +18083,16 @@ cp_parser_type_specifier_seq (cp_parser* parser,
 	 type-specifier-seq at all.  */
 	  if (!seen_type_specifier)
 	{
-	  cp_parser_error (parser, "expected type-specifier");
+	  /* Set in_declarator_p to avoid skipping to the semicolon.  */
+	  int in_decl = parser->in_declarator_p;
+	  parser->in_declarator_p = true;
+
+	  if (cp_parser_uncommitted_to_tentative_parse_p (parser)
+		  || !cp_parser_parse_and_diagnose_invalid_type_name (parser))
+		cp_parser_error (parser, "expected type-specifier");
+
+	  parser->in_declarator_p = in_decl;
+
 	  type_specifier_seq->type = error_mark_node;
 	  return;
 	}
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C
new file mode 100644
index 000..f8bff78
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C
@@ -0,0 +1,33 @@
+// PR c++/58170
+// { dg-require-effective-target c++11 }
+// { dg-prune-output "not declared" }
+// { dg-prune-output "expected" }
+
+template 
+struct base {
+  template 
+  struct derived;
+};
+
+template 
+template 
+struct base::derived : public base {
+};
+
+// This (wrong?) alias declaration provokes the crash.
+template 
+using alias = base::derived; // { dg-error "template|typename" }
+
+// This one works:
+// template 
+// using alias = typename base::template derived;
+
+template 
+void f() {
+  alias m{};
+  (void) m;
+}
+
+int main() {
+  f();
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/error8.C b/gcc/testsuite/g++.dg/cpp0x/error8.C
index cc4f877..a992077 100644
--- a/gcc/testsuite/g++.dg/cpp0x/error8.C
+++ b/gcc/testsuite/g++.dg/cpp0x/error8.C
@@ -3,5 +3,5 @@
 
 struct A
 {
-  int* p = new foo; // { dg-error "16:expected type-specifier" }
+  int* p = new foo; // { dg-error "16:foo. does not name a type" }
 };
diff --git a/gcc/testsuite/g++.dg/cpp0x/override4.C b/gcc/testsuite/g++.dg/cpp0x/override4.C
index aec5c2c..695f9a3 100644
--- a/gcc/testsuite/g++.dg/cpp0x/override4.C
+++ b/gcc/testsuite/g++.dg/cpp0x/override4.C
@@ -16,12 +16,12 @@ struct B2
 
 struct B3
 {
-  virtual auto f() -> final void; // { dg-error "expected type-specifier" }
+  virtual auto f() -> final void; // { dg-error "type" }
 };
 
 struct B4
 {
-  virtual auto f() -> final void {} // { dg-error "expected type-specifier" }
+  virtual auto f() -> final void {} // { dg-error "type" }
 };
 
 struct D : B
@@ -36,10 +36,10 @@ struct D2 : B
 
 struct D3 : B
 {
-  virtual auto g() -> override void; // { dg-error "expected type-specifier" }
+  virtual auto g() -> override void; // { dg-error "type" }
 };
 
 struct D4 : B
 {
-  virtual auto g() -> override void {} // { dg-error "expected type-specifier" }
+  virtual auto g() -> override void {} // { dg-error "type" }
 };
diff --git a/gcc/testsuite/g++.dg/ext/underlying_type1.C b/gcc/testsuite/g++.dg/ext/underlying_type1.C
index a8f68d3..999cd9f 100644
--- a/gcc/testsuite/g++.dg/ext/underlying_type1.C
+++ b/gcc/testsuite/g++.dg/ext/underlying_type1.C
@@ -8,7 +8,7 @@ template
   { typedef __underlying_type(T) type; }; // { dg-error "not an enumeration" }
 
 __underlying_type(int) i1; // { dg-error "not an enumeration|invalid" }
-__underlying_type(A)   i2; // { dg-error "expected" }
+__underlying_type(A)   i2; // { dg-error "expected|type" }
 __underlying_type(B)   i3; // { dg-error "not an enumeration|invalid" }
 __underlying_type(U)   i4; // { dg-error "not an enumeration|invalid" }
 
diff --git a/gcc/testsuite/g++.dg/parse/crash48.C b/gcc/testsuite/g++.dg/parse/crash48.C
index 4541548..020ddf0 100644
--- a/gcc/testsuite/g++.dg/parse/crash48.C
+++ b/gcc/testsuite/g++.dg/parse/crash48.C
@@ -5,5 +5,5 @@ void
 foo (bool b)
 {
   if (b)
-try { throw 0; } catch (X) { }	// { dg-error "expected type-specifier before" }
+try { throw 0; } catch (X) { }	// { dg-error "type