Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Jakub Jelinek
On Mon, Apr 18, 2011 at 03:33:18PM -0700, Jason Merrill wrote:
> On 04/18/2011 02:40 PM, Jakub Jelinek wrote:
> >If TREE_BINFO has BV_VCALL_INDEX set, this needs to be dynamically
> >adjusted, but none of the callers are prepared to handle that.
> 
> Well, it means that we do dynamic adjustment at runtime.  If we're
> able to do devirtualization, we should be able to figure out the
> right offset as well, just not in 4.6.

Sure, but how exactly?  If BV_VCALL_INDEX is non-NULL,
I guess we need to find for the given binfo corresponding rtti_binfo
that was (or would be if vtable isn't emitted in the current TU)
used in build_rtti_vtbl_entries to compute the difference between
BINFO_OFFSETs:
  offset = size_diffop_loc (input_location,
BINFO_OFFSET (vid->rtti_binfo), BINFO_OFFSET (b));
That is the value actually added on top of the fixed delta, right?
Or do you think it should just add to the delta the value read from the
vcall_offset from the vtable?

But first of all PR48674 needs solving on the trunk.
I think cgraph thunk nodes have fixed_offset field so it is easy to figure
out that adjustment, but for vcall_offset adjustment the same applies here,
is there a way to find out what value will be added, or does it need to be
dynamic?  And what kind of info do we need to find that out.

Jakub


FDO usage: -Wcoverage-mismatch should not ignore -Wno-error

2011-04-19 Thread Xinliang David Li
-Wcoverage-mismatch is enabled by default, and the warning is promoted
to error by default. However in the current implementation -Wno-error
can not demote the error back to warning. The patch was ported from
one contributed by Neil.

OK for trunk after regression testing?


2011-04-18  Neil Vachharajani  

* flags.c:  New flag variable.
* opts.c (common_handle_options): Set flag_werror_set.
* opts-global.c (decode_options): Delay Werror decision
for Wcoverage-mismatch util after options are parsed.

The following test case can be added, but the test harness does not
like the extra warnings -- how can they be pruned?

Thanks,

David

/* { dg-options "-O2 -Wcoverage-mismatch -Wno-error" } */

int __attribute__((noinline)) bar (void)
{
}

#ifdef _PROFILE_USE
int foo (int i)
{
  if (i)
bar ();
  else
   bar ();
  return 0;
}
#else
int foo (int i)
{
  if (i)
bar ();
  return 0;
}
#endif

int main(int argc, char **argv)
{
  foo (argc);
  return 0;
}


wc.p
Description: Binary data


Re: [Patch, fortran] Use xcalloc instead of gfc_getmem

2011-04-19 Thread Janne Blomqvist
On Mon, Apr 18, 2011 at 23:50, Steve Kargl
 wrote:
> On Mon, Apr 18, 2011 at 11:41:33PM +0300, Janne Blomqvist wrote:
>> Hi,
>>
>> the attached patch replaces gfc_getmem with calls to xcalloc (from
>> libiberty). Apart from reducing duplicated code, calloc is better than
>> malloc + memset, as the allocator knows that the kernel always gives
>> out zeroed pages so in some cases it can avoid memset'in the area.
>>
>> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
>>
>
> OK.

Thanks; I'll commit it later today when I get home.

> (I suppose we'll never have sizeof(char) != 1).

Indeed, ISO C defines sizeof(char) == 1. See 6.5.3.4 (page 80) in n1256.pdf.

-- 
Janne Blomqvist


Re: [Patch, fortran] Use xcalloc instead of gfc_getmem

2011-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2011 at 10:41:33AM +0300, Janne Blomqvist wrote:
> On Mon, Apr 18, 2011 at 23:50, Steve Kargl
>  wrote:
> > On Mon, Apr 18, 2011 at 11:41:33PM +0300, Janne Blomqvist wrote:
> >> Hi,
> >>
> >> the attached patch replaces gfc_getmem with calls to xcalloc (from
> >> libiberty). Apart from reducing duplicated code, calloc is better than
> >> malloc + memset, as the allocator knows that the kernel always gives
> >> out zeroed pages so in some cases it can avoid memset'in the area.
> >>
> >> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
> >>
> >
> > OK.
> 
> Thanks; I'll commit it later today when I get home.

Please don't.  You should be using macros from libiberty.h like rest
of gcc instead.  So instead of xcalloc use XCNEW, XCNEWVEC or XCNEWVAR
and remove the casts.

Jakub


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 2:15 AM, Jan Hubicka  wrote:
>> > Hi!
>> >
>> > If TREE_BINFO has BV_VCALL_INDEX set, this needs to be dynamically
>> > adjusted, but none of the callers are prepared to handle that.
>> > Thus, this patch makes devirtualization give up in those cases.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, trunk and 4.6.
>> > On the trunk the testcase ICEs before and after the patch in some new 
>> > callgraph
>> > checking (added today or so, Honza?), on the branch it works just fine.
>>
>> The ICE for me is:
>> jakub.C:77:1: error: a call to thunk improperly represented in the call 
>> graph:
>> # VUSE <.MEM_11>
>> D.2319_3 = D::_ZTv0_n24_NK1D1mEv (&q.D.2159);
>>
>> void test()/26(14) @0x777066f0 (asm: _Z4testv) availability:available 
>> analyzed needed reachable body externally_visible finalized
>>   called by: int main()/28 (1.00 per call) (can throw external)
>>   calls: void bar(int)/1 (1.00 per call) void bar(int)/1 (1.00 per call) 
>> virtual int D::m() const/15 (1.00 per call) D::D(const A&)/14 (1.00 per call)
>>   References:
>>   Refering this function:
>>   has 3 outgoing edges for indirect calls.
>>
>> this is not my snaity check, but Martins and I think it means that somehow 
>> your
>> change is ineffective on mainline and we still devirtualize.
>> Martin, I declare this problem yours.
>
> Actually what happens here is that CCP devirtualize by propagating the
> constructors and due to Richard's new code to drop OBJ_TYPE_REF we finally get
> a direct call.  This is all good and desirable.
>
> I think good solution would be to fold further and inline the thunk 
> adjustment,
> just like the type based devirtualization does.  Even once I get far enough
> with my cgraph cleanuping project to make cgraph represent thunks nicely, we
> would win if in these cases ccp and other passes simply inlined the this 
> adjustment,
> like we do with type based devirtualization already.
> Martin, I guess it is matter of looking up the thunk info by associated 
> cgraph node
> alias and extending fold_stmts of passes that now drop the OBJ_TYPE_REF 
> wrappers?

Huh.  No, I don't think we want to do any "inlining" as part of
folding.  At least not if it
is a correctness issue (is it?).  Why does the inliner not simply
inline the thunk function
body?

Richard.

> Honza
>


Re: FDO usage: -Wcoverage-mismatch should not ignore -Wno-error

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 9:13 AM, Xinliang David Li  wrote:
> -Wcoverage-mismatch is enabled by default, and the warning is promoted
> to error by default. However in the current implementation -Wno-error
> can not demote the error back to warning. The patch was ported from
> one contributed by Neil.
>
> OK for trunk after regression testing?

I am sure there is a better way to achieve this, like making
Werror=coverage-mismatch
the default.  Joseph?

Richard.

>
> 2011-04-18  Neil Vachharajani  
>
>    * flags.c:  New flag variable.
>    * opts.c (common_handle_options): Set flag_werror_set.
>    * opts-global.c (decode_options): Delay Werror decision
>    for Wcoverage-mismatch util after options are parsed.
>
> The following test case can be added, but the test harness does not
> like the extra warnings -- how can they be pruned?
>
> Thanks,
>
> David
>
> /* { dg-options "-O2 -Wcoverage-mismatch -Wno-error" } */
>
> int __attribute__((noinline)) bar (void)
> {
> }
>
> #ifdef _PROFILE_USE
> int foo (int i)
> {
>  if (i)
>    bar ();
>  else
>   bar ();
>  return 0;
> }
> #else
> int foo (int i)
> {
>  if (i)
>    bar ();
>  return 0;
> }
> #endif
>
> int main(int argc, char **argv)
> {
>  foo (argc);
>  return 0;
> }
>


[vms] committed: add wrappers for ld and ar

2011-04-19 Thread Tristan Gingold
Hi,

as the native vms linker ('link') doesn't follow at all the unix convention, we 
need to convert and massage the command line
before invoking the native linker.  The easiest and least intrusive way is the 
use of a wrapper.  It deals with command line length
limitation, filename, extension, library search, misc options and support of 
dwarf on alpha vms.

Alongside, we also have a wrapper for ar (librarian on vms).

They are automatically built unless gcc is configured with --with-gnu-ld.  They 
can also be compiled on unix to unix to check
for errors, also the binary is useless.

I have also slightly cleaned up config.gcc to group common VMS definitions.

Tested by doing cross-builds for both alpha and ia64 vms.

Committed on trunk.

Tristan.

gcc/
2011-04-19  Tristan Gingold  

* config.gcc (-*-*-*vms): Added.
(alpha64-dec-*vms*,alpha*-dec-*vms*, ia64-hp-*vms*): Common
definitions moved.
* config/vms/vms-ld.c: New file.
* config/vms/vms-ar.c: New file.
* config/vms/t-vmsnative: New file.


Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 172694)
+++ gcc/config.gcc  (working copy)
@@ -668,6 +668,16 @@
 *-*-solaris2*)
   extra_options="${extra_options} sol2.opt"
   ;;
+*-*-*vms*)
+  extra_options="${extra_options} vms/vms.opt"
+  xmake_file=vms/x-vms
+  tmake_file="vms/t-vms"
+  if test x$gnu_ld != xyes; then
+# Build wrappers for native case.
+extra_programs="ld\$(exeext) ar\$(exeext)"
+tmake_file="$tmake_file vms/t-vmsnative"
+  fi
+  ;;
 *-*-vxworks*)
   tmake_file=t-vxworks
   xm_defines=POSIX
@@ -752,20 +762,12 @@
 alpha64-dec-*vms*)
tm_file="${tm_file} alpha/vms.h alpha/vms64.h"
xm_file="alpha/xm-vms.h vms/xm-vms64.h"
-   tmake_file="alpha/t-alpha vms/t-vms alpha/t-vms alpha/t-ieee"
-   xmake_file=vms/x-vms
-   exeext=.exe
-   install_headers_dir=install-headers-cp
-   extra_options="${extra_options} vms/vms.opt"
+   tmake_file="${tmake_file} alpha/t-alpha vms/t-vms64 alpha/t-vms 
alpha/t-ieee"
;;
 alpha*-dec-*vms*)
tm_file="${tm_file} alpha/vms.h"
xm_file="alpha/xm-vms.h"
-   tmake_file="alpha/t-alpha vms/t-vms alpha/t-vms alpha/t-ieee"
-   xmake_file=vms/x-vms
-   exeext=.exe
-   install_headers_dir=install-headers-cp
-   extra_options="${extra_options} vms/vms.opt"
+   tmake_file="${tmake_file} alpha/t-alpha alpha/t-vms alpha/t-ieee"
;;
 arm-wrs-vxworks)
tm_file="elfos.h arm/elf.h arm/aout.h ${tm_file} vx-common.h vxworks.h 
arm/vxworks.h"
@@ -1582,16 +1584,13 @@
 ia64-hp-*vms*)
tm_file="${tm_file} elfos.h ia64/sysv4.h ia64/elf.h ia64/vms.h 
ia64/vms64.h"
xm_file="vms/xm-vms.h vms/xm-vms64.h"
-   tmake_file="vms/t-vms ia64/t-ia64 ia64/t-vms"
-   xmake_file=vms/x-vms
+   tmake_file="${tmake_file} vms/t-vms64 ia64/t-ia64 ia64/t-vms"
target_cpu_default="0"
if test x$gas = xyes
then
target_cpu_default="${target_cpu_default}|MASK_GNU_AS"
fi
-   exeext=.exe
-   install_headers_dir=install-headers-cp
-   extra_options="${extra_options} vms/vms.opt ia64/vms.opt"
+   extra_options="${extra_options} ia64/vms.opt"
;;
 iq2000*-*-elf*)
 tm_file="elfos.h newlib-stdint.h iq2000/iq2000.h"
Index: gcc/config/vms/vms-ld.c
===
--- gcc/config/vms/vms-ld.c (revision 0)
+++ gcc/config/vms/vms-ld.c (revision 0)
@@ -0,0 +1,968 @@
+/* VMS linker wrapper.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by AdaCore
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* This program is a wrapper around the VMS linker.
+   It translates Unix style command line options into corresponding
+   VMS style qualifiers and then spawns the VMS linker.
+
+   It is possible to build this program on UNIX but only for the purpose of
+   checking for errors.  */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "libiberty.h"
+#include 
+#include 
+
+/* Macro for logicals.  */
+#define LNM__STRING 2
+#define LNM_C_NAMLENGTH 255
+#define PSL_C_SUPER 2
+#define PSL_C_USER 3
+
+/* Local variable declarations.  */
+static int ld_nocall_debug = 0;
+static int ld_mkthreads = 0;
+static int ld_upcalls = 0;
+
+/* verbose 

Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Georg-Johann Lay
Anatoly Sokolov schrieb:
> Hi.
>>
>> +/* To track if code will use .bss and/or .data */
>> +static int avr_need_clear_bss_p = 0;
>> +static int avr_need_copy_data_p = 0;
> 
> Change type avr_need_clear_bss_p and avr_need_copy_data_p vars to bool.
> 
>> [ASM_OUTPUT_COMMON]
> 
> Use ASM_OUTPUT_ALIGNED_DECL_COMMON macro instead of ASM_OUTPUT_COMMON.
> 
>> [ASM_OUTPUT_LOCAL]
> 
> Use ASM_OUTPUT_ALIGNED_DECL_LOCAL macro instead of ASM_OUTPUT_LOCAL.
> 
> Anatoly.


Voila


PR target/18145

* config/avr/avr.h (TARGET_ASM_INIT_SECTIONS): Delete.
(ASM_OUTPUT_COMMON, ASM_OUTPUT_LOCAL): Delete.
(ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
(ASM_OUTPUT_ALIGNED_DECL_LOCAL): Define.
(TARGET_ASM_NAMED_SECTION): Change to avr_asm_named_section.

* config/avr/avr-protos.h (avr_asm_output_aligned_decl_local,
avr_asm_output_aligned_decl_common): New prototypes.

* config/avr/avr.c (TARGET_ASM_INIT_SECTIONS): Define.
(avr_asm_named_section, avr_asm_output_aligned_decl_local,
avr_asm_output_aligned_common, avr_output_data_section_asm_op,
avr_output_bss_section_asm_op): New functions to update...
(avr_need_clear_bss_p, avr_need_copy_data_p): ...these new variables.
(avr_asm_init_sections): Overwrite section callbacks for
data_section, bss_section.
(avr_file_start): Move output of __do_copy_data, __do_clear_bss
from here to...
(avr_file_end): ...here.
Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(Revision 172597)
+++ config/avr/avr-protos.h	(Arbeitskopie)
@@ -36,6 +36,8 @@ extern int avr_hard_regno_rename_ok (uns
 extern rtx avr_return_addr_rtx (int count, rtx tem);
 
 #ifdef TREE_CODE
+extern void avr_asm_output_aligned_decl_local (FILE*, const_tree, const char*, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT);
+extern void avr_asm_output_aligned_decl_common (FILE*, const_tree, const char*, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT);
 extern void asm_output_external (FILE *file, tree decl, char *name);
 extern int avr_progmem_p (tree decl, tree attributes);
 
Index: config/avr/avr.c
===
--- config/avr/avr.c	(Revision 172597)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -108,6 +108,7 @@ static void avr_function_arg_advance (CU
   const_tree, bool);
 static void avr_help (void);
 static bool avr_function_ok_for_sibcall (tree, tree);
+static void avr_asm_named_section (const char *name, unsigned int flags, tree decl);
 
 /* Allocate registers from r25 to r8 for parameters for function calls.  */
 #define FIRST_CUM_REG 26
@@ -132,6 +133,10 @@ const struct mcu_type_s *avr_current_dev
 
 section *progmem_section;
 
+/* To track if code will use .bss and/or .data */
+bool avr_need_clear_bss_p = false;
+bool avr_need_copy_data_p = false;
+
 /* AVR attributes.  */
 static const struct attribute_spec avr_attribute_table[] =
 {
@@ -197,6 +202,12 @@ static const struct default_options avr_
 #define TARGET_INSERT_ATTRIBUTES avr_insert_attributes
 #undef TARGET_SECTION_TYPE_FLAGS
 #define TARGET_SECTION_TYPE_FLAGS avr_section_type_flags
+
+/* `TARGET_ASM_NAMED_SECTION' must be defined in avr.h */
+
+#undef TARGET_ASM_INIT_SECTIONS
+#define TARGET_ASM_INIT_SECTIONS avr_asm_init_sections
+
 #undef TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST avr_register_move_cost
 #undef TARGET_MEMORY_MOVE_COST
@@ -5190,6 +5201,63 @@ avr_output_progmem_section_asm_op (const
   fprintf (asm_out_file, "\t.p2align 1\n");
 }
 
+
+/* Implement `ASM_OUTPUT_ALIGNED_DECL_LOCAL' */
+/* Track need of __do_clear_bss */
+
+void
+avr_asm_output_aligned_decl_local (FILE * stream, const_tree decl ATTRIBUTE_UNUSED,
+   const char *name, unsigned HOST_WIDE_INT size,
+   unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED)
+{
+  avr_need_clear_bss_p = true;
+  
+  fputs ("\t.lcomm ", stream);
+  assemble_name (stream, name);
+  fprintf (stream, ",%d\n", (int) size);
+}
+
+
+/* Implement `ASM_OUTPUT_ALIGNED_DECL_COMMON' */
+/* Track need of __do_clear_bss */
+
+void
+avr_asm_output_aligned_decl_common (FILE * stream, const_tree decl ATTRIBUTE_UNUSED,
+const char *name, unsigned HOST_WIDE_INT size,
+unsigned HOST_WIDE_INT rounded ATTRIBUTE_UNUSED)
+{
+  avr_need_clear_bss_p = true;
+  
+  fputs ("\t.comm ", stream);
+  assemble_name (stream, name);
+  fprintf (stream, ",%lu,1\n", (unsigned long) size);
+}
+
+
+/* Unnamed section callback for data_section
+   to track need of __do_copy_data */
+
+static void
+avr_output_data_section_asm_op (const void *data)
+{
+  avr_need_copy_data_p = true;
+  /* Dispatch to default */
+  output_section_asm_op (data);
+}
+
+
+/* Unnamed section callback for bss_section
+   to to track need of __do_clear_bss */
+
+

[PATCH] Fix PR46188

2011-04-19 Thread Richard Guenther

This fixes PR46188 on the 4.5 branch by backporting rev. 159907.  Instead
of carrying over the checking bits I simply removed them on the branch
(they had one trivial fallout originally, PR44295).

Bootstrapped and tested on x86_64-unknown-linux-gnu, {,-m32} for
all languages including Ada and Objective-C++ with checking enabled.

I wasn't able to distill a testcase without output to stdout.

Installed on the branch.

Richard.

2011-04-19  Richard Guenther  

PR tree-optimization/46188
Backported from 4.6 branch
2010-05-26  Jan Hubicka  

* cgraphunit.c (clone_of_p): Remove.
(verify_cgraph_node): Do not verify clones.
(cgraph_materialize_all_clones): Do no redirection here.
* ipa-inline.c (inline_transform): Do redirection here.

Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c(revision 172653)
+++ gcc/cgraphunit.c(working copy)
@@ -553,15 +553,6 @@ cgraph_mark_if_needed (tree decl)
 cgraph_mark_needed_node (node);
 }
 
-/* Return TRUE if NODE2 is equivalent to NODE or its clone.  */
-static bool
-clone_of_p (struct cgraph_node *node, struct cgraph_node *node2)
-{
-  while (node != node2 && node2)
-node2 = node2->clone_of;
-  return node2 != NULL;
-}
-
 /* Verify cgraph nodes of given cgraph node.  */
 void
 verify_cgraph_node (struct cgraph_node *node)
@@ -777,16 +768,6 @@ verify_cgraph_node (struct cgraph_node *
debug_tree (e->callee->decl);
error_found = true;
  }
-   else if (!node->global.inlined_to
-&& !e->callee->global.inlined_to
-&& !clone_of_p (cgraph_node (decl), e->callee))
- {
-   error ("edge points to wrong declaration:");
-   debug_tree (e->callee->decl);
-   fprintf (stderr," Instead of:");
-   debug_tree (decl);
-   error_found = true;
- }
e->aux = (void *)1;
  }
else
@@ -2397,30 +2378,7 @@ cgraph_materialize_all_clones (void)
 if (!node->analyzed && node->callees)
   cgraph_node_remove_callees (node);
   if (cgraph_dump_file)
-fprintf (cgraph_dump_file, "Updating call sites\n");
-  for (node = cgraph_nodes; node; node = node->next)
-if (node->analyzed && !node->clone_of
-   && gimple_has_body_p (node->decl))
-  {
-struct cgraph_edge *e;
-
-   current_function_decl = node->decl;
-push_cfun (DECL_STRUCT_FUNCTION (node->decl));
-   for (e = node->callees; e; e = e->next_callee)
- cgraph_redirect_edge_call_stmt_to_callee (e);
-   pop_cfun ();
-   current_function_decl = NULL;
-#ifdef ENABLE_CHECKING
-verify_cgraph_node (node);
-#endif
-  }
-  if (cgraph_dump_file)
 fprintf (cgraph_dump_file, "Materialization Call site updates done.\n");
-  /* All changes to parameters have been performed.  In order not to
- incorrectly repeat them, we simply dispose of the bitmaps that drive the
- changes. */
-  for (node = cgraph_nodes; node; node = node->next)
-node->clone.combined_args_to_skip = NULL;
 #ifdef ENABLE_CHECKING
   verify_cgraph ();
 #endif
Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 172653)
+++ gcc/ipa-inline.c(working copy)
@@ -2092,6 +2092,7 @@ inline_transform (struct cgraph_node *no
 {
   unsigned int todo = 0;
   struct cgraph_edge *e;
+  bool inline_p = false;
 
   /* FIXME: Currently the passmanager is adding inline transform more than 
once to some
  clones.  This needs revisiting after WPA cleanups.  */
@@ -2104,10 +2105,13 @@ inline_transform (struct cgraph_node *no
 save_inline_function_body (node);
 
   for (e = node->callees; e; e = e->next_callee)
-if (!e->inline_failed || warn_inline)
-  break;
+{
+  cgraph_redirect_edge_call_stmt_to_callee (e);
+  if (!e->inline_failed || warn_inline)
+   inline_p = true;
+}
 
-  if (e)
+  if (inline_p)
 {
   timevar_push (TV_INTEGRATION);
   todo = optimize_inline_calls (current_function_decl);


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Georg-Johann Lay
Denis Chertykov schrieb:
> 2011/4/18 Georg-Johann Lay :
>> Denis Chertykov schrieb:
>>> 2011/4/17 Denis Chertykov :
 2011/4/15 Georg-Johann Lay :
> Finally, I exposed alternative #3 of the insns to the register
> allocator, because it is not possible to distinguish between
> overlapping or non-overlapping regs, and #3 does not need a scratch.
>
> Ran C-testsuite with no regressions.
 Are you encountered any difference in code size ?
>>> I'm ask about code size because the IRA pass isn't work with
>>> `scratch:MODE' at all.
>> I wonder what the difference is; IRA could treat scratch:MODE just the
>> same way as pseudo:MODE. Anyway, thanks for that insight. It's merely
>> impossible to get the big picture of IRA/reload from digging sources.
>>
>> So that means IRA generates garbage for scratch and reload has to fix
>> it, generate spill code if needed, etc?
> 
> No garbage, IRA just ignore scratch.
> The reload pass will allocate scratch registers if it's possible.
> If it's impossible, then reload pass will spill and generate reload for
> insns with scratches.
> 
>> Does this mean that if IRA sees a pseudo together with constraint "X"
>> it will allocate the pseudo, anyway?
> 
> I think so, but better to test IRA.
> 
>>> This lead to bad/wrong register allocation in IRA pass.
>>> The reload pass will correct such a wrong allocation, but reload can't
>>> generate optimal code. (reload generate correct code).
>>> Because of that, may be you right and may be better to have
>>> (clobber (match_operand)) instead of (clobber (match_scratch...)).
>> I don't see effects on code size, at least for the binaries left by
>> testsuite.
>>
>> Is there a policy how to drive size-performance tests for avr?
>> Anatoly posted a result on code size some weeks ago (concerning move
>> costs), it would be interesting to know his setup and what
>> sources/options he uses. Using testsuite seems not appropriate to me,
>> because it runs -O3, -finline etc. and most code is no real-world
>> code, at least not for avr.
> 
> IMHO libgcc is a good test case.

I don't see differences there, either. This is presumably because
rotate pattern is not used resp. used only very seldom.

>> FYI, what I observe is a remarkable dependency on subreg lowering that
>> can be generalized as this:
>> * code size decreases for mode <= SImode
>> * code size increases for mode > SImode
> 
> I think that DImode move instructions must be supported for representative
> test results for mode > SImode.
> 
> 
> Personally I'm don't want to support DImode for AVR.
> IMHO DImode is an overkill for 8bits controller.

Some people use 64bit on AVR. But to support a movdi would be very
painful, I think it would cause more harm like spill failures than good.

> Another one note:
> Few years ago I have played with early splitting of anything possible
> (move,add,sub,and,...). The results was very bad.
> It's happened because flow of splitted insns (8bits insns) becomes
> unreadable for most of GCC optimisation passes.
> May be splitting is appropriate before register allocation or even after
> reload pass.

The -f[no-]split-wide-types happens before reload, in .subreg1 (prior
to combine) resp. .subreg2 (after combine).

For 64bit, without subreg lowering, there is an insn like

(clobber (reg:DI 43 [  ]))

which causes bunch of moves and finally the setup of a frame pointer
even though nothing lives in the frame.

How can add, sub etc. be split? This would need an explicit
representation of carry.

> Denis.
> PS: about code size tests: try to email directly to Anatoly.
> PPS: I can try to call him.

Johann



Re: PR target/46329: Reject Neon structure constants

2011-04-19 Thread Richard Earnshaw

On Mon, 2011-04-18 at 11:12 +0100, Richard Sandiford wrote:
> Richard Earnshaw  writes:
> > I'm uncomfortable about this.  Generally the ARM port doesn't work well
> > with the target-independent constant pool and it's better to assert that
> > this is empty when it comes to final assembly generation.  Can you
> > clarify by way of example how this patch is working please (ie some
> > sample output).
> 
> Sure, for the testcase in the PR, we get:
> 
> ldr r8, .L3
> vldmia  r8, {d16-d19}
> ...
> .L3:
> .word   .LC0
> ...
> .LC0:
> .word   0
> [x 7]
> 
> (this is at -O0).  With -fPIC we get:
> 
> ldr r3, .L3
> .LPIC0:
> add r3, pc, r3
> vldmia  r3, {d16-d19}
> ...
> .L3:
> .word   .LC0-(.LPIC0+8)
> 
> Richard
> 

I understand now.  I still don't particularly like this, but reworking
the minipools to allow really large constants isn't particularly
palatable either.  So OK.

R.




Re: Fix PR47976

2011-04-19 Thread Richard Guenther
On Tue, Apr 12, 2011 at 6:18 PM, Jeff Law  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 04/07/11 04:47, Bernd Schmidt wrote:
>> PR47976 is a followup to PR47166; the patch there caused this problem.
>>
>> The problem occurs in reload. There are two autoinc addresses which
>> inherit from one another, and we delete an insn that is necessary.
>>
>> We reach this code when reloading the second autoinc address:
>>
>> 6821        if (optimize && REG_P (oldequiv)
>> 6822            && REGNO (oldequiv) < FIRST_PSEUDO_REGISTER
>> 6823            && spill_reg_store[REGNO (oldequiv)]
>> 6824            && REG_P (old)
>> (gdb)
>> 6825            && (dead_or_set_p (insn,
>> 6826                               spill_reg_stored_to[REGNO (oldequiv)])
>> 6827                || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
>> 6828                                old)))
>> 6829          delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
>>
>> reload_inherited[j] is 1 at this point, so oldequiv == reloadreg.
>>
>> (gdb) p debug_rtx (spill_reg_store[7])
>> (insn 719 718 232 10 (set (reg:SI 7 r7)
>>         (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])) -1 (nil))
>> (gdb) p debug_rtx (spill_reg_stored_to[7])
>> (reg:SI 3 r3)
>>
>> Prior to the PR47166 patch, we had spill_reg_store[7] equal to insn 718,
>> which doesn't involve register 7 at all:
>>
>> (insn 718 221 719 10 (set (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])
>>         (plus:SI (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])
>>             (const_int 8 [0x8]))) 4 {*arm_addsi3} (nil))
>>
>> That was sufficient to generate enough confusion to make the compiler
>> think it couldn't delete the output reload.
>>
>> I think the problem is simply that the (set (r7) (r3)) is the opposite
>> direction of a normal spill_reg_store - normally you write a spill reg
>> to its destination, but autoinc reloads are somewhat special.
>>
>> If delete_output_reload isn't valid for (at least some) autoincs, we can
>> simply not record them in spill_reg_store. That's part of the patch
>> below; it seems to fix the problem. I've also deleted the code quoted
>> above since it's pointless to have reload deleting dead stores to
>> registers: that's what DCE is for. I've observed no code generation
>> changes other than for the testcase from either of these changes, with
>> both an ARM and an sh compiler.
>>
>> Comments?
> Looks good to me.  I like letting DCE do its job, particularly if it
> allows us to even trivially simplify this code ;-)

As you are fine with it and a patch deleting more code than it adds
always makes me feel comfortable and as I'm trying to get a 4.5.3
done which is blocked by this bug I will apply the patch after a round
of testing.

The bugzilla audit trail says the patch tests fine on a few archs,
I'm going to test x86_64 everywhere and all my available archs
for a 4.5 branch backport.

Richard.


[google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-19 Thread Guozhi Wei
Reload pass tries to determine the stack frame, so it needs to check the
push/pop lr optimization opportunity. One of the criteria is if there is any
far jump inside the function. Unfortunately at this time gcc can't decide each
instruction's length and basic block layout, so it can't know the offset of
a jump. To be conservative it assumes every jump is a far jump. So any jump
in a function will prevent this push/pop lr optimization.

To enable the push/pop lr optimization in reload pass, I compute the possible
maximum length of the function body. If the length is not large enough, far
jump is not necessary, so we can safely do push/pop lr optimization.

Tested on arm qemu with options -march=armv5te -mthumb, without regression.

This patch is for google/main.

2011-04-19  Guozhi Wei  

Google ref 40255.
* gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
(estimate_function_length): New function.
(thumb_far_jump_used_p): No far jump is needed in short function.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 172689)
+++ gcc/config/arm/arm.c(working copy)
@@ -592,7 +592,9 @@ static const struct default_options arm_
   arm_preferred_rename_class
 
 struct gcc_target targetm = TARGET_INITIALIZER;
-
+
+#define SHORTEST_FAR_JUMP_LENGTH 2040
+
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
 static char * minipool_startobj;
@@ -20298,6 +20300,17 @@ thumb_shiftable_const (unsigned HOST_WID
   return 0;
 }
 
+/* Computes the maximum possible function length. */
+static int
+estimate_function_length (void)
+{
+  rtx insn;
+  int length = 0;
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+length += get_attr_length(insn);
+  return length;
+}
+
 /* Returns nonzero if the current function contains,
or might contain a far jump.  */
 static int
@@ -20316,6 +20329,16 @@ thumb_far_jump_used_p (void)
   if (cfun->machine->far_jump_used)
 return 1;
 
+  /* In reload pass we haven't got the exact jump instruction length,
+ but we can get a reasonable estimation based on the maximum
+ possible function length. */
+  if (!reload_completed)
+{
+  int function_length = estimate_function_length();
+  if (function_length < SHORTEST_FAR_JUMP_LENGTH)
+   return 0;
+}
+
   /* If this function is not being called from the prologue/epilogue
  generation code then it must be being called from the
  INITIAL_ELIMINATION_OFFSET macro.  */

--
This patch is available for review at http://codereview.appspot.com/4441050


Re: PATCH [trunk] gengtype should generate ggc_alloc macros in plugin mode on for plugin files

2011-04-19 Thread Basile Starynkevitch
On Tue, 19 Apr 2011 06:42:58 +0300
Laurynas Biveinis  wrote:
> 
> Did you test this patch that it bootstraps and that a GTY-using plugin
> gets the right set of definitions in the output?

I checked both (using MELT as the plugin to test it). 
I very slightly & trivially improved the patch by updating the copyright year 
of gengtype.h and by 
replacing in the patch of write_typed_struct_alloc_def
+  gcc_assert (s->kind == TYPE_STRUCT || s->kind == TYPE_UNION
+ || s->kind == TYPE_LANG_STRUCT);
with the more readable & shorter equivalent
   gcc_assert (UNION_OR_STRUCT_P (s));



> OK with these changes and confirmation that the patch was tested.

Committed revision 172705.

Thanks
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Denis Chertykov
2011/4/19 Georg-Johann Lay :
> How can add, sub etc. be split? This would need an explicit
> representation of carry.

Yes.

Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html

Denis.


Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 11:41 AM, Guozhi Wei  wrote:
> Reload pass tries to determine the stack frame, so it needs to check the
> push/pop lr optimization opportunity. One of the criteria is if there is any
> far jump inside the function. Unfortunately at this time gcc can't decide each
> instruction's length and basic block layout, so it can't know the offset of
> a jump. To be conservative it assumes every jump is a far jump. So any jump
> in a function will prevent this push/pop lr optimization.
>
> To enable the push/pop lr optimization in reload pass, I compute the possible
> maximum length of the function body. If the length is not large enough, far
> jump is not necessary, so we can safely do push/pop lr optimization.

What about hot/cold partitioning?  That might cause jumps to different
sections.

Richard.

> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
>
> This patch is for google/main.
>
> 2011-04-19  Guozhi Wei  
>
>        Google ref 40255.
>        * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>        (estimate_function_length): New function.
>        (thumb_far_jump_used_p): No far jump is needed in short function.
>
> Index: gcc/config/arm/arm.c
> ===
> --- gcc/config/arm/arm.c        (revision 172689)
> +++ gcc/config/arm/arm.c        (working copy)
> @@ -592,7 +592,9 @@ static const struct default_options arm_
>   arm_preferred_rename_class
>
>  struct gcc_target targetm = TARGET_INITIALIZER;
> -
> +
> +#define SHORTEST_FAR_JUMP_LENGTH 2040
> +
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
>  static char *         minipool_startobj;
> @@ -20298,6 +20300,17 @@ thumb_shiftable_const (unsigned HOST_WID
>   return 0;
>  }
>
> +/* Computes the maximum possible function length. */
> +static int
> +estimate_function_length (void)
> +{
> +  rtx insn;
> +  int length = 0;
> +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +    length += get_attr_length(insn);
> +  return length;
> +}
> +
>  /* Returns nonzero if the current function contains,
>    or might contain a far jump.  */
>  static int
> @@ -20316,6 +20329,16 @@ thumb_far_jump_used_p (void)
>   if (cfun->machine->far_jump_used)
>     return 1;
>
> +  /* In reload pass we haven't got the exact jump instruction length,
> +     but we can get a reasonable estimation based on the maximum
> +     possible function length. */
> +  if (!reload_completed)
> +    {
> +      int function_length = estimate_function_length();
> +      if (function_length < SHORTEST_FAR_JUMP_LENGTH)
> +       return 0;
> +    }
> +
>   /* If this function is not being called from the prologue/epilogue
>      generation code then it must be being called from the
>      INITIAL_ELIMINATION_OFFSET macro.  */
>
> --
> This patch is available for review at http://codereview.appspot.com/4441050
>


Re: [patch] Fix PR lto/48148

2011-04-19 Thread Richard Guenther
On Mon, Apr 18, 2011 at 11:45 PM, Eric Botcazou  wrote:
> Hi,
>
> this is the assembler error during an LTO bootstrap:
>
> cc1.ltrans8.s: Assembler messages:
> cc1.ltrans8.s:249143: Error: symbol `.Ldebug_info0' is already defined
>
> In the assembly file:
>
> .Letext0:
>        .file 93 "/home/eric/svn/gcc/gcc/c-parser.c"
>        .section        .debug_info,"",@progbits
> .Ldebug_info0:
> [...]
>        .file 225 "/usr/include/mpc.h"
>        .file 226 "/usr/include/mpfr.h"
> .Ldebug_info0:
>
>
> output_comp_unit is passed a completely bogus DIE:
>
> DIE   11: DW_TAG_enumeration_type (0x72024a50)
>  abbrev id: 1 offset: 11 mark: 1
>  DW_AT_name: "prec"
>  DW_AT_byte_size: 4
>  DW_AT_decl_file: "/home/eric/svn/gcc/gcc/c-parser.c" (93)
>  DW_AT_decl_line: 5459
>
> created by the call to resolve_addr at line 23613 of dwarf2out.c:
>
>  limbo_die_list = NULL;
>
>  resolve_addr (comp_unit_die ());
>
>
> #20 0x005941a6 in gen_formal_parameter_die (node=0x75dc6000,
>    origin=0x0, emit_name_p=1 '\001', context_die=0x74b28780)
>    at /home/eric/svn/gcc/gcc/dwarf2out.c:18682
> #21 0x00594303 in gen_formal_types_die (
>    function_or_method_type=0x75dcba80, context_die=0x74b28780)
>    at /home/eric/svn/gcc/gcc/dwarf2out.c:18776
> #22 0x00592187 in gen_subprogram_die (decl=0x75dce300,
>    context_die=0x72c13f00) at /home/eric/svn/gcc/gcc/dwarf2out.c:19388
> #23 0x0059b632 in force_decl_die (decl=0x75dce300)
>    at /home/eric/svn/gcc/gcc/dwarf2out.c:21025
> #24 0x0059bb25 in resolve_addr (die=0x73d747d0)
>    at /home/eric/svn/gcc/gcc/dwarf2out.c:23115
> 23115                   force_decl_die (tdecl);
> (gdb) p debug_dwarf_die(die)
> DIE    0: DW_TAG_GNU_call_site (0x73d747d0)
>  abbrev id: 0 offset: 0 mark: 0
>  DW_AT_low_pc: label: *.LVL13657
>  DW_AT_abstract_origin: address
>
>        if (die->die_tag == DW_TAG_GNU_call_site
>            && a->dw_attr == DW_AT_abstract_origin)
>          {
>            tree tdecl = SYMBOL_REF_DECL (a->dw_attr_val.v.val_addr);
>            dw_die_ref tdie = lookup_decl_die (tdecl);
>            if (tdie == NULL
>                && DECL_EXTERNAL (tdecl)
>                && DECL_ABSTRACT_ORIGIN (tdecl) == NULL_TREE)
>              {
>                force_decl_die (tdecl);
>
>
> The DW_TAG_enumeration_type DIE is for an unrelated type of c-parser.c:
>
> static struct c_expr
> c_parser_binary_expression (c_parser *parser, struct c_expr *after)
> {
>  enum prec {
>    PREC_NONE,
>    PREC_LOGOR,
>    PREC_LOGAND,
>    PREC_BITOR,
>    PREC_BITXOR,
>    PREC_BITAND,
>    PREC_EQ,
>    PREC_REL,
>    PREC_SHIFT,
>    PREC_ADD,
>    PREC_MULT,
>    NUM_PRECS
>  };
>
> which happens to be merged with a type at top-level in cpplib.h:
>
> /* C language kind, used when calling cpp_create_reader.  */
> enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC1X,
>             CLK_STDC89, CLK_STDC94, CLK_STDC99, CLK_STDC1X,
>             CLK_GNUCXX, CLK_CXX98, CLK_GNUCXX0X, CLK_CXX0X, CLK_ASM};
>
> so the LANG field of:
>
> struct cpp_options
> {
>  /* Characters between tab stops.  */
>  unsigned int tabstop;
>
>  /* The language we're preprocessing.  */
>  enum c_lang lang;
>
> gets enum prec as type and things go downhill from there.
>
>
> I don't think the two enum types should be merged: fields of structures are
> merged only if they have the same identifier, this should apply here as well.
> Hence the attached patch.  LTO bootstrapped on x86_64-suse-linux, OK for the
> mainline?

That has been on my todo for quite a while now ... I didn't think it was
that simple though ;)

Ok for trunk and also ok for the 4.6 branch if you manage to schedule
testing there.

Thanks,
Richard.

>
> 2011-04-18  Eric Botcazou  
>
>        PR lto/48148
>        * gimple.c (gimple_types_compatible_p_1) : Do not
>        merge the types if they have different enumeration identifiers.
>
>
> --
> Eric Botcazou
>


Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-19 Thread Carrot Wei
On Tue, Apr 19, 2011 at 5:57 PM, Richard Guenther
 wrote:
> On Tue, Apr 19, 2011 at 11:41 AM, Guozhi Wei  wrote:
>> Reload pass tries to determine the stack frame, so it needs to check the
>> push/pop lr optimization opportunity. One of the criteria is if there is any
>> far jump inside the function. Unfortunately at this time gcc can't decide 
>> each
>> instruction's length and basic block layout, so it can't know the offset of
>> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> in a function will prevent this push/pop lr optimization.
>>
>> To enable the push/pop lr optimization in reload pass, I compute the possible
>> maximum length of the function body. If the length is not large enough, far
>> jump is not necessary, so we can safely do push/pop lr optimization.
>
> What about hot/cold partitioning?  That might cause jumps to different
> sections.
>
> Richard.
>

The hot/cold partitioning is disabled in arm backend.
http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00671.html.

thanks
Carrot


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Jan Hubicka
> Huh.  No, I don't think we want to do any "inlining" as part of
> folding.  At least not if it
> is a correctness issue (is it?).  Why does the inliner not simply
> inline the thunk function
> body?

Because thunk functions have no bodies in gimple form and are no functions (at
the moment) in cgraph sense.

What IPA infrastructure needs to learn is to handle inlining of those as well
as compute proper jump functions.  We spoke with Martin on this some time ago,
we ought to get this done in 4.7. Thunks are very irritating spcial cases from
cgraph POV ;(

I however like the idea that code doing devirtualizatio (type based or folding 
based)
will also do adjustment to call the actual functions instead of thunk.  Direct
calls to thunks are quite nonsential.

Honza
s
> 
> Richard.
> 
> > Honza
> >


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Georg-Johann Lay
Denis Chertykov schrieb:
> 2011/4/19 Georg-Johann Lay :
>> How can add, sub etc. be split? This would need an explicit
>> representation of carry.
> 
> Yes.
> 
> Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html

Just skimmed the conversation. I thought about making AVR ISA's
effects on SREG explicit several times, but I always got stuck at some
point.

- It's not only about scheduling (which does not happen for avr) but
  also about moving instructions across jumps.

- Many transformations would happen before reload, but at these stages
the effect on SREG is not yet known in many cases. There is
sophisticated instruction output for many patterns, and their impact
on SREG/CC is not known before reload.

- Making CC explicit would render many single_set insns to PARALLELs
making the optimizers' live much harder or impossible. Imagine
instructions that could be combined. Explicit CC would clutter up
insns and combine won't try to transform the bulky patterns.

- Backend would be much more complicated, harder to maintain and
understand. Almost any insn would have to be changed.

Johann

> 
> Denis.
> 



Re: [patch] Fix PR lto/48148

2011-04-19 Thread Jakub Jelinek
On Mon, Apr 18, 2011 at 11:45:03PM +0200, Eric Botcazou wrote:
> 2011-04-18  Eric Botcazou  
> 
>   PR lto/48148
>   * gimple.c (gimple_types_compatible_p_1) : Do not
>   merge the types if they have different enumeration identifiers.

While I think it is a step in the right direction, IMHO we can reach the
same error if we merge say enums from:
TU1.c:
void foo (void)
{
  enum A { B, C } e = B;
  bar ((int) e);
}
TU2.c:
enum A { B, C } f;
extern void baz (enum A);
void test (void)
{
  baz (0);
}
If enum A from second TU is merged with enum A from first TU, but DIE
for it isn't needed in the partition containing TU2, then
when trying for call site to add DIE for call to baz it needs to be emitted
and it tries to emit it with the incorrect TYPE_CONTEXT.

IMHO types with incompatible TYPE_CONTEXT should never be compatible.
Where incompatible TYPE_CONTEXT is something that needs some thought,
if both TYPE_CONTEXTs are NULL / TRANSLATION_UNIT_DECL, it is
fine to merge them, if it is a FUNCTION_DECL, it better be the same
FUNCTION_DECL, if it is some type, it better be a gtc compatible type.
TYPE_CONTEXT is IMHO relevant to all types, not just ENUMERAL_TYPE.

Then, if we ever want to make LTO code actually debuggable, probably
types with different TYPE_NAMEs shouldn't be considered to be compatible
either, say enum A { B, C } a; in one TU and enum D { B, C } d; in another TU,
otherwise ptype a or ptype d will be wrong.

Jakub


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Richard Guenther
On Tue, 19 Apr 2011, Jan Hubicka wrote:

> > Huh.  No, I don't think we want to do any "inlining" as part of
> > folding.  At least not if it
> > is a correctness issue (is it?).  Why does the inliner not simply
> > inline the thunk function
> > body?
> 
> Because thunk functions have no bodies in gimple form and are no functions (at
> the moment) in cgraph sense.

Well, I see thunks as simple forwarders which we can represent with
a real function with body in gimple.  We don't have to _emit_ it
that way later, but at least IPA passes could do regular stmt
analysis on them and not need any special thunk knowledge (maybe
apart from making the inliner not inline the forwarding call into
the thunk itself).

> What IPA infrastructure needs to learn is to handle inlining of those as well
> as compute proper jump functions.  We spoke with Martin on this some time ago,
> we ought to get this done in 4.7. Thunks are very irritating spcial cases from
> cgraph POV ;(

And I don't see why they have to be very irritating spcial cases ;)

> I however like the idea that code doing devirtualizatio (type based or 
> folding based)
> will also do adjustment to call the actual functions instead of thunk.  Direct
> calls to thunks are quite nonsential.

Well, sure.  If we discover a direct call to a thunk late, after inlining
we can do some tricks to inline the adjustment, even during folding.  Just
analysis passes should simply see the thunk.

Richard.


Re: [patch] Fix PR lto/48148

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 12:21 PM, Jakub Jelinek  wrote:
> On Mon, Apr 18, 2011 at 11:45:03PM +0200, Eric Botcazou wrote:
>> 2011-04-18  Eric Botcazou  
>>
>>       PR lto/48148
>>       * gimple.c (gimple_types_compatible_p_1) : Do not
>>       merge the types if they have different enumeration identifiers.
>
> While I think it is a step in the right direction, IMHO we can reach the
> same error if we merge say enums from:
> TU1.c:
> void foo (void)
> {
>  enum A { B, C } e = B;
>  bar ((int) e);
> }
> TU2.c:
> enum A { B, C } f;
> extern void baz (enum A);
> void test (void)
> {
>  baz (0);
> }
> If enum A from second TU is merged with enum A from first TU, but DIE
> for it isn't needed in the partition containing TU2, then
> when trying for call site to add DIE for call to baz it needs to be emitted
> and it tries to emit it with the incorrect TYPE_CONTEXT.

Sure, but that problem applies to all types.

> IMHO types with incompatible TYPE_CONTEXT should never be compatible.
> Where incompatible TYPE_CONTEXT is something that needs some thought,
> if both TYPE_CONTEXTs are NULL / TRANSLATION_UNIT_DECL, it is
> fine to merge them, if it is a FUNCTION_DECL, it better be the same
> FUNCTION_DECL, if it is some type, it better be a gtc compatible type.
> TYPE_CONTEXT is IMHO relevant to all types, not just ENUMERAL_TYPE.
>
> Then, if we ever want to make LTO code actually debuggable, probably
> types with different TYPE_NAMEs shouldn't be considered to be compatible
> either, say enum A { B, C } a; in one TU and enum D { B, C } d; in another TU,
> otherwise ptype a or ptype d will be wrong.

Yeah, now that we merge the TYPE_CANONICAL web separately we
can do with a lot less "real" type merging without any effect on semantics.
We'll just pay with memory usage for better debug info.

Richard.

>        Jakub
>


[Patch, Fortran] PR 48588 - (4.6/4.7 regression) Resolve whole TU before generating code

2011-04-19 Thread Tobias Burnus
Currently, gfortran ships (trans*.c) MODULEs directly while it waits 
until the end for subroutines, functions and PROGRAM. The latter are 
then first all resolved and afterwards the middle-end code is generated.


In the PR this shows an issue: The a module procedure calls an external 
subroutine - which exists as gsym. In trans-decl one therefore uses the 
symbol information from gsym, which causes an ICE as the latter has not 
yet been resolved.


The solution is to first resolve the whole translation unit (file) 
before starting to generate middle end code. The only disadvantage is 
that it increases the required memory, however, it might be easier to 
re-use the module data later when gfortran stops reading module files 
all the time. I am also not 100% sure that I free all structures, though 
the patch probably frees most.


Build and regtested on x86-64-linux.
OK for the trunk - and after some grace period - for the 4.6 branch?

Tobias

PS: The code works with -fno-whole-file as then the code is always 
generated immediately. (Additionally, gsym is not used then.)


PPS: There are still three open 4.6/4.7 regressions - besides this one 
and the -frealloc-lhs issue Paul is working on. We should really fix 
those before 4.6.1 is released and Linux distributions and others start 
to heavily using 4.6.x.
2011-04-19  Tobias Burnus  
	PR fortran/48588
	* parse.c (resolve_all_program_units): Skip modules.
	(translate_all_program_units): Handle modules.
	(gfc_parse_file): Defer code generation for modules.

2011-04-19  Tobias Burnus  
	PR fortran/48588
	* gfortran.dg/whole_file_33.f90: New.

diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index c09589b..5d2237a 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -4191,6 +4191,10 @@ resolve_all_program_units (gfc_namespace *gfc_global_ns_list)
   gfc_current_ns = gfc_global_ns_list;
   for (; gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling)
 {
+  if (gfc_current_ns->proc_name
+	  && gfc_current_ns->proc_name->attr.flavor == FL_MODULE)
+	continue; /* Already resolved.  */
+
   if (gfc_current_ns->proc_name)
 	gfc_current_locus = gfc_current_ns->proc_name->declared_at;
   gfc_resolve (gfc_current_ns);
@@ -4231,8 +4235,28 @@ translate_all_program_units (gfc_namespace *gfc_global_ns_list)
   gfc_current_ns = gfc_global_ns_list;
   gfc_get_errors (NULL, &errors);
 
+  /* We first translate all modules to make sure that later parts
+ of the program can use the decl. Then we translate the nonmodules.  */
+
+  for (; !errors && gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling)
+{
+  if (!gfc_current_ns->proc_name
+	  || gfc_current_ns->proc_name->attr.flavor != FL_MODULE)
+	continue;
+
+  gfc_current_locus = gfc_current_ns->proc_name->declared_at;
+  gfc_derived_types = gfc_current_ns->derived_types;
+  gfc_generate_module_code (gfc_current_ns);
+  gfc_current_ns->translated = 1;
+}
+
+  gfc_current_ns = gfc_global_ns_list;
   for (; !errors && gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling)
 {
+  if (gfc_current_ns->proc_name
+	  && gfc_current_ns->proc_name->attr.flavor == FL_MODULE)
+	continue;
+
   gfc_current_locus = gfc_current_ns->proc_name->declared_at;
   gfc_derived_types = gfc_current_ns->derived_types;
   gfc_generate_code (gfc_current_ns);
@@ -4243,7 +4267,16 @@ translate_all_program_units (gfc_namespace *gfc_global_ns_list)
   gfc_current_ns = gfc_global_ns_list;
   for (;gfc_current_ns;)
 {
-  gfc_namespace *ns = gfc_current_ns->sibling;
+  gfc_namespace *ns;
+
+  if (gfc_current_ns->proc_name
+	  && gfc_current_ns->proc_name->attr.flavor == FL_MODULE)
+	{
+	  gfc_current_ns = gfc_current_ns->sibling;
+	  continue;
+	}
+
+  ns = gfc_current_ns->sibling;
   gfc_derived_types = gfc_current_ns->derived_types;
   gfc_done_2 ();
   gfc_current_ns = ns;
@@ -4375,16 +4408,18 @@ loop:
   if (s.state == COMP_MODULE)
 {
   gfc_dump_module (s.sym->name, errors_before == errors);
-  if (errors == 0)
-	gfc_generate_module_code (gfc_current_ns);
-  pop_state ();
   if (!gfc_option.flag_whole_file)
-	gfc_done_2 ();
+	{
+	  if (errors == 0)
+	gfc_generate_module_code (gfc_current_ns);
+	  pop_state ();
+	  gfc_done_2 ();
+	}
   else
 	{
 	  gfc_current_ns->derived_types = gfc_derived_types;
 	  gfc_derived_types = NULL;
-	  gfc_current_ns = NULL;
+	  goto prog_units;
 	}
 }
   else
@@ -4429,10 +4464,12 @@ prog_units:
 	= gfc_option.dump_fortran_original ? gfc_global_ns_list : NULL;
 
   for (; gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling)
-{
-  gfc_dump_parse_tree (gfc_current_ns, stdout);
-  fputs ("--\n\n", stdout);
-}
+if (!gfc_current_ns->proc_name
+	|| gfc_current_ns->proc_name->attr.flavor != FL_MODULE)
+  {
+	gfc_dump_parse_tree (gfc_current_ns, stdout);
+	fputs ("--\n\n", 

Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Jan Hubicka
> On Tue, 19 Apr 2011, Jan Hubicka wrote:
> 
> > > Huh.  No, I don't think we want to do any "inlining" as part of
> > > folding.  At least not if it
> > > is a correctness issue (is it?).  Why does the inliner not simply
> > > inline the thunk function
> > > body?
> > 
> > Because thunk functions have no bodies in gimple form and are no functions 
> > (at
> > the moment) in cgraph sense.
> 
> Well, I see thunks as simple forwarders which we can represent with
> a real function with body in gimple.  We don't have to _emit_ it

That is true only for non-variadic thunks. (and we actually generate bodies for
those early, unless target asked us to do otherwise).  Sure, Diego once
mentioned that we might have some kind of special thunk_call gimple statement,
but that seems to me that amount of gimple code that would need to handle
thunk_call strictly exceeds amount of IPA code that would need to get past
special kind of functions.

Also as as we make them regular functions, we need to go all the way through,
as IPA optimizers will happily modify them so simply using existing target
interface instead of cfgexpand later won't help.  In a way, with tailcall
support, the target interface is really only interesting for variadic case.
At least for i386. I did not inspect details of implementation of thunks on
all the archs that do asm thunks to be sure.

> that way later, but at least IPA passes could do regular stmt
> analysis on them and not need any special thunk knowledge (maybe
> apart from making the inliner not inline the forwarding call into
> the thunk itself).
> 
> > What IPA infrastructure needs to learn is to handle inlining of those as 
> > well
> > as compute proper jump functions.  We spoke with Martin on this some time 
> > ago,
> > we ought to get this done in 4.7. Thunks are very irritating spcial cases 
> > from
> > cgraph POV ;(
> 
> And I don't see why they have to be very irritating spcial cases ;)

Well, becuase of variadic thunks, I think ;)

Honza
> 
> > I however like the idea that code doing devirtualizatio (type based or 
> > folding based)
> > will also do adjustment to call the actual functions instead of thunk.  
> > Direct
> > calls to thunks are quite nonsential.
> 
> Well, sure.  If we discover a direct call to a thunk late, after inlining
> we can do some tricks to inline the adjustment, even during folding.  Just
> analysis passes should simply see the thunk.
> 
> Richard.


[PATCH][LTO] Fix PR48207, ICE in lhd_set_decl_assembler_name

2011-04-19 Thread Richard Guenther

This avoids ICEing in the default decl-assembler-name hook during
compile-time.  We are not yet prepared for a world where all important
mangling is done before/during free-lang-data, and the default
langhook implementation surely isn't the one that would assert so.

The following patch simply retains the language specific hook for now
which avoids the ICE.  At link time we use the LTO implementation
of the langhook which doesn't ICE but generates (only) "wrong"
debug info.  Better than ICEing, so I consider this good for 4.6.

In an ideal world we only ever would need to assign assembler names
to non-exported entities at link time.  And we could simply use the
same mechanism for LTO and all other frontends for such entities,
at LTO link time asserting that we never try that for exported entities.
But we are not there yet.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2011-04-19  Richard Guenther  

PR lto/48207
* tree.c (free_lang_data): Do not reset the decl-assembler-name
langhook.

* g++.dg/lto/pr48207_0.C: New testcase.

Index: gcc/tree.c
===
*** gcc/tree.c  (revision 172705)
--- gcc/tree.c  (working copy)
*** free_lang_data (void)
*** 5176,5182 
lang_hooks.callgraph.analyze_expr = NULL;
lang_hooks.dwarf_name = lhd_dwarf_name;
lang_hooks.decl_printable_name = gimple_decl_printable_name;
!   lang_hooks.set_decl_assembler_name = lhd_set_decl_assembler_name;
  
/* Reset diagnostic machinery.  */
diagnostic_starter (global_dc) = default_tree_diagnostic_starter;
--- 5176,5187 
lang_hooks.callgraph.analyze_expr = NULL;
lang_hooks.dwarf_name = lhd_dwarf_name;
lang_hooks.decl_printable_name = gimple_decl_printable_name;
!   /* We do not want the default decl_assembler_name implementation,
!  rather if we have fixed everything we want a wrapper around it
!  asserting that all non-local symbols already got their assembler
!  name and only produce assembler names for local symbols.  Or rather
!  make sure we never call decl_assembler_name on local symbols and
!  devise a separate, middle-end private scheme for it.  */
  
/* Reset diagnostic machinery.  */
diagnostic_starter (global_dc) = default_tree_diagnostic_starter;
Index: gcc/testsuite/g++.dg/lto/pr48207_0.C
===
*** gcc/testsuite/g++.dg/lto/pr48207_0.C(revision 0)
--- gcc/testsuite/g++.dg/lto/pr48207_0.C(revision 0)
***
*** 0 
--- 1,13 
+ // { dg-lto-do link }
+ // { dg-lto-options { { -flto -g } } }
+ 
+ void bar(int) {}
+ 
+ void foo(void)
+ {
+   typedef enum { ABC } DEF;
+   DEF a;
+   bar((int)a);
+ }
+ 
+ int main() {}


Re: Fix PR47976

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 11:32 AM, Richard Guenther
 wrote:
> On Tue, Apr 12, 2011 at 6:18 PM, Jeff Law  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 04/07/11 04:47, Bernd Schmidt wrote:
>>> PR47976 is a followup to PR47166; the patch there caused this problem.
>>>
>>> The problem occurs in reload. There are two autoinc addresses which
>>> inherit from one another, and we delete an insn that is necessary.
>>>
>>> We reach this code when reloading the second autoinc address:
>>>
>>> 6821        if (optimize && REG_P (oldequiv)
>>> 6822            && REGNO (oldequiv) < FIRST_PSEUDO_REGISTER
>>> 6823            && spill_reg_store[REGNO (oldequiv)]
>>> 6824            && REG_P (old)
>>> (gdb)
>>> 6825            && (dead_or_set_p (insn,
>>> 6826                               spill_reg_stored_to[REGNO (oldequiv)])
>>> 6827                || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
>>> 6828                                old)))
>>> 6829          delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
>>>
>>> reload_inherited[j] is 1 at this point, so oldequiv == reloadreg.
>>>
>>> (gdb) p debug_rtx (spill_reg_store[7])
>>> (insn 719 718 232 10 (set (reg:SI 7 r7)
>>>         (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])) -1 (nil))
>>> (gdb) p debug_rtx (spill_reg_stored_to[7])
>>> (reg:SI 3 r3)
>>>
>>> Prior to the PR47166 patch, we had spill_reg_store[7] equal to insn 718,
>>> which doesn't involve register 7 at all:
>>>
>>> (insn 718 221 719 10 (set (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])
>>>         (plus:SI (reg:SI 3 r3 [orig:339 ivtmp.79 ] [339])
>>>             (const_int 8 [0x8]))) 4 {*arm_addsi3} (nil))
>>>
>>> That was sufficient to generate enough confusion to make the compiler
>>> think it couldn't delete the output reload.
>>>
>>> I think the problem is simply that the (set (r7) (r3)) is the opposite
>>> direction of a normal spill_reg_store - normally you write a spill reg
>>> to its destination, but autoinc reloads are somewhat special.
>>>
>>> If delete_output_reload isn't valid for (at least some) autoincs, we can
>>> simply not record them in spill_reg_store. That's part of the patch
>>> below; it seems to fix the problem. I've also deleted the code quoted
>>> above since it's pointless to have reload deleting dead stores to
>>> registers: that's what DCE is for. I've observed no code generation
>>> changes other than for the testcase from either of these changes, with
>>> both an ARM and an sh compiler.
>>>
>>> Comments?
>> Looks good to me.  I like letting DCE do its job, particularly if it
>> allows us to even trivially simplify this code ;-)
>
> As you are fine with it and a patch deleting more code than it adds
> always makes me feel comfortable and as I'm trying to get a 4.5.3
> done which is blocked by this bug I will apply the patch after a round
> of testing.
>
> The bugzilla audit trail says the patch tests fine on a few archs,
> I'm going to test x86_64 everywhere and all my available archs
> for a 4.5 branch backport.

Bootstrapped and tested on x86_64-unknown-linux-gnu for trunk
with the set-but-not-used store variable removed.  Installed as
r172706.  I'll wait for some autotester coverage and my own extensive
4.5 testing before doing a 4.6 and 4.5 backport.

Richard.


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Richard Guenther
On Tue, 19 Apr 2011, Jan Hubicka wrote:

> > On Tue, 19 Apr 2011, Jan Hubicka wrote:
> > 
> > > > Huh.  No, I don't think we want to do any "inlining" as part of
> > > > folding.  At least not if it
> > > > is a correctness issue (is it?).  Why does the inliner not simply
> > > > inline the thunk function
> > > > body?
> > > 
> > > Because thunk functions have no bodies in gimple form and are no 
> > > functions (at
> > > the moment) in cgraph sense.
> > 
> > Well, I see thunks as simple forwarders which we can represent with
> > a real function with body in gimple.  We don't have to _emit_ it
> 
> That is true only for non-variadic thunks. (and we actually generate bodies 
> for
> those early, unless target asked us to do otherwise).  Sure, Diego once
> mentioned that we might have some kind of special thunk_call gimple statement,
> but that seems to me that amount of gimple code that would need to handle
> thunk_call strictly exceeds amount of IPA code that would need to get past
> special kind of functions.
> 
> Also as as we make them regular functions, we need to go all the way through,
> as IPA optimizers will happily modify them so simply using existing target
> interface instead of cfgexpand later won't help.  In a way, with tailcall
> support, the target interface is really only interesting for variadic case.
> At least for i386. I did not inspect details of implementation of thunks on
> all the archs that do asm thunks to be sure.
> 
> > that way later, but at least IPA passes could do regular stmt
> > analysis on them and not need any special thunk knowledge (maybe
> > apart from making the inliner not inline the forwarding call into
> > the thunk itself).
> > 
> > > What IPA infrastructure needs to learn is to handle inlining of those as 
> > > well
> > > as compute proper jump functions.  We spoke with Martin on this some time 
> > > ago,
> > > we ought to get this done in 4.7. Thunks are very irritating spcial cases 
> > > from
> > > cgraph POV ;(
> > 
> > And I don't see why they have to be very irritating spcial cases ;)
> 
> Well, becuase of variadic thunks, I think ;)

I thought the idea was to use __builtin_va_arg_pack and friends.
Of course the inliner would still need to know how to inline such
a va-arg forwarder, and we would need a way to expand them (well,
or just go the existing special casing).  We might need this
kind of inliner support anyway because of the partial inlining
opportunity in libquantum which is for a varargs function like

void foo (...)
{
  if (always-zero-static)
return;

  ...
}

thus partial inlining would create such a forwarding call.

But well, it's just my random 2 cents on the issue ;)

Richard.


[patch] Add missing test for DECL_NO_INLINE_WARNING_P

2011-04-19 Thread Eric Botcazou
Hi,

tree_inlinable_function_p issues the -Winline warning only if

  /* We only warn for functions declared `inline' by the user.  */
  do_warning = (warn_inline
&& DECL_DECLARED_INLINE_P (fn)
&& !DECL_NO_INLINE_WARNING_P (fn)
&& !DECL_IN_SYSTEM_HEADER (fn));

is true, so in particular only if DECL_NO_INLINE_WARNING_P is not set.  Now 
expand_call_inline also issues the -Winline warning

  else if (warn_inline && DECL_DECLARED_INLINE_P (fn)
   && !DECL_IN_SYSTEM_HEADER (fn)
   && reason != CIF_UNSPECIFIED
   && !lookup_attribute ("noinline", DECL_ATTRIBUTES (fn))
   /* Do not warn about not inlined recursive calls.  */
   && !cgraph_edge_recursive_p (cg_edge)
   /* Avoid warnings during early inline pass. */
   && cgraph_global_info_ready)

but disregards the DECL_NO_INLINE_WARNING_P flag.

Tested on i586-suse-linux, OK for the mainline?


2011-04-19  Eric Botcazou  

* tree-inline.c (expand_call_inline): Do not issue a -Winline warning
if DECL_NO_INLINE_WARNING_P is set on the function.


-- 
Eric Botcazou
Index: tree-inline.c
===
--- tree-inline.c	(revision 172693)
+++ tree-inline.c	(working copy)
@@ -3744,7 +3744,9 @@ expand_call_inline (basic_block bb, gimp
 		 _(cgraph_inline_failed_string (reason)));
 	  sorry ("called from here");
 	}
-  else if (warn_inline && DECL_DECLARED_INLINE_P (fn)
+  else if (warn_inline
+	   && DECL_DECLARED_INLINE_P (fn)
+	   && !DECL_NO_INLINE_WARNING_P (fn)
 	   && !DECL_IN_SYSTEM_HEADER (fn)
 	   && reason != CIF_UNSPECIFIED
 	   && !lookup_attribute ("noinline", DECL_ATTRIBUTES (fn))


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Denis Chertykov
2011/4/19 Georg-Johann Lay :
> Denis Chertykov schrieb:
>> 2011/4/19 Georg-Johann Lay :
>>> How can add, sub etc. be split? This would need an explicit
>>> representation of carry.
>>
>> Yes.
>>
>> Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html
>
> Just skimmed the conversation. I thought about making AVR ISA's
> effects on SREG explicit several times, but I always got stuck at some
> point.
>
> - It's not only about scheduling (which does not happen for avr) but
>  also about moving instructions across jumps.
>
> - Many transformations would happen before reload, but at these stages
> the effect on SREG is not yet known in many cases. There is
> sophisticated instruction output for many patterns, and their impact
> on SREG/CC is not known before reload.
>
> - Making CC explicit would render many single_set insns to PARALLELs
> making the optimizers' live much harder or impossible. Imagine
> instructions that could be combined. Explicit CC would clutter up
> insns and combine won't try to transform the bulky patterns.
>
> - Backend would be much more complicated, harder to maintain and
> understand. Almost any insn would have to be changed.

Generally, I'm agree with you, the AVR port uses CC0 because of that.

Denis.


Re: [patch] Add missing test for DECL_NO_INLINE_WARNING_P

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 1:06 PM, Eric Botcazou  wrote:
> Hi,
>
> tree_inlinable_function_p issues the -Winline warning only if
>
>  /* We only warn for functions declared `inline' by the user.  */
>  do_warning = (warn_inline
>                && DECL_DECLARED_INLINE_P (fn)
>                && !DECL_NO_INLINE_WARNING_P (fn)
>                && !DECL_IN_SYSTEM_HEADER (fn));
>
> is true, so in particular only if DECL_NO_INLINE_WARNING_P is not set.  Now
> expand_call_inline also issues the -Winline warning
>
>      else if (warn_inline && DECL_DECLARED_INLINE_P (fn)
>               && !DECL_IN_SYSTEM_HEADER (fn)
>               && reason != CIF_UNSPECIFIED
>               && !lookup_attribute ("noinline", DECL_ATTRIBUTES (fn))
>               /* Do not warn about not inlined recursive calls.  */
>               && !cgraph_edge_recursive_p (cg_edge)
>               /* Avoid warnings during early inline pass. */
>               && cgraph_global_info_ready)
>
> but disregards the DECL_NO_INLINE_WARNING_P flag.
>
> Tested on i586-suse-linux, OK for the mainline?

Ok.

Thanks,
Richard.

>
> 2011-04-19  Eric Botcazou  
>
>        * tree-inline.c (expand_call_inline): Do not issue a -Winline warning
>        if DECL_NO_INLINE_WARNING_P is set on the function.
>
>
> --
> Eric Botcazou
>


[testsuite]: Skip some tests for avr

2011-04-19 Thread Georg-Johann Lay
This patchlet skips some tests for avr because int is just 16 bits there.

Johann

testsuite/

2011-04-19  Georg-Johann Lay  

* gcc.c-torture/compile/pr43191.c: Skip avr due to 16-bit int.
* gcc.dg/torture/pr43165.c: Ditto.
* gcc.dg/torture/pr47228.c: Ditto.
* gcc.dg/tree-ssa/pr45144.c: Ditto.
* gcc.dg/ipa/pr45644.c: Ditto.
Index: gcc.c-torture/compile/pr43191.c
===
--- gcc.c-torture/compile/pr43191.c	(Revision 172597)
+++ gcc.c-torture/compile/pr43191.c	(Arbeitskopie)
@@ -1,4 +1,4 @@
-/* { dg-skip-if "Ints are 16 bits" { "pdp11-*-*" } { "*" } { "" } } */ 
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" "pdp11-*-*" } { "*" } { "" } } */ 
 struct S0
 {
 };
Index: gcc.dg/torture/pr43165.c
===
--- gcc.dg/torture/pr43165.c	(Revision 172597)
+++ gcc.dg/torture/pr43165.c	(Arbeitskopie)
@@ -1,5 +1,6 @@
 /* PR debug/43165 */
 /* { dg-options "-g" } */
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" } { "*" } { "" } } */ 
 
 struct __attribute__((packed)) S
 {
Index: gcc.dg/torture/pr47228.c
===
--- gcc.dg/torture/pr47228.c	(Revision 172597)
+++ gcc.dg/torture/pr47228.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" } { "*" } { "" } } */ 
 
 struct S4
 {
Index: gcc.dg/tree-ssa/pr45144.c
===
--- gcc.dg/tree-ssa/pr45144.c	(Revision 172597)
+++ gcc.dg/tree-ssa/pr45144.c	(Arbeitskopie)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" } { "*" } { "" } } */ 
 
 void baz (unsigned);
 
Index: gcc.dg/ipa/pr45644.c
===
--- gcc.dg/ipa/pr45644.c	(Revision 172597)
+++ gcc.dg/ipa/pr45644.c	(Arbeitskopie)
@@ -1,6 +1,7 @@
 /* Verify that we do not IPA-SRA bitfields.  */
 /* { dg-do run } */
 /* { dg-options "-O2"  } */
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" } { "*" } { "" } } */ 
 
 extern void abort (void);
 


Re: [build, lto] Only accept -fuse-linker-plugin if linker supports -plugin (PR lto/46944)

2011-04-19 Thread Paolo Bonzini

On 04/04/2011 06:15 PM, Rainer Orth wrote:

I haven't found if there are provisions for in-tree gold, though, and
still cannot test that.


In-tree gold definitely works (or should).

Paolo


Re: [testsuite]: Skip some tests for avr

2011-04-19 Thread Hans-Peter Nilsson
On Tue, 19 Apr 2011, Georg-Johann Lay wrote:
> This patchlet skips some tests for avr because int is just 16 bits there.

> 2011-04-19  Georg-Johann Lay  
>
>   * gcc.c-torture/compile/pr43191.c: Skip avr due to 16-bit int.
>   * gcc.dg/torture/pr43165.c: Ditto.
>   * gcc.dg/torture/pr47228.c: Ditto.
>   * gcc.dg/tree-ssa/pr45144.c: Ditto.
>   * gcc.dg/ipa/pr45644.c: Ditto.
>

(Cutnpasted instead of quoting because alpine can't quote
attached text; please inline patches.)

Index: gcc.c-torture/compile/pr43191.c
===
--- gcc.c-torture/compile/pr43191.c (Revision 172597)
+++ gcc.c-torture/compile/pr43191.c (Arbeitskopie)
@@ -1,4 +1,4 @@
-/* { dg-skip-if "Ints are 16 bits" { "pdp11-*-*" } { "*" } { "" } } */
+/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" "pdp11-*-*" } { "*" } { "" } }*/

That pdp11 stanza was a bad precedent.  Please instead use one of

/* { dg-skip-if "Ints are 16 bits" { ! int32plus } { "*" } { "" } } */
or
/* { dg-require-effective-target int32plus } */
or
/* { dg-do X { target int32plus } } */
(where X = {compile, run, assemble})

so we don't have to keep a steadily increasing target list
(decreasing only when targets are removed).

brgds, H-P


Re: Improve stack layout heuristic.

2011-04-19 Thread Michael Matz
Hi,

On Mon, 18 Apr 2011, Easwaran Raman wrote:

> > > @@ -596,7 +581,7 @@
> > >    if (vb->conflicts)
> > >      {
> > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> > > -     add_stack_var_conflict (a, stack_vars[u].representative);
> > > +     add_stack_var_conflict (a, u);
> >
> > Please don't.  This uselessly bloats the conflict bitmaps.
> 
> It is sufficient to add the conflicts of  a variable only when it is
> not merged into some group.

That is correct but is also what the use of stack_vars[u].representative 
achieves alone, ...

> I am adding a check to that effect.

... without any check.

@@ -596,7 +581,8 @@
   if (vb->conflicts)
 {
   EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
-   add_stack_var_conflict (a, stack_vars[u].representative);
+if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
+  add_stack_var_conflict (a, u);
   BITMAP_FREE (vb->conflicts);
 }
 }

What's your objective with this change?  I find the original code clearer.


Ciao,
Michael.

Re: [PATCH v3] Re: avoid useless if-before-free tests

2011-04-19 Thread Michael Matz
Hi,

On Mon, 18 Apr 2011, Janne Blomqvist wrote:

> (Why we, in the age of non-sucky version control, persist in keeping 
> manual changelog files is beyond me..)

The one single reason why I'm very happy about our ChangeLog file policy 
is that I can grep it easily.  Listing the changelog of the whole tree via 
svn (or anything else) up to some years ago is very slow and generates 
quite some load on the server.  Sure there are various work-arounds for 
that, but I'm happy to not have to resolve to such work-arounds.


Ciao,
Michael.


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Anatoly Sokolov

Hi.



+
+/* Implement `ASM_OUTPUT_ALIGNED_DECL_LOCAL' */
+/* Track need of __do_clear_bss */


Put dot and two spaces after the end of a sentence. The same for other 
commens in this patch.



+
+void
+avr_asm_output_aligned_decl_local (FILE * stream, const_tree decl 
ATTRIBUTE_UNUSED,
+   const char *name, unsigned 
HOST_WIDE_INT size,
+   unsigned HOST_WIDE_INT align 
ATTRIBUTE_UNUSED)

   unsigned HOST_WIDE_INT align)


+{
+  avr_need_clear_bss_p = true;
+
+  fputs ("\t.lcomm ", stream);
+  assemble_name (stream, name);
+  fprintf (stream, ",%d\n", (int) size);
  fprintf (stream, " , " HOST_WIDE_INT_PRINT_UNSIGNED ", %u\n", size, 
align / BITS_PER_UNIT);

+}


The same changes in avr_asm_output_aligned_decl_common function.

Anatoly. 



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-19 Thread Richard Earnshaw

On Tue, 2011-04-19 at 15:17 +0400, Denis Chertykov wrote:
> 2011/4/19 Georg-Johann Lay :
> > Denis Chertykov schrieb:
> >> 2011/4/19 Georg-Johann Lay :
> >>> How can add, sub etc. be split? This would need an explicit
> >>> representation of carry.
> >>
> >> Yes.
> >>
> >> Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html
> >
> > Just skimmed the conversation. I thought about making AVR ISA's
> > effects on SREG explicit several times, but I always got stuck at some
> > point.
> >
> > - It's not only about scheduling (which does not happen for avr) but
> >  also about moving instructions across jumps.
> >
> > - Many transformations would happen before reload, but at these stages
> > the effect on SREG is not yet known in many cases. There is
> > sophisticated instruction output for many patterns, and their impact
> > on SREG/CC is not known before reload.
> >
> > - Making CC explicit would render many single_set insns to PARALLELs
> > making the optimizers' live much harder or impossible. Imagine
> > instructions that could be combined. Explicit CC would clutter up
> > insns and combine won't try to transform the bulky patterns.
> >
> > - Backend would be much more complicated, harder to maintain and
> > understand. Almost any insn would have to be changed.
> 
> Generally, I'm agree with you, the AVR port uses CC0 because of that.

Thumb-1 support in the ARM compiler has similar flag-setting properties
(ie most instructions set the condition codes).  It doesn't use CC0.  It
works because it doesn't model the condition code register at all, but
treats compare/branch sequences as indivisible operations.

R.




Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-19 Thread Richard Earnshaw

On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
> Reload pass tries to determine the stack frame, so it needs to check the
> push/pop lr optimization opportunity. One of the criteria is if there is any
> far jump inside the function. Unfortunately at this time gcc can't decide each
> instruction's length and basic block layout, so it can't know the offset of
> a jump. To be conservative it assumes every jump is a far jump. So any jump
> in a function will prevent this push/pop lr optimization.
> 
> To enable the push/pop lr optimization in reload pass, I compute the possible
> maximum length of the function body. If the length is not large enough, far
> jump is not necessary, so we can safely do push/pop lr optimization.
> 
> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
> 
> This patch is for google/main.
> 
> 2011-04-19  Guozhi Wei  
> 
>   Google ref 40255.
>   * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>   (estimate_function_length): New function.
>   (thumb_far_jump_used_p): No far jump is needed in short function.
> 

Setting aside for the moment Richi's issue with hot/cold sections, this
isn't safe.  Firstly get_attr_length() doesn't return the worst case
length; and secondly, it doesn't take into account the size of reload
insns that are still on the reloads stack -- these are only emitted
right at the end of the reload pass.  Both of these would need to be
addressed before this can be safely done.

It's worth noting here that in the dim and distant past we used to try
to estimate the size of the function and eliminate redundant saves of
R14, but the code had to be removed because it was too fragile; but it
looks like some vestiges of the code are still in the compiler.

A slightly less optimistic approach, but one that is much safer is to
scan the function after reload has completed and see if we can avoid
having to push LR.  We can do this if:

- The function makes no calls
- The function saves nothing on the stack other than r14
- It's small enough (by this point we can use get_attr_length)
- R14 is only modified by internal jump instructions

There's already some code to try to do this in the ARM back-end (look
for lr_save_eliminated), but it's probably not doing its job properly
because it tries to cache the result early on (it's costly to work this
out) and at the time its first called we cannot assert that the register
won't be live.

R.





Re: [PATCH][LTO] Fix PR48207, ICE in lhd_set_decl_assembler_name

2011-04-19 Thread Diego Novillo
On Tue, Apr 19, 2011 at 06:51, Richard Guenther  wrote:

> 2011-04-19  Richard Guenther  
>
>        PR lto/48207
>        * tree.c (free_lang_data): Do not reset the decl-assembler-name
>        langhook.
>
>        * g++.dg/lto/pr48207_0.C: New testcase.

OK, if it's only problematic with debug info.


Diego.


Re: Inliner heuristics facelift

2011-04-19 Thread H.J. Lu
On Sun, Apr 17, 2011 at 7:14 AM, Jan Hubicka  wrote:
> Hi,
> there is now tracking PR48636 for fortran inlining issues.  I would suggest
> reporting problems like this there or PR that blocks it instead of sending it
> to different threads so we have track of them.  It is quite important to get
> some understanding what sort of issues we have here.  We might want to collect
> some sort of inlining benchmark like we have for C++.  For the moment I think
> the polyhedron we have is working well, but just having a summary what
> benchmarks needs what --param tweaks and why (i.e. what is important to 
> inline)
> would be useful.
>
> While final testing before comming I noticed that last minue gcc_assert added
> to can_inline checking that gimple stmt and edge cannot_inline_flags are in 
> sync
> (they must be, otherwise we would do invalid optimizations at IPA) is failing
> on one of C++ testcases.
>
> The problem is code in tree-inline that incorrectly handles the flag on
> functions that do have mismatching EH personalities.
>
> This is final version of patch I comitted that fixes this by moving all the 
> logic
> scattered across tree-inline.c into can_inline_edge_p where it belongs and 
> adding
> corresponding CIF codes.  It also turned out that our all_set_cannot_inline 
> are
> not quite up to date after devirtualization and other transforms.  As first 
> cut
> i decided to move the computation into inline summary computation and I think
> i can remove it from gimplifier and value profiler in followup patch.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> Index: ChangeLog
> ===
> *** ChangeLog   (revision 172608)
> --- ChangeLog   (working copy)
> ***
> *** 1,3 
> --- 1,74 
> + 2011-04-17  Jan Hubicka  
> +
> +       * lto-symtab.c (lto_cgraph_replace_node): When call statement is
> +       present, also set gimple_call_set_cannot_inline.
> +       * ipa-inline.c: Update toplevel comment.
> +       (MAX_TIME): Remove.
> +       (cgraph_clone_inlined_nodes): Fix linebreaks.
> +       (cgraph_check_inline_limits): Restructure to ...
> +       (caller_growth_limits): ... this one; be more tolerant
> +       on growth in nested inline chains; add explanatory comment;
> +       fix stack accounting thinko introduced by previous patch.
> +       (cgraph_default_inline_p): Remove.
> +       (report_inline_failed_reason): New function.
> +       (can_inline_edge_p): New function.
> +       (can_early_inline_edge_p): New function.
> +       (leaf_node_p): Move upwards in file.
> +       (want_early_inline_function_p): New function.
> +       (want_inline_small_function_p): New function.
> +       (want_inline_self_recursive_call_p): New function.
> +       (cgraph_edge_badness): Rename to ...
> +       (edge_badness) ... this one; fix linebreaks.
> +       (update_edge_key): Update call of edge_baddness; add
> +       detailed dump about queue updates.
> +       (update_caller_keys): Use can_inline_edge_p and
> +       want_inline_small_function_p.
> +       (cgraph_decide_recursive_inlining): Rename to...
> +       (recursive_inlining): Use can_inline_edge_p and
> +       want_inline_self_recursive_call_p; simplify and
> +       remove no longer valid FIXME.
> +       (cgraph_set_inline_failed): Remove.
> +       (add_new_edges_to_heap): Use can_inline_edge_p and
> +       want_inline_small_function_p.
> +       (cgraph_decide_inlining_of_small_functions): Rename to ...
> +       (inline_small_functions): ... this one; cleanup; use
> +       can/want predicates; cleanup debug ouput; work edges
> +       till fibheap is exhausted and do not stop once unit
> +       growth is reached; remove later loop processing remaining
> +       edges.
> +       (cgraph_flatten): Rename to ...
> +       (flatten_function): ... this one; use can_inline_edge_p
> +       and can_early_inline_edge_p predicates.
> +       (cgraph_decide_inlining): Rename to ...
> +       (ipa_inline): ... this one; remove unreachable nodes before
> +       inlining functions called once; simplify the pass.
> +       (cgraph_perform_always_inlining): Rename to ...
> +       (inline_always_inline_functions): ... this one; use
> +       DECL_DISREGARD_INLINE_LIMITS; use can_inline_edge_p
> +       predicate
> +       (cgraph_decide_inlining_incrementally): Rename to ...
> +       (early_inline_small_functions): ... this one; simplify
> +       using new predicates; cleanup; make dumps prettier.
> +       (cgraph_early_inlining): Rename to ...
> +       (early_inliner): newer inline regular functions into always-inlines;
> +       fix updating of call stmt summaries.
> +       (pass_early_inline): Update for new names.
> +       (inline_transform): Fix formating.
> +       (gate_cgraph_decide_inlining): Rename to ...
> +       (pass_ipa_inline): ... this one.
> +       * ipa-inline.h (inline_summary): Remove disregard_inline_limits.
> +       * ipa-inline-analy

Re: RFA: Gimple calls to "internal" functions

2011-04-19 Thread Richard Sandiford
Richard Guenther  writes:
>> We really should be able to treat calls to pure internal functions
>> like calls to pure "real" functions though.
>>
>> TBH, I think this is an example of why trying to so hard to avoid a tree
>> code for the internal function is working against us.  Most of the patch
>> feels like sprinkling hacks around the code base just because we don't
>> have a real tree representation for what we're calling.  Any new code
>> that handles calls is going to have to remember to make the same
>> distinction.
>
> Most code either doesn't care about the callee or only handles calls
> to a specific function and thus already checks whether gimple_call_fndecl ()
> returns NULL.  The patch doesn't feel like it adds hacks everywhere,
> instead it is quite small (smaller than I thought at least).

OK, just testing your resolve :-)

Here's a revised patch, updated this morning.  Main changes:

* gimple_call_fn and gimple_call_fntype can be called unconditionally.
  They return null for internal functions.

* gimple_call_set_fn, gimple_call_set_fntype and gimple_call_set_fndecl
  assert that the call isn't to an internal function.

* gimple_call_set_internal_fn asserts the opposite.

* verify_gimple_call checks internalness vs. nullness.  This shouldn't
  trigger with the asserts above, but it's there Just In Case.

* I've not moved code around in verify_gimple_call but added null checks
  instead.

* Some other places also check for nullness rather than specifically for
  internal calls.

* gimple_call_same_target_p uses the expression you suggested.

* The expanders now take the gimple call statement rather than the
  tree operands.

* I've made the value-numbering changes you suggested.

I didn't factor out the dumping code because the three cases are
doing different things (one raw dump, one pretty dump to a buffer,
and one pretty dump to a file).

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
* Makefile.in (INTERNAL_FN_DEF, INTERNAL_FN_H): Define.
(GIMPLE_H): Include $(INTERNAL_FN_H).
(OBJS-common): Add internal-fn.o.
(internal-fn.o): New rule.
* internal-fn.def: New file.
* internal-fn.h: Likewise.
* internal-fn.c: Likewise.
* gimple.h: Include internal-fn.h.
(GF_CALL_INTERNAL): New gf_mask.
(gimple_statement_call): Put fntype into a union with a new
internal_fn field.
(gimple_build_call_internal): Declare.
(gimple_build_call_internal_vec): Likewise.
(gimple_call_same_target_p): Likewise.
(gimple_call_internal_p): New function.
(gimple_call_internal_fn): Likewise.
(gimple_call_fntype): Return null for internal calls.
(gimple_call_set_fntype): Assert that the function is not internal.
(gimple_call_set_fn): Likewise.
(gimple_call_set_fndecl): Likewise.
(gimple_call_set_internal_fn): New function.
(gimple_call_addr_fndecl): Handle null functions.
(gimple_call_return_type): Likewise null types.
* gimple.c (gimple_build_call_internal_1): New function.
(gimple_build_call_internal): Likewise.
(gimple_build_call_internal_vec): Likewise.
(gimple_call_same_target_p): Likewise.
(gimple_call_flags): Handle calls to internal functions.
(gimple_call_fnspec): New function.
(gimple_call_arg_flags, gimple_call_return_flags): Use it.
(gimple_has_side_effects): Handle null functions.
(gimple_rhs_has_side_effects): Likewise.
(gimple_call_copy_skip_args): Handle calls to internal functions.
* cfgexpand.c (expand_call_stmt): Likewise.
* expr.c (expand_expr_real_1): Assert that the call isn't internal.
* gimple-fold.c (gimple_fold_call): Handle null functions.
(gimple_fold_stmt_to_constant_1): Don't fold
calls to internal functions.
* gimple-low.c (gimple_check_call_args): Handle calls to internal
functions.
* gimple-pretty-print.c (dump_gimple_call): Likewise.
* ipa-prop.c (ipa_analyze_call_uses): Handle null functions.
* tree-cfg.c (verify_gimple_call): Handle calls to internal functions.
(do_warn_unused_result): Likewise.
* tree-eh.c (same_handler_p): Use gimple_call_same_target_p.
* tree-ssa-ccp.c (ccp_fold_stmt): Handle calls to internal functions.
* tree-ssa-dom.c (hashable_expr): Use the gimple statement to record
the target of a call.
(initialize_hash_element): Update accordingly.
(hashable_expr_equal_p): Use gimple_call_same_target_p.
(iterative_hash_hashable_expr): Handle calls to internal functions.
(print_expr_hash_elt): Likewise.
* tree-ssa-pre.c (can_value_number_call): Likewise.
(eliminate): Handle null functions.
* tree-ssa-sccvn.c (visit_use): Handle calls to internal functions.
* tree-ssa-structalias.c (get_fi_for_callee): Li

[v3] fix libstdc++/48521

2011-04-19 Thread Jonathan Wakely
2011-04-19  Jonathan Wakely  

PR libstdc++/48521
* include/std/type_traits (result_of): Handle pointer to member.
* include/std/functional (__invoke): Likewise.
(_Function_to_function_pointer): Remove.
(_Reference_wrapper_base): Provide nested types independent of
unary_function and binary_function.
(reference_wrapper::operator()): DR 2017.
(ref(const A&&), cref(const A&&): Define as deleted.
* include/std/future (async): Simplify SFINAE and use result_of to
support pointer to member.
* testsuite/20_util/reference_wrapper/invoke.cc: Test pointer to
member.
* testsuite/20_util/reference_wrapper/24803.cc: Likewise.
* testsuite/20_util/reference_wrapper/typedefs.cc: Test for types
instead of derivation from unary_function and binary_function.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust.
* testsuite/20_util/reference_wrapper/invoke-2.cc: New.
* testsuite/20_util/reference_wrapper/ref_neg.c: New.
* testsuite/20_util/reference_wrapper/typedefs-3.c: New.

This resolves a few issues in  which I first noticed
because our std::async didn't work with pointers to member. To fix
that I needed to make result_of support pointer to member, then use
result_of in async, which required changing the enable_if otherwise we
form an invalid result_of expression outside the immediate context of
the type deduction, where SFINAE doesn't apply.

I implemented DR 2017 (reference wrapper makes incorrect use of
result_of) and because we now use a reference to the wrapped type in
the result_of expression we don't need Function_to_function_pointer to
prevent forming an invalid expression.

I made reference_wrapper define the nested
{first_,second_,}argument_type typedefs correctly if the wrapped type
defines them. Previously we only did so if the nested type derived
from {unary,binary}_function but the C++0x draft was changed to stop
depending on those types.

Finally, I've also added missing ref/cref overloads to prevent
creating reference_wrapper to rvalues.

mem_fn doesn't yet support varargs member functions, I'll fix that
when PR c++/48424 is fixed.

Tested x86_64-linux, committed to trunk.  Will wait a day or two and
commit to 4.6 too.
Index: include/std/type_traits
===
--- include/std/type_traits	(revision 172708)
+++ include/std/type_traits	(working copy)
@@ -1583,18 +1583,6 @@
 
   /// underlying_type (still unimplemented)
 
-  /// result_of
-  template
-class result_of;
-
-  template
-struct result_of<_Functor(_ArgTypes...)>
-{
-  typedef
-decltype( std::declval<_Functor>()(std::declval<_ArgTypes>()...) )
-type;
-};
-
   /// declval
   template
 struct __declval_protector
@@ -1612,6 +1600,98 @@
   return __declval_protector<_Tp>::__delegate();
 }
 
+  /// result_of
+  template
+class result_of;
+
+  template
+struct _Result_of_memobj;
+
+  template
+struct _Result_of_memobj<_Res _Class::*, _Arg>
+{
+private:
+  typedef _Res _Class::* _Func;
+
+  template
+	static _Tp _S_get(const _Class&);
+  template
+	static decltype(*std::declval<_Tp>()) _S_get(...);
+
+public:
+  typedef
+decltype(_S_get<_Arg>(std::declval<_Arg>()).*std::declval<_Func>())
+__type;
+};
+
+  template
+struct _Result_of_memfun;
+
+  template
+struct _Result_of_memfun<_Res _Class::*, _Arg, _Args...>
+{
+private:
+  typedef _Res _Class::* _Func;
+
+  template
+	static _Tp _S_get(const _Class&);
+  template
+	static decltype(*std::declval<_Tp>()) _S_get(...);
+
+public:
+  typedef
+decltype((_S_get<_Arg>(std::declval<_Arg>()).*std::declval<_Func>())
+(std::declval<_Args>()...) )
+__type;
+};
+
+  template
+struct _Result_of_impl;
+
+  template
+struct _Result_of_impl
+{
+  typedef
+decltype( std::declval<_Functor>()(std::declval<_ArgTypes>()...) )
+__type;
+};
+
+  template
+struct _Result_of_impl
+: _Result_of_memobj::type, _Arg>
+{
+  typedef typename _Result_of_memobj<
+	typename remove_reference<_MemPtr>::type, _Arg>::__type
+	__type;
+};
+
+  template
+struct _Result_of_impl
+: _Result_of_memfun::type, _Arg,
+_ArgTypes...>
+{
+  typedef typename _Result_of_memfun<
+	typename remove_reference<_MemPtr>::type, _Arg, _ArgTypes...>::__type
+	__type;
+};
+
+  template
+struct result_of<_Functor(_ArgTypes...)>
+: _Result_of_impl::type >::value,
+  is_member_function_pointer<
+			typename remove_reference<_Functor>::type >::value,
+		  _Functor, _ArgTypes...>
+{
+  typedef typename _Result_of_impl<
+	is_member_object_pointer<
+	  typename remove_reference<_Functor>::type >::value,
+is_member_function_point

Re: RFA: Gimple calls to "internal" functions

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 3:14 PM, Richard Sandiford
 wrote:
> Richard Guenther  writes:
>>> We really should be able to treat calls to pure internal functions
>>> like calls to pure "real" functions though.
>>>
>>> TBH, I think this is an example of why trying to so hard to avoid a tree
>>> code for the internal function is working against us.  Most of the patch
>>> feels like sprinkling hacks around the code base just because we don't
>>> have a real tree representation for what we're calling.  Any new code
>>> that handles calls is going to have to remember to make the same
>>> distinction.
>>
>> Most code either doesn't care about the callee or only handles calls
>> to a specific function and thus already checks whether gimple_call_fndecl ()
>> returns NULL.  The patch doesn't feel like it adds hacks everywhere,
>> instead it is quite small (smaller than I thought at least).
>
> OK, just testing your resolve :-)
>
> Here's a revised patch, updated this morning.  Main changes:
>
> * gimple_call_fn and gimple_call_fntype can be called unconditionally.
>  They return null for internal functions.
>
> * gimple_call_set_fn, gimple_call_set_fntype and gimple_call_set_fndecl
>  assert that the call isn't to an internal function.
>
> * gimple_call_set_internal_fn asserts the opposite.
>
> * verify_gimple_call checks internalness vs. nullness.  This shouldn't
>  trigger with the asserts above, but it's there Just In Case.
>
> * I've not moved code around in verify_gimple_call but added null checks
>  instead.
>
> * Some other places also check for nullness rather than specifically for
>  internal calls.
>
> * gimple_call_same_target_p uses the expression you suggested.
>
> * The expanders now take the gimple call statement rather than the
>  tree operands.
>
> * I've made the value-numbering changes you suggested.
>
> I didn't factor out the dumping code because the three cases are
> doing different things (one raw dump, one pretty dump to a buffer,
> and one pretty dump to a file).
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok with me.  Diego, can you have a 2nd eye?

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * Makefile.in (INTERNAL_FN_DEF, INTERNAL_FN_H): Define.
>        (GIMPLE_H): Include $(INTERNAL_FN_H).
>        (OBJS-common): Add internal-fn.o.
>        (internal-fn.o): New rule.
>        * internal-fn.def: New file.
>        * internal-fn.h: Likewise.
>        * internal-fn.c: Likewise.
>        * gimple.h: Include internal-fn.h.
>        (GF_CALL_INTERNAL): New gf_mask.
>        (gimple_statement_call): Put fntype into a union with a new
>        internal_fn field.
>        (gimple_build_call_internal): Declare.
>        (gimple_build_call_internal_vec): Likewise.
>        (gimple_call_same_target_p): Likewise.
>        (gimple_call_internal_p): New function.
>        (gimple_call_internal_fn): Likewise.
>        (gimple_call_fntype): Return null for internal calls.
>        (gimple_call_set_fntype): Assert that the function is not internal.
>        (gimple_call_set_fn): Likewise.
>        (gimple_call_set_fndecl): Likewise.
>        (gimple_call_set_internal_fn): New function.
>        (gimple_call_addr_fndecl): Handle null functions.
>        (gimple_call_return_type): Likewise null types.
>        * gimple.c (gimple_build_call_internal_1): New function.
>        (gimple_build_call_internal): Likewise.
>        (gimple_build_call_internal_vec): Likewise.
>        (gimple_call_same_target_p): Likewise.
>        (gimple_call_flags): Handle calls to internal functions.
>        (gimple_call_fnspec): New function.
>        (gimple_call_arg_flags, gimple_call_return_flags): Use it.
>        (gimple_has_side_effects): Handle null functions.
>        (gimple_rhs_has_side_effects): Likewise.
>        (gimple_call_copy_skip_args): Handle calls to internal functions.
>        * cfgexpand.c (expand_call_stmt): Likewise.
>        * expr.c (expand_expr_real_1): Assert that the call isn't internal.
>        * gimple-fold.c (gimple_fold_call): Handle null functions.
>        (gimple_fold_stmt_to_constant_1): Don't fold
>        calls to internal functions.
>        * gimple-low.c (gimple_check_call_args): Handle calls to internal
>        functions.
>        * gimple-pretty-print.c (dump_gimple_call): Likewise.
>        * ipa-prop.c (ipa_analyze_call_uses): Handle null functions.
>        * tree-cfg.c (verify_gimple_call): Handle calls to internal functions.
>        (do_warn_unused_result): Likewise.
>        * tree-eh.c (same_handler_p): Use gimple_call_same_target_p.
>        * tree-ssa-ccp.c (ccp_fold_stmt): Handle calls to internal functions.
>        * tree-ssa-dom.c (hashable_expr): Use the gimple statement to record
>        the target of a call.
>        (initialize_hash_element): Update accordingly.
>        (hashable_expr_equal_p): Use gimple_call_same_target_p.
>        (iterative_hash_hashable_expr): Handle calls to internal functions.
>        (print_expr_hash_elt): L

Re: RFA: Gimple calls to "internal" functions

2011-04-19 Thread Diego Novillo

On 04/19/2011 09:14 AM, Richard Sandiford wrote:


Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?


OK with me.  Thanks for your patience.  Just a couple of minor nits below



+  extern const int internal_fn_flags_array[];
+  return internal_fn_flags_array[(int) fn];
+}
+
+extern void expand_internal_call (gimple stmt);


s/stmt//


  static inline tree
  gimple_call_addr_fndecl (const_tree fn)
  {
-  if (TREE_CODE (fn) == ADDR_EXPR)
+  if (fn&&  TREE_CODE (fn) == ADDR_EXPR)


Can't tell if thunderbird is being silly about spaces again or you are 
missing a space before '&&'.  I see it in a couple other places below, 
so I think it's just TB.



Diego.


[patch] Do not generate discriminator directive in strict mode

2011-04-19 Thread Eric Botcazou
Hi,

it appears that the (standard DWARF as of version 4) discriminator directive 
can confuse non-GDB DWARF 2/3 debuggers, so this patch changes the compiler 
to stop emitting it in strict mode.

Tested on i586-suse-linux, OK for the mainline?


2011-04-19  Eric Botcazou  

* dwarf2out.c (dwarf2out_source_line): Emit "discriminator" directive
only for version 4 or above, or else in non-strict mode.


-- 
Eric Botcazou
Index: dwarf2out.c
===
--- dwarf2out.c	(revision 172693)
+++ dwarf2out.c	(working copy)
@@ -22232,7 +22240,9 @@ dwarf2out_source_line (unsigned int line
   fprintf (asm_out_file, "\t.loc %d %d 0", file_num, line);
   if (is_stmt != table->is_stmt)
 	fprintf (asm_out_file, " is_stmt %d", is_stmt ? 1 : 0);
-  if (SUPPORTS_DISCRIMINATOR && discriminator != 0)
+  if (SUPPORTS_DISCRIMINATOR
+	  && discriminator != 0
+	  && (dwarf_version >= 4 || !dwarf_strict))
 	fprintf (asm_out_file, " discriminator %d", discriminator);
   fputc ('\n', asm_out_file);
 }


[PATCH] Don't use ./tmp0 for site.exp generation

2011-04-19 Thread Richard Guenther

This patch sits in all my development trees because delta uses
tmp? named directories for storing intermediate files.  This causes
a make check to fail.

Maybe there is an even better way to create a truly temporary file
name (that is even portable), but the following patch simply avoids
using tmp0 in favor of tmpDG.

Bootstrap and regtest running on x86_64-unknown-linux-gnu (not that
I expect any problems).

Ok for trunk?

Thanks,
Richard.

2011-04-19  Richard Guenther  

* Makefile.in (site.exp): Do not use tmp0 but tmpDG as temporary
file name.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 172709)
+++ gcc/Makefile.in (working copy)
@@ -4837,39 +4837,39 @@ target_subdir = @target_subdir@
 
 site.exp: ./config.status Makefile
@echo "Making a new config file..."
-   -@rm -f ./tmp?
+   -@rm -f ./tmpDG
@$(STAMP) site.exp
-@mv site.exp site.bak
-   @echo "## these variables are automatically generated by make ##" > 
./tmp0
-   @echo "# Do not edit here. If you wish to override these values" >> 
./tmp0
-   @echo "# add them to the last section" >> ./tmp0
-   @echo "set rootme \"`${PWD_COMMAND}`\"" >> ./tmp0
-   @echo "set srcdir \"`cd ${srcdir}; ${PWD_COMMAND}`\"" >> ./tmp0
-   @echo "set host_triplet $(host)" >> ./tmp0
-   @echo "set build_triplet $(build)" >> ./tmp0
-   @echo "set target_triplet $(target)" >> ./tmp0
-   @echo "set target_alias $(target_noncanonical)" >> ./tmp0
-   @echo "set libiconv \"$(LIBICONV)\"" >> ./tmp0
+   @echo "## these variables are automatically generated by make ##" > 
./tmpDG
+   @echo "# Do not edit here. If you wish to override these values" >> 
./tmpDG
+   @echo "# add them to the last section" >> ./tmpDG
+   @echo "set rootme \"`${PWD_COMMAND}`\"" >> ./tmpDG
+   @echo "set srcdir \"`cd ${srcdir}; ${PWD_COMMAND}`\"" >> ./tmpDG
+   @echo "set host_triplet $(host)" >> ./tmpDG
+   @echo "set build_triplet $(build)" >> ./tmpDG
+   @echo "set target_triplet $(target)" >> ./tmpDG
+   @echo "set target_alias $(target_noncanonical)" >> ./tmpDG
+   @echo "set libiconv \"$(LIBICONV)\"" >> ./tmpDG
 # CFLAGS is set even though it's empty to show we reserve the right to set it.
-   @echo "set CFLAGS \"\"" >> ./tmp0
-   @echo "set CXXFLAGS \"\"" >> ./tmp0
-   @echo "set HOSTCC \"$(CC)\"" >> ./tmp0
-   @echo "set HOSTCFLAGS \"$(CFLAGS)\"" >> ./tmp0
+   @echo "set CFLAGS \"\"" >> ./tmpDG
+   @echo "set CXXFLAGS \"\"" >> ./tmpDG
+   @echo "set HOSTCC \"$(CC)\"" >> ./tmpDG
+   @echo "set HOSTCFLAGS \"$(CFLAGS)\"" >> ./tmpDG
 # When running the tests we set GCC_EXEC_PREFIX to the install tree so that
 # files that have already been installed there will be found.  The -B option
 # overrides it, so use of GCC_EXEC_PREFIX will not result in using GCC files
 # from the install tree.
-   @echo "set TEST_GCC_EXEC_PREFIX \"$(libdir)/gcc/\"" >> ./tmp0
-   @echo "set TESTING_IN_BUILD_TREE 1" >> ./tmp0
-   @echo "set HAVE_LIBSTDCXX_V3 1" >> ./tmp0
+   @echo "set TEST_GCC_EXEC_PREFIX \"$(libdir)/gcc/\"" >> ./tmpDG
+   @echo "set TESTING_IN_BUILD_TREE 1" >> ./tmpDG
+   @echo "set HAVE_LIBSTDCXX_V3 1" >> ./tmpDG
@if test "@enable_plugin@" = "yes" ; then \
- echo "set ENABLE_PLUGIN 1" >> ./tmp0; \
- echo "set PLUGINCC \"$(PLUGINCC)\"" >> ./tmp0; \
- echo "set PLUGINCFLAGS \"$(PLUGINCFLAGS)\"" >> ./tmp0; \
- echo "set GMPINC \"$(GMPINC)\"" >> ./tmp0; \
+ echo "set ENABLE_PLUGIN 1" >> ./tmpDG; \
+ echo "set PLUGINCC \"$(PLUGINCC)\"" >> ./tmpDG; \
+ echo "set PLUGINCFLAGS \"$(PLUGINCFLAGS)\"" >> ./tmpDG; \
+ echo "set GMPINC \"$(GMPINC)\"" >> ./tmpDG; \
fi
@if test "@enable_lto@" = "yes" ; then \
- echo "set ENABLE_LTO 1" >> ./tmp0; \
+ echo "set ENABLE_LTO 1" >> ./tmpDG; \
fi
 # If newlib has been configured, we need to pass -B to gcc so it can find
 # newlib's crt0.o if it exists.  This will cause a "path prefix not used"
@@ -4882,36 +4882,36 @@ site.exp: ./config.status Makefile
 # We also need to pass -L ../ld so that the linker can find ldscripts.
@if [ -d $(objdir)/../$(target_subdir)/newlib ] \
&& [ "${host}" != "${target}" ]; then \
- echo "set newlib_cflags 
\"-I$(objdir)/../$(target_subdir)/newlib/targ-include 
-I\$$srcdir/../newlib/libc/include\"" >> ./tmp0; \
- echo "set newlib_ldflags \"-B$(objdir)/../$(target_subdir)/newlib/\"" 
>> ./tmp0; \
- echo "append CFLAGS \" \$$newlib_cflags\"" >> ./tmp0; \
- echo "append CXXFLAGS \" \$$newlib_cflags\"" >> ./tmp0; \
- echo "append LDFLAGS \" \$$newlib_ldflags\"" >> ./tmp0; \
+ echo "set newlib_cflags 
\"-I$(objdir)/../$(target_subdir)/newlib/targ-include 
-I\$$srcdir/../newlib/libc/include\"" >> ./tmpDG; \
+   

Re: [PATCH] Don't use ./tmp0 for site.exp generation

2011-04-19 Thread Rainer Orth
Richard Guenther  writes:

> This patch sits in all my development trees because delta uses
> tmp? named directories for storing intermediate files.  This causes
> a make check to fail.
>
> Maybe there is an even better way to create a truly temporary file
> name (that is even portable), but the following patch simply avoids
> using tmp0 in favor of tmpDG.

Perhaps it would be better to use site.tmp to match current git
automake?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Don't use ./tmp0 for site.exp generation

2011-04-19 Thread Richard Guenther
On Tue, 19 Apr 2011, Rainer Orth wrote:

> Richard Guenther  writes:
> 
> > This patch sits in all my development trees because delta uses
> > tmp? named directories for storing intermediate files.  This causes
> > a make check to fail.
> >
> > Maybe there is an even better way to create a truly temporary file
> > name (that is even portable), but the following patch simply avoids
> > using tmp0 in favor of tmpDG.
> 
> Perhaps it would be better to use site.tmp to match current git
> automake?

I can certainly use any other temporary name, just the tmp0 use
is annoying ;)  It's a matter of a single search-and-replace.

Richard.


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Jan Hubicka
> I thought the idea was to use __builtin_va_arg_pack and friends.
> Of course the inliner would still need to know how to inline such
> a va-arg forwarder, and we would need a way to expand them (well,
> or just go the existing special casing).  We might need this
> kind of inliner support anyway because of the partial inlining
> opportunity in libquantum which is for a varargs function like
> 
> void foo (...)
> {
>   if (always-zero-static)
> return;
> 
>   ...
> }
> 
> thus partial inlining would create such a forwarding call.

Yep, I know this is related issue.  We have builtlin_apply and
builtin_va_arg_pack. Neither of those solves the problem here.

With __builtin_va_arg_pack the thunk would be something like:
int b(int *this, ...);

int a(int *this, ...)
{ 
  return b(this+1, __builtin_va_arg_pack ());
}
and similarly for your libquantum testcase. However one gets:
jh@gcc10:~/trunk/build/gcc$ gcc -O2 t.c
t.c: In function 'a':
t.c:5: error: invalid use of '__builtin_va_arg_pack ()'
becuase va_arg_pack works only when eliminated by the inliner.

Consequentely thunks would be forced to be always inline and this contradits
their purpose to be addressed from virtual tables.

So we would need to develop equivalent of va_arg_pack that works
when not inlining. It can not be implemented generally, only for tail
calls and making nice interface for this seems tricky.

__builtin_apply don't fit here because it 
 1) produces lousy code
 2) requires knowledge about size of argument block
 3) doesn't let you to modify one of arguments.

We could go by deriving __builtlin_return into somethign like:

int a(int *this, ...)
{ 
  __builtin_return_va_arg_pack (b, this + 1)
}

that would have semantic of tail calling the function with the same arguments 
as the function was called with
with overwritting the arguments it is given that would be required to match the 
first few named arguments
of the function.

Since this construct is departed enough from gimple_call we probably would want
to have something like gimple_tailcall_va_arg_pack or so.  We will need to
develop expand machinery - I am not sure one can implement expander of this
without introducing new macros. Maybe it is - it essentially means to load
argument registers from original argument registers and tailcall and even for
funnier architectures I think one can just copy what calls.c does already for
tailcalls with the slight semantics changes...
I am not really aware with all details of all calling conventions we support to
be sure however.

I am not really sure if it is really prettier than the irritating special case
in cgraph, however?
> 
> But well, it's just my random 2 cents on the issue ;)
:)

Honza
> 
> Richard.


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Richard Guenther
On Tue, 19 Apr 2011, Jan Hubicka wrote:

> > I thought the idea was to use __builtin_va_arg_pack and friends.
> > Of course the inliner would still need to know how to inline such
> > a va-arg forwarder, and we would need a way to expand them (well,
> > or just go the existing special casing).  We might need this
> > kind of inliner support anyway because of the partial inlining
> > opportunity in libquantum which is for a varargs function like
> > 
> > void foo (...)
> > {
> >   if (always-zero-static)
> > return;
> > 
> >   ...
> > }
> > 
> > thus partial inlining would create such a forwarding call.
> 
> Yep, I know this is related issue.  We have builtlin_apply and
> builtin_va_arg_pack. Neither of those solves the problem here.
> 
> With __builtin_va_arg_pack the thunk would be something like:
> int b(int *this, ...);
> 
> int a(int *this, ...)
> { 
>   return b(this+1, __builtin_va_arg_pack ());
> }
> and similarly for your libquantum testcase. However one gets:
> jh@gcc10:~/trunk/build/gcc$ gcc -O2 t.c
> t.c: In function 'a':
> t.c:5: error: invalid use of '__builtin_va_arg_pack ()'
> becuase va_arg_pack works only when eliminated by the inliner.
> 
> Consequentely thunks would be forced to be always inline and this contradits
> their purpose to be addressed from virtual tables.
> 
> So we would need to develop equivalent of va_arg_pack that works
> when not inlining. It can not be implemented generally, only for tail
> calls and making nice interface for this seems tricky.
> 
> __builtin_apply don't fit here because it 
>  1) produces lousy code
>  2) requires knowledge about size of argument block
>  3) doesn't let you to modify one of arguments.
> 
> We could go by deriving __builtlin_return into somethign like:
> 
> int a(int *this, ...)
> { 
>   __builtin_return_va_arg_pack (b, this + 1)
> }
> 
> that would have semantic of tail calling the function with the same arguments 
> as the function was called with
> with overwritting the arguments it is given that would be required to match 
> the first few named arguments
> of the function.
> 
> Since this construct is departed enough from gimple_call we probably would 
> want
> to have something like gimple_tailcall_va_arg_pack or so.  We will need to
> develop expand machinery - I am not sure one can implement expander of this
> without introducing new macros. Maybe it is - it essentially means to load
> argument registers from original argument registers and tailcall and even for
> funnier architectures I think one can just copy what calls.c does already for
> tailcalls with the slight semantics changes...
> I am not really aware with all details of all calling conventions we support 
> to
> be sure however.
> 
> I am not really sure if it is really prettier than the irritating special case
> in cgraph, however?

Well, even for the partial inlining case I'd devise a scheme that
if a call to such a forwarder remains it gets expanded as a call to
the original (non-split) function (similar to emitting a call to
the asm-thunk instead of expanding the gimple representation of the 
thunk).  I realize this is another special case, but I like it more
as it appears to be more flexible to me ;)  (it's similar to
the redefined extern inline function case, one function body for
inlining and another one for call targets)

> > But well, it's just my random 2 cents on the issue ;)
> :)
:)

Richard.


Fix hot/cold code in ipa-cp

2011-04-19 Thread Jan Hubicka
Hi,
while removing use of optimize_function_for_size_p from the ipa-inliner (since 
we already
check cgraph_maybe_hot_edge_p that include the test), I noticed that
optimize_function_for_size_p (DECL_STRUCT_FUNCTION (node->decl)) won't give the 
expected
results on WPA when DECL_STRUCT_FUNCTION is NULL.
Fixed by adding cgraph_optimize_for_size_p.

Bootstrapped/regtested x86_64-linux.

Honza
* cgraph.h (cgraph_optimize_for_size_p): Declare.
* ipa-cp.c (ipcp_insert_stage): Use cgraph_optimize_for_size_p.
* predict.c (cgraph_optimize_for_size_p): Break out from ...
(optimize_function_for_size_p) ... here.
Index: cgraph.h
===
*** cgraph.h(revision 172709)
--- cgraph.h(working copy)
*** bool cgraph_comdat_can_be_unshared_p (st
*** 656,661 
--- 656,662 
  
  /* In predict.c  */
  bool cgraph_maybe_hot_edge_p (struct cgraph_edge *e);
+ bool cgraph_optimize_for_size_p (struct cgraph_node *);
  
  /* In varpool.c  */
  extern GTY(()) struct varpool_node *varpool_nodes_queue;
Index: ipa-cp.c
===
*** ipa-cp.c(revision 172709)
--- ipa-cp.c(working copy)
*** ipcp_insert_stage (void)
*** 1410,1416 
if (new_size + growth > max_new_size)
break;
if (growth
! && optimize_function_for_size_p (DECL_STRUCT_FUNCTION (node->decl)))
{
  if (dump_file)
fprintf (dump_file, "Not versioning, cold code would grow");
--- 1410,1416 
if (new_size + growth > max_new_size)
break;
if (growth
! && cgraph_optimize_for_size_p (node))
{
  if (dump_file)
fprintf (dump_file, "Not versioning, cold code would grow");
Index: predict.c
===
*** predict.c   (revision 172709)
--- predict.c   (working copy)
*** maybe_hot_edge_p (edge e)
*** 196,202 
--- 196,204 
return maybe_hot_frequency_p (EDGE_FREQUENCY (e));
  }
  
+ 
  /* Return true in case BB is probably never executed.  */
+ 
  bool
  probably_never_executed_bb_p (const_basic_block bb)
  {
*** probably_never_executed_bb_p (const_basi
*** 209,214 
--- 211,229 
return false;
  }
  
+ /* Return true if NODE should be optimized for size.  */
+ 
+ bool
+ cgraph_optimize_for_size_p (struct cgraph_node *node)
+ {
+   if (optimize_size)
+ return true;
+   if (node && (node->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
+ return true;
+   else
+ return false;
+ }
+ 
  /* Return true when current function should always be optimized for size.  */
  
  bool
*** optimize_function_for_size_p (struct fun
*** 220,230 
  return true;
if (!fun || !fun->decl)
  return false;
!   node = cgraph_get_node (fun->decl);
!   if (node && (node->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
! return true;
!   else
! return false;
  }
  
  /* Return true when current function should always be optimized for speed.  */
--- 235,241 
  return true;
if (!fun || !fun->decl)
  return false;
!   return cgraph_optimize_for_size_p (cgraph_get_node (fun->decl));
  }
  
  /* Return true when current function should always be optimized for speed.  */


Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

2011-04-19 Thread Jan Hubicka
Hi,
> 
> Well, even for the partial inlining case I'd devise a scheme that
> if a call to such a forwarder remains it gets expanded as a call to
> the original (non-split) function (similar to emitting a call to
> the asm-thunk instead of expanding the gimple representation of the 
> thunk).  I realize this is another special case, but I like it more

For partial inlining I do have plan.  I intend to simply mark edges of calls
containing __builtin_va_arg_pack argument, arrange it to disappear when
function gets inlinined and make inliner to do final pass after inlining small
function inlining all those calls.

This will work nicely for partial inlining - when all header functions gets
inlined, the offline copy will remain.  When header won't get inlined, the
function will be merged again.

When some of them gets inlined, the partially inlined function will get
unnecesiraly dupicated, but just twice that is not bad for this weird side case
we speak about.

It won't help thunks where offline copy may neccesarily exist and we already
have thunks to avoid that duplication of callee function body after all.

I think we have options:

  1) keep them "non functions" in the callgraph.  Then we need
 a) I) make inliner aware of expanding the thunk whenever we decide to 
inline
   and call edge that "goes through thunk".
OR
II) make the direct calls to thunks not appearing in our IL and being
directly folded into inline code of the thunk
 b) teach ipa-prop to compute jump functions correctly for thunked edges:
for devirtualization happening at IPA we will neccesarily need represent
those edges without having the adjustment visible at analysis time.
  2) make thunk gimple representible.  For this we need
 a) introduce gimple representation of thunk tail call, i.e. 
gimple_tailcall_va_arg_pack or something
 b) make gimple passes to understand it.
 c) I) invent expansion machinery
OR
II) teach IPA passes to special case bodies of thunk and do not change 
them so current
ASM machinery stays around
 d) teach ipa-inline and ipa-prop to handle this new type of call and 
propagate through.

Martin implemented 1-II variant since. It doesn't seem that much worse than
together alternatives to me and is significandly easier to implement.

Main drawback is the need for nice cgraph representation.  Representing thunks
completely on side as we do now has turned out to be confusing and not really
pretty. So we want thunks appear as full fledged cgraph node with call edge to
function they are thunking.

This will neccesarily impose some extra work on IPA passes since they will not
be able to easilly expect that everything in cgraph is a function.  But we have
precisely the same problem with alias and unless we want to go for aliases
represented as gimple bodies symmetric to "empty thunks", we will need this
special case anyway.

As for alias/thunk, I plan to make them symbol table entries and make callgraph
edges to have "caller" and "target". Target of call is the symbol called, not
neccesarily function.  Then I plan to have cgraph_edge_callee accesor that will
look into real target since it is what majority of IPA code cares about.

Also we will need to have cgraph_edge_callee_body_visibility instead of global
cgraph_body_visibility we have now since it depends if you call through alias
that can be overwritten. We can also have function that will return the thunk
adjustments happening on the edge IMO. Since IPA cares about this on two places
(computation of jump function and inlining), it give relatively low pain.

With this API, 1-a-I or 1-a-II would not be too ugly. Still combination of both
makes most sense for me: there is little to lose by inserting the thunk code as
soon as we devirutalize during local optimization and if we won't do so we will
just keep missing optimizations. We will need II to nicely handle case when
devirutalization happens at WPA.

2-I variant is probably most generic and will solve the aforementioned partial
inlining better in a sense that we will not need to duplicate function body in
the case variadic function gets partially inlined only into some of its
callees. At the same time it seems quite invasive and IMO have very little 
pratical
benefit.  I also don't know if having extra exotic gimple construct to worry
about really makes compiler more maintainable.

Honza

> as it appears to be more flexible to me ;)  (it's similar to
> the redefined extern inline function case, one function body for
> inlining and another one for call targets)
> 
> > > But well, it's just my random 2 cents on the issue ;)
> > :)
> :)
> 
> Richard.


Re: [PATCH][ARM] New testcases for NEON

2011-04-19 Thread Richard Earnshaw
On Wed, 2011-04-13 at 16:59 +0100, Andrew Stubbs wrote:
> Hi,
> 
> This old patch has been carried in the CodeSourcery toolchain for some 
> time now. It just adds a few new testcases for vectorization.
> 
> OK?
> 
> Andrew

OK.

2008-12-03  Daniel Jacobowitz  

gcc/testsuite/
* gcc.dg/vect/vect-shift-3.c, gcc.dg/vect/vect-shift-4.c: New.
* lib/target-supports.exp (check_effective_target_vect_shift_char): New
function.

R.





Make vectoriser use operand_equal_p to compare base addresses

2011-04-19 Thread Richard Sandiford
In the attached testcase, we treat:

sum += x[i][0] * x[i][0] + x[i][1] * x[i][1];

as being two independent strided loads: x[i][0] and x[i][1].
On targets with appropriate support, we therefore use two interleaved
loads rather than one, then discard one half of each result.
This does not happen if bar() is compiled on its own.

The code before vectorisation looks like this:

:
  i.0_5 = (unsigned int) i_32;
  D.3532_6 = i.0_5 * 320;
  D.3533_8 = x_7(D) + D.3532_6;

:
  i.1_14 = (unsigned int) i_34;
  D.3559_15 = i.1_14 * 8;
  D.3558_16 = D.3533_8 + D.3559_15;
  D.3557_17 = *D.3558_16[0];
  D.3555_19 = *D.3558_16[1];
  ...

The two DR_BASE_ADDRESSES are therefore the expansion of D.3558_8:

x_7(D) + (unsigned int) i_32 * 320

which we don't think are equal.  The comment in tree-data-refs.h
made me think that (unsigned int) i_32 * 320 ought really to be
in the DR_OFFSET, but that only seems to apply to offsets that
occur in the memory reference itself.

Fixed by using operand_equal_p to compare base addresses.
Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

I've not made the test dependent on a target selector because
it seemed like the message should be printed regardless.

Richard


gcc/
* tree-vect-data-refs.c (vect_drs_dependent_in_basic_block): Use
operand_equal_p to compare DR_BASE_ADDRESSes.
(vect_check_interleaving): Likewise.

gcc/testsuite/
* gcc.dg/vect/vect-119.c: New test.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2011-04-19 14:59:58.0 +0100
+++ gcc/tree-vect-data-refs.c   2011-04-19 14:59:58.0 +0100
@@ -353,11 +353,7 @@ vect_drs_dependent_in_basic_block (struc
 
   /* Check that the data-refs have same bases and offsets.  If not, we can't
  determine if they are dependent.  */
-  if ((DR_BASE_ADDRESS (dra) != DR_BASE_ADDRESS (drb)
-   && (TREE_CODE (DR_BASE_ADDRESS (dra)) != ADDR_EXPR
-   || TREE_CODE (DR_BASE_ADDRESS (drb)) != ADDR_EXPR
-   || TREE_OPERAND (DR_BASE_ADDRESS (dra), 0)
-   != TREE_OPERAND (DR_BASE_ADDRESS (drb),0)))
+  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
   || !dr_equal_offsets_p (dra, drb))
 return true;
 
@@ -403,11 +399,7 @@ vect_check_interleaving (struct data_ref
 
   /* Check that the data-refs have same first location (except init) and they
  are both either store or load (not load and store).  */
-  if ((DR_BASE_ADDRESS (dra) != DR_BASE_ADDRESS (drb)
-   && (TREE_CODE (DR_BASE_ADDRESS (dra)) != ADDR_EXPR
-  || TREE_CODE (DR_BASE_ADDRESS (drb)) != ADDR_EXPR
-  || TREE_OPERAND (DR_BASE_ADDRESS (dra), 0)
-  != TREE_OPERAND (DR_BASE_ADDRESS (drb),0)))
+  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
   || !dr_equal_offsets_p (dra, drb)
   || !tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb))
   || DR_IS_READ (dra) != DR_IS_READ (drb))
Index: gcc/testsuite/gcc.dg/vect/vect-119.c
===
--- /dev/null   2011-03-23 08:42:11.268792848 +
+++ gcc/testsuite/gcc.dg/vect/vect-119.c2011-04-19 14:59:58.0 
+0100
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+
+#define OUTER 32
+#define INNER 40
+
+static unsigned int
+bar (const unsigned int x[INNER][2], unsigned int sum)
+{
+  int i;
+
+  for (i = 0; i < INNER; i++)
+sum += x[i][0] * x[i][0] + x[i][1] * x[i][1];
+  return sum;
+}
+
+unsigned int foo (const unsigned int x[OUTER][INNER][2])
+{
+  int i;
+  unsigned int sum;
+
+  sum = 0.0f;
+  for (i = 0; i < OUTER; i++)
+sum = bar (x[i], sum);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "Detected interleaving of size 2" 1 
"vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


Re: [PATCH v3] Re: avoid useless if-before-free tests

2011-04-19 Thread Jim Meyering
Michael Matz wrote:
> On Mon, 18 Apr 2011, Janne Blomqvist wrote:
>
>> (Why we, in the age of non-sucky version control, persist in keeping
>> manual changelog files is beyond me..)
>
> The one single reason why I'm very happy about our ChangeLog file policy
> is that I can grep it easily.  Listing the changelog of the whole tree via
> svn (or anything else) up to some years ago is very slow and generates
> quite some load on the server.  Sure there are various work-arounds for
> that, but I'm happy to not have to resolve to such work-arounds.

I doubt anyone would be happy to get rid of ChangeLog files altogether.
In coreutils and other projects, we have a compromise.
There is no VC'd ChangeLog file, yet every release tarball includes
the expected (and up to date) ChangeLog file.  How?

It generates that ChangeLog file from the git commit log.
Currently it does it only when creating a distribution tarball
but it'd be easy to add a target to recreate all (or a subset of)
generated ChangeLog files

Doing something similar for gcc would be trivial
if you'd be ok to switch to a single top-level ChangeLog file,
but I doubt folks would go for that.
However, it wouldn't be much more work to automatically
generated ChangeLog entries for all 44 of the existing files.

... in case anyone ever wants to go that route.


[PATCH] Disallow i?86/x86_64 movstrict for subregs of non-MODE_INT regs (PR target/48678)

2011-04-19 Thread Jakub Jelinek
Hi!

Since MEM_REF has been added, we can unlike in 4.5 and earlier, arbitrary
VCEs on the LHS.  Reload isn't able to reload
(strict_low_part (subreg:HI (reg:V2DI ...)))
on the LHS, while it probably should be taught to do that, such
movstrict[qh]i IMHO will only very rarely lead to good code,
unless pinsrw/pinsrq can be used (currently we have not a suitable
movstrict* pattern using pinsr*).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6?

2011-04-19  Jakub Jelinek  

PR target/48678
* config/i386/i386.md (movstrict): FAIL if operands[0]
is a SUBREG with non-MODE_INT mode inside of it.

* gcc.target/i386/pr48678.c: New test.

--- gcc/config/i386/i386.md.jj  2011-04-06 16:33:26.0 +0200
+++ gcc/config/i386/i386.md 2011-04-19 14:08:55.0 +0200
@@ -2427,6 +2427,9 @@ (define_expand "movstrict"
 {
   if (TARGET_PARTIAL_REG_STALL && optimize_function_for_speed_p (cfun))
 FAIL;
+  if (GET_CODE (operands[0]) == SUBREG
+  && GET_MODE_CLASS (GET_MODE (SUBREG_REG (operands[0]))) != MODE_INT)
+FAIL;
   /* Don't generate memory->memory moves, go through a register */
   if (MEM_P (operands[0]) && MEM_P (operands[1]))
 operands[1] = force_reg (mode, operands[1]);
--- gcc/testsuite/gcc.target/i386/pr48678.c.jj  2011-04-19 14:02:33.0 
+0200
+++ gcc/testsuite/gcc.target/i386/pr48678.c 2011-04-19 14:07:08.0 
+0200
@@ -0,0 +1,16 @@
+/* PR target/48678 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#include 
+
+typedef short T __attribute__((may_alias));
+struct S { __m128i d; };
+
+__m128i
+foo (short *x, struct S *y, __m128i *z)
+{
+  struct S s = *y;
+  ((T *) &s.d)[0] = *x;
+  return _mm_cmpeq_epi16 (s.d, *z);
+}

Jakub


Re: Make vectoriser use operand_equal_p to compare base addresses

2011-04-19 Thread Richard Guenther
On Tue, Apr 19, 2011 at 4:59 PM, Richard Sandiford
 wrote:
> In the attached testcase, we treat:
>
>    sum += x[i][0] * x[i][0] + x[i][1] * x[i][1];
>
> as being two independent strided loads: x[i][0] and x[i][1].
> On targets with appropriate support, we therefore use two interleaved
> loads rather than one, then discard one half of each result.
> This does not happen if bar() is compiled on its own.
>
> The code before vectorisation looks like this:
>
> :
>  i.0_5 = (unsigned int) i_32;
>  D.3532_6 = i.0_5 * 320;
>  D.3533_8 = x_7(D) + D.3532_6;
>
> :
>  i.1_14 = (unsigned int) i_34;
>  D.3559_15 = i.1_14 * 8;
>  D.3558_16 = D.3533_8 + D.3559_15;
>  D.3557_17 = *D.3558_16[0];
>  D.3555_19 = *D.3558_16[1];
>  ...
>
> The two DR_BASE_ADDRESSES are therefore the expansion of D.3558_8:
>
>    x_7(D) + (unsigned int) i_32 * 320
>
> which we don't think are equal.  The comment in tree-data-refs.h
> made me think that (unsigned int) i_32 * 320 ought really to be
> in the DR_OFFSET, but that only seems to apply to offsets that
> occur in the memory reference itself.
>
> Fixed by using operand_equal_p to compare base addresses.
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
>
> I've not made the test dependent on a target selector because
> it seemed like the message should be printed regardless.

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * tree-vect-data-refs.c (vect_drs_dependent_in_basic_block): Use
>        operand_equal_p to compare DR_BASE_ADDRESSes.
>        (vect_check_interleaving): Likewise.
>
> gcc/testsuite/
>        * gcc.dg/vect/vect-119.c: New test.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2011-04-19 14:59:58.0 +0100
> +++ gcc/tree-vect-data-refs.c   2011-04-19 14:59:58.0 +0100
> @@ -353,11 +353,7 @@ vect_drs_dependent_in_basic_block (struc
>
>   /* Check that the data-refs have same bases and offsets.  If not, we can't
>      determine if they are dependent.  */
> -  if ((DR_BASE_ADDRESS (dra) != DR_BASE_ADDRESS (drb)
> -       && (TREE_CODE (DR_BASE_ADDRESS (dra)) != ADDR_EXPR
> -           || TREE_CODE (DR_BASE_ADDRESS (drb)) != ADDR_EXPR
> -           || TREE_OPERAND (DR_BASE_ADDRESS (dra), 0)
> -           != TREE_OPERAND (DR_BASE_ADDRESS (drb),0)))
> +  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
>       || !dr_equal_offsets_p (dra, drb))
>     return true;
>
> @@ -403,11 +399,7 @@ vect_check_interleaving (struct data_ref
>
>   /* Check that the data-refs have same first location (except init) and they
>      are both either store or load (not load and store).  */
> -  if ((DR_BASE_ADDRESS (dra) != DR_BASE_ADDRESS (drb)
> -       && (TREE_CODE (DR_BASE_ADDRESS (dra)) != ADDR_EXPR
> -          || TREE_CODE (DR_BASE_ADDRESS (drb)) != ADDR_EXPR
> -          || TREE_OPERAND (DR_BASE_ADDRESS (dra), 0)
> -          != TREE_OPERAND (DR_BASE_ADDRESS (drb),0)))
> +  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
>       || !dr_equal_offsets_p (dra, drb)
>       || !tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb))
>       || DR_IS_READ (dra) != DR_IS_READ (drb))
> Index: gcc/testsuite/gcc.dg/vect/vect-119.c
> ===
> --- /dev/null   2011-03-23 08:42:11.268792848 +
> +++ gcc/testsuite/gcc.dg/vect/vect-119.c        2011-04-19 14:59:58.0 
> +0100
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +
> +#define OUTER 32
> +#define INNER 40
> +
> +static unsigned int
> +bar (const unsigned int x[INNER][2], unsigned int sum)
> +{
> +  int i;
> +
> +  for (i = 0; i < INNER; i++)
> +    sum += x[i][0] * x[i][0] + x[i][1] * x[i][1];
> +  return sum;
> +}
> +
> +unsigned int foo (const unsigned int x[OUTER][INNER][2])
> +{
> +  int i;
> +  unsigned int sum;
> +
> +  sum = 0.0f;
> +  for (i = 0; i < OUTER; i++)
> +    sum = bar (x[i], sum);
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Detected interleaving of size 2" 1 
> "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>


Re: [Patch, Fortran] PR 48588 - (4.6/4.7 regression) Resolve whole TU before generating code

2011-04-19 Thread Mikael Morin
On Tuesday 19 April 2011 12:36:11 Tobias Burnus wrote:
> Build and regtested on x86-64-linux.
> OK for the trunk - and after some grace period - for the 4.6 branch?
Yes.
Do you think we could have a case where we have to delay module namespace 
resolving as well (which we can't as we have to generate mod files)?

Mikael



Re: [google] Install cpu_defines.h (issue4440044)

2011-04-19 Thread Benjamin Kosnik

> This patch adds cpu_defines.h to the set of files to be installed.
> Doug, could you describe why we need to do this?  Will you be
> submitting this patch for trunk?

It's already in trunk, see

2011-03-15  Doug Kwan  

PR libstdc++/48123
* include/Makefile.am (install-freestanding-headers): Install
cpu_defines.h
* include/Makefile.in: Regenerate.

-benjamin


Re: [google] Install cpu_defines.h (issue4440044)

2011-04-19 Thread Diego Novillo
On Tue, Apr 19, 2011 at 12:04, Benjamin Kosnik  wrote:
>
>> This patch adds cpu_defines.h to the set of files to be installed.
>> Doug, could you describe why we need to do this?  Will you be
>> submitting this patch for trunk?
>
> It's already in trunk, see

D'oh.  Thanks.  Clearly, I was no autopilot.  Doug even said so in the
internal patch tracking tool.


Diego.


Re: [PATCH] Disallow i?86/x86_64 movstrict for subregs of non-MODE_INT regs (PR target/48678)

2011-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2011 at 4:59 PM, Jakub Jelinek  wrote:

> Since MEM_REF has been added, we can unlike in 4.5 and earlier, arbitrary
> VCEs on the LHS.  Reload isn't able to reload
> (strict_low_part (subreg:HI (reg:V2DI ...)))
> on the LHS, while it probably should be taught to do that, such
> movstrict[qh]i IMHO will only very rarely lead to good code,
> unless pinsrw/pinsrq can be used (currently we have not a suitable
> movstrict* pattern using pinsr*).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6?
>
> 2011-04-19  Jakub Jelinek  
>
>        PR target/48678
>        * config/i386/i386.md (movstrict): FAIL if operands[0]
>        is a SUBREG with non-MODE_INT mode inside of it.
>
>        * gcc.target/i386/pr48678.c: New test.

OK.

Thanks,
Uros.


Re: [Patch, Fortran] PR 48588 - (4.6/4.7 regression) Resolve whole TU before generating code

2011-04-19 Thread Tobias Burnus

On 04/19/2011 05:24 PM, Mikael Morin wrote:

On Tuesday 19 April 2011 12:36:11 Tobias Burnus wrote:

Build and regtested on x86-64-linux.
OK for the trunk - and after some grace period - for the 4.6 branch?

Yes.
Do you think we could have a case where we have to delay module namespace
resolving as well (which we can't as we have to generate mod files)?


Committed as Rev. 172718.

Thanks for the review. I don't think delaying the module namespace makes 
sense. In terms of the Fortran language, each module and each 
subroutine/function/PROGRAM/BLOCK have their own scope. Thus, in 
principle, a Fortran compiler does not know about any other item in the 
program. As modules might be needed later, it makes sense to resolve 
them in the initial order - and immediately.


The global view is only needed at resolve stage for argument checking 
(i.e. diagnostics). For the code generation that's different. At that 
stage one needs a global (whole file) view as the middle end regards 
everything as a single translation unit.


Tobias


[gomp3.1] Allow pointers and cray pointers in firstprivate/lastprivate, handle not allocated allocatable in firstprivate

2011-04-19 Thread Jakub Jelinek
Hi!

This patch includes assorted OpenMP 3.1 changes for Fortran.
Haven't changed COPYIN with not allocated allocatables yet, waiting
for explanation on OpenMP forum there.

2011-04-19  Jakub Jelinek  

PR fortran/46752
* trans-openmp.c (gfc_omp_clause_copy_ctor): Handle
non-allocated allocatable.

* openmp.c (resolve_omp_clauses): Allow POINTERs and
Cray pointers in clauses other than REDUCTION.
* trans-openmp.c (gfc_omp_predetermined_sharing): Adjust
comment.

* gfortran.dg/gomp/crayptr1.f90: Don't expect error
about Cray pointer in FIRSTPRIVATE/LASTPRIVATE.

* testsuite/libgomp.fortran/crayptr3.f90: New test.
* testsuite/libgomp.fortran/allocatable7.f90: New test.
* testsuite/libgomp.fortran/pointer1.f90: New test.
* testsuite/libgomp.fortran/pointer2.f90: New test.

--- gcc/fortran/openmp.c(revision 170933)
+++ gcc/fortran/openmp.c(working copy)
@@ -1,5 +1,5 @@
 /* OpenMP directive matching and resolving.
-   Copyright (C) 2005, 2006, 2007, 2008, 2010
+   Copyright (C) 2005, 2006, 2007, 2008, 2010, 2011
Free Software Foundation, Inc.
Contributed by Jakub Jelinek
 
@@ -940,15 +940,20 @@ resolve_omp_clauses (gfc_code *code)
n->sym->name, name, &code->loc);
if (list != OMP_LIST_PRIVATE)
  {
-   if (n->sym->attr.pointer)
+   if (n->sym->attr.pointer
+   && list >= OMP_LIST_REDUCTION_FIRST
+   && list <= OMP_LIST_REDUCTION_LAST)
  gfc_error ("POINTER object '%s' in %s clause at %L",
 n->sym->name, name, &code->loc);
/* Variables in REDUCTION-clauses must be of intrinsic type 
(flagged below).  */
-   if ((list < OMP_LIST_REDUCTION_FIRST || list > 
OMP_LIST_REDUCTION_LAST) &&
-   n->sym->ts.type == BT_DERIVED && 
n->sym->ts.u.derived->attr.alloc_comp)
+   if ((list < OMP_LIST_REDUCTION_FIRST || list > 
OMP_LIST_REDUCTION_LAST)
+&& n->sym->ts.type == BT_DERIVED
+&& n->sym->ts.u.derived->attr.alloc_comp)
  gfc_error ("%s clause object '%s' has ALLOCATABLE 
components at %L",
 name, n->sym->name, &code->loc);
-   if (n->sym->attr.cray_pointer)
+   if (n->sym->attr.cray_pointer
+   && list >= OMP_LIST_REDUCTION_FIRST
+   && list <= OMP_LIST_REDUCTION_LAST)
  gfc_error ("Cray pointer '%s' in %s clause at %L",
 n->sym->name, name, &code->loc);
  }
--- gcc/fortran/trans-openmp.c  (revision 170933)
+++ gcc/fortran/trans-openmp.c  (working copy)
@@ -1,5 +1,5 @@
 /* OpenMP directive translation -- generate GCC trees from gfc_code.
-   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
Free Software Foundation, Inc.
Contributed by Jakub Jelinek 
 
@@ -88,9 +88,7 @@ gfc_omp_predetermined_sharing (tree decl
   if (GFC_DECL_CRAY_POINTEE (decl))
 return OMP_CLAUSE_DEFAULT_PRIVATE;
 
-  /* Assumed-size arrays are predetermined to inherit sharing
- attributes of the associated actual argument, which is shared
- for all we care.  */
+  /* Assumed-size arrays are predetermined shared.  */
   if (TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
@@ -214,7 +212,8 @@ tree
 gfc_omp_clause_copy_ctor (tree clause, tree dest, tree src)
 {
   tree type = TREE_TYPE (dest), ptr, size, esize, rank, call;
-  stmtblock_t block;
+  tree cond, then_b, else_b;
+  stmtblock_t block, cond_block;
 
   if (! GFC_DESCRIPTOR_TYPE_P (type)
   || GFC_TYPE_ARRAY_AKIND (type) != GFC_ARRAY_ALLOCATABLE)
@@ -226,7 +225,9 @@ gfc_omp_clause_copy_ctor (tree clause, t
  and copied from SRC.  */
   gfc_start_block (&block);
 
-  gfc_add_modify (&block, dest, src);
+  gfc_init_block (&cond_block);
+
+  gfc_add_modify (&cond_block, dest, src);
   rank = gfc_rank_cst[GFC_TYPE_ARRAY_RANK (type) - 1];
   size = gfc_conv_descriptor_ubound_get (dest, rank);
   size = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
@@ -240,17 +241,29 @@ gfc_omp_clause_copy_ctor (tree clause, t
TYPE_SIZE_UNIT (gfc_get_element_type (type)));
   size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
  size, esize);
-  size = gfc_evaluate_now (fold_convert (size_type_node, size), &block);
-  ptr = gfc_allocate_array_with_status (&block,
+  size = gfc_evaluate_now (fold_convert (size_type_node, size), &cond_block);
+  ptr = gfc_allocate_array_with_status (&cond_block,
build_in

Re: Improve stack layout heuristic.

2011-04-19 Thread Easwaran Raman
On Tue, Apr 19, 2011 at 5:08 AM, Michael Matz  wrote:
> Hi,
>
> On Mon, 18 Apr 2011, Easwaran Raman wrote:
>
>> > > @@ -596,7 +581,7 @@
>> > >    if (vb->conflicts)
>> > >      {
>> > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
>> > > -     add_stack_var_conflict (a, stack_vars[u].representative);
>> > > +     add_stack_var_conflict (a, u);
>> >
>> > Please don't.  This uselessly bloats the conflict bitmaps.
>>
>> It is sufficient to add the conflicts of  a variable only when it is
>> not merged into some group.
>
> That is correct but is also what the use of stack_vars[u].representative
> achieves alone, ...
>
>> I am adding a check to that effect.
>
> ... without any check.
>
> @@ -596,7 +581,8 @@
>   if (vb->conflicts)
>     {
>       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> -       add_stack_var_conflict (a, stack_vars[u].representative);
> +        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
> +          add_stack_var_conflict (a, u);
>       BITMAP_FREE (vb->conflicts);
>     }
>  }
>
> What's your objective with this change?  I find the original code clearer.

Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many
members of an existing partition C. It is not necessary to add all
those conflicts to 'b' since they will be never considered in the call
to union_stack_vars. I was motivated by your comment on bit-vector
bloat to try this, but if this affects readability I'll happily revert
back to what it was before.

-Easwaran

>
>
> Ciao,
> Michael.


Re: FDO usability patch -- correct insane profile data

2011-04-19 Thread Xinliang David Li
I reverted the original patch and added the assertion instead.

Thanks,

David

On Sun, Apr 17, 2011 at 2:31 PM, Jan Hubicka  wrote:
>> Adding assertion sounds good to me -- the only problem is that problem
>> like this hard to be triggered during testing, thus making assertion
>> less useful as it can be. Is it better to add the assertion with
>> checking is enabled while still keeping the correction code?
> Hi,
> since we speak about code that has minimal to none testing coverage in our
> automates testers (we don't really test anything multithreaded or with 
> mismatching
> profiles), I don't think gcc_checking_assert would make any difference.
>
> Since we don't really want the future bug to stay around unnoticed (since it
> would resolut in silent misupdates of profiles), we are only shooting to make
> analysis easier so next time someone trips around the scenario he won't see
> symptomatic message from cgraph verifier.
> So I would go for normal assert and comment about nature of the bug.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>> On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka  wrote:
>> > Hi,
>> > hmm, looks we both run into same issue and developed different fixes.
>> >>
>> >> Good point. The change above was added based on 4.4.3 where the sum
>> >> computation has no capping like above. Yes, with the current capping,
>> >> the negative value won't result. However, it is still good to guard
>> >> the scale independent of the implementation of ipcp_compute_node_scale
>> >> -- it may change and break silently (the comment of the function says
>> >> the implementation is wrong...)
>> >
>> > Well, if we want to add check in anticipation that we will introduce bug 
>> > few
>> > lines up, I think we could just add gcc_assert (scale < BASE); with an 
>> > comment
>> > explaining that ipcp_compute_node_scale must give results in range even 
>> > when
>> > profile is not flow consistent.
>> >
>> > Since we need to propagate the scales across callgraph (at least for the
>> > correct implementation of the function), masking bug in 
>> > ipcp_compute_node_scale
>> > would make us to propagate the nonsenses further and silently producing
>> > unnecesarily aplified insanity.
>> >
>> > I would pre-approve patch reverting the current change and adding the 
>> > assert
>> > with comment.
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >>
>> >> David
>> >>
>> >>
>> >> >
>> >> > Honza
>> >> >
>> >
>


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Georg-Johann Lay
Anatoly Sokolov schrieb:
> Hi.
> 
> 
>> +
>> +/* Implement `ASM_OUTPUT_ALIGNED_DECL_LOCAL' */
>> +/* Track need of __do_clear_bss */
> 
> Put dot and two spaces after the end of a sentence. The same for other
> commens in this patch.
> 
>> +
>> +void
>> +avr_asm_output_aligned_decl_local (FILE * stream, const_tree decl
>> ATTRIBUTE_UNUSED,
>> +   const char *name, unsigned
>> HOST_WIDE_INT size,
>> +   unsigned HOST_WIDE_INT align
>> ATTRIBUTE_UNUSED)
>unsigned HOST_WIDE_INT align)
> 
>> +{
>> +  avr_need_clear_bss_p = true;
>> +
>> +  fputs ("\t.lcomm ", stream);
>> +  assemble_name (stream, name);
>> +  fprintf (stream, ",%d\n", (int) size);
>   fprintf (stream, " , " HOST_WIDE_INT_PRINT_UNSIGNED ", %u\n",
> size, align / BITS_PER_UNIT);
>> +}
> 
> The same changes in avr_asm_output_aligned_decl_common function.
> 
> Anatoly.

This patch now uses the same procedure like elfos.h

Johann


PR target/18145

* config/avr/avr.h (TARGET_ASM_INIT_SECTIONS): Delete.
(ASM_OUTPUT_COMMON, ASM_OUTPUT_LOCAL): Delete.
(ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
(ASM_OUTPUT_ALIGNED_DECL_LOCAL): Define.
(TARGET_ASM_NAMED_SECTION): Change to avr_asm_named_section.

* config/avr/avr-protos.h (avr_asm_output_aligned_common):
New prototype.

* config/avr/avr.c (TARGET_ASM_INIT_SECTIONS): Define.
(avr_asm_named_section, avr_asm_output_aligned_common,
avr_output_data_section_asm_op, avr_output_bss_section_asm_op):
New functions to update...
(avr_need_clear_bss_p, avr_need_copy_data_p): ...these new variables.
(avr_asm_init_sections): Overwrite section callbacks for
data_section, bss_section.
(avr_file_start): Move output of __do_copy_data, __do_clear_bss
from here to...
(avr_file_end): ...here.


Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(Revision 172597)
+++ config/avr/avr-protos.h	(Arbeitskopie)
@@ -34,6 +34,7 @@ extern void gas_output_limited_string (F
 extern void gas_output_ascii (FILE *file, const char *str, size_t length);
 extern int avr_hard_regno_rename_ok (unsigned int, unsigned int);
 extern rtx avr_return_addr_rtx (int count, rtx tem);
+extern void avr_asm_output_aligned_common (FILE*, const char*, unsigned HOST_WIDE_INT, unsigned int, bool);
 
 #ifdef TREE_CODE
 extern void asm_output_external (FILE *file, tree decl, char *name);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(Revision 172597)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -108,6 +108,7 @@ static void avr_function_arg_advance (CU
   const_tree, bool);
 static void avr_help (void);
 static bool avr_function_ok_for_sibcall (tree, tree);
+static void avr_asm_named_section (const char *name, unsigned int flags, tree decl);
 
 /* Allocate registers from r25 to r8 for parameters for function calls.  */
 #define FIRST_CUM_REG 26
@@ -132,6 +133,10 @@ const struct mcu_type_s *avr_current_dev
 
 section *progmem_section;
 
+/* To track if code will use .bss and/or .data.  */
+bool avr_need_clear_bss_p = false;
+bool avr_need_copy_data_p = false;
+
 /* AVR attributes.  */
 static const struct attribute_spec avr_attribute_table[] =
 {
@@ -197,6 +202,12 @@ static const struct default_options avr_
 #define TARGET_INSERT_ATTRIBUTES avr_insert_attributes
 #undef TARGET_SECTION_TYPE_FLAGS
 #define TARGET_SECTION_TYPE_FLAGS avr_section_type_flags
+
+/* `TARGET_ASM_NAMED_SECTION' must be defined in avr.h.  */
+
+#undef TARGET_ASM_INIT_SECTIONS
+#define TARGET_ASM_INIT_SECTIONS avr_asm_init_sections
+
 #undef TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST avr_register_move_cost
 #undef TARGET_MEMORY_MOVE_COST
@@ -5190,7 +5201,60 @@ avr_output_progmem_section_asm_op (const
   fprintf (asm_out_file, "\t.p2align 1\n");
 }
 
-/* Implement TARGET_ASM_INIT_SECTIONS.  */
+
+/* Implement `ASM_OUTPUT_ALIGNED_DECL_LOCAL'.  */
+/* Implement `ASM_OUTPUT_ALIGNED_DECL_COMMON'.  */
+/* Track need of __do_clear_bss.  */
+
+void
+avr_asm_output_aligned_common (FILE * stream, const char *name,
+   unsigned HOST_WIDE_INT size,
+   unsigned int align, bool local_p)
+{
+  avr_need_clear_bss_p = true;
+
+  if (local_p)
+{
+  fputs ("\t.local\t", stream);
+  assemble_name (stream, name);
+  fputs ("\n", stream);
+}
+  
+  fputs ("\t.comm\t", stream);
+  assemble_name (stream, name);
+  fprintf (stream,
+   "," HOST_WIDE_INT_PRINT_UNSIGNED ",%u\n",
+   size, align / BITS_PER_UNIT);
+}
+
+
+/* Unnamed section callback for data_section
+   to track need of __do_copy_data.  */
+
+static void
+avr_output_data_section_asm_op (const void *data)
+{
+  avr_need_copy_data_p = true;
+  
+  /* Dispatch

Re: [build, doc] Remove --enable-threads=solaris support

2011-04-19 Thread Rainer Orth
Hi Ralf,

> Can't gcc/gthr-tpf.h go, too?  What about gcc/gthr-nks.h?

they are both used: config.gcc (s390x-ibm-tpf*) has thread_file='tpf',
and i[3456x]86-*-netware* uses thread_file='nks'.  I checked for
references to the existing gthr-*.h files for the cleanup patch.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [google] Install cpu_defines.h (issue4440044)

2011-04-19 Thread 關振德
This is used for a freestanding C++ library (i.e., we build
libsupc++.a only).  This is in trunk but for some reason it is not in
the 4.6 release.

-Doug

On Sat, Apr 16, 2011 at 1:27 PM, Diego Novillo  wrote:
> I am committing this patch from Doug Kwan on google/main.
>
> This patch adds cpu_defines.h to the set of files to be installed.
> Doug, could you describe why we need to do this?  Will you be
> submitting this patch for trunk?
>
> Tested on x86_64.  Committed on google/main.
>
>
> Diego.
>
> 2011-03-15  Doug Kwan  
>
>        Google ref 50044.
>
>        PR libstdc++/48123
>        * include/Makefile.am (install-freestanding-headers): Install
>        cpu_defines.h
>        * include/Makefile.in: Regenerate.
>
> diff --git a/libstdc++-v3/include/Makefile.am 
> b/libstdc++-v3/include/Makefile.am
> index e7f1543..d3b44bc 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -1201,7 +1201,8 @@ endif
>  install-freestanding-headers:
>        $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}
>        $(mkinstalldirs) $(DESTDIR)${host_installdir}
> -       for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h; 
> do \
> +       for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \
> +         ${glibcxx_srcdir}/$(CPU_DEFINES_SRCDIR)/cpu_defines.h; do \
>          $(INSTALL_DATA) $${file} $(DESTDIR)${host_installdir}; done
>        $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${std_builddir}
>        $(INSTALL_DATA) ${std_builddir}/limits 
> $(DESTDIR)${gxx_include_dir}/${std_builddir}
> diff --git a/libstdc++-v3/include/Makefile.in 
> b/libstdc++-v3/include/Makefile.in
> index a93b2ea..8a23c1b 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -1585,7 +1585,8 @@ ${pch3_output}: ${pch3_source} ${pch2_output}
>  install-freestanding-headers:
>        $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}
>        $(mkinstalldirs) $(DESTDIR)${host_installdir}
> -       for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h; 
> do \
> +       for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \
> +         ${glibcxx_srcdir}/$(CPU_DEFINES_SRCDIR)/cpu_defines.h; do \
>          $(INSTALL_DATA) $${file} $(DESTDIR)${host_installdir}; done
>        $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${std_builddir}
>        $(INSTALL_DATA) ${std_builddir}/limits 
> $(DESTDIR)${gxx_include_dir}/${std_builddir}
>
>
> --
> This patch is available for review at http://codereview.appspot.com/4440044
>


Re: [testsuite]: Skip some tests for avr

2011-04-19 Thread Georg-Johann Lay
Hans-Peter Nilsson schrieb:
> On Tue, 19 Apr 2011, Georg-Johann Lay wrote:
>> This patchlet skips some tests for avr because int is just 16 bits there.
> 
>> 2011-04-19  Georg-Johann Lay  
>>
>>  * gcc.c-torture/compile/pr43191.c: Skip avr due to 16-bit int.
>>  * gcc.dg/torture/pr43165.c: Ditto.
>>  * gcc.dg/torture/pr47228.c: Ditto.
>>  * gcc.dg/tree-ssa/pr45144.c: Ditto.
>>  * gcc.dg/ipa/pr45644.c: Ditto.
>>
> 
> (Cutnpasted instead of quoting because alpine can't quote
> attached text; please inline patches.)
> 
> Index: gcc.c-torture/compile/pr43191.c
> ===
> --- gcc.c-torture/compile/pr43191.c (Revision 172597)
> +++ gcc.c-torture/compile/pr43191.c (Arbeitskopie)
> @@ -1,4 +1,4 @@
> -/* { dg-skip-if "Ints are 16 bits" { "pdp11-*-*" } { "*" } { "" } } */
> +/* { dg-skip-if "Ints are 16 bits" { "avr-*-*" "pdp11-*-*" } { "*" } { "" } 
> }*/
> 
> That pdp11 stanza was a bad precedent.  Please instead use one of
> 
> /* { dg-skip-if "Ints are 16 bits" { ! int32plus } { "*" } { "" } } */
> or
> /* { dg-require-effective-target int32plus } */
> or
> /* { dg-do X { target int32plus } } */
> (where X = {compile, run, assemble})
> 
> so we don't have to keep a steadily increasing target list
> (decreasing only when targets are removed).
> 
> brgds, H-P

Thanks for that hint. There are some more test cases that imply int >=
32 bit

Johann

2011-04-19  Georg-Johann Lay  

* gcc.dg/pr42629.c: Add dg-require-effective-target int32plus
* gcc.c-torture/execute/cmpsi-2.c: Ditto
* gcc.c-torture/execute/pr45262.c: Ditto
* gcc.dg/torture/pr43165.c: Ditto.
* gcc.dg/torture/pr47228.c: Ditto.
* gcc.dg/tree-ssa/pr45144.c: Ditto.
* gcc.dg/ipa/pr45644.c: Ditto.
* gcc.c-torture/compile/pr43191.c: Ditto. Remove dg-skip-if for
PDP11.






Index: gcc.c-torture/execute/cmpsi-2.c
===
--- gcc.c-torture/execute/cmpsi-2.c	(Revision 172597)
+++ gcc.c-torture/execute/cmpsi-2.c	(Arbeitskopie)
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target int32plus } */
+
 #define F 140
 #define T 13
 
Index: gcc.c-torture/execute/pr45262.c
===
--- gcc.c-torture/execute/pr45262.c	(Revision 172597)
+++ gcc.c-torture/execute/pr45262.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
 /* PR middle-end/45262 */
+/* { dg-require-effective-target int32plus } */
 
 extern void abort (void);
 
Index: gcc.c-torture/compile/pr43191.c
===
--- gcc.c-torture/compile/pr43191.c	(Revision 172597)
+++ gcc.c-torture/compile/pr43191.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
-/* { dg-skip-if "Ints are 16 bits" { "pdp11-*-*" } { "*" } { "" } } */ 
+/* { dg-require-effective-target int32plus } */
+
 struct S0
 {
 };
Index: gcc.dg/pr42629.c
===
--- gcc.dg/pr42629.c	(Revision 172597)
+++ gcc.dg/pr42629.c	(Arbeitskopie)
@@ -2,6 +2,7 @@
took debug insns into account.  */
 /* { dg-do compile } */
 /* { dg-options "-O1 -fsched-pressure -fschedule-insns -fcompare-debug" } */
+/* { dg-require-effective-target int32plus } */
 
 int lzo_adler32(int adler, char *buf)
 {
Index: gcc.dg/torture/pr43165.c
===
--- gcc.dg/torture/pr43165.c	(Revision 172597)
+++ gcc.dg/torture/pr43165.c	(Arbeitskopie)
@@ -1,5 +1,6 @@
 /* PR debug/43165 */
 /* { dg-options "-g" } */
+/* { dg-require-effective-target int32plus } */
 
 struct __attribute__((packed)) S
 {
Index: gcc.dg/torture/pr47228.c
===
--- gcc.dg/torture/pr47228.c	(Revision 172597)
+++ gcc.dg/torture/pr47228.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 
 struct S4
 {
Index: gcc.dg/tree-ssa/pr45144.c
===
--- gcc.dg/tree-ssa/pr45144.c	(Revision 172597)
+++ gcc.dg/tree-ssa/pr45144.c	(Arbeitskopie)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-require-effective-target int32plus } */
 
 void baz (unsigned);
 
Index: gcc.dg/ipa/pr45644.c
===
--- gcc.dg/ipa/pr45644.c	(Revision 172597)
+++ gcc.dg/ipa/pr45644.c	(Arbeitskopie)
@@ -1,6 +1,7 @@
 /* Verify that we do not IPA-SRA bitfields.  */
 /* { dg-do run } */
 /* { dg-options "-O2"  } */
+/* { dg-require-effective-target int32plus } */
 
 extern void abort (void);
 



Re: [google] Install cpu_defines.h (issue4440044)

2011-04-19 Thread Paolo Carlini

On 04/19/2011 07:26 PM, Doug Kwan (關振德) wrote:

This is used for a freestanding C++ library (i.e., we build
libsupc++.a only).  This is in trunk but for some reason it is not in
the 4.6 release.
Why do you think so? For sure upon approval you committed it to the 
branch too, on March, 15th.


Paolo.


Re: [build, lto] Only accept -fuse-linker-plugin if linker supports -plugin (PR lto/46944)

2011-04-19 Thread Rainer Orth
Hi Ralf,

>> I haven't found if there are provisions for in-tree gold, though, and
>> still cannot test that.
>
> I'm not quite sure I understand this statement.  I built a combined tree
> with gold enabled a while ago (must've been several months now).
> I might be misunderstanding this.

I suppose I've been unclear: I'm specificially referring to the current
code for setting gcc_cv_lto_plugin: in the in-tree case, there's nothing
that deals with in-tree gold.

>>if test $in_tree_ld = yes -a x"$ORIGINAL_PLUGIN_LD_FOR_TARGET" = 
>> x"$gcc_cv_ld"; then
>> -if test "$gcc_cv_gld_major_version" -eq 2 -a 
>> "$gcc_cv_gld_minor_version" -ge 21 -o "$gcc_cv_gld_major_version" -gt 2; then
>> -  gcc_cv_lto_plugin=2
>> -elif test "$ld_is_gold" = yes -a "$gcc_cv_gld_major_version" -eq 2 -a 
>> "$gcc_cv_gld_minor_version" -eq 20; then
>> -  gcc_cv_lto_plugin=1
>> -
>> +ld_ver="GNU ld"
>> +# FIXME: ld_is_gold?
>
> What about this FIXME?  Did you test gold?  Is it not relevant here?
> Can the FIXME go?

I cannot test gold since it doesn't yet work on Solaris: cf. binutils PR
gold/12525.  We made some progress on that front, but the PR is
currently stalled and I had other things on my plate that prevented me
from pushing it.  As I said, the current code (before my patch) doesn't
handle in-tree gold, so I don't feel obliged to change that, especially
since I'm in no good position to test.

>> +  ld_vers_major=`expr "$ld_vers" : '\([0-9]*\)'`
>> +  ld_vers_minor=`expr "$ld_vers" : '[0-9]*\.\([0-9]*\)'`
>
> Can you try the expr statements quickly on Tru64?  If not, I can do it
> for you ('info Autoconf --index expr' is long and tells of many woes).

I just tried with /bin/expr and ld_vers set to 2.20.1.  OTOH, this isn't
relevant for two reasons: this code is identical to the one determining
ld_vers_major/ld_vers_minor further up in gcc/configure.ac, and GNU ld
(as well as GNU as) aren't currently supported on Tru64 UNIX and I
seriously doubt that will change over the remaining livetime of the
platform.

> Thanks, and sorry for the delay,

No worries.  I'd just like to get this series of patches out of my queue
(and eventually backported to the 4.6 branch if all issues are sorted
out).  Maybe one of the build maintainers finds some time to handle the
current mess that are the linker tests in gcc/configure.ac, compared to
what we do for the assembler.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [build, lto] Only accept -fuse-linker-plugin if linker supports -plugin (PR lto/46944)

2011-04-19 Thread Rainer Orth
Paolo Bonzini  writes:

> On 04/04/2011 06:15 PM, Rainer Orth wrote:
>> I haven't found if there are provisions for in-tree gold, though, and
>> still cannot test that.
>
> In-tree gold definitely works (or should).

My problem is simply that gold doesn't work on Solaris at all, either
in-tree or externally.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Patch, fortran] Use xcalloc instead of gfc_getmem

2011-04-19 Thread Janne Blomqvist
On Tue, Apr 19, 2011 at 10:46, Jakub Jelinek  wrote:
> On Tue, Apr 19, 2011 at 10:41:33AM +0300, Janne Blomqvist wrote:
>> On Mon, Apr 18, 2011 at 23:50, Steve Kargl
>>  wrote:
>> > On Mon, Apr 18, 2011 at 11:41:33PM +0300, Janne Blomqvist wrote:
>> >> Hi,
>> >>
>> >> the attached patch replaces gfc_getmem with calls to xcalloc (from
>> >> libiberty). Apart from reducing duplicated code, calloc is better than
>> >> malloc + memset, as the allocator knows that the kernel always gives
>> >> out zeroed pages so in some cases it can avoid memset'in the area.
>> >>
>> >> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
>> >>
>> >
>> > OK.
>>
>> Thanks; I'll commit it later today when I get home.
>
> Please don't.  You should be using macros from libiberty.h like rest
> of gcc instead.  So instead of xcalloc use XCNEW, XCNEWVEC or XCNEWVAR
> and remove the casts.

Oh, those macros are nice. I updated the patch to use them instead,
except for one case where the usage didn't fit the XCNEW(VEC) API and
I used xcalloc directly instead.  Here's what I committed:

2011-04-19  Janne Blomqvist  

* misc.c (gfc_getmem): Remove function.
* gfortran.h: Remove gfc_getmem prototype. Replace gfc_getmem
usage with XCNEW or XCNEWVEC.
* expr.c (gfc_check_assign_symbol): Replace gfc_getmem usage with
XCNEW or XCNEWVEC.
* options.c (gfc_handle_module_path_options)
(gfc_get_option_string): Likewise.
* resolve.c (gfc_resolve_forall): Likewise.
* simplify.c (simplify_transformation_to_array): Likewise.
* target-memory.c (gfc_target_interpret_expr): Likewise.
* trans-common.c (get_segment_info, copy_equiv_list_to_ns)
(get_init_field): Likewise.
* trans-expr.c (gfc_conv_statement_function): Likewise.
* trans-io.c (nml_full_name): Likewise.
* trans-stmt.c (gfc_trans_forall_1): Likewise.
* scanner.c (load_file): Replace gfc_getmem usage with xcalloc.




-- 
Janne Blomqvist
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 1e31653..42b65c6 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3583,7 +3583,7 @@ gfc_check_assign_symbol (gfc_symbol *sym, gfc_expr *rvalue)
   lvalue.ts = sym->ts;
   if (sym->as)
 lvalue.rank = sym->as->rank;
-  lvalue.symtree = (gfc_symtree *) gfc_getmem (sizeof (gfc_symtree));
+  lvalue.symtree = XCNEW (gfc_symtree);
   lvalue.symtree->n.sym = sym;
   lvalue.where = sym->declared_at;
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index ce11c07..1d725e4 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1272,8 +1272,7 @@ typedef struct gfc_entry_list
 }
 gfc_entry_list;
 
-#define gfc_get_entry_list() \
-  (gfc_entry_list *) gfc_getmem(sizeof(gfc_entry_list))
+#define gfc_get_entry_list() XCNEW (gfc_entry_list)
 
 /* Lists of rename info for the USE statement.  */
 
@@ -1302,8 +1301,7 @@ typedef struct gfc_use_list
 }
 gfc_use_list;
 
-#define gfc_get_use_list() \
-  (gfc_use_list *) gfc_getmem(sizeof(gfc_use_list))
+#define gfc_get_use_list() XCNEW (gfc_use_list)
 
 /* Within a namespace, symbols are pointed to by symtree nodes that
are linked together in a balanced binary tree.  There can be
@@ -1783,7 +1781,7 @@ typedef struct gfc_expr
 gfc_expr;
 
 
-#define gfc_get_shape(rank) ((mpz_t *) gfc_getmem((rank)*sizeof(mpz_t)))
+#define gfc_get_shape(rank) (XCNEWVEC (mpz_t, (rank)))
 
 /* Structures for information associated with different kinds of
numbers.  The first set of integer parameters define all there is
@@ -2369,7 +2367,6 @@ void gfc_start_source_files (void);
 void gfc_end_source_files (void);
 
 /* misc.c */
-void *gfc_getmem (size_t) ATTRIBUTE_MALLOC;
 int gfc_terminal_width (void);
 void gfc_clear_ts (gfc_typespec *);
 FILE *gfc_open_file (const char *);
diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
index a54ffc0..1274047 100644
--- a/gcc/fortran/misc.c
+++ b/gcc/fortran/misc.c
@@ -23,24 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "gfortran.h"
 
-/* Get a block of memory.  Many callers assume that the memory we
-   return is zeroed.  */
-
-void *
-gfc_getmem (size_t n)
-{
-  void *p;
-
-  if (n == 0)
-return NULL;
-
-  p = xmalloc (n);
-  if (p == NULL)
-gfc_fatal_error ("Allocation would exceed memory limit -- malloc() failed");
-  memset (p, 0, n);
-  return p;
-}
-
 
 /* Get terminal width.  */
 
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index a4d9a66..bc65f6b 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -471,7 +471,7 @@ gfc_handle_module_path_options (const char *arg)
   if (gfc_option.module_dir != NULL)
 gfc_fatal_error ("gfortran: Only one -J option allowed");
 
-  gfc_option.module_dir = (char *) gfc_getmem (strlen (arg) + 2);
+  gfc_option.module_dir = XCNEWVEC (char, strlen (arg) + 2);
   strcpy (gfc_option.module_dir, arg);
 
   gfc_add_include_path (gfc_option.module_dir, true, false);
@@ -1

Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Anatoly Sokolov


Hi.



This patch now uses the same procedure like elfos.h


...

+#define ASM_OUTPUT_ALIGNED_DECL_COMMON(STREAM, DECL, NAME, SIZE, ALIGN) \
+  avr_asm_output_aligned_common (STREAM, NAME, SIZE, ALIGN, false)

..

+#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(STREAM, DECL, NAME, SIZE, ALIGN) \
+  avr_asm_output_aligned_common (STREAM, NAME, SIZE, ALIGN, true)




The GCC have three macro ASM_OUTPUT_COMMON, ASM_OUTPUT_ALIGNED_COMMON and 
ASM_OUTPUT_ALIGNED_DECL_COMMON for output  common label in stream, 
eventually only one most flexible ASM_OUTPUT_ALIGNED_DECL_COMMON macro 
should be left. The same for local common label. Please use 
ASM_OUTPUT_ALIGNED_DECL_COMMON  and ASM_OUTPUT_ALIGNED_DECL_LOCAL macros 
here.


Anatoly.





Re: [Patch, fortran] Use xcalloc instead of gfc_getmem

2011-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2011 at 08:46:17PM +0300, Janne Blomqvist wrote:
> Oh, those macros are nice. I updated the patch to use them instead,
> except for one case where the usage didn't fit the XCNEW(VEC) API and
> I used xcalloc directly instead.  Here's what I committed:

For that remaining case there is XCNEWVAR, i.e.
b = XCNEWVAR (gfc_linebuf, gfc_linebuf_header_size 
   + (len + 1) * sizeof (gfc_char_t));

Jakub


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Georg-Johann Lay
Anatoly Sokolov schrieb:
> 
> Hi.
> 
>>
>> This patch now uses the same procedure like elfos.h
>>
> ...
>> +#define ASM_OUTPUT_ALIGNED_DECL_COMMON(STREAM, DECL, NAME, SIZE,
>> ALIGN) \
>> +  avr_asm_output_aligned_common (STREAM, NAME, SIZE, ALIGN, false)
> ..
>> +#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(STREAM, DECL, NAME, SIZE, ALIGN) \
>> +  avr_asm_output_aligned_common (STREAM, NAME, SIZE, ALIGN, true)
>>
> 
> 
> The GCC have three macro ASM_OUTPUT_COMMON, ASM_OUTPUT_ALIGNED_COMMON
> and ASM_OUTPUT_ALIGNED_DECL_COMMON for output  common label in stream,
> eventually only one most flexible ASM_OUTPUT_ALIGNED_DECL_COMMON macro
> should be left. The same for local common label. Please use
> ASM_OUTPUT_ALIGNED_DECL_COMMON  and ASM_OUTPUT_ALIGNED_DECL_LOCAL macros
> here.

Confused. These macros are used.

Johann

> 
> Anatoly.
> 



Re: [build] Support multilib testing in libgo

2011-04-19 Thread Rainer Orth
Hi Ralf,

> * Rainer Orth wrote on Mon, Apr 04, 2011 at 08:19:19PM CEST:
>> To avoid this mess, I'm instead setting LD_LIBRARY_PATH in CHECK.  While
>> this isn't exactly portable (some platforms, especially Darwin and
>> HP-UX, use different variables), it's at least more widespread than -R.
>
> Toplevel configure computes RPATH_ENVVAR for the host, and toplevel
> Makefile passes that.  If you need it for the target, it should be
> easy to repeat those five lines of code somewhere.

probably, though this is going to be a mess since you have to do it for
all possible multilib variants, cf. gcc/testsuite/lib/target-libpath.exp
and PR other/43445.

>> +  p=`grep -c PASS $${multidir}/libgo.sum.sep`; \
>> +  if test "$$p" != "0"; then \
>
> I'd use -ne instead of != here and below.

Ok.  The != test was there before, but I've updated my patch.

>> +echo `echo $(GOC) | sed -e 's/ .*//'`  `$(GOC) -v 2>&1 | grep " 
>> version" | sed -n -e 's/.* \(version.*$$\)/\1/p'` >> libgo.sum; \
>
> This line:
> echo `echo $(GOC) | sed -e 's/ .*//'` ...
>
> is equivalent to:
> echo $(GOC) ...
>
> as the shell squashes unquoted multiple adjacent white space.

Unfortunately, it's not: GOC contains not only the compiler command, but
a whole bunch of -B etc. options that need to be stripped.  OTOH, one
might argue that the version info only belongs with the compiler tests
in gcc/testsuite.  So for, none of the runtime libraries emit version
info in they .sum files, and many won't even have a separate version.

That said, I noticed another problem with the patch: for non-multilibbed
targets, it produced two separate summary lines, which is wrong and
doesn't match what DejaGnu does.

Here's the updated patch.  It also omits $GOCFLAGS because I noticed
that it is both unnecessary (the multilib flags that prompted me to
include it are already included in $GOC) and harmful: a couple of
testcases start failing when they are compiled with -g -O2 instead of
without optimization, cf. e.g. PR go/48122.

Rainer


2011-02-13  Rainer Orth  

* Makefile.am (CHECK): Remove -Wl,-R from $GC.
Add LD_LIBRARY_PATH.
(check): Depend on check-multi, check-tail.
(check-recursive): Depend on check-head.
(check-am): Move header, footer generation ...
(check-head, check-tail): ... here.
New targets.
(check-multi): New target.
(MOSTLYCLEAN_FILES): Replace libgo.tail by libgo.head, add
libgo.sum.sep, libgo.log.sep.
* Makefile.in: Regenerate.

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -1544,12 +1544,15 @@ GOTESTFLAGS =
 
 # Check a package.
 CHECK = \
-   GC="$(GOC) -L `${PWD_COMMAND}` -L `${PWD_COMMAND}`/.libs 
-Wl,-R,`${PWD_COMMAND}`/.libs"; \
+   GC="$(GOC) -L `${PWD_COMMAND}` -L `${PWD_COMMAND}`/.libs"; \
export GC; \
RUNTESTFLAGS="$(RUNTESTFLAGS)"; \
export RUNTESTFLAGS; \
MAKE="$(MAKE)"; \
export MAKE; \
+   libgcc=`${GOC} -print-libgcc-file-name`; \
+   LD_LIBRARY_PATH="`${PWD_COMMAND}`/.libs:`dirname 
$${libgcc}`:${LD_LIBRARY_PATH}"; \
+   export LD_LIBRARY_PATH; \
rm -f $@-testsum $@-testlog; \
prefix=`if test "$(@D)" = "regexp"; then echo regexp-test; else dirname 
$(@D); fi`; \
test "$${prefix}" != "." || prefix="$(@D)"; \
@@ -3065,27 +3068,84 @@ TEST_PACKAGES = \
testing/quick/check \
testing/script/check
 
+check: check-tail
+check-recursive: check-head
+
+check-head:
+   @echo "Test Run By $${USER} on `date`" > libgo.head
+   @echo "Native configuration is $(host_triplet)" >> libgo.head
+   @echo >> libgo.head
+   @echo " === libgo tests ===" >> libgo.head
+   @echo >> libgo.head
+
+check-tail: check-recursive check-multi
+   @lib=`${PWD_COMMAND} | sed -e 's,^.*/\([^/][^/]*\)$$,\1,'`; \
+   for dir in . $(MULTIDIRS); do \
+ mv ../$${dir}/$${lib}/libgo.sum ../$${dir}/$${lib}/libgo.sum.sep; \
+ mv ../$${dir}/$${lib}/libgo.log ../$${dir}/$${lib}/libgo.log.sep; \
+   done; \
+   mv libgo.head libgo.sum; \
+   cp libgo.sum libgo.log; \
+   echo "Schedule of variations:" >> libgo.sum; \
+   for dir in . $(MULTIDIRS); do \
+ multidir=../$${dir}/$${lib}; \
+ multivar=`cat $${multidir}/libgo.var`; \
+ echo "$${multivar}" >> libgo.sum; \
+   done; \
+   echo >> libgo.sum; \
+   pass=0; fail=0; untested=0; \
+   for dir in . $(MULTIDIRS); do \
+ multidir=../$${dir}/$${lib}; \
+ multivar=`cat $${multidir}/libgo.var`; \
+ echo "Running target $${multivar}" >> libgo.sum; \
+ echo "Running $(srcdir)/libgo.exp ..." >> libgo.sum; \
+ cat $${multidir}/libgo.sum.sep >> libgo.sum; \
+ cat $${multidir}/libgo.log.sep >> libgo.log; \
+ if test -n "${MULTIDIRS}"; then \
+   echo "  === libgo Summary fo

Re: [google] Install cpu_defines.h (issue4440044)

2011-04-19 Thread 關振德
My bad.  I looked at the gcc-4.6.0 tarball and it is indeed there.

-Doug

On Tue, Apr 19, 2011 at 10:36 AM, Paolo Carlini
 wrote:
> On 04/19/2011 07:26 PM, Doug Kwan (關振德) wrote:
>>
>> This is used for a freestanding C++ library (i.e., we build
>> libsupc++.a only).  This is in trunk but for some reason it is not in
>> the 4.6 release.
>
> Why do you think so? For sure upon approval you committed it to the branch
> too, on March, 15th.
>
> Paolo.
>


Re: [lto, testsuite] Don't use visibility on targets that don't support it (PR lto/47334)

2011-04-19 Thread Rainer Orth
Rainer Orth  writes:

> Mike Stump  writes:
>
>> On Apr 5, 2011, at 1:56 AM, Rainer Orth wrote:
>>> * lto.c (promote_var): Only set VISIBILITY_HIDDEN if
>>> HAVE_GAS_HIDDEN.
>>
>> Oh, at a minimum, if TARGET_ASM_ASSEMBLE_VISIBILITY is set, doing this stuff 
>> I think is useful?
>
> No, this won't work.  E.g. on Solaris with an assembler without
> visibility support, TARGET_ASM_ASSEMBLE_VISIBILITY is set, but just
> emits a warning.  This is similiar to default_assemble_visibility with
> HAVE_GAS_HIDDEN undefined.
>
> Right now, there are four definitions of TARGET_ASM_ASSEMBLE_VISIBILITY:
>
> * i386/cygming.h: i386_pe_assemble_visibility only warns about
>   visibility attributes, so no problem here.
>
> * darwin.h: darwin_assemble_visibility is the only implementation which
>   can handle VISIBILITY_HIDDEN (only), but doesn't define
>   HAVE_GAS_HIDDEN.  Maybe it should, but one would have to check every
>   instance of the macro to make sure there are no ill effects.
>
> * rs6000/rs6000.c: protected by HAVE_GAS_HIDDEN.
>
> * sol2.h: warns unless HAVE_GAS_HIDDEN.

I've had a closer look now and think it's possible (and desirable) to
define HAVE_GAS_HIDDEN for Darwin, too.  I've now (after lots of
trouble, and without success getting Ada to bootstrap on PowerPC Darwin)
set up a development environment on Mac OS X 10.5, both i386 and powerpc.

My current plan (though this may be slow) is to define HAVE_GAS_HIDDEN
for Darwin in gcc/configure.ac and check what else is necessary to make
this work.  Once that is done, my patch can probably go in.

Additionally, one might want to rename HAVE_GAS_HIDDEN to
HAVE_AS_VISIBILITY since that's what the macro really means.  (Actually,
that's a lie: it means HAVE_AS_LD_VISIBILITY, but I don't think we need
to become that verbose.)

Does this sound reasonable?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[Patch, fortran] Committed trivial FIXME patch

2011-04-19 Thread Janne Blomqvist
Now that Jim Meyering has remove the macro that prevented directly
calling free(), and replaced gfc_free() with free(), we can fix this.
Committed as obvious.

Index: frontend-passes.c
===
--- frontend-passes.c   (revision 172727)
+++ frontend-passes.c   (working copy)
@@ -71,9 +71,7 @@ gfc_run_passes (gfc_namespace *ns)
   if (gfc_option.dump_fortran_optimized)
gfc_dump_parse_tree (ns, stdout);

-  /* FIXME: The following should be XDELETEVEC(expr_array);
-  but we cannot do that because it depends on free.  */
-  free (expr_array);
+  XDELETEVEC (expr_array);
 }
 }


-- 
Janne Blomqvist


Re: [Patch, fortran] Use xcalloc instead of gfc_getmem

2011-04-19 Thread Janne Blomqvist
On Tue, Apr 19, 2011 at 20:53, Jakub Jelinek  wrote:
> On Tue, Apr 19, 2011 at 08:46:17PM +0300, Janne Blomqvist wrote:
>> Oh, those macros are nice. I updated the patch to use them instead,
>> except for one case where the usage didn't fit the XCNEW(VEC) API and
>> I used xcalloc directly instead.  Here's what I committed:
>
> For that remaining case there is XCNEWVAR, i.e.
>    b = XCNEWVAR (gfc_linebuf, gfc_linebuf_header_size
>                               + (len + 1) * sizeof (gfc_char_t));

Thanks. Committed the above as obvious (r172730).


-- 
Janne Blomqvist


Go patch committed: Use backend interface for blocks

2011-04-19 Thread Ian Lance Taylor
This patch to the Go frontend uses the backend interface for blocks.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian


2011-04-19  Ian Lance Taylor  

* go-gcc.cc (class Bblock): Define.
(Gcc_backend::if_statement): Change then_block and else_block to
Bblock*.
(Gcc_backend::block): New function.
(Gcc_backend::block_add_statements): New function.
(Gcc_backend::block_statement): New function.
(tree_to_block, block_to_tree): New functions.


Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc	(revision 172693)
+++ gcc/go/go-gcc.cc	(working copy)
@@ -92,6 +92,14 @@ class Bfunction : public Gcc_tree
   { }
 };
 
+class Bblock : public Gcc_tree
+{
+ public:
+  Bblock(tree t)
+: Gcc_tree(t)
+  { }
+};
+
 class Bvariable : public Gcc_tree
 {
  public:
@@ -194,8 +202,8 @@ class Gcc_backend : public Backend
 		   source_location);
 
   Bstatement*
-  if_statement(Bexpression* condition, Bstatement* then_block,
-	   Bstatement* else_block, source_location);
+  if_statement(Bexpression* condition, Bblock* then_block, Bblock* else_block,
+	   source_location);
 
   Bstatement*
   switch_statement(Bexpression* value,
@@ -209,6 +217,18 @@ class Gcc_backend : public Backend
   Bstatement*
   statement_list(const std::vector&);
 
+  // Blocks.
+
+  Bblock*
+  block(Bfunction*, Bblock*, const std::vector&,
+	source_location, source_location);
+
+  void
+  block_add_statements(Bblock*, const std::vector&);
+
+  Bstatement*
+  block_statement(Bblock*);
+
   // Variables.
 
   Bvariable*
@@ -370,8 +390,8 @@ Gcc_backend::return_statement(Bfunction*
 // If.
 
 Bstatement*
-Gcc_backend::if_statement(Bexpression* condition, Bstatement* then_block,
-			  Bstatement* else_block, source_location location)
+Gcc_backend::if_statement(Bexpression* condition, Bblock* then_block,
+			  Bblock* else_block, source_location location)
 {
   tree cond_tree = condition->get_tree();
   tree then_tree = then_block->get_tree();
@@ -481,6 +501,114 @@ Gcc_backend::statement_list(const std::v
   return this->make_statement(stmt_list);
 }
 
+// Make a block.  For some reason gcc uses a dual structure for
+// blocks: BLOCK tree nodes and BIND_EXPR tree nodes.  Since the
+// BIND_EXPR node points to the BLOCK node, we store the BIND_EXPR in
+// the Bblock.
+
+Bblock*
+Gcc_backend::block(Bfunction* function, Bblock* enclosing,
+		   const std::vector& vars,
+		   source_location start_location,
+		   source_location)
+{
+  tree block_tree = make_node(BLOCK);
+  if (enclosing == NULL)
+{
+  // FIXME: Permitting FUNCTION to be NULL is a temporary measure
+  // until we have a proper representation of the init function.
+  tree fndecl;
+  if (function == NULL)
+	fndecl = current_function_decl;
+  else
+	fndecl = function->get_tree();
+  gcc_assert(fndecl != NULL_TREE);
+
+  // We may have already created a block for local variables when
+  // we take the address of a parameter.
+  if (DECL_INITIAL(fndecl) == NULL_TREE)
+	{
+	  BLOCK_SUPERCONTEXT(block_tree) = fndecl;
+	  DECL_INITIAL(fndecl) = block_tree;
+	}
+  else
+	{
+	  tree superblock_tree = DECL_INITIAL(fndecl);
+	  BLOCK_SUPERCONTEXT(block_tree) = superblock_tree;
+	  tree* pp;
+	  for (pp = &BLOCK_SUBBLOCKS(superblock_tree);
+	   *pp != NULL_TREE;
+	   pp = &BLOCK_CHAIN(*pp))
+	;
+	  *pp = block_tree;
+	}
+}
+  else
+{
+  tree superbind_tree = enclosing->get_tree();
+  tree superblock_tree = BIND_EXPR_BLOCK(superbind_tree);
+  gcc_assert(TREE_CODE(superblock_tree) == BLOCK);
+
+  BLOCK_SUPERCONTEXT(block_tree) = superblock_tree;
+  tree* pp;
+  for (pp = &BLOCK_SUBBLOCKS(superblock_tree);
+	   *pp != NULL_TREE;
+	   pp = &BLOCK_CHAIN(*pp))
+	;
+  *pp = block_tree;
+}
+
+  tree* pp = &BLOCK_VARS(block_tree);
+  for (std::vector::const_iterator pv = vars.begin();
+   pv != vars.end();
+   ++pv)
+{
+  *pp = (*pv)->get_tree();
+  if (*pp != error_mark_node)
+	pp = &DECL_CHAIN(*pp);
+}
+  *pp = NULL_TREE;
+
+  TREE_USED(block_tree) = 1;
+
+  tree bind_tree = build3_loc(start_location, BIND_EXPR, void_type_node,
+			  BLOCK_VARS(block_tree), NULL_TREE, block_tree);
+  TREE_SIDE_EFFECTS(bind_tree) = 1;
+
+  return new Bblock(bind_tree);
+}
+
+// Add statements to a block.
+
+void
+Gcc_backend::block_add_statements(Bblock* bblock,
+  const std::vector& statements)
+{
+  tree stmt_list = NULL_TREE;
+  for (std::vector::const_iterator p = statements.begin();
+   p != statements.end();
+   ++p)
+{
+  tree s = (*p)->get_tree();
+  if (s != error_mark_node)
+	append_to_statement_list(s, &stmt_list);
+}
+
+  tree bind_tree = bblock->get_tree();
+  gcc_assert(TREE_CODE(bind_tree) == BIND_EXPR);
+  BIND_EXPR_BODY(bind_tree) = stmt_list;
+}
+
+// Return a block as a statement.
+
+Bstatement*
+Gcc_

[patch, fortran] PR 48405 - Front end expressions in DO loops

2011-04-19 Thread Thomas Koenig

Hello world,

this patch fixes the enhancement PR, plus probably a few regressions.

The basic problem was that the code walker got confused when *c, the 
pointer to the current gfc_code statement, was changed by inserting 
additional code.


Currently regression-testing. OK for trunk if the tests pass?

Thomas

2011-04-18  Thomas Koenig  

PR fortran/48405
* frontend_passes (cfe_register_funcs): Remove workaround for DO
loops.
(gfc_code_walker):  Make sure the pointer to the current
statement doen't change when other statements are inserted.

2011-04-18  Thomas Koenig  

PR fortran/48405
* gfortran.dg/function_optimize_6.f90:  New test.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 172607)
+++ frontend-passes.c	(Arbeitskopie)
@@ -142,12 +142,6 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre
 	  void *data ATTRIBUTE_UNUSED)
 {
 
-  /* FIXME - there is a bug in the insertion code for DO loops.  Bail
- out here.  */
-
-  if ((*current_code)->op == EXEC_DO)
-return 0;
-
   if ((*e)->expr_type != EXPR_FUNCTION)
 return 0;
 
@@ -958,31 +952,37 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	{
 	  gfc_code *b;
 	  gfc_actual_arglist *a;
+	  gfc_code *co;
 
-	  switch ((*c)->op)
+	  /* There might be statement insertions before the current code,
+	 which must not affect the expression walker.  */
+
+	  co = *c;
+
+	  switch (co->op)
 	{
 	case EXEC_DO:
-	  WALK_SUBEXPR ((*c)->ext.iterator->var);
-	  WALK_SUBEXPR ((*c)->ext.iterator->start);
-	  WALK_SUBEXPR ((*c)->ext.iterator->end);
-	  WALK_SUBEXPR ((*c)->ext.iterator->step);
+	  WALK_SUBEXPR (co->ext.iterator->var);
+	  WALK_SUBEXPR (co->ext.iterator->start);
+	  WALK_SUBEXPR (co->ext.iterator->end);
+	  WALK_SUBEXPR (co->ext.iterator->step);
 	  break;
 
 	case EXEC_CALL:
 	case EXEC_ASSIGN_CALL:
-	  for (a = (*c)->ext.actual; a; a = a->next)
+	  for (a = co->ext.actual; a; a = a->next)
 		WALK_SUBEXPR (a->expr);
 	  break;
 
 	case EXEC_CALL_PPC:
-	  WALK_SUBEXPR ((*c)->expr1);
-	  for (a = (*c)->ext.actual; a; a = a->next)
+	  WALK_SUBEXPR (co->expr1);
+	  for (a = co->ext.actual; a; a = a->next)
 		WALK_SUBEXPR (a->expr);
 	  break;
 
 	case EXEC_SELECT:
-	  WALK_SUBEXPR ((*c)->expr1);
-	  for (b = (*c)->block; b; b = b->block)
+	  WALK_SUBEXPR (co->expr1);
+	  for (b = co->block; b; b = b->block)
 		{
 		  gfc_case *cp;
 		  for (cp = b->ext.block.case_list; cp; cp = cp->next)
@@ -998,7 +998,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	case EXEC_DEALLOCATE:
 	  {
 		gfc_alloc *a;
-		for (a = (*c)->ext.alloc.list; a; a = a->next)
+		for (a = co->ext.alloc.list; a; a = a->next)
 		  WALK_SUBEXPR (a->expr);
 		break;
 	  }
@@ -1006,7 +1006,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	case EXEC_FORALL:
 	  {
 		gfc_forall_iterator *fa;
-		for (fa = (*c)->ext.forall_iterator; fa; fa = fa->next)
+		for (fa = co->ext.forall_iterator; fa; fa = fa->next)
 		  {
 		WALK_SUBEXPR (fa->var);
 		WALK_SUBEXPR (fa->start);
@@ -1017,110 +1017,110 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code
 	  }
 
 	case EXEC_OPEN:
-	  WALK_SUBEXPR ((*c)->ext.open->unit);
-	  WALK_SUBEXPR ((*c)->ext.open->file);
-	  WALK_SUBEXPR ((*c)->ext.open->status);
-	  WALK_SUBEXPR ((*c)->ext.open->access);
-	  WALK_SUBEXPR ((*c)->ext.open->form);
-	  WALK_SUBEXPR ((*c)->ext.open->recl);
-	  WALK_SUBEXPR ((*c)->ext.open->blank);
-	  WALK_SUBEXPR ((*c)->ext.open->position);
-	  WALK_SUBEXPR ((*c)->ext.open->action);
-	  WALK_SUBEXPR ((*c)->ext.open->delim);
-	  WALK_SUBEXPR ((*c)->ext.open->pad);
-	  WALK_SUBEXPR ((*c)->ext.open->iostat);
-	  WALK_SUBEXPR ((*c)->ext.open->iomsg);
-	  WALK_SUBEXPR ((*c)->ext.open->convert);
-	  WALK_SUBEXPR ((*c)->ext.open->decimal);
-	  WALK_SUBEXPR ((*c)->ext.open->encoding);
-	  WALK_SUBEXPR ((*c)->ext.open->round);
-	  WALK_SUBEXPR ((*c)->ext.open->sign);
-	  WALK_SUBEXPR ((*c)->ext.open->asynchronous);
-	  WALK_SUBEXPR ((*c)->ext.open->id);
-	  WALK_SUBEXPR ((*c)->ext.open->newunit);
+	  WALK_SUBEXPR (co->ext.open->unit);
+	  WALK_SUBEXPR (co->ext.open->file);
+	  WALK_SUBEXPR (co->ext.open->status);
+	  WALK_SUBEXPR (co->ext.open->access);
+	  WALK_SUBEXPR (co->ext.open->form);
+	  WALK_SUBEXPR (co->ext.open->recl);
+	  WALK_SUBEXPR (co->ext.open->blank);
+	  WALK_SUBEXPR (co->ext.open->position);
+	  WALK_SUBEXPR (co->ext.open->action);
+	  WALK_SUBEXPR (co->ext.open->delim);
+	  WALK_SUBEXPR (co->ext.open->pad);
+	  WALK_SUBEXPR (co->ext.open->iostat);
+	  WALK_SUBEXPR (co->ext.open->iomsg);
+	  WALK_SUBEXPR (co->ext.open->convert);
+	  WALK_SUBEXPR (co->ext.open->decimal

[RFA] [PowerPC]

2011-04-19 Thread edmar

This patch fixes some test cases for PowerPC.

The tests pr39902-2.c, dfp-dd.c, and dfp-td.c reports as errors
when gcc is configured without dfp support. This patch will make the
tests to be reported as unsupported.

The test and-1.c has wrong logic.
In the formula:
y & ~(y & -y)

The part (y & -y) is always a mask with one bit set, which corresponds
to the least significant "1" bit in y.
The final result is that bit, is set to zero (y & ~mask)

There is no boolean simplification possible, and the compiler always 
produces

a nand instruction.

The test should be:
y & ~(y & ~x)

Which can be simplified to:
y & (~y or x)
y.~y or yx
yx

Optimized code has a single "and" instruction. Sub-optimal optimization
will generate a "nand" or an "orc" / "and" pair (as in gcc-4.3).


I also would like to request for someone to commit the patch after
approval, as I have no WAA privileges.

Regards,
Edmar

2011-04-19  Edmar Wienskoski  ed...@freescale.com

* gcc.target/powerpc/pr39902-2.c: Skip testcase for non-dfp
targets.
* gcc.target/powerpc/dfp-dd.c: ditto
* gcc.target/powerpc/dfp-td.c: ditto
* gcc.target/powerpc/and-1.c: Fixed testcase logic
Index: gcc-20100630/gcc/testsuite/gcc.target/powerpc/pr39902-2.c
===
--- gcc-20100630/gcc/testsuite/gcc.target/powerpc/pr39902-2.c	(revision 161589)
+++ gcc-20100630/gcc/testsuite/gcc.target/powerpc/pr39902-2.c	(working copy)
@@ -1,7 +1,7 @@
 /* Check that simplification "x*(-1)" -> "-x" is not performed for decimal
float types.  */
 
-/* { dg-do compile { target { powerpc*-*-linux* && powerpc_fprs } } } */
+/* { dg-do compile { target { powerpc*-*-linux* && { powerpc_fprs && dfp } } } } */
 /* { dg-options "-std=gnu99 -O -mcpu=power6" } */
 /* { dg-final { scan-assembler-not "fneg" } } */
 
Index: gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-dd.c
===
--- gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-dd.c	(revision 161589)
+++ gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-dd.c	(working copy)
@@ -1,6 +1,6 @@
 /* Test generation of DFP instructions for POWER6.  */
 /* Origin: Janis Johnson  */
-/* { dg-do compile { target { powerpc*-*-linux* && powerpc_fprs } } } */
+/* { dg-do compile { target { powerpc*-*-linux* && { powerpc_fprs && dfp } } } } */
 /* { dg-options "-std=gnu99 -mcpu=power6" } */
 
 /* { dg-final { scan-assembler "dadd" } } */
Index: gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-td.c
===
--- gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-td.c	(revision 161589)
+++ gcc-20100630/gcc/testsuite/gcc.target/powerpc/dfp-td.c	(working copy)
@@ -1,6 +1,6 @@
 /* Test generation of DFP instructions for POWER6.  */
 /* Origin: Janis Johnson  */
-/* { dg-do compile { target { powerpc*-*-linux* && powerpc_fprs } } } */
+/* { dg-do compile { target { powerpc*-*-linux* && { powerpc_fprs && dfp } } } } */
 /* { dg-options "-std=gnu99 -mcpu=power6" } */
 
 /* { dg-final { scan-assembler "daddq" } } */
Index: gcc-20100630/gcc/testsuite/gcc.dg/and-1.c
===
--- gcc-20100630/gcc/testsuite/gcc.dg/and-1.c	(revision 161589)
+++ gcc-20100630/gcc/testsuite/gcc.dg/and-1.c	(working copy)
@@ -3,8 +3,9 @@
 /* { dg-final { scan-assembler "and" { target powerpc*-*-* spu-*-* } } } */
 /* There should be no nand for this testcase (for either PPC or SPU). */
 /* { dg-final { scan-assembler-not "nand" { target powerpc*-*-* spu-*-* } } } */
+/* { dg-final { scan-assembler-not "orc" { target powerpc*-*-* spu-*-* } } } */
 
-int f(int y)
+int f(int y, int x)
 {
-  return y & ~(y & -y);
+  return y & ~(y & ~x);
 }


Re: Inliner heuristics facelift

2011-04-19 Thread H.J. Lu
On Tue, Apr 19, 2011 at 11:30 AM, Xinliang David Li  wrote:
> Is change tested this with SPEC?  5 or 6 SPEC2k programs ICE with the
> following error when built with FDO.
>
>  internal compiler error: in inline_small_functions, at ipa-inline.c:1361
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
>

This is caused by a different patch:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48671

-- 
H.J.


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Anatoly Sokolov





Please use
ASM_OUTPUT_ALIGNED_DECL_COMMON and ASM_OUTPUT_ALIGNED_DECL_LOCAL macros
here.


Confused. These macros are used.

Johann



Sorry... Im look on function name not on macro definition.

I agree with the patch. Please wait day or two if Denis would not object, 
commit patch.


Anatoly.


[google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Easwaran Raman
This makes the gcc driver pass the --save-temps option to the assembler or 
assembler wrapper so that post-assembly tools like MAO can be integrated. 
Bootstraps on x86_64. Ok for google/main?

2011-04-19  Easwaran Raman  

* gcc/gcc.c (static const char *asm_options): Pass
--save-temps to the assembler.

Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 172727)
+++ gcc/gcc.c   (working copy)
@@ -780,7 +780,7 @@ static const char *asm_options =
 #if HAVE_GNU_AS
 /* If GNU AS is used, then convert -w (no warnings), -I, and -v
to the assembler equivalents.  */
-"%{v} %{w:-W} %{I*} "
+"%{v} %{w:-W} %{I*} %{save-temps:--save-temps} "
 #endif
 "%a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}";
 

--
This patch is available for review at http://codereview.appspot.com/4436049


Re: [google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Diego Novillo
On Tue, Apr 19, 2011 at 15:02, Easwaran Raman  wrote:
> This makes the gcc driver pass the --save-temps option to the assembler or 
> assembler wrapper so that post-assembly tools like MAO can be integrated. 
> Bootstraps on x86_64. Ok for google/main?
>
> 2011-04-19  Easwaran Raman  
>
>        * gcc/gcc.c (static const char *asm_options): Pass

No need to specify 'static const char *'.  Just 'asm_options'.

>        --save-temps to the assembler.

Could you add a comment stating what version of binutils we need to use this?

I will commit this patch for you.  Given that you will be sending
other patches, please fill in the write access form at
http://sourceware.org/cgi-bin/pdw/ps_form.cgi


Diego.


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Denis Chertykov
2011/4/19 Anatoly Sokolov :
>
>
>>>
>>> Please use
>>> ASM_OUTPUT_ALIGNED_DECL_COMMON and ASM_OUTPUT_ALIGNED_DECL_LOCAL macros
>>> here.
>>
>> Confused. These macros are used.
>>
>> Johann
>>
>
> Sorry... Im look on function name not on macro definition.
>
> I agree with the patch. Please wait day or two if Denis would not object,
> commit patch.

Please, commit.

Denis.


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Richard Henderson
On 04/18/2011 10:20 AM, Georg-Johann Lay wrote:
> +avr_asm_named_section (const char *name, unsigned int flags, tree decl)
> +{
> +  if (!avr_need_copy_data_p)
> +avr_need_copy_data_p = ((0 == strncmp (name, ".data", 5)
> + || 0 == strncmp (name, ".rodata", 7)
> + || 0 == strncmp (name, ".gnu.linkonce.", 14)));
> +  
> +  if (!avr_need_clear_bss_p)
> +avr_need_clear_bss_p = (0 == strncmp (name, ".bss", 4));

Have a look at FLAGS.  I expect that you can reference those to
categorize the data rather than hard-coding the section names.
In particular I believe your ".gnu.linkonce" test is wrong.

It's not clear to me what you're looking for wrt "data".
Perhaps what you want is

  if (flags & (SECTION_DEBUG | SECTION_CODE))
;
  else if (flags & SECTION_BSS)
avr_need_clear_bss_p = true;
  else
avr_need_copy_data_p = true;


r~


Re: [google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Easwaran Raman
On Tue, Apr 19, 2011 at 12:19 PM, Diego Novillo  wrote:
> On Tue, Apr 19, 2011 at 15:02, Easwaran Raman  wrote:
>> This makes the gcc driver pass the --save-temps option to the assembler or 
>> assembler wrapper so that post-assembly tools like MAO can be integrated. 
>> Bootstraps on x86_64. Ok for google/main?
>>
>> 2011-04-19  Easwaran Raman  
>>
>>        * gcc/gcc.c (static const char *asm_options): Pass
>
> No need to specify 'static const char *'.  Just 'asm_options'.
>
>>        --save-temps to the assembler.
>
> Could you add a comment stating what version of binutils we need to use this?
This is to work with the assembler wrapper that has to know whether to
save the assembly file after running any post-assembly tools like MAO.

> I will commit this patch for you.  Given that you will be sending
> other patches, please fill in the write access form at
> http://sourceware.org/cgi-bin/pdw/ps_form.cgi
>
>
> Diego.
>


RE: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Weddington, Eric


> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Tuesday, April 19, 2011 1:31 PM
> To: Georg-Johann Lay
> Cc: gcc-patches@gcc.gnu.org; Weddington, Eric; Denis Chertykov; Anatoly
> Sokolov
> Subject: Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if
> needed
> 
> On 04/18/2011 10:20 AM, Georg-Johann Lay wrote:
> > +avr_asm_named_section (const char *name, unsigned int flags, tree decl)
> > +{
> > +  if (!avr_need_copy_data_p)
> > +avr_need_copy_data_p = ((0 == strncmp (name, ".data", 5)
> > + || 0 == strncmp (name, ".rodata", 7)
> > + || 0 == strncmp (name, ".gnu.linkonce.",
> 14)));
> > +
> > +  if (!avr_need_clear_bss_p)
> > +avr_need_clear_bss_p = (0 == strncmp (name, ".bss", 4));
> 
> Have a look at FLAGS.  I expect that you can reference those to
> categorize the data rather than hard-coding the section names.
> In particular I believe your ".gnu.linkonce" test is wrong.
> 
> It's not clear to me what you're looking for wrt "data".

We're looking for the existence of global initialized data variables.

We have 2 small subroutines in our libgcc, one to set everything in .bss to 
zero, and another to copy the initializations from .text (in flash) to the RAM 
based variables in .data. These subroutines have always been included in the 
startup code whether they were needed or not. The purpose of this patch is to 
check whether these subroutines are actually needed or not (i.e. if anything 
actually exists in .bss, or in .data). If either of these sections are actually 
empty, then we don't want to link in the corresponding subroutine, which helps 
us save some code space on small devices.

AFAICT, I've never seen the avr target generate anything in .rodata and 
.gnu.linkonce sections. I think these might be holdovers from some other target 
code and/or there for some completeness reason, I'm not really sure.

Eric Weddington


Re: FDO usability patch -- cfg and lineno checksum

2011-04-19 Thread Xinliang David Li
The attached is the revised patch with a warning suggested for cases
when CFG matches, but source locations change.

Ok for trunk?

thanks,

David

On Sun, Apr 17, 2011 at 8:36 AM, Jan Hubicka  wrote:
>> Hi,  in current FDO implementation, the source file version used in
>> profile-generate needs to strictly match the version used in
>> profile-use -- simple formating changes will invalidate the profile
>> data (use of it will lead to compiler error or ICE). There are two
>> main problems that lead to the weakness:
>>
>> 1) the function checksum is computed based on line number and number
>> of basic blocks -- this means any line number change will invalidate
>> the profile data. In fact, line number should play very minimal role
>> in profile matching decision. Another problem is that number of basic
>> blocks is also not a good indicator whether CFG matches or not.
>> 2) cgraph's pid is used as the key to find the matching profile data
>> for a function. If there is any new function added, or if the order of
>> the functions changes, the profile data is invalidated.
>>
>> The attached is a patch from google that improves this.
>> 1) function checksum is split into two parts: lineno checksum and cfg 
>> checksum
>> 2) function assembler name is used in profile hashing.
>
> Well, the overall idea was to make profile intolerant to pretty much any 
> source
> level change, since you never know if even when the CFG shape match, the 
> actual
> probabiliies did not changed.
> I however see that during development it is not bad to make compiler grok
> the profile as long as it seems consistent.
>
> I did not looked in detail into the implementation yet, but it seems
> sane at first glance.  However what about issuing a warning when line number
> information change telling user that profile might no longer be accurate
> and that he may want to remove it and train again?
>
> Honza
>
Index: tree.c
===
--- tree.c	(revision 172693)
+++ tree.c	(working copy)
@@ -8508,14 +8508,12 @@ dump_tree_statistics (void)
 
 #define FILE_FUNCTION_FORMAT "_GLOBAL__%s_%s"
 
-/* Generate a crc32 of a string.  */
+/* Generate a crc32 of a byte.  */
 
 unsigned
-crc32_string (unsigned chksum, const char *string)
+crc32_byte (unsigned chksum, char byte)
 {
-  do
-{
-  unsigned value = *string << 24;
+  unsigned value = (unsigned) byte << 24;
   unsigned ix;
 
   for (ix = 8; ix--; value <<= 1)
@@ -8526,6 +8524,18 @@ crc32_string (unsigned chksum, const cha
  	  chksum <<= 1;
  	  chksum ^= feedback;
   	}
+  return chksum;
+}
+
+
+/* Generate a crc32 of a string.  */
+
+unsigned
+crc32_string (unsigned chksum, const char *string)
+{
+  do
+{
+  chksum = crc32_byte (chksum, *string);
 }
   while (*string++);
   return chksum;
@@ -8549,8 +8559,10 @@ clean_symbol_name (char *p)
   *p = '_';
 }
 
-/* Generate a name for a special-purpose function function.
+/* Generate a name for a special-purpose function.
The generated name may need to be unique across the whole link.
+   Changes to this function may also require corresponding changes to
+   xstrdup_mask_random.
TYPE is some string to identify the purpose of this function to the
linker or collect2; it must start with an uppercase letter,
one of:
Index: tree.h
===
--- tree.h	(revision 172693)
+++ tree.h	(working copy)
@@ -4998,6 +4998,7 @@ inlined_function_outer_scope_p (const_tr
 
 /* In tree.c */
 extern unsigned crc32_string (unsigned, const char *);
+extern unsigned crc32_byte (unsigned, char);
 extern void clean_symbol_name (char *);
 extern tree get_file_function_name (const char *);
 extern tree get_callee_fndecl (const_tree);
Index: gcov.c
===
--- gcov.c	(revision 172693)
+++ gcov.c	(working copy)
@@ -54,6 +54,13 @@ along with Gcov; see the file COPYING3. 
some places we make use of the knowledge of how profile.c works to
select particular algorithms here.  */
 
+/* The code validates that the profile information read in corresponds
+   to the code currently being compiled.  Rather than checking for
+   identical files, the code below computes a checksum on the CFG
+   (based on the order of basic blocks and the arcs in the CFG).  If
+   the CFG checksum in the gcda file match the CFG checksum for the
+   code currently being compiled, the profile data will be used.  */
+
 /* This is the size of the buffer used to read in source file lines.  */
 
 #define STRING_SIZE 200
@@ -161,7 +168,8 @@ typedef struct function_info
   /* Name of function.  */
   char *name;
   unsigned ident;
-  unsigned checksum;
+  unsigned lineno_checksum;
+  unsigned cfg_checksum;
 
   /* Array of basic blocks.  */
   block_t *blocks;
@@ -809,12 +817,14 @@ read_graph_file (void)
   if (tag == GCOV_TAG_FUNCTION)
 	{
 	  char *function_name;
-	  unsigned iden

better wpa [1/n]: merge types during read-in

2011-04-19 Thread Michael Matz
Hi,

I have a backlog of random improvements to the WPA phase of LTO 
compilations, all with the common goal of reducing peak memory usage.  I 
was basically dumping all trees that the WPA phase read in, and then tried 
to think about which trees can be merged with already existing ones very 
early or alternatively which shouldn't have been in the global section 
from the beginning.  I.e. whenever there were two trees left after reading 
in all units, that IMO should have been the same, I either tried to merge 
them, or find reasons why they in fact were not the same.

This is the first part, that in essence merely moves the merging of types 
somewhat earlier.  Currently all non-local trees of all object files are 
read in, then symbols are merged, then types and prevailing decls are set.  
That's the maximum of peak memory usage you can get, because during 
read-in virtually no trees will be garbage collected.

There's much one can merge already during read-in, types are the easiest 
thing, some obvious top-level trees are in the same league (integer and 
string constants, tree_lists used to collect arguments, field_decls of the 
just replaced types).  Later patches will extend this set.  This current 
one will deal just with the mentioned entities.

One nice property of our reader is, that all trees are nicely collected in 
a sequential array (the reader cache).  This implies that we don't have to 
use walk_tree to get to all reachable trees, we merely have to iterate 
through that array and are sure we'll catch everything (except some 
builtin trees).  We also don't have to remember which trees we've already 
visited, we'll naturally visit each one just once.

The current callbacks of walk_tree are used for two purposes: merging 
types and setting prevailing decls.  The latter can only be done after the 
prevailing decls are known, and that is only when we've seen all object 
files (even in the case of the linker plugin).  Hence, while uniquifying 
the read trees I'm also remembering those that (potentially) refer to 
variables or functions in a hashtable, which conveniently has support in 
the garbage collector to remove unreachable trees (so, we'll later only 
iterate over those trees that are known to be reachable).  I'm using that 
hash at the place where currently walk_tree is used to iterate over only 
those trees that potentially refer to a var/function_decl.

The type merger is in essence a moved and slightly changed copy of the 
original walk_tree callback routines.  Via some asserts I've verified that 
I'm replacing all reachable vars or functions with their prevailing decl, 
before removing the old callbacks (the same cannot be done with the merged 
types, due to ordering issues type merging depends on other types already 
being merged, or, worse, some decls being merged; that's already currently 
the case, rerunning the type merger will produce more merged types).

I'm starting to merge tree as soon as a top-level invocation of 
lto_input_tree is finished.  That is the earliest point at which all 
recursively reachable trees are known and read in.  It's better to do it 
at that place than to defer it until the whole unit is read in because so 
the next invocations of input_tree (possibly referring back to already 
existing trees in the cache) will make use of the already merged variants.

Some statistics for an LTO bootstrap, in this case only for the cc1 
executable, only for the WPA phase, generated with a -O0 compiled lto1 
(the cc1 object are from stage3 of a normal lto bootstrap).  Without the 
patch:

Merging declarations
 {GC 320745k -> 204020k}Reading summaries
 ipa lto gimple out:   6.08 ( 7%) usr   0.14 ( 9%) sys   6.23 ( 7%) wall   
0 kB ( 0%) ggc
 ipa lto decl in   :  21.24 (23%) usr   0.38 (24%) sys  21.65 (23%) wall  
311556 kB (65%) ggc
 ipa lto decl out  :  17.89 (19%) usr   0.01 ( 1%) sys  17.91 (19%) wall   
0 kB ( 0%) ggc
 ipa lto decl merge:  12.09 (13%) usr   0.13 ( 8%) sys  12.22 (13%) wall   
7704 kB ( 2%) ggc
 inline heuristics :  21.05 (23%) usr   0.00 ( 0%) sys  21.07 (22%) wall   
28972 kB ( 6%) ggc
 TOTAL :  92.99 1.5894.68 
479817 kB

I.e. right after reading in all units we have 320MB of ggc memory.

With the patch:

Merging declarations
 {GC 193371k -> 176876k}Reading summaries
 ipa lto gimple out:   6.63 ( 8%) usr   0.19 ( 9%) sys  11.13 (12%) wall
   0 kB ( 0%) ggc
 ipa lto decl in   :  27.09 (32%) usr   0.30 (14%) sys  27.79 (30%) wall  
318044 kB (66%) ggc
 ipa lto decl out  :  13.96 (17%) usr   0.02 ( 1%) sys  13.99 (15%) wall
   0 kB ( 0%) ggc
 ipa lto decl merge:   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall
   5 kB ( 0%) ggc
 inline heuristics :  21.04 (25%) usr   0.03 ( 1%) sys  21.10 (23%) wall   
28972 kB ( 6%) ggc
 TOTAL :  84.10 2.1791.78 
478594 kB

So, about 193MB of ggc memory after rea

FDO usability: pid handling

2011-04-19 Thread Xinliang David Li
Hi, Insane value profile data may contain indirect call targets with
wrong (corrupted) pids.  r172276 solves the problem when the pid
refers to a bogus target that is still 'alive'. This patch addresses
the issue when the bogus target is already eliminated or pid is too
large.

OK after testing? (SPEC FDO testing is currently blocked by some regressions).

Thanks,

David



2011-04-19  Xinliang David Li  

* cgraph.h: New function declaration.
* cgraph.c (cgraph_remove_node): Remove from pid map.
* value-prof.c (find_func_by_pid): Handle insane pid.
(cgraph_remove_pid): New function.
Index: cgraph.h
===
--- cgraph.h	(revision 172721)
+++ cgraph.h	(working copy)
@@ -732,6 +732,8 @@ void compute_inline_parameters (struct c
 cgraph_inline_failed_t cgraph_edge_inlinable_p (struct cgraph_edge *);
 
 
+void cgraph_remove_pid (struct cgraph_node *);
+
 /* Create a new static variable of type TYPE.  */
 tree add_new_static_var (tree type);
 
Index: cgraph.c
===
--- cgraph.c	(revision 172721)
+++ cgraph.c	(working copy)
@@ -1478,6 +1478,7 @@ cgraph_remove_node (struct cgraph_node *
   cgraph_call_node_removal_hooks (node);
   cgraph_node_remove_callers (node);
   cgraph_node_remove_callees (node);
+  cgraph_remove_pid (node);
   ipa_remove_all_references (&node->ref_list);
   ipa_remove_all_refering (&node->ref_list);
   VEC_free (ipa_opt_pass, heap,
Index: value-prof.c
===
--- value-prof.c	(revision 172721)
+++ value-prof.c	(working copy)
@@ -1083,13 +1083,35 @@ init_pid_map (void)
 /* Return cgraph node for function with pid */
 
 static inline struct cgraph_node*
-find_func_by_pid (int	pid)
+find_func_by_pid (int pid)
 {
   init_pid_map ();
 
+  if (pid >= cgraph_max_pid)
+{
+  if (flag_profile_correction)
+inform (DECL_SOURCE_LOCATION (current_function_decl),
+"Inconsistent profile: indirect call target (%d) does not exist", pid);
+  else
+error ("Inconsistent profile: indirect call target (%d) does not exist", pid);
+
+  return NULL;
+}
+
   return pid_map [pid];
 }
 
+/* Remove a cgraph node N from maps of pids  */
+
+void
+cgraph_remove_pid (struct cgraph_node *n)
+{
+  if (pid_map == NULL || n->pid < 0)
+return;
+
+  pid_map [n->pid] = NULL;
+}
+
 /* Perform sanity check on the indirect call target. Due to race conditions,
false function target may be attributed to an indirect call site. If the
call expression type mismatches with the target function's type, expand_call


Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Georg-Johann Lay

Weddington, Eric schrieb:



-Original Message-
From: Richard Henderson [mailto:r...@redhat.com]
Sent: Tuesday, April 19, 2011 1:31 PM
To: Georg-Johann Lay
Cc: gcc-patches@gcc.gnu.org; Weddington, Eric; Denis Chertykov; Anatoly
Sokolov
Subject: Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if
needed

On 04/18/2011 10:20 AM, Georg-Johann Lay wrote:


+avr_asm_named_section (const char *name, unsigned int flags, tree decl)
+{
+  if (!avr_need_copy_data_p)
+avr_need_copy_data_p = ((0 == strncmp (name, ".data", 5)
+ || 0 == strncmp (name, ".rodata", 7)
+ || 0 == strncmp (name, ".gnu.linkonce.",


14)));


+
+  if (!avr_need_clear_bss_p)
+avr_need_clear_bss_p = (0 == strncmp (name, ".bss", 4));


Have a look at FLAGS.  I expect that you can reference those to
categorize the data rather than hard-coding the section names.
In particular I believe your ".gnu.linkonce" test is wrong.

It's not clear to me what you're looking for wrt "data".






We're looking for the existence of global initialized data variables.

We have 2 small subroutines in our libgcc, one to set everything in
.bss to zero, and another to copy the initializations from .text (in
flash) to the RAM based variables in .data. These subroutines have
always been included in the startup code whether they were needed or
not. The purpose of this patch is to check whether these subroutines
are actually needed or not (i.e. if anything actually exists in .bss,
or in .data). If either of these sections are actually empty, then we
don't want to link in the corresponding subroutine, which helps us
save some code space on small devices.

AFAICT, I've never seen the avr target generate anything in .rodata
and .gnu.linkonce sections. I think these might be holdovers from
some other target code and/or there for some completeness reason, I'm
not really sure.


When I wrote this patch I looked at the default linker script to see 
what goes into .data resp .bss; the hard-coded section maned reflect 
that. For the linkonce stuff I found no explanation (grepping the web 
yields bunch of questions from people having trouble with it) and in the 
gcc wiki you find nothing (except you have already a link to what you seek).


The .gnu.linkonce is an overestimation, .gnu.linkonce.d. is perhaps 
better, but exceptions are not supported on avr, anyway. So an 
overestimation won't hurt.


Then I wonder where .noinit, .progmem.data, .progmem.gcc_sw_table
would go?

Moreover, with checking section names it might be easier for users that 
want some private sections treated as cleared by __do_clear_bss or 
initialized by __do_copy_data.


AFAIK the section attribute does not allow to set section flags?

Eric, you don't see .rodata because of avr.c:avr_asm_init_sections
 readonly_data_section = data_section;
To see .rodata give -fdata-sections for const not in progmem.

The only thing I am a bit uncomfortable with is that there is no command 
line option to unconditionally reference the __do's.


Johann



[m32c] Various patches to fix build errors.

2011-04-19 Thread DJ Delorie

The stage 1 work had broken m32c-elf.  This at least fixes it so the
build works, along with a few performance tweaks I had sitting around
in my build.  Applied.

2011-04-19  DJ Delorie  

* config/m32c/m32c.c (m32c_emit_epilogue): Don't try to push
registers if we already know there aren't any.
(m32c_emit_epilogue): Don't emit a barrier here.
(m32c_emit_eh_epilogue): Likewise.
* config/m32c/blkmov.md (movstr): Don't fail on wrong-type
operands at expand time.
* config/m32c/m32c.h (WCHAR_TYPE_SIZE): Change to 4 to match "long
int" wchar type.
(REG_CLASS_CONTENTS, reg_class, REG_CLASS_NAMES): Remove
duplicates.  Provide aliases instead.
* config/m32c/prologue.md (eh_return): Emit a barrier here.
(eh_epilogue): Add a "(return)" here as a hint to other parts of
the compiler.

Index: config/m32c/m32c.c
===
--- config/m32c/m32c.c  (revision 172733)
+++ config/m32c/m32c.c  (working copy)
@@ -4482,17 +4482,20 @@ m32c_emit_prologue (void)
 /* Likewise, for the epilogue.  The only exception is that, for
interrupts, we must manually unwind the frame as the REIT opcode
doesn't do that.  */
 void
 m32c_emit_epilogue (void)
 {
+  int popm_count = m32c_pushm_popm (PP_justcount);
+
   /* This just emits a comment into the .s file for debugging.  */
-  if (m32c_pushm_popm (PP_justcount) > 0 || cfun->machine->is_interrupt)
+  if (popm_count > 0 || cfun->machine->is_interrupt)
 emit_insn (gen_epilogue_start ());
 
-  m32c_pushm_popm (PP_popm);
+  if (popm_count > 0)
+m32c_pushm_popm (PP_popm);
 
   if (cfun->machine->is_interrupt)
 {
   enum machine_mode spmode = TARGET_A16 ? HImode : PSImode;
 
   /* REIT clears B flag and restores $fp for us, but we still
@@ -4543,25 +4546,23 @@ m32c_emit_epilogue (void)
   else if (cfun->machine->use_rts)
 emit_jump_insn (gen_epilogue_rts ());
   else if (TARGET_A16)
 emit_jump_insn (gen_epilogue_exitd_16 ());
   else
 emit_jump_insn (gen_epilogue_exitd_24 ());
-  emit_barrier ();
 }
 
 void
 m32c_emit_eh_epilogue (rtx ret_addr)
 {
   /* R0[R2] has the stack adjustment.  R1[R3] has the address to
  return to.  We have to fudge the stack, pop everything, pop SP
  (fudged), and return (fudged).  This is actually easier to do in
  assembler, so punt to libgcc.  */
   emit_jump_insn (gen_eh_epilogue (ret_addr, cfun->machine->eh_stack_adjust));
   /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
-  emit_barrier ();
 }
 
 /* Indicate which flags must be properly set for a given conditional.  */
 static int
 flags_needed_for_conditional (rtx cond)
 {
Index: config/m32c/blkmov.md
===
--- config/m32c/blkmov.md   (revision 172733)
+++ config/m32c/blkmov.md   (working copy)
@@ -212,14 +212,14 @@
 ;; 0 = target: set to &NUL in dest
 ;; 1 = destination (mem:BLK ...)
 ;; 2 = source (mem:BLK ...)
 
 (define_expand "movstr"
   [(match_operand 0 "m32c_nonimmediate_operand" "")
-   (match_operand 1 "ap_operand" "")
-   (match_operand 2 "ap_operand" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")
]
   "TARGET_A24"
   "if (m32c_expand_movstr(operands)) DONE; FAIL;"
   )
 
 ;; 0 = dest (out)
Index: config/m32c/m32c.h
===
--- config/m32c/m32c.h  (revision 172733)
+++ config/m32c/m32c.h  (working copy)
@@ -198,13 +198,13 @@ machine_function;
 #define SIZE_TYPE "unsigned int"
 
 #undef  WCHAR_TYPE
 #define WCHAR_TYPE "long int"
 
 #undef  WCHAR_TYPE_SIZE
-#define WCHAR_TYPE_SIZE BITS_PER_WORD
+#define WCHAR_TYPE_SIZE 32
 
 /* REGISTER USAGE */
 
 /* Register Basics */
 
 /* Register layout:
@@ -285,38 +285,39 @@ machine_function;
   { 0x0004 }, /* R1  - r1 */\
   { 0x0002 }, /* R2  - r2 */\
   { 0x0008 }, /* R3  - r3 */\
   { 0x0003 }, /* R02 - r0r2 */\
   { 0x000c }, /* R13 - r1r3 */\
   { 0x0005 }, /* HL  - r0 r1 */\
-  { 0x0005 }, /* QI  - r0 r1 */\
   { 0x000a }, /* R23 - r2 r3 */\
   { 0x000f }, /* R03 - r0r2 r1r3 */\
-  { 0x000f }, /* DI  - r0r2r1r3 + mems */\
   { 0x0010 }, /* A0  - a0 */\
   { 0x0020 }, /* A1  - a1 */\
   { 0x0030 }, /* A   - a0 a1 */\
   { 0x00f0 }, /* AD  - a0 a1 sb fp */\
   { 0x01f0 }, /* PS  - a0 a1 sb fp sp */\
-  { 0x000f }, /* SI  - r0r2 r1r3 a0a1 */\
-  { 0x003f }, /* HI  - r0 r1 r2 r3 a0 a1 */\
   { 0x0033 }, /* R02A  - r0r2 a0 a1 */ \
-  { 0x003f }, /* RA  - r0..r3 a0 a1 */\
+  { 0x003f }, /* RA  - r0 r1 r2 r3 a0 a1 */\
   { 0x007f }, /* GENERAL */\
   { 0x0400 }, /* FLG */\
   { 0x01ff }, /* HC  - r0l r1 r2 r3 a0 a1 sb fb sp */\
   { 0x000ff000 }, /* MEM */\
   { 0x000ff003 }, /* R02_A_MEM */\
   { 0x000ff005 }, /* A_HL_MEM */\
   { 0x000ff00c }, /* R1_R3_A_MEM */\
   { 0x000ff00f }, /* R03_MEM */\
   { 0x000ff03

Re: [Patch,AVR]: PR18145: do_copy_data & do_clear_bss only if needed

2011-04-19 Thread Georg-Johann Lay

Georg-Johann Lay schrieb:

When I wrote this patch I looked at the default linker script to see 
what goes into .data resp .bss; the hard-coded section maned reflect

.names
that. For the linkonce stuff I found no explanation (grepping the web 
yields bunch of questions from people having trouble with it) and in the 
gcc wiki you find nothing (except you have already a link to what you 
seek).


The .gnu.linkonce is an overestimation, .gnu.linkonce.d. is perhaps 
better, but exceptions are not supported on avr, anyway. So an 
overestimation won't hurt.


Then I wonder where .noinit, .progmem.data, .progmem.gcc_sw_table
would go?


I mean what flags they get.

.init* gets SECTION_BSS in avr_section_type_flags, but there is also a 
hard-coded section name.


Moreover, there is a discrepancy between C and C++, see PR34734.
In C++, avr_section_type_flags sees no DECL_INITIAL (decl) even if one 
is present. I read tree stuff and they make no difference between C/C++ 
and promise DECL_INITIAL is there. Perhaps it's because of different 
parsers?


avr_section_type_flags could attach (target specific) flags.
However, checking section names explicitely appeared more robust to me, 
especially because PR34734 indicates that not all information is readily 
available.


Moreover, this patch is integrated in some avr-gcc distributions and has 
experienced some real world testing. I didn't hear complaints, but it's 
likely that I missed them because I am just volonteering and have long 
times of inactivity because of technical reasons then and when. Eric has 
definetely a better oversight.


Johann


Re: better wpa [1/n]: merge types during read-in

2011-04-19 Thread Eric Botcazou
> What do people think? (regstrapped with and without LTO bootstrap, the
> latter with all languages, the former without Ada, on x86_64-linux).

It would be nice to LTO bootstrap it with Ada as well because this involves 
different kinds of type merging.  You just need the one-liner at:
  http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01461.html

Btw, if someone could approve (or reject) the above patch...

-- 
Eric Botcazou


  1   2   >