Re: Fortran Patches

2011-09-16 Thread Janus Weil
Hi Tobias,

> could you also patches, which you commit as obvious to the mailing lists?

yes, I usually do this, but this time I just forgot. Sorry.


> Regarding the last patch, the GNU style puts a line break after the ")" in:
>
> +  if (!sym) return NULL;
> +

In principle I'm aware of the GNU coding style, but apparently I
didn't pay enough attention. Sorry again. I'll fix it ...

Cheers,
Janus


[Ada] fix potential memory corruption in annotated value cache

2011-09-16 Thread Alexandre Oliva
A -fcompare-debug regression in s-regexp.adb on x86_64-linux-gnu turned
out to be caused by a hashtable management error in annotate_value().
We ask for an insertion and leave the allocated slot empty while
proceeding to other computations that might (a) return without filling
it in, or (b) recurse and allocate the same slot or (c) grow and move
the table.

(a) is a performance problem because it might render cached values
invisible, should the allocated slot be formerly deleted, thus breaking
a rehash chain.

(b) is also a performance problem, because when the context that first
allocated the slot proceeds to fill it in, it may override another
cached value that happened to be assigned the same slot.  This is what
caused the -fcompare-debug difference: the annotation of a value had its
cache entry overridden by an upstream caller in only one of the
compilations, so a subsequent call of annotate_value with the same value
resulted in a use of the cached value in one, and an expansion that
remapped new decls in the other.  This out-of-sync decl numbering ended
up causing different symbol names to be chosen within
lhd_set_decl_assembler_name().

(c) is the scariest possibility: if the hash table that holds cached
values grows to the point of being moved during recursion, and upon
return we fill in the pointer in the slot that is in the old (possibly
reused) storage, we may be corrupting internal compiler state.

Some possible fixes I considered were:

1. inserting on entry (as is), allocating the cache entry right away,
and *always* filling it before returning

2. inserting on entry (as is), allocating the cache entry right away,
and releasing it before returning unless we're filling it in

3. not inserting on entry, and looking up again for insertion before
caching and returning, so as to get a fresh slot pointer

I implemented 3., and considered splitting the logic of annotate_value()
into one function that manages caching and calls the other to perform
the computation, so as to simplify the implementation.

Here's the patch I've tested on i686-pc-linux-gnu and x86_64-linux-gnu.
Ok to install?

for  gcc/ada/ChangeLog
from  Alexandre Oliva  

	* gcc-interface/decl.c (annotate_value): Look up expression for
	insertion in the cache at the end.

Index: gcc/ada/gcc-interface/decl.c
===
--- gcc/ada/gcc-interface/decl.c.orig	2011-09-15 03:51:42.984761174 -0300
+++ gcc/ada/gcc-interface/decl.c	2011-09-15 03:51:44.698733097 -0300
@@ -7471,23 +7471,26 @@ annotate_value (tree gnu_size)
 {
   TCode tcode;
   Node_Ref_Or_Val ops[3], ret;
-  struct tree_int_map **h = NULL;
+  struct tree_int_map in;
   int i;
 
   /* See if we've already saved the value for this node.  */
   if (EXPR_P (gnu_size))
 {
-  struct tree_int_map in;
+  struct tree_int_map **h;
+
   if (!annotate_value_cache)
 annotate_value_cache = htab_create_ggc (512, tree_int_map_hash,
 	tree_int_map_eq, 0);
   in.base.from = gnu_size;
   h = (struct tree_int_map **)
-	htab_find_slot (annotate_value_cache, &in, INSERT);
+	htab_find_slot (annotate_value_cache, &in, NO_INSERT);
 
-  if (*h)
+  if (h)
 	return (Node_Ref_Or_Val) (*h)->to;
 }
+  else
+in.base.from = NULL;
 
   /* If we do not return inside this switch, TCODE will be set to the
  code to use for a Create_Node operand and LEN (set above) will be
@@ -7588,8 +7591,17 @@ annotate_value (tree gnu_size)
   ret = Create_Node (tcode, ops[0], ops[1], ops[2]);
 
   /* Save the result in the cache.  */
-  if (h)
+  if (in.base.from)
 {
+  struct tree_int_map **h;
+  /* We can't assume the hash table data hasn't moved since the
+	 initial look up, so we have to search again.  Allocating and
+	 inserting an entry at that point would be an alternative, but
+	 then we'd better discard the entry if we decided not to cache
+	 it.  */
+  h = (struct tree_int_map **)
+	htab_find_slot (annotate_value_cache, &in, INSERT);
+  gcc_assert (!*h);
   *h = ggc_alloc_tree_int_map ();
   (*h)->base.from = gnu_size;
   (*h)->to = ret;


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


RE: [Patch, gcc, testsuite] Use long enums for case foldconst-3.c for target ARM EABI.

2011-09-16 Thread Terry Guo
Ping.

BR,
Terry

> -Original Message-
> From: Terry Guo [mailto:terry@arm.com]
> Sent: Sunday, September 11, 2011 9:39 AM
> To: gcc-patches@gcc.gnu.org
> Cc: r...@cebitec.uni-bielefeld.de
> Subject: [Patch, gcc, testsuite] Use long enums for case foldconst-3.c
> for target ARM EABI.
> 
> Hello,
> 
> This patch aims to disable short enums for arm eabi otherwise the case
> will
> fail to be compiled due to "width of 'code' exceeds its type". Is it OK
> to
> trunk?
> 
> BR,
> Terry
> 
> 2011-09-09  Terry Guo  
> 
> * gcc.dg/tree-ssa/foldconst-3.c: Use -fno-short-enums
> for ARM EABI target.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> index 6132362..e030f53 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized -fno-short-enums" { target
> arm_eabi } } */
>  typedef const union tree_node *const_tree;
>  typedef struct
>  {
> 
> 






Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-09-16 Thread Dodji Seketeli
> On 08/04/2011 11:32 AM, Dodji Seketeli wrote:
> > +++ b/gcc/diagnostic.c
> > @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "input.h"
> >  #include "intl.h"
> >  #include "diagnostic.h"
> > +#include "vec.h"
> 
> Do you still need this?

Oops, no.  Removed and adjusted gcc/Makefile.in accordingly.

> 
> >  // Just discard errors pointing at header files
> >  // { dg-prune-output "include" }
> > +// { dg-prune-output "from" }
> 
> These should be pruned by testsuite/lib/prune.exp.  I'm surprised they
> aren't already.

OK.  I have added that pruning to prune.exp and removed it from the
relevant test case files.

> 
> > +#define APPEND_LOC_TO_VEC(LOC) \
> > +  if (num_locs >=3D loc_vec_capacity)  \
> > +{  \
> > +  loc_vec_capacity +=3D 4; \
> > +  loc_vec =3D XRESIZEVEC (loc_t, loc_vec, loc_vec_capacity);
> > \
> > +}  \
> > +  loc_vec[num_locs++] =3D LOC;
> 
> Why not use VEC since we're in gcc/ here?

This is another leftover of when this code wasn't in gcc/.  I am using
VEC now in the amended patch.

> 
> > +/* Unwind the different macro expansions that lead to the token which
> > +   location is WHERE and emit diagnostics showing the resulting
> > +   unwound macro expansion stack.  If TOPMOST_EXP_POINT_MAP is
> > +   non-null, *TOPMOST_EXP_POINT_MAP is set to the map of the expansion
> > +   point of the top most macro of the stack.  This must be an ordinary
> > +   map.  */
> 
> I find the use of "top" here confusing.  You mean the place in the
> source that first triggered the macro expansion, right?

Yes.

>  Can we avoid talking about stacks here?

OK.  Sorry for the confusion.  I have removed the stack analogy added
hopefully more accurate comments.

[...]

> > +  while (unwind)
> > +{
> ...
> > +  if (!linemap_macro_expansion_map_p (map))
> > +   unwind =3D false;
> > +}
> 
> This seems like a job for do/while.

Updated accordingly.

I have fixed some other nits in the patch, bootstrapped and tested it on
x86_64-unknown-linux-gnu against a tree based on trunk and containing
the previous patches of the set.

Thanks.

From: Dodji Seketeli 
Date: Sat, 4 Dec 2010 16:31:35 +0100
Subject: [PATCH 3/7] Emit macro expansion related diagnostics

In this third instalment the diagnostic machinery -- when faced with
the virtual location of a token resulting from macro expansion -- uses
the new linemap APIs to unwind the stack of macro expansions that led
to that token and emits a [hopefully] more useful message than what we
have today.

diagnostic_report_current_module has been slightly changed to use the
location given by client code instead of the global input_location
variable.  This results in more precise diagnostic locations in
general but then the patch adjusts some C++ tests which output changed
as a result of this.

Three new regression tests have been added.

The mandatory screenshot goes like this:

[dodji@adjoa gcc]$ cat -n test.c
 1#define OPERATE(OPRD1, OPRT, OPRD2) \
 2  OPRD1 OPRT OPRD2;
 3
 4#define SHIFTL(A,B) \
 5  OPERATE (A,<<,B)
 6
 7#define MULT(A) \
 8  SHIFTL (A,1)
 9
10void
11g ()
12{
13  MULT (1.0);/* 1.0 << 1; <-- so this is an error.  */
14}

[dodji@adjoa gcc]$ ./cc1 -quiet -ftrack-macro-expansion test.c
test.c: In function ‘g’:
test.c:5:14: erreur: invalid operands to binary << (have ‘double’ and ‘int’)
test.c:2:9: note: in expansion of macro 'OPERATE'
test.c:5:3: note: expanded from here
test.c:5:14: note: in expansion of macro 'SHIFTL'
test.c:8:3: note: expanded from here
test.c:8:3: note: in expansion of macro 'MULT2'
test.c:13:3: note: expanded from here

The combination of this patch and the previous ones boostrapped with
--enable-languages=all,ada and passed regression tests on
x86_64-unknown-linux-gnu.

gcc/
* gcc/diagnostic.h (diagnostic_report_current_module): Add a
location parameter.
* diagnostic.c (diagnostic_report_current_module): Add a location
parameter to the function definition.  Use it instead of
input_location.  Resolve the virtual location rather than just
looking up its map and risking to touch a resulting macro map.
(default_diagnostic_starter): Pass the relevant diagnostic
location to diagnostic_report_current_module.
* tree-diagnostic.c (maybe_unwind_expanded_macro_loc): New.
(virt_loc_aware_diagnostic_finalizer): Likewise.
(diagnostic_report_current_function): Pass the
relevant location to diagnostic_report_current_module.
* tree-diagnostic.h (virt_loc_aware_diagnostic_finalizer): Declare
new function.
* toplev.c (general_init

do not copy DEBUG_EXPRs in copy_insn

2011-09-16 Thread Alexandre Oliva
copy_rtx avoids duplicating DEBUG_EXPRs, so that they map one-to-one to
DEBUG_EXPR_DECLs, but copy_insn lacks these smarts.  This was exposed by
a patch I'm working on, that uses RTL flags in DEBUG_EXPRs to speed up
expansion of var location notes within var-tracking.

Regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu.  Ok to install?

for  gcc/ChangeLog
from  Alexandre Oliva  

	* emit-rtl.c (copy_insn_1): Do not copy DEBUG_EXPRs.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c.orig	2011-09-15 03:51:41.982777588 -0300
+++ gcc/emit-rtl.c	2011-09-15 03:51:50.767633680 -0300
@@ -5269,6 +5269,7 @@ copy_insn_1 (rtx orig)
   switch (code)
 {
 case REG:
+case DEBUG_EXPR:
 case CONST_INT:
 case CONST_DOUBLE:
 case CONST_FIXED:


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: [PATCH 4/7] Support -fdebug-cpp option

2011-09-16 Thread Dodji Seketeli
Jason Merrill  writes:

> On 08/24/2011 10:06 AM, Tom Tromey wrote:
>> Dodji> Would it be acceptable to just change the output of -fdirective to 
>> fit?
>> Dodji> Or are we bound to not breaking existing consumers?
>>
>> I think changing it would be fine.
>
> I agree.

I have added that to my TODO list.

>
> On 07/16/2011 10:37 AM, Dodji Seketeli wrote:
>>  }
>> +
>> +void
>> +linemap_dump_location (struct line_maps *set,
>
> Comment.

Oops, added.

>> +@item -fdebug-cpp
>> +@opindex fdebug-cpp
>
> Please add something to clarify that this is only useful for debugging
> GCC.

Done.

Below is the updated patch, thanks.

This patch adds -fdebug-cpp option. When used with -E this dumps the
relevant macro map before every single token. This clutters the output
a lot but has proved to be invaluable in tracking some bugs during the
development of the virtual location support.

Tested on x86_64-unknown-linux-gnu against trunk.

libcpp/

* include/cpplib.h (struct cpp_options): New struct member.
* include/line-map.h (linemap_dump_location): Declare ...
* line-map.c (linemap_dump_location): ... new function.

gcc/

* doc/cppopts.texi: Document -fdebug-cpp.
* doc/invoke.texi: Add -fdebug-cpp to the list of preprocessor
options.

gcc/c-family/

* c.opt (fdebug-cpp): New option.
* c-opts.c (c_common_handle_option): Handle the option.
* c-ppoutput.c (maybe_print_line_1): New static function. Takes an
output stream in parameter. Factorized from ...
(maybe_print_line): ... this. Dump location debug information when
-fdebug-cpp is in effect.
(print_line_1): New static function. Takes an output stream in
parameter. Factorized from ...
(print_line): ... here. Dump location information when -fdebug-cpp
is in effect.
(scan_translation_unit): Dump location information when
-fdebug-cpp is in effect.
---
 gcc/c-family/c-opts.c |4 +++
 gcc/c-family/c-ppoutput.c |   57 
 gcc/c-family/c.opt|4 +++
 gcc/doc/cppopts.texi  |   13 ++
 gcc/doc/invoke.texi   |2 +-
 libcpp/include/cpplib.h   |4 +++
 libcpp/include/line-map.h |4 +++
 libcpp/line-map.c |   38 ++
 8 files changed, 114 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 3184539..6869d5c 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -628,6 +628,10 @@ c_common_handle_option (size_t scode, const char *arg, int 
value,
   cpp_opts->preprocessed = value;
   break;
 
+case OPT_fdebug_cpp:
+  cpp_opts->debug = 1;
+  break;
+
 case OPT_ftrack_macro_expansion:
   if (value)
value = 2;
diff --git a/gcc/c-family/c-ppoutput.c b/gcc/c-family/c-ppoutput.c
index b4bc9ce..cb010c5 100644
--- a/gcc/c-family/c-ppoutput.c
+++ b/gcc/c-family/c-ppoutput.c
@@ -59,7 +59,9 @@ static void account_for_newlines (const unsigned char *, 
size_t);
 static int dump_macro (cpp_reader *, cpp_hashnode *, void *);
 static void dump_queued_macros (cpp_reader *);
 
+static void print_line_1 (source_location, const char*, FILE *);
 static void print_line (source_location, const char *);
+static void maybe_print_line_1 (source_location, FILE *);
 static void maybe_print_line (source_location);
 static void do_line_change (cpp_reader *, const cpp_token *,
source_location, int);
@@ -243,7 +245,12 @@ scan_translation_unit (cpp_reader *pfile)
  in_pragma = false;
}
   else
-   cpp_output_token (token, print.outf);
+   {
+ if (cpp_get_options (parse_in)->debug)
+ linemap_dump_location (line_table, token->src_loc,
+print.outf);
+ cpp_output_token (token, print.outf);
+   }
 
   if (token->type == CPP_COMMENT)
account_for_newlines (token->val.str.text, token->val.str.len);
@@ -297,8 +304,9 @@ scan_translation_unit_trad (cpp_reader *pfile)
 /* If the token read on logical line LINE needs to be output on a
different line to the current one, output the required newlines or
a line marker, and return 1.  Otherwise return 0.  */
+
 static void
-maybe_print_line (source_location src_loc)
+maybe_print_line_1 (source_location src_loc, FILE *stream)
 {
   int src_line = LOCATION_LINE (src_loc);
   const char *src_file = LOCATION_FILE (src_loc);
@@ -306,7 +314,7 @@ maybe_print_line (source_location src_loc)
   /* End the previous line of text.  */
   if (print.printed)
 {
-  putc ('\n', print.outf);
+  putc ('\n', stream);
   print.src_line++;
   print.printed = 0;
 }
@@ -318,22 +326,37 @@ maybe_print_line (source_location src_loc)
 {
   while (src_line > print.src_line)
{
- putc ('\n', print.outf);
+ putc ('\n', stream);
  print.src_line++;
}
 }
   

Re: [PATCH 5/7] Add line map statistics to -fmem-report output

2011-09-16 Thread Dodji Seketeli
> On 07/16/2011 10:37 AM, Dodji Seketeli wrote:
> > +#define ONE_M ONE_K * ONE_K
> 
> Parenthesize this so that users don't need to.

OK.

> 
> > +  macro_maps_used_size =
> > +LINEMAPS_MACRO_USED (set) * sizeof (struct line_map)
> > ++ macro_maps_locations_size;
> 
> It seems odd to add in the locations size here since it's also printed
> separately.

I wanted macro_maps_used_size to really reflect the total used size
for macro maps, without having to mentally do the addition of its two
components.  But at the same time, I was interested in seeing how much
memory were the locations taking inside the macro map memory.  As I
was suspecting them to take a lot of memory.  It turned out I could
gain much more by optimizing things elsewhere.

> 
> > +  fprintf (stderr, "Total allocated maps size:   %5lu%c\n",
> > +  SCALE (s.total_allocated_map_size),
> > +  STAT_LABEL (s.total_allocated_map_size));
> > +  fprintf (stderr, "Total used maps size:%5lu%c\n",
> > +  SCALE (s.total_used_map_size),
> > +  STAT_LABEL (s.total_used_map_size));
> > +  fprintf (stderr, "Ordinary map used size:  %5lu%c\n",
> > +  SCALE (s.ordinary_maps_used_size),
> > +  STAT_LABEL (s.ordinary_maps_used_size));
> > +  fprintf (stderr, "Macro maps used size:%5lu%c\n",
> > +  SCALE (s.macro_maps_used_size),
> > +  STAT_LABEL (s.macro_maps_used_size));
> > +  fprintf (stderr, "Number of ordinary maps allocated:   %5lu%c\n",
> > +  SCALE (s.num_ordinary_maps_allocated),
> > +  STAT_LABEL (s.num_ordinary_maps_allocated));
> > +  fprintf (stderr, "Number of ordinary maps used:%5lu%c\n",
> > +  SCALE (s.num_ordinary_maps_used),
> > +  STAT_LABEL (s.num_ordinary_maps_used));
> > +  fprintf (stderr, "Number of macro maps used:   %5lu%c\n",
> > +  SCALE (s.num_macro_maps_used),
> > +  STAT_LABEL (s.num_macro_maps_used));
> > +  fprintf (stderr, "Ordinary maps allocated size:%5lu%c\n",
> > +  SCALE (s.ordinary_maps_allocated_size),
> > +  STAT_LABEL (s.ordinary_maps_allocated_size));
> > +  fprintf (stderr, "Macro maps locations size:   %5lu%c\n",
> > +  SCALE (s.macro_maps_locations_size),
> > +  STAT_LABEL (s.macro_maps_locations_size));
> > +  fprintf (stderr, "Duplicated maps locations size:  %5lu%c\n",
> > +  SCALE (s.duplicated_macro_maps_locations_size),
> > +  STAT_LABEL (s.duplicated_macro_maps_locations_size));
> 
> This seems oddly sorted.

I am not sure what you mean exactly, but in the patch below I tried to
actually sort them this time, as opposed to just adding things as I
needed them, in no particular order.  :-) Please tell me if you prefer
any particular order.

> And why the difference between ordinary and macro maps in terms of
> what is printed?

It's related to the difference of memory layout of ordinary and macro
maps.  I wanted to understand how each component of a macro map
impacts the overall size of taken by the macro maps, and where/how I
could gain by working on the macro map encoding.

For macro maps, the memory is allocated in two parts.  First the array
line_maps::info_macro::maps, and then, for each map, there is memory
allocated for the locations it holds.  Then, because of the way the
mapping encoding is done, there can be times where the two locations
of an entry of a macro map are the same.  This space wasted by this
redundancy is what I tried to quantify with
duplicated_macro_maps_locations_size.  I wanted to quantify the memory
consumed by each of these components to see how much memory I could
save by changing the way the macro maps were organized.  E.g, at one
iteration, I realized that it really was the number of macro maps that
was hurting, independently of how each map were encoded.  So we could
work on reducing that.  Then we realized that we were allocating too
much memory, just for line_maps::info_macro::maps alone.  Hence the
memory allocator patch, etc...

The memory of ordinary macros on the other hand is allocated in a much
simpler linear way.  Just the size of line_maps::info_ordinary::maps
tells the story.

> 
> > +/* Counters defined in libcpp's macro.c.  */
> > +extern unsigned num_expanded_macros_counter;
> > +extern unsigned num_macro_tokens_counter;
> 
> These should be part of struct linemap_stats.

Done, and updated input.c accordingly.

Thanks.

Bootstrapped and tested it on x86_64-unknown-linux-gnu against a tree
based on trunk and containing the previous patches of the set.

This patch adds statistics about line maps' memory consumption and macro
expansion to the output of -fmem-report.  It has been useful in trying
to reduce the memory consumption of the macro maps support.

Tested on x86_64-unknown-linux-gnu against trunk.

gcc/
* input.c (ONE_K, ONE_M, SCALE, STAT_LABEL, FORMAT_AMOUNT): New
macros.
(num_expand

Re: [Patch, gcc, testsuite] Use long enums for case foldconst-3.c for target ARM EABI.

2011-09-16 Thread Richard Earnshaw
On 11/09/11 02:39, Terry Guo wrote:
> Hello,
> 
> This patch aims to disable short enums for arm eabi otherwise the case will
> fail to be compiled due to "width of 'code' exceeds its type". Is it OK to
> trunk?
> 
> BR,
> Terry
> 
> 2011-09-09  Terry Guo  
> 
> * gcc.dg/tree-ssa/foldconst-3.c: Use -fno-short-enums 
> for ARM EABI target.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> index 6132362..e030f53 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized -fno-short-enums" { target
> arm_eabi } } */
>  typedef const union tree_node *const_tree;
>  typedef struct
>  {
> 
>

This is a compile-only test, and -fno-short-enums is a global option
that all targets support, so I'd suggest just adding -fno-short-enums to
the standard list of options.  It won't hurt those platforms where that
is already the default and it will fix the testcase problem for all
those platforms where it isn't.

R.




Re: do not copy DEBUG_EXPRs in copy_insn

2011-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2011 at 04:04:54AM -0300, Alexandre Oliva wrote:
> from  Alexandre Oliva  
> 
>   * emit-rtl.c (copy_insn_1): Do not copy DEBUG_EXPRs.

This is ok.

Jakub


Re: [Ada] fix potential memory corruption in annotated value cache

2011-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2011 at 04:02:32AM -0300, Alexandre Oliva wrote:
> -  struct tree_int_map in;
> +  struct tree_int_map **h;
> +
>if (!annotate_value_cache)
>  annotate_value_cache = htab_create_ggc (512, tree_int_map_hash,
>   tree_int_map_eq, 0);
>in.base.from = gnu_size;
>h = (struct tree_int_map **)
> - htab_find_slot (annotate_value_cache, &in, INSERT);
> + htab_find_slot (annotate_value_cache, &in, NO_INSERT);

I wonder why don't you use htab_find instead here.

> -  if (*h)
> +  if (h)
>   return (Node_Ref_Or_Val) (*h)->to;
>  }

Jakub


RE: [Patch, gcc, testsuite] Use long enums for case foldconst-3.c for target ARM EABI.

2011-09-16 Thread Terry Guo
> This is a compile-only test, and -fno-short-enums is a global option
> that all targets support, so I'd suggest just adding -fno-short-enums
> to
> the standard list of options.  It won't hurt those platforms where that
> is already the default and it will fix the testcase problem for all
> those platforms where it isn't.
> 
> R.

Agree. Here is the updated one.

BR,
Terry

2011-09-16  Terry Guo  

* gcc.dg/tree-ssa/foldconst-3.c: Don't use short enums.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/foldcons
index 6132362..9f10886 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized -fno-short-enums" } */
 typedef const union tree_node *const_tree;  typedef struct  {





Re: [Patch, gcc, testsuite] Use long enums for case foldconst-3.c for target ARM EABI.

2011-09-16 Thread Richard Earnshaw
On 16/09/11 09:28, Terry Guo wrote:
>> This is a compile-only test, and -fno-short-enums is a global option
>> that all targets support, so I'd suggest just adding -fno-short-enums
>> to
>> the standard list of options.  It won't hurt those platforms where that
>> is already the default and it will fix the testcase problem for all
>> those platforms where it isn't.
>>
>> R.
> 
> Agree. Here is the updated one.
> 
> BR,
> Terry
> 
> 2011-09-16  Terry Guo  
> 
> * gcc.dg/tree-ssa/foldconst-3.c: Don't use short enums.
> 
> 

OK.

R.



Re: [PATCH 6/7] Kill pedantic warnings on system headers macros

2011-09-16 Thread Dodji Seketeli
Jason Merrill  writes:

> On 07/16/2011 10:37 AM, Dodji Seketeli wrote:
> > +  location_t here = c_parser_peek_token (parser)->location;
> 
> Perhaps "first_token_loc"?

OK, changed.

> 
> > +   SYNTAX_ERROR2_AT (prev_virtual_location,
> > + "missing binary operator before token \"%s\"",
> > + cpp_token_as_text (pfile, op.token));
> 
> It seems to me that the "missing X before" errors should point to the
> current token, not the previous one.  So you can drop
> prev_virtual_location.

OK, dropped.

I have bootstrapped and tested it on x86_64-unknown-linux-gnu against a
tree based on trunk and containing the previous patches of the set.

This patch leverages the virtual location infrastructure to avoid
emitting pedantic warnings related to macros defined in system headers
but expanded in normal TUs.

The point is to make diagnostic routines use virtual locations of
tokens instead of their spelling locations.  The diagnostic routines
in turn indirectly use linemap_location_in_system_header_p to know if
a given virtual location originated from a system header.

The patch has two main parts.

The libcpp part makes diagnostic routines called from the preprocessor
expression parsing and number conversion code use virtual locations.

The C FE part makes diagnostic routines called from the type
specifiers validation code use virtual locations.

This fixes the relevant examples presented in the comments of the bug
but I guess, as usual, libcpp and the FEs will need on-going care to
use more and more virtual locations of tokens instead of spelling
locations.

The combination of the patch and the previous ones boostrapped with
--enable-languages=all,ada and passed regression tests on
x86_64-unknown-linux-gnu.

libcpp/

* include/cpplib.h (cpp_classify_number): Add a location parameter
to the declaration.
* expr.c (SYNTAX_ERROR_AT, SYNTAX_ERROR2_AT): New macros to emit
syntax error using a virtual location.
(cpp_classify_number): Add a virtual location parameter.  Use
SYNTAX_ERROR_AT instead of SYNTAX_ERROR, cpp_error_with_line
instead of cpp_error and cpp_warning_with_line instead of
cpp_warning.  Pass the new virtual location parameter to those
diagnostic routines.
(eval_token): Add a virtual location parameter.  Pass it down to
cpp_classify_number.  Use cpp_error_with_line instead of
cpp_error, cpp_warning_with_line instead of cpp_warning, and pass
the new virtual location parameter to these.
(_cpp_parse_expr): Use cpp_get_token_with_location instead of
cpp_get_token, to get the virtual location of the token. Use
SYNTAX_ERROR2_AT instead of SYNTAX_ERROR2, cpp_error_with_line
instead of cpp_error. Use the virtual location instead of the
spelling location.
* macro.c (maybe_adjust_loc_for_trad_cpp): Define new static
function.
(cpp_get_token_with_location): Use it.

gcc/c-family

* c-lex.c (c_lex_with_flags): Adjust to pass the virtual location
to cpp_classify_number.

gcc/

* c-tree.h (finish_declspecs): Add a virtual location parameter.
* c-decl.c (finish_declspecs): Add a virtual location parameter.
Use error_at instead of error and pass down the virtual location
to pewarn and error_at.
(declspecs_add_type): Use in_system_header_at instead of
in_system_header.
* c-parser.c (c_parser_declaration_or_fndef): Pass virtual
location of the relevant token to finish_declspecs.
(c_parser_struct_declaration, c_parser_parameter_declaration):
Likewise.
(c_parser_type_name): Likewise.

gcc/testsuite/

* gcc.dg/cpp/syshdr3.h: New test header.
* gcc.dg/cpp/syshdr3.c: New test file.
* gcc.dg/nofixed-point-2.c: Adjust to more precise location.
---
 gcc/c-decl.c   |   17 ++--
 gcc/c-family/c-lex.c   |4 +-
 gcc/c-parser.c |   12 ++-
 gcc/c-tree.h   |2 +-
 gcc/testsuite/gcc.dg/cpp/syshdr3.c |   16 +++
 gcc/testsuite/gcc.dg/cpp/syshdr3.h |7 ++
 gcc/testsuite/gcc.dg/nofixed-point-2.c |6 +-
 libcpp/expr.c  |  173 +++-
 libcpp/include/cpplib.h|3 +-
 9 files changed, 149 insertions(+), 91 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.c
 create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.h

diff --git a/gcc/c-decl.c b/gcc/c-decl.c
index 5d4564a..f139abc 100644
--- a/gcc/c-decl.c
+++ b/gcc/c-decl.c
@@ -8983,7 +8983,7 @@ declspecs_add_type (location_t loc, struct c_declspecs 
*specs,
  break;
case RID_COMPLEX:
  dupe = specs->complex_p;
- if (!flag_isoc99 && !in_system_header)
+ if (!flag_isoc99 && !in_system_header_at (loc))
   

[ARM] pass "--be8" to linker when linking for M profile

2011-09-16 Thread Bin Cheng
Hi,
Here attached the second version patch, with changes mentioned previously.

Is it ok?

Thanks-chengbin

2011-09-16  Cheng Bin 

* config/arm/bpabi.h (BE8_LINK_SPEC): Add cortex-m arch and
processors.


> -Original Message-
> From: Richard Earnshaw
> Sent: Thursday, September 15, 2011 6:46 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [ARM] pass "--be8" to linker when linking for M profile
> 
> On 15/09/11 03:41, Bin Cheng wrote:
> > Hi,
> > The linker should do endian swizzling at link-time according to "--be8"
> > option.
> > This patch modifies BE8_LINK_SPEC by adding cortex-m processors in the
specs
> > string.
> >
> > Since R-profile supports configurable big-endian instruction fetch, I
didn't
> > include it here.
> >
> > Is it ok? Thanks.
> >
> > 2011-09-15  Cheng Bin 
> > * config/arm/bpabi.h (BE8_LINK_SPEC): add cortex-m arch
and
> > processors.
> >
> > Thanks-chengbin=
> >
> >
> > gcc-be8-for-m-profile.patch
> >
> >
> 
> +#define BE8_LINK_SPEC  \
> +  " %{mbig-endian:%{march=armv7-a|mcpu=cortex-a5 \
> +   |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15 \
> +   |march=armv7-m|march=armv7e-m|mcpu=cortex-m3|mcpu=cortex-m4 \
> +   |march=armv6-m|mcpu=cortex-m0:%{!r:--be8}}}"
> 
> 
> Please sort this so that the list is ordered alphabetically by
> architecture/cpu (with architectures first).
> 
> It might save some patch churn in the future if each element was put on
> a line on its own.
> 
> OK with that change.
> 
> R.

gcc-be8-for-m-profile-20110916.patch
Description: Binary data


[RFC PATCH] Improve -mavx{,2} vector extraction

2011-09-16 Thread Jakub Jelinek
Hi!

I've noticed vector extraction generates terrible code
with -mavx{,2} when extracting something from 32-byte vectors.
Everything is forced into memory, then loaded back.

For extraction of first lane I believe we can just use
standard 128-bit extraction from the %xmmN register corresponding
to %ymmN register containing the 256-bit vector, if the extraction
is %v prefixed it shouldn't result in any penalty, right?
For the second lane vextract{f,i}128 is used to swap the lanes
first (well, before reload even the first lane is represented as
extraction of the first lane, but post reload this is splitted
into a subreg).

>From what I understood, for vectors containing integers instead of
floats AVX2 prefers vextracti128 instead of vextractf128, this patch
teaches those patterns to do that.  Should we do the same for
vinsertf128 patterns with V*[QHSD]Imode 32-byte modes?

The avx2_extracti128 pattern looked like wrong RTL, as to extract
a 2 element vector from 4 element vector it used just one constant
in the parallel instead of two.  I've changed it into a define_expand.

Still, for SSE4.1+ we seem to generate terrible code for
float extraction of elements 1, 2 and 3 (and for 32-byte vectors
also 5, 6 and 7 after the lanes are swapped).  For -mno-sse4
we would be in those cases shuffling the vectors and then doing
vec_extractv4sf_0 which is a nop.  But for SSE4.1 we have:
(define_insn "*sse4_1_extractps"
  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm")
(vec_select:SF
  (match_operand:V4SF 1 "register_operand" "x")
  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")])))]
  "TARGET_SSE4_1"
  "%vextractps\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "type" "sselog")
   (set_attr "prefix_data16" "1")
   (set_attr "prefix_extra" "1")
   (set_attr "length_immediate" "1")
   (set_attr "prefix" "maybe_vex")
   (set_attr "mode" "V4SF")])
which is fine if we want to extract into general register or memory,
but if we want to extract the float into "x" constraint register, this
results in spilling it and loading back immediately.  I wonder if
the above insn shouldn't have "=rm,x" alternative which would be splitted
after reload into doing what ix86_expand_vector_extract does in that
case for pre-SSE4.1 - i.e. some vector shuffling and the noop
vec_extractv4sf_0-ish SUBREGing.

Thoughts?

2011-09-16  Jakub Jelinek  

* config/i386/sse.md (vec_extract_hi_,
vec_extract_hi_v16hi, vec_extract_hi_v32qi): Use
vextracti128 instead of vextractf128 for -mavx2 and
integer vectors.  For V4DFmode fix up mode attribute.
(VEC_EXTRACT_MODE): For TARGET_AVX add 32-byte vectors.
(vec_set_lo_, vec_set_hi_): For VI8F_256 modes use V4DF
instead of V8SF mode attribute.
(avx2_extracti128): Change into define_expand.
* config/i386/i386.c (ix86_expand_vector_extract): Handle
32-byte vector modes if TARGET_AVX.

* gcc.target/i386/sse2-extract-1.c: New test.
* gcc.target/i386/avx-extract-1.c: New test.

--- gcc/config/i386/sse.md.jj   2011-09-15 17:36:20.0 +0200
+++ gcc/config/i386/sse.md  2011-09-16 10:51:51.0 +0200
@@ -3863,13 +3863,23 @@ (define_insn "vec_extract_hi_"
  (match_operand:VI8F_256 1 "register_operand" "x,x")
  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else
+   (and (match_test "TARGET_AVX2")
+   (eq (const_string "mode") (const_string "V4DImode")))
+ (const_string "OI")
+ (const_string "V4DF")))])
 
 (define_insn_and_split "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=x,m")
@@ -3898,13 +3908,23 @@ (define_insn "vec_extract_hi_"
  (parallel [(const_int 4) (const_int 5)
 (const_int 6) (const_int 7)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else
+   (and (match_test "TARGET_AVX2")
+   (eq (const_string "mode") (const_string "V8SImode")))
+ (const_string "OI")
+ (const_string "V8SF")))])
 
 (define_insn_and_split "vec_extract_lo_v16hi"
   [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m"

[PATCH, 4.4] Backport of the fix for PR 49886 (again)

2011-09-16 Thread Martin Jambor
Hi,

the patch below is a backport of the fix for PR 49886 as it is now in
trunk except that we check for type attributes in ipa-split instead of
in ipa-inline-analysis which does not exist in the 4.6 branch.  The
fix is almost the same as the one I have previously reverted but the
!is_gimple_reg path was added to address the fallout of the first
patch.

I have checked that the testcase from PR 50295 does not trigger with
this patch and added the testcase from PR 50287 (which is essentially
the same bug but the testcase is simpler).  Also bootstrapped and
tested on x86_64-linux.

Since the difference from the previously approved patch is exactly
what has been approved for trunk (and finally seems to work), I will
commit this on Monday unless someone objects.

Thanks,

Martin


2011-09-15  Martin Jambor  

PR middle-end/49886
* ipa-split.c (split_function): Do not change signature if it is
not possible or there are attribute types.

* testsuite/gcc.dg/torture/pr49886.c: Remove XFAILs.
* testsuite/gcc.dg/torture/pr50287.c: New test.

Index: gcc/testsuite/gcc.dg/torture/pr49886.c
===
--- gcc/testsuite/gcc.dg/torture/pr49886.c  (revision 178885)
+++ gcc/testsuite/gcc.dg/torture/pr49886.c  (working copy)
@@ -1,5 +1,4 @@
 /* { dg-do run } */
-/* { dg-xfail-run-if "" { "*-*-*" } { "-O2" "-O3" "-Os" } } */
 
 struct PMC {
 unsigned flags;
Index: gcc/testsuite/gcc.dg/torture/pr50287.c
===
--- gcc/testsuite/gcc.dg/torture/pr50287.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr50287.c  (revision 0)
@@ -0,0 +1,109 @@
+/* { dg-do compile } */
+
+struct PMC {
+unsigned flags;
+};
+
+struct PVC {
+  unsigned flags, other_stuff;
+};
+
+
+typedef struct Pcc_cell
+{
+struct PMC *p;
+long bla;
+long type;
+} Pcc_cell;
+
+int gi;
+int cond;
+
+struct PVC g_pvc;
+
+extern void abort ();
+extern void never_ever(int interp, struct PMC *pmc)
+  __attribute__((noinline,noclone));
+
+void never_ever (int interp, struct PMC *pmc)
+{
+  abort ();
+}
+
+static void mark_cell(int * interp, Pcc_cell *c, struct PVC pvc)
+  __attribute__((__nonnull__(1)));
+
+static void
+mark_cell(int * interp, Pcc_cell *c, struct PVC pvc)
+{
+  if (!cond)
+return;
+
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<8)))
+never_ever(gi + 1, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<7)))
+never_ever(gi + 2, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<6)))
+never_ever(gi + 3, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<5)))
+never_ever(gi + 4, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<4)))
+never_ever(gi + 5, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<3)))
+never_ever(gi + 6, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<2)))
+never_ever(gi + 7, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<1)))
+never_ever(gi + 8, c->p);
+  if (c && c->type == 4 && c->p
+  && !(c->p->flags & (1<<9)))
+never_ever(gi + 9, c->p);
+}
+
+static void
+foo(int * interp, Pcc_cell *c)
+{
+  mark_cell(interp, c, g_pvc);
+}
+
+static struct Pcc_cell *
+__attribute__((noinline,noclone))
+getnull(void)
+{
+  return (struct Pcc_cell *) 0;
+}
+
+
+int main()
+{
+  int i;
+
+  cond = 1;
+  for (i = 0; i < 100; i++)
+foo (&gi, getnull ());
+  return 0;
+}
+
+
+void
+bar_1 (int * interp, Pcc_cell *c)
+{
+  c->bla += 1;
+  mark_cell(interp, c, g_pvc);
+}
+
+void
+bar_2 (int * interp, Pcc_cell *c, struct PVC pvc)
+{
+  c->bla += 2;
+  mark_cell(interp, c, pvc);
+}
+
Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 178885)
+++ gcc/ipa-split.c (working copy)
@@ -946,7 +946,7 @@ split_function (struct split_point *spli
   bitmap args_to_skip = BITMAP_ALLOC (NULL);
   tree parm;
   int num = 0;
-  struct cgraph_node *node;
+  struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
   basic_block return_bb = find_return_bb ();
   basic_block call_bb;
   gimple_stmt_iterator gsi;
@@ -966,17 +966,39 @@ split_function (struct split_point *spli
   dump_split_point (dump_file, split_point);
 }
 
+  if (cur_node->local.can_change_signature
+  && !TYPE_ATTRIBUTES (TREE_TYPE (cur_node->decl)))
+args_to_skip = BITMAP_ALLOC (NULL);
+  else
+args_to_skip = NULL;
+
   /* Collect the parameters of new function and args_to_skip bitmap.  */
   for (parm = DECL_ARGUMENTS (current_function_decl);
parm; parm = DECL_CHAIN (parm), num++)
-if (!is_gimple_reg (parm)
-   || !gimple_default_def (cfun, parm)
-   || !bitmap_bit_p (split_point->ssa_names_to_pass,
- SSA_NAME_VERSION (gimple_default_def (cfun, parm
+ 

[RFC PATCH] Improve V8SFmode and V4DFmode smin/smax reductions

2011-09-16 Thread Jakub Jelinek
Hi!

I've noticed that the code generated for -mavx min/max reductions is
terrible, the following patch is an attempt to improve it.

In fad function (i.e. V4DFmode reduction) the difference with the patch
(plus the patch I've posted today) is:
-   vmovapd %ymm0, -56(%rsp)
-   vmovapd %ymm0, -24(%rsp)
-   vmovsd  -48(%rsp), %xmm2
-   vmovapd %ymm0, -88(%rsp)
-   vmaxsd  -24(%rsp), %xmm2, %xmm1
-   vmovapd %ymm0, -120(%rsp)
-   vmaxsd  -72(%rsp), %xmm1, %xmm1
-   vmaxsd  -96(%rsp), %xmm1, %xmm0
+   vperm2f128  $1, %ymm0, %ymm0, %ymm1
+   vmaxpd  %ymm0, %ymm1, %ymm0
+   vshufpd $1, %ymm0, %ymm0, %ymm1
+   vmaxpd  %ymm1, %ymm0, %ymm0
and in faf (V8SFmode reduction) the difference is:
-   vmovaps %ymm0, 72(%rsp)
-   vmovaps %ymm0, 104(%rsp)
-   vmovss  76(%rsp), %xmm2
-   vmaxss  104(%rsp), %xmm2, %xmm1
-   vmovaps %ymm0, 40(%rsp)
-   vmovaps %ymm0, 8(%rsp)
-   vmovaps %ymm0, -24(%rsp)
-   vmovaps %ymm0, -56(%rsp)
-   vmovaps %ymm0, -88(%rsp)
-   vmovaps %ymm0, -120(%rsp)
-   vmaxss  48(%rsp), %xmm1, %xmm1
-   vmaxss  20(%rsp), %xmm1, %xmm1
-   vmaxss  -8(%rsp), %xmm1, %xmm1
-   vmaxss  -36(%rsp), %xmm1, %xmm1
-   vmaxss  -64(%rsp), %xmm1, %xmm1
-   vmaxss  -92(%rsp), %xmm1, %xmm0
+   vperm2f128  $1, %ymm0, %ymm0, %ymm1
+   vmaxps  %ymm0, %ymm1, %ymm0
+   vshufps $14, %ymm0, %ymm0, %ymm1
+   vmaxps  %ymm0, %ymm1, %ymm0
+   vshufps $1, %ymm0, %ymm0, %ymm1
+   vmaxps  %ymm1, %ymm0, %ymm0

Surprisingly with -mavx2 the integer loops aren't vectorized with
32-byte vectors, wonder why.  But looking at the integer umin/umax/smin/smax
16-byte reductions they generate good code even without reduc_* patterns,
apparently using vector shifts.

2011-09-16  Jakub Jelinek  

* config/i386/i386.c (ix86_expand_reduc_v4sf): Rename to ...
(ix86_expand_reduc): ... this.  Handle also V8SFmode and V4DFmode.
* config/i386/sse.md (reduc_splus_v4sf, reduc_smax_v4sf,
reduc_smin_v4sf): Adjust callers.
(reduc_smax_v8sf, reduc_smin_v8sf, reduc_smax_v4df, reduc_smin_v4df):
New expanders.

* gcc.dg/vect/vect-reduc-10.c: New test.
* gcc.target/i386/avx-reduc-1.c: New test.

--- gcc/config/i386/i386.c.jj   2011-09-15 12:18:50.0 +0200
+++ gcc/config/i386/i386.c  2011-09-16 11:54:27.0 +0200
@@ -32623,24 +32623,45 @@ ix86_expand_vector_extract (bool mmx_ok,
 }
 }
 
-/* Expand a vector reduction on V4SFmode for SSE1.  FN is the binary
-   pattern to reduce; DEST is the destination; IN is the input vector.  */
+/* Expand a vector reduction.  FN is the binary pattern to reduce;
+   DEST is the destination; IN is the input vector.  */
 
 void
-ix86_expand_reduc_v4sf (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
+ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
 {
-  rtx tmp1, tmp2, tmp3;
+  rtx tmp1, tmp2, tmp3, tmp4, tmp5;
+  enum machine_mode mode = GET_MODE (in);
 
-  tmp1 = gen_reg_rtx (V4SFmode);
-  tmp2 = gen_reg_rtx (V4SFmode);
-  tmp3 = gen_reg_rtx (V4SFmode);
+  tmp1 = gen_reg_rtx (mode);
+  tmp2 = gen_reg_rtx (mode);
+  tmp3 = gen_reg_rtx (mode);
 
-  emit_insn (gen_sse_movhlps (tmp1, in, in));
-  emit_insn (fn (tmp2, tmp1, in));
-
-  emit_insn (gen_sse_shufps_v4sf (tmp3, tmp2, tmp2,
- const1_rtx, const1_rtx,
- GEN_INT (1+4), GEN_INT (1+4)));
+  switch (mode)
+{
+case V4SFmode:
+  emit_insn (gen_sse_movhlps (tmp1, in, in));
+  emit_insn (fn (tmp2, tmp1, in));
+  emit_insn (gen_sse_shufps_v4sf (tmp3, tmp2, tmp2,
+ const1_rtx, const1_rtx,
+ GEN_INT (1+4), GEN_INT (1+4)));
+  break;
+case V8SFmode:
+  tmp4 = gen_reg_rtx (mode);
+  tmp5 = gen_reg_rtx (mode);
+  emit_insn (gen_avx_vperm2f128v8sf3 (tmp4, in, in, const1_rtx));
+  emit_insn (fn (tmp5, tmp4, in));
+  emit_insn (gen_avx_shufps256 (tmp1, tmp5, tmp5, GEN_INT (2+12)));
+  emit_insn (fn (tmp2, tmp1, tmp5));
+  emit_insn (gen_avx_shufps256 (tmp3, tmp2, tmp2, const1_rtx));
+  break;
+case V4DFmode:
+  emit_insn (gen_avx_vperm2f128v4df3 (tmp1, in, in, const1_rtx));
+  emit_insn (fn (tmp2, tmp1, in));
+  emit_insn (gen_avx_shufpd256 (tmp3, tmp2, tmp2, const1_rtx));
+  break;
+default:
+  gcc_unreachable ();
+}
   emit_insn (fn (dest, tmp2, tmp3));
 }
 
--- gcc/config/i386/sse.md.jj   2011-09-08 11:21:09.0 +0200
+++ gcc/config/i386/sse.md  2011-09-16 10:51:51.0 +0200
@@ -1253,7 +1253,7 @@ (define_expand "reduc_splus_v4sf"
   emit_insn (gen_sse3_haddv4sf3 (operands[0], tmp, tmp));
 }
   else
-ix86_expand_reduc_v4sf (gen_addv4sf3, operands[0], operands[1]);
+ix86_expand_reduc (gen_addv4sf3, operands[0], operands[1]);
   DONE;
 })
 
@@ -1263,7 +1263,7 @@ (define_expand "reduc_smax_v4sf"
(

[PATCH] Do not store/stream binfos in jump functions

2011-09-16 Thread Martin Jambor
Hi,

this patch is basically a followup to
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00398.html

The problem is that BINFOs coming from different compilation units and
which are streamed into jump-functions do not undergo any unification
in LTO and we have the same value represented by different BINFO
structures (and compare them by comparing pointers to them).

The solution is to do the BINFO lookup from the type after the type
unification system has chosen the prevailing type (and fixed all
references to it).  This means that in the known type jump functions
we need to store the parameters for get_binfo_at_offset instead of its
result and call the function only when actually using the value of the
function at the IPA decision making stage.

I have bootstrapped and tested the patch on x86_64-linux and have also
successfully LTO-built 483.xalancbmk and now I'm in the process of
LTO-building Firefox with it.  OK for trunk?

Thanks,

Martin


2011-09-15  Martin Jambor  

* ipa-prop.h (jump_func_type): Updated comments.
(ipa_known_type_data): New type.
(ipa_jump_func): Use it to describe known type jump functions.
* ipa-prop.c (ipa_print_node_jump_functions_for_edge): Updated to
reflect the new known type jump function contents.
(compute_known_type_jump_func): Likewise.
(combine_known_type_and_ancestor_jfs): Likewise.
(try_make_edge_direct_virtual_call): Likewise.
(ipa_write_jump_function): Likewise.
(ipa_read_jump_function): Likewise.
* ipa-cp.c (ipa_value_from_known_type_jfunc): New function.
(ipa_value_from_jfunc): Use ipa_value_from_known_type_jfunc.
(propagate_accross_jump_function): Likewise.

Index: src/gcc/ipa-cp.c
===
--- src.orig/gcc/ipa-cp.c
+++ src/gcc/ipa-cp.c
@@ -674,6 +674,19 @@ ipa_get_jf_ancestor_result (struct ipa_j
 return NULL_TREE;
 }
 
+/* Extract the acual BINFO being described by JFUNC which must b e known type
+   jump function.  */
+
+static tree
+ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
+{
+  tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type);
+  gcc_checking_assert (base_binfo);
+  return get_binfo_at_offset (base_binfo,
+ jfunc->value.known_type.offset,
+ jfunc->value.known_type.component_type);
+}
+
 /* Determine whether JFUNC evaluates to a known value (that is either a
constant or a binfo) and if so, return it.  Otherwise return NULL. INFO
describes the caller node so that pass-through jump functions can be
@@ -685,7 +698,7 @@ ipa_value_from_jfunc (struct ipa_node_pa
   if (jfunc->type == IPA_JF_CONST)
 return jfunc->value.constant;
   else if (jfunc->type == IPA_JF_KNOWN_TYPE)
-return jfunc->value.base_binfo;
+return ipa_value_from_known_type_jfunc (jfunc);
   else if (jfunc->type == IPA_JF_PASS_THROUGH
   || jfunc->type == IPA_JF_ANCESTOR)
 {
@@ -991,7 +1004,11 @@ propagate_accross_jump_function (struct
   tree val;
 
   if (jfunc->type == IPA_JF_KNOWN_TYPE)
-   val = jfunc->value.base_binfo;
+   {
+ val = ipa_value_from_known_type_jfunc (jfunc);
+ if (!val)
+   return set_lattice_contains_variable (dest_lat);
+   }
   else
val = jfunc->value.constant;
   return add_value_to_lattice (dest_lat, val, cs, NULL, 0);
Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -164,10 +164,12 @@ ipa_print_node_jump_functions_for_edge (
fprintf (f, "UNKNOWN\n");
   else if (type == IPA_JF_KNOWN_TYPE)
{
- tree binfo_type = TREE_TYPE (jump_func->value.base_binfo);
- fprintf (f, "KNOWN TYPE, type in binfo is: ");
- print_generic_expr (f, binfo_type, 0);
- fprintf (f, " (%u)\n", TYPE_UID (binfo_type));
+ fprintf (f, "KNOWN TYPE: base  ");
+ print_generic_expr (f, jump_func->value.known_type.base_type, 0);
+ fprintf (f, ", offset "HOST_WIDE_INT_PRINT_DEC", component ",
+  jump_func->value.known_type.offset);
+ print_generic_expr (f, jump_func->value.known_type.component_type, 0);
+ fprintf (f, "\n");
}
   else if (type == IPA_JF_CONST)
{
@@ -638,7 +640,7 @@ compute_known_type_jump_func (tree op, s
  gimple call)
 {
   HOST_WIDE_INT offset, size, max_size;
-  tree base, binfo;
+  tree base;
 
   if (!flag_devirtualize
   || TREE_CODE (op) != ADDR_EXPR
@@ -654,18 +656,14 @@ compute_known_type_jump_func (tree op, s
   || is_global_var (base))
 return;
 
-  if (detect_type_change (op, base, call, jfunc, offset))
+  if (detect_type_change (op, base, call, jfunc, offset)
+  || !TYPE_BINFO (TREE_TYPE (base)))
 return;
 
-  binfo = TYPE_BINFO (TREE_TYPE (base));
-  if (!binfo)
-ret

[PATCH, libiberty] correct md5_process_bytes with unaligned pointers

2011-09-16 Thread Pierre Vittet
Hello,

The patch is the result of the following threads:

Here is a patch correcting md5_process_bytes when we are in the case of
unaligned pointers.A pair of brace was missing, leading the buffer to be
shift 2 times losing a part of its content.


The patch also remove a preprocessor #if testing if
_STRING_ARCH_unaligned is defined. This symbol is never defined in gcc
and could be only used in CFLAGS. Looking at the code, it does not looks
usefull to define it (and it is only tested on libiberty/md5.c and
libiberty/sha1.c), as we already check the pointer alignement, so
removing it clean the code. I searched on google, and it does not looks
to be used. Does anyone want it or thing that it should not be removed?

Ok for trunk ?

Thanks!

Pierre Vittet

PS: I also write a small gcc plugin, allowing to easily test
md5_process_bytes, if can change your environment in a way where the
pointer buffer is not aligned, you should get the bug.


Index: libiberty/md5.c
===
--- libiberty/md5.c	(révision 178905)
+++ libiberty/md5.c	(copie de travail)
@@ -227,7 +227,6 @@ md5_process_bytes (const void *buffer, size_t len,
   /* Process available complete blocks.  */
   if (len > 64)
 {
-#if !_STRING_ARCH_unaligned
 /* To check alignment gcc has an appropriate operator.  Other
compilers don't.  */
 # if __GNUC__ >= 2
@@ -244,10 +243,11 @@ md5_process_bytes (const void *buffer, size_t len,
 len -= 64;
   }
   else
-#endif
-  md5_process_block (buffer, len & ~63, ctx);
-  buffer = (const void *) ((const char *) buffer + (len & ~63));
-  len &= 63;
+	{
+	  md5_process_block (buffer, len & ~63, ctx);
+	  buffer = (const void *) ((const char *) buffer + (len & ~63));
+	  len &= 63;
+	}
 }
 
   /* Move remaining bytes in internal buffer.  */
2011-09-16  Pierre Vittet  

* md5.c (md5_process_bytes): Remove unused _STRING_ARCH_unaligned, add
missing braces.



micro_plugin_md5.tar.gz
Description: GNU Zip compressed data


Re: [PATCH] Do not store/stream binfos in jump functions

2011-09-16 Thread Jan Hubicka
> 
>   * ipa-prop.h (jump_func_type): Updated comments.
>   (ipa_known_type_data): New type.
>   (ipa_jump_func): Use it to describe known type jump functions.
>   * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Updated to
>   reflect the new known type jump function contents.
>   (compute_known_type_jump_func): Likewise.
>   (combine_known_type_and_ancestor_jfs): Likewise.
>   (try_make_edge_direct_virtual_call): Likewise.
>   (ipa_write_jump_function): Likewise.
>   (ipa_read_jump_function): Likewise.
>   * ipa-cp.c (ipa_value_from_known_type_jfunc): New function.
>   (ipa_value_from_jfunc): Use ipa_value_from_known_type_jfunc.
>   (propagate_accross_jump_function): Likewise.

OK. If we saved just one pointer to the actual type (i.e. not BINFO), would it 
be any sanier?

Honza


partial fix for PR lto/50430

2011-09-16 Thread Jan Hubicka
Hi,
this patch fixes ICE seen when compiling libreoffice. Sadly I didn't get any
testcase since libreoffice one desn't reproduce with -r.

The problem is external vtable whose constructor is not streamed, because we
stream only constructors needed and we do not take external vars into account.

This patch makes the ICE go away turning 50430 into an missed optimization.
We obvoiusly shoud do something for programs with error in them, so the patch
makes sense even after we fix the streaming issue.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR lto/50430
* gimple-fold.c (gimple_get_virt_method_for_binfo): Do not ICE on
error_mark_node in the DECL_INITIAL of vtable.

Index: gimple-fold.c
===
--- gimple-fold.c   (revision 178757)
+++ gimple-fold.c   (working copy)
@@ -3048,7 +3048,8 @@ gimple_get_virt_method_for_binfo (HOST_W
 
   if (TREE_CODE (v) != VAR_DECL
   || !DECL_VIRTUAL_P (v)
-  || !DECL_INITIAL (v))
+  || !DECL_INITIAL (v)
+  || DECL_INITIAL (v) == error_mark_node)
 return NULL_TREE;
   gcc_checking_assert (TREE_CODE (TREE_TYPE (v)) == ARRAY_TYPE);
   size = tree_low_cst (TYPE_SIZE (TREE_TYPE (TREE_TYPE (v))), 1);


Re: inline-analysis improvement

2011-09-16 Thread Eric Botcazou
>   * ipa-inline-analysis.c (add_condition): Add conditions parameter;
>   simplify obviously true clauses.
>   (and_predicates, or_predicates): Add conditions parameter.
>   (inline_duplication_hoook): Update.
>   (mark_modified): New function.
>   (unmodified_parm): New function.
>   (eliminated_by_inlining_prob, (set_cond_stmt_execution_predicate,
>   set_switch_stmt_execution_predicate, will_be_nonconstant_predicate):
>   Use unmodified_parm.
>   (estimate_function_body_sizes): Update.
>   (remap_predicate): Update.

This breaks things in Ada:

Program received signal SIGSEGV, Segmentation fault.
walk_aliased_vdefs_1 (ref=0xcbf4, vdef=0x0,
walker=0x873e3e0 , data=0xcc1f, visited=0xcbc0,
cnt=0) at /home/eric/svn/gcc/gcc/tree-ssa-alias.c:1996
1996  gimple def_stmt = SSA_NAME_DEF_STMT (vdef);
(gdb) bt
#0  walk_aliased_vdefs_1 (ref=0xcbf4, vdef=0x0,
walker=0x873e3e0 , data=0xcc1f, visited=0xcbc0,
cnt=0) at /home/eric/svn/gcc/gcc/tree-ssa-alias.c:1996
#1  0x089c3a6d in walk_aliased_vdefs (ref=0xcbf4, vdef=0x0,
walker=0x873e3e0 , data=0xcc1f, visited=0xcbc0)
at /home/eric/svn/gcc/gcc/tree-ssa-alias.c:2037
#2  0x087456b5 in unmodified_parm (stmt=0xf7cf6ab0, op=0xf7cf77e0)
at /home/eric/svn/gcc/gcc/ipa-inline-analysis.c:1104
#3  0x08748a99 in eliminated_by_inlining_prob (stmt=)
at /home/eric/svn/gcc/gcc/ipa-inline-analysis.c:1165

Testcase attached, compile it at -O on x86/x86-64.  It can also be directly 
installed as gnat.dg/opt19.adb in the testsuite.

-- 
Eric Botcazou
-- { dg-do compile }
-- { dg-options "-O" }

procedure Opt19 is

  type Enum is (One, Two);

  type Vector_T is array (Enum) of Integer;

  Zero_Vector : constant Vector_T := (Enum => 0);

  type T is record
Vector : Vector_T;
  end record;

  procedure Nested (Value : in out T; E : Enum; B : out Boolean) is
I : Integer renames Value.Vector(E);
  begin
B := I /= 0;
  end;

  Obj : T := (Vector => Zero_Vector);
  B : Boolean;

begin
  Nested (Obj, One, B);
end;


Ping^3: PR 50113/50061: Fix ABI breakage from emit_library_call_value_1 patch

2011-09-16 Thread Richard Sandiford
Ping for this patch to emit_library_call_value_1:

http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00735.html

which fixes a bootstrap failure on MIPS since:

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02341.html

Tested on mips64-linux-gnu, mips-sgi-irix6.5 (by Rainer) and
on both big and little-endian ARM (by Julian).

Richard


Re: [RFC PATCH] Improve -mavx{,2} vector extraction

2011-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2011 at 11:16:44AM +0200, Jakub Jelinek wrote:
> The avx2_extracti128 pattern looked like wrong RTL, as to extract
> a 2 element vector from 4 element vector it used just one constant
> in the parallel instead of two.  I've changed it into a define_expand.

Actually there were two further issues with avx2_extracti128, one introduced
by my change (pasto in switch control expression), one preexisting
(no idea why it didn't fail before) - in vextracti128 the source operand
has to be register and destination operand has to be register or memory,
while the predicates were incorrectly swapped.

So here is a version that has been (together with the smin/smax patch)
bootstrapped/regtested on x86_64-linux and i686-linux and additionally
tested with RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp vect.exp'
on AVX capable hw.

Sorry for the screw-up.

2011-09-16  Jakub Jelinek  

* config/i386/sse.md (vec_extract_hi_,
vec_extract_hi_v16hi, vec_extract_hi_v32qi): Use
vextracti128 instead of vextractf128 for -mavx2 and
integer vectors.  For V4DFmode fix up mode attribute.
(VEC_EXTRACT_MODE): For TARGET_AVX add 32-byte vectors.
(vec_set_lo_, vec_set_hi_): For VI8F_256 modes use V4DF
instead of V8SF mode attribute.
(avx2_extracti128): Change into define_expand.
* config/i386/i386.c (ix86_expand_vector_extract): Handle
32-byte vector modes if TARGET_AVX.

* gcc.target/i386/sse2-extract-1.c: New test.
* gcc.target/i386/avx-extract-1.c: New test.

--- gcc/config/i386/sse.md.jj   2011-09-15 17:36:20.0 +0200
+++ gcc/config/i386/sse.md  2011-09-16 10:51:51.0 +0200
@@ -3863,13 +3863,23 @@ (define_insn "vec_extract_hi_"
  (match_operand:VI8F_256 1 "register_operand" "x,x")
  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else
+   (and (match_test "TARGET_AVX2")
+   (eq (const_string "mode") (const_string "V4DImode")))
+ (const_string "OI")
+ (const_string "V4DF")))])
 
 (define_insn_and_split "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=x,m")
@@ -3898,13 +3908,23 @@ (define_insn "vec_extract_hi_"
  (parallel [(const_int 4) (const_int 5)
 (const_int 6) (const_int 7)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else
+   (and (match_test "TARGET_AVX2")
+   (eq (const_string "mode") (const_string "V8SImode")))
+ (const_string "OI")
+ (const_string "V8SF")))])
 
 (define_insn_and_split "vec_extract_lo_v16hi"
   [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
@@ -3937,13 +3957,21 @@ (define_insn "vec_extract_hi_v16hi"
 (const_int 12) (const_int 13)
 (const_int 14) (const_int 15)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else (match_test "TARGET_AVX2")
+   (const_string "OI")
+   (const_string "V8SF")))])
 
 (define_insn_and_split "vec_extract_lo_v32qi"
   [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m")
@@ -3984,13 +4012,21 @@ (define_insn "vec_extract_hi_v32qi"
 (const_int 28) (const_int 29)
 (const_int 30) (const_int 31)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}"
+{
+  if (get_attr_mode (insn) == MODE_OI)
+return "vextracti128\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else
+return "vextractf128\t{$0x1, %1, %0|%0, %1, 0x1}";
+}
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,store")
(set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+   (set (attr "mode")
+ (if_then_else (mat

Re: [RFC PATCH] Improve V8SFmode and V4DFmode smin/smax reductions

2011-09-16 Thread Richard Henderson
On 09/16/2011 04:24 AM, Jakub Jelinek wrote:
>   * config/i386/i386.c (ix86_expand_reduc_v4sf): Rename to ...
>   (ix86_expand_reduc): ... this.  Handle also V8SFmode and V4DFmode.
>   * config/i386/sse.md (reduc_splus_v4sf, reduc_smax_v4sf,
>   reduc_smin_v4sf): Adjust callers.
>   (reduc_smax_v8sf, reduc_smin_v8sf, reduc_smax_v4df, reduc_smin_v4df):
>   New expanders.
> 
>   * gcc.dg/vect/vect-reduc-10.c: New test.
>   * gcc.target/i386/avx-reduc-1.c: New test.

Ok.


r~


Re: [RFC PATCH] Improve -mavx{,2} vector extraction

2011-09-16 Thread Richard Henderson
On 09/16/2011 08:22 AM, Jakub Jelinek wrote:
>   * config/i386/sse.md (vec_extract_hi_,
>   vec_extract_hi_v16hi, vec_extract_hi_v32qi): Use
>   vextracti128 instead of vextractf128 for -mavx2 and
>   integer vectors.  For V4DFmode fix up mode attribute.
>   (VEC_EXTRACT_MODE): For TARGET_AVX add 32-byte vectors.
>   (vec_set_lo_, vec_set_hi_): For VI8F_256 modes use V4DF
>   instead of V8SF mode attribute.
>   (avx2_extracti128): Change into define_expand.
>   * config/i386/i386.c (ix86_expand_vector_extract): Handle
>   32-byte vector modes if TARGET_AVX.
> 
>   * gcc.target/i386/sse2-extract-1.c: New test.
>   * gcc.target/i386/avx-extract-1.c: New test.

Ok.


r~


[RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns

2011-09-16 Thread Jakub Jelinek
Hi!

On Fri, Sep 16, 2011 at 01:24:53PM +0200, Jakub Jelinek wrote:
> Surprisingly with -mavx2 the integer loops aren't vectorized with
> 32-byte vectors, wonder why.  But looking at the integer umin/umax/smin/smax
> 16-byte reductions they generate good code even without reduc_* patterns,
> apparently using vector shifts.

Seems on that testcase the integer loops weren't using 32-byte vectors
because there were no expanders for 32-byte integer min/max.
The following patch adds that (and also 32-byte integer condition
vcond/u because it is related).  With this all the integer loops
in that testcase are nicely vectorized with 32-byte vectors with -mavx2,
unfortunately the reductions look terrible.

The problem is that AVX2 doesn't have 32-byte whole vector shift right
(well, in theory it has it if the shift count is exactly 128 - vextractf128).
For shift counts > 128 we could in theory handle it as two instructions,
vextractf128 plus a 16-byte whole vector shift with count - 128, but
reductions actually don't need the two steps, we only care about the
bottom bits after the shifts and the upper bits can contain anything.

So, either we can fix this by adding 
reduc_{smin,smax,umin,umax}_v{32q,16h,8s,4d}i
patterns (at that point I guess I should just macroize them together with
the reduc_{smin,smax,umin,umax}_v{4sf,8sf,4df}) and handle the 4 32-byte
integer modes also in ix86_expand_reduc, or come up with some new optab
for an operation like whole vector shift right, but which would allow
the upper bits to be undefined and would only allow shifts by
vector size / 2, / 4, / 8 down to element size and corresponding tree code.
What do you prefer?

OT: seems the AVX2 support put the avx2_3 and
*avx2_3 patterns (the former after this patch 3)
in a wrong spot, in between vec_shr_ expander and sse2_lshrv1ti3
insn which implements what the expander expands.  Uros, would you like to
move it elsewhere?  Where exactly?

This patch has been tested on x86_64-linux and i686-linux on SandyBridge.

2011-09-16  Jakub Jelinek  

* config/i386/i386.c (ix86_build_const_vector): Handle V8SImode
and V4DImode.
(ix86_build_signbit_mask): Likewise.
(ix86_expand_int_vcond): Likewise.  Handle V16HImode and
V32QImode.
(bdesc_args): Use CODE_FOR_{s,u}m{ax,in}v{32q,16h,8s}i3
instead of CODE_FOR_avx2_{s,u}m{ax,in}v{32q,16h,8s}i3.
* config/i386/sse.md (avx2_3 umaxmin expand): Rename
to...
(3) ... this.
(avx2_3 smaxmin expand): Rename to...
(3) ... this.
(smax3, smin3): Macroize using smaxmin code iterator.
(smaxv2di3, sminv2di3): Macroize using smaxmin code iterator and
VI8_AVX2 mode iterator.
(umaxv2di3, uminv2di3): Macroize using umaxmin code iterator and
VI8_AVX2 mode iterator.
(vcond, vcondu):
New expanders.

--- gcc/config/i386/i386.c.jj   2011-09-16 11:54:27.0 +0200
+++ gcc/config/i386/i386.c  2011-09-16 16:46:12.0 +0200
@@ -16951,7 +16951,9 @@ ix86_build_const_vector (enum machine_mo
 
   switch (mode)
 {
+case V8SImode:
 case V4SImode:
+case V4DImode:
 case V2DImode:
   gcc_assert (vect);
 case V8SFmode:
@@ -16992,6 +16994,7 @@ ix86_build_signbit_mask (enum machine_mo
   /* Find the sign bit, sign extended to 2*HWI.  */
   switch (mode)
 {
+case V8SImode:
 case V4SImode:
 case V8SFmode:
 case V4SFmode:
@@ -17001,6 +17004,7 @@ ix86_build_signbit_mask (enum machine_mo
   lo = 0x8000, hi = lo < 0;
   break;
 
+case V4DImode:
 case V2DImode:
 case V4DFmode:
 case V2DFmode:
@@ -19112,17 +19116,26 @@ ix86_expand_int_vcond (rtx operands[])
 
  switch (mode)
{
+   case V8SImode:
+   case V4DImode:
case V4SImode:
case V2DImode:
{
  rtx t1, t2, mask;
  rtx (*gen_sub3) (rtx, rtx, rtx);
 
+ switch (mode)
+   {
+   case V8SImode: gen_sub3 = gen_subv8si3; break;
+   case V4DImode: gen_sub3 = gen_subv4di3; break;
+   case V4SImode: gen_sub3 = gen_subv4si3; break;
+   case V2DImode: gen_sub3 = gen_subv2di3; break;
+   default:
+ gcc_unreachable ();
+   }
  /* Subtract (-(INT MAX) - 1) from both operands to make
 them signed.  */
  mask = ix86_build_signbit_mask (mode, true, false);
- gen_sub3 = (mode == V4SImode
- ? gen_subv4si3 : gen_subv2di3);
  t1 = gen_reg_rtx (mode);
  emit_insn (gen_sub3 (t1, cop0, mask));
 
@@ -19135,6 +19148,8 @@ ix86_expand_int_vcond (rtx operands[])
}
  break;
 
+   case V32QImode:
+   case V16HImode:
case V16QImode:
case V8HIm

[v3] Add std::array testcase + other minor housekeeping

2011-09-16 Thread Paolo Carlini

Hi,

tested x86_64-linux, -Wall too, committed to mainline.

Paolo.

//
2011-09-16  Paolo Carlini  

* testsuite/23_containers/array/comparison_operators/
less_or_equal.cc: New.
* testsuite/23_containers/array/comparison_operators/
greater_or_equal.cc: Likewise.
* testsuite/23_containers/array/comparison_operators/less.cc: Likewise.
* testsuite/23_containers/array/comparison_operators/equal.cc: Likewise.
* testsuite/23_containers/array/comparison_operators/not_equal.cc:
Likewise.
* testsuite/23_containers/array/comparison_operators/greater.cc:
Likewise.
* testsuite/23_containers/array/iterators/end_is_one_past.cc: Likewise.
* testsuite/23_containers/array/capacity/empty.cc: Likewise.
* testsuite/23_containers/array/capacity/max_size.cc: Likewise.
* testsuite/23_containers/array/capacity/size.cc: Likewise.
* testsuite/23_containers/array/tuple_interface/tuple_element.cc:
Likewise.
* testsuite/23_containers/array/tuple_interface/tuple_size.cc:
Likewise.
* testsuite/23_containers/array/element_access/at_out_of_range.cc:
Likewise.
* testsuite/23_containers/array/element_access/back.cc: Likewise.
* testsuite/23_containers/array/element_access/front.cc: Likewise.
* testsuite/23_containers/array/element_access/data.cc: Likewise.
* testsuite/23_containers/array/cons/aggregate_initialization.cc:
Likewise.
* testsuite/23_containers/array/requirements/zero_sized_arrays.cc:
Likewise.
* testsuite/23_containers/array/requirements/contiguous.cc: Likewise.
* testsuite/23_containers/array/requirements/member_swap.cc: Likewise.
* testsuite/23_containers/array/specialized_algorithms/swap.cc:
Likewise.
* testsuite/23_containers/array/constexpr_get.cc: Move...
* testsuite/23_containers/array/tuple_interface/constexpr_get.cc:
... here.
* testsuite/23_containers/array/requirements/get.cc: Move...
* testsuite/23_containers/array/tuple_interface/get: ... here.
* testsuite/23_containers/array/at_neg.cc: Move...
* testsuite/23_containers/array/element_access: ... here.
* testsuite/23_containers/array/requirements/constexpr_functions.cc:
Move...
* testsuite/23_containers/array/capacity: ... here.
* testsuite/23_containers/array/requirements/
constexpr_element_access.cc: Move...
* testsuite/23_containers/array/element_access: ... here.

* testsuite/20_util/duration/cons/1_neg.cc: Avoid -Wall warnings.
* testsuite/20_util/tuple/creation_functions/constexpr.cc: Likewise.
* testsuite/20_util/pair/make_pair/constexpr.cc: Likewise.
* testsuite/20_util/time_point/nonmember/constexpr.cc: Likewise.
* testsuite/23_containers/bitset/operations/constexpr.cc: Likewise.

* testsuite/20_util/duration/cons/1_neg.cc: Discard bogus warning.
* testsuite/20_util/forward/1_neg.cc: Likewise.
Index: ChangeLog
===
--- ChangeLog   (revision 178910)
+++ ChangeLog   (working copy)
@@ -1,3 +1,59 @@
+2011-09-16  Paolo Carlini  
+
+   * testsuite/23_containers/array/comparison_operators/
+   less_or_equal.cc: New.
+   * testsuite/23_containers/array/comparison_operators/
+   greater_or_equal.cc: Likewise.
+   * testsuite/23_containers/array/comparison_operators/less.cc: Likewise.
+   * testsuite/23_containers/array/comparison_operators/equal.cc: Likewise.
+   * testsuite/23_containers/array/comparison_operators/not_equal.cc:
+   Likewise.
+   * testsuite/23_containers/array/comparison_operators/greater.cc:
+   Likewise.
+   * testsuite/23_containers/array/iterators/end_is_one_past.cc: Likewise.
+   * testsuite/23_containers/array/capacity/empty.cc: Likewise.
+   * testsuite/23_containers/array/capacity/max_size.cc: Likewise.
+   * testsuite/23_containers/array/capacity/size.cc: Likewise.
+   * testsuite/23_containers/array/tuple_interface/tuple_element.cc:
+   Likewise.
+   * testsuite/23_containers/array/tuple_interface/tuple_size.cc:
+   Likewise.
+   * testsuite/23_containers/array/element_access/at_out_of_range.cc:
+   Likewise.
+   * testsuite/23_containers/array/element_access/back.cc: Likewise.
+   * testsuite/23_containers/array/element_access/front.cc: Likewise.
+   * testsuite/23_containers/array/element_access/data.cc: Likewise.
+   * testsuite/23_containers/array/cons/aggregate_initialization.cc:
+   Likewise.
+   * testsuite/23_containers/array/requirements/zero_sized_arrays.cc:
+   Likewise.
+   * testsuite/23_containers/array/requirements/contiguous.cc: Likewise.
+   * testsuite/23_containers/array/requirements/member_swap.cc: Likewise.
+   * 

Re: [PATCH, libiberty] correct md5_process_bytes with unaligned pointers

2011-09-16 Thread Basile Starynkevitch
On Fri, 16 Sep 2011 14:46:57 +0200
Pierre Vittet  wrote:

> Hello,
[...]

> The patch also remove a preprocessor #if testing if
> _STRING_ARCH_unaligned is defined. This symbol is never defined in gcc
> and could be only used in CFLAGS. Looking at the code, it does not looks
> usefull to define it (and it is only tested on libiberty/md5.c and
> libiberty/sha1.c), as we already check the pointer alignement, so
> removing it clean the code. I searched on google, and it does not looks
> to be used. Does anyone want it or thing that it should not be removed?
> 
> Ok for trunk ?


I can't formally approve this patch, but I do hope it will be reviewed and 
approved soon. 

See http://gcc.gnu.org/ml/gcc-help/2011-09/msg00126.html
and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00963.html
and 
http://groups.google.com/group/gcc-melt/browse_thread/thread/292c394fea5089c7

Regards.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Georg-Johann Lay
Denis Chertykov schrieb:
> 2011/9/12 Georg-Johann Lay :
>> This patch introduces patterns for multiply-add and multiply-sub.
>>
>> On the enhanced core, these operations can be performed with the product in 
>> R0;
>> there is no need to MOVW it out of that register.  The code is smaller and
>> faster and has lower register pressure.
>>
>> Tested without regressions.
>>
>> Ok to commit?
> 
> Ok.
> 
> Denis.

This is the second part to fix this PR; it introduced multiply-add/-sub for
QImode and one insn for HI = sign_extend (QI << 1).

With this patch PR50358 is fixed up to some corner cases.

The insns with CONST_INT split the load of the constant after reload.
avr_rtx_costs describes these costs, but it would be advantageous to do the
split pre-reload because IRA/reload could reuse constants.

The trouble is that reload_in_progress is false in IRA and therefore the
patterns match in IRA, so here is the same trouble I faced in the patch for
widening multiply where a function like avr_gate_split1() was regarded as to
hackish because it tested the pass-number to help out.  Without such a function
in the insn condition, the insn matches in IRA and a register is re-replaced
with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
because of !reload_completed in the insn condition.

So this patch comes without reusing constants.

Besides that, the "Write as one pattern" changes gather two insns and write
them down as one; that's no functional change, it's just about using iterators
to reduce lines of code.  The order of insns changes, but that does not matter
here.  I didn't make an extra patch for that.

Passed without regression.

Ok to install?

Johann

PR target/50358
* config/avr/avr.md (*ashiftqihi2.signx.1): New insn.
(*maddqi4, *maddqi4.const): New insns.
(*msubqi4, *msubqi4.const): New insns.
(umulqihi3, mulqihi3): Write as one pattern.
(umulqi3_highpart, smulqi3_highpart): Ditto.
(*maddqihi4.const, *umaddqihi4.uconst): Ditto.
(*msubqihi4.const, *umsubqihi4.uconst): Ditto.
(*muluqihi3.uconst, *mulsqihi3.sconst): Ditto.
* config/avr/avr.c (avr_rtx_costs): Record costs of above in cases
PLUS:QI and MINUS:QI.  Increase costs of multiply-add/-sub for
HImode by 1 in the case of multiplying with a CONST_INT.
Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 178806)
+++ config/avr/avr.md	(working copy)
@@ -1027,31 +1027,21 @@ (define_insn "*mulqi3_call"
   [(set_attr "type" "xcall")
(set_attr "cc" "clobber")])
 
-(define_insn "smulqi3_highpart"
-  [(set (match_operand:QI 0 "register_operand" "=r")
-	(truncate:QI
- (lshiftrt:HI (mult:HI (sign_extend:HI (match_operand:QI 1 "register_operand" "d"))
-   (sign_extend:HI (match_operand:QI 2 "register_operand" "d")))
+;; "umulqi3_highpart"
+;; "smulqi3_highpart"
+(define_insn "mulqi3_highpart"
+  [(set (match_operand:QI 0 "register_operand"   "=r")
+(truncate:QI
+ (lshiftrt:HI (mult:HI (any_extend:HI (match_operand:QI 1 "register_operand" ""))
+   (any_extend:HI (match_operand:QI 2 "register_operand" "")))
   (const_int 8]
   "AVR_HAVE_MUL"
-  "muls %1,%2
+  "mul %1,%2
 	mov %0,r1
 	clr __zero_reg__"
   [(set_attr "length" "3")
(set_attr "cc" "clobber")])
   
-(define_insn "umulqi3_highpart"
-  [(set (match_operand:QI 0 "register_operand" "=r")
-	(truncate:QI
- (lshiftrt:HI (mult:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
-   (zero_extend:HI (match_operand:QI 2 "register_operand" "r")))
-  (const_int 8]
-  "AVR_HAVE_MUL"
-  "mul %1,%2
-	mov %0,r1
-	clr __zero_reg__"
-  [(set_attr "length" "3")
-   (set_attr "cc" "clobber")])
 
 ;; Used when expanding div or mod inline for some special values
 (define_insn "*subqi3.ashiftrt7"
@@ -1064,25 +1054,16 @@ (define_insn "*subqi3.ashiftrt7"
   [(set_attr "length" "2")
(set_attr "cc" "clobber")])
 
-(define_insn "mulqihi3"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-	(mult:HI (sign_extend:HI (match_operand:QI 1 "register_operand" "d"))
-		 (sign_extend:HI (match_operand:QI 2 "register_operand" "d"]
-  "AVR_HAVE_MUL"
-  "muls %1,%2
-	movw %0,r0
-	clr r1"
-  [(set_attr "length" "3")
-   (set_attr "cc" "clobber")])
-
-(define_insn "umulqihi3"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-	(mult:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
-		 (zero_extend:HI (match_operand:QI 2 "register_operand" "r"]
+;; "umulqihi3"
+;; "mulqihi3"
+(define_insn "mulqihi3"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+(mult:HI (any_extend:HI (match_operand:QI 1 "register_operand" ""))

Re: Vector Comparison patch

2011-09-16 Thread Richard Henderson
On 08/29/2011 04:41 AM, Paolo Bonzini wrote:
> The definition in OpenCL makes zero sense to me.  For byte operands
> it is custom-tailored after the SSE PMOVMSKB instruction, but there
> is no PMOVMSKW/PMOVMSKD instruction so you would need very slow bit
> shift operations before PMOVMSK.  On the other hand, bit selection is
> for example in Altivec.

Not PMOVMSKB, but the sse4.1 PBLENDVB.

With that, we don't need funny shift operations for wider integer
types, but only *because* the comparison produces -1, which means
that the MSB of each byte is in fact set.

Which means that the Perfect wording probably doesn't want to be
specific to bit selection, but include bit selection (aka and-andn-or)
as a valid implementation.



r~


Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Georg-Johann Lay
Georg-Johann Lay schrieb:
> Denis Chertykov schrieb:
>> 2011/9/12 Georg-Johann Lay :
>>> This patch introduces patterns for multiply-add and multiply-sub.
>>>
>>> On the enhanced core, these operations can be performed with the product in 
>>> R0;
>>> there is no need to MOVW it out of that register.  The code is smaller and
>>> faster and has lower register pressure.
>>>
>>> Tested without regressions.
>>>
>>> Ok to commit?
>> Ok.
>>
>> Denis.
> 
> This is the second part to fix this PR; it introduced multiply-add/-sub for
> QImode and one insn for HI = sign_extend (QI << 1).
> 
> With this patch PR50358 is fixed up to some corner cases.
> 
> The insns with CONST_INT split the load of the constant after reload.
> avr_rtx_costs describes these costs, but it would be advantageous to do the
> split pre-reload because IRA/reload could reuse constants.
> 
> The trouble is that reload_in_progress is false in IRA and therefore the
> patterns match in IRA, so here is the same trouble I faced in the patch for
> widening multiply where a function like avr_gate_split1() was regarded as to
> hackish because it tested the pass-number to help out.  Without such a 
> function
> in the insn condition, the insn matches in IRA and a register is re-replaced
> with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
> because of !reload_completed in the insn condition.
> 
> So this patch comes without reusing constants.
> 
> Besides that, the "Write as one pattern" changes gather two insns and write
> them down as one; that's no functional change, it's just about using iterators
> to reduce lines of code.  The order of insns changes, but that does not matter
> here.  I didn't make an extra patch for that.

Split the two patches

Ok to install?

Johann


Patch-1
PR target/50358
* config/avr/avr.md (*ashiftqihi2.signx.1): New insn.
(*maddqi4, *maddqi4.const): New insns.
(*msubqi4, *msubqi4.const): New insns.
* config/avr/avr.c (avr_rtx_costs): Record costs of above in cases
PLUS:QI and MINUS:QI.  Increase costs of multiply-add/-sub for
HImode by 1 in the case of multiplying with a CONST_INT.
Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI.


Patch-2
* config/avr/avr.md: (umulqihi3, mulqihi3): Write as one pattern.
(umulqi3_highpart, smulqi3_highpart): Ditto.
(*maddqihi4.const, *umaddqihi4.uconst): Ditto.
(*msubqihi4.const, *umsubqihi4.uconst): Ditto.
(*muluqihi3.uconst, *mulsqihi3.sconst): Ditto.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 178806)
+++ config/avr/avr.md	(working copy)
@@ -1138,6 +1138,72 @@ (define_insn "*oumulqihi3"
(set_attr "cc" "clobber")])
 
 ;**
+; multiply-add/sub QI: $0 = $3 +/- $1*$2
+;**
+
+(define_insn "*maddqi4"
+  [(set (match_operand:QI 0 "register_operand"  "=r")
+(plus:QI (mult:QI (match_operand:QI 1 "register_operand" "r")
+  (match_operand:QI 2 "register_operand" "r"))
+ (match_operand:QI 3 "register_operand"  "0")))]
+  
+  "AVR_HAVE_MUL"
+  "mul %1,%2
+	add %A0,r0
+	clr __zero_reg__"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*msubqi4"
+  [(set (match_operand:QI 0 "register_operand"   "=r")
+(minus:QI (match_operand:QI 3 "register_operand"  "0")
+  (mult:QI (match_operand:QI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r"]
+  "AVR_HAVE_MUL"
+  "mul %1,%2
+	sub %A0,r0
+	clr __zero_reg__"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+(define_insn_and_split "*maddqi4.const"
+  [(set (match_operand:QI 0 "register_operand"   "=r")
+(plus:QI (mult:QI (match_operand:QI 1 "register_operand"  "r")
+  (match_operand:QI 2 "const_int_operand" "n"))
+ (match_operand:QI 3 "register_operand"   "0")))
+   (clobber (match_scratch:QI 4 "=&d"))]
+  "AVR_HAVE_MUL"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+(match_dup 2))
+   ; *maddqi4
+   (set (match_dup 0)
+(plus:QI (mult:QI (match_dup 1)
+  (match_dup 4))
+ (match_dup 3)))]
+  "")
+
+(define_insn_and_split "*msubqi4.const"
+  [(set (match_operand:QI 0 "register_operand""=r")
+(minus:QI (match_operand:QI 3 "register_operand"   "0")
+  (mult:QI (match_operand:QI 1 "register_operand"  "r")
+   (match_operand:QI 2 "const_int_operand" "n"
+   (clobber (match_scratch:QI 4  "=&d"))]
+  "AVR_HAVE_MUL"
+  "#"

Re: [PATCH] fix PR ada/42978

2011-09-16 Thread Simon Wright
On 8 Feb 2010, at 14:18, Arnaud Charlet wrote:

>> OK, second try.
>> 
>> Tested on gcc version 4.5.0 20100207 (experimental) [trunk revision 156574] 
>> (GCC).
>> 
>> 2010-02-07  Simon Wright  
>> 
>>  PR ada/42978
>>  * mlib-utl.adb (Ar): Output ranlib options if verbose.
> 
> This is OK, thanks.

Can this be applied, please? The patch still applies at r178911.

2010-02-07  Simon Wright  

PR ada/42978
* mlib-utl.adb (Ar): Output ranlib options if verbose.




gcc-ada-mlib-utl.adb.diff
Description: Binary data


Re: [Trunk/GCC 4.6] Re: [google] Omit date from Fortran .mod files for reproducible builds

2011-09-16 Thread Diego Novillo
On Fri, Jan 28, 2011 at 13:00, Diego Novillo  wrote:
> On Fri, Jan 28, 2011 at 06:19, Tobias Burnus  wrote:
>> We (Janne and I) think this patch can also be applied to the GCC 4.6 trunk;
>> as the date is never read there is also no .mod ABI issue.
>>
>> I assume that there are no copyright issues.
>
> There aren't.  Google has signed a blanket copyright assignment with
> the FSF.  Any patch coming from a google.com address is covered.
>
> I'll mark this patch for trunk.  Thanks.

Tobias, I'm planning to apply this (old) patch to trunk.  Still OK?

I've re-bootstrapped on x86_64.  No new failures.


Thanks.  Diego.

2011-09-16  Simon Baldwin  

   * module.c (gfc_dump_module): Omit timestamp from output.

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 4250a17..b29ba4b 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -5178,8 +5178,7 @@ void
 gfc_dump_module (const char *name, int dump_flag)
 {
   int n;
-  char *filename, *filename_tmp, *p;
-  time_t now;
+  char *filename, *filename_tmp;
   fpos_t md5_pos;
   unsigned char md5_new[16], md5_old[16];

@@ -5221,13 +5220,8 @@ gfc_dump_module (const char *name, int dump_flag)
 filename_tmp, xstrerror (errno));

   /* Write the header, including space reserved for the MD5 sum.  */
-  now = time (NULL);
-  p = ctime (&now);
-
-  *strchr (p, '\n') = '\0';
-
-  fprintf (module_fp, "GFORTRAN module version '%s' created from %s on %s\n"
-  "MD5:", MOD_VERSION, gfc_source_file, p);
+  fprintf (module_fp, "GFORTRAN module version '%s' created from %s\n"
+  "MD5:", MOD_VERSION, gfc_source_file);
   fgetpos (module_fp, &md5_pos);
   fputs (" -- "
"If you edit this, you'll get what you deserve.\n\n", module_fp);


Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Denis Chertykov
2011/9/16 Georg-Johann Lay :
> Georg-Johann Lay schrieb:
>> Denis Chertykov schrieb:
>>> 2011/9/12 Georg-Johann Lay :
 This patch introduces patterns for multiply-add and multiply-sub.

 On the enhanced core, these operations can be performed with the product 
 in R0;
 there is no need to MOVW it out of that register.  The code is smaller and
 faster and has lower register pressure.

 Tested without regressions.

 Ok to commit?
>>> Ok.
>>>
>>> Denis.
>>
>> This is the second part to fix this PR; it introduced multiply-add/-sub for
>> QImode and one insn for HI = sign_extend (QI << 1).
>>
>> With this patch PR50358 is fixed up to some corner cases.
>>
>> The insns with CONST_INT split the load of the constant after reload.
>> avr_rtx_costs describes these costs, but it would be advantageous to do the
>> split pre-reload because IRA/reload could reuse constants.
>>
>> The trouble is that reload_in_progress is false in IRA and therefore the
>> patterns match in IRA, so here is the same trouble I faced in the patch for
>> widening multiply where a function like avr_gate_split1() was regarded as to
>> hackish because it tested the pass-number to help out.  Without such a 
>> function
>> in the insn condition, the insn matches in IRA and a register is re-replaced
>> with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
>> because of !reload_completed in the insn condition.
>>
>> So this patch comes without reusing constants.
>>
>> Besides that, the "Write as one pattern" changes gather two insns and write
>> them down as one; that's no functional change, it's just about using 
>> iterators
>> to reduce lines of code.  The order of insns changes, but that does not 
>> matter
>> here.  I didn't make an extra patch for that.
>
> Split the two patches
>
> Ok to install?
>

Please, commit.

Denis.


[RFC PATCH] AVX2 32-byte integer min/max reductions

2011-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2011 at 06:20:52PM +0200, Jakub Jelinek wrote:
> So, either we can fix this by adding 
> reduc_{smin,smax,umin,umax}_v{32q,16h,8s,4d}i
> patterns (at that point I guess I should just macroize them together with
> the reduc_{smin,smax,umin,umax}_v{4sf,8sf,4df}) and handle the 4 32-byte
> integer modes also in ix86_expand_reduc, or come up with some new optab

Here is a patch that does it this way and also moves the umaxmin expanders
one insn down to the right spot.

I've noticed _lshr3 insn was modelled incorrectly
for the 256-bit shift, because, as the documentation says, it
shifts each 128-bit lane separately, while it was modelled as V4DImode
shift (i.e. shifting each 64-bit chunk), and sse2_lshrv1ti3 was there
just for the 128-bit variant, not the 256-bit one.

Regtested on x86_64-linux and i686-linux on SandyBridge, unfortunately
I don't have AVX2 emulator and thus AVX2 assembly was just eyeballed.
E.g. for the V16HImode reduction the difference with this patch is:
-   vmovdqa %xmm0, %xmm1
-   vextracti128$0x1, %ymm0, %xmm0
-   vpextrw $0, %xmm1, %eax
-   vpextrw $1, %xmm1, %edx
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $2, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $3, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $4, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $5, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $6, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $7, %xmm1, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $0, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $1, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $2, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $3, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $4, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $5, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $6, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovl   %eax, %edx
-   vpextrw $7, %xmm0, %eax
-   cmpw%ax, %dx
-   cmovge  %edx, %eax
+   vperm2i128  $1, %ymm0, %ymm0, %ymm1
+   vpmaxsw %ymm1, %ymm0, %ymm0
+   vpsrldq $8, %ymm0, %ymm1
+   vpmaxsw %ymm1, %ymm0, %ymm0
+   vpsrldq $4, %ymm0, %ymm1
+   vpmaxsw %ymm1, %ymm0, %ymm0
+   vpsrldq $2, %ymm0, %ymm1
+   vpmaxsw %ymm1, %ymm0, %ymm0
+   vpextrw $0, %xmm0, %eax

2011-09-16  Jakub Jelinek  

* config/i386/sse.md (VIMAX_AVX2): Change V4DI to V2TI.
(sse2_avx, sseinsnmode): Add V2TI.
(REDUC_SMINMAX_MODE): New mode iterator.
(reduc_smax_v4sf, reduc_smin_v4sf, reduc_smax_v8sf,
reduc_smin_v8sf, reduc_smax_v4df, reduc_smin_v4df): Remove.
(reduc__): New smaxmin and umaxmin expanders.
(sse2_lshrv1ti3): Rename to...
(_lshr3): ... this.  Use VIMAX_AVX2 mode
iterator.  Move before umaxmin expanders.
* config/i386/i386.h (VALID_AVX256_REG_MODE,
SSE_REG_MODE_P): Accept V2TImode.
* config/i386/i386.c (ix86_expand_reduc): Handle V32QImode,
V16HImode, V8SImode and V4DImode.

--- gcc/config/i386/sse.md.jj   2011-09-16 17:04:07.0 +0200
+++ gcc/config/i386/sse.md  2011-09-16 20:07:02.0 +0200
@@ -100,7 +100,7 @@ (define_mode_iterator VI8_AVX2
   [(V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VIMAX_AVX2
-  [(V4DI "TARGET_AVX2") V1TI])
+  [(V2TI "TARGET_AVX2") V1TI])
 
 (define_mode_iterator SSESCALARMODE
   [(V4DI "TARGET_AVX2") TI])
@@ -140,7 +140,7 @@ (define_mode_attr sse2_avx2
(V8HI "sse2") (V16HI "avx2")
(V4SI "sse2") (V8SI "avx2")
(V2DI "sse2") (V4DI "avx2")
-   (V1TI "sse2")])
+   (V1TI "sse2") (V2TI "avx2")])
 
 (define_mode_attr ssse3_avx2
[(V16QI "ssse3") (V32QI "avx2")
@@ -225,7 +225,7 @@ (define_mode_attr avxsizesuffix
 
 ;; SSE instruction mode
 (define_mode_attr sseinsnmode
-  [(V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")
+  [(V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI") (V2TI "OI")
(V16QI "TI") (V8HI "TI") (V4SI "TI") (V2DI "TI") (V1TI "TI")
(V8SF "V8SF") (V4DF "V4DF")
(V4SF "V4SF") (V2DF "V2DF")
@@ -1257,58 +1257,30 @@ (define_expand "reduc_splus_v4sf"
   DONE;
 })
 
-
-(define_expand "reduc_smax_v4sf"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V4SF 1 "register_operand" "")]
-  "TARGET_SSE"
-{
-  ix86_expand_reduc (gen_smaxv4sf3, operands[0], operands[1]);
-  DONE;
-})
-
-(define_expand "reduc_smin_v4sf"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V4SF 1 "register_operand" "")]
-  "TARGET_SSE"
-{
-  ix86_expand_reduc (gen_sminv4sf3, operands[0], operands[1]);
-  DONE;
-})
-
-(define_expand "reduc_smax_v8sf"
-  [(match_oper

[PATCH] Add VIS intrinsics header for sparc.

2011-09-16 Thread David Miller

I've been meaning to toss something like this together for a while.

If we were going to do this, I wanted to get it out of the way before
adding VIS2 and VIS3 support.

I considered trying to make a set of VIS headers compatible with the
vis_*.h headers Sun provides in medialib and Sun Studio, but that's
not possible since we use fundamentally different types in the
builtins provided by GCC.

Sun uses "double" and "float" in the declarations whereas we use our
vector types.

I even checked various users of Sun's VIS intrinsics and they all just
declare their vector variables as "float" and "double" so it would be
impossible to provide headers that would work out of the box.

Eric, any objections?

2011-09-16  David S. Miller  

* config/sparc/visintrin.h: New file.
* config.gcc: Add it to extra_headers on sparc.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e442fa7..7183f26 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -422,6 +422,7 @@ score*-*-*)
;;
 sparc*-*-*)
cpu_type=sparc
+   extra_headers="visintrin.h"
need_64bit_hwint=yes
;;
 spu*-*-*)
--- /dev/null   2011-09-11 10:37:28.169997151 -0700
+++ b/gcc/config/sparc/visintrin.h  2011-09-14 21:20:35.0 -0700
@@ -0,0 +1,160 @@
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _VISINTRIN_H_INCLUDED
+#define _VISINTRIN_H_INCLUDED
+
+typedef int __v2si __attribute__ ((__vector_size__ (8)));
+typedef short __v4hi __attribute__ ((__vector_size__ (8)));
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef int __i64 __attribute__ ((__mode__ (DI)));
+
+extern __inline void *
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_alignaddr (void *__A, long __B)
+{
+   return __builtin_vis_alignaddr(__A, __B);
+}
+
+extern __inline __i64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_faligndatadi (__i64 __A)
+{
+   return __builtin_vis_faligndatadi (__A);
+}
+
+extern __inline __v2si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_faligndatav2si (__v2si __A)
+{
+   return __builtin_vis_faligndatav2si (__A);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_faligndatav4hi (__v4hi __A)
+{
+   return __builtin_vis_faligndatav4hi (__A);
+}
+
+extern __inline __v8qi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_faligndatav8qi (__v8qi __A)
+{
+   return __builtin_vis_faligndatav8qi (__A);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fexpand (__v4qi __A)
+{
+   return __builtin_vis_fexpand (__A);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmul8x16 (__v4qi __A, __v4hi __B)
+{
+   return __builtin_vis_fmul8x16 (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmul8x16au (__v4qi __A, __v4hi __B)
+{
+   return __builtin_vis_fmul8x16au (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmul8x16al (__v4qi __A, __v4hi __B)
+{
+   return __builtin_vis_fmul8x16al (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmul8sux16 (__v8qi __A, __v4hi __B)
+{
+   return __builtin_vis_fmul8sux16 (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmul8ulx16 (__v8qi __A, __v4hi __B)
+{
+   return __builtin_vis_fmul8ulx16 (__A, __B);
+}
+
+extern __inline __v2si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fmuld8sux16 (__v4qi __A, __v2hi __B)
+{
+   return __builtin_vis_fmuld8sux16 (__A, __B);
+}
+
+extern

Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2011 at 03:02:07PM -0400, David Miller wrote:
> +extern __inline void *
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +__vis_alignaddr (void *__A, long __B)
> +{
> + return __builtin_vis_alignaddr(__A, __B);

Just formatting nits, two spaces instead of tab to indent and
space in between function name and (.

Jakub


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-16 Thread David Miller
From: Jakub Jelinek 
Date: Fri, 16 Sep 2011 21:07:09 +0200

> On Fri, Sep 16, 2011 at 03:02:07PM -0400, David Miller wrote:
>> +extern __inline void *
>> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>> +__vis_alignaddr (void *__A, long __B)
>> +{
>> +return __builtin_vis_alignaddr(__A, __B);
> 
> Just formatting nits, two spaces instead of tab to indent and
> space in between function name and (.

Thanks Jakub, I'll fix those up.


[libgomp] Pass CC to the libgomp testsuite to capture -sysroot

2011-09-16 Thread Diego Novillo
I would like to push this libgomp patch upstream.  We need it to pass
sysroot to GCC when testing libgomp.

Tested on x86_64.  OK for trunk?


Thanks.  Diego.


-- Forwarded message --
From: Simon Baldwin 
Date: Fri, Jan 28, 2011 at 11:30
Subject: [google] Pass CC to the libgomp testsuite to capture -sysroot
To: gcc-patches@gcc.gnu.org


Pass CC to the libgomp testsuite to capture -sysroot.

Pass CC to the libgomp testsuite.  This is required for running tests where
gcc is configured with a custom sysroot, and CC therefore includes a -sysroot
flag.

Targeted for the google/integration branch.

libgomp/ChangeLog.google:
2011-01-28  Simon Baldwin  

       * configure.ac: Add testsuite/gompconfig.exp to config files.
       * configure: Rebuild from configure.ac.
       * testsuite/config/default.exp: Load gompconfig.exp.
       * testsuite/lib/libgomp.exp (libgomp_init): Exec all of $CC_UNDER_TEST.
       * libgomp/testsuite/gompconfig.exp.in: New.

Google ref: 39294


Index: libgomp/configure
===
--- libgomp/configure   (revision 169355)
+++ libgomp/configure   (working copy)
@@ -16279,6 +16279,8 @@ ac_config_files="$ac_config_files omp.h

 ac_config_files="$ac_config_files Makefile testsuite/Makefile libgomp.spec"

+ac_config_files="$ac_config_files testsuite/gompconfig.exp"
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
@@ -17423,6 +17425,7 @@ do
    "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
    "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
    "libgomp.spec") CONFIG_FILES="$CONFIG_FILES libgomp.spec" ;;
+    "testsuite/gompconfig.exp") CONFIG_FILES="$CONFIG_FILES
testsuite/gompconfig.exp" ;;

  *) as_fn_error "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
  esac
Index: libgomp/configure.ac
===
--- libgomp/configure.ac        (revision 169355)
+++ libgomp/configure.ac        (working copy)
@@ -347,4 +347,5 @@ CFLAGS="$save_CFLAGS"

 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
 AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
+AC_CONFIG_FILES(testsuite/gompconfig.exp)
 AC_OUTPUT
Index: libgomp/testsuite/config/default.exp
===
--- libgomp/testsuite/config/default.exp        (revision 169355)
+++ libgomp/testsuite/config/default.exp        (working copy)
@@ -15,3 +15,4 @@
 # .

 load_lib "standard.exp"
+load_lib "gompconfig.exp"
Index: libgomp/testsuite/lib/libgomp.exp
===
--- libgomp/testsuite/lib/libgomp.exp   (revision 169355)
+++ libgomp/testsuite/lib/libgomp.exp   (working copy)
@@ -110,10 +110,9 @@ proc libgomp_init { args } {
           append always_ld_library_path ":${gccdir}/pthread"
       }
       append always_ld_library_path ":${gccdir}"
-       set compiler [lindex $GCC_UNDER_TEST 0]

-       if { [is_remote host] == 0 && [which $compiler] != 0 } {
-         foreach i "[exec $compiler --print-multi-lib]" {
+       if { [is_remote host] == 0 } {
+         foreach i "[eval "exec $GCC_UNDER_TEST --print-multi-lib"]" {
           set mldir ""
           regexp -- "\[a-z0-9=_/\.-\]*;" $i mldir
           set mldir [string trimright $mldir "\;@"]
Index: libgomp/testsuite/gompconfig.exp.in
===
--- libgomp/testsuite/gompconfig.exp.in (revision 0)
+++ libgomp/testsuite/gompconfig.exp.in (revision 0)
@@ -0,0 +1,2 @@
+global GCC_UNDER_TEST
+set GCC_UNDER_TEST "@CC@"


Re: [RFC] Add FMA support to sparc backend

2011-09-16 Thread Eric Botcazou
> Second, like rs6000 the sparc negate fused multiply instructions
> negate the full result, not the multiply result.  So we cannot use
> those instructions for the fnmadf4/fnmsdf4/fnmasf4/fnmssf4 patterns.
> Since rs6000 provides patterns for such negate operations (presumably
> just in case the combiner creates a match) I have done so for sparc
> as well.

OK, this makes sense indeed.

> For now my plan is to turn these fused multiply instructions on if you
> ask to compile targetting a cpu that supports them.

> I'll write a suitable changelog etc. once everything is finalized, this
> patch posting is just to elicit feedback.

What's the story with TFmode for FMA?

Thanks for working on this!

-- 
Eric Botcazou


Re: [RFC] Add FMA support to sparc backend

2011-09-16 Thread David Miller
From: Eric Botcazou 
Date: Fri, 16 Sep 2011 22:25:41 +0200

> What's the story with TFmode for FMA?

There have never been TFmode float operations implemented in hardware
ever for sparc, and I doubt we'll see it in the future.

And this applies also to the FMA instructions.

And especially since the presence of the FMA patterns is meant to be a
performance enhancement, I don't see much value to considering TFmode
cases.

Did you have something specific in mind?

> Thanks for working on this!

No problem.


Re: [libgomp] Pass CC to the libgomp testsuite to capture -sysroot

2011-09-16 Thread Mike Stump
On Sep 16, 2011, at 1:13 PM, Diego Novillo wrote:
> I would like to push this libgomp patch upstream.  We need it to pass
> sysroot to GCC when testing libgomp.

I'd be curious if Ian likes this...  Can one still set GCC_UNDER_TEST in the 
site.exp file or do other unholy things with it after your patch?  I'm thinking 
about things like in tree testing v out of tree testing.



Re: [Ada] fix potential memory corruption in annotated value cache

2011-09-16 Thread Eric Botcazou
> Some possible fixes I considered were:
>
> 1. inserting on entry (as is), allocating the cache entry right away,
> and *always* filling it before returning
>
> 2. inserting on entry (as is), allocating the cache entry right away,
> and releasing it before returning unless we're filling it in
>
> 3. not inserting on entry, and looking up again for insertion before
> caching and returning, so as to get a fresh slot pointer
>
> I implemented 3., and considered splitting the logic of annotate_value()
> into one function that manages caching and calls the other to perform
> the computation, so as to simplify the implementation.

This looks like the most straightforward solution indeed.

> Here's the patch I've tested on i686-pc-linux-gnu and x86_64-linux-gnu.
> Ok to install?

Yes, modulo Jakub's remark and s/NULL/NULL_TREE for zeroing in.base.from.

-- 
Eric Botcazou


Re: [libgomp] Pass CC to the libgomp testsuite to capture -sysroot

2011-09-16 Thread Diego Novillo

On 11-09-16 16:40 , Mike Stump wrote:

On Sep 16, 2011, at 1:13 PM, Diego Novillo wrote:

I would like to push this libgomp patch upstream.  We need it to pass
sysroot to GCC when testing libgomp.


I'd be curious if Ian likes this...  Can one still set GCC_UNDER_TEST in the 
site.exp file or do other unholy things with it after your patch?  I'm thinking 
about things like in tree testing v out of tree testing.

I don't see why not.  The patch does not re-set GCC_UNDER_TEST, it just 
uses it whole instead of getting the compiler part.  But I think I'm 
missing your point.



Diego.


Re: [libgomp] Pass CC to the libgomp testsuite to capture -sysroot

2011-09-16 Thread Diego Novillo
On Fri, Sep 16, 2011 at 16:49, Diego Novillo  wrote:
> On 11-09-16 16:40 , Mike Stump wrote:
>>
>> On Sep 16, 2011, at 1:13 PM, Diego Novillo wrote:
>>>
>>> I would like to push this libgomp patch upstream.  We need it to pass
>>> sysroot to GCC when testing libgomp.
>>
>> I'd be curious if Ian likes this...  Can one still set GCC_UNDER_TEST in
>> the site.exp file or do other unholy things with it after your patch?  I'm
>> thinking about things like in tree testing v out of tree testing.
>>
> I don't see why not.  The patch does not re-set GCC_UNDER_TEST, it just uses
> it whole instead of getting the compiler part.  But I think I'm missing your
> point.

Never mind.  I see what you mean now.


Re: [RFC] Add FMA support to sparc backend

2011-09-16 Thread Eric Botcazou
> There have never been TFmode float operations implemented in hardware
> ever for sparc, and I doubt we'll see it in the future.
>
> And this applies also to the FMA instructions.

Do the specs totally disregard quad floats for FMA or...?

> And especially since the presence of the FMA patterns is meant to be a
> performance enhancement, I don't see much value to considering TFmode
> cases.
>
> Did you have something specific in mind?

No, this was purely for my own education. :-)  Maybe a comment explaining the 
situation/(non-)implementation choice wrt TFmode would be in order.

-- 
Eric Botcazou


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-16 Thread Eric Botcazou
> I considered trying to make a set of VIS headers compatible with the
> vis_*.h headers Sun provides in medialib and Sun Studio, but that's
> not possible since we use fundamentally different types in the
> builtins provided by GCC.
>
> Sun uses "double" and "float" in the declarations whereas we use our
> vector types.
>
> I even checked various users of Sun's VIS intrinsics and they all just
> declare their vector variables as "float" and "double" so it would be
> impossible to provide headers that would work out of the box.

Yes, I have some recollections of that.

> Eric, any objections?

None, this looks OK to me.

-- 
Eric Botcazou


Re: [RFC] Add FMA support to sparc backend

2011-09-16 Thread David Miller
From: Eric Botcazou 
Date: Fri, 16 Sep 2011 22:53:09 +0200

>> There have never been TFmode float operations implemented in hardware
>> ever for sparc, and I doubt we'll see it in the future.
>>
>> And this applies also to the FMA instructions.
> 
> Do the specs totally disregard quad floats for FMA or...?

The documentation I've read merely states that presence of single and
double precision versions of these instructions, and their behavior.

The same is also the case for all of the HPC instructions (such as
"fhadd" which is "floating point add and halve").  Only single and
double precision versions are provided and described.

Absolutely no consideration nor mention is made to quad precision at
all.

These are instruction set extensions, rather than an addition or
modification to v9.  So I wouldn't go so far as to say that they have
some requirement to take quad floating point into consideration, or
even mention it at all.



Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-16 Thread David Miller
From: Eric Botcazou 
Date: Fri, 16 Sep 2011 23:01:56 +0200

>> Eric, any objections?
> 
> None, this looks OK to me.

Thanks Eric, I'll check this in.


C++ PATCH for c++/50424 (wrong code with throwing default argument)

2011-09-16 Thread Jason Merrill
We collect information about whether a function can throw as we compile 
the function: if we build up a call that can throw, then the current 
function can throw, too.  But we weren't doing the same for default 
arguments used in a call, which might themselves contain calls that can 
throw.


Tested x86_64-pc-linux-gnu, applying to trunk and a smaller patch to 4.6.
commit 98ffb624642b54592956dae50c744300ba3a29c0
Author: Jason Merrill 
Date:   Thu Sep 15 14:31:50 2011 -0400

	PR c++/50424
	* call.c (set_flags_from_callee): Split out from build_call_a.
	* cp-tree.h: Declare it.
	* tree.c (bot_manip): Call it.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 81df80e..bdbede7 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -306,11 +306,32 @@ build_call_n (tree function, int n, ...)
 }
 }
 
-tree
-build_call_a (tree function, int n, tree *argarray)
+/* Update various flags in cfun and the call itself based on what is being
+   called.  Split out of build_call_a so that bot_manip can use it too.  */
+
+void
+set_flags_from_callee (tree call)
 {
-  int is_constructor = 0;
   int nothrow;
+  tree decl = get_callee_fndecl (call);
+
+  /* We check both the decl and the type; a function may be known not to
+ throw without being declared throw().  */
+  nothrow = ((decl && TREE_NOTHROW (decl))
+	 || TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call);
+
+  if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
+cp_function_chain->can_throw = 1;
+
+  if (decl && TREE_THIS_VOLATILE (decl) && cfun && cp_function_chain)
+current_function_returns_abnormally = 1;
+
+  TREE_NOTHROW (call) = nothrow;
+}
+
+tree
+build_call_a (tree function, int n, tree *argarray)
+{
   tree decl;
   tree result_type;
   tree fntype;
@@ -327,60 +348,45 @@ build_call_a (tree function, int n, tree *argarray)
   if (SCALAR_TYPE_P (result_type) || VOID_TYPE_P (result_type))
 result_type = cv_unqualified (result_type);
 
-  if (TREE_CODE (function) == ADDR_EXPR
-  && TREE_CODE (TREE_OPERAND (function, 0)) == FUNCTION_DECL)
+  function = build_call_array_loc (input_location,
+   result_type, function, n, argarray);
+  set_flags_from_callee (function);
+
+  decl = get_callee_fndecl (function);
+
+  if (decl && !TREE_USED (decl))
 {
-  decl = TREE_OPERAND (function, 0);
-  if (!TREE_USED (decl))
-	{
-	  /* We invoke build_call directly for several library
-	 functions.  These may have been declared normally if
-	 we're building libgcc, so we can't just check
-	 DECL_ARTIFICIAL.  */
-	  gcc_assert (DECL_ARTIFICIAL (decl)
-		  || !strncmp (IDENTIFIER_POINTER (DECL_NAME (decl)),
-   "__", 2));
-	  mark_used (decl);
-	}
+  /* We invoke build_call directly for several library
+	 functions.  These may have been declared normally if
+	 we're building libgcc, so we can't just check
+	 DECL_ARTIFICIAL.  */
+  gcc_assert (DECL_ARTIFICIAL (decl)
+		  || !strncmp (IDENTIFIER_POINTER (DECL_NAME (decl)),
+			   "__", 2));
+  mark_used (decl);
 }
-  else
-decl = NULL_TREE;
-
-  /* We check both the decl and the type; a function may be known not to
- throw without being declared throw().  */
-  nothrow = ((decl && TREE_NOTHROW (decl))
-	 || TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (function;
-
-  if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
-cp_function_chain->can_throw = 1;
-
-  if (decl && TREE_THIS_VOLATILE (decl) && cfun && cp_function_chain)
-current_function_returns_abnormally = 1;
 
   if (decl && TREE_DEPRECATED (decl))
 warn_deprecated_use (decl, NULL_TREE);
   require_complete_eh_spec_types (fntype, decl);
 
-  if (decl && DECL_CONSTRUCTOR_P (decl))
-is_constructor = 1;
+  TREE_HAS_CONSTRUCTOR (function) = (decl && DECL_CONSTRUCTOR_P (decl));
 
   /* Don't pass empty class objects by value.  This is useful
  for tags in STL, which are used to control overload resolution.
  We don't need to handle other cases of copying empty classes.  */
   if (! decl || ! DECL_BUILT_IN (decl))
 for (i = 0; i < n; i++)
-  if (is_empty_class (TREE_TYPE (argarray[i]))
-	  && ! TREE_ADDRESSABLE (TREE_TYPE (argarray[i])))
-	{
-	  tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (argarray[i]));
-	  argarray[i] = build2 (COMPOUND_EXPR, TREE_TYPE (t),
-argarray[i], t);
-	}
-
-  function = build_call_array_loc (input_location,
-   result_type, function, n, argarray);
-  TREE_HAS_CONSTRUCTOR (function) = is_constructor;
-  TREE_NOTHROW (function) = nothrow;
+  {
+	tree arg = CALL_EXPR_ARG (function, i);
+	if (is_empty_class (TREE_TYPE (arg))
+	&& ! TREE_ADDRESSABLE (TREE_TYPE (arg)))
+	  {
+	tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (arg));
+	arg = build2 (COMPOUND_EXPR, TREE_TYPE (t), arg, t);
+	CALL_EXPR_ARG (function, i) = arg;
+	  }
+  }
 
   return function;
 }
@@ -6736,7 +6742,6 @@ build_cxx_call (tree fn, int nargs, tree *argarray)
   fn = build_call_a (

Go patch committed: send/recieve on nil channel blocks forever

2011-09-16 Thread Ian Lance Taylor
The Go language was clarified so that a send/receive on a nil channel
blocks forever, which makes nil channels not very useful but consistent
with select.  This patch implements that.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

Index: libgo/runtime/go-reflect-chan.c
===
--- libgo/runtime/go-reflect-chan.c	(revision 178910)
+++ libgo/runtime/go-reflect-chan.c	(working copy)
@@ -45,18 +45,13 @@ chansend (struct __go_channel_type *ct, 
   void *pv;
 
   __go_assert (ct->__common.__code == GO_CHAN);
-  __go_assert (__go_type_descriptors_equal (ct->__element_type,
-	channel->element_type));
 
-  if (channel == NULL)
-__go_panic_msg ("send to nil channel");
-
-  if (__go_is_pointer_type (channel->element_type))
+  if (__go_is_pointer_type (ct->__element_type))
 pv = &val_i;
   else
 pv = (void *) val_i;
 
-  element_size = channel->element_type->__size;
+  element_size = ct->__element_type->__size;
   if (element_size <= sizeof (uint64_t))
 {
   union
@@ -112,12 +107,10 @@ chanrecv (struct __go_channel_type *ct, 
   struct chanrecv_ret ret;
 
   __go_assert (ct->__common.__code == GO_CHAN);
-  __go_assert (__go_type_descriptors_equal (ct->__element_type,
-	channel->element_type));
 
-  element_size = channel->element_type->__size;
+  element_size = ct->__element_type->__size;
 
-  if (__go_is_pointer_type (channel->element_type))
+  if (__go_is_pointer_type (ct->__element_type))
 pv = &ret.val;
   else
 {
Index: libgo/runtime/channel.h
===
--- libgo/runtime/channel.h	(revision 178784)
+++ libgo/runtime/channel.h	(working copy)
@@ -147,3 +147,6 @@ extern void __go_builtin_close (struct _
 extern int __go_chan_len (struct __go_channel *);
 
 extern int __go_chan_cap (struct __go_channel *);
+
+extern uintptr_t __go_select (uintptr_t, _Bool, struct __go_channel **,
+			  _Bool *);
Index: libgo/runtime/go-send-big.c
===
--- libgo/runtime/go-send-big.c	(revision 178784)
+++ libgo/runtime/go-send-big.c	(working copy)
@@ -17,7 +17,10 @@ __go_send_big (struct __go_channel* chan
   size_t offset;
 
   if (channel == NULL)
-__go_panic_msg ("send to nil channel");
+{
+  // Block forever.
+  __go_select (0, 0, NULL, NULL);
+}
 
   element_size = channel->element_type->__size;
   alloc_size = (element_size + sizeof (uint64_t) - 1) / sizeof (uint64_t);
Index: libgo/runtime/go-send-nb-small.c
===
--- libgo/runtime/go-send-nb-small.c	(revision 178784)
+++ libgo/runtime/go-send-nb-small.c	(working copy)
@@ -93,6 +93,9 @@ __go_send_nonblocking_acquire (struct __
 _Bool
 __go_send_nonblocking_small (struct __go_channel *channel, uint64_t val)
 {
+  if (channel == NULL)
+return 0;
+
   __go_assert (channel->element_type->__size <= sizeof (uint64_t));
 
   if (!__go_send_nonblocking_acquire (channel))
Index: libgo/runtime/chan.goc
===
--- libgo/runtime/chan.goc	(revision 178784)
+++ libgo/runtime/chan.goc	(working copy)
@@ -6,6 +6,8 @@ package runtime
 #include "config.h"
 #include "channel.h"
 
+#define nil NULL
+
 typedef _Bool bool;
 typedef unsigned char byte;
 typedef struct __go_channel chan;
@@ -13,7 +15,7 @@ typedef struct __go_channel chan;
 /* Do a channel receive with closed status.  */
 
 func chanrecv2(c *chan, val *byte) (received bool) {
-	uintptr_t element_size = c->element_type->__size;
+	uintptr_t element_size = c == nil ? 0 : c->element_type->__size;
 	if (element_size > 8) {
 		return __go_receive_big(c, val, 0);
 	} else {
Index: libgo/runtime/go-send-nb-big.c
===
--- libgo/runtime/go-send-nb-big.c	(revision 178784)
+++ libgo/runtime/go-send-nb-big.c	(working copy)
@@ -15,6 +15,9 @@ __go_send_nonblocking_big (struct __go_c
   size_t alloc_size;
   size_t offset;
 
+  if (channel == NULL)
+return 0;
+
   element_size = channel->element_type->__size;
   alloc_size = (element_size + sizeof (uint64_t) - 1) / sizeof (uint64_t);
 
Index: libgo/runtime/go-send-small.c
===
--- libgo/runtime/go-send-small.c	(revision 178784)
+++ libgo/runtime/go-send-small.c	(working copy)
@@ -145,7 +145,10 @@ void
 __go_send_small (struct __go_channel *channel, uint64_t val, _Bool for_select)
 {
   if (channel == NULL)
-__go_panic_msg ("send to nil channel");
+{
+  // Block forever.
+  __go_select (0, 0, NULL, NULL);
+}
 
   __go_assert (channel->element_type->__size <= sizeof (uint64_t));
 
Index: libgo/runtime/go-rec-big.c
===
--- libgo/runtime/go-rec-big.c	(revision 178784)
+++ libgo/runt

Go patch committed: Better errors for invalid [...]type

2011-09-16 Thread Ian Lance Taylor
This patch to the Go frontend improves the error handling for invalid
use of [...]type, which may be only used with a composite literal.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian

Index: gcc/go/gofrontend/parse.cc
===
--- gcc/go/gofrontend/parse.cc	(revision 178784)
+++ gcc/go/gofrontend/parse.cc	(working copy)
@@ -2761,8 +2761,21 @@ Parse::primary_expr(bool may_be_sink, bo
 	  else
 	this->advance_token();
 	  if (expr->is_error_expression())
-	return expr;
-	  ret = Expression::make_cast(ret->type(), expr, loc);
+	ret = expr;
+	  else
+	{
+	  Type* t = ret->type();
+	  if (t->classification() == Type::TYPE_ARRAY
+		  && t->array_type()->length() != NULL
+		  && t->array_type()->length()->is_nil_expression())
+		{
+		  error_at(ret->location(),
+			   "invalid use of %<...%> in type conversion");
+		  ret = Expression::make_error(loc);
+		}
+	  else
+		ret = Expression::make_cast(t, expr, loc);
+	}
 	}
 }
 
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc	(revision 178870)
+++ gcc/go/gofrontend/expressions.cc	(working copy)
@@ -11789,7 +11789,7 @@ Array_construction_expression::do_check_
 }
 
   Expression* length = at->length();
-  if (length != NULL)
+  if (length != NULL && !length->is_error_expression())
 {
   mpz_t val;
   mpz_init(val);
Index: gcc/testsuite/go.test/test/ddd1.go
===
--- gcc/testsuite/go.test/test/ddd1.go	(revision 178784)
+++ gcc/testsuite/go.test/test/ddd1.go	(working copy)
@@ -15,7 +15,7 @@ var (
 	_ = sum()
 	_ = sum(1.0, 2.0)
 	_ = sum(1.5)  // ERROR "integer"
-	_ = sum("hello")  // ERROR "convert|incompatible"
+	_ = sum("hello")  // ERROR "string.*as type int|incompatible"
 	_ = sum([]int{1}) // ERROR "slice literal.*as type int|incompatible"
 )
 
@@ -43,4 +43,7 @@ func bad(args ...int) {
 	var x int
 	_ = unsafe.Pointer(&x...)	// ERROR "[.][.][.]"
 	_ = unsafe.Sizeof(x...)	// ERROR "[.][.][.]"
+	_ = [...]byte("foo") // ERROR "[.][.][.]"
+	_ = [...][...]int{{1,2,3},{4,5,6}}	// ERROR "[.][.][.]"
 }
+


PATCH: Replace tmp with __tmp

2011-09-16 Thread H.J. Lu
Hi,

We should use __tmp instead of tmp in intrinsics.  OK for trunk?

Thanks.

H.J.
---
2011-09-16  H.J. Lu  

* config/i386/bmiintrin.h: Replace tmp with __tmp.
* config/i386/tbmintrin.h: Likewise.

diff --git a/gcc/config/i386/bmiintrin.h b/gcc/config/i386/bmiintrin.h
index af5d9dc..72ab114 100644
--- a/gcc/config/i386/bmiintrin.h
+++ b/gcc/config/i386/bmiintrin.h
@@ -42,8 +42,8 @@ __tzcnt_u16 (unsigned short __X)
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __andn_u32 (unsigned int __X, unsigned int __Y)
 {
-  unsigned int tmp = ~(__X) & (__Y);
-  return tmp;
+  unsigned int __tmp = ~(__X) & (__Y);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -55,22 +55,22 @@ __bextr_u32 (unsigned int __X, unsigned int __Y)
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blsi_u32 (unsigned int __X)
 {
-  unsigned int tmp = (__X) & (-(__X));
-  return tmp;
+  unsigned int __tmp = (__X) & (-(__X));
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blsmsk_u32 (unsigned int __X)
 {
-  unsigned int tmp = (__X) ^ (__X - 1);
-  return tmp;
+  unsigned int __tmp = (__X) ^ (__X - 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blsr_u32 (unsigned int __X)
 {
-  unsigned int tmp = (__X) & (__X - 1);
-  return tmp;
+  unsigned int __tmp = (__X) & (__X - 1);
+  return __tmp;
 }
 
 
@@ -85,8 +85,8 @@ __tzcnt_u32 (unsigned int __X)
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
 __andn_u64 (unsigned long long __X, unsigned long long __Y)
 {
-  unsigned long long tmp = ~(__X) & (__Y);
-  return tmp;
+  unsigned long long __tmp = ~(__X) & (__Y);
+  return __tmp;
 }
 
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
@@ -98,22 +98,22 @@ __bextr_u64 (unsigned long long __X, unsigned long long __Y)
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
 __blsi_u64 (unsigned long long __X)
 {
-  unsigned long long tmp = (__X) & (-(__X));
-  return tmp;
+  unsigned long long __tmp = (__X) & (-(__X));
+  return __tmp;
 }
 
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
 __blsmsk_u64 (unsigned long long __X)
 {
-  unsigned long long tmp = (__X) ^ (__X - 1);
-  return tmp;
+  unsigned long long __tmp = (__X) ^ (__X - 1);
+  return __tmp;
 }
 
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
 __blsr_u64 (unsigned long long __X)
 {
-  unsigned long long tmp = (__X) & (__X - 1);
-  return tmp;
+  unsigned long long __tmp = (__X) & (__X - 1);
+  return __tmp;
 }
 
 extern __inline unsigned long long __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))
diff --git a/gcc/config/i386/tbmintrin.h b/gcc/config/i386/tbmintrin.h
index 8d2431d..0eb7c0a 100644
--- a/gcc/config/i386/tbmintrin.h
+++ b/gcc/config/i386/tbmintrin.h
@@ -47,64 +47,64 @@ __bextri_u32 (unsigned int __X, const unsigned int __I)
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blcfill_u32 (unsigned int __X)
 {
-   unsigned int tmp = (__X) & ((__X) + 1);
-   return tmp;
+  unsigned int __tmp = (__X) & ((__X) + 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blci_u32 (unsigned int __X)
 {
-   unsigned int tmp = (__X) | (~((__X) + 1));
-   return tmp;
+  unsigned int __tmp = (__X) | (~((__X) + 1));
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blcic_u32 (unsigned int __X)
 {
-   unsigned int tmp = (~(__X)) & ((__X) + 1);
-   return tmp;
+  unsigned int __tmp = (~(__X)) & ((__X) + 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blcmsk_u32 (unsigned int __X)
 {
-   unsigned int tmp = (__X) ^ ((__X) + 1);
-   return tmp;
+  unsigned int __tmp = (__X) ^ ((__X) + 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blcs_u32 (unsigned int __X)
 {
-   unsigned int tmp = (__X) | ((__X) + 1);
-   return tmp;
+  unsigned int __tmp = (__X) | ((__X) + 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __blsfill_u32 (unsigned int __X)
 {
-   unsigned int tmp = (__X) | ((__X) - 1);
-   return tmp;
+  unsigned int __tmp = (__X) | ((__X) - 1);
+  return __tmp;
 }
 
 extern __inline unsigned int __attri

Re: PATCH: Replace tmp with __tmp

2011-09-16 Thread Andreas Schwab
"H.J. Lu"  writes:

> diff --git a/gcc/config/i386/bmiintrin.h b/gcc/config/i386/bmiintrin.h
> index af5d9dc..72ab114 100644
> --- a/gcc/config/i386/bmiintrin.h
> +++ b/gcc/config/i386/bmiintrin.h
> @@ -42,8 +42,8 @@ __tzcnt_u16 (unsigned short __X)
>  extern __inline unsigned int __attribute__((__gnu_inline__, 
> __always_inline__, __artificial__))
>  __andn_u32 (unsigned int __X, unsigned int __Y)
>  {
> -  unsigned int tmp = ~(__X) & (__Y);
> -  return tmp;
> +  unsigned int __tmp = ~(__X) & (__Y);
> +  return __tmp;

How about just removing it?  (And the parens are redundant, too.)

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."