Re: [PATCH] libgomp: Add RTEMS support

2015-03-31 Thread Sebastian Huber



On 13/03/15 11:43, Jakub Jelinek wrote:

On Fri, Mar 13, 2015 at 11:38:12AM +0100, Sebastian Huber wrote:

I would like to commit this patch to GCC 4.9 and 5.0.

libgomp/ChangeLog
2015-03-13  Sebastian Huber  

* configure.tgt (*-*-rtems*): Use local-exec TLS model.
* configure.ac (*-*-rtems*): Assume Pthread is supported.
(pthread.h): Check for this header file.
* configure: Regenerate.

Ok for trunk.
Please wait with the backports for a few weeks.


May I back port this patch now? It would be nice to have it available in 
GCC 4.9.3.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] libgomp: Add RTEMS support

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 09:24:30AM +0200, Sebastian Huber wrote:
> 
> 
> On 13/03/15 11:43, Jakub Jelinek wrote:
> >On Fri, Mar 13, 2015 at 11:38:12AM +0100, Sebastian Huber wrote:
> >>I would like to commit this patch to GCC 4.9 and 5.0.
> >>
> >>libgomp/ChangeLog
> >>2015-03-13  Sebastian Huber  
> >>
> >>* configure.tgt (*-*-rtems*): Use local-exec TLS model.
> >>* configure.ac (*-*-rtems*): Assume Pthread is supported.
> >>(pthread.h): Check for this header file.
> >>* configure: Regenerate.
> >Ok for trunk.
> >Please wait with the backports for a few weeks.
> 
> May I back port this patch now? It would be nice to have it available in GCC
> 4.9.3.

Yes.

Jakub


Another inline heuristic tweak

2015-03-31 Thread Jan Hubicka
Hi,
I was investigating regressions WRT to GCC 4.9 on some of Firefox talos
benchmarks.  The reason turned out to be caused by wrong handling functions
that have fast inline path plus an offline slow implementation.
I.e.:
 inline_caller ()
   {
 do_fast_job...
 if (need_more_work)
   noninline_callee ();
   }

Our inline heuristics gives quite a  lot of priority to inlinable functions
called once because it knows the code size will shrink.  It thus almost
consistently first inline noninline_callee before even considering
inline_caller that is way too large at that point. This seem to have got worse
with the new inline metric. Also LLVm seems to get lucky here, because often
the offline function is placed in other compilation unit.  LLVM inlines quite
agressively at compile time even with LTO and thus it usually inline the caller
before even seeing callee's body.

This patch adjust badness metric to recognize this specific pattern and use
caller's overall_growth instead of callee's.  The long conditional is there to
make this to fireonly in quite controlled situation (either because user
declared wrapper inline or the function was produced by ipa-split)
I manually inspected quite representative sample of the cases where this
triggers on firefox and GCC and they all seems quite sane. Next stage1 I
may turn this into a simple propagation that will probably lead to better
results.  The patch as it is fixes the regression and improves significantly
over 4.9 (by up to 32% on individual parts of dromaeo)

The patch also adds explanation to yesterday change suggested by Richard
and I added a capping to the power to 64 - it seems to matter only for
small functions and I am concerned by overflows and effect of extremely
large values. The periodic testers did not show any noticeable regressions
caused by the patch and I have verified that all the speedup remains with the
capping.

I am running additional benchmarks overnight and intend to commit it tomorrow
if testing suceeds.

Bootstrapped/regtested x86_64-linux.

* lto-cgraph.c (lto_output_node, input_overwrite_node): Stream
split_part.
* ipa-inline.c (edge_badness): Add wrapper penalty.
(sum_callers): Move up.
(inline_small_functions): Set single_caller.
* ipa-inline.h (inline_summary): Add single_caller.
* ipa-split.c (split_function): Set split_part.
(cgraph_node::create_clone): Do not shadow decl; copy split_part.
* cgraph.h (cgraph_node): Add split_part.

* gcc.dg/ipa/inlinehint-4.c: New testcase.
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 221777)
+++ lto-cgraph.c(working copy)
@@ -578,6 +578,7 @@ lto_output_node (struct lto_simple_outpu
   bp_pack_enum (&bp, ld_plugin_symbol_resolution,
LDPR_NUM_KNOWN, node->resolution);
   bp_pack_value (&bp, node->instrumentation_clone, 1);
+  bp_pack_value (&bp, node->split_part, 1);
   streamer_write_bitpack (&bp);
   streamer_write_data_stream (ob->main_stream, section, strlen (section) + 1);
 
@@ -1214,6 +1216,7 @@ input_overwrite_node (struct lto_file_de
   node->resolution = bp_unpack_enum (bp, ld_plugin_symbol_resolution,
 LDPR_NUM_KNOWN);
   node->instrumentation_clone = bp_unpack_value (bp, 1);
+  node->split_part = bp_unpack_value (bp, 1);
   gcc_assert (flag_ltrans
  || (!node->in_other_partition
  && !node->used_from_other_partition));
Index: ipa-inline.c
===
--- ipa-inline.c(revision 221777)
+++ ipa-inline.c(working copy)
@@ -1088,6 +1088,7 @@ edge_badness (struct cgraph_edge *edge,
   else if (opt_for_fn (caller->decl, flag_guess_branch_prob) || caller->count)
 {
   sreal numerator, denominator;
+  int overall_growth;
 
   numerator = (compute_uninlined_call_time (callee_info, edge)
   - compute_inlined_call_time (edge, edge_time));
@@ -1098,8 +1099,74 @@ edge_badness (struct cgraph_edge *edge,
   else if (opt_for_fn (caller->decl, flag_branch_probabilities))
numerator = numerator >> 11;
   denominator = growth;
-  if (callee_info->growth > 0)
-   denominator *= callee_info->growth * callee_info->growth;
+
+  overall_growth = callee_info->growth;
+
+  /* Look for inliner wrappers of the form:
+
+inline_caller ()
+  {
+do_fast_job...
+if (need_more_work)
+  noninline_callee ();
+  }
+Withhout panilizing this case, we usually inline noninline_callee
+into the inline_caller because overall_growth is small preventing
+further inlining of inline_caller.
+
+Penalize only callgraph edges to functions with small overall
+growth ...
+   */
+  if (growth > overall_growth
+ /* ... and having only one caller whic

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:
> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence  
> wrote:
>>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>it.
>>
>>The problem appears to be in laying out arguments, specifically
>>varargs. From
>>the "good" -fdump-rtl-expand:
>>
>>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>S4 A32])
>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>  (nil))
>>(insn 19 18 20 2 (set (reg:DF 2 r2)
>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>  (nil))
>>(insn 20 19 21 2 (set (reg:SI 1 r1)
>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>  (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>> (reg:SI 118)) reduced.c:14 -1
>>  (nil))
>>(call_insn 22 21 23 2 (parallel [
>> (set (reg:SI 0 r0)
>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>) [0 __builtin_printf S4
>>A32])
>>
>>The struct members are
>>reg:SI 113 => int a;
>>reg:DF 112 => double b;
>>reg:SI 111 => int c;
>>
>>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>pushed
>>into virtual-outgoing-args. In contrast, post-change to
>>build_ref_of_offset, we get:
>>
>>(insn 17 16 18 2 (set (reg:SI 118)
>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  >*.LC1>)) reduced.c:14 -1
>>  (nil))
>>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>virtual-outgoing-args)
>> (const_int 8 [0x8])) [0  S4 A64])
>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>  (nil))
>>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>S8 A64])
>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>  (nil))
>>(insn 20 19 21 2 (set (reg:SI 2 r2)
>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>  (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>> (reg:SI 118)) reduced.c:14 -1
>>  (nil))
>>(call_insn 22 21 23 2 (parallel [
>> (set (reg:SI 0 r0)
>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>) [0 __builtin_printf S4
>>A32])
>>
>>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>
>>double+following int are all pushed into virtual-outgoing-args. This is
>>because
>>arm_function_arg is fed a 64-bit-aligned int as type of the second
>>argument (the
>>type constructed by build_ref_for_offset); it then executes
>>(aapcs_layout_arg,
>>arm.c line ~~5914)
>>
>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>  next even number.  */
>>   ncrn = pcum->aapcs_ncrn;
>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>> ncrn++;
>>
>>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>testcase
>>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>32-bit-aligned int instead, which works as previously.
>>
>>Passing the same members of that struct in a non-vargs call, works ok -
>>I think
>>because these use the type of the declared parameters, rather than the
>>provided
>>arguments, and the former do not have the increased alignment from
>>build_ref_for_offset.
>
> It doesn't make sense to use the alignment of passed values.  That looks like 
> bs.
>
> This means that
>
> Int I __aligned__(8);
>
> Is passed differently than int.
>
> Arm_function_arg needs to be fixed.

That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.  Special-casing aligned vs.
non-aligned values only makes sense for things passed by value on the stack.
And then obviously only dependent on the functuion type signature, not
on the type of the passed value.

Richard.

> Richard.
>
>>
>>FWIW, I also tried:
>>
>>__attribute__((__aligned__((16 int x;
>>int main (void)
>>{
>>   __builtin_printf("%d\n", x);
>>}
>>
>>but in that case, the arm_function_arg is still fed a type with
>>alignment 32
>>(bits), i.e. distinct from the type of the field 'x' in memory, which
>>has
>>alignment 128.
>>
>>--Alan
>>
>>Richard Biener wrote:
>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>
 On Mon, 30 Mar 2015, Alan Lawrence wrote:

> ...actually attach the testcase...
 What compile options?
>>>
>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>> I can't see anything not guaranteeing that:
>>>
>>> .section.rodata
>>> .align  3
>>> .LANCHOR0 = . + 0
>>> .LC1:
>>> .ascii  "%d %g %d\012\000"
>>> .space  6
>>> .LC0:
>>> .word   7
>>> .space  4
>>> .word   0
>>> .word   1075838976
>>> .word   9
>>> .space  4
>>>
>>> maybe there is some more generic code-gen bug for aligned aggregate
>>> copy?  That is, the patch tells the backend th

Re: Silence merge warnings on artificial types

2015-03-31 Thread Jan Hubicka
> On 03/30/2015 01:23 PM, Jan Hubicka wrote:
> >Jason probably knows better, but I think only real C++ types comply the One 
> >Defintion
> >Type and should be merged. Anything we create artifically in compiler is 
> >probably
> >not covered by this.
> 
> Agreed, compiler internals are outside the scope of the language.  :)

:)

Hi,
this patch adds the ARTIFICIAL flag check to avoid ODR merging to these.
I oriignally tested DECL_ARTIFICIAL (decl) (that is TYPE_NAME) that randomly
dropped type names on some classes but not all.

Jason, please do you know what is meaning of DECL_ARTIFICIAL on class type
names? Perhaps we can drop them to 0 in free lang data?

With this bug I triggered wrong devirtualization because we no longer insert
non-odr types into a type inheritance graph.  This is fixed by the 
lto_read_decls
change and finally I triggered an ICE in ipa-devirt that due to the bug
output a warning late and ICEd on streamer cache being NULL.  I guess it is
better to guard it even though all wanrings should be output early.

Bootsrapped/regtested x86_64-linux, will commit it after chromium rebuild.

Honza

* tree.c (need_assembler_name_p): Artificial types have no ODR
names.
* ipa-devirt.c (warn_odr): Do not try to apply ODR cache when
no caching is done.

* lto.c (lto_read_decls): Move code registering odr types out
of TYPE_CANONICAL conditional and also register polymorphic types.
Index: tree.c
===
--- tree.c  (revision 221777)
@@ -5139,6 +5145,7 @@ need_assembler_name_p (tree decl)
   && decl == TYPE_NAME (TREE_TYPE (decl))
   && !is_lang_specific (TREE_TYPE (decl))
   && AGGREGATE_TYPE_P (TREE_TYPE (decl))
+  && !TYPE_ARTIFICIAL (TREE_TYPE (decl))
   && !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)
   && !type_in_anonymous_namespace_p (TREE_TYPE (decl)))
 return !DECL_ASSEMBLER_NAME_SET_P (decl);
Index: lto/lto.c
===
--- lto/lto.c   (revision 221777)
+++ lto/lto.c   (working copy)
@@ -1944,13 +1944,24 @@ lto_read_decls (struct lto_file_decl_dat
  lto_fixup_prevailing_type (t);
}
  /* Compute the canonical type of all types.
-???  Should be able to assert that !TYPE_CANONICAL.  */
+
+Inside a strongly connected component
+gimple_register_canonical_type may recurse and insert
+main variant ahead of time.  Thus the need to check
+TYPE_CANONICAL. */
  if (TYPE_P (t) && !TYPE_CANONICAL (t))
-   {
- gimple_register_canonical_type (t);
- if (odr_type_p (t))
-   register_odr_type (t);
-   }
+   gimple_register_canonical_type (t);
+
+ /* Reigster types to ODR hash.  If we compile unit w/o
+-fno-lto-odr-type-merging, also insert types with virtual
+tables to keep type inheritance graph complete on
+polymorphic types.  */
+ if (TYPE_P (t)
+ && (odr_type_p (t)
+ || (TYPE_MAIN_VARIANT (t) == t
+ && TREE_CODE (t) == RECORD_TYPE
+ && TYPE_BINFO (t) && BINFO_VTABLE (TYPE_BINFO (t)
+   register_odr_type (t);
  /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
 type which is also member of this SCC.  */
  if (TREE_CODE (t) == INTEGER_CST
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 221777)
+++ ipa-devirt.c(working copy)
@@ -939,7 +939,8 @@ warn_odr (tree t1, tree t2, tree st1, tr
 
   /* ODR warnings are output druing LTO streaming; we must apply location
  cache for potential warnings to be output correctly.  */
-  lto_location_cache::current_cache->apply_location_cache ();
+  if (lto_location_cache::current_cache)
+lto_location_cache::current_cache->apply_location_cache ();
 
   if (!warning_at (DECL_SOURCE_LOCATION (TYPE_NAME (t1)), OPT_Wodr,
   "type %qT violates one definition rule",


Re: [PATCH][ada][PR65490] Fix bzero warning in child_setup_tty

2015-03-31 Thread Arnaud Charlet
> OK for stage 4/stage1?
> 
> Thanks,
> - Tom

> Fix bzero warning in child_setup_tty
> 
> 2015-03-30  Tom de Vries  
> 
>   PR ada/65490
>   * terminals.c (child_setup_tty): Fix warning 'argument to sizeof in
>   bzero call is the same expression as the destination'.
> ---
>  gcc/ada/terminals.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ada/terminals.c b/gcc/ada/terminals.c
> index a46e610..eaaf1c2 100644
> --- a/gcc/ada/terminals.c
> +++ b/gcc/ada/terminals.c
> @@ -1262,8 +1262,8 @@ child_setup_tty (int fd)
>struct termios s;
>intstatus;
>  
> -  /* ensure that s is filled with 0 */
> -  bzero (&s, sizeof (&s));
> +  /* Ensure that s is filled with 0.  */

Please keep the comment as is, we do not put dots on single partial sentences
(otherwise you would have to change these everywhere, and you and I do not
really want that).

> +  bzero (&s, sizeof (s));

the above single line change is OK for stage 4, thanks.

Arno


Re: [PATCH][ada][PR65490] Fix bzero warning in child_setup_tty

2015-03-31 Thread Arnaud Charlet
> > -  /* ensure that s is filled with 0 */
> > -  bzero (&s, sizeof (&s));
> > +  /* Ensure that s is filled with 0.  */
> 
> Please keep the comment as is, we do not put dots on single partial sentences
> (otherwise you would have to change these everywhere, and you and I do not
> really want that).
> 
> > +  bzero (&s, sizeof (s));
> 
> the above single line change is OK for stage 4, thanks.

Plus the copyright update (to 2015).

--
--- terminals.c (revision 313797)
+++ terminals.c (working copy)
@@ -6,7 +6,7 @@
  *  *
  *  C Implementation File   *
  *  *
- * Copyright (C) 2008-2014, AdaCore *
+ * Copyright (C) 2008-2015, AdaCore *
  *  *
  * GNAT is free software;  you can  redistribute it  and/or modify it under *
  * terms of the  GNU General Public License as published  by the Free Soft- *
@@ -1263,7 +1263,7 @@
   intstatus;

   /* ensure that s is filled with 0 */
-  bzero (&s, sizeof (&s));
+  bzero (&s, sizeof (s));

   /* Get the current terminal settings */
   status = tcgetattr (fd, &s);



Re: [PATCH, PR target/65602] Fix check_effective_target_mpx to check lib availability

2015-03-31 Thread Ilya Enkovich
2015-03-30 23:30 GMT+03:00 Rainer Orth :
> I originally reported the bug and did test the patch over the weekend:
> the Solaris/x86 testsuite failures are gone, so that part is fine.  I
> couldn't of course test the alloca -> __builtin_alloca change since the
> tests aren't built at all.
>
> I don't have a baseline for Linux/x86_64 without --enable-libmpx (the
> default) to compare against, but see in the gcc.log file that the mpx
> tests aren't run in that config due to missing -lmpx -lmpxwrappers.

Thanks for testing!

>
> I'd suggest (though this is stage1 material) to split the mpx tests into
> compile (requiring an assembler with mpx support) and link/run
> (also requiring the runtime libs) tests to extend test coverage.

I don't see much reason to run compile mpx tests on targets where mpx
is not supported. I thinks it would be enough to just enable libmpx by
default on stage 1.

Thanks
Ilya

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-03-31 Thread Kyrill Tkachov

Hi Evandro
On 30/03/15 22:51, Evandro Menezes wrote:

The Samsung Exynos M1 implements the ARMv8 ISA and this patch adds support
for it through the -mcpu command-line option.

The patch was checked on arm-unknown-linux-gnueabihf without new failures.

OK for trunk?

-- Evandro Menezes Austin, TX

0001-ARM-Add-option-for-the-Samsung-Exynos-M1-core-for-AR.patch


diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index b22ea7f..0710a38 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -168,6 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, 
cortexa7, 7A,  FL_LDSCHED |
  ARM_CORE("cortex-a53",  cortexa53, cortexa53,   8A, FL_LDSCHED | 
FL_CRC32, cortex_a53)
  ARM_CORE("cortex-a57",  cortexa57, cortexa57,   8A, FL_LDSCHED | 
FL_CRC32, cortex_a57)
  ARM_CORE("cortex-a72",  cortexa72, cortexa57,   8A, FL_LDSCHED | 
FL_CRC32, cortex_a57)
+ARM_CORE("exynos-m1",exynosm1,  exynosm1,8A, FL_LDSCHED | 
FL_CRC32, exynosm1)


There are two problems with this:
* The 3rd field of ARM_CORE represents the scheduling identifier and 
without a
separate pipeline description for exynosm1 this will just use the 
generic_sched

scheduler which performs quite poorly on modern cores.  Would you prefer to
reuse a pipeline description from one of the pre-existing ones? Look for 
example

at the cortex-a72 definition:
ARM_CORE("cortex-a72",cortexa72, cortexa57,  <...snip>
here the cortexa57 means 'make scheduling decisions for cortexa57'.

* The final field in ARM_CORE specifies the tuning struct to be used for 
this core.
This should be defined in arm.c and have the form 'arm__tune, so 
for your
case it should be arm_exynosm1_tune. This isn't defined in your patch, 
so it won't
compile without that. You can write a custom tuning struct yourself, or 
reuse a

tuning struct for one of the existing cores, if you'd like.

Also, you should add exynosm1 to the switch statement in arm_issue_rate 
to specify
the issue rate. I have a patch for next stage1 that should refactor it 
all into the
tuning structs 
(https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02706.html) but until

that goes in, you should fill in the switch statement there.

Thanks,
Kyrill




Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 08:50, Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:
>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>>  wrote:
>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>> it.
>>>
>>> The problem appears to be in laying out arguments, specifically
>>> varargs. From
>>> the "good" -fdump-rtl-expand:
>>>
>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>> S4 A32])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 __builtin_printf S4
>>> A32])
>>>
>>> The struct members are
>>> reg:SI 113 => int a;
>>> reg:DF 112 => double b;
>>> reg:SI 111 => int c;
>>>
>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>> pushed
>>> into virtual-outgoing-args. In contrast, post-change to
>>> build_ref_of_offset, we get:
>>>
>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  >> *.LC1>)) reduced.c:14 -1
>>>  (nil))
>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>> virtual-outgoing-args)
>>> (const_int 8 [0x8])) [0  S4 A64])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>> S8 A64])
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 __builtin_printf S4
>>> A32])
>>>
>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>
>>> double+following int are all pushed into virtual-outgoing-args. This is
>>> because
>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>> argument (the
>>> type constructed by build_ref_for_offset); it then executes
>>> (aapcs_layout_arg,
>>> arm.c line ~~5914)
>>>
>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>  next even number.  */
>>>   ncrn = pcum->aapcs_ncrn;
>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>> ncrn++;
>>>
>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>> testcase
>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>> 32-bit-aligned int instead, which works as previously.
>>>
>>> Passing the same members of that struct in a non-vargs call, works ok -
>>> I think
>>> because these use the type of the declared parameters, rather than the
>>> provided
>>> arguments, and the former do not have the increased alignment from
>>> build_ref_for_offset.
>>
>> It doesn't make sense to use the alignment of passed values.  That looks 
>> like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
> 
> That is,
> 
> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
> 
> or
> 
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
> 
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
> 
> should behave the same.  There isn't a printf modifier for an "aligned int"
> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> non-aligned values only makes sense for things passed by value on the stack.
> And then obviously only dependent on the functuion type signature, not
> on the type of the passed value.
> 

I think the testcase is ill-formed.  Just because printf doesn't have
such a modifier, doesn't mean that another variadic function might not
have a means to detect when an object in the variadic list needs to be
over-aligned.  As such, the test should really be written as:

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", (int) i, j);
}

Variadic functions take the types of their arguments from the types of
the actuals passed.  The compiler should either be applying appropriate
promotion rules to make the types conformant by the language
specification or respecting the types exactly.  However, that should be
done in the mid-end not the back-end.  If incorrect alignment
information is passed to the back-end it can't help but make the wrong
choice.  Examining the mo

Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX

2015-03-31 Thread Ilya Enkovich
On 23 Mar 13:19, Ilya Enkovich wrote:
> Hi,
> 
> May this patch go into trunk at this point?  It is very important for
> dynamic MPX codes.
> 
> Thanks,
> Ilya
> 

I additionally documented changes in invoke.texi.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-03-31  Ilya Enkovich  

PR driver/65444
* config/i386/linux-common.h (MPX_SPEC): New.
(CHKP_SPEC): Add MPX_SPEC.
* doc/invoke.texi (-fcheck-pointer-boudns): Document
possible issues with '-z bndplt' support in linker.

libmpx/

2015-03-31  Ilya Enkovich  

PR driver/65444
* configure.ac: Add check for '-z bndplt' support
by linker. Add link_mpx output variable.
* libmpx.spec.in (link_mpx): New.
* configure: Regenerate.


diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index 9c6560b..dd79ec6 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -59,6 +59,11 @@ along with GCC; see the file COPYING3.  If not see
  %:include(libmpx.spec)%(link_libmpx)"
 #endif
 
+#ifndef MPX_SPEC
+#define MPX_SPEC "\
+ %{mmpx:%{fcheck-pointer-bounds:%{!static:%:include(libmpx.spec)%(link_mpx)}}}"
+#endif
+
 #ifndef LIBMPX_SPEC
 #if defined(HAVE_LD_STATIC_DYNAMIC)
 #define LIBMPX_SPEC "\
@@ -89,5 +94,5 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifndef CHKP_SPEC
 #define CHKP_SPEC "\
-%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC LIBMPXWRAPPERS_SPEC "}}"
+%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC LIBMPXWRAPPERS_SPEC "}}" MPX_SPEC
 #endif
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bf8afad..c058710 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5857,7 +5857,16 @@ MPX-based instrumentation requires
 a runtime library to enable MPX in hardware and handle bounds
 violation signals.  By default when @option{-fcheck-pointer-bounds}
 and @option{-mmpx} options are used to link a program, the GCC driver
-links against the @file{libmpx} runtime library.  MPX-based instrumentation
+links against the @file{libmpx} runtime library and @file{libmpxwrappers}
+library.  It also passes '-z bndplt' to a linker in case it supports this
+option (which is checked on libmpx configuration).  Note that old versions
+of linker may ignore option.  Gold linker doesn't support '-z bndplt'
+option.  With no '-z bndplt' support in linker all calls to dynamic libraries
+lose passed bounds reducing overall protection level.  It's highly
+recommended to use linker with '-z bndplt' support.  In case such linker
+is not available it is adviced to always use @option{-static-libmpxwrappers}
+for better protection level or use @option{-static} to completely avoid
+external calls to dynamic libraries.  MPX-based instrumentation
 may be used for debugging and also may be included in production code
 to increase program security.  Depending on usage, you may
 have different requirements for the runtime library.  The current version
diff --git a/libmpx/configure.ac b/libmpx/configure.ac
index fe0d3f2..3f8b50f 100644
--- a/libmpx/configure.ac
+++ b/libmpx/configure.ac
@@ -40,7 +40,18 @@ AC_MSG_RESULT($LIBMPX_SUPPORTED)
 AM_CONDITIONAL(LIBMPX_SUPPORTED, [test "x$LIBMPX_SUPPORTED" = "xyes"])
 
 link_libmpx="-lpthread"
+link_mpx=""
+AC_MSG_CHECKING([whether ld accepts -z bndplt])
+echo "int main() {};" > conftest.c
+if AC_TRY_COMMAND([${CC} ${CFLAGS} -Wl,-z,bndplt -o conftest conftest.c 
1>&AS_MESSAGE_LOG_FD])
+then
+AC_MSG_RESULT([yes])
+link_mpx="$link_mpx -z bndplt"
+else
+AC_MSG_RESULT([no])
+fi
 AC_SUBST(link_libmpx)
+AC_SUBST(link_mpx)
 
 AM_INIT_AUTOMAKE(foreign no-dist no-dependencies)
 AM_ENABLE_MULTILIB(, ..)
diff --git a/libmpx/libmpx.spec.in b/libmpx/libmpx.spec.in
index a265e28..34d0bdf 100644
--- a/libmpx/libmpx.spec.in
+++ b/libmpx/libmpx.spec.in
@@ -1,3 +1,5 @@
 # This spec file is read by gcc when linking.  It is used to specify the
-# standard libraries we need in order to link with libcilkrts.
+# standard libraries we need in order to link with libmpx.
 *link_libmpx: @link_libmpx@
+
+*link_mpx: @link_mpx@


Re: New regression on ARM Linux

2015-03-31 Thread Alan Lawrence

Richard Biener wrote:

On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:

It doesn't make sense to use the alignment of passed values.  That looks like 
bs.

This means that

Int I __aligned__(8);

Is passed differently than int.

Arm_function_arg needs to be fixed.


That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.


Agreed. All of the cases you post do indeed behave the same, and correctly (they 
pass the format string in r0, the next argument in r1, and then r2). This is 
what my "aligned(16)" example was trying to achieve also. From the 
-fdump-rtl-expand of your last example:


(insn 7 6 8 2 (set (reg:SI 117)
(mem:SI (reg/f:SI 116) [2 *_4+0 S4 A32])) richie2.c:10 -1
 (nil))
(insn 8 7 9 2 (set (reg/f:SI 118)
(symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) richie2.c:10 -1
 (nil))
(insn 9 8 10 2 (set (reg/f:SI 119)
(mem/f/c:SI (plus:SI (reg/f:SI 118)
(const_int 4 [0x4])) [1 p+0 S4 A32])) richie2.c:10 -1
 (nil))
(insn 10 9 11 2 (set (reg:SI 120)
(mem:SI (reg/f:SI 119) [2 *_2+0 S4 A64])) richie2.c:10 -1  ***
 (nil))
(insn 11 10 12 2 (set (reg:SI 121)
(symbol_ref/v/f:SI ("*.LC0") [flags 0x82]  *.LC0>)) richie2.c:10 -1

 (nil))
(insn 12 11 13 2 (set (reg:SI 2 r2)
(reg:SI 117)) richie2.c:10 -1
 (nil))
(insn 13 12 14 2 (set (reg:SI 1 r1)
(reg:SI 120)) richie2.c:10 -1
 (nil))
(insn 14 13 15 2 (set (reg:SI 0 r0)
(reg:SI 121)) richie2.c:10 -1
 (nil))

*** is the load of *p. The mem has alignment 64, but the type describing the 
loaded value *p, passed to arm_function_arg, has alignment 32. Even in the first 
of your examples, or even with an explicit cast to the aligned type:

__builtin_printf("%d %d\n", (myint) i, j);
we still get alignment 32 in arm_function_arg. It's only if SRA is applied, do 
we get the situation where the int with 64-bit alignment, is passed to 
arm_function_arg.


Disclaimer, I haven't tracked down how the alignment information flows through 
the compiler i.e. from build_ref_for_offset into expand_call. But it looks to me 
like something different is happening in the SRA case...no?


--Alan



Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 08:50, Richard Biener wrote:
> > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:
> >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
> >>  wrote:
> >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>> it.
> >>>
> >>> The problem appears to be in laying out arguments, specifically
> >>> varargs. From
> >>> the "good" -fdump-rtl-expand:
> >>>
> >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S4 A32])
> >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>> (reg:SI 118)) reduced.c:14 -1
> >>>  (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>> (set (reg:SI 0 r0)
> >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> ) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> The struct members are
> >>> reg:SI 113 => int a;
> >>> reg:DF 112 => double b;
> >>> reg:SI 111 => int c;
> >>>
> >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>> pushed
> >>> into virtual-outgoing-args. In contrast, post-change to
> >>> build_ref_of_offset, we get:
> >>>
> >>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   >>> *.LC1>)) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>> virtual-outgoing-args)
> >>> (const_int 8 [0x8])) [0  S4 A64])
> >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S8 A64])
> >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>> (reg:SI 118)) reduced.c:14 -1
> >>>  (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>> (set (reg:SI 0 r0)
> >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> ) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>
> >>> double+following int are all pushed into virtual-outgoing-args. This is
> >>> because
> >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>> argument (the
> >>> type constructed by build_ref_for_offset); it then executes
> >>> (aapcs_layout_arg,
> >>> arm.c line ~~5914)
> >>>
> >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>  next even number.  */
> >>>   ncrn = pcum->aapcs_ncrn;
> >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>> ncrn++;
> >>>
> >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>> testcase
> >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>> 32-bit-aligned int instead, which works as previously.
> >>>
> >>> Passing the same members of that struct in a non-vargs call, works ok -
> >>> I think
> >>> because these use the type of the declared parameters, rather than the
> >>> provided
> >>> arguments, and the former do not have the increased alignment from
> >>> build_ref_for_offset.
> >>
> >> It doesn't make sense to use the alignment of passed values.  That looks 
> >> like bs.
> >>
> >> This means that
> >>
> >> Int I __aligned__(8);
> >>
> >> Is passed differently than int.
> >>
> >> Arm_function_arg needs to be fixed.
> > 
> > That is,
> > 
> > typedef int myint __attribute__((aligned(8)));
> > 
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", i, j);
> > }
> > 
> > or
> > 
> > myint i;
> > int j;
> > myint *p = &i;
> > int *q = &j;
> > 
> > int main()
> > {
> >   __builtin_printf("%d %d", *p, *q);
> > }
> > 
> > should behave the same.  There isn't a printf modifier for an "aligned int"
> > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > non-aligned values only makes sense for things passed by value on the stack.
> > And then obviously only dependent on the functuion type signature, not
> > on the type of the passed value.
> > 
> 
> I think the testcase is ill-formed.  Just because printf doesn't have
> such a modifier, doesn't mean that another variadic function might not
> have a means to detect when an object in the variadic list needs to be
> over-aligned.  As such, the test should really be written as:

A value doesn't have "alignment".  A function may have alignment
requirements on its arguments, clearly printf doesn't.

> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d 

Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 11:00, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 08:50, Richard Biener wrote:
>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:
 On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
  wrote:
> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> it.
>
> The problem appears to be in laying out arguments, specifically
> varargs. From
> the "good" -fdump-rtl-expand:
>
> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> S4 A32])
> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>  (nil))
> (insn 19 18 20 2 (set (reg:DF 2 r2)
> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>  (nil))
> (insn 20 19 21 2 (set (reg:SI 1 r1)
> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>  (nil))
> (insn 21 20 22 2 (set (reg:SI 0 r0)
> (reg:SI 118)) reduced.c:14 -1
>  (nil))
> (call_insn 22 21 23 2 (parallel [
> (set (reg:SI 0 r0)
> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> ) [0 __builtin_printf S4
> A32])
>
> The struct members are
> reg:SI 113 => int a;
> reg:DF 112 => double b;
> reg:SI 111 => int c;
>
> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> pushed
> into virtual-outgoing-args. In contrast, post-change to
> build_ref_of_offset, we get:
>
> (insn 17 16 18 2 (set (reg:SI 118)
>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   *.LC1>)) reduced.c:14 -1
>  (nil))
> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> virtual-outgoing-args)
> (const_int 8 [0x8])) [0  S4 A64])
> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>  (nil))
> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> S8 A64])
> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>  (nil))
> (insn 20 19 21 2 (set (reg:SI 2 r2)
> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>  (nil))
> (insn 21 20 22 2 (set (reg:SI 0 r0)
> (reg:SI 118)) reduced.c:14 -1
>  (nil))
> (call_insn 22 21 23 2 (parallel [
> (set (reg:SI 0 r0)
> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> ) [0 __builtin_printf S4
> A32])
>
> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>
> double+following int are all pushed into virtual-outgoing-args. This is
> because
> arm_function_arg is fed a 64-bit-aligned int as type of the second
> argument (the
> type constructed by build_ref_for_offset); it then executes
> (aapcs_layout_arg,
> arm.c line ~~5914)
>
>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>  next even number.  */
>   ncrn = pcum->aapcs_ncrn;
>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> ncrn++;
>
> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> testcase
> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> 32-bit-aligned int instead, which works as previously.
>
> Passing the same members of that struct in a non-vargs call, works ok -
> I think
> because these use the type of the declared parameters, rather than the
> provided
> arguments, and the former do not have the increased alignment from
> build_ref_for_offset.

 It doesn't make sense to use the alignment of passed values.  That looks 
 like bs.

 This means that

 Int I __aligned__(8);

 Is passed differently than int.

 Arm_function_arg needs to be fixed.
>>>
>>> That is,
>>>
>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", i, j);
>>> }
>>>
>>> or
>>>
>>> myint i;
>>> int j;
>>> myint *p = &i;
>>> int *q = &j;
>>>
>>> int main()
>>> {
>>>   __builtin_printf("%d %d", *p, *q);
>>> }
>>>
>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>> non-aligned values only makes sense for things passed by value on the stack.
>>> And then obviously only dependent on the functuion type signature, not
>>> on the type of the passed value.
>>>
>>
>> I think the testcase is ill-formed.  Just because printf doesn't have
>> such a modifier, doesn't mean that another variadic function might not
>> have a means to detect when an object in the variadic list needs to be
>> over-aligned.  As such, the test should really be written as:
> 
> A value doesn't have "alignment".  A function may have alignment
> requirements on its arguments, clearly printf doesn't.
> 

Values don't.  But types do and variadic functions are special in that
they

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Biener wrote:

> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
> > On 31/03/15 08:50, Richard Biener wrote:
> > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> > > wrote:
> > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
> > >>  wrote:
> > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> > >>> it.
> > >>>
> > >>> The problem appears to be in laying out arguments, specifically
> > >>> varargs. From
> > >>> the "good" -fdump-rtl-expand:
> > >>>
> > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S4 A32])
> > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>> (reg:SI 118)) reduced.c:14 -1
> > >>>  (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>> (set (reg:SI 0 r0)
> > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> ) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> The struct members are
> > >>> reg:SI 113 => int a;
> > >>> reg:DF 112 => double b;
> > >>> reg:SI 111 => int c;
> > >>>
> > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> > >>> pushed
> > >>> into virtual-outgoing-args. In contrast, post-change to
> > >>> build_ref_of_offset, we get:
> > >>>
> > >>> (insn 17 16 18 2 (set (reg:SI 118)
> > >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   > >>> *.LC1>)) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> > >>> virtual-outgoing-args)
> > >>> (const_int 8 [0x8])) [0  S4 A64])
> > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S8 A64])
> > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>  (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>> (reg:SI 118)) reduced.c:14 -1
> > >>>  (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>> (set (reg:SI 0 r0)
> > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> ) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> > >>>
> > >>> double+following int are all pushed into virtual-outgoing-args. This is
> > >>> because
> > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> > >>> argument (the
> > >>> type constructed by build_ref_for_offset); it then executes
> > >>> (aapcs_layout_arg,
> > >>> arm.c line ~~5914)
> > >>>
> > >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> > >>>  next even number.  */
> > >>>   ncrn = pcum->aapcs_ncrn;
> > >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> > >>> ncrn++;
> > >>>
> > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> > >>> testcase
> > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> > >>> 32-bit-aligned int instead, which works as previously.
> > >>>
> > >>> Passing the same members of that struct in a non-vargs call, works ok -
> > >>> I think
> > >>> because these use the type of the declared parameters, rather than the
> > >>> provided
> > >>> arguments, and the former do not have the increased alignment from
> > >>> build_ref_for_offset.
> > >>
> > >> It doesn't make sense to use the alignment of passed values.  That looks 
> > >> like bs.
> > >>
> > >> This means that
> > >>
> > >> Int I __aligned__(8);
> > >>
> > >> Is passed differently than int.
> > >>
> > >> Arm_function_arg needs to be fixed.
> > > 
> > > That is,
> > > 
> > > typedef int myint __attribute__((aligned(8)));
> > > 
> > > int main()
> > > {
> > >   myint i = 1;
> > >   int j = 2;
> > >   __builtin_printf("%d %d\n", i, j);
> > > }
> > > 
> > > or
> > > 
> > > myint i;
> > > int j;
> > > myint *p = &i;
> > > int *q = &j;
> > > 
> > > int main()
> > > {
> > >   __builtin_printf("%d %d", *p, *q);
> > > }
> > > 
> > > should behave the same.  There isn't a printf modifier for an "aligned 
> > > int"
> > > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > > non-aligned values only makes sense for things passed by value on the 
> > > stack.
> > > And then obviously only dependent on the functuion type signature, not
> > > on the type of the passed value.
> > > 
> > 
> > I think the testcase is ill-formed.  Just because printf doesn't have
> > such a modifier, doesn't mean that another variadic function might not
> > have a means to detect when an object in

Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 11:20, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Biener wrote:
> 
>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>
>>> On 31/03/15 08:50, Richard Biener wrote:
 On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:
> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>  wrote:
>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>> it.
>>
>> The problem appears to be in laying out arguments, specifically
>> varargs. From
>> the "good" -fdump-rtl-expand:
>>
>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>> S4 A32])
>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>  (nil))
>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>  (nil))
>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>  (nil))
>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>> (reg:SI 118)) reduced.c:14 -1
>>  (nil))
>> (call_insn 22 21 23 2 (parallel [
>> (set (reg:SI 0 r0)
>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>> ) [0 __builtin_printf S4
>> A32])
>>
>> The struct members are
>> reg:SI 113 => int a;
>> reg:DF 112 => double b;
>> reg:SI 111 => int c;
>>
>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>> pushed
>> into virtual-outgoing-args. In contrast, post-change to
>> build_ref_of_offset, we get:
>>
>> (insn 17 16 18 2 (set (reg:SI 118)
>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  > *.LC1>)) reduced.c:14 -1
>>  (nil))
>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>> virtual-outgoing-args)
>> (const_int 8 [0x8])) [0  S4 A64])
>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>  (nil))
>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>> S8 A64])
>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>  (nil))
>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>  (nil))
>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>> (reg:SI 118)) reduced.c:14 -1
>>  (nil))
>> (call_insn 22 21 23 2 (parallel [
>> (set (reg:SI 0 r0)
>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>> ) [0 __builtin_printf S4
>> A32])
>>
>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>
>> double+following int are all pushed into virtual-outgoing-args. This is
>> because
>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>> argument (the
>> type constructed by build_ref_for_offset); it then executes
>> (aapcs_layout_arg,
>> arm.c line ~~5914)
>>
>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>  next even number.  */
>>   ncrn = pcum->aapcs_ncrn;
>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>> ncrn++;
>>
>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>> testcase
>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>> 32-bit-aligned int instead, which works as previously.
>>
>> Passing the same members of that struct in a non-vargs call, works ok -
>> I think
>> because these use the type of the declared parameters, rather than the
>> provided
>> arguments, and the former do not have the increased alignment from
>> build_ref_for_offset.
>
> It doesn't make sense to use the alignment of passed values.  That looks 
> like bs.
>
> This means that
>
> Int I __aligned__(8);
>
> Is passed differently than int.
>
> Arm_function_arg needs to be fixed.

 That is,

 typedef int myint __attribute__((aligned(8)));

 int main()
 {
   myint i = 1;
   int j = 2;
   __builtin_printf("%d %d\n", i, j);
 }

 or

 myint i;
 int j;
 myint *p = &i;
 int *q = &j;

 int main()
 {
   __builtin_printf("%d %d", *p, *q);
 }

 should behave the same.  There isn't a printf modifier for an "aligned int"
 because that sort of thing doesn't make sense.  Special-casing aligned vs.
 non-aligned values only makes sense for things passed by value on the 
 stack.
 And then obviously only dependent on the functuion type signature, not
 on the type of the passed value.

>>>
>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>> such a modifier, doesn't mean that another variadic function might not
>>> have a means to detect when an object in the variadic list needs to be
>>> over-aligned.  As such, the test should really be written as:
>>
>> A value

Re: New regression on ARM Linux

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 11:10:39AM +0100, Richard Earnshaw wrote:
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", i, j);
> >>> }
> >>>
> >>> or
> >>>
> >>> myint i;
> >>> int j;
> >>> myint *p = &i;
> >>> int *q = &j;
> >>>
> >>> int main()
> >>> {
> >>>   __builtin_printf("%d %d", *p, *q);
> >>> }

Note that starting with r221348, gcc fails to profiledbootstrap on
armv7hl-linux-gnueabi.  I'd hope it is the same thing.

To middle-end, all integral type conversions that differ just in alignment
are useless - for INTEGRAL_TYPE_P all useless_type_conversion_p cares about
is sign and precision.

So, if arm has a weirdo ABI that wants to pass aligned types differently
(why, I'd say it is just a bug in the ABI), then for named arguments it
really has to look at the function type - the type of the argument, rather
than the passed in value's type, and for varargs it would be best if it
remembered such thing early (in the FEs), because the middle-end can change
things any time, with or without Richard B.'s recent SRA change.

Jakub


Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:00, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 08:50, Richard Biener wrote:
> >>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> >>> wrote:
>  On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>   wrote:
> > -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> > it.
> >
> > The problem appears to be in laying out arguments, specifically
> > varargs. From
> > the "good" -fdump-rtl-expand:
> >
> > (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> > S4 A32])
> > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >  (nil))
> > (insn 19 18 20 2 (set (reg:DF 2 r2)
> > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >  (nil))
> > (insn 20 19 21 2 (set (reg:SI 1 r1)
> > (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >  (nil))
> > (insn 21 20 22 2 (set (reg:SI 0 r0)
> > (reg:SI 118)) reduced.c:14 -1
> >  (nil))
> > (call_insn 22 21 23 2 (parallel [
> > (set (reg:SI 0 r0)
> > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > ) [0 __builtin_printf S4
> > A32])
> >
> > The struct members are
> > reg:SI 113 => int a;
> > reg:DF 112 => double b;
> > reg:SI 111 => int c;
> >
> > r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> > pushed
> > into virtual-outgoing-args. In contrast, post-change to
> > build_ref_of_offset, we get:
> >
> > (insn 17 16 18 2 (set (reg:SI 118)
> >   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   > *.LC1>)) reduced.c:14 -1
> >  (nil))
> > (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> > virtual-outgoing-args)
> > (const_int 8 [0x8])) [0  S4 A64])
> > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >  (nil))
> > (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> > S8 A64])
> > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >  (nil))
> > (insn 20 19 21 2 (set (reg:SI 2 r2)
> > (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >  (nil))
> > (insn 21 20 22 2 (set (reg:SI 0 r0)
> > (reg:SI 118)) reduced.c:14 -1
> >  (nil))
> > (call_insn 22 21 23 2 (parallel [
> > (set (reg:SI 0 r0)
> > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > ) [0 __builtin_printf S4
> > A32])
> >
> > r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >
> > double+following int are all pushed into virtual-outgoing-args. This is
> > because
> > arm_function_arg is fed a 64-bit-aligned int as type of the second
> > argument (the
> > type constructed by build_ref_for_offset); it then executes
> > (aapcs_layout_arg,
> > arm.c line ~~5914)
> >
> >   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >  next even number.  */
> >   ncrn = pcum->aapcs_ncrn;
> >   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> > ncrn++;
> >
> > Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> > testcase
> > "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> > 32-bit-aligned int instead, which works as previously.
> >
> > Passing the same members of that struct in a non-vargs call, works ok -
> > I think
> > because these use the type of the declared parameters, rather than the
> > provided
> > arguments, and the former do not have the increased alignment from
> > build_ref_for_offset.
> 
>  It doesn't make sense to use the alignment of passed values.  That looks 
>  like bs.
> 
>  This means that
> 
>  Int I __aligned__(8);
> 
>  Is passed differently than int.
> 
>  Arm_function_arg needs to be fixed.
> >>>
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", i, j);
> >>> }
> >>>
> >>> or
> >>>
> >>> myint i;
> >>> int j;
> >>> myint *p = &i;
> >>> int *q = &j;
> >>>
> >>> int main()
> >>> {
> >>>   __builtin_printf("%d %d", *p, *q);
> >>> }
> >>>
> >>> should behave the same.  There isn't a printf modifier for an "aligned 
> >>> int"
> >>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>> non-aligned values only makes sense for things passed by value on the 
> >>> stack.
> >>> And then obviously only dependent on the functuion type signature, not
> >>> on the type of the passed value.
> >>>
> >>
> >> I think the testcase is ill-formed.  Just because printf doesn't have
> >> such a modifier, doesn't mean that another variadic function might not

Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread H.J. Lu
On Mon, Mar 30, 2015 at 10:38 PM, Jakub Jelinek  wrote:
> On Mon, Mar 30, 2015 at 07:08:00PM -0700, H.J. Lu wrote:
>> --- a/gcc/gcc.c
>> +++ b/gcc/gcc.c
>> @@ -1566,11 +1566,13 @@ init_spec (void)
>>   if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
>> {
>>   init_gcc_specs (&obstack,
>> + "-lgcc_nonshared "
>>   "-lgcc_s"
>>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>>   " -lunwind"
>>  #endif
>>   ,
>> + "-lgcc_nonshared "
>>   "-lgcc",
>>   "-lgcc_eh"
>>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>> @@ -1591,7 +1593,9 @@ init_spec (void)
>>   /* Ug.  We don't know shared library extensions.  Hope that
>>  systems that use this form don't do shared libraries.  */
>>   init_gcc_specs (&obstack,
>> + "libgcc_nonshared.a%s "
>>   "-lgcc_s",
>> + "libgcc_nonshared.a%s "
>>   "libgcc.a%s",
>>   "libgcc_eh.a%s"
>
> Why do you need to link libgcc_nonshared.a twice here?  -lgcc_s surely won't
> add any new __cpu* undefined references.

The one added for -lgcc_s is for building shared C++ library since -lgcc isn't
used and only -lgcc_s is used.  The one added for -lgcc is for static linking
since -lgcc_s isn't used.  I updated the patch to add some testcases
for -static.
I couldn't find a way to add tests for:

export/build/gnu/gcc/build-x86_64-linux/gcc/xg++
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2-c -o main.o
main.cc
/export/build/gnu/gcc/build-x86_64-linux/gcc/xg++
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -shared -fPIC -O2  -o
libmv20.so mv20.cc
-L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
/export/build/gnu/gcc/build-x86_64-linux/gcc/xg++
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2  -o x main.o
libmv20.so -Wl,-R,.
-L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
/usr/local/bin/ld: x: hidden symbol `__cpu_model' in
/export/build/gnu/gcc/build-x86_64-linux/gcc/libgcc_nonshared.a(cpuinfo.o)
is referenced by DSO
/usr/local/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status

>> @@ -424,3 +424,8 @@ __cpu_indicator_init (void)
>>
>>return 0;
>>  }
>> +
>> +#if defined SHARED && !defined _WIN32
>> +__asm__ (".symver __cpu_indicator_init, __cpu_indicator_init@GCC_4.8.0");
>> +__asm__ (".symver __cpu_model, __cpu_model@GCC_4.8.0");
>> +#endif
>
> Will this work on Solaris?
> I'd say you at least want to also guard with some configure check if
> .symver is supported by assembler.
>

I updated my patch with:

@@ -424,3 +424,8 @@ __cpu_indicator_init (void)

   return 0;
 }
+
+#if defined SHARED && defined USE_ELF_SYMVER
+__asm__ (".symver __cpu_indicator_init, __cpu_indicator_init@GCC_4.8.0");
+__asm__ (".symver __cpu_model, __cpu_model@GCC_4.8.0");
+#endif

diff --git a/libgcc/config/i386/t-linux b/libgcc/config/i386/t-linux
index 4f47f7b..11bb46e 100644
--- a/libgcc/config/i386/t-linux
+++ b/libgcc/config/i386/t-linux
@@ -3,4 +3,4 @@
 # t-slibgcc-elf-ver and t-linux
 SHLIB_MAPFILES = libgcc-std.ver $(srcdir)/config/i386/libgcc-glibc.ver

-HOST_LIBGCC2_CFLAGS += -mlong-double-80
+HOST_LIBGCC2_CFLAGS += -mlong-double-80 -DUSE_ELF_SYMVER

OK for master?

-- 
H.J.
From 259457afe0add56064ed49da1954d0770b1e1975 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 29 Mar 2015 18:03:49 -0700
Subject: [PATCH] Hide __cpu_indicator_init/__cpu_model from linker

We shouldn't call external function, __cpu_indicator_init, while an object
is being relocated since its .got.plt section hasn't been updated.  It
works for non-PIE since no update on .got.plt section is required.  This
patch hides __cpu_indicator_init/__cpu_model from linker to force linker
to resolve __cpu_indicator_init/__cpu_model to their hidden definitions
in libgcc_nonshared.a while providing backward binary compatibility.  The
new libgcc_nonshared.a is always linked togther with -lgcc_s and -lgcc.

gcc/

	PR target/65612
	* gcc.c (init_spec): Add -lgcc_nonshared/libgcc_nonshared.a%s
	to -lgcc_s/-lgcc/libgcc.a%s.

gcc/testsuite/

	PR target/65612
	* g++.dg/ext/mv18.C: New test.
	* g++.dg/ext/mv19.C: Likewise.
	* g++.dg/ext/mv20.C: Likewise.
	* g++.dg/ext/mv21.C: Likewise.
	* g++.dg/ext/mv22.C: Likewise.
	* g++.dg/ext/mv23.C: Likewise.

libgcc/

	PR target/65612
	* Makefile.in (LIB2ADDSHARED): New.
	(LIB2ADDNONSHARED): Likewise.
	(libgcc-nonshared-objects): Likewise.
	(libgcc_nonshared.a): Likewise.
	Check unsupported files in LIB2ADDNONSHARED or LIB2ADDSHARED.
	(iter-items): Add $(LIB2ADDNONSHARED) $(LIB2ADDSHARED).
	(libgcc-s-objects): Add $(LIB2ADDSHARED).
	(all): Depend on libgcc_nonshared.a.
	($(libgcc-nonshared-objects)): Depend on libgcc_tm.h.
	(install-leaf): Install libgcc_nonshared.a.
	* sha

Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 11:36, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:00, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
 On 31/03/15 08:50, Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> wrote:
>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>>  wrote:
>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>> it.
>>>
>>> The problem appears to be in laying out arguments, specifically
>>> varargs. From
>>> the "good" -fdump-rtl-expand:
>>>
>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>> S4 A32])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 __builtin_printf S4
>>> A32])
>>>
>>> The struct members are
>>> reg:SI 113 => int a;
>>> reg:DF 112 => double b;
>>> reg:SI 111 => int c;
>>>
>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>> pushed
>>> into virtual-outgoing-args. In contrast, post-change to
>>> build_ref_of_offset, we get:
>>>
>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  >> *.LC1>)) reduced.c:14 -1
>>>  (nil))
>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>> virtual-outgoing-args)
>>> (const_int 8 [0x8])) [0  S4 A64])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>> S8 A64])
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 __builtin_printf S4
>>> A32])
>>>
>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>
>>> double+following int are all pushed into virtual-outgoing-args. This is
>>> because
>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>> argument (the
>>> type constructed by build_ref_for_offset); it then executes
>>> (aapcs_layout_arg,
>>> arm.c line ~~5914)
>>>
>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>  next even number.  */
>>>   ncrn = pcum->aapcs_ncrn;
>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>> ncrn++;
>>>
>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>> testcase
>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>> 32-bit-aligned int instead, which works as previously.
>>>
>>> Passing the same members of that struct in a non-vargs call, works ok -
>>> I think
>>> because these use the type of the declared parameters, rather than the
>>> provided
>>> arguments, and the former do not have the increased alignment from
>>> build_ref_for_offset.
>>
>> It doesn't make sense to use the alignment of passed values.  That looks 
>> like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
>
> That is,
>
> typedef int myint __attribute__((aligned(8)));
>
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
>
> or
>
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
>
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
>
> should behave the same.  There isn't a printf modifier for an "aligned 
> int"
> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> non-aligned values only makes sense for things passed by value on the 
> stack.
> And then obviously only dependent on the functuion type signature, not
> on the type of the passed value.
>

 I think the testcase is ill-formed.  Just because printf doesn't have
 such a modifier, doesn't m

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:20, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Biener wrote:
> > 
> >> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>
> >>> On 31/03/15 08:50, Richard Biener wrote:
>  On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
>  wrote:
> > On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
> >  wrote:
> >> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >> it.
> >>
> >> The problem appears to be in laying out arguments, specifically
> >> varargs. From
> >> the "good" -fdump-rtl-expand:
> >>
> >> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >> S4 A32])
> >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >> (reg:SI 118)) reduced.c:14 -1
> >>  (nil))
> >> (call_insn 22 21 23 2 (parallel [
> >> (set (reg:SI 0 r0)
> >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >> ) [0 __builtin_printf S4
> >> A32])
> >>
> >> The struct members are
> >> reg:SI 113 => int a;
> >> reg:DF 112 => double b;
> >> reg:SI 111 => int c;
> >>
> >> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >> pushed
> >> into virtual-outgoing-args. In contrast, post-change to
> >> build_ref_of_offset, we get:
> >>
> >> (insn 17 16 18 2 (set (reg:SI 118)
> >>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   >> *.LC1>)) reduced.c:14 -1
> >>  (nil))
> >> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >> virtual-outgoing-args)
> >> (const_int 8 [0x8])) [0  S4 A64])
> >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >> S8 A64])
> >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>  (nil))
> >> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >> (reg:SI 118)) reduced.c:14 -1
> >>  (nil))
> >> (call_insn 22 21 23 2 (parallel [
> >> (set (reg:SI 0 r0)
> >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >> ) [0 __builtin_printf S4
> >> A32])
> >>
> >> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>
> >> double+following int are all pushed into virtual-outgoing-args. This is
> >> because
> >> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >> argument (the
> >> type constructed by build_ref_for_offset); it then executes
> >> (aapcs_layout_arg,
> >> arm.c line ~~5914)
> >>
> >>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>  next even number.  */
> >>   ncrn = pcum->aapcs_ncrn;
> >>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >> ncrn++;
> >>
> >> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >> testcase
> >> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >> 32-bit-aligned int instead, which works as previously.
> >>
> >> Passing the same members of that struct in a non-vargs call, works ok -
> >> I think
> >> because these use the type of the declared parameters, rather than the
> >> provided
> >> arguments, and the former do not have the increased alignment from
> >> build_ref_for_offset.
> >
> > It doesn't make sense to use the alignment of passed values.  That 
> > looks like bs.
> >
> > This means that
> >
> > Int I __aligned__(8);
> >
> > Is passed differently than int.
> >
> > Arm_function_arg needs to be fixed.
> 
>  That is,
> 
>  typedef int myint __attribute__((aligned(8)));
> 
>  int main()
>  {
>    myint i = 1;
>    int j = 2;
>    __builtin_printf("%d %d\n", i, j);
>  }
> 
>  or
> 
>  myint i;
>  int j;
>  myint *p = &i;
>  int *q = &j;
> 
>  int main()
>  {
>    __builtin_printf("%d %d", *p, *q);
>  }
> 
>  should behave the same.  There isn't a printf modifier for an "aligned 
>  int"
>  because that sort of thing doesn't make sense.  Special-casing aligned 
>  vs.
>  non-aligned values only makes sense for things passed by value on the 
>  stack.
>  And then obviously only dependent on the functuion type signature, not
> 

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:36, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 11:00, Richard Biener wrote:
> >>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>>
>  On 31/03/15 08:50, Richard Biener wrote:
> > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> > wrote:
> >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
> >>  wrote:
> >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>> it.
> >>>
> >>> The problem appears to be in laying out arguments, specifically
> >>> varargs. From
> >>> the "good" -fdump-rtl-expand:
> >>>
> >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S4 A32])
> >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>> (reg:SI 118)) reduced.c:14 -1
> >>>  (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>> (set (reg:SI 0 r0)
> >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> ) [0 __builtin_printf 
> >>> S4
> >>> A32])
> >>>
> >>> The struct members are
> >>> reg:SI 113 => int a;
> >>> reg:DF 112 => double b;
> >>> reg:SI 111 => int c;
> >>>
> >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>> pushed
> >>> into virtual-outgoing-args. In contrast, post-change to
> >>> build_ref_of_offset, we get:
> >>>
> >>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   >>> *.LC1>)) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>> virtual-outgoing-args)
> >>> (const_int 8 [0x8])) [0  S4 A64])
> >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S8 A64])
> >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>  (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>> (reg:SI 118)) reduced.c:14 -1
> >>>  (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>> (set (reg:SI 0 r0)
> >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> ) [0 __builtin_printf 
> >>> S4
> >>> A32])
> >>>
> >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and 
> >>> the
> >>>
> >>> double+following int are all pushed into virtual-outgoing-args. This 
> >>> is
> >>> because
> >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>> argument (the
> >>> type constructed by build_ref_for_offset); it then executes
> >>> (aapcs_layout_arg,
> >>> arm.c line ~~5914)
> >>>
> >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>  next even number.  */
> >>>   ncrn = pcum->aapcs_ncrn;
> >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>> ncrn++;
> >>>
> >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>> testcase
> >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>> 32-bit-aligned int instead, which works as previously.
> >>>
> >>> Passing the same members of that struct in a non-vargs call, works ok 
> >>> -
> >>> I think
> >>> because these use the type of the declared parameters, rather than the
> >>> provided
> >>> arguments, and the former do not have the increased alignment from
> >>> build_ref_for_offset.
> >>
> >> It doesn't make sense to use the alignment of passed values.  That 
> >> looks like bs.
> >>
> >> This means that
> >>
> >> Int I __aligned__(8);
> >>
> >> Is passed differently than int.
> >>
> >> Arm_function_arg needs to be fixed.
> >
> > That is,
> >
> > typedef int myint __attribute__((aligned(8)));
> >
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", i, j);
> > }
> >
> > or
> >
> > myint i;
> > int j;
> > myint *p = &i;
> > int *q = &j;
> >
> > int main()
> > {
> >   __builtin_printf("%d %d", *p, *q);
> > }
> >
> > should behave the same.  There isn't a printf modifier for an "aligned 
> > int"
> > because that sort of th

Re: New regression on ARM Linux

2015-03-31 Thread Alan Lawrence

Richard Biener wrote:


But I find it odd that on ARM passing *((aligned_int *)p) as
vararg (only as varargs?) changes calling conventions independent
of the functions type signature.


Does it? Do you have a testcase, and compilation flags, that'll make this show 
up in an RTL dump? I've tried numerous cases, including AFAICT yours, and I 
always get the value being passed in the expected ("unaligned") register?


Cheers, Alan



Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 11:45, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:36, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
 On 31/03/15 11:00, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>
>> On 31/03/15 08:50, Richard Biener wrote:
>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
>>> wrote:
 On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
  wrote:
> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> it.
>
> The problem appears to be in laying out arguments, specifically
> varargs. From
> the "good" -fdump-rtl-expand:
>
> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> S4 A32])
> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>  (nil))
> (insn 19 18 20 2 (set (reg:DF 2 r2)
> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>  (nil))
> (insn 20 19 21 2 (set (reg:SI 1 r1)
> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>  (nil))
> (insn 21 20 22 2 (set (reg:SI 0 r0)
> (reg:SI 118)) reduced.c:14 -1
>  (nil))
> (call_insn 22 21 23 2 (parallel [
> (set (reg:SI 0 r0)
> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> ) [0 __builtin_printf 
> S4
> A32])
>
> The struct members are
> reg:SI 113 => int a;
> reg:DF 112 => double b;
> reg:SI 111 => int c;
>
> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> pushed
> into virtual-outgoing-args. In contrast, post-change to
> build_ref_of_offset, we get:
>
> (insn 17 16 18 2 (set (reg:SI 118)
>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   *.LC1>)) reduced.c:14 -1
>  (nil))
> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> virtual-outgoing-args)
> (const_int 8 [0x8])) [0  S4 A64])
> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>  (nil))
> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> S8 A64])
> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>  (nil))
> (insn 20 19 21 2 (set (reg:SI 2 r2)
> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>  (nil))
> (insn 21 20 22 2 (set (reg:SI 0 r0)
> (reg:SI 118)) reduced.c:14 -1
>  (nil))
> (call_insn 22 21 23 2 (parallel [
> (set (reg:SI 0 r0)
> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> ) [0 __builtin_printf 
> S4
> A32])
>
> r0 still gets the format string, but 'int b1.a' now goes in r2, and 
> the
>
> double+following int are all pushed into virtual-outgoing-args. This 
> is
> because
> arm_function_arg is fed a 64-bit-aligned int as type of the second
> argument (the
> type constructed by build_ref_for_offset); it then executes
> (aapcs_layout_arg,
> arm.c line ~~5914)
>
>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>  next even number.  */
>   ncrn = pcum->aapcs_ncrn;
>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> ncrn++;
>
> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> testcase
> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> 32-bit-aligned int instead, which works as previously.
>
> Passing the same members of that struct in a non-vargs call, works ok 
> -
> I think
> because these use the type of the declared parameters, rather than the
> provided
> arguments, and the former do not have the increased alignment from
> build_ref_for_offset.

 It doesn't make sense to use the alignment of passed values.  That 
 looks like bs.

 This means that

 Int I __aligned__(8);

 Is passed differently than int.

 Arm_function_arg needs to be fixed.
>>>
>>> That is,
>>>
>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", i, j);
>>> }
>>>
>>> or
>>>
>>> myint i;
>>> int j;
>>> myint *p = &i;
>>> int *q = &j;
>>>
>>> int main()
>>> {
>>>   __builtin_printf("%d %d", *p, *q);
>>> }
>>>
>>> should behave the same.  There isn't a printf modifier for an "aligned 

Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 11:44, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:20, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Biener wrote:
>>>
 On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 08:50, Richard Biener wrote:
>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
>> wrote:
>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>>>  wrote:
 -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
 it.

 The problem appears to be in laying out arguments, specifically
 varargs. From
 the "good" -fdump-rtl-expand:

 (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
 S4 A32])
 (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
  (nil))
 (insn 19 18 20 2 (set (reg:DF 2 r2)
 (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
  (nil))
 (insn 20 19 21 2 (set (reg:SI 1 r1)
 (reg:SI 113 [ b1 ])) reduced.c:14 -1
  (nil))
 (insn 21 20 22 2 (set (reg:SI 0 r0)
 (reg:SI 118)) reduced.c:14 -1
  (nil))
 (call_insn 22 21 23 2 (parallel [
 (set (reg:SI 0 r0)
 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
 ) [0 __builtin_printf S4
 A32])

 The struct members are
 reg:SI 113 => int a;
 reg:DF 112 => double b;
 reg:SI 111 => int c;

 r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
 pushed
 into virtual-outgoing-args. In contrast, post-change to
 build_ref_of_offset, we get:

 (insn 17 16 18 2 (set (reg:SI 118)
   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  >>> *.LC1>)) reduced.c:14 -1
  (nil))
 (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
 virtual-outgoing-args)
 (const_int 8 [0x8])) [0  S4 A64])
 (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
  (nil))
 (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
 S8 A64])
 (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
  (nil))
 (insn 20 19 21 2 (set (reg:SI 2 r2)
 (reg:SI 113 [ b1 ])) reduced.c:14 -1
  (nil))
 (insn 21 20 22 2 (set (reg:SI 0 r0)
 (reg:SI 118)) reduced.c:14 -1
  (nil))
 (call_insn 22 21 23 2 (parallel [
 (set (reg:SI 0 r0)
 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
 ) [0 __builtin_printf S4
 A32])

 r0 still gets the format string, but 'int b1.a' now goes in r2, and the

 double+following int are all pushed into virtual-outgoing-args. This is
 because
 arm_function_arg is fed a 64-bit-aligned int as type of the second
 argument (the
 type constructed by build_ref_for_offset); it then executes
 (aapcs_layout_arg,
 arm.c line ~~5914)

   /* C3 - For double-word aligned arguments, round the NCRN up to the
  next even number.  */
   ncrn = pcum->aapcs_ncrn;
   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
 ncrn++;

 Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
 testcase
 "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
 32-bit-aligned int instead, which works as previously.

 Passing the same members of that struct in a non-vargs call, works ok -
 I think
 because these use the type of the declared parameters, rather than the
 provided
 arguments, and the former do not have the increased alignment from
 build_ref_for_offset.
>>>
>>> It doesn't make sense to use the alignment of passed values.  That 
>>> looks like bs.
>>>
>>> This means that
>>>
>>> Int I __aligned__(8);
>>>
>>> Is passed differently than int.
>>>
>>> Arm_function_arg needs to be fixed.
>>
>> That is,
>>
>> typedef int myint __attribute__((aligned(8)));
>>
>> int main()
>> {
>>   myint i = 1;
>>   int j = 2;
>>   __builtin_printf("%d %d\n", i, j);
>> }
>>
>> or
>>
>> myint i;
>> int j;
>> myint *p = &i;
>> int *q = &j;
>>
>> int main()
>> {
>>   __builtin_printf("%d %d", *p, *q);
>> }
>>
>> should behave the same.  There isn't a printf modifier for an "aligned 
>> int"
>> because that sort of thing doesn't make sense.  Special-casing aligned 
>> vs.
>> non-aligned values only makes sense for things passed by value on the 
>> stack.
>> And then obviously only dependen

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Alan Lawrence wrote:

> Richard Biener wrote:
> > 
> > But I find it odd that on ARM passing *((aligned_int *)p) as
> > vararg (only as varargs?) changes calling conventions independent
> > of the functions type signature.
> 
> Does it? Do you have a testcase, and compilation flags, that'll make this show
> up in an RTL dump? I've tried numerous cases, including AFAICT yours, and I
> always get the value being passed in the expected ("unaligned") register?

Yep, it seems that loading from p with the following has a value of
type 'int', not 'myint'.

typedef int myint __attribute__((aligned(8)));
myint i;
myint *p = &i;

Well, the gimplifier does that:

/* Create a temporary with a name derived from VAL.  Subroutine of
   lookup_tmp_var; nobody else should call this function.  */

static inline tree
create_tmp_from_val (tree val)
{
  /* Drop all qualifiers and address-space information from the value 
type.  */
  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (val));
  tree var = create_tmp_var (type, get_name (val));

it even drops address-space info which looks suspicious to me for

int *p [addr-space:X];
int * [addr-space:X] *q = &p;

and

 **q;

it would generate

 int *tem = *q;
 int tem = *tem;

and thus drop the addr-space from the 2nd load.  Similar for atomics
if they are also on the variant chain (ah, but those are lowered
early, so it doesn't matter for them).

So you indeed need a testcase where the compiler creates a temporary
of such type (or you need a type the gimplifier doesn't consider
a register type).

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


Re: New regression on ARM Linux

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> Richard Biener wrote:
> >
> >But I find it odd that on ARM passing *((aligned_int *)p) as
> >vararg (only as varargs?) changes calling conventions independent
> >of the functions type signature.
> 
> Does it? Do you have a testcase, and compilation flags, that'll make this
> show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> and I always get the value being passed in the expected ("unaligned")
> register?

If the integral type alignment right now matters, I'd try something like:

typedef int V __attribute__((aligned (8)));
V x;

int foo (int x, ...)
{
  int z;
  __builtin_va_list va;
  __builtin_va_start (va, x);
  switch (x)
{
case 1:
case 3:
case 6:
  z = __builtin_va_arg (va, int);
  break;
default:
  z = __builtin_va_arg (va, V);
  break;
}
  __builtin_va_end (va);
  return z;
}

int
bar (void)
{
  V v = 3;
  int w = 3;
  foo (1, (int) v);
  foo (2, (V) w);
  v = 3;
  w = (int) v;
  foo (3, w);
  foo (4, (V) w);
  v = (V) w;
  foo (5, v);
  foo (6, (int) v);
  foo (7, x);
  return 0;
}

(of course, most likely with passing a different value each time and
verification of the result).
As the compiler treats all those casts there as useless, I'd expect
that the types of the passed argument would be pretty much random.
And, note that even on x86_64, the __builtin_va_arg with V expands into
  # addr.1_3 = PHI 
  z_35 = MEM[(V * {ref-all})addr.1_3];
using exactly the same address for int as well as V va_arg - if you increase
the overalignment arbitrarily, it will surely be a wrong IL because nobody
really guarantees anything about the overalignment.

So, I think the tree-sra.c patch is a good idea - try to keep using the main
type variants as the types in the IL where possible except for the MEM_REF
first argument (i.e. even the lhs of the load should IMHO not be
overaligned).

As Eric Botcazou said, GCC right now isn't really prepared for under or
overaligned scalars, only when they are in structs (or for middle-end in
*MEM_REFs).

Jakub


Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:45, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 11:36, Richard Biener wrote:
> >>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>>
>  On 31/03/15 11:00, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >
> >> On 31/03/15 08:50, Richard Biener wrote:
> >>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> >>> wrote:
>  On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>   wrote:
> > -O2 was what I first used; it also occurs at -O1. -fno-tree-sra 
> > fixes
> > it.
> >
> > The problem appears to be in laying out arguments, specifically
> > varargs. From
> > the "good" -fdump-rtl-expand:
> >
> > (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) 
> > [0
> > S4 A32])
> > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >  (nil))
> > (insn 19 18 20 2 (set (reg:DF 2 r2)
> > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >  (nil))
> > (insn 20 19 21 2 (set (reg:SI 1 r1)
> > (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >  (nil))
> > (insn 21 20 22 2 (set (reg:SI 0 r0)
> > (reg:SI 118)) reduced.c:14 -1
> >  (nil))
> > (call_insn 22 21 23 2 (parallel [
> > (set (reg:SI 0 r0)
> > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > ) [0 
> > __builtin_printf S4
> > A32])
> >
> > The struct members are
> > reg:SI 113 => int a;
> > reg:DF 112 => double b;
> > reg:SI 111 => int c;
> >
> > r0 gets the format string; r1 gets int a; r2+r3 get double b; int c 
> > is
> > pushed
> > into virtual-outgoing-args. In contrast, post-change to
> > build_ref_of_offset, we get:
> >
> > (insn 17 16 18 2 (set (reg:SI 118)
> >   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]   > 0x2ba57fa8d750
> > *.LC1>)) reduced.c:14 -1
> >  (nil))
> > (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> > virtual-outgoing-args)
> > (const_int 8 [0x8])) [0  S4 A64])
> > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >  (nil))
> > (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) 
> > [0
> > S8 A64])
> > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >  (nil))
> > (insn 20 19 21 2 (set (reg:SI 2 r2)
> > (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >  (nil))
> > (insn 21 20 22 2 (set (reg:SI 0 r0)
> > (reg:SI 118)) reduced.c:14 -1
> >  (nil))
> > (call_insn 22 21 23 2 (parallel [
> > (set (reg:SI 0 r0)
> > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > ) [0 
> > __builtin_printf S4
> > A32])
> >
> > r0 still gets the format string, but 'int b1.a' now goes in r2, and 
> > the
> >
> > double+following int are all pushed into virtual-outgoing-args. 
> > This is
> > because
> > arm_function_arg is fed a 64-bit-aligned int as type of the second
> > argument (the
> > type constructed by build_ref_for_offset); it then executes
> > (aapcs_layout_arg,
> > arm.c line ~~5914)
> >
> >   /* C3 - For double-word aligned arguments, round the NCRN up to 
> > the
> >  next even number.  */
> >   ncrn = pcum->aapcs_ncrn;
> >   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> > ncrn++;
> >
> > Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> > testcase
> > "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be 
> > fed a
> > 32-bit-aligned int instead, which works as previously.
> >
> > Passing the same members of that struct in a non-vargs call, works 
> > ok -
> > I think
> > because these use the type of the declared parameters, rather than 
> > the
> > provided
> > arguments, and the former do not have the increased alignment from
> > build_ref_for_offset.
> 
>  It doesn't make sense to use the alignment of passed values.  That 
>  looks like bs.
> 
>  This means that
> 
>  Int I __aligned__(8);
> 
>  Is passed differently than int.
> 
>  Arm_function_arg needs to be fixed.
> >>>
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>

Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Jakub Jelinek wrote:

> On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> > Richard Biener wrote:
> > >
> > >But I find it odd that on ARM passing *((aligned_int *)p) as
> > >vararg (only as varargs?) changes calling conventions independent
> > >of the functions type signature.
> > 
> > Does it? Do you have a testcase, and compilation flags, that'll make this
> > show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> > and I always get the value being passed in the expected ("unaligned")
> > register?
> 
> If the integral type alignment right now matters, I'd try something like:
> 
> typedef int V __attribute__((aligned (8)));
> V x;
> 
> int foo (int x, ...)
> {
>   int z;
>   __builtin_va_list va;
>   __builtin_va_start (va, x);
>   switch (x)
> {
> case 1:
> case 3:
> case 6:
>   z = __builtin_va_arg (va, int);
>   break;
> default:
>   z = __builtin_va_arg (va, V);
>   break;
> }
>   __builtin_va_end (va);
>   return z;
> }
> 
> int
> bar (void)
> {
>   V v = 3;
>   int w = 3;
>   foo (1, (int) v);
>   foo (2, (V) w);
>   v = 3;
>   w = (int) v;
>   foo (3, w);
>   foo (4, (V) w);
>   v = (V) w;
>   foo (5, v);
>   foo (6, (int) v);
>   foo (7, x);
>   return 0;
> }
> 
> (of course, most likely with passing a different value each time and
> verification of the result).
> As the compiler treats all those casts there as useless, I'd expect
> that the types of the passed argument would be pretty much random.
> And, note that even on x86_64, the __builtin_va_arg with V expands into
>   # addr.1_3 = PHI 
>   z_35 = MEM[(V * {ref-all})addr.1_3];
> using exactly the same address for int as well as V va_arg - if you increase
> the overalignment arbitrarily, it will surely be a wrong IL because nobody
> really guarantees anything about the overalignment.
> 
> So, I think the tree-sra.c patch is a good idea - try to keep using the main
> type variants as the types in the IL where possible except for the MEM_REF
> first argument (i.e. even the lhs of the load should IMHO not be
> overaligned).

Yeah, I'm testing it right now as it seems to fix the regression and
should be certainly safe.

Richard.

> As Eric Botcazou said, GCC right now isn't really prepared for under or
> overaligned scalars, only when they are in structs (or for middle-end in
> *MEM_REFs).
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


[patch emutls]:Fix PR 65566

2015-03-31 Thread Kai Tietz
Hi,

This patch avoids that we try to operate on function-decl's cfun equal
to NULL within lower_emutls_function_body.

ChangeLog

2015-03-31  Kai Tietz  

PR target/65566
* tree-emutls.c (lower_emutls_function_body): Don't try to
operate on node's decl function is NULL.

Ok for apply?

Regards,
Kai

Index: tree-emutls.c
===
--- tree-emutls.c   (Revision 221789)
+++ tree-emutls.c   (Arbeitskopie)
@@ -635,6 +635,12 @@ lower_emutls_function_body (struct cgraph_node *no

   push_cfun (DECL_STRUCT_FUNCTION (node->decl));

+  if (!cfun)
+{
+  pop_cfun ();
+  return;
+}
+
   d.cfun_node = node;
   d.builtin_decl = builtin_decl_explicit (BUILT_IN_EMUTLS_GET_ADDRESS);
   /* This is where we introduce the declaration to the IL and so we have to


Re: [patch emutls]:Fix PR 65566

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 01:35:56PM +0200, Kai Tietz wrote:
> Hi,
> 
> This patch avoids that we try to operate on function-decl's cfun equal
> to NULL within lower_emutls_function_body.

If DECL_STRUCT_FUNCTION (node->decl) is already NULL (for which functions?),
then what is the point doing push_cfun/pop_cfun in that case at all?
Shouldn't you just early return if DECL_STRUCT_FUNCTION (node->decl) is
NULL?
> 
> ChangeLog
> 
> 2015-03-31  Kai Tietz  
> 
> PR target/65566
> * tree-emutls.c (lower_emutls_function_body): Don't try to
> operate on node's decl function is NULL.
> 
> Ok for apply?
> 
> Regards,
> Kai
> 
> Index: tree-emutls.c
> ===
> --- tree-emutls.c   (Revision 221789)
> +++ tree-emutls.c   (Arbeitskopie)
> @@ -635,6 +635,12 @@ lower_emutls_function_body (struct cgraph_node *no
> 
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> 
> +  if (!cfun)
> +{
> +  pop_cfun ();
> +  return;
> +}
> +
>d.cfun_node = node;
>d.builtin_decl = builtin_decl_explicit (BUILT_IN_EMUTLS_GET_ADDRESS);
>/* This is where we introduce the declaration to the IL and so we have to

Jakub


Re: Fix ice on comdat groups with -check-pointer-bounds

2015-03-31 Thread Ilya Enkovich
2015-03-30 20:20 GMT+03:00 Jan Hubicka :
>> On 30 Mar 14:05, Ilya Enkovich wrote:
>> > 2015-03-27 18:23 GMT+03:00 Jan Hubicka :
>> > > Index: symtab.c
>> > > ===
>> > > --- symtab.c(revision 221734)
>> > > +++ symtab.c(working copy)
>> > > @@ -1130,15 +1130,20 @@ symtab_node::verify_symtab_nodes (void)
>> > >   &existed);
>> > >   if (!existed)
>> > > *entry = node;
>> > > - else
>> > > -   for (s = (*entry)->same_comdat_group; s != NULL && s != 
>> > > node; s = s->same_comdat_group)
>> > > + else if (!DECL_EXTERNAL (node->decl))
>> > > +   {
>> > > + for (s = (*entry)->same_comdat_group; s != NULL && s != 
>> > > node;
>> > > +  s = s->same_comdat_group)
>> > > +   ;
>> >
>> > With no if-statement in the loop body you need an additional exit
>> > condition for a case when you reach the entry.
>> >
>> > Thanks,
>> > Ilya
>> >
>>
>> Here is a patch to add a testcase, fix the loop and avoid same_comdat_group 
>> for instrumented external symbols.  Does it look OK?
>
> OK
>>
>> BTW should we check same_comdat_group is NULL for external symbols?
>
> We could do that, yes.  If you can come with a patch, it is OK for next 
> stage1.

Bootstrap fails with such check. Seems we may have same_comdat_group
for external symbols with body.

Ilya

>
> Honza


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Marek Polacek
On Tue, Mar 24, 2015 at 01:14:50AM -0400, Jason Merrill wrote:

Here's my shot at this.

> The problem is that the type is considered dependent in a template but is
> not actually dependent, so we can see the exact same type outside a template

Yeah, I think this is true...

> and it's not dependent.  So, this code is creating the difference:
 
> >  /* We can only call value_dependent_expression_p on integral constant
> > expressions; treat non-constant expressions as dependent, too.  */
> >  if (processing_template_decl
> >  && (type_dependent_expression_p (size)
> >  || !TREE_CONSTANT (size) || value_dependent_expression_p (size)))
> 
> Now that we have instantiation_dependent_expression_p, we should be able to
> use that instead of checking type/value dependency separately.

...but I think there's another place where things go wrong.  ISTM that in
build_cplus_array_type we consider all arrays with non-constant index as
dependent (when processing_template_decl) -- but as the testcase shows, this
is not always true.  The fix then could look like the following, though I
wouldn't be surprised if this was a wrong way how to go about this.

Bootstrapped/regtested on x86_64-linux.  Not a regression, so we might want to
defer this patch to the next stage1.

2015-03-31  Marek Polacek  

PR c++/65390
* tree.c (build_cplus_array_type): Use dependent_type_p rather than
checking for constness.

* g++.dg/template/pr65390.C: New test.

diff --git gcc/cp/tree.c gcc/cp/tree.c
index ef53aff..97bccc0 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -822,10 +822,9 @@ build_cplus_array_type (tree elt_type, tree index_type)
   if (elt_type == error_mark_node || index_type == error_mark_node)
 return error_mark_node;
 
-  bool dependent
-= (processing_template_decl
-   && (dependent_type_p (elt_type)
-  || (index_type && !TREE_CONSTANT (TYPE_MAX_VALUE (index_type);
+  bool dependent = (processing_template_decl
+   && (dependent_type_p (elt_type)
+   || (index_type && dependent_type_p (index_type;
 
   if (elt_type != TYPE_MAIN_VARIANT (elt_type))
 /* Start with an array of the TYPE_MAIN_VARIANT.  */
diff --git gcc/testsuite/g++.dg/template/pr65390.C 
gcc/testsuite/g++.dg/template/pr65390.C
index e69de29..299d22a 100644
--- gcc/testsuite/g++.dg/template/pr65390.C
+++ gcc/testsuite/g++.dg/template/pr65390.C
@@ -0,0 +1,12 @@
+// PR c++/65390
+// { dg-do compile }
+// { dg-options "" }
+
+template struct shared_ptr { };
+
+template
+shared_ptr make_shared(Arg) { return shared_ptr(); } // { dg-error 
"variably modified type|trying to instantiate" }
+
+void f(int n){
+  make_shared(1); // { dg-error "no matching function" }
+}

Marek


Re: New regression on ARM Linux

2015-03-31 Thread Alan Lawrence

Richard Earnshaw wrote:

On 31/03/15 11:45, Richard Biener wrote:

On Tue, 31 Mar 2015, Richard Earnshaw wrote:


On 31/03/15 11:36, Richard Biener wrote:

On Tue, 31 Mar 2015, Richard Earnshaw wrote:


On 31/03/15 11:00, Richard Biener wrote:

On Tue, 31 Mar 2015, Richard Earnshaw wrote:


On 31/03/15 08:50, Richard Biener wrote:

On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  wrote:

On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence  
wrote:

-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
it.

The problem appears to be in laying out arguments, specifically
varargs. From
the "good" -fdump-rtl-expand:

(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
S4 A32])
(reg:SI 111 [ b1$16 ])) reduced.c:14 -1
 (nil))
(insn 19 18 20 2 (set (reg:DF 2 r2)
(reg:DF 112 [ b1$8 ])) reduced.c:14 -1
 (nil))
(insn 20 19 21 2 (set (reg:SI 1 r1)
(reg:SI 113 [ b1 ])) reduced.c:14 -1
 (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
(reg:SI 118)) reduced.c:14 -1
 (nil))
(call_insn 22 21 23 2 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
) [0 __builtin_printf S4
A32])

The struct members are
reg:SI 113 => int a;
reg:DF 112 => double b;
reg:SI 111 => int c;

r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
pushed
into virtual-outgoing-args. In contrast, post-change to
build_ref_of_offset, we get:

(insn 17 16 18 2 (set (reg:SI 118)
  (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  )) reduced.c:14 -1
 (nil))
(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
virtual-outgoing-args)
(const_int 8 [0x8])) [0  S4 A64])
(reg:SI 111 [ b1$16 ])) reduced.c:14 -1
 (nil))
(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
S8 A64])
(reg:DF 112 [ b1$8 ])) reduced.c:14 -1
 (nil))
(insn 20 19 21 2 (set (reg:SI 2 r2)
(reg:SI 113 [ b1 ])) reduced.c:14 -1
 (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
(reg:SI 118)) reduced.c:14 -1
 (nil))
(call_insn 22 21 23 2 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
) [0 __builtin_printf S4
A32])

r0 still gets the format string, but 'int b1.a' now goes in r2, and the

double+following int are all pushed into virtual-outgoing-args. This is
because
arm_function_arg is fed a 64-bit-aligned int as type of the second
argument (the
type constructed by build_ref_for_offset); it then executes
(aapcs_layout_arg,
arm.c line ~~5914)

  /* C3 - For double-word aligned arguments, round the NCRN up to the
 next even number.  */
  ncrn = pcum->aapcs_ncrn;
  if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
ncrn++;

Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
testcase
"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
32-bit-aligned int instead, which works as previously.

Passing the same members of that struct in a non-vargs call, works ok -
I think
because these use the type of the declared parameters, rather than the
provided
arguments, and the former do not have the increased alignment from
build_ref_for_offset.

It doesn't make sense to use the alignment of passed values.  That looks like 
bs.

This means that

Int I __aligned__(8);

Is passed differently than int.

Arm_function_arg needs to be fixed.

That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.  Special-casing aligned vs.
non-aligned values only makes sense for things passed by value on the stack.
And then obviously only dependent on the functuion type signature, not
on the type of the passed value.


I think the testcase is ill-formed.  Just because printf doesn't have
such a modifier, doesn't mean that another variadic function might not
have a means to detect when an object in the variadic list needs to be
over-aligned.  As such, the test should really be written as:

A value doesn't have "alignment".  A function may have alignment
requirements on its arguments, clearly printf doesn't.


Values don't.  But types do and variadic functions are special in that
they derive their types from the types of the actual parameters passed
not from the formals in the prototype.  Any manipulation of the types
should be done in the front end, not in the back end.

The following seems to help the testcase (by luck I'd say?).  It
makes us drop alignment information from the temporaries that
SRA creates as memory replacement.

But I find it odd that on ARM passing *((aligned_int *)p) as
vararg (only as varargs?) changes calling conventions independent
of the functions type signature.

Ric

Re: New regression on ARM Linux

2015-03-31 Thread Richard Earnshaw
On 31/03/15 12:08, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:45, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
 On 31/03/15 11:36, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>
>> On 31/03/15 11:00, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
 On 31/03/15 08:50, Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener  
> wrote:
>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>>  wrote:
>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra 
>>> fixes
>>> it.
>>>
>>> The problem appears to be in laying out arguments, specifically
>>> varargs. From
>>> the "good" -fdump-rtl-expand:
>>>
>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) 
>>> [0
>>> S4 A32])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 
>>> __builtin_printf S4
>>> A32])
>>>
>>> The struct members are
>>> reg:SI 113 => int a;
>>> reg:DF 112 => double b;
>>> reg:SI 111 => int c;
>>>
>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c 
>>> is
>>> pushed
>>> into virtual-outgoing-args. In contrast, post-change to
>>> build_ref_of_offset, we get:
>>>
>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  >> 0x2ba57fa8d750
>>> *.LC1>)) reduced.c:14 -1
>>>  (nil))
>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>> virtual-outgoing-args)
>>> (const_int 8 [0x8])) [0  S4 A64])
>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) 
>>> [0
>>> S8 A64])
>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>  (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>> (reg:SI 118)) reduced.c:14 -1
>>>  (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>> (set (reg:SI 0 r0)
>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> ) [0 
>>> __builtin_printf S4
>>> A32])
>>>
>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and 
>>> the
>>>
>>> double+following int are all pushed into virtual-outgoing-args. 
>>> This is
>>> because
>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>> argument (the
>>> type constructed by build_ref_for_offset); it then executes
>>> (aapcs_layout_arg,
>>> arm.c line ~~5914)
>>>
>>>   /* C3 - For double-word aligned arguments, round the NCRN up to 
>>> the
>>>  next even number.  */
>>>   ncrn = pcum->aapcs_ncrn;
>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>> ncrn++;
>>>
>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>> testcase
>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be 
>>> fed a
>>> 32-bit-aligned int instead, which works as previously.
>>>
>>> Passing the same members of that struct in a non-vargs call, works 
>>> ok -
>>> I think
>>> because these use the type of the declared parameters, rather than 
>>> the
>>> provided
>>> arguments, and the former do not have the increased alignment from
>>> build_ref_for_offset.
>>
>> It doesn't make sense to use the alignment of passed values.  That 
>> looks like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
>
> That is,
>
> typedef int 

[PATCH] S390: Nested functions can be hotpatched like all other functions.

2015-03-31 Thread Dominik Vogt
The attached patch removes the special handling for nested
functions regarding the hotpatch feature on S390, i.e. nested
functions are made hotpatchable by default.

This also fixes a bug that caused part of the hotpatch prologue
being generated for nested functions.  A corresponding test case
is added.

The patch also cleans up the source code comments of the hotpatch
feature.

The patch is provided for 5.0 and as backports for 4.8 and 4.9.
For some reason 4.8 was missing a couple minor changes from 5.0,
so the first patch for 4.8 corrects this, and the real change is
added on top of that.

--

Accidentally, we have committed this patch before posting it.
Sorry for that.

Common ChangeLog:
-

gcc/ChangeLog:

* config/s390/s390.c (s390_function_num_hotpatch_hw): Allow hotpatching
nested functions.
(s390_reorg): Adapt to new signature of s390_function_num_hotpatch_hw.
(s390_asm_output_function_label): Adapt to new signature of
s390_function_num_hotpatch_hw
Optimise the code generating assembler output.
Add comments to assembler file.

gcc/testsuite/ChangeLog:

* gcc.target/s390/hotpatch-25.c: New test.
* gcc.target/s390/hotpatch-1.c: Update test.
* gcc.target/s390/hotpatch-10.c: Update test.
* gcc.target/s390/hotpatch-11.c: Update test.
* gcc.target/s390/hotpatch-12.c: Update test.
* gcc.target/s390/hotpatch-13.c: Update test.
* gcc.target/s390/hotpatch-14.c: Update test.
* gcc.target/s390/hotpatch-15.c: Update test.
* gcc.target/s390/hotpatch-16.c: Update test.
* gcc.target/s390/hotpatch-17.c: Update test.
* gcc.target/s390/hotpatch-18.c: Update test.
* gcc.target/s390/hotpatch-19.c: Update test.
* gcc.target/s390/hotpatch-2.c: Update test.
* gcc.target/s390/hotpatch-21.c: Update test.
* gcc.target/s390/hotpatch-22.c: Update test.
* gcc.target/s390/hotpatch-23.c: Update test.
* gcc.target/s390/hotpatch-24.c: Update test.
* gcc.target/s390/hotpatch-3.c: Update test.
* gcc.target/s390/hotpatch-4.c: Update test.
* gcc.target/s390/hotpatch-5.c: Update test.
* gcc.target/s390/hotpatch-6.c: Update test.
* gcc.target/s390/hotpatch-7.c: Update test.
* gcc.target/s390/hotpatch-8.c: Update test.
* gcc.target/s390/hotpatch-9.c: Update test.
* gcc.target/s390/hotpatch-compile-16.c: Update test.

ChangeLog for the additional 4.8 changes:
-

gcc/ChangeLog:

* config/s390/s390.c (s390_function_num_hotpatch_hw): Remove special
cases for not hotpatching main () and artificial functions.

gcc/testsuite/ChangeLog:

* gcc.target/s390/hotpatch-compile-16.c: Remove include of stdio.h.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From a5947f5201b8d1ef7b52cf2cf02cac8365005c77 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 27 Mar 2015 11:41:56 +0100
Subject: [PATCH] S390: Nested functions can be hotpatched like all other
 functions.

Also add comments to assembler files to mark the code inserted for hotpatching.
---
 gcc/config/s390/s390.c | 53 ++
 gcc/testsuite/gcc.target/s390/hotpatch-1.c |  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-10.c|  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-11.c|  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-12.c|  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-13.c|  4 ++
 gcc/testsuite/gcc.target/s390/hotpatch-14.c|  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-15.c|  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-16.c|  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-17.c|  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-18.c|  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-19.c|  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-2.c |  4 +-
 gcc/testsuite/gcc.target/s390/hotpatch-21.c|  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-22.c|  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-23.c|  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-24.c|  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-25.c| 33 ++
 gcc/testsuite/gcc.target/s390/hotpatch-3.c |  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-4.c |  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-5.c |  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-6.c |  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-7.c |  2 +
 gcc/testsuite/gcc.target/s390/hotpatch-8.c |  3 ++
 gcc/testsuite/gcc.target/s390/hotpatch-9.c |  2 +
 .../gcc.target/s390/hotpatch-compile-16.c  |  4 +-
 26 files changed, 110 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-25.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index d2b8704..7d16048 100644
--- a/gcc

Re: [patch emutls]:Fix PR 65566

2015-03-31 Thread Kai Tietz
2015-03-31 13:42 GMT+02:00 Jakub Jelinek :
> On Tue, Mar 31, 2015 at 01:35:56PM +0200, Kai Tietz wrote:
>> Hi,
>>
>> This patch avoids that we try to operate on function-decl's cfun equal
>> to NULL within lower_emutls_function_body.
>
> If DECL_STRUCT_FUNCTION (node->decl) is already NULL (for which functions?),
> then what is the point doing push_cfun/pop_cfun in that case at all?
> Shouldn't you just early return if DECL_STRUCT_FUNCTION (node->decl) is
> NULL?

In testcase for the function-decl "mpx_test" function-body is NULL.
...
addressable used static QI file thread-local-var-1-lbv.c line 28
col 5 align  8 context 
attributes  initial 
result 
ignored SI file thread-local-var-1-lbv.c line 28 col 5 size
 unit size 
align 32 context > chain
>

Fair point, so patch is:

Index: tree-emutls.c
===
--- tree-emutls.c   (Revision 221789)
+++ tree-emutls.c   (Arbeitskopie)
@@ -633,6 +633,9 @@ lower_emutls_function_body (struct cgraph_node *no
   struct lower_emutls_data d;
   bool any_edge_inserts = false;

+  if (!DECL_STRUCT_FUNCTION (node->decl))
+return;
+
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));

   d.cfun_node = node;


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Kai Tietz
Hi,

I had tried same approach as Marek.  For me it solved the PR, but
caused other regressions on boostrap.  So I dropped the way via
dependent_type_p.

Well, this bootstrap-issue might be caused by some local changes I had
forgot to remove, but I doubt it.
Marek, have you tried to do a boostrap with your patch?

Kai

2015-03-31 13:50 GMT+02:00 Marek Polacek :
> On Tue, Mar 24, 2015 at 01:14:50AM -0400, Jason Merrill wrote:
>
> Here's my shot at this.
>
>> The problem is that the type is considered dependent in a template but is
>> not actually dependent, so we can see the exact same type outside a template
>
> Yeah, I think this is true...
>
>> and it's not dependent.  So, this code is creating the difference:
>
>> >  /* We can only call value_dependent_expression_p on integral constant
>> > expressions; treat non-constant expressions as dependent, too.  */
>> >  if (processing_template_decl
>> >  && (type_dependent_expression_p (size)
>> >  || !TREE_CONSTANT (size) || value_dependent_expression_p (size)))
>>
>> Now that we have instantiation_dependent_expression_p, we should be able to
>> use that instead of checking type/value dependency separately.
>
> ...but I think there's another place where things go wrong.  ISTM that in
> build_cplus_array_type we consider all arrays with non-constant index as
> dependent (when processing_template_decl) -- but as the testcase shows, this
> is not always true.  The fix then could look like the following, though I
> wouldn't be surprised if this was a wrong way how to go about this.
>
> Bootstrapped/regtested on x86_64-linux.  Not a regression, so we might want to
> defer this patch to the next stage1.
>
> 2015-03-31  Marek Polacek  
>
> PR c++/65390
> * tree.c (build_cplus_array_type): Use dependent_type_p rather than
> checking for constness.
>
> * g++.dg/template/pr65390.C: New test.
>
> diff --git gcc/cp/tree.c gcc/cp/tree.c
> index ef53aff..97bccc0 100644
> --- gcc/cp/tree.c
> +++ gcc/cp/tree.c
> @@ -822,10 +822,9 @@ build_cplus_array_type (tree elt_type, tree index_type)
>if (elt_type == error_mark_node || index_type == error_mark_node)
>  return error_mark_node;
>
> -  bool dependent
> -= (processing_template_decl
> -   && (dependent_type_p (elt_type)
> -  || (index_type && !TREE_CONSTANT (TYPE_MAX_VALUE (index_type);
> +  bool dependent = (processing_template_decl
> +   && (dependent_type_p (elt_type)
> +   || (index_type && dependent_type_p (index_type;
>
>if (elt_type != TYPE_MAIN_VARIANT (elt_type))
>  /* Start with an array of the TYPE_MAIN_VARIANT.  */
> diff --git gcc/testsuite/g++.dg/template/pr65390.C 
> gcc/testsuite/g++.dg/template/pr65390.C
> index e69de29..299d22a 100644
> --- gcc/testsuite/g++.dg/template/pr65390.C
> +++ gcc/testsuite/g++.dg/template/pr65390.C
> @@ -0,0 +1,12 @@
> +// PR c++/65390
> +// { dg-do compile }
> +// { dg-options "" }
> +
> +template struct shared_ptr { };
> +
> +template
> +shared_ptr make_shared(Arg) { return shared_ptr(); } // { dg-error 
> "variably modified type|trying to instantiate" }
> +
> +void f(int n){
> +  make_shared(1); // { dg-error "no matching function" }
> +}
>
> Marek


Re: ipa-cp heuristic tweek

2015-03-31 Thread Ilya Enkovich
2015-03-29 18:43 GMT+03:00 Jan Hubicka :
> Hi,
> this patch improve crafty performance by avoiding ipa-cp clonning of
> Search function that specializes the first iteration of the recursion.
> The patch is by Martin, I only tested it and cleaned up code in count_callers
> and set_single_call_flag
>
> Bootstrapped/regtested x86_64-linux, comitted.
> PR ipa/65478
> * params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New.
> (PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
> * ipa-prop.h (ipa_node_params): New flags node_within_scc and
> node_calling_single_call.
> * ipa-cp.c (count_callers): New function.
> (set_single_call_flag): Likewise.
> (initialize_node_lattices): Count callers and set single_flag_call if
> necessary.
> (incorporate_penalties): New function.
> (good_cloning_opportunity_p): Use it, dump new flags.
> (propagate_constants_topo): Set node_within_scc flag if appropriate.
> * doc/invoke.texi (ipa-cp-recursion-penalty,
> ipa-cp-single-call-pentalty): Document.
> Index: params.def
> ===
> --- params.def  (revision 221757)
> +++ params.def  (working copy)
> @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD,
>   "beneficial to clone.",
>   500, 0, 0)
>
> +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY,
> + "ipa-cp-recursion-penalty",
> + "Percentage penalty the recursive functions will receive when they "
> + "are evaluated for cloning.",
> + 40, 0, 100)
> +
> +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY,
> + "ipa-cp-single-call-penalty",
> + "Percentage penalty functions containg a single call to another "
> + "function will receive when they are evaluated for cloning.",
> + 15, 0, 100)
> +
>  DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS,
>   "ipa-max-agg-items",
>   "Maximum number of aggregate content items for a parameter in "
> Index: ipa-prop.h
> ===
> --- ipa-prop.h  (revision 221757)
> +++ ipa-prop.h  (working copy)
> @@ -330,6 +330,10 @@ struct ipa_node_params
>/* Node has been completely replaced by clones and will be removed after
>   ipa-cp is finished.  */
>unsigned node_dead : 1;
> +  /* Node is involved in a recursion, potentionally indirect.  */
> +  unsigned node_within_scc : 1;
> +  /* Node is calling a private function called only once.  */
> +  unsigned node_calling_single_call : 1;
>  };
>
>  /* ipa_node_params access functions.  Please use these to access fields that
> Index: ipa-cp.c
> ===
> --- ipa-cp.c(revision 221757)
> +++ ipa-cp.c(working copy)
> @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p
>return ret;
>  }
>
> +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA
> +   points to by the number of callers to NODE.  */
> +
> +static bool
> +count_callers (cgraph_node *node, void *data)
> +{
> +  int *caller_count = (int *) data;
> +
> +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> +/* Local thunks can be handled transparently, but if the thunk can not
> +   be optimized out, count it as a real use.  */
> +if (!cs->caller->thunk.thunk_p || !cs->caller->local.local)
> +  ++*caller_count;
> +  return false;
> +}
> +
> +/* Worker of call_for_symbol_thunks_and_aliases, it is supposed to be called 
> on
> +   the one caller of some other node.  Set the caller's corresponding flag.  
> */
> +
> +static bool
> +set_single_call_flag (cgraph_node *node, void *)
> +{
> +  cgraph_edge *cs = node->callers;
> +  /* Local thunks can be handled transparently, skip them.  */
> +  while (cs && cs->caller->thunk.thunk_p && cs->caller->local.local)
> +cs = cs->next_caller;
> +  if (cs)
> +{
> +  gcc_assert (!cs->next_caller);

This assert assumes the only non-thunk caller is always at the end of
a callers list. Is it actually guaranteed?

> +  IPA_NODE_REF (cs->caller)->node_calling_single_call = true;
> +  return true;
> +}
> +  return false;
> +}
> +
>  /* Initialize ipcp_lattices.  */


Thanks,
Ilya


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Marek Polacek
On Tue, Mar 31, 2015 at 02:25:14PM +0200, Kai Tietz wrote:
> Hi,
> 
> I had tried same approach as Marek.  For me it solved the PR, but
> caused other regressions on boostrap.  So I dropped the way via
> dependent_type_p.
> 
> Well, this bootstrap-issue might be caused by some local changes I had
> forgot to remove, but I doubt it.
> Marek, have you tried to do a boostrap with your patch?

Of course, with --enable-languages=all.  I'll re-run the bootstrap with more
languages enabled, though.

Marek


Re: [patch emutls]:Fix PR 65566

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 02:21:23PM +0200, Kai Tietz wrote:
> addressable used static QI file thread-local-var-1-lbv.c line 28
> col 5 align  8 context 
> attributes  initial 
> result 
> ignored SI file thread-local-var-1-lbv.c line 28 col 5 size
>  unit size 
> align 32 context > chain
> >
> 
> Fair point, so patch is:
> 
> Index: tree-emutls.c
> ===
> --- tree-emutls.c   (Revision 221789)
> +++ tree-emutls.c   (Arbeitskopie)
> @@ -633,6 +633,9 @@ lower_emutls_function_body (struct cgraph_node *no
>struct lower_emutls_data d;
>bool any_edge_inserts = false;
> 
> +  if (!DECL_STRUCT_FUNCTION (node->decl))
> +return;
> +
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> 
>d.cfun_node = node;

E.g. ipa_pta_execute uses instead of the node->lowered guard
  FOR_EACH_DEFINED_FUNCTION (node)
{
  varinfo_t vi;
  /* Nodes without a body are not interesting.  Especially do not
 visit clones at this point for now - we get duplicate decls
 there for inline clones at least.  */
  if (!node->has_gimple_body_p () || node->global.inlined_to)
continue;
  node->get_body ();
Wonder why emutls is different, whether it is really desirable to adjust
already inlined functions, or functions without body and whether the
functions don't need to be read for LTO.
So I think you want Honza or Richi to look at this.

Jakub


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Marek Polacek
On Tue, Mar 31, 2015 at 02:32:32PM +0200, Marek Polacek wrote:
> On Tue, Mar 31, 2015 at 02:25:14PM +0200, Kai Tietz wrote:
> > Hi,
> > 
> > I had tried same approach as Marek.  For me it solved the PR, but
> > caused other regressions on boostrap.  So I dropped the way via
> > dependent_type_p.
> > 
> > Well, this bootstrap-issue might be caused by some local changes I had
> > forgot to remove, but I doubt it.
> > Marek, have you tried to do a boostrap with your patch?
> 
> Of course, with --enable-languages=all.  I'll re-run the bootstrap with more
> languages enabled, though.

BTW, are you saying that your fix was exactly the same?  Did you as well check
that index_type is non-null?

Marek


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Kai Tietz
2015-03-31 14:34 GMT+02:00 Marek Polacek :
> On Tue, Mar 31, 2015 at 02:32:32PM +0200, Marek Polacek wrote:
>> On Tue, Mar 31, 2015 at 02:25:14PM +0200, Kai Tietz wrote:
>> > Hi,
>> >
>> > I had tried same approach as Marek.  For me it solved the PR, but
>> > caused other regressions on boostrap.  So I dropped the way via
>> > dependent_type_p.
>> >
>> > Well, this bootstrap-issue might be caused by some local changes I had
>> > forgot to remove, but I doubt it.
>> > Marek, have you tried to do a boostrap with your patch?
>>
>> Of course, with --enable-languages=all.  I'll re-run the bootstrap with more
>> languages enabled, though.
>
> BTW, are you saying that your fix was exactly the same?  Did you as well check
> that index_type is non-null?

Sure, I checked for index_type.  But by looking closer I used
instantiation_dependent_expression_p - as mentioned by Jason - instead
of dependent_type_p, which seems to make here the difference.

> Marek

Kai


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Mon, Mar 30, 2015 at 22:42:51 +0100, Julian Brown wrote:
> On Mon, 30 Mar 2015 18:42:02 +0200
> Jakub Jelinek  wrote:
> > But the one Julian posted doesn't apply on top of your patch.
> > If there is any interdiff needed on top of your patch, can it be
> > posted against trunk + your patch?
> 
> Here's a version of my patch against trunk and Ilya's latest patch
> (hopefully!). Tests look OK (libgomp + PTX).

Thanks for rebasing!

On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> > +/* Insert mapping of host -> target address pairs to splay tree.  */
> > +
> > +static void
> > +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
> > +   struct addr_pair *host_addr,
> > +   struct addr_pair *tgt_addr)
> > +{
> > +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
> > +  tgt->refcount = 1;
> > +  tgt->array = gomp_malloc (sizeof (*tgt->array));
> > +  tgt->tgt_start = tgt_addr->start;
> > +  tgt->tgt_end = tgt_addr->end;
> > +  tgt->to_free = NULL;
> > +  tgt->list_count = 0;
> > +  tgt->device_descr = devicep;
> > +  splay_tree_node node = tgt->array;
> > +  splay_tree_key k = &node->key;
> > +  k->host_start = host_addr->start;
> > +  k->host_end = host_addr->end;
> > +  k->tgt_offset = 0;
> > +  k->refcount = 1;
> > +  k->copy_from = false;
> > +  k->tgt = tgt;
> > +  node->left = NULL;
> > +  node->right = NULL;
> > +  splay_tree_insert (&devicep->mem_map, node);
> > +}
> 
> What is the reason to register and allocate these one at a time, rather than
> using one struct target_mem_desc with one tgt->array for all splay tree
> nodes registered from one image?
> Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> clear it is special) and just use tgt_offset relative to that (i.e.
> absolute), but having to malloc each node individually and having to malloc
> a target_mem_desc for each one sounds expensive.
> Everything is freed just once anyway, isn't it?

Here is WIP patch, does this look like what you suggested?  It works fine with
functions, however I'm not sure what to do with variables.  Will gomp_map_vars
work when tgt_start and tgt_end are equal to 0?

> > @@ -654,6 +727,18 @@ void
> >  GOMP_offload_register (void *host_table, enum offload_target_type 
> > target_type,
> >void *target_data)
> >  {
> > +  int i;
> > +  gomp_mutex_lock (®ister_lock);
> > +
> > +  /* Load image to all initialized devices.  */
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = &devices[i];
> > +  if (devicep->type == target_type && devicep->is_initialized)
> > +   gomp_offload_image_to_device (devicep, host_table, target_data);
> 
> Shouldn't either this function, or gomp_offload_image_to_device lock
> also devicep->lock mutex and unlock at the end?
> Where exactly I guess depends on if the devicep->* hook calls should be
> guarded with the mutex or not.  If yes, it should be this function and
> gomp_init_device.
> 
> > +  if (devicep->type != target_type || !devicep->is_initialized)
> > +   continue;
> > +
> 
> Similarly.

I've added lock/unlock to GOMP_offload_register and GOMP_offload_unregister.
All calls to gomp_init_device were already guarded.

> > +  devicep->unload_image_func (devicep->target_id, target_data);
> > +
> > +  /* Remove mapping from splay tree.  */
> > +  for (j = 0; j < num_funcs; j++)
> > +   {
> > + struct splay_tree_key_s k;
> > + k.host_start = (uintptr_t) host_func_table[j];
> > + k.host_end = k.host_start + 1;
> > + splay_tree_remove (&devicep->mem_map, &k);
> > +   }
> > +
> > +  for (j = 0; j < num_vars; j++)
> > +   {
> > + struct splay_tree_key_s k;
> > + k.host_start = (uintptr_t) host_var_table[j*2];
> > + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
> > + splay_tree_remove (&devicep->mem_map, &k);
> > +   }
> > +}
> 
> Aren't you leaking here all the tgt->array and tgt allocations here?
> Though, if you change it to just two allocations (one tgt, one array),
> you'd need to free just once.

You're right.  I've fixed this for functions, variables are WIP.


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   "#ifdef __cplusplus\n"
   "extern \"C\"\n"
   "#endif\n"
-  "void GOMP_offload_register (void *, int, void *);\n\n"
+  "void GOMP_offload_register (void *, int, void *);\n"
+  "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
   "__attribute__((constructor))\n"
   "static void\n"
   "init (void)\n"
   "{\n"
   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n"
+  "}\n\n"

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
> > What is the reason to register and allocate these one at a time, rather than
> > using one struct target_mem_desc with one tgt->array for all splay tree
> > nodes registered from one image?
> > Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> > clear it is special) and just use tgt_offset relative to that (i.e.
> > absolute), but having to malloc each node individually and having to malloc
> > a target_mem_desc for each one sounds expensive.
> > Everything is freed just once anyway, isn't it?
> 
> Here is WIP patch, does this look like what you suggested?  It works fine with
> functions, however I'm not sure what to do with variables.  Will gomp_map_vars
> work when tgt_start and tgt_end are equal to 0?

Can you explain what you are afraid of?  The mapped images (both their
mapping and unmapping) are done in pairs, and in a valid program the
addresses shouldn't be already mapped when the image is mapped in etc.
So, for gomp_map_vars, the var allocations should just be the pre-existing
mappings, i.e.
  splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
  if (n)
{
  tgt->list[i] = n;
  gomp_map_vars_existing (n, &cur_node, kind & typemask);
}
case and
  if (is_target)
{
  for (i = 0; i < mapnum; i++)
{
  if (tgt->list[i] == NULL)
cur_node.tgt_offset = (uintptr_t) NULL;
  else
cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
  + tgt->list[i]->tgt_offset;
  /* FIXME: see above FIXME comment.  */
  devicep->host2dev_func (devicep->target_id,
  (void *) (tgt->tgt_start
+ i * sizeof (void *)),
  (void *) &cur_node.tgt_offset,
  sizeof (void *));
}
}
at the end.  tgt->list[i] will be non-NULL, tgt->list[i]->tgt->tgt_start
will be 0, but tgt->list[i]->tgt_offset will be absolute and so should DTRT.

> +  for (i = 0; i < num_vars; i++)
> +{
> +  struct addr_pair host_addr;
> +  host_addr.start = (uintptr_t) host_var_table[i*2];
> +  host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];

Formatting, spaces around + or *.  But, as said earlier, I don't see why
this wouldn't work for variables too.

Jakub


Re: New regression on ARM Linux

2015-03-31 Thread Alan Lawrence

Jakub Jelinek wrote:

On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:

Richard Biener wrote:

But I find it odd that on ARM passing *((aligned_int *)p) as
vararg (only as varargs?) changes calling conventions independent
of the functions type signature.

Does it? Do you have a testcase, and compilation flags, that'll make this
show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
and I always get the value being passed in the expected ("unaligned")
register?


If the integral type alignment right now matters, I'd try something like:

typedef int V __attribute__((aligned (8)));
V x;

int foo (int x, ...)
{
  int z;
  __builtin_va_list va;
  __builtin_va_start (va, x);
  switch (x)
{
case 1:
case 3:
case 6:
  z = __builtin_va_arg (va, int);
  break;
default:
  z = __builtin_va_arg (va, V);
  break;
}
  __builtin_va_end (va);
  return z;
}

int
bar (void)
{
  V v = 3;
  int w = 3;
  foo (1, (int) v);
  foo (2, (V) w);
  v = 3;
  w = (int) v;
  foo (3, w);
  foo (4, (V) w);
  v = (V) w;
  foo (5, v);
  foo (6, (int) v);
  foo (7, x);
  return 0;
}

(of course, most likely with passing a different value each time and
verification of the result).
As the compiler treats all those casts there as useless, I'd expect
that the types of the passed argument would be pretty much random.
And, note that even on x86_64, the __builtin_va_arg with V expands into
  # addr.1_3 = PHI 
  z_35 = MEM[(V * {ref-all})addr.1_3];
using exactly the same address for int as well as V va_arg - if you increase
the overalignment arbitrarily, it will surely be a wrong IL because nobody
really guarantees anything about the overalignment.

So, I think the tree-sra.c patch is a good idea - try to keep using the main
type variants as the types in the IL where possible except for the MEM_REF
first argument (i.e. even the lhs of the load should IMHO not be
overaligned).

As Eric Botcazou said, GCC right now isn't really prepared for under or
overaligned scalars, only when they are in structs (or for middle-end in
*MEM_REFs).

Jakub



On ARM, I get the arguments being passed in r0 & r1 for every call in bar() 
above. It sounds as if this is because the casts are being removed as useless; 
so the only way for overalignment info to be present, is when SRA puts it there.


The only way I can get a register to be skipped, is by providing a prototype 
with alignment specified via a typedef:


typedef int aligned_int __attribute__((aligned((8;
int foo(int a, aligned_int b) {...} //compiles ok

whereas specifying alignment directly, is rejected:

nonvar.c:2:20: error: alignment may not be specified for 'b'
 int foo(int a, int b __attribute__((aligned((8)
^
Note this is using the GNU __attribute__((aligned)) extension. Trying to use C11 
_Alignas results in a frontend error either way; IIUC the C11 spec deems that 
sort of thing illegal.


(1) If we wish to keep the AAPCS principle that varargs are passed just as named 
args, we should use TYPE_MAIN_VARIANT inside arm_needs_doubleword_alignment, 
which will then ignore overalignment on both varargs _and named args_. However 
this would be silently ABI-changing?


(2) It seems to me that SRA is the only way for overalignment info to be present 
on a value, so the patch to tree-sra.c/create_access_replacement seems to make 
things more consistent?


(3) Given C11 forbids overaligned arguments/parameters, do we wish GNU 
__attribute__((aligned)) to allow this (via typedefs but not attributes on the 
parameters themselves) ?


--Alan



Re: New regression on ARM Linux

2015-03-31 Thread Richard Biener
On Tue, 31 Mar 2015, Alan Lawrence wrote:

> Jakub Jelinek wrote:
> > On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> > > Richard Biener wrote:
> > > > But I find it odd that on ARM passing *((aligned_int *)p) as
> > > > vararg (only as varargs?) changes calling conventions independent
> > > > of the functions type signature.
> > > Does it? Do you have a testcase, and compilation flags, that'll make this
> > > show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> > > and I always get the value being passed in the expected ("unaligned")
> > > register?
> > 
> > If the integral type alignment right now matters, I'd try something like:
> > 
> > typedef int V __attribute__((aligned (8)));
> > V x;
> > 
> > int foo (int x, ...)
> > {
> >   int z;
> >   __builtin_va_list va;
> >   __builtin_va_start (va, x);
> >   switch (x)
> > {
> > case 1:
> > case 3:
> > case 6:
> >   z = __builtin_va_arg (va, int);
> >   break;
> > default:
> >   z = __builtin_va_arg (va, V);
> >   break;
> > }
> >   __builtin_va_end (va);
> >   return z;
> > }
> > 
> > int
> > bar (void)
> > {
> >   V v = 3;
> >   int w = 3;
> >   foo (1, (int) v);
> >   foo (2, (V) w);
> >   v = 3;
> >   w = (int) v;
> >   foo (3, w);
> >   foo (4, (V) w);
> >   v = (V) w;
> >   foo (5, v);
> >   foo (6, (int) v);
> >   foo (7, x);
> >   return 0;
> > }
> > 
> > (of course, most likely with passing a different value each time and
> > verification of the result).
> > As the compiler treats all those casts there as useless, I'd expect
> > that the types of the passed argument would be pretty much random.
> > And, note that even on x86_64, the __builtin_va_arg with V expands into
> >   # addr.1_3 = PHI 
> >   z_35 = MEM[(V * {ref-all})addr.1_3];
> > using exactly the same address for int as well as V va_arg - if you increase
> > the overalignment arbitrarily, it will surely be a wrong IL because nobody
> > really guarantees anything about the overalignment.
> > 
> > So, I think the tree-sra.c patch is a good idea - try to keep using the main
> > type variants as the types in the IL where possible except for the MEM_REF
> > first argument (i.e. even the lhs of the load should IMHO not be
> > overaligned).
> > 
> > As Eric Botcazou said, GCC right now isn't really prepared for under or
> > overaligned scalars, only when they are in structs (or for middle-end in
> > *MEM_REFs).
> > 
> > Jakub
> > 
> 
> On ARM, I get the arguments being passed in r0 & r1 for every call in bar()
> above. It sounds as if this is because the casts are being removed as useless;
> so the only way for overalignment info to be present, is when SRA puts it
> there.
> 
> The only way I can get a register to be skipped, is by providing a prototype
> with alignment specified via a typedef:
> 
> typedef int aligned_int __attribute__((aligned((8;
> int foo(int a, aligned_int b) {...} //compiles ok
> 
> whereas specifying alignment directly, is rejected:
> 
> nonvar.c:2:20: error: alignment may not be specified for 'b'
>  int foo(int a, int b __attribute__((aligned((8)
> ^
> Note this is using the GNU __attribute__((aligned)) extension. Trying to use
> C11 _Alignas results in a frontend error either way; IIUC the C11 spec deems
> that sort of thing illegal.
> 
> (1) If we wish to keep the AAPCS principle that varargs are passed just as
> named args, we should use TYPE_MAIN_VARIANT inside
> arm_needs_doubleword_alignment, which will then ignore overalignment on both
> varargs _and named args_. However this would be silently ABI-changing?
> 
> (2) It seems to me that SRA is the only way for overalignment info to be
> present on a value, so the patch to tree-sra.c/create_access_replacement seems
> to make things more consistent?

I'm not so sure about (2), SCCVN records the type of a reference
and PRE uses it to create the LHS temporaries to insert them.
You'd need some tricky order of optimizations to expose that to
a call argument though (copy-propagating the inserted value to
a call argument).  LIM may have similar issues (when doing store-motion),
so may predictive commoning and loop distribution (and maybe others I
forgot).

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


Re: Silence merge warnings on artificial types

2015-03-31 Thread Jason Merrill

On 03/31/2015 03:51 AM, Jan Hubicka wrote:

Jason, please do you know what is meaning of DECL_ARTIFICIAL on class type
names? Perhaps we can drop them to 0 in free lang data?


It indicates the implicit typedef that let's you say 'S' instead of 
'struct S' without writing 'typedef struct S S' yourself.


dwarf2out.c uses this information in TYPE_DECL_IS_STUB and 
is_redundant_typedef, so we can't clear it until after debug info 
generation.


Jason



[PATCH] Fix arm regression after SRA alignment fix

2015-03-31 Thread Richard Biener

The following avoids running into issues with the AACPS ABI on arm
when using over-aligned types on registers.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-03-31  Richard Biener  

* tree-sra.c (create_access_replacement): Drop under-/over-alignment
of types.

Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  (revision 221770)
+++ gcc/tree-sra.c  (working copy)
@@ -2012,7 +2012,11 @@ create_access_replacement (struct access
   DECL_CONTEXT (repl) = current_function_decl;
 }
   else
-repl = create_tmp_var (access->type, "SR");
+/* Drop any special alignment on the type if it's not on the main
+   variant.  This avoids issues with weirdo ABIs like AAPCS.  */
+repl = create_tmp_var (build_qualified_type
+(TYPE_MAIN_VARIANT (access->type),
+ TYPE_QUALS (access->type)), "SR");
   if (TREE_CODE (access->type) == COMPLEX_TYPE
   || TREE_CODE (access->type) == VECTOR_TYPE)
 {


Re: [PR64164] drop copyrename, integrate into expand

2015-03-31 Thread Richard Biener
On Tue, Mar 31, 2015 at 8:55 AM, Steven Bosscher  wrote:
> On Sat, Mar 28, 2015 at 8:21 PM, Alexandre Oliva wrote:
>> Regstrapped on x86_64-linux-gnu native and on i686-pc-linux-gnu native
>> on x86_64, so without lto plugin.  The only regression is in
>> gcc.dg/guality/pr54200.c, that explicitly disables VTA.
>
> What about memory footprint? IIRC this pass was in part introduced to
> reduce the number of VAR_DECLs.

That's no longer necessary as we now drop VAR_DECLs from non-user vars
completely at into-SSA time.  We have "anonymous" SSA names without
associated decls.

Richard.

> Ciao!
> Steven


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Marek Polacek
On Tue, Mar 31, 2015 at 02:32:32PM +0200, Marek Polacek wrote:
> Of course, with --enable-languages=all.  I'll re-run the bootstrap with more
> languages enabled, though.

--enable-languages=all,obj-c++,go bootstrap passed again on x86_64 and ppc64.

Marek


Re: [libstdc++/65033] Give alignment info to libatomic

2015-03-31 Thread Jonathan Wakely

On 26/03/15 13:21 +, Jonathan Wakely wrote:

This includes your fix to avoid decreasing alignment, but I didn't add
a test for that as I couldn't make it fail on any of the targets I
test on.



commit f796769ad20c0353490b9f1a7e019e2f0c1771fb
Author: Jonathan Wakely 
Date:   Wed Sep 3 15:39:53 2014 +0100

PR libstdc++/62259
PR libstdc++/65147
* include/std/atomic (atomic): Increase alignment for types with
the same size as one of the integral types.
* testsuite/29_atomics/atomic/60695.cc: Adjust dg-error line number.
* testsuite/29_atomics/atomic/62259.cc: New.


My patch was not sufficient to fix 65147, because I didn't increase
the alignment of the std::atomic specializations, and
std::atomic<16-byte type> is only aligned correctly if __int128 is
supported, which isn't true on x86 and other 32-bit targets.

This is the best I've come up with, does anyone have any better ideas
than the #else branch to hardcode alignment of 16-byte types to 16?

commit d0ccfb0523066c69f3d22d9cdd617a139c57f9e1
Author: Jonathan Wakely 
Date:   Mon Mar 30 14:28:01 2015 +0100

	PR libstdc++/65147
	* include/bits/atomic_base.h (__atomic_base): Align as underlying
	type.
	* include/std/atomic (atomic): Hardcode alignment for 16-byte
	types when __int128 is not available.
	* testsuite/29_atomics/atomic/60695.cc: Adjust dg-error line number.
	* testsuite/29_atomics/atomic/65147.cc: New.

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 8104c98..48931ac 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -235,7 +235,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 8 bytes, since that is what GCC built-in functions for atomic
   // memory access expect.
   template
-struct __atomic_base
+struct alignas(_ITp) __atomic_base
 {
 private:
   typedef _ITp 	__int_type;
@@ -559,7 +559,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Partial specialization for pointer types.
   template
-struct __atomic_base<_PTp*>
+struct alignas(_PTp*) __atomic_base<_PTp*>
 {
 private:
   typedef _PTp* 	__pointer_type;
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 88c8b17..2b09477 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -175,6 +175,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	: sizeof(_Tp) == sizeof(long long) ? alignof(long long)
 #ifdef _GLIBCXX_USE_INT128
 	: sizeof(_Tp) == sizeof(__int128)  ? alignof(__int128)
+#else
+	: sizeof(_Tp) == 16  ? 16
 #endif
 	: 0;
 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc b/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc
index 6f618a0..f755be0 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc
@@ -27,4 +27,4 @@ struct X {
   char stuff[0]; // GNU extension, type has zero size
 };
 
-std::atomic a;  // { dg-error "not supported" "" { target *-*-* } 189 }
+std::atomic a;  // { dg-error "not supported" "" { target *-*-* } 191 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/65147.cc b/libstdc++-v3/testsuite/29_atomics/atomic/65147.cc
new file mode 100644
index 000..bb92513
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/65147.cc
@@ -0,0 +1,42 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// PR libstdc++65147
+
+#include 
+
+static_assert( alignof(std::atomic) == alignof(short),
+"atomic short must be aligned like short" );
+
+static_assert( alignof(std::atomic) == alignof(int),
+"atomic int must be aligned like int" );
+
+static_assert( alignof(std::atomic) == alignof(long),
+"atomic long must be aligned like long" );
+
+static_assert( alignof(std::atomic) == alignof(long long),
+"atomic long long must be aligned like long long" );
+
+struct S {
+  char s[16];
+};
+
+static_assert( alignof(std::atomic) > 1,
+"atomic 16-byte struct must not be aligned like char" );


Re: [PATCH] [UPDATED] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-31 Thread Richard Biener
On Mon, Mar 30, 2015 at 9:42 PM, Bernd Edlinger
 wrote:
> Hi,
>
> On Mon, 30 Mar 2015 12:33:47, Richard Biener wrote:
>>
>> So - shouldn't the check be
>>
>> if (MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (fieldmode))
>> return false;
>>
>
> No. Because this example would access memory beyond the end of structure
> on m66k:
>
> volatile struct s
> {
>   unsigned x:15;
> } g;
>
> int x = g.x;
>
> but when MEM_ALIGN(op0) == fieldmode we know that
> sizeof(struct s) == sizeof(int).

We don't for

volatile struct s
{
  unsigned x:15__attribute__((packed)) ;
} g __attribute__((aligned(4)));


>> instead? And looking at
>>
>> /* Check for cases of unaligned fields that must be split. */
>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize
>> - || (STRICT_ALIGNMENT
>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
>> + if (bitnum % (STRICT_ALIGNMENT ? modesize : BITS_PER_UNIT)
>> + + bitsize> modesize)
>> return false;
>>
>> I wonder what the required semantics are - are they not that we need to 
>> access
>> the whole "underlying" field with the access (the C++ memory model only
>> requires we don't go beyond the field)? It seems that information isn't 
>> readily
>> available here though. So the check checks that we can access the field
>> with a single access using fieldmode. Which means (again),
>>
>
> And another requirement must of course be, that we should never read anything
> outside of the structure's limits.

Yes.  bitregion_end should provide a good hint here?

>> if (bitnum % (STRICT_ALIGNMENT ? GET_MODE_ALIGNMENT (fieldmode) :
>> BITS_PER_UNIT)
>> + bitsize> modesize)
>>
>> Also this means that for !STRICT_ALIGNMENT platforms the MEM_ALIGN check 
>> isn't
>> sufficient which is what the other hunks in the patch are about to fix?
>>
>> It seems at least the
>>
>> @@ -988,6 +995,16 @@ store_bit_field (rtx str_rtx, unsigned HOST_WIDE_I
>>
>> str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum,
>> &bitnum);
>> + if (!STRICT_ALIGNMENT
>> + && bitnum + bitsize> GET_MODE_BITSIZE (fieldmode))
>> + {
>> + unsigned HOST_WIDE_INT offset;
>> + offset = (bitnum + bitsize + BITS_PER_UNIT - 1
>> + - GET_MODE_BITSIZE (fieldmode)) / BITS_PER_UNIT;
>> + str_rtx = adjust_bitfield_address (str_rtx, fieldmode, offset);
>> + bitnum -= offset * BITS_PER_UNIT;
>> + }
>> + gcc_assert (bitnum + bitsize <= GET_MODE_BITSIZE (fieldmode));
>>
>> hunks could do with a comment.
>>
>> That said, I fail to realize how the issue is specific to
>> strict-volatile bitfields?
>>
>
> I will not try to defend this hunk at the moment, because it is actually not 
> needed to
> fix the wrong code bug,
>
>
>> I also hope somebody else will also look at the patch - I'm not feeling like 
>> the
>> appropriate person to review changes in this area (even if I did so in
>> the past).
>>
>
>
> Sorry, but I just have to continue...
>
> So, now I removed these unaligned access parts again from the patch,
>
> Attached you will find the new reduced version of the patch.
>
> Boot-strapped and regression-tested on x86_64-linux-gnu.
>
> OK for trunk?

Ok if nobody else complains within 24h.

Thanks,
Richard.

>
> Thanks,
> Bernd.
>


Re: [PATCH] [ARM] PR45701 testcase fix.

2015-03-31 Thread James Greenhalgh
*ping* on Alex' behalf and CCing the ARM maintainers.

This fix looks obvious to me, and cleans up another couple of FAILs
for the ARM port.

Richard/Ramana?

Cheers,
James

On Thu, Mar 26, 2015 at 03:28:15PM +, Alex Velenko wrote:
> On 04/03/15 11:13, Alex Velenko wrote:
> > 2015-03-04  Alex Velenko  
> >
> > gcc/testsuite
> >
> > * gcc.target/arm/pr45701-1.c (history_expand_line_internal): Add an
> > extra variable to force stack alignment.
> > * gcc.target/arm/pr45701-2.c (history_expand_line_internal): Add an
> > extra variable to force stack alignment.
> > ---
> >   gcc/testsuite/gcc.target/arm/pr45701-1.c | 5 +++--
> >   gcc/testsuite/gcc.target/arm/pr45701-2.c | 5 +++--
> >   2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/pr45701-1.c 
> > b/gcc/testsuite/gcc.target/arm/pr45701-1.c
> > index 2c690d5..454a087 100644
> > --- a/gcc/testsuite/gcc.target/arm/pr45701-1.c
> > +++ b/gcc/testsuite/gcc.target/arm/pr45701-1.c
> > @@ -5,6 +5,7 @@
> >   /* { dg-final { scan-assembler-not "r8" } } */
> >
> >   extern int hist_verify;
> > +extern int a1;
> >   extern char *pre_process_line (char*);
> >   extern char* str_cpy (char*, char*);
> >   extern int str_len (char*);
> > @@ -16,10 +17,10 @@ history_expand_line_internal (char* line)
> >   {
> > char *new_line;
> > int old_verify;
> > -
> > +  int a = a1;
> > old_verify = hist_verify;
> > hist_verify = 0;
> > new_line = pre_process_line (line);
> > -  hist_verify = old_verify;
> > +  hist_verify = old_verify + a;
> > return (new_line == line) ? savestring (line) : new_line;
> >   }
> > diff --git a/gcc/testsuite/gcc.target/arm/pr45701-2.c 
> > b/gcc/testsuite/gcc.target/arm/pr45701-2.c
> > index ee1ee7d..afe0840 100644
> > --- a/gcc/testsuite/gcc.target/arm/pr45701-2.c
> > +++ b/gcc/testsuite/gcc.target/arm/pr45701-2.c
> > @@ -5,6 +5,7 @@
> >   /* { dg-final { scan-assembler-not "r8" } } */
> >
> >   extern int hist_verify;
> > +extern int a1;
> >   extern char *pre_process_line (char*);
> >   extern char* savestring1 (char*, char*);
> >   extern char* str_cpy (char*, char*);
> > @@ -17,11 +18,11 @@ history_expand_line_internal (char* line)
> >   {
> > char *new_line;
> > int old_verify;
> > -
> > +  int a = a1;
> > old_verify = hist_verify;
> > hist_verify = 0;
> > new_line = pre_process_line (line);
> > -  hist_verify = old_verify;
> > +  hist_verify = old_verify + a;
> > /* Two tail calls here, but r3 is not used to pass values.  */
> > return (new_line == line) ? savestring (line) : savestring1 (new_line, 
> > line);
> >   }
> >
 


Re: [PR64164] drop copyrename, integrate into expand

2015-03-31 Thread Richard Biener
On Sat, Mar 28, 2015 at 8:21 PM, Alexandre Oliva  wrote:
> On Mar 27, 2015, Alexandre Oliva  wrote:
>
>> This patch reworks the out-of-ssa expander to enable coalescing of SSA
>> partitions that don't share the same base name.  This is done only when
>> optimizing.
>
>> The test we use to tell whether two partitions can be merged no longer
>> demands them to have the same base variable when optimizing, so they
>> become eligible for coalescing, as they would after copyrename.  We then
>> compute the partitioning we'd get if all coalescible partitions were
>> coalesced, using this partition assignment to assign base vars numbers.
>> These base var numbers are then used to identify conflicts, which used
>> to be based on shared base vars or base types.
>
>> We now propagate base var names during coalescing proper, only towards
>> the leader variable.  I'm no longer sure this is still needed, but
>> something about handling variables and results led me this way and I
>> didn't revisit it.  I might rework that with a later patch, or a later
>> revision of this patch; it would require other means to identify
>> partitions holding result_decls during merging, or allow that and deal
>> with param and result decls in a different way during expand proper.
>
>> I had to fix two lingering bugs in order for the whole thing to work: we
>> perform conflict detection after abnormal coalescing, but we computed
>> live ranges involving only the partition leaders, so conflicts with
>> other names already coalesced wouldn't be detected.
>
> This early abnormal coalescing was only present for a few days in the
> trunk, and I was lucky enough to start working on a tree that had it.
> It turns out that the fix for it was thus rendered unnecessary, so I
> dropped it.  It was the fix for it, that didn't cover the live range
> check, that caused the two ICEs I saw in the regressions tests.  Since
> the ultimate cause of the problem is gone, and the change that
> introduced the check failures, both problems went *poof* after I updated
> the tree, resolved the conflicts and dropped the redundant code.
>
>> The other problem was that we didn't track default defs for parms as
>> live at entry, so they might end up coalesced.
>
> I improved this a little bit, using the bitmap of partitions containing
> default params to check that we only process function-entry defs for
> them, rather than for all param decls in case they end up in other
> partitions.
>
>> I guess none of these problems would have been exercised in practice,
>> because we wouldn't even consider merging ssa names associated with
>> different variables.
>
>> In the end, I verified that this fixed the codegen regression in the
>> PR64164 testcase, that failed to merge two partitions that could in
>> theory be merged, but that wasn't even considered due to differences in
>> the SSA var names.
>
>> I'd agree that disregarding the var names and dropping 4 passes is too
>> much of a change to fix this one problem, but...  it's something we
>> should have long tackled, and it gets this and other jobs done, so...
>
> Regstrapped on x86_64-linux-gnu native and on i686-pc-linux-gnu native
> on x86_64, so without lto plugin.  The only regression is in
> gcc.dg/guality/pr54200.c, that explicitly disables VTA.  When
> optimization is enabled, the different coalescing we perform now causes
> VTA-less variable tracking to lose track of variable "z".  This
> regression in non-VTA var-tracking is expected and, as richi put it in
> PR 64164, I guess we don't care about that, do we? :-)

Apart from at -O0, yes.

> The other guality regressions I mentioned in my other email turned out
> not to be regressions, but preexisting failures that somehow did not
> make to the test_summary of my earlier pristine build.
>
> Is this ok to install?

I think this is stage1 material.  Some comments in-line

>
> for  gcc/ChangeLog
>
> PR rtl-optimization/64164
> * Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
> * tree-ssa-copyrename.c: Removed.
> * opts.c (default_options_table): Drop -ftree-copyrename.
> * passes.def: Drop all occurrences of pass_rename_ssa_copies.
> * common.opt (ftree-copyrename): Ignore.
> (ftree-coalesce-vars, ftree-coalesce-inlined-vars): Likewise.
> * doc/invoke.texi: Remove the ignored options above.
> * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
> across base variables when optimizing.
> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Add
> param_defaults argument.  Process PARM_DECLs's default defs at
> the entry point.
> (attempt_coalesce): Add param_defaults argument, and
> track the presence of default defs for params in each
> partition.  Propagate base var to leader on merge, preferring
> parms and results, named vars, ignored vars, and then anon
> vars.  Refuse to merge a RESULT_DECL partition with a default
>

Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread Jack Howarth
H.J.,
 While the latest patch fails to bootstrap on x86_64-apple-darwin14...

make[2]: Entering directory
'/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libcilkrts'
/bin/sh ./libtool --tag=CXX   --mode=link
/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/./gcc/xg++
-B/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/./gcc/ -nostdinc++
-nostdinc++ 
-I/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/include/x86_64-apple-darwin14.3.0
-I/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/include
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/libsupc++
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/include/backward
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/testsuite/util
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src/.libs
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/libsupc++/.libs
-B/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src/.libs
-B/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/libsupc++/.libs
-B/sw/lib/gcc5/x86_64-apple-darwin14.3.0/bin/
-B/sw/lib/gcc5/x86_64-apple-darwin14.3.0/lib/ -isystem
/sw/lib/gcc5/x86_64-apple-darwin14.3.0/include -isystem
/sw/lib/gcc5/x86_64-apple-darwin14.3.0/sys-include -g -O2
-version-info 5:0:0 -ldl -lpthread   -no-undefined  -o libcilkrts.la
-rpath /sw/lib/gcc5/lib cilk-abi-vla.lo os-unix-sysdep.lo bug.lo
cilk-abi.lo cilk-abi-cilk-for.lo cilk-abi-vla-internal.lo cilk_api.lo
cilk_fiber.lo cilk_fiber-unix.lo cilk_malloc.lo c_reducers.lo
except-gcc.lo frame_malloc.lo full_frame.lo global_state.lo jmpbuf.lo
local_state.lo metacall_impl.lo os_mutex-unix.lo os-unix.lo
pedigrees.lo record-replay.lo reducer_impl.lo scheduler.lo
signal_node.lo spin_mutex.lo stats.lo symbol_test.lo sysdep-unix.lo
worker_mutex.lo
libtool: link:
/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/./gcc/xg++
-B/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/./gcc/ -nostdinc++
-nostdinc++ 
-I/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/include/x86_64-apple-darwin14.3.0
-I/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/include
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/libsupc++
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/include/backward
-I/sw/src/fink.build/gcc5-5.0.0-1/gcc-5-20150331/libstdc++-v3/testsuite/util
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src/.libs
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/libsupc++/.libs
-B/sw/lib/gcc5/x86_64-apple-darwin14.3.0/bin/
-B/sw/lib/gcc5/x86_64-apple-darwin14.3.0/lib/ -isystem
/sw/lib/gcc5/x86_64-apple-darwin14.3.0/include -isystem
/sw/lib/gcc5/x86_64-apple-darwin14.3.0/sys-include-dynamiclib  -o
.libs/libcilkrts.5.dylib  .libs/cilk-abi-vla.o .libs/os-unix-sysdep.o
.libs/bug.o .libs/cilk-abi.o .libs/cilk-abi-cilk-for.o
.libs/cilk-abi-vla-internal.o .libs/cilk_api.o .libs/cilk_fiber.o
.libs/cilk_fiber-unix.o .libs/cilk_malloc.o .libs/c_reducers.o
.libs/except-gcc.o .libs/frame_malloc.o .libs/full_frame.o
.libs/global_state.o .libs/jmpbuf.o .libs/local_state.o
.libs/metacall_impl.o .libs/os_mutex-unix.o .libs/os-unix.o
.libs/pedigrees.o .libs/record-replay.o .libs/reducer_impl.o
.libs/scheduler.o .libs/signal_node.o .libs/spin_mutex.o .libs/stats.o
.libs/symbol_test.o .libs/sysdep-unix.o .libs/worker_mutex.o
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/src/.libs
-L/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libstdc++-v3/libsupc++/.libs
-ldl -lpthread-install_name  /sw/lib/gcc5/lib/libcilkrts.5.dylib
-compatibility_version 6 -current_version 6.0 -Wl,-single_module
Undefined symbols for architecture x86_64:
  "___cpu_model", referenced from:
  _restore_x86_fp_state in os-unix-sysdep.o
  _sysdep_save_fp_ctrl_state in os-unix-sysdep.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
Makefile:540: recipe for target 'libcilkrts.la' failed
make[2]: *** [libcilkrts.la] Error 1
make[2]: Leaving directory
'/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libcilkrts'
Makefile:13569: recipe for target 'all-target-libcilkrts' failed
make[1]: *** [all-target-libcilkrts] Error 2
make[1]: Leaving directory '/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir'
Makefile:21064: recipe for target 'bootstrap&

Re: [PATCH] [ARM] PR45701 testcase fix.

2015-03-31 Thread Richard Earnshaw
On 04/03/15 11:13, Alex Velenko wrote:
> Hi,
> 
> This patch fixes arm pr45701 scan assembly tests. Those test register r3 being
> used to maintain stack double word alignment. Recent optimizations reduced
> number of local variables needed in those tests, removing necessity to push 
> r3.
> Testcase fixed by adding additional local variable.
> 
> Is patch OK?
> 

This patch is OK.

Let me put it on record that I really dislike these scan-assembler tests
that rely on specific behaviours throughout the entire compilation flow.
 They're just too fragile to be useful.

R.

> 2015-03-04  Alex Velenko  
> 
> gcc/testsuite
> 
>   * gcc.target/arm/pr45701-1.c (history_expand_line_internal): Add an
>   extra variable to force stack alignment.
>   * gcc.target/arm/pr45701-2.c (history_expand_line_internal): Add an
>   extra variable to force stack alignment.
> ---
>  gcc/testsuite/gcc.target/arm/pr45701-1.c | 5 +++--
>  gcc/testsuite/gcc.target/arm/pr45701-2.c | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/pr45701-1.c 
> b/gcc/testsuite/gcc.target/arm/pr45701-1.c
> index 2c690d5..454a087 100644
> --- a/gcc/testsuite/gcc.target/arm/pr45701-1.c
> +++ b/gcc/testsuite/gcc.target/arm/pr45701-1.c
> @@ -5,6 +5,7 @@
>  /* { dg-final { scan-assembler-not "r8" } } */
>  
>  extern int hist_verify;
> +extern int a1;
>  extern char *pre_process_line (char*);
>  extern char* str_cpy (char*, char*);
>  extern int str_len (char*);
> @@ -16,10 +17,10 @@ history_expand_line_internal (char* line)
>  {
>char *new_line;
>int old_verify;
> -
> +  int a = a1;
>old_verify = hist_verify;
>hist_verify = 0;
>new_line = pre_process_line (line);
> -  hist_verify = old_verify;
> +  hist_verify = old_verify + a;
>return (new_line == line) ? savestring (line) : new_line;
>  }
> diff --git a/gcc/testsuite/gcc.target/arm/pr45701-2.c 
> b/gcc/testsuite/gcc.target/arm/pr45701-2.c
> index ee1ee7d..afe0840 100644
> --- a/gcc/testsuite/gcc.target/arm/pr45701-2.c
> +++ b/gcc/testsuite/gcc.target/arm/pr45701-2.c
> @@ -5,6 +5,7 @@
>  /* { dg-final { scan-assembler-not "r8" } } */
>  
>  extern int hist_verify;
> +extern int a1;
>  extern char *pre_process_line (char*);
>  extern char* savestring1 (char*, char*);
>  extern char* str_cpy (char*, char*);
> @@ -17,11 +18,11 @@ history_expand_line_internal (char* line)
>  {
>char *new_line;
>int old_verify;
> -
> +  int a = a1;
>old_verify = hist_verify;
>hist_verify = 0;
>new_line = pre_process_line (line);
> -  hist_verify = old_verify;
> +  hist_verify = old_verify + a;
>/* Two tail calls here, but r3 is not used to pass values.  */
>return (new_line == line) ? savestring (line) : savestring1 (new_line, 
> line);
>  }
> 



Re: [libstdc++/65033] Give alignment info to libatomic

2015-03-31 Thread Richard Henderson
On 03/31/2015 06:41 AM, Jonathan Wakely wrote:
> This is the best I've come up with, does anyone have any better ideas
> than the #else branch to hardcode alignment of 16-byte types to 16?

If there's no 16 byte type, are we convinced this matters?  I mean, there isn't
a 16-byte atomic instruction for 32-bit x86 (or any other 32-bit cpu of which I
am aware).  So we're forced to use a locking path anyway.


And if we do want the alignment, do we stop pretending with all the sizeof's
and alignof's and just use power-of-two size alignment for all of them, e.g.

  min_align = ((size & (size - 1)) || size > 16 ? 0 : size)


r~


Re: [libstdc++/65033] Give alignment info to libatomic

2015-03-31 Thread Jonathan Wakely

On 31/03/15 07:54 -0700, Richard Henderson wrote:

On 03/31/2015 06:41 AM, Jonathan Wakely wrote:

This is the best I've come up with, does anyone have any better ideas
than the #else branch to hardcode alignment of 16-byte types to 16?


If there's no 16 byte type, are we convinced this matters?  I mean, there isn't
a 16-byte atomic instruction for 32-bit x86 (or any other 32-bit cpu of which I
am aware).  So we're forced to use a locking path anyway.


The C front end gives struct S { char s[16]; } 16 byte alignment, and
I'd like std::atomic and _Atomic struct S to be layout compatible,
although it's not essential (or required by any standard).

And it matters most for the integral types, not arbitrary structs.


And if we do want the alignment, do we stop pretending with all the sizeof's
and alignof's and just use power-of-two size alignment for all of them, e.g.

 min_align = ((size & (size - 1)) || size > 16 ? 0 : size)


Yeah, I wondered about that too. Joseph indicated there are targets
where C gives alignof(_Atomic X) != sizeof(X), which is why the target
hook exists, but maybe we can just not worry about those targets for
now.  For GCC 6 we can look into the new attribute Andrew did in the
atomics branch so that we can make std::atomic use the target hook
directly instead of trying to simulate its effects in C++ code.



Re: [PATCH] Fix size & type for cold partition names (hot-cold function partitioning)

2015-03-31 Thread Caroline Tice
I am fine with waiting until stage 1.  When that is likely to be?

-- Caroline Tice
cmt...@google.com

On Mon, Mar 30, 2015 at 10:19 PM, Jeff Law  wrote:
> On 03/27/2015 10:44 AM, Caroline Tice wrote:
>>
>> It took  me a while to get a test case I'm happy with, so I'm
>> re-submitting the whole patch for approval.
>>
>> 2015-03-27  Caroline Tice  
>>
>>  * final.c (final_scan_insn): Change 'cold_function_name' to
>>  'cold_partition_name' and make it a global variable; also output
>>  assembly to give it a 'FUNC' type, if appropriate.
>>  * varasm.c (cold_partition_name): Declare and initialize global
>>  variable.
>>  (assemble_start_function): Re-set value for cold_partition_name.
>>  (assemble_end_function): Output assembly to calculate size of
>> cold
>>  partition, and associate size with name, if appropriate.
>>  * varash.h (cold_partition_name): Add extern declaration for
>> global
>>  variable.
>>  * testsuite/gcc.dg/tree-prof/cold_partition_label.c: Add
>> dg-final-use
>>  to scan assembly for cold partition name and size.
>
> Given we're late in stage4, can this wait until stage1, where it should be
> considered pre-approved?  I'd hate to mess up the PA or PTX ports at this
> stage in the release process.
>
> jeff
>


Re: [libstdc++/65033] Give alignment info to libatomic

2015-03-31 Thread Richard Henderson
On 03/31/2015 08:03 AM, Jonathan Wakely wrote:
> On 31/03/15 07:54 -0700, Richard Henderson wrote:
>> On 03/31/2015 06:41 AM, Jonathan Wakely wrote:
>>> This is the best I've come up with, does anyone have any better ideas
>>> than the #else branch to hardcode alignment of 16-byte types to 16?
>>
>> If there's no 16 byte type, are we convinced this matters?  I mean, there 
>> isn't
>> a 16-byte atomic instruction for 32-bit x86 (or any other 32-bit cpu of 
>> which I
>> am aware).  So we're forced to use a locking path anyway.
> 
> The C front end gives struct S { char s[16]; } 16 byte alignment...

Um, I'm pretty sure it doesn't.

struct S { char s[16]; };
int x = __alignof(struct S);

.type   x, @object
.size   x, 4
x:
.long   1

What you're interpreting as alignment for that struct is probably optional
*additional* alignment from LOCAL_ALIGNMENT or LOCAL_DECL_ALIGNMENT or 
something.

> And it matters most for the integral types, not arbitrary structs.
> 
>> And if we do want the alignment, do we stop pretending with all the sizeof's
>> and alignof's and just use power-of-two size alignment for all of them, e.g.
>>
>>  min_align = ((size & (size - 1)) || size > 16 ? 0 : size)
> 
> Yeah, I wondered about that too. Joseph indicated there are targets
> where C gives alignof(_Atomic X) != sizeof(X), which is why the target
> hook exists, but maybe we can just not worry about those targets for
> now.

Those targets have alignof < sizeof.  So in a way we'd probably be doing them a
favor.  I know for instance that CRIS is in this category, where most data is
all byte aligned, but atomics must be naturally aligned.

And, as you note, just so long as we do the same thing for _Atomic once we
get that merged.


r~


Re: [PATCH] Fix size & type for cold partition names (hot-cold function partitioning)

2015-03-31 Thread Jeff Law

On 03/31/2015 09:12 AM, Caroline Tice wrote:

I am fine with waiting until stage 1.  When that is likely to be?
We're very close (week or two) to getting the first GCC 5 RCs spun, so 
stage1 for GCC 6 should open shortly thereafter.


jeff



Re: [libstdc++/65033] Give alignment info to libatomic

2015-03-31 Thread Jonathan Wakely

On 31/03/15 08:13 -0700, Richard Henderson wrote:

On 03/31/2015 08:03 AM, Jonathan Wakely wrote:

On 31/03/15 07:54 -0700, Richard Henderson wrote:

On 03/31/2015 06:41 AM, Jonathan Wakely wrote:

This is the best I've come up with, does anyone have any better ideas
than the #else branch to hardcode alignment of 16-byte types to 16?


If there's no 16 byte type, are we convinced this matters?  I mean, there isn't
a 16-byte atomic instruction for 32-bit x86 (or any other 32-bit cpu of which I
am aware).  So we're forced to use a locking path anyway.


The C front end gives struct S { char s[16]; } 16 byte alignment...


Um, I'm pretty sure it doesn't.

struct S { char s[16]; };
int x = __alignof(struct S);

.type   x, @object
.size   x, 4
x:
.long   1

What you're interpreting as alignment for that struct is probably optional
*additional* alignment from LOCAL_ALIGNMENT or LOCAL_DECL_ALIGNMENT or 
something.


Sorry for not being clear, I meant __alignof(_Atomic struct S) is 16.


And it matters most for the integral types, not arbitrary structs.


And if we do want the alignment, do we stop pretending with all the sizeof's
and alignof's and just use power-of-two size alignment for all of them, e.g.

 min_align = ((size & (size - 1)) || size > 16 ? 0 : size)


Yeah, I wondered about that too. Joseph indicated there are targets
where C gives alignof(_Atomic X) != sizeof(X), which is why the target
hook exists, but maybe we can just not worry about those targets for
now.


Those targets have alignof < sizeof.  So in a way we'd probably be doing them a
favor.  I know for instance that CRIS is in this category, where most data is
all byte aligned, but atomics must be naturally aligned.


Aha, I wondered why CRIS overrides the atomic_align_for_mode hook when
it seemed to be giving them natural alignment anyway - I didn't
realise non-atomic types are only byte-aligned.



Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses

2015-03-31 Thread Trevor Saunders
On Tue, Mar 31, 2015 at 07:37:40AM +0300, Mikhail Maltsev wrote:
> Hi!
> 
> I'm currently working on the proposed task of replacing rtx objects
> (i.e. struct rtx_def) with derived classes. I would like to get some
> feedback on this work (it's far from being finished, but basically I
> would like to know, whether my modifications are appropriate, e.g. one
> may consider that this is "too much" for just refactoring, because
> sometimes they involve small modification of semantics).

I don't see why "too much" would makesense if the change improves
maintainability.

> The attached patch is not well tested, i.e. I bootstrapped and regtested
> it only on x86_64, but I'll perform more extensive testing before
> submitting the next version.
> 
> The key points I would like to ask about:
> 
> 1. The original task was to replace the rtx type by rtx_insn *, where it
> is appropriate. But rtx_insn has several derived classes, such as
> rtx_code_label, for example. So I tried to use the most derived type
> when possible. Is it OK?

sure why not?

> 2. Not all of these "type promotions" can be done by just looking at
> function callers and callees (and some functions will only be generated
> during the build of some rare architecture) and checks already done in
> them. In a couple of cases I referred to comments and my general
> understanding of code semantics. In one case this actually caused a
> regression (in the patch it is fixed, of course), because of somewhat
> misleading comment (see "live_label_rtx" function added in patch for
> details) The question is - are such changes OK for refactoring (or it
> should strictly preserve semantics)?

I think correct semantic changes are just fine if they make things
easier to use and read.

> 3. In lra-constraints.c I added a new class rtx_usage_list, which, IMHO,
> allows to group the functions which work with usage list in a more
> explicit manner and make some conditions more self-explaining. I hope
> that Vladimir Makarov (in this case, because it concerns LRA) and other
> authors will not object against such "intrusion" into their code (or
> would rather tell me what should be fixed in my patch(es), instead of
> just refusing to apply it/them). In general, are such changes OK or
> should better be avoided?

I wouldn't avoidthem, though I would definitely break this patch up into
smaller ones that each make one set of related changes.

> A couple of questions related to further work:
> 
> 1. I noticed that emit_insn function, in fact, does two kinds of things:
> it can either add it's argument to the chain, or, if the argument is a
> pattern, it creates a new instruction based on that pattern. Shouldn't
> this logic be separated in the callers?

That might well make sense.

> 2. Are there any plans on implementing a better class hierarchy on AST's
> ("union tree_node" type). I see that C++ FE uses a huge number of macros
> (which check TREE_CODE and some boolean flags). Could this be improved
> somehow?

people have talked about doing it, and Andrew MacLeod's work on
seperating types out of tree is related, but not too much has happened
yet.

Trev

> 
> -- 
> Regards,
> Mikhail Maltsev

> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index c2a3be3..7179faa 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -1745,9 +1745,11 @@ set_edge_can_fallthru_flag (void)
>   continue;
>if (!any_condjump_p (BB_END (bb)))
>   continue;
> -  if (!invert_jump (BB_END (bb), JUMP_LABEL (BB_END (bb)), 0))
> +
> +  rtx_jump_insn *bb_end_jump = as_a  (BB_END (bb));
> +  if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0))
>   continue;
> -  invert_jump (BB_END (bb), JUMP_LABEL (BB_END (bb)), 0);
> +  invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0);
>EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU;
>EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU;
>  }
> @@ -1902,9 +1904,15 @@ fix_up_fall_thru_edges (void)
>  
> fall_thru_label = block_label (fall_thru->dest);
>  
> -   if (old_jump && JUMP_P (old_jump) && fall_thru_label)
> - invert_worked = invert_jump (old_jump,
> -  fall_thru_label,0);
> +   if (old_jump && fall_thru_label)
> +{
> +  rtx_jump_insn *old_jump_insn =
> +  dyn_cast  (old_jump);
> +  if (old_jump_insn)
> +invert_worked = invert_jump (old_jump_insn,
> +  fall_thru_label, 0);
> +}
> +
> if (invert_worked)
>   {
> fall_thru->flags &= ~EDGE_FALLTHRU;
> @@ -2024,7 +2032,7 @@ fix_crossing_conditional_branches (void)
>rtx_insn *old_jump;
>rtx set_src;
>rtx old_label = NULL_RTX;
> -  rtx new_label;
>

Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread H.J. Lu
On Tue, Mar 31, 2015 at 7:25 AM, Jack Howarth  wrote:
> H.J.,
>  While the latest patch fails to bootstrap on x86_64-apple-darwin14...
>
>   _restore_x86_fp_state in os-unix-sysdep.o
>   _sysdep_save_fp_ctrl_state in os-unix-sysdep.o
> ld: symbol(s) not found for architecture x86_64
> collect2: error: ld returned 1 exit status
> Makefile:540: recipe for target 'libcilkrts.la' failed
> make[2]: *** [libcilkrts.la] Error 1
> make[2]: Leaving directory
> '/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libcilkrts'
> Makefile:13569: recipe for target 'all-target-libcilkrts' failed
> make[1]: *** [all-target-libcilkrts] Error 2
> make[1]: Leaving directory '/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir'
> Makefile:21064: recipe for target 'bootstrap' failed
> make: *** [bootstrap] Error 2
>
> as darwin will require the new usage of libgcc_nonshared.a to be added
> to the spec handling with...

Here is the updated patch to make libgcc_nonshared.a optional
so that it is only needed on Linux.

> Index: gcc/config/darwin.h
> ===
> --- gcc/config/darwin.h (revision 221794)
> +++ gcc/config/darwin.h (working copy)
> @@ -325,7 +325,7 @@ extern GTY(()) int darwin_ms_struct;
> need symbols from -lgcc.  */
>  #undef REAL_LIBGCC_SPEC
>  #define REAL_LIBGCC_SPEC   \
> -   "%{static-libgcc|static: -lgcc_eh -lgcc;   \
> +   "%{static-libgcc|static: -lgcc_eh -lgcc_nonshared -lgcc;   \
>shared-libgcc|fexceptions|fgnu-runtime:   \
> %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4)   \
> %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
> @@ -336,7 +336,7 @@ extern GTY(()) int darwin_ms_struct;
> %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
> %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4)   \
> %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5)   \
> -   -lgcc }"
> +   -lgcc_nonshared -lgcc }"
>
>  /* We specify crt0.o as -lcrt0.o so that ld will search the library path.
>
>  Jack
> ps One minor nit...
>
> Index: gcc/gcc.c
> ===
> --- gcc/gcc.c (revision 221794)
> +++ gcc/gcc.c (working copy)
> @@ -1566,11 +1566,13 @@ init_spec (void)
>   if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
>{
>  init_gcc_specs (&obstack,
> +"-lgcc_nonshared "
>  "-lgcc_s"
>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>  " -lunwind"
>  #endif
>  ,
> +"-lgcc_nonshared "
>  "-lgcc",
>  "-lgcc_eh"
>  #ifdef USE_LIBUNWIND_EXCEPTIONS
> @@ -1591,7 +1593,9 @@ init_spec (void)
>  /* Ug.  We don't know shared library extensions.  Hope that
> systems that use this form don't do shared libraries.  */
>  init_gcc_specs (&obstack,
> +"libgcc_nonshared.a%s "
>  "-lgcc_s",
> +"libgcc_nonshared.a%s "
>  "libgcc.a%s",
>  "libgcc_eh.a%s"
>
> You seem to have unnecessary trailing whitespace at the end of these flags.
>

The white space is needed to avoid -lgcc_nonshared-lgcc_s.


-- 
H.J.
---

We shouldn't call external function, __cpu_indicator_init, while an object
is being relocated since its .got.plt section hasn't been updated.  It
works for non-PIE since no update on .got.plt section is required.  This
patch hides __cpu_indicator_init/__cpu_model from linker to force linker
to resolve __cpu_indicator_init/__cpu_model to their hidden definitions
in libgcc_nonshared.a while providing backward binary compatibility.  The
new libgcc_nonshared.a is always linked togther with -lgcc_s and -lgcc.

gcc/

PR target/65612
* gcc.c (init_spec): Add -lgcc_nonshared/libgcc_nonshared.a%s
to -lgcc_s.

gcc/testsuite/

PR target/65612
* g++.dg/ext/mv18.C: New test.
* g++.dg/ext/mv19.C: Likewise.
* g++.dg/ext/mv20.C: Likewise.
* g++.dg/ext/mv21.C: Likewise.
* g++.dg/ext/mv22.C: Likewise.
* g++.dg/ext/mv23.C: Likewise.

libgcc/

PR target/65612
* Makefile.in (LIB2ADDNONSHARED): New.
(libgcc-nonshared-objects): Likewise.
(libgcc_nonshared.a): Likewise.
Check unsupported files in LIB2ADDNONSHARED.
(iter-items): Add $(LIB2ADDNONSHARED).
(all): Depend on libgcc_nonshared.a.
($(libgcc-nonshared-objects)): Depend on libgcc_tm.h.
(install-leaf): Install libgcc_nonshared.a.
* shared-object.mk: Check empty $o.
* config/i386/cpuinfo.c (__cpu_model): Initialize.
(__cpu_indicator_init@GCC_4.8.0): New.
(__cpu_model@GCC_4.8.0): Likewise.
* config/i386/t-cpuinfo (LIB2ADDNONSHARED): New.
* config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Add
-DUSE_ELF_SYMVER.
---
From 760b79124482860b4317f4d39fbe898cfbe8e47b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 29 Mar 2015 18:03:49 -0700
Subject: [PATCH] Hide __cpu_indicator_init/__cpu_model from linker

We shouldn't call external function, __cpu_indicator_init, while an object
is being relocated since its .got.plt section hasn't been updated.  It
works for non-PIE since no

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Tue, Mar 31, 2015 at 15:07:58 +0200, Jakub Jelinek wrote:
> On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
> > > What is the reason to register and allocate these one at a time, rather 
> > > than
> > > using one struct target_mem_desc with one tgt->array for all splay tree
> > > nodes registered from one image?
> > > Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
> > > clear it is special) and just use tgt_offset relative to that (i.e.
> > > absolute), but having to malloc each node individually and having to 
> > > malloc
> > > a target_mem_desc for each one sounds expensive.
> > > Everything is freed just once anyway, isn't it?
> > 
> > Here is WIP patch, does this look like what you suggested?  It works fine 
> > with
> > functions, however I'm not sure what to do with variables.  Will 
> > gomp_map_vars
> > work when tgt_start and tgt_end are equal to 0?
> 
> Can you explain what you are afraid of?  The mapped images (both their
> mapping and unmapping) are done in pairs, and in a valid program the
> addresses shouldn't be already mapped when the image is mapped in etc.
> So, for gomp_map_vars, the var allocations should just be the pre-existing
> mappings, i.e.
>   splay_tree_key n = splay_tree_lookup (&mm->splay_tree, &cur_node);
>   if (n)
> {
>   tgt->list[i] = n;
>   gomp_map_vars_existing (n, &cur_node, kind & typemask);
> }
> case and
>   if (is_target)
> {
>   for (i = 0; i < mapnum; i++)
> {
>   if (tgt->list[i] == NULL)
> cur_node.tgt_offset = (uintptr_t) NULL;
>   else
> cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
>   + tgt->list[i]->tgt_offset;
>   /* FIXME: see above FIXME comment.  */
>   devicep->host2dev_func (devicep->target_id,
>   (void *) (tgt->tgt_start
> + i * sizeof (void *)),
>   (void *) &cur_node.tgt_offset,
>   sizeof (void *));
> }
> }
> at the end.  tgt->list[i] will be non-NULL, tgt->list[i]->tgt->tgt_start
> will be 0, but tgt->list[i]->tgt_offset will be absolute and so should DTRT.

Ok, thanks for the clarification!  Here is the new patch with variables.

Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied
on top, but with disabled offloading to PTX.
Julian, have you seen them?  All other tests passed with intelmic emul.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

acc_on_device-1.c aborts here:
  /* Offloaded.  */
#pragma acc parallel
  {
if (acc_on_device (acc_device_none))
  abort ();


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   "#ifdef __cplusplus\n"
   "extern \"C\"\n"
   "#endif\n"
-  "void GOMP_offload_register (void *, int, void *);\n\n"
+  "void GOMP_offload_register (void *, int, void *);\n"
+  "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
   "__attribute__((constructor))\n"
   "static void\n"
   "init (void)\n"
   "{\n"
   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n"
+  "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  "__attribute__((destructor))\n"
+  "static void\n"
+  "fini (void)\n"
+  "{\n"
+  "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n"
   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 

Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread Jack Howarth
H.J.,
Did you attach the correct version of the patch? I don't see
anything conditional on linux.
Jack

On Tue, Mar 31, 2015 at 11:58 AM, H.J. Lu  wrote:
> On Tue, Mar 31, 2015 at 7:25 AM, Jack Howarth  
> wrote:
>> H.J.,
>>  While the latest patch fails to bootstrap on x86_64-apple-darwin14...
>>
>>   _restore_x86_fp_state in os-unix-sysdep.o
>>   _sysdep_save_fp_ctrl_state in os-unix-sysdep.o
>> ld: symbol(s) not found for architecture x86_64
>> collect2: error: ld returned 1 exit status
>> Makefile:540: recipe for target 'libcilkrts.la' failed
>> make[2]: *** [libcilkrts.la] Error 1
>> make[2]: Leaving directory
>> '/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir/x86_64-apple-darwin14.3.0/libcilkrts'
>> Makefile:13569: recipe for target 'all-target-libcilkrts' failed
>> make[1]: *** [all-target-libcilkrts] Error 2
>> make[1]: Leaving directory '/sw/src/fink.build/gcc5-5.0.0-1/darwin_objdir'
>> Makefile:21064: recipe for target 'bootstrap' failed
>> make: *** [bootstrap] Error 2
>>
>> as darwin will require the new usage of libgcc_nonshared.a to be added
>> to the spec handling with...
>
> Here is the updated patch to make libgcc_nonshared.a optional
> so that it is only needed on Linux.
>
>> Index: gcc/config/darwin.h
>> ===
>> --- gcc/config/darwin.h (revision 221794)
>> +++ gcc/config/darwin.h (working copy)
>> @@ -325,7 +325,7 @@ extern GTY(()) int darwin_ms_struct;
>> need symbols from -lgcc.  */
>>  #undef REAL_LIBGCC_SPEC
>>  #define REAL_LIBGCC_SPEC   \
>> -   "%{static-libgcc|static: -lgcc_eh -lgcc;   \
>> +   "%{static-libgcc|static: -lgcc_eh -lgcc_nonshared -lgcc;   \
>>shared-libgcc|fexceptions|fgnu-runtime:   \
>> %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4)   \
>> %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
>> @@ -336,7 +336,7 @@ extern GTY(()) int darwin_ms_struct;
>> %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
>> %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4)   \
>> %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5)   \
>> -   -lgcc }"
>> +   -lgcc_nonshared -lgcc }"
>>
>>  /* We specify crt0.o as -lcrt0.o so that ld will search the library path.
>>
>>  Jack
>> ps One minor nit...
>>
>> Index: gcc/gcc.c
>> ===
>> --- gcc/gcc.c (revision 221794)
>> +++ gcc/gcc.c (working copy)
>> @@ -1566,11 +1566,13 @@ init_spec (void)
>>   if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
>>{
>>  init_gcc_specs (&obstack,
>> +"-lgcc_nonshared "
>>  "-lgcc_s"
>>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>>  " -lunwind"
>>  #endif
>>  ,
>> +"-lgcc_nonshared "
>>  "-lgcc",
>>  "-lgcc_eh"
>>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>> @@ -1591,7 +1593,9 @@ init_spec (void)
>>  /* Ug.  We don't know shared library extensions.  Hope that
>> systems that use this form don't do shared libraries.  */
>>  init_gcc_specs (&obstack,
>> +"libgcc_nonshared.a%s "
>>  "-lgcc_s",
>> +"libgcc_nonshared.a%s "
>>  "libgcc.a%s",
>>  "libgcc_eh.a%s"
>>
>> You seem to have unnecessary trailing whitespace at the end of these flags.
>>
>
> The white space is needed to avoid -lgcc_nonshared-lgcc_s.
>
>
> --
> H.J.
> ---
>
> We shouldn't call external function, __cpu_indicator_init, while an object
> is being relocated since its .got.plt section hasn't been updated.  It
> works for non-PIE since no update on .got.plt section is required.  This
> patch hides __cpu_indicator_init/__cpu_model from linker to force linker
> to resolve __cpu_indicator_init/__cpu_model to their hidden definitions
> in libgcc_nonshared.a while providing backward binary compatibility.  The
> new libgcc_nonshared.a is always linked togther with -lgcc_s and -lgcc.
>
> gcc/
>
> PR target/65612
> * gcc.c (init_spec): Add -lgcc_nonshared/libgcc_nonshared.a%s
> to -lgcc_s.
>
> gcc/testsuite/
>
> PR target/65612
> * g++.dg/ext/mv18.C: New test.
> * g++.dg/ext/mv19.C: Likewise.
> * g++.dg/ext/mv20.C: Likewise.
> * g++.dg/ext/mv21.C: Likewise.
> * g++.dg/ext/mv22.C: Likewise.
> * g++.dg/ext/mv23.C: Likewise.
>
> libgcc/
>
> PR target/65612
> * Makefile.in (LIB2ADDNONSHARED): New.
> (libgcc-nonshared-objects): Likewise.
> (libgcc_nonshared.a): Likewise.
> Check unsupported files in LIB2ADDNONSHARED.
> (iter-items): Add $(LIB2ADDNONSHARED).
> (all): Depend on libgcc_nonshared.a.
> ($(libgcc-nonshared-objects)): Depend on libgcc_tm.h.
> (install-leaf): Install libgcc_nonshared.a.
> * shared-object.mk: Check empty $o.
> * config/i386/cpuinfo.c (__cpu_model): Initialize.
> (__cpu_indicator_init@GCC_4.8.0): New.
> (__cpu_model@GCC_4.8.0): Likewise.
> * config/i386/t-cpuinfo (LIB2ADDNONSHARED): New.
> * config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Add
> -DUSE_ELF_SYMVER.
> ---


Re: [patch c++]: Fix for PR/65390

2015-03-31 Thread Jason Merrill

OK, thanks.

Jason


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread H.J. Lu
On Tue, Mar 31, 2015 at 9:09 AM, Jack Howarth  wrote:
> H.J.,
> Did you attach the correct version of the patch? I don't see
> anything conditional on linux.
> Jack

My patch will build and install libgcc_nonshared.a for all targets.  If you
don't link against it, nothing is changed.  On Linux, it is used via the
init_spec change.

-- 
H.J.


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread Jack Howarth
On Tue, Mar 31, 2015 at 12:14 PM, H.J. Lu  wrote:
> On Tue, Mar 31, 2015 at 9:09 AM, Jack Howarth  
> wrote:
>> H.J.,
>> Did you attach the correct version of the patch? I don't see
>> anything conditional on linux.
>> Jack
>
> My patch will build and install libgcc_nonshared.a for all targets.  If you
> don't link against it, nothing is changed.  On Linux, it is used via the
> init_spec change.

Isn't...

diff --git a/gcc/gcc.c b/gcc/gcc.c
index d956c36..3fbd549 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1566,6 +1566,7 @@ init_spec (void)
if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
  {
init_gcc_specs (&obstack,
+   "-lgcc_nonshared "
"-lgcc_s"
 #ifdef USE_LIBUNWIND_EXCEPTIONS
" -lunwind"
@@ -1591,6 +1592,7 @@ init_spec (void)
/* Ug.  We don't know shared library extensions.  Hope that
   systems that use this form don't do shared libraries.  */
init_gcc_specs (&obstack,
+   "libgcc_nonshared.a%s "
"-lgcc_s",
"libgcc.a%s",
"libgcc_eh.a%s"

problematic for Solaris? I am unfamiliar with the Solaris spec
handling but sol2.h doesn't seem to have any instances of -lgcc which
might imply they use the stock compiler invocation which will now have
a non-existent libgcc_nonshared static library.
 Also, are you leaving the cpu symbols in libgcc.a on non-linux
targets? If not, the linkage failure reported in
https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01668.html will occur,
no?
Jack
>
> --
> H.J.


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread H.J. Lu
On Tue, Mar 31, 2015 at 9:39 AM, Jack Howarth  wrote:
> On Tue, Mar 31, 2015 at 12:14 PM, H.J. Lu  wrote:
>> On Tue, Mar 31, 2015 at 9:09 AM, Jack Howarth  
>> wrote:
>>> H.J.,
>>> Did you attach the correct version of the patch? I don't see
>>> anything conditional on linux.
>>> Jack
>>
>> My patch will build and install libgcc_nonshared.a for all targets.  If you
>> don't link against it, nothing is changed.  On Linux, it is used via the
>> init_spec change.
>
> Isn't...
>
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index d956c36..3fbd549 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -1566,6 +1566,7 @@ init_spec (void)
> if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
>   {
> init_gcc_specs (&obstack,
> +   "-lgcc_nonshared "
> "-lgcc_s"
>  #ifdef USE_LIBUNWIND_EXCEPTIONS
> " -lunwind"
> @@ -1591,6 +1592,7 @@ init_spec (void)
> /* Ug.  We don't know shared library extensions.  Hope that
>systems that use this form don't do shared libraries.  */
> init_gcc_specs (&obstack,
> +   "libgcc_nonshared.a%s "
> "-lgcc_s",
> "libgcc.a%s",
> "libgcc_eh.a%s"
>
> problematic for Solaris? I am unfamiliar with the Solaris spec
> handling but sol2.h doesn't seem to have any instances of -lgcc which
> might imply they use the stock compiler invocation which will now have
> a non-existent libgcc_nonshared static library.

libgcc_nonshared.a is built and installed for all targets.

>  Also, are you leaving the cpu symbols in libgcc.a on non-linux
> targets? If not, the linkage failure reported in
> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01668.html will occur,
> no?

My current patch doesn't change what are in libgcc.a.  It
adds libgcc_nonshared.a for all targets, which contains
the same cpuinfo.o as in libgcc.a or a dummy .o if libgcc.a
doesn't have cpuinfo.o.

-- 
H.J.


[PATCH, i386]: Fix PR 58945, Improve atomic_compare_and_swap*_doubleword pattern

2015-03-31 Thread Uros Bizjak
Hello!

As shown in the PR, the attached patch substantial improves generated
code when cmpxchg}8,16}b insn is involved. Following testcase:

--cut here--
__int128_t i;

int main()
{
  __atomic_store_16(&i, -1, 0);
  if (i != -1)
__builtin_abort();
  return 0;
}
--cut here--

compiles with -O2 -mcx16 to:

movqi(%rip), %rax
movq$-1, %rcx
movqi+8(%rip), %rdx
.L2:
movq%rcx, %rbx
lock cmpxchg16bi(%rip)
jne .L2

where without the patch, the compiler generated:

movqi(%rip), %rsi
movq$-1, %rcx
movqi+8(%rip), %rdi
.L2:
movq%rsi, %rax
movq%rdi, %rdx
movq%rcx, %rbx
lock cmpxchg16b i(%rip)
movq%rdx, %rdi
movq%rax, %rsi
jne .L2

2015-03-31  Uros Bizjak  

PR target/58945
* config/i386/sync.md (atomic_compare_and_swap_doubleword):
Do not split operands 0 and operands 2 to halfmode.
(atomic_compare_and_swap): Update for
atomic_compare_and_swap_doubleword changes.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32} and was committed to mainline.

Uros.
Index: config/i386/sync.md
===
--- config/i386/sync.md (revision 221786)
+++ config/i386/sync.md (working copy)
@@ -351,21 +351,12 @@
   else
 {
   machine_mode hmode = mode;
-  rtx lo_o, lo_e, lo_n, hi_o, hi_e, hi_n;
 
-  lo_o = operands[1];
-  lo_e = operands[3];
-  lo_n = operands[4];
-  hi_o = gen_highpart (hmode, lo_o);
-  hi_e = gen_highpart (hmode, lo_e);
-  hi_n = gen_highpart (hmode, lo_n);
-  lo_o = gen_lowpart (hmode, lo_o);
-  lo_e = gen_lowpart (hmode, lo_e);
-  lo_n = gen_lowpart (hmode, lo_n);
-
   emit_insn
(gen_atomic_compare_and_swap_doubleword
-(lo_o, hi_o, operands[2], lo_e, hi_e, lo_n, hi_n, operands[6]));
+(operands[1], operands[2], operands[3],
+gen_lowpart (hmode, operands[4]), gen_highpart (hmode, operands[4]),
+operands[6]));
 }
 
   ix86_expand_setcc (operands[0], EQ, gen_rtx_REG (CCZmode, FLAGS_REG),
@@ -389,31 +380,26 @@
   "lock{%;} %K4cmpxchg{}\t{%3, %1|%1, %3}")
 
 ;; For double-word compare and swap, we are obliged to play tricks with
-;; the input newval (op5:op6) because the Intel register numbering does
+;; the input newval (op3:op4) because the Intel register numbering does
 ;; not match the gcc register numbering, so the pair must be CX:BX.
-;; That said, in order to take advantage of possible lower-subreg opts,
-;; treat all of the integral operands in the same way.
 
 (define_mode_attr doublemodesuffix [(SI "8") (DI "16")])
 
 (define_insn "atomic_compare_and_swap_doubleword"
-  [(set (match_operand:DWIH 0 "register_operand" "=a")
-   (unspec_volatile:DWIH
- [(match_operand: 2 "memory_operand" "+m")
-  (match_operand:DWIH 3 "register_operand" "0")
-  (match_operand:DWIH 4 "register_operand" "1")
-  (match_operand:DWIH 5 "register_operand" "b")
-  (match_operand:DWIH 6 "register_operand" "c")
-  (match_operand:SI 7 "const_int_operand")]
+  [(set (match_operand: 0 "register_operand" "=A")
+   (unspec_volatile:
+ [(match_operand: 1 "memory_operand" "+m")
+  (match_operand: 2 "register_operand" "0")
+  (match_operand:DWIH 3 "register_operand" "b")
+  (match_operand:DWIH 4 "register_operand" "c")
+  (match_operand:SI 5 "const_int_operand")]
  UNSPECV_CMPXCHG))
-   (set (match_operand:DWIH 1 "register_operand" "=d")
-   (unspec_volatile:DWIH [(const_int 0)] UNSPECV_CMPXCHG))
-   (set (match_dup 2)
+   (set (match_dup 1)
(unspec_volatile: [(const_int 0)] UNSPECV_CMPXCHG))
(set (reg:CCZ FLAGS_REG)
 (unspec_volatile:CCZ [(const_int 0)] UNSPECV_CMPXCHG))]
   "TARGET_CMPXCHGB"
-  "lock{%;} %K7cmpxchgb\t%2")
+  "lock{%;} %K5cmpxchgb\t%1")
 
 ;; For operand 2 nonmemory_operand predicate is used instead of
 ;; register_operand to allow combiner to better optimize atomic


[C++ Patch/RFC] PR 56100

2015-03-31 Thread Paolo Carlini

Hi,

thus, in order to not warn -Wshadow at instantiation time, I figured out 
the below. Tested x86_64-linux.


Note, I took the idea of allowing for current_instantiation ()->decl != 
current_function_decl from some code prepared by Dodji for 
-Wunused-local-typedefs: I'm not 100% sure it's necessary here, but in 
any case testcases *10.C and *11.C exercise that path (in general, we 
don't seem to have many testcases involving this specific kind of 
-Wshadow and templates, thus it cannot hurt, IMO)


Thanks!
Paolo.


Index: cp/name-lookup.c
===
--- cp/name-lookup.c(revision 221795)
+++ cp/name-lookup.c(working copy)
@@ -1277,7 +1277,10 @@ pushdecl_maybe_friend_1 (tree x, bool is_friend)
   old and new decls are type decls.  */
|| (TREE_CODE (oldglobal) == TYPE_DECL
&& (!DECL_ARTIFICIAL (oldglobal)
-   || TREE_CODE (x) == TYPE_DECL
+   || TREE_CODE (x) == TYPE_DECL)))
+  && (current_instantiation () == NULL
+  || (current_instantiation ()->decl
+  != current_function_decl)))
/* XXX shadow warnings in outer-more namespaces */
{
  if (warning_at (input_location, OPT_Wshadow,
Index: testsuite/g++.dg/warn/Wshadow-10.C
===
--- testsuite/g++.dg/warn/Wshadow-10.C  (revision 0)
+++ testsuite/g++.dg/warn/Wshadow-10.C  (working copy)
@@ -0,0 +1,15 @@
+// PR c++/56100
+// { dg-options "-Wshadow" }
+
+struct bar
+{
+  template 
+  void baz () { int foo; }
+};
+
+int foo;
+
+int main ()
+{
+  bar ().baz  ();
+}
Index: testsuite/g++.dg/warn/Wshadow-11.C
===
--- testsuite/g++.dg/warn/Wshadow-11.C  (revision 0)
+++ testsuite/g++.dg/warn/Wshadow-11.C  (working copy)
@@ -0,0 +1,15 @@
+// PR c++/56100
+// { dg-options "-Wshadow" }
+
+int foo;  // { dg-message "shadowed declaration" }
+
+struct bar
+{
+  template 
+  void baz () { int foo; }  // { dg-warning "shadows a global" }
+};
+
+int main ()
+{
+  bar ().baz  ();
+}
Index: testsuite/g++.dg/warn/Wshadow-8.C
===
--- testsuite/g++.dg/warn/Wshadow-8.C   (revision 0)
+++ testsuite/g++.dg/warn/Wshadow-8.C   (working copy)
@@ -0,0 +1,15 @@
+// PR c++/56100
+// { dg-options "-Wshadow" }
+
+template 
+struct bar
+{
+  void baz () { int foo; }
+};
+
+int foo;
+
+int main ()
+{
+  bar  ().baz ();
+}
Index: testsuite/g++.dg/warn/Wshadow-9.C
===
--- testsuite/g++.dg/warn/Wshadow-9.C   (revision 0)
+++ testsuite/g++.dg/warn/Wshadow-9.C   (working copy)
@@ -0,0 +1,15 @@
+// PR c++/56100
+// { dg-options "-Wshadow" }
+
+int foo;  // { dg-message "shadowed declaration" }
+
+template 
+struct bar
+{
+  void baz () { int foo; }  // { dg-warning "shadows a global" }
+};
+
+int main ()
+{
+  bar  ().baz ();
+}


Re: [PATCH] fortran/65429 -- don't dereference a null pointer

2015-03-31 Thread Jerry DeLisle

On 03/29/2015 09:25 AM, Steve Kargl wrote:

On Sat, Mar 28, 2015 at 01:01:57AM +0100, Dominique Dhumieres wrote:


AFAICT your test succeeds without your patch and does not test that the ICE
reported by FX is gone (indeed it is with your patch).



New patch and testcase.  The ChangeLog entries are in the
original email.  Built and tested on x86_64-*-freebsd.
OK, now?



OK Steve.


C++ PATCH for c++/65554 (ICE with user-defined initializer_list)

2015-03-31 Thread Marek Polacek
The user *should* have been using .  But responding to this
with an ICE isn't acceptable either.

We do reject wholly incompatible user-defined initializer_list: finish_struct
requires it be a template with a pointer field followed by an integer field,
and in this case it is, but convert_like_real assumes that the second integer
field has a size_type, so it initializes the length with that type.  But as the
following testcase (which clang accepts) shows, it might be a different integer
type, and gimplifier doesn't like any non-trivial conversion in an assignment.

This fixes only a part of the PR.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-03-31  Marek Polacek  

PR c++/65554
* call.c (convert_like_real): Build integer constant with the field
type, not always of size_type.

* g++.dg/cpp0x/initlist93.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 31d2b9c..b171179 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -6366,7 +6366,8 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
field = next_initializable_field (TYPE_FIELDS (totype));
CONSTRUCTOR_APPEND_ELT (vec, field, array);
field = next_initializable_field (DECL_CHAIN (field));
-   CONSTRUCTOR_APPEND_ELT (vec, field, size_int (len));
+   CONSTRUCTOR_APPEND_ELT (vec, field,
+   build_int_cst (TREE_TYPE (field), len));
new_ctor = build_constructor (totype, vec);
return get_target_expr_sfinae (new_ctor, complain);
   }
diff --git gcc/testsuite/g++.dg/cpp0x/initlist93.C 
gcc/testsuite/g++.dg/cpp0x/initlist93.C
index e69de29..230e2f9 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist93.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist93.C
@@ -0,0 +1,26 @@
+// PR c++/65554
+// { dg-do compile { target c++11 } }
+
+namespace std
+{
+template  class initializer_list
+{
+  int *_M_array;
+  int _M_len;
+};
+class A
+{
+public:
+  void operator=(initializer_list);
+};
+class B
+{
+  void m_fn1 (A &) const;
+};
+void
+B::m_fn1 (A &) const
+{
+  A extra;
+  extra = {};
+}
+}

Marek


libgo patch committed: Fix go tool to put external tests first

2015-03-31 Thread Ian Lance Taylor
When a complex package has both external and internal tests, we need
to link against the external tests first.  This patch from Dave Cheney
fixes this.  Bootstrapped on x86_64-unknown-linux-gnu.  Committed to
mainline.

Ian
diff -r c601118c5169 libgo/go/cmd/go/build.go
--- a/libgo/go/cmd/go/build.go  Tue Mar 31 10:25:51 2015 -0700
+++ b/libgo/go/cmd/go/build.go  Tue Mar 31 10:53:19 2015 -0700
@@ -1921,6 +1921,7 @@
// and all LDFLAGS from cgo dependencies.
apackagesSeen := make(map[*Package]bool)
afiles := []string{}
+   xfiles := []string{}
ldflags := b.gccArchArgs()
cgoldflags := []string{}
usesCgo := false
@@ -1936,7 +1937,12 @@
if !a.p.Standard {
if a.p != nil && !apackagesSeen[a.p] {
apackagesSeen[a.p] = true
-   if a.p.fake {
+   if a.p.fake && a.p.external {
+   // external _tests, if present must 
come before
+   // internal _tests. Store these on a 
seperate list
+   // and place them at the head after 
this loop.
+   xfiles = append(xfiles, a.target)
+   } else if a.p.fake {
// move _test files to the top of the 
link order
afiles = append([]string{a.target}, 
afiles...)
} else {
@@ -1945,6 +1951,7 @@
}
}
}
+   afiles = append(xfiles, afiles...)
 
for _, a := range allactions {
if a.p != nil {
diff -r c601118c5169 libgo/go/cmd/go/pkg.go
--- a/libgo/go/cmd/go/pkg.goTue Mar 31 10:25:51 2015 -0700
+++ b/libgo/go/cmd/go/pkg.goTue Mar 31 10:53:19 2015 -0700
@@ -83,6 +83,7 @@
allgofiles   []string // gofiles + IgnoredGoFiles, absolute 
paths
target   string   // installed file for this package 
(may be executable)
fake bool // synthesized package
+   external bool // synthesized external test package
forceBuild   bool // this package must be rebuilt
forceLibrary bool // this package is a library (even if 
named "main")
cmdline  bool // defined by files listed on command 
line
diff -r c601118c5169 libgo/go/cmd/go/test.go
--- a/libgo/go/cmd/go/test.go   Tue Mar 31 10:25:51 2015 -0700
+++ b/libgo/go/cmd/go/test.go   Tue Mar 31 10:53:19 2015 -0700
@@ -692,10 +692,11 @@
build: &build.Package{
ImportPos: p.build.XTestImportPos,
},
-   imports: ximports,
-   pkgdir:  testDir,
-   fake:true,
-   Stale:   true,
+   imports:  ximports,
+   pkgdir:   testDir,
+   fake: true,
+   external: true,
+   Stale:true,
}
if pxtestNeedsPtest {
pxtest.imports = append(pxtest.imports, ptest)


Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8

2015-03-31 Thread Bill Schmidt
Hi,

David correctly pointed out offline that I used the wrong macro to test
for efficient unaligned access.  Here's a corrected version, which still
fixes PR65456 without causing regressions.  Sorry for the error!

Thanks,
Bill

On Sun, 2015-03-29 at 12:42 -0500, Bill Schmidt wrote:
> Hi,
> 
> This is a follow-up to
> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00103.html, which adds
> support for faster unaligned vector memory accesses on POWER8.  As
> pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456, there
> was a piece missing here.  The target macro SLOW_UNALIGNED_ACCESS is
> still evaluating to 1 for unaligned vector accesses on POWER8, which
> causes some scalarization to occur during expand.  This version of the
> patch fixes this as well.
> 
> The only changes from before are the update to config/rs6000/rs6000.h,
> and the new test case gcc.target/powerpc/pr65456.c.  Is this ok for
> trunk after 5 branches, and backports to 4.8, 4.9, 5 thereafter?
> 
> Thanks,
> Bill


[gcc]

2015-03-31  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_option_override_internal):  For
VSX + POWER8, enable TARGET_ALLOW_MOVMISALIGN and
TARGET_EFFICIENT_UNALIGNED_VSX if not selected by command line
option.  However, for -mno-allow-movmisalign, be sure to disable
TARGET_EFFICIENT_UNALIGNED_VSX to avoid an ICE.
(rs6000_builtin_mask_for_load): Return 0 for targets with
efficient unaligned VSX accesses so that the vectorizer will use
direct unaligned loads.
(rs6000_builtin_support_vector_misalignment): Always return true
for targets with efficient unaligned VSX accesses.
(rs6000_builtin_vectorization_cost): Cost of unaligned loads and
stores on targets with efficient unaligned VSX accesses is almost
always the same as the cost of an aligned load or store, so model
it that way.
* config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Evaluate to
zero for unaligned vector accesses on POWER8.
* config/rs6000/rs6000.opt (mefficient-unaligned-vector): New
undocumented option.

[gcc/testsuite]

2015-03-31  Bill Schmidt  

* gcc.dg/vect/bb-slp-24.c: Exclude test for POWER8.
* gcc.dg/vect/bb-slp-25.c: Likewise.
* gcc.dg/vect/bb-slp-29.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Replace vect_no_align with
vect_no_align && { ! vect_hw_misalign }.
* gcc.dg/vect/bb-slp-9.c: Likewise.
* gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Exclude test for
vect_hw_misalign.
* gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Likewise.
* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust tests to
account for POWER8, where peeling for alignment is not needed.
* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Replace
vect_no_align with vect_no_align && { ! vect_hw_misalign }.
* gcc.dg.vect.if-cvt-stores-vect-ifcvt-18.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-6-global.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-6.c: Likewise.
* gcc.dg/vect/no-vfa-vect-43.c: Likewise.
* gcc.dg/vect/no-vfa-vect-57.c: Likewise.
* gcc.dg/vect/no-vfa-vect-61.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
* gcc.dg/vect/pr16105.c: Likewise.
* gcc.dg/vect/pr20122.c: Likewise.
* gcc.dg/vect/pr33804.c: Likewise.
* gcc.dg/vect/pr33953.c: Likewise.
* gcc.dg/vect/pr56787.c: Likewise.
* gcc.dg/vect/pr58508.c: Likewise.
* gcc.dg/vect/slp-25.c: Likewise.
* gcc.dg/vect/vect-105-bit-array.c: Likewise.
* gcc.dg/vect/vect-105.c: Likewise.
* gcc.dg/vect/vect-27.c: Likewise.
* gcc.dg/vect/vect-29.c: Likewise.
* gcc.dg/vect/vect-33.c: Exclude unaligned access test for
POWER8.
* gcc.dg/vect/vect-42.c: Replace vect_no_align with vect_no_align
&& { ! vect_hw_misalign }.
* gcc.dg/vect/vect-44.c: Likewise.
* gcc.dg/vect/vect-48.c: Likewise.
* gcc.dg/vect/vect-50.c: Likewise.
* gcc.dg/vect/vect-52.c: Likewise.
* gcc.dg/vect/vect-56.c: Likewise.
* gcc.dg/vect/vect-60.c: Likewise.
* gcc.dg/vect/vect-72.c: Likewise.
* gcc.dg/vect/vect-75-big-array.c: Likewise.
* gcc.dg/vect/vect-75.c: Likewise.
* gcc.dg/vect/vect-77-alignchecks.c: Likewise.
* gcc.dg/vect/vect-77-global.c: Likewise.
* gcc.dg/vect/vect-78-alignchecks.c: Likewise.
* gcc.dg/vect/vect-78-global.c: Likewise.
* gcc.dg/vect/vect-93.c: Likewise.
* gcc.dg/vect/vect-95.c: Likewise.
* gcc.dg/vect/vect-96.c: Likewise.
* gcc.dg/vect/vect-cond-1.c: Likewise.
* gcc.dg/vect/vect-cond-3.c: Likewise.
* gcc.dg/vect/vect-cond-4.c: Likewise.
 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> Shouldn't either this function, or gomp_offload_image_to_device lock
> also devicep->lock mutex and unlock at the end?
> Where exactly I guess depends on if the devicep->* hook calls should be
> guarded with the mutex or not.  If yes, it should be this function and
> gomp_init_device.
> 
> > +  if (devicep->type != target_type || !devicep->is_initialized)
> > +   continue;
> > +
> 
> Similarly.

Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep->lock and then
calls gomp_fatal, the destructors from .fini section are executed, so
gomp_mutex_lock in GOMP_offload_unregister will wait for devicep->lock.

  -- Ilya


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-03-31 Thread Jack Howarth
On Tue, Mar 31, 2015 at 1:00 PM, H.J. Lu  wrote:
> On Tue, Mar 31, 2015 at 9:39 AM, Jack Howarth  
> wrote:
>> On Tue, Mar 31, 2015 at 12:14 PM, H.J. Lu  wrote:
>>> On Tue, Mar 31, 2015 at 9:09 AM, Jack Howarth  
>>> wrote:
 H.J.,
 Did you attach the correct version of the patch? I don't see
 anything conditional on linux.
 Jack
>>>
>>> My patch will build and install libgcc_nonshared.a for all targets.  If you
>>> don't link against it, nothing is changed.  On Linux, it is used via the
>>> init_spec change.
>>
>> Isn't...
>>
>> diff --git a/gcc/gcc.c b/gcc/gcc.c
>> index d956c36..3fbd549 100644
>> --- a/gcc/gcc.c
>> +++ b/gcc/gcc.c
>> @@ -1566,6 +1566,7 @@ init_spec (void)
>> if (in_sep && *p == '-' && strncmp (p, "-lgcc", 5) == 0)
>>   {
>> init_gcc_specs (&obstack,
>> +   "-lgcc_nonshared "
>> "-lgcc_s"
>>  #ifdef USE_LIBUNWIND_EXCEPTIONS
>> " -lunwind"
>> @@ -1591,6 +1592,7 @@ init_spec (void)
>> /* Ug.  We don't know shared library extensions.  Hope that
>>systems that use this form don't do shared libraries.  */
>> init_gcc_specs (&obstack,
>> +   "libgcc_nonshared.a%s "
>> "-lgcc_s",
>> "libgcc.a%s",
>> "libgcc_eh.a%s"
>>
>> problematic for Solaris? I am unfamiliar with the Solaris spec
>> handling but sol2.h doesn't seem to have any instances of -lgcc which
>> might imply they use the stock compiler invocation which will now have
>> a non-existent libgcc_nonshared static library.
>
> libgcc_nonshared.a is built and installed for all targets.
>
>>  Also, are you leaving the cpu symbols in libgcc.a on non-linux
>> targets? If not, the linkage failure reported in
>> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01668.html will occur,
>> no?
>
> My current patch doesn't change what are in libgcc.a.  It
> adds libgcc_nonshared.a for all targets, which contains
> the same cpuinfo.o as in libgcc.a or a dummy .o if libgcc.a
> doesn't have cpuinfo.o.


I can confirm that the most current patch bootstraps on
x86_64-apple-darwin14 and that all of the new tests show up as
unsupported in the test suite.
  Jack

>
> --
> H.J.


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 09:25:26PM +0300, Ilya Verbin wrote:
> On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
> > Shouldn't either this function, or gomp_offload_image_to_device lock
> > also devicep->lock mutex and unlock at the end?
> > Where exactly I guess depends on if the devicep->* hook calls should be
> > guarded with the mutex or not.  If yes, it should be this function and
> > gomp_init_device.
> > 
> > > +  if (devicep->type != target_type || !devicep->is_initialized)
> > > + continue;
> > > +
> > 
> > Similarly.
> 
> Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep->lock and then
> calls gomp_fatal, the destructors from .fini section are executed, so
> gomp_mutex_lock in GOMP_offload_unregister will wait for devicep->lock.

Thus perhaps before calling gomp_fatal you should release the device lock
(if held) and register_lock (ditto).

Jakub


Re: [PATCH] [ARM] PR45701 testcase fix.

2015-03-31 Thread Alex Velenko



On 31/03/15 15:30, Richard Earnshaw wrote:

On 04/03/15 11:13, Alex Velenko wrote:

Hi,

This patch fixes arm pr45701 scan assembly tests. Those test register r3 being
used to maintain stack double word alignment. Recent optimizations reduced
number of local variables needed in those tests, removing necessity to push r3.
Testcase fixed by adding additional local variable.

Is patch OK?



This patch is OK.

Let me put it on record that I really dislike these scan-assembler tests
that rely on specific behaviours throughout the entire compilation flow.
  They're just too fragile to be useful.

R.



Committed.


2015-03-04  Alex Velenko  

gcc/testsuite

* gcc.target/arm/pr45701-1.c (history_expand_line_internal): Add an
extra variable to force stack alignment.
* gcc.target/arm/pr45701-2.c (history_expand_line_internal): Add an
extra variable to force stack alignment.
---
  gcc/testsuite/gcc.target/arm/pr45701-1.c | 5 +++--
  gcc/testsuite/gcc.target/arm/pr45701-2.c | 5 +++--
  2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr45701-1.c 
b/gcc/testsuite/gcc.target/arm/pr45701-1.c
index 2c690d5..454a087 100644
--- a/gcc/testsuite/gcc.target/arm/pr45701-1.c
+++ b/gcc/testsuite/gcc.target/arm/pr45701-1.c
@@ -5,6 +5,7 @@
  /* { dg-final { scan-assembler-not "r8" } } */

  extern int hist_verify;
+extern int a1;
  extern char *pre_process_line (char*);
  extern char* str_cpy (char*, char*);
  extern int str_len (char*);
@@ -16,10 +17,10 @@ history_expand_line_internal (char* line)
  {
char *new_line;
int old_verify;
-
+  int a = a1;
old_verify = hist_verify;
hist_verify = 0;
new_line = pre_process_line (line);
-  hist_verify = old_verify;
+  hist_verify = old_verify + a;
return (new_line == line) ? savestring (line) : new_line;
  }
diff --git a/gcc/testsuite/gcc.target/arm/pr45701-2.c 
b/gcc/testsuite/gcc.target/arm/pr45701-2.c
index ee1ee7d..afe0840 100644
--- a/gcc/testsuite/gcc.target/arm/pr45701-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr45701-2.c
@@ -5,6 +5,7 @@
  /* { dg-final { scan-assembler-not "r8" } } */

  extern int hist_verify;
+extern int a1;
  extern char *pre_process_line (char*);
  extern char* savestring1 (char*, char*);
  extern char* str_cpy (char*, char*);
@@ -17,11 +18,11 @@ history_expand_line_internal (char* line)
  {
char *new_line;
int old_verify;
-
+  int a = a1;
old_verify = hist_verify;
hist_verify = 0;
new_line = pre_process_line (line);
-  hist_verify = old_verify;
+  hist_verify = old_verify + a;
/* Two tail calls here, but r3 is not used to pass values.  */
return (new_line == line) ? savestring (line) : savestring1 (new_line, 
line);
  }







Re: C++ PATCH for c++/65554 (ICE with user-defined initializer_list)

2015-03-31 Thread Jason Merrill

On 03/31/2015 01:22 PM, Marek Polacek wrote:

The user *should* have been using .  But responding to this
with an ICE isn't acceptable either.

We do reject wholly incompatible user-defined initializer_list: finish_struct
requires it be a template with a pointer field followed by an integer field,
and in this case it is, but convert_like_real assumes that the second integer
field has a size_type, so it initializes the length with that type.  But as the
following testcase (which clang accepts) shows, it might be a different integer
type, and gimplifier doesn't like any non-trivial conversion in an assignment.


I think I'd prefer to enforce that the second integer is size_t, not 
just an integer, so that the assumption in convert_like_real is correct.


Jason



Re: [C++ Patch/RFC] PR 56100

2015-03-31 Thread Jason Merrill

On 03/31/2015 01:14 PM, Paolo Carlini wrote:

Note, I took the idea of allowing for current_instantiation ()->decl !=
current_function_decl from some code prepared by Dodji for
-Wunused-local-typedefs


Let's make this a predicate function.

Jason



libgo patch committed: Remove some accidentally committed files

2015-03-31 Thread Ian Lance Taylor
I noticed that I accidentally committed three generated files in
libgo/runtime.  They are unused.  I committed this patch to remove
them.

Ian
Index: libgo/runtime/chan.c
===
--- libgo/runtime/chan.c(revision 221440)
+++ libgo/runtime/chan.c(working copy)
@@ -1,1186 +0,0 @@
-// AUTO-GENERATED by autogen.sh; DO NOT EDIT
-
-#include "runtime.h"
-#include "arch.h"
-#include "go-type.h"
-#include "race.h"
-#include "malloc.h"
-#include "chan.h"
-
-#line 13 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-uint32 runtime_Hchansize = sizeof ( Hchan ) ; 
-#line 15 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-static void dequeueg ( WaitQ* ) ; 
-static SudoG* dequeue ( WaitQ* ) ; 
-static void enqueue ( WaitQ* , SudoG* ) ; 
-static void racesync ( Hchan* , SudoG* ) ; 
-#line 20 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-static Hchan* 
-makechan ( ChanType *t , int64 hint ) 
-{ 
-Hchan *c; 
-uintptr n; 
-const Type *elem; 
-#line 27 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-elem = t->__element_type; 
-#line 30 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( elem->__size >= ( 1<<16 ) ) 
-runtime_throw ( "makechan: invalid channel element type" ) ; 
-#line 33 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( hint < 0 || ( intgo ) hint != hint || ( elem->__size > 0 && ( uintptr ) 
hint > ( MaxMem - sizeof ( *c ) ) / elem->__size ) ) 
-runtime_panicstring ( "makechan: size out of range" ) ; 
-#line 36 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-n = sizeof ( *c ) ; 
-n = ROUND ( n , elem->__align ) ; 
-#line 40 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-c = ( Hchan* ) runtime_mallocgc ( sizeof ( *c ) + hint*elem->__size , ( 
uintptr ) t | TypeInfo_Chan , 0 ) ; 
-c->elemsize = elem->__size; 
-c->elemtype = elem; 
-c->dataqsiz = hint; 
-#line 45 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( debug ) 
-runtime_printf ( "makechan: chan=%p; elemsize=%D; dataqsiz=%D\n" , 
-c , ( int64 ) elem->__size , ( int64 ) c->dataqsiz ) ; 
-#line 49 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-return c; 
-} 
-Hchan* reflect_makechan(ChanType* t, uint64 size) __asm__ (GOSYM_PREFIX 
"reflect.makechan");
-Hchan* reflect_makechan(ChanType* t, uint64 size)
-{
-  Hchan* c;
-#line 52 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-
-   c = makechan(t, size);
-return c;
-}
-
-#line 56 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-Hchan* 
-__go_new_channel ( ChanType *t , uintptr hint ) 
-{ 
-return makechan ( t , hint ) ; 
-} 
-#line 62 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-Hchan* 
-__go_new_channel_big ( ChanType *t , uint64 hint ) 
-{ 
-return makechan ( t , hint ) ; 
-} 
-#line 82 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-static bool 
-chansend ( ChanType *t , Hchan *c , byte *ep , bool block , void *pc ) 
-{ 
-SudoG *sg; 
-SudoG mysg; 
-G* gp; 
-int64 t0; 
-G* g; 
-#line 91 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-g = runtime_g ( ) ; 
-#line 93 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( raceenabled ) 
-runtime_racereadobjectpc ( ep , t->__element_type , runtime_getcallerpc ( &t ) 
, chansend ) ; 
-#line 96 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( c == nil ) { 
-USED ( t ) ; 
-if ( !block ) 
-return false; 
-runtime_park ( nil , nil , "chan send (nil chan)" ) ; 
-return false; 
-} 
-#line 104 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( runtime_gcwaiting ( ) ) 
-runtime_gosched ( ) ; 
-#line 107 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( debug ) { 
-runtime_printf ( "chansend: chan=%p\n" , c ) ; 
-} 
-#line 111 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-t0 = 0; 
-mysg.releasetime = 0; 
-if ( runtime_blockprofilerate > 0 ) { 
-t0 = runtime_cputicks ( ) ; 
-mysg.releasetime = -1; 
-} 
-#line 118 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-runtime_lock ( c ) ; 
-if ( raceenabled ) 
-runtime_racereadpc ( c , pc , chansend ) ; 
-if ( c->closed ) 
-goto closed; 
-#line 124 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-if ( c->dataqsiz > 0 ) 
-goto asynch; 
-#line 127 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-sg = dequeue ( &c->recvq ) ; 
-if ( sg != nil ) { 
-if ( raceenabled ) 
-racesync ( c , sg ) ; 
-runtime_unlock ( c ) ; 
-#line 133 "../../../trunk/libgo/runtime/../../../trunk/libgo/runtime/chan.goc"
-gp = sg->g; 
-gp->param = sg; 
-if ( sg->elem != nil ) 
-runtime_memmove ( sg->elem , ep , c->elemsize ) ; 
-if ( sg->releasetime ) 
-sg->releasetime

[zlib] [PATH] Use AM_ENABLE_MULTILIB only if with_target_subdir isn't empty

2015-03-31 Thread H.J. Lu
On Tue, Mar 31, 2015 at 11:55 AM, H.J. Lu  wrote:
> On Tue, Mar 31, 2015 at 10:18 AM, Antoine Tremblay
>  wrote:
>>
>>
>> On 03/31/2015 01:16 PM, H.J. Lu wrote:
>>>
>>> On Tue, Mar 31, 2015 at 10:12 AM, Antoine Tremblay
>>>  wrote:
>
> Also doing ./configure in binutils/zlib I get :
>
> config.status: creating Makefile
> config.status: executing default-1 commands
> ./config.status: line 1190: ./../../config-ml.in: No such file or
> directory
>
> So configure does not exit cleanly...ideas?
>
>

 I did a bit more research on this issue and I get this if I build gdb
 from
 it's source directory

 in binutils-gdb
 ./configure
 make

 make fails with : while in zlib directory

 configure: creating ./config.status
 config.status: creating Makefile
 config.status: executing default-1 commands
 ./config.status: line 1190: ./../../config-ml.in: No such file or
 directory

 However if I build out of tree in like binutils-gdb/build for example I
 do
 not get this issue.

 Could this be related to 92c695a14f6a5a24b177e89624c13d7dbcbf9e1f ?

 Subject: [PATCH 09/76] A zlib to tarball

 I see this snippet there

 -./configure --target=i386-pc-linux-gnu
 +./configure --target=i386-pc-linux-gnu \
 +   --with-target-subdir=. \
 +   --disable-multilib

 With these options I get around the configure problem only to fail in gas
 with :
 make[4]: Entering directory `/home/x/src/binutils-gdb/gas'
 /bin/bash ./libtool --tag=CC   --mode=link gcc -W -Wall
 -Wstrict-prototypes
 -Wmissing-prototypes -Wshadow -Werror -I./../zlib -g -O2
 -static-libstdc++
 -static-libgcc  -o as-new app.o as.o atof-generic.o compress-debug.o
 cond.o
 depend.o dwarf2dbg.o dw2gencfi.o ecoff.o ehopt.o expr.o flonum-copy.o
 flonum-konst.o flonum-mult.o frags.o hash.o input-file.o input-scrub.o
 listing.o literal.o macro.o messages.o output-file.o read.o remap.o sb.o
 stabs.o subsegs.o symbols.o write.o tc-i386.o obj-elf.o atof-ieee.o
 ../opcodes/libopcodes.la ../bfd/libbfd.la ../libiberty/libiberty.a   -ldl
 libtool: link: gcc -W -Wall -Wstrict-prototypes -Wmissing-prototypes
 -Wshadow -Werror -I./../zlib -g -O2 -static-libstdc++ -static-libgcc -o
 as-new app.o as.o atof-generic.o compress-debug.o cond.o depend.o
 dwarf2dbg.o dw2gencfi.o ecoff.o ehopt.o expr.o flonum-copy.o
 flonum-konst.o
 flonum-mult.o frags.o hash.o input-file.o input-scrub.o listing.o
 literal.o
 macro.o messages.o output-file.o read.o remap.o sb.o stabs.o subsegs.o
 symbols.o write.o tc-i386.o obj-elf.o atof-ieee.o
 ../opcodes/.libs/libopcodes.a ../bfd/.libs/libbfd.a
 -L/home/x/src/binutils-gdb/zlib -lz ../libiberty/libiberty.a -ldl
 /usr/bin/ld: cannot find -lz


 This is with head as :  711a72d3d6f8cd3c3f408e718ff19aa4bfd2144e

 Did you try to compile directly in the src tree ?

>>>
>>> Yes, I did.  You need to add --disable-multilib,  and maybe
>>> --with-target-subdir=.
>>>
>>
>> As I said if I add  --disable-multilib, -with-target-subdir=.
>>
>> I get into the gas missing zlib error above ?
>>
>> Also I don't think it's a good idea that gdb would require options to
>> compile in it's source tree ?
>>
>> Is there a good reason for this ?
>>
>

zlib can be built as a host and a target library.  We should
use AM_ENABLE_MULTILIB only when building for target.
This patch for zlib is borrowed from libbacktrace which is also
built as a host and a target library.  I will check it into binutils
tree.


-- 
H.J.
From 0753a9ac8a73328180cb06fb5357ee8c2c68641f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 31 Mar 2015 11:35:30 -0700
Subject: [PATCH 1/2] Use AM_ENABLE_MULTILIB only if with_target_subdir isn't
 empty

	* configure.ac (AM_ENABLE_MULTILIB): Use only if
	${with_target_subdir} isn't empty.
	* configure: Regenerated.
---
 zlib/configure| 8 +---
 zlib/configure.ac | 4 +++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/zlib/configure b/zlib/configure
index 1a9d854..8378857 100755
--- a/zlib/configure
+++ b/zlib/configure
@@ -2181,7 +2181,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
-# Default to --enable-multilib
+if test -n "${with_target_subdir}"; then
+  # Default to --enable-multilib
 # Check whether --enable-multilib was given.
 if test "${enable_multilib+set}" = set; then :
   enableval=$enable_multilib; case "$enableval" in
@@ -2218,6 +2219,7 @@ fi
 
 ac_config_commands="$ac_config_commands default-1"
 
+fi
 
 ac_aux_dir=
 for ac_dir in "$srcdir" "$srcdir/.." "$srcdir/../.."; do
@@ -10403,7 +10405,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10406 "configure"
+#line 10408 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -10509,7 +1051

Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8

2015-03-31 Thread David Edelsohn
On Tue, Mar 31, 2015 at 2:00 PM, Bill Schmidt
 wrote:
> Hi,
>
> David correctly pointed out offline that I used the wrong macro to test
> for efficient unaligned access.  Here's a corrected version, which still
> fixes PR65456 without causing regressions.  Sorry for the error!
>
> Thanks,
> Bill
>
> On Sun, 2015-03-29 at 12:42 -0500, Bill Schmidt wrote:
>> Hi,
>>
>> This is a follow-up to
>> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00103.html, which adds
>> support for faster unaligned vector memory accesses on POWER8.  As
>> pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456, there
>> was a piece missing here.  The target macro SLOW_UNALIGNED_ACCESS is
>> still evaluating to 1 for unaligned vector accesses on POWER8, which
>> causes some scalarization to occur during expand.  This version of the
>> patch fixes this as well.
>>
>> The only changes from before are the update to config/rs6000/rs6000.h,
>> and the new test case gcc.target/powerpc/pr65456.c.  Is this ok for
>> trunk after 5 branches, and backports to 4.8, 4.9, 5 thereafter?
>>
>> Thanks,
>> Bill
>
>
> [gcc]
>
> 2015-03-31  Bill Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_option_override_internal):  For
> VSX + POWER8, enable TARGET_ALLOW_MOVMISALIGN and
> TARGET_EFFICIENT_UNALIGNED_VSX if not selected by command line
> option.  However, for -mno-allow-movmisalign, be sure to disable
> TARGET_EFFICIENT_UNALIGNED_VSX to avoid an ICE.
> (rs6000_builtin_mask_for_load): Return 0 for targets with
> efficient unaligned VSX accesses so that the vectorizer will use
> direct unaligned loads.
> (rs6000_builtin_support_vector_misalignment): Always return true
> for targets with efficient unaligned VSX accesses.
> (rs6000_builtin_vectorization_cost): Cost of unaligned loads and
> stores on targets with efficient unaligned VSX accesses is almost
> always the same as the cost of an aligned load or store, so model
> it that way.
> * config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Evaluate to
> zero for unaligned vector accesses on POWER8.
> * config/rs6000/rs6000.opt (mefficient-unaligned-vector): New
> undocumented option.
>
> [gcc/testsuite]
>
> 2015-03-31  Bill Schmidt  
>
> * gcc.dg/vect/bb-slp-24.c: Exclude test for POWER8.
> * gcc.dg/vect/bb-slp-25.c: Likewise.
> * gcc.dg/vect/bb-slp-29.c: Likewise.
> * gcc.dg/vect/bb-slp-32.c: Replace vect_no_align with
> vect_no_align && { ! vect_hw_misalign }.
> * gcc.dg/vect/bb-slp-9.c: Likewise.
> * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Exclude test for
> vect_hw_misalign.
> * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Likewise.
> * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust tests to
> account for POWER8, where peeling for alignment is not needed.
> * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Replace
> vect_no_align with vect_no_align && { ! vect_hw_misalign }.
> * gcc.dg.vect.if-cvt-stores-vect-ifcvt-18.c: Likewise.
> * gcc.dg/vect/no-scevccp-outer-6-global.c: Likewise.
> * gcc.dg/vect/no-scevccp-outer-6.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-43.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-57.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-61.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
> * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
> * gcc.dg/vect/pr16105.c: Likewise.
> * gcc.dg/vect/pr20122.c: Likewise.
> * gcc.dg/vect/pr33804.c: Likewise.
> * gcc.dg/vect/pr33953.c: Likewise.
> * gcc.dg/vect/pr56787.c: Likewise.
> * gcc.dg/vect/pr58508.c: Likewise.
> * gcc.dg/vect/slp-25.c: Likewise.
> * gcc.dg/vect/vect-105-bit-array.c: Likewise.
> * gcc.dg/vect/vect-105.c: Likewise.
> * gcc.dg/vect/vect-27.c: Likewise.
> * gcc.dg/vect/vect-29.c: Likewise.
> * gcc.dg/vect/vect-33.c: Exclude unaligned access test for
> POWER8.
> * gcc.dg/vect/vect-42.c: Replace vect_no_align with vect_no_align
> && { ! vect_hw_misalign }.
> * gcc.dg/vect/vect-44.c: Likewise.
> * gcc.dg/vect/vect-48.c: Likewise.
> * gcc.dg/vect/vect-50.c: Likewise.
> * gcc.dg/vect/vect-52.c: Likewise.
> * gcc.dg/vect/vect-56.c: Likewise.
> * gcc.dg/vect/vect-60.c: Likewise.
> * gcc.dg/vect/vect-72.c: Likewise.
> * gcc.dg/vect/vect-75-big-array.c: Likewise.
> * gcc.dg/vect/vect-75.c: Likewise.
> * gcc.dg/vect/vect-77-alignchecks.c: Likewise.
> * gcc.dg/vect/vect-77-global.c: Likewise.
> * gcc.dg/vect/vect-78-alignchecks.c: Likewise.
> * gcc.dg/vect/vect-78-global.c: Likewise.
> * gcc.dg/vect/vect-93.c: 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Tue, Mar 31, 2015 at 19:10:36 +0300, Ilya Verbin wrote:
> Ok, thanks for the clarification!  Here is the new patch with variables.
> 
> Unfortunately I see 4 fails in make check-target-libgomp with PTX patch 
> applied
> on top, but with disabled offloading to PTX.
> Julian, have you seen them?  All other tests passed with intelmic emul.
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c 
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> acc_on_device-1.c aborts here:
>   /* Offloaded.  */
> #pragma acc parallel
>   {
> if (acc_on_device (acc_device_none))
>   abort ();

And here is the next version with fixed potential deadlock in
GOMP_offload_unregister.  make check-target-libgomp also passed.
(but with PTX patch make check-target-libgomp has several fails mentioned above)


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   "#ifdef __cplusplus\n"
   "extern \"C\"\n"
   "#endif\n"
-  "void GOMP_offload_register (void *, int, void *);\n\n"
+  "void GOMP_offload_register (void *, int, void *);\n"
+  "void GOMP_offload_unregister (void *, int, void *);\n\n"
 
   "__attribute__((constructor))\n"
   "static void\n"
   "init (void)\n"
   "{\n"
   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n"
+  "}\n\n", GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  "__attribute__((destructor))\n"
+  "static void\n"
+  "fini (void)\n"
+  "{\n"
+  "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n"
   "}\n", GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include "splay-tree.h"
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, voi

RE: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-03-31 Thread Evandro Menezes
Hi, Kyrill.

At this moment, it suffices to use the same scheduling as Cortex A57, but
more specific details are to be expected.

I couldn't check the build though, as my Arndale is strange today.  As soon
as it's healthy, I'll check it.  

I appreciate your feedback.

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: Kyrill Tkachov [mailto:kyrylo.tkac...@arm.com]
> Sent: Tuesday, March 31, 2015 3:33
> To: Evandro Menezes; 'GCC Patches'
> Subject: Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor
> 
> Hi Evandro
> On 30/03/15 22:51, Evandro Menezes wrote:
> > The Samsung Exynos M1 implements the ARMv8 ISA and this patch adds
> > support for it through the -mcpu command-line option.
> >
> > The patch was checked on arm-unknown-linux-gnueabihf without new
failures.
> >
> > OK for trunk?
> >
> > -- Evandro Menezes Austin, TX
> >
> > 0001-ARM-Add-option-for-the-Samsung-Exynos-M1-core-for-AR.patch
> >
> >
> > diff --git a/gcc/config/arm/arm-cores.def
> > b/gcc/config/arm/arm-cores.def index b22ea7f..0710a38 100644
> > --- a/gcc/config/arm/arm-cores.def
> > +++ b/gcc/config/arm/arm-cores.def
> > @@ -168,6 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7,
> cortexa7, 7A,  FL_LDSCHED |
> >   ARM_CORE("cortex-a53",cortexa53, cortexa53,   8A, FL_LDSCHED |
> FL_CRC32, cortex_a53)
> >   ARM_CORE("cortex-a57",cortexa57, cortexa57,   8A, FL_LDSCHED |
> FL_CRC32, cortex_a57)
> >   ARM_CORE("cortex-a72",cortexa72, cortexa57,   8A, FL_LDSCHED |
> FL_CRC32, cortex_a57)
> > +ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, FL_LDSCHED |
FL_CRC32,
> exynosm1)
> 
> There are two problems with this:
> * The 3rd field of ARM_CORE represents the scheduling identifier and
without
> a separate pipeline description for exynosm1 this will just use the
> generic_sched scheduler which performs quite poorly on modern cores.
Would
> you prefer to reuse a pipeline description from one of the pre-existing
ones?
> Look for example at the cortex-a72 definition:
> ARM_CORE("cortex-a72",cortexa72, cortexa57,  <...snip>
> here the cortexa57 means 'make scheduling decisions for cortexa57'.
> 
> * The final field in ARM_CORE specifies the tuning struct to be used for
this
> core.
> This should be defined in arm.c and have the form 'arm__tune, so
for
> your case it should be arm_exynosm1_tune. This isn't defined in your
patch,
> so it won't compile without that. You can write a custom tuning struct
> yourself, or reuse a tuning struct for one of the existing cores, if you'd
> like.
> 
> Also, you should add exynosm1 to the switch statement in arm_issue_rate to
> specify the issue rate. I have a patch for next stage1 that should
refactor
> it all into the tuning structs
> (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02706.html) but until that
> goes in, you should fill in the switch statement there.
> 
> Thanks,
> Kyrill



0001-ARM-Add-option-for-the-Samsung-Exynos-M1-core.patch
Description: Binary data


Re: [PATCH] [AArch64] Add support for the Samsung Exynos M1 processor

2015-03-31 Thread 박준모
Hi, Evandro.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Evandro Menezes
> Sent: Tuesday, March 31, 2015 6:51 AM
> To: 'GCC Patches'
> Subject: [PATCH] [AArch64] Add support for the Samsung Exynos M1 
> processor
> 
> The Samsung Exynos M1 implements the ARMv8 ISA and this patch adds 
> support for it through the -mcpu command-line option.
> 
> The patch was checked on aarch64-unknown-linux-gnu without new failures.
> 
> OK for trunk?
> 
> --
> Evandro Menezes  Austin, TX
> 


Could you modify this patch likes for ARM? 
I mean using cortex-a57's pipeline description.

Thanks
Junmo Park.



Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
> +/* Similar to gomp_fatal, but release mutexes before.  */
> +
> +static void
> +gomp_fatal_unlock (const char *fmt, ...)
> +{
> +  int i;
> +  va_list list;
> +
> +  for (i = 0; i < num_devices; i++)
> +gomp_mutex_unlock (&devices[i].lock);

This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
locked is undefined behavior.
You really should unlock it in the caller which should be aware which 0/1/2
locks it holds.

> +  gomp_mutex_unlock (®ister_lock);

Jakub


Fix ICE with thunks taking scalars passed by reference

2015-03-31 Thread Jan Hubicka
Hi,
this patch solves ICE in the attached testcase on mingw32.  The problem is that
on Windows API long double is passed & returned by reference and while expanidng
the tunk tail call, we get lost because we turn the parameter into SSA name and
later need its address to pass it further.

The patch extends hack https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00423.html
to handle not only non-registers but also registers.

Bootstrapped/regtested ppc64-linux. OK?

The bug reproduced with ICF, but I suppose it will turn into ice on any C++ 
covariant
thunks taking scalar passed by reference.

ng double func1 (long double x)
{
  if (x > 0.0)
return x;
  else if (x < 0.0)
return -x;
  else
return x;
}

long double func2 (long double x)
{
  if (x > 0.0)
return x;
  else if (x < 0.0)
return -x;
  else
return x;
}

PR ipa/65540
* calls.c (initialize_argument_information): When producing tail
call also turn SSA_NAMES passed by references to original PARM_DECLs
Index: calls.c
===
--- calls.c (revision 221805)
+++ calls.c (working copy)
@@ -1321,6 +1321,15 @@ initialize_argument_information (int num
  && TREE_CODE (base) != SSA_NAME
  && (!DECL_P (base) || MEM_P (DECL_RTL (base)
{
+ /* We may have turned the parameter value into an SSA name.
+Go back to the original parameter so we can take the
+address.  */
+ if (TREE_CODE (args[i].tree_value) == SSA_NAME)
+   {
+ gcc_assert (SSA_NAME_IS_DEFAULT_DEF (args[i].tree_value));
+ args[i].tree_value = SSA_NAME_VAR (args[i].tree_value);
+ gcc_assert (TREE_CODE (args[i].tree_value) == PARM_DECL);
+   }
  /* Argument setup code may have copied the value to register.  We
 revert that optimization now because the tail call code must
 use the original location.  */