Re: [PATCH][ARM] Use %wd format for lane printing in bounds_check

2015-08-19 Thread Ramana Radhakrishnan


On 14/08/15 10:56, Kyrill Tkachov wrote:
> Hi all,
> 
> I'm seeing these warnings when building arm.c:
> warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 
> 5 has type ‘long int’ [-Wformat=]
> 
> These appear in the bounds_check function when it tries to print out 
> HOST_WIDE_INTs using the %lld format.
> I believe the right way to print these is with %wd, which is what the 
> equivalent aarch64 function does.
> 
> With this patch I don't see the warnings any more.
> Bootstrapped and tested on arm.
> 
> Ok for trunk?

OK - I'd consider this sort of thing as obvious.

ramana

> 
> Thanks,
> Kyrill
> 
> 2015-08-14  Kyrylo Tkachov  
> 
> * config/arm/arm.c (bounds_check): Use %wd print format
> for HOST_WIDE_INT arguments.


Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Paolo Carlini

Hi,

On 08/19/2015 08:50 AM, FX wrote:

.. I think this improvement means that in principle we could revert what we 
committed for libstdc++/58625, thus increasing a little the consistency wrt the 
other classification facilities in c_global/cmath (and c_std/cmath). Not sure 
it's worth it.

Can’t comment on whether it’s worth doing, but yes, with the type-generic 
__builtin_signbit you can revert that patch with no change to the generated 
code.
Let's do it then! I'll post the trivial patchlet later today before 
committing.


Paolo.


RE: [Patch, MIPS] MIPS specific optimization for o32 ABI

2015-08-19 Thread Matthew Fortune
Steve Ellcey  writes:
> On Thu, 2015-08-13 at 02:14 -0700, Matthew Fortune wrote:
> > Hi Steve,
> >
> > Overall, I don't think these optimizations are ready to include. In 
> > principle
> > the idea looks good but it is done at the wrong point in the compiler in my
> > opinion.
> >
> > The biggest concern I have is that the analysis should be possible at (or
> > prior to) the point where the prologue/epilogue are expanded. I don't think
> > it is safe enough to post-process the code and delete the stack allocation.
> 
> I think that to do this, what we would have to do is introduce a new
> pass at the tree level (just before expanding to rtl) where we could do
> the analysis of whether or not the outgoing argument area is needed or
> not.  Then we could use that info during expand_prologue to reset
> frame->args_size if the space is not needed.

One thought was that this problem seems to fit into the category of ipa-ra and
while I don't know how that is implemented or if it is early enough... it may
be worth seeing if extra information can be calculated there.

> > There is at least one other optimization idea that competes with this one
> > which is to allow LRA to use the argument save area for arbitrary spills 
> > when
> > it is not used for spilling arguments or to prepare varargs. I think we need
> > to at least consider how the frame header removal will interact with such
> > an optimization.
> 
> I am not sure how this would work.  It seems better to just not allocate
> the space if it is not needed and then LRA can separately allocate
> whatever it needs for its own use (if any).  I'll add Robert to the cc
> list on this in case he has any ideas since he did the LRA
> implementation for MIPS.

I think between these two there will always be one optimization that has to
come first and win. If we decide prior to expansion whether an outgoing argument
area is needed (and therefore also decide if an incoming argument area is
available in any given function) then we will of course preclude any use of
this area for spilling/locals in the callee. The saving when re-using this
area is that the callee doesn't have to do stack allocation which could be
a performance win if called in a loop. Removing the stack allocation from the
caller is not as big of a win.

Perhaps balancing the two optimizations (if/when we do the LRA one) can be
fit in later without too much trouble.

Thanks,
Matthew


Re: [PATCH] ivopts costs debug

2015-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2015 at 10:45:42AM +0800, Bin.Cheng wrote:
> I ran into back-end address cost issue before and this should be
> useful in such cases.  Though there are a lot of dumps, it would be
> better to classify it into existing dump option (TDF_DETAILS?) and
> discard the use of macro.

But TDF_DETAILS is enabled pretty much always, and this costs dump
isn't to debug ivopts _itself_.  I got the idea from lower-subreg.c,
which also uses debug macros like this (and is another place where
bad costs tend to show up, btw).

> Also the address cost will be tuned/dumped
> later, we should differentiate between them by emphasizing this part
> of dump is the original cost from back-end.

Yeah I should probably print some header, also say e.g. what machine
mode some table is for.

> > It also shows that the LAST_VIRTUAL_REGISTER trickery ivopts does
> > does not work (legitimize_address can create new registers, so now
> > a) we have new registers anyway, and b) we use some for multiple
> > purposes.  Oops).
> Yes, that makes seq dump a little weird.

It can even make the result wrong -- e.g. (plus (reg 155) (reg 155))
could  well be costed differently than (plus (reg 155) (reg 159)).


Segher


Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Andreas Schwab
FX  writes:

> @@ -80,6 +80,24 @@ foo_1 (float f, double d, long double ld
>if (__builtin_finitel (ld) != res_isfin)
>  __builtin_abort ();
>  
> +  /* Sign bit of zeros and nans is not preserved in unsafe math mode.  */
> +#ifdef UNSAFE
> +  if (!res_isnan && d != 0)
> +#endif

Why only in usafe mode?  Isn't the sign bit of NaN always unreliable?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-19 Thread Yao Qi

On 18/08/15 20:32, Aldy Hernandez wrote:

Aldyh, what other testing did you usually do for changes?  Run
the gdb testsuite against the new compiler?  Anything else?


gdb testsuite, and make sure you test GCC with
--enable-languages=all,go,ada, though the latter is mostly useful while
you iron out bugs initially.  I found that ultimately, the best test was
C++.


FWIW, it would be nice to run gdb testsuite with different dwarf
versions (3, 4, and 5).

--
Yao (齐尧)


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Biener
On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood  
>> wrote:
 On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
  wrote:
 > Hi Richard,
 >
 > Thanks for the reply. I'd chosen to add new expressions as this seemed 
 > more
 > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
 > addition it
 > would seem to provide more opportunities for optimisation than a
 > target-specific
 > builtin implementation would. I accept that optimisation opportunities 
 > will
 > be more limited for strict math compilation, but that it was still
 > worth having
 > them. Also, if we did map it to builtins then the scalar version would go
 > through the optabs and the vector version would go through the
 > target's builtin
 > expansion, which doesn't seem very consistent.

 On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
 you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
>>> I thought for this particular case associativity wasn't an issue?
>>> We're not doing any
>>> reductions here, just simply performing max/min operations on each
>>> pair of elements
>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>> ensure that for
>>> each pair of elements if one element is a NaN we return the other one.
>>
>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>> tree code is the best thing to do here.  You can add fmin/max optabs
>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>> builtin for the vectorized variant.
>>
>> The reason I am pushing against a new tree code is that we'd have an
>> awful lot of similar codes when pushing other flag related IL
>> specialities to actual IL constructs.  And we still need to find a
>> consistent way to do that.
>
> In this case though the new code is really the "native" min/max operation
> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> a bit unfortunate that the non-strict min/max fp operation got mapped
> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
> the flag-related modification.  The STRICT_* prefix is forced by that and
> might make it seem like more of a special case than it really is.

In some sense.  But the "strict" version already has a builtin (just no
special expander in builtins.c).  We usually don't add 1:1 tree codes
for existing builtins (why have builtins at all then?).

> If you're still not convinced, how about an internal function instead
> of a built-in function, so that we can continue to use optabs for all
> cases?  I'd really like to avoid forcing such a generic concept down to
> target-specific builtins with target-specific expansion code, especially
> when the same concept is exposed by target-independent code for scalars.

The target builtin is for the vectorized variant - not all targets might have
that and we'd need to query the target about this.  So using a IFN would
mean adding a target hook for that query.

> TBH though I'm not sure why an internal_fn value (or a target-specific
> builtin enum value) is worse than a tree-code value, unless the limit
> of the tree_code bitfield is in sight (maybe it is).

I think tree_code is 64bits now.

Richard.

> Thanks,
> Richard
>


Re: [PATCH] Fix middle-end/67133, part 1

2015-08-19 Thread Richard Biener
On Tue, Aug 18, 2015 at 9:49 PM, Marek Polacek  wrote:
> On Tue, Aug 18, 2015 at 10:45:21AM +0200, Richard Biener wrote:
>> On Mon, Aug 17, 2015 at 7:31 PM, Jeff Law  wrote:
>> > But in walking through all that, I think I've stumbled on a simpler
>> > solution.  Specifically do as a little as possible and let the standard
>> > mechanisms clean things up :-)
>> >
>> > 1. Delete the code that removes instructions after the trap.
>> >
>> > 2. Split the block immediately after the trap and remove the edge
>> >from the original block (with the trap) to the new block.
>>
>> cfg-cleanup will do that for you if you have a not returning stmt ending
>> the previous block.
>
> The following patch hopefully does what's oulined above.
> Arguably I should have renamed the insert_trap_and_remove_trailing_statements
> to something more descriptive, e.g. insert_trap_and_split_block.  Your
> call.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Looks good to me.

Richard.

> 2015-08-18  Marek Polacek  
>
> PR middle-end/67133
> * gimple-ssa-isolate-paths.c
> (insert_trap_and_remove_trailing_statements): Rename to ...
> (insert_trap): ... this.  Don't remove trailing statements; split
> block instead.
> (find_explicit_erroneous_behaviour): Don't remove all outgoing edges.
>
> * g++.dg/torture/pr67133.C: New test.
>
> diff --git gcc/gimple-ssa-isolate-paths.c gcc/gimple-ssa-isolate-paths.c
> index 6f84f85..ca2322d 100644
> --- gcc/gimple-ssa-isolate-paths.c
> +++ gcc/gimple-ssa-isolate-paths.c
> @@ -66,10 +66,10 @@ check_loadstore (gimple stmt, tree op, tree, void *data)
>return false;
>  }
>
> -/* Insert a trap after SI and remove SI and all statements after the trap.  
> */
> +/* Insert a trap after SI and split the block after the trap.  */
>
>  static void
> -insert_trap_and_remove_trailing_statements (gimple_stmt_iterator *si_p, tree 
> op)
> +insert_trap (gimple_stmt_iterator *si_p, tree op)
>  {
>/* We want the NULL pointer dereference to actually occur so that
>   code that wishes to catch the signal can do so.
> @@ -115,18 +115,8 @@ insert_trap_and_remove_trailing_statements 
> (gimple_stmt_iterator *si_p, tree op)
>else
>  gsi_insert_before (si_p, seq, GSI_NEW_STMT);
>
> -  /* We must remove statements from the end of the block so that we
> - never reference a released SSA_NAME.  */
> -  basic_block bb = gimple_bb (gsi_stmt (*si_p));
> -  for (gimple_stmt_iterator si = gsi_last_bb (bb);
> -   gsi_stmt (si) != gsi_stmt (*si_p);
> -   si = gsi_last_bb (bb))
> -{
> -  stmt = gsi_stmt (si);
> -  unlink_stmt_vdef (stmt);
> -  gsi_remove (&si, true);
> -  release_defs (stmt);
> -}
> +  split_block (gimple_bb (new_stmt), new_stmt);
> +  *si_p = gsi_for_stmt (stmt);
>  }
>
>  /* BB when reached via incoming edge E will exhibit undefined behaviour
> @@ -215,7 +205,7 @@ isolate_path (basic_block bb, basic_block duplicate,
>   update_stmt (ret);
> }
>else
> -   insert_trap_and_remove_trailing_statements (&si2, op);
> +   insert_trap (&si2, op);
>  }
>
>return duplicate;
> @@ -422,14 +412,8 @@ find_explicit_erroneous_behaviour (void)
> continue;
> }
>
> - insert_trap_and_remove_trailing_statements (&si,
> - null_pointer_node);
> -
> - /* And finally, remove all outgoing edges from BB.  */
> - edge e;
> - for (edge_iterator ei = ei_start (bb->succs);
> -  (e = ei_safe_edge (ei)); )
> -   remove_edge (e);
> + insert_trap (&si, null_pointer_node);
> + bb = gimple_bb (gsi_stmt (si));
>
>   /* Ignore any more operands on this statement and
>  continue the statement iterator (which should
> diff --git gcc/testsuite/g++.dg/torture/pr67133.C 
> gcc/testsuite/g++.dg/torture/pr67133.C
> index e69de29..0f23572 100644
> --- gcc/testsuite/g++.dg/torture/pr67133.C
> +++ gcc/testsuite/g++.dg/torture/pr67133.C
> @@ -0,0 +1,46 @@
> +// { dg-do compile }
> +// { dg-additional-options "-fisolate-erroneous-paths-attribute" }
> +
> +class A;
> +struct B {
> +  typedef A type;
> +};
> +template  struct I : B {};
> +class C {
> +public:
> +  C(char *);
> +  int size();
> +};
> +template  struct D;
> +template > class F {
> +  class G {
> +template  static _Tp *__test();
> +typedef int _Del;
> +
> +  public:
> +typedef decltype(__test<_Del>()) type;
> +  };
> +
> +public:
> +  typename I<_Tp>::type operator*() {
> +typename G::type a = 0;
> +return *a;
> +  }
> +};
> +class H {
> +  F Out;
> +  H();
> +};
> +void fn1(void *, void *, int) __attribute__((__nonnull__));
> +class A {
> +  int OutBufEnd, OutBufCur;
> +
> +public:
> +  void operator<<(C p1) {
> +int b, c = p1.size();
> +if (OutBufEnd)
> +  fn1(&OutBufCur, &b, c);
> +  }
> +};
> +

Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>  wrote:
> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>  wrote:
> > Hi Richard,
> >
> > Thanks for the reply. I'd chosen to add new expressions as this
> > seemed more
> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
> > addition it
> > would seem to provide more opportunities for optimisation than a
> > target-specific
> > builtin implementation would. I accept that optimisation
> > opportunities will
> > be more limited for strict math compilation, but that it was still
> > worth having
> > them. Also, if we did map it to builtins then the scalar version would 
> > go
> > through the optabs and the vector version would go through the
> > target's builtin
> > expansion, which doesn't seem very consistent.
>
> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, 
> correct?)
 I thought for this particular case associativity wasn't an issue?
 We're not doing any
 reductions here, just simply performing max/min operations on each
 pair of elements
 in the vectors. I thought for IEEE-compliant behaviour we just need to
 ensure that for
 each pair of elements if one element is a NaN we return the other one.
>>>
>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>> builtin for the vectorized variant.
>>>
>>> The reason I am pushing against a new tree code is that we'd have an
>>> awful lot of similar codes when pushing other flag related IL
>>> specialities to actual IL constructs.  And we still need to find a
>>> consistent way to do that.
>>
>> In this case though the new code is really the "native" min/max operation
>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>> a bit unfortunate that the non-strict min/max fp operation got mapped
>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>> the flag-related modification.  The STRICT_* prefix is forced by that and
>> might make it seem like more of a special case than it really is.
>
> In some sense.  But the "strict" version already has a builtin (just no
> special expander in builtins.c).  We usually don't add 1:1 tree codes
> for existing builtins (why have builtins at all then?).

We still need the builtin to match the C function (and to allow direct
calls to __builtin_fmin, etc., which are occasionally useful).

>> If you're still not convinced, how about an internal function instead
>> of a built-in function, so that we can continue to use optabs for all
>> cases?  I'd really like to avoid forcing such a generic concept down to
>> target-specific builtins with target-specific expansion code, especially
>> when the same concept is exposed by target-independent code for scalars.
>
> The target builtin is for the vectorized variant - not all targets might have
> that and we'd need to query the target about this.  So using a IFN would
> mean adding a target hook for that query.

No, the idea is that if we have a tree code or an internal function, the
decision about whether we have target support is based on a query of the
optabs (just like it is for scalar, and for other vectorisable tree codes).
No new hooks are needed.

The patch checked for target support that way.

> > TBH though I'm not sure why an internal_fn value (or a target-specific
> > builtin enum value) is worse than a tree-code value, unless the limit
> > of the tree_code bitfield is in sight (maybe it is).
>
> I think tree_code is 64bits now.

Even better :-)

Thanks,
Richard



Re: Move some flag_unsafe_math_optimizations using simplify and match

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 6:53 AM, Hurugalawadi, Naveen
 wrote:
> Hi Richard,
>
> Thanks very much for your review and comments.
>
>>> Can you point me to which patterns exhibit this behavior?
>
> root(x)*root(y) as root(x*y)
> expN(x)*expN(y) as expN(x+y)
> pow(x,y)*pow(x,z) as pow(x,y+z)
> x/expN(y) into x*expN(-y)
>
> Long Double and Float variants FAIL with segmentation fault with these
> patterns in match.pd file for AArch64.

Presumably the backend tells GCC the builtins are not available.

> However, most of these work as expected with X86_64.
>
> I had those implemented as per the fold-const.c which can be found at:-
> https://gcc.gnu.org/ml/gcc/2015-08/msg00021.html
>
>>>  (mult (SQRT@1 @0) @1)
>
> Sorry for the typo in there.
> However, the current pattern does not generate the optimized pattern as 
> expected.
> x_2 = ABS_EXPR ;
> return x_2;

I see.  But I can't really help without a testcase that I can use to have a look
(same for the above issue with the segfaults).

>>> use (rdiv (POW @0 REAL_CST@1) @0)
>
> It generates ICE with the above modification
> internal compiler error: tree check: expected ssa_name, have var_decl in 
> simplify_builtin_call, at tree-ssa-forwprop.c:1259

Hmm.  Indeed, replacing a non-call with a call isn't very well
supported yet.  A quick "fix" to
avoid this ICE would disable the pattern for -ferrno-math.  If you
open a bugreport with the
pattern and a testcase I'm going to have a closer look.

> Also, can you please explain me the significance and use of ":s"
> I could understand it a bit but still confused about its use in match.pd

":s" is important so that when we have, say

 tem = pow (x, 4.5);
 tem2 = tem / x;
 foo (tem);

thus the result of 'pow (x, 4.5)' is used in the pattern we match and also
elsewhere, we avoid turning this into

 tem = pow (x, 4.5);
 tem2 = pow (x, 3.5);
 foo (tem);

which is of course more expensive than doing the division.  Thus it makes sure
that parts of the patterns we don't use in the result are later removed as dead.

Richard.

> Thanks,
> Naveen


Re: [PR64164] drop copyrename, integrate into expand

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 8:45 AM, Alexandre Oliva  wrote:
> On Aug 18, 2015, Alexandre Oliva  wrote:
>
 On Aug 17, 2015, Christophe Lyon  wrote:
> Since this was committed (r226901), I can see that the compiler build
> fails for armeb targets, when building libgcc:
>
>> This patch fixes this particular case.  I'll also add this configuration
>> to the cross build tests I'm going to rerun shortly, before submitting a
>> followup formally, to see whether other non-MEM mems need to be handled
>> explicitly.
>
> On Aug 17, 2015, Andreas Schwab  wrote:
>
>> Andreas Schwab  writes:
>
>>> Alexandre Oliva  writes:
>>>
 Would you be so kind as to give it a spin on a m68k native?  TIA,
>>>
>>> I tried it on ia64, and it falls flat on the floor.
>
>> It fixes the m68k failures, though.
>
> On Aug 17, 2015, Alexandre Oliva  wrote:
>
>>> I tried it on ia64, and it falls flat on the floor.
>
>> Doh, I see a logic flaw in the patch I posted.
>
> There were other shortcomings in the snippets I posted before, revealed
> by testing on on various other targets: remaining BLKmode asserts,
> failure to deal with parms without a default def and split complex args
> with an unassigned stack address.  This patch fixes them all.
>
> It was regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu,
> ppc64el-linux-gnu, and further tested with a compile-only 'make all' on
> a binutils+gcc+newlib tree on all tens of cross targets mentioned
> before, plus the armeb configuration Christophe mentioned.
>
> Ok to install?

Ok.

Thanks,
Richard.

>
> [PR64164] fix regressions reported on m68k and armeb
>
> From: Alexandre Oliva 
>
> Defer stack slot address assignment for all parms that can't live in
> pseudos, and accept pseudos assignments in assign_param_setup_block.
>
> for  gcc/ChangeLog
>
> PR rtl-optimization/64164
> * cfgexpand.c (parm_maybe_byref_p): Renamed to...
> (parm_in_stack_slot_p): ... this.  Disregard mode, what
> matters is whether the parm will live in a pseudo or a stack
> slot.
> (expand_one_ssa_partition): Deal with params without a default
> def.  Disregard mode.
> * cfgexpand.h: Renamed function declaration.
> * tree-ssa-coalesce.c: Adjust.
> * function.c (split_complex_args): Allocate stack slot for
> unassigned parms before splitting.
> (parm_in_unassigned_mem_p): New.  Use it instead of
> parm_maybe_byref_p throughout this file.
> (assign_parm_setup_block): Use it.  Accept pseudos in the
> expand-assigned rtl.
> (assign_parm_setup_reg): Drop BLKmode requirement.
> (assign_parm_setup_stack): Allocate and fill in the address of
> unassigned MEM parms.
> ---
>  gcc/cfgexpand.c |   44 ++--
>  gcc/cfgexpand.h |2 +
>  gcc/function.c  |   74 
> ---
>  gcc/tree-ssa-coalesce.c |4 +--
>  4 files changed, 100 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 0bc20f6..d567a87 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -172,17 +172,23 @@ leader_merge (tree cur, tree next)
>return cur;
>  }
>
> -/* Return true if VAR is a PARM_DECL or a RESULT_DECL of type BLKmode.
> +/* Return true if VAR is a PARM_DECL or a RESULT_DECL that ought to be
> +   assigned to a stack slot.  We can't have expand_one_ssa_partition
> +   choose their address: the pseudo holding the address would be set
> +   up too late for assign_params to copy the parameter if needed.
> +
> Such parameters are likely passed as a pointer to the value, rather
> than as a value, and so we must not coalesce them, nor allocate
> stack space for them before determining the calling conventions for
> -   them.  For their SSA_NAMEs, expand_one_ssa_partition emits RTL as
> -   MEMs with pc_rtx as the address, and then it replaces the pc_rtx
> -   with NULL so as to make sure the MEM is not used before it is
> -   adjusted in assign_parm_setup_reg.  */
> +   them.
> +
> +   For their SSA_NAMEs, expand_one_ssa_partition emits RTL as MEMs
> +   with pc_rtx as the address, and then it replaces the pc_rtx with
> +   NULL so as to make sure the MEM is not used before it is adjusted
> +   in assign_parm_setup_reg.  */
>
>  bool
> -parm_maybe_byref_p (tree var)
> +parm_in_stack_slot_p (tree var)
>  {
>if (!var || VAR_P (var))
>  return false;
> @@ -190,7 +196,7 @@ parm_maybe_byref_p (tree var)
>gcc_assert (TREE_CODE (var) == PARM_DECL
>   || TREE_CODE (var) == RESULT_DECL);
>
> -  return TYPE_MODE (TREE_TYPE (var)) == BLKmode;
> +  return !use_register_for_decl (var);
>  }
>
>  /* Return the partition of the default SSA_DEF for decl VAR.  */
> @@ -1343,17 +1349,35 @@ expand_one_ssa_partition (tree var)
>
>if (!use_register_for_decl (var))
>  {
> -  if (parm_maybe_byref_p (SSA_NAME_VAR (var))
> - && ssa_de

Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
  wrote:
>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>  wrote:
>> > Hi Richard,
>> >
>> > Thanks for the reply. I'd chosen to add new expressions as this
>> > seemed more
>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>> > addition it
>> > would seem to provide more opportunities for optimisation than a
>> > target-specific
>> > builtin implementation would. I accept that optimisation
>> > opportunities will
>> > be more limited for strict math compilation, but that it was still
>> > worth having
>> > them. Also, if we did map it to builtins then the scalar version would 
>> > go
>> > through the optabs and the vector version would go through the
>> > target's builtin
>> > expansion, which doesn't seem very consistent.
>>
>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, 
>> correct?)
> I thought for this particular case associativity wasn't an issue?
> We're not doing any
> reductions here, just simply performing max/min operations on each
> pair of elements
> in the vectors. I thought for IEEE-compliant behaviour we just need to
> ensure that for
> each pair of elements if one element is a NaN we return the other one.

 Hmm, true.  Ok, my comment still stands - I don't see that using a
 tree code is the best thing to do here.  You can add fmin/max optabs
 and special expansion of BUILT_IN_FMIN/MAX and you can use a target
 builtin for the vectorized variant.

 The reason I am pushing against a new tree code is that we'd have an
 awful lot of similar codes when pushing other flag related IL
 specialities to actual IL constructs.  And we still need to find a
 consistent way to do that.
>>>
>>> In this case though the new code is really the "native" min/max operation
>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>>> might make it seem like more of a special case than it really is.
>>
>> In some sense.  But the "strict" version already has a builtin (just no
>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>> for existing builtins (why have builtins at all then?).
>
> We still need the builtin to match the C function (and to allow direct
> calls to __builtin_fmin, etc., which are occasionally useful).
>
>>> If you're still not convinced, how about an internal function instead
>>> of a built-in function, so that we can continue to use optabs for all
>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>> target-specific builtins with target-specific expansion code, especially
>>> when the same concept is exposed by target-independent code for scalars.
>>
>> The target builtin is for the vectorized variant - not all targets might have
>> that and we'd need to query the target about this.  So using a IFN would
>> mean adding a target hook for that query.
>
> No, the idea is that if we have a tree code or an internal function, the
> decision about whether we have target support is based on a query of the
> optabs (just like it is for scalar, and for other vectorisable tree codes).
> No new hooks are needed.
>
> The patch checked for target support that way.

Fair enough.  Still this means we should have tree codes for all builtins
that eventually are vectorized?  So why don't we have SIN_EXPR,
POW_EXPR (ok, I did argue and have patches for that in the past),
RINT_EXPR, SQRT_EXPR, etc?

This patch starts to go down that route which is why I ask for the
whole picture to be considered and hinted at the alternative implementation
which follows existing practice.  Add a expander in builtins.c, add an optab,
and eventual support to vectorized_function.

See for example ix86_builtin_vectorized_function which handles
sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
if the target supports it for the scalar mode, so not sure if there is
any x86 ISA where it has vectorized FMA but not scalar FMA).

>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>> > builtin enum value) is worse than a tree-code value, unless the limit
>> > of the tree_code bitfield is in sight (maybe it is).
>>
>> I think tree_code is 64bits now.
>
> Even better :-)

Yes.

I'm not against adding a corresponding tree code for all math builtin functions,
we just have to decide whether this is the way 

Re: [PATCH] Fix middle-end/67133, part 1

2015-08-19 Thread Marek Polacek
On Wed, Aug 19, 2015 at 11:48:12AM +0200, Richard Biener wrote:
> Looks good to me.

Thanks!  I'll wait for Jeff if he has any comments.

Marek


Re: [PATCH] Missing Skylake -march=/-mtune= option

2015-08-19 Thread Richard Biener
On Thu, Aug 13, 2015 at 9:57 PM, Uros Bizjak  wrote:
> On Thu, Aug 13, 2015 at 11:31 AM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is patch for adding march/mtune options for Skylake.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>
> OK.

I think this causes

FAIL: g++.dg/ext/mv16.C  -std=gnu++98 execution test
FAIL: g++.dg/ext/mv16.C  -std=gnu++11 execution test
FAIL: g++.dg/ext/mv16.C  -std=gnu++14 execution test

for me.  Possibly __builtin_cpu_is is not working for skylake?

Richarad.

> Thanks,
> Uros.
>
>> ChangeLog:
>> 2015-08-13  Yuri Rumyantsev  
>>
>> * config/i386/driver-i386.c (host_detect_local_cpu): Add support
>> for skylake.
>> * config/i386/i386.c (PTA_SKYLAKE): New macros.
>> (processor_alias_table): Add skylake description.
>> (enum processor_model): Add skylake processor.
>> (arch_names_table): Add skylake record.
>> * doc/invoke.texi: Add skylake item.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/i386/builtin_target.c: Add skylake check.
>>
>> libgcc/ChangeLog:
>> * config/i386/cpuinfo.c (enum processor_subtypes): Add skylake.
>> (get_intel_cpu): Likewise.


Re: [PATCH] Missing Skylake -march=/-mtune= option

2015-08-19 Thread Uros Bizjak
On Wed, Aug 19, 2015 at 12:39 PM, Richard Biener
 wrote:
> On Thu, Aug 13, 2015 at 9:57 PM, Uros Bizjak  wrote:
>> On Thu, Aug 13, 2015 at 11:31 AM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is patch for adding march/mtune options for Skylake.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>
>> OK.
>
> I think this causes
>
> FAIL: g++.dg/ext/mv16.C  -std=gnu++98 execution test
> FAIL: g++.dg/ext/mv16.C  -std=gnu++11 execution test
> FAIL: g++.dg/ext/mv16.C  -std=gnu++14 execution test
>
> for me.  Possibly __builtin_cpu_is is not working for skylake?

No, a relevant entry has to be added to the testcase. But a real
skylake is needed to test the patch.

Uros.


[PATCH/AARCH64] Remove index from AARCH64_EXTRA_TUNING_OPTION

2015-08-19 Thread Andrew Pinski
Just like the patch for AARCH64_FUSION_PAIR, this is a patch for
AARCH64_EXTRA_TUNING_OPTION.  Note I tested this patch on top of the
patch for AARCH64_EXTRA_TUNING_OPTION.


Remove index from AARCH64_FUSION_PAIR

Instead of doing an explict index in aarch64-fusion-pairs.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
(aarch64_fusion_pairs): Base the shifted value on the index instead
of the argument to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64.c: Remove the last argument to AARCH64_FUSION_PAIR.
commit 61a89a2f6939fbc97e18d2137daba7f450ef76b2
Author: Andrew Pinski 
Date:   Wed Aug 19 01:15:00 2015 -0700

Remove index from AARCH64_EXTRA_TUNING_OPTION

Instead of doing an explict index in aarch64-tuning-flags.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
ChangeLog:
* config/aarch64/aarch64-tuning-flags.def: Remove all index to 
AARCH64_EXTRA_TUNING_OPTION.
* config/aarch64/aarch64-protos.h (extra_tuning_flags_index): New enum.
(aarch64_extra_tuning_flags): Base the shifted value on the index instead
of the argument to AARCH64_EXTRA_TUNING_OPTION.
* config/aarch64/aarch64.c: Remove the last argument to 
AARCH64_EXTRA_TUNING_OPTION..

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index c4c1817..2abee03 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -231,8 +231,19 @@ enum aarch64_fusion_pairs
 };
 #undef AARCH64_FUSION_PAIR
 
-#define AARCH64_EXTRA_TUNING_OPTION(x, name, index) \
-  AARCH64_EXTRA_TUNE_##name = (1 << index),
+#define AARCH64_EXTRA_TUNING_OPTION(x, name) \
+  AARCH64_EXTRA_TUNE_##name##_index,
+/* Supported tuning flags indexes.  */
+enum aarch64_extra_tuning_flags_index
+{
+#include "aarch64-tuning-flags.def"
+  AARCH64_EXTRA_TUNE_index_END
+};
+#undef AARCH64_EXTRA_TUNING_OPTION
+
+
+#define AARCH64_EXTRA_TUNING_OPTION(x, name) \
+  AARCH64_EXTRA_TUNE_##name = (1 << AARCH64_EXTRA_TUNE_##name##_index),
 /* Supported tuning flags.  */
 enum aarch64_extra_tuning_flags
 {
@@ -242,7 +253,7 @@ enum aarch64_extra_tuning_flags
 /* Hacky macro to build the "all" flag mask.
Expands to 0 | AARCH64_TUNE_index0 | AARCH64_TUNE_index1 , etc.  */
 #undef AARCH64_EXTRA_TUNING_OPTION
-#define AARCH64_EXTRA_TUNING_OPTION(x, name, y) \
+#define AARCH64_EXTRA_TUNING_OPTION(x, name) \
   | AARCH64_EXTRA_TUNE_##name
   AARCH64_EXTRA_TUNE_ALL = 0
 #include "aarch64-tuning-flags.def"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 01aaca8..628386b 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -20,15 +20,13 @@
 /* Additional control over certain tuning parameters.  Before including
this file, define a macro:
 
- AARCH64_EXTRA_TUNING_OPTION (name, internal_name, index_bit)
+ AARCH64_EXTRA_TUNING_OPTION (name, internal_name)
 
Where:
 
  NAME is a string giving a friendly name for the tuning flag.
  INTERNAL_NAME gives the internal name suitable for appending to
- AARCH64_TUNE_ to give an enum name.
- INDEX_BIT is the bit to set in the bitmask of supported tuning
- flags.  */
+ AARCH64_TUNE_ to give an enum name. */
 
-AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS, 0)
+AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 162e25e..ad144fe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -183,7 +183,7 @@ static const struct aarch64_flag_desc 
aarch64_fusible_pairs[] =
 };
 #undef AARCH64_FUION_PAIR
 
-#define AARCH64_EXTRA_TUNING_OPTION(name, internal_name, y) \
+#define AARCH64_EXTRA_TUNING_OPTION(name, internal_name) \
   { name, AARCH64_EXTRA_TUNE_##internal_name },
 static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 {


Re: [PATCH] Missing Skylake -march=/-mtune= option

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 12:47 PM, Uros Bizjak  wrote:
> On Wed, Aug 19, 2015 at 12:39 PM, Richard Biener
>  wrote:
>> On Thu, Aug 13, 2015 at 9:57 PM, Uros Bizjak  wrote:
>>> On Thu, Aug 13, 2015 at 11:31 AM, Yuri Rumyantsev  
>>> wrote:
 Hi All,

 Here is patch for adding march/mtune options for Skylake.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?
>>>
>>> OK.
>>
>> I think this causes
>>
>> FAIL: g++.dg/ext/mv16.C  -std=gnu++98 execution test
>> FAIL: g++.dg/ext/mv16.C  -std=gnu++11 execution test
>> FAIL: g++.dg/ext/mv16.C  -std=gnu++14 execution test
>>
>> for me.  Possibly __builtin_cpu_is is not working for skylake?
>
> No, a relevant entry has to be added to the testcase. But a real
> skylake is needed to test the patch.

Hmm, so it doesn't fall back to 'default' for skylake?  Anyway, I'll
ignore the execute fail then for now until you sort it out in the testcase.

Richard.

> Uros.


[PATCH/AARCH64] Remove index from AARCH64_FUSION_PAIR

2015-08-19 Thread Andrew Pinski
Instead of doing an explicit index in aarch64-fusion-pairs.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conflicts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
(aarch64_fusion_pairs): Base the shifted value on the index instead
of the argument to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64.c: Remove the last argument to AARCH64_FUSION_PAIR.
commit 69a828bfdcd2f4de2c9d4f27e3878213d04ed353
Author: Andrew Pinski 
Date:   Tue Aug 18 22:13:32 2015 -0700

Remove index from AARCH64_FUSION_PAIR

Instead of doing an explict index in aarch64-fusion-pairs.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
(aarch64_fusion_pairs): Base the shifted value on the index instead
of the argument to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64.c: Remove the last argument to AARCH64_FUSION_PAIR.

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index a7b00f6..53bbef4 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -20,19 +20,17 @@
 /* Pairs of instructions which can be fused. before including this file,
define a macro:
 
- AARCH64_FUSION_PAIR (name, internal_name, index_bit)
+ AARCH64_FUSION_PAIR (name, internal_name)
 
Where:
 
  NAME is a string giving a friendly name for the instructions to fuse.
  INTERNAL_NAME gives the internal name suitable for appending to
- AARCH64_FUSE_ to give an enum name.
- INDEX_BIT is the bit to set in the bitmask of supported fusion
- operations.  */
-
-AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK, 0)
-AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD, 1)
-AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK, 2)
-AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR, 3)
-AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH, 4)
+ AARCH64_FUSE_ to give an enum name. */
+
+AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK)
+AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
+AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
+AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
+AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 0b09d49..c4c1817 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -201,8 +201,18 @@ struct tune_params
   unsigned int extra_tuning_flags;
 };
 
-#define AARCH64_FUSION_PAIR(x, name, index) \
-  AARCH64_FUSE_##name = (1 << index),
+#define AARCH64_FUSION_PAIR(x, name) \
+  AARCH64_FUSE_##name##_index, 
+/* Supported fusion operations.  */
+enum aarch64_fusion_pairs_index
+{
+#include "aarch64-fusion-pairs.def"
+  AARCH64_FUSE_index_END
+};
+#undef AARCH64_FUSION_PAIR
+
+#define AARCH64_FUSION_PAIR(x, name) \
+  AARCH64_FUSE_##name = (1 << AARCH64_FUSE_##name##_index),
 /* Supported fusion operations.  */
 enum aarch64_fusion_pairs
 {
@@ -213,7 +223,7 @@ enum aarch64_fusion_pairs
to:
AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
 #undef AARCH64_FUSION_PAIR
-#define AARCH64_FUSION_PAIR(x, name, y) \
+#define AARCH64_FUSION_PAIR(x, name) \
   | AARCH64_FUSE_##name
 
   AARCH64_FUSE_ALL = 0
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aa268ae..162e25e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -172,7 +172,7 @@ struct aarch64_flag_desc
   unsigned int flag;
 };
 
-#define AARCH64_FUSION_PAIR(name, internal_name, y) \
+#define AARCH64_FUSION_PAIR(name, internal_name) \
   { name, AARCH64_FUSE_##internal_name },
 static const struct aarch64_flag_desc aarch64_fusible_pairs[] =
 {


Re: [PATCH/AARCH64] Remove index from AARCH64_FUSION_PAIR

2015-08-19 Thread James Greenhalgh
On Wed, Aug 19, 2015 at 12:11:04PM +0100, Andrew Pinski wrote:
> Instead of doing an explicit index in aarch64-fusion-pairs.def, we
> should have an enum which does the index instead.  This allows
> you to add/remove them without worrying about the order being
> correct and having holes or worry about merge conflicts.
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> 

Looks good, it would be good to expand this patch to get rid of the
messy way we build the AARCH64_FUSE_ALL macro:

> /* Hacky macro to build AARCH64_FUSE_ALL.  The sequence below expands
>to:
>AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
> #undef AARCH64_FUSION_PAIR
> #define AARCH64_FUSION_PAIR(x, name, y) \
>   | AARCH64_FUSE_##name
> 
>   AARCH64_FUSE_ALL = 0
> #include "aarch64-fusion-pairs.def"

We should now be able to do something like:

  AARCH64_FUSE_ALL = ((1 << AARCH64_FUSE_index_END) - 1)

Right?

If so, could you respin with that change?

Thanks,
James


> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> index a7b00f6..53bbef4 100644
> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> @@ -20,19 +20,17 @@
>  /* Pairs of instructions which can be fused. before including this file,
> define a macro:
>  
> - AARCH64_FUSION_PAIR (name, internal_name, index_bit)
> + AARCH64_FUSION_PAIR (name, internal_name)
>  
> Where:
>  
>   NAME is a string giving a friendly name for the instructions to fuse.
>   INTERNAL_NAME gives the internal name suitable for appending to
> - AARCH64_FUSE_ to give an enum name.
> - INDEX_BIT is the bit to set in the bitmask of supported fusion
> - operations.  */
> -
> -AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK, 0)
> -AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD, 1)
> -AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK, 2)
> -AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR, 3)
> -AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH, 4)
> + AARCH64_FUSE_ to give an enum name. */
> +
> +AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK)
> +AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
> +AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
> +AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
> +AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
>  
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 0b09d49..c4c1817 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -201,8 +201,18 @@ struct tune_params
>unsigned int extra_tuning_flags;
>  };
>  
> -#define AARCH64_FUSION_PAIR(x, name, index) \
> -  AARCH64_FUSE_##name = (1 << index),
> +#define AARCH64_FUSION_PAIR(x, name) \
> +  AARCH64_FUSE_##name##_index, 
> +/* Supported fusion operations.  */
> +enum aarch64_fusion_pairs_index
> +{
> +#include "aarch64-fusion-pairs.def"
> +  AARCH64_FUSE_index_END
> +};
> +#undef AARCH64_FUSION_PAIR
> +
> +#define AARCH64_FUSION_PAIR(x, name) \
> +  AARCH64_FUSE_##name = (1 << AARCH64_FUSE_##name##_index),
>  /* Supported fusion operations.  */
>  enum aarch64_fusion_pairs
>  {
> @@ -213,7 +223,7 @@ enum aarch64_fusion_pairs
> to:
> AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
>  #undef AARCH64_FUSION_PAIR
> -#define AARCH64_FUSION_PAIR(x, name, y) \
> +#define AARCH64_FUSION_PAIR(x, name) \
>| AARCH64_FUSE_##name
>  
>AARCH64_FUSE_ALL = 0
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index aa268ae..162e25e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -172,7 +172,7 @@ struct aarch64_flag_desc
>unsigned int flag;
>  };
>  
> -#define AARCH64_FUSION_PAIR(name, internal_name, y) \
> +#define AARCH64_FUSION_PAIR(name, internal_name) \
>{ name, AARCH64_FUSE_##internal_name },
>  static const struct aarch64_flag_desc aarch64_fusible_pairs[] =
>  {



Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>  wrote:
>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>  wrote:
>>> > Hi Richard,
>>> >
>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>> > seemed more
>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>> > addition it
>>> > would seem to provide more opportunities for optimisation than a
>>> > target-specific
>>> > builtin implementation would. I accept that optimisation
>>> > opportunities will
>>> > be more limited for strict math compilation, but that it was still
>>> > worth having
>>> > them. Also, if we did map it to builtins then the scalar
>>> > version would go
>>> > through the optabs and the vector version would go through the
>>> > target's builtin
>>> > expansion, which doesn't seem very consistent.
>>>
>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
>>> correct?)
>> I thought for this particular case associativity wasn't an issue?
>> We're not doing any
>> reductions here, just simply performing max/min operations on each
>> pair of elements
>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>> ensure that for
>> each pair of elements if one element is a NaN we return the other one.
>
> Hmm, true.  Ok, my comment still stands - I don't see that using a
> tree code is the best thing to do here.  You can add fmin/max optabs
> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
> builtin for the vectorized variant.
>
> The reason I am pushing against a new tree code is that we'd have an
> awful lot of similar codes when pushing other flag related IL
> specialities to actual IL constructs.  And we still need to find a
> consistent way to do that.

 In this case though the new code is really the "native" min/max operation
 for fp, rather than some weird flag-dependent behaviour.  Maybe it's
 a bit unfortunate that the non-strict min/max fp operation got mapped
 to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
 the flag-related modification.  The STRICT_* prefix is forced by that and
 might make it seem like more of a special case than it really is.
>>>
>>> In some sense.  But the "strict" version already has a builtin (just no
>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>>> for existing builtins (why have builtins at all then?).
>>
>> We still need the builtin to match the C function (and to allow direct
>> calls to __builtin_fmin, etc., which are occasionally useful).
>>
 If you're still not convinced, how about an internal function instead
 of a built-in function, so that we can continue to use optabs for all
 cases?  I'd really like to avoid forcing such a generic concept down to
 target-specific builtins with target-specific expansion code, especially
 when the same concept is exposed by target-independent code for scalars.
>>>
>>> The target builtin is for the vectorized variant - not all targets might 
>>> have
>>> that and we'd need to query the target about this.  So using a IFN would
>>> mean adding a target hook for that query.
>>
>> No, the idea is that if we have a tree code or an internal function, the
>> decision about whether we have target support is based on a query of the
>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>> No new hooks are needed.
>>
>> The patch checked for target support that way.
>
> Fair enough.  Still this means we should have tree codes for all builtins
> that eventually are vectorized?  So why don't we have SIN_EXPR,
> POW_EXPR (ok, I did argue and have patches for that in the past),
> RINT_EXPR, SQRT_EXPR, etc?

Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
have implemented fmod() or remainder().  Casts from double to int could
have used the current rounding mode, but instead they truncate and
conversions using the rounding mode need to go through something like
(l)lrint().  Like you say, pow() could have been an operator (and is in
many languages), but instead it's a function.

> This patch starts to go down that route which is why I ask for the
> whole picture to be considered and hinted at the alternative implementation
> which follows existing practice.  Add a expander in builtins.c, add an optab,
> and eventual support to vectorized_function.
>
> See for example ix86_builtin_vectorized_function which handles
> s

Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Tue, Aug 4, 2015 at 1:50 PM, H.J. Lu  wrote:
> On Tue, Aug 4, 2015 at 1:45 PM, Segher Boessenkool
>  wrote:
>> On Tue, Aug 04, 2015 at 01:00:32PM -0700, H.J. Lu wrote:
>>> There is another issue with x86, maybe other targets.  You
>>> can't get the real stack top when stack is realigned and
>>> -maccumulate-outgoing-args isn't used since ix86_expand_prologue
>>> will create and return another stack frame for
>>> __builtin_frame_address and __builtin_return_address.
>>> It will be wrong for __builtin_stack_top, which should
>>> return the real stack address.
>>
>> That's why I asked:
>>
>>> >> > You might have a reason why you want the entry stack address instead 
>>> >> > of the
>>> >> > frame address, but you didn't really explain I think?  Or I missed it.
>>
>> What would a C program do with this, that it cannot do with the frame
>> address, that would be useful and cannot be much better done in straight
>> assembler?  Do you actually want to expose the argument pointer, maybe?
>>
>
> Yes, we want to use the argument pointer as shown in testcases
> included in my patch.
>

Where do we stand on this?  We need the hard stack address at
function entry for x86 without using frame pointer.   I added
__builtin_stack_top since __builtin_frame_address can't give
us what we want.  Should __builtin_stack_top be added to
middle-end or x86 backend?

Thanks.

-- 
H.J.


RE: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-19 Thread David Sherwood
> I asked Richard S. to give this a once-over which he did.  However, he
> technically can't approve due to the way his maintainership position was
> worded.
> 
> The one request would be a function comment for emit_mode_unit_size and
> emit_mode_unit_precision.  OK with that change.
Thanks. Here's a new patch with the comments added.

Good to go?
David.

ChangeLog:

2015-08-19  David Sherwood  

gcc/
* genmodes.c (emit_mode_unit_size_inline): New function.
(emit_mode_unit_precision_inline): New function.
(emit_insn_modes_h): Emit new #define.  Emit new functions.
(emit_mode_unit_size): New function.
(emit_mode_unit_precision): New function.
(emit_mode_adjustments): Add mode_unit_size adjustments.
(emit_insn_modes_c): Emit new arrays.
* machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
use new inline methods.



mode_inner4a.patch
Description: Binary data


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
  wrote:
> Richard Biener  writes:
>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>  wrote:
 On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
  wrote:
 > Hi Richard,
 >
 > Thanks for the reply. I'd chosen to add new expressions as this
 > seemed more
 > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
 > addition it
 > would seem to provide more opportunities for optimisation than a
 > target-specific
 > builtin implementation would. I accept that optimisation
 > opportunities will
 > be more limited for strict math compilation, but that it was still
 > worth having
 > them. Also, if we did map it to builtins then the scalar
 > version would go
 > through the optabs and the vector version would go through the
 > target's builtin
 > expansion, which doesn't seem very consistent.

 On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
 you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
 correct?)
>>> I thought for this particular case associativity wasn't an issue?
>>> We're not doing any
>>> reductions here, just simply performing max/min operations on each
>>> pair of elements
>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>> ensure that for
>>> each pair of elements if one element is a NaN we return the other one.
>>
>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>> tree code is the best thing to do here.  You can add fmin/max optabs
>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>> builtin for the vectorized variant.
>>
>> The reason I am pushing against a new tree code is that we'd have an
>> awful lot of similar codes when pushing other flag related IL
>> specialities to actual IL constructs.  And we still need to find a
>> consistent way to do that.
>
> In this case though the new code is really the "native" min/max operation
> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> a bit unfortunate that the non-strict min/max fp operation got mapped
> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
> the flag-related modification.  The STRICT_* prefix is forced by that and
> might make it seem like more of a special case than it really is.

 In some sense.  But the "strict" version already has a builtin (just no
 special expander in builtins.c).  We usually don't add 1:1 tree codes
 for existing builtins (why have builtins at all then?).
>>>
>>> We still need the builtin to match the C function (and to allow direct
>>> calls to __builtin_fmin, etc., which are occasionally useful).
>>>
> If you're still not convinced, how about an internal function instead
> of a built-in function, so that we can continue to use optabs for all
> cases?  I'd really like to avoid forcing such a generic concept down to
> target-specific builtins with target-specific expansion code, especially
> when the same concept is exposed by target-independent code for scalars.

 The target builtin is for the vectorized variant - not all targets might 
 have
 that and we'd need to query the target about this.  So using a IFN would
 mean adding a target hook for that query.
>>>
>>> No, the idea is that if we have a tree code or an internal function, the
>>> decision about whether we have target support is based on a query of the
>>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>>> No new hooks are needed.
>>>
>>> The patch checked for target support that way.
>>
>> Fair enough.  Still this means we should have tree codes for all builtins
>> that eventually are vectorized?  So why don't we have SIN_EXPR,
>> POW_EXPR (ok, I did argue and have patches for that in the past),
>> RINT_EXPR, SQRT_EXPR, etc?
>
> Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
> in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
> have implemented fmod() or remainder().  Casts from double to int could
> have used the current rounding mode, but instead they truncate and
> conversions using the rounding mode need to go through something like
> (l)lrint().  Like you say, pow() could have been an operator (and is in
> many languages), but instead it's a function.
>
>> This patch starts to go down that route which is why I ask for the
>> whole picture to be considered and hinted at the alternative implementation
>> which follows existing practic

Re: [gomp4] OpenACC first private

2015-08-19 Thread Nathan Sidwell

On 08/18/15 17:43, Thomas Schwinge wrote:


..., but the following ones remain to be addressed -- could somebody look
into this, please?  Especially the timeouts are very annoying.  Tests
that now reproducibly XPASS instead of XFAIL should be verified, and the
XFAIL marker removed.


 [-PASS:-]{+FAIL: gfortran.dg/goacc/modules.f95   -O  (internal compiler 
error)+}
 {+FAIL:+} gfortran.dg/goacc/modules.f95   -O  (test for excess errors)

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
 [-XFAIL:-]{+XPASS:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
 {+WARNING: program timed out.+}
 XFAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

 PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
 [-XFAIL:-]{+XPASS:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

 PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
 {+WARNING: program timed out.+}
 XFAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/lib-13.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O1  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/lib-13.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O1  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O2  (test for excess errors)
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O2  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer  (test for excess errors)
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer -funroll-loops  (test for excess 
errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/lib-13.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer 
-funroll-loops  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  execution test
 PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -g  (test for excess errors)




If the reduction ones are timeing out, they should simply be skipped until the 
reduction reworking is complete.  I do not know what the lib-13  ones are.



--
Nathan Sidwell


Re: [PATCH], PowerPC IEEE 128-bit patch #5

2015-08-19 Thread Segher Boessenkool
On Fri, Aug 14, 2015 at 11:46:03AM -0400, Michael Meissner wrote:
> +;; Like int_reg_operand, but don't return true for pseudo registers
> +(define_predicate "int_reg_operand_not_pseudo"
> +  (match_operand 0 "register_operand")
> +{
> +  if ((TARGET_E500_DOUBLE || TARGET_SPE) && invalid_e500_subreg (op, mode))
> +return 0;
> +
> +  if (GET_CODE (op) == SUBREG)
> +op = SUBREG_REG (op);
> +
> +  if (!REG_P (op))
> +return 0;
> +
> +  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
> +return 0;
> +
> +  return INT_REGNO_P (REGNO (op));
> +})

Since you use this only once, maybe it is easier (to read, etc.) if you
just test it there?  Hard regs do not get subregs.

> +(define_insn_and_split "ieee_128bit_vsx_neg2"
> +  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
> + (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
> +   (clobber (match_scratch:V16QI 2 "=v"))]
> +  "TARGET_FLOAT128 && FLOAT128_IEEE_P (mode)"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (match_dup 0)
> +(neg:TFIFKF (match_dup 1)))
> +   (use (match_dup 2))])]
> +{
> +  if (GET_CODE (operands[2]) == SCRATCH)
> +operands[2] = gen_reg_rtx (V16QImode);
> +
> +  operands[3] = gen_reg_rtx (V16QImode);
> +  emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
> +}
> +  [(set_attr "length" "8")
> +   (set_attr "type" "vecsimple")])

Where is operands[3] used?  I guess that whole line should be deleted?

> +(define_insn "*ieee_128bit_vsx_neg2_internal"
> +  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
> + (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
> +   (use (match_operand:V16QI 2 "register_operand" "=v"))]
> +  "TARGET_FLOAT128"
> +  "xxlxor %x0,%x1,%x2"
> +  [(set_attr "length" "4")
> +   (set_attr "type" "vecsimple")])

Length 4 is default, you can just leave it out (like we do for most
machine insns already).


Segher


RE: [PATCH][ARM]Tighten the conditions for arm_movw, arm_movt

2015-08-19 Thread Kyrill Tkachov
Hi Renlin,

Please send patches to gcc-patches for review.
Redirecting there now...


On 19/08/15 12:49, Renlin Li wrote:
> Hi all,
>
> This simple patch will tighten the conditions when matching movw and
> arm_movt rtx pattern.
> Those two patterns will generate the following assembly:
>
> movw w1, #:lower16: dummy + addend
> movt w1, #:upper16: dummy + addend
>
> The addend here is optional. However, it should be an 16-bit signed
> value with in the range -32768 <= A <= 32768.
>
> By impose this restriction explicitly, it will prevent LRA/reload code
> from generation invalid high/lo_sum code for arm target.
> In process_address_1(), if the address is not legitimate, it will try to
> generate high/lo_sum pair to put the address into register. It will
> check if the target support those newly generated reload instructions.
> By define those two patterns, arm will reject them if conditions is not
> meet.
>
> Otherwise, it might generate movw/movt instructions with addend larger
> than 32768, this will cause a GAS error. GAS will produce '''offset out
> of range'' error message when the addend for MOVW/MOVT REL relocation is
> too large.
>
>
> arm-none-eabi regression tests Okay, Okay to commit to the trunk and
> backport to 5.0?
>
> Regards,
> Renlin
>
> gcc/ChangeLog:
>
> 2015-08-19  Renlin Li  
>
>   * config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
>   * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
>   * config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
>   * config/arm/constraints.md ("j"): Add check for high code.

+/* Returns true if the pattern is a valid symbolic address, which is either a
+   symbol_ref or a symbol_ref + offset.  */
+bool
+arm_valid_symbolic_address_p (rtx addr)

New line between comment and function.

+{
+  rtx xop0, xop1 = NULL_RTX;
+  rtx tmp = addr;
+
+  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
+return true;
+
+  /* (const (plus: symbol_ref const_int))  */
+  if (GET_CODE (addr) == CONST)
+tmp = XEXP (addr, 0);
+
+  xop0 = XEXP (tmp, 0);
+  xop1 = XEXP (tmp, 1);


Is it guaranteed that at this point XEXP (tmp, 0) and XEXP (tmp, 1) are valid?
I think before you extract xop0 and xop1 you want to check that tmp is indeed a 
PLUS
and return false if it's not. Only then you should extract XEXP (tmp, 0) and 
XEXP (tmp, 1).

 +  if (GET_CODE (tmp) == PLUS && GET_CODE (xop0) == SYMBOL_REF
+  && CONST_INT_P (xop1))
+{
+  HOST_WIDE_INT offset = INTVAL (xop1);
+  if (offset < -0x8000 || offset > 0x7fff)
+   return false;
+  else
+   return true;

I think you can just do "return IN_RANGE (offset, -0x8000, 0x7);"

Thanks,
Kyrill







Re: RFC: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-08-19 Thread H.J. Lu
On Mon, Aug 17, 2015 at 10:17:00AM -0700, H.J. Lu wrote:
> On Mon, Aug 17, 2015 at 10:08 AM, Alexander Monakov  
> wrote:
> >> >> Perhaps add a comment that GOT slots are 64-bit on x32?
> >> >>
> >> >
> >> > Good idea.  I will update my patch.
> >> >
> >>
> >> How about this?
> >>
> >>
> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> index bf8a21d..216dee6 100644
> >> --- a/gcc/config/i386/i386.c
> >> +++ b/gcc/config/i386/i386.c
> >> @@ -25690,6 +25690,10 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> >> callarg1,
> >>   fnaddr);
> >>   }
> >>fnaddr = gen_const_mem (Pmode, fnaddr);
> >> +  /* Pmode may not be the same as word_mode for x32, which
> >
> > I think 'Pmode is not the same as word_mode on x32' is more appropriate 
> > here.
> 
> "-maddress-mode=long -mx32" makes Pmode == word_mode.
> 
> >> + doesn't support indirect branch va 32-bit memory slot.
> >
> > Typo: s/va/via.
> >
> 
> Fixed.
> 
> Here is the updated patch.
> 

Hi Jefff,

Can you review this?

Thanks.


H.J.
---
It boils down to that -fno-plt should convert calling an external
function, foo, from

call foo@PLT

to

call *foo@GOT

to avoid one extra direct branch to PLT.  The proper place for this is
in backend during expanding a function call.  The backend already takes
of many details for calling an external function, like setting up a PIC
register.  Using the GOT slot instead of PLT slot, just one of those
details.  For x86, it should be done in ix86_expand_call, not
prepare_call_address and hope for the best, which doesn't always
happen.  Also non-PIC case can only be handled in backend.

This patch reverts -fno-plt in prepare_call_address and handles it in
ix86_expand_call.  Other backends may need similar changes to support
-fno-plt.  Alternately, we can introduce a target hook to indicate
whether an external function should be called via register for -fno-plt
so that i386 backend can disable it in prepare_call_address.

sibcall_memory_operand is also updated to accept the GOT slot so that

call *foo@GOT(%reg)

can be generated by ix86_expand_call for 32-bit and 64-bit large model.

gcc/

PR target/67215
* calls.c (prepare_call_address): Don't handle -fno-plt here.
* config/i386/i386.c (ix86_expand_call): Generate indirect call
via GOT for -fno-plt.  Support indirect call via GOT for x32.
* config/i386/predicates.md (sibcall_memory_operand): Allow
GOT memory operand.

gcc/testsuite/

PR target/67215
* gcc.target/i386/pr67215-1.c: New test.
* gcc.target/i386/pr67215-2.c: Likewise.
* gcc.target/i386/pr67215-3.c: Likewise.
---
 gcc/calls.c   | 12 --
 gcc/config/i386/i386.c| 71 ---
 gcc/config/i386/predicates.md |  7 ++-
 gcc/testsuite/gcc.target/i386/pr67215-1.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-2.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-3.c | 12 ++
 6 files changed, 113 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-3.c

diff --git a/gcc/calls.c b/gcc/calls.c
index 5636725..7cce9be 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -203,18 +203,6 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, rtx 
static_chain_value,
   && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
  : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic
-  && fndecl_or_type
-  && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
-  && (!flag_plt
-  || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type)))
-  && !targetm.binds_local_p (fndecl_or_type))
-{
-  /* This is done only for PIC code.  There is no easy interface to force 
the
-function address into GOT for non-PIC case.  non-PIC case needs to be
-handled specially by the backend.  */
-  funexp = force_reg (Pmode, funexp);
-}
   else if (! sibcallp)
 {
   if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 05fa5e1..ac9a6c4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25649,21 +25649,54 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
callarg1,
   /* Static functions and indirect calls don't need the pic register.  
Also,
 check if PLT was explicitly avoided via no-plt or "noplt" attribute, 
making
 it an indirect call.  */
+  rtx addr = XEXP (fnaddr, 0);
   if (flag_pic
- && (!TARGET_64BIT
- || (ix86_cmodel == CM_LARGE_PIC
- && DEFAULT_ABI != MS_ABI))
- && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
- && !SYMBOL_REF_LOC

RE: [PATCH][ARM][3/3] Expand mod by power of 2

2015-08-19 Thread Kyrill Tkachov
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00448.html

Thanks,
Kyrill

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Kyrill Tkachov
> Sent: 10 August 2015 12:14
> To: GCC Patches
> Cc: Ramana Radhakrishnan; Richard Earnshaw; Marcus Shawcroft; James
> Greenhalgh
> Subject: Re: [PATCH][ARM][3/3] Expand mod by power of 2
> 
> Here is a slight respin.
> The important parts are the same, just the expander now uses the slightly
> shorter arm_gen_compare_reg and the rtx costs hunk is moved under an
> explicit case MOD.
> 
> Note, the tests still require patch 1/3 that does this for aarch64 that I 
> hope to
> post a respinned version of soon.
> 
> Ok after the prerequisite goes in?
> 
> Thanks,
> Kyrill
> 
> 
> 2015-08-10  Kyrylo Tkachov  
> 
>  * config/arm/arm.md (*subsi3_compare0): Rename to...
>  (subsi3_compare0): ... This.
>  (*arm_andsi3_insn): Rename to...
>  (arm_andsi3_insn): ... This.
>  (modsi3): New define_expand.
>  * config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
>  when operand is power of 2.
> 
> 
> 2015-08-10  Kyrylo Tkachov  
> 
>  * gcc.target/aarch64/mod_2.x: New file.
>  * gcc.target/aarch64/mod_256.x: Likewise.
>  * gcc.target/arm/mod_2.c: New test.
>  * gcc.target/arm/mod_256.c: Likewise.
>  * gcc.target/aarch64/mod_2.c: Likewise.
>  * gcc.target/aarch64/mod_256.c: Likewise.
> 
> 
> 
> On 31/07/15 09:20, Kyrill Tkachov wrote:
> > Ping.
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02037.html
> > Thanks,
> > Kyrill
> >
> > On 24/07/15 11:55, Kyrill Tkachov wrote:
> >> Hi all,
> >>
> >> This third patch implements the same algorithm as patch 1/3 but for arm.
> >> That is, for X % N where N is a power of 2 we do:
> >>
> >> rsbsr1, r0, #0
> >> and r0, r0, #(N - 1)
> >> and r1, r1, #(N - 1)
> >> rsbpl   r0, r1, #0
> >>
> >> For the special case where N is 2 we do the shorter:
> >>  cmp r0, #0
> >>  and r0, r0, #1
> >>  rsblt   r0, r0, #0
> >>
> >> Note that for the final conditional negate we expand to an
> >> IF_THEN_ELSE of a NEG rather than a cond_exec rtx because the lra
> >> dataflow analysis doesn't always deal with cond_execs correctly. The
> >> splitters fixed in patch 2/3 then break it into a cond_exec after reload, 
> >> so
> it all works out.
> >>
> >> Bootstrapped and tested on arm, with both ARM and Thumb2 states.
> >>
> >> Tests are added and shared with aarch64.
> >>
> >> Ok for trunk?
> >>
> >> Thanks,
> >> Kyrill
> >>
> >> 2015-07-24  Kyrylo Tkachov  
> >>
> >>* config/arm/arm.md (*subsi3_compare0): Rename to...
> >>(subsi3_compare0): ... This.
> >>(*arm_andsi3_insn): Rename to...
> >>(arm_andsi3_insn): ... This.
> >>(modsi3): New define_expand.
> >>* config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
> >>operand is power of 2.
> >>
> >>
> >> 2015-07-24  Kyrylo Tkachov  
> >>
> >>* gcc.target/aarch64/mod_2.x: New file.
> >>* gcc.target/aarch64/mod_256.x: Likewise.
> >>* gcc.target/arm/mod_2.c: New test.
> >>* gcc.target/arm/mod_256.c: Likewise.
> >>* gcc.target/aarch64/mod_2.c: Likewise.
> >>* gcc.target/aarch64/mod_256.c: Likewise.





Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2015 at 05:23:41AM -0700, H.J. Lu wrote:
> >>> >> > You might have a reason why you want the entry stack address instead 
> >>> >> > of the
> >>> >> > frame address, but you didn't really explain I think?  Or I missed 
> >>> >> > it.
> >>
> >> What would a C program do with this, that it cannot do with the frame
> >> address, that would be useful and cannot be much better done in straight
> >> assembler?  Do you actually want to expose the argument pointer, maybe?
> >
> > Yes, we want to use the argument pointer as shown in testcases
> > included in my patch.
> 
> Where do we stand on this?  We need the hard stack address at
> function entry for x86 without using frame pointer.   I added
> __builtin_stack_top since __builtin_frame_address can't give
> us what we want.  Should __builtin_stack_top be added to
> middle-end or x86 backend?

Sorry for not following up; I thought my suggestion was obvious.

Can you do a __builtin_argument_pointer instead?  That should work
for all targets, afaics?


Segher


RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Kyrill Tkachov
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00609.html

Thanks,
Kyrill

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Kyrill Tkachov
> Sent: 12 August 2015 15:32
> To: Jeff Law; Steven Bosscher
> Cc: Bernhard Reutner-Fischer; GCC Patches
> Subject: Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-
> conversion more aggressive
> 
> 
> On 11/08/15 18:09, Kyrill Tkachov wrote:
> > On 11/08/15 18:05, Jeff Law wrote:
> >> On 08/09/2015 03:20 PM, Steven Bosscher wrote:
> >>> On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law  wrote:
>  So there's a tight relationship between the implementation of
>  bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there
>  wasn't, then we'd probably be looking to use note_stores and
> note_uses.
> >>> Perhaps I'm missing something, but what is wrong with using DF here
> >>> instead of note_stores/note_uses? All the info on refs/mods of
> >>> registers is available in the DF caches.
> >> Nothing inherently wrong with using DF here.
> > I have reworked the patch to use FOR_EACH_INSN_DEF and
> > FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the
> refs/mods and it seems to work.
> > Is that what you mean by DF?
> > I'm doing some more testing and hope to post the updated version soon.
> 
> Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses
> and sets.
> 
> Bootstrapped and tested on x86_64 and aarch64.
> How does this look?
> 
> Thanks,
> Kyrill
> 
> 2015-08-10  Kyrylo Tkachov  
> 
>  * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>  then_cost, else_cost fields.  Change branch_cost field to unsigned int.
>  (end_ifcvt_sequence): Call set_used_flags on each insn in the
>  sequence.
>  Include rtl-iter.h.
>  (noce_simple_bbs): New function.
>  (noce_try_move): Bail if basic blocks are not simple.
>  (noce_try_store_flag): Likewise.
>  (noce_try_store_flag_constants): Likewise.
>  (noce_try_addcc): Likewise.
>  (noce_try_store_flag_mask): Likewise.
>  (noce_try_cmove): Likewise.
>  (noce_try_minmax): Likewise.
>  (noce_try_abs): Likewise.
>  (noce_try_sign_mask): Likewise.
>  (noce_try_bitop): Likewise.
>  (bbs_ok_for_cmove_arith): New function.
>  (noce_emit_all_but_last): Likewise.
>  (noce_emit_insn): Likewise.
>  (noce_emit_bb): Likewise.
>  (noce_try_cmove_arith): Handle non-simple basic blocks.
>  (insn_valid_noce_process_p): New function.
>  (contains_mem_rtx_p): Likewise.
>  (bb_valid_for_noce_process_p): Likewise.
>  (noce_process_if_block): Allow non-simple basic blocks
>  where appropriate.
> 
> 2015-08-11  Kyrylo Tkachov  
> 
>  * gcc.dg/ifcvt-1.c: New test.
>  * gcc.dg/ifcvt-2.c: Likewise.
>  * gcc.dg/ifcvt-3.c: Likewise.
> 
> > Thanks,
> > Kyrill
> >
> >> jeff
> >>





Re: [PATCH/AARCH64] Remove index from AARCH64_FUSION_PAIR

2015-08-19 Thread Andrew Pinski
On Wed, Aug 19, 2015 at 7:39 PM, James Greenhalgh
 wrote:
> On Wed, Aug 19, 2015 at 12:11:04PM +0100, Andrew Pinski wrote:
>> Instead of doing an explicit index in aarch64-fusion-pairs.def, we
>> should have an enum which does the index instead.  This allows
>> you to add/remove them without worrying about the order being
>> correct and having holes or worry about merge conflicts.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>
>
> Looks good, it would be good to expand this patch to get rid of the
> messy way we build the AARCH64_FUSE_ALL macro:
>
>> /* Hacky macro to build AARCH64_FUSE_ALL.  The sequence below expands
>>to:
>>AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
>> #undef AARCH64_FUSION_PAIR
>> #define AARCH64_FUSION_PAIR(x, name, y) \
>>   | AARCH64_FUSE_##name
>>
>>   AARCH64_FUSE_ALL = 0
>> #include "aarch64-fusion-pairs.def"
>
> We should now be able to do something like:
>
>   AARCH64_FUSE_ALL = ((1 << AARCH64_FUSE_index_END) - 1)
>
> Right?

Yes I actually thought of that after I had submitted the patch.

>
> If so, could you respin with that change?

Respinning this patch and the one for AARCH64_EXTRA_TUNING_OPTION.

Thanks,
Andrew

>
> Thanks,
> James
>
>
>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> index a7b00f6..53bbef4 100644
>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> @@ -20,19 +20,17 @@
>>  /* Pairs of instructions which can be fused. before including this file,
>> define a macro:
>>
>> - AARCH64_FUSION_PAIR (name, internal_name, index_bit)
>> + AARCH64_FUSION_PAIR (name, internal_name)
>>
>> Where:
>>
>>   NAME is a string giving a friendly name for the instructions to fuse.
>>   INTERNAL_NAME gives the internal name suitable for appending to
>> - AARCH64_FUSE_ to give an enum name.
>> - INDEX_BIT is the bit to set in the bitmask of supported fusion
>> - operations.  */
>> -
>> -AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK, 0)
>> -AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD, 1)
>> -AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK, 2)
>> -AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR, 3)
>> -AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH, 4)
>> + AARCH64_FUSE_ to give an enum name. */
>> +
>> +AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK)
>> +AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
>> +AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
>> +AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
>> +AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
>>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 0b09d49..c4c1817 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -201,8 +201,18 @@ struct tune_params
>>unsigned int extra_tuning_flags;
>>  };
>>
>> -#define AARCH64_FUSION_PAIR(x, name, index) \
>> -  AARCH64_FUSE_##name = (1 << index),
>> +#define AARCH64_FUSION_PAIR(x, name) \
>> +  AARCH64_FUSE_##name##_index,
>> +/* Supported fusion operations.  */
>> +enum aarch64_fusion_pairs_index
>> +{
>> +#include "aarch64-fusion-pairs.def"
>> +  AARCH64_FUSE_index_END
>> +};
>> +#undef AARCH64_FUSION_PAIR
>> +
>> +#define AARCH64_FUSION_PAIR(x, name) \
>> +  AARCH64_FUSE_##name = (1 << AARCH64_FUSE_##name##_index),
>>  /* Supported fusion operations.  */
>>  enum aarch64_fusion_pairs
>>  {
>> @@ -213,7 +223,7 @@ enum aarch64_fusion_pairs
>> to:
>> AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
>>  #undef AARCH64_FUSION_PAIR
>> -#define AARCH64_FUSION_PAIR(x, name, y) \
>> +#define AARCH64_FUSION_PAIR(x, name) \
>>| AARCH64_FUSE_##name
>>
>>AARCH64_FUSE_ALL = 0
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index aa268ae..162e25e 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -172,7 +172,7 @@ struct aarch64_flag_desc
>>unsigned int flag;
>>  };
>>
>> -#define AARCH64_FUSION_PAIR(name, internal_name, y) \
>> +#define AARCH64_FUSION_PAIR(name, internal_name) \
>>{ name, AARCH64_FUSE_##internal_name },
>>  static const struct aarch64_flag_desc aarch64_fusible_pairs[] =
>>  {
>


Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 5:51 AM, Segher Boessenkool
 wrote:
> On Wed, Aug 19, 2015 at 05:23:41AM -0700, H.J. Lu wrote:
>> >>> >> > You might have a reason why you want the entry stack address 
>> >>> >> > instead of the
>> >>> >> > frame address, but you didn't really explain I think?  Or I missed 
>> >>> >> > it.
>> >>
>> >> What would a C program do with this, that it cannot do with the frame
>> >> address, that would be useful and cannot be much better done in straight
>> >> assembler?  Do you actually want to expose the argument pointer, maybe?
>> >
>> > Yes, we want to use the argument pointer as shown in testcases
>> > included in my patch.
>>
>> Where do we stand on this?  We need the hard stack address at
>> function entry for x86 without using frame pointer.   I added
>> __builtin_stack_top since __builtin_frame_address can't give
>> us what we want.  Should __builtin_stack_top be added to
>> middle-end or x86 backend?
>
> Sorry for not following up; I thought my suggestion was obvious.
>
> Can you do a __builtin_argument_pointer instead?  That should work
> for all targets, afaics?

To me, stack top is easier to understand and argument pointer isn't
very clear.  Does argument pointer exist when there is no argument?

But I can live with it.  I will update my patch.

Thanks.

-- 
H.J.


[COMMITTED][AArch64] Cleanup whitespace in aarch64.c

2015-08-19 Thread Jiong Wang

These whitespaces are introduced by my commit r225017. Those whitespaces
should be replaced with tab according to GNU coding style.

Commited as obvisous (r227005), after cross build aarch64-elf OK.

2015-08-19  Jiong Wang  
gcc/
  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
  Replace whitespaces with tab.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aa268ae..0f3be3c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -931,7 +931,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	   The generate instruction sequence for accessing global variable
 	   is:
 
-	 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym]
+		 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym]
 
 	   Only one instruction needed. But we must initialize
 	   pic_offset_table_rtx properly.  We generate initialize insn for
@@ -940,12 +940,12 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	   The final instruction sequences will look like the following
 	   for multiply global variables access.
 
-	 adrp pic_offset_table_rtx, _GLOBAL_OFFSET_TABLE_
+		 adrp pic_offset_table_rtx, _GLOBAL_OFFSET_TABLE_
 
-	 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym1]
-	 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym2]
-	 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym3]
-	 ...  */
+		 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym1]
+		 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym2]
+		 ldr reg, [pic_offset_table_rtx, #:gotpage_lo15:sym3]
+		 ...  */
 
 	rtx s = gen_rtx_SYMBOL_REF (Pmode, "_GLOBAL_OFFSET_TABLE_");
 	crtl->uses_pic_offset_table = 1;


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Sandiford
Richard Biener  writes:
> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle
> both variants for things like constant folding and expression combining.
> That's very unfortunate and something we want to avoid (one reason
> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> is an example where this doesn't apply, of course (but I detest the name,
> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.

Yeah, but match.pd makes that easy, right? ;-)



Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Paolo Carlini

... I'm committing the below. Tested x86_64-linux.

Thanks,
Paolo.

/
2015-08-19  Paolo Carlini  

* include/c_global/cmath: Revert fix for libstdc++/58625, no
longer necessary (__builtin_signbit is now type-generic).
Index: include/c_global/cmath
===
--- include/c_global/cmath  (revision 227003)
+++ include/c_global/cmath  (working copy)
@@ -650,10 +650,10 @@
 isnormal(_Tp __x)
 { return __x != 0 ? true : false; }
 
-  // The front-end doesn't provide a type generic builtin (libstdc++/58625).
+  // Note: c++/36757 is fixed, __builtin_signbit is type-generic.
   constexpr bool
   signbit(float __x)
-  { return __builtin_signbitf(__x); }
+  { return __builtin_signbit(__x); }
 
   constexpr bool
   signbit(double __x)
@@ -661,7 +661,7 @@
 
   constexpr bool
   signbit(long double __x)
-  { return __builtin_signbitl(__x); }
+  { return __builtin_signbit(__x); }
 
   template
 constexpr typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,


Re: [PR64164] drop copyrename, integrate into expand

2015-08-19 Thread Andreas Schwab
Alexandre Oliva  writes:

> [PR64164] fix regressions reported on m68k and armeb
>
> From: Alexandre Oliva 
>
> Defer stack slot address assignment for all parms that can't live in
> pseudos, and accept pseudos assignments in assign_param_setup_block.

That doesn't fix the ia64 Ada miscompilation though.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-19 Thread Jeff Law

On 08/19/2015 06:29 AM, David Sherwood wrote:

I asked Richard S. to give this a once-over which he did.  However, he
technically can't approve due to the way his maintainership position was
worded.

The one request would be a function comment for emit_mode_unit_size and
emit_mode_unit_precision.  OK with that change.

Thanks. Here's a new patch with the comments added.

Good to go?
David.

ChangeLog:

2015-08-19  David Sherwood  

gcc/
* genmodes.c (emit_mode_unit_size_inline): New function.
(emit_mode_unit_precision_inline): New function.
(emit_insn_modes_h): Emit new #define.  Emit new functions.
(emit_mode_unit_size): New function.
(emit_mode_unit_precision): New function.
(emit_mode_adjustments): Add mode_unit_size adjustments.
(emit_insn_modes_c): Emit new arrays.
* machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
use new inline methods.


Thanks, this is OK for the trunk.

jeff



Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Biener
On Wed, Aug 19, 2015 at 3:06 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> As an additional point for many math functions we have to support errno
>> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
>> only if -fno-math-errno is in effect.  But then code has to handle
>> both variants for things like constant folding and expression combining.
>> That's very unfortunate and something we want to avoid (one reason
>> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
>> is an example where this doesn't apply, of course (but I detest the name,
>> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
>> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>
> Yeah, but match.pd makes that easy, right? ;-)

Sure, but that only addresses stmt combining, not other passes.  And of course
it causes {gimple,generic}-match.c to become even bigger ;)

Richard.


[PATCH][AArch64] Fix FAIL: gcc.target/aarch64/target_attr_crypto_ice_1.c (internal compiler error)

2015-08-19 Thread Kyrill Tkachov

Hi all,

This fixes the ICE exposed by Alexandre's patch 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00873.html)
The solution I came up with is to re-layout the parameter decls not during 
expansion time (when RTL has already
been allocated to SSA names) but in TARGET_SET_CURRENT_FUNCTION which is called 
much earlier before that and is
used when setting cfun. This way we reach expand with the proper vector modes 
registered for the param decls
and all seems to work ok.

The aarch64-builtins.c workaround that I initially introduced in 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02012.html
are partially reverted (at least the re-laying out parts).

The patch fixes the target_attr_crypto_ice_1.c ICE but I'd like to add a second 
derived testcase that
tests a different expansion path and it has proved useful in writing this patch.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-08-19  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_set_current_function):
Re-layout any vector parameters have non-simd layout.
* config/aarch64/aarch64-builtins.c (aarch64_relayout_simd_param):
Delete.
(aarch64_simd_expand_args): Delete call to the above.

2015-08-19  Kyrylo Tkachov  

* gcc.target/aarch64/target_attr_crypto_ice_2.c: New test.
commit 94093e43f5bc91f3afa1002f41dcd423e8db3237
Author: Kyrylo Tkachov 
Date:   Tue Aug 18 17:02:26 2015 +0100

[AArch64] Re-layout vector parameter DECLs in TARGET_SET_CURRENT_FUNCTION

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 0f4f2b9..e3a90b5 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -886,30 +886,6 @@ typedef enum
   SIMD_ARG_STOP
 } builtin_simd_arg;
 
-/* Relayout the decl of a function arg.  Keep the RTL component the same,
-   as varasm.c ICEs.  It doesn't like reinitializing the RTL
-   on PARM decls.  Something like this needs to be done when compiling a
-   file without SIMD and then tagging a function with +simd and using SIMD
-   intrinsics in there.  The types will have been laid out assuming no SIMD,
-   so we want to re-lay them out.  */
-
-static void
-aarch64_relayout_simd_param (tree arg)
-{
-  tree argdecl = arg;
-  if (TREE_CODE (argdecl) == SSA_NAME)
-argdecl = SSA_NAME_VAR (argdecl);
-
-  if (argdecl
-  && (TREE_CODE (argdecl) == PARM_DECL
-	  || TREE_CODE (argdecl) == VAR_DECL))
-{
-  rtx rtl = NULL_RTX;
-  rtl = DECL_RTL_IF_SET (argdecl);
-  relayout_decl (argdecl);
-  SET_DECL_RTL (argdecl, rtl);
-}
-}
 
 static rtx
 aarch64_simd_expand_args (rtx target, int icode, int have_retval,
@@ -940,7 +916,6 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
 	{
 	  tree arg = CALL_EXPR_ARG (exp, opc - have_retval);
 	  enum machine_mode mode = insn_data[icode].operand[opc].mode;
-	  aarch64_relayout_simd_param (arg);
 	  op[opc] = expand_normal (arg);
 
 	  switch (thisarg)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 217b4d7..b16e511 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8073,6 +8073,23 @@ aarch64_set_current_function (tree fndecl)
 	  = save_target_globals_default_opts ();
 	}
 }
+
+  if (!fndecl)
+return;
+
+  /* If we turned on SIMD make sure that any vector parameters are re-laid out
+ so that they use proper vector modes.  */
+  if (TARGET_SIMD)
+{
+  tree parms = DECL_ARGUMENTS (fndecl);
+  for (; parms && parms != void_list_node; parms = TREE_CHAIN (parms))
+	{
+	  if (TREE_CODE (parms) == PARM_DECL
+	  && VECTOR_TYPE_P (TREE_TYPE (parms))
+	  && DECL_MODE (parms) != TYPE_MODE (TREE_TYPE (parms)))
+	relayout_decl (parms);
+	}
+}
 }
 
 /* Enum describing the various ways we can handle attributes.
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c
new file mode 100644
index 000..d6e7b68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=thunderx+nofp" } */
+
+/* Make sure that we don't ICE when dealing with vector parameters
+   in a simd-tagged function within a non-simd translation unit.  */
+
+#pragma GCC push_options
+#pragma GCC target ("+nothing+simd")
+typedef unsigned int __uint32_t;
+typedef __uint32_t uint32_t ;
+typedef __Uint32x4_t uint32x4_t;
+#pragma GCC pop_options
+
+
+__attribute__ ((target ("cpu=cortex-a57")))
+uint32x4_t
+foo (uint32x4_t a, uint32_t b, uint32x4_t c)
+{
+  return c;
+}


Re: [PR64164] drop copyrename, integrate into expand

2015-08-19 Thread Andreas Schwab
Andreas Schwab  writes:

> Alexandre Oliva  writes:
>
>> [PR64164] fix regressions reported on m68k and armeb
>>
>> From: Alexandre Oliva 
>>
>> Defer stack slot address assignment for all parms that can't live in
>> pseudos, and accept pseudos assignments in assign_param_setup_block.
>
> That doesn't fix the ia64 Ada miscompilation though.

I mean miscomparison, not miscompilation.  The difference is only in the
insn scheduling.

--- x1  2015-08-19 15:26:41.0 +0200
+++ x2  2015-08-19 15:26:46.0 +0200
@@ -1,5 +1,5 @@
 
-stage2-gcc/ada/par.o: file format elf64-ia64-little
+stage3-gcc/ada/par.o: file format elf64-ia64-little
 
 
 Disassembly of section .text:
@@ -29467,25 +29467,25 @@
214b2: PCREL21B atree__new_node
214b6:  00 00 00 02 00 00   nop.i 0x0
214bc:  08 00 00 50 br.call.sptk.many b0=214b0 

-   214c0:  08 78 e0 01 80 24   [MMI]   mov r15=16504
-   214c6:  e0 80 03 00 49 20   mov r14=16496
-   214cc:  00 06 04 92 mov r1=16608
-   214d0:  0a 80 23 00 08 20   [MMI]   addp4 r112=r8,r0;;
-   214d6:  f0 78 30 00 40 c0   add r15=r15,r12
-   214dc:  e1 60 00 80 add r14=r14,r12
-   214e0:  0a 08 04 18 00 20   [MMI]   add r1=r1,r12;;
-   214e6:  f0 00 3c 20 20 00   ld4 r15=[r15]
-   214ec:  00 00 04 00 nop.i 0x0
+   214c0:  08 70 c0 01 80 24   [MMI]   mov r14=16496
+   214c6:  00 00 00 02 00 e0   nop.m 0x0
+   214cc:  81 07 00 92 mov r15=16504
+   214d0:  09 08 80 01 81 24   [MMI]   mov r1=16608
+   214d6:  00 00 00 02 00 00   nop.m 0x0
+   214dc:  8e 00 20 80 addp4 r112=r8,r0;;
+   214e0:  09 70 38 18 00 20   [MMI]   add r14=r14,r12
+   214e6:  f0 78 30 00 40 20   add r15=r15,r12
+   214ec:  10 60 00 80 add r1=r1,r12;;
214f0:  09 00 20 1c 90 11   [MMI]   st4 [r14]=r8
214f6:  10 00 04 30 20 00   ld8 r1=[r1]
214fc:  00 00 04 00 nop.i 0x0;;
-   21500:  01 00 00 00 01 00   [MII]   nop.m 0x0
-   21506:  e0 00 3c 2c 00 e0   sxt4 r14=r15
-   2150c:  01 61 00 84 adds r15=16,r12;;
-   21510:  0b 70 38 00 11 20   [MMI]   shladd r14=r14,2,r0;;
-   21516:  e0 78 38 00 40 00   add r14=r15,r14
+   21500:  02 78 00 1e 10 10   [MII]   ld4 r15=[r15]
+   21506:  00 00 00 02 00 c0   nop.i 0x0;;
+   2150c:  01 78 58 00 sxt4 r14=r15
+   21510:  0b 78 40 18 00 21   [MMI]   adds r15=16,r12;;
+   21516:  e0 70 00 22 40 00   shladd r14=r14,2,r0
2151c:  00 00 04 00 nop.i 0x0;;
-   21520:  09 00 00 00 01 00   [MMI]   nop.m 0x0
+   21520:  0b 70 3c 1c 00 20   [MMI]   add r14=r15,r14;;
21526:  e0 e0 3b 7e 46 00   adds r14=-4,r14
2152c:  00 00 04 00 nop.i 0x0;;
21530:  10 88 03 1c 10 10   [MIB]   ld4 r113=[r14]


Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-19 Thread Richard Biener
On Tue, 18 Aug 2015, Aldy Hernandez wrote:

> On 08/18/2015 07:20 AM, Richard Biener wrote:
> > 
> > This starts a series of patches (still in development) to refactor
> > dwarf2out.c to better cope with early debug (and LTO debug).
> 
> Awesome!  Thanks.
> 
> > Aldyh, what other testing did you usually do for changes?  Run
> > the gdb testsuite against the new compiler?  Anything else?
> 
> gdb testsuite, and make sure you test GCC with --enable-languages=all,go,ada,
> though the latter is mostly useful while you iron out bugs initially.  I found
> that ultimately, the best test was C++.

I see.

> Pre merge I also bootstrapped the compiler and compared .debug* section sizes
> in object files to make sure things were within reason.
> 
> > +
> > +static void
> > +vmsdbgout_early_finish (const char *filename ATTRIBUTE_UNUSED)
> > +{
> > +  if (write_symbols == VMS_AND_DWARF2_DEBUG)
> > +(*dwarf2_debug_hooks.early_finish) (filename);
> > +}
> 
> You can get rid of ATTRIBUTE_UNUSED now.

Done.  I've also refrained from moving

  gen_scheduled_generic_parms_dies ();
  gen_remaining_tmpl_value_param_die_attribute ();

for now as that causes regressions I have to investigate.

The patch below has passed bootstrap & regtest on x86_64-unknown-linux-gnu
as well as gdb testing.  Twice unpatched, twice patched - results seem
to be somewhat unstable!?  I even refrained from using any -j with
make check-gdb...  maybe it's just contrib/test_summary not coping well
with gdb?  any hints?  Difference between unpatched run 1 & 2 is
for example

--- results.unpatched   2015-08-19 15:08:36.152899926 +0200
+++ results.unpatched2  2015-08-19 15:29:46.902060797 +0200
@@ -209,7 +209,6 @@
 WARNING: remote_expect statement without a default case?!
 WARNING: remote_expect statement without a default case?!
 WARNING: remote_expect statement without a default case?!
-FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3, 
fc4)
 FAIL: gdb.cp/inherit.exp: print g_vD
 FAIL: gdb.cp/inherit.exp: print g_vE
 FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)'
@@ -238,6 +237,7 @@
 UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings
 FAIL: gdb.fortran/whatis_type.exp: run to MAIN__
 WARNING: remote_expect statement without a default case?!
+FAIL: gdb.gdb/complaints.exp: print symfile_complaints->root->fmt
 WARNING: remote_expect statement without a default case?!
 WARNING: remote_expect statement without a default case?!
 WARNING: remote_expect statement without a default case?!
@@ -362,12 +362,12 @@
=== gdb Summary ===
 
-# of expected passes   30881
+# of expected passes   30884
 # of unexpected failures   284
 # of unexpected successes  2
-# of expected failures 85
+# of expected failures 83
 # of unknown successes 2
-# of known failures60
+# of known failures59
 # of unresolved testcases  6
 # of untested testcases32
 # of unsupported tests 165

the sames changes randomly appear/disappear in the patched case.  
Otherwise patched/unpatched agree.

Ok?

Thanks,
Richard.

2015-08-18  Richard Biener  

* debug.h (gcc_debug_hooks::early_finish): Add filename argument.
* dbxout.c (dbx_debug_hooks): Adjust.
* debug.c (do_nothing_hooks): Likewise.
* sdbout.c (sdb_debug_hooks): Likewise.
* vmsdbgout.c (vmsdbgout_early_finish): New function dispatching
to dwarf2out variant if needed.
(vmsdbg_debug_hooks): Adjust.
* dwarf2out.c (dwarf2_line_hooks): Adjust.
(flush_limbo_die_list): New function.
(dwarf2out_finish): Call flush_limbo_die_list instead of
dwarf2out_early_finish.  Assert there are no deferred asm-names.
Move early stuff ...
(dwarf2out_early_finish): ... here.
* cgraphunit.c (symbol_table::finalize_compilation_unit):
Call early_finish with main_input_filename argument.


Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c(revision 226966)
+++ gcc/cgraphunit.c(working copy)
@@ -2490,7 +2490,7 @@ symbol_table::finalize_compilation_unit
 
   /* Clean up anything that needs cleaning up after initial debug
  generation.  */
-  (*debug_hooks->early_finish) ();
+  (*debug_hooks->early_finish) (main_input_filename);
 
   /* Finally drive the pass manager.  */
   compile ();
Index: gcc/dbxout.c
===
--- gcc/dbxout.c(revision 226966)
+++ gcc/dbxout.c(working copy)
@@ -354,7 +354,7 @@ const struct gcc_debug_hooks dbx_debug_h
 {
   dbxout_init,
   dbxout_finish,
-  debug_nothing_void,
+  debug_nothing_charstar,
   debug_nothing_void,
   debug_nothing_int_charstar,
   debug_nothing_int_charstar,
Index: gcc/debug.c
===
--- gcc/debug.c (revision 226966)
+++ gcc

Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-19 Thread Aldy Hernandez

On 08/19/2015 06:45 AM, Richard Biener wrote:

[copying gdb folks]


On Tue, 18 Aug 2015, Aldy Hernandez wrote:


On 08/18/2015 07:20 AM, Richard Biener wrote:


[snip]


The patch below has passed bootstrap & regtest on x86_64-unknown-linux-gnu
as well as gdb testing.  Twice unpatched, twice patched - results seem
to be somewhat unstable!?  I even refrained from using any -j with
make check-gdb...  maybe it's just contrib/test_summary not coping well
with gdb?  any hints?  Difference between unpatched run 1 & 2 is
for example

--- results.unpatched   2015-08-19 15:08:36.152899926 +0200
+++ results.unpatched2  2015-08-19 15:29:46.902060797 +0200
@@ -209,7 +209,6 @@
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
-FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3,
fc4)
  FAIL: gdb.cp/inherit.exp: print g_vD
  FAIL: gdb.cp/inherit.exp: print g_vE
  FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)'
@@ -238,6 +237,7 @@
  UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings
  FAIL: gdb.fortran/whatis_type.exp: run to MAIN__
  WARNING: remote_expect statement without a default case?!
+FAIL: gdb.gdb/complaints.exp: print symfile_complaints->root->fmt
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
@@ -362,12 +362,12 @@
 === gdb Summary ===

-# of expected passes   30881
+# of expected passes   30884
  # of unexpected failures   284
  # of unexpected successes  2
-# of expected failures 85
+# of expected failures 83
  # of unknown successes 2
-# of known failures60
+# of known failures59
  # of unresolved testcases  6
  # of untested testcases32
  # of unsupported tests 165

the sames changes randomly appear/disappear in the patched case.
Otherwise patched/unpatched agree.



This is somewhat expected. Well, at least I never found a good 
explanation. Some tests seemed to be thread related and inconsistent. 
Others, I have no idea.


After running the tests enough times I got a feeling of which tests 
would always pass, and use those as reference. It was confusing at 
first. Perhaps the GDB folks could shed some light? I've found them very 
helpful.


Also, -j made things worse. I never used it.

Aldy



Re: [PATCH] Fix middle-end/67133, part 1

2015-08-19 Thread Jeff Law

On 08/18/2015 01:49 PM, Marek Polacek wrote:

On Tue, Aug 18, 2015 at 10:45:21AM +0200, Richard Biener wrote:

On Mon, Aug 17, 2015 at 7:31 PM, Jeff Law  wrote:

But in walking through all that, I think I've stumbled on a simpler
solution.  Specifically do as a little as possible and let the standard
mechanisms clean things up :-)

1. Delete the code that removes instructions after the trap.

2. Split the block immediately after the trap and remove the edge
from the original block (with the trap) to the new block.


cfg-cleanup will do that for you if you have a not returning stmt ending
the previous block.


The following patch hopefully does what's oulined above.
Arguably I should have renamed the insert_trap_and_remove_trailing_statements
to something more descriptive, e.g. insert_trap_and_split_block.  Your
call.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-08-18  Marek Polacek  

PR middle-end/67133
* gimple-ssa-isolate-paths.c
(insert_trap_and_remove_trailing_statements): Rename to ...
(insert_trap): ... this.  Don't remove trailing statements; split
block instead.
(find_explicit_erroneous_behaviour): Don't remove all outgoing edges.

* g++.dg/torture/pr67133.C: New test.

Looks good to me too.
jeff



[PATCH][2/n] Change dw2_asm_output_offset to allow assembling extra offset

2015-08-19 Thread Richard Biener

This is needed so that we can output references to $early-debug-symbol + 
constant offset where $early-debug-symbol is the beginning of a 
.debug_info section containing early debug info from the compile-stage.
Constant offsets are always fine for any object formats I know, I
tested ia64-linux apart from x86_64-linux.  I have access to darwin at
home (x86_64), so can try there as well.

The question is whether we want to assemble "+" HOST_WIDE_INT_PRINT_DEC
directly for the non-ASM_OUTPUT_DWARF_OFFSET case as well as opposed
to building a PLUS (we do build a SYMBOL_REF already).

I've also refrained from changing all existing callers to
dw2_asm_output_offset to pass a 0 offset argument to avoid the
overloading - please tell me if you prefer that.  The LTO support
adds a single call here:

@@ -9064,8 +9248,12 @@ output_die (dw_die_ref die)
size = DWARF2_ADDR_SIZE;
  else
size = DWARF_OFFSET_SIZE;
- dw2_asm_output_offset (size, sym, debug_info_section, 
"%s",
-name);
+ if (AT_ref (a)->with_offset)
+   dw2_asm_output_offset (size, sym, AT_ref 
(a)->die_offset,
+  debug_info_section, "%s", 
name);
+ else
+   dw2_asm_output_offset (size, sym, debug_info_section, 
"%s",
+  name);
}

(ignore that ->with_offset, it can hopefully go if die_offset is zero
for other cases - just checking with an assert right now).

Bootstrap & regtest pending on x86_64-unknown-linux-gnu.  The patch
is an effective no-op currently (the offset != 0 condition is always
false for now).

Ok?

CCing affected target maintainers - I've verified this on ia64
by just generating assembly for a simple testcase with -g and
changing one of the generated section-relative offsets in the
suggested way (adding "+4"), passing through the assembler and
inspecting the resulting relocations.

Thanks,
Richard.

2015-08-19  Richard Biener  

* dwarf2asm.h (dw2_asm_output_offset): Add overload with
extra offset argument.
* dwarf2asm.c (dw2_asm_output_offset): Implement that.
* doc/tm.texi.in (ASM_OUTPUT_DWARF_OFFSET): Adjust documentation
to reflect new offset parameter.
* doc/tm.texi: Regenerate.
* config/darwin.h (ASM_OUTPUT_DWARF_OFFSET): Adjust.
* config/darwin-protos.h (darwin_asm_output_dwarf_delta): Add
offset argument.
(darwin_asm_output_dwarf_offset): Likewise.
* config/darwin.c (darwin_asm_output_dwarf_delta): Add offset
argument.
(darwin_asm_output_dwarf_offset): Pass offset argument through.
* config/ia64/ia64.h (ASM_OUTPUT_DWARF_OFFSET): Adjust.
* config/i386/cygmin.h (ASM_OUTPUT_DWARF_OFFSET): Likewise.

Index: gcc/dwarf2asm.c
===
--- gcc/dwarf2asm.c (revision 226856)
+++ gcc/dwarf2asm.c (working copy)
@@ -33,6 +33,8 @@ along with GCC; see the file COPYING3.
 #include "dwarf2asm.h"
 #include "dwarf2.h"
 #include "tm_p.h"
+#include "function.h"
+#include "emit-rtl.h"
 
 
 /* Output an unaligned integer with the given value and size.  Prefer not
@@ -190,12 +192,39 @@ dw2_asm_output_offset (int size, const c
   va_start (ap, comment);
 
 #ifdef ASM_OUTPUT_DWARF_OFFSET
-  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, base);
+  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, 0, base);
 #else
   dw2_assemble_integer (size, gen_rtx_SYMBOL_REF (Pmode, label));
 #endif
 
   if (flag_debug_asm && comment)
+{
+  fprintf (asm_out_file, "\t%s ", ASM_COMMENT_START);
+  vfprintf (asm_out_file, comment, ap);
+}
+  fputc ('\n', asm_out_file);
+
+  va_end (ap);
+}
+
+void
+dw2_asm_output_offset (int size, const char *label, HOST_WIDE_INT offset,
+  section *base ATTRIBUTE_UNUSED,
+  const char *comment, ...)
+{
+  va_list ap;
+
+  va_start (ap, comment);
+
+#ifdef ASM_OUTPUT_DWARF_OFFSET
+  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, offset, base);
+#else
+  dw2_assemble_integer (size, gen_rtx_PLUS (Pmode,
+   gen_rtx_SYMBOL_REF (Pmode, label),
+   gen_int_mode (offset, Pmode)));
+#endif
+
+  if (flag_debug_asm && comment)
 {
   fprintf (asm_out_file, "\t%s ", ASM_COMMENT_START);
   vfprintf (asm_out_file, comment, ap);
Index: gcc/dwarf2asm.h
===
--- gcc/dwarf2asm.h (revision 226856)
+++ gcc/dwarf2asm.h (working copy)
@@ -40,6 +40,10 @@ extern void dw2_asm_output_offset (int,
   const char *, ...)
  ATTRIBUTE_NULL_PRINTF_4;
 
+extern void dw2_asm_output_offset (int, const char *, HOST_WIDE_INT,
+  section

[AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64

2015-08-19 Thread Jiong Wang

Marcus Shawcroft writes:

> On 21 May 2015 at 17:44, Jiong Wang  wrote:
>>
>> This patch add -mtls-size option for AArch64. This option let user to do
>> finer control on code generation for various TLS model on AArch64.
>>
>> For example, for TLS LE, user can specify smaller tls-size, for example
>> 4K which is quite usual, to let AArch64 backend generate more efficient
>> instruction sequences.
>>
>> Currently, -mtls-size accept all integer, then will translate it into
>> 12(4K), 24(16M), 32(4G), 48(256TB) based on the value.
>>
>> no functional change.
>>
>> ok for trunk?
>>
>> 2015-05-20  Jiong Wang  
>>
>> gcc/
>>   * config/aarch64/aarch64.opt (mtls-size): New entry.
>>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
>>   * doc/invoke.texi (AArch64 Options): Document -mtls-size.
>
> +mtls-size=
> +Target RejectNegative Joined UInteger Var(aarch64_tls_size) Init(24)
> +Specifies size of the TLS data area, default size is 16M. Accept any
> integer, but the value
> +will be transformed into 12(4K), 24(16M), 32(4G), 48(256TB)
> +
>
> Can we follow the mechanism used by rs6000 and limit the accepted
> values here using an Enum to just the valid values: 12, 24, 32, 48?

Done.

>
> +@item -mtls-size=@var{size}
> +@opindex mtls-size
> +Specify the size of TLS area. You can specify smaller value to get better 
> code
> +generation for TLS variable access. Currently, we accept any integer, but 
> will
> +turn them into 12(4K), 24(16M), 32(4G), 48(256TB) according to the integer
> +value.
> +
>
> How about:
> "Specify bit size of immediate TLS offsets.  Valid values are 12, 24,
> 32, 48."

Done.

Patch updated, please review, thanks.

2015-08-19  Jiong Wang  

gcc/
  * config/aarch64/aarch64.opt (mtls-size): New entry.
  * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
  (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
  * doc/invoke.texi (AArch64 Options): Document -mtls-size.

-- 
Regards,
Jiong

From 4a244a1d4b32b1e10e5ba07c0c568f135648912e Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Wed, 19 Aug 2015 14:10:37 +0100
Subject: [PATCH 1/3] 1

---
 gcc/config/aarch64/aarch64.c   | 31 +++
 gcc/config/aarch64/aarch64.opt | 19 +++
 gcc/doc/invoke.texi|  5 +
 3 files changed, 55 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0f3be3c..f55cc38 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7506,6 +7506,36 @@ aarch64_parse_one_override_token (const char* token,
   return;
 }
 
+/* A checking mechanism for the implementation of the tls size.  */
+
+static void
+initialize_aarch64_tls_size (struct gcc_options *opts)
+{
+  switch (opts->x_aarch64_cmodel_var)
+{
+case AARCH64_CMODEL_TINY:
+  /* The maximum TLS size allowed under tiny is 1M.  */
+  if (aarch64_tls_size > 20)
+	aarch64_tls_size = 20;
+  break;
+case AARCH64_CMODEL_SMALL:
+  /* The maximum TLS size allowed under small is 4G.  */
+  if (aarch64_tls_size > 32)
+	aarch64_tls_size = 32;
+  break;
+case AARCH64_CMODEL_LARGE:
+  /* The maximum TLS size allowed under large is 16E.
+	 FIXME: 16E should be 64bit, we only support 48bit offset now.  */
+  if (aarch64_tls_size > 48)
+	aarch64_tls_size = 48;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  return;
+}
+
 /* Parse STRING looking for options in the format:
  string	:: option:string
  option	:: name=substring
@@ -7598,6 +7628,7 @@ aarch64_override_options_internal (struct gcc_options *opts)
 }
 
   initialize_aarch64_code_model (opts);
+  initialize_aarch64_tls_size (opts);
 
   aarch64_override_options_after_change_1 (opts);
 }
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 37c2c50..8642bdb 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -96,6 +96,25 @@ mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(aarch64_tls_dialect) Init(TLS_DESCRIPTORS) Save
 Specify TLS dialect
 
+mtls-size=
+Target RejectNegative Joined Var(aarch64_tls_size) Enum(aarch64_tls_size)
+Specifies bit size of immediate TLS offsets.  Valid values are 12, 24, 32, 48.
+
+Enum
+Name(aarch64_tls_size) Type(int)
+
+EnumValue
+Enum(aarch64_tls_size) String(12) Value(12)
+
+EnumValue
+Enum(aarch64_tls_size) String(24) Value(24)
+
+EnumValue
+Enum(aarch64_tls_size) String(32) Value(32)
+
+EnumValue
+Enum(aarch64_tls_size) String(48) Value(48)
+
 march=
 Target RejectNegative ToLower Joined Var(aarch64_arch_string)
 -march=ARCH	Use features of architecture ARCH
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27be317..c9f332c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -514,6 +514,7 @@ Objective-C and Objective-C++ Dialects}.
 -mstrict-align @gol
 -momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
 -mtls-dialect=desc  -mtls-dialect=tra

[AArch64][TLSLE][2/3] Add the option "-mtls-size" for AArch64

2015-08-19 Thread Jiong Wang

As we have added -mtls-size support, there should be four types TLSLE
symbols:

  SYMBOL_TLSLE12
  SYMBOL_TLSLE24
  SYMBOL_TLSLE32
  SYMBOL_TLSLE48

which reflect the maximum address bits needed to address this symbol.

This patch rename SYMBOL_TLSLE to SYMBOL_TLSLE24. Patch [3/3] will add
support for other symbol types.

OK for trunk?

2015-08-19  Jiong Wang  

gcc/
  * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename
  SYMBOL_TLSLE to SYMBOL_TLSLE24.
  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Likewise
  (aarch64_expand_mov_immediate): Likewise
  (aarch64_print_operand): Likewise
  (aarch64_classify_symbol): Likewise

From 676fc22d51432b037a2c77ae9de01f934cc77985 Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Wed, 19 Aug 2015 14:12:57 +0100
Subject: [PATCH 2/3] 2

---
 gcc/config/aarch64/aarch64-protos.h |  4 ++--
 gcc/config/aarch64/aarch64.c| 12 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 0b09d49..daa45bf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -74,7 +74,7 @@ enum aarch64_symbol_context
SYMBOL_SMALL_TLSGD
SYMBOL_SMALL_TLSDESC
SYMBOL_SMALL_GOTTPREL
-   SYMBOL_TLSLE
+   SYMBOL_TLSLE24
Each of these represents a thread-local symbol, and corresponds to the
thread local storage relocation operator for the symbol being referred to.
 
@@ -111,7 +111,7 @@ enum aarch64_symbol_type
   SYMBOL_SMALL_GOTTPREL,
   SYMBOL_TINY_ABSOLUTE,
   SYMBOL_TINY_GOT,
-  SYMBOL_TLSLE,
+  SYMBOL_TLSLE24,
   SYMBOL_FORCE_TO_MEM
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f55cc38..87f8d96 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1115,7 +1115,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	return;
   }
 
-case SYMBOL_TLSLE:
+case SYMBOL_TLSLE24:
   {
 	rtx tp = aarch64_load_tp (NULL);
 
@@ -1677,7 +1677,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 
 	case SYMBOL_SMALL_ABSOLUTE:
 	case SYMBOL_TINY_ABSOLUTE:
-	case SYMBOL_TLSLE:
+	case SYMBOL_TLSLE24:
 	  aarch64_load_symref_appropriately (dest, imm, sty);
 	  return;
 
@@ -4560,7 +4560,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, ":gottprel:");
 	  break;
 
-	case SYMBOL_TLSLE:
+	case SYMBOL_TLSLE24:
 	  asm_fprintf (asm_out_file, ":tprel:");
 	  break;
 
@@ -4593,7 +4593,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, ":gottprel_lo12:");
 	  break;
 
-	case SYMBOL_TLSLE:
+	case SYMBOL_TLSLE24:
 	  asm_fprintf (asm_out_file, ":tprel_lo12_nc:");
 	  break;
 
@@ -4611,7 +4611,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 
   switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR))
 	{
-	case SYMBOL_TLSLE:
+	case SYMBOL_TLSLE24:
 	  asm_fprintf (asm_out_file, ":tprel_hi12:");
 	  break;
 	default:
@@ -8717,7 +8717,7 @@ aarch64_classify_tls_symbol (rtx x)
   return SYMBOL_SMALL_GOTTPREL;
 
 case TLS_MODEL_LOCAL_EXEC:
-  return SYMBOL_TLSLE;
+  return SYMBOL_TLSLE24;
 
 case TLS_MODEL_EMULATED:
 case TLS_MODEL_NONE:
-- 
1.9.1



Re: [PATCH][ARM]Tighten the conditions for arm_movw, arm_movt

2015-08-19 Thread Renlin Li

Hi Kyrylo,

On 19/08/15 13:46, Kyrylo Tkachov wrote:

Hi Renlin,

Please send patches to gcc-patches for review.
Redirecting there now...
Thank you! I should really double check after Thunderbird auto complete 
the address for me.




On 19/08/15 12:49, Renlin Li wrote:

Hi all,

This simple patch will tighten the conditions when matching movw and
arm_movt rtx pattern.
Those two patterns will generate the following assembly:

movw w1, #:lower16: dummy + addend
movt w1, #:upper16: dummy + addend

The addend here is optional. However, it should be an 16-bit signed
value with in the range -32768 <= A <= 32768.

By impose this restriction explicitly, it will prevent LRA/reload code
from generation invalid high/lo_sum code for arm target.
In process_address_1(), if the address is not legitimate, it will try to
generate high/lo_sum pair to put the address into register. It will
check if the target support those newly generated reload instructions.
By define those two patterns, arm will reject them if conditions is not
meet.

Otherwise, it might generate movw/movt instructions with addend larger
than 32768, this will cause a GAS error. GAS will produce '''offset out
of range'' error message when the addend for MOVW/MOVT REL relocation is
too large.


arm-none-eabi regression tests Okay, Okay to commit to the trunk and
backport to 5.0?

Regards,
Renlin

gcc/ChangeLog:

2015-08-19  Renlin Li  

   * config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
   * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
   * config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
   * config/arm/constraints.md ("j"): Add check for high code.


Is it guaranteed that at this point XEXP (tmp, 0) and XEXP (tmp, 1) are valid?
I think before you extract xop0 and xop1 you want to check that tmp is indeed a 
PLUS
and return false if it's not. Only then you should extract XEXP (tmp, 0) and 
XEXP (tmp, 1).

  +  if (GET_CODE (tmp) == PLUS && GET_CODE (xop0) == SYMBOL_REF
+  && CONST_INT_P (xop1))
+{
+  HOST_WIDE_INT offset = INTVAL (xop1);
+  if (offset < -0x8000 || offset > 0x7fff)
+   return false;
+  else
+   return true;

I think you can just do "return IN_RANGE (offset, -0x8000, 0x7);"

Updated accordingly, please check the latest attachment.

Thank you,
Renlin

commit fb4329931a7895bcd8744d7378f6d291377f2c2e
Author: Renlin Li 
Date:   Mon Aug 17 12:24:25 2015 +0100

movw

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 16eb854..ebaf746 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -312,6 +312,7 @@ extern int vfp3_const_double_for_bits (rtx);
 
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 	   rtx);
+extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cf60313..d87eca1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28811,6 +28811,38 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx out, rtx in,
   #undef BRANCH
 }
 
+/* Returns true if the pattern is a valid symbolic address, which is either a
+   symbol_ref or (symbol_ref + addend).
+
+   According to the ARM ELF ABI, the initial addend of REL-type relocations
+   processing MOVW and MOVT instructions is formed by interpreting the 16-bit
+   literal field of the instruction as a 16-bit signed value in the range
+   -32768 <= A < 32768.  */
+
+bool
+arm_valid_symbolic_address_p (rtx addr)
+{
+  rtx xop0, xop1 = NULL_RTX;
+  rtx tmp = addr;
+
+  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
+return true;
+
+  /* (const (plus: symbol_ref const_int))  */
+  if (GET_CODE (addr) == CONST)
+tmp = XEXP (addr, 0);
+
+  if (GET_CODE (tmp) == PLUS)
+{
+  xop0 = XEXP (tmp, 0);
+  xop1 = XEXP (tmp, 1);
+
+  if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
+	  return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
+}
+
+  return false;
+}
 
 /* Returns true if a valid comparison operation and makes
the operands in a form that is valid.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f63fc39..7ac4f34 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5662,7 +5662,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
 		   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2"
+  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 42935a4..f9e11e0 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -67,7 +67,8 @@
 (define_constraint "j"
  "A constant suitabl

[AArch64][TLSLE][3/3] Implement local executable mode for all memory model

2015-08-19 Thread Jiong Wang

Marcus Shawcroft writes:

> On 21 May 2015 at 17:49, Jiong Wang  wrote:
>
>> 2015-05-14  Jiong Wang  
>> gcc/
>>   * config/aarch64/aarch64.c (aarch64_print_operand): Support tls_size.
>>   * config/aarch64/aarch64.md (tlsle): Choose proper instruction
>>   sequences.
>>   (tlsle_): New define_insn.
>>   (tlsle_movsym_): Ditto.
>>   * config/aarch64/constraints.md (Uta): New constraint.
>>   (Utb): Ditto.
>>   (Utc): Ditto.
>>   (Utd): Ditto.
>>
>> gcc/testsuite/
>>   * gcc.target/aarch64/tlsle.c: New test source.
>>   * gcc.target/aarch64/tlsle12.c: New testcase.
>>   * gcc.target/aarch64/tlsle24.c: New testcase.
>>   * gcc.target/aarch64/tlsle32.c: New testcase.
>>
>
>
>   case SYMBOL_TLSLE:
> -  asm_fprintf (asm_out_file, ":tprel_lo12_nc:");
> +  if (aarch64_tls_size <= 12)
> +/* Make sure TLS offset fit into 12bit.  */
> +asm_fprintf (asm_out_file, ":tprel_lo12:");
> +  else
> +asm_fprintf (asm_out_file, ":tprel_lo12_nc:");
>break;
>
> Use the existing classify_symbol mechanism we use throughout the
> aarch64 backend.  Specifically rename SYMBOL_TLSLE as SYMBOL_TLSLE24
> and introduce the 3 missing flavours then use the symbol
> classification to control behaviour such as this modifier selection.

Done.

classified TLS symbol into the following sub-types according to the value of 
tls size.

 SYMBOL_TLSLE12
 SYMBOL_TLSLE24
 SYMBOL_TLSLE32
 SYMBOL_TLSLE48

And On AArch64, instruction sequence for TLS LE under -mtls-size=32 will
utilize the relocation modifier "tprel_g0_nc" together with MOVK, it's
only supported in binutils since 2015-03-04 as PR gas/17843. So I
adjusted tlsle32.c to make it robust by detecting whether there is such
binutils support.

OK for trunk?

2015-08-19  Marcus Shawcroft  
Jiong Wang  
gcc/
  * config/aarch64/aarch64.c (initialize_aarch64_tls_size): Set default
  tls size for tiny, small, large memory model.
  (aarch64_load_symref_appropriately): Support new symbol types.
  (aarch64_expand_mov_immediate): Likewise.
  (aarch64_print_operand): Likewise.
  (aarch64_classify_tls_symbol): Likewise.
  * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Likewise.
  (aarch64_symbol_type): Likewise.
  * config/aarch64/aarch64.md (tlsle): Deleted.
  (tlsle12_): New define_insn.
  (tlsle24_): Likewise.
  (tlsle32_): Likewise.
  (tlsle48_): Likewise.
  * doc/sourcebuild.texi (AArch64-specific attributes): Document
  "aarch64_tlsle32".

gcc/testsuite/
  * lib/target-supports.exp (check_effective_target_aarch64_tlsle32):
  New test directive.
  * gcc.target/aarch64/tlsle_1.x: New test source.
  * gcc.target/aarch64/tlsle12.c: New testcase.
  * gcc.target/aarch64/tlsle24.c: New testcase.
  * gcc.target/aarch64/tlsle32.c: New testcase.
-- 
Regards,
Jiong

From bd5c221101a9cf241c1f4d117c643e3a5c7e344d Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Wed, 19 Aug 2015 14:15:01 +0100
Subject: [PATCH 3/3] 3

---
 gcc/config/aarch64/aarch64-protos.h|  6 +++
 gcc/config/aarch64/aarch64.c   | 66 ++
 gcc/config/aarch64/aarch64.md  | 56 +
 gcc/testsuite/gcc.target/aarch64/tlsle12.c |  8 
 gcc/testsuite/gcc.target/aarch64/tlsle24.c |  9 
 gcc/testsuite/gcc.target/aarch64/tlsle32.c | 10 +
 gcc/testsuite/gcc.target/aarch64/tlsle_1.x | 14 +++
 gcc/testsuite/lib/target-supports.exp  | 17 
 8 files changed, 161 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/tlsle12.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/tlsle24.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/tlsle32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/tlsle_1.x

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index daa45bf..09d83e3 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -74,7 +74,10 @@ enum aarch64_symbol_context
SYMBOL_SMALL_TLSGD
SYMBOL_SMALL_TLSDESC
SYMBOL_SMALL_GOTTPREL
+   SYMBOL_TLSLE12
SYMBOL_TLSLE24
+   SYMBOL_TLSLE32
+   SYMBOL_TLSLE48
Each of these represents a thread-local symbol, and corresponds to the
thread local storage relocation operator for the symbol being referred to.
 
@@ -111,7 +114,10 @@ enum aarch64_symbol_type
   SYMBOL_SMALL_GOTTPREL,
   SYMBOL_TINY_ABSOLUTE,
   SYMBOL_TINY_GOT,
+  SYMBOL_TLSLE12,
   SYMBOL_TLSLE24,
+  SYMBOL_TLSLE32,
+  SYMBOL_TLSLE48,
   SYMBOL_FORCE_TO_MEM
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 87f8d96..9a1e53b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1115,14 +1115,43 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	return;
   }
 
+case SYMBOL_TLSLE12:
 case SYMBOL_TLSLE24:
+case SYMBOL_TLSLE32:
+case SYMBOL_TLSLE48:
   {
+	machine_mode mode = GET_MODE (dest);
 	rtx tp = aarch64_load_tp (NULL);
 
-	if (GET_MODE (dest) != Pmode)
-	  tp = gen_lowpart (GET_MODE 

Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-19 Thread Pedro Alves
On 08/19/2015 02:55 PM, Aldy Hernandez wrote:
> On 08/19/2015 06:45 AM, Richard Biener wrote:
> 
> [copying gdb folks]

Thanks.

> 
>> On Tue, 18 Aug 2015, Aldy Hernandez wrote:
>>
>>> On 08/18/2015 07:20 AM, Richard Biener wrote:
> 
> [snip]
> 
>> The patch below has passed bootstrap & regtest on x86_64-unknown-linux-gnu
>> as well as gdb testing.  Twice unpatched, twice patched - results seem
>> to be somewhat unstable!?  I even refrained from using any -j with
>> make check-gdb...  maybe it's just contrib/test_summary not coping well
>> with gdb?  any hints?  Difference between unpatched run 1 & 2 is
>> for example
>>
>> --- results.unpatched   2015-08-19 15:08:36.152899926 +0200
>> +++ results.unpatched2  2015-08-19 15:29:46.902060797 +0200
>> @@ -209,7 +209,6 @@
>>   WARNING: remote_expect statement without a default case?!
>>   WARNING: remote_expect statement without a default case?!
>>   WARNING: remote_expect statement without a default case?!
>> -FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3,
>> fc4)


if {![target_info exists gdb,skip_float_tests]} {
gdb_test_stdio "print find_max_double(5,1.0,17.0,2.0,3.0,4.0)" \
"find_max\\(.*\\) returns 17\\.00\[ \r\n\]+" \
".\[0-9\]+ = 17" \
"print find_max_double(5,1.0,17.0,2.0,3.0,4.0)"
}

# Test _Complex type here if supported.
if [support_complex_tests] {
global gdb_prompt

set test "print find_max_float_real(4, fc1, fc2, fc3, fc4)"
gdb_test $test ".*= 4 \\+ 4 \\* I" $test

>>   FAIL: gdb.cp/inherit.exp: print g_vD
>>   FAIL: gdb.cp/inherit.exp: print g_vE
>>   FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)'
>> @@ -238,6 +237,7 @@
>>   UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings
>>   FAIL: gdb.fortran/whatis_type.exp: run to MAIN__
>>   WARNING: remote_expect statement without a default case?!
>> +FAIL: gdb.gdb/complaints.exp: print symfile_complaints->root->fmt


# Prime the system
gdb_test_stdio "call complaint (&symfile_complaints, \"Register a 
complaint\")" \
"During symbol reading, Register a complaint."

# Check that the complaint was inserted and where
gdb_test "print symfile_complaints->root->fmt" \
".\[0-9\]+ =.*\"Register a complaint\""


So in both cases, there was a "gdb_test_stdio" test just before
the test that failed.  gdb_test_stdio is new, and it expects output
from two different spawn ids simultaneously.  Sounds like it still
needs fixing...  I'll guess that Richard has a much faster machine than
my getting-old laptop, which exposes races that I didn't trip on...

> This is somewhat expected. Well, at least I never found a good 
> explanation. Some tests seemed to be thread related and inconsistent. 
> Others, I have no idea.
> 

Indeed there are still some threading tests that unfortunately still
cause intermittent fails.  We've been fixing them but it's a
slow process.  Judging by the GDB build bots, x86_64 GNU/Linux
testing seems to be mostly stable though.  There's one DejaGNU issue
that is consistently causing trouble  -- see below.

> After running the tests enough times I got a feeling of which tests 
> would always pass, and use those as reference. It was confusing at 
> first. Perhaps the GDB folks could shed some light? I've found them very 
> helpful.
> 
> Also, -j made things worse. I never used it.

It gets much better if you use git master DejaGNU, to pick this up:

  http://lists.gnu.org/archive/html/dejagnu/2015-07/msg5.html

Thanks,
Pedro Alves



[fortran, committed] Forward port test generic_31.f90 from the 5 branch

2015-08-19 Thread Mikael Morin

Hello,

I have forward-ported the test that justified backport of the pr66929 
patch on the 5 branch:

https://gcc.gnu.org/r227010

Mikael



Index: gcc/testsuite/gfortran.dg/generic_31.f90
===
--- gcc/testsuite/gfortran.dg/generic_31.f90	(révision 0)
+++ gcc/testsuite/gfortran.dg/generic_31.f90	(révision 227010)
@@ -0,0 +1,35 @@
+! { dg-do run }
+!
+! PR fortran/66929
+! Check that the specific FIRST symbol is used for the call to FOO,
+! so that the J argument is not assumed to be present
+
+module m
+  interface foo
+module procedure first
+  end interface foo
+contains
+  elemental function bar(j) result(r)
+integer, intent(in), optional :: j
+integer :: r, s(2)
+! We used to have NULL dereference here, in case of a missing J argument
+s = foo(j, [3, 7])
+r = sum(s)
+  end function bar
+  elemental function first(i, j) result(r)
+integer, intent(in), optional :: i
+integer, intent(in) :: j
+integer :: r
+if (present(i)) then
+  r = i
+else
+  r = -5
+end if
+  end function first
+end module m
+program p
+  use m
+  integer :: i
+  i = bar()
+  if (i /= -10) call abort
+end program p
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(révision 227009)
+++ gcc/testsuite/ChangeLog	(révision 227010)
@@ -1,3 +1,8 @@
+2015-08-19  Mikael Morin  
+
+	PR fortran/66929
+	* gfortran.dg/generic_31.f90: New.
+
 2015-08-19  Marek Polacek  
 
 	PR middle-end/67133





[PATCH][AArch64][obvious] Remove obsolete comment in aarch64-option-extensions.def

2015-08-19 Thread Kyrill Tkachov

Hi all,

This comment in aarch64-option-extensions.def seems obsolete and to me is more 
confusing than helpful.
The entries in that file are not "example" extensions, they have a real 
meaning, and they are not templates
for adding new CPUs anyway (not sure that ever made sense).

This patch removes that comment.

Committed as obvious with r227011.

Thanks,
Kyrill

2015-08-19  Kyrylo Tkachov  

* config/aarch64/aarch64-option-extensions.def: Delete obsolete
comment.
commit 4de4c2a8d80eb49384cbd6c1746e0ce87ea3bbb4
Author: Kyrylo Tkachov 
Date:   Wed Aug 19 13:51:58 2015 +0100

[AArch64][obvious] Remove obsolete comment in aarch64-option-extensions.def

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 1762cc8..b261a0f 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,6 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-/* V8 Architecture Extensions.
-   This list currently contains example extensions for CPUs that implement
-   AArch64, and therefore serves as a template for adding more CPUs in the
-   future.  */
-
 AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
 AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")


[PATCH][AArch64] Use popcount_hwi instead of homebrew version

2015-08-19 Thread Kyrill Tkachov

Hi all,

I noticed we have a hand-crafted "bit_count" function in the aarch64 backend 
that implements the popcount operation.
We already have a midend popcount_hwi function operating on HOST_WIDE_INTs 
which seems to be exactly what we need.

This patch removes the aarch64-specific version and updates the one callsite 
where it's used, the '%P' output operand,
which itself is only used by the *andim_ashift_bfiz pattern in aarch64.md.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-08-19  Kyrylo Tkachov  

* config/aarch64/aarch64.c (bit_count): Delete prototype
and definition.
(aarch64_print_operand): Use popcount_hwi instead of the above.
commit d52a7d4a0a4e1a8bcf11e549ded207851b6086b3
Author: Kyrylo Tkachov 
Date:   Wed Aug 19 14:28:45 2015 +0100

[AArch64] Use popcount_hwi instead of homebrew version

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b16e511..86eabac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -150,7 +150,6 @@ static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
 static bool aarch64_vector_mode_supported_p (machine_mode);
-static unsigned bit_count (unsigned HOST_WIDE_INT);
 static bool aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
 		 const unsigned char *sel);
 static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
@@ -4170,19 +4169,6 @@ aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT val)
   return aarch64_const_vec_all_same_in_range_p (x, val, val);
 }
 
-static unsigned
-bit_count (unsigned HOST_WIDE_INT value)
-{
-  unsigned count = 0;
-
-  while (value)
-{
-  count++;
-  value &= value - 1;
-}
-
-  return count;
-}
 
 /* N Z C V.  */
 #define AARCH64_CC_V 1
@@ -4337,7 +4323,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  return;
 	}
 
-  asm_fprintf (f, "%u", bit_count (INTVAL (x)));
+  asm_fprintf (f, "%u", popcount_hwi (INTVAL (x)));
   break;
 
 case 'H':


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Michael Matz
Hi,

On Wed, 19 Aug 2015, Richard Biener wrote:

> I think tree_code is 64bits now.

Huh?  No; it's 16 bit since 8 bit run out.


Ciao,
Michael.


Re: [PATCH][AArch64] Use popcount_hwi instead of homebrew version

2015-08-19 Thread James Greenhalgh
On Wed, Aug 19, 2015 at 04:02:41PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> I noticed we have a hand-crafted "bit_count" function in the aarch64 backend 
> that implements the popcount operation.
> We already have a midend popcount_hwi function operating on HOST_WIDE_INTs 
> which seems to be exactly what we need.
> 
> This patch removes the aarch64-specific version and updates the one callsite 
> where it's used, the '%P' output operand,
> which itself is only used by the *andim_ashift_bfiz pattern in 
> aarch64.md.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

OK. A change like this which performs useful cleanup is borderline
obvious.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2015-08-19  Kyrylo Tkachov  
> 
>  * config/aarch64/aarch64.c (bit_count): Delete prototype
>  and definition.
>  (aarch64_print_operand): Use popcount_hwi instead of the above.



Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Joseph Myers
On Wed, 19 Aug 2015, Andreas Schwab wrote:

> FX  writes:
> 
> > @@ -80,6 +80,24 @@ foo_1 (float f, double d, long double ld
> >if (__builtin_finitel (ld) != res_isfin)
> >  __builtin_abort ();
> >  
> > +  /* Sign bit of zeros and nans is not preserved in unsafe math mode.  */
> > +#ifdef UNSAFE
> > +  if (!res_isnan && d != 0)
> > +#endif
> 
> Why only in usafe mode?  Isn't the sign bit of NaN always unreliable?

NaN sign bits are meaningful for a limited set of operations.  (As I noted 
in , there would 
be a use for a -fno-signed-nans option in certain cases, but we don't 
currently have such an option.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Biener
On August 19, 2015 5:05:01 PM GMT+02:00, Michael Matz  wrote:
>Hi,
>
>On Wed, 19 Aug 2015, Richard Biener wrote:
>
>> I think tree_code is 64bits now.
>
>Huh?  No; it's 16 bit since 8 bit run out.

Err, that's what I was trying to say...
16bits, obviously.

BTW, in addition to errno math there is rounding math where we rely on virtual 
operands to not mess with ordering.

Richard.

>
>Ciao,
>Michael.




Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Joseph Myers
On Wed, 19 Aug 2015, Richard Biener wrote:

> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle

I'd say that for functions like that (which can be expanded inline only 
for -fno-math-errno) there should be no-errno built-in function variants 
that users can call even if -fmath-errno (if not expanded inline, they'd 
still result in a call to a libm function that might set errno).

An example of a use for that is AArch64 sqrt intrinsics that need an 
architecture-specific built-in __builtin_aarch64_sqrtdf when 
__builtin_sqrt_noerrno would do just as well if it existed.  As another 
example: various libm functions are marked in builtins.def as not setting 
errno, even though their proper semantics mean they might set errno; see 
bug 64101 for the example of erf.  One such function is fma.  But if you 
limit fma inline expansion (for calls to fma / __builtin_fma in the user's 
program; obviously this doesn't affect expansion via contraction of a * b 
+ c) to allow for the possibility of errno setting, you definitely want a 
way for user programs to get back the efficient inline expansion if they 
don't need errno set; for example, glibc uses __builtin_fma in various 
cases if _FP_FAST_FMA, and does not need errno setting in those cases, so 
would want to use __builtin_fma_noerrno in the event of any such change.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-08-19 Thread Richard Sandiford
Richard Biener  writes:
> BTW, in addition to errno math there is rounding math where we rely on
> virtual operands to not mess with ordering.

But you know what I'm going to say to that.  Rounding affects arithmetic
just as much as things like pow().  (And also doesn't affect min/max.)

Richard



Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 6:00 AM, H.J. Lu  wrote:
> On Wed, Aug 19, 2015 at 5:51 AM, Segher Boessenkool
>  wrote:
>> On Wed, Aug 19, 2015 at 05:23:41AM -0700, H.J. Lu wrote:
>>> >>> >> > You might have a reason why you want the entry stack address 
>>> >>> >> > instead of the
>>> >>> >> > frame address, but you didn't really explain I think?  Or I missed 
>>> >>> >> > it.
>>> >>
>>> >> What would a C program do with this, that it cannot do with the frame
>>> >> address, that would be useful and cannot be much better done in straight
>>> >> assembler?  Do you actually want to expose the argument pointer, maybe?
>>> >
>>> > Yes, we want to use the argument pointer as shown in testcases
>>> > included in my patch.
>>>
>>> Where do we stand on this?  We need the hard stack address at
>>> function entry for x86 without using frame pointer.   I added
>>> __builtin_stack_top since __builtin_frame_address can't give
>>> us what we want.  Should __builtin_stack_top be added to
>>> middle-end or x86 backend?
>>
>> Sorry for not following up; I thought my suggestion was obvious.
>>
>> Can you do a __builtin_argument_pointer instead?  That should work
>> for all targets, afaics?
>
> To me, stack top is easier to understand and argument pointer isn't
> very clear.  Does argument pointer exist when there is no argument?
>
> But I can live with it.  I will update my patch.
>

Here is a patch to add __builtin_argument_pointer.  I only have

 -- Built-in Function: void * __builtin_argument_pointer (void)
 This function returns the argument pointer.

as documentation.  Can you suggest a better description so that it can
be implemented also by other compilers?

Thanks.

-- 
H.J.
From 9af08fdda587e1876e09840499000e35cc841e96 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 21 Jul 2015 14:32:09 -0700
Subject: [PATCH] Add __builtin_argument_pointer

When __builtin_frame_address is used to retrieve the address of the
function stack frame, the frame pointer register is required, which
wastes one register and 2 instructions.  For x86-32, one less register
means significant negative impact on performance.  This patch adds a
new builtin function, __builtin_argument_pointer.  It returns the
argument pointer, which, on x86, can be used to compute the stack
address when the function is called by subtracting the size of integer
register.

gcc/

	PR target/66960
	* builtin-types.def (BT_FN_PTR_VOID): New function type.
	* builtins.c (expand_builtin): Handle BUILT_IN_ARGUMENT_POINTER.
	(is_simple_builtin): Likewise.
	* ipa-pure-const.c (special_builtin_state): Likewise.
	* builtins.def: Add BUILT_IN_ARGUMENT_POINTER.
	* function.h (function): Add argument_pointer_taken.
	* config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is
	used and the argument pointer has been taken.
	* doc/extend.texi: Document __builtin_argument_pointer.

gcc/testsuite/

	PR target/66960
	* gcc.target/i386/pr66960-1.c: New test.
	* gcc.target/i386/pr66960-2.c: Likewise.
	* gcc.target/i386/pr66960-3.c: Likewise.
	* gcc.target/i386/pr66960-4.c: Likewise.
	* gcc.target/i386/pr66960-5.c: Likewise.
---
 gcc/builtin-types.def |  1 +
 gcc/builtins.c|  5 +
 gcc/builtins.def  |  1 +
 gcc/config/i386/i386.c|  6 ++
 gcc/doc/extend.texi   |  4 
 gcc/function.h|  3 +++
 gcc/ipa-pure-const.c  |  1 +
 gcc/testsuite/gcc.target/i386/pr66960-1.c | 34 +++
 gcc/testsuite/gcc.target/i386/pr66960-2.c | 34 +++
 gcc/testsuite/gcc.target/i386/pr66960-3.c | 18 
 gcc/testsuite/gcc.target/i386/pr66960-4.c | 22 
 gcc/testsuite/gcc.target/i386/pr66960-5.c | 22 
 12 files changed, 151 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-5.c

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 0e34531..2b6b5ab 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -177,6 +177,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_LONGDOUBLE_LONGDOUBLE,
 		 BT_COMPLEX_LONGDOUBLE, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_UINT, BT_PTR, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_SIZE, BT_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_1 (BT_FN_PTR_VOID, BT_PTR, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINT, BT_INT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONG, BT_INT, BT_LONG)
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 31969ca..b1cfa44 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6206,6 +6206,10 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 ca

Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Andreas Schwab
Joseph Myers  writes:

> On Wed, 19 Aug 2015, Andreas Schwab wrote:
>
>> FX  writes:
>> 
>> > @@ -80,6 +80,24 @@ foo_1 (float f, double d, long double ld
>> >if (__builtin_finitel (ld) != res_isfin)
>> >  __builtin_abort ();
>> >  
>> > +  /* Sign bit of zeros and nans is not preserved in unsafe math mode.  */
>> > +#ifdef UNSAFE
>> > +  if (!res_isnan && d != 0)
>> > +#endif
>> 
>> Why only in usafe mode?  Isn't the sign bit of NaN always unreliable?
>
> NaN sign bits are meaningful for a limited set of operations.

And what are those?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[AArch64] Break -mcpu tie between the compiler and assembler

2015-08-19 Thread James Greenhalgh

Hi,

This patch has been sitting in my tree for a while - it comes in handy
when trying out bootstrap or test with -mcpu values like -mcpu=cortex-a72
with a system assmebler which trails trunk binutils.

Essentially, we rewrite -mcpu=foo to a -march flag providing the same
architecture revision and set of optional architecture features. There
is no reason we should ever need the assembler to see a CPU name, it
should only be interested in the architecture variant.

While we're there I've long found this function too fragile and hard
to grok in C. So I've rewritten it in C++ to use std::string rather than
raw C strings. Making this work with extension strings requires a slight
refactor to the existing extension printing code to pull it across to
somewhere common.

Note that this also stops us from having to pick through a big.LITTLE
target to find and separate the core names - we can just look up the
architecture of the whole target and use that.

The new function does leak the allocation of a C string to hold the
result, but looking at gcc.c:getenv_spec_function and
gcc.c:replace_extension_spec_func this is the usual thing to do.

This has been through an aarch64-none-linux-gnu bootstrap and test run,
configured with --with-cpu=cortex-a72 , which my system assembler does
not understand.

Ok?

Thanks,
James

---
2015-08-19  James Greenhalgh  

* common/config/aarch64/aarch64-common.c
(AARCH64_CPU_NAME_LENGTH): Delete.
(aarch64_option_extension): New.
(all_extensions): Likewise.
(processor_name_to_arch): Likewise.
(arch_to_arch_name): Likewise.
(all_cores): New.
(all_architectures): Likewise.
(aarch64_get_extension_string_for_isa_flags): Likewise.
(aarch64_rewrite_selected_cpu): Change to rewrite CPU names to
architecture names.
* config/aarch64/aarch64-protos.h
(aarch64_get_extension_string_for_isa_flags): New.
* config/aarch64/aarch64.c (aarch64_print_extension): Delete.
(aarch64_option_print): Get the string to print from
aarch64_get_extension_string_for_isa_flags.
(aarch64_declare_function_name): Likewise.
* config/aarch64/aarch64.h (BIG_LITTLE_SPEC): Rename to...
(MCPU_TO_MARCH_SPEC): This.
(ASM_CPU_SPEC): Use it.
(BIG_LITTLE_SPEC_FUNCTIONS): Rename to...
(MCPU_TO_MARCH_SPEC_FUNCTIONS): ...This.
(EXTRA_SPEC_FUNCTIONS): Use it.

diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 726c625..476401f 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -27,7 +27,7 @@
 #include "common/common-target-def.h"
 #include "opts.h"
 #include "flags.h"
-#include "errors.h"
+#include "diagnostic.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -107,36 +107,137 @@ aarch64_handle_option (struct gcc_options *opts,
 
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
 
-#define AARCH64_CPU_NAME_LENGTH 128
+/* An ISA extension in the co-processor and main instruction set space.  */
+struct aarch64_option_extension
+{
+  const char *const name;
+  const unsigned long flags_on;
+  const unsigned long flags_off;
+};
+
+/* ISA extensions in AArch64.  */
+static const struct aarch64_option_extension all_extensions[] =
+{
+#define AARCH64_OPT_EXTENSION(NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRING) \
+  {NAME, FLAGS_ON, FLAGS_OFF},
+#include "config/aarch64/aarch64-option-extensions.def"
+#undef AARCH64_OPT_EXTENSION
+  {NULL, 0, 0}
+};
+
+struct processor_name_to_arch
+{
+  const std::string processor_name;
+  const enum aarch64_arch arch;
+  const unsigned long flags;
+};
+
+struct arch_to_arch_name
+{
+  const enum aarch64_arch arch;
+  const std::string arch_name;
+};
+
+/* Map processor names to the architecture revision they implement and
+   the default set of architectural feature flags they support.  */
+static const struct processor_name_to_arch all_cores[] =
+{
+#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
+  {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
+#include "config/aarch64/aarch64-cores.def"
+#undef AARCH64_CORE
+  {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
+  {"", aarch64_no_arch, 0}
+};
+
+/* Map architecture revisions to their string representation.  */
+static const struct arch_to_arch_name all_architectures[] =
+{
+#define AARCH64_ARCH(NAME, CORE, ARCH_IDENT, ARCH, FLAGS) \
+  {AARCH64_ARCH_##ARCH_IDENT, NAME},
+#include "config/aarch64/aarch64-arches.def"
+#undef AARCH64_ARCH
+  {aarch64_no_arch, ""}
+};
+
+/* Return a string representation of ISA_FLAGS.  */
+
+std::string
+aarch64_get_extension_string_for_isa_flags (unsigned long isa_flags)
+{
+  const struct aarch64_option_extension *opt = NULL;
+  std::string outstr = "";
+
+  for (opt = all_extensions; opt->name != NULL; opt++)
+if ((isa_flags & opt->flags_on) == opt->flags_o

Re: [AArch64] Break -mcpu tie between the compiler and assembler

2015-08-19 Thread Andrew Pinski
On Wed, Aug 19, 2015 at 11:39 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch has been sitting in my tree for a while - it comes in handy
> when trying out bootstrap or test with -mcpu values like -mcpu=cortex-a72
> with a system assmebler which trails trunk binutils.
>
> Essentially, we rewrite -mcpu=foo to a -march flag providing the same
> architecture revision and set of optional architecture features. There
> is no reason we should ever need the assembler to see a CPU name, it
> should only be interested in the architecture variant.
>
> While we're there I've long found this function too fragile and hard
> to grok in C. So I've rewritten it in C++ to use std::string rather than
> raw C strings. Making this work with extension strings requires a slight
> refactor to the existing extension printing code to pull it across to
> somewhere common.
>
> Note that this also stops us from having to pick through a big.LITTLE
> target to find and separate the core names - we can just look up the
> architecture of the whole target and use that.
>
> The new function does leak the allocation of a C string to hold the
> result, but looking at gcc.c:getenv_spec_function and
> gcc.c:replace_extension_spec_func this is the usual thing to do.
>
> This has been through an aarch64-none-linux-gnu bootstrap and test run,
> configured with --with-cpu=cortex-a72 , which my system assembler does
> not understand.
>
> Ok?

I like this since this helps me not having to have a newer assembler
for -mcpu=thunderx.  Though I still need it for LSE support in the
assembler.  Has anyone thought about adding a configure test for v8.1
(LSE and other) support and disabling those extensions (yes this is
the same issue on x86_64 with AVX)?

Thanks,
Andrew

>
> Thanks,
> James
>
> ---
> 2015-08-19  James Greenhalgh  
>
> * common/config/aarch64/aarch64-common.c
> (AARCH64_CPU_NAME_LENGTH): Delete.
> (aarch64_option_extension): New.
> (all_extensions): Likewise.
> (processor_name_to_arch): Likewise.
> (arch_to_arch_name): Likewise.
> (all_cores): New.
> (all_architectures): Likewise.
> (aarch64_get_extension_string_for_isa_flags): Likewise.
> (aarch64_rewrite_selected_cpu): Change to rewrite CPU names to
> architecture names.
> * config/aarch64/aarch64-protos.h
> (aarch64_get_extension_string_for_isa_flags): New.
> * config/aarch64/aarch64.c (aarch64_print_extension): Delete.
> (aarch64_option_print): Get the string to print from
> aarch64_get_extension_string_for_isa_flags.
> (aarch64_declare_function_name): Likewise.
> * config/aarch64/aarch64.h (BIG_LITTLE_SPEC): Rename to...
> (MCPU_TO_MARCH_SPEC): This.
> (ASM_CPU_SPEC): Use it.
> (BIG_LITTLE_SPEC_FUNCTIONS): Rename to...
> (MCPU_TO_MARCH_SPEC_FUNCTIONS): ...This.
> (EXTRA_SPEC_FUNCTIONS): Use it.
>


Re: [AArch64] Break -mcpu tie between the compiler and assembler

2015-08-19 Thread Andrew Pinski
On Wed, Aug 19, 2015 at 11:39 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch has been sitting in my tree for a while - it comes in handy
> when trying out bootstrap or test with -mcpu values like -mcpu=cortex-a72
> with a system assmebler which trails trunk binutils.
>
> Essentially, we rewrite -mcpu=foo to a -march flag providing the same
> architecture revision and set of optional architecture features. There
> is no reason we should ever need the assembler to see a CPU name, it
> should only be interested in the architecture variant.
>
> While we're there I've long found this function too fragile and hard
> to grok in C. So I've rewritten it in C++ to use std::string rather than
> raw C strings. Making this work with extension strings requires a slight
> refactor to the existing extension printing code to pull it across to
> somewhere common.
>
> Note that this also stops us from having to pick through a big.LITTLE
> target to find and separate the core names - we can just look up the
> architecture of the whole target and use that.
>
> The new function does leak the allocation of a C string to hold the
> result, but looking at gcc.c:getenv_spec_function and
> gcc.c:replace_extension_spec_func this is the usual thing to do.
>
> This has been through an aarch64-none-linux-gnu bootstrap and test run,
> configured with --with-cpu=cortex-a72 , which my system assembler does
> not understand.
>
> Ok?


+ modified string, which seems much worse!  */
+  char *output = (char*) xmalloc (sizeof (*output)
+  * (outstr.length () + 1));
+  strcpy (output, outstr.c_str ());

Why not just:
char *output = xstrdup (outstr.c_str ());

Or at least use XNEWVEC instead of xmalloc with a cast?

Thanks,
Andrew Pinski

>
> Thanks,
> James
>
> ---
> 2015-08-19  James Greenhalgh  
>
> * common/config/aarch64/aarch64-common.c
> (AARCH64_CPU_NAME_LENGTH): Delete.
> (aarch64_option_extension): New.
> (all_extensions): Likewise.
> (processor_name_to_arch): Likewise.
> (arch_to_arch_name): Likewise.
> (all_cores): New.
> (all_architectures): Likewise.
> (aarch64_get_extension_string_for_isa_flags): Likewise.
> (aarch64_rewrite_selected_cpu): Change to rewrite CPU names to
> architecture names.
> * config/aarch64/aarch64-protos.h
> (aarch64_get_extension_string_for_isa_flags): New.
> * config/aarch64/aarch64.c (aarch64_print_extension): Delete.
> (aarch64_option_print): Get the string to print from
> aarch64_get_extension_string_for_isa_flags.
> (aarch64_declare_function_name): Likewise.
> * config/aarch64/aarch64.h (BIG_LITTLE_SPEC): Rename to...
> (MCPU_TO_MARCH_SPEC): This.
> (ASM_CPU_SPEC): Use it.
> (BIG_LITTLE_SPEC_FUNCTIONS): Rename to...
> (MCPU_TO_MARCH_SPEC_FUNCTIONS): ...This.
> (EXTRA_SPEC_FUNCTIONS): Use it.
>


[PATCH] remove more useless typedefs

2015-08-19 Thread tbsaunde+gcc
From: tbsaunde 

Hi,

more typedef cleanup.

bootstrapped + regtested on x86_64-linux-gnu, commited since preapproved
 by richi.

Trev

gcc/c-family/ChangeLog:

2015-08-18  Trevor Saunders  

* c-ada-spec.h, c-common.c, c-common.h, c-format.c, c-format.h,
c-objc.h, c-ppoutput.c, c-pragma.c, c-pragma.h: Remove useless
 typedefs.

gcc/c/ChangeLog:

2015-08-18  Trevor Saunders  

* c-aux-info.c, c-parser.c, c-tree.h: Remove useless typedefs.

gcc/cp/ChangeLog:

2015-08-18  Trevor Saunders  

* call.c, class.c, cp-tree.h, decl.c, except.c, mangle.c,
method.c, name-lookup.h, parser.c, parser.h, rtti.c,
semantics.c, typeck2.c: Remove useless typedefs.

gcc/fortran/ChangeLog:

2015-08-18  Trevor Saunders  

* dependency.c, dependency.h, gfortran.h, io.c, module.c,
parse.h, resolve.c, trans-types.h, trans.h: remove useless
typedefs.

gcc/lto/ChangeLog:

2015-08-18  Trevor Saunders  

* lto.h: Remove useless typedefs.

gcc/objc/ChangeLog:

2015-08-18  Trevor Saunders  

* objc-act.h, objc-next-runtime-abi-02.c, objc-runtime-hooks.h:
Remove useless typedefs.

gcc/ChangeLog:

2015-08-18  Trevor Saunders  

* bb-reorder.c, cfgloop.h, collect2.c, combine.c, dse.c,
dwarf2cfi.c, gcse-common.h, genopinit.c, ggc-page.c, machmode.h,
mcf.c, modulo-sched.c, omp-low.c, read-rtl.c, sched-rgn.c,
signop.h, tree-call-cdce.c, tree-dfa.c, tree-diagnostic.c,
tree-inline.h, tree-scalar-evolution.c, tree-ssa-address.c,
tree-ssa-loop-niter.c, tree-ssa-loop.h, tree-ssa-pre.c,
tree-ssa-reassoc.c, tree-ssa-sccvn.h, tree-ssa-structalias.c,
tree-ssa-uninit.c, tree-ssa.h, tree-vect-loop-manip.c,
tree-vectorizer.h, tree-vrp.c, var-tracking.c: Remove useless
typedefs.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@227001 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |  12 +++
 gcc/bb-reorder.c|   4 +-
 gcc/c-family/ChangeLog  |   5 ++
 gcc/c-family/c-ada-spec.h   |   4 +-
 gcc/c-family/c-common.c |   4 +-
 gcc/c-family/c-common.h |   9 +-
 gcc/c-family/c-format.c |  16 ++--
 gcc/c-family/c-format.h |  28 +++
 gcc/c-family/c-objc.h   |   4 +-
 gcc/c-family/c-ppoutput.c   |   4 +-
 gcc/c-family/c-pragma.c |  20 ++---
 gcc/c-family/c-pragma.h |  11 ++-
 gcc/c/ChangeLog |   4 +
 gcc/c/c-aux-info.c  |   3 +-
 gcc/c/c-parser.c|  16 ++--
 gcc/c/c-tree.h  |   4 +-
 gcc/cfgloop.h   |   2 +-
 gcc/collect2.c  |   8 +-
 gcc/combine.c   |   4 +-
 gcc/cp/ChangeLog|   6 ++
 gcc/cp/call.c   |  12 ++-
 gcc/cp/class.c  |  12 +--
 gcc/cp/cp-tree.h|  97 +++---
 gcc/cp/decl.c   |  12 +--
 gcc/cp/except.c |   4 +-
 gcc/cp/mangle.c |   4 +-
 gcc/cp/method.c |   2 -
 gcc/cp/name-lookup.h|  20 ++---
 gcc/cp/parser.c |  28 +++
 gcc/cp/parser.h |  36 
 gcc/cp/rtti.c   |   8 +-
 gcc/cp/semantics.c  |   4 +-
 gcc/cp/typeck2.c|   2 +-
 gcc/dse.c   |   4 +-
 gcc/dwarf2cfi.c |  20 ++---
 gcc/fortran/ChangeLog   |   5 ++
 gcc/fortran/dependency.c|   5 +-
 gcc/fortran/dependency.h|   5 +-
 gcc/fortran/gfortran.h  | 161 ++--
 gcc/fortran/io.c|   5 +-
 gcc/fortran/module.c|  10 +--
 gcc/fortran/parse.h |   5 +-
 gcc/fortran/resolve.c   |  15 ++--
 gcc/fortran/trans-types.h   |   4 +-
 gcc/fortran/trans.h |  10 +--
 gcc/gcse-common.h   |   4 +-
 gcc/genopinit.c |   8 +-
 gcc/ggc-page.c  |   8 +-
 gcc/lto/ChangeLog   |   4 +
 gcc/lto/lto.h   |   4 +-
 gcc/machmode.h  |   4 +-
 gcc/mcf.c   |  24 +++---
 gcc/modulo-sched.c  |   4 -
 gcc/objc/ChangeLog  |   5 ++
 gcc/objc/objc-act.h |  10 +--
 gcc/objc/objc-next-runtime-abi-02.c |  16 ++--
 gcc/objc/objc-runtime-hooks.h   |   4 +-
 gcc/omp-low.c   |   4 +-
 gcc/read-rtl.c  |   6 --
 gcc/sched-rgn.c |  15 ++--
 gcc/signop.h|   4 +-
 gcc/tree-call-cdce.c|   4 +-
 gcc/tree-dfa.c  |   3 +-
 gcc/tree-diagnostic.c   |   4 +-
 gcc/tree-inline.h   |   4

[PATCHv2/AARCH64] Remove index from AARCH64_FUSION_PAIR

2015-08-19 Thread Andrew Pinski
Changes from v1:
Also remove the hack AARCH64_FUSE_ALL.

Instead of doing an explict index in aarch64-fusion-pairs.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
(aarch64_fusion_pairs): Base the shifted value on the index instead
Rewrite AARCH64_FUSE_ALL to be based on the end index.
of the argument to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64.c: Remove the last argument to AARCH64_FUSION_PAIR.
commit 8a02e03360852b9261d45528384fa8b87c673e53
Author: Andrew Pinski 
Date:   Tue Aug 18 22:13:32 2015 -0700

Remove index from AARCH64_FUSION_PAIR

Instead of doing an explict index in aarch64-fusion-pairs.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
(aarch64_fusion_pairs): Base the shifted value on the index instead
Rewrite AARCH64_FUSE_ALL to be based on the end index.
of the argument to AARCH64_FUSION_PAIR.
* config/aarch64/aarch64.c: Remove the last argument to AARCH64_FUSION_PAIR.

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index a7b00f6..53bbef4 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -20,19 +20,17 @@
 /* Pairs of instructions which can be fused. before including this file,
define a macro:
 
- AARCH64_FUSION_PAIR (name, internal_name, index_bit)
+ AARCH64_FUSION_PAIR (name, internal_name)
 
Where:
 
  NAME is a string giving a friendly name for the instructions to fuse.
  INTERNAL_NAME gives the internal name suitable for appending to
- AARCH64_FUSE_ to give an enum name.
- INDEX_BIT is the bit to set in the bitmask of supported fusion
- operations.  */
-
-AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK, 0)
-AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD, 1)
-AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK, 2)
-AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR, 3)
-AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH, 4)
+ AARCH64_FUSE_ to give an enum name. */
+
+AARCH64_FUSION_PAIR ("mov+movk", MOV_MOVK)
+AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
+AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
+AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
+AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 0b09d49..057d4fc 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -201,23 +201,24 @@ struct tune_params
   unsigned int extra_tuning_flags;
 };
 
-#define AARCH64_FUSION_PAIR(x, name, index) \
-  AARCH64_FUSE_##name = (1 << index),
+#define AARCH64_FUSION_PAIR(x, name) \
+  AARCH64_FUSE_##name##_index, 
 /* Supported fusion operations.  */
-enum aarch64_fusion_pairs
+enum aarch64_fusion_pairs_index
 {
-  AARCH64_FUSE_NOTHING = 0,
 #include "aarch64-fusion-pairs.def"
-
-/* Hacky macro to build AARCH64_FUSE_ALL.  The sequence below expands
-   to:
-   AARCH64_FUSE_ALL = 0 | AARCH64_FUSE_index1 | AARCH64_FUSE_index2 ...  */
+  AARCH64_FUSE_index_END
+};
 #undef AARCH64_FUSION_PAIR
-#define AARCH64_FUSION_PAIR(x, name, y) \
-  | AARCH64_FUSE_##name
 
-  AARCH64_FUSE_ALL = 0
+#define AARCH64_FUSION_PAIR(x, name) \
+  AARCH64_FUSE_##name = (1u << AARCH64_FUSE_##name##_index),
+/* Supported fusion operations.  */
+enum aarch64_fusion_pairs
+{
+  AARCH64_FUSE_NOTHING = 0,
 #include "aarch64-fusion-pairs.def"
+  AARCH64_FUSE_ALL = (1u << AARCH64_FUSE_index_END) - 1
 };
 #undef AARCH64_FUSION_PAIR
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aa268ae..162e25e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -172,7 +172,7 @@ struct aarch64_flag_desc
   unsigned int flag;
 };
 
-#define AARCH64_FUSION_PAIR(name, internal_name, y) \
+#define AARCH64_FUSION_PAIR(name, internal_name) \
   { name, AARCH64_FUSE_##internal_name },
 static const struct aarch64_flag_desc aarch64_fusible_pairs[] =
 {


[PATCHv2/AARCH64] Remove index from AARCH64_EXTRA_TUNING_OPTION

2015-08-19 Thread Andrew Pinski
Just like the patch for AARCH64_FUSION_PAIR, this is a patch for
AARCH64_EXTRA_TUNING_OPTION.  Note I tested this patch on top of the
patch for AARCH64_EXTRA_TUNING_OPTION.

Changes in v2:
Remove the hack for AARCH64_EXTRA_TUNE_ALL.

Remove index from AARCH64_EXTRA_TUNING_OPTION

Instead of doing an explict index in aarch64-tuning-flags.def, we
should have an enum which does the index instead.  This allows
you to add/remove them without worrying about the order being
correct and having holes or worry about merge conficts.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

ChangeLog:
* config/aarch64/aarch64-tuning-flags.def: Remove all index to
AARCH64_EXTRA_TUNING_OPTION.
* config/aarch64/aarch64-protos.h (extra_tuning_flags_index): New enum.
(aarch64_extra_tuning_flags): Base the shifted value on the index instead
of the argument to AARCH64_EXTRA_TUNING_OPTION.
* config/aarch64/aarch64.c: Remove the last argument to
AARCH64_EXTRA_TUNING_OPTION..


Re: [PATCHv2/AARCH64] Remove index from AARCH64_FUSION_PAIR

2015-08-19 Thread James Greenhalgh
On Wed, Aug 19, 2015 at 04:58:22PM +0100, Andrew Pinski wrote:
> Changes from v1:
> Also remove the hack AARCH64_FUSE_ALL.
> 
> Instead of doing an explict index in aarch64-fusion-pairs.def, we
> should have an enum which does the index instead.  This allows
> you to add/remove them without worrying about the order being
> correct and having holes or worry about merge conficts.
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

OK.

Thanks,
James

> 
> ChangeLog:
> * aarch64-fusion-pairs.def: Remove all index to AARCH64_FUSION_PAIR.
> * config/aarch64/aarch64-protos.h (aarch64_fusion_pairs_index): New enum.
> (aarch64_fusion_pairs): Base the shifted value on the index instead
> Rewrite AARCH64_FUSE_ALL to be based on the end index.
> of the argument to AARCH64_FUSION_PAIR.
> * config/aarch64/aarch64.c: Remove the last argument to 
> AARCH64_FUSION_PAIR.



Re: [PATCHv2/AARCH64] Remove index from AARCH64_EXTRA_TUNING_OPTION

2015-08-19 Thread James Greenhalgh
On Wed, Aug 19, 2015 at 05:00:14PM +0100, Andrew Pinski wrote:
> Just like the patch for AARCH64_FUSION_PAIR, this is a patch for
> AARCH64_EXTRA_TUNING_OPTION.  Note I tested this patch on top of the
> patch for AARCH64_EXTRA_TUNING_OPTION.
> 
> Changes in v2:
> Remove the hack for AARCH64_EXTRA_TUNE_ALL.
> 
> Remove index from AARCH64_EXTRA_TUNING_OPTION
> 
> Instead of doing an explict index in aarch64-tuning-flags.def, we
> should have an enum which does the index instead.  This allows
> you to add/remove them without worrying about the order being
> correct and having holes or worry about merge conficts.
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

-ENOPATCH, but assuming this is along the same lines as the one I just
acked, I'm happy for you to consider this preapproved (after checking
the comments below). Please send a copy to the list for the archives.

> ChangeLog:
> * config/aarch64/aarch64-tuning-flags.def: Remove all index to
> AARCH64_EXTRA_TUNING_OPTION.
> * config/aarch64/aarch64-protos.h (extra_tuning_flags_index): New enum.

I'm guessing that this has a more aarch64-centric name like
aarch64_extra_tuning_flags_index ? If not, it probably should have just to
fit with the naming scheme in the rest of the file.

> (aarch64_extra_tuning_flags): Base the shifted value on the index instead
> of the argument to AARCH64_EXTRA_TUNING_OPTION.
> * config/aarch64/aarch64.c: Remove the last argument to
> AARCH64_EXTRA_TUNING_OPTION..

Watch out for the extra . on the end of this ChangeLog line..

Thanks,
James

> 


[AArch64][TLSLE][2/3] Rename SYMBOL_TLSLE to SYMBOL_TLSLE24

2015-08-19 Thread Jiong Wang

Jiong Wang writes:

> As we have added -mtls-size support, there should be four types TLSLE
> symbols:
>
>   SYMBOL_TLSLE12
>   SYMBOL_TLSLE24
>   SYMBOL_TLSLE32
>   SYMBOL_TLSLE48
>
> which reflect the maximum address bits needed to address this symbol.
>
> This patch rename SYMBOL_TLSLE to SYMBOL_TLSLE24. Patch [3/3] will add
> support for other symbol types.
>
> OK for trunk?
>
> 2015-08-19  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename
>   SYMBOL_TLSLE to SYMBOL_TLSLE24.
>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Likewise
>   (aarch64_expand_mov_immediate): Likewise
>   (aarch64_print_operand): Likewise
>   (aarch64_classify_symbol): Likewise

Sorry, the patch name should be

  [AArch64][TLSLE][2/3] Rename SYMBOL_TLSLE to SYMBOL_TLSLE24

instead of

  [AArch64][TLSLE][2/3] Add the option "-mtls-size" for AArch64

-- 
Regards,
Jiong



Re: [middle-end,patch] Making __builtin_signbit type-generic

2015-08-19 Thread Joseph Myers
On Wed, 19 Aug 2015, Andreas Schwab wrote:

> >> Why only in usafe mode?  Isn't the sign bit of NaN always unreliable?
> >
> > NaN sign bits are meaningful for a limited set of operations.
> 
> And what are those?

Assignment to the same type, negation, absolute value, copysign, signbit.  
(In particular, the sign bit of a NaN resulting from an arithmetic 
operation is unspecified, but it does have to act as if it has some 
particular sign; if the statement "a = b * c;" results in a containing a 
NaN, two calls to signbit (a), with a not modified in between, must result 
in the same sign being indicated, but what that value is doesn't matter, 
and signbit (-a) must result in the other sign being indicated.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Jeff Law

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:


2015-08-10  Kyrylo Tkachov 

 * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
 then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
 (end_ifcvt_sequence): Call set_used_flags on each insn in the
 sequence.
 Include rtl-iter.h.
 (noce_simple_bbs): New function.
 (noce_try_move): Bail if basic blocks are not simple.
 (noce_try_store_flag): Likewise.
 (noce_try_store_flag_constants): Likewise.
 (noce_try_addcc): Likewise.
 (noce_try_store_flag_mask): Likewise.
 (noce_try_cmove): Likewise.
 (noce_try_minmax): Likewise.
 (noce_try_abs): Likewise.
 (noce_try_sign_mask): Likewise.
 (noce_try_bitop): Likewise.
 (bbs_ok_for_cmove_arith): New function.
 (noce_emit_all_but_last): Likewise.
 (noce_emit_insn): Likewise.
 (noce_emit_bb): Likewise.
 (noce_try_cmove_arith): Handle non-simple basic blocks.
 (insn_valid_noce_process_p): New function.
 (contains_mem_rtx_p): Likewise.
 (bb_valid_for_noce_process_p): Likewise.
 (noce_process_if_block): Allow non-simple basic blocks
 where appropriate.

2015-08-11  Kyrylo Tkachov 

 * gcc.dg/ifcvt-1.c: New test.
 * gcc.dg/ifcvt-2.c: Likewise.
 * gcc.dg/ifcvt-3.c: Likewise.
Thanks for pinging -- I thought I'd already approved this a few days 
ago!  But I can't find it in my outbox...  So clearly I didn't finish 
the final review.





diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
return FALSE;
  }

+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+  || !NONJUMP_INSN_P (insn)
+  || (cc && set_of (cc, insn)))
+  return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+  || !noce_operand_ok (SET_DEST (sset))
+  || !noce_operand_ok (SET_SRC (sset)))
+return false;
+
+  return true;
+}
+
+



+  /* Make sure this is a REG and not some instance
+of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
+  if (!REG_P (SET_DEST (sset_b)))
+   {
+ BITMAP_FREE (bba_sets);
+ return false;
+   }
BTW, this is overly conservative.  You're working with pseudos here, so 
you can just treat a ZERO_EXTRACT or SUBREG as a read of the full 
underlying register.  If this comes up in practice you might consider 
handling them as a follow-up patch.  I don't think you need to handle 
that case immediately though.


I also can't remember if we discussed what happens if blocks A & B write 
to the same register, do we handle that situation correctly?


That's the only issue left in my mind.  If we're handling that case 
correctly, then this is Ok for the trunk as-is.  Else we'll need another 
iteration.


Thanks,
Jeff




Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2015 at 08:25:49AM -0700, H.J. Lu wrote:
> Here is a patch to add __builtin_argument_pointer.  I only have

Sorry to be a pain but...  all the other builtins use _address
instead of _pointer, it's probably best to follow that.

>  -- Built-in Function: void * __builtin_argument_pointer (void)
>  This function returns the argument pointer.
> 
> as documentation.  Can you suggest a better description so that it can
> be implemented also by other compilers?

Maybe something like (heavily cut'n'pasted):


@deftypefn {Built-in Function} {void *} __builtin_argument_address (void)
This function is similar to @code{__builtin_frame_address} with an
argument of 0, but it returns the address of the incoming arguments to
the current function rather than the address of its frame.

The exact definition of this address depends upon the processor and the
calling convention.  Usually some arguments are passed in registers and
the rest on the stack, and this builtin returns the address of the first
argument that is on the stack.


> +  /* Can't use DRAP if the stack address has been taken.  */
> +  if (cfun->argument_pointer_taken)
> + sorry ("%<__builtin_argument_pointer%> not supported with stack"
> +" realignment.  This may be worked around by adding"
> +" -maccumulate-outgoing-args.");

This doesn't work with DRAP?  Pity :-(

The patch looks plausible, but I of course can not approve it.

Thanks,


Segher


Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 9:58 AM, Segher Boessenkool
 wrote:
> On Wed, Aug 19, 2015 at 08:25:49AM -0700, H.J. Lu wrote:
>> Here is a patch to add __builtin_argument_pointer.  I only have
>
> Sorry to be a pain but...  all the other builtins use _address
> instead of _pointer, it's probably best to follow that.
>
>>  -- Built-in Function: void * __builtin_argument_pointer (void)
>>  This function returns the argument pointer.
>>
>> as documentation.  Can you suggest a better description so that it can
>> be implemented also by other compilers?
>
> Maybe something like (heavily cut'n'pasted):
>
>
> @deftypefn {Built-in Function} {void *} __builtin_argument_address (void)
> This function is similar to @code{__builtin_frame_address} with an
> argument of 0, but it returns the address of the incoming arguments to
> the current function rather than the address of its frame.

This doesn't make senses when there is no argument or arguments
are passed in registers.  To me, argument pointer is a virtual concept
and an implementation detail internal to GCC.  I am not sure if another
compiler can implement it based on this description.

> The exact definition of this address depends upon the processor and the
> calling convention.  Usually some arguments are passed in registers and
> the rest on the stack, and this builtin returns the address of the first
> argument that is on the stack.
>
>
>> +  /* Can't use DRAP if the stack address has been taken.  */
>> +  if (cfun->argument_pointer_taken)
>> + sorry ("%<__builtin_argument_pointer%> not supported with stack"
>> +" realignment.  This may be worked around by adding"
>> +" -maccumulate-outgoing-args.");
>
> This doesn't work with DRAP?  Pity :-(

With DRAP,  we do

  /* Replicate the return address on the stack so that return
 address can be reached via (argp - 1) slot.  This is needed
 to implement macro RETURN_ADDR_RTX and intrinsic function
 expand_builtin_return_addr etc.  */
  t = plus_constant (Pmode, crtl->drap_reg, -UNITS_PER_WORD);
  t = gen_frame_mem (word_mode, t);
  insn = emit_insn (gen_push (t));
  RTX_FRAME_RELATED_P (insn) = 1;

  /* For the purposes of frame and register save area addressing,
 we've started over with a new frame.  */
  m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
  m->fs.realigned = true;

which doesn't work for __builtin_argument_pointer.

> The patch looks plausible, but I of course can not approve it.
>

Thanks.


-- 
H.J.


Re: [PATCH] remove more useless typedefs

2015-08-19 Thread David Malcolm
On Wed, 2015-08-19 at 11:50 -0400, tbsaunde+...@tbsaunde.org wrote:
> From: tbsaunde 
> 
> Hi,
> 
> more typedef cleanup.
> 
> bootstrapped + regtested on x86_64-linux-gnu, commited since preapproved
>  by richi.

[...]

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index db23a0f..32421c5 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,15 @@
> +2015-08-18  Trevor Saunders  
> +
> + * bb-reorder.c, cfgloop.h, collect2.c, combine.c, dse.c,
> + dwarf2cfi.c, gcse-common.h, genopinit.c, ggc-page.c, machmode.h,
> + mcf.c, modulo-sched.c, omp-low.c, read-rtl.c, sched-rgn.c,
> + signop.h, tree-call-cdce.c, tree-dfa.c, tree-diagnostic.c,
> + tree-inline.h, tree-scalar-evolution.c, tree-ssa-address.c,
> + tree-ssa-loop-niter.c, tree-ssa-loop.h, tree-ssa-pre.c,
> + tree-ssa-reassoc.c, tree-ssa-sccvn.h, tree-ssa-structalias.c,
> + tree-ssa-uninit.c, tree-ssa.h, tree-vect-loop-manip.c,
> + tree-vectorizer.h, tree-vrp.c, var-tracking.c: Remove useless
> +

FWIW, it looks like this ChangeLog entry is missing the trailing word
"typedefs".

> diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> index 7a25c39..eb717a0 100644
> --- a/gcc/c-family/ChangeLog
> +++ b/gcc/c-family/ChangeLog
> @@ -1,3 +1,8 @@
> +2015-08-18  Trevor Saunders  
> +
> + * c-ada-spec.h, c-common.c, c-common.h, c-format.c, c-format.h,
> + c-objc.h, c-ppoutput.c, c-pragma.c, c-pragma.h: Remove useless
> +

Likewise.


> diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
> index 275d787..1536b1b 100644
> --- a/gcc/c/ChangeLog
> +++ b/gcc/c/ChangeLog
> @@ -1,3 +1,7 @@
> +2015-08-18  Trevor Saunders  
> +
> + * c-aux-info.c, c-parser.c, c-tree.h: Remove useless typedefs.
> +

...whereas this one looks correct.


> diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
> index 3a63875..9cbaf6c 100644
> --- a/gcc/fortran/ChangeLog
> +++ b/gcc/fortran/ChangeLog
> @@ -1,3 +1,8 @@
> +2015-08-18  Trevor Saunders  
> +
> + * dependency.c, dependency.h, gfortran.h, io.c, module.c,
> + parse.h, resolve.c, trans-types.h, trans.h: remove useless
> +

...and this one looks wrong.


Looking at:
https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=227001

it looks like the copy of the ChangeLog in the commit message contains
the missing words (albeit without indentation), but the actual ChangeLog
files don't.

Dave



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Kyrill Tkachov


On 19/08/15 17:57, Jeff Law wrote:

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:

2015-08-10  Kyrylo Tkachov 

  * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
  then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
  (end_ifcvt_sequence): Call set_used_flags on each insn in the
  sequence.
  Include rtl-iter.h.
  (noce_simple_bbs): New function.
  (noce_try_move): Bail if basic blocks are not simple.
  (noce_try_store_flag): Likewise.
  (noce_try_store_flag_constants): Likewise.
  (noce_try_addcc): Likewise.
  (noce_try_store_flag_mask): Likewise.
  (noce_try_cmove): Likewise.
  (noce_try_minmax): Likewise.
  (noce_try_abs): Likewise.
  (noce_try_sign_mask): Likewise.
  (noce_try_bitop): Likewise.
  (bbs_ok_for_cmove_arith): New function.
  (noce_emit_all_but_last): Likewise.
  (noce_emit_insn): Likewise.
  (noce_emit_bb): Likewise.
  (noce_try_cmove_arith): Handle non-simple basic blocks.
  (insn_valid_noce_process_p): New function.
  (contains_mem_rtx_p): Likewise.
  (bb_valid_for_noce_process_p): Likewise.
  (noce_process_if_block): Allow non-simple basic blocks
  where appropriate.

2015-08-11  Kyrylo Tkachov 

  * gcc.dg/ifcvt-1.c: New test.
  * gcc.dg/ifcvt-2.c: Likewise.
  * gcc.dg/ifcvt-3.c: Likewise.

Thanks for pinging -- I thought I'd already approved this a few days
ago!  But I can't find it in my outbox...  So clearly I didn't finish
the final review.




diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
 return FALSE;
   }

+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+  || !NONJUMP_INSN_P (insn)
+  || (cc && set_of (cc, insn)))
+  return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+  || !noce_operand_ok (SET_DEST (sset))
+  || !noce_operand_ok (SET_SRC (sset)))
+return false;
+
+  return true;
+}
+
+
+  /* Make sure this is a REG and not some instance
+of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
+  if (!REG_P (SET_DEST (sset_b)))
+   {
+ BITMAP_FREE (bba_sets);
+ return false;
+   }

BTW, this is overly conservative.  You're working with pseudos here, so
you can just treat a ZERO_EXTRACT or SUBREG as a read of the full
underlying register.  If this comes up in practice you might consider
handling them as a follow-up patch.  I don't think you need to handle
that case immediately though.


I agree, from my testing and investigation this patch strictly
increases the available opportunities, so being conservative here
should not cause any regressions against existing behaviour.
Making it more aggressive can be done as a follow up though, as you said,
I'm not sure how frequently this comes up in practice.



I also can't remember if we discussed what happens if blocks A & B write
to the same register, do we handle that situation correctly?


Hmmm...
The function bb_valid_for_noce_process_p that we call early on
in noce_process_if_block makes sure that the only live reg out
of each basic block is the final common destination ('x' in the
noce_if_info struct definition). Since both basic blocks satisfy
that check I suppose that means that even if A and B do write to
the same intermediate pseudo that should not affect correctness
since the written-to common register would have to be read within
A and B (and nowhere outside A and B), which would cause it to
fail the bbs_ok_for_cmove_arith check that checks that no regs
written in A are read in B (and vice versa).



That's the only issue left in my mind.  If we're handling that case
correctly, then this is Ok for the trunk as-is.  Else we'll need another
iteration.


Does the above explanation look ok to you?
If so, I'll be away for a week from Monday so I'd rather commit
the patch when I get back so I can deal with any fallout...

Thanks for the reviews!
Kyrill



Thanks,
Jeff






Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-08-19 Thread Jeff Law

On 08/05/2015 09:16 AM, Martin Liška wrote:


2015-07-09  Martin Liska

* dbgcnt.def: Add new debug counter.
* ipa-icf-gimple.c (func_checker::compare_ssa_name): Use newly added
state flag.
(func_checker::compare_memory_operand): Likewise.
(func_checker::compare_cst_or_decl): Handle if we are in
tail_merge_mode.
(func_checker::reset_preferences): New function.
(func_checker::set_comparing_sensitive_rhs): Likewise.
(func_checker::stmt_local_def): New function.
(func_checker::compare_phi_node): Move from sem_function class.
(func_checker::compare_bb_tail_merge): New function.
(func_checker::compare_bb): Improve STMT iteration.
(func_checker::compare_gimple_call): Return false in case of
an UBSAN function.
(func_checker::compare_gimple_assign): Likewise.
(func_checker::compare_gimple_label): Remove unused flag.
(ssa_names_set): New class.
(ssa_names_set::build): New function.
* ipa-icf-gimple.h (func_checker::gsi_next_nonlocal): New
function.
(ssa_names_set::contains): New function.
(ssa_names_set::add): Likewise.
* ipa-icf.c (sem_function::equals_private): Use transformed
function.
(sem_function::compare_phi_node): Move to func_checker class.
(make_pass_ipa_icf): Change namespace.
* ipa-icf.h: Add new declarations and rename namespace.
* tree-ssa-tail-merge.c (check_edges_correspondence): New
function.
(find_duplicate): Add usage of IPA ICF gimple infrastructure.
(find_clusters_1): Pass new sem_function argument.
(find_clusters): Likewise.
(tail_merge_optimize): Call IPA ICF comparison machinery.
(gvn_uses_equal): Remove.
(gimple_equal_p): Likewise.
(gsi_advance_bw_nondebug_nonlocal): Likewise.
(find_duplicate): Remove unused argument.
(make_pass_tail_merge): New function.
(pass_tail_merge::execute): Likewise.
(equal_ssa_uses): New function.
(same_succ_hash): Skip hashing of call arguments.
(same_succ_hash): Handle NULL value which can occur.
(gimple_operand_equal_value_p): Remove.
(same_phi_alternatives): Use newly added function equal_ssa_uses.
(same_phi_alternatives_1): Pass a new argument.
* passes.def: Add new pass.
* tree-pass.h: Likewise.
* tree-ssa-pre.c (pass_pre::execute): Remove connection to tail-merge
pass.
---



@@ -256,7 +265,8 @@ func_checker::compatible_types_p (tree t1, tree t2)
return true;
  }

-/* Function compare for equality given memory operands T1 and T2.  */
+/* Function compare for equality given memory operands T1 and T2.
+   If STRICT flag is true, versions must match strictly.  */

You've removed the STRICT argument, so you can probably drop this comment.


@@ -626,6 +665,138 @@ func_checker::parse_labels (sem_bb *bb)
  }
  }

+/* Return true if gimple STMT is just a local definition in a
+   basic block.  Local definition in this context means that a product
+   of the statement (transitively) does not escape the basic block.
+   Used SSA names are contained in SSA_NAMES_SET.  */
+
+bool
+func_checker::stmt_local_def (gimple stmt, ssa_names_set *ssa_names_set)

Funny, Kyrill just implemented something similar, but at the RTL level.



@@ -1037,4 +1252,67 @@ func_checker::compare_gimple_asm (const gasm *g1, const 
gasm *g2)
return true;
  }

-} // ipa_icf_gimple namespace
+void
+ssa_names_set::build (basic_block bb)
My only concern here is whether or not the two passes are sufficient.  I 
can kind of intuitively see how it works most of the time, but what if 
BB is a single node loop (ie, it branches back to itself).  Do really 
get the transitive closure we want in that case?



So I think if you can assure me we're doing the right thing for single 
node loops in ssa_names_set::build and remove the one comment change 
noted above and we'll be good to go for the trunk.


jeff


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Jeff Law

On 08/19/2015 11:20 AM, Kyrill Tkachov wrote:


Hmmm...
The function bb_valid_for_noce_process_p that we call early on
in noce_process_if_block makes sure that the only live reg out
of each basic block is the final common destination ('x' in the
noce_if_info struct definition). Since both basic blocks satisfy
that check I suppose that means that even if A and B do write to
the same intermediate pseudo that should not affect correctness
since the written-to common register would have to be read within
A and B (and nowhere outside A and B), which would cause it to
fail the bbs_ok_for_cmove_arith check that checks that no regs
written in A are read in B (and vice versa).
Excellent.  That answers things quite well.  In retrospect, I could have 
figured that out myself :-)






That's the only issue left in my mind.  If we're handling that case
correctly, then this is Ok for the trunk as-is.  Else we'll need another
iteration.


Does the above explanation look ok to you?
If so, I'll be away for a week from Monday so I'd rather commit
the patch when I get back so I can deal with any fallout...

That's fine with me.  Commit at your leisure.

Thanks,
jeff



[gomp4] Fixacc_on_device xform

2015-08-19 Thread Nathan Sidwell
I've committed this fix for a typo I introduced yesterday (and not testing what 
I thought I was testing).  Sadly passing a gimple_seq to gsi_replace doesn't 
lead to a compile error, but to bad runtime behaviour.


nathan
2015-08-19  Nathan Sidwell  

	* omp-low.c (oacc_xform_on_device): Fix thinko in previous change.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 226986)
+++ gcc/omp-low.c	(working copy)
@@ -14619,14 +14621,14 @@ oacc_xform_on_device (gimple stmt)
 #endif
   result = fold_convert (integer_type_node, result);
   tree lhs = gimple_call_lhs (stmt);
-  gimple_seq replace = NULL;
+  gimple_seq seq = NULL;
 
   push_gimplify_context (true);
-  gimplify_assign (lhs, result, &replace);
+  gimplify_assign (lhs, result, &seq);
   pop_gimplify_context (NULL);
 
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  gsi_replace (&gsi, replace, false);
+  gsi_replace_with_seq (&gsi, seq, false);
 }
 
 /* Transform oacc_dim_size and oacc_dim_pos internal function calls to


Re: [PR64164] drop copyrename, integrate into expand

2015-08-19 Thread Alexandre Oliva
On Aug 19, 2015, Andreas Schwab  wrote:

> Andreas Schwab  writes:
>> Alexandre Oliva  writes:
>> 
>>> [PR64164] fix regressions reported on m68k and armeb
>>> 
>>> From: Alexandre Oliva 
>>> 
>>> Defer stack slot address assignment for all parms that can't live in
>>> pseudos, and accept pseudos assignments in assign_param_setup_block.
>> 
>> That doesn't fix the ia64 Ada miscompilation though.

That's not surprising, it's the first I hear of it ;-)

> I mean miscomparison, not miscompilation.  The difference is only in the
> insn scheduling.

Interesting.  I have a hard time figuring out how this could follow from
the patchset at hand, but...  let's try to figure it out.

I'm having some difficulty getting access to an ia64 box ATM, and for
ada bootstraps, a cross won't do, so...  if you still have that build
tree around, any chance you could recompile par.o with both stage1 and
stage2, with -fdump-rtl-expand-details, and email me the compiler dump
files?  Maybe that will suffice to figure out where the difference might
come from.

Thanks in advance,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2015 at 10:08:01AM -0700, H.J. Lu wrote:
> > Maybe something like (heavily cut'n'pasted):
> >
> >
> > @deftypefn {Built-in Function} {void *} __builtin_argument_address (void)
> > This function is similar to @code{__builtin_frame_address} with an
> > argument of 0, but it returns the address of the incoming arguments to
> > the current function rather than the address of its frame.
> 
> This doesn't make senses when there is no argument or arguments
> are passed in registers.

Sure, but see the weasel-words below ("The exact...")

> To me, argument pointer is a virtual concept
> and an implementation detail internal to GCC.  I am not sure if another
> compiler can implement it based on this description.

The same is true for frame_address, on many machines.

> > The exact definition of this address depends upon the processor and the
> > calling convention.  Usually some arguments are passed in registers and
> > the rest on the stack, and this builtin returns the address of the first
> > argument that is on the stack.


Segher


Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 10:48 AM, Segher Boessenkool
 wrote:
> On Wed, Aug 19, 2015 at 10:08:01AM -0700, H.J. Lu wrote:
>> > Maybe something like (heavily cut'n'pasted):
>> >
>> >
>> > @deftypefn {Built-in Function} {void *} __builtin_argument_address (void)
>> > This function is similar to @code{__builtin_frame_address} with an
>> > argument of 0, but it returns the address of the incoming arguments to
>> > the current function rather than the address of its frame.
>>
>> This doesn't make senses when there is no argument or arguments
>> are passed in registers.
>
> Sure, but see the weasel-words below ("The exact...")
>
>> To me, argument pointer is a virtual concept
>> and an implementation detail internal to GCC.  I am not sure if another
>> compiler can implement it based on this description.
>
> The same is true for frame_address, on many machines.

Stack frame is well understood unlike argument pointer which is
pretty vague.

>> > The exact definition of this address depends upon the processor and the
>> > calling convention.  Usually some arguments are passed in registers and
>> > the rest on the stack, and this builtin returns the address of the first
>> > argument that is on the stack.
>
>
> Segher



-- 
H.J.


Re: [PATCH] remove more useless typedefs

2015-08-19 Thread Trevor Saunders
On Wed, Aug 19, 2015 at 01:11:04PM -0400, David Malcolm wrote:
> On Wed, 2015-08-19 at 11:50 -0400, tbsaunde+...@tbsaunde.org wrote:
> > From: tbsaunde 
> > 
> > Hi,
> > 
> > more typedef cleanup.
> > 
> > bootstrapped + regtested on x86_64-linux-gnu, commited since preapproved
> >  by richi.
> 
> [...]
> 
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index db23a0f..32421c5 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,15 @@
> > +2015-08-18  Trevor Saunders  
> > +
> > +   * bb-reorder.c, cfgloop.h, collect2.c, combine.c, dse.c,
> > +   dwarf2cfi.c, gcse-common.h, genopinit.c, ggc-page.c, machmode.h,
> > +   mcf.c, modulo-sched.c, omp-low.c, read-rtl.c, sched-rgn.c,
> > +   signop.h, tree-call-cdce.c, tree-dfa.c, tree-diagnostic.c,
> > +   tree-inline.h, tree-scalar-evolution.c, tree-ssa-address.c,
> > +   tree-ssa-loop-niter.c, tree-ssa-loop.h, tree-ssa-pre.c,
> > +   tree-ssa-reassoc.c, tree-ssa-sccvn.h, tree-ssa-structalias.c,
> > +   tree-ssa-uninit.c, tree-ssa.h, tree-vect-loop-manip.c,
> > +   tree-vectorizer.h, tree-vrp.c, var-tracking.c: Remove useless
> > +
> 
> FWIW, it looks like this ChangeLog entry is missing the trailing word
> "typedefs".
> 
> > diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> > index 7a25c39..eb717a0 100644
> > --- a/gcc/c-family/ChangeLog
> > +++ b/gcc/c-family/ChangeLog
> > @@ -1,3 +1,8 @@
> > +2015-08-18  Trevor Saunders  
> > +
> > +   * c-ada-spec.h, c-common.c, c-common.h, c-format.c, c-format.h,
> > +   c-objc.h, c-ppoutput.c, c-pragma.c, c-pragma.h: Remove useless
> > +
> 
> Likewise.
> 
> 
> > diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
> > index 275d787..1536b1b 100644
> > --- a/gcc/c/ChangeLog
> > +++ b/gcc/c/ChangeLog
> > @@ -1,3 +1,7 @@
> > +2015-08-18  Trevor Saunders  
> > +
> > +   * c-aux-info.c, c-parser.c, c-tree.h: Remove useless typedefs.
> > +
> 
> ...whereas this one looks correct.
> 
> 
> > diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
> > index 3a63875..9cbaf6c 100644
> > --- a/gcc/fortran/ChangeLog
> > +++ b/gcc/fortran/ChangeLog
> > @@ -1,3 +1,8 @@
> > +2015-08-18  Trevor Saunders  
> > +
> > +   * dependency.c, dependency.h, gfortran.h, io.c, module.c,
> > +   parse.h, resolve.c, trans-types.h, trans.h: remove useless
> > +
> 
> ...and this one looks wrong.
> 
> 
> Looking at:
> https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=227001
> 
> it looks like the copy of the ChangeLog in the commit message contains
> the missing words (albeit without indentation), but the actual ChangeLog
> files don't.

oops! I think what happened is vim didn't autoindent for some reason and
I didn't notice (I was tired while writing the logs) then my script to
apply change logs from commit messages was buggy and just dropped those
lines :(

btw I should get around to putting a better version of that script in
contrib/ unless someone else has an actually good script to update
ChangeLog files from commit messages.

I guess we could also have a commit hook that checks commit and
ChangeLog format.

Trev

> 
> Dave
> 


Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-19 Thread Matthias Klose
On 08/18/2015 10:36 PM, Lynn A. Boger wrote:
> As discussed in PR 66870, for ppc64le and ppc64le it is preferred to
>  use the gold linker with gccgo or gcc if the split stack option is enabled.
> Use of the gold linker with the split stack option results in less storage
> allocated for goroutine stacks; if split stack is used without the gold
> linker then some testcase failures can occur.
> 
>   Since the use of the gold linker has not been well tested
> with all gcc compilers on Power, it is only used as the linker if the
> split stack option is used.
> 
> This adds the capability to the configure for gcc and libgo to determine
> if the gold linker is available at build time, either in the path or 
> explicitly
>  configured, and its version supports split stack.  If that is the case then
> defines are set that cause the gold linker to be used by the compiler when
> -fsplit-stack is used.  This applies to ppc64 and ppc64le.  Other platforms
> with split stack work as before.
> 
> 2015-08-18Lynn Boger 
> 
> gcc/
> PR target/66870
> config/rs6000/linux64.h: When HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK
> is defined add -fuse-ld=gold if fsplit-stack and not m32
> config/rs6000/sysv4.h:  Define TARGET_CAN_SPLIT_STACK based on
> LIBC version.
> config.in:  Set up HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK.
> configure.ac:  When gold linker is available and its version
> supports split stack on ppc64, ppc64le, define
> HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK.
> configure:  Regenerate.
> 
> libgo/
> PR target/66870
> configure.ac:  When gccgo for building libgo uses the gold version
> containing split stack support on ppc64, ppc64le, define
> LINKER_SUPPORTS_SPLIT_STACK.
> configure:  Regenerate.
> 
> For platforms other than ppc64 and ppc64le, the configure for gcc
> and libgo behave as before.

why keep the old behaviour for other archs that have split stack support? Is it
really necessary to make this dependent on the target? I'm still using an
unreviewed/unpinged patch to enable gold for gccgo (attached).

Matthias


# DP: Pass -fuse-ld=gold to gccgo on targets supporting -fsplit-stack

gcc/go/ 
 

 
* gospec.c (lang_specific_driver): Pass -fuse-ld=gold on targets
 
supporting -fsplit-stack, unless overwritten.   
 

 
gcc/
 
* configure.ac: New define HAVE_GOLD_NON_DEFAULT.   
 
* config.in: Regenerate.
 

 
Index: b/src/gcc/go/gospec.c
===
--- a/src/gcc/go/gospec.c
+++ b/src/gcc/go/gospec.c
@@ -117,6 +117,10 @@ lang_specific_driver (struct cl_decoded_
   /* Whether the -S option was used.  */
   bool saw_opt_S = false;
 
+  /* "-fuse-ld=" if it appears on the command line.  */
+  bool saw_use_ld ATTRIBUTE_UNUSED = false;
+  int need_gold = 0;
+
   /* The first input file with an extension of .go.  */
   const char *first_go_file = NULL;  
 
@@ -217,6 +221,11 @@ lang_specific_driver (struct cl_decoded_
}
 
  break;
+
+   case OPT_fuse_ld_bfd:
+   case OPT_fuse_ld_gold:
+ saw_use_ld = true;
+ break;
}
 }
 
@@ -226,8 +235,14 @@ lang_specific_driver (struct cl_decoded_
   shared_libgcc = 0;
 #endif
 
+#if defined(TARGET_CAN_SPLIT_STACK) && defined(HAVE_GOLD_NON_DEFAULT)
+  if (!saw_use_ld)
+need_gold = 1;
+#endif
+
   /* Make sure to have room for the trailing NULL argument.  */
-  num_args = argc + need_math + shared_libgcc + (library > 0) * 5 + 10;
+  num_args = argc + need_math + shared_libgcc + need_gold +
+(library > 0) * 5 + 10;
   new_decoded_options = XNEWVEC (struct cl_decoded_option, num_args);
 
   i = 0;
@@ -244,6 +259,14 @@ lang_specific_driver (struct cl_decoded_
   &new_decoded_options[j]);
   j++;
 }
+#ifdef HAVE_GOLD_NON_DEFAULT
+  if (need_gold)
+{
+  generate_option (OPT_fuse_ld_gold, NULL, 1, CL_DRIVER,
+  &new_decoded_options[j]);
+  j++;
+}
+#endif
 #endif
 
   /* NOTE: We start at 1 now, not 0.  */
Index: b/src/gcc/config.in
===
--- a/src/gcc/config.in
+++ b/src/gcc/config.in
@@ -1277,6 +1277,12 @@
 #endif
 
 
+/* Define if the gold linker is available as a non-default */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GOLD_NON_

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-19 Thread David Edelsohn
On Wed, Aug 19, 2015 at 3:33 PM, Matthias Klose  wrote:

> why keep the old behaviour for other archs that have split stack support? Is 
> it
> really necessary to make this dependent on the target? I'm still using an
> unreviewed/unpinged patch to enable gold for gccgo (attached).

I much prefer your patch.

Thanks, David


[gomp4] New reduction infrastructure for OpenACC

2015-08-19 Thread Cesar Philippidis
This patch introduces a infrastructure for reductions in OpenACC. This
infrastructure consists of four internal functions,
GOACC_REDUCTION_SETUP, GOACC_REDUCTION_INIT, GOACC_REDUCTION_FINI, and
GOACC_REDUCTION_TEARDOWN, along with a new target hook goacc.reduction.
Each internal function shares a common interface:

  var = ifn (*ref_to_res, local_var, level, op, lid, rid)

var is the intermediate and private result of the reduction. Usually,
var = local_var.

*ref_to_res is a pointer to the resulting reduction. This is only
non-NULL for gang reductions. All other reduction operate on local
variables for which var will suffice.

local_var is a local (private) copy of the reduction variable.

level is the GOMP_DIM of the reduction. Each function call may only
contain one dim. If a loop a combination of gang, worker and vector,
then ifn must be called one per each dim.

op is the reduction operation.

lid is a unique loop ID. It's not 100% unique because it might get reset
in different TUs.

rid is the reduction ID within a loop. E.g., if a loop has two
reductions associated with it, the first could be designated zero and
the second one.

The target hook takes in one argument, the gimple statement containing
the call to the internal reduction function, and it returns true if it
introduces any calls to other target functions. This was necessary for
the nvptx backend, specifically for vector INIT because the thread ID is
necessary.

Each internal function is expanded during execute_oacc_transform using
that goacc reduction target hook. This allows us to generate
target-specific code while lowering it in a target-independent manner.

There are a couple of significant changes in this patch over the
existing OpenMP reduction implementation. The first change is that
reductions no longer rely on special ganglocal mappings. Certain
targets, such as nvptx gpus, have a distributed memory hierarchy. On
nvptx targets, all of the processors are partitioned into blocks. Each
block has a limited amount of shared memory. Because of the OpenACC spec
is written, we were initially mapping nvptx's shared memory into
gang-local memory. However, Nathan's worker and vector state propagator
is robust enough that we were able to eliminate the ganglocal mappings
altogether.

While this new infrastructure allows us to eliminate the ganglocal
mappings, nvptx still needs to use shared memory for worker reductions.
Consider the following example where red is private:

  #pragma acc loop worker reduction (+:red)
  for (...)
red++;

This loop would expand to this during omp-lower:

  red = GOACC_REDUCTION_SETUP (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);
  GOACC_FORK (GOMP_DIM_WORKER);
  red = GOACC_REDUCTION_INIT (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);

  for (...)
red++;

  red = GOACC_REDUCTION_FINI (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);
  GOACC_JOIN (GOMP_DIM_WORKER);
  red = GOACC_REDUCTION_TEARDOWN (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);

For nvptx targets, SETUP and TEARDOWN are responsible for allocating and
freeing shared memory. INIT is responsible for initializing the private
reduction variable. This is necessary for vector reductions because we
want thread 0 to contain the original value of local_var, and the other
threads to be initialized to the proper value for 'op'. All of the
intermediate reduction results are combined in FINI and written back to
var or *ref_to_res, whichever is necessary, in TEARDOWN.

I don't want to delve too much into the use of this infrastructure right
now. We do have a design for that, and I intend to present more details
when I post the lowering patch. The next patch will likely be the nvptx
changes though.

One of the reasons why we needed create this generic interface was to
implement vector reductions on nvptx targets. On nvptx targets, we're
mapping vectors to warps. That's fine, but warps cannot use spinlocks or
the warp will deadlock. As a consequence, we can't use the existing
OpenMP atomic reductions in OpenACC. The way I got around the spinlock
problem in 5.0 was by allocating an array of length vector_length, and
stashing all of the intermediate reductions in there. The later on, one
thread would merge all of those reductions together.

This new reduction infrastructure provides a more elegant solution for
OpenACC reduction. And while we're still using atomic operations for
gang and worker reductions, we're no longer using a global lock for
workers. This api allows us to use a lock in shared memory for workers.
That said, this infrastructure does provide sufficient flexibility to
implement tree reductions for gangs and workers later on.

It should be noted that this is not a replacement for the existing
OpenMP reductions. Rather, OpenMP will continue to use
lower_reduction_clauses and friends, while OpenACC will use this
infrastructure. That said, OpenMP could taught to use this infrastructure.

Is this patch OK for gomp-4_0-branch?

Thanks,
Cesar
2015-08-19  Cesar Philippidis  


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-19 Thread Jeff Law

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:

All:

Please find the updated patch with suggestion and feedback
incorporated.

Thanks Jeff and Richard for the review comments.

Following changes were done based on the feedback on RFC comments.
and the review for the previous patch.

1. Both tracer and path splitting pass are separate passes so  that
two instances of the pass will run in the end, one doing path
splitting and one doing  tracing, at different times in the
optimization pipeline.

I'll have to think about this.  I'm not sure I agree totally with
Richi's assertion that we should share code with the tracer pass, but
I'll give it a good looksie.




2. Transform code is shared for tracer and path splitting pass. The
common code in extracted in a given function transform_duplicate And
place the function in tracer.c and the path splitting pass uses the
transform code.

OK.  I'll take a good look at that.



3. Analysis for the basic block population and traversing the basic
block using the Fibonacci heap is commonly used. This cannot be
Factored out into new function as the tracer pass does more analysis
based on the profile and the different heuristics is used in tracer
And path splitting pass.

Understood.



4. The include headers is minimal and presence of what is required
for the path splitting pass.

THanks.



5. The earlier patch does the SSA updating  with replace function to
preserve the SSA representation required to move the loop latch node
same as join Block to its predecessors and the loop latch node is
just forward block. Such replace function are not required as
suggested by the Jeff. Such replace Function goes away with this
patch and the transformed code is factored into a given function
which is shared between tracer and path splitting pass.

Sounds good.



Bootstrapping with i386 and Microblaze target works fine. No
regression is seen in Deja GNU tests for Microblaze. There are
lesser failures. Mibench/EEMBC benchmarks were run for Microblaze
target and the gain of 9.3% is seen in rgbcmy_lite the EEMBC
benchmarks.
What do you mean by there are "lesser failures"?  Are you saying there 
are cases where path splitting generates incorrect code, or cases where 
path splitting produces code that is less efficient, or something else?




SPEC 2000 benchmarks were run with i386 target and the following
performance number is achieved.

INT benchmarks with path splitting(ratio) Vs INT benchmarks without
path splitting(ratio) = 3661.225091 vs 3621.520572

That's an impressive improvement.

Anyway, I'll start taking a close look at this momentarily.

Jeff


[C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Paolo Carlini

Hi,

submitter noticed that, in violation of [basic.start.main], we don't 
reject as ill-formed the declaration of a 'main' variable in the global 
namespace. Not a big deal IMHO, but the below simple check works well 
for me on x86_64-linux.


Thanks,
Paolo.

//
/cp
2015-08-19  Paolo Carlini  

PR c++/67065
* decl.c (grokvardecl): Reject 'main' as global variable.

/testsuite
2015-08-19  Paolo Carlini  

PR c++/67065
* g++.dg/other/pr67065.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 227003)
+++ cp/decl.c   (working copy)
@@ -8355,6 +8355,11 @@ grokvardecl (tree type,
   else
 DECL_INTERFACE_KNOWN (decl) = 1;
 
+  if (DECL_NAME (decl)
+  && MAIN_NAME_P (DECL_NAME (decl))
+  && CP_DECL_CONTEXT (decl) == global_namespace)
+error ("cannot declare %<::main%> to be a global variable");
+
   /* Check that the variable can be safely declared as a concept.
  Note that this also forbids explicit specializations.  */
   if (conceptp)
Index: testsuite/g++.dg/other/pr67065.C
===
--- testsuite/g++.dg/other/pr67065.C(revision 0)
+++ testsuite/g++.dg/other/pr67065.C(working copy)
@@ -0,0 +1,3 @@
+// PR c++/67065
+
+int main;  // { dg-error "cannot declare" }


Re: [PATCH][2/n] Change dw2_asm_output_offset to allow assembling extra offset

2015-08-19 Thread Mike Stump
On Aug 19, 2015, at 7:25 AM, Richard Biener  wrote:
> 
> This is needed so that we can output references to $early-debug-symbol + 
> constant offset where $early-debug-symbol is the beginning of a 
> .debug_info section containing early debug info from the compile-stage.
> Constant offsets are always fine for any object formats I know,

On darwin, they generally speaking, are not. subsections_via_symbols can shed 
some light on the topic, if one is interested all the fun.  I’ll give a quick 
intro below.

  foo+n

only works if there is not other label of a certain type between label and 
foo+4, and there are no labels of a certain type at foo+4, and foo+n refers to 
at least one byte after that label, and n is non-negative and …

So, for example, in

nop
  foo:
nop

foo+32 would be invalid as nops are 4 bytes or so, and +32 is beyond the size 
of the region.  foo+0 be fine.  foo+4 would be invalid, assuming nop generates 
4 bytes.  foo-4 would be invalid.

In:

  foo:
nop
  bar:
nop

foo+4 would be invalid, as bar exists.

In:

foo:
nop
L12:
nop

foo+4 is fine, as local labels don’t participate.  One way to think about this 
is imagine that each global label points to an independent section and that 
section isn’t loaded unless something refers to it, and one can only have 
pointers to the bytes inside that section, and that sections on output can be 
arbitrarily ordered.

  bar: nop
  foo: nop

bar+4, even if you deferred this to running code, need not refer to foo.

I say this as background.

In the optimization where gcc tries to bunch up global variables together and 
form base+offset to get to the different data, this does not work on darwin 
because base+offset isn’t a valid way to go from one global label to the next, 
even in the same section.

Now, if you merely sneak in data into the section with no labels and you need 
to account for N extra bytes before then you can change the existing reference 
to what it was before + N, without any worry.  If you remove the interior 
labels to form your new base, and concatenate all the data together, then 
base+N to refer to the data is fine, if there are at least N+1 bytes of data 
after base.

foo: nop
bar: nop

would become:

base:
Lfoo: nop
Lbar: nop

base+0 and base+4.

So, if you confident you know and follow the rules, ok from my perspective.  If 
you’re unsure, I can try and read a .s file and see if it looks ok.  Testing 
would may not catch broken things unless you also select dead code stripping 
and try test cases with dead code.

> The LTO support adds a single call here:
> 
> @@ -9064,8 +9248,12 @@ output_die (dw_die_ref die)
>size = DWARF2_ADDR_SIZE;
>  else
>size = DWARF_OFFSET_SIZE;
> - dw2_asm_output_offset (size, sym, debug_info_section, 
> "%s",
> -name);
> + if (AT_ref (a)->with_offset)
> +   dw2_asm_output_offset (size, sym, AT_ref 
> (a)->die_offset,
> +  debug_info_section, "%s", 
> name);
> + else
> +   dw2_asm_output_offset (size, sym, debug_info_section, 
> "%s",
> +  name);
>}

So, I glanced around this call site, and it would seem safe if all you’re doing 
is adding die_offset bytes of data or more and no global labels.

[C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Ville Voutilainen
>submitter noticed that, in violation of [basic.start.main], we don't reject
>as ill-formed the declaration of a 'main' variable in the global namespace.
>Not a big deal IMHO, but the below simple check works well for me on 
>x86_64-linux.

Just fyi, gcc accepts

decltype(main) x;

decltype(sizeof(decltype(main)*)) x;

which are "uses" of main and also violate [basic.start.main]/3.


Re: [C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Paolo Carlini

Hi Ville,

On 08/19/2015 10:12 PM, Ville Voutilainen wrote:

submitter noticed that, in violation of [basic.start.main], we don't reject
as ill-formed the declaration of a 'main' variable in the global namespace.
Not a big deal IMHO, but the below simple check works well for me on 
x86_64-linux.

Just fyi, gcc accepts

decltype(main) x;

decltype(sizeof(decltype(main)*)) x;

which are "uses" of main and also violate [basic.start.main]/3.
"good" to know. In my experience sometimes the front end appears to 
so-to-speak pre-declare entities. For instance I filed a while ago 
c++/48396. Not sure if in practice the exact same code is involved...


Paolo.


[nvptx] testsuite cleanups

2015-08-19 Thread Nathan Sidwell

This patch cleans up a bunch of c testsuite fails, (by skipping them)

1) make nvptx-*-* a freestanding environment.  While there is a newlib port, 
it's not a full c library, and in particular doesn't have all the IO that's 
generally presumed.


2) added effective_target_global_constructor.  nvptx lacks these, and the 
environment for which it's intended doesn't really need them,


3) Some tests already check 'SIGNAL_SUPPRESS' to avoid signals.  Added smarts in 
gcc.exp to set that from the board info.


4) skip the dwarf tests entirely.  PTX dwarf directives are somewhat funky and 
it's just meaningless noise in the testsuite right now.


5) skip  tests that cause ptxas to blow up.  There's no point waiting for ptxas 
to be fixed.


6) .. except for callind, which is fixed simply by not naming a function 'call'.

7) mul-subnormal-single-1 had a full 3 argument definition of main,  but doesn't 
 need it.


8) added check for non frestanding to a bunch of tests that require more IO than 
ptx can provide.


9) added check for  nonlocal_goto on a bunch of tests that used setjmp (builtin 
or otherwise).


10) added check for global constructor on a test.

11) added checks for profiling on some tests that check profiling.

This isn't a full cleanup, but a first pass to remove a bunch of false 
negatives.  I expect further cleanups and/or fixes later.


Any comments or objections?

nathan
2015-08-19  Nathan Sidwell  

	* lib/target-supports.exp (check_effective_target_freestanding):
	nvptx is freestanding.
	(check_effective_target_global_constructor): New.
	* lib/gcc.exp (gcc_target_compile): Set SIGNAL_SUPPRESS if needed.
	* gcc.dg/debug/debug.exp: Skip  for nvptx.
	* gcc.dg/debug/dwarf2/dwarf2.exp: Likewise.

	* gcc.c-torture/execute/981019-1.c: Ptx assembler bug.
	* gcc.c-torture/compile/limits-externdecl.c: Likewise.
	* gcc.c-torture/compile/pr33855.c: Likewise.
	* gcc.c-torture/compile/920723-1.c: Likewise.

	* gcc.c-torture/compile/callind.c: Change name to avoid ptxas bug.

	* gcc.c-torture/execute/ieee/mul-subnormal-single-1.c: Adjust main
	decl to be more normal.

	* gcc.c-torture/execute/pr34456.c: Require not freestanding
	* gcc.c-torture/execute/vprintf-chk-1.c: Likewise.
	* gcc.c-torture/execute/vfprintf-1.c: Likewise.
	* gcc.c-torture/execute/gofast.c: Likewise.
	* gcc.c-torture/execute/fprintf-1.c: Likewise.
	* gcc.c-torture/execute/fprintf-chk-1.c: Likewise.
	* gcc.c-torture/execute/vprintf-1.c: Likewise.
	* gcc.c-torture/execute/vfprintf-chk-1.c: Likewise.

	* gcc.c-torture/execute/builtins/sprintf-chk.x: Require nonlocal goto.
	* gcc.c-torture/execute/builtins/snprintf-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/memmove-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/stpcpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/memcpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/mempcpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/vsnprintf-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/memset-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/strcpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/strcat-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/stpncpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/strncpy-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/vsprintf-chk.x: Likewise.
	* gcc.c-torture/execute/builtins/strncat-chk.x: Likewise.
	* gcc.dg/setjmp-1.c: Likewise.
	* gcc.dg/cleanup-12.c: Likewise.
	* gcc.dg/cleanup-13.c: Likewise.
	* gcc.dg/cleanup-5.c: Likewise.

	* gcc.dg/constructor-1.c: Require global ctor.

	* gcc.dg/fork-instrumentation.c: Require profiling.
	* gcc.dg/20030107-1.c: Likewise.
	* gcc.dg/20030702-1.c: Likewise.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp	(revision 453995)
+++ lib/target-supports.exp	(working copy)
@@ -580,7 +580,10 @@ proc check_profiling_available { test_wh
 # in Section 4 of C99 standard. Effectively, it is a target which supports no
 # extra headers or libraries other than what is considered essential.
 proc check_effective_target_freestanding { } {
-	return 0
+if { [istarget nvptx-*-*] } {
+	return 1
+}
+return 0
 }
 
 # Return 1 if target has packed layout of structure members by
@@ -641,6 +644,15 @@ proc check_effective_target_nonlocal_got
 if { [istarget nvptx-*-*] } {
 	return 0
 }
+return 1
+}
+
+# Return 1 if global constructors are supported, 0 otherwise.
+
+proc check_effective_target_global_constructor {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
 return 1
 }
 
Index: lib/gcc.exp
===
--- lib/gcc.exp	(revision 453995)
+++ lib/gcc.exp	(working copy)
@@ -150,6 +150,9 @@ proc gcc_target_compile { source dest ty
 if [target_info exists gcc,no_label_values] {
 	lappend options "additional_flags=-DNO_LABEL_VALUES"
 }
+if [target_info exists gcc,signal_suppress] {
+	lappend options "additional_flags=-DSIGNAL

Re: [C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Ville Voutilainen
On 19 August 2015 at 23:26, Paolo Carlini  wrote:
> Hi Ville,
>
>
> On 08/19/2015 10:12 PM, Ville Voutilainen wrote:
>>>
>>> submitter noticed that, in violation of [basic.start.main], we don't
>>> reject
>>> as ill-formed the declaration of a 'main' variable in the global
>>> namespace.
>>> Not a big deal IMHO, but the below simple check works well for me on
>>> x86_64-linux.
>>
>> Just fyi, gcc accepts
>>
>> decltype(main) x;
>>
>> decltype(sizeof(decltype(main)*)) x;
>>
>> which are "uses" of main and also violate [basic.start.main]/3.
>
> "good" to know. In my experience sometimes the front end appears to
> so-to-speak pre-declare entities. For instance I filed a while ago
> c++/48396. Not sure if in practice the exact same code is involved...

Let me clarify: this is not about that. It's code like

int main() {}

decltype(main) x;

whereas just having

decltype(main) x;

as the whole program will diagnose the use of an undeclared
identifier. Nevertheless,
no use, not just odr-use, but use of main as in the entry point
pseudo-function is allowed
by the standard, but gcc allows some of them. gcc rejects attempts to
call main even in
such decltype contexts, but it can be fooled to allow other uses of main.


Re: [C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Paolo Carlini

Hi,

On 08/19/2015 10:33 PM, Ville Voutilainen wrote:

On 19 August 2015 at 23:26, Paolo Carlini  wrote:

Hi Ville,


On 08/19/2015 10:12 PM, Ville Voutilainen wrote:

submitter noticed that, in violation of [basic.start.main], we don't
reject
as ill-formed the declaration of a 'main' variable in the global
namespace.
Not a big deal IMHO, but the below simple check works well for me on
x86_64-linux.

Just fyi, gcc accepts

decltype(main) x;

decltype(sizeof(decltype(main)*)) x;

which are "uses" of main and also violate [basic.start.main]/3.

"good" to know. In my experience sometimes the front end appears to
so-to-speak pre-declare entities. For instance I filed a while ago
c++/48396. Not sure if in practice the exact same code is involved...

Let me clarify: this is not about that. It's code like

int main() {}

decltype(main) x;

whereas just having

decltype(main) x;

as the whole program will diagnose the use of an undeclared
identifier. Nevertheless,
no use, not just odr-use, but use of main as in the entry point
pseudo-function is allowed
by the standard, but gcc allows some of them. gcc rejects attempts to
call main even in
such decltype contexts, but it can be fooled to allow other uses of main.
Ah, Ok, I didn't actually try to compile your snippet. Then I suspect 
you mean c++/66606?!? Please double check if something is missing in 
Martin's bug!


Paolo.



Re: [C++ Patch] PR 67065 ("Missing diagnostics for ill-formed program with main variable instead of function")

2015-08-19 Thread Ville Voutilainen
On 19 August 2015 at 23:37, Paolo Carlini  wrote:
> Ah, Ok, I didn't actually try to compile your snippet. Then I suspect you
> mean c++/66606?!? Please double check if something is missing in Martin's
> bug!


That looks fairly comprehensive to me, I don't think I have cases to
add to it. I did confirm
that bug, though, I've been playing with such abuses of main but never
thought it important
enough to bother reporting a bug. :)


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-19 Thread Jeff Law

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:



 From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal
Date: Sat, 15 Aug 2015 18:19:14 +0200
Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
  tree ssa representation.

Added a new pass on path splitting on tree SSA representation. The path
splitting optimization does the CFG transformation of join block of the
if-then-else same as the loop latch node is moved and merged with the
predecessor blocks after preserving the SSA representation.

ChangeLog:
2015-08-15  Ajit Agarwal

* gcc/Makefile.in: Add the build of the new file
tree-ssa-path-split.c

Instead:

* Makefile.in (OBJS): Add tree-ssa-path-split.o.



* gcc/opts.c (OPT_ftree_path_split) : Add an entry for
Path splitting pass with optimization flag greater and
equal to O2.


* opts.c (default_options_table): Add entry for path splitting
optimization at -O2 and above.




* gcc/passes.def (path_split): add new path splitting pass.

Capitalize "add".





* gcc/tree-ssa-path-split.c: New.

Use "New file".


* gcc/tracer.c (transform_duplicate): New.

Use "New function".


* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
* gcc/testsuite/gcc.dg/path-split-1.c: New.
These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite" 
prefix.



* gcc/doc/invoke.texi
(ftree-path-split): Document.
(fdump-tree-path_split): Document.

Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with "gcc/".

Now that the ChangeLog nits are out of the way, let's get to stuff 
that's more interesting.






Signed-off-by:Ajit agarwalajit...@xilinx.com
---
  gcc/Makefile.in  |   1 +
  gcc/common.opt   |   4 +
  gcc/doc/invoke.texi  |  16 +-
  gcc/opts.c   |   1 +
  gcc/passes.def   |   1 +
  gcc/testsuite/gcc.dg/path-split-1.c  |  65 ++
  gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +
  gcc/timevar.def  |   1 +
  gcc/tracer.c |  37 +--
  gcc/tree-pass.h  |   1 +
  gcc/tree-ssa-path-split.c| 330 +++
  11 files changed, 503 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
  create mode 100644 gcc/tree-ssa-path-split.c

diff --git a/gcc/common.opt b/gcc/common.opt
index e80eadf..1d02582 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2378,6 +2378,10 @@ ftree-vrp
  Common Report Var(flag_tree_vrp) Init(0) Optimization
  Perform Value Range Propagation on trees

+ftree-path-split
+Common Report Var(flag_tree_path_split) Init(0) Optimization
+Perform Path Splitting
Maybe "Perform Path Splitting for loop backedges" or something which is 
a little more descriptive.  The above isn't exactly right, so don't use 
it as-is.





@@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher.  Null 
pointer check
  elimination is only done if @option{-fdelete-null-pointer-checks} is
  enabled.

+@item -ftree-path-split
+@opindex ftree-path-split
+Perform Path Splitting  on trees.  The join blocks of IF-THEN-ELSE same
+as loop latch node is moved to its predecessor and the loop latch node
+will be forwarding block.  This is enabled by default at @option{-O2}
+and higher.

Needs some work.  Maybe something along the lines of

When two paths of execution merge immediately before a loop latch node, 
try to duplicate the merge node into the two paths.



diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..20ddf3d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_ccp);
  /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
+  NEXT_PASS (pass_path_split);
  NEXT_PASS (pass_forwprop);
  NEXT_PASS (pass_sra_early);
I can't recall if we've discussed the location of the pass at all.  I'm 
not objecting to this location, but would like to hear why you chose 
this particular location in the optimization pipeline.



  /* pass_build_ealias is a dummy pass that ensures that we
diff --git a/gcc/testsuite/gcc.dg/path-split-1.c 
b/gcc/testsuite/gcc.dg/path-split-1.c
ISTM the two tests should be combined into a single test.  I didn't see 
a functional difference in the test() function between those two tests.


I believe you can still create/scan debugging dumps with dg-do run test.



+DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path_split")

tree path split rather than using underscores


diff --git a/gcc/tracer.c 

Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 10:53 AM, H.J. Lu  wrote:
> On Wed, Aug 19, 2015 at 10:48 AM, Segher Boessenkool
>  wrote:
>> On Wed, Aug 19, 2015 at 10:08:01AM -0700, H.J. Lu wrote:
>>> > Maybe something like (heavily cut'n'pasted):
>>> >
>>> >
>>> > @deftypefn {Built-in Function} {void *} __builtin_argument_address (void)
>>> > This function is similar to @code{__builtin_frame_address} with an
>>> > argument of 0, but it returns the address of the incoming arguments to
>>> > the current function rather than the address of its frame.
>>>
>>> This doesn't make senses when there is no argument or arguments
>>> are passed in registers.
>>
>> Sure, but see the weasel-words below ("The exact...")
>>
>>> To me, argument pointer is a virtual concept
>>> and an implementation detail internal to GCC.  I am not sure if another
>>> compiler can implement it based on this description.
>>
>> The same is true for frame_address, on many machines.
>
> Stack frame is well understood unlike argument pointer which is
> pretty vague.
>

How about this

@deftypefn {Built-in Function} {void *} __builtin_argument_pointer (void)
This function is similar to @code{__builtin_frame_address} with an
argument of 0, but it returns the address of the incoming arguments to
the current function rather than the address of its frame.  Unlike
@code{__builtin_frame_address}, the frame pointer register isn't
required.

The exact definition of this address depends upon the processor and the
calling convention.  Usually some arguments are passed in registers and
the rest on the stack, and this builtin returns the address of the
first argument which would be passed on the stack.
@end deftypefn

-- 
H.J.


Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-19 Thread Lynn A. Boger

The split stack support only recently went into the gold
linker for Power so the configure needs to detect if the
gold linker version contains that support.  If the build tries
 to use a gold linker without that support the build
will fail on Power.  My understanding was that the gold
linker support had been there for other platforms.  That is why
the configure check is target dependent and only Power cares
about the version number.

I started out with a simpler patch like you have but after trying to 
build it

on ppc64le, ppc64, and ppc this is what I ended up with and I didn't
want to mess up other platforms so I left those as is.

On 08/19/2015 02:33 PM, Matthias Klose wrote:

On 08/18/2015 10:36 PM, Lynn A. Boger wrote:

As discussed in PR 66870, for ppc64le and ppc64le it is preferred to
  use the gold linker with gccgo or gcc if the split stack option is enabled.
Use of the gold linker with the split stack option results in less storage
allocated for goroutine stacks; if split stack is used without the gold
linker then some testcase failures can occur.

   Since the use of the gold linker has not been well tested
with all gcc compilers on Power, it is only used as the linker if the
split stack option is used.

This adds the capability to the configure for gcc and libgo to determine
if the gold linker is available at build time, either in the path or explicitly
  configured, and its version supports split stack.  If that is the case then
defines are set that cause the gold linker to be used by the compiler when
-fsplit-stack is used.  This applies to ppc64 and ppc64le.  Other platforms
with split stack work as before.

2015-08-18Lynn Boger 

gcc/
 PR target/66870
 config/rs6000/linux64.h: When HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK
 is defined add -fuse-ld=gold if fsplit-stack and not m32
 config/rs6000/sysv4.h:  Define TARGET_CAN_SPLIT_STACK based on
 LIBC version.
 config.in:  Set up HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK.
 configure.ac:  When gold linker is available and its version
 supports split stack on ppc64, ppc64le, define
 HAVE_LD_GOLD_SUPPORTS_SPLIT_STACK.
 configure:  Regenerate.

libgo/
 PR target/66870
 configure.ac:  When gccgo for building libgo uses the gold version
 containing split stack support on ppc64, ppc64le, define
 LINKER_SUPPORTS_SPLIT_STACK.
 configure:  Regenerate.

For platforms other than ppc64 and ppc64le, the configure for gcc
and libgo behave as before.

why keep the old behaviour for other archs that have split stack support? Is it
really necessary to make this dependent on the target? I'm still using an
unreviewed/unpinged patch to enable gold for gccgo (attached).

Matthias






Re: [PING] Re: [PATCH] New configure option to default enable Smart Stack Protection

2015-08-19 Thread Jeff Law

On 07/13/2015 07:20 AM, Magnus Granberg wrote:

Patch updated and tested on x86_64-unknown-linux-gnu (Gentoo)

Changlogs
/gcc
2015-07-05  Magnus Granberg

 * common.opt (fstack-protector): Initialize to -1.
 (fstack-protector-all): Likewise.
 (fstack-protector-strong): Likewise.
 (fstack-protector-explicit): Likewise.
 * configure.ac: Add --enable-default-ssp.
 * defaults.h (DEFAULT_FLAG_SSP): New.  Default SSP to strong.
 * opts.c (finish_options): Update opts->x_flag_stack_protect if it is 
-1.
 * doc/install.texi: Document --enable-default-ssp.
 * config.in: Regenerated.
 * configure: Likewise.

/testsuite
2015-07-13  Magnus Granberg

 * lib/target-supports.exp
 (check_effective_target_fstack_protector_enabled): New test.
 * gcc.target/i386/ssp-default.c: New test.

Sorry for the delay, it seems nobody picked this up.

It's a nit, but the feature is "Stack Smashing Protection", not "Smart 
Stack Protection".  I'll fix that nit and install your change.


Thanks!

Jeff




Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2015 at 02:53:47PM -0700, H.J. Lu wrote:
> How about this
> 
> @deftypefn {Built-in Function} {void *} __builtin_argument_pointer (void)
> This function is similar to @code{__builtin_frame_address} with an
> argument of 0, but it returns the address of the incoming arguments to
> the current function rather than the address of its frame.  Unlike
> @code{__builtin_frame_address}, the frame pointer register isn't
> required.

That last line isn't true, if your port uses INITIAL_FRAME_POINTER_RTX.
Maybe it shouldn't be true otherwise either (but currently a hard frame
pointer is forced, indeed).  Have we gone full circle now? ;-)

> The exact definition of this address depends upon the processor and the
> calling convention.  Usually some arguments are passed in registers and
> the rest on the stack, and this builtin returns the address of the
> first argument which would be passed on the stack.
> @end deftypefn


Segher


Re: [PATCH] Add __builtin_stack_top

2015-08-19 Thread H.J. Lu
On Wed, Aug 19, 2015 at 3:10 PM, Segher Boessenkool
 wrote:
> On Wed, Aug 19, 2015 at 02:53:47PM -0700, H.J. Lu wrote:
>> How about this
>>
>> @deftypefn {Built-in Function} {void *} __builtin_argument_pointer (void)
>> This function is similar to @code{__builtin_frame_address} with an
>> argument of 0, but it returns the address of the incoming arguments to
>> the current function rather than the address of its frame.  Unlike
>> @code{__builtin_frame_address}, the frame pointer register isn't
>> required.
>
> That last line isn't true, if your port uses INITIAL_FRAME_POINTER_RTX.
> Maybe it shouldn't be true otherwise either (but currently a hard frame
> pointer is forced, indeed).  Have we gone full circle now? ;-)

Let's drop it:


@deftypefn {Built-in Function} {void *} __builtin_argument_pointer (void)
This function is similar to @code{__builtin_frame_address} with an
argument of 0, but it returns the address of the incoming arguments to
the current function rather than the address of its frame.

The exact definition of this address depends upon the processor and the
calling convention.  Usually some arguments are passed in registers and
the rest on the stack, and this builtin returns the address of the
first argument which would be passed on the stack.
@end deftypefn


-- 
H.J.


  1   2   >