Re: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Richard Sandiford
Jeff Law  writes:
> On 09/19/14 01:23, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on the
 word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
 but you can have a subreg for word 2 of a 4-word value (as well as lowpart
 subregs of that word, etc.).  This causes problems when an architecture has
 wider-than-word registers, since the addressability of a word can
 then depend
 on which register class is used.

 The register allocators need to fix up cases where a subreg turns out to
 be invalid for a particular class.  This is really an extension of what
 we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
>>> I thought we fixed these problems long ago with the change to subreg_byte?!?
>>
>> No, that was fixing something else.  (I'm just about old enough to remember
>> that too!)  The problem here is that (say):
>>
>>  (subreg:SI (reg:DI X) 4)
>>
>> is independently addressable on little-endian AArch32 if X assigned
>> to a GPR, but not if X is assigned to a vector register.  We need
>> to allow these kinds of subreg on pseudos in order to decompose multiword
>> arithmetic.  It's then up to the RA to realise that a reload would be
>> needed if X were assigned to a vector register, since the upper half
>> of a vector register cannot be independently accessed.
>>
>> Note that you could write this example even with the old word-style offsets
>> and IIRC the effect would have been the same.
> OK.  So I kept thinking in terms of the byte offset stuff.  But what 
> you're tackling is related to the mess around the mode of the subreg 
> having a different meaning if its smaller than a word vs word-sized or 
> greater.
>
> Right?

Yeah, that's right.  Addressability is based on words, which is inconvenient
when your registers are bigger than a word.

Thanks,
Richard



Re: [PATCH 4/5] Generalise invalid_mode_change_p

2014-09-22 Thread Richard Sandiford
Jeff Law  writes:
> On 09/18/14 04:25, Richard Sandiford wrote:
>> This is the main patch for the bug.  We should treat a register as invalid
>> for a mode change if simplify_subreg_regno cannot provide a new register
>> number for the result.  We should treat a class as invalid for a mode change
>> if all registers in the class are invalid.  This is an extension of the old
>> CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M).
>>
>> I forgot to say that the patch is a prerequisite to removing aarch64's
>> C_C_C_M.  There are other prerequisites too, but removing C_C_C_M without
>> this patch caused regressions in the existing testsuite, which is why no
>> new tests are needed.
>>
>>
>> gcc/
>>  * hard-reg-set.h: Include hash-table.h.
>>  (target_hard_regs): Add a finalize method and a x_simplifiable_subregs
>>  field.
>>  * target-globals.c (target_globals::~target_globals): Handle
>>  hard_regs->finalize.
>>  * rtl.h (subreg_shape): New structure.
>>  (shape_of_subreg): New function.
>>  (simplifiable_subregs): Declare.
>>  * reginfo.c (simplifiable_subreg): New structure.
>>  (simplifiable_subregs_hasher): Likewise.
>>  (simplifiable_subregs): New function.
>>  (invalid_mode_changes): Delete.
>>  (alid_mode_changes, valid_mode_changes_obstack): New variables.
>>  (record_subregs_of_mode): Remove subregs_of_mode parameter.
>>  Record valid mode changes in valid_mode_changes.
>>  (find_subregs_of_mode): Remove subregs_of_mode parameter.
>>  Update calls to record_subregs_of_mode.
>>  (init_subregs_of_mode): Remove invalid_mode_changes and bitmap
>>  handling.  Initialize new variables.  Update call to
>>  find_subregs_of_mode.
>>  (invalid_mode_change_p): Check new variables instead of
>>  invalid_mode_changes.
>>  (finish_subregs_of_mode): Finalize new variables instead of
>>  invalid_mode_changes.
>>  (target_hard_regs::finalize): New function.
>>  * ira-costs.c (print_allocno_costs): Call invalid_mode_change_p
>>  even when CLASS_CANNOT_CHANGE_MODE is undefined.
>>
>> Index: gcc/rtl.h
>> ===
>> --- gcc/rtl.h2014-09-15 11:55:40.459855161 +0100
>> +++ gcc/rtl.h2014-09-15 12:26:21.249077760 +0100
>> +/* Return the shape of a SUBREG rtx.  */
>> +
>> +static inline subreg_shape
>> +shape_of_subreg (const_rtx x)
>> +{
>> +  return subreg_shape (GET_MODE (SUBREG_REG (x)),
>> +   SUBREG_BYTE (x), GET_MODE (x));
>> +}
>> +
> Is there some reason you don't have a constructor that accepts a 
> const_rtx?

I was worried that by allowing implicit const_rtx->subreg_shape
conversions, it would be less obvious that the rtx has to have
code SUBREG.  I.e. a checked conversion would be hidden in the
constructor rather than being explicit.

If with David's new rtx hierarchy we end up with an rtx_subreg
subclass then I agree we should have a constructor that takes
one of those.

Thanks,
Richard



Re: Speedup int_bit_from_pos

2014-09-22 Thread Richard Biener
On Sun, 21 Sep 2014, Jan Hubicka wrote:

> > 
> > Please omit static from inline functions.
> 
> Yep, I suppose we want to drop static in all inlines? I can make patch for 
> that.
> > 
> > Also one notable difference with your patches is that the fits hwi is now 
> > not tested on the result but on the result input which, multiplied by 8, 
> > might not fit a hwi now.  So please use wide-ints here (the to_offset 
> > flavor).
> 
> The function must always suceed (so user promise it will fit in HWI) and for
> performance reasons I would rather not go into wide int by defualt, but I can
> do that with checking enabled.

wide-int should be fast enough, please use it.

Richard.


Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)

2014-09-22 Thread James Greenhalgh
On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote:
> Hi!
> 
> On Thu, 2014-09-18 11:19:21 +0100, James Greenhalgh 
>  wrote:
> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01334.html
> > The construct
> > 
> >   (clobber (match_scratch 0 "r"))
> > 
> > is invalid - operand 0 must be marked either write or read/write.
> > 
> > Likewise
> > 
> >   (match_* 0 "&r")
> > 
> > is invalid, marking an operand earlyclobber does not remove the need to
> > also mark it write or read/write.
> 
> My build robot shows a new build error, which I guess is
> caused/uncovered by your genrecog change on bfin-elf (see eg. build
> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667):
> 
> build/genrecog /home/jbglaw/repos/gcc/gcc/common.md 
> /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \
>   insn-conditions.md > tmp-recog.c
> /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not 
> supported in define_split
> make[1]: *** [s-recog] Error 1
> make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc'
> make: *** [all-gcc] Error 2
> 
> 
> Would be nice if the bfin maintainer or you would come up with a fix.

Hi Jan,

I posted a fix for this on Friday evening at:

  https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html

I'm waiting for a bfin maintainer to say OK, as it isn't a port I know
well.

Thanks,
James



Re: [PATCH][match-and-simplify] User defined predicates

2014-09-22 Thread Richard Biener
On Tue, 16 Sep 2014, Marc Glisse wrote:

> On Tue, 16 Sep 2014, Richard Biener wrote:
> 
> > The following adds the ability to write predicates using patterns
> > with an example following negate_expr_p which already has a
> > use in comparison folding (via its if c-expr).
> > 
> > The syntax is as follows:
> > 
> > (match negate_expr_p
> > INTEGER_CST
> > (if (TYPE_OVERFLOW_WRAPS (type)
> >  || may_negate_without_overflow_p (t
> > (match negate_expr_p
> > (bit_not @0)
> > (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type
> > (match negate_expr_p
> > FIXED_CST)
> > (match negate_expr_p
> > (negate @0))
> > ...
> > 
> > that is, you write '(match ' instead of '(simplify' and then
> > follow with a pattern and optional conditionals.  There should
> > be no transform pattern (unchecked yet).  Multiple matches for
> > the same  simply add to what is recognized as .
> > The predicate is applied to a single 'tree' operand and looks
> > up SSA defs and utilizes the optional valueize hook.
> > 
> > Currently both GENERIC and GIMPLE variants result in name-mangling
> > and the proptotypes (unprototyped anywhere)
> > 
> > bool tree_negate_expr_p (tree t);
> > bool gimple_negate_expr_p (tree t, tree (*valueize)(tree) = NULL);
> 
> Ah, I haven't looked at the generated code, but I was expecting something
> roughly like:
> 
> struct matcher
> {
>   std::function valueize;
>   bool negate_expr(tree);
>   ...
> };
> 
> where we can call negate_expr recursively without caring about passing
> valueize (if there are 2 matchers, one without a valueize function,
> negate_expr can be static in that version). Although recursive calls sound
> potentially slow, and having a thread_local counter in valueize to limit the
> call depth may not be ideal.
> 
> Please note that I am not at all saying the above is a good design, just
> dropping a random thought.

Yeah, abstracting a bit from the low-level interface might be nice.

Note that I wouldn't recommend using recursive predicates as on GIMPLE
this easily can result in exponential runtime behavior (and capping
on a recursion limit looks ugly).  Instead of using recursive predicates
a lattice of predicate results should be provided by the caller
(which would ask for a lattice abstraction as well).  So you'd have
sth like

struct lattice
{
  tree valueize (tree);
  bool negate_expr_p (tree);
  bool nonnegative_p (tree);
  ...
};

note that the fold-const.c negate_expr_p is really a predicate on
whether negate_expr will be able to simplify.  Yes, fold-const.c
has "recursive" transforms, something match-and-simplify doesn't
support either.  Here the proper way is to implement this kind
of stuff in a real pass.

I'll rip out the recursive parts of negate_expr_p again at some point,
I just put it there as an exercise ;)

Richard.


Re: Stream ODR types

2014-09-22 Thread Richard Biener
On Wed, 17 Sep 2014, Jan Hubicka wrote:

> Hi,
> this patch renames types reported by Wodr during LTO bootstrap.
> 
> Bootrapping/regtesting in progress, OK if it passes?
> 
> Honza
> 
>   * tree-ssa-ccp.c (prop_value_d): Rename to ...
>   (ccp_prop_value_t): ... this one to avoid ODR violation; update uses.
>   * ipa-prop.c (struct type_change_info): Rename to ...
>   (prop_type_change_infoprop_type_change_info): ... this; update uses.

Seems a bit excessive ;)

Ok.

Thanks,
Richard.

>   * ggc-page.c (globals): Rename to ...
>   (static struct ggc_globals): ... this; update uses.
>   * tree-ssa-loop-im.c (mem_ref): Rename to ...
>   (im_mem_ref): ... this; update uses.
>   * ggc-common.c (loc_descriptor): Rename to ...
>   (ggc_loc_descriptor): ... this; update uses.
>   * lra-eliminations.c (elim_table): Rename to ...
>   (lra_elim_table): ... this; update uses.
>   * bitmap.c (output_info): Rename to ...
>   (bitmap_output_info): ... this; update uses.
>   * gcse.c (expr): Rename to ...
>   (gcse_expr) ... this; update uses.
>   (occr): Rename to ...
>   (gcse_occr): .. this; update uses.
>   * tree-ssa-copy.c (prop_value_d): Rename to ...
>   (prop_value_t): ... this.
>   * predict.c (block_info_def): Rename to ...
>   (block_info): ... this; update uses.
>   (edge_info_def): Rename to ...
>   (edge_info): ... this; update uses.
>   * profile.c (bb_info): Rename to ...
>   (bb_profile_info): ... this; update uses.
>   * alloc-pool.c (output_info): Rename to ...
>   (pool_output_info): ... this; update uses.
>   
> Index: tree-ssa-ccp.c
> ===
> --- tree-ssa-ccp.c(revision 215328)
> +++ tree-ssa-ccp.c(working copy)
> @@ -166,7 +166,7 @@ typedef enum
>VARYING
>  } ccp_lattice_t;
>  
> -struct prop_value_d {
> +struct ccp_prop_value_t {
>  /* Lattice value.  */
>  ccp_lattice_t lattice_val;
>  
> @@ -180,24 +180,22 @@ struct prop_value_d {
>  widest_int mask;
>  };
>  
> -typedef struct prop_value_d prop_value_t;
> -
>  /* Array of propagated constant values.  After propagation,
> CONST_VAL[I].VALUE holds the constant value for SSA_NAME(I).  If
> the constant is held in an SSA name representing a memory store
> (i.e., a VDEF), CONST_VAL[I].MEM_REF will contain the actual
> memory reference used to store (i.e., the LHS of the assignment
> doing the store).  */
> -static prop_value_t *const_val;
> +static ccp_prop_value_t *const_val;
>  static unsigned n_const_val;
>  
> -static void canonicalize_value (prop_value_t *);
> +static void canonicalize_value (ccp_prop_value_t *);
>  static bool ccp_fold_stmt (gimple_stmt_iterator *);
>  
>  /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX.  */
>  
>  static void
> -dump_lattice_value (FILE *outf, const char *prefix, prop_value_t val)
> +dump_lattice_value (FILE *outf, const char *prefix, ccp_prop_value_t val)
>  {
>switch (val.lattice_val)
>  {
> @@ -236,10 +234,10 @@ dump_lattice_value (FILE *outf, const ch
>  
>  /* Print lattice value VAL to stderr.  */
>  
> -void debug_lattice_value (prop_value_t val);
> +void debug_lattice_value (ccp_prop_value_t val);
>  
>  DEBUG_FUNCTION void
> -debug_lattice_value (prop_value_t val)
> +debug_lattice_value (ccp_prop_value_t val)
>  {
>dump_lattice_value (stderr, "", val);
>fprintf (stderr, "\n");
> @@ -272,10 +270,10 @@ extend_mask (const wide_int &nonzero_bit
> 4- Initial values of variables that are not GIMPLE registers are
>considered VARYING.  */
>  
> -static prop_value_t
> +static ccp_prop_value_t
>  get_default_value (tree var)
>  {
> -  prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 };
> +  ccp_prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 };
>gimple stmt;
>  
>stmt = SSA_NAME_DEF_STMT (var);
> @@ -343,10 +341,10 @@ get_default_value (tree var)
>  
>  /* Get the constant value associated with variable VAR.  */
>  
> -static inline prop_value_t *
> +static inline ccp_prop_value_t *
>  get_value (tree var)
>  {
> -  prop_value_t *val;
> +  ccp_prop_value_t *val;
>  
>if (const_val == NULL
>|| SSA_NAME_VERSION (var) >= n_const_val)
> @@ -366,7 +364,7 @@ get_value (tree var)
>  static inline tree
>  get_constant_value (tree var)
>  {
> -  prop_value_t *val;
> +  ccp_prop_value_t *val;
>if (TREE_CODE (var) != SSA_NAME)
>  {
>if (is_gimple_min_invariant (var))
> @@ -387,7 +385,7 @@ get_constant_value (tree var)
>  static inline void
>  set_value_varying (tree var)
>  {
> -  prop_value_t *val = &const_val[SSA_NAME_VERSION (var)];
> +  ccp_prop_value_t *val = &const_val[SSA_NAME_VERSION (var)];
>  
>val->lattice_val = VARYING;
>val->value = NULL_TREE;
> @@ -413,7 +411,7 @@ set_value_varying (tree var)
>For other constants, make sure to drop TREE_OVERFLOW.  */
>  
>  static void
> -canonicalize_value

Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.

2014-09-22 Thread Yuri Rumyantsev
Richard,

here is reduced patch (part.1) which was reduced almost twice.
Let's me also answer on your comments.

1. I really use edge field 'aux' to keep predicate for critical edges.
My previous code was not correct and now it looks like:

  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
/* Edge E is not critical,  use predicate of edge source bb. */
c = bb_predicate (b);
  else
/* Edge E is critical and its aux field contains predicate.  */
c = edge_predicate (e);

2. I completely delete all code related to creation of conditional
expressions and completely rely on bool pattern recognition in
vectorizer. But we need to delete all dead predicate computations
which are not used since they prevent vectorization. I will add this
local-dce function in next patch.
3. I also did not include in this patch recognition of general
phi-nodes with two arguments only for which conversion of conditional
scalar reduction can be applied also.
Note that all these changes are applied for loop marked with pragma
omp simd only.

2014-09-22  Yuri Rumyantsev  

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(convert_name_to_cmp): New function.
(add_to_predicate_list): Check unconditionally that bb is always
executed to early exit. Use predicate of cd-equivalent block
for join blocks if it exists.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if
destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also. Add check that if
fold_build2 produces bool conversion, recompute predicate using
build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_all_scalar_phis): Introduce new variable BEFORE.
Invoke find_insertion_point to initialize gsi and
predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
that extended predication must be applied).
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd and
FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
for blocks with two successors.




2014-09-08 17:10 GMT+04:00 Richard Biener :
> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev  wrote:
>> Richard!
>> Here is updated patch with the following changes:
>>
>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>> negate_predicate was deleted.
>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>> be critical.
>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>> blocks to simplify it.
>> 5. I decided to not design pre-pass since it will lead generating
>> chain of cond expressions for phi-node if conversion, whereas for phi
>> of kind
>>   x = PHI <1(2), 1(3), 2(4)>
>> only one cond expression is required and this is considered as simple
>> optimization for arbitrary phi-function. More precise,
>> if phi-function have only two different arguments and one of them has
>> single occurrence, if- conversion is performed as if phi have only 2
>> arguments.
>> For arbitrary phi function a chain of cond expressions is produced.
>>
>> Updated patch is attached.
>>
>> Any comments will be appreciated.
>
> The patch is still very big and does multiple things at once which makes
> it hard to review.
>
> In addition to that it changes function singatures without updating
> the function comments.  For example what is the convert_bool
> argument doing to add_to_dst_predicate_list?  Why do we need
> all this

Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Nathan Sidwell

On 09/21/14 18:58, Xinliang David Li wrote:


the intent is that that points to the gcov_info object of the object file
containing the live version of the function.  I couldn't quite get this to
work though -- it involves emitting a function's gcov_fn_info decl in the
same comdat group as the function itself.


Another problem is that comdat functions may have different CFGs due
to different early inline decisions. Comdatting gcov counters can lead
to problems in profile use. Not comdatting profile counters have
another advantage -- it allows context sensitive profiling for comdat
function inline instances (IPA-inline).


IIRC early inlining is done before the counters are created.  You're right later 
inlining may be a problem, and require a non-comdat set of cloned counters.   I 
can't recall exactly at what stage the counters are now inserted relative to 
inlining.  The CFG machinery had a number of significant changes while, and 
shortly after, I was working on this.



You'll see the checking of gfi_ptr->key != gi_ptr in libgcov-driver.c.

Are you making use of this machinery, or inventing new machinery?


Teresa's method is a different machinery -- it tries to propagate
profile data from the selected comdat copy + inline instance copies to
comdat copies with zero counts.


It'd be preferrable to complete the mechanism I outline above, rather than have 
a competing mechanism.  Also, this patch  is in effect lying because the data 
then makes it look like the unselected comdat instances are in fact being 
executed -- looking at the whole program it's going to be harder to understand 
whether the different inline instances are being executed multiple times, or are 
duplicate data.  Does the gcov user output indicate this subtlety in some way?


nathan


Re: ptx preliminary address space fixes [4/4]

2014-09-22 Thread Richard Biener
On Wed, Sep 17, 2014 at 10:15 PM, Bernd Schmidt  wrote:
> On 09/11/2014 01:41 PM, Richard Biener wrote:
>>
>> On Thu, Sep 11, 2014 at 12:12 PM, Bernd Schmidt 
>> wrote:
>>>
>>> This one isn't a wrong-code issue, just a missed optimization.  The
>>> strlen
>>> optimizations need to be made to look through ADDR_SPACE_CONVERT_EXPR to
>>> work on ptx.
>>>
>>> Bootstrapped and tested together with the other patches on x86_64-linux.
>>> Ok?
>>
>>
>> Did you try adding ADDR_SPACE_CONVERT_EXPR to the tree codes
>> handled in gimple_assign_cast_p?
>
>
> I did now (full test on x86_64, and also tested with ptx), and that also
> appears to work.  Ok?

Ok.

Thanks,
Richard.

>
> Bernd
>


Re: ptx preliminary rtl patches [3/4]

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 8:57 PM, Bernd Schmidt  wrote:
> On 09/12/2014 10:04 AM, Richard Biener wrote:
>>
>> On Thu, Sep 11, 2014 at 6:36 PM, Bernd Schmidt 
>> wrote:
>>>
>>> I strongly disagree. It's the same as for any other integer - there's one
>>> sign bit, and since there aren't any other bits, the number of sign bit
>>> copies is always exactly 1.
>>
>>
>> I agree about that.  But I fail to see what goes wrong with the existing
>> code in combine.  Maybe the code simply doesn't work for
>> GET_MODE_PRECISION != GET_MODE_BITSIZE?
>
>
> I had to debug it again - the patch was a year old. This time I came to the
> conclusion that we're just using the wrong mode. We're trying to simplify
> (ne:BI (reg:HI x) (const_int 0)), and the code here was using BImode when
> calling num_sign_bit_copies for the register - what I think it wants to do
> is verify that the operand consists of all zeros or all ones.
>
> Digging a bit further I noticed that some of the cases around this code have
> a mode == GET_MODE (op0) test. These were added by rth in commit 3573fd048,
> which added BImode. It looks like this particular case slipped through the
> cracks. The easiest way to fix it is the below - bootstrapped and tested on
> x86_64-linux, ok if it also works with ptx?

Ok.

Thanks,
Richard.

>
> Bernd
>


Re: Small fix for walking constructors

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law  wrote:
> On 09/18/14 13:01, Bernd Schmidt wrote:
>>
>> This fixes an issue on ptx where we fail to output a declaration for a
>> variable. The testcase is c-torture/compile/pr34856.c, and the cause of
>> the problem is that the variable g is never inserted into the varpool,
>> which is where a future patch will look for references to variables not
>> defined in the current translation unit (ptx assembly requires
>> declarations for these too).
>>
>> Bootstrapped and tested on x86_64-linux, ok?
>>
>>
>> Bernd
>>
>> walk-more.diff
>>
>>
>> commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
>> Author: Bernd Schmidt
>> Date:   Thu Sep 11 14:35:01 2014 +0200
>>
>>  Fix handling of CONSTRUCTORs in gimple-walk.
>>
>> * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
>> when
>> dealing with CONSTRUCTORs.
>
> OK.

Errr - certainly not.

It seems to me that walk_stmt_load_store_addr_ops is called on
bogus input.  The function is supposed to be called on GIMPLE
stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
conversions in their elements.

Please revert if you have applied already.

Thanks,
Richard.

> Jeff
>


Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.

2014-09-22 Thread Andreas Arnez
On Sat, Sep 20 2014, Mark Wielaard wrote:

> When adding DW_TAG_restrict_type I made a mistake when updating the
> code that handled types with multiple modifiers. This patch fixes it
> by putting the logic for finding the "sub-qualified" type in a separate
> function and fall back to adding the modifiers separately if there is
> no such existing type. The old tests didn't catch this case because
> there always was an existing sub-qualified type already. The new testcase
> fails before and succeeds after this patch.
>
> gcc/ChangeLog
>
>   * dwarf2out.c (existing_sub_qualified_type): New function.
>   (modified_type_die): Use existing_sub_qualified_type. Fall
>   back to adding modifiers one by one of there is no existing
>   sub-qualified type.
>
> gcc/testsuite/ChangeLog
>
>   * gcc.dg/guality/pr63300-const-volatile.c: New testcase.
> ---
>  gcc/dwarf2out.c| 85 
> ++
>  .../gcc.dg/guality/pr63300-const-volatile.c| 12 +++
>  2 files changed, 84 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/guality/pr63300-const-volatile.c
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index e87ade2..0cbc316 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -10461,6 +10461,51 @@ decl_quals (const_tree decl)
>? TYPE_QUAL_VOLATILE : TYPE_UNQUALIFIED));
>  }
>  
> +/* Returns true if CV_QUALS contains QUAL and we have a qualified
> +   variant of TYPE that has at least one other qualifier found in
> +   CV_QUALS.  Returns false if CV_QUALS doesn't contain QUAL, if
> +   CV_QUALS is empty after subtracting QUAL, or if we don't have a
> +   TYPE that has at least one qualifier from CV_QUALS minus QUAL.  */
> +static bool
> +existing_sub_qualified_type (tree type, int cv_quals, int qual)
> +{
> +  int sub_qual, sub_quals = cv_quals & ~qual;
> +  if ((cv_quals & qual) == TYPE_UNQUALIFIED || sub_quals == TYPE_UNQUALIFIED)
> +return false;
> +
> +  sub_qual = TYPE_QUAL_CONST;
> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  sub_qual = TYPE_QUAL_VOLATILE;
> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  sub_qual = TYPE_QUAL_RESTRICT;
> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  sub_qual = TYPE_QUAL_CONST & TYPE_QUAL_VOLATILE;

You probably mean '|' instead of '&' here.

> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  sub_qual = TYPE_QUAL_CONST & TYPE_QUAL_RESTRICT;

See above.

> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  sub_qual = TYPE_QUAL_VOLATILE & TYPE_QUAL_RESTRICT;

See above.

> +  if ((sub_quals & ~sub_qual) != TYPE_UNQUALIFIED
> +  && get_qualified_type (type, sub_quals & ~sub_qual) != NULL_TREE)
> +return true;
> +
> +  return false;
> +}

IIUC, 'sub_qual' above represents the qualifiers to *omit* from the ones
we're interested in, right?  Maybe it would be more straightforward to
reverse the logic, i.e., start with

sub_qual = TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT;

and then always use sub_qual instead of ~sub_qual.

Also note that the logic wouldn't scale too well for yet more
qualifiers...

> +
>  /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
> entry that chains various modifiers in front of the given type.  */
>  
> @@ -10543,34 +10588,48 @@ modified_type_die (tree type, int cv_quals, 
> dw_die_ref context_die)
>  
>mod_scope = scope_die_for (type, context_die);
>  
> -  if ((cv_quals & TYPE_QUAL_CONST)
> -  /* If there are multiple type modifiers, prefer a path which
> -  leads to a qualified type.  */
> -  && (((cv_quals & ~TYPE_QUAL_CONST) == TYPE_UNQUALIFIED)
> -   || get_qualified_type (type, cv_quals) == NULL_TREE
> -   || (get_qualified_type (type, cv_quals & ~TYPE_QUAL_CONST)
> -   != NULL_TREE)))
> +  /* If there are multiple type modifiers, prefer a path which
> + leads to a qualified type.  */
> +  if (existing_sub_qualified_type (type, cv_quals, TYPE_QUAL_CONST))
>  {
>mod_type_die = new_die (DW_TAG_const_type, mod_scope, type);
>sub_die = modified_type_die (type, cv_quals & ~TYPE_QUAL_CONST,
>  context_die);
>  }
> -  else if ((cv_quals & TYPE_QUAL_VOLATILE)
> -&& (((cv_quals & ~TYPE_QUAL_VOLATILE) == TYPE_UNQUALIFIED)
> -|| get_qualified_type (type, cv_quals) == NULL_TREE
> -|| (get_qualified_type (type, cv_quals & ~TYPE_QUAL_VOLATILE)
> -  

Re: Small fix for walking constructors

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener
 wrote:
> On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law  wrote:
>> On 09/18/14 13:01, Bernd Schmidt wrote:
>>>
>>> This fixes an issue on ptx where we fail to output a declaration for a
>>> variable. The testcase is c-torture/compile/pr34856.c, and the cause of
>>> the problem is that the variable g is never inserted into the varpool,
>>> which is where a future patch will look for references to variables not
>>> defined in the current translation unit (ptx assembly requires
>>> declarations for these too).
>>>
>>> Bootstrapped and tested on x86_64-linux, ok?
>>>
>>>
>>> Bernd
>>>
>>> walk-more.diff
>>>
>>>
>>> commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
>>> Author: Bernd Schmidt
>>> Date:   Thu Sep 11 14:35:01 2014 +0200
>>>
>>>  Fix handling of CONSTRUCTORs in gimple-walk.
>>>
>>> * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
>>> when
>>> dealing with CONSTRUCTORs.
>>
>> OK.
>
> Errr - certainly not.
>
> It seems to me that walk_stmt_load_store_addr_ops is called on
> bogus input.  The function is supposed to be called on GIMPLE
> stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
> conversions in their elements.
>
> Please revert if you have applied already.

For the testcase I can indeed see


  :
  pin_3 = {(unsigned int) (long int) &g[16]};

but that's invalid GIMPLE, unfortunately not caught by out checker.

Please fix the root cause and add checking to verify_gimple_assign_single.

Thanks,
Richard.



>
> Thanks,
> Richard.
>
>> Jeff
>>


Re: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Andrew Pinski
On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford
 wrote:
> This series is a cleaned-up version of:
>
> https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html
>
> The underlying problem is that the semantics of subregs depend on the
> word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
> but you can have a subreg for word 2 of a 4-word value (as well as lowpart
> subregs of that word, etc.).  This causes problems when an architecture has
> wider-than-word registers, since the addressability of a word can then depend
> on which register class is used.
>
> The register allocators need to fix up cases where a subreg turns out to
> be invalid for a particular class.  This is really an extension of what
> we need to do for CANNOT_CHANGE_MODE_CLASS.
>
> Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.


This sounds like something which should be tested on spu as it is the
main target that I can think of which has wider-than-word registers
and that has had issues with subreg.  I can't remember if the
simulator for SPU is free (as in beer) and would run on anything
besides PowerPC.  It has been more than 4 years since I looked into
the spu back-end also.

Thanks,
Andrew Pinski

>
> Thanks,
> Richard
>


Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)

2014-09-22 Thread Jan-Benedict Glaw
On Mon, 2014-09-22 08:58:34 +0100, James Greenhalgh  
wrote:
> On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote:
> > My build robot shows a new build error, which I guess is
> > caused/uncovered by your genrecog change on bfin-elf (see eg. build
> > http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667):
[...]
> > build/genrecog /home/jbglaw/repos/gcc/gcc/common.md 
> > /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \
> >   insn-conditions.md > tmp-recog.c
> > /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not 
> > supported in define_split
> > make[1]: *** [s-recog] Error 1
> > make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc'
> > make: *** [all-gcc] Error 2
> > 
> > Would be nice if the bfin maintainer or you would come up with a fix.
> 
> I posted a fix for this on Friday evening at:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html
> 
> I'm waiting for a bfin maintainer to say OK, as it isn't a port I know
> well.

Ah great!  I somehow missed to recognize your email, sorry for the
noise.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
the second  :


signature.asc
Description: Digital signature


Re: Fix ICE with ODR mering and variable sized types

2014-09-22 Thread Richard Biener
On Fri, Sep 19, 2014 at 8:55 PM, Jan Hubicka  wrote:
> Hi,
> this patch fixes ICE by avoiding mangling of types with variadic size (those 
> are
> not really supported).  Bootstrapped/regtested x86_64-linux, tested with 
> libreoffice,
> comitted.

Hmm, but how do global vars end up having variadic type?  Isn't the
bug that you are ending up with some local entity here?

Richard.

> PR lto/63286
> * tree.c (need_assembler_name_p): Do not mangle variadic types.
> Index: tree.c
> ===
> --- tree.c  (revision 215328)
> +++ tree.c  (working copy)
> @@ -5003,6 +5003,7 @@ need_assembler_name_p (tree decl)
>&& decl == TYPE_NAME (TREE_TYPE (decl))
>&& !is_lang_specific (TREE_TYPE (decl))
>&& AGGREGATE_TYPE_P (TREE_TYPE (decl))
> +  && !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)
>&& !type_in_anonymous_namespace_p (TREE_TYPE (decl)))
>  return !DECL_ASSEMBLER_NAME_SET_P (decl);
>/* Only FUNCTION_DECLs and VAR_DECLs are considered.  */


Re: Small fix for walking constructors

2014-09-22 Thread Bernd Schmidt

On 09/22/2014 11:00 AM, Richard Biener wrote:

It seems to me that walk_stmt_load_store_addr_ops is called on
bogus input.  The function is supposed to be called on GIMPLE
stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
conversions in their elements.

Please revert if you have applied already.


For the testcase I can indeed see


   :
   pin_3 = {(unsigned int) (long int) &g[16]};

but that's invalid GIMPLE, unfortunately not caught by out checker.

Please fix the root cause and add checking to verify_gimple_assign_single.


Hmm, fix how exactly? What representation do you want for an initializer 
where a pointer is cast to an int (or to a different address space, 
something that will be possible with patches I'll submit in the near 
future)?



Bernd



Re: Small fix for walking constructors

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 11:00 AM, Richard Biener
 wrote:
> On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener
>  wrote:
>> On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law  wrote:
>>> On 09/18/14 13:01, Bernd Schmidt wrote:

 This fixes an issue on ptx where we fail to output a declaration for a
 variable. The testcase is c-torture/compile/pr34856.c, and the cause of
 the problem is that the variable g is never inserted into the varpool,
 which is where a future patch will look for references to variables not
 defined in the current translation unit (ptx assembly requires
 declarations for these too).

 Bootstrapped and tested on x86_64-linux, ok?


 Bernd

 walk-more.diff


 commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
 Author: Bernd Schmidt
 Date:   Thu Sep 11 14:35:01 2014 +0200

  Fix handling of CONSTRUCTORs in gimple-walk.

 * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
 when
 dealing with CONSTRUCTORs.
>>>
>>> OK.
>>
>> Errr - certainly not.
>>
>> It seems to me that walk_stmt_load_store_addr_ops is called on
>> bogus input.  The function is supposed to be called on GIMPLE
>> stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
>> conversions in their elements.
>>
>> Please revert if you have applied already.
>
> For the testcase I can indeed see
>
>
>   :
>   pin_3 = {(unsigned int) (long int) &g[16]};
>
> but that's invalid GIMPLE, unfortunately not caught by out checker.
>
> Please fix the root cause and add checking to verify_gimple_assign_single.

I'm on it.  The reason for the invalid GIMPLE is the gimplifier which
says "well, looks like a constant for the target!" and doesn't gimplify
at all then.  Oops (but only for vectors?!).

Introduced by r118747 and only semi-fixed by r129739.  The original
rev. is also just a bugfix for an ICE.

Thus I am testing the following.

2014-09-22  Richard Biener  

* gimplify.c (gimplify_init_constructor): Do not leave
non-GIMPLE vector constructors around.
* tree-cfg.c (verify_gimple_assign_single): Verify that
CONSTRUCTORs have gimple elements.


Richard.






> Thanks,
> Richard.
>
>
>
>>
>> Thanks,
>> Richard.
>>
>>> Jeff
>>>


p
Description: Binary data


Re: Small fix for walking constructors

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 11:52 AM, Bernd Schmidt  wrote:
> On 09/22/2014 11:00 AM, Richard Biener wrote:
>>>
>>> It seems to me that walk_stmt_load_store_addr_ops is called on
>>> bogus input.  The function is supposed to be called on GIMPLE
>>> stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
>>> conversions in their elements.
>>>
>>> Please revert if you have applied already.
>>
>>
>> For the testcase I can indeed see
>>
>>
>>:
>>pin_3 = {(unsigned int) (long int) &g[16]};
>>
>> but that's invalid GIMPLE, unfortunately not caught by out checker.
>>
>> Please fix the root cause and add checking to verify_gimple_assign_single.
>
>
> Hmm, fix how exactly? What representation do you want for an initializer
> where a pointer is cast to an int (or to a different address space,
> something that will be possible with patches I'll submit in the near
> future)?

For the above

  _4 = (long int) &g[16];
  _5 = (unsigned int) _4;
  pin_3 = { _5 };

it's GIMPLE after all, not GENERIC.

Richard.

>
> Bernd
>


Re: [Patch bfin] Fixup use of constraints in define_split

2014-09-22 Thread Bernd Schmidt

On 09/19/2014 11:32 PM, James Greenhalgh wrote:

As with the earlier patch for sh
( https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01627.html ), this fixes the
fallout caused by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01615.html.

These are build failures, and the fixes are "obvious", but I don't know
my way around the failing ports, so I'd like an explicit maintainer ack.

For testing, I've just checked that the build error is resolved.


Looks obvious to me too. Thanks!


Bernd



Re: [PATCH] gcc-gdb-test.exp: Handle old GDB "short int" and "long int" types.

2014-09-22 Thread Jakub Jelinek
On Sat, Sep 20, 2014 at 11:21:25PM +0200, Mark Wielaard wrote:
> Old GDB might show short and long as short int and long int. This made 
> gcc.dg/guality/const-volatile.c ans restrict.c fail on older GDBs.
> According to the patch that changed this in newer versions of GDB
> this was a bug: https://sourceware.org/ml/gdb-patches/2012-09/msg00455.html
> 
> The patch transforms the types "short int" and "long int" coming from
> GDB to plain "short" and "long". And a variant has been added to the
> const-volatile.c testcase to make sure short and long long are handled
> correctly now with older GDB.
> 
> Tested against GDB 7.7.1 and 7.4.50.
> 
> gcc/testsuite/ChangeLog
> 
>   * lib/gcc-gdb-test.exp (gdb-test): Transform gdb types "short int"
>   and "long int" to plain "short" and "long".
>   * gcc.dg/guality/const-volatile.c (struct bar): New struct
>   containing short and long long fields.
>   (bar): New variable to test the type.

Ok, with a minor nit:

> --- a/gcc/testsuite/lib/gcc-gdb-test.exp
> +++ b/gcc/testsuite/lib/gcc-gdb-test.exp
> @@ -111,6 +111,10 @@ proc gdb-test { args } {
>   # Squash all extra whitespace/newlines that gdb might use for
>   # "pretty printing" into one so result is just one line.
>   regsub -all {[\n\r\t ]+} $type " " type
> + # Old gdb might output "long int" instead of just "long"
> + # and "short int" instead of just "short". Canonicalize.
> +regsub -all {\mlong int\M} $type "long" type
> +regsub -all {\mshort int\M} $type "short" type

Please fix whitespace on the above 2 lines, should be tab + 4 spaces
instead of 12 spaces.

Jakub


Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-22 Thread Jiong Wang

On 19/09/14 17:19, Jeff Law wrote:


On 09/19/14 10:02, Jiong Wang wrote:

On 19/09/14 16:49, Jeff Law wrote:


Probably.  Though I'd be a bit concerned with next_block->next_bb.
Wouldn't it be safer to stash away the relevant basic block prior to the
call to split_edge, then use that saved copy.  Something like this
(untested):

basic_block old_dest = live_edge->dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
  all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
   df_get_live_out (bb),
   df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in
the block chain.

Could you try that and see if it does what you need?

Jeff,

Thanks, verified, it works.

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of
  new created BB as the intersection of live-in from "old_dest" and live-out
  from "bb".
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..63deadf 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -217,12 +217,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   if (!df_live)
 	return false;

+  basic_block old_dest = live_edge->dest;
   next_block = split_edge (live_edge);

   /* We create a new basic block.  Call df_grow_bb_info to make sure
 	 all data structures are allocated.  */
   df_grow_bb_info (df_live);
-  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+
+  bitmap_and (df_get_live_in (next_block), df_get_live_out (bb),
+		  df_get_live_in (old_dest));
   df_set_bb_dirty (next_block);

   /* We should not split more than once for a function.  */

Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-22 Thread Jiong Wang

On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate "const"
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]
)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  ))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags
0xc0]  ))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI ("global_a") [flags 0xc0]  ))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]
)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for review?


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P 
(src) to
  release more instruction sink opportunities.

gcc/testsuite/
  * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..739e957 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "bb-reorder.h"
 #include "shrink-wrap.h"
 #include "regcprop.h"
+#include "rtl-iter.h"

 #ifdef HAVE_simple_return

@@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;

@@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+{
+  unsigned int reg_num = 0;
+  unsigned int nonconstobj_num = 0;
+  rtx src_inner = NULL_RTX;
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
+	{
+	  rtx x = *iter;
+	  if (REG_P (x))
+	{
+	  reg_num++;
+	  src_inner = x;
+	}
+	  else if (!CONSTANT_P (x) && OBJECT_P (x))
+	nonconstobj_num++;
+	}
+
+  if (nonconstobj_num > 0
+	  || reg_num > 1)
+	src = NULL_RTX;
+  else if (reg_num == 1)
+	src = src_inner;
+}
+
+  if (!REG_P (dest) || src == NULL_RTX
   /* STACK or FRAME related adjustment might be part of prologue.
 	 So keep them in the entry block.  */
   || dest == stack_pointer_rtx
@@ -188,10 +218,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;

   /* Make sure that the source register isn't defined later in BB.  */
-  sregno = REGNO (src);
-  end_sregno = END_REGNO (src);
-  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
-return false;
+  if (REG_P (src))
+{
+  sregno = REGNO (src);
+  end_sregno = END_REGNO (src);
+  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
+	return false;
+}

   /* Make sure that the destination register isn't referenced later in BB.  */
   dregno = REGNO (dest);
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
new file mode 100644
index 000..ad2e588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+extern char *asm_out_file;
+extern void default_elf_asm_output_ascii (char *, const char *, int);
+
+void
+assemble_string (const char *p, int size)
+{
+  int pos = 0;
+  int maximum = 2000;
+
+  while (pos < size)
+{
+  int thissize = size - pos;
+
+  if (thissize > maximum)
+	thissize = maximum;
+
+  default_elf_asm_output_ascii (asm_out_file, p, thissize);;
+
+  pos += thissize;
+  p += thissize;
+}
+}
+
+/* { dg-final { scan-rtl-dump "Performing shrink-wr

Re: [PATCH] remove duplicated lines in gcc/fortran/resolve.c

2014-09-22 Thread Dominique d'Humières

Le 21 sept. 2014 à 10:44, FX  a écrit :

>> AFAICT the lines 11200-11222 in gcc/fortran/resolve.c are a copy of
>> the lines 11176-11198.
> 
> The duplicates were introduced by revision 126468, an unrelated patch, after 
> the original commit of the code as 126466. It looks like a svn/patch mishap.

Thanks for tracking the origin of the problem. After having tested a patch, I 
usually do not revert it before doing the svn update when it is committed. If a 
last minute change has been made between the patch and the commit, it results 
in general a conflict during the svn update. However I have seen a couple times 
the patched tree is simply merged with the update leading to a duplicated piece 
of code.

For the record, while the two blocks were identical from a functional pout of 
view, their formatting were slightly different:

< "PUBLIC interface '%s' at %L takes "
< "dummy arguments of '%s' which is "
< "PRIVATE", iface->sym->name, 
---
> "PUBLIC interface '%s' at %L "
> "takes dummy arguments of '%s' which "
> "is PRIVATE", iface->sym->name, 

> 
>> The following patch removes the duplicated
>> lines. OK for the trunk?
> 
> You didn’t say if it was regtested. OK if it was (one can never be too sure!)

The patch has been in my working tree for months. I have posted recent test 
results with the patch (and r211089 reverted to fix pr61387) at

https://gcc.gnu.org/ml/gcc-testresults/2014-09/msg02018.html

> 
> Thanks,
> FX

This is what I have committed as revision r215452

Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog   (revision 215451)
+++ gcc/fortran/ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2014-09-21  Dominique d'Humieres 
+
+   * resolve.c (resolve_fl_procedure): Remove duplicated lines.
+
 2014-09-20  Alessandro Fanfarillo  
Tobias Burnus  
 
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 215451)
+++ gcc/fortran/resolve.c   (working copy)
@@ -11196,30 +11196,6 @@
}
 }
}
-
-  /* PUBLIC interfaces may expose PRIVATE procedures that take types
-PRIVATE to the containing module.  */
-  for (iface = sym->generic; iface; iface = iface->next)
-   {
- for (arg = gfc_sym_get_dummy_args (iface->sym); arg; arg = arg->next)
-   {
- if (arg->sym
- && arg->sym->ts.type == BT_DERIVED
- && !arg->sym->ts.u.derived->attr.use_assoc
- && !gfc_check_symbol_access (arg->sym->ts.u.derived)
- && !gfc_notify_std (GFC_STD_F2003, "Procedure '%s' in "
- "PUBLIC interface '%s' at %L takes "
- "dummy arguments of '%s' which is "
- "PRIVATE", iface->sym->name, 
- sym->name, &iface->sym->declared_at, 
- gfc_typename(&arg->sym->ts)))
-   {
- /* Stop this message from recurring.  */
- arg->sym->ts.u.derived->attr.access = ACCESS_PUBLIC;
- return false;
-   }
-}
-   }
 }
 
   if (sym->attr.function && sym->value && sym->attr.proc != PROC_ST_FUNCTION

Thanks for the review,

Dominique



Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence  wrote:
> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
>
> These are presently documented as producing a vector with the result in
> element 0, and this is inconsistent with their use in tree-vect-loop.c
> (which on bigendian targets pulls the bits out of the wrong end of the
> vector result). This leads to bugs on bigendian targets - see also
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.
>
> I discounted "fixing" the vectorizer (to read from element 0) and then
> making bigendian targets (whose architectural insn produces the result in
> lane N-1) permute the result vector, as optimization of vectors in RTL seems
> unlikely to remove such a permute and would lead to a performance
> regression.
>
> Instead it seems more natural for the tree code to produce a scalar result
> (producing a vector with the result in lane 0 has already caused confusion,
> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).
>
> However, this patch preserves the meaning of the optab (producing a result
> in lane 0 on little-endian architectures or N-1 on bigendian), thus
> generally avoiding the need to change backends. Thus, expr.c extracts an
> endianness-dependent element from the optab result to give the result
> expected for the tree code.
>
> Previously posted as an RFC
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
> VIEW_CONVERT_EXPR if the types of the reduction/result do not match.

Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.

Ok with that change.

Thanks,
Richard.

> Testing:
> x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
> aarch64-none-linux-gnu: bootstrap
> aarch64-none-elf:  check-gcc, check-g++
> arm-none-eabi: check-gcc
>
> aarch64_be-none-elf: check-gcc, showing
> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
> Passes the (previously-failing) reduced testcase on
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114
>
> Have also assembler/stage-1 tested that testcase on PowerPC, also
> fixed.

> gcc/ChangeLog:
>
> * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
> extract_bit_field around optab result.
>
> * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
> produce
> scalar not vector.
>
> * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
> type
> for REDUC_{MIN,MAX,PLUS}_EXPR.
>
> * tree-vect-loop.c (vect_analyze_loop): Update comment.
> (vect_create_epilog_for_reduction): For direct vector reduction, use
> result of tree code directly without extract_bit_field.
>
> * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
> comment.


[committed] Fix -fcompare-debug issue in simd clone creation (PR debug/63328)

2014-09-22 Thread Jakub Jelinek
Hi!

Obviously, it is a bad idea to emit gimple assign stmts for temporaries
needed by debug stmts, we have to emit debug temporaries instead, otherwise
we generate different code between -g0 and -g.
Tested on x86_64-linux, committed to trunk/4.9.

2014-09-22  Jakub Jelinek  

PR debug/63328
* omp-low.c (ipa_simd_modify_stmt_ops): For debug stmts
insert a debug source bind stmt setting DEBUG_EXPR_DECL
instead of a normal gimple assignment stmt.

* c-c++-common/gomp/pr63328.c: New test.

--- gcc/omp-low.c.jj2014-09-08 22:12:46.0 +0200
+++ gcc/omp-low.c   2014-09-22 11:44:47.751338842 +0200
@@ -11717,9 +11717,22 @@ ipa_simd_modify_stmt_ops (tree *tp, int
   if (tp != orig_tp)
 {
   repl = build_fold_addr_expr (repl);
-  gimple stmt
-   = gimple_build_assign (make_ssa_name (TREE_TYPE (repl), NULL), repl);
-  repl = gimple_assign_lhs (stmt);
+  gimple stmt;
+  if (is_gimple_debug (info->stmt))
+   {
+ tree vexpr = make_node (DEBUG_EXPR_DECL);
+ stmt = gimple_build_debug_source_bind (vexpr, repl, NULL);
+ DECL_ARTIFICIAL (vexpr) = 1;
+ TREE_TYPE (vexpr) = TREE_TYPE (repl);
+ DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (repl));
+ repl = vexpr;
+   }
+  else
+   {
+ stmt = gimple_build_assign (make_ssa_name (TREE_TYPE (repl),
+NULL), repl);
+ repl = gimple_assign_lhs (stmt);
+   }
   gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt);
   gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
   *orig_tp = repl;
--- gcc/testsuite/c-c++-common/gomp/pr63328.c.jj2014-09-22 
12:09:50.140724501 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr63328.c   2014-09-22 12:09:45.371745608 
+0200
@@ -0,0 +1,5 @@
+/* PR debug/63328 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp-simd -fno-strict-aliasing -fcompare-debug" } */
+
+#include "pr60823-3.c"

Jakub


Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 1:54 PM, Alan Lawrence  wrote:
> These match their corresponding tree codes, by taking a vector and returning
> a scalar; this is more architecturally neutral than the (somewhat loosely
> defined) previous optab that took a vector and returned a vector with the
> result in the least significant bits (i.e. element 0 for little-endian or
> N-1 for bigendian). However, the old optabs are preserved so as not to break
> existing backends, so clients check for both old + new optabs.
>
> Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
> aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
> (of course at this point in the series all these are using the old optab +
> migration path.)

scalar_reduc_to_vector misses a comment.

I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.

The optabs have only one mode - I wouldn't be surprised if an ISA
invents for example v4si -> di reduction?  So do we want to make
reduc_plus_scal_optab a little bit more future proof (maybe there
is already an ISA that supports this kind of reduction?).

Otherwise the patch looks good to me.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
> optabs, and note in reduc_[us](plus|min|max) to prefer the former.
>
> * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
> back to old reduc_... + BIT_FIELD_REF only if not.
>
> * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
> return the reduce-to-scalar (reduc_..._scal) optab.
> (scalar_reduc_to_vector): New.
>
> * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
> reduc_plus_scal_optab, reduc_umax_scal_optab,
> reduc_umin_scal_optab):
> New.
>
> * optabs.h (scalar_reduc_to_vector): Declare.
>
> * tree-vect-loop.c (vectorizable_reduction): Look for optabs
> reducing
> to either scalar or vector.


Re: [PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:19 PM, Alan Lawrence  wrote:
> For reduction operations (e.g. multiply) that don't have such a tree code
> ,or where the target platform doesn't define an optab handler for the tree
> code, we can perform the reduction using a series of log(N) shifts (where N
> = #elements in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift tree
> code (if the platform handles the vec_shr_optab).
>
> First stage is to add some tests of non-(min/max/plus) reductions; here,
> multiplies. The first is designed to be non-foldable, so we make sure the
> architectural instructions line up with what the tree codes specify. The
> second is designed to be easily constant-propagated, to test the (currently
> endianness-dependent) constant folding code.
>
> In lib/target-supports.exp, I've defined a new
> check_effective_target_whole_vector_shift, which I intended to define to
> true for platforms with the vec_shr optab. However, I've not managed to make
> this test pass on PowerPC - even with -maltivec, -fdump-tree-vect-details
> gives me a message about the target not supporting vector multiplication -
> so I've omitted PowerPC from the whole_vector_shift. This doesn't feel
> right, suggestions welcomed from PowerPC maintainers?
>
> Tests passing on arm-none-eabi and x86_64-none-linux-gnu;
> also verified the scan-tree-dump part works on ia64-none-linux-gnu (by
> compiling to assembly only).
> (Tests are not run on AArch64, because we have no vec_shr_optab at this
> point; PowerPC, as above; or MIPS, as check_effective_target_vect_int_mult
> yields 0.)

Ok.

Thanks,
Richard.

> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp
> (check_effective_target_whole_vector_shift):
> New.
>
> * gcc.dg/vect/vect-reduc-mul_1.c: New test.
> * gcc.dg/vect/vect-reduc-mul_2.c: New test.


Re: [PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:25 PM, Alan Lawrence  wrote:
> These are like the previous patch, but using | rather than * - I was unable
> to get the previous test to pass on PowerPC and MIPS.
>
> I note there is no inherent vector operation here - a bitwise OR across a
> word, and a "reduction via shifts" using scalar (not vector) ops would be
> all that's necessary. However, GCC doesn't exploit this possibility at
> present, and I don't have any plans at present to add such myself.
>
> Passing on x86_64-linux-gnu, aarch64-none-elf, aarch64_be-none-elf,
> arm-none-eabi.
> The 'scan-tree-dump' part passes on mips64 and powerpc (although the latter
> is disabled as check_effective_target_whole_vector_shift gives 0, as per
> previous patch)

Ok.

Thanks,
Richard.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-reduc-or_1.c: New test.
> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.


RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Sandiford
Sent: Monday, September 22, 2014 12:54 PM
To: Jeff Law
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Jeff Law  writes:
> On 09/19/14 01:23, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
>>> I thought we fixed these problems long ago with the change to subreg_byte?!?
>>
>> No, that was fixing something else.  (I'm just about old enough to 
>> remember that too!)  The problem here is that (say):
>>
>>  (subreg:SI (reg:DI X) 4)
>>
>> is independently addressable on little-endian AArch32 if X assigned 
>> to a GPR, but not if X is assigned to a vector register.  We need to 
>> allow these kinds of subreg on pseudos in order to decompose 
>> multiword arithmetic.  It's then up to the RA to realise that a 
>> reload would be needed if X were assigned to a vector register, since 
>> the upper half of a vector register cannot be independently accessed.
>>
>> Note that you could write this example even with the old word-style 
>> offsets and IIRC the effect would have been the same.
> OK.  So I kept thinking in terms of the byte offset stuff.  But what 
> you're tackling is related to the mess around the mode of the subreg 
> having a different meaning if its smaller than a word vs word-sized or 
> greater.
>
> Right?

>>Yeah, that's right.  Addressability is based on words, which is inconvenient 
>>when your registers are bigger than a word.

If the architecture like Microblaze which doesn't support  the 1 byte or 2 byte 
registers. In this scenario what should be returned when SUBREG_WORD is used.

Thanks,
Richard



Re: [PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:27 PM, Alan Lawrence  wrote:
> The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c
> (vect_create_epilog_for_reduction), to shift the vector by a whole number of
> elements. The tree code allows more general shifts but only for integral
> types. This only causes pain and difficulty for backends (particularly for
> backends with different endiannesses), and enforcing that restriction for
> integral types too does no harm.
>
> bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu
> check-gcc on aarch64-none-elf and x86_64-none-linux-gnu

Hmm, but then (coming from the tree / gimple level) all shifts can
be expressed with a VEC_PERM_EXPR.  And of course a general
whole-vector shift could be expressed using a VIEW_CONVERT_EXPR
to a 1-element integer vector and a regular [RL]SHIFT_EXPR and then
converting back.

So it seems to me that the vectorizer should instead emit a
VEC_PERM_EXPR (making sure the backends or the generic
vec_perm expansion code in optabs.c handles the whole-vector-shift
case in an optimal way).

The current VEC_RSHIFT_EXPR description lacks information
on what is shifted in btw (always zeros? the most significant bit (endian
dependent?!)).

So - can we instead remove VEC_[LR]SHIFT_EXPR?  Seems that
VEC_LSHIFT_EXPR is unused anyway, and thus vec_shl_optabs
as well.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-cfg.c (verify_gimple_assign_binary): for VEC_RSHIFT_EXPR (and
> VEC_LSHIFT_EXPR), require shifts to be by a whole number of elements
> for all types, rather than only non-integral types.
>
> * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): Update comment.
>
> * doc/md.texi (vec_shl_m, vec_shr_m): Update comment.
>


Re: [PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:45 PM, Alan Lawrence  wrote:
> The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR /
> vec_shr_optab to always shift the vector towards gcc's element 0. This fixes
> aarch64_be to do that.
>
> check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes
> all regressions produced by previous patch, i.e. no regressions from before
> redefining vec_shr).

Using vector permutes would have avoided this I guess?

Richard.

>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (vec_shr_ *2): Fix bigendian.
>
>


Re: [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:35 PM, Alan Lawrence  wrote:
> The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to
> have been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR
> and vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by
> the vectorizer, VEC_LSHIFT_EXPR expressions are not generated at all, so
> there seems little point in maintaining it.
>
> Bootstrapped on x86_64-unknown-linux-gnu.
> aarch64.exp+vect.exp on aarch64-none-elf and aarch64_be-none-elf.

Ah, there it is ;)

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * expr.c (expand_expr_real_2): Remove code handling VEC_LSHIFT_EXPR.
> * fold-const.c (const_binop): Likewise.
> * cfgexpand.c (expand_debug_expr): Likewise.
> * tree-inline.c (estimate_operator_cost, dump_generic_node,
> op_code_prio, op_symbol_code): Likewise.
> * tree-vect-generic.c (expand_vector_operations_1): Likewise.
> * optabs.c (optab_for_tree_code): Likewise.
> (expand_vec_shift_expr): Likewise, update comment.
> * tree.def: Delete VEC_LSHIFT_EXPR, remove comment.
> * optabs.h (expand_vec_shift_expr): Remove comment re.
> VEC_LSHIFT_EXPR.
> * optabs.def: Remove vec_shl_optab.
> * doc/md.texi: Remove references to vec_shr_m.


Re: [PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:48 PM, Alan Lawrence  wrote:
> Following earlier patches, vect_create_epilog_for_reduction contains exactly
> one case where extract_scalar_result==true. Hence, move the code 'if
> (extract_scalar_result)' there, and tidy-up/remove some variables.
>
> bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++.

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Move code for
> 'if (extract_scalar_result)' to the only place that it is true.


Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 2:42 PM, Alan Lawrence  wrote:
> The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
> general principles of tree. This patch updates fold-const and the vectorizer
> (the only place where such expressions are created), such that
> VEC_RSHIFT_EXPR always shifts towards element 0.
>
> The tree code still maps directly onto the vec_shr_optab, and so this patch
> *will break any bigendian platform defining the vec_shr optab*.
> --> For AArch64_be, patch follows next in series;
> --> For PowerPC, I think patch/rfc 15 should fix, please inspect;
> --> For MIPS, I think patch/rfc 16 should fix, please inspect.
>
> gcc/ChangeLog:
>
> * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
> element 0.
>
> * tree-vect-loop.c (vect_create_epilog_for_reduction): always
> extract
> the result of a reduction with vector shifts from element 0.
>
> * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift
> direction.
>
> * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.
>
> Testing Done:
>
> Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on
> aarch64-none-elf.

As said elsewhere I'd like the vectorizer to use VEC_PERM_EXPRs
and the generic vec_perm expansion machinery handle the
case where the permute can be expressed using the vec_shr_optab.
You'd have, for a 1-element shift of V4SI x, VEC_PERM 

I'd say that if the target says it can handle the constant permute just fine
then use the vec_perm_const expansion path.

Richard.


Re: Fix i386 FP_TRAPPING_EXCEPTIONS

2014-09-22 Thread Joseph S. Myers
On Fri, 19 Sep 2014, Joseph S. Myers wrote:

> On Thu, 18 Sep 2014, Joseph S. Myers wrote:
> 
> > On Thu, 18 Sep 2014, Uros Bizjak wrote:
> > 
> > > OK for mainline and release branches.
> > 
> > I've omitted ia64 from the targets in the testcase in the release branch 
> > version, given the lack of any definition of FP_TRAPPING_EXCEPTIONS at all 
> > there.
> > 
> > (I think a definition as (~_fcw & 0x3f) should work for ia64, but haven't 
> > tested that.)
> 
> Here is an *untested* patch with that definition.
> 
> 2014-09-19  Joseph Myers  
> 
>   PR target/63312
>   * config/ia64/sfp-machine.h (FE_EX_ALL, FP_TRAPPING_EXCEPTIONS):
>   New macros.

Now committed after Andreas's testing reporting in PR 63312.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-22 Thread Alan Lawrence

Well, I haven't looked into this in detail: I've gone only as far as
  * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad' 
compiles (r214043), finding that the critical difference is in the emit-rtl.o 
generated by r214043;
  *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries 
'fixed_regs + ', and that Richard Biener's changelog specifically 
mentions stripping signedness changes (and introduces the SIGN_NOPS).


However, I apply your patch (minus the hunk adding the (set_attr "type" load1"), 
this appears to have gone in already), and still see the same error message:


emit-rtl.o: In function `gen_rtx_REG':
emit-rtl.c:(.text+0x12f8): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

emit-rtl.o: In function `gen_rtx':
emit-rtl.c:(.text+0x1824): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

collect2: error: ld returned 1 exit status

and still see the same (suspicious-looking, although perhaps not convicted) 
relocations:


$ readelf --relocs 
benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep fixed_regs

12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs + 

1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

1828  005d0116 R_AARCH64_LDST8_A  fixed_regs + 

186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

I've also now bootstrapped my patch (STRIP_NOPS -> STRIP_SIGN_NOPS * 2) on 
aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no 
regressions, so would like to propose that patch for trunk...?


--Alan



Andrew Pinski wrote:

On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence  wrote:

We've been seeing errors using aarch64-none-linux-gnu gcc to build the
403.gcc benchmark from spec2k6, that we've traced back to this patch. The
error looks like:

/home/alalaw01/bootstrap_richie/gcc/xgcc
-B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
-DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o
conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o
insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o
ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o
getpwd.o ucbqsort.o -lm-o gcc
emit-rtl.o: In function `gen_rtx_REG':
emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
section in regclass.o
emit-rtl.o: In function `gen_rtx':
emit-rtl.c:(.text+0x1824): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
section in regclass.o
collect2: error: ld returned 1 exit status
specmake: *** [gcc] Error 1
Error with make 'specmake -j7 build': check file
'/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err'
  Command returned exit code 2
  Error with make!
*** Error building 403.gcc

Inspecting the compiled emit-rtl.o shows:

$ readelf --relocs good/emit-rtl.o | grep fixed_regs
12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac 005d0115 R_AARCH64_ADD_ABS 

[AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-09-22 Thread Alan Lawrence
Ok thanks Jeff. In that case I think I should draw this to the attention of the 
AArch64 maintainers to check the testsuite updates are OK before I commit...?


Methinks it may be possible to get further, or at least increase our confidence, 
if I "mock" out try_widen_shift_mode, and/or try injecting some dubious RTL from 
a builtin, although this'll only give a momentary snapshot of behaviour. I may 
or may not have time to look into this though ;)...


Cheers, Alan

Jeff Law wrote:

On 09/18/14 03:35, Alan Lawrence wrote:

Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. "Just" putting that check in, achieves
what I'm trying for here, so I'd be happy to go with the attached
patch and call it a day. However, I'm a little concerned about the
other cases - i.e. where shift_mode is wider than result_mode.
Let's go ahead and get the attached patch installed.  I'm pretty sure 
it's correct and I know you want to see something move forward here.  We 
can iterate further if we want.



If I understand correctly (and I'm not sure about that, let's see how
far I get), this means we'll perform the shift in (say) DImode, when
we're only really concerned about the lower (say) 32-bits (for an
originally-SImode shift).

That's the class of cases I'm concerned about.


  try_widen_shift_mode will in this case

check that the result of the operation *inside* the shift (in our
case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e.
that its *top* 32-bits are all equal to the original SImode sign bit.
 of these bits may be fed into the top of the desired SImode
result by the DImode shift. Right so far?

Correct.


AFAICT, num_sign_bit_copies for an XOR, conservatively returns the
minimum of the num_sign_bit_copies of its two operands. I'm not sure
whether this is behaviour we should rely on in its callers, or for
the sake of abstraction we should treat num_sign_bit_copies as a
black box (which does what it says on the, erm, tin).
Treat it as a black box.  It returns the number of known sign bit 
copies.  There may be more, but never less.




If the former, doesn't having num_sign_bit_copies >= the difference
in size between result_mode and shift_mode, of both operands to the
XOR, guarantee safety of the commutation (whether the constant is
positive or negative)? We could perform the shift (in the larger
mode) on both of the XOR operands safely, then XOR together their
lower parts.
I had convinced myself that when we flip the sign bit via the XOR and 
commute the XOR out that we invalidate the assumptions made when 
widening.  But I'm not so sure anymore.  Damn I hate changes made 
without suitable tests :(


I almost convinced myself the problem is in the adjustment of C2 in the 
widened case, but that's not a problem either.  At least not on paper.



If, however, we want to play safe and ensure that we deal safely with
 any XOR whose top (mode size difference + 1) bits were the same,
then I think the restriction that the XOR constant is positive is
neither necessary nor sufficient; rather (mirroring
try_widen_shift_mode) I think we need that num_sign_bit_copies of the
constant in shift_mode, is more than the size difference between
result_mode and shift_mode.
But isn't that the same?  Isn't the only case where it isn't the same 
when the constant has bits set that are outside the mode of the other 
operand?


Hmm, what about (xor:QI A -1)?  In that case -1 will be represented with 
bits outside the precision of QImode.



Hmmm. I might try that patch at some point, I think it is the right
check to make. (Meta-comment: this would be *so*much* easier if we
could write unit tests more easily!) In the meantime I'd be happy to
settle for the attached...
No argument on the unit testing comment.  It's a major failing in the 
design of GCC that we can't easily build a unit testing framework.


Jeff






Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 1:10 PM, Alan Lawrence  wrote:
> Well, I haven't looked into this in detail: I've gone only as far as
>   * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad'
> compiles (r214043), finding that the critical difference is in the
> emit-rtl.o generated by r214043;
>   *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries
> 'fixed_regs + ', and that Richard Biener's changelog specifically
> mentions stripping signedness changes (and introduces the SIGN_NOPS).
>
> However, I apply your patch (minus the hunk adding the (set_attr "type"
> load1"), this appears to have gone in already), and still see the same error
> message:
>
> emit-rtl.o: In function `gen_rtx_REG':
> emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
> section in regclass.o
> emit-rtl.o: In function `gen_rtx':
> emit-rtl.c:(.text+0x1824): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
> section in regclass.o
> collect2: error: ld returned 1 exit status
>
> and still see the same (suspicious-looking, although perhaps not convicted)
> relocations:
>
> $ readelf --relocs
> benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep
> fixed_regs
> 12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
> 12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
> 12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
> 
> 12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs +
> 
> 1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
> 
> 1828  005d0116 R_AARCH64_LDST8_A  fixed_regs +
> 
> 186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
> 1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
>
> I've also now bootstrapped my patch (STRIP_NOPS -> STRIP_SIGN_NOPS * 2) on
> aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no
> regressions, so would like to propose that patch for trunk...?

As Andrew said it certainly isn't a "fix" for the bug but only a workaround.
That said, I don't think keeping the sign-change strip is important (it was
just cleanup of that routine as I were there).

So - the patch is ok for trunk.  You still may want to fix the bug
though ;)

(I'd say it makes more sense then to remove stripping conversions
entirely)

Thanks,
Richard.

> --Alan
>
>
>
>
> Andrew Pinski wrote:
>>
>> On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence 
>> wrote:
>>>
>>> We've been seeing errors using aarch64-none-linux-gnu gcc to build the
>>> 403.gcc benchmark from spec2k6, that we've traced back to this patch. The
>>> error looks like:
>>>
>>> /home/alalaw01/bootstrap_richie/gcc/xgcc
>>> -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
>>> -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
>>> attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
>>> c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
>>> cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
>>> cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
>>> alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
>>> cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o
>>> combine.o
>>> conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
>>> diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
>>> emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
>>> function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
>>> hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o
>>> insn-extract.o
>>> insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
>>> jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
>>> params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
>>> reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
>>> resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
>>> sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
>>> ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
>>> tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o
>>> xcoffout.o
>>> ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
>>> xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o
>>> lbasename.o
>>> getpwd.o ucbqsort.o -lm-o gcc
>>> emit-rtl.o: In function `gen_rtx_REG':
>>> emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
>>> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
>>> section in regclass.o
>>> emit-rtl.o: In function `gen_rtx':

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-09-22 Thread Richard Biener
On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence  wrote:
> The end goal here is to remove this code from tree-vect-loop.c
> (vect_create_epilog_for_reduction):
>
>   if (BYTES_BIG_ENDIAN)
> bitpos = size_binop (MULT_EXPR,
>  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
> 1),
>  TYPE_SIZE (scalar_type));
>   else
>
> as this is the root cause of PR/61114 (see testcase there, failing on all
> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
> suspicious". The code snippet above is used on two paths:
>
> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
> reduc_[us](plus|min|max)_optab.
> The optab is documented as "the scalar result is stored in the least
> significant bits of operand 0", but the tree code as "the first element in
> the vector holding the result of the reduction of all elements of the
> operand". This mismatch means that when the tree code is folded, the code
> snippet above reads the result from the wrong end of the vector.
>
> The strategy (as per
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
> tree codes and optabs that produce scalar results directly; this seems
> better than tying (the element of the vector into which the result is
> placed) to (the endianness of the target), and avoids generating extra moves
> on current bigendian targets. However, the previous optabs are retained for
> now as a migration strategy so as not to break existing backends; moving
> individual platforms over will follow.
>
> A complication here is on AArch64, where we directly generate
> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
> remove this folding in order to decouple the midend and AArch64 backend.

Sounds fine.  I hope we can transition all backends for 5.0 and remove
the vector variant optabs (maybe renaming the scalar ones).

> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
> is defined in an endianness-dependent way, leading to significant
> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is
> never used!). Few platforms appear to handle vec_shr_optab (and fewer
> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
> the existing optab to be endianness-neutral.
>
> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
> updates that implementation to fit the new endianness-neutral specification,
> serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
> equivalents for MIPS and PowerPC; I haven't tested these but hope they act
> as useful pointers for the port maintainers.
>
> Finally patch 14 cleans up the affected part of tree-vect-loop.c
> (vect_create_epilog_for_reduction).

As said during the individual patches review I'd like the vectorizer to
use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
only whole-element amounts).  This means we can remove
VEC_RSHIFT_EXPR.  It also means that if the backend defines
vec_perm_const (which it really should) it can handle the special
permutes that boil down to a possibly more efficient vector shift
there (a good optimization anyway).  Until it does that all backends
would at least create correct code (with the endian dependent
vec_shr removed).

Richard.

> --Alan
>


Re: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Richard Sandiford
Ajit Kumar Agarwal  writes:
> Jeff Law  writes:
>> On 09/19/14 01:23, Richard Sandiford wrote:
>>> Jeff Law  writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
> This series is a cleaned-up version of:
>
>   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html
>
> The underlying problem is that the semantics of subregs depend on 
> the word size.  You can't have a subreg for byte 2 of a 4-byte 
> word, say, but you can have a subreg for word 2 of a 4-word value 
> (as well as lowpart subregs of that word, etc.).  This causes 
> problems when an architecture has wider-than-word registers, since 
> the addressability of a word can then depend on which register 
> class is used.
>
> The register allocators need to fix up cases where a subreg turns 
> out to be invalid for a particular class.  This is really an 
> extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.
>
> Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to 
 subreg_byte?!?
>>>
>>> No, that was fixing something else.  (I'm just about old enough to 
>>> remember that too!)  The problem here is that (say):
>>>
>>>  (subreg:SI (reg:DI X) 4)
>>>
>>> is independently addressable on little-endian AArch32 if X assigned 
>>> to a GPR, but not if X is assigned to a vector register.  We need to 
>>> allow these kinds of subreg on pseudos in order to decompose 
>>> multiword arithmetic.  It's then up to the RA to realise that a 
>>> reload would be needed if X were assigned to a vector register, since 
>>> the upper half of a vector register cannot be independently accessed.
>>>
>>> Note that you could write this example even with the old word-style 
>>> offsets and IIRC the effect would have been the same.
>> OK.  So I kept thinking in terms of the byte offset stuff.  But what 
>> you're tackling is related to the mess around the mode of the subreg 
>> having a different meaning if its smaller than a word vs word-sized or 
>> greater.
>>
>> Right?
>
>>>Yeah, that's right.  Addressability is based on words, which is
>>> inconvenient when your registers are bigger than a word.
>
> If the architecture like Microblaze which doesn't support the 1 byte or
> 2 byte registers. In this scenario what should be returned when
> SUBREG_WORD is used.

I don't understand the question sorry.  Subreg offsets are still represented
as bytes rather than words.  The patch doesn't change the way that subregs
are represented or the rules about which subregs are valid.

Both before and after the patch, the semantics of subregs say that if
you have 4-byte words, only one of:

(subreg:QI (reg:SI X) 0)
(subreg:QI (reg:SI X) 1)
(subreg:QI (reg:SI X) 2)
(subreg:QI (reg:SI X) 3)

is ever valid (0 for little-endian, 3 for big-endian).  Writing to that
one valid subreg will change the whole of X, unless the subreg is wrapped
in a strict_lowpart.  In other words, subregs are defined so that individual
parts of a word are not independently addressable.

However, individual words of a multiword register _are_ addressable.  I.e.:

   (subreg:SI (reg:DI Y) 0)
   (subreg:SI (reg:DI Y) 4)

are both valid.  Writing to one does not change the other.

The problem the patch was trying to solve was that you can have targets
with 4-byte words but some 8-byte registers.  In those cases, it's still
possible to form both of the Y subregs above if Y is allocated to a word-sized
register, but not if Y is allocated to a doubleword-sized register.

Thanks,
Richard



Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 1:21 PM, Richard Biener
 wrote:
> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence  wrote:
>> The end goal here is to remove this code from tree-vect-loop.c
>> (vect_create_epilog_for_reduction):
>>
>>   if (BYTES_BIG_ENDIAN)
>> bitpos = size_binop (MULT_EXPR,
>>  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
>> 1),
>>  TYPE_SIZE (scalar_type));
>>   else
>>
>> as this is the root cause of PR/61114 (see testcase there, failing on all
>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
>> suspicious". The code snippet above is used on two paths:
>>
>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
>> reduc_[us](plus|min|max)_optab.
>> The optab is documented as "the scalar result is stored in the least
>> significant bits of operand 0", but the tree code as "the first element in
>> the vector holding the result of the reduction of all elements of the
>> operand". This mismatch means that when the tree code is folded, the code
>> snippet above reads the result from the wrong end of the vector.
>>
>> The strategy (as per
>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
>> tree codes and optabs that produce scalar results directly; this seems
>> better than tying (the element of the vector into which the result is
>> placed) to (the endianness of the target), and avoids generating extra moves
>> on current bigendian targets. However, the previous optabs are retained for
>> now as a migration strategy so as not to break existing backends; moving
>> individual platforms over will follow.
>>
>> A complication here is on AArch64, where we directly generate
>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
>> remove this folding in order to decouple the midend and AArch64 backend.
>
> Sounds fine.  I hope we can transition all backends for 5.0 and remove
> the vector variant optabs (maybe renaming the scalar ones).
>
>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
>> is defined in an endianness-dependent way, leading to significant
>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is
>> never used!). Few platforms appear to handle vec_shr_optab (and fewer
>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
>> the existing optab to be endianness-neutral.
>>
>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
>> updates that implementation to fit the new endianness-neutral specification,
>> serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
>> equivalents for MIPS and PowerPC; I haven't tested these but hope they act
>> as useful pointers for the port maintainers.
>>
>> Finally patch 14 cleans up the affected part of tree-vect-loop.c
>> (vect_create_epilog_for_reduction).
>
> As said during the individual patches review I'd like the vectorizer to
> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
> only whole-element amounts).  This means we can remove
> VEC_RSHIFT_EXPR.  It also means that if the backend defines
> vec_perm_const (which it really should) it can handle the special
> permutes that boil down to a possibly more efficient vector shift
> there (a good optimization anyway).  Until it does that all backends
> would at least create correct code (with the endian dependent
> vec_shr removed).

It seems only Alpha completely lacks vec_perm_const but implements
vec_shr.

Richard.

> Richard.
>
>> --Alan
>>


Re: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Richard Sandiford
Andrew Pinski  writes:
> On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford
>  wrote:
>> This series is a cleaned-up version of:
>>
>> https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html
>>
>> The underlying problem is that the semantics of subregs depend on the
>> word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
>> but you can have a subreg for word 2 of a 4-word value (as well as lowpart
>> subregs of that word, etc.).  This causes problems when an architecture has
>> wider-than-word registers, since the addressability of a word can then depend
>> on which register class is used.
>>
>> The register allocators need to fix up cases where a subreg turns out to
>> be invalid for a particular class.  This is really an extension of what
>> we need to do for CANNOT_CHANGE_MODE_CLASS.
>>
>> Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
>
>
> This sounds like something which should be tested on spu as it is the
> main target that I can think of which has wider-than-word registers
> and that has had issues with subreg.  I can't remember if the
> simulator for SPU is free (as in beer) and would run on anything
> besides PowerPC.  It has been more than 4 years since I looked into
> the spu back-end also.

Well, AArch64 and x86_64 should be good enough targets for testing
the patch.  In the AArch64 case the bug was holding up other big-endian
fixes, in the x86_64 case it led to a workaround in C_C_M_C.

Thanks,
Richard



Re: [PATCH] Fix ubsan ICE on invalid (PR sanitizer/61272)

2014-09-22 Thread Marek Polacek
On Wed, May 21, 2014 at 08:58:57PM +0200, Jakub Jelinek wrote:
> On Wed, May 21, 2014 at 08:46:22PM +0200, Marek Polacek wrote:
> > 2014-05-21  Marek Polacek  
> > 
> > PR sanitizer/61272
> > * ubsan.c (is_ubsan_builtin_p): Turn assert into a condition.
> > 
> > * g++.dg/ubsan/pr61272.C: New test.
> 
> Ok, thanks.

Now committed to 4.9 branch as well to fix PR63323.

Marek


RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Monday, September 22, 2014 4:56 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Ajit Kumar Agarwal  writes:
> Jeff Law  writes:
>> On 09/19/14 01:23, Richard Sandiford wrote:
>>> Jeff Law  writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
> This series is a cleaned-up version of:
>
>   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html
>
> The underlying problem is that the semantics of subregs depend on 
> the word size.  You can't have a subreg for byte 2 of a 4-byte 
> word, say, but you can have a subreg for word 2 of a 4-word value 
> (as well as lowpart subregs of that word, etc.).  This causes 
> problems when an architecture has wider-than-word registers, since 
> the addressability of a word can then depend on which register 
> class is used.
>
> The register allocators need to fix up cases where a subreg turns 
> out to be invalid for a particular class.  This is really an 
> extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.
>
> Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to 
 subreg_byte?!?
>>>
>>> No, that was fixing something else.  (I'm just about old enough to 
>>> remember that too!)  The problem here is that (say):
>>>
>>>  (subreg:SI (reg:DI X) 4)
>>>
>>> is independently addressable on little-endian AArch32 if X assigned 
>>> to a GPR, but not if X is assigned to a vector register.  We need to 
>>> allow these kinds of subreg on pseudos in order to decompose 
>>> multiword arithmetic.  It's then up to the RA to realise that a 
>>> reload would be needed if X were assigned to a vector register, 
>>> since the upper half of a vector register cannot be independently accessed.
>>>
>>> Note that you could write this example even with the old word-style 
>>> offsets and IIRC the effect would have been the same.
>> OK.  So I kept thinking in terms of the byte offset stuff.  But what 
>> you're tackling is related to the mess around the mode of the subreg 
>> having a different meaning if its smaller than a word vs word-sized 
>> or greater.
>>
>> Right?
>
>>>Yeah, that's right.  Addressability is based on words, which is  
>>>inconvenient when your registers are bigger than a word.
>
> If the architecture like Microblaze which doesn't support the 1 byte 
> or
> 2 byte registers. In this scenario what should be returned when 
> SUBREG_WORD is used.

>>I don't understand the question sorry.  Subreg offsets are still represented 
>>as bytes rather than words.  The patch doesn't change the way that subregs 
>>are >>represented or the rules about which subregs are valid.

>>Both before and after the patch, the semantics of subregs say that if you 
>>have 4-byte words, only one of:

>>(subreg:QI (reg:SI X) 0)
>>(subreg:QI (reg:SI X) 1)
>>(subreg:QI (reg:SI X) 2)
>>(subreg:QI (reg:SI X) 3)

>>is ever valid (0 for little-endian, 3 for big-endian).  Writing to that one 
>>valid subreg will change the whole of X, unless the subreg is wrapped in a 
strict_lowpart.  In other words, subregs are defined so that individual 
>>parts of a word are not independently addressable.

>>However, individual words of a multiword register _are_ addressable.  I.e.:

   (subreg:SI (reg:DI Y) 0)
   (subreg:SI (reg:DI Y) 4)

>>are both valid.  Writing to one does not change the other.

>>The problem the patch was trying to solve was that you can have targets with 
>>4-byte words but some 8-byte registers.  In those cases, it's still possible 
>>to >>form both of the Y subregs above if Y is allocated to a word-sized 
>>register, but not if Y is allocated to a doubleword-sized register.

Thanks Richard for the explanation. 

Thanks,
Richard



[PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim

2014-09-22 Thread Peter A. Bigot
Based on discussion on the mspgcc-users mailing list[1], this patch
changes msp430 to not automatically apply -lnosys when -msim is absent,
as this prevents a user from supplying a custom system interface.

The existing behavior providing the CIO alternative can be obtained by
explicitly linking -lcio instead, assuming the corresponding newlib
patch[2] to move msp430's nosys implementation to libcio.a is also used.

gcc/ChangeLog
2014-09-22  Peter A. Bigot  

* config/msp430/msp430.h: Remove automatic -lnosys when -msim absent.

[1] http://www.mail-archive.com/mspgcc-users@lists.sourceforge.net/msg12104.html
[2] https://sourceware.org/ml/newlib/2014/msg00465.html

Cc: d...@redhat.com
Cc: ni...@redhat.com
Signed-off-by: Peter A. Bigot 
---
 gcc/config/msp430/msp430.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 91fc91c..068bdad 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -70,7 +70,6 @@ extern bool msp430x;
 -lgcc  \
 -lcrt  \
 %{msim:-lsim}  \
-%{!msim:-lnosys}   \
 --end-group\
 %{!T*:%{!msim:%{mmcu=*:--script=%*.ld}}}   \
 %{!T*:%{!msim:%{!mmcu=*:%Tmsp430.ld}}} \
-- 
1.8.5.5



[PATCH] Fix missing gimplification of vector constructors

2014-09-22 Thread Richard Biener

The following fixes non-GIMPLE constructors slipping through the
gimplifier.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-09-22  Richard Biener  

* gimplify.c (gimplify_init_constructor): Do not leave
non-GIMPLE vector constructors around.
* tree-cfg.c (verify_gimple_assign_single): Verify that
CONSTRUCTORs have gimple elements.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 215450)
+++ gcc/gimplify.c  (working copy)
@@ -4021,12 +4021,6 @@ gimplify_init_constructor (tree *expr_p,
break;
  }
 
-   /* Don't reduce an initializer constant even if we can't
-  make a VECTOR_CST.  It won't do anything for us, and it'll
-  prevent us from representing it as a single constant.  */
-   if (initializer_constant_valid_p (ctor, type))
- break;
-
TREE_CONSTANT (ctor) = 0;
  }
 
Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 215450)
+++ gcc/tree-cfg.c  (working copy)
@@ -4207,8 +4233,20 @@ verify_gimple_assign_single (gimple stmt
  debug_generic_stmt (rhs1);
  return true;
}
+ if (!is_gimple_val (elt_v))
+   {
+ error ("vector CONSTRUCTOR element is not a GIMPLE value");
+ debug_generic_stmt (rhs1);
+ return true;
+   }
}
}
+  else if (CONSTRUCTOR_NELTS (rhs1) != 0)
+   {
+ error ("non-vector CONSTRUCTOR with elements");
+ debug_generic_stmt (rhs1);
+ return true;
+   }
   return res;
 case OBJ_TYPE_REF:
 case ASSERT_EXPR:



Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-22 Thread Alan Lawrence

Richard Biener wrote:


Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.


Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic*** 
with void *. (This testcase otherwise fails the verify_gimple_assign_unary check 
in tree-cfg.c .)   However, test passes also with your suggestion of NOP_EXPR so 
that's good by me.


***that is, computes the minimum

--Alan



Ok with that change.

Thanks,
Richard.


Testing:
x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
aarch64-none-linux-gnu: bootstrap
aarch64-none-elf:  check-gcc, check-g++
arm-none-eabi: check-gcc

aarch64_be-none-elf: check-gcc, showing
FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
Passes the (previously-failing) reduced testcase on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

Have also assembler/stage-1 tested that testcase on PowerPC, also
fixed.



gcc/ChangeLog:

* expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
extract_bit_field around optab result.

* fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
produce
scalar not vector.

* tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
type
for REDUC_{MIN,MAX,PLUS}_EXPR.

* tree-vect-loop.c (vect_analyze_loop): Update comment.
(vect_create_epilog_for_reduction): For direct vector reduction, use
result of tree code directly without extract_bit_field.

* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
comment.







Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-22 Thread Alan Lawrence

Richard Biener wrote:


scalar_reduc_to_vector misses a comment.


Ok to reuse the comment in optabs.h in optabs.c also?


I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.


Yes, that sounds like a plan, the _scal is a bit of a mouthful.


The optabs have only one mode - I wouldn't be surprised if an ISA
invents for example v4si -> di reduction?  So do we want to make
reduc_plus_scal_optab a little bit more future proof (maybe there
is already an ISA that supports this kind of reduction?).


That sounds like a plausible thing for an ISA to do, indeed. However given these 
names are only used by the autovectorizer rather than directly, the question is 
what the corresponding source code looks like, and/or what changes to the 
autovectorizer we might have to make to (look for code to) exploit such an 
instruction. At this point I could go for a 
reduc_{plus,min_max}_scal_ which reduces from the first vector mode 
to the second scalar mode, and then make the vectorizer look only for cases 
where the second mode was the element type of the first; but I'm not sure I want 
to do anything more complicated than that at this stage. (However, indeed it 
would leave the possibility open for the future.)


--Alan



Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-22 Thread Bill Schmidt
On Thu, 2014-09-18 at 09:12 -0400, David Edelsohn wrote:
> On Thu, Sep 18, 2014 at 8:42 AM, Alan Lawrence  wrote:
> > The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
> > general principles of tree. This patch updates fold-const and the vectorizer
> > (the only place where such expressions are created), such that
> > VEC_RSHIFT_EXPR always shifts towards element 0.
> >
> > The tree code still maps directly onto the vec_shr_optab, and so this patch
> > *will break any bigendian platform defining the vec_shr optab*.
> > --> For AArch64_be, patch follows next in series;
> > --> For PowerPC, I think patch/rfc 15 should fix, please inspect;
> > --> For MIPS, I think patch/rfc 16 should fix, please inspect.
> >
> > gcc/ChangeLog:
> >
> > * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
> > element 0.
> >
> > * tree-vect-loop.c (vect_create_epilog_for_reduction): always
> > extract
> > the result of a reduction with vector shifts from element 0.
> >
> > * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift
> > direction.
> >
> > * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.
> >
> > Testing Done:
> >
> > Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on
> > aarch64-none-elf.
> 
> Why wasn't this tested on the PowerLinux system in the GCC Compile Farm?
> 
> Also, Bill Schmidt can help check the PPC parts fo the patches.

Sorry for the late response; I just returned from vacation.  I think
that patch 15 looks reasonable on the surface, but would be more
comfortable if it had been tested.  I would echo David's suggestion that
you please test this on gcc110 in the compile farm to avoid surprises.
Given the similarity between vec_shl_ and vec_shr_ I am ok
with removing the former; it won't be difficult to re-create it later if
needed.

Please add some of the language you used above about VEC_RSHIFT_EXPR as
commentary for vec_shr_ in vector.md, as right-shifting towards
element zero is not an obvious concept on a BE machine.

Thanks,
Bill

> 
> Thanks, David
> 




[AArch64] Auto-generate the "BUILTIN_" macros for aarch64-builtins.c

2014-09-22 Thread James Greenhalgh

On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote:
> On 18/09/14 10:53, James Greenhalgh wrote:
> > +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: 
> > $(srcdir)/config/aarch64/geniterators.sh \
> > + $(srcdir)/config/aarch64/iterators.md
> > + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
> > + $(srcdir)/config/aarch64/iterators.md > \
> > + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h
> > +
> >  aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) 
> > $(SYSTEM_H) \
> >  coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H)
> >   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> >
>
> Is there any real need to write this into the source directory and have
> the built file checked in?  Ie. can't we always write to the build
> directory and use it from there.  That avoids problems if the sources
> are on a read-only filesystem.
>
> If we do need to leave it in the sources, then contrib/update_gcc should
> be taught how to touch the generated file when resyncing from the
> repositories.
>

I thought I had tried this and failed to make it work. I must not have
been trying hard enough at the time.

Updated as attached, generating the header in the build directory. It
looks much better this way!

Bootstrapped on aarch64-none-linux-gnueabi with no issues.

Ok?

Thanks,
James

---
gcc/

2014-09-22  James Greenhalgh  

* config/aarch64/geniterators.sh: New.
* config/aarch64/iterators.md (VDQF_DF): New.
* config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h.
* config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6a77d29..6b9c383 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -313,91 +313,7 @@ typedef struct
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
 
-/* BUILTIN_ macros should expand to cover the same range of
-   modes as is given for each define_mode_iterator in
-   config/aarch64/iterators.md.  */
-
-#define BUILTIN_DX(T, N, MAP) \
-  VAR2 (T, N, MAP, di, df)
-#define BUILTIN_GPF(T, N, MAP) \
-  VAR2 (T, N, MAP, sf, df)
-#define BUILTIN_SDQ_I(T, N, MAP) \
-  VAR4 (T, N, MAP, qi, hi, si, di)
-#define BUILTIN_SD_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, hi, si)
-#define BUILTIN_V2F(T, N, MAP) \
-  VAR2 (T, N, MAP, v2sf, v2df)
-#define BUILTIN_VALL(T, N, MAP) \
-  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df)
-#define BUILTIN_VALLDI(T, N, MAP) \
-  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df, di)
-#define BUILTIN_VALLDIF(T, N, MAP) \
-  VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df, di, df)
-#define BUILTIN_VB(T, N, MAP) \
-  VAR2 (T, N, MAP, v8qi, v16qi)
-#define BUILTIN_VD1(T, N, MAP) \
-  VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df)
-#define BUILTIN_VDC(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df)
-#define BUILTIN_VDIC(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VDN(T, N, MAP) \
-  VAR3 (T, N, MAP, v4hi, v2si, di)
-#define BUILTIN_VDQ(T, N, MAP) \
-  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
-#define BUILTIN_VDQF(T, N, MAP) \
-  VAR3 (T, N, MAP, v2sf, v4sf, v2df)
-#define BUILTIN_VDQF_DF(T, N, MAP) \
-  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
-#define BUILTIN_VDQH(T, N, MAP) \
-  VAR2 (T, N, MAP, v4hi, v8hi)
-#define BUILTIN_VDQHS(T, N, MAP) \
-  VAR4 (T, N, MAP, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQIF(T, N, MAP) \
-  VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df)
-#define BUILTIN_VDQM(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQV(T, N, MAP) \
-  VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si)
-#define BUILTIN_VDQQH(T, N, MAP) \
-  VAR4 (T, N, MAP, v8qi, v16qi, v4hi, v8hi)
-#define BUILTIN_VDQ_BHSI(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQ_I(T, N, MAP) \
-  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
-#define BUILTIN_VDW(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VD_BHSI(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VD_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, v4hi, v2si)
-#define BUILTIN_VQ(T, N, MAP) \
-  VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df)
-#define BUILTIN_VQN(T, N, MAP) \
-  VAR3 (T, N, MAP, v8hi, v4si, v2di)
-#define BUILTIN_VQW(T, N, MAP) \
-  VAR3 (T, N, MAP, v16qi, v8hi, v4si)
-#define BUILTIN_VQ_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, v8hi, v4si)
-#define BUILTIN_VQ_S(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VSDQ_HSI(T, N, MAP) \
-  VAR6 (T, N, MAP, v4hi, v8hi, v2si, v4si, hi, si)
-#define BUILTIN_VSDQ_I(T, N, MAP) \
-  VAR11 (T, N, MAP, v8qi, v1

[patch] Implement move semantics for iostreams

2014-09-22 Thread Jonathan Wakely

This adds move and swap functions to the iostream classes.

Although this is a pretty large patch, it's a pure addition that only
affects C++11 mode, and should have no effect on existing code because
it won't be moving or swapping streams.

I wanted to use C++14's std::exchange so I added std::__exchange to
 and made std::exchange forward to that.

I needed to add a new constructor to basic_ostream that doesn't call
init(0), for basic_iostream's move constructor to use.  (I wonder why
our non-standard default constructors for basic_istream and
basic_ostream call init(nullptr), rather than doing nothing.  Derived
classes that want init(nullptr) to be called can do that by passing
nullptr to the standard basic_istream and basic_ostream constructors
taking a pointer, which would allow the default constructors to be
re-purposed to intentionally leave the object uninitialized).

To ensure that the explicit instantiations in the library include the
new functions I had to move several files from src/c++98 to src/c++11,
which makes the patch huge, so the new tests are in a separate,
gzipped file to keep this post below the mailing list size limits.

Tested x86_64-linux, committed to trunk.

commit ac4314b3a2451147601385e26f19bb95b84c69d0
Author: Jonathan Wakely 
Date:   Fri Sep 12 19:56:06 2014 +0100

Make streams movable and swappable.

	PR libstdc++/54316
	PR libstdc++/53626
	* config/abi/pre/gnu.ver: Add new exports.
	* config/io/basic_file_stdio.h (__basic_file): Support moving and
	swapping.
	* include/bits/basic_ios.h (basic_ios::move, basic_ios::swap):
	Likewise.
	* include/bits/ios_base.h (ios_base::_M_move, ios_base::_M_swap):
	Likewise.
	* include/bits/fstream.tcc (basic_filebuf): Likewise.
	* include/bits/move.h (__exchange): Define for C++11 mode.
	* include/ext/stdio_filebuf.h (stdio_filebuf): Support moving and
	swapping.
	* include/ext/stdio_sync_filebuf.h (stdio_sync_filebuf): Likewise.
	* include/std/fstream (basic_filebuf, basic_ifstream, basic_ofstream,
	basic_fstream): Likewise.
	* include/std/ios: Remove whitespace.
	* include/std/istream (basic_istream, basic_iostream): Support moving
	and swapping.
	* include/std/ostream (basic_ostream): Likewise.
	* include/std/sstream (basic_stringbuf, basic_istringstream,
	basic_ostringstream, basic_stringstream): Likewise.
	* include/std/streambuf (basic_streambuf): Do not default copy
	constructor and assignment on first declaration.
	* include/std/utility (exchange): Forward to __exchange.
	* testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New.
	* src/c++11/Makefile.am: Add stream-related files.
	* src/c++11/Makefile.in: Regenerate.
	* src/c++11/ext11-inst.cc (stdio_filebuf, stdio_sync_filebuf):
	New file for explicit instantiation definitions.
	* src/c++11/ios.cc: Move from src/c++98 to here.
	(ios_base::_M_move, ios_base::_M_swap): Define.
	* src/c++11/ios-inst.cc: Move from src/c++98 to here.
	* src/c++11/iostream-inst.cc: Likewise.
	* src/c++11/istream-inst.cc: Likewise.
	* src/c++11/ostream-inst.cc: Likewise.
	* src/c++11/sstream-inst.cc: Likewise.
	* src/c++11/streambuf-inst.cc: Likewise.
	* src/c++98/Makefile.am: Remove stream-related files.
	* src/c++98/Makefile.in: Regenerate.
	* src/c++98/ext-inst.cc (stdio_filebuf): Remove explicit
	instantiations.
	* src/c++98/misc-inst.cc (stdio_sync_filebuf): Likewise.
	* src/c++98/ios-inst.cc: Move to src/c++11/.
	* src/c++98/ios.cc: Move to src/c++11/.
	* src/c++98/iostream-inst.cc: Likewise.
	* src/c++98/istream-inst.cc: Likewise.
	* src/c++98/ostream-inst.cc: Likewise.
	* src/c++98/sstream-inst.cc: Likewise.
	* src/c++98/streambuf-inst.cc: Likewise.
	* testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New.
	* testsuite/27_io/basic_fstream/cons/move.cc: New.
	* testsuite/27_io/basic_fstream/assign/1.cc: New.
	* testsuite/27_io/basic_ifstream/cons/move.cc: New.
	* testsuite/27_io/basic_ifstream/assign/1.cc: New.
	* testsuite/27_io/basic_istringstream/assign/1.cc: New.
	* testsuite/27_io/basic_istringstream/cons/move.cc: New.
	* testsuite/27_io/basic_ofstream/cons/move.cc: New.
	* testsuite/27_io/basic_ofstream/assign/1.cc: New.
	* testsuite/27_io/basic_ostringstream/assign/1.cc: New.
	* testsuite/27_io/basic_ostringstream/cons/move.cc: New.
	* testsuite/27_io/basic_stringstream/assign/1.cc: New.
	* testsuite/27_io/basic_stringstream/cons/move.cc: New.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 41fac71..669e36d 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -989,7 +989,8 @@ GLIBCXX_3.4.10 {
 _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE6stosscEv;
 
 _ZN9__gnu_cxx18stdio_sync_filebufI[cw]St11char_traitsI[cw]EE4syncEv;
-_ZN9__gnu

Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-22 Thread Richard Biener
On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence  wrote:
> Richard Biener wrote:
>>
>>
>> scalar_reduc_to_vector misses a comment.
>
>
> Ok to reuse the comment in optabs.h in optabs.c also?

Sure.

>> I wonder if at the end we wouldn't transition all backends and then
>> renaming reduc_*_scal_optab back to reduc_*_optab makes sense.
>
>
> Yes, that sounds like a plan, the _scal is a bit of a mouthful.
>
>> The optabs have only one mode - I wouldn't be surprised if an ISA
>> invents for example v4si -> di reduction?  So do we want to make
>> reduc_plus_scal_optab a little bit more future proof (maybe there
>> is already an ISA that supports this kind of reduction?).
>
>
> That sounds like a plausible thing for an ISA to do, indeed. However given
> these names are only used by the autovectorizer rather than directly, the
> question is what the corresponding source code looks like, and/or what
> changes to the autovectorizer we might have to make to (look for code to)
> exploit such an instruction.

Ah, indeed.  Would be sth like a REDUC_WIDEN_SUM_EXPR or so.

> At this point I could go for a
> reduc_{plus,min_max}_scal_ which reduces from the first vector
> mode to the second scalar mode, and then make the vectorizer look only for
> cases where the second mode was the element type of the first; but I'm not
> sure I want to do anything more complicated than that at this stage.
> (However, indeed it would leave the possibility open for the future.)

Yeah, agreed.  For the min/max case a widen variant isn't useful anyway.

Thanks,
Richard.

> --Alan
>


Re: FW: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2014-09-22 Thread Thomas Schwinge
Hi!

On Tue, 27 Aug 2013 21:30:49 +, "Iyer, Balaji V"  
wrote:
> --- /dev/null
> +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c
> @@ -0,0 +1,37 @@
> +/* { dg-do run  { target { i?86-*-* x86_64-*-* arm*-*-* } } } */
> +/* { dg-options "-fcilkplus" } */
> +/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* arm*-*-* } } } */
> +
> +void f0(volatile int *steal_flag)
> +{ 
> +  int i = 0;
> +  /* Wait for steal_flag to be set */
> +  while (!*steal_flag) 
> +;
> +}
> +
> +int f1()
> +{
> +
> +  volatile int steal_flag = 0;
> +  _Cilk_spawn f0(&steal_flag);
> +  steal_flag = 1;  // Indicate stolen
> +  _Cilk_sync; 
> +  return 0;
> +}
> +
> +void f2(int q)
> +{
> +  q = 5;
> +}
> +
> +void f3()
> +{
> +   _Cilk_spawn f2(f1());
> +}
> +
> +int main()
> +{
> +  f3();
> +  return 0;
> +}

Is this really well-formed Cilk Plus code?  Running with CILK_NWORKERS=1,
or -- the equivalent -- in a system with just one CPU (as per
libcilkrts/runtime/os-unix.c:__cilkrts_hardware_cpu_count returning 1), I
see this test busy-loop as follows:

Breakpoint 1, __cilkrts_hardware_cpu_count () at 
../../../source/libcilkrts/runtime/os-unix.c:358
358 {
(gdb) return 1
Make __cilkrts_hardware_cpu_count return now? (y or n) y
#0  cilkg_get_user_settable_values () at 
../../../source/libcilkrts/runtime/global_state.cpp:385
385 CILK_ASSERT(hardware_cpu_count > 0);
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
f0 (steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
9 while (!*steal_flag) 
(gdb) info threads
  Id   Target Id Frame 
* 1Thread 0x77fcd780 (LWP 30816) "spawning_arg.ex" f0 
(steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
(gdb) list
4
5   void f0(volatile int *steal_flag)
6   { 
7 int i = 0;
8 /* Wait for steal_flag to be set */
9 while (!*steal_flag) 
10  ;
11  }
12
13  int f1()
(gdb) bt
#0  f0 (steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
#1  0x004009c8 in _cilk_spn_0 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17
#2  0x00400a4b in f1 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17
#3  0x00400d0e in _cilk_spn_1 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30
#4  0x00400d7a in f3 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30
#5  0x00400e33 in main () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:35

No additional thread has been spawned by libcilkrts, and the one initial
thread is stuck in f0, without being able to make progress.  Should in
f0's while loop, some function be called to "yield to libcilkrts
scheduler", or should libcilkrts have spawned an additional thread, or is
the test case just not valid Cilk Plus code?


Grüße,
 Thomas


pgpG_DU89ebEh.pgp
Description: PGP signature


Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin

2014-09-22 Thread Michael Matz
Hi,

On Fri, 19 Sep 2014, Aldy Hernandez wrote:

> Michael, I really don't understand the need for this change in your original
> patch.  I don't know if this was a temporary testing change or what.

I'm pretty sure it was temporary testing, when I still was finding my way 
through dwarf2out limitations/constraints.

> I'm happy to report that with this and the last set of patches, both C 
> and C++ guality tests have <= regressions than mainline.  Yay.

Super.


Ciao,
Michael.


[PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition

2014-09-22 Thread Felix Yang
Hi,

I find that update_equiv_regs in ira.c sets the wrong EQUIV
reg-note for pseudo with more than one definiton in certain situation.
Here is a simplified RTL snippet after this function finishs handling:

 (insn 77 37 78 2 (set (reg:SI 171)
 (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp}
  (expr_list:REG_EQUAL (const_int 0 [0])
 (nil)))

..

(insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64])
 (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp}
  (expr_list:REG_DEAD (reg:SI 171)
 (nil)))
(insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32])
 (reg:SI 163)) 52 {movsi_internal_dsp}
  (expr_list:REG_DEAD (reg:SI 163)
 (expr_list:REG_DEAD (reg/f:SI 162)
 (nil
(insn 54 53 14 2 (set (reg:SI 171)
 (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4  S4
A32])) ticket151.c:49 52 {movsi_internal_dsp}
  (expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI ("*.LC8")
[flags 0x2]) [4  S4 A32])
 (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI ("*.LC8")
[flags 0x2]) [4  S4 A32])
 (nil


The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined
in insn 77 with a differerent value.
This may causes reload replacing pseudo 171 with mem/u/c:SI
(symbol_ref/u:SI ("*.LC8"), which is wrong.
A proposed patch for this issue, please comment:

 Index: gcc/ira.c
===
--- gcc/ira.c(revision 215460)
+++ gcc/ira.c(working copy)
@@ -3477,18 +3477,26 @@ update_equiv_regs (void)
   no_equiv (dest, set, NULL);
   continue;
 }
+
   /* Record this insn as initializing this register.  */
   reg_equiv[regno].init_insns
 = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns);

   /* If this register is known to be equal to a constant, record that
  it is always equivalent to the constant.  */
-  if (DF_REG_DEF_COUNT (regno) == 1
-  && note && ! rtx_varies_p (XEXP (note, 0), 0))
+  if (note && ! rtx_varies_p (XEXP (note, 0), 0))
 {
-  rtx note_value = XEXP (note, 0);
-  remove_note (insn, note);
-  set_unique_reg_note (insn, REG_EQUIV, note_value);
+  if (DF_REG_DEF_COUNT (regno) == 1)
+{
+  rtx note_value = XEXP (note, 0);
+  remove_note (insn, note);
+  set_unique_reg_note (insn, REG_EQUIV, note_value);
+}
+  else
+{
+  no_equiv (dest, set, NULL);
+  continue;
+}
 }

   /* If this insn introduces a "constant" register, decrease the priority


Cheers,
Felix


Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Michael Eager

On 09/21/14 21:10, Chen Gang wrote:

On 9/22/14 2:09, Michael Eager wrote:


Generally, you should use "gcc" to link programs, not "ld".  gcc is
a driver which will select the appropriate libraries and support routines
(such as crt0.o, which contains _start) and pass them to the linker.



OK, thanks.

When gcc, it misses the root directory for "crt1.o" and "crtn.o": e.g.
"/lib/ld.so.1", "crt1.o", "crtn.o" when gcc -v, but we need "/upstream/
release/lib/ld.so.1", "/upstream/lib/crt1.o", "/upstream/libcrtn.o".


You likely need to build mb-gcc with --sysroot=/upstream.

How are you building gcc?  What are your configuration options?


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Michael Eager

On 09/21/14 20:55, Chen Gang wrote:



On 9/22/14 2:03, Michael Eager wrote:

On 09/20/14 23:24, Chen Gang wrote:

And it seems, we also need 'LinkScr.ld' for ldscript, could you share it
to me, thanks.

set_board_info ldscript "-T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld"


Hi Chen --

The DejaGNU configuration I provided is for a bare-metal environment.

If you are testing in a Linux environment, the tool chain you uses
should provide a default linker script which matches your hardware's
memory layout. You should not need to provide a separate linker script.



OK, thanks, I shall try to find the default linker script for it.


If you are running mb-gcc which generates executables which run on
the target, you do not need to provide a linker script.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: Move dwarf2 frame tables to read-only section for AIX

2014-09-22 Thread Joseph S. Myers
On Mon, 22 Sep 2014, Andrew Dixie wrote:

> I altered the dwarf2 frame and exception table generation so the
> decision on whether to use a read-only or read-write section is an
> independent decision from how the frame tables are registered.
> I renamed EH_FRAME_IN_DATA_SECTION to EH_FRAME_THROUGH_COLLECT2, as it
> now supports read-only, has slightly changed semantics, and I think
> this name better reflects what it currently does rather than what it
> historically did.

If you rename a target macro, the old target macro name needs to be 
poisoned in system.h.

> 2014-09-22  Andrew Dixie  
> 
>   Move exception tables to read-only memory on AIX.
>* dwarf2asm.c (dw2_asm_output_encoded_addr_rtx): Add call to
>ASM_OUTPUT_DWARF_DATAREL.
>* dwarf2out.c (switch_to_eh_frame_section): Use a read-only section
>even if EH_FRAME_SECTION_NAME is undefined.  Add call to
>EH_FRAME_THROUGH_COLLECT2.
>* except.c (switch_to_exception_section):  Use a read-only section
>even if EH_FRAME_SECTION_NAME is undefined.
>   * collect2.c (write_c_file_stat): Provide dbase on AIX.
>(scan_prog_file): Don't output __dso_handle nor __gcc_unwind_dbase.
>* config/rs6000/aix.h (ASM_PREFERRED_EH_DATA_FORMAT): define.
>(EH_TABLES_CAN_BE_READ_ONLY): define.
>(ASM_OUTPUT_DWARF_PCREL): define.
>(ASM_OUTPUT_DWARF_DATAREL): define.
>(EH_FRAME_IN_DATA_SECTION): undefine.
>(EH_FRAME_THROUGH_COLLECT2): define.
>* config/rs6000/rs6000-aix.c: new file.
>(rs6000_aix_asm_output_dwarf_pcrel): new function.
>(rs6000_aix_asm_output_dwarf_datarel): new function.
>* config/rs6000/rs6000.c (rs6000_xcoff_asm_init_sections): remove
>assignment of exception_section.

This ChangeLog entry seems very incomplete.  It doesn't mention the 
changes for other architectures, or to defaults.h, or to the 
documentation, for example.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING] [PATCH] Add direct support for Linux kernel __fentry__ patching

2014-09-22 Thread Andi Kleen
Andi Kleen  writes:

Ping!

> Andi Kleen  writes:
>
> Ping!
>
>> From: Andi Kleen 
>>
>> The Linux kernel dynamically patches in __fentry__ calls in and
>> out at runtime. This allows using function tracing for debugging
>> in production kernels without (significant) performance penalty.
>>
>> For this it needs a table pointing to each __fentry__ call.
>>
>> The way it is currently implemented is that a special
>> perl script scans the object file, generates the table in a special
>> section. When the kernel boots up it nops the calls, and
>> then later patches in the calls again as needed.
>>
>> The recordmcount.pl script in the kernel works, but it seems
>> cleaner and faster to support the code generation of the patch table
>> directly in gcc.
>>
>> This also allows to nop the calls directly at code generation
>> time, which allows to skip a patching step at kernel boot.
>> I also expect that a patchable production tracing facility is also useful
>> for other applications.
>>
>> For example it could be used in ftracer
>> (https://github.com/andikleen/ftracer)
>>
>> Having a nop area at the beginning of each function can be also
>> also useful for other things. For example it can be used to patch
>> functions at runtime to point to different functions, to do
>> binary updates without restarting the program (like ksplice or
>> similar)
>>
>> This patch implements two new options for the i386 target:
>>
>> -mrecord-mcount
>> Generate a __mcount_loc section entry for each __fentry__ or mcount
>> call. The section is compatible with the kernel convention
>> and the data is put into a section loaded at runtime.
>>
>> -mnop-mcount
>> Generate the mcount/__fentry__ call as 5 byte nop that can be
>> patched in later. The nop is generated as a single instruction,
>> as the Linux kernel run time patching relies on this.
>>
>> Limitations:
>> - I didn't implement -mnop-mcount for -fPIC. This
>> would need a good single instruction 6 byte NOP and it seems a
>> bit pointless, as the patching would prevent text sharing.
>> - I didn't implement noping for targets that pass a variable
>> to mcount.
>> - The facility could be useful on architectures too. Currently
>> the mcount code is target specific, so I made it a i386 option.
>>
>> Passes bootstrap and testing on x86_64-linux.
>>
>> Cc: rost...@goodmis.org
>>
>> gcc/:
>>
>> 2014-09-01  Andi Kleen  
>>
>>  * config/i386/i386.c (x86_print_call_or_nop): New function.
>>  (x86_function_profiler): Support -mnop-mcount and
>>  -mrecord-mcount.
>>  * config/i386/i386.opt (-mnop-mcount, -mrecord-mcount): Add
>>  * doc/invoke.texi: Document -mnop-mcount, -mrecord-mcount
>>  * testsuite/gcc/gcc.target/i386/nop-mcount.c: New file.
>>  * testsuite/gcc/gcc.target/i386/record-mcount.c: New file.
>> ---
>>  gcc/config/i386/i386.c| 34 
>> +++
>>  gcc/config/i386/i386.opt  |  9 +++
>>  gcc/doc/invoke.texi   | 17 +-
>>  gcc/testsuite/gcc.target/i386/nop-mcount.c| 24 +++
>>  gcc/testsuite/gcc.target/i386/record-mcount.c | 24 +++
>>  5 files changed, 102 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/nop-mcount.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/record-mcount.c
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 61b33782..a651aa1 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3974,6 +3974,13 @@ ix86_option_override_internal (bool main_args_p,
>>  }
>>  }
>>  
>> +#ifndef NO_PROFILE_COUNTERS
>> +  if (flag_nop_mcount)
>> +error ("-mnop-mcount is not compatible with this target");
>> +#endif
>> +  if (flag_nop_mcount && flag_pic)
>> +error ("-mnop-mcount is not implemented for -fPIC");
>> +
>>/* Accept -msseregparm only if at least SSE support is enabled.  */
>>if (TARGET_SSEREGPARM_P (opts->x_target_flags)
>>&& ! TARGET_SSE_P (opts->x_ix86_isa_flags))
>> @@ -39042,6 +39049,17 @@ x86_field_alignment (tree field, int computed)
>>return computed;
>>  }
>>  
>> +/* Print call to TARGET to FILE.  */
>> +
>> +static void
>> +x86_print_call_or_nop (FILE *file, const char *target)
>> +{
>> +  if (flag_nop_mcount)
>> +fprintf (file, "1:\tnopl 0x00(%%eax,%%eax,1)\n"); /* 5 byte nop.  */
>> +  else
>> +fprintf (file, "1:\tcall\t%s\n", target);
>> +}
>> +
>>  /* Output assembler code to FILE to increment profiler label # LABELNO
>> for profiling a function entry.  */
>>  void
>> @@ -39049,7 +39067,6 @@ x86_function_profiler (FILE *file, int labelno 
>> ATTRIBUTE_UNUSED)
>>  {
>>const char *mcount_name = (flag_fentry ? MCOUNT_NAME_BEFORE_PROLOGUE
>>   : MCOUNT_NAME);
>> -
>>if (TARGET_64BIT)
>>  {
>>  #ifndef NO_PROFILE_COUNTERS
>> @@ -39057,9 +39074,9 @@ x86_function_profiler (FILE *file, int labelno 
>> ATTRIBUTE_UNUSED)
>>  #

[patch] Fix std::try_lock behaviour

2014-09-22 Thread Jonathan Wakely

When I fixed std::try_lock a few years ago I misread the spec,
exceptions should not be caught and turned into a return value.

Tested x86_64-linux, committed to trunk.

commit 5effca670aa009c60e31b639604da4d00f388038
Author: Jonathan Wakely 
Date:   Thu Sep 18 16:15:54 2014 +0100

	* include/std/mutex (try_lock): Do not swallow exceptions.
	* testsuite/30_threads/try_lock/4.cc: Fix test.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index f6b851c..d80fa5a 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -630,12 +630,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   int __idx;
   auto __locks = std::tie(__l1, __l2, __l3...);
-  __try
-  { __try_lock_impl<0>::__do_try_lock(__locks, __idx); }
-  __catch(const __cxxabiv1::__forced_unwind&)
-  { __throw_exception_again; }
-  __catch(...)
-  { }
+  __try_lock_impl<0>::__do_try_lock(__locks, __idx);
   return __idx;
 }
 
diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
index 7741798..1212b65 100644
--- a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
+++ b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
@@ -133,8 +133,15 @@ void test03()
   while (unreliable_lock::throw_on < 3)
   {
 unreliable_lock::count = 0;
-int failed = std::try_lock(l1, l2, l3);
-VERIFY( failed == unreliable_lock::throw_on );
+try
+  {
+std::try_lock(l1, l2, l3);
+VERIFY( false );
+  }
+catch (int e)
+  {
+VERIFY( e == unreliable_lock::throw_on );
+  }
 ++unreliable_lock::throw_on;
   }
 }


Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Teresa Johnson
On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell  wrote:
> On 09/21/14 18:58, Xinliang David Li wrote:
>
>>> the intent is that that points to the gcov_info object of the object file
>>> containing the live version of the function.  I couldn't quite get this
>>> to
>>> work though -- it involves emitting a function's gcov_fn_info decl in the
>>> same comdat group as the function itself.
>>
>>
>> Another problem is that comdat functions may have different CFGs due
>> to different early inline decisions. Comdatting gcov counters can lead
>> to problems in profile use. Not comdatting profile counters have
>> another advantage -- it allows context sensitive profiling for comdat
>> function inline instances (IPA-inline).
>
>
> IIRC early inlining is done before the counters are created.  You're right
> later inlining may be a problem, and require a non-comdat set of cloned
> counters.   I can't recall exactly at what stage the counters are now
> inserted relative to inlining.  The CFG machinery had a number of
> significant changes while, and shortly after, I was working on this.
>
>>> You'll see the checking of gfi_ptr->key != gi_ptr in libgcov-driver.c.
>>>
>>> Are you making use of this machinery, or inventing new machinery?
>>
>>
>> Teresa's method is a different machinery -- it tries to propagate
>> profile data from the selected comdat copy + inline instance copies to
>> comdat copies with zero counts.
>
>
> It'd be preferrable to complete the mechanism I outline above, rather than
> have a competing mechanism.

I don't think the above mechanism helps the problem my patches are
trying to solve. Unless we are in whole-program mode, which we don't
use, the only profiles available at profile-use time are those for the
given module (and any other modules in the same module group in LIPO
mode). If the COMDAT copy selected by the linker in the profile-gen
binary is in a different module, we would see all-zero counts when
compiling modules containing the other copies. I had submitted some
patches to trunk awhile back in the 4.9 time frame to help deal with
this by using estimated frequencies for zero-count COMDAT copies, and
applying scaled counts when we inline them, but it is an imperfect
solution.

The approach we now take for LIPO builds is to propagate the counts
for the profiled copy of the COMDAT to other modules. (Additionally
the indirect call profiling we perform in LIPO mode would point to a
module that we didn't have access to, which is a related issue that
the COMDAT fixups we perform at the end of the LIPO profiling run are
trying to solve.)

> Also, this patch  is in effect lying because
> the data then makes it look like the unselected comdat instances are in fact
> being executed -- looking at the whole program it's going to be harder to
> understand whether the different inline instances are being executed
> multiple times, or are duplicate data.  Does the gcov user output indicate
> this subtlety in some way?

Correct in that it makes it look like these copies were executed. This
was causing some issues when we rewrote/merged profiles with
gcov-tool, which essentially operates in whole-program mode. To handle
this, this patch marks the modified (previously all-zero) copies in
the gcda file. So now gcov-tool can handle them appropriately (clear
them on read before doing any analysis), and gcov-dump will flag them.
My patch does not do anything special for these routines when they are
read into the profile-use build, because we do want the propagated
counts during optimization. Possibly in whole-program mode they should
be cleared on read just as in gcov-tool, or they could be flagged in
some way for downstream phases, but it is not a compilation mode we
are using so I have not experimented.

Thanks,
Teresa

>
> nathan



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jason Merrill

On 09/12/2014 08:04 PM, Jakub Jelinek wrote:

I've been worried about the quick cases where
parallelization is not beneficial, like make check-gcc \
RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN
in that case.


I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt 
shell function:


rgt ()
{
( cd ~/m/$CANON/gcc/gcc;
make check-c++ ${1:+RUNTESTFLAGS="$*"} )
}

If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in 
the explosion of shell verbosity.  Could we add some '@'s to more of the 
rules, perhaps?


Jason



[patch] Update C++11 library status docs

2014-09-22 Thread Jonathan Wakely

This documents some more C++11 features as incomplete and notes that
iostreams are movable now.

Committed to trunk.
commit 8de61687f31c20aec35c02393c4fbb44ddcc0cbb
Author: Jonathan Wakely 
Date:   Mon Sep 22 15:57:28 2014 +0100

	* doc/xml/manual/status_cxx2011.xml: Update C++11 status.
	* doc/xml/manual/status_cxx2014.xml: Update TS status.
	* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index f0a256d..4433c89 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -600,16 +600,18 @@ particular release.
   
 
 
+  
   20.6.12.3
   uninitialized_fill
-  Y
-  
+  Partial
+  Returns void..
 
 
+  
   20.6.12.4
   uninitialized_fill_n
-  Y
-  
+  Partial
+  Returns void..
 
 
   20.6.13
@@ -1183,10 +1185,11 @@ particular release.
   
 
 
+  
   22.3.3.1
   Character classification
-  Y
-  
+  Partial
+  Missing isblank.
 
 
   22.3.3.2
@@ -1639,10 +1642,11 @@ particular release.
   
 
 
+  
   25.3
   Mutating sequence operations
-  Y
-  
+  Partial
+  rotate returns void.
 
 
   25.4
@@ -2049,10 +2053,13 @@ particular release.
   
 
 
+  
   26.8
   C Library
-  Y
-  
+  Partial
+   doesn't include
+
+  
 
 
   
@@ -2129,7 +2136,6 @@ particular release.
   Iostreams base classes
   Partial
   
-Missing move and swap operations on basic_ios.
 Missing io_errc and iostream_category.
 ios_base::failure is not derived from system_error.
 	Missing ios_base::hexfloat.
@@ -2147,23 +2153,20 @@ particular release.
   Formatting and manipulators
   Partial
   
-Missing move and swap operations
 Missing get_time and put_time manipulators.
   
 
 
-  
   27.8
   String-based streams
-  Partial
-  Missing move and swap operations
+  Y
+  
 
 
-  
   27.9
   File-based streams
-  Partial
-  Missing move and swap operations
+  Y
+  
 
 
   
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
index 82abd88..11254d6 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
@@ -402,18 +402,6 @@ not in any particular release.
 
   
   
-	http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3923.pdf";>
-	  N3923
-	
-  
-  A SFINAE-Friendly std::iterator_traits
-  N
-  Library Fundamentals TS
-
-
-
-  
-  
 	http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3925.pdf";>
 	  N3925
 	


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jakub Jelinek
On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:
> On 09/12/2014 08:04 PM, Jakub Jelinek wrote:
> >I've been worried about the quick cases where
> >parallelization is not beneficial, like make check-gcc \
> >RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN
> >in that case.
> 
> I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt
> shell function:
> 
> rgt ()
> {
> ( cd ~/m/$CANON/gcc/gcc;
> make check-c++ ${1:+RUNTESTFLAGS="$*"} )
> }
> 
> If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
> explosion of shell verbosity.  Could we add some '@'s to more of the rules,
> perhaps?

I've been considering that too, but not sure what info people find valuable
and what they don't.

Jakub


Re: [PATCH, i386, Pointer Bounds Checker 32/x] Pointer Bounds Checker hooks for i386 target

2014-09-22 Thread Ilya Enkovich
On 19 Sep 18:21, Uros Bizjak wrote:
> On Fri, Sep 19, 2014 at 2:53 PM, Ilya Enkovich  wrote:
> 
> >> > This patch adds i386 target hooks for Pointer Bounds Checker.
> 
> > New version with fixes and better documentation for ix86_load_bounds and 
> > ix86_store_bounds is below.
> 
> > +/* Expand pass uses this hook to load bounds for function parameter
> > +   PTR passed in SLOT in case its bounds are not passed in a register.
> > +
> > +   If SLOT is a memory, then bounds are loaded as for regular pointer
> > +   loaded from memory.  PTR may be NULL in case SLOT is a memory.
> > +   In such case value of PTR (if required) may be loaded from SLOT.
> > +
> > +   If SLOT is NULL or a register then SLOT_NO is an integer constant
> > +   holding number of the target dependent special slot which should be
> > +   used to obtain bounds.
> > +
> > +   Return loaded bounds.  */
> 
> OK, I hope I understand this target-handling of SLOT_NO. Can you
> please clarify when SLOT is a register?

For functions with more than four pointers passed in registers we do not have 
enough bound registers to pass bounds.  These hooks are called then with SLOT 
set to register used to pass pointer

> 
> I propose to write this function in the following (hopefully equivalent) way:

Since addr computation is very similar for both loading and storing (the only 
difference is usage of either arg_pointer_rtx or stack_pointer_rtx) I decided 
additionally move it into a separate function.  This should make functions 
simplier for understanding.

> 
> --cut here--
> {
>   if (!slot)
> {
>   gcc_assert (CONST_INT_P (slot_no));
>   addr = plus_constant (Pmode, arg_pointer_rtx,
> -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode));
>   gcc_assert (ptr);
> }
>   else if (REG_P (slot))
> {
>   gcc_assert (CONST_INT_P (slot_no));
>   addr = plus_constant (Pmode, arg_pointer_rtx,
> -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode));
>   ptr = slot;
> }
>   else if (MEM_P (slot))
> {
>   addr = XEXP (slot, 0);
>   if (!register_operand (addr, Pmode))
> addr = copy_addr_to_reg (addr);
> 
>   if (!ptr)
> ptr = copy_addr_to_reg (slot);
> }
>   else
> gcc_unreachable ();
> 
>   if (!index_register_operand (ptr, Pmode))
> ptr = copy_addr_to_reg (ptr);
> 
>   ...
> }
> --cut here--
> 
> Please add a comment in each of if/else, explaining what the code is
> doing. This is non-trivial to understand.
> 
> > +
> > +static rtx
> > +ix86_load_bounds (rtx slot, rtx ptr, rtx slot_no)
> > +{
> > +  rtx addr = NULL;
> > +  rtx reg;
> > +
> > +  if (!ptr)
> > +{
> > +  gcc_assert (MEM_P (slot));
> > +  ptr = copy_addr_to_reg (slot);
> > +}
> > +
> > +  if (!slot || REG_P (slot))
> > +{
> > +  if (slot)
> > +   ptr = slot;
> > +
> > +  gcc_assert (CONST_INT_P (slot_no));
> > +
> > +  /* Here we have the case when more than four pointers are
> > +passed in registers.  In this case we are out of bound
> > +registers and have to use bndldx to load bound.  RA,
> > +RA - 8, etc. are used for address translation in bndldx.  */
> > +  addr = plus_constant (Pmode, arg_pointer_rtx,
> > +   -(INTVAL (slot_no) + 1) * GET_MODE_SIZE 
> > (Pmode));
> > +}
> > +  else if (MEM_P (slot))
> > +{
> > +  addr = XEXP (slot, 0);
> > +  if (!register_operand (addr, Pmode))
> > +   addr = copy_addr_to_reg (addr);
> > +}
> > +  else
> > +gcc_unreachable ();
> > +
> > +  if (!register_operand (ptr, Pmode))
> > +ptr = copy_addr_to_reg (ptr);
> > +
> > +  reg = gen_reg_rtx (BNDmode);
> > +  emit_insn (BNDmode == BND64mode
> > +? gen_bnd64_ldx (reg, addr, ptr)
> > +: gen_bnd32_ldx (reg, addr, ptr));
> > +
> > +  return reg;
> > +}
> > +
> > +/* Expand pass uses this hook to store BOUNDS for call argument PTR
> > +   passed in SLOT in case BOUNDS are not passed in a register.
> > +
> > +   If SLOT is a memory, then BOUNDS are stored as for regular pointer
> > +   stored in memory.  PTR may be NULL in case SLOT is a memory.
> > +   In such case value of PTR (if required) may be loaded from SLOT.
> > +
> > +   If SLOT is NULL or a register then SLOT_NO is an integer constant
> > +   holding number of the target dependent special slot which should be
> > +   used to store BOUNDS.  */
> > +
> > +static void
> > +ix86_store_bounds (rtx ptr, rtx slot, rtx bounds, rtx slot_no)
> 
> This function can be written in exactly the same way as the above
> proposed code, up to the check for ptr register_operand.
> 
> > +{
> > +  rtx addr;
> > +
> > +  if (ptr)
> > +{
> > +  if (!register_operand (ptr, Pmode))
> > +   ptr = copy_addr_to_reg (ptr);
> > +}
> > +  else
> > +{
> > +  gcc_assert (MEM_P (slot));
> > +  ptr = copy_addr_to_reg (slot);
> > +}
> > +
> > +  if (!slot || REG_P (slot))
> > +{
> > +  gcc_assert (CONST_INT_P (s

Re: [PATCH, i386, Pointer Bounds Checker 34/x] Vararg functions support

2014-09-22 Thread Ilya Enkovich
On 21 Sep 18:08, Uros Bizjak wrote:
> Hello!
> 
> > This patch introduces initialization of incoming bounds for vararg function 
> > on i386 target.
> >
> > Bootstrapped and tested on linux-x86_64.
> >
> > Thanks,
> > Ilya
> > --
> > gcc/
> >
> > 2014-06-11  Ilya Enkovich  
> >
> > * config/i386/i386.c (ix86_setup_incoming_varargs): New.
> > (ix86_va_start): Initialize bounds for pointers in va_list.
> > (TARGET_SETUP_INCOMING_VARARG_BOUNDS): New.
> 
> OK with a couple of smallchanges below.
> 
> Thanks,
> Uros.
> 
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index a67e6e7..c520f26 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -8456,6 +8456,72 @@ ix86_setup_incoming_varargs (cumulative_args_t 
> > cum_v, enum machine_mode mode,
> >  setup_incoming_varargs_64 (&next_cum);
> >  }
> >
> > +static void
> > +ix86_setup_incoming_vararg_bounds (cumulative_args_t cum_v,
> > +  enum machine_mode mode,
> > +  tree type,
> > +  int *pretend_size ATTRIBUTE_UNUSED,
> > +  int no_rtl)
> > +{
> > +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
> > +  CUMULATIVE_ARGS next_cum;
> > +  tree fntype;
> > +  rtx save_area;
> > +  int bnd_reg, i, max;
> > +
> > +  gcc_assert (!no_rtl);
> > +
> 
> Please add a comment for following condition.
> 
> > +  if (!TARGET_64BIT)
> > +return;
> > +
> > +  fntype = TREE_TYPE (current_function_decl);
> > +
> > +  /* For varargs, we do not want to skip the dummy va_dcl argument.
> > + For stdargs, we do want to skip the last named argument.  */
> > +  next_cum = *cum;
> > +  if (stdarg_p (fntype))
> > +ix86_function_arg_advance (pack_cumulative_args (&next_cum), mode, 
> > type,
> > +  true);
> > +  if (cum->call_abi == MS_ABI)
> > +return;
> 
> Put this early exit at the top of the function.
> 
> > +  save_area = frame_pointer_rtx;
> > +
> > +  max = cum->regno + cfun->va_list_gpr_size / UNITS_PER_WORD;
> > +  if (max > X86_64_REGPARM_MAX)
> > +max = X86_64_REGPARM_MAX;
> > +
> > +  bnd_reg = cum->bnd_regno + cum->force_bnd_pass;
> > +  if (chkp_function_instrumented_p (current_function_decl))
> > +for (i = cum->regno; i < max; i++)
> > +  {
> > +   rtx addr = plus_constant (Pmode, save_area, i * UNITS_PER_WORD);
> > +   rtx reg = gen_rtx_REG (DImode,
> > +  x86_64_int_parameter_registers[i]);
> > +   rtx ptr = reg;
> > +   rtx bounds;
> > +
> > +   if (bnd_reg <= LAST_BND_REG)
> > + bounds = gen_rtx_REG (BNDmode, bnd_reg);
> > +   else
> > + {
> > +   rtx ldx_addr = plus_constant (Pmode, arg_pointer_rtx,
> > + (LAST_BND_REG - bnd_reg) * 8);
> 
> No magic constants!
> 
> > +   bounds = gen_reg_rtx (BNDmode);
> > +   emit_insn (TARGET_64BIT
> > +  ? gen_bnd64_ldx (bounds, ldx_addr, ptr)
> > +  : gen_bnd32_ldx (bounds, ldx_addr, ptr));
> > + }
> > +
> > +   emit_insn (TARGET_64BIT
> > +  ? gen_bnd64_stx (addr, ptr, bounds)
> > +  : gen_bnd32_stx (addr, ptr, bounds));
> 
> Please check BNDmode instead of TARGET_64BIT.
> 
> > +   bnd_reg++;
> > +  }
> > +}
> > +
> > +
> >  /* Checks if TYPE is of kind va_list char *.  */
> >
> >  static bool
> > @@ -8478,7 +8544,7 @@ ix86_va_start (tree valist, rtx nextarg)
> >  {
> >HOST_WIDE_INT words, n_gpr, n_fpr;
> >tree f_gpr, f_fpr, f_ovf, f_sav;
> > -  tree gpr, fpr, ovf, sav, t;
> > +  tree gpr, fpr, ovf, sav, t, t1;
> >tree type;
> >rtx ovf_rtx;
> >
> > @@ -8529,6 +8595,13 @@ ix86_va_start (tree valist, rtx nextarg)
> >crtl->args.arg_offset_rtx,
> >NULL_RTX, 0, OPTAB_LIB_WIDEN);
> >   convert_move (va_r, next, 0);
> > +
> > + /* Store zero bounds for va_list.  */
> > + if (chkp_function_instrumented_p (current_function_decl))
> > +   chkp_expand_bounds_reset_for_mem (valist,
> > + make_tree (TREE_TYPE (valist),
> > +next));
> > +
> > }
> >return;
> >  }
> > @@ -8582,10 +8655,15 @@ ix86_va_start (tree valist, rtx nextarg)
> >t = make_tree (type, ovf_rtx);
> >if (words != 0)
> >  t = fold_build_pointer_plus_hwi (t, words * UNITS_PER_WORD);
> > +  t1 = t;
> >t = build2 (MODIFY_EXPR, type, ovf, t);
> >TREE_SIDE_EFFECTS (t) = 1;
> >expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
> >
> > +  /* Store zero bounds for overflow area pointer.  */
> > +  if (chkp_function_instrumented_p (current_function_decl))
> > +chkp_expand_bounds_reset_for_mem (ovf, t1);
> 
> Can you please move this above to avoid anothe

Re: Fix ICE with ODR mering and variable sized types

2014-09-22 Thread Jan Hubicka
> On Fri, Sep 19, 2014 at 8:55 PM, Jan Hubicka  wrote:
> > Hi,
> > this patch fixes ICE by avoiding mangling of types with variadic size 
> > (those are
> > not really supported).  Bootstrapped/regtested x86_64-linux, tested with 
> > libreoffice,
> > comitted.
> 
> Hmm, but how do global vars end up having variadic type?  Isn't the
> bug that you are ending up with some local entity here?

We call this on TYPE_NAME of all types, not only global vars.  I do not think I 
should
skip all types with function context, because static variables may have them 
(and I do not
track where type comes from because ODR violation is caused even by types not 
used
in global decls).  For example:

inline
int test()
{
  struct A {int a,b;};
  static struct A testA;
  return testA.a++;
}

creates type A that is local and should be merged interprocedurally I think.

Variadic types indeed can not appear in global declarations, so I think it is 
safe to ignore them.
I am adding Jason to CC, perhaps he knows better.

Honza
> 
> Richard.
> 
> > PR lto/63286
> > * tree.c (need_assembler_name_p): Do not mangle variadic types.
> > Index: tree.c
> > ===
> > --- tree.c  (revision 215328)
> > +++ tree.c  (working copy)
> > @@ -5003,6 +5003,7 @@ need_assembler_name_p (tree decl)
> >&& decl == TYPE_NAME (TREE_TYPE (decl))
> >&& !is_lang_specific (TREE_TYPE (decl))
> >&& AGGREGATE_TYPE_P (TREE_TYPE (decl))
> > +  && !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)
> >&& !type_in_anonymous_namespace_p (TREE_TYPE (decl)))
> >  return !DECL_ASSEMBLER_NAME_SET_P (decl);
> >/* Only FUNCTION_DECLs and VAR_DECLs are considered.  */


Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support

2014-09-22 Thread Ilya Enkovich
On 17 Sep 12:17, Ilya Enkovich wrote:
> On 19 May 11:23, Jeff Law wrote:
> > On 05/19/14 02:19, Ilya Enkovich wrote:
> > >On 16 May 13:39, Jeff Law wrote:
> > >>On 04/16/14 05:35, Ilya Enkovich wrote:
> > >>>Hi,
> > >>>
> > >>>This patch introduces Intel MPX bound registers and instructions.  It 
> > >>>was approved earlier for 4.9 and had no significant changes since then.  
> > >>>I'll assume patch is OK if no objections arise.
> > >>>
> > >>>Patch was bootstrapped and tested for linux-x86_64.
> > OK for the trunk.  Please wait until entire set is approved before
> > installing.
> > 
> > jeff
> > 
> 
> Here is an updated version.  The only change is in _ldx expand.  It now 
> has the second operand preparation code moved from ix86_expand_builtin as was 
> proposed by Uros.
> 
> Thanks,
> Ilya

Added similar operand handling into _stx expand.

Thanks,
Ilya
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 8e0a583..4e07d70 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;   H
-;;;   h jw  z
+;;;   h j   z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -94,6 +94,9 @@
 (define_register_constraint "v" "TARGET_SSE ? ALL_SSE_REGS : NO_REGS"
  "Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).")
 
+(define_register_constraint "w" "TARGET_MPX ? BND_REGS : NO_REGS"
+ "@internal Any bound register.")
+
 ;; We use the Y prefix to denote any number of conditional register sets:
 ;;  z  First SSE register.
 ;;  i  SSE2 inter-unit moves to SSE register enabled
@@ -253,6 +256,8 @@
 ;; T prefix is used for different address constraints
 ;;   v - VSIB address
 ;;   s - address with no segment register
+;;   i - address with no index and no rip
+;;   b - address with no base and no rip
 
 (define_address_constraint "Tv"
   "VSIB address operand"
@@ -261,3 +266,11 @@
 (define_address_constraint "Ts"
   "Address operand without segment register"
   (match_operand 0 "address_no_seg_operand"))
+
+(define_address_constraint "Ti"
+  "MPX address operand without index"
+  (match_operand 0 "address_mpx_no_index_operand"))
+
+(define_address_constraint "Tb"
+  "MPX address operand without base"
+  (match_operand 0 "address_mpx_no_base_operand"))
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2c05cec..a1e9289 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -399,6 +399,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__XSAVEC__");
   if (isa_flag & OPTION_MASK_ISA_XSAVES)
 def_or_undef (parse_in, "__XSAVES__");
+  if (isa_flag & OPTION_MASK_ISA_MPX)
+def_or_undef (parse_in, "__MPX__");
 }
 
 
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 07e5720..0e302e3 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -87,6 +87,9 @@ VECTOR_MODE (INT, DI, 1); /*   V1DI */
 VECTOR_MODE (INT, SI, 1); /*   V1SI */
 VECTOR_MODE (INT, QI, 2); /*   V2QI */
 
+POINTER_BOUNDS_MODE (BND32, 8);
+POINTER_BOUNDS_MODE (BND64, 16);
+
 INT_MODE (OI, 32);
 INT_MODE (XI, 64);
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 39462bd..c8ef2d2 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -231,6 +231,8 @@ extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
 
+extern bool ix86_bnd_prefixed_insn_p (rtx);
+
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 929f1b1..01823ca 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2131,6 +2131,8 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   /* Mask registers.  */
   MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
   MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  /* MPX bound registers */
+  BND_REGS, BND_REGS, BND_REGS, BND_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2147,6 +2149,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
   93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
+  101, 102, 103, 104,  /* bound registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2163,6 +2166,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   67, 68, 69, 70, 71, 72, 73, 74,   /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,   /* AVX-512 registers 24-31 */
   118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers *

Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jason Merrill

On 09/22/2014 11:26 AM, Jakub Jelinek wrote:

On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:

If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
explosion of shell verbosity.  Could we add some '@'s to more of the rules,
perhaps?


I've been considering that too, but not sure what info people find valuable
and what they don't.


I don't see much information in the ~128 repetitions of the 
check-parallel rules with different numbers; the actual runtest command 
is the same in all of them.  Adding @ to all of the commands of the 
check-parallel-% rule makes things much better for me:


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6f251a5..be4c840 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
 	fi
 
 check-parallel-% : site.exp
-	-test -d plugin || mkdir plugin
-	-test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
-	test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
-	-(rootme=`${PWD_COMMAND}`; export rootme; \
+	-@test -d plugin || mkdir plugin
+	-@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
+	@test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
+	-@(rootme=`${PWD_COMMAND}`; export rootme; \
 	srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
 	if [ -n "$(check_p_subno)" ] \
 	   && [ -n "$$GCC_RUNTEST_PARALLELIZE_DIR" ] \


Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin

2014-09-22 Thread Aldy Hernandez

On 09/22/14 08:17, Michael Matz wrote:

Hi,

On Fri, 19 Sep 2014, Aldy Hernandez wrote:


Michael, I really don't understand the need for this change in your original
patch.  I don't know if this was a temporary testing change or what.


I'm pretty sure it was temporary testing, when I still was finding my way
through dwarf2out limitations/constraints.


Ah perfect!




I'm happy to report that with this and the last set of patches, both C
and C++ guality tests have <= regressions than mainline.  Yay.


Super.


Again, thank you for your original patchset, which has immensely helped 
me navigate my way around the black box which was/is dwarf2out.


Aldy


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Segher Boessenkool
On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
> I've been considering that too, but not sure what info people find valuable
> and what they don't.

The ten million "Running blablablalba.exp ..." messages on a very parallel
run aren't helpful in my opinion.  There might be more but that drowns out
everything else :-)


Segher


Re: [PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim

2014-09-22 Thread Nicholas Clifton

Hi Peter,

> gcc/ChangeLog
> 2014-09-22  Peter A. Bigot  
>
>* config/msp430/msp430.h: Remove automatic -lnosys when -msim absent.

Approved and applied.

Cheers
  Nick



[PATCH][AArch64] LR register not used in leaf functions

2014-09-22 Thread Kugan
AArch64 has the same issue ARM had where the LR register was not used in
leaf functions. This was reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
test-case need to be added with more live ranges for the need for the
LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
the case AArch64 to see this.

The same fix (from the thread
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
into ARM should apply to AArch64 as well. Regression tested on qemu for
aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2014-09-23  Kugan Vivekanandarajah  

* config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after
epilogue_completed is true.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index db950da..b3e4585 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -309,7 +309,7 @@ extern unsigned long aarch64_tune_flags;
considered live at the start of the called function.  */
 
 #define EPILOGUE_USES(REGNO) \
-  ((REGNO) == LR_REGNUM)
+  (epilogue_completed && (REGNO) == LR_REGNUM)
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
the stack pointer does not matter.  The value is tested only in


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jakub Jelinek
On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote:
> On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
> > I've been considering that too, but not sure what info people find valuable
> > and what they don't.
> 
> The ten million "Running blablablalba.exp ..." messages on a very parallel
> run aren't helpful in my opinion.  There might be more but that drowns out
> everything else :-)

It has some value, it shows the actual progress.  Sure, you can just watch
the *.log files as they are populated and get better picture.  I think the
Running *.exp messages go from dejagnu, not from gcc testsuite changes.

Jakub


Re: Speedup int_bit_from_pos

2014-09-22 Thread Jan Hubicka
> On Sun, 21 Sep 2014, Jan Hubicka wrote:
> 
> > > 
> > > Please omit static from inline functions.
> > 
> > Yep, I suppose we want to drop static in all inlines? I can make patch for 
> > that.
> > > 
> > > Also one notable difference with your patches is that the fits hwi is now 
> > > not tested on the result but on the result input which, multiplied by 8, 
> > > might not fit a hwi now.  So please use wide-ints here (the to_offset 
> > > flavor).
> > 
> > The function must always suceed (so user promise it will fit in HWI) and for
> > performance reasons I would rather not go into wide int by defualt, but I 
> > can
> > do that with checking enabled.
> 
> wide-int should be fast enough, please use it.

Like this?

Index: tree.h
===
--- tree.h  (revision 215421)
+++ tree.h  (working copy)
@@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
 extern HOST_WIDE_INT int_size_in_bytes (const_tree);
 extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
 extern tree bit_position (const_tree);
-extern HOST_WIDE_INT int_bit_position (const_tree);
 extern tree byte_position (const_tree);
 extern HOST_WIDE_INT int_byte_position (const_tree);
 
+/* Like bit_position, but return as an integer.  It must be representable in
+   that way (since it could be a signed value, we don't have the
+   option of returning -1 like int_size_in_byte can.  */
+
+static inline HOST_WIDE_INT int_bit_position (const_tree field)
+{ 
+  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
+ + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();
+}
+
+
 #define sizetype sizetype_tab[(int) stk_sizetype]
 #define bitsizetype sizetype_tab[(int) stk_bitsizetype]
 #define ssizetype sizetype_tab[(int) stk_ssizetype]


Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Chen Gang
On 09/22/2014 10:45 PM, Michael Eager wrote:
> On 09/21/14 21:10, Chen Gang wrote:
>> On 9/22/14 2:09, Michael Eager wrote:
>>>
>>> Generally, you should use "gcc" to link programs, not "ld".  gcc is
>>> a driver which will select the appropriate libraries and support routines
>>> (such as crt0.o, which contains _start) and pass them to the linker.
>>>
>>
>> OK, thanks.
>>
>> When gcc, it misses the root directory for "crt1.o" and "crtn.o": e.g.
>> "/lib/ld.so.1", "crt1.o", "crtn.o" when gcc -v, but we need "/upstream/
>> release/lib/ld.so.1", "/upstream/lib/crt1.o", "/upstream/libcrtn.o".
> 
> You likely need to build mb-gcc with --sysroot=/upstream.
> 

OK, thanks! I guess it will solve all issues which I met, and I shall
try next.

> How are you building gcc?  What are your configuration options?
> 

The related information is below, please help check when you have time,
thanks.

[root@localhost ~]# /upstream/release/bin/microblaze-gchen-linux-gcc -v
Using built-in specs.
COLLECT_GCC=/upstream/release/bin/microblaze-gchen-linux-gcc
COLLECT_LTO_WRAPPER=/upstream/release/libexec/gcc/microblaze-gchen-linux/5.0.0/lto-wrapper
Target: microblaze-gchen-linux
Configured with: ../gcc/configure --target=microblaze-gchen-linux --disable-nls 
--enable-languages=c --disable-threads --disable-shared --without-headers 
--disable-libssp --disable-libquadmath --disable-libgomp --disable-libatomic 
--prefix=/upstream/release
Thread model: single
gcc version 5.0.0 20140920 (experimental) (GCC) 
[root@localhost ~]# 


Thanks.
-- 
Chen Gang

Open share and attitude like air water and life which God blessed


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jakub Jelinek
On Mon, Sep 22, 2014 at 11:43:35AM -0400, Jason Merrill wrote:
> On 09/22/2014 11:26 AM, Jakub Jelinek wrote:
> >On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:
> >>If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
> >>explosion of shell verbosity.  Could we add some '@'s to more of the rules,
> >>perhaps?
> >
> >I've been considering that too, but not sure what info people find valuable
> >and what they don't.
> 
> I don't see much information in the ~128 repetitions of the check-parallel
> rules with different numbers; the actual runtest command is the same in all
> of them.  Adding @ to all of the commands of the check-parallel-% rule makes
> things much better for me:

LGTM (though, supposedly we want similar change in
libstdc++-v3/testsuite/Makefile.am).
Or, if people would really like to see the commands, we could print them
just once, using e.g.
-$(if $(check_p_subno),@)(rootme= ...
(then e.g. check-parallel-gcc goal would print the command, but
check-parallel-gcc-1 or check-parallel-gcc-112 would not).

> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
>   fi
>  
>  check-parallel-% : site.exp
> - -test -d plugin || mkdir plugin
> - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
> - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
> $(TESTSUITEDIR)/$(check_p_subdir)
> - -(rootme=`${PWD_COMMAND}`; export rootme; \
> + -@test -d plugin || mkdir plugin
> + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
> + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
> $(TESTSUITEDIR)/$(check_p_subdir)
> + -@(rootme=`${PWD_COMMAND}`; export rootme; \
>   srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
>   if [ -n "$(check_p_subno)" ] \
>  && [ -n "$$GCC_RUNTEST_PARALLELIZE_DIR" ] \


Jakub


Re: [PATCH][AArch64] LR register not used in leaf functions

2014-09-22 Thread Jiong Wang

On 22/09/14 16:43, Kugan wrote:


AArch64 has the same issue ARM had where the LR register was not used in
leaf functions. This was reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
test-case need to be added with more live ranges for the need for the
LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
the case AArch64 to see this.

The same fix (from the thread
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
into ARM should apply to AArch64 as well. Regression tested on qemu for
aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?

This still be a partial fix. LR should be a caller-saved register free
to use in case it's saved properly to across function call.

I had a very similar patch to this sitting in my local tree and under
various benchmark analysis.

-- Jiong


Thanks,
Kugan


gcc/ChangeLog:

2014-09-23  Kugan Vivekanandarajah  

* config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after
epilogue_completed is true.






Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Chen Gang
On 09/22/2014 10:46 PM, Michael Eager wrote:
> On 09/21/14 20:55, Chen Gang wrote:
>>
>>
>> On 9/22/14 2:03, Michael Eager wrote:
>>> On 09/20/14 23:24, Chen Gang wrote:
 And it seems, we also need 'LinkScr.ld' for ldscript, could you share it
 to me, thanks.

 set_board_info ldscript 
 "-T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld"
>>>
>>> Hi Chen --
>>>
>>> The DejaGNU configuration I provided is for a bare-metal environment.
>>>
>>> If you are testing in a Linux environment, the tool chain you uses
>>> should provide a default linker script which matches your hardware's
>>> memory layout. You should not need to provide a separate linker script.
>>>
>>
>> OK, thanks, I shall try to find the default linker script for it.
> 
> If you are running mb-gcc which generates executables which run on
> the target, you do not need to provide a linker script.
> 

OK, thanks. I shall remove it, next.

Thanks.
-- 
Chen Gang

Open share and attitude like air water and life which God blessed


[jit] Improvements to documentation

2014-09-22 Thread David Malcolm
Committed to branch dmalcolm/jit

As before, an HTML version of the docs can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/index.html

with the bulk of the changes occurring to:
 https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/intro/tutorial03.html

gcc/jit/ChangeLog.jit:
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* docs/intro/install.rst: Reduce width of listing.
* docs/intro/tutorial01.rst: Use  rather than
"libgccjit.h" when including the header.
* docs/intro/tutorial02.rst: Likewise.
* docs/intro/tutorial03.rst: Clarify various sections; show
effect of reducing optimization level down from 3 to 2.
("Putting it all together"): Move to above...
("Behind the curtain: optimizing away stack manipulation"):
...this, and rename this to...
("Behind the curtain: How does our code get optimized?"): ...and
add more detail, and discussion of elimination of tail recursion.
---
 gcc/jit/ChangeLog.jit  |  15 +
 gcc/jit/docs/_build/texinfo/libgccjit.texi | 875 +
 gcc/jit/docs/intro/install.rst |  11 +-
 gcc/jit/docs/intro/tutorial01.rst  |   2 +-
 gcc/jit/docs/intro/tutorial02.rst  |   2 +-
 gcc/jit/docs/intro/tutorial03.rst  | 533 +++---
 6 files changed, 1141 insertions(+), 297 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 8e546e6..14576f2 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,18 @@
+2014-09-22  David Malcolm  
+
+   * docs/_build/texinfo/libgccjit.texi: Regenerate.
+   * docs/intro/install.rst: Reduce width of listing.
+   * docs/intro/tutorial01.rst: Use  rather than
+   "libgccjit.h" when including the header.
+   * docs/intro/tutorial02.rst: Likewise.
+   * docs/intro/tutorial03.rst: Clarify various sections; show
+   effect of reducing optimization level down from 3 to 2.
+   ("Putting it all together"): Move to above...
+   ("Behind the curtain: optimizing away stack manipulation"):
+   ...this, and rename this to...
+   ("Behind the curtain: How does our code get optimized?"): ...and
+   add more detail, and discussion of elimination of tail recursion.
+
 2014-09-19  David Malcolm  
 
* TODO.rst: Add detection of uninitialized variables, since
diff --git a/gcc/jit/docs/_build/texinfo/libgccjit.texi 
b/gcc/jit/docs/_build/texinfo/libgccjit.texi
index 985b22c..850adf2 100644
--- a/gcc/jit/docs/_build/texinfo/libgccjit.texi
+++ b/gcc/jit/docs/_build/texinfo/libgccjit.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-libgccjit 0.1, September 19, 2014
+libgccjit 0.1, September 22, 2014
 
 David Malcolm
 
@@ -131,8 +131,13 @@ Tutorial part 3: Adding JIT-compilation to a toy 
interpreter
 * Compiling the context:: 
 * Single-stepping through the generated code:: 
 * Examining the generated code:: 
-* Behind the curtain; optimizing away stack manipulation: Behind the curtain 
optimizing away stack manipulation. 
 * Putting it all together:: 
+* Behind the curtain; How does our code get optimized?: Behind the curtain How 
does our code get optimized?. 
+
+Behind the curtain: How does our code get optimized?
+
+* Optimizing away stack manipulation:: 
+* Elimination of tail recursion:: 
 
 Topic Reference
 
@@ -259,8 +264,13 @@ Tutorial part 3: Adding JIT-compilation to a toy 
interpreter
 * Compiling the context:: 
 * Single-stepping through the generated code:: 
 * Examining the generated code:: 
-* Behind the curtain; optimizing away stack manipulation: Behind the curtain 
optimizing away stack manipulation. 
 * Putting it all together:: 
+* Behind the curtain; How does our code get optimized?: Behind the curtain How 
does our code get optimized?. 
+
+Behind the curtain: How does our code get optimized?
+
+* Optimizing away stack manipulation:: 
+* Elimination of tail recursion:: 
 
 @end menu
 
@@ -321,12 +331,13 @@ needed to develop against it (@cite{libgccjit-devel}):
 
 @example
 $ rpm -qlv libgccjit
-lrwxrwxrwx1 rootroot   18 Aug 12 07:56 
/usr/lib64/libgccjit.so.0 -> libgccjit.so.0.0.1
--rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 
/usr/lib64/libgccjit.so.0.0.1
+lrwxrwxrwx1 rootroot   18 Aug 12 07:56 /usr/lib64/libgccjit.so.0 
-> libgccjit.so.0.0.1
+-rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 
/usr/lib64/libgccjit.so.0.0.1
+
 $ rpm -qlv libgccjit-devel
--rwxr-xr-x1 rootroot37654 Aug 12 07:56 
/usr/include/libgccjit++.h
--rwxr-xr-x1 rootroot28967 Aug 12 07:56 
/usr/include/libgccjit.h
-lrwxrwxrwx1 rootroot   14 Aug 12 07:56 
/usr/lib64/libgccjit.so -> libgccjit.so.0
+-rwxr-xr-x1 rootroot37654 Aug 12 07:56 /usr/include/libgccjit++.h
+-rwxr-xr-x1 rootroot28967 Aug 12 07:56 /usr/include/libgcc

Re: [AArch64] Auto-generate the "BUILTIN_" macros for aarch64-builtins.c

2014-09-22 Thread Richard Earnshaw
On 22/09/14 14:30, James Greenhalgh wrote:
> 
> On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote:
>> On 18/09/14 10:53, James Greenhalgh wrote:
>>> +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: 
>>> $(srcdir)/config/aarch64/geniterators.sh \
>>> + $(srcdir)/config/aarch64/iterators.md
>>> + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
>>> + $(srcdir)/config/aarch64/iterators.md > \
>>> + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h
>>> +
>>>  aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) 
>>> $(SYSTEM_H) \
>>>  coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H)
>>>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>>
>>
>> Is there any real need to write this into the source directory and have
>> the built file checked in?  Ie. can't we always write to the build
>> directory and use it from there.  That avoids problems if the sources
>> are on a read-only filesystem.
>>
>> If we do need to leave it in the sources, then contrib/update_gcc should
>> be taught how to touch the generated file when resyncing from the
>> repositories.
>>
> 
> I thought I had tried this and failed to make it work. I must not have
> been trying hard enough at the time.
> 
> Updated as attached, generating the header in the build directory. It
> looks much better this way!
> 
> Bootstrapped on aarch64-none-linux-gnueabi with no issues.
> 
> Ok?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2014-09-22  James Greenhalgh  
> 
>   * config/aarch64/geniterators.sh: New.
>   * config/aarch64/iterators.md (VDQF_DF): New.
>   * config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h.
>   * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
> 
> 

OK.

R.




Re: [PATCH] gcc parallel make check

2014-09-22 Thread Segher Boessenkool
On Mon, Sep 22, 2014 at 05:49:12PM +0200, Jakub Jelinek wrote:
> On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote:
> > On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
> > > I've been considering that too, but not sure what info people find 
> > > valuable
> > > and what they don't.
> > 
> > The ten million "Running blablablalba.exp ..." messages on a very parallel
> > run aren't helpful in my opinion.  There might be more but that drowns out
> > everything else :-)
> 
> It has some value, it shows the actual progress.  Sure, you can just watch
> the *.log files as they are populated and get better picture.  I think the
> Running *.exp messages go from dejagnu, not from gcc testsuite changes.

Hrm.  Looking at the log files it seems there are not more of those messages
at all since the changes.  Maybe it just all got too fast! :-)


Segher


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jason Merrill

On 09/22/2014 11:58 AM, Jakub Jelinek wrote:

LGTM (though, supposedly we want similar change in
libstdc++-v3/testsuite/Makefile.am).
Or, if people would really like to see the commands, we could print them
just once, using e.g.
-$(if $(check_p_subno),@)(rootme= ...
(then e.g. check-parallel-gcc goal would print the command, but
check-parallel-gcc-1 or check-parallel-gcc-112 would not).


So, like this?



commit c750897381a3f936e27cabd825cfa85ce936a6a9
Author: Jason Merrill 
Date:   Mon Sep 22 11:44:00 2014 -0400

gcc/
	* Makefile.in (check-parallel-%): Add @.
libstdc++-v3/
	* testsuite/Makefile.am (%/site.exp): Add @.
	(check-DEJAGNU): Likewise.
	* testsuite/Makefile.in: Regenerate.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6f251a5..97b439a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
 	fi
 
 check-parallel-% : site.exp
-	-test -d plugin || mkdir plugin
-	-test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
-	test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
-	-(rootme=`${PWD_COMMAND}`; export rootme; \
+	-@test -d plugin || mkdir plugin
+	-@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
+	@test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
+	-$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \
 	srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
 	if [ -n "$(check_p_subno)" ] \
 	   && [ -n "$$GCC_RUNTEST_PARALLELIZE_DIR" ] \
diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am
index e206aba..b4c9e85 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -91,9 +91,9 @@ new-abi-baseline:
 	  ${extract_symvers} ../src/.libs/libstdc++.so $${output})
 
 %/site.exp: site.exp
-	-test -d $* || mkdir $*
+	-@test -d $* || mkdir $*
 	@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
-	objdir=`${PWD_COMMAND}`/$*; \
+	@objdir=`${PWD_COMMAND}`/$*; \
 	sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
 	-e "s|^set objdir .*$$|set objdir $$objdir|" \
 	site.exp > $*/site.exp.tmp
@@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
 
 # Run the testsuite in normal mode.
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
-	AR="$(AR)"; export AR; \
+	$(if $*,@)AR="$(AR)"; export AR; \
 	RANLIB="$(RANLIB)"; export RANLIB; \
 	if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
 	  rm -rf normal-parallel || true; \
diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/Makefile.in
index 59060b8..0fc26f4 100644
--- a/libstdc++-v3/testsuite/Makefile.in
+++ b/libstdc++-v3/testsuite/Makefile.in
@@ -553,9 +553,9 @@ new-abi-baseline:
 	  ${extract_symvers} ../src/.libs/libstdc++.so $${output})
 
 %/site.exp: site.exp
-	-test -d $* || mkdir $*
+	-@test -d $* || mkdir $*
 	@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
-	objdir=`${PWD_COMMAND}`/$*; \
+	@objdir=`${PWD_COMMAND}`/$*; \
 	sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
 	-e "s|^set objdir .*$$|set objdir $$objdir|" \
 	site.exp > $*/site.exp.tmp
@@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
 
 # Run the testsuite in normal mode.
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
-	AR="$(AR)"; export AR; \
+	$(if $*,@)AR="$(AR)"; export AR; \
 	RANLIB="$(RANLIB)"; export RANLIB; \
 	if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
 	  rm -rf normal-parallel || true; \


Re: [PATCH] gcc parallel make check

2014-09-22 Thread Jakub Jelinek
On Mon, Sep 22, 2014 at 12:21:08PM -0400, Jason Merrill wrote:
> On 09/22/2014 11:58 AM, Jakub Jelinek wrote:
> >LGTM (though, supposedly we want similar change in
> >libstdc++-v3/testsuite/Makefile.am).
> >Or, if people would really like to see the commands, we could print them
> >just once, using e.g.
> > -$(if $(check_p_subno),@)(rootme= ...
> >(then e.g. check-parallel-gcc goal would print the command, but
> >check-parallel-gcc-1 or check-parallel-gcc-112 would not).
> 
> So, like this?

Ok, thanks.

> commit c750897381a3f936e27cabd825cfa85ce936a6a9
> Author: Jason Merrill 
> Date:   Mon Sep 22 11:44:00 2014 -0400
> 
> gcc/
>   * Makefile.in (check-parallel-%): Add @.
> libstdc++-v3/
>   * testsuite/Makefile.am (%/site.exp): Add @.
>   (check-DEJAGNU): Likewise.
>   * testsuite/Makefile.in: Regenerate.
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 6f251a5..97b439a 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
>   fi
>  
>  check-parallel-% : site.exp
> - -test -d plugin || mkdir plugin
> - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
> - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
> $(TESTSUITEDIR)/$(check_p_subdir)
> - -(rootme=`${PWD_COMMAND}`; export rootme; \
> + -@test -d plugin || mkdir plugin
> + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
> + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
> $(TESTSUITEDIR)/$(check_p_subdir)
> + -$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \
>   srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
>   if [ -n "$(check_p_subno)" ] \
>  && [ -n "$$GCC_RUNTEST_PARALLELIZE_DIR" ] \
> diff --git a/libstdc++-v3/testsuite/Makefile.am 
> b/libstdc++-v3/testsuite/Makefile.am
> index e206aba..b4c9e85 100644
> --- a/libstdc++-v3/testsuite/Makefile.am
> +++ b/libstdc++-v3/testsuite/Makefile.am
> @@ -91,9 +91,9 @@ new-abi-baseline:
> ${extract_symvers} ../src/.libs/libstdc++.so $${output})
>  
>  %/site.exp: site.exp
> - -test -d $* || mkdir $*
> + -@test -d $* || mkdir $*
>   @srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
> - objdir=`${PWD_COMMAND}`/$*; \
> + @objdir=`${PWD_COMMAND}`/$*; \
>   sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
>   -e "s|^set objdir .*$$|set objdir $$objdir|" \
>   site.exp > $*/site.exp.tmp
> @@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
> normal%/site.exp
>  
>  # Run the testsuite in normal mode.
>  check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
> - AR="$(AR)"; export AR; \
> + $(if $*,@)AR="$(AR)"; export AR; \
>   RANLIB="$(RANLIB)"; export RANLIB; \
>   if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
> rm -rf normal-parallel || true; \
> diff --git a/libstdc++-v3/testsuite/Makefile.in 
> b/libstdc++-v3/testsuite/Makefile.in
> index 59060b8..0fc26f4 100644
> --- a/libstdc++-v3/testsuite/Makefile.in
> +++ b/libstdc++-v3/testsuite/Makefile.in
> @@ -553,9 +553,9 @@ new-abi-baseline:
> ${extract_symvers} ../src/.libs/libstdc++.so $${output})
>  
>  %/site.exp: site.exp
> - -test -d $* || mkdir $*
> + -@test -d $* || mkdir $*
>   @srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
> - objdir=`${PWD_COMMAND}`/$*; \
> + @objdir=`${PWD_COMMAND}`/$*; \
>   sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
>   -e "s|^set objdir .*$$|set objdir $$objdir|" \
>   site.exp > $*/site.exp.tmp
> @@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
> normal%/site.exp
>  
>  # Run the testsuite in normal mode.
>  check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
> - AR="$(AR)"; export AR; \
> + $(if $*,@)AR="$(AR)"; export AR; \
>   RANLIB="$(RANLIB)"; export RANLIB; \
>   if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
> rm -rf normal-parallel || true; \


Jakub


[patch] moving macro definitions to defaults.h

2014-09-22 Thread Andrew MacLeod
After being reminded of the tm.h issues brought up last november (here:  
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started 
looking back into it.


The general summary is the any header file which has a conditional on a  
target macro could be affected by include file reordering... ie,  if a 
header file has

#ifdef BLAH
 

and we move the header files around a bit, it wouldn't be immediately 
obvious if the order where changes and BLAH was define later in the 
include structure.  The results for this header files would be different 
because  would no longer be in the preprocess source, and it 
may take a long time to discover the difference.


Josephs solution was to identify these and instead put a default 
definition in default.h ...  then change all the uses to #if instead.. ie,

#if BLAH

This way we can ensure that the definition has been seen and it will be 
a compile error if not.



I looked at all the target macros listed by Joseph, and decide to start 
with the ones which are used *only* in an "if defined" situation in a .h 
files of some sort.

ie
#ifdef
#ifndef
or #if define()

If this happens in a .c file, then changing the order of .h's shouldn't 
matter. (Unless the .c file is doing something really screwy.  So for 
the moment, ignoring that.)


So looking at only .h files, I found a number of default definitions in 
rtl.h, which this patch moves to defaults.h.  I was going to #error if 
defaults.h wasn't included, but found that to be unnecessary since it 
uses some of those macros and would cause a compile failure anyway if it 
wasn't included. All the other uses of these particular macros use the 
#if model, so no further adjustment is required for them.


This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are 
running.  Assuming no issues show up, OK for mainline?


The remaining macros I will have a closer look at and most will likely 
need the full treatment moving from #ifdef to the #if model.  I plan to 
do these next.  The macros which fit this category for potential header 
trouble (along with their usage count) are:


2  USE_COLLECT2
3  TARGET_TERMINATE_DW2_EH_FRAME_INFO
3  TARGET_HAVE_CTORS_DTORS
3  REGMODE_NATURAL_SIZE
4  CC_STATUS_MDEP_INIT
4  CC_STATUS_MDEP_INIT
4  MODE_BASE_REG_REG_CLASS
4  REGNO_MODE_OK_FOR_REG_BASE_P
5  MODE_BASE_REG_CLASS
5  XCOFF_DEBUGGING_INFO
6  VMS_DEBUGGING_INFO
6  TARGET_ASM_OUTPUT_ANCHOR
6  EH_FRAME_SECTION_NAME
8  MODE_CODE_BASE_REG_CLASS
9  HARD_REGNO_CALLER_SAVE_MODE
9  HARD_REGNO_CALL_PART_CLOBBERED
9  SDB_DEBUGGING_INFO
9  CC_STATUS_MDEP
9  REGNO_MODE_CODE_OK_FOR_BASE_P
10 SECONDARY_RELOAD_CLASS
10 TARGET_VXWORKS_RTP
14 NO_DOT_IN_LABEL
14 REGNO_MODE_OK_FOR_BASE_P
17 STACK_REGS
19 TARGET_ASM_DESTRUCTOR
20 OBJECT_FORMAT_ELF
20 TARGET_ASM_CONSTRUCTOR
24 NO_DOLLAR_IN_LABEL
26 DBX_DEBUGGING_INFO
30 CPLUSPLUS_CPP_SPEC
42 ASM_OUTPUT_DEF






	* defaults.h (HAVE_PRE_INCREMENT, HAVE_PRE_DECREMENT,
	HAVE_POST_INCREMENT, HAVE_POST_DECREMENT, HAVE_POST_MODIFY_DISP,
	HAVE_POST_MODIFY_REG, HAVE_PRE_MODIFY_DISP, HAVE_PRE_MODIFY_REG,
	USE_LOAD_POST_INCREMENT, USE_LOAD_POST_DECREMENT,
	USE_LOAD_PRE_INCREMENT, USE_LOAD_PRE_DECREMENT,
	USE_STORE_POST_INCREMENT, USE_STORE_POST_DECREMENT,
	USE_STORE_PRE_INCREMENT, USE_STORE_PRE_DECREMENT,
	HARD_FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_IS_FRAME_POINTER,
	HARD_FRAME_POINTER_IS_ARG_POINTER): Relocate from rtl.h.
	* rtl.h: Move default definitions to defaults.h.
	(AUTO_INC_DEC): Adjust defintion check to assumed defaults exist.


Index: defaults.h
===
*** defaults.h	(revision 215355)
--- defaults.h	(working copy)
*** see the files COPYING3 and COPYING.RUNTI
*** 1168,1173 
--- 1168,1264 
  #define DEFAULT_PCC_STRUCT_RETURN 1
  #endif
  
+ #ifndef HAVE_PRE_INCREMENT
+ #define HAVE_PRE_INCREMENT 0
+ #endif
+ 
+ #ifndef HAVE_PRE_DECREMENT
+ #define HAVE_PRE_DECREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_INCREMENT
+ #define HAVE_POST_INCREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_DECREMENT
+ #define HAVE_POST_DECREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_MODIFY_DISP
+ #define HAVE_POST_MODIFY_DISP 0
+ #endif
+ 
+ #ifndef HAVE_POST_MODIFY_REG
+ #define HAVE_POST_MODIFY_REG 0
+ #endif
+ 
+ #ifndef HAVE_PRE_MODIFY_DISP
+ #define HAVE_PRE_MODIFY_DISP 0
+ #endif
+ 
+ #ifndef HAVE_PRE_MODIFY_REG
+ #define HAVE_PRE_MODIFY_REG 0
+ #endif
+ 
+ /* Some architectures do not have complete pre/post increment/decrement
+instruction sets, or only move some modes efficiently.  These macros
+allow us to tune autoincrement generation.  */
+ 
+ #ifndef USE_LOAD_POST_INCREMENT
+ #define USE_LOAD_POST_INCREMENT(MODE)   HAVE_POST_INCREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_POST_DECREMENT
+ #define USE_LOAD_POST_DECREMENT(MODE)   HAVE_POST_DECREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_PRE_INCREMENT
+ #define USE_LOAD_PRE_INCREMENT(MODE)HAVE_PRE_INCREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_PRE_DECREMENT
+ #define USE_LOAD_PRE_DECREMEN

Re: Fix ICE with ODR mering and variable sized types

2014-09-22 Thread Jason Merrill

On 09/22/2014 11:37 AM, Jan Hubicka wrote:

Variadic types indeed can not appear in global declarations, so I think it is 
safe to ignore them.


Agreed.

Jason



Re: [PATCH 4/5] Generalise invalid_mode_change_p

2014-09-22 Thread Jeff Law

On 09/22/14 01:34, Richard Sandiford wrote:

Jeff Law  writes:

On 09/18/14 04:25, Richard Sandiford wrote:

This is the main patch for the bug.  We should treat a register as invalid
for a mode change if simplify_subreg_regno cannot provide a new register
number for the result.  We should treat a class as invalid for a mode change
if all registers in the class are invalid.  This is an extension of the old
CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M).

I forgot to say that the patch is a prerequisite to removing aarch64's
C_C_C_M.  There are other prerequisites too, but removing C_C_C_M without
this patch caused regressions in the existing testsuite, which is why no
new tests are needed.


gcc/
* hard-reg-set.h: Include hash-table.h.
(target_hard_regs): Add a finalize method and a x_simplifiable_subregs
field.
* target-globals.c (target_globals::~target_globals): Handle
hard_regs->finalize.
* rtl.h (subreg_shape): New structure.
(shape_of_subreg): New function.
(simplifiable_subregs): Declare.
* reginfo.c (simplifiable_subreg): New structure.
(simplifiable_subregs_hasher): Likewise.
(simplifiable_subregs): New function.
(invalid_mode_changes): Delete.
(alid_mode_changes, valid_mode_changes_obstack): New variables.
(record_subregs_of_mode): Remove subregs_of_mode parameter.
Record valid mode changes in valid_mode_changes.
(find_subregs_of_mode): Remove subregs_of_mode parameter.
Update calls to record_subregs_of_mode.
(init_subregs_of_mode): Remove invalid_mode_changes and bitmap
handling.  Initialize new variables.  Update call to
find_subregs_of_mode.
(invalid_mode_change_p): Check new variables instead of
invalid_mode_changes.
(finish_subregs_of_mode): Finalize new variables instead of
invalid_mode_changes.
(target_hard_regs::finalize): New function.
* ira-costs.c (print_allocno_costs): Call invalid_mode_change_p
even when CLASS_CANNOT_CHANGE_MODE is undefined.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-09-15 11:55:40.459855161 +0100
+++ gcc/rtl.h   2014-09-15 12:26:21.249077760 +0100
+/* Return the shape of a SUBREG rtx.  */
+
+static inline subreg_shape
+shape_of_subreg (const_rtx x)
+{
+  return subreg_shape (GET_MODE (SUBREG_REG (x)),
+  SUBREG_BYTE (x), GET_MODE (x));
+}
+

Is there some reason you don't have a constructor that accepts a
const_rtx?


I was worried that by allowing implicit const_rtx->subreg_shape
conversions, it would be less obvious that the rtx has to have
code SUBREG.  I.e. a checked conversion would be hidden in the
constructor rather than being explicit.

If with David's new rtx hierarchy we end up with an rtx_subreg
subclass then I agree we should have a constructor that takes
one of those.

Makes sense.

I'm not sure if I was explicit, but the patch is fine, that was more a 
curiosity on my part than anything.


jeff



C++ PATCH for variable template diagnostics

2014-09-22 Thread Jason Merrill

This patch fixes two issues with variable templates:

1) -Wunused was warning about a variable template specialization when 
leaving the template specialization scope, and
2) The diagnostic referring to the specialization wasn't showing the 
template arguments.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 51aed4b97f9a6dd8bc7954a4691f7790d6bc41c5
Author: Jason Merrill 
Date:   Mon Sep 22 10:33:57 2014 -0400

	* decl.c (poplevel): Don't warn about unused vars in template scope.
	* error.c (dump_decl): Handle variable templates.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index fe5a4af..12a9f43 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -624,6 +624,7 @@ poplevel (int keep, int reverse, int functionbody)
 
   /* Before we remove the declarations first check for unused variables.  */
   if ((warn_unused_variable || warn_unused_but_set_variable)
+  && current_binding_level->kind != sk_template_parms
   && !processing_template_decl)
 for (tree d = getdecls (); d; d = TREE_CHAIN (d))
   {
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 86fd405..a03bfe1 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1044,6 +1044,18 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
 case FIELD_DECL:
 case PARM_DECL:
   dump_simple_decl (pp, t, TREE_TYPE (t), flags);
+
+  /* Handle variable template specializations.  */
+  if (TREE_CODE (t) == VAR_DECL
+	  && DECL_LANG_SPECIFIC (t)
+	  && DECL_TEMPLATE_INFO (t)
+	  && PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (t)))
+	{
+	  pp_cxx_begin_template_argument_list (pp);
+	  tree args = INNERMOST_TEMPLATE_ARGS (DECL_TI_ARGS (t));
+	  dump_template_argument_list (pp, args, flags);
+	  pp_cxx_end_template_argument_list (pp);
+	}
   break;
 
 case RESULT_DECL:
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ12.C b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C
new file mode 100644
index 000..49ea588
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wall" }
+
+template  T x;
+template <> int x = 0;
+
+int main()
+{
+  return x;
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ13.C b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C
new file mode 100644
index 000..e398d22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++14 } }
+
+template  T x;
+template <> int x = 0;
+template <> int x = 0;	// { dg-error "x" }


[patch] Fix specializations of std::uses_allocator in and

2014-09-22 Thread Jonathan Wakely

Include missing header, and add tests.

Tested x86_64-linux, committed to trunk.
commit a0ffb1d4d0bf78be525c7546665ef7002b66b13e
Author: Jonathan Wakely 
Date:   Mon Sep 22 16:42:51 2014 +0100

Include  in  and .

	* include/bits/stl_queue.h: Include missing header.
	* include/bits/stl_stack.h: Likewise.
	* testsuite/23_containers/priority_queue/requirements/
	uses_allocator.cc: New.
	* testsuite/23_containers/queue/requirements/uses_allocator.cc: New.
	* testsuite/23_containers/stack/requirements/uses_allocator.cc: New.

diff --git a/libstdc++-v3/include/bits/stl_queue.h b/libstdc++-v3/include/bits/stl_queue.h
index b516664..32124e3 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -58,6 +58,9 @@
 
 #include 
 #include 
+#if __cplusplus >= 201103L
+# include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/include/bits/stl_stack.h b/libstdc++-v3/include/bits/stl_stack.h
index ee187da..f4bb72c 100644
--- a/libstdc++-v3/include/bits/stl_stack.h
+++ b/libstdc++-v3/include/bits/stl_stack.h
@@ -58,6 +58,9 @@
 
 #include 
 #include 
+#if __cplusplus >= 201103L
+# include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
new file mode 100644
index 000..efe73ae
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include 
+
+template
+  using uses_allocator = std::uses_allocator, A>;
+
+static_assert( uses_allocator>::value, "valid allocator" );
+
+struct X { };
+static_assert( !uses_allocator::value, "invalid allocator" );
diff --git a/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc
new file mode 100644
index 000..42106ca
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include 
+
+template
+  using uses_allocator = std::uses_allocator, A>;
+
+static_assert( uses_allocator>::value, "valid allocator" );
+
+struct X { };
+static_assert( !uses_allocator::value, "invalid allocator" );
diff --git a/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc
new file mode 100644
index 000..3663d63
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General 

Re: [patch] moving macro definitions to defaults.h

2014-09-22 Thread David Malcolm
On Mon, 2014-09-22 at 12:26 -0400, Andrew MacLeod wrote:
> After being reminded of the tm.h issues brought up last november (here:  
> https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started 
> looking back into it.
> 
> The general summary is the any header file which has a conditional on a  
> target macro could be affected by include file reordering... ie,  if a 
> header file has
> #ifdef BLAH
>   
> 
> and we move the header files around a bit, it wouldn't be immediately 
> obvious if the order where changes and BLAH was define later in the 
> include structure.  The results for this header files would be different 
> because  would no longer be in the preprocess source, and it 
> may take a long time to discover the difference.
> 
> Josephs solution was to identify these and instead put a default 
> definition in default.h ...  then change all the uses to #if instead.. ie,
> #if BLAH
> 
> This way we can ensure that the definition has been seen and it will be 
> a compile error if not.
> 
> 
> I looked at all the target macros listed by Joseph, and decide to start 
> with the ones which are used *only* in an "if defined" situation in a .h 
> files of some sort.
> ie
> #ifdef
> #ifndef
> or #if define()
> 
> If this happens in a .c file, then changing the order of .h's shouldn't 
> matter. (Unless the .c file is doing something really screwy.  So for 
> the moment, ignoring that.)

There appears to be a particular implicit order in which headers must be
included.  I notice that e.g. tm.h has:

  #ifndef GCC_TM_H
  #define GCC_TM_H

so if we're going with this "no header file includes any other header
file" model, would it make sense to add something like:

#ifndef GCC_TM_H
#error tm.h must have been included by this point
/* We need tm.h here so that we can see: BAR, BAZ, QUUX,
   etc.  */
#endif

to header files needing it, thus expressing the expected dependencies
explicitly?


> So looking at only .h files, I found a number of default definitions in 
> rtl.h, which this patch moves to defaults.h.  I was going to #error if 
> defaults.h wasn't included, but found that to be unnecessary since it 
> uses some of those macros and would cause a compile failure anyway if it 
> wasn't included. All the other uses of these particular macros use the 
> #if model, so no further adjustment is required for them.


> 
> This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are 
> running.  Assuming no issues show up, OK for mainline?
> 
> The remaining macros I will have a closer look at and most will likely 
> need the full treatment moving from #ifdef to the #if model.  I plan to 
> do these next.  The macros which fit this category for potential header 
> trouble (along with their usage count) are:
> 
> 2  USE_COLLECT2
> 3  TARGET_TERMINATE_DW2_EH_FRAME_INFO
> 3  TARGET_HAVE_CTORS_DTORS
> 3  REGMODE_NATURAL_SIZE
> 4  CC_STATUS_MDEP_INIT
> 4  CC_STATUS_MDEP_INIT
> 4  MODE_BASE_REG_REG_CLASS
> 4  REGNO_MODE_OK_FOR_REG_BASE_P
> 5  MODE_BASE_REG_CLASS
> 5  XCOFF_DEBUGGING_INFO
> 6  VMS_DEBUGGING_INFO
> 6  TARGET_ASM_OUTPUT_ANCHOR
> 6  EH_FRAME_SECTION_NAME
> 8  MODE_CODE_BASE_REG_CLASS
> 9  HARD_REGNO_CALLER_SAVE_MODE
> 9  HARD_REGNO_CALL_PART_CLOBBERED
> 9  SDB_DEBUGGING_INFO
> 9  CC_STATUS_MDEP
> 9  REGNO_MODE_CODE_OK_FOR_BASE_P
> 10 SECONDARY_RELOAD_CLASS
> 10 TARGET_VXWORKS_RTP
> 14 NO_DOT_IN_LABEL
> 14 REGNO_MODE_OK_FOR_BASE_P
> 17 STACK_REGS
> 19 TARGET_ASM_DESTRUCTOR
> 20 OBJECT_FORMAT_ELF
> 20 TARGET_ASM_CONSTRUCTOR
> 24 NO_DOLLAR_IN_LABEL
> 26 DBX_DEBUGGING_INFO
> 30 CPLUSPLUS_CPP_SPEC
> 42 ASM_OUTPUT_DEF
> 
> 
> 
> 




Re: [AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-09-22 Thread Jeff Law

On 09/22/14 05:16, Alan Lawrence wrote:

Ok thanks Jeff. In that case I think I should draw this to the attention
of the AArch64 maintainers to check the testsuite updates are OK before
I commit...?

Can't hurt.



Methinks it may be possible to get further, or at least increase our
confidence, if I "mock" out try_widen_shift_mode, and/or try injecting
some dubious RTL from a builtin, although this'll only give a momentary
snapshot of behaviour. I may or may not have time to look into this
though ;)...
Yea, it's something I'd pondered as well, though I tend to inject the 
RTL I want directly in the debugger.  The downside is doing so doesn't 
ensure various tables are updated properly, and I vaguely recall a 
per-pseudo table for the combiner's nonzero_bits, signbit_copies and 
friends.



jeff



Re: [patch] moving macro definitions to defaults.h

2014-09-22 Thread Joseph S. Myers
On Mon, 22 Sep 2014, Andrew MacLeod wrote:

> Josephs solution was to identify these and instead put a default definition in
> default.h ...  then change all the uses to #if instead.. ie,
> #if BLAH
> 
> This way we can ensure that the definition has been seen and it will be a
> compile error if not.

No, my suggestion was that whenever possible we should change preprocessor 
conditionals - #ifdef or #if - into C conditionals - "if (MACRO)".

Changing from #ifdef to #if does nothing to make a missing tm.h include 
produce an error - the undefined macro simply quietly gets treated as 0 in 
preprocessor conditionals.  To get an error from #if in such cases, you'd 
need to build GCC with -Wundef (together with existing -Werror), and I'd 
guess there are plenty of places that are not -Wundef clean at present.

Now, I think moves of defaults to defaults.h are generally a good idea, 
and that moving from defined/undefined to true/false semantics are also a 
good idea - even if the way the macro is used means you can't take the 
further step of converting from #if to if ().  They don't solve the 
problem of making a missing tm.h include immediately visible, but they 
*do* potentially help with future automatic refactoring to convert target 
macros into hooks.

Obviously such moves do require checking the definitions and uses of the 
macros in question; you need to make sure you catch all places that use 
#ifdef / #if defined etc. on the macro (and make sure they have the same 
default).  And if you're changing the semantics of the macro from defined 
/ undefined to true / false, you need to watch out for any existing 
definitions with an empty expansion, or an expansion to 0, etc.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch] moving macro definitions to defaults.h

2014-09-22 Thread Joseph S. Myers
On Mon, 22 Sep 2014, David Malcolm wrote:

> There appears to be a particular implicit order in which headers must be
> included.  I notice that e.g. tm.h has:
> 
>   #ifndef GCC_TM_H
>   #define GCC_TM_H
> 
> so if we're going with this "no header file includes any other header
> file" model, would it make sense to add something like:
> 
> #ifndef GCC_TM_H
> #error tm.h must have been included by this point
> /* We need tm.h here so that we can see: BAR, BAZ, QUUX,
>etc.  */
> #endif
> 
> to header files needing it, thus expressing the expected dependencies
> explicitly?

In principle, yes.

In practice, some headers have definitions that depend on tm.h but for 
most users this doesn't matter.  For example, flags.h depends on 
SWITCHABLE_TARGET.  (I think the fix there is to make most users use 
options.h instead, and move miscellaneous declarations from flags.h to 
other headers.)  In some cases, the target macro may be used only in a 
macro expansion.  (BITS_PER_UNIT isn't strictly a target macro any more, 
but when it was its uses in tree.h were an example of that.  tree.h still 
depends on the target macros NO_DOLLAR_IN_LABEL, NO_DOT_IN_LABEL and 
TARGET_DLLIMPORT_DECL_ATTRIBUTES, however, but we shouldn't make all 
tree.h users include tm.h.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-22 Thread Andrew Pinski
On Mon, Sep 22, 2014 at 4:10 AM, Alan Lawrence  wrote:
> Well, I haven't looked into this in detail: I've gone only as far as
>   * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad'
> compiles (r214043), finding that the critical difference is in the
> emit-rtl.o generated by r214043;
>   *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries
> 'fixed_regs + ', and that Richard Biener's changelog specifically
> mentions stripping signedness changes (and introduces the SIGN_NOPS).
>
> However, I apply your patch (minus the hunk adding the (set_attr "type"
> load1"), this appears to have gone in already), and still see the same error
> message:
>
> emit-rtl.o: In function `gen_rtx_REG':
> emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
> section in regclass.o
> emit-rtl.o: In function `gen_rtx':
> emit-rtl.c:(.text+0x1824): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
> section in regclass.o
> collect2: error: ld returned 1 exit status
>
> and still see the same (suspicious-looking, although perhaps not convicted)
> relocations:
>
> $ readelf --relocs
> benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep
> fixed_regs
> 12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
> 12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
> 12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
> 
> 12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs +
> 
> 1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
> 
> 1828  005d0116 R_AARCH64_LDST8_A  fixed_regs +
> 
> 186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
> 1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
>
> I've also now bootstrapped my patch (STRIP_NOPS -> STRIP_SIGN_NOPS * 2) on
> aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no
> regressions, so would like to propose that patch for trunk...?


You need to track down where R_AARCH64_ADR_PREL_PG_HI21 reloc is being
created in the assembly and then track down why GCC is using tiny
model here.  Note my fix was for a similar issue; not necessary the
exact same one in that there could be another pattern which needs to
use the new constraint too.

Thanks,
Andrew

>
> --Alan
>
>
>
>
> Andrew Pinski wrote:
>>
>> On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence 
>> wrote:
>>>
>>> We've been seeing errors using aarch64-none-linux-gnu gcc to build the
>>> 403.gcc benchmark from spec2k6, that we've traced back to this patch. The
>>> error looks like:
>>>
>>> /home/alalaw01/bootstrap_richie/gcc/xgcc
>>> -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
>>> -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
>>> attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
>>> c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
>>> cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
>>> cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
>>> alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
>>> cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o
>>> combine.o
>>> conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
>>> diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
>>> emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
>>> function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
>>> hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o
>>> insn-extract.o
>>> insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
>>> jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
>>> params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
>>> reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
>>> resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
>>> sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
>>> ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
>>> tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o
>>> xcoffout.o
>>> ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
>>> xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o
>>> lbasename.o
>>> getpwd.o ucbqsort.o -lm-o gcc
>>> emit-rtl.o: In function `gen_rtx_REG':
>>> emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
>>> R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
>>> section in regclass.o
>>> emit-rtl.o: In function `gen_rtx':
>>> emit-rtl.c:(.text+0x1824): relocation truncated

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-22 Thread Jeff Law

On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
"old_dest" and live-out from "bb".

Looks good.  However, before committing we need a couple things.

1. Bootstrap & regression test this variant of the patch.  I know you 
tested an earlier one, but please test this one just to be sure.


2. Testcase.  I think you could test for either the reduction in the 
live-in set of the newly created block or that you're shrink wrapping 
one or more functions you didn't previously shrink-wrap.  I think it's 
fine if this test is target specific.


Jeff



Re: Speedup int_bit_from_pos

2014-09-22 Thread Jan Hubicka
> > On Sun, 21 Sep 2014, Jan Hubicka wrote:
> > 
> > > > 
> > > > Please omit static from inline functions.
> > > 
> > > Yep, I suppose we want to drop static in all inlines? I can make patch 
> > > for that.
> > > > 
> > > > Also one notable difference with your patches is that the fits hwi is 
> > > > now not tested on the result but on the result input which, multiplied 
> > > > by 8, might not fit a hwi now.  So please use wide-ints here (the 
> > > > to_offset flavor).
> > > 
> > > The function must always suceed (so user promise it will fit in HWI) and 
> > > for
> > > performance reasons I would rather not go into wide int by defualt, but I 
> > > can
> > > do that with checking enabled.
> > 
> > wide-int should be fast enough, please use it.
> 
> Like this?
> 
> Index: tree.h
> ===
> --- tree.h(revision 215421)
> +++ tree.h(working copy)
> @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
>  extern HOST_WIDE_INT int_size_in_bytes (const_tree);
>  extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
>  extern tree bit_position (const_tree);
> -extern HOST_WIDE_INT int_bit_position (const_tree);
>  extern tree byte_position (const_tree);
>  extern HOST_WIDE_INT int_byte_position (const_tree);
>  
> +/* Like bit_position, but return as an integer.  It must be representable in
> +   that way (since it could be a signed value, we don't have the
> +   option of returning -1 like int_size_in_byte can.  */
> +
> +static inline HOST_WIDE_INT int_bit_position (const_tree field)
> +{ 
> +  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
> +   + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();

Hmm, this gets me:
/aux/hubicka/trunk-6/gcc/testsuite/gcc.c-torture/compile/2224-1.c:39:7: 
internal compiler error: in decompose, at wide-int.h:911
0x656ed6 wi::int_traits >::decompose(long*, 
unsigned int, generic_wide_int const&)
../../gcc/wide-int.h:911
0x6ec9f4 
wide_int_ref_storage::wide_int_ref_storage
 >(generic_wide_int const&, unsigned int)
../../gcc/wide-int.h:959
0x6ec75a generic_wide_int 
>::generic_wide_int 
>(generic_wide_int const&, unsigned int)
../../gcc/wide-int.h:735
0x7d52ec wi::binary_traits, 
generic_wide_int, 
wi::int_traits >::precision_type, 
wi::int_traits 
>::precision_type>::result_type wi::add, 
generic_wide_int >(generic_wide_int const&, 
generic_wide_int const&)
../../gcc/wide-int.h:2287
0x7d4fcc wi::binary_traits, 
generic_wide_int, (wi::precision_type)1, 
wi::int_traits 
>::precision_type>::result_type 
generic_wide_int::operator+
 >(generic_wide_int const&) const
../../gcc/wide-int.h:696
0xf282ab int_bit_position
../../gcc/tree.h:3890

on precision mismatch. What is correct way to compute this?
I tried offset_int but I do not seem smart enough to get past copmile errors 
with that one.
> +}
> +
> +
>  #define sizetype sizetype_tab[(int) stk_sizetype]
>  #define bitsizetype sizetype_tab[(int) stk_bitsizetype]
>  #define ssizetype sizetype_tab[(int) stk_ssizetype]


Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-22 Thread Jeff Law

On 09/22/14 04:29, Jiong Wang wrote:

On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function "ira_implicitly_set_insn_hard_regs" in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate "const"
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI ("recog_data") [flags 0x40]
)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI ("global_a") [flags 0xc0]  ))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI ("global_a") [flags
0xc0]  ))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI ("global_a") [flags 0xc0]  ))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI ("global_a") [flags 0xc0]
)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for
review?


patch attached.

please review, thanks.

gcc/
   * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
!REG_P (src) to
   release more instruction sink opportunities.

gcc/testsuite/
   * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
Thanks.  Please verify this version passes a bootstrap & regression 
test.  Assuming it does it is OK for the trunk.


jeff



Re: Speedup int_bit_from_pos

2014-09-22 Thread Mike Stump
On Sep 22, 2014, at 8:51 AM, Jan Hubicka  wrote:
>> On Sun, 21 Sep 2014, Jan Hubicka wrote:
>> 
 
 Please omit static from inline functions.
>>> 
>>> Yep, I suppose we want to drop static in all inlines? I can make patch for 
>>> that.
 
 Also one notable difference with your patches is that the fits hwi is now 
 not tested on the result but on the result input which, multiplied by 8, 
 might not fit a hwi now.  So please use wide-ints here (the to_offset 
 flavor).
>>> 
>>> The function must always suceed (so user promise it will fit in HWI) and for
>>> performance reasons I would rather not go into wide int by defualt, but I 
>>> can
>>> do that with checking enabled.
>> 
>> wide-int should be fast enough, please use it.
> 
> Like this?
> 
> Index: tree.h
> ===
> --- tree.h(revision 215421)
> +++ tree.h(working copy)
> @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
> extern HOST_WIDE_INT int_size_in_bytes (const_tree);
> extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
> extern tree bit_position (const_tree);
> -extern HOST_WIDE_INT int_bit_position (const_tree);
> extern tree byte_position (const_tree);
> extern HOST_WIDE_INT int_byte_position (const_tree);
> 
> +/* Like bit_position, but return as an integer.  It must be representable in
> +   that way (since it could be a signed value, we don't have the
> +   option of returning -1 like int_size_in_byte can.  */
> +
> +static inline HOST_WIDE_INT int_bit_position (const_tree field)
> +{ 
> +  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
> +   + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();
> +}
> +
> +

Not quite:

  offset_int woffset
= (wi::to_offset (xoffset)
   + wi::lrshift (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)),
  LOG2_BITS_PER_UNIT));

offset_int is the type that can hold all bit positions, byte positions, byte 
sizes and bit sizes.

One can use wi::to_offset to convert things into it.  You can see woffset uses 
to see that various uses to convert back out.  The idea is that eventually all 
the code that plays with that concept, could use that type.  Convert into it 
sooner, and convert out of it later.

Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Xinliang David Li
On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell  wrote:
> On 09/21/14 18:58, Xinliang David Li wrote:
>
>>> the intent is that that points to the gcov_info object of the object file
>>> containing the live version of the function.  I couldn't quite get this
>>> to
>>> work though -- it involves emitting a function's gcov_fn_info decl in the
>>> same comdat group as the function itself.
>>
>>
>> Another problem is that comdat functions may have different CFGs due
>> to different early inline decisions. Comdatting gcov counters can lead
>> to problems in profile use. Not comdatting profile counters have
>> another advantage -- it allows context sensitive profiling for comdat
>> function inline instances (IPA-inline).
>
>
> IIRC early inlining is done before the counters are created.

Yes, and that will be the cause of problem for coverage mismatch when
COMDATing profile counters -- due to difference in early inline
decisions, the counter array created may be different across modules.

>You're right
> later inlining may be a problem, and require a non-comdat set of cloned
> counters.   I can't recall exactly at what stage the counters are now
> inserted relative to inlining.  The CFG machinery had a number of
> significant changes while, and shortly after, I was working on this.

After early inining and before ipa inline.  The early inline can lead
to the coverage mismatch, and the latter can lead to loss of profile
precision with the counter comdat.


David

>
>>> You'll see the checking of gfi_ptr->key != gi_ptr in libgcov-driver.c.
>>>
>>> Are you making use of this machinery, or inventing new machinery?
>>
>>
>> Teresa's method is a different machinery -- it tries to propagate
>> profile data from the selected comdat copy + inline instance copies to
>> comdat copies with zero counts.
>
>
> It'd be preferrable to complete the mechanism I outline above, rather than
> have a competing mechanism.  Also, this patch  is in effect lying because
> the data then makes it look like the unselected comdat instances are in fact
> being executed -- looking at the whole program it's going to be harder to
> understand whether the different inline instances are being executed
> multiple times, or are duplicate data.  Does the gcov user output indicate
> this subtlety in some way?
>
> nathan


  1   2   >