RE: [PATCH GCC]Simplify address expression in IVOPT

2013-11-04 Thread bin.cheng


> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, October 30, 2013 10:46 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
> 
> On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng  wrote:
> 
> Hmm.  I think you want what get_inner_reference_aff does, using the return
> value of get_inner_reference as starting point for determine_base_object.
> And you don't want to restrict yourselves so much on what exprs to
process,
> but only exclude DECL_Ps.
Did you mean I should pass all address expressions to
get_inner_reference_aff except the one with declaration operand?
I am a little confused here why DECL_Ps should be handled specially?  Seems
it can be handled properly by the get_* function.  Anything important I
missed?

> Just amend get_inner_reference_aff to return the tree base object.
I found it's inappropriate to do this because: functions like
get_inner_reference* only accept reference expressions, while base_object
has to be computed for other kinds of expressions too.  Take gimple
statement "a_1 = *ptr_1" as an example, the base passed to alloc_iv is
ptr_1; the base_object computed by determine_base_object is also ptr_1,
which we can't rely on get_inner_reference* .

Also It seems the comment before determine_base_object might be inaccurate:
" Returns a memory object to that EXPR points." which I think is " Returns a
pointer pointing to the main memory object to that EXPR points."
Right?
> 
> Note that this isn't really "simplifying" but rather lowering all
addresses.
> 
> The other changes in this patch are unrelated, right?
Right, I should have named this message like "refine cost computation of
expressions in IVOPT" just as the patch file.  

Thanks.
bin






Re: [wide-int] Update main comment

2013-11-04 Thread Richard Biener
On Wed, 30 Oct 2013, Richard Sandiford wrote:

> Kenneth Zadeck  writes:
> > On 10/30/2013 07:01 AM, Richard Sandiford wrote:
> >> Kenneth Zadeck  writes:
> >>> On 10/29/2013 06:37 PM, Richard Sandiford wrote:
>  This patch tries to update the main wide_int comment to reflect the 
>  current
>  implementation.
> 
>  - bitsizetype is TImode on x86_64 and others, so I don't think it's
>  necessarily true that all offset_ints are signed.  (widest_int are
>  though.)
> >>> i am wondering if this is too conservative an interpretation.I
> >>> believe that they are ti mode because that is the next thing after di
> >>> mode and so they wanted to accommodate the 3 extra bits. Certainly there
> >>> is no x86 that is able to address more than 64 bits.
> >> Right, but my point is that it's a different case from widest_int.
> >> It'd be just as valid to do bitsizetype arithmetic using wide_int
> >> rather than offset_int, and those wide_ints would have precision 128,
> >> just like the offset_ints.  And I wouldn't really say that those wide_ints
> >> were fundamentally signed in any way.  Although the tree layer might "know"
> >> that X upper bits of the bitsizetype are always signs, the tree-wide_int
> >> interface treats them in the same way as any other 128-bit type.
> >>
> >> Maybe I'm just being pedantic, but I think offset_int would only be like
> >> widest_int if bitsizetype had precision 67 or whatever.  Then we could
> >> say that both offset_int and widest_int must be wider than any inputs,
> >> meaning that there's at least one leading sign bit.
> > this was of course what mike and i wanted, but we could not really 
> > figure out how to pull it off.
> > in particular, we could not find any existing reliable marker in the 
> > targets to say what the width of the widest pointer on any 
> > implementation.   We actually used the number 68 rather than 67 because 
> > we assumed 64 for the widest pointer on any existing platform, 3 bits 
> > for the bits and 1 bit for the sign.
> 
> Ah yeah, 68 would be better for signed types.
> 
> Is the patch OK while we still have 128-bit bitsizetypes though?
> I agree the current comment would be right if we ever did switch
> to sub-128 bitsizes.

The issue with sub-128bit bitsizetype is code generation quality.
We do generate code for bitsizetype operations (at least from Ada),
so a power-of-two precision is required to avoid a lot of masking
operations.

Richard.


[wide-int] Fix UNSIGNED_FIX typo

2013-11-04 Thread Richard Sandiford
"t" had got switched to "x" during the conversion.  This showed up in
one of the alpha g++.dg tests.

Applied as obvious.

Richard


Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2013-11-03 13:01:39.083385620 +
+++ gcc/simplify-rtx.c  2013-11-03 20:22:41.115469359 +
@@ -1819,7 +1819,8 @@ simplify_const_unary_operation (enum rtx
  if (REAL_VALUES_LESS (t, x))
return immed_wide_int_const (wmax, mode);
 
- return immed_wide_int_const (real_to_integer (&t, &fail, width), 
mode);
+ return immed_wide_int_const (real_to_integer (&x, &fail, width),
+  mode);
  break;
 
default:


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Fri, 1 Nov 2013, Richard Sandiford wrote:

> I'm building one target for each supported CPU and comparing the wide-int
> assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
> output from the merge point.  This patch removes all the differences I saw
> for alpha-linux-gnu in gcc.c-torture.
> 
> Hunk 1: Preserve the current trunk behaviour in which the shift count
> is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
> inspection after hunk 5.
> 
> Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
> types and where we weren't extending according to the sign of the source.
> We should probably assert that the input is at least as wide as the type...
> 
> Hunk 4: The "&" in:
> 
> if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
> && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
> 
> had got dropped during the conversion.
> 
> Hunk 5: The old code was:
> 
> if (shift > 0)
>   {
> *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
> *val = r1val.llshift (shift, TYPE_PRECISION (type));
>   }
> else if (shift < 0)
>   {
> shift = -shift;
> *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
> *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
>   }
> 
> and these precision arguments had two purposes: to control which
> bits of the first argument were shifted, and to control the truncation
> mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
> for the second.
> 
> (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
> end of the !GENERATOR_FILE list in rtl.def, since the current position caused
> some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
> on code, RTL PRE creates registers in the hash order of the associated
> expressions, RA uses register numbers as a tie-breaker during ordering,
> and so the order of rtx_code can indirectly influence register allocation.
> First time I'd realised that could happen, so just thought I'd mention it.
> I think we should keep rtl.def in the current (logical) order though.)
> 
> Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?

Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
from double-int.c on the trunk?  If not then putting this at the
callers of wi::rshift and friends is clearly asking for future
mistakes.

Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
than in machine instruction patterns was a mistake in the past.

Thanks,
Richard.

[btw, please use diff -p to get us at least some context if you
are not writing changelogs ...]

> Thanks,
> Richard
> 
> 
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c  (revision 204247)
> +++ gcc/fold-const.c  (working copy)
> @@ -1008,9 +1008,12 @@
>  The following code ignores overflow; perhaps a C standard
>  interpretation ruling is needed.  */
>   res = wi::rshift (arg1, arg2, sign,
> -   GET_MODE_BITSIZE (TYPE_MODE (type)));
> +   SHIFT_COUNT_TRUNCATED
> +   ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
>else
> - res = wi::lshift (arg1, arg2, GET_MODE_BITSIZE (TYPE_MODE (type)));
> + res = wi::lshift (arg1, arg2,
> +   SHIFT_COUNT_TRUNCATED
> +   ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
>break;
>
>  case RROTATE_EXPR:
> @@ -6870,7 +6873,8 @@
>  return NULL_TREE;
>  
>if (TREE_CODE (arg1) == INTEGER_CST)
> -arg1 = force_fit_type (inner_type, arg1, 0, TREE_OVERFLOW (arg1));
> +arg1 = force_fit_type (inner_type, wi::to_widest (arg1), 0,
> +TREE_OVERFLOW (arg1));
>else
>  arg1 = fold_convert_loc (loc, inner_type, arg1);
>  
> @@ -8081,7 +8085,8 @@
>   }
> if (change)
>   {
> -   tem = force_fit_type (type, and1, 0, TREE_OVERFLOW (and1));
> +   tem = force_fit_type (type, wi::to_widest (and1), 0,
> + TREE_OVERFLOW (and1));
> return fold_build2_loc (loc, BIT_AND_EXPR, type,
> fold_convert_loc (loc, type, and0), tem);
>   }
> @@ -14098,12 +14103,13 @@
>   (inner_width, outer_width - inner_width, false, 
>TYPE_PRECISION (TREE_TYPE (arg1)));
>  
> -   if (mask == arg1)
> +   wide_int common = mask & arg1;
> +   if (common == mask)
>   {
> tem_type = signed_type_for (TREE_TYPE (tem));
> tem = fold_convert_loc (loc, tem_type, tem);
>   }
> -   else if ((mask & arg1) == 0)
> +   else if (common == 0)
>   {
> tem_type = unsigned_type_for (TREE_TYPE (tem)

[wide-int] Make integer_onep return false for signed 1-bit bitfields

2013-11-04 Thread Richard Sandiford
As discussed on gcc@, integer_onep is supposed to return false for
a nonzero 1-bit value if the type is signed.

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/tree.c
===
--- gcc/tree.c  2013-10-29 19:19:27.623468618 +
+++ gcc/tree.c  2013-11-02 17:25:06.499657501 +
@@ -2092,7 +2092,7 @@ integer_onep (const_tree expr)
   switch (TREE_CODE (expr))
 {
 case INTEGER_CST:
-  return wi::eq_p (expr, 1);
+  return wi::eq_p (wi::to_widest (expr), 1);
 case COMPLEX_CST:
   return (integer_onep (TREE_REALPART (expr))
  && integer_zerop (TREE_IMAGPART (expr)));


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Biener wrote:

> On Fri, 1 Nov 2013, Richard Sandiford wrote:
> 
> > I'm building one target for each supported CPU and comparing the wide-int
> > assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
> > output from the merge point.  This patch removes all the differences I saw
> > for alpha-linux-gnu in gcc.c-torture.
> > 
> > Hunk 1: Preserve the current trunk behaviour in which the shift count
> > is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
> > inspection after hunk 5.
> > 
> > Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
> > types and where we weren't extending according to the sign of the source.
> > We should probably assert that the input is at least as wide as the type...
> > 
> > Hunk 4: The "&" in:
> > 
> >   if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
> >   && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
> > 
> > had got dropped during the conversion.
> > 
> > Hunk 5: The old code was:
> > 
> >   if (shift > 0)
> > {
> >   *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
> >   *val = r1val.llshift (shift, TYPE_PRECISION (type));
> > }
> >   else if (shift < 0)
> > {
> >   shift = -shift;
> >   *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
> >   *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
> > }
> > 
> > and these precision arguments had two purposes: to control which
> > bits of the first argument were shifted, and to control the truncation
> > mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
> > for the second.
> > 
> > (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
> > end of the !GENERATOR_FILE list in rtl.def, since the current position 
> > caused
> > some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
> > on code, RTL PRE creates registers in the hash order of the associated
> > expressions, RA uses register numbers as a tie-breaker during ordering,
> > and so the order of rtx_code can indirectly influence register allocation.
> > First time I'd realised that could happen, so just thought I'd mention it.
> > I think we should keep rtl.def in the current (logical) order though.)
> > 
> > Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
> 
> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
> from double-int.c on the trunk?  If not then putting this at the
> callers of wi::rshift and friends is clearly asking for future
> mistakes.
> 
> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
> than in machine instruction patterns was a mistake in the past.

Oh, and my understanding is that it's maybe required for correctness on
the RTL side if middle-end code removes masking operations.
Constant propagation results later may not change the result because
of that.  I'm not sure this is the way it happens, but if we have

  (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))

and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)

  (lshift:si (reg:si 1) (reg:qi 2))

and later constant propagate 77 to reg:qi 2.

That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
but instead the backends should have a define_insn which matches
the 'and' and combine should try to match that.

You can see that various architectures may have truncating shifts
but not for all patterns which results in stuff like

config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD

which clearly means that it's not implemented in terms of providing
shift insns which can match a shift operand that is masked.

That is, either try to clean that up (hooray!), or preserve
the double-int.[ch] behavior (how does CONST_INT RTL constant
folding honor it?).

Richard.

> Thanks,
> Richard.
> 
> [btw, please use diff -p to get us at least some context if you
> are not writing changelogs ...]
> 
> > Thanks,
> > Richard
> > 
> > 
> > Index: gcc/fold-const.c
> > ===
> > --- gcc/fold-const.c(revision 204247)
> > +++ gcc/fold-const.c(working copy)
> > @@ -1008,9 +1008,12 @@
> >The following code ignores overflow; perhaps a C standard
> >interpretation ruling is needed.  */
> > res = wi::rshift (arg1, arg2, sign,
> > - GET_MODE_BITSIZE (TYPE_MODE (type)));
> > + SHIFT_COUNT_TRUNCATED
> > + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
> >else
> > -   res = wi::lshift (arg1, arg2, GET_MODE_BITSIZE (TYPE_MODE (type)));
> > +   res = wi::lshift (arg1, arg2,
> > + SHIFT_COUNT_TRUNCATED
> > + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
> >break;
> >
> >  case RROTATE_EXPR:
> > @@ -6870,7 +6873,8 @@
> >  return NULL_TREE;
> >  
> >if (

Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 1 Nov 2013, Richard Sandiford wrote:
>> I'm building one target for each supported CPU and comparing the wide-int
>> assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
>> output from the merge point.  This patch removes all the differences I saw
>> for alpha-linux-gnu in gcc.c-torture.
>> 
>> Hunk 1: Preserve the current trunk behaviour in which the shift count
>> is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
>> inspection after hunk 5.
>> 
>> Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
>> types and where we weren't extending according to the sign of the source.
>> We should probably assert that the input is at least as wide as the type...
>> 
>> Hunk 4: The "&" in:
>> 
>>if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
>>&& (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
>> 
>> had got dropped during the conversion.
>> 
>> Hunk 5: The old code was:
>> 
>>if (shift > 0)
>>  {
>>*mask = r1mask.llshift (shift, TYPE_PRECISION (type));
>>*val = r1val.llshift (shift, TYPE_PRECISION (type));
>>  }
>>else if (shift < 0)
>>  {
>>shift = -shift;
>>*mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
>>*val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
>>  }
>> 
>> and these precision arguments had two purposes: to control which
>> bits of the first argument were shifted, and to control the truncation
>> mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
>> for the second.
>> 
>> (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
>> end of the !GENERATOR_FILE list in rtl.def, since the current position caused
>> some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
>> on code, RTL PRE creates registers in the hash order of the associated
>> expressions, RA uses register numbers as a tie-breaker during ordering,
>> and so the order of rtx_code can indirectly influence register allocation.
>> First time I'd realised that could happen, so just thought I'd mention it.
>> I think we should keep rtl.def in the current (logical) order though.)
>> 
>> Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
>
> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
> from double-int.c on the trunk?  If not then putting this at the
> callers of wi::rshift and friends is clearly asking for future
> mistakes.
>
> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
> than in machine instruction patterns was a mistake in the past.

Sure, I can try, but what counts as success?  For better or worse:

unsigned int i = 1 << 32;

has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour,
so (e.g.) i is 0 for x86_64 and 1 for MIPS.  So if the idea is to
get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a
visible change (but presumably only for undefined code).

> [btw, please use diff -p to get us at least some context if you
> are not writing changelogs ...]

Yeah, sorry, this whole "no changelog" thing has messed up my workflow :-)
The scripts I normally use did that for me.

Thanks,
Richard


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 4 Nov 2013, Richard Biener wrote:
>
>> On Fri, 1 Nov 2013, Richard Sandiford wrote:
>> 
>> > I'm building one target for each supported CPU and comparing the wide-int
>> > assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
>> > output from the merge point.  This patch removes all the differences I saw
>> > for alpha-linux-gnu in gcc.c-torture.
>> > 
>> > Hunk 1: Preserve the current trunk behaviour in which the shift count
>> > is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
>> > inspection after hunk 5.
>> > 
>> > Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
>> > types and where we weren't extending according to the sign of the source.
>> > We should probably assert that the input is at least as wide as the type...
>> > 
>> > Hunk 4: The "&" in:
>> > 
>> >  if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
>> >  && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
>> > 
>> > had got dropped during the conversion.
>> > 
>> > Hunk 5: The old code was:
>> > 
>> >  if (shift > 0)
>> >{
>> >  *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
>> >  *val = r1val.llshift (shift, TYPE_PRECISION (type));
>> >}
>> >  else if (shift < 0)
>> >{
>> >  shift = -shift;
>> >  *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
>> >  *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
>> >}
>> > 
>> > and these precision arguments had two purposes: to control which
>> > bits of the first argument were shifted, and to control the truncation
>> > mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
>> > for the second.
>> > 
>> > (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
>> > end of the !GENERATOR_FILE list in rtl.def, since the current position 
>> > caused
>> > some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
>> > on code, RTL PRE creates registers in the hash order of the associated
>> > expressions, RA uses register numbers as a tie-breaker during ordering,
>> > and so the order of rtx_code can indirectly influence register allocation.
>> > First time I'd realised that could happen, so just thought I'd mention it.
>> > I think we should keep rtl.def in the current (logical) order though.)
>> > 
>> > Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
>> 
>> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
>> from double-int.c on the trunk?  If not then putting this at the
>> callers of wi::rshift and friends is clearly asking for future
>> mistakes.
>> 
>> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
>> than in machine instruction patterns was a mistake in the past.
>
> Oh, and my understanding is that it's maybe required for correctness on
> the RTL side if middle-end code removes masking operations.
> Constant propagation results later may not change the result because
> of that.  I'm not sure this is the way it happens, but if we have
>
>   (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))
>
> and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)
>
>   (lshift:si (reg:si 1) (reg:qi 2))
>
> and later constant propagate 77 to reg:qi 2.
>
> That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
> but instead the backends should have a define_insn which matches
> the 'and' and combine should try to match that.
>
> You can see that various architectures may have truncating shifts
> but not for all patterns which results in stuff like
>
> config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
>
> which clearly means that it's not implemented in terms of providing
> shift insns which can match a shift operand that is masked.
>
> That is, either try to clean that up (hooray!), or preserve
> the double-int.[ch] behavior (how does CONST_INT RTL constant
> folding honor it?).

The CONST_INT code does the modulus explicitly:

  /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure
 the value is in range.  We can't return any old value for
 out-of-range arguments because either the middle-end (via
 shift_truncation_mask) or the back-end might be relying on
 target-specific knowledge.  Nor can we rely on
 shift_truncation_mask, since the shift might not be part of an
 ashlM3, lshrM3 or ashrM3 instruction.  */
  if (SHIFT_COUNT_TRUNCATED)
arg1 = (unsigned HOST_WIDE_INT) arg1 % width;
  else if (arg1 < 0 || arg1 >= GET_MODE_BITSIZE (mode))
return 0;

There was a strong feeling (from me and others) that wide-int.h
should not depend on tm.h.  I don't think wi::lshift itself should
know about SHIFT_COUNT_TRUNCATED.  Especially since (and I think we agree
on this :-)) it's a terrible interface.  It would be better if it took
a 

Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-11-04 Thread Jan Hubicka
> Thanks! Can you attach the patch file for the proposed change?
> 
> David
> 
> On Fri, Nov 1, 2013 at 2:08 PM, Rong Xu  wrote:
> > libgcov.c is the main file to generate libgcov.a. This single source 
> > generates
> > 21 object files and then archives to a library. These objects are of
> > very different purposes but they are in the same file guarded by various 
> > macros.
> > The source file becomes quite large and its readability becomes very
> > low. In its current state:
> > 1) The code is very hard to understand by new developers;

Agreed, libgcov has became hard to maintain since several parts are written
in very low-level way.
> > 2) It makes regressions more likely when making simple changes;
> > More importantly,
> > 3) it makes code reuse/sharing very difficult. For instance, Using
> > libgcov to implement an offline profile tool (see below); kernel FDO
> > support etc.

Yep, it was my longer term plan to do this, too.
> >
> > We came to this issue when working on an offline tool to handle
> > profile counters.
> > Our plan is to have a profile-tool can merge, diff, collect statistics, and
> > better dump raw profile counters. This is one of the most requested features
> > from out internal users. This tool can also be very useful for any 
> > FDO/coverage
> > users. In the first part of tool, we want to have the weight
> > merge. Again, to reuse the code, we have to change libgcov.c. But we don't 
> > want
> > to further diverge libgcov.a in our google branches from the trunk.
> >
> > Another issue in libgcov.c is function gcov_exit(). It is a huge function
> > with about 467 lines of code. This function traverses gcov_list and dumps 
> > all
> > the gcda files. The code for merging the gcda files also sits in the same
> >  function. The structure makes the code-reuse and extension really difficult
> > (like in kernel FDO, we have to break this function to reuse some of the 
> > code).
> >
> > We propose to break gcov_exit into smaller functions and split libgcov.c 
> > into
> > several files. Our plan for profile-tool is listed in (3).
> >
> > 1. break gcov_exit() into the following structure:
> >gcov_exit()
> >   --> gcov_exit_compute_summary()
> >   --> allocate_filename_struct()
> >   for gi_ptr in gcov_list
> > --> gcov_exit_dump_gcov()
> >--> gcov_exit_open_gcda_file()
> >--> gcov_exit_merge_gcda ()
> >--> gcov_exit_write_gcda ()

Just a short comment that summaries are also produced during the merging - i.e. 
they are
done on per-file basis. But otherwise I agree.
We should be cureful to not increase footprint of libgcov much for embedded 
users, but
I believe changes like this can be done w/o additional overhead in the 
optimized library.
> >
> > 2. split libgcov.c into the following files:
> >  libgcov-profiler.c
> >  libgcov-merge.c
> >  libgcov-interface.c
> >  libgcov-driver.c
> >libgcov-driver-system.c (source included into libgcov-driver.c)

Seems resonable.  Splitting i/o stuff into separate module (driver) is 
something I planned
for long time.  What is difference in between libgcov-interface and 
libgcov-driver?
> >
> > The details of the splitting:
> > libgcov-interface.c
> > /* This file contains API functions to other programs and the supporting
> >functions.  */
> >   __gcov_dump()
> >   __gcov_execl()
> >   __gcov_execle()
> >   __gcov_execlp()
> >   __gcov_execv()
> >   __gcov_execve()
> >   __gcov_execvp()
> >   __gcov_flush()
> >   __gcov_fork()
> >   __gcov_reset()
> >   init_mx()
> >   init_mx_once()
> >
> > libgcov-profiler.c
> > /* This file contains runtime profile handlers.  */
> >   variables:
> > __gcov_indirect_call_callee
> > __gcov_indirect_call_counters
> >   functions:
> > __gcov_average_profiler()
> > __gcov_indirect_call_profiler()
> > __gcov_indirect_call_profiler_v2()
> > __gcov_interval_profiler()
> > __gcov_ior_profiler()
> > __gcov_one_value_profiler()
> > __gcov_one_value_profiler_body()
> > __gcov_pow2_profiler()
> >
> > libgcov-merge.c
> > /* This file contains the merge functions for various counters.  */
> >   functions:
> > __gcov_merge_add()
> > __gcov_merge_delta()
> > __gcov_merge_ior()
> > __gcov_merge_single()
> >
> > libcov-driver.c
> > /* This file contains the gcov dumping functions. We separate the
> >system dependent part to libgcov-driver-system.c.  */
> >   variables:
> > gcov_list
> > gcov_max_filename
> > gcov_dump_complete
> > -- newly added file static variables --
> > this_prg
> > all_prg
> > crc32
> > gi_filename
> > fn_buffer
> > fn_tail
> > sum_buffer
> > next_sum_buffer
> > sum_tail
> > - end -
> >   functions:
> > free_fn_data()
> > buffer_fn_data()
> > crc32_unsigned()
> > gcov_version()
> > gcov_histogram_insert()
> > gcov_compute_histogram()
> > 

Re: [PATCH 1/n] Add conditional compare support

2013-11-04 Thread Joey Ye
+  "TARGET_ARM || TARGET_THUMB2"
TARGET_32BIT

+static const char *const ite = "it\t%d4";
+static const int cmp_idx[9] = {0, 0, 1, 0, 1};
s/9/5/

On Wed, Oct 30, 2013 at 5:32 PM, Zhenqiang Chen  wrote:
>
>> -Original Message-
>> From: Richard Henderson [mailto:r...@redhat.com]
>> Sent: Monday, October 28, 2013 11:07 PM
>> To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener'
>> Cc: GCC Patches
>> Subject: Re: [PATCH 1/n] Add conditional compare support
>>
>> On 10/28/2013 01:32 AM, Zhenqiang Chen wrote:
>> > Patch is updated according to your comments. Main changes are:
>> > * Add two hooks: legitimize_cmp_combination and
>> > legitimize_ccmp_combination
>> > * Improve document.
>>
>> No, these are not the hooks I proposed.
>>
>> You should *not* have a ccmp_optab, because the middle-end has
>> absolutely no idea what mode TARGET should be in.
>>
>> The hook needs to return an rtx expression appropriate for feeding to
>> cbranch et al.  E.g. for arm,
>>
>>   (ne (reg:CC_Z CC_REGNUM) (const_int 0))
>>
>> after having emitted the instruction that sets CC_REGNUM.
>>
>> We need to push this to the backend because for ia64 the expression needs
>> to be of te form
>>
>>   (ne (reg:BI new_pseudo) (const_int 0))
>
> Thanks for the clarification.
> Patch is updated:
> * ccmp_optab is removed.
> * Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends 
> to generate the conditional compare instructions.
>
> Thanks!
> -Zhenqiang


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 1 Nov 2013, Richard Sandiford wrote:
> >> I'm building one target for each supported CPU and comparing the wide-int
> >> assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
> >> output from the merge point.  This patch removes all the differences I saw
> >> for alpha-linux-gnu in gcc.c-torture.
> >> 
> >> Hunk 1: Preserve the current trunk behaviour in which the shift count
> >> is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
> >> inspection after hunk 5.
> >> 
> >> Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
> >> types and where we weren't extending according to the sign of the source.
> >> We should probably assert that the input is at least as wide as the type...
> >> 
> >> Hunk 4: The "&" in:
> >> 
> >>  if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
> >>  && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
> >> 
> >> had got dropped during the conversion.
> >> 
> >> Hunk 5: The old code was:
> >> 
> >>  if (shift > 0)
> >>{
> >>  *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
> >>  *val = r1val.llshift (shift, TYPE_PRECISION (type));
> >>}
> >>  else if (shift < 0)
> >>{
> >>  shift = -shift;
> >>  *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
> >>  *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
> >>}
> >> 
> >> and these precision arguments had two purposes: to control which
> >> bits of the first argument were shifted, and to control the truncation
> >> mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
> >> for the second.
> >> 
> >> (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
> >> end of the !GENERATOR_FILE list in rtl.def, since the current position 
> >> caused
> >> some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
> >> on code, RTL PRE creates registers in the hash order of the associated
> >> expressions, RA uses register numbers as a tie-breaker during ordering,
> >> and so the order of rtx_code can indirectly influence register allocation.
> >> First time I'd realised that could happen, so just thought I'd mention it.
> >> I think we should keep rtl.def in the current (logical) order though.)
> >> 
> >> Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
> >
> > Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
> > from double-int.c on the trunk?  If not then putting this at the
> > callers of wi::rshift and friends is clearly asking for future
> > mistakes.
> >
> > Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
> > than in machine instruction patterns was a mistake in the past.
> 
> Sure, I can try, but what counts as success?  For better or worse:
> 
> unsigned int i = 1 << 32;
> 
> has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour,
> so (e.g.) i is 0 for x86_64 and 1 for MIPS.  So if the idea is to
> get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a
> visible change (but presumably only for undefined code).

Sure - but the above is undefined (as you say).  Generally
constant folding should follow language standards, not machine
dependent behavior.

> > [btw, please use diff -p to get us at least some context if you
> > are not writing changelogs ...]
> 
> Yeah, sorry, this whole "no changelog" thing has messed up my workflow :-)
> The scripts I normally use did that for me.

;)

Richard.


RE: [PATCH 1/n] Add conditional compare support

2013-11-04 Thread Zhenqiang Chen

> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Friday, November 01, 2013 12:27 AM
> To: Zhenqiang Chen
> Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches
> Subject: Re: [PATCH 1/n] Add conditional compare support
> 
> On 10/31/2013 02:06 AM, Zhenqiang Chen wrote:
> > With two compares, we might swap the order of the two compares to get
> > a valid combination. This is what current ARM does (Please refer
> > arm_select_dominace_cc_mode, cmp_and, cmp_ior, and_scc_scc and
> ior_scc_scc).
> > To improve gen_ccmp_first, we need another function/hook to determine
> > which compare is done first. I will double check ARM backend to avoid hook.
> 
> Hmm.  I hadn't thought of that, and missed that point when looking through
> your previous patches.  I think you should have a third hook for that.  ARM
> will need it, but IA-64 doesn't.

Thanks. I add a new hook. The default function will return -1 if the target 
does not care about the order.

+DEFHOOK
+(select_ccmp_cmp_order,
+ "For some target (like ARM), the order of two compares is sensitive for\n\
+conditional compare.  cmp0-cmp1 might be an invalid combination.  But when\n\
+swapping the order, cmp1-cmp0 is valid.  The function will return\n\
+  -1: if @code{code1} and @code{code2} are valid combination.\n\
+   1: if @code{code2} and @code{code1} are valid combination.\n\
+   0: both are invalid.",
+ int, (int code1, int code2),
+ default_select_ccmp_cmp_order)
 
> > Hook gen_ccmp_next needs one more parameter to indicate AND/IOR
> since
> > they will generate different instructions.
> 
> True, I forgot about that while writing the hook descriptions.

For gen_ccmp_next, I add another parameter CC_P to indicate the result is used 
as CC or not. If CC_P is false, the gen_ccmp_next will return a general 
register. This is for code like

int test (int a, int b)
{
  return a > 0 && b > 0;
}
During expand, there might have no branch at all. So gen_ccmp_next  can not 
return CC for "a > 0 && b > 0".

+DEFHOOK
+(gen_ccmp_next,
+ "This function emits a conditional comparison within a sequence of\n\
+ conditional comparisons.  The @code{prev} expression is the result of a\n\
+ prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}.  It may return\n\
+ @code{NULL} if the combination of @code{prev} and this comparison is\n\
+ not supported, otherwise the result must be appropriate for passing to\n\
+ @code{gen_ccmp_next} or @code{cbranch_optab} if @code{cc_p} is true.\n\
+ If @code{cc_p} is false, it returns a general register.  @code{bit_code}\n\
+ is AND or IOR, which is the op on the two compares.",
+ rtx, (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code, bool cc_p),
+ NULL)

A updated patch is attached.

Thanks!
-Zhenqiang

ccmp-hook3.patch
Description: Binary data


Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-11-04 Thread James Greenhalgh
On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2a5a2e1..8f5d39a 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or
>  wider than the mode of the product. The result is placed in operand 0, which
>  is of the same mode as operand 3.
> 
> +@cindex @code{ssad@var{m}} instruction pattern
> +@item @samp{ssad@var{m}}
> +@cindex @code{usad@var{m}} instruction pattern
> +@item @samp{usad@var{m}}
> +Compute the sum of absolute differences of two signed/unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their absolute difference, 
> which
> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a 
> mode
> +equal or wider than the mode of the absolute difference. The result is placed
> +in operand 0, which is of the same mode as operand 3.
> +
>  @cindex @code{ssum_widen@var{m3}} instruction pattern
>  @item @samp{ssum_widen@var{m3}}
>  @cindex @code{usum_widen@var{m3}} instruction pattern
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 4975a64..1db8a49 100644

I'm not sure I follow, and if I do - I don't think it matches what
you have implemented for i386.

>From your text description I would guess the series of operations to be:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  operands[0] = v3 + operands[3]

But if I understand the behaviour of PSADBW correctly, what you have
actually implemented is:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  v4 = reduce_plus (v3)
  operands[0] = v4 + operands[3]

To my mind, synthesizing the reduce_plus step will be wasteful for targets
who do not get this for free with their Absolute Difference step. Imagine a
simple loop where we have synthesized the reduce_plus, we compute partial
sums each loop iteration, though we would be better to leave the reduce_plus
step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
Tree code for this.

I would prefer to see this Tree code not imply the reduce_plus.

Thanks,
James



Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Mon, 4 Nov 2013, Richard Biener wrote:
> >
> >> On Fri, 1 Nov 2013, Richard Sandiford wrote:
> >> 
> >> > I'm building one target for each supported CPU and comparing the wide-int
> >> > assembly output of gcc.c-torture, gcc.dg and g++.dg with the 
> >> > corresponding
> >> > output from the merge point.  This patch removes all the differences I 
> >> > saw
> >> > for alpha-linux-gnu in gcc.c-torture.
> >> > 
> >> > Hunk 1: Preserve the current trunk behaviour in which the shift count
> >> > is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
> >> > inspection after hunk 5.
> >> > 
> >> > Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
> >> > types and where we weren't extending according to the sign of the source.
> >> > We should probably assert that the input is at least as wide as the 
> >> > type...
> >> > 
> >> > Hunk 4: The "&" in:
> >> > 
> >> >if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
> >> >&& (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
> >> > 
> >> > had got dropped during the conversion.
> >> > 
> >> > Hunk 5: The old code was:
> >> > 
> >> >if (shift > 0)
> >> >  {
> >> >*mask = r1mask.llshift (shift, TYPE_PRECISION (type));
> >> >*val = r1val.llshift (shift, TYPE_PRECISION (type));
> >> >  }
> >> >else if (shift < 0)
> >> >  {
> >> >shift = -shift;
> >> >*mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
> >> >*val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
> >> >  }
> >> > 
> >> > and these precision arguments had two purposes: to control which
> >> > bits of the first argument were shifted, and to control the truncation
> >> > mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
> >> > for the second.
> >> > 
> >> > (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to 
> >> > the
> >> > end of the !GENERATOR_FILE list in rtl.def, since the current position 
> >> > caused
> >> > some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
> >> > on code, RTL PRE creates registers in the hash order of the associated
> >> > expressions, RA uses register numbers as a tie-breaker during ordering,
> >> > and so the order of rtx_code can indirectly influence register 
> >> > allocation.
> >> > First time I'd realised that could happen, so just thought I'd mention 
> >> > it.
> >> > I think we should keep rtl.def in the current (logical) order though.)
> >> > 
> >> > Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
> >> 
> >> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
> >> from double-int.c on the trunk?  If not then putting this at the
> >> callers of wi::rshift and friends is clearly asking for future
> >> mistakes.
> >> 
> >> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
> >> than in machine instruction patterns was a mistake in the past.
> >
> > Oh, and my understanding is that it's maybe required for correctness on
> > the RTL side if middle-end code removes masking operations.
> > Constant propagation results later may not change the result because
> > of that.  I'm not sure this is the way it happens, but if we have
> >
> >   (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))
> >
> > and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)
> >
> >   (lshift:si (reg:si 1) (reg:qi 2))
> >
> > and later constant propagate 77 to reg:qi 2.
> >
> > That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
> > but instead the backends should have a define_insn which matches
> > the 'and' and combine should try to match that.
> >
> > You can see that various architectures may have truncating shifts
> > but not for all patterns which results in stuff like
> >
> > config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
> >
> > which clearly means that it's not implemented in terms of providing
> > shift insns which can match a shift operand that is masked.
> >
> > That is, either try to clean that up (hooray!), or preserve
> > the double-int.[ch] behavior (how does CONST_INT RTL constant
> > folding honor it?).
> 
> The CONST_INT code does the modulus explicitly:
> 
> /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure
>the value is in range.  We can't return any old value for
>out-of-range arguments because either the middle-end (via
>shift_truncation_mask) or the back-end might be relying on
>target-specific knowledge.  Nor can we rely on
>shift_truncation_mask, since the shift might not be part of an
>ashlM3, lshrM3 or ashrM3 instruction.  */
> if (SHIFT_COUNT_TRUNCATED)
>   arg1 = (unsigned HOST_WIDE_INT) arg1 % width;
> else if (arg1 < 0 || arg1 >= GET_MODE_BITSIZE (mode))
>   return 0;
> 
> There was a stro

Re: [PATCH] Fix up reassoc (PR tree-optimization/58946)

2013-11-04 Thread Richard Biener
On Fri, 1 Nov 2013, Jakub Jelinek wrote:

> Hi!
> 
> My recent reassoc patch caused following regression (though, it only started
> failing on this testcase with Andrew's ifcombine changes).
> 
> The issue is that update_ops relies on walking the same stmts as get_ops
> did, and uses has_single_uses (either directly or indirectly through
> is_reassociable_op).  optimize_range_tests itself doesn't change the IL
> except for inserting new stmts using values on which get_ops already didn't
> recurse (either because they were multiple uses or non-reassociable).
> The problem in the testcase is when optimizing a GIMPLE_COND directly, there
> is no guarantee of single use, we treat the condition as the starting point
> of init_range_info and thus SSA_NAME != 0 or SSA_NAME == 0 etc. and that
> is just fine if SSA_NAME has multiple uses, so if we first change the
> condition to something else (as instructed by the changed ops[i]->op value
> from NULL to some SSA_NAME), we might turn something update_ops looks at
> from multiple uses into single use.
> 
> This patch fixes it by doing all the update_ops calls before changing
> GIMPLE_CONDs themselves.  I believe it is safe, update_ops will walk only
> single use SSA_NAMEs and thus they occur only in the single particular
> update_ops call, and never removes anything, only adds new stmt (which
> can make single use SSA_NAMEs into multiple use, but that happened after
> we've walked that originally single use exactly ones from the single use),
> and GIMPLE_COND adjustments never use has_single_use, thus they can be
> safely done after all update_ops have been called.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2013-11-01  Jakub Jelinek  
> 
>   PR tree-optimization/58946
>   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Update all
>   bbs with bbinfo[idx].op != NULL before all blocks with
>   bbinfo[idx].op == NULL.
> 
>   * gcc.c-torture/compile/pr58946.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2013-10-24 10:19:21.0 +0200
> +++ gcc/tree-ssa-reassoc.c2013-11-01 09:23:09.264615181 +0100
> @@ -2657,6 +2657,7 @@ maybe_optimize_range_tests (gimple stmt)
>edge e;
>vec ops = vNULL;
>vec bbinfo = vNULL;
> +  bool any_changes = false;
>  
>/* Consider only basic blocks that end with GIMPLE_COND or
>   a cast statement satisfying final_range_test_p.  All
> @@ -2870,41 +2871,31 @@ maybe_optimize_range_tests (gimple stmt)
>   break;
>  }
>if (ops.length () > 1)
> +any_changes = optimize_range_tests (ERROR_MARK, &ops);
> +  if (any_changes)
>  {
>unsigned int idx;
> -  bool any_changes = optimize_range_tests (ERROR_MARK, &ops);
> -  for (bb = last_bb, idx = 0; any_changes; bb = single_pred (bb), idx++)
> +  /* update_ops relies on has_single_use predicates returning the
> +  same values as it did during get_ops earlier.  Additionally it
> +  never removes statements, only adds new ones and it should walk
> +  from the single imm use and check the predicate already before
> +  making those changes.
> +  On the other side, the handling of GIMPLE_COND directly can turn
> +  previously multiply used SSA_NAMEs into single use SSA_NAMEs, so
> +  it needs to be done in a separate loop afterwards.  */
> +  for (bb = last_bb, idx = 0; ; bb = single_pred (bb), idx++)
>   {
> -   if (bbinfo[idx].first_idx < bbinfo[idx].last_idx)
> +   if (bbinfo[idx].first_idx < bbinfo[idx].last_idx
> +   && bbinfo[idx].op != NULL_TREE)
>   {
> -   gimple stmt = last_stmt (bb);
> tree new_op;
>  
> -   if (bbinfo[idx].op == NULL_TREE)
> - {
> -   if (ops[bbinfo[idx].first_idx]->op != NULL_TREE)
> - {
> -   if (integer_zerop (ops[bbinfo[idx].first_idx]->op))
> - gimple_cond_make_false (stmt);
> -   else if (integer_onep (ops[bbinfo[idx].first_idx]->op))
> - gimple_cond_make_true (stmt);
> -   else
> - {
> -   gimple_cond_set_code (stmt, NE_EXPR);
> -   gimple_cond_set_lhs (stmt,
> -ops[bbinfo[idx].first_idx]->op);
> -   gimple_cond_set_rhs (stmt, boolean_false_node);
> - }
> -   update_stmt (stmt);
> - }
> -   bbinfo[idx].op = new_op = boolean_false_node;
> - }
> -   else
> - new_op = update_ops (bbinfo[idx].op,
> -  (enum tree_code)
> -  ops[bbinfo[idx].first_idx]->rank,
> -  ops, &bbinfo[idx].first_idx,
> -  loop_containing_stmt (stmt));
> +   stmt = last_stmt (bb);
> +   new_op = 

PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
emit_block_move_via_movmem and set_storage_via_setmem have

  for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
   mode = GET_MODE_WIDER_MODE (mode))
{
  enum insn_code code = direct_optab_handler (movmem_optab, mode);

  if (code != CODE_FOR_nothing
  /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
 here because if SIZE is less than the mode mask, as it is
 returned by the macro, it will definitely be less than the
 actual mode mask.  */
  && ((CONST_INT_P (size)
   && ((unsigned HOST_WIDE_INT) INTVAL (size)
   <= (GET_MODE_MASK (mode) >> 1)))
  || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
{

Backend may assume mode of size in movmem and setmem expanders is no
widder than Pmode since size is within the Pmode address space.  X86
backend expand_set_or_movmem_prologue_epilogue_by_misaligned has

  rtx saveddest = *destptr;

  gcc_assert (desired_align <= size);
  /* Align destptr up, place it to new register.  */
  *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
  GEN_INT (prolog_size),
  NULL_RTX, 1, OPTAB_DIRECT);
  *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
  GEN_INT (-desired_align),
  *destptr, 1, OPTAB_DIRECT);
  /* See how many bytes we skipped.  */
  saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
   *destptr,
   saveddest, 1, OPTAB_DIRECT);
  /* Adjust srcptr and count.  */
  if (!issetmem)
*srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
saveddest,
*srcptr, 1, OPTAB_DIRECT);
  *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
saveddest, *count, 1, OPTAB_DIRECT);

saveddest is a negative number in Pmode and *count is in word_mode.  For
x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
leads to overflow.  We could fix it by using mode of saveddest to compute
saveddest + *count.  But it leads to extra conversions and other backends
may run into the same problem.  A better fix is to limit mode of size in
movmem and setmem expanders to Pmode.  It generates better and correct
memcpy and memset for x32.

There is also a typo in comments.  It should be BITS_PER_WORD, not
BITS_PER_HOST_WIDE_INT. 

Tested on x32.  OK to install?

Thanks.


H.J.
---
gcc/

2013-11-04  H.J. Lu  

PR middle-end/58981
* expr.c (emit_block_move_via_movmem): Don't use mode wider than
Pmode for size.
(set_storage_via_setmem): Likewise.

gcc/testsuite/

2013-11-04  H.J. Lu  

PR middle-end/58981
* gcc.dg/pr58981.c: New test.

diff --git a/gcc/expr.c b/gcc/expr.c
index 551a660..1a869650 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1294,13 +1294,16 @@ emit_block_move_via_movmem (rtx x, rtx y, rtx size, 
unsigned int align,
   enum insn_code code = direct_optab_handler (movmem_optab, mode);
 
   if (code != CODE_FOR_nothing
- /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
-here because if SIZE is less than the mode mask, as it is
+ /* We don't need MODE to be narrower than BITS_PER_WORD here
+because if SIZE is less than the mode mask, as it is
 returned by the macro, it will definitely be less than the
-actual mode mask.  */
+actual mode mask unless MODE is wider than Pmode.  Since
+SIZE is within the Pmode address space, we should use
+Pmode in this case.  */
  && ((CONST_INT_P (size)
   && ((unsigned HOST_WIDE_INT) INTVAL (size)
   <= (GET_MODE_MASK (mode) >> 1)))
+ || GET_MODE_BITSIZE (mode) >= GET_MODE_BITSIZE (Pmode)
  || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
{
  struct expand_operand ops[6];
@@ -2879,13 +2882,16 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, 
unsigned int align,
   enum insn_code code = direct_optab_handler (setmem_optab, mode);
 
   if (code != CODE_FOR_nothing
- /* We don't need MODE to be narrower than
-BITS_PER_HOST_WIDE_INT here because if SIZE is less than
-the mode mask, as it is returned by the macro, it will
-definitely be less than the actual mode mask.  */
+ /* We don't need MODE to be narrower than BITS_PER_WORD here
+because if SIZE is less than the mode mask, as it is
+returned by the macro, it will definitely be less than the
+actual mode mask unless MODE is wider than Pmode.  Since
+SIZE is within the Pmode address space, we shou

Re: RFC: simd enabled functions (omp declare simd / elementals)

2013-11-04 Thread Richard Biener
On Fri, Nov 1, 2013 at 4:04 AM, Aldy Hernandez  wrote:
> Hello gentlemen.  I'm CCing all of you, because each of you can provide
> valuable feedback to various parts of the compiler which I touch.  I have
> sprinkled love notes with your names throughout the post :).
>
> This is a patch against the gomp4 branch.  It provides initial support for
> simd-enabled functions which are "#pragma omp declare simd" in the OpenMP
> world and elementals in Cilk Plus nomenclature.  The parsing bits for OpenMP
> are already in trunk, but they are silently ignored.  This patch aims to
> remedy the situation.  The Cilk Plus parsing bits, OTOH, are not ready, but
> could trivially be adapted to use this infrastructure (see below).
>
> I would like to at least get this into the gomp4 branch for now, because I
> am accumulating far too many changes locally.
>
> The main idea is that for a simd annotated function, we can create one or
> more cloned vector variants of a scalar function that can later be used by
> the vectorizer.
>
> For a simple example with multiple returns...
>
> #pragma omp declare simd simdlen(4) notinbranch
> int foo (int a, int b)
> {
>   if (a == b)
> return 555;
>   else
> return 666;
> }
>
> ...we would generate with this patch (unoptimized):

Just a quick question, queueing the thread for later review (aww, queue
back at >100 threads again - I can't get any work done anymore :().

What does #pragma omp declare simd guarantee about memory
side-effects and memory use in the function?  That is, unless the
function can be safely annotated with the 'const' attribute the
whole thing is useless for the vectorizer.

Thanks,
Richard.

> foo.simdclone.0 (vector(4) int simd.4, vector(4) int simd.5)
> {
>   unsigned int iter.6;
>   int b.3[4];
>   int a.2[4];
>   int retval.1[4];
>   int _3;
>   int _5;
>   int _6;
>   vector(4) int _7;
>
>   :
>   a.2 = VIEW_CONVERT_EXPR(simd.4);
>   b.3 = VIEW_CONVERT_EXPR(simd.5);
>   iter.6_12 = 0;
>
>   :
>   # iter.6_9 = PHI 
>   _5 = a.2[iter.6_9];
>   _6 = b.3[iter.6_9];
>   if (_5 == _6)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # _3 = PHI <555(3), 666(4)>
>   retval.1[iter.6_9] = _3;
>   iter.6_14 = iter.6_9 + 1;
>   if (iter.6_14 < 4)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
>   :
>   _7 = VIEW_CONVERT_EXPR(retval.1);
>   return _7;
>
> }
>
> The new loop is properly created and annotated with loop->force_vect=true
> and loop->safelen set.
>
> A possible use may be:
>
> int array[1000];
> void bar ()
> {
>   int i;
>   for (i=0; i < 1000; ++i)
> array[i] = foo(i, 123);
> }
>
> In which case, we would use the simd clone if available:
>
> bar ()
> {
>   vector(4) int vect_cst_.21;
>   vector(4) int vect_i_6.20;
>   vector(4) int * vectp_array.19;
>   vector(4) int * vectp_array.18;
>   vector(4) int vect_cst_.17;
>   vector(4) int vect__4.16;
>   vector(4) int vect_vec_iv_.15;
>   vector(4) int vect_cst_.14;
>   vector(4) int vect_cst_.13;
>   int stmp_var_.12;
>   int i;
>   unsigned int ivtmp_1;
>   int _4;
>   unsigned int ivtmp_7;
>   unsigned int ivtmp_20;
>   unsigned int ivtmp_21;
>
>   :
>   vect_cst_.13_8 = { 0, 1, 2, 3 };
>   vect_cst_.14_2 = { 4, 4, 4, 4 };
>   vect_cst_.17_13 = { 123, 123, 123, 123 };
>   vectp_array.19_15 = &array;
>   vect_cst_.21_5 = { 1, 1, 1, 1 };
>   goto ;
>
>   :
>
>   :
>   # i_9 = PHI 
>   # ivtmp_1 = PHI 
>   # vect_vec_iv_.15_11 = PHI 
>   # vectp_array.18_16 = PHI 
>   # ivtmp_20 = PHI 
>   vect_vec_iv_.15_12 = vect_vec_iv_.15_11 + vect_cst_.14_2;
>   vect__4.16_14 = foo.simdclone.0 (vect_vec_iv_.15_11, vect_cst_.17_13);
>   _4 = 0;
>   MEM[(int *)vectp_array.18_16] = vect__4.16_14;
>   vect_i_6.20_19 = vect_vec_iv_.15_11 + vect_cst_.21_5;
>   i_6 = i_9 + 1;
>   ivtmp_7 = ivtmp_1 - 1;
>   vectp_array.18_17 = vectp_array.18_16 + 16;
>   ivtmp_21 = ivtmp_20 + 1;
>   if (ivtmp_21 < 250)
> goto ;
>   else
> goto ;
>
>   :
>   return;
>
> }
>
> That's the idea.
>
> Some of the ABI issues still need to be resolved (mangling for avx-512, what
> to do with non x86 architectures, what (if any) default clones will be
> created when no vector length is specified, etc etc), but the main
> functionality can be seen above.
>
> Uniform and linear parameters (which are passed as scalars) are still not
> handled.  Also, Jakub mentioned that with the current vectorizer we probably
> can't make good use of the inbranch/masked clones.  I have a laundry list of
> missing things prepended by // FIXME if anyone is curious.
>
> I'd like some feedback from y'all in your respective areas, since this
> touches a few places besides OpenMP.  For instance...
>
> [Honza] Where do you suggest I place a list of simd clones for a particular
> (scalar) function?  Right now I have added a simdclone_of field in
> cgraph_node and am (temporarily) serially scanning all functions in
> get_simd_clone().  This is obviously inefficient.  I didn't know whether to
> use the current next_sibling_clone/etc fields or create my o

Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-11-04 Thread Tom de Vries
On 04/11/13 01:59, Tom de Vries wrote:
> On 29/10/13 14:54, Jakub Jelinek wrote:
>> +/* Return true if all imm uses of VAR are either in STMT, or
>> +   feed (optionally through a chain of single imm uses) GIMPLE_COND
>> +   in basic block COND_BB.  */
>> +
>> +static bool
>> +all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block 
>> cond_bb)
>> +{
>> +  use_operand_p use_p, use2_p;
>> +  imm_use_iterator iter;
>> +
>> +  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
>> +if (USE_STMT (use_p) != stmt)
>> +  {
>> +gimple use_stmt = USE_STMT (use_p);
>> +if (is_gimple_debug (use_stmt))
>> +  continue;
>> +while (is_gimple_assign (use_stmt)
>> +   && single_imm_use (gimple_assign_lhs (use_stmt),
>> +  &use2_p, &use_stmt))
>> +  ;
>> +if (gimple_code (use_stmt) != GIMPLE_COND
>> +|| gimple_bb (use_stmt) != cond_bb)
>> +  return false;
>> +  }
>> +  return true;
>> +}
>> +
> 
> Jakub,
> 
> with g++.dg/torture/pr54684.C and -fno-tree-tail-merge, I run into a sigsegv
> because of gimple_code (use_stmt) being called with use_stmt == NULL.
> 

It's a duplicate of PR58978 - '[4.9 Regression] ICE: Segmentation fault'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58978

Thanks,
- Tom



Re: [PATCH] Time profiler - phase 1

2013-11-04 Thread Jan Hubicka
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index fca665b..3b62bcc 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,31 @@
> +2013-10-29  Martin Liska  
> + Jan Hubicka  
> +
> + * cgraph.c (dump_cgraph_node): Profile dump added.
> + * cgraph.h (struct cgraph_node): New time profile variable added.
> + * cgraphclones.c (cgraph_clone_node): Time profile is cloned.
> + * gcov-io.h (gcov_type): New profiler type introduced.
> + * ipa-profile.c (lto_output_node): Streaming for time profile added.
> + (input_node): Time profiler is read from LTO stream.
> + * predict.c (maybe_hot_count_p): Hot prediction changed.
> + * profile.c (instrument_values): New case for time profiler added.
> + (compute_value_histograms): Read of time profile.
> + * tree-pretty-print.c (dump_function_header): Time profiler is dumped.
> + * tree-profile.c (init_ic_make_global_vars): Time profiler function 
> added.
> + (gimple_init_edge_profiler): TP function instrumentation.
> + (gimple_gen_time_profiler): New.
> + * value-prof.c (gimple_add_histogram_value): Support for time profiler
> + added.
> + (dump_histogram_value): TP type added to dumps.
> + (visit_hist): More sensitive check that takes TP into account.
> + (gimple_find_values_to_profile): TP instrumentation.
> + * value-prof.h (hist_type): New histogram type added.
> + (struct histogram_value_t): Pointer to struct function added.
> + * libgcc/Makefile.in: New GCOV merge function for TP added.
> + * libgcov.c: function_counter variable introduced.
> + (_gcov_merge_time_profile): New.
> + (_gcov_time_profiler): New. 
> +
>  2013-10-29  David Malcolm  
>  
>   * doc/gty.texi ("Inheritance and GTY"): Make it clear that
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 52d9ab0..c95a54e 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node)
>if (node->profile_id)
>  fprintf (f, "  Profile id: %i\n",
>node->profile_id);
> +  fprintf (f, "  First run: %i\n", node->tp_first_run);
>fprintf (f, "  Function flags:");
>if (node->count)
>  fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 7706419..479d49f 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info
>bitmap combined_args_to_skip;
>  };
>  
> -
>  /* The cgraph data structure.
> Each function decl has assigned cgraph_node listing callees and callers.  
> */
>  
> @@ -324,6 +323,8 @@ struct GTY(()) cgraph_node {
>unsigned tm_clone : 1;
>/* True if this decl is a dispatcher for function versions.  */
>unsigned dispatcher_function : 1;
> +  /* Time profiler: first run of function.  */
> +  int tp_first_run;

Move this up after profile_id.
> --- a/gcc/gcov-io.c
> +++ b/gcc/gcov-io.c
> @@ -68,7 +68,7 @@ gcov_open (const char *name, int mode)
>  #if IN_LIBGCOV
>const int mode = 0;
>  #endif
> -#if GCOV_LOCKED
> +#if GCOV_LOCKED  
>struct flock s_flock;
>int fd;
>  
Accidental change?
> @@ -651,6 +658,9 @@ lto_symtab_prevailing_decl (tree decl)
>if (TREE_CODE (decl) == FUNCTION_DECL && DECL_BUILT_IN (decl))
>  return decl;
>  
> +  if (!DECL_ASSEMBLER_NAME_SET_P (decl))
> +return decl;
> +
>/* Ensure DECL_ASSEMBLER_NAME will not set assembler name.  */
>gcc_assert (DECL_ASSEMBLER_NAME_SET_P (decl));
>  
Remove this change - it is unrelated hack from my old tree.
> diff --git a/gcc/predict.c b/gcc/predict.c
> index cc9a053..4b655d3 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -170,7 +170,7 @@ maybe_hot_count_p (struct function *fun, gcov_type count)
>if (fun && profile_status_for_function (fun) != PROFILE_READ)
>  return true;
>/* Code executed at most once is not hot.  */
> -  if (profile_info->runs >= count)
> +  if (count <= 1)
>  return false;
>return (count >= get_hot_bb_threshold ());
>  }
And also this change.
> @@ -895,9 +907,19 @@ compute_value_histograms (histogram_values values, 
> unsigned cfg_checksum,
>hist->hvalue.counters =  XNEWVEC (gcov_type, hist->n_counters);
>for (j = 0; j < hist->n_counters; j++)
>  if (aact_count)
> -   hist->hvalue.counters[j] = aact_count[j];
> - else
> -   hist->hvalue.counters[j] = 0;
> +  hist->hvalue.counters[j] = aact_count[j];
> +else
> +  hist->hvalue.counters[j] = 0;
> +
> +  if (hist->type == HIST_TYPE_TIME_PROFILE)
> +{
> +  node = cgraph_get_node (hist->fun->decl);
> +  
> +  node->tp_first_run = hist->hvalue.counters[0];
> +
> +  if (dump_file)
> +fprintf (dump_file, "Read tp_first_run: %d\n", 
> node->tp_first_run);
> +}
Probably add a comment why you need to annotate counter here.
> diff --git a/gcc/tree-pretty-print.c

Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich  wrote:
> Hi,
>
> This patch adds support Pointer Bounds Checker into c-family and LTO 
> front-ends.  The main purpose of changes in front-end is to register all 
> statically initialized objects for checker.  We also need to register such 
> objects created by compiler.

You define CHKP as supported in LTO.  That means it has to be supported
for all languages and thus you should drop the langhook.

Richard.

> Thanks,
> Ilya
> --
>
> gcc/
>
> 2013-10-29  Ilya Enkovich  
>
> * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
> * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
> * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
> * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
> * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
> * c/c-parser.c (c_parser_declaration_or_fndef): Register statically
> initialized decls in Pointer Bounds Checker.
> * cp/decl.c (cp_finish_decl): Likewise.
> * gimplify.c (gimplify_init_constructor): Likewise.
>
>
> diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
> index 614c46d..a32bc6b 100644
> --- a/gcc/c/c-lang.c
> +++ b/gcc/c/c-lang.c
> @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c;
>  #define LANG_HOOKS_INIT c_objc_common_init
>  #undef LANG_HOOKS_INIT_TS
>  #define LANG_HOOKS_INIT_TS c_common_init_ts
> +#undef LANG_HOOKS_CHKP_SUPPORTED
> +#define LANG_HOOKS_CHKP_SUPPORTED true
>
>  /* Each front end provides its own lang hook initializer.  */
>  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 9ccae3b..65d83c8 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -1682,6 +1682,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>   maybe_warn_string_init (TREE_TYPE (d), init);
>   finish_decl (d, init_loc, init.value,
>init.original_type, asm_name);
> +
> + /* Register all decls with initializers in Pointer
> +Bounds Checker to generate required static bounds
> +initializers.  */
> + if (DECL_INITIAL (d) != error_mark_node)
> +   chkp_register_var_initializer (d);
> }
> }
>   else
> diff --git a/gcc/cp/cp-lang.c b/gcc/cp/cp-lang.c
> index a7fa8e4..6d138bd 100644
> --- a/gcc/cp/cp-lang.c
> +++ b/gcc/cp/cp-lang.c
> @@ -81,6 +81,8 @@ static tree get_template_argument_pack_elems_folded 
> (const_tree);
>  #define LANG_HOOKS_EH_PERSONALITY cp_eh_personality
>  #undef LANG_HOOKS_EH_RUNTIME_TYPE
>  #define LANG_HOOKS_EH_RUNTIME_TYPE build_eh_type_type
> +#undef LANG_HOOKS_CHKP_SUPPORTED
> +#define LANG_HOOKS_CHKP_SUPPORTED true
>
>  /* Each front end provides its own lang hook initializer.  */
>  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 1e92f2a..db40e75 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -6379,6 +6379,12 @@ cp_finish_decl (tree decl, tree init, bool 
> init_const_expr_p,
>  the class specifier.  */
>   if (!DECL_EXTERNAL (decl))
> var_definition_p = true;
> +
> + /* If var has initilizer then we need to register it in
> +Pointer Bounds Checker to generate static bounds initilizer
> +if required.  */
> + if (DECL_INITIAL (decl) && DECL_INITIAL (decl) != error_mark_node)
> +   chkp_register_var_initializer (decl);
> }
>/* If the variable has an array type, lay out the type, even if
>  there is no initializer.  It is valid to index through the
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 4f52c27..503450f 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -4111,6 +4111,11 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>
> walk_tree (&ctor, force_labels_r, NULL, NULL);
> ctor = tree_output_constant_def (ctor);
> +
> +   /* We need to register created constant object to
> +  initialize bounds for pointers in it.  */
> +   chkp_register_var_initializer (ctor);
> +
> if (!useless_type_conversion_p (type, TREE_TYPE (ctor)))
>   ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);
> TREE_OPERAND (*expr_p, 1) = ctor;
> diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> index 0fa0fc9..b6073d9 100644
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -1278,6 +1278,8 @@ static void lto_init_ts (void)
>
>  #undef LANG_HOOKS_INIT_TS
>  #define LANG_HOOKS_INIT_TS lto_init_ts
> +#undef LANG_HOOKS_CHKP_SUPPORTED
> +#define LANG_HOOKS_CHKP_SUPPORTED true
>
>  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
>
> diff --git a/gcc/objc/objc-lang.c b/gcc/objc/objc-lang.c
> index bc0008b..5e7e43b 100644
> --- a/gcc/objc/ob

Re: PR 58958: wrong aliasing info

2013-11-04 Thread Richard Biener
On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse  wrote:
> Hello,
>
> the issue was described in the PR and the message linked from there.
> ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
> detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
> never learns of it and uses the offset+size as if they were meaningful.

Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-&ref->offset, &t1, &t2);
+{
+  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  &t, 0, &safe);
+  ref->offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1 && t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.

Richard.

> Bootstrap+testsuite on x86_64-unknown-linux-gnu.
>
> 2013-11-04  Marc Glisse  
>
> PR tree-optimization/
> gcc/
> * tree-dfa.h (get_addr_base_and_unit_offset_1): Add error reporting.
> * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Use
> get_addr_base_and_unit_offset_1 instead of get_ref_base_and_extent.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/pr58958.c: New file.
>
> --
> Marc Glisse
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +double a[10];
> +int f(int n){
> +  a[3]=9;
> +  __builtin_memset(&a[n],3,sizeof(double));
> +  return a[3]==9;
> +}
> +
> +/* { dg-final { scan-tree-dump " == 9" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
> ___
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: gcc/tree-dfa.h
> ===
> --- gcc/tree-dfa.h  (revision 204302)
> +++ gcc/tree-dfa.h  (working copy)
> @@ -30,65 +30,70 @@ extern tree ssa_default_def (struct func
>  extern void set_ssa_default_def (struct function *, tree, tree);
>  extern tree get_or_create_ssa_default_def (struct function *, tree);
>  extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
>  HOST_WIDE_INT *, HOST_WIDE_INT *);
>  extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
>  extern bool stmt_references_abnormal_ssa_name (gimple);
>  extern void dump_enumerated_decls (FILE *, int);
>
>  /* Returns the base object and a constant BITS_PER_UNIT offset in *POFFSET
> that
> denotes the starting address of the memory access EXP.
> -   Returns NULL_TREE if the offset is not constant or any component
> -   is not BITS_PER_UNIT-aligned.
> +   If the offset is not constant or any component is not
> BITS_PER_UNIT-aligned,
> +   sets *SAFE to false or returns NULL_TREE if SAFE is NULL.
> VALUEIZE if non-NULL is used to valueize SSA names.  It should return
> its argument or a constant if the argument is known to be constant.  */
>  /* ??? This is a static inline here to avoid the overhead of the indirect
> calls
> to VALUEIZE.  But is this overhead really that significant?  And should
> we
> perhaps just rely on WHOPR to specialize the function?  */
>
>  static inline tree
>  get_addr_base_and_unit_offset_1 (tree exp, HOST_WIDE_INT *poffset,
> -tree (*valueize) (tree))
> +tree (*valueize) (tree), bool *safe = NULL)
>  {
>HOST_WIDE_INT byte_offset = 0;
> +  bool issafe = true;
>
>/* Compute cumulative byte-offset for nested component-refs and
> array-refs,
>   and find the ultimate containing object.  */
>while (1)
>  {
>switch (TREE_CODE (exp))
> {
> case BIT_FIELD_REF:
>   {
> HOST_WIDE_INT this_off = TREE_INT_CST_LOW (TREE_OPERAND (exp,
> 2));
> if (this_off % BITS_PER_UNIT)
> - return NULL_TREE;
> -   byte_offset += this_off / BITS_PER_UNIT;
> + issafe = false;
> +   else
> + byte_offset += this_off / BITS_PER_UNIT;
>   }
>   break;
>
> case COMPONENT_REF:
>   {
> tree field = TREE_OPERAND (exp, 1);
> tree this_offset = component_ref_field_offset (exp);
> HOST_WIDE_INT hthis_offset;
>
> if (!this_offset
>  

Re: RFC: simd enabled functions (omp declare simd / elementals)

2013-11-04 Thread Jakub Jelinek
Hi!

On Mon, Nov 04, 2013 at 11:37:19AM +0100, Richard Biener wrote:
> Just a quick question, queueing the thread for later review (aww, queue
> back at >100 threads again - I can't get any work done anymore :().
> 
> What does #pragma omp declare simd guarantee about memory
> side-effects and memory use in the function?  That is, unless the
> function can be safely annotated with the 'const' attribute the
> whole thing is useless for the vectorizer.

The main restriction is:

"The execution of the function or subroutine cannot have any side effects that 
would
alter its execution for concurrent iterations of a SIMD chunk."

There are other restrictions, omp declare simd functions can't call
setjmp/longjmp or throw, etc.

The functions certainly can't be in the general case annotated with the
const attribute, they can read and write memory, but the user is responsible
for making sure that the effects or running the function sequentially as
part of non-vectorized loop are the same as running the simd clone of that
as part of a vectorized loop with the given simdlen vectorization factor.
So it is certainly meant to be used by the vectorizer, after all, that is
it's sole purpose.

Jakub


Re: PR 58958: wrong aliasing info

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
 wrote:
> On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse  wrote:
>> Hello,
>>
>> the issue was described in the PR and the message linked from there.
>> ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
>> detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
>> never learns of it and uses the offset+size as if they were meaningful.
>
> Well...  it certainly learns of it, but it chooses to ignore...
>
>if (TREE_CODE (ptr) == ADDR_EXPR)
> -ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
> -&ref->offset, &t1, &t2);
> +{
> +  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
> +  &t, 0, &safe);
> +  ref->offset = BITS_PER_UNIT * t;
> +}
>
> safe == (t1 != -1 && t1 == t2)
>
> note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
> so I fail to see why it matters at all ...?  That is, if you feed in a wrong
> size then it's the callers error.

I think one issue is that the above code uses get_ref_base_and_extent
on an address.  It should have used get_addr_base_and_unit_offset.

Richard.

> Richard.
>
>> Bootstrap+testsuite on x86_64-unknown-linux-gnu.
>>
>> 2013-11-04  Marc Glisse  
>>
>> PR tree-optimization/
>> gcc/
>> * tree-dfa.h (get_addr_base_and_unit_offset_1): Add error reporting.
>> * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Use
>> get_addr_base_and_unit_offset_1 instead of get_ref_base_and_extent.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/pr58958.c: New file.
>>
>> --
>> Marc Glisse
>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
>> ===
>> --- gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (working copy)
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +double a[10];
>> +int f(int n){
>> +  a[3]=9;
>> +  __builtin_memset(&a[n],3,sizeof(double));
>> +  return a[3]==9;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump " == 9" "optimized" } } */
>> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>>
>> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
>> ___
>> Added: svn:keywords
>> ## -0,0 +1 ##
>> +Author Date Id Revision URL
>> \ No newline at end of property
>> Added: svn:eol-style
>> ## -0,0 +1 ##
>> +native
>> \ No newline at end of property
>> Index: gcc/tree-dfa.h
>> ===
>> --- gcc/tree-dfa.h  (revision 204302)
>> +++ gcc/tree-dfa.h  (working copy)
>> @@ -30,65 +30,70 @@ extern tree ssa_default_def (struct func
>>  extern void set_ssa_default_def (struct function *, tree, tree);
>>  extern tree get_or_create_ssa_default_def (struct function *, tree);
>>  extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
>>  HOST_WIDE_INT *, HOST_WIDE_INT *);
>>  extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
>>  extern bool stmt_references_abnormal_ssa_name (gimple);
>>  extern void dump_enumerated_decls (FILE *, int);
>>
>>  /* Returns the base object and a constant BITS_PER_UNIT offset in *POFFSET
>> that
>> denotes the starting address of the memory access EXP.
>> -   Returns NULL_TREE if the offset is not constant or any component
>> -   is not BITS_PER_UNIT-aligned.
>> +   If the offset is not constant or any component is not
>> BITS_PER_UNIT-aligned,
>> +   sets *SAFE to false or returns NULL_TREE if SAFE is NULL.
>> VALUEIZE if non-NULL is used to valueize SSA names.  It should return
>> its argument or a constant if the argument is known to be constant.  */
>>  /* ??? This is a static inline here to avoid the overhead of the indirect
>> calls
>> to VALUEIZE.  But is this overhead really that significant?  And should
>> we
>> perhaps just rely on WHOPR to specialize the function?  */
>>
>>  static inline tree
>>  get_addr_base_and_unit_offset_1 (tree exp, HOST_WIDE_INT *poffset,
>> -tree (*valueize) (tree))
>> +tree (*valueize) (tree), bool *safe = NULL)
>>  {
>>HOST_WIDE_INT byte_offset = 0;
>> +  bool issafe = true;
>>
>>/* Compute cumulative byte-offset for nested component-refs and
>> array-refs,
>>   and find the ultimate containing object.  */
>>while (1)
>>  {
>>switch (TREE_CODE (exp))
>> {
>> case BIT_FIELD_REF:
>>   {
>> HOST_WIDE_INT this_off = TREE_INT_CST_LOW (TREE_OPERAND (exp,
>> 2));
>> if (this_off % BITS_PER_UNIT)
>> - return NULL_TREE;
>> -   byte_offset += this_off / BITS_PER_UNIT;
>> + issafe = false;
>> +  

Re: Expect JUMP_TABLE_DATA in NEXT_INSN(label)

2013-11-04 Thread Richard Biener
On Sat, Nov 2, 2013 at 1:45 AM, Steven Bosscher  wrote:
> Hello,
>
> As $SUBJECT. The assert for this has been in place for several months
> now without triggering any issues, so I'd like to make this next step
> before stage1 is over.
>
> OK for trunk?

Ok.  Can you verify this from some RTL insn verifier (if it not already is).

Thanks,
Richard.

> Ciao!
> Steven


Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse  wrote:

Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.


Well...  it certainly learns of it, but it chooses to ignore...

  if (TREE_CODE (ptr) == ADDR_EXPR)
-ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-&ref->offset, &t1, &t2);
+{
+  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  &t, 0, &safe);
+  ref->offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1 && t1 == t2)


I'll try that... (I need to think whether that's sufficient to be safe)


note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.


The caller is feeding the right size. The issue is that 
get_ref_base_and_extent cannot determine the offset as a constant. 
Normally, get_ref_base_and_extent then gives you a safe combination of 
offset+maxsize to cover the whole decl. Here, we don't want to use the 
size determined by get_ref_base_and_extent, we know better, but that also 
means we have to handle the case where the offset couldn't be determined 
as a constant.


--
Marc Glisse


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread Richard Sandiford
"H.J. Lu"  writes:
> emit_block_move_via_movmem and set_storage_via_setmem have
>
>   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
>mode = GET_MODE_WIDER_MODE (mode))
> {
>   enum insn_code code = direct_optab_handler (movmem_optab, mode);
>
>   if (code != CODE_FOR_nothing
>   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
>  here because if SIZE is less than the mode mask, as it is
>  returned by the macro, it will definitely be less than the
>  actual mode mask.  */
>   && ((CONST_INT_P (size)
>&& ((unsigned HOST_WIDE_INT) INTVAL (size)
><= (GET_MODE_MASK (mode) >> 1)))
>   || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
> {
>
> Backend may assume mode of size in movmem and setmem expanders is no
> widder than Pmode since size is within the Pmode address space.  X86
> backend expand_set_or_movmem_prologue_epilogue_by_misaligned has
>
>   rtx saveddest = *destptr;
>
>   gcc_assert (desired_align <= size);
>   /* Align destptr up, place it to new register.  */
>   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
>   GEN_INT (prolog_size),
>   NULL_RTX, 1, OPTAB_DIRECT);
>   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
>   GEN_INT (-desired_align),
>   *destptr, 1, OPTAB_DIRECT);
>   /* See how many bytes we skipped.  */
>   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
>*destptr,
>saveddest, 1, OPTAB_DIRECT);
>   /* Adjust srcptr and count.  */
>   if (!issetmem)
> *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
> saveddest,
> *srcptr, 1, OPTAB_DIRECT);
>   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
> saveddest, *count, 1, OPTAB_DIRECT);
>
> saveddest is a negative number in Pmode and *count is in word_mode.  For
> x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
> leads to overflow.  We could fix it by using mode of saveddest to compute
> saveddest + *count.  But it leads to extra conversions and other backends
> may run into the same problem.  A better fix is to limit mode of size in
> movmem and setmem expanders to Pmode.  It generates better and correct
> memcpy and memset for x32.
>
> There is also a typo in comments.  It should be BITS_PER_WORD, not
> BITS_PER_HOST_WIDE_INT. 

I don't think it's a typo.  It's explaining why we don't have to worry about:

GET_MODE_BITSIZE (mode) > BITS_PER_HOST_WIDE_INT

in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
will be an all-1 HOST_WIDE_INT, even though that's narrower than the
real mask).

I don't think the current comment covers the BITS_PER_WORD test at all.
AIUI it's there because the pattern is defined as taking a length of
at most word_mode, so we should stop once we reach it.

FWIW, I agree Pmode makes more sense on face value.  But shouldn't
we replace the BITS_PER_WORD test instead of adding to it?  Having both
would only make a difference for Pmode > word_mode targets, which might
be able to handle full Pmode lengths.

Either way, the md.texi documentation should be updated too.

Thanks,
Richard



Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
 wrote:

On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse  wrote:

Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.


Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-&ref->offset, &t1, &t2);
+{
+  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  &t, 0, &safe);
+  ref->offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1 && t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.


I think one issue is that the above code uses get_ref_base_and_extent
on an address.  It should have used get_addr_base_and_unit_offset.


Isn't that what my patch does? Except that get_addr_base_and_unit_offset 
often gives up and returns NULL_TREE, whereas I believe we still want a 
base even if we have trouble determining a constant offset, so I modified 
get_addr_base_and_unit_offset_1 a bit.


(not saying the result is pretty...)

--
Marc Glisse


Re: [wide-int] Do not treat rtxes as sign-extended

2013-11-04 Thread Richard Biener
On Sat, Nov 2, 2013 at 3:25 PM, Richard Sandiford
 wrote:
> Kenneth Zadeck  writes:
>> On 11/02/2013 06:30 AM, Richard Sandiford wrote:
>>> Bah.  After all that effort, it turns out that -- by design --
>>> there is one special case where CONST_INTs are not sign-extended.
>>> Nonzero/true BImode integers are stored as STORE_FLAG_VALUE,
>>> which can be 1 rather than -1.  So (const_int 1) can be a valid
>>> BImode integer -- and consequently (const_int -1) can be wrong --
>>> even though BImode only has 1 bit.
>>>
>>> It might be nice to change that, but for wide-int I think we should
>>> just treat rtxes like trees for now.
>>>
>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some ICEs
>>> seen on bfin-elf.  OK to install?
>> do we have to throw away the baby with the bath water here?  I guess
>> what you are saying is that it is worse to have is_sign_extended be a
>> variable that is almost always true than to be a hard false.
>
> Right.  is_sign_extended is only useful if it's a compile-time value.
> Making it a run-time value would negate the benefit.
>
> I think in practice STORE_FLAG_VALUE is a compile-time constant too,
> so we could set is_sign_extended to STORE_FLAG_VALUE == -1.  But AFAICT
> that would only help SPU and m68k.
>
>> also we could preserve the test and make it not apply to bimode.
>
> You mean the one in the assert?  Yeah, OK.  How about this version?
>
> Richard
>
>
> Index: gcc/rtl.h
> ===
> --- gcc/rtl.h   2013-11-02 11:06:12.738517644 +
> +++ gcc/rtl.h   2013-11-02 14:22:05.636007860 +
> @@ -1408,7 +1408,9 @@ typedef std::pair {
>  static const enum precision_type precision_type = VAR_PRECISION;
>  static const bool host_dependent_precision = false;
> -static const bool is_sign_extended = true;
> +/* This ought to be true, except for the special case that BImode
> +   is canonicalized to STORE_FLAG_VALUE, which might be 1.  */
> +static const bool is_sign_extended = false;
>  static unsigned int get_precision (const rtx_mode_t &);
>  static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,
>   const rtx_mode_t &);
> @@ -1432,7 +1434,8 @@ wi::int_traits ::decompose (
>  case CONST_INT:
>if (precision < HOST_BITS_PER_WIDE_INT)
> gcc_checking_assert (INTVAL (x.first)
> -== sext_hwi (INTVAL (x.first), precision));
> +== sext_hwi (INTVAL (x.first), precision)
> +|| (precision == 1 && INTVAL (x.first) == 1));

please add a comment here (and a check for BImode?).

Thanks,
Richard.

>return wi::storage_ref (&INTVAL (x.first), 1, precision);
>


[C++ PATCH] Fix PR58979

2013-11-04 Thread Marek Polacek
Even RO_ARROW_STAR can end up in invalid_indirection_error in invalid
code, so handle that instead of ICEing later on.

Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8?

2013-11-04  Marek Polacek  

PR c++/58979
c-family/
* c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case.
testsuite/
* g++.dg/diagnostic/pr58979.C: New test.

--- gcc/c-family/c-common.c.mp2 2013-11-04 10:30:15.265951084 +0100
+++ gcc/c-family/c-common.c 2013-11-04 10:47:29.164783425 +0100
@@ -9930,6 +9930,7 @@ invalid_indirection_error (location_t lo
type);
   break;
 case RO_ARROW:
+case RO_ARROW_STAR:
   error_at (loc,
"invalid type argument of %<->%> (have %qT)",
type);
--- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2   2013-11-04 
11:09:30.515553664 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr58979.C   2013-11-04 11:09:26.505538785 
+0100
@@ -0,0 +1,4 @@
+// PR c++/58979
+// { dg-do compile }
+
+int i = 0->*0; // { dg-error "invalid type argument of" }

Marek


Re: [wide-int] Simplify some force_fit_type calls

2013-11-04 Thread Richard Biener
On Sun, Nov 3, 2013 at 11:31 AM, Richard Sandiford
 wrote:
> When going through force_fit_type calls to see whether they were
> extending correctly, I noticed some of the calls in VRP could be simplified.
>
> There's no change in behaviour, it's just shorter and more efficient.
>
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  2013-11-03 10:04:56.004019113 +
> +++ gcc/tree-vrp.c  2013-11-03 10:25:45.398984735 +
> @@ -1617,16 +1617,8 @@ extract_range_from_assert (value_range_t
>/* Make sure to not set TREE_OVERFLOW on the final type
>  conversion.  We are willingly interpreting large positive
>  unsigned values as negative singed values here.  */
> -  min = force_fit_type (TREE_TYPE (var),
> -   wide_int::from (min,
> -   TYPE_PRECISION (TREE_TYPE (var)),
> -   TYPE_SIGN (TREE_TYPE (min))),
> -   0, false);
> -  max = force_fit_type (TREE_TYPE (var),
> -   wide_int::from (max,
> -   TYPE_PRECISION (TREE_TYPE (var)),
> -   TYPE_SIGN (TREE_TYPE (max))),
> -   0, false);
> +  min = force_fit_type (TREE_TYPE (var), wi::to_widest (min), 0, false);
> +  max = force_fit_type (TREE_TYPE (var), wi::to_widest (max), 0, false);
>
>/* We can transform a max, min range to an anti-range or
>   vice-versa.  Use set_and_canonicalize_value_range which does
> @@ -3235,20 +3227,12 @@ extract_range_from_unary_expr_1 (value_r
>   if (is_overflow_infinity (vr0.min))
> new_min = negative_overflow_infinity (outer_type);
>   else
> -   new_min = force_fit_type (outer_type,
> - wide_int::from
> -   (vr0.min,
> -TYPE_PRECISION (outer_type),
> -TYPE_SIGN (TREE_TYPE (vr0.min))),
> +   new_min = force_fit_type (outer_type, wi::to_widest (vr0.min),
>   0, false);
>   if (is_overflow_infinity (vr0.max))
> new_max = positive_overflow_infinity (outer_type);
>   else
> -   new_max = force_fit_type (outer_type,
> - wide_int::from
> -   (vr0.max,
> -TYPE_PRECISION (outer_type),
> -TYPE_SIGN (TREE_TYPE (vr0.max))),
> +   new_max = force_fit_type (outer_type, wi::to_widest (vr0.max),
>   0, false);
>   set_and_canonicalize_value_range (vr, vr0.type,
> new_min, new_max, NULL);


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
Y
On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford
 wrote:
> "H.J. Lu"  writes:
>> emit_block_move_via_movmem and set_storage_via_setmem have
>>
>>   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
>>mode = GET_MODE_WIDER_MODE (mode))
>> {
>>   enum insn_code code = direct_optab_handler (movmem_optab, mode);
>>
>>   if (code != CODE_FOR_nothing
>>   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
>>  here because if SIZE is less than the mode mask, as it is
>>  returned by the macro, it will definitely be less than the
>>  actual mode mask.  */
>>   && ((CONST_INT_P (size)
>>&& ((unsigned HOST_WIDE_INT) INTVAL (size)
>><= (GET_MODE_MASK (mode) >> 1)))
>>   || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
>> {
>>
>> Backend may assume mode of size in movmem and setmem expanders is no
>> widder than Pmode since size is within the Pmode address space.  X86
>> backend expand_set_or_movmem_prologue_epilogue_by_misaligned has
>>
>>   rtx saveddest = *destptr;
>>
>>   gcc_assert (desired_align <= size);
>>   /* Align destptr up, place it to new register.  */
>>   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
>>   GEN_INT (prolog_size),
>>   NULL_RTX, 1, OPTAB_DIRECT);
>>   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
>>   GEN_INT (-desired_align),
>>   *destptr, 1, OPTAB_DIRECT);
>>   /* See how many bytes we skipped.  */
>>   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
>>*destptr,
>>saveddest, 1, OPTAB_DIRECT);
>>   /* Adjust srcptr and count.  */
>>   if (!issetmem)
>> *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
>> saveddest,
>> *srcptr, 1, OPTAB_DIRECT);
>>   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
>> saveddest, *count, 1, OPTAB_DIRECT);
>>
>> saveddest is a negative number in Pmode and *count is in word_mode.  For
>> x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
>> leads to overflow.  We could fix it by using mode of saveddest to compute
>> saveddest + *count.  But it leads to extra conversions and other backends
>> may run into the same problem.  A better fix is to limit mode of size in
>> movmem and setmem expanders to Pmode.  It generates better and correct
>> memcpy and memset for x32.
>>
>> There is also a typo in comments.  It should be BITS_PER_WORD, not
>> BITS_PER_HOST_WIDE_INT.
>
> I don't think it's a typo.  It's explaining why we don't have to worry about:
>
> GET_MODE_BITSIZE (mode) > BITS_PER_HOST_WIDE_INT
>
> in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
> will be an all-1 HOST_WIDE_INT, even though that's narrower than the
> real mask).

Thanks for explanation.

> I don't think the current comment covers the BITS_PER_WORD test at all.
> AIUI it's there because the pattern is defined as taking a length of
> at most word_mode, so we should stop once we reach it.

I see.

> FWIW, I agree Pmode makes more sense on face value.  But shouldn't
> we replace the BITS_PER_WORD test instead of adding to it?  Having both
> would only make a difference for Pmode > word_mode targets, which might
> be able to handle full Pmode lengths.

Do we ever have a target with Pmode is wider than
word_mode? If not, we can check Pmode instead.

> Either way, the md.texi documentation should be updated too.
>
> Thanks,
> Richard
>



-- 
H.J.


Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng  wrote:
> Hi,
>
> The IVOPT in GCC has a problem that it does not use cost of auto-increment
> address expression in accounting, while it retreats to cost of address
> expression if auto-increment addressing mode is unavailable.
> For example, on ARM target:
> 1) the cost of "[reg]" (which is 6) is used for address expression "[reg],
> #off";
> 2) the cost of "[reg+off]" (which is 2) is used for address expression
> "[reg, #off]!";
>
> This causes:
> 1) cost of non-auto increment address expression is used for auto-increment
> address expression;
> 2) different address costs are used for pre/post increment address
> expressions.
> This patch fixes the problem by computing, caching and using the cost of
> auto-increment address expressions.
>
> Bootstrap and test on x86/arm.  Is it OK?

But don't you need to adjust

static bool
determine_use_iv_cost_address (struct ivopts_data *data,
   struct iv_use *use, struct iv_cand *cand)
{
  bitmap depends_on;
  bool can_autoinc;
  int inv_expr_id = -1;
  comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on,
 &can_autoinc, &inv_expr_id);

  if (cand->ainc_use == use)
{
  if (can_autoinc)
cost.cost -= cand->cost_step;

this which seems to try to compensate for your issue?

Or maybe I don't understand.

CCing Bernd who implemented this IIRC.

Richard.

> 2013-11-01  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (enum ainc_type): New.
> (address_cost_data): New field.
> (get_address_cost): Compute auto-increment rtx cost in ainc_costs.
> Use ainc_costs for auto-increment rtx patterns.
> Cleanup TWS.


Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Jakub Jelinek  writes:

> I think even as a fallback is the patch too expensive.
> I'd say best would be to write some getline API compatible function
> and just use it, using fread on say fixed size buffer.

OK, thanks for the insight.  I have just used the getline_fallback
function you proposed, slightly amended to use the memory allocation
routines commonly used in gcc and renamed into get_line, with a
hopefully complete comment explaining where this function comes from
etc.

[...]

> A slight complication is what to do on mingw/cygwin and other DOS or
> Mac style line ending environments, no idea what fgets exactly does
> there.

Actually, I think that even fgets just deals with '\n'.  The reason,
from what I gathered being that on windows, we fopen the input file in
text mode; and in that mode, the \r\n is transformed into just \n.

Apparently OSX is compatible with '\n' too.  Someone corrects me if I am
saying non-sense here.

So the patch below is what I am bootstrapping at the moment.

OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk?

> BTW, we probably want to do something with the speed of the caret
> diagnostics too, right now it opens the file again for each single line
> to be printed in caret diagnostics and reads all lines until the right one,
> so imagine how fast is printing of many warnings on almost adjacent lines
> near the end of many megabytes long file.
> Perhaps we could remember the last file we've opened for caret diagnostics,
> don't fclose the file right away but only if a new request is for a
> different file, perhaps keep some vector of line start offsets (say starting
> byte of every 100th line or similar) and also remember the last read line
> offset, so if a new request is for the same file, but higher line than last,
> we can just keep getlineing, and if it is smaller line than last, we look up
> the offset of the line / 100, fseek to it and just getline only modulo 100
> lines.  Maybe we should keep not just one, but 2 or 4 opened files as cache
> (again, with the starting line offsets vectors).

I like this idea.  I'll try and work on it.

And now the patch.

Cheers.

gcc/ChangeLog:

* input.h (location_get_source_line): Take an additional line_size
parameter by reference.
* input.c (get_line): New static function definition.
(read_line): Take an additional line_length output parameter to be
set to the size of the line.  Use the new get_line function to
compute the size of the line returned by fgets, rather than using
strlen.  Ensure that the buffer is initially zeroed; ensure that
when growing the buffer too.
(location_get_source_line): Take an additional output line_len
parameter.  Update the use of read_line to pass it the line_len
parameter.
* diagnostic.c (adjust_line): Take an additional input parameter
for the length of the line, rather than calculating it with
strlen.
(diagnostic_show_locus): Adjust the use of
location_get_source_line and adjust_line with respect to their new
signature.  While displaying a line now, do not stop at the first
null byte.  Rather, display the zero byte as a space and keep
going until we reach the size of the line.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c   |  17 ++--
 gcc/input.c| 104 ++---
 gcc/input.h|   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 -> 240 bytes
 4 files changed, 103 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..0ca7081 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
MAX_WIDTH by some margin, then adjust the start of the line such
that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
   const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@ diagnostic_show

Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor  wrote:
> The middle-end currently marks all the sync builtins as nothrow.  That
> is incorrect for most of them when using -fnon-call-exceptions.  The
> -fnon-call-exceptions option means that instructions that trap may throw
> exceptions.  Most of the sync builtins access memory via pointers passed
> by user code.  Therefore, they may trap.  (I noticed this because Go
> uses -fnon-call-exceptions.)
>
> This patch fixes the problem by introducing a new internal function
> attribute "nothrow call".  The new attribute is exactly like "nothrow",
> except that it is ignored when using -fnon-call-exceptions.
>
> It is an internal attribute because there is no reason for a user to use
> this.  The problem only arises when the middle-end sees a function call
> that the backend turns into an instruction sequence that directly
> references memory.  That can only happen for functions that are built in
> to the compiler.
>
> This requires changes to the C family, LTO, and Ada frontends.  I think
> I'm handling the option correctly for LTO, but it would be good if
> somebody could check the logic for me.
>
> Bootstrapped and ran tests on x86_64-unknown-linux-gnu.  OK for
> mainline?

Hmm.  I think you can handle this in a simpler way by just
doing sth similar to

#define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)

that is, conditionally drop the _NOTHROW part.  Btw, I also think
that if it can throw then it isn't LEAF (it may return exceptionally
into another function in the unit).

Also this whole conditional-on-a-flag thing doesn't work well with
using -fnon-call-exceptions in the optimize attribute or with
different -fnon-call-exceptions settings in different TUs we LTO
together.  Because we only have a single builtin decl (so for
"proper" operation you'd have to clone builtin decls based on
the matrix of flags that generate different attributes and use the
correct one depending on context).

That said, I'd prefer a simpler approach for now, mimicing flag_errno_math
handling.

Your lto-wrapper parts are clearly "wrong" (in a conservative way
though).  They follow -fexceptions so the lto-wrapper.c and
lto-opts.c parts are ok.

Fixing the LTO pieces would require to really "stream" most
builtins instead of generating them in the LTO frontend and streaming
them via DECL_FUNCTION_CODE.  tree merging should get
rid of the duplicates.  I suppose I should try that again ;)

Thanks,
Richard.

> Ian
>
>
> gcc/ChangeLog:
>
> 2013-11-03  Ian Lance Taylor  
>
> * builtin-attrs.def (ATTR_NOTHROWCALL): Define.
> (ATTR_NOTHROWCALL_LIST, ATTR_NOTHROWCALL_LEAF_LIST): Define.
> * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync
> builtins that take pointers.
> * lto-opts.c (lto_write_options): Write -fnon-call-exceptions
> if set.
> * lto-wrapper.c (merge_and_complain): Collect
> OPT_fnon_call_exceptions.
> (run_gcc): Pass -fnon-call-exceptions.
>
> gcc/c-family/ChangeLog:
>
> 2013-11-03  Ian Lance Taylor  
>
> * c-common.c (c_common_attribute_table): Add "nothrow call".
> (handle_nothrowcall_attribute): New static function.
>
> gcc/lto/ChangeLog:
>
> 2013-11-03  Ian Lance Taylor  
>
> * lto-lang.c (lto_attribute_table): Add "nothrow call"
> (handle_nothrowcall_attribute): New static function.
>
> gcc/ada/ChangeLog:
>
> 2013-11-03  Ian Lance Taylor  
>
> * gcc-interface/utils.c (gnat_internal_attribute_table): Add
> "nothrow call".
> (handle_nothrowcall_attribute): New static function.
>
> gcc/testsuite/ChangeLog:
>
> 2013-11-03  Ian Lance Taylor  
>
> * g++.dg/ext/sync-4.C: New test.
>
>


Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 12:46:10PM +0100, Dodji Seketeli wrote:
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
> MAX_WIDTH by some margin, then adjust the start of the line such
> that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
> margin is either 10 characters or the difference between the column
> -   and the length of the line, whatever is smaller.  */
> +   and the length of the line, whatever is smaller.  The length of
> +   LINE is given by LINE_WIDTH.  */
>  static const char *
> -adjust_line (const char *line, int max_width, int *column_p)
> +adjust_line (const char *line, int line_width,
> +  int max_width, int *column_p)

Eventually, I think using int for sizes is just a ticking bomb, what if
somebody uses > 2GB long lines?  Surely, on 32-bit hosts we are unlikely to
handle it, but why couldn't 64-bit host handle it?  Column info maybe bogus
in there, sure, but at least we should not crash or overflow buffers on it
;).  Anyway, not something needed to be fixed right now, but in the future
it would be nicer to use size_t and/or ssize_t here.

>  {
>int right_margin = 10;
> -  int line_width = strlen (line);
>int column = *column_p;
>  
>right_margin = MIN (line_width - column, right_margin);
> @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
>  const diagnostic_info *diagnostic)
>  {
>const char *line;
> +  int line_width;
>char *buffer;
>expanded_location s;
>int max_width;
> @@ -297,22 +299,25 @@ diagnostic_show_locus (diagnostic_context * context,
>  
>context->last_location = diagnostic->location;
>s = expand_location_to_spelling_point (diagnostic->location);
> -  line = location_get_source_line (s);
> +  line = location_get_source_line (s, line_width);

I think richi didn't like C++ reference arguments to be used that way (and
perhaps guidelines don't either), because it isn't immediately obvious
that line_width is modified by the call.  Can you change it to a pointer
argument instead and pass &line_width?
> +  *lineptr = XNEWVEC (char, *n);
> +  if (*lineptr == NULL)
> + return -1;

XNEWVEC or XRESIZEVEC will never return NULL though, so it doesn't have
to be tested.  Though, the question is if that is what we want, caret
diagnostics should be optional, if we can't print it, we just won't.
So perhaps using malloc/realloc here would be better?

>  
>  const char *
> -location_get_source_line (expanded_location xloc)
> +location_get_source_line (expanded_location xloc,
> +   int& line_len)

Ditto.

Otherwise, LGTM.

Jakub


Re: PR 58958: wrong aliasing info

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse  wrote:
> On Mon, 4 Nov 2013, Richard Biener wrote:
>
>> On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
>>  wrote:
>>>
>>> On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse 
>>> wrote:

 Hello,

 the issue was described in the PR and the message linked from there.
 ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
 detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
 never learns of it and uses the offset+size as if they were meaningful.
>>>
>>>
>>> Well...  it certainly learns of it, but it chooses to ignore...
>>>
>>>if (TREE_CODE (ptr) == ADDR_EXPR)
>>> -ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
>>> -&ref->offset, &t1, &t2);
>>> +{
>>> +  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr,
>>> 0),
>>> +  &t, 0, &safe);
>>> +  ref->offset = BITS_PER_UNIT * t;
>>> +}
>>>
>>> safe == (t1 != -1 && t1 == t2)
>>>
>>> note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
>>> so I fail to see why it matters at all ...?  That is, if you feed in a
>>> wrong
>>> size then it's the callers error.
>>
>>
>> I think one issue is that the above code uses get_ref_base_and_extent
>> on an address.  It should have used get_addr_base_and_unit_offset.
>
>
> Isn't that what my patch does? Except that get_addr_base_and_unit_offset
> often gives up and returns NULL_TREE, whereas I believe we still want a base
> even if we have trouble determining a constant offset, so I modified
> get_addr_base_and_unit_offset_1 a bit.
>
> (not saying the result is pretty...)

I don't think we want get_addr_base_and_unit_offset to return non-NULL
for a non-constant offset.  But yes, inside your patch is the correct fix,
so if you drop changing get_addr_base_and_unit_offset ...

Richard.

> --
> Marc Glisse


RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Bernd Edlinger
Hi,


I see another "read_line" at gcov.c, which seems to be a copy.

Maybe this should be changed too?

What do you think?

Bernd.

On Mon, 4 Nov 2013 12:46:10, Dodji Seketeli wrote:
>
> Jakub Jelinek  writes:
>
>> I think even as a fallback is the patch too expensive.
>> I'd say best would be to write some getline API compatible function
>> and just use it, using fread on say fixed size buffer.
>
> OK, thanks for the insight. I have just used the getline_fallback
> function you proposed, slightly amended to use the memory allocation
> routines commonly used in gcc and renamed into get_line, with a
> hopefully complete comment explaining where this function comes from
> etc.
>
> [...]
>
>> A slight complication is what to do on mingw/cygwin and other DOS or
>> Mac style line ending environments, no idea what fgets exactly does
>> there.
>
> Actually, I think that even fgets just deals with '\n'. The reason,
> from what I gathered being that on windows, we fopen the input file in
> text mode; and in that mode, the \r\n is transformed into just \n.
>
> Apparently OSX is compatible with '\n' too. Someone corrects me if I am
> saying non-sense here.
>
> So the patch below is what I am bootstrapping at the moment.
>
> OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk?
>
>> BTW, we probably want to do something with the speed of the caret
>> diagnostics too, right now it opens the file again for each single line
>> to be printed in caret diagnostics and reads all lines until the right one,
>> so imagine how fast is printing of many warnings on almost adjacent lines
>> near the end of many megabytes long file.
>> Perhaps we could remember the last file we've opened for caret diagnostics,
>> don't fclose the file right away but only if a new request is for a
>> different file, perhaps keep some vector of line start offsets (say starting
>> byte of every 100th line or similar) and also remember the last read line
>> offset, so if a new request is for the same file, but higher line than last,
>> we can just keep getlineing, and if it is smaller line than last, we look up
>> the offset of the line / 100, fseek to it and just getline only modulo 100
>> lines. Maybe we should keep not just one, but 2 or 4 opened files as cache
>> (again, with the starting line offsets vectors).
>
> I like this idea. I'll try and work on it.
>
> And now the patch.
>
> Cheers.
>
> gcc/ChangeLog:
>
> * input.h (location_get_source_line): Take an additional line_size
> parameter by reference.
> * input.c (get_line): New static function definition.
> (read_line): Take an additional line_length output parameter to be
> set to the size of the line. Use the new get_line function to
> compute the size of the line returned by fgets, rather than using
> strlen. Ensure that the buffer is initially zeroed; ensure that
> when growing the buffer too.
> (location_get_source_line): Take an additional output line_len
> parameter. Update the use of read_line to pass it the line_len
> parameter.
> * diagnostic.c (adjust_line): Take an additional input parameter
> for the length of the line, rather than calculating it with
> strlen.
> (diagnostic_show_locus): Adjust the use of
> location_get_source_line and adjust_line with respect to their new
> signature. While displaying a line now, do not stop at the first
> null byte. Rather, display the zero byte as a space and keep
> going until we reach the size of the line.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
> ---
> gcc/diagnostic.c | 17 ++--
> gcc/input.c | 104 ++---
> gcc/input.h | 3 +-
> .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 -> 240 bytes
> 4 files changed, 103 insertions(+), 21 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
>
> diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> index 36094a1..0ca7081 100644
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
> MAX_WIDTH by some margin, then adjust the start of the line such
> that the COLUMN is smaller than MAX_WIDTH minus the margin. The
> margin is either 10 characters or the difference between the column
> - and the length of the line, whatever is smaller. */
> + and the length of the line, whatever is smaller. The length of
> + LINE is given by LINE_WIDTH. */
> static const char *
> -adjust_line (const char *line, int max_width, int *column_p)
> +adjust_line (const char *line, int line_width,
> + int max_width, int *column_p)
> {
> int right_margin = 10;
> - int line_width = strlen (line);
> int column = *column_p;
>
> right_margin = MIN (line_width - column, right_margin);
> @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
> const diagnostic_info *diagnostic)
> {
> const char *line;
> + int line_width;
> char *buffer;
> expanded_location s;
> int max_width;
> @@ -297,22 +299,25 @@ diagnostic_show_

Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote:
> I see another "read_line" at gcov.c, which seems to be a copy.

Copy of what?  gcov.c read_line hardly can be allowed to fail because out of
mem unlike this one for caret diagnostics.
Though, surely, this one could be somewhat adjusted so that it really
doesn't use a temporary buffer but reads directly into the initially
malloced, then realloced, buffer.  But, if we want it to eventually switch
to caching the caret diagnostics, it won't be possible/desirable anymore.

Jakub


RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Bernd Edlinger
>
> On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote:
>> I see another "read_line" at gcov.c, which seems to be a copy.
>
> Copy of what? gcov.c read_line hardly can be allowed to fail because out of
> mem unlike this one for caret diagnostics.
> Though, surely, this one could be somewhat adjusted so that it really
> doesn't use a temporary buffer but reads directly into the initially
> malloced, then realloced, buffer. But, if we want it to eventually switch
> to caching the caret diagnostics, it won't be possible/desirable anymore.
>
> Jakub

gcov.c and input.c currently both have a static function "read_line"
they are currently 100% in sync. Both _can_ fail, if the file gets
deleted or modified while the function executes.

If gcov.c crashes in that event, I'd call it a bug.

Bernd.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 5:20 PM, Jeff Law  wrote:
> On 10/30/13 04:34, Ilya Enkovich wrote:
>>
>> On 30 Oct 10:26, Richard Biener wrote:
>>>
>>>
>>> Ick - you enlarge all return statements?  But you don't set the
>>> actual value? So why allocate it with 2 ops in the first place??
>>
>>
>> When return does not return bounds it has operand with zero value
>> similar to case when it does not return value. What is the difference
>> then?
>
> In general, when someone proposes a change in the size of tree, rtl or
> gimple nodes, it's a "yellow flag" that something may need further
> investigation.
>
> In this specific instance, I could trivially predict how that additional
> field would be used and a GIMPLE_RETURN isn't terribly important from a size
> standpoint, so I didn't call it out.

Btw, both for regular return with no value and this case only required
operands could be allocated.  There may be legacy issues with
the regular return value but for new uses always allocating the
operand seems wrong (even if "cheap" in practice).

Richard.

>> Returns instrumentation. We add new operand to return statement to
>> hold returned bounds and instrumentation pass is responsible to fill
>> this operand with correct bounds
>
> Exactly what I expected.
>
>
>>
>> Unfortunately patch has been already installed.  Should we uninstall
>> it?  If not, then here is patch for documentation.
>
> I think we're OK for now.  If Richi wants it out, he'll say so explicitly.
>
>
>
>>
>> Thanks, Ilya --
>>
>> gcc/
>>
>> 2013-10-30  Ilya Enkovich  
>>
>> * doc/gimple.texi (gimple_call_num_nobnd_args): New.
>> (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New.
>> (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index):
>> New.
>
> Can you also fixup the GIMPLE_RETURN documentation in gimple.texi.  It needs
> a minor update after these changes.
>
> jeff
>


Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 6:40 PM, Jeff Law  wrote:
> On 10/30/13 04:48, Richard Biener wrote:
>>
>> foo (int * p, unsigned int size)
>> {
>> __bound_tmp.0;
>>long unsigned int D.2239;
>>long unsigned int _2;
>>sizetype _6;
>>int * _7;
>>
>>:
>>__bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));
>>
>>:
>>_2 = (long unsigned int) size_1(D);
>>__builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
>>_6 = _2 + 18446744073709551615;
>>_7 = p_3(D) + _6;
>>__builtin_ia32_bndcu (__bound_tmp.0_4, _7);
>>access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));
>>
>> so it seems there is now a mismatch between DECL_ARGUMENTS
>> and the GIMPLE call stmt arguments.  How (if) did you amend
>> the GIMPLE stmt verifier for this?
>
> Effectively the bounds are passed "on the side".

Well, not exactly.  I can see __bound_tmp.0_4 passed to access_and_store.

Are __builtin_ia32_bndcu and access_and_store supposed to be
called directly after each other?  What if for example profile instrumentation
inserts a call inbetween them?

>> How does regular code deal with this which may expect matching
>> to DECL_ARGUMENTS?  In fact interleaving the additional
>> arguments sounds very error-prone for existing code - I'd have
>> appended all bound args at the end.  Also you unconditionally
>> claim all pointer arguments have a bound - that looks like bad
>> design as well.  Why didn't you add a flag to the relevant
>> PARM_DECL (and then, what do you do for indirect calls?).
>
> You can't actually interleave them -- that results in MPX and normal code
> not being able to interact.   Passing the bound at the end doesn't really
> work either -- varargs and the desire to pass some of the bounds around in
> bound registers.

I'm only looking at how bound arguments are passed at the GIMPLE
level - which I think is arbitrary given they are passed on-the-side
during code-generation.  So I'm looking for a more "sensible" option
for the GIMPLE representation which looks somewhat fragile to me
(it looks the argument is only there to have a data dependence
between the bndcu intrinsic and the actual call?)

>>
>> /* Return the number of arguments used by call statement GS
>> ignoring bound ones.  */
>>
>> static inline unsigned
>> gimple_call_num_nobnd_args (const_gimple gs)
>> {
>>unsigned num_args = gimple_call_num_args (gs);
>>unsigned res = num_args;
>>for (unsigned n = 0; n < num_args; n++)
>>  if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
>>res--;
>>return res;
>> }
>>
>> the choice means that gimple_call_num_nobnd_args is not O(1).
>
> Yes, but I don't see that's terribly problematical.

Well, consider compile.exp limits-fnargs.c written with int * parameters
and bound support ;)  [there is a limit on the number of bound registers
available, but the GIMPLE parts doesn't put any limit on instrumented
args?]

>>
>> /* Return INDEX's call argument ignoring bound ones.  */
>> static inline tree
>> gimple_call_nobnd_arg (const_gimple gs, unsigned index)
>> {
>>/* No bound args may exist if pointers checker is off.  */
>>if (!flag_check_pointer_bounds)
>>  return gimple_call_arg (gs, index);
>>return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs,
>> index));
>> }
>>
>> GIMPLE layout depending on flag_check_pointer_bounds sounds
>> like a recipie for desaster if you consider TUs compiled with and
>> TUs compiled without and LTO.  Or if you consider using
>> optimized attribute with that flag.
>
> Sorry, I don't follow.  Can you elaborate please.

Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files
which gives you - well, either -fno-chk or -fchk.  Then you read in
the GIMPLE and when calling gimple_call_nobnd_arg you get
a mismatch between where you generated the gimple and where
you use the gimple.

A flag on whether a call is instrumented is better (consider inlining
from TU1 into TU2 where a flag on struct function for example wouldn't
be enough either).

Richard.

>> I hope the reviewers that approved the patch will work with you to
>> address the above issues.  I can't be everywhere.
>
> Obviously I will.
>
> jeff
>


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 8:55 PM, Ian Lance Taylor  wrote:
> On Wed, Oct 30, 2013 at 8:47 AM, Jason Merrill  wrote:
>> I find -Wformat warnings about unknown % codes when building with an older
>> GCC to be mildly annoying noise; this patch avoids them by passing
>> -Wno-format during stage 1.
>>
>> Tested x86_64-pc-linux-gnu.  Is this OK for trunk?  Do you have another
>> theory of how this should work?
>
> Thank you!

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...

Richard.

> Ian


Re: [PATCH][RFA] Improvements to infer_nonnull_range

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 6:41 AM, Jeff Law  wrote:
> On 10/28/13 23:02, Jeff Law wrote:
>>
>>
>> Based on a suggestion from Marc, I want to use infer_nonnull_range in
>> the erroneous path isolation optimization.
>>
>> In the process of doing that, I found  a few deficiencies in
>> infer_nonnull_range that need to be addressed.
>>
>>
>> First, infer_nonnull_range_p doesn't infer ranges from a GIMPLE_RETURN
>> if the current function is marked as returns_nonnull.  That's fixed in
>> the relatively obvious way.
>>
>> Second, infer_nonnull_range_p, when presented with an arglist where the
>> non-null attribute applies to all pointer arguments, it won't bother to
>> determine if OP is actually one of the arguments :(  It just assumes
>> that OP is in the argument list.  Opps.
>>
>> Third, I want to be able to call infer_nonnull_range with OP being 0B.
>> That lets me use infer_nonnull_range to look for explicit null pointer
>> dereferences, explicit uses of null in a return statement in functions
>> that can't return non-null and explicit uses of null arguments when
>> those arguments can't be null.   Sadly, to make that work we need to use
>> operand_equal_p rather than simple pointer comparisons to see if OP
>> shows up in STMT.
>>
>> Finally, for detecting explicit null pointers, infer_nonnull_range calls
>> count_ptr_derefs.  count_ptr_derefs counts things in two ways.  One with
>> a FOR_EACH_SSA_TREE_OPERAND, then again with simple walks of the tree
>> structures.   Not surprisingly if we're looking for an explicit 0B, then
>> the loop over the operands finds nothing, but walking the tree
>> structures does. And the checking assert triggers.  This change removes
>> the assert and instead sets *num_uses_p to a sane value.
>>
>> I don't have testcases for this stuff that are independent of the
>> erroneous path isolation optimization.  However, each is triggered by
>> tests I'll include in the the erroneous path isolation patch.
>>
>> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for
>> the trunk?
>
> I'm withdrawing this patch.  While everything works exactly how I wanted, I
> think it's best to rework things a bit and totally eliminate
> count_ptr_derefs entirely.

Yes!  And count_uses_and_derefs, too.  See walk_stmt_load_store_addr_ops.

Richard.


> That change makes it significantly easier to move infer_nonnull_range out of
> tree-vrp.c and into gimple.c (or wherever Andrew wants it) without a
> significant modularity break.

> Jeff
>


Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT

2013-11-04 Thread Bin.Cheng
On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener
 wrote:
> On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng  wrote:
>> Hi,
>>
>> The IVOPT in GCC has a problem that it does not use cost of auto-increment
>> address expression in accounting, while it retreats to cost of address
>> expression if auto-increment addressing mode is unavailable.
>> For example, on ARM target:
>> 1) the cost of "[reg]" (which is 6) is used for address expression "[reg],
>> #off";
>> 2) the cost of "[reg+off]" (which is 2) is used for address expression
>> "[reg, #off]!";
>>
>> This causes:
>> 1) cost of non-auto increment address expression is used for auto-increment
>> address expression;
>> 2) different address costs are used for pre/post increment address
>> expressions.
>> This patch fixes the problem by computing, caching and using the cost of
>> auto-increment address expressions.
>>
>> Bootstrap and test on x86/arm.  Is it OK?
>
> But don't you need to adjust
>
> static bool
> determine_use_iv_cost_address (struct ivopts_data *data,
>struct iv_use *use, struct iv_cand *cand)
> {
>   bitmap depends_on;
>   bool can_autoinc;
>   int inv_expr_id = -1;
>   comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on,
>  &can_autoinc, &inv_expr_id);
>
>   if (cand->ainc_use == use)
> {
>   if (can_autoinc)
> cost.cost -= cand->cost_step;
>
> this which seems to try to compensate for your issue?
That's where problem gets complicated depending on how backend defines
address cost.  For back ends define cost according to the true cost of
addressing mode approximately, the address cost of auto-increment
addressing mode doesn't take the saved stepping instruction into
consideration, so the code is necessary.
Moreover, according to gcc internal's description about
TARGET_ADDRESS_COST, RISC machines may define different address cost
for addressing modes which actually have equal execution on
micro-architecture level (like ARM for now).  The problems are:
1) Though address costs are defined in this "discriminated" way, it's
unlikely to have the saved stepping instruction considered either.
The address cost of auto-increment address expression shouldn't go so
far.
2) We think the "discriminated" address cost model is established
before gimple pass and is outdated.  The true micro-architecture
address cost (or cost normalized with COSTS_N_INSNS) should be used in
GCC nowadays.  The rtl passes like fwprop_addr which use address cost
as heuristic information should be refined... but that's totally
another problem (am investigating it).

So overall, I think the stepping cost should to be subtracted here.

>
> Or maybe I don't understand.
>
> CCing Bernd who implemented this IIRC.
Any suggestions appreciated.

Thanks.
bin

-- 
Best Regards.


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
On Mon, Nov 4, 2013 at 3:34 AM, H.J. Lu  wrote:
> Y
> On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford
>  wrote:
>> "H.J. Lu"  writes:
>>> emit_block_move_via_movmem and set_storage_via_setmem have
>>>
>>>   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
>>>mode = GET_MODE_WIDER_MODE (mode))
>>> {
>>>   enum insn_code code = direct_optab_handler (movmem_optab, mode);
>>>
>>>   if (code != CODE_FOR_nothing
>>>   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
>>>  here because if SIZE is less than the mode mask, as it is
>>>  returned by the macro, it will definitely be less than the
>>>  actual mode mask.  */
>>>   && ((CONST_INT_P (size)
>>>&& ((unsigned HOST_WIDE_INT) INTVAL (size)
>>><= (GET_MODE_MASK (mode) >> 1)))
>>>   || GET_MODE_BITSIZE (mode) >= BITS_PER_WORD))
>>> {
>>>
>>> Backend may assume mode of size in movmem and setmem expanders is no
>>> widder than Pmode since size is within the Pmode address space.  X86
>>> backend expand_set_or_movmem_prologue_epilogue_by_misaligned has
>>>
>>>   rtx saveddest = *destptr;
>>>
>>>   gcc_assert (desired_align <= size);
>>>   /* Align destptr up, place it to new register.  */
>>>   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
>>>   GEN_INT (prolog_size),
>>>   NULL_RTX, 1, OPTAB_DIRECT);
>>>   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
>>>   GEN_INT (-desired_align),
>>>   *destptr, 1, OPTAB_DIRECT);
>>>   /* See how many bytes we skipped.  */
>>>   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, 
>>> saveddest,
>>>*destptr,
>>>saveddest, 1, OPTAB_DIRECT);
>>>   /* Adjust srcptr and count.  */
>>>   if (!issetmem)
>>> *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
>>> saveddest,
>>> *srcptr, 1, OPTAB_DIRECT);
>>>   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
>>> saveddest, *count, 1, OPTAB_DIRECT);
>>>
>>> saveddest is a negative number in Pmode and *count is in word_mode.  For
>>> x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
>>> leads to overflow.  We could fix it by using mode of saveddest to compute
>>> saveddest + *count.  But it leads to extra conversions and other backends
>>> may run into the same problem.  A better fix is to limit mode of size in
>>> movmem and setmem expanders to Pmode.  It generates better and correct
>>> memcpy and memset for x32.
>>>
>>> There is also a typo in comments.  It should be BITS_PER_WORD, not
>>> BITS_PER_HOST_WIDE_INT.
>>
>> I don't think it's a typo.  It's explaining why we don't have to worry about:
>>
>> GET_MODE_BITSIZE (mode) > BITS_PER_HOST_WIDE_INT
>>
>> in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
>> will be an all-1 HOST_WIDE_INT, even though that's narrower than the
>> real mask).
>
> Thanks for explanation.
>
>> I don't think the current comment covers the BITS_PER_WORD test at all.
>> AIUI it's there because the pattern is defined as taking a length of
>> at most word_mode, so we should stop once we reach it.
>
> I see.
>
>> FWIW, I agree Pmode makes more sense on face value.  But shouldn't
>> we replace the BITS_PER_WORD test instead of adding to it?  Having both
>> would only make a difference for Pmode > word_mode targets, which might
>> be able to handle full Pmode lengths.
>
> Do we ever have a target with Pmode is wider than
> word_mode? If not, we can check Pmode instead.
>
>> Either way, the md.texi documentation should be updated too.
>>

This is the updated patch with md.texi change.   The testcase is the same.
Tested on x32.  OK to install?

Thanks.

-- 
H.J.
---
gcc/

2013-11-04  H.J. Lu  

PR middle-end/58981
* doc/md.texi (@code{movmem@var{m}}): Specify Pmode as mode of
pattern, instead of word_mode.

* expr.c (emit_block_move_via_movmem): Don't use mode wider than
Pmode for size.
(set_storage_via_setmem): Likewise.

gcc/testsuite/

2013-11-04  H.J. Lu  

PR middle-end/58981
* gcc.dg/pr58981.c: New test.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ac10a0a..1e22b88 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5291,12 +5291,13 @@ are the first two operands, and both are
@code{mem:BLK}s with an
 address in mode @code{Pmode}.

 The number of bytes to move is the third operand, in mode @var{m}.
-Usually, you specify @code{word_mode} for @var{m}.  However, if you can
+Usually, you specify @code{Pmode} for @var{m}.  However, if you can
 generate better code knowing the range of valid leng

Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 06:42 PM, David Malcolm wrote:



!   gimple_statement_call *call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
  gimple_call_set_lhs (call, atree);

that is looking much better :-)

I'd love to make use of compile-time type-safety like this, but does
this have to gate this initial step?
no, I wasn't implying it did... just mapping out what I think we can go 
on to do with it.





And eventually we can pull the accessor routines and others into the
class itself:
 gimple_call call;
 call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 call->set_lhs (atree);

Nice.  It's readable (IMHO), and the type-checking (that it's a call) is
enforced at compile-time, rather than at run-time (and only in
ENABLE_CHECKING builds).

Sadly, this may need some further gengtype work: the simple inheritance
support I'm using in this patch series will barf if it sees methods
within a class.  So I can have a look at supporting methods in gengtype,
I guess.   Though the change above of gimple_call_set_lhs to accepting a
subclass ptr gives a middle-ground.
Eventually this is going to be an issue elsewhere as well.  Whats the 
problem with methods in gengtype?  it just doesn't parse them? or 
something deeper and more insidious?

Which results in a similar feel to the new gimple_type, gimple_decl,
etc. classes with my upcoming tree wrappers.  Changing gimple statements
to behave something like this is actually mentioned in the plan, but out
in the future once trees have been taken care of.

I would also plan to create instances for each of the gimple statements

By "instances" presumably you meant "subclasses"?

yeah.

That all said, this change enables that work to proceed if someone wants
to do it.

My question is: Is anyone going to do it, and if so,  who and when? :-)

I'd be up for working on a followup patch that adds such subclasses, and
I'd be up for changing the accessors to give up compile-time
type-safety.  I'd do it one accessor at a time.
Its probably pushing it for 4.9...  and doesn't really serve much 
purpose to try do do any of this followup work now.  Probably a better 
task  for the next stage 1 anyway when there is time to fully flush it 
out and prove the various improvements to have full support from 
everyone :-)

Again, as noted in the earlier patch series, the names of the structs
are rather verbose.  I would prefer to also rename them all to eliminate
the "_statement" component:
"gimple_statement_base" -> "gimple_base"
"gimple_statement_phi"  -> "gimple_phi"
"gimple_statement_omp"  -> "gimple_omp"
etc, but I didn't do this to mimimize the patch size.  But if the core
maintainers are up for that, I can redo the patch series with that
change also, or do that as a followup.

As mentioned, I'd rather see the short names be typedefs for pointers to
the long names since we're typically using the pointers.

[FWIW, I'm not a fan of such typedefs, but I'll defer to you in this]


Just expressing my opinion :-)


Presumably we'd want typedefs for the extra subclasses as well?

Also, I take it we'd use the "printable_name" from gimple.def - though
would we use it for the subclass name, or for the ptr typedef?


I would think a typedef for any type that we would directly refer to in 
the code, like a gimple_omp_clause, gimple_assign, or whatever.



Other than stomping on the same bits at the moment (less so once
gimple-stmt.[ch] is split out), these changes are completely orthogonal,
but in line with with my direction.  I think It should be done sooner or
later  My preference is to also see the follow up work carried out
as well.  Or at least a plan for it.

So you like the approach, provided I commit to the followup work? [1]

OK, I'll try to create some followup patches next week.  I'd prefer to
get the conversion to inheritance into trunk sooner rather than later,
though.

Thanks for looking at this (and for your cleanup work!)

Dave
[1] and the patches still need formal review.


Im not the one that has to be sold on this, I'm actually ambivalent to 
the change this close to the end of stage1...  I'm pointing out what we 
can eventually do with it in case it sways opinion..


I would have considered something similar in a "while" when the other 
refactoring work reached that point.  In some ways it might be better 
for next stage 1, but perhaps you have outlined enough benefits for it 
to be attractive to someone today.  It seems to be pretty neutral in 
most other respects.



Andrew


[Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed

2013-11-04 Thread Senthil Kumar Selvaraj
The AVR backend does not generate position independent code, yet it
happily accepts -fpic, -fPIC, -fpie and -fPIE. The generated code
doesn't change at all. Also, it accepts the -shared option to generate a
shared library, without really doing anything with it.

This causes one of the regression tests 
(gcc.dg/lto/pr54709 c_lto_pr54709_0.o-c_lto_pr54709_1.o link) to fail with 
an 'undefined reference to main' error, when the test is trying to build 
a shared object.

The attached patch generates a warning if one of the -f{pic,PIC,pie,PIE} 
options is provided, and an error if -shared is provided (
config/mep/mep.c and config/s390/tpf.h already do something very similar).

Regression tested with no new failures.Tests which exercise PIC now report as 
unsupported.

If ok, could someone commit please? I don't have commit access.

Regards
Senthil

gcc/ChangeLog
2013-11-04  Senthil Kumar Selvaraj  

* config/avr/avr.c (avr_option_override): Warn if asked to generate
position independent code.
* config/avr/avr.h: Modify LINK_SPEC to reject -shared.


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index e7e1c2f..cf4b8ca 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -310,6 +310,15 @@ avr_option_override (void)
   flag_omit_frame_pointer = 0;
 }
 
+  if (flag_pic == 1)
+warning (OPT_fpic, "-fpic is not supported");
+  if (flag_pic == 2)
+warning (OPT_fPIC, "-fPIC is not supported");
+  if (flag_pie == 1)
+warning (OPT_fpie, "-fpie is not supported");
+  if (flag_pie == 2)
+warning (OPT_fPIE, "-fPIE is not supported");
+
   avr_current_device = &avr_mcu_types[avr_mcu_index];
   avr_current_arch = &avr_arch_types[avr_current_device->arch];
 
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index f223a61..1eff5be 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -522,7 +522,8 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
mmcu=at90can64*|\
mmcu=at90usb64*:--pmem-wrap-around=64k}}}\
 %:device_to_ld(%{mmcu=*:%*})\
-%:device_to_data_start(%{mmcu=*:%*})"
+%:device_to_data_start(%{mmcu=*:%*})\
+%{shared:%eshared is not supported}"
 
 #define LIB_SPEC \
   
"%{!mmcu=at90s1*:%{!mmcu=attiny11:%{!mmcu=attiny12:%{!mmcu=attiny15:%{!mmcu=attiny28:
 -lc }"


Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 7:11 AM, Jeff Law  wrote:
>
> I've incorporated the various suggestions from Marc and Richi, except for
> Richi's to integrate this into jump threading.
>
> I've also made the following changes since the last version:
>
>   1. Added more testcases.
>
>   2. Use infer_nonnull_range, moving it from tree-vrp.c
>   into gimple.c.  Minor improvements to infer_nonnull_range
>   to make it handle more cases we care about and avoid using
>   unnecessary routines from tree-ssa.c (which can now be removed)
>
>   3. Multiple undefined statements in a block are handled in the
>   logical way.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for the
> trunk?

Comments inline

> Thanks,
> Jeff
>
> * Makefile.in (OBJS): Add gimple-ssa-isolate-paths.o
> * common.opt (-fisolate-erroneous-paths): Add option and
> documentation.
> * gimple-ssa-isolate-paths.c: New file.
> * gimple.c (check_loadstore): New function.
> (infer_nonnull_range): Moved into gimple.c from tree-vrp.c
> Verify OP is in the argument list and the argument corresponding
> to OP is a pointer type.  Use operand_equal_p rather than
> pointer equality when testing if OP is on the nonnull list.
> Use check_loadstore rather than count_ptr_derefs.  Handle
> GIMPLE_RETURN statements.
> * tree-vrp.c (infer_nonnull_range): Remove.
> * gimple.h (infer_nonnull_range): Declare.
> (gsi_start_nondebug_after_labels): New function.
> * opts.c (default_options_table): Add OPT_fisolate_erroneous_paths.
> * passes.def: Add pass_isolate_erroneous_paths.
> * timevar.def (TV_ISOLATE_ERRONEOUS_PATHS): New timevar.
> * tree-pass.h (make_pass_isolate_erroneous_paths): Declare.
> * tree-ssa.c (struct count_ptr_d): Remove.
> (count_ptr_derefs, count_uses_and_derefs): Remove.
> * tree-ssa.h (count_uses_and_derefs): Remove.
>
>
>
> * gcc.dg/pr38984.c: Add -fno-isolate-erroneous-paths.
> * gcc.dg/tree-ssa/20030711-3.c: Update expected output.
> * gcc.dg/tree-ssa/isolate-1.c: New test.
> * gcc.dg/tree-ssa/isolate-2.c: New test.
> * gcc.dg/tree-ssa/isolate-3.c: New test.
> * gcc.dg/tree-ssa/isolate-4.c: New test.
>
>
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 29609fd..7e9a702 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1233,6 +1233,7 @@ OBJS = \
> gimple-fold.o \
> gimple-low.o \
> gimple-pretty-print.o \
> +   gimple-ssa-isolate-paths.o \
> gimple-ssa-strength-reduction.o \
> gimple-streamer-in.o \
> gimple-streamer-out.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index deeb3f2..6db9f56 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2104,6 +2104,12 @@ foptimize-strlen
>  Common Report Var(flag_optimize_strlen) Optimization
>  Enable string length optimizations on trees
>
> +fisolate-erroneous-paths
> +Common Report Var(flag_isolate_erroneous_paths) Init(1) Optimization

Drop Init(1) (see below)

> +Detect paths which trigger erroneous or undefined behaviour.  Isolate those
> +paths from the main control flow and turn the statement with erroneous or
> +undefined behaviour into a trap.
> +
>  ftree-loop-distribution
>  Common Report Var(flag_tree_loop_distribution) Optimization
>  Enable loop distribution on trees
> diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
> new file mode 100644
> index 000..aa526cc
> --- /dev/null
> +++ b/gcc/gimple-ssa-isolate-paths.c
> @@ -0,0 +1,332 @@
> +/* Detect paths through the CFG which can never be executed in a conforming
> +   program and isolate them.
> +
> +   Copyright (C) 2013
> +   Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "flags.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "tree-ssa.h"
> +#include "gimple-ssa.h"
> +#include "tree-ssa-operands.h"
> +#include "tree-phinodes.h"
> +#include "ssa-iterators.h"
> +#include "cfgloop.h"
> +#include "tree-pass.h"
> +
> +
> +static bool cfg_altered;
> +
> +/* BB when reached via incoming edge E will exhibit undefined behaviour
> +   at S

Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 06:58 PM, David Malcolm wrote:

On Fri, 2013-11-01 at 22:57 +0100, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote:

On 11/01/2013 05:41 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);

The checking you are removing here.


What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual "does my parameter match" that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,
I expect you'd do as_a which is not only further uglification, but has
runtime cost also for --enable-checking=release.

I can have a look next week at every call to gimple_call_set_lhs in the
tree, and see to what extent we know at compile-time that the initial
arg is indeed a call (of the ones I quickly grepped just now, most are
from gimple_build_call and friends, but one was from a gimple_copy).

FWIW I did some performance testing of the is_a/as_a code in the earlier
version of the patch, and it didn't have a noticable runtime cost
compared to the GIMPLE_CHECK in the existing code:
Size of compiler executable:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01920.html
Compile times:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00171.html
I actually really dislike as_a<> and is_a<>, and  think code needs to be 
restructured rather than use them, other than possibly at the very 
bottom level when we're allocating memory or something like that, or 
some kind of emergency :-)...   If we require frequent uses of those, 
I'd be against it, I find them quite ugly.


Like I said in the other reply, no rush, I don't think any of this 
follow up is appropriate this late in stage 1.  It would be more of an 
"interest" examination right now.. at least in my opinion...  I suspect 
thinks like gimple_assign are more complex cases, but without looking 
its hard to tell for sure.


Andrew



Re: Pre-Patch RFC: proposed changes to option-lookup

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 11:56 PM, Joseph S. Myers
 wrote:
> On Wed, 30 Oct 2013, David Malcolm wrote:
>
>> My idea is to introduce a GCC_OPTION macro, and replace the above with:
>>
>>   static bool
>>   gate_vrp (void)
>>   {
>> return GCC_OPTION (flag_tree_vrp) != 0;
>>   }
>
> That's only slightly shorter than the full expansion using global_options;
> I'd prefer using the full expansion.
>
> (Of course the ideal is to use explicit options pointers instead of
> global_options.  For example, if such a pointer were associated with the
> current function, it might make function-specific options handling a bit
> less fragile.)

Seconded - ideally the 'struct function' a pass is supposed to work
on would be passed as argument here and the options in effect
should be able to be extracted from it.  [that is, we shouldn't have
push/pop_cfun or cfun or current_function_decl at all]

Note that I think expanding this to global_options.x_flag_tree_vrp
wouldn't be a step forward.

Richard.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


[PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian

2013-11-04 Thread Bill Schmidt
Hi,

This cleans up another case where a vector-pack operation needs to
reverse its operand order for little endian.  This fixes the last
remaining vector test failure in the test suite when building for little
endian.

Next I'll be spending some time looking for additional little endian
issues for which we don't yet have test coverage.  As an example, there
are several patterns similar to vect_pack_trunc_v2df that probably need
the same treatment as this patch, but we don't have any test cases that
expose a problem.  I need to verify those are broken and add test cases
when fixing them.

Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


2013-11-04  Bill Schmidt  

* config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
little endian.


Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 204320)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -830,7 +830,12 @@
 
   emit_insn (gen_vsx_xvcvdpsp (r1, operands[1]));
   emit_insn (gen_vsx_xvcvdpsp (r2, operands[2]));
-  rs6000_expand_extract_even (operands[0], r1, r2);
+
+  if (BYTES_BIG_ENDIAN)
+rs6000_expand_extract_even (operands[0], r1, r2);
+  else
+rs6000_expand_extract_even (operands[0], r2, r1);
+
   DONE;
 })
 




Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:02 AM, Ilya Enkovich  wrote:
> Hi,
>
> Here is a patch which hadles the problem with optimization of 
> BUILT_IN_CHKP_ARG_BND calls.  Pointer Bounds Checker expects that argument of 
> this call is a default SSA_NAME of the PARM_DECL whose bounds we want to get. 
>  The problem is in optimizations which may replace arg with it's copy or a 
> known value.  This patch suppress such modifications.

This doesn't seem like a good fix.  I suppose you require the same on
RTL, that is, have the incoming arg reg coalesced with the use?
In that case better set SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

Richard.

> Thanks,
> Ilya
> --
>
> gcc/
>
> 2013-10-28  Ilya Enkovich  
>
> * tree-into-ssa.c: Include "target.h"
> (rewrite_update_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.
> * tree-ssa-dom.c: Include "target.h"
> (cprop_into_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.
>
>
> diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
> index 981e9f4..8d48f6d 100644
> --- a/gcc/tree-into-ssa.c
> +++ b/gcc/tree-into-ssa.c
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "diagnostic-core.h"
>  #include "tree-into-ssa.h"
> +#include "target.h"
>
>
>  /* This file builds the SSA form for a function as described in:
> @@ -1921,8 +1922,14 @@ rewrite_update_stmt (gimple stmt, gimple_stmt_iterator 
> gsi)
>  }
>
>/* Rewrite USES included in OLD_SSA_NAMES and USES whose underlying
> - symbol is marked for renaming.  */
> -  if (rewrite_uses_p (stmt))
> + symbol is marked for renaming.
> + Skip calls to BUILT_IN_CHKP_ARG_BND whose arg should never be
> + renamed.  */
> +  if (rewrite_uses_p (stmt)
> +  && !(flag_check_pointer_bounds
> +  && (gimple_code (stmt) == GIMPLE_CALL)
> +  && gimple_call_fndecl (stmt)
> +  == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND)))
>  {
>if (is_gimple_debug (stmt))
> {
> diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
> index 211bfcf..445278a 100644
> --- a/gcc/tree-ssa-dom.c
> +++ b/gcc/tree-ssa-dom.c
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "tree-ssa-threadedge.h"
>  #include "tree-ssa-dom.h"
> +#include "target.h"
>
>  /* This file implements optimizations on the dominator tree.  */
>
> @@ -2266,6 +2267,16 @@ cprop_into_stmt (gimple stmt)
>use_operand_p op_p;
>ssa_op_iter iter;
>
> +  /* Call used to obtain bounds of input arg by Pointer Bounds Checker
> + should not be optimized.  Argument of the call is a default
> + SSA_NAME of PARM_DECL.  It should never be replaced by value.  */
> +  if (flag_check_pointer_bounds && gimple_code (stmt) == GIMPLE_CALL)
> +{
> +  tree fndecl = gimple_call_fndecl (stmt);
> +  if (fndecl == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND))
> +   return;
> +}
> +
>FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE)
>  cprop_operand (stmt, op_p);
>  }


Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:24 AM, Ilya Enkovich  wrote:
> Hi,
>
> Here is a patch to support of instrumented code in calls verifiers and calls 
> copy with skipped args.
>
> Thanks,
> Ilya
> --
>
> gcc/
>
> 2013-10-29  Ilya Enkovich  
>
> * cgraph.c (gimple_check_call_args): Handle bound args.
> * gimple.c (gimple_call_copy_skip_args): Likewise.
> (validate_call): Likewise.
>
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 52d9ab0..9d7ae85 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -3030,40 +3030,54 @@ gimple_check_call_args (gimple stmt, tree fndecl, 
> bool args_count_match)
>  {
>for (i = 0, p = DECL_ARGUMENTS (fndecl);
>i < nargs;
> -  i++, p = DECL_CHAIN (p))
> +  i++)
> {
> - tree arg;
> + tree arg = gimple_call_arg (stmt, i);
> +
> + /* Skip bound args inserted by Pointer Bounds Checker.  */
> + if (POINTER_BOUNDS_P (arg))
> +   continue;
> +
>   /* We cannot distinguish a varargs function from the case
>  of excess parameters, still deferring the inlining decision
>  to the callee is possible.  */
>   if (!p)
> break;
> - arg = gimple_call_arg (stmt, i);
> +
>   if (p == error_mark_node
>   || arg == error_mark_node
>   || (!types_compatible_p (DECL_ARG_TYPE (p), TREE_TYPE (arg))
>   && !fold_convertible_p (DECL_ARG_TYPE (p), arg)))
>  return false;
> +
> + p = DECL_CHAIN (p);
> }
>if (args_count_match && p)
> return false;
>  }
>else if (parms)
>  {
> -  for (i = 0, p = parms; i < nargs; i++, p = TREE_CHAIN (p))
> +  for (i = 0, p = parms; i < nargs; i++)
> {
> - tree arg;
> + tree arg = gimple_call_arg (stmt, i);
> +
> + /* Skip bound args inserted by Pointer Bounds Checker.  */
> + if (POINTER_BOUNDS_P (arg))
> +   continue;
> +
>   /* If this is a varargs function defer inlining decision
>  to callee.  */
>   if (!p)
> break;
> - arg = gimple_call_arg (stmt, i);
> +
>   if (TREE_VALUE (p) == error_mark_node
>   || arg == error_mark_node
>   || TREE_CODE (TREE_VALUE (p)) == VOID_TYPE
>   || (!types_compatible_p (TREE_VALUE (p), TREE_TYPE (arg))
>   && !fold_convertible_p (TREE_VALUE (p), arg)))
>  return false;
> +
> + p = TREE_CHAIN (p);
> }
>  }
>else
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 20f6010..dc85bf8 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -3048,15 +3048,20 @@ canonicalize_cond_expr_cond (tree t)
>  gimple
>  gimple_call_copy_skip_args (gimple stmt, bitmap args_to_skip)
>  {
> -  int i;
> +  int i, bit;
>int nargs = gimple_call_num_args (stmt);
>vec vargs;
>vargs.create (nargs);
>gimple new_stmt;
>
> -  for (i = 0; i < nargs; i++)
> -if (!bitmap_bit_p (args_to_skip, i))
> -  vargs.quick_push (gimple_call_arg (stmt, i));
> +  for (i = 0, bit = 0; i < nargs; i++, bit++)
> +  if (POINTER_BOUNDS_P (gimple_call_arg (stmt, i)))
> +   {
> + if (!bitmap_bit_p (args_to_skip, --bit))
> +   vargs.quick_push (gimple_call_arg (stmt, i));
> +   }
> +  else if (!bitmap_bit_p (args_to_skip, bit))
> + vargs.quick_push (gimple_call_arg (stmt, i));

The new code is completely confusing.  You need to update
comments with what bits args_to_skip refers to and what happens
with pointer bounds.  I suppose the bitmap also contains
bound parameters as they are in calls (but not in fndecls -- I think
this discrepancy still is a way to desaster).

>if (gimple_call_internal_p (stmt))
>  new_stmt = gimple_build_call_internal_vec (gimple_call_internal_fn 
> (stmt),
> @@ -3702,6 +3707,9 @@ validate_call (gimple stmt, tree fndecl)
>if (!targs)
> return true;
>tree arg = gimple_call_arg (stmt, i);
> +  /* Skip bounds.  */
> +  if (flag_check_pointer_bounds && POINTER_BOUNDS_P (arg))
> +   continue;

Why check flag_check_pointer_bounds here but not elsewhere?

>if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
>   && INTEGRAL_TYPE_P (TREE_VALUE (targs)))
> ;


[Patch, testsuite] Require scheduling support in gcc.dg/superblock.c

2013-11-04 Thread Senthil Kumar Selvaraj
Hi,

gcc.dg/superblock.c uses the -fdump-rtl-sched2 option and then scans the 
resulting dump. This fails for the AVR target, as it doesn't have any
instruction scheduling support.

The attached patch makes this test require scheduling support in the
target. If ok, could someone commit please? I do not have commit access.

Regards
Senthil

gcc/testsuite

2013-11-04  Senthil Kumar Selvaraj  

* gcc.dg/superblock.c: Require scheduling support

diff --git gcc/testsuite/gcc.dg/superblock.c gcc/testsuite/gcc.dg/superblock.c
index 2b9aedf..272d161 100644
--- gcc/testsuite/gcc.dg/superblock.c
+++ gcc/testsuite/gcc.dg/superblock.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fno-asynchronous-unwind-tables -fsched2-use-superblocks 
-fdump-rtl-sched2 -fdump-rtl-bbro" } */
+/* { dg-require-effective-target scheduling } */
 
 typedef int aligned __attribute__ ((aligned (64)));
 extern void abort (void);


Re: Aliasing: look through pointer's def stmt

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 6:11 PM, Marc Glisse  wrote:
> On Wed, 30 Oct 2013, Richard Biener wrote:
>
>>> --- gcc/tree-ssa-alias.c(revision 204188)
>>> +++ gcc/tree-ssa-alias.c(working copy)
>>> @@ -567,20 +567,29 @@ void
>>>  ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
>>>  {
>>>HOST_WIDE_INT t1, t2;
>>>ref->ref = NULL_TREE;
>>>if (TREE_CODE (ptr) == SSA_NAME)
>>>  {
>>>gimple stmt = SSA_NAME_DEF_STMT (ptr);
>>>if (gimple_assign_single_p (stmt)
>>>   && gimple_assign_rhs_code (stmt) == ADDR_EXPR)
>>> ptr = gimple_assign_rhs1 (stmt);
>>> +  else if (is_gimple_assign (stmt)
>>> +  && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
>>> +  && host_integerp (gimple_assign_rhs2 (stmt), 0)
>>> +  && (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) >= 0)
>>
>>
>> No need to restrict this to positive offsets I think.
>
>
> Here is the follow-up patch to remove this restriction (bootstrap+testsuite
> on x86_64-unknown-linux-gnu).
> I don't really know how safe it is.
>
> I couldn't use host_integerp(,1) since it returns false for (size_t)(-1) on
> x86_64, and host_integerp(,0) seemed strange, but I could use it if you
> want.

Well, host_integer_p (, 0) looks correct to me.  But ...

> Index: gcc/tree-ssa-alias.h
> ===
> --- gcc/tree-ssa-alias.h(revision 204267)
> +++ gcc/tree-ssa-alias.h(working copy)
> @@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct
>
>  extern void dump_pta_stats (FILE *);
>
>  extern GTY(()) struct pt_solution ipa_escaped_pt;
>
>  /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2]
> overlap.  SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the
> range is open-ended.  Otherwise return false.  */
>
>  static inline bool
> -ranges_overlap_p (unsigned HOST_WIDE_INT pos1,
> - unsigned HOST_WIDE_INT size1,
> - unsigned HOST_WIDE_INT pos2,
> - unsigned HOST_WIDE_INT size2)
> +ranges_overlap_p (HOST_WIDE_INT pos1,
> + HOST_WIDE_INT size1,
> + HOST_WIDE_INT pos2,
> + HOST_WIDE_INT size2)

I think size[12] should still be unsigned (we don't allow negative
sizes but only the special value -1U which we special-case only to
avoid pos + size to wrap)

Otherwise ok.

Thanks,
Richard.

>  {
>if (pos1 >= pos2
> -  && (size2 == (unsigned HOST_WIDE_INT)-1
> +  && (size2 == -1
>   || pos1 < (pos2 + size2)))
>  return true;
>if (pos2 >= pos1
> -  && (size1 == (unsigned HOST_WIDE_INT)-1
> +  && (size1 == -1
>   || pos2 < (pos1 + size1)))
>  return true;
>
>return false;
>  }



>
>
>  #endif /* TREE_SSA_ALIAS_H  */
>


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 05:57 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote:

On 11/01/2013 05:41 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);

The checking you are removing here.


What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual "does my parameter match" that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,


But when you convert all the source base for gimple calls, then you have 
context pretty much everywhere.. you wont be doing it from a base 
class...  at least pretty infrequently.

I expect you'd do as_a which is not only further uglification, but has
runtime cost also for --enable-checking=release.


Im not a fan of as_a<> or is_a<> at all.  I really dislike them. They 
should be kept to the barest minimum, or even non existent if 
possible.   If you need it, you probably need restructured source.  
ie, functions which work with switches on gimple_code() need to be 
restructured... And the end result should look cleaner.


Of course, thats what makes it a big project too :-)

Andrew


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote:
> >>What checking?  There ought to be no checking at all in this
> >>example...  gimple_build_call_vec returns a gimple_call, and
> >>gimple_call_set_lhs()  doesn't have to check anything because it
> >>only accepts gimple_call's.. so there is no checking other than the
> >>usual "does my parameter match" that the compiler has to do...
> >and want to replace it by checking of the types at compile time.
> >The problem is that it uglifies the source too much, and, when you
> >actually don't have a gimple_call but supposedly a base class of it,
> 
> But when you convert all the source base for gimple calls, then you
> have context pretty much everywhere.. you wont be doing it from a
> base class...  at least pretty infrequently.

Usually you just have IL which contains various kinds of statements,
it can be a call, assign, dozens of other gimple stmt forms.
If you need to modify something in these, you'd need to uglify with
as_a/is_a if you require everybody using these setters/getters to use
gimple_call * rather than gimple, pointer to any kind of stmt.

Jakub


Re: [PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian

2013-11-04 Thread David Edelsohn
On Mon, Nov 4, 2013 at 8:28 AM, Bill Schmidt
 wrote:
> Hi,
>
> This cleans up another case where a vector-pack operation needs to
> reverse its operand order for little endian.  This fixes the last
> remaining vector test failure in the test suite when building for little
> endian.
>
> Next I'll be spending some time looking for additional little endian
> issues for which we don't yet have test coverage.  As an example, there
> are several patterns similar to vect_pack_trunc_v2df that probably need
> the same treatment as this patch, but we don't have any test cases that
> expose a problem.  I need to verify those are broken and add test cases
> when fixing them.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2013-11-04  Bill Schmidt  
>
> * config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
> little endian.

This patch is okay.

Thanks, David


Re: [PATCH, rs6000] Re-permute source register for postreload splits of VSX LE stores

2013-11-04 Thread David Edelsohn
On Sat, Nov 2, 2013 at 7:44 PM, Bill Schmidt
 wrote:
> Hi,
>
> When I created the patch to split VSX loads and stores to add permutes
> so the register image was correctly little endian, I forgot to implement
> a known requirement.  When a VSX store is split after reload, we must
> reuse the source register for the permute, which leaves it in a swapped
> state after the store.  If the register remains live, this is invalid.
> After the store, we need to permute the register back to its original
> value.
>
> For each of the little endian VSX store patterns, I've replaced the
> define_insn_and_split with a define_insn and two define_splits, one for
> prior to reload and one for after reload.  The post-reload split has the
> extra behavior.
>
> I don't know of a way to set the insn's length attribute conditionally,
> so I'm optimistically setting this to 8, though it will be 12 in the
> post-reload case.  Is there any concern with that?  Is there a way to
> make it fully accurate?
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
> regressions.  This fixes two failing test cases.  Is this ok for trunk?
>
> Thanks!
> Bill
>
>
> 2013-11-02  Bill Schmidt  
>
> * config/rs6000/vsx.md (*vsx_le_perm_store_ for VSX_D):
> Replace the define_insn_and_split with a define_insn and two
> define_splits, with the split after reload re-permuting the source
> register to its original value.
> (*vsx_le_perm_store_ for VSX_W): Likewise.
> (*vsx_le_perm_store_v8hi): Likewise.
> (*vsx_le_perm_store_v16qi): Likewise.

This is okay, but you must change the insn length to the conservative,
larger value.  insn length is used to reverse branches when the
displacement is too large.

Thanks, David


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/04/2013 09:00 AM, Jakub Jelinek wrote:

On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote:

What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual "does my parameter match" that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,

But when you convert all the source base for gimple calls, then you
have context pretty much everywhere.. you wont be doing it from a
base class...  at least pretty infrequently.

Usually you just have IL which contains various kinds of statements,
it can be a call, assign, dozens of other gimple stmt forms.
If you need to modify something in these, you'd need to uglify with
as_a/is_a if you require everybody using these setters/getters to use
gimple_call * rather than gimple, pointer to any kind of stmt.

Jakub
In many cases splitting out the case body and typing functions will 
surprisingly cover many of the the uses required.
Proper subclassing will cover a lot too if the get/set methods are in 
the right classes.  In cases where none of this works...  well, there is 
an appropriate time for a virtual function...  Thats really all those 
kinds off switches are implementing  but I won't get into that meta 
discussion right now :-).   We can save that for someday when there is 
actually a real proposal on the table.


Andrew



Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?

2013-11-04 Thread David Edelsohn
Hi, Steven

Thanks for investigating this. This presumably was the reason that
Vlad changed the constraint modifier for that pattern in his patch for
LRA.  I don't think that using memory is an improvement, but Mike is
the best person to comment.

Thanks, David

On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher  wrote:
> Hello,
>
> Today's powerpc64-linux gcc has 2 extra failures with -mlra vs. reload
> (i.e. svn unpatched).
>
> (I'm excluding guality failure differences here because there are too
> many of them that seem to fail at random after minimal changes
> anywhere in the compiler...).
>
> Test results are posted here:
> reload: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00128.html
> lra: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00129.html
>
> The new failures and total score is as follows (+=lra, -=reload):
> +FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
> +FAIL: gcc.target/powerpc/pr58330.c scan-assembler-not stwbrx
>
> === gcc Summary ===
>
> -# of expected passes   97887
> -# of unexpected failures   536
> +# of expected passes   97903
> +# of unexpected failures   538
>  # of unexpected successes  38
>  # of expected failures 244
> -# of unsupported tests 1910
> +# of unsupported tests 1892
>
>
> The failure of pr53199.c is because of different instruction selection
> for bswap. Test case is reduced to just one function:
>
> /* { dg-options "-O2 -mcpu=power6 -mavoid-indexed-addresses" } */
> long long
> reg_reverse (long long x)
> {
>   return __builtin_bswap64 (x);
> }
>
> Reload left vs. LRA right:
> reg_reverse:   reg_reverse:
> srdi 8,3,32  | addi 8,1,-16
> rlwinm 7,3,8,0x  | srdi 10,3,32
> rlwinm 9,8,8,0x  | addi 9,8,4
> rlwimi 7,3,24,0,7| stwbrx 3,0,8
> rlwimi 7,3,24,16,23  | stwbrx 10,0,9
> rlwimi 9,8,24,0,7| ld 3,-16(1)
> rlwimi 9,8,24,16,23  <
> sldi 7,7,32  <
> or 7,7,9 <
> mr 3,7   <
> blrblr
>
> This same difference is responsible for the failure of pr58330.c which
> also uses __builtin_bswap64().
>
> The difference in RTL for the test case is this (after reload vs. after LRA):
> -   11: {%7:DI=bswap(%3:DI);clobber %8:DI;clobber %9:DI;clobber %10:DI;}
> -   20: %3:DI=%7:DI
> +   20: %8:DI=%1:DI-0x10
> +   21: %8:DI=%8:DI  // stupid no-op move
> +   11: {[%8:DI]=bswap(%3:DI);clobber %9:DI;clobber %10:DI;clobber scratch;}
> +   19: %3:DI=[%1:DI-0x10]
>
> So LRA believes going through memory is better than using a register,
> even though obviously there are plenty registers available.
>
> What LRA does:
>   Creating newreg=129
> Removing SCRATCH in insn #11 (nop 2)
>   Creating newreg=130
> Removing SCRATCH in insn #11 (nop 3)
>   Creating newreg=131
> Removing SCRATCH in insn #11 (nop 4)
> // at this point the insn would be a bswapdi2_64bit:
> //   11: {%3:DI=bswap(%3:DI);clobber r129;clobber r130;clobber r131;}
> // cost calculation for the insn alternatives:
> 0 Early clobber: reject++
> 1 Non-pseudo reload: reject+=2
> 1 Spill pseudo in memory: reject+=3
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
> 4 Scratch win: reject+=2
>   alt=0,overall=18,losers=1,rld_nregs=0
> 0 Non-pseudo reload: reject+=2
> 0 Spill pseudo in memory: reject+=3
> 0 Non input pseudo reload: reject++
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
>   alt=1,overall=16,losers=1,rld_nregs=0
> Staticly defined alt reject+=12
> 0 Early clobber: reject++
> 2 Scratch win: reject+=2
> 3 Scratch win: reject+=2
> 4 Scratch win: reject+=2
> 0 Conflict early clobber reload: reject--
>   alt=2,overall=24,losers=1,rld_nregs=0
>  Choosing alt 1 in insn 11:  (0) Z  (1) r  (2) &b  (3) &r  (4)
> X {*bswapdi2_64bit}
>   Change to class BASE_REGS for r129
>   Change to class GENERAL_REGS for r130
>   Creating newreg=132 from oldreg=3, assigning class NO_REGS to r132
>   Change to class NO_REGS for r131
>11: {r132:DI=bswap(%3:DI);clobber r129:DI;clobber r130:DI;clobber r131:DI;}
>   REG_UNUSED r131:DI
>   REG_UNUSED r130:DI
>   REG_UNUSED r129:DI
>
> LRA selects alternative 1 (Z,r,&b,&r,X) which seems to be the right
> choice, from looking at the constraints. Reload selects alternative 2
> which is slightly^2 discouraged: (??&r,r,&r,&r,&r).
>
> Is this an improvement or a regression? If it's an improvement then
> these two test cases should be adjusted :-)
>
> Ciao!
> Steven


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Kenneth Zadeck

On 11/04/2013 04:12 AM, Richard Biener wrote:

On Fri, 1 Nov 2013, Richard Sandiford wrote:


I'm building one target for each supported CPU and comparing the wide-int
assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
output from the merge point.  This patch removes all the differences I saw
for alpha-linux-gnu in gcc.c-torture.

Hunk 1: Preserve the current trunk behaviour in which the shift count
is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
inspection after hunk 5.

Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
types and where we weren't extending according to the sign of the source.
We should probably assert that the input is at least as wide as the type...

Hunk 4: The "&" in:

  if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
  && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)

had got dropped during the conversion.

Hunk 5: The old code was:

  if (shift > 0)
{
  *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
  *val = r1val.llshift (shift, TYPE_PRECISION (type));
}
  else if (shift < 0)
{
  shift = -shift;
  *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
  *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
}

and these precision arguments had two purposes: to control which
bits of the first argument were shifted, and to control the truncation
mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
for the second.

(BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
end of the !GENERATOR_FILE list in rtl.def, since the current position caused
some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
on code, RTL PRE creates registers in the hash order of the associated
expressions, RA uses register numbers as a tie-breaker during ordering,
and so the order of rtx_code can indirectly influence register allocation.
First time I'd realised that could happen, so just thought I'd mention it.
I think we should keep rtl.def in the current (logical) order though.)

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?

Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
from double-int.c on the trunk?  If not then putting this at the
callers of wi::rshift and friends is clearly asking for future
mistakes.

Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
than in machine instruction patterns was a mistake in the past.
I disagree.   The programmer has the right to expect that the optimizer 
transforms the program in a manner that is consistent, but faster, with 
the underlying machine.   Saying you are going to do it in the patterns 
is not realistic.


I should point out that i had originally put the shift count truncated 
inside wide-int and you (richi) told me to push it back out to the 
callers.   I personally think that it should be returned to wide-int to 
assure consistent behavior.


However, i would also say that it would be nice if were actually done 
right.   There are (as far as i know) actually 3 ways handling shift 
amounts that have been used in the past:


(1) mask the shift amount by (bitsize - 1).   This is the most popular 
and is what is done if you set SHIFT_COUNT_TRUNCATED.
(2) mask the shift amount by ((2*bitsize) -1).   There are a couple of 
architectures that do this including the ppc.   However, gcc has never 
done it correctly for these platforms.
(3) assume that the shift amount can be a big number.The only 
machine that i know of that ever did this was the Knuth's mmix, and 
AFAIK, it has never "known" silicon.   (it turns out that this is almost 
impossible to build efficiently).   And yet, we support this if you do 
not set SHIFT_COUNT_TRUNCATED.
I will, if you like, fix this properly.   But i want to see a plan that 
at least richi and richard are happy with first.My gut feeling is to 
provide a target hook that takes a bitsize and a shift value as wide-int 
(or possibly a hwi) and returns a machine specific shift amount.Then 
i would put that inside of the shift routines of wide-int.


kenny







Re: libsanitizer merge from upstream r191666

2013-11-04 Thread Konstantin Serebryany
+glider, our Mac expert.

Hi Jack,

This patch has not been tested on Mac and we generally do not claim
that gcc-asan is supported on Mac.
clang-asan *is* supported on Mac and our bots are green (this patch is
a merge of the sources which are regularly tested on Mac,
but the build procedure is different).

The failing assertion is weird, I haven't seen it since last year
(https://code.google.com/p/address-sanitizer/issues/detail?id=117),
but Alexander should know more.

What is your version of Mac OS?

--kcc

On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth  wrote:
> On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote:
>> Jukub,
>>
>> This works!
>> New patch attached.
>
> Konstantin,
>This patch, when applied on top of r204305, bootstraps fine on 
> x86_64-apple-darwin12 but
> produces a lot of regressions with...
>
> make -k check RUNTESTFLAGS="asan.exp --target_board=unix'{-m64}'"
>
> Native configuration is x86_64-apple-darwin12.5.0
>
> === g++ tests ===
>
>
> Running target unix/-m64
> FAIL: c-c++-common/asan/global-overflow-1.c  -O0  execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O1  execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O2  execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -fomit-frame-pointer  
> execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -Os  execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto -flto-partition=none  
> execution test
> FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O0  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O1  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O2  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -fomit-frame-pointer  
> execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -Os  execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto -flto-partition=none  
> execution test
> FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O0  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O1  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O2  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O3 -fomit-frame-pointer  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -Os  execution test
> FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto -flto-partition=none  execution 
> test
> FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O0  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O1  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O2  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O3 -fomit-frame-pointer  execution 
> test
> FAIL: c-c++-common/asan/null-deref-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -Os  execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto -flto-partition=none  
> execution test
> FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O0  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O1  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -fomit-frame-pointer  
> execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -Os  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto 
> -flto-partition=none  execution test
> FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2  execution test
> FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2 -flto 
> -flto-partition=none  execution test
> FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O0  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O1  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O2  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O3 -fomit-frame-pointer  
> execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O3 -g  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -Os  execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O2 -flto -flto-partition=none  
> execution test
> FAIL: c-c++-common/asan/stack-overflow-1.c  -O2 -flto  execution test
> FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2  execution test
> FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2 -flto -flto-partition=none 
>  execution test
> FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2 -flto 

Ping^3 Re: Add predefined macros for library use in defining __STDC_IEC_559*

2013-11-04 Thread Joseph S. Myers
Ping^3.  The addition of a new target hook in this patch 
 is still pending 
review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Ping^3 Re: Add predefined macros for library use in defining __STDC_IEC_559*

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 02:51:37PM +, Joseph S. Myers wrote:
> Ping^3.  The addition of a new target hook in this patch 
>  is still pending 
> review.

Ok.

Jakub


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Kenneth Zadeck  writes:
> However, i would also say that it would be nice if were actually done 
> right.   There are (as far as i know) actually 3 ways handling shift 
> amounts that have been used in the past:
>
> (1) mask the shift amount by (bitsize - 1).   This is the most popular 
> and is what is done if you set SHIFT_COUNT_TRUNCATED.
> (2) mask the shift amount by ((2*bitsize) -1).   There are a couple of 
> architectures that do this including the ppc.   However, gcc has never 
> done it correctly for these platforms.

There's TARGET_SHIFT_TRUNCATION_MASK for these cases.  It applies only
to the shift instruction patterns though, not to any random rtl shift
expression.

Part of the reason for this mess is that the things represented as
rtl shifts didn't necessarily start out as shifts.  E.g. during combine,
extractions are rewritten as shifts, optimised, then converted back.
So since SHIFT_COUNT_TRUNCATED applies to all rtl shift expressions,
it also has to apply to all bitfield operations, since bitfield operations
temporarily appear as shifts (see tm.texi).  TARGET_SHIFT_TRUNCATION_MASK
deliberately limits the scope to the shift instruction patterns because
that's easier to contain.

E.g. knowing that shifts operate to a certain mask allows you
to generate more efficient doubleword shift sequences, which was why
TARGET_SHIFT_TRUNCATION_MASK was added in the first place.

> (3) assume that the shift amount can be a big number.The only 
> machine that i know of that ever did this was the Knuth's mmix, and 
> AFAIK, it has never "known" silicon.   (it turns out that this is almost 
> impossible to build efficiently).   And yet, we support this if you do 
> not set SHIFT_COUNT_TRUNCATED.

Well, we support it in the sense that we make no assumptions about what
happens, at least at the rtl level.  We just punt on any out-of-range shift
and leave it to be evaluated at run time.

Thanks,
Richard



Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Bernd Edlinger  writes:


> I see another "read_line" at gcov.c, which seems to be a copy.
>
> Maybe this should be changed too?

I have seen it as well.

I'd rather have the patch be reviewed and everthing, and only then
propose to share the implementation with the gcov module.

-- 
Dodji


[omp4]

2013-11-04 Thread Aldy Hernandez

Hi.

While looking over some of your testcases I noticed that array 
subscripts are not being properly adjusted:


foo(int i) {
array[i] = 
}

The `i' should use the simd array magic instead of ICEing :).

Is the attached patch OK for the branch?

Aldy
gcc/ChangeLog.gomp

* omp-low.c (ipa_simd_modify_function_body): Adjust tree operands
on the LHS of an assign.
(ipa_simd_modify_function_body_ops_1): New.
(ipa_simd_modify_function_body_ops): New.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d30fb17..94058af 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11049,6 +11049,35 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
   return seq;
 }
 
+static tree
+ipa_simd_modify_function_body_ops_1 (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  ipa_parm_adjustment_vec *adjustments = (ipa_parm_adjustment_vec *) wi->info;
+  tree t = *tp;
+
+  if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
+return (tree) sra_ipa_modify_expr (tp, true, *adjustments);
+  else
+*walk_subtrees = 1;
+  return NULL_TREE;
+}
+
+/* Helper function for ipa_simd_modify_function_body.  Make any
+   necessary adjustments for tree operators.  */
+
+static bool
+ipa_simd_modify_function_body_ops (tree *tp,
+  ipa_parm_adjustment_vec *adjustments)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = adjustments;
+  bool res = (bool) walk_tree (tp, ipa_simd_modify_function_body_ops_1,
+  &wi, NULL);
+  return res;
+}
+
 /* Traverse the function body and perform all modifications as
described in ADJUSTMENTS.  At function return, ADJUSTMENTS will be
modified such that the replacement/reduction value will now be an
@@ -11121,6 +11150,11 @@ ipa_simd_modify_function_body (struct cgraph_node 
*node,
case GIMPLE_ASSIGN:
  t = gimple_assign_lhs_ptr (stmt);
  modified |= sra_ipa_modify_expr (t, false, adjustments);
+
+ /* The LHS may have operands that also need adjusting
+(e.g. `foo' in array[foo]).  */
+ modified |= ipa_simd_modify_function_body_ops (t, &adjustments);
+
  for (i = 0; i < gimple_num_ops (stmt); ++i)
{
  t = gimple_op_ptr (stmt, i);
diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c 
b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c
new file mode 100644
index 000..8818594
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fopenmp" } */
+
+/* Test that array subscripts are properly adjusted.  */
+
+int array[1000];
+#pragma omp declare simd notinbranch simdlen(4)
+void foo (int i)
+{
+  array[i] = 555;
+}


Re: [omp4] fix array subscripts in simd clones

2013-11-04 Thread Aldy Hernandez

On 11/04/13 08:19, Aldy Hernandez wrote:

Hi.

While looking over some of your testcases I noticed that array
subscripts are not being properly adjusted:

foo(int i) {
 array[i] = 
}

The `i' should use the simd array magic instead of ICEing :).

Is the attached patch OK for the branch?

Aldy


Ughh, my apologies for the lack of subject.  Fixed in this followup.


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jason Merrill

On 11/04/2013 07:41 AM, Richard Biener wrote:

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...


I found it to be useless noise when not bootstrapping, since my system 
compiler isn't likely to recognize recently added % codes any time soon.


Jason



Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Jason Merrill

On 11/04/2013 05:50 AM, Marek Polacek wrote:

  case RO_ARROW:
+case RO_ARROW_STAR:
error_at (loc,
"invalid type argument of %<->%> (have %qT)",
type);


I think the diagnostic for RO_ARROW_STAR should say ->*, not just ->.

Jason



Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Jakub Jelinek  writes:

[...]

> Eventually, I think using int for sizes is just a ticking bomb, what if
> somebody uses > 2GB long lines?  Surely, on 32-bit hosts we are unlikely to
> handle it, but why couldn't 64-bit host handle it?  Column info maybe bogus
> in there, sure, but at least we should not crash or overflow buffers on it
> ;).  Anyway, not something needed to be fixed right now, but in the future
> it would be nicer to use size_t and/or ssize_t here.

Yes.  I initially tried to use size_t but found that I'd need to modify
several other places to shutdown many warning because these places where
using int :-(.  So I felt that would be a battle for later.

But I am adding this to my TODO.  I'll send a patch later that
changes this to size_t then, and adjusts the other places that need it
as well.

[...]

>>context->last_location = diagnostic->location;
>>s = expand_location_to_spelling_point (diagnostic->location);
>> -  line = location_get_source_line (s);
>> +  line = location_get_source_line (s, line_width);
>
> I think richi didn't like C++ reference arguments to be used that way (and
> perhaps guidelines don't either), because it isn't immediately obvious
> that line_width is modified by the call.  Can you change it to a pointer
> argument instead and pass &line_width?

Sure.  I have done the change in the patch below.  Sorry for this
reflex.  I tend to use pointers like these only in places where we can
allow them to be NULL.

> XNEWVEC or XRESIZEVEC will never return NULL though, so it doesn't have
> to be tested.  Though, the question is if that is what we want, caret
> diagnostics should be optional, if we can't print it, we just won't.

Hmmh.  This particular bug was noticed because of the explicit OOM
message displayed by XNEWVEC/XRESIZEVEC; otherwise, I bet this could
have just felt through the crack for a little longer.  So I'd say let's
just use XNEWVEC/XRESIZEVEC and remove the test, as you first
suggested.  The caret diagnostics functionality as a whole can be
disabled with -fno-diagnostic-show-caret.


[...]

> Otherwise, LGTM.

Thanks.

So here is the patch that bootstraps.

gcc/ChangeLog:

* input.h (location_get_source_line): Take an additional line_size
parameter by reference.
* input.c (get_line): New static function definition.
(read_line): Take an additional line_length output parameter to be
set to the size of the line.  Use the new get_line function to
compute the size of the line returned by fgets, rather than using
strlen.  Ensure that the buffer is initially zeroed; ensure that
when growing the buffer too.
(location_get_source_line): Take an additional output line_len
parameter.  Update the use of read_line to pass it the line_len
parameter.
* diagnostic.c (adjust_line): Take an additional input parameter
for the length of the line, rather than calculating it with
strlen.
(diagnostic_show_locus): Adjust the use of
location_get_source_line and adjust_line with respect to their new
signature.  While displaying a line now, do not stop at the first
null byte.  Rather, display the zero byte as a space and keep
going until we reach the size of the line.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c   |  17 ++--
 gcc/input.c| 100 ++---
 gcc/input.h|   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 -> 240 bytes
 4 files changed, 99 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..0ca7081 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
MAX_WIDTH by some margin, then adjust the start of the line such
that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
   const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@ diagnostic_show_locus (diag

Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Paolo Carlini

On 11/04/2013 04:34 PM, Jason Merrill wrote:

On 11/04/2013 07:41 AM, Richard Biener wrote:

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...


I found it to be useless noise when not bootstrapping, since my system 
compiler isn't likely to recognize recently added % codes any time soon.

Likewise here.

Paolo.


Re: [omp4]

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 08:19:12AM -0700, Aldy Hernandez wrote:
> Hi.
> 
> While looking over some of your testcases I noticed that array
> subscripts are not being properly adjusted:
> 
> foo(int i) {
>   array[i] = 
> }
> 
> The `i' should use the simd array magic instead of ICEing :).
> 
> Is the attached patch OK for the branch?

I guess short time yes, but I wonder if it wouldn't be better to use
walk_gimple_op and do all changes in the callback.  Instead of passing
adjustments pass around a struct containing the adjustments, current stmt
and the modified flag.  You can use the val_only and is_lhs to determine
what you need to do (probably need to reset those two for the subtrees
to val_only = true, is_lhs = false and not walk subtrees of types) and you
could (if not val_only) immediately gimplify it properly (insert temporary
SSA_NAME setter before resp. store after depending on is_lhs).
Then you could avoid the regimplification.

Jakub


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 10:34:55AM -0500, Jason Merrill wrote:
> On 11/04/2013 07:41 AM, Richard Biener wrote:
> >Please make sure the warning is preserved when not bootstrapping,
> >it's useful to have them in your dev-tree and not notice errors only
> >during -Werror bootstrap ...
> 
> I found it to be useless noise when not bootstrapping, since my
> system compiler isn't likely to recognize recently added % codes any
> time soon.

That's true, but it is easy to ignore those, while without -Wformat
you won't notice until you bootstrap that you actually passed incorrect
parameters to some *printf etc. family function.  Generally, in
non-bootstrap compilers I tend to ignore warnings except for the
file(s) I've been hacking on.

Jakub


Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian

2013-11-04 Thread Richard Sandiford
Bill Schmidt  writes:
> +   /* We need this to be vmulouh for both big and little endian,
> +  but for little endian we would swap this, so avoid that.  */
> +   if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> +   else
> + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));

FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into
define_expands and have define_insns for the underlying instructions.
E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub
depending on endianness.

Then the unspec name would always match the instruction, and you could
also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above.

It probably works out as more code overall, but maybe it means jumping
through fewer mental hoops...

Thanks,
Richard



Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Marek Polacek
On Mon, Nov 04, 2013 at 10:36:35AM -0500, Jason Merrill wrote:
> On 11/04/2013 05:50 AM, Marek Polacek wrote:
> >  case RO_ARROW:
> >+case RO_ARROW_STAR:
> >error_at (loc,
> > "invalid type argument of %<->%> (have %qT)",
> > type);
> 
> I think the diagnostic for RO_ARROW_STAR should say ->*, not just ->.

Ah, sorry about that.  What about this one?

2013-11-04  Marek Polacek  

PR c++/58979
c-family/
* c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case.
testsuite/
* g++.dg/diagnostic/pr58979.C: New test.

--- gcc/c-family/c-common.c.mp  2013-11-04 16:44:48.674023654 +0100
+++ gcc/c-family/c-common.c 2013-11-04 16:45:52.632267904 +0100
@@ -9934,6 +9934,11 @@ invalid_indirection_error (location_t lo
"invalid type argument of %<->%> (have %qT)",
type);
   break;
+case RO_ARROW_STAR:
+  error_at (loc,
+   "invalid type argument of %<->*%> (have %qT)",
+   type);
+  break;
 case RO_IMPLICIT_CONVERSION:
   error_at (loc,
"invalid type argument of implicit conversion (have %qT)",
--- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2   2013-11-04 
11:09:30.515553664 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr58979.C   2013-11-04 11:09:26.505538785 
+0100
@@ -0,0 +1,4 @@
+// PR c++/58979
+// { dg-do compile }
+
+int i = 0->*0; // { dg-error "invalid type argument of" }

Marek


Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Jason Merrill

OK.

Jason


Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-11-04 Thread David Edelsohn
Balaji,

I am seeing a large number of libcilkrts failures on AIX.  These all
are of the form:

Executing on host: /tmp/20131103/gcc/xgcc -B/tmp/20131103/gcc/ /nasfarm/edelsohn
/src/src/gcc/testsuite/c-c++-common/cilk-plus/CK/sync_wo_spawn.c  -fno-diagnosti
cs-show-caret -fdiagnostics-color=never   -O0 -flto -g -fcilkplus -L/tmp/2013110
3/powerpc-ibm-aix7.1.0.0/./libcilkrts/.libs -fcilkplus -S  -o sync_wo_spawn.s
 (timeout = 300)
cc1: error: LTO support has not been enabled in this configuration
compiler exited with status 1

If the feature / functionality requires LTO, the tests should check
that LTO is supported with check_effective_target_lto, either in each
relevant test or in cilk-plus.exp.

Thanks, David


Re: [omp4]

2013-11-04 Thread Aldy Hernandez

On 11/04/13 08:44, Jakub Jelinek wrote:

On Mon, Nov 04, 2013 at 08:19:12AM -0700, Aldy Hernandez wrote:

Hi.

While looking over some of your testcases I noticed that array
subscripts are not being properly adjusted:

foo(int i) {
array[i] = 
}

The `i' should use the simd array magic instead of ICEing :).

Is the attached patch OK for the branch?


I guess short time yes, but I wonder if it wouldn't be better to use
walk_gimple_op and do all changes in the callback.  Instead of passing
adjustments pass around a struct containing the adjustments, current stmt
and the modified flag.  You can use the val_only and is_lhs to determine
what you need to do (probably need to reset those two for the subtrees
to val_only = true, is_lhs = false and not walk subtrees of types) and you
could (if not val_only) immediately gimplify it properly (insert temporary
SSA_NAME setter before resp. store after depending on is_lhs).
Then you could avoid the regimplification.



You mean rewrite the entire ipa_simd_modify_function_body() with 
walk_gimple_op and friends, or just the bits I suggested with the above 
patch (the LHS operands in a GIMPLE_ASSIGN)?


Aldy



patch to fix pr58968

2013-11-04 Thread Vladimir Makarov
  The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58968


  It is actually the 3rd try to fix the problem.  Removing no-ops move
may result in mode-switching crashes on x86/x86-64 as it expects a
specific code (by the way reload has the same problem on x86/x86-64).

  The patch was successfully bootstrapped and tested on x86/x86-64 and
ppc64 (with LRA).

  Committed as rev. 204353.

2013-11-04  Vladimir Makarov  

PR rtl-optimization/58968
* lra-spills.c (return_regno_p): New function.
  (lra_final_code_change): Use it.

2013-11-04  Vladimir Makarov  

PR rtl-optimization/58968
* gfortran.dg/pr58968.f: New

Index: lra-spills.c
===
--- lra-spills.c	(revision 204352)
+++ lra-spills.c	(working copy)
@@ -618,6 +618,33 @@ alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
+/* Return true if REGNO is used for return in the current
+   function.  */
+static bool
+return_regno_p (unsigned int regno)
+{
+  rtx outgoing = crtl->return_rtx;
+
+  if (! outgoing)
+return false;
+
+  if (REG_P (outgoing))
+return REGNO (outgoing) == regno;
+  else if (GET_CODE (outgoing) == PARALLEL)
+{
+  int i;
+
+  for (i = 0; i < XVECLEN (outgoing, 0); i++)
+	{
+	  rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
+
+	  if (REG_P (x) && REGNO (x) == regno)
+	return true;
+	}
+}
+  return false;
+}
+
 /* Final change of pseudos got hard registers into the corresponding
hard registers and removing temporary clobbers.  */
 void
@@ -625,7 +652,7 @@ lra_final_code_change (void)
 {
   int i, hard_regno;
   basic_block bb;
-  rtx insn, curr, set;
+  rtx insn, curr;
   int max_regno = max_reg_num ();
 
   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
@@ -636,7 +663,6 @@ lra_final_code_change (void)
 FOR_BB_INSNS_SAFE (bb, insn, curr)
   if (INSN_P (insn))
 	{
-	  bool change_p;
 	  rtx pat = PATTERN (insn);
 
 	  if (GET_CODE (pat) == CLOBBER && LRA_TEMP_CLOBBER_P (pat))
@@ -649,12 +675,24 @@ lra_final_code_change (void)
 	  continue;
 	}
 
-	  set = single_set (insn);
-	  change_p = (set != NULL
-		  && REG_P (SET_SRC (set)) && REG_P (SET_DEST (set))
-		  && REGNO (SET_SRC (set)) >= FIRST_PSEUDO_REGISTER
-		  && REGNO (SET_DEST (set)) >= FIRST_PSEUDO_REGISTER);
-	  
+	  /* IRA can generate move insns involving pseudos.  It is
+	 better remove them earlier to speed up compiler a bit.
+	 It is also better to do it here as they might not pass
+	 final RTL check in LRA, (e.g. insn moving a control
+	 register into itself).  So remove an useless move insn
+	 unless next insn is USE marking the return reg (we should
+	 save this as some subsequent optimizations assume that
+	 such original insns are saved).  */
+	  if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
+	  && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
+	  && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
+	  && ! return_regno_p (REGNO (SET_SRC (pat
+	{
+	  lra_invalidate_insn_data (insn);
+	  delete_insn (insn);
+	  continue;
+	}
+	
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
 	  struct lra_static_insn_data *static_id = id->insn_static_data;
 	  bool insn_change_p = false;
@@ -668,20 +706,5 @@ lra_final_code_change (void)
 	  }
 	  if (insn_change_p)
 	lra_update_operator_dups (id);
-
-	  if (change_p && REGNO (SET_SRC (set)) == REGNO (SET_DEST (set)))
-	{
-	  /* Remove an useless move insn but only involving
-		 pseudos as some subsequent optimizations are based on
-		 that move insns involving originally hard registers
-		 are preserved.  IRA can generate move insns involving
-		 pseudos.  It is better remove them earlier to speed
-		 up compiler a bit.  It is also better to do it here
-		 as they might not pass final RTL check in LRA,
-		 (e.g. insn moving a control register into
-		 itself).  */
-	  lra_invalidate_insn_data (insn);
-	  delete_insn (insn);
-	}
 	}
 }
Index: testsuite/gfortran.dg/pr58968.f
===
--- testsuite/gfortran.dg/pr58968.f	(revision 0)
+++ testsuite/gfortran.dg/pr58968.f	(working copy)
@@ -0,0 +1,96 @@
+C PR rtl-optimization/58968.f
+C { dg-do compile { target powerpc*-*-*} }
+C { dg-options "-mcpu=power7 -O3 -w -ffast-math  -funroll-loops" }
+  SUBROUTINE MAKTABS(IW,SOME,LBOX1,LBOX2,LBOX3,NSPACE,NA,NB,
+ *LBST,X,
+ *NX,IAMA,IAMI,IBMA,IBMI,MNUM,IDIM,MSTA,IBO,
+ *IDSYM,ISYM1,NSYM,
+ *NACT,LWRK,KTAB,LGMUL,
+ *LCON,LCOA,LCOB,
+ *LANDET,LBNDET,NAST,NBST,LSYMA,LSYMB,LGCOM,
+ *MINI,MAXI,LSPA,LSPB,LDISB,
+ *LSAS,LSBS,LSAC,LSBC,
+ *ITGA,ITGB,IAST,IBST,NCI,NA1EX,NB1EX,FDIRCT)
+  IMPLICIT DOUBLE PRECISION(A-H,O-Z)
+  LOGICAL SOME
+  DIMENSION

Re: [omp4]

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 09:22:39AM -0700, Aldy Hernandez wrote:
> Ok, then let's leave this for later.  I'm following what we've done
> in tree-sra.c to be consistent.  Perhaps later we could even clean
> up ipa_sra_modify_function_body in tree-sra.c.  But for now I think
> we can concentrate on the basic functionality.

Well, ipa_sra_modify_function_body doesn't need to regimplify, unlike
the omp-low.c version thereof (supposedly because SRA only modifies
loads/stores, but doesn't change SSA_NAMEs on arithmetics etc.).
But yeah, this isn't requirement for hacking on this on gomp-4_0-branch, but
it would be nice to clean it up before merging to trunk.

> Committing to omp4 branch.

Thanks.

Jakub


Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian

2013-11-04 Thread Bill Schmidt
On Mon, 2013-11-04 at 15:48 +, Richard Sandiford wrote:
> Bill Schmidt  writes:
> > +   /* We need this to be vmulouh for both big and little endian,
> > +  but for little endian we would swap this, so avoid that.  */
> > +   if (BYTES_BIG_ENDIAN)
> > + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> > +   else
> > + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));
> 
> FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into
> define_expands and have define_insns for the underlying instructions.
> E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub
> depending on endianness.
> 
> Then the unspec name would always match the instruction, and you could
> also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above.
> 
> It probably works out as more code overall, but maybe it means jumping
> through fewer mental hoops...

Good idea.  I'll have a look and produce a new patch set shortly.

Thanks,
Bill

> 
> Thanks,
> Richard



Re: Aliasing: look through pointer's def stmt

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


Well, host_integer_p (, 0) looks correct to me.  But ...


Ok, I'll put it back.


Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h(revision 204267)
+++ gcc/tree-ssa-alias.h(working copy)
@@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct

 extern void dump_pta_stats (FILE *);

 extern GTY(()) struct pt_solution ipa_escaped_pt;

 /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2]
overlap.  SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the
range is open-ended.  Otherwise return false.  */

 static inline bool
-ranges_overlap_p (unsigned HOST_WIDE_INT pos1,
- unsigned HOST_WIDE_INT size1,
- unsigned HOST_WIDE_INT pos2,
- unsigned HOST_WIDE_INT size2)
+ranges_overlap_p (HOST_WIDE_INT pos1,
+ HOST_WIDE_INT size1,
+ HOST_WIDE_INT pos2,
+ HOST_WIDE_INT size2)


I think size[12] should still be unsigned (we don't allow negative
sizes but only the special value -1U which we special-case only to
avoid pos + size to wrap)


But all we do with size[12] is compare it to -1 and perform arithmetic 
with pos[12]. At least for the second one, we need to cast size to a 
signed type so the comparisons make sense (unless we decide to use a 
double_int there). So I thought I would do the cast in the argument. 
Otherwise, I'll start the function with:

HOST_WIDE_INT ssize1 = (HOST_WIDE_INT)size1;
and replace size1 with ssize1 through the function.

Is the reason of keeping size[12] unsigned for documentation? Or am I 
missing a reason why I may be breaking things?


--
Marc Glisse


[C++ Patch, Java related/RFC] PR 11006

2013-11-04 Thread Paolo Carlini

Hi,

we have got this very old bug, where we ICE in build_java_class_ref 
because TYPE is an INTEGER_TYPE and we try to use TYPE_FIELDS on it. 
Shall we apply something like the below and resolve it for good? 
Alternately, we could maybe provide the same message we currently 
provide in release-builds, when we get to:


if (!field)
  {
error ("can%'t find % in %qT", type);
return error_mark_node;
  }

Thanks!
Paolo.

//

Index: cp/init.c
===
--- cp/init.c   (revision 204353)
+++ cp/init.c   (working copy)
@@ -2461,9 +2461,16 @@ build_new_1 (vec **placement, tree ty
   if (vec_safe_is_empty (*placement) && TYPE_FOR_JAVA (elt_type))
 {
   tree class_addr;
-  tree class_decl = build_java_class_ref (elt_type);
+  tree class_decl;
   static const char alloc_name[] = "_Jv_AllocObject";
 
+  if (!MAYBE_CLASS_TYPE_P (elt_type))
+   {
+ error ("%qT isn%'t a valid Java class type", elt_type);
+ return error_mark_node;
+   }
+
+  class_decl = build_java_class_ref (elt_type);
   if (class_decl == error_mark_node)
return error_mark_node;
 
Index: testsuite/g++.dg/other/java3.C
===
--- testsuite/g++.dg/other/java3.C  (revision 0)
+++ testsuite/g++.dg/other/java3.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/11006
+
+typedef int* jclass;
+
+void foo () {
+  new __java_boolean;  // { dg-error "valid" }
+}


Re: libsanitizer merge from upstream r191666

2013-11-04 Thread Jack Howarth
On Mon, Nov 04, 2013 at 06:47:13AM -0800, Konstantin Serebryany wrote:
> +glider, our Mac expert.
> 
> Hi Jack,
> 
> This patch has not been tested on Mac and we generally do not claim
> that gcc-asan is supported on Mac.
> clang-asan *is* supported on Mac and our bots are green (this patch is
> a merge of the sources which are regularly tested on Mac,
> but the build procedure is different).
> 
> The failing assertion is weird, I haven't seen it since last year
> (https://code.google.com/p/address-sanitizer/issues/detail?id=117),
> but Alexander should know more.
> 
> What is your version of Mac OS?

I am testing on MacOS X 10.8.5 with FSF gcc trunk built as...

% gcc-fsf-4.9 -v
Using built-in specs.
COLLECT_GCC=gcc-fsf-4.9
COLLECT_LTO_WRAPPER=/sw/lib/gcc4.9/libexec/gcc/x86_64-apple-darwin12.5.0/4.9.0/lto-wrapper
Target: x86_64-apple-darwin12.5.0
Configured with: ../gcc-4.9-20131101/configure --prefix=/sw 
--prefix=/sw/lib/gcc4.9 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.9/info 
--enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw 
--with-libiconv-prefix=/sw --with-isl=/sw --with-cloog=/sw --with-mpc=/sw 
--with-system-zlib --enable-checking=yes --x-includes=/usr/X11R6/include 
--x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.9
Thread model: posix
gcc version 4.9.0 20131101 (experimental) (GCC) 

> 
> --kcc
> 
> On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth  
> wrote:
> > On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote:
> >> Jukub,
> >>
> >> This works!
> >> New patch attached.
> >
> > Konstantin,
> >This patch, when applied on top of r204305, bootstraps fine on 
> > x86_64-apple-darwin12 but
> > produces a lot of regressions with...
> >
> > make -k check RUNTESTFLAGS="asan.exp --target_board=unix'{-m64}'"
> >
> > Native configuration is x86_64-apple-darwin12.5.0
> >
> > === g++ tests ===
> >
> >
> > Running target unix/-m64
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O0  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O1  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O2  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -fomit-frame-pointer  
> > execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -g  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -Os  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto -flto-partition=none 
> >  execution test
> > FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O0  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O1  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O2  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -fomit-frame-pointer  
> > execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -g  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -Os  execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto -flto-partition=none  
> > execution test
> > FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O0  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O1  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O2  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O3 -fomit-frame-pointer  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O3 -g  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -Os  execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto -flto-partition=none  
> > execution test
> > FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O0  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O1  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O2  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O3 -fomit-frame-pointer  execution 
> > test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O3 -g  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -Os  execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto -flto-partition=none  
> > execution test
> > FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O0  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O1  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -fomit-frame-pointer  
> > execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -g  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -Os  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto 
> > -flto-partition=none  execution test
> > FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto  execution test
> > FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2  execution test
> > FAIL

Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse  wrote:

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
 wrote:


On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse 
wrote:


Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.



Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref->base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-&ref->offset, &t1, &t2);
+{
+  ref->base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr,
0),
+  &t, 0, &safe);
+  ref->offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1 && t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a
wrong
size then it's the callers error.



I think one issue is that the above code uses get_ref_base_and_extent
on an address.  It should have used get_addr_base_and_unit_offset.



Isn't that what my patch does? Except that get_addr_base_and_unit_offset
often gives up and returns NULL_TREE, whereas I believe we still want a base
even if we have trouble determining a constant offset, so I modified
get_addr_base_and_unit_offset_1 a bit.

(not saying the result is pretty...)


I don't think we want get_addr_base_and_unit_offset to return non-NULL
for a non-constant offset.  But yes, inside your patch is the correct fix,
so if you drop changing get_addr_base_and_unit_offset ...


That means that in a number of cases, ao_ref_init_from_ptr_and_size will 
produce a meaningless ao_ref (both .ref and .base are 0). I expect that 
will cause regressions in code quality at least. Or did you mean something 
else?


get_inner_reference might be an option too (we have quite a few functions 
doing similar things).


--
Marc Glisse


Re: [C++ Patch, Java related/RFC] PR 11006

2013-11-04 Thread Jason Merrill
Surely it should be valid to allocate a Java boolean type.  Andrew, how 
should that work?


Jason


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jason Merrill

On 11/04/2013 10:50 AM, Jakub Jelinek wrote:

That's true, but it is easy to ignore those, while without -Wformat
you won't notice until you bootstrap that you actually passed incorrect
parameters to some *printf etc. family function.


Sure, but that is much less common than the false positives, and will be 
caught anyway when you actually bootstrap (or rebuild stage 3).


Jason



Re: [Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed

2013-11-04 Thread Joerg Wunsch
As Senthil Kumar Selvaraj wrote:

> If ok, could someone commit please? I don't have commit access.

I don't have commit access either, but the changes look reasonable to me.
-- 
Joerg Wunsch * Development engineer, Dresden, Germany

Atmel Automotive GmbH, Theresienstrasse 2, D-74027 Heilbronn
Geschäftsführung: Steven Laub, Steve Skaggs, Dr. Matthias Kästner
Amtsgericht Stuttgart, Registration HRB 106594


Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian

2013-11-04 Thread Bill Schmidt
Per Richard S's suggestion, I'm reworking parts 1 and 3 of the patch
set, but this one will remain unchanged and is ready for review.

Thanks,
Bill

On Sun, 2013-11-03 at 23:34 -0600, Bill Schmidt wrote:
> Hi,
> 
> This patch fixes the widening multiply high/low operations to work
> correctly in the presence of the first patch of this series, which
> reverses the meanings of multiply even/odd instructions.  Here we
> reorder the input operands to the vector merge low/high instructions.
> 
> The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on;
> that is, we need to reverse the usage of merge high and merge low, and
> also swap their inputs, to obtain the same semantics.  In this case we
> are only swapping the inputs, because the reversed usage of high and low
> has already been done for us in the generic handling code for
> VEC_WIDEN_MULT_LO_EXPR.
> 
> Bootstrapped and tested with the rest of the patch set on
> powerpc64{,le}-unknown-linux-gnu, with no regressions.  Is this ok for
> trunk?
> 
> Thanks,
> Bill
> 
> 
> 2013-11-03  Bill Schmidt  
> 
>   * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
>   arguments to merge instruction for little endian.
>   (vec_widen_umult_lo_v16qi): Likewise.
>   (vec_widen_smult_hi_v16qi): Likewise.
>   (vec_widen_smult_lo_v16qi): Likewise.
>   (vec_widen_umult_hi_v8hi): Likewise.
>   (vec_widen_umult_lo_v8hi): Likewise.
>   (vec_widen_smult_hi_v8hi): Likewise.
>   (vec_widen_smult_lo_v8hi): Likewise.
> 
> 
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  (revision 204192)
> +++ gcc/config/rs6000/altivec.md  (working copy)
> @@ -2185,7 +2235,10 @@
>
>emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2202,7 +2255,10 @@
>
>emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2219,7 +2275,10 @@
>
>emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2236,7 +2295,10 @@
>
>emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2253,7 +2315,10 @@
>
>emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2270,7 +2335,10 @@
>
>emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2287,7 +2355,10 @@
>
>emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>DONE;
>  }")
> 
> @@ -2304,7 +2375,10 @@
>
>emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
>

Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw

2013-11-04 Thread Richard Biener
Ian Lance Taylor  wrote:
>On Mon, Nov 4, 2013 at 3:52 AM, Richard Biener
> wrote:
>> On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor 
>wrote:
>>> The middle-end currently marks all the sync builtins as nothrow. 
>That
>>> is incorrect for most of them when using -fnon-call-exceptions.  The
>>> -fnon-call-exceptions option means that instructions that trap may
>throw
>>> exceptions.  Most of the sync builtins access memory via pointers
>passed
>>> by user code.  Therefore, they may trap.  (I noticed this because Go
>>> uses -fnon-call-exceptions.)
>>>
>>> This patch fixes the problem by introducing a new internal function
>>> attribute "nothrow call".  The new attribute is exactly like
>"nothrow",
>>> except that it is ignored when using -fnon-call-exceptions.
>>>
>>> It is an internal attribute because there is no reason for a user to
>use
>>> this.  The problem only arises when the middle-end sees a function
>call
>>> that the backend turns into an instruction sequence that directly
>>> references memory.  That can only happen for functions that are
>built in
>>> to the compiler.
>>>
>>> This requires changes to the C family, LTO, and Ada frontends.  I
>think
>>> I'm handling the option correctly for LTO, but it would be good if
>>> somebody could check the logic for me.
>>>
>>> Bootstrapped and ran tests on x86_64-unknown-linux-gnu.  OK for
>>> mainline?
>>
>> Hmm.  I think you can handle this in a simpler way by just
>> doing sth similar to
>>
>> #define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
>> ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)
>>
>> that is, conditionally drop the _NOTHROW part.
>
>Good point.
>
>> Btw, I also think
>> that if it can throw then it isn't LEAF (it may return exceptionally
>> into another function in the unit).
>
>According to the docs, a "leaf" function is permitted to return by
>throwing an exception.
>
>> Also this whole conditional-on-a-flag thing doesn't work well with
>> using -fnon-call-exceptions in the optimize attribute or with
>> different -fnon-call-exceptions settings in different TUs we LTO
>> together.  Because we only have a single builtin decl (so for
>> "proper" operation you'd have to clone builtin decls based on
>> the matrix of flags that generate different attributes and use the
>> correct one depending on context).
>
>Yes, but as you know this problem already exists with -fexceptions.
>I'm not really making it worse.
>
>> That said, I'd prefer a simpler approach for now, mimicing
>flag_errno_math
>> handling.
>
>Done.  Attached.  Bootstrapped on x86_64-unknown-linux-gnu, running
>tests now.  OK for mainline if the tests pass?

Ok

Thanks,
Richard

>Ian
>
>
>gcc/ChangeLog:
>
>2013-11-04  Ian Lance Taylor  
>
>   * builtins.def (ATTR_NOTHROWCALL_LEAF_LIST): Define.
>   * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync
>   builtins that take pointers.
>   * lto-opts.c (lto_write_options): Write -fnon-call-exceptions
>   if set.
>   * lto-wrapper.c (merge_and_complain): Collect
>   OPT_fnon_call_exceptions.
>   (run_gcc): Pass -fnon-call-exceptions.
>
>gcc/testsuite/ChangeLog:
>
>2013-11-03  Ian Lance Taylor  
>
>   * g++.dg/ext/sync-4.C: New test.




Re: lto-plugin: mismatch between ld's architecture and GCC's configure --host

2013-11-04 Thread Cary Coutant
>> Ping.  To sum it up, with these patches applied, there are no changes for
>> a "regular" build (not using the new configure options).  On the other
>> hand, configuring GCC as described, it is possible use the 32-bit x86
>> linker for/with a x86_64 build, and get the very same GCC test results as
>> when using a x86_64 linker.

Sorry, I've been hoping someone with more global approval authority
would respond.

> Allow overriding the libiberty used for building the LTO plugin.
>
> lto-plugin/
> * configure.ac (--with-libiberty): New configure option.
> * configure: Regenerate.
> * Makefile.am (libiberty, libiberty_pic): New variables.
> (liblto_plugin_la_LIBADD, liblto_plugin_la_LDFLAGS)
> (liblto_plugin_la_DEPENDENCIES): Use them.
> * Makefile.in: Regenerate.

These look OK to me.

> Allow for overriding a module's srcdir.
>
> * Makefile.tpl (configure-[+prefix+][+module+])
> (configure-stage[+id+]-[+prefix+][+module+]): Allow for
> overriding a module's srcdir.
> * Makefile.in: Regenerate.

These look OK, but I think a global maintainer or build machinery
maintainer should give approval.

> Non-host system configuration for linker plugins.
>
> * configure.ac (--enable-linker-plugin-flags)
> (--enable-linker-plugin-configure-flags): New flags.
> (configdirs): Conditionally add libiberty-linker-plugin.
> * configure: Regenerate.
> * Makefile.def (host_modules): Add libiberty-linker-plugin.
> (host_modules) : Pay attention to
> @extra_linker_plugin_flags@ and
> @extra_linker_plugin_configure_flags@.
> (all-lto-plugin): Also depend on all-libiberty-linker-plugin.
> * Makefile.in: Regenerate.
> gcc/
> * doc/install.texi (--enable-linker-plugin-flags)
> (--enable-linker-plugin-configure-flags): Document new flags.

Same here.

> GNU ld can use linker plugins, too.
>
> gcc/
> * doc/sourcebuild.texi (Top Level) : GNU ld can use
> linker plugins, too.

OK.

> Fix typo.
>
> * Makefile.tpl: Fix typo.
> * Makefile.in: Regenerate.

OK (qualifies as trivial and obvious).

-cary


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Jeff Law

On 11/01/13 15:41, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);
 gimple_set_op (gs, 0, lhs);
to
 static inline void
! gimple_call_set_lhs (gimple_statement_call *gs, tree lhs)
   {
 gimple_set_op (gs, 0, lhs);


but then every location that calls it needs an appropriate change:

!   gimple call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 gimple_call_set_lhs (call, atree);

--- 1518,1524 

!   gimple_statement_call *call;
!   call = as_a (gimple_build_call_vec
(build_fold_addr_expr_loc (0, alias), vargs));
 gimple_call_set_lhs (call, atree);

And in fact there is a ripple effect to then change
gimple_build_call_vec to simply return a gimple_statement_call *...
Then this doesn't look as ugly either...

!   gimple_statement_call *call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 gimple_call_set_lhs (call, atree);

that is looking much better :-)


Do you seriously think this is an improvement?  The cost of changing
the --enable-checking=yes cost to compile time checking in either
cases sounds way too high to me.  Please don't.
?!?  One of the things we're reallying trying to do here is be type safe 
and use the type system to check for things at compile-time rather than 
runtime checks.


jeff


Re: [PATCH] Try to avoid vector mode punning in SET_DEST on i?86

2013-11-04 Thread James Greenhalgh
On Thu, Oct 31, 2013 at 04:49:47PM +, Jakub Jelinek wrote:

> 2013-10-31  Jakub Jelinek  
> 
> * optabs.c (expand_vec_perm): Avoid vector mode punning
> SUBREGs in SET_DEST.
> * expmed.c (store_bit_field_1): Likewise.
> * config/i386/sse.md (movdi_to_sse, vec_pack_sfix_trunc_v2df,
> vec_pack_sfix_v2df, vec_shl_, vec_shr_,
> vec_interleave_high, vec_interleave_low): Likewise.
> * config/i386/i386.c (ix86_expand_vector_move_misalign,
> ix86_expand_sse_movcc, ix86_expand_int_vcond, ix86_expand_vec_perm,
> ix86_expand_sse_unpack, ix86_expand_args_builtin,
> ix86_expand_vector_init_duplicate, ix86_expand_vector_set,
> emit_reduc_half, expand_vec_perm_blend, expand_vec_perm_pshufb,
> expand_vec_perm_interleave2, expand_vec_perm_pshufb2,
> expand_vec_perm_vpshufb2_vpermq,
> expand_vec_perm_vpshufb2_vpermq_even_odd, expand_vec_perm_even_odd_1,
> expand_vec_perm_broadcast_1, expand_vec_perm_vpshufb4_vpermq2,
> ix86_expand_sse2_mulv4si3, ix86_expand_pinsr): Likewise.
> (expand_vec_perm_palignr): Likewise.  Modify a copy of *d rather
> than *d itself.
> 
> --- gcc/optabs.c.jj 2013-10-29 09:25:45.0 +0100
> +++ gcc/optabs.c2013-10-31 13:20:40.384808642 +0100
> @@ -6674,7 +6674,7 @@ expand_vec_perm (enum machine_mode mode,
> }
>tmp = gen_rtx_CONST_VECTOR (qimode, vec);
>sel = gen_lowpart (qimode, sel);
> -  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> +  sel = expand_vec_perm (qimode, gen_reg_rtx (qimode), sel, tmp, NULL);
>gcc_assert (sel != NULL);
> 
>/* Add the byte offset to each byte element.  */

This hunk causes issues on AArch64 and ARM.

We look to see which permute operation we should generate in
aarch64_expand_vec_perm_const. If we notice that all elements in 
would select from op0 we copy op0 to op1 and generate appropriate code.

With this hunk applied we end up selecting from the register
generated by gen_reg_rtx (qimode), rather than from 'sel' as we
intended. Thus we lose the value of 'sel' and everything starts to
go wrong!

The hunk looks suspicious to me (why do we pick a register out of
thin air?), and reverting it fixes the problems I see. Could you
give me a pointer as to what this hunk fixes on i?86?

I don't think I can fix this up in the expander very easily?

Thanks,
James




Re: [Patch, ObjC, ObjC++, Darwin] Remove "-fobjc-sjlj-exceptions"

2013-11-04 Thread Joseph S. Myers
On Sun, 3 Nov 2013, Iain Sandoe wrote:

> gcc/c-family:
> 
>   * c-opts.c (c_common_post_options): Remove handling of
>   flag_objc_sjlj_exceptions.
>   * c.opt (fobjc-sjlj-exceptions): Issue a warning if this
>   option is given.

The c-family changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [driver] Force options Wa, Wl, Wp, to take a mandatory argument

2013-11-04 Thread Joseph S. Myers
On Mon, 4 Nov 2013, Mingjie Xing wrote:

> Hello,
> 
> This patch forces options Wa, Wl, Wp, to take a mandatory argument,
> which can fix the bug
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55651.
> 
> Tested on i686-pc-linux-gnu.
> 
> 2013-11-04  Mingjie Xing  
> 
> * common.opt (Wa, Wl, Wp,): Change JoinedOrMissing to Joined.
> 
> Is it OK?

As discussed in the thread starting at 
, this is 
incorrect; these options should pass through empty strings unchanged.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-11-04 Thread Cong Hou
On Mon, Nov 4, 2013 at 2:06 AM, James Greenhalgh
 wrote:
> On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 2a5a2e1..8f5d39a 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
>> Operand 3 is of a mode equal or
>>  wider than the mode of the product. The result is placed in operand 0, which
>>  is of the same mode as operand 3.
>>
>> +@cindex @code{ssad@var{m}} instruction pattern
>> +@item @samp{ssad@var{m}}
>> +@cindex @code{usad@var{m}} instruction pattern
>> +@item @samp{usad@var{m}}
>> +Compute the sum of absolute differences of two signed/unsigned elements.
>> +Operand 1 and operand 2 are of the same mode. Their absolute difference, 
>> which
>> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a 
>> mode
>> +equal or wider than the mode of the absolute difference. The result is 
>> placed
>> +in operand 0, which is of the same mode as operand 3.
>> +
>>  @cindex @code{ssum_widen@var{m3}} instruction pattern
>>  @item @samp{ssum_widen@var{m3}}
>>  @cindex @code{usum_widen@var{m3}} instruction pattern
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index 4975a64..1db8a49 100644
>
> I'm not sure I follow, and if I do - I don't think it matches what
> you have implemented for i386.
>
> From your text description I would guess the series of operations to be:
>
>   v1 = widen (operands[1])
>   v2 = widen (operands[2])
>   v3 = abs (v1 - v2)
>   operands[0] = v3 + operands[3]
>
> But if I understand the behaviour of PSADBW correctly, what you have
> actually implemented is:
>
>   v1 = widen (operands[1])
>   v2 = widen (operands[2])
>   v3 = abs (v1 - v2)
>   v4 = reduce_plus (v3)
>   operands[0] = v4 + operands[3]
>
> To my mind, synthesizing the reduce_plus step will be wasteful for targets
> who do not get this for free with their Absolute Difference step. Imagine a
> simple loop where we have synthesized the reduce_plus, we compute partial
> sums each loop iteration, though we would be better to leave the reduce_plus
> step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
> Tree code for this.

What do you mean when you use "synthesizing" here? For each pattern,
the only synthesized operation is the one being returned from the
pattern recognizer. In this case, it is USAD_EXPR. The recognition of
reduce sum is necessary as we need corresponding prolog and epilog for
reductions, which is already done before pattern recognition. Note
that reduction is not a pattern but is a type of vector definition. A
vectorization pattern can still be a reduction operation as long as
STMT_VINFO_RELATED_STMT of this pattern is a reduction operation. You
can check the other two reduction patterns: widen_sum_pattern and
dot_prod_pattern for reference.

Thank you for your comment!


Cong

>
> I would prefer to see this Tree code not imply the reduce_plus.
>
> Thanks,
> James
>


Re: changing a collision resolution strategy of the symbol table of identifiers

2013-11-04 Thread Roman Gareev
2013/10/20 Ondřej Bílka :
> On Sun, Oct 20, 2013 at 06:55:40PM +0600, Roman Gareev wrote:
>> Dear gcc contributors,
>>
>> Recently I came across the list of "ideas for speeding up GCC"
>> (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was
>> suggested to replace identifier hash table with other data structure.
>>
>> Please find attached a patch, that switches the resolution strategy of
>> a hash table used in the symbol table of identifiers from open
>> addressing to separate chaining with linked list. I would be very
>> grateful for your comments, feedback and ideas about further
>> enhancements to gcc, in which I could take a part. I am pursuing
>> master of science and looking for thesis theme, that is compiler
>> related.
>>
>> Below are results of testing and the patch.
>>
>> During testing of the linux kernel (3.8.8) compilation time, the
>> acquired results were the following: increasing by 0.17% for the
>> version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing
>> by 0.598% for trunk (this are average values).
>>
>>
>> Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk.
>>
>>
>> diff --git a/gcc/stringpool.c b/gcc/stringpool.c
>>
>> index f4d0dae..cc696f7 100644
>>
>> --- a/gcc/stringpool.c
>>
>> +++ b/gcc/stringpool.c
>>
>> @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator
>> op, void *cookie)
>>
>> /* SPD is saved in the PCH file and holds the information needed
>>
>> to restore the string pool. */
>>
>> -struct GTY(()) string_pool_data {
>>
>> - ht_identifier_ptr *
>>
>> - GTY((length ("%h.nslots"),
>>
> Mail client broke it?
Sorry, an error occurred somewhere. Below is an attachment with the patch.


patch
Description: Binary data


[PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-04 Thread Yufeng Zhang

Hi,

This patch extends the slsr pass to optionally use an alternative base 
expression in finding basis for CAND_REFs.  Currently the pass uses 
hash-based algorithm to match the base_expr in a candidate.  Given a 
test case like the following, slsr will not be able to recognize the two 
CAND_REFs have the same basis, as their base_expr are of different 
SSA_NAMEs:


typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i][j] = 1;
  a2[i + 10][j] = 2;
}

The gimple dump before slsr is like the following (using an 
arm-none-eabi gcc):


  i.0_2 = (unsigned int) i_1(D);
  _3 = i.0_2 * 80;
  _5 = a2_4(D) + _3;
  *_5[j_7(D)] = 1;  <
  _9 = _3 + 800;
  _10 = a2_4(D) + _9;
  *_10[j_7(D)] = 2; <

Here are the dumps for the two CAND_REFs generated for the two 
statements pointed by the arrows:



  4  [2] _5 = a2_4(D) + _3;
 ADD  : a2_4(D) + (80 * i_1(D)) : int[20] *
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

  8  [2] *_10[j_7(D)] = 2;
 REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
 basis: 5  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

As mentioned previously, slsr cannot establish that candidate 4 is the 
basis for the candidate 8, as they have different base_exprs: a2_4(D) 
and _10, respectively.  However, the two references actually only differ 
by an immediate offset (800).


This patch uses the tree affine combination facilities to create an 
optional alternative base expression to be used in finding (as well as 
recording) the basis.  It calls tree_to_aff_combination_expand on 
base_expr, reset the offset field of the generated aff_tree to 0 and 
generate a tree from it by calling aff_combination_to_tree.


The new tree is recorded as a potential basis, and when 
find_basis_for_candidate fails to find a basis for a CAND_REF in its 
normal approach, it searches again using a tree expanded in such way. 
Such an expanded tree usually discloses the expression behind an 
SSA_NAME.  In the example above, instead of seeing the strength 
reduction candidate chains like this:


  _5 -> 5
  _10 -> 8

we are now having:

  _5 -> 5
  _10 -> 8
  a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8

With the candidates 5 and 8 linked to the same tree expression (a2_4(D) 
+ (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the 
basis of 8.


The patch doesn't attempt to change the content of any CAND_REF though. 
 It only enables CAND_REFs which (1) have the same stride and (2) have 
the underlying expressions of their base_expr only differ in immediate 
offsets,  to be recognized to have the same basis.  The statements with 
such CAND_REFs will be lowered to MEM_REFs, and later on the RTL 
expander shall be able to fold and re-associate the immediate offsets to 
the rightmost side of the addressing expression, and therefore exposes 
the common sub-expression successfully.


The code-gen difference of the example code on arm with -O2 
-mcpu=cortex-15 is:


mov r3, r1, asl #6
-   add ip, r0, r2, asl #2
str lr, [sp, #-4]!
+   mov ip, #1
+   mov lr, #2
add r1, r3, r1, asl #4
-   mov lr, #1
-   mov r3, #2
add r0, r0, r1
-   add r0, r0, #800
-   str lr, [ip, r1]
-   str r3, [r0, r2, asl #2]
+   add r3, r0, r2, asl #2
+   str ip, [r0, r2, asl #2]
+   str lr, [r3, #800]
ldr pc, [sp], #4

One fewer instruction in this simple case.

The example used in illustration is too simple to show code-gen 
difference on x86_64, but the included test case will show the benefit 
of the patch quite obviously.


The patch has passed

* bootstrapping on arm and x86_64
* regtest on arm-none-eabi,  aarch64-none-elf and x86_64

There is no regression in SPEC2K on arm or x86_64.

OK to commit to the trunk?

Any comment is welcomed!

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(find_basis_for_base_expr): Update comment.
(find_basis_for_candidate): Add new parameter 'alt_base_expr' of
type 'tree'.  Optionally call find_basis_for_base_expr with
'alt_base_expr'.
(record_potential_basis): Add new parameter 'alt_base_expr' of
type 'tree'; set node->base_expr with 'alt_base_expr' if it is
not NULL.
(name_expansions): New static variable.
(get_alternative_base): New function.
(alloc_cand_and_find_basis): Call get_alternative_base for 
CAND_REF.
Update calls to find_basis_for_candidate and 
record_potential_basis.

(execute_strength_reduction): Call free_affine_expand_cache with
&name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 9a5072c..3150046 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@

  1   2   >