Re: patch to canonize unsigned tree-csts

2013-10-06 Thread Richard Sandiford
Kenneth Zadeck  writes:
> On 10/04/2013 01:00 PM, Richard Sandiford wrote:
>> I was hoping Richard would weigh in here.  In case not...
>>
>> Kenneth Zadeck  writes:
>> I was thinking that we should always be able to use the constant as-is
>> for max_wide_int-based and addr_wide_int-based operations.  The 
>> small_prec
> again, you can get edge cased to death here.i think it would work
> for max because that really is bigger than anything else, but it is
> possible (though unlikely) to have something big converted to an address
> by truncation.
 But I'd have expected that conversion to be represented by an explicit
 CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
 something that isn't bit- or byte-address-sized.  It'd be the C equivalent
 of int + long -> int rather than the expected int + long -> long.

 Same goes for wide_int.  If we're doing arithmetic at a specific
 precision, it seems odd for one of the inputs to be wider and yet
 not have an explicit truncation.
>>> you miss the second reason why we needed addr-wide-int.   A large amount
>>> of the places where the addressing arithmetic is done are not "type
>>> safe".Only the gimple and rtl that is translated from the source
>>> code is really type safe. In passes like the strength reduction
>>> where they just "grab things from all over", the addr-wide-int or the
>>> max-wide-int are safe haven structures that are guaranteed to be large
>>> enough to not matter.So what i fear here is something like a very
>>> wide loop counter being grabbed into some address calculation.
>> It still seems really dangerous to be implicitly truncating a wider type
>> to addr_wide_int.  It's not something we'd ever do in mainline, because
>> uses of addr_wide_int are replacing uses of double_int, and double_int
>> is our current maximum-width representation.
>>
>> Using addr_wide_int rather than max_wide_int is an optimisation.
>> IMO part of implementating that optimisation should be to introduce
>> explicit truncations whenever we try to use addr_wide_int to operate
>> on inputs than might be wider than addr_wide_int.
>>
>> So I still think the assert is the way to go.
> addr_wide_int is not as much of an optimization as it is documentation 
> of what you are doing - i.e. this is addressing arithmetic.  My 
> justification for putting it in was that we wanted a sort of an abstract 
> type to say that this was not just user math, it was addressing 
> arithmetic and that the ultimate result is going to be slammed into a 
> target pointer.
>
> I was only using that as an example to try to indicate that I did not 
> think that it was wrong if someone did truncate.   In particular, would 
> you want the assert to be that the value was truncated or that the type 
> of the value would allow numbers that would be truncated?   I actually 
> think neither.

I'm arguing for:

gcc_assert (precision >= xprecision);

in wi::int_traits ::decompose.

IIRC one of the reasons for wanting addr_wide_int rather than wide_int
was that we wanted a type that could handle both bit and byte sizes.
And we wanted to be able to convert between bits and bytes seamlessly.
That means that shifting is a valid operation for addr_wide_int.  But if
we also implicitly (and that's the key word) used addr_wide_int
directly on tree constants that are wider than addr_wide_int, and say
shifted the result right, the answer would be different from what you'd
get if you did the shift in max_wide_int.  That seems like new behaviour,
since all address arithmetic is effectively done to maximum precision on
mainline.  It's just that the maximum on mainline is rather small.

If code is looking through casts to see wider-than-addr_wide_int types,
I think it's reasonable for that code to have to explicitly force the
tree to addr_wide_int size, via addr_wide_int::from.  Leaving it implicit
seems too subtle and also means that every caller to wi::int_traits
::decompose does a check that is usually unnecessary.

> If a programmer uses a long long on a 32 bit machine for some index 
> variable and slams that into a pointer, he either knows what he is doing 
> or has made a mistake.do you really think that the compiler should ice?

No, I'm saying that passes that operate on addr_wide_ints while "grabbing
trees from all over" (still not sure what that means in practice) should
explicitly mark places where a truncation is deliberately allowed.
Those places then guarantee that the dropped bits wouldn't affect any of
the later calculations, which is something only the pass itself can know.

We already forbid direct assignments like:

   addr_wide_int x = max_wide_int(...);

at compile time, for similar reasons.

Thanks,
Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (1).

2013-10-06 Thread Richard Sandiford
Thanks for the updates.

Chung-Ju Wu  writes:
> On 9/29/13 7:25 PM, Richard Sandiford wrote:
>> Chung-Ju Wu  writes:
>> 
>>> +  /* We need to provide a customized rtx which contains
>>> + necessary information for data analysis,
>>> + so we create a parallel rtx like this:
>>> + (parallel [(unspec [(reg: Rb)
>>> + (reg: Re)
>>> + (const_int En4)] UNSPEC_STACK_PUSH_MULTIPLE)
>>> +(use (reg:SI Rb))
>>> +(use (reg:SI Re))
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -4)))
>>> + (reg:SI LP_REGNUM))
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -8)))
>>> + (reg:SI GP_REGNUM))
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -12)))
>>> + (reg:SI FP_REGNUM))
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -16)))
>>> + (reg:SI Re))
>>> +...
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -28)))
>>> + (reg:SI Rb+1))
>>> +(set (mem (plus (reg:SI SP_REGNUM) (const_int -32)))
>>> + (reg:SI Rb))
>>> +(set (reg:SI SP_REGNUM)
>>> + (plus (reg:SI SP_REGNUM) (const_int -32)))]) */
>> 
>> The "use"s here should be redundant, since the "set"s say that the
>> registers are used.  Same comment for:
>> 
>>> +  /* We need to provide a customized rtx which contains
>>> + necessary information for data analysis,
>>> + so we create a parallel rtx like this:
>>> + (NOTE: We have to clearly claim that we are using/clobbering Rb and 
>>> Re,
>>> +otherwise it may be renamed by optimization.)
>>> + (parallel [(unspec [(reg: Rb)
>>> + (reg: Re)
>>> + (const_int En4)] UNSPEC_STACK_POP_MULTIPLE)
>>> +(use (reg:SI Rb))
>>> +(use (reg:SI Re))
>>> +(clobber (reg:SI Rb))
>>> +(clobber (reg:SI Re))
>>> +(set (reg:SI Rb)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 0
>>> +(set (reg:SI Rb+1)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 4
>>> +...
>>> +(set (reg:SI Re)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 16
>>> +(set (reg:SI FP_REGNUM)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 20
>>> +(set (reg:SI GP_REGNUM)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 24
>>> +(set (reg:SI LP_REGNUM)
>>> + (mem (plus (reg:SI SP_REGNUM) (const_int 28
>>> +(set (reg:SI SP_REGNUM)
>>> + (plus (reg:SI SP_REGNUM) (const_int 32)))]) */
>> 
>> the "use"s and "clobber"s here.  As far as the note goes, adding clobbers
>> isn't guaranteed to stop renaming, and it's not really valid for an rtx
>> to assign to the same register twice.
>> 
>> The best way of preventing renaming is to have a match_parallel that
>> checks every "set", a bit like you already did for the load/store multiple.
>> That should make the "unspec" unnecessary too.
>> 
>
> Now we remove all "use"s and "clobber"s from parallel rtx and
> use predicate function to check stack push/pop operation.
> Furthermore, once I remove unspec rtx as you suggested,
> I notice gcc is aware of the def-use dependency and it wouldn't
> perform unexpected register renaming.  So I think it was my
> faulty design of placing unspec/use/clobber within parallel rtx.

FWIW, it'd probably be better for nds32_valid_stack_push_pop to check
all the SETs in the PARALLEL, like nds32_valid_multiple_load_store does.
They could share a subroutine that checks for conesecutive loads and stores.
I.e. something like:

static bool
nds32_valid_multiple_load_store_1 (rtx op, bool load_p, int start, int end)
{
  ...
}

bool
nds32_valid_multiple_load_store (rtx op, bool load_p)
{
  return nds32_valid_multiple_load_store_1 (op, load_p, 0, XVECLEN (op, 0));
}

bool
nds32_valid_multiple_push_pop (rtx op, bool load_p)
{
  ...handle $lp, $gp and $fp cases, based on cframe->...
  if (!nds32_valid_multiple_load_store_1 (op, load_p, i, count - 1))
return false;
  ...check last element...
}

Thanks,
Richard


Re: [Patch] Fix COMPILE error in regex and remove default parameter in function definition

2013-10-06 Thread Paolo Carlini

On 10/06/2013 07:46 AM, Tim Shen wrote:

Stupid errors hidden in some large commit.

Ok, thanks.

Paolo.


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Richard Sandiford
Chung-Ju Wu  writes:
> On 10/2/13 1:31 AM, Richard Sandiford wrote:
>> Chung-Ju Wu  writes:
>>> +  /* Use $r15, if the value is NOT in the range of Is20,
>>> + we must output "sethi + ori" directly since
>>> + we may already passed the split stage.  */
>>> +  return "sethi\t%0, hi20(%1)\;ori\t%0, %0, lo12(%1)";
>>> +case 17:
>>> +  return "#";
>> 
>> I don't really understand the comment for case 16.  Returning "#"
>> (like for case 17) forces a split even at the output stage.
>> 
>> In this case it might not be worth forcing a split though, so I don't
>> see any need to change the code.  I think the comment should be changed
>> to give a different reason though.
>> 
>
> Sorry for the misleading comment.
>
> For case 17, we were trying to split large constant into two individual
> rtx patterns into "sethi" + "addi" so that we can have chance to match
> "addi" pattern with 16-bit instruction.
>
> But case 16 is different.
> This case is only produced at prologue/epilogue phase, using a temporary
> register $r15 to hold a large constant for adjusting stack pointer. 
> Since prologue/epilogue is after split1/split2 phase, we can only
> output "sethi" + "ori" directly.
> (The "addi" instruction with $r15 is a 32-bit instruction.)

But this code is in the output template of the define_insn.  That code
is only executed during final, after all passes have been run.  If the
template returns "#", final will split the instruction itself, which is
possible even at that late stage.  "#" doesn't have any effect on the
passes themselves.

(FWIW, there's also a split3 pass that runs after prologue/epilogue
generation but before sched2.)

However, ISTR there is/was a rule that prologue instructions shouldn't
be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
Maybe you hit an ICE because of that?

Another way to handle this would be to have the movsi expander split
large constant moves.  When can_create_pseudo_p (), the intermediate
results can be stored in new registers, otherwise they should reuse
operands[0].  Two advantages to doing it that way are that high parts
can be shared before RA, and that calls to emit_move_insn from the
prologue code will split the move automatically.  I think many ports
do it that way (including MIPS FWIW).

Thanks,
Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Lin Tang
On 2013/10/6 05:57 PM, Richard Sandiford wrote:
>> > But case 16 is different.
>> > This case is only produced at prologue/epilogue phase, using a temporary
>> > register $r15 to hold a large constant for adjusting stack pointer. 
>> > Since prologue/epilogue is after split1/split2 phase, we can only
>> > output "sethi" + "ori" directly.
>> > (The "addi" instruction with $r15 is a 32-bit instruction.)
> But this code is in the output template of the define_insn.  That code
> is only executed during final, after all passes have been run.  If the
> template returns "#", final will split the instruction itself, which is
> possible even at that late stage.  "#" doesn't have any effect on the
> passes themselves.
> 
> (FWIW, there's also a split3 pass that runs after prologue/epilogue
> generation but before sched2.)
> 
> However, ISTR there is/was a rule that prologue instructions shouldn't
> be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
> Maybe you hit an ICE because of that?
> 
> Another way to handle this would be to have the movsi expander split
> large constant moves.  When can_create_pseudo_p (), the intermediate
> results can be stored in new registers, otherwise they should reuse
> operands[0].  Two advantages to doing it that way are that high parts
> can be shared before RA, and that calls to emit_move_insn from the
> prologue code will split the move automatically.  I think many ports
> do it that way (including MIPS FWIW).

FWIW, most ports usually just handle such "large adjustment" cases in
the prologue/epilogue code manually; either multiple SP-adjustments, or
use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
You might be able to get it to work, but trying to rely on the splitter
does not seem like best practice...

Chung-Lin



Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Richard Sandiford
Chung-Lin Tang  writes:
> On 2013/10/6 05:57 PM, Richard Sandiford wrote:
>>> > But case 16 is different.
>>> > This case is only produced at prologue/epilogue phase, using a temporary
>>> > register $r15 to hold a large constant for adjusting stack pointer. 
>>> > Since prologue/epilogue is after split1/split2 phase, we can only
>>> > output "sethi" + "ori" directly.
>>> > (The "addi" instruction with $r15 is a 32-bit instruction.)
>> But this code is in the output template of the define_insn.  That code
>> is only executed during final, after all passes have been run.  If the
>> template returns "#", final will split the instruction itself, which is
>> possible even at that late stage.  "#" doesn't have any effect on the
>> passes themselves.
>> 
>> (FWIW, there's also a split3 pass that runs after prologue/epilogue
>> generation but before sched2.)
>> 
>> However, ISTR there is/was a rule that prologue instructions shouldn't
>> be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
>> Maybe you hit an ICE because of that?
>> 
>> Another way to handle this would be to have the movsi expander split
>> large constant moves.  When can_create_pseudo_p (), the intermediate
>> results can be stored in new registers, otherwise they should reuse
>> operands[0].  Two advantages to doing it that way are that high parts
>> can be shared before RA, and that calls to emit_move_insn from the
>> prologue code will split the move automatically.  I think many ports
>> do it that way (including MIPS FWIW).
>
> FWIW, most ports usually just handle such "large adjustment" cases in
> the prologue/epilogue code manually; either multiple SP-adjustments, or
> use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
> You might be able to get it to work, but trying to rely on the splitter
> does not seem like best practice...

To be clear, I wasn't talking about relying on the splitter in the
define_split sense.  I was saying that the move expanders could
split large constants.

MIPS prologue code does use emit_move_insn to move large constants,
which automatically produces a split form from the outset.  I don't
really agree that it's bad practice.

Thanks,
Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Lin Tang
On 2013/10/6 下午 06:33, Richard Sandiford wrote:
> Chung-Lin Tang  writes:
>> On 2013/10/6 05:57 PM, Richard Sandiford wrote:
> But case 16 is different.
> This case is only produced at prologue/epilogue phase, using a temporary
> register $r15 to hold a large constant for adjusting stack pointer. 
> Since prologue/epilogue is after split1/split2 phase, we can only
> output "sethi" + "ori" directly.
> (The "addi" instruction with $r15 is a 32-bit instruction.)
>>> But this code is in the output template of the define_insn.  That code
>>> is only executed during final, after all passes have been run.  If the
>>> template returns "#", final will split the instruction itself, which is
>>> possible even at that late stage.  "#" doesn't have any effect on the
>>> passes themselves.
>>>
>>> (FWIW, there's also a split3 pass that runs after prologue/epilogue
>>> generation but before sched2.)
>>>
>>> However, ISTR there is/was a rule that prologue instructions shouldn't
>>> be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
>>> Maybe you hit an ICE because of that?
>>>
>>> Another way to handle this would be to have the movsi expander split
>>> large constant moves.  When can_create_pseudo_p (), the intermediate
>>> results can be stored in new registers, otherwise they should reuse
>>> operands[0].  Two advantages to doing it that way are that high parts
>>> can be shared before RA, and that calls to emit_move_insn from the
>>> prologue code will split the move automatically.  I think many ports
>>> do it that way (including MIPS FWIW).
>>
>> FWIW, most ports usually just handle such "large adjustment" cases in
>> the prologue/epilogue code manually; either multiple SP-adjustments, or
>> use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
>> You might be able to get it to work, but trying to rely on the splitter
>> does not seem like best practice...
> 
> To be clear, I wasn't talking about relying on the splitter in the
> define_split sense.  I was saying that the move expanders could
> split large constants.

Okay, I sort of missed the context.

> MIPS prologue code does use emit_move_insn to move large constants,
> which automatically produces a split form from the outset.  I don't
> really agree that it's bad practice.

I think that's mostly the same as what I meant by "manually"; it seems
that there's lots of MIPS backend machinery starting from
mips_legitimize_move(), so it's not really "automatic" ;)

Chung-Lin



Re: [C++ Patch] PR 56060

2013-10-06 Thread Jason Merrill
For EXPR_PACK_EXPANSION we can just return true; a pack expansion is 
always dependent, on the number of arguments if nothing else.


Jason


Re: [C++ Patch] PR 56060

2013-10-06 Thread Paolo Carlini


Hi,

Jason Merrill  ha scritto:
>For EXPR_PACK_EXPANSION we can just return true; a pack expansion is
>always dependent, on the number of arguments if nothing else.

Thanks. I suspected that ;) Then I'm going to test the corresponding very 
simple patch and commit it.

Thanks again!
Paolo



RE: [PING] [PATCH] PR58143/58227 wrong code at -O3

2013-10-06 Thread Bernd Edlinger
Ping!


How I should proceed with this patch, is it OK?

The latest version was posted at: 
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html

Thanks,
Bernd.

>
> ping...
>
> On Wed, 4 Sep 2013 18:45:39, Bernd Edlinger wrote:
>>
>> On Tue, 3 Sep 2013 12:31:50, Richard Biener wrote:
>>> On Fri, Aug 30, 2013 at 6:43 PM, Bernd Edlinger
>>>  wrote:
 Now I think this is good opportunity for a simple heuristic:

 If a statement is at loop level 1 we can move it out of the loop,
 regardless of the fact, that it may invoke undefined behavior, because if 
 it is
 moved then out of any loops, and thus it cannot be an induction variable 
 and
 cause problems with aggressive loop optimizations later.
>>>
>>> VRP can still cause wrong-code issues (it's probably hard to generate a
>>> testcase though). Also a less conservative check would be to see
>>> whether we hoist _into_ loop level 0 (though we cannot check that at
>>> the point where you placed the check).
>>
>> Well, then I should better revert this heuristic again.
>>
 Only statements with possible undefined behavior in nested loops can become
 induction variable if lim moves them from one loop into an outer loop.

 Therefore with extremely much luck the test case will pass unmodified.
 I tried it, and the patch passes bootstrap and causes zero regressions
 with this heuristic.

 Ok for trunk now?
>>>
>>> Jakub mentioned another possibility - make sure the moved expression
>>> does not invoke undefined behavior by computing in an unsigned type.
>>
>> That is a possibility, but on the other hand, that would obscure the 
>> undefined
>> behavior and thus prevent other possible optimizations later.
>>
>> Another possibility would be to move the statement together with the
>> enclosing if-statement, thus really preserving the execution.
>>
>>> That said, for the sake of backporting we need a patch as simple as
>>> possible - so it would be interesting to see whether the patch without
>>> the loop 1 heuristic has any effect on say SPEC CPU 2006 performance.
>>
>> I do not have access to that test, but on the dhrystone benchmark this patch
>> has no influence whatsoever.
>>
>> Attached you'll find the latest version of my patch without the heuristic.
>> Bootstrapped on i686-pc-linux-gnu and regression tested again.
>>
>> Ok for trunk and 4.8 branch?   

Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Ju Wu
2013/10/6 Richard Sandiford :
> Chung-Ju Wu  writes:
>> On 10/2/13 1:31 AM, Richard Sandiford wrote:
>>> Chung-Ju Wu  writes:
 +  /* Use $r15, if the value is NOT in the range of Is20,
 + we must output "sethi + ori" directly since
 + we may already passed the split stage.  */
 +  return "sethi\t%0, hi20(%1)\;ori\t%0, %0, lo12(%1)";
 +case 17:
 +  return "#";
>>>
>>> I don't really understand the comment for case 16.  Returning "#"
>>> (like for case 17) forces a split even at the output stage.
>>>
>>> In this case it might not be worth forcing a split though, so I don't
>>> see any need to change the code.  I think the comment should be changed
>>> to give a different reason though.
>>>
>>
>> Sorry for the misleading comment.
>>
>> For case 17, we were trying to split large constant into two individual
>> rtx patterns into "sethi" + "addi" so that we can have chance to match
>> "addi" pattern with 16-bit instruction.
>>
>> But case 16 is different.
>> This case is only produced at prologue/epilogue phase, using a temporary
>> register $r15 to hold a large constant for adjusting stack pointer.
>> Since prologue/epilogue is after split1/split2 phase, we can only
>> output "sethi" + "ori" directly.
>> (The "addi" instruction with $r15 is a 32-bit instruction.)
>
> But this code is in the output template of the define_insn.  That code
> is only executed during final, after all passes have been run.  If the
> template returns "#", final will split the instruction itself, which is
> possible even at that late stage.  "#" doesn't have any effect on the
> passes themselves.
>
> (FWIW, there's also a split3 pass that runs after prologue/epilogue
> generation but before sched2.)
>
> However, ISTR there is/was a rule that prologue instructions shouldn't
> be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
> Maybe you hit an ICE because of that?
>

Ah... yes, you are right.  In the nds32_force_addi_stack_int(),
I move a large constant to a temp register for stack pointer adjustment:

+  /* $r15 is going to be temporary register to hold the value.  */
+  tmp_reg = gen_rtx_REG (SImode, TA_REGNUM);
+
+  /* Create one more instruction to move value
+ into the temporary register.  */
+  value_move_insn = emit_move_insn (tmp_reg, GEN_INT (full_value));
+
+  /* At prologue, we need to tell GCC that this is frame related insn,
+ so that we can consider this instruction to output debug information.
+ If full_value is NEGATIVE, it means this function
+ is invoked by expand_prologue.  */
+  if (full_value < 0)
+RTX_FRAME_RELATED_P (value_move_insn) = 1;
+
+  /* Create new 'add' rtx.  */
+  sp_adjust_insn = gen_addsi3 (stack_pointer_rtx,
+   stack_pointer_rtx,
+   tmp_reg);
+  /* Emit rtx into insn list and receive its transformed insn rtx.  */
+  sp_adjust_insn = emit_insn (sp_adjust_insn);
+
+  /* At prologue, we need to tell GCC that this is frame related insn,
+ so that we can consider this instruction to output debug information.
+ If full_value is NEGATIVE, it means this function
+ is invoked by expand_prologue.  */
+  if (full_value < 0)
+RTX_FRAME_RELATED_P (sp_adjust_insn) = 1;

If there is a rule to avoid spliting instructions with RTX_FRAME_RELATED_P,
I think it is the case why I hit an ICE of unrecognized insn for
'value_move_insn'.

It seems that my comment to case 16 is incorrect.
Thanks for clarifying it.

> Another way to handle this would be to have the movsi expander split
> large constant moves.  When can_create_pseudo_p (), the intermediate
> results can be stored in new registers, otherwise they should reuse
> operands[0].  Two advantages to doing it that way are that high parts
> can be shared before RA, and that calls to emit_move_insn from the
> prologue code will split the move automatically.  I think many ports
> do it that way (including MIPS FWIW).
>

Do you mean that I should split large constant by myself in movsi
(or starting from movsi) for both case 16 and case 17?

Thanks for the suggestion.  I'll try to implement it. :)


Best regards,
jasonwucj


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Ju Wu
2013/10/6 Chung-Lin Tang :
> On 2013/10/6 下午 06:33, Richard Sandiford wrote:
>> Chung-Lin Tang  writes:
>>> On 2013/10/6 05:57 PM, Richard Sandiford wrote:
 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).
>>>
>>> FWIW, most ports usually just handle such "large adjustment" cases in
>>> the prologue/epilogue code manually; either multiple SP-adjustments, or
>>> use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
>>> You might be able to get it to work, but trying to rely on the splitter
>>> does not seem like best practice...
>>
>> To be clear, I wasn't talking about relying on the splitter in the
>> define_split sense.  I was saying that the move expanders could
>> split large constants.
>
>> MIPS prologue code does use emit_move_insn to move large constants,
>> which automatically produces a split form from the outset.  I don't
>> really agree that it's bad practice.
>
> I think that's mostly the same as what I meant by "manually"; it seems
> that there's lots of MIPS backend machinery starting from
> mips_legitimize_move(), so it's not really "automatic" ;)
>
> Chung-Lin
>

Hi, Chung-Lin,

Thanks for the hint. ^_^

I will follow Richard and your suggestion to split large constant
via movsi manually.  So that it will be automatically split whenever
emit_move_insn() is used. :)


Best regards,
jasonwucj


Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-06 Thread Jan Hubicka
> 2013-10-03  Teresa Johnson  
> 
> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.

I would not mention HOT in the parameter name.  Blocks are hot/normal/unlikely
and we HOT_BB_FREQUENCY_FRACTION.

So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
to number of train runs?
> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
> +"min-hot-run-ratio",
> + "The minimum ratio of profile runs to basic block execution count "
> + "for the block to be considered hot",
Perhaps as:
> + "The maximum ratio of profile runs to basic block execution count "
> + "for the block to be considered unlikely",

> +4, 0, 1)

lower bound should be 1 or we divide by 0.
> Index: predict.c
> ===
> --- predict.c   (revision 203152)
> +++ predict.c   (working copy)
> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>gcc_checking_assert (fun);
>if (profile_status_for_function (fun) == PROFILE_READ)
>  {
> -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
> +  if (RDIV (count * min_run_ratio, profile_info->runs) > 0)

This way if count is 1, runs needs to be n_run_ratio * 2
So perhaps if (count * min_run_ratio >= profile_info->runs) instead of division.
> -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +  if (ENTRY_BLOCK_PTR->count)
> {
> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -   ENTRY_BLOCK_PTR->frequency)
> - < REG_BR_PROB_BASE / 4);
> +  gcov_type scaled_count
> +  = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
> +  gcov_type computed_count;
> +  /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
> + be large enough to do the division first without losing much
> + precision.  */
> +  if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)

I do not like the undefined behaviour triggered before the check (sure unsigned
arithmetic would fix that, but it seems all strange).  What about guarding the
whole code.

if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
For large counts the roudoff problems in first test should not be problem.
> +{
> +  computed_count = RDIV (ENTRY_BLOCK_PTR->count,
> + ENTRY_BLOCK_PTR->frequency);
> +  computed_count *= frequency * min_run_ratio;
> +}
> +  else
> +computed_count = RDIV (scaled_count, ENTRY_BLOCK_PTR->frequency);
> +  if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)

So at nonoverflow path you compute

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
min_run_ratio / runs > 0.5

I think you want

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs

If entry_bb_frequency is 0, you can just return true iff frequency is > 
min_run_ratio.

I think safe way to compute this is

if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
  scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
entry_bb_frequency;
else
  scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
min_run_ratio

In first path, we have at most 1*1*1*1 that still fits in 64bit
value.

In the second path the base of fixed point arithmetic is large enough so the
division should not matter. We never get over entry_count * 1 that is safe.
The first part of statement however just compute value that should match count
so count > 1 unless profile got misupated and the low-count roundoff errors 
should
not matter. So perhaps the whole code can execute only if
if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
&& ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
and we do not need any conditionals for scaled_count

Does such code work with ratio changed to 16?  I think the double 
multiplication in your
patch may have caused problems.

I will check the second part of patch separately.
Thanks!
Honza


[v3] libstdc++/58625

2013-10-06 Thread Paolo Carlini

Hi,

tested x86_64-linux, committed to mainline.

Paolo.

///
2013-10-06  Oleg Endo  
Paolo Carlini  

PR libstdc++/58625
* include/c_global/cmath (signbit): Use __builtin_signbitf and
__builtin_signbitl.
Index: include/c_global/cmath
===
--- include/c_global/cmath  (revision 203227)
+++ include/c_global/cmath  (working copy)
@@ -650,9 +650,10 @@
 isnormal(_Tp __x)
 { return __x != 0 ? true : false; }
 
+  // The front-end doesn't provide a type generic builtin (libstdc++/58625).
   constexpr bool
   signbit(float __x)
-  { return __builtin_signbit(__x); }
+  { return __builtin_signbitf(__x); }
 
   constexpr bool
   signbit(double __x)
@@ -660,7 +661,7 @@
 
   constexpr bool
   signbit(long double __x)
-  { return __builtin_signbit(__x); }
+  { return __builtin_signbitl(__x); }
 
   template
 constexpr typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,


Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-06 Thread Jan Hubicka
> The second part ensures that frequencies are correctly updated after
> inlining. The problem is that after inlining the frequencies were
> being recomputed directly from the corresponding bb counts in
> rebuild_frequencies. If the counts had been truncated to 0, then the
> recomputed frequencies would become 0 as well (often leading to
> inconsistencies in the frequencies between blocks). Then the above
> change to probably_never_executed would not help identify these blocks
> as non-cold from the frequencies. The fix was to use the existing
> logic used during static profile rebuilding to also
> recompute/propagate the frequencies from the branch probabilities in
> the profile feedback case. I also renamed a number of estimate_*
> routines to compute_* and updated the comments to reflect the fact
> that these routines are not doing estimation (from a static profile),
> but in fact recomputing/propagating frequencies from the existing
> (either guessed or profile-feedback-based) probabilities.

I see where your plan - you want frequencies to be computed purely on
probabilities that are there.  This however will lead to unnecesary loss of
precisions for functions with large counts: probabilities are truncated in
precision and the propagation won't give correct answers and has serious
truncating issues.

So perhaps we want to execute this only if maximal count in function is low?
(i.e. less than REG_BR_PROB_BASE or even REG_BR_PROB_BASE/10 - if counts
are close enough the roundoff errors of count updating should be less terrible
than errors introduced by the propagation).

> -/* Estimate probabilities of loopback edges in loops at same nest level.  */
> +/* Compute frequencies in loops at same nest level.  */
> 
>  static void
> -estimate_loops_at_level (struct loop *first_loop)
> +compute_loops_at_level (struct loop *first_loop)

I would keep "estimate".  The whole logic is not safe and it will lead to
mistakes for irreducible loops and many other cases...

> @@ -3027,18 +3042,16 @@ void
>  rebuild_frequencies (void)
>  {
>timevar_push (TV_REBUILD_FREQUENCIES);
> -  if (profile_status == PROFILE_GUESSED)
> +  if (profile_status == PROFILE_GUESSED || profile_status == PROFILE_READ)
>  {
>loop_optimizer_init (0);
>add_noreturn_fake_exit_edges ();
>mark_irreducible_loops ();
>connect_infinite_loops_to_exit ();
> -  estimate_bb_frequencies ();
> +  compute_bb_frequencies (true);

Here please add the check about maximal count in functiona and commnet
(especially for case we will remember to remove this hack if we switch to sreal
based counts/frequencies/probabilities)

OK, with those changes (if it fixes the problem)

Thanks,
Honza


[PATCH, rs6000] Correct vector permute for little endian

2013-10-06 Thread Bill Schmidt
This patch corrects the expansion of vec_perm_constv16qi for
powerpc64le.  The explanation of the problem with a detailed example
appears in the commentary, as this corrects for what I found to be
surprising behavior in the implementation of the vperm instruction, and
I don't want any of us to spend time figuring that out again.  (We may
want to add a programming note in the next version of the ISA.)

This corrects 18 failing tests in the test suite for the powerpc64le
target, without affecting the big-endian targets.  Bootstrapped and
tested with no new regressions on powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


2013-10-06  Bill Schmidt  

* config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
(altivec_expand_vec_perm_const): Call it.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 203018)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -28426,6 +28526,88 @@ rs6000_emit_parity (rtx dst, rtx src)
 }
 }
 
+/* Expand an Altivec constant permutation for little endian mode.
+   There are two issues: First, the two input operands must be
+   swapped so that together they form a double-wide array in LE
+   order.  Second, the vperm instruction has surprising behavior
+   in LE mode:  it interprets the elements of the source vectors
+   in BE mode ("left to right") and interprets the elements of
+   the destination vector in LE mode ("right to left").  To
+   correct for this, we must subtract each element of the permute
+   control vector from 31.
+
+   For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
+   with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
+   We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
+   serve as the permute control vector.  Then, in BE mode,
+
+ vperm 9,10,11,12
+
+   places the desired result in vr9.  However, in LE mode the 
+   vector contents will be
+
+ vr10 = 0003 0002 0001 
+ vr11 = 0007 0006 0005 0004
+
+   The result of the vperm using the same permute control vector is
+
+ vr9  = 0500 0700 0100 0300
+
+   That is, the leftmost 4 bytes of vr10 are interpreted as the
+   source for the rightmost 4 bytes of vr9, and so on.
+
+   If we change the permute control vector to
+
+ vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
+
+   and issue
+
+ vperm 9,11,10,12
+
+   we get the desired
+
+   vr9  = 0006 0004 0002 .  */
+
+void
+altivec_expand_vec_perm_const_le (rtx operands[4])
+{
+  unsigned int i;
+  rtx perm[16];
+  rtx constv, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+
+  /* Unpack and adjust the constant selector.  */
+  for (i = 0; i < 16; ++i)
+{
+  rtx e = XVECEXP (sel, 0, i);
+  unsigned int elt = 31 - (INTVAL (e) & 31);
+  perm[i] = GEN_INT (elt);
+}
+
+  /* Expand to a permute, swapping the inputs and using the
+ adjusted selector.  */
+  if (!REG_P (op0))
+op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+op1 = force_reg (V16QImode, op1);
+
+  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+  constv = force_reg (V16QImode, constv);
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
+  UNSPEC_VPERM);
+  if (!REG_P (target))
+{
+  rtx tmp = gen_reg_rtx (V16QImode);
+  emit_move_insn (tmp, unspec);
+  unspec = tmp;
+}
+
+  emit_move_insn (target, unspec);
+}
+
 /* Expand an Altivec constant permutation.  Return true if we match
an efficient implementation; false to fall back to VPERM.  */
 
@@ -28606,6 +28788,12 @@ altivec_expand_vec_perm_const (rtx operands[4])
}
 }
 
+  if (!BYTES_BIG_ENDIAN)
+{
+  altivec_expand_vec_perm_const_le (operands);
+  return true;
+}
+
   return false;
 }
 




Re: [Patch] Internal functions for testsuite in regex

2013-10-06 Thread Tim Shen
On Sat, Oct 5, 2013 at 7:45 AM, Paolo Carlini  wrote:
> Patch looks great to me. If you don't get more comments over the next day or 
> so, please go ahead.

Committed :)


-- 
Tim Shen


Re: [Patch] Fix COMPILE error in regex and remove default parameter in function definition

2013-10-06 Thread Tim Shen
On Sun, Oct 6, 2013 at 5:38 AM, Paolo Carlini  wrote:
> Ok, thanks.

Committed :)


-- 
Tim Shen


Re: [GOOGLE] Iterative AutoFDO

2013-10-06 Thread Xinliang David Li
Adding additional early inline + value transform (calling them as
utility functions) is 'unconventional' way of invoking passes. It
would be helpful to do more heavy documentation by providing more
examples and explaining why simply augmenting the indirect target info
for promoted icall site with inlined call stack profile data and rely
on existing value profile transform to kick in is not enough.

The patch also misses documentation for many functions at the
definition site. There are also a couple of coding style problems such
as function name does not start a new line (but following return
type).



> /* Return the combined relative location for STMT.  */

> static unsigned
> get_relative_location_for_stmt (gimple stmt)
> {

More explanation on 'relative' -- it is not mentioned in other places either.


> /* Program behavior changed, original promoted (and inlined) target is not
>  hot any more. Will avoid promote the original target.  */
>  if (total >= info->first * 0.5)
>return false;

This needs more explanation.

> /* For a given BB, return its execution count, and annotate value profile
>   if a stmt is not in PROMOTED. Add the location of annotated stmt to
>   ANNOTATED.  */

More explanation on why skipping promoted ones.

 >  || promoted_stmts->find (stmt) != promoted_stmts->end ())
 >   continue;
 > promoted_stmts->insert (stmt);

It is not quite clear how 'promoted' stmts are identified here.

> /* First do indirect call promotion and early inline to make the
> IR match the profiled binary before actual annotation.  */
>  autofdo::stmt_set promoted_stmts;
>  for (int i = 0; i < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS); 
> i++)
>{
>  early_inliner ();
>  if (!flag_value_profile_transformations
>  || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
>break;
>}'

This needs heavy documentation. ALso why putting early inline before
value transform?

David




On Mon, Sep 23, 2013 at 10:51 AM, Dehao Chen  wrote:
> This patch fixes the issue of indirect call promotion while using
> AutoFDO optimized binary to collect profile, and use the new profile
> to re-optimize the binary. Before the patch, the indirect call is
> promoted (and likely inlined) in the profiled binary, and will not be
> promoted in the new iteration. With the patch, a new iterative
> vpt+einline pass is added before annotation to make sure hot promoted
> indirect calls are still promoted before annotation.
>
> Bootstrapped and passed regression test and benchmark test.
>
> OK for google-4_8 branch?
>
> Thanks,
> Dehao
>
> http://codereview.appspot.com/13492045


[C++ Patch] PR 58126

2013-10-06 Thread Paolo Carlini

Hi,

it seems to me that in order to fix this accepts-invalid (for the 
testcase we don't reject the declaration 'B b;') we have simply to 
propagate from bases to derived the CLASSTYPE_READONLY_FIELDS_NEED_INIT 
and CLASSTYPE_REF_FIELDS_NEED_INIT flags. Thus, the few class.c lines in 
the patch below. Then I think we can also make the error messages in 
diagnose_uninitialized_cst_or_ref_member_1 more clear and explicitly 
mention the base to which the problematic member belongs.


Tested x86_64-linux.

Thanks,
Paolo.

PS: I struggled a bit with "%qT" vs "%q#T": I would slightly prefer 
simply talking about, eg, "base class 'A'", we do that in other cases, 
but here we are already using # which automatically picks class vs 
struct. Thus I think we should use # for the base too... or remove all 
the #s and just talk about, eg, "class 'B'" and "base class 'A'" for 
these structs too?


//
/cp
2013-10-06  Paolo Carlini  

PR c++/58126
* class.c (check_bases): Propagate CLASSTYPE_READONLY_FIELDS_NEED_INIT
and CLASSTYPE_REF_FIELDS_NEED_INIT from bases to derived.
* init.c (diagnose_uninitialized_cst_or_ref_member_1): Extend error
messages about uninitialized const and references members to mention
the base class.

/testsuite
2013-10-06  Paolo Carlini  

PR c++/58126
* g++.dg/init/uninitialized1.C: New.
Index: cp/class.c
===
--- cp/class.c  (revision 203228)
+++ cp/class.c  (working copy)
@@ -1517,6 +1517,12 @@ check_bases (tree t,
|= CLASSTYPE_CONTAINS_EMPTY_CLASS_P (basetype);
   TYPE_HAS_COMPLEX_DFLT (t) |= (!TYPE_HAS_DEFAULT_CONSTRUCTOR (basetype)
|| TYPE_HAS_COMPLEX_DFLT (basetype));
+  SET_CLASSTYPE_READONLY_FIELDS_NEED_INIT
+   (t, CLASSTYPE_READONLY_FIELDS_NEED_INIT (t)
+| CLASSTYPE_READONLY_FIELDS_NEED_INIT (basetype));
+  SET_CLASSTYPE_REF_FIELDS_NEED_INIT
+   (t, CLASSTYPE_REF_FIELDS_NEED_INIT (t)
+| CLASSTYPE_REF_FIELDS_NEED_INIT (basetype));
 
   /*  A standard-layout class is a class that:
  ...
Index: cp/init.c
===
--- cp/init.c   (revision 203228)
+++ cp/init.c   (working copy)
@@ -2120,11 +2120,24 @@ diagnose_uninitialized_cst_or_ref_member_1 (tree t
  ++ error_count;
  if (complain)
{
- if (using_new)
-   error ("uninitialized reference member in %q#T "
-  "using % without new-initializer", origin);
+ if (DECL_CONTEXT (field) == origin)
+   {
+ if (using_new)
+   error ("uninitialized reference member in %q#T "
+  "using % without new-initializer", origin);
+ else
+   error ("uninitialized reference member in %q#T", origin);
+   }
  else
-   error ("uninitialized reference member in %q#T", origin);
+   {
+ if (using_new)
+   error ("uninitialized reference member in base %q#T "
+  "of %q#T using % without new-initializer",
+  DECL_CONTEXT (field), origin);
+ else
+   error ("uninitialized reference member in base %q#T "
+  "of %q#T", DECL_CONTEXT (field), origin);
+   }
  inform (DECL_SOURCE_LOCATION (field),
  "%qD should be initialized", field);
}
@@ -2135,11 +2148,24 @@ diagnose_uninitialized_cst_or_ref_member_1 (tree t
  ++ error_count;
  if (complain)
{
- if (using_new)
-   error ("uninitialized const member in %q#T "
-  "using % without new-initializer", origin);
+ if (DECL_CONTEXT (field) == origin)
+   {
+ if (using_new)
+   error ("uninitialized const member in %q#T "
+  "using % without new-initializer", origin);
+ else
+   error ("uninitialized const member in %q#T", origin);
+   }
  else
-   error ("uninitialized const member in %q#T", origin);
+   {
+ if (using_new)
+   error ("uninitialized const member in base %q#T "
+  "of %q#T using % without new-initializer",
+  DECL_CONTEXT (field), origin);
+ else
+   error ("uninitialized const member in base %q#T "
+  "of %q#T", DECL_CONTEXT (field), origin);
+   }
  inform (DECL_SOURCE_LOCATION (field),
  "%qD should be initialized", field);
}
Index: testsuite/g++.dg/init/uninitialized1.C
===

Re: [C++ Patch] PR 58126

2013-10-06 Thread Jason Merrill

On 10/06/2013 03:35 PM, Paolo Carlini wrote:

PS: I struggled a bit with "%qT" vs "%q#T": I would slightly prefer
simply talking about, eg, "base class 'A'", we do that in other cases,
but here we are already using # which automatically picks class vs
struct. Thus I think we should use # for the base too...


That seems fine.

The patch is OK.

Jason




Optimize callers using nonnull attribute

2013-10-06 Thread Marc Glisse

Hello,

this patch asserts that when we call a function with the nonnull 
attribute, the corresponding argument is not zero, just like when we 
dereference a pointer. Everything is under a check for 
flag_delete_null_pointer_checks.


Note that this function currently gives up if the statement may throw 
(because in some languages indirections may throw?), but this could 
probably be relaxed a bit so my example would still work when compiled 
with g++, without having to mark f1 and f2 as throw().


Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.

2013-10-08  Marc Glisse  

PR tree-optimization/58480
gcc/
* tree-vrp.c (infer_value_range): Look for a nonnull attribute.

gcc/testsuite/
* gcc.dg/tree-ssa/pr58480.c: New file.

--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/pr58480.c
===
--- testsuite/gcc.dg/tree-ssa/pr58480.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr58480.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+extern void eliminate (void);
+extern void* f1 (void *a, void *b) __attribute__((nonnull));
+extern void* f2 (void *a, void *b) __attribute__((nonnull(2)));
+void g1 (void*p, void*q){
+  f1 (q, p);
+  if (p == 0)
+eliminate ();
+}
+void g2 (void*p, void*q){
+  f2 (q, p);
+  if (p == 0)
+eliminate ();
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate\[^\\n\]*to 0" 2 "vrp1" 
} } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */

Property changes on: testsuite/gcc.dg/tree-ssa/pr58480.c
___
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: tree-vrp.c
===
--- tree-vrp.c  (revision 203229)
+++ tree-vrp.c  (working copy)
@@ -4493,28 +4493,58 @@ infer_value_range (gimple stmt, tree op,
   /* We can only assume that a pointer dereference will yield
  non-NULL if -fdelete-null-pointer-checks is enabled.  */
   if (flag_delete_null_pointer_checks
   && POINTER_TYPE_P (TREE_TYPE (op))
   && gimple_code (stmt) != GIMPLE_ASM)
 {
   unsigned num_uses, num_loads, num_stores;
 
   count_uses_and_derefs (op, stmt, &num_uses, &num_loads, &num_stores);
   if (num_loads + num_stores > 0)
+   goto nonnull;
+
+  if (gimple_code (stmt) == GIMPLE_CALL)
{
- *val_p = build_int_cst (TREE_TYPE (op), 0);
- *comp_code_p = NE_EXPR;
- return true;
+ tree fntype = gimple_call_fntype (stmt);
+ tree attrs = TYPE_ATTRIBUTES (fntype);
+ for (; attrs; attrs = TREE_CHAIN (attrs))
+   {
+ attrs = lookup_attribute ("nonnull", attrs);
+
+ /* If "nonnull" wasn't specified, we know nothing about
+the argument.  */
+ if (attrs == NULL_TREE)
+   return false;
+
+ /* If "nonnull" applies to all the arguments, then ARG
+is non-null.  */
+ if (TREE_VALUE (attrs) == NULL_TREE)
+   goto nonnull;
+
+ /* Now see if op appears in the nonnull list.  */
+ for (tree t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t))
+   {
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (t)) - 1;
+ tree arg = gimple_call_arg (stmt, idx);
+ if (op == arg)
+   goto nonnull;
+   }
+   }
}
 }
 
   return false;
+
+nonnull:
+  *val_p = build_int_cst (TREE_TYPE (op), 0);
+  *comp_code_p = NE_EXPR;
+  return true;
 }
 
 
 void dump_asserts_for (FILE *, tree);
 void debug_asserts_for (tree);
 void dump_all_asserts (FILE *);
 void debug_all_asserts (void);
 
 /* Dump all the registered assertions for NAME to FILE.  */
 


[Patch] Optimize _BFSExecutor in regex

2013-10-06 Thread Tim Shen
Here's a simple piece of code
https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
inefficiency. Some optimizations are here to reduce the unecessary
time complexity from O(n^2) to O(n).

I'll do a bootstrap and full testing before committing.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-06 Thread Kaz Kojima
Oleg Endo  wrote:
> Forgot to handle a case in function can_remove_cstore, thanks for
> catching it.  Fixed in the attached patch and also added test cases.
> Retested as before without new failures.

Ok for trunk.

> Yeah, right.  I've changed 'ifcvt_sh' to 'sh_ifcvt'.

>+  register_pass (make_pass_sh_ifcvt (g, false, "ifcvt1_sh"),
>+   PASS_POS_INSERT_AFTER, "ce1", 1);
>+*/

s/ifcvt1_sh/sh_ifcvt1/ might be better even in a comment.

Regards,
kaz


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-06 Thread Paolo Carlini

On 10/07/2013 12:43 AM, Tim Shen wrote:

Here's a simple piece of code
https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
inefficiency.

... which we want in testsuite/performance/28_regex!

Thanks!
Paolo.


Re: [PATCH, PowerPC] Change code generation for VSX loads and stores for little endian

2013-10-06 Thread David Edelsohn
On Mon, Sep 30, 2013 at 10:04 PM, Bill Schmidt
 wrote:
> This patch implements support for VSX vector loads and stores in little
> endian mode.  VSX loads and stores permute the register image with
> respect to storage in a unique manner that is not truly little endian.
> This can cause problems (for example, when a vector appears in a union
> with a different type).  This patch adds an explicit permute to each VSX
> load and store instruction so that the register image is true little
> endian.
>
> It is desirable to remove redundant pairs of permutes where legal to do
> so.  This work is delayed to a later patch.
>
> This patch currently has no effect on generated code because -mvsx is
> disabled in little endian mode, pending fixes of additional problems
> with little endian code generation.
>
> I tested this by enabling -mvsx in little endian mode and running the
> regression bucket.  Using a GCC code base from August 5, I observed that
> this patch corrected 187 failures and exposed a new regression.
> Investigation showed that the regression is not directly related to the
> patch.
>
> Unfortunately the results are not as good on current trunk.  It appears
> we have introduced some more problems for little endian code generation
> since August 5th, which hides the effectiveness of the patch; most of
> the VSX vector tests still fail with the patch applied to current trunk.
> There are a handful of additional regressions, which again are not
> directly related to the patch.  I feel that the patch is well-tested by
> the August 5 results, and would like to commit it before continuing to
> investigate the recently introduced problems.
>
> I also bootstrapped and tested the patch on a big-endian machine
> (powerpc64-unknown-linux-gnu) to verify that I introduced no regressions
> in that environment.
>
> Ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2013-09-30  Bill Schmidt  
>
> * config/rs6000/vector.md (mov): Emit permuted move
> sequences for LE VSX loads and stores at expand time.
> * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
> prototype.
> * config/rs6000/rs6000.c (rs6000_const_vec): New.
> (rs6000_gen_le_vsx_permute): New.
> (rs6000_gen_le_vsx_load): New.
> (rs6000_gen_le_vsx_store): New.
> (rs6000_gen_le_vsx_move): New.
> * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
> (*vsx_le_perm_load_v4si): New.
> (*vsx_le_perm_load_v8hi): New.
> (*vsx_le_perm_load_v16qi): New.
> (*vsx_le_perm_store_v2di): New.
> (*vsx_le_perm_store_v4si): New.
> (*vsx_le_perm_store_v8hi): New.
> (*vsx_le_perm_store_v16qi): New.
> (*vsx_xxpermdi2_le_): New.
> (*vsx_xxpermdi4_le_): New.
> (*vsx_xxpermdi8_le_V8HI): New.
> (*vsx_xxpermdi16_le_V16QI): New.
> (*vsx_lxvd2x2_le_): New.
> (*vsx_lxvd2x4_le_): New.
> (*vsx_lxvd2x8_le_V8HI): New.
> (*vsx_lxvd2x16_le_V16QI): New.
> (*vsx_stxvd2x2_le_): New.
> (*vsx_stxvd2x4_le_): New.
> (*vsx_stxvd2x8_le_V8HI): New.
> (*vsx_stxvd2x16_le_V16QI): New.
>
> gcc/testsuite:
>
> 2013-09-30  Bill Schmidt  
>
> * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
> * gcc.target/powerpc/fusion.c: Likewise.

Okay.

Thanks, David


Re: [PATCH, rs6000] Correct vector permute for little endian

2013-10-06 Thread David Edelsohn
On Sun, Oct 6, 2013 at 1:32 PM, Bill Schmidt
 wrote:
> This patch corrects the expansion of vec_perm_constv16qi for
> powerpc64le.  The explanation of the problem with a detailed example
> appears in the commentary, as this corrects for what I found to be
> surprising behavior in the implementation of the vperm instruction, and
> I don't want any of us to spend time figuring that out again.  (We may
> want to add a programming note in the next version of the ISA.)
>
> This corrects 18 failing tests in the test suite for the powerpc64le
> target, without affecting the big-endian targets.  Bootstrapped and
> tested with no new regressions on powerpc64le-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> 2013-10-06  Bill Schmidt  
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
> (altivec_expand_vec_perm_const): Call it.

Okay.

Thanks, David


Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-06 Thread Yury Gribov

Hi Jakub,

>> I've recently submitted a bug report regarding invalid unpoisoning 
of stack frame redzones 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take 
a look at proposed patch (a simple one-liner) and check whether it's ok 
for commit?

>
> Can you please be more verbose

Do you think the proposed patch is ok or should I provide some 
additional info?


-Y