Re: current NEON status

2014-08-12 Thread Richard Earnshaw
On 12/08/14 07:49, Andrew Pinski wrote:
> On Mon, Aug 11, 2014 at 11:44 PM, Marat Zakirov  wrote:
>> Hi Vladimir!
>>
>> I think you are as the main IRA contributor would be appropriate person to
>> answer question bellow. Please confirm or refute my statement about
>> unsplittable register ranges in GCC IRA.
>>
>>
>> On 07/30/2014 05:38 PM, Marat Zakirov wrote:
>>>
>>> Hi there!
>>>
>>> My question came from bug
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725. I found that GCC
>>> considers NEON register ranges as unsplittable. So any subregister may be
>>> used only after whole chunk is dead. This issue leads to redundant spill
>>> fills which is performance trouble.
>>>
>>> Example 1: RAL trouble
>>>
>>> #include 
>>> #include 
>>>
>>> extern  uint16x8x4_t m0;
>>> extern  uint16x8x4_t m1;
>>> extern  uint16x8x4_t m2;
>>> extern  uint16x8x4_t m3;
>>> extern  uint16x8_t   m4;
>>>
>>> void foo1(uint16_t * in_ptr)
>>> {
>>>  uint16x8x4_t t0, t1, t2, t3;
>>>  t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]);
>>>  t1 = vld4q_u16((uint16_t *)&in_ptr[64]);
>>>  t2 = vld4q_u16((uint16_t *)&in_ptr[128]);
>>>  t3 = vld4q_u16((uint16_t *)&in_ptr[192]);
>>>  m4 = t0.val[3];
>>>  m4 = m4 * 3;   <<< *
>>>  t0.val[3] = t1.val[3];
>>>  m0 = t3;
>>>  m1 = t2;
>>>  m2 = t1;
>>>  m3 = t0;
>>> }
>>>
>>> Here test uses all NEON registers. No spill is needed. Because
>>> multiplication requires one Q register which may be obtained from dead
>>> t0.val[3] subregister. But GCC makes spill if multiplication (*) exists
>>> because of issue described above.
>>>
>>> Example 2: CSE makes trouble for IRA
>>>
>>> #include 
>>> #include 
>>>
>>> extern  uint16x8x4_t m0;
>>> extern  uint16x8x4_t m1;
>>>
>>> void foo2(uint16_t * in_ptr)
>>> {
>>>  uint16x8x4_t t0, t1;
>>>  t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]);
>>>  t1 = vld4q_u16((uint16_t *)&in_ptr[64]);
>>>  t0.val[0] *= 333;
>>>  t0.val[1] *= 333;
>>>  t0.val[2] *= 333;
>>>  t0.val[3] *= 333;
>>>  t1.val[0] *= 333;
>>>  t1.val[1] *= 333;
>>>  t1.val[2] *= 333;
>>>  t1.val[3] *= 333;
>>>  m0 = t0;
>>>  m1 = t1;
>>> }
>>>
>>> Here test uses only half NEON + one Q for '333' factor. But GCC makes
>>> spills here too! Briefly speak problem is in partial CSE. GCC generates rtl
>>> with the listed bellow form:
>>>
>>> Before CSE:
>>>
>>> a = b
>>> a0 = a0 * 3
>>> a1 = a1 * 3
>>> a2 = a2 * 3
>>> a3 = a3 * 3
>>>
>>> After:
>>>
>>> a = b
>>> a0 = b0 * 3
>>> a1 = a1 * 3 <<< *
>>> a2 = a2 * 3
>>> a3 = a3 * 3
>>>
>>> CSE do not substitute b1 to a1 because at the moment (*) a0 was already
>>> defined so actually a != b. Yes but a1 = b1, unfortunately CSE also do not
>>> handle register-ranges parts as RA does. Strange thing here is that even if
>>> we fix CSE, so CSE could propagate register-ranges subregs, this will make
>>> trouble to RAL also because of the same reason: IRA do not handle precisely
>>> register ranges parts. I attached a demo patch which forbids partial CSE
>>> propagation and removes spills from Ex2. Is this patch OK? Or maybe CSE
>>> should be fixed in a different way? Or maybe partial substitution is OK?
>>>
>>> Main question: Are there any plans to fix/upgrade IRA?
>>>
>>> --Marat
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2014-07-30  Marat Zakirov  
>>
>> * cse.c (canon_reg): Forbid partial CSE.
>> * fwprop.c (forward_propagate_and_simplify): Likewise.
>>
>> diff --git a/gcc/cse.c b/gcc/cse.c
>> index 34f9364..a9e0442 100644
>> --- a/gcc/cse.c
>> +++ b/gcc/cse.c
>> @@ -2862,6 +2862,9 @@ canon_reg (rtx x, rtx insn)
>> || ! REGNO_QTY_VALID_P (REGNO (x)))
>>   return x;
>>
>> +if (GET_MODE (x) == XImode)
>> +  return x;
> 
> This patch is wrong and even more wrong.  XImode is not defined in all 
> targets.
>

Even if it were, this still wouldn't be the right fix.  What if a
machine had a native XImode?  Then you'd be arbitrarily disabling parts
of the compiler.

R.


> Maybe the better fix is to have lower subreg come along and split up
> the moves for a = b and then a pass after reload comes along and
> stitches it back together.
> 
> Thanks,
> Andrew
> 
> 
>> +
>> q = REG_QTY (REGNO (x));
>> ent = &qty_table[q];
>> first = ent->first_reg;
>> diff --git a/gcc/fwprop.c b/gcc/fwprop.c
>> index 547fcd6..eadc729 100644
>> --- a/gcc/fwprop.c
>> +++ b/gcc/fwprop.c
>> @@ -1317,6 +1317,9 @@ forward_propagate_and_simplify (df_ref use, rtx
>> def_insn, rtx def_set)
>>if (!new_rtx)
>>  return false;
>>
>> +  if (GET_MODE (reg) == XImode)
>> +return false;
>> +
>>return try_fwprop_subst (use, loc, new_rtx, def_insn, set_reg_equal);
>>  }
>>
>>
> 




Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Richard Biener
On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:
> On 08/11/14 07:41, Kyrill Tkachov wrote:
>>
>>
>> I haven't been able to get combine to match the comparison+xor+neg+plus
>> RTL and it seems like it would be just a workaround to undo the
>> tree-level transformation.
>
> Yea, it'd just be a workaround, but it's probably the easiest way to deal
> with this problem.  Can you describe in further detail why you weren't able
> to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.

OTOH a suitable place to "undo" would be smarter RTL expansion
that detects this pattern (and hope for TER to still apply - thus
no CSE opportunities going in your way).  For your testcase TER
allows

r_5 replace with --> r_5 = (int) _4;

_8 replace with --> _8 = _4 != 0;

_10 replace with --> _10 = -_9;

_11 replace with --> _11 = _10 ^ r_5;

_12 replace with --> _12 = _11 + _9;

which unfortunately already "breaks" because of the multi-use _9
and _4.

Now - the way out here is a GIMPLE pass right before expansion
that performs pattern detection and generates target specific code suitable
for optimal expand.  Fortunately we already have that with
pass_optimize_widening_mul (which does widen-mul and fma detection).

This is the place where you'd deal with such pattern, generating
a suitable compound operation (or two, dependent on expansion
and insn pattern details).  This of course usually requires new
tree codes or internal functions.  For more arcane stuff like
your conditional negate I'd prefer an internal function.

Of course you then have to add an optab or a target hook to
do custom internal function expansion.  I suppose for internal
functions a target hook would be nicer than a generic optab?

Thanks,
Richard.



>
>>
>> What is the most acceptable way of disabling this transformation for a
>> target that has a conditional negation instruction?
>
> In general, we don't want target dependencies in the gimple/ssa optimizers.
>
> Jeff


Re: writing patterns

2014-08-12 Thread Richard Biener
On Tue, Aug 12, 2014 at 12:47 AM, Prathamesh Kulkarni
 wrote:
> On Mon, Aug 11, 2014 at 5:52 PM, Richard Biener
>  wrote:
>> On Mon, Aug 11, 2014 at 1:06 AM, Prathamesh Kulkarni
>>  wrote:
>>> On Mon, Aug 11, 2014 at 2:53 AM, Prathamesh Kulkarni
>>>  wrote:
 On Tue, Aug 5, 2014 at 3:15 PM, Richard Biener
  wrote:
> On Mon, Aug 4, 2014 at 4:44 PM, Prathamesh Kulkarni
>  wrote:
>> On Mon, Aug 4, 2014 at 2:59 PM, Richard Biener
> No, not changing tree.def but its use as you did in your first proposed
> patch.
 I have got it done except for capturing optional converts.
 for instance (negate (convert1@0 @1)) does not work.
 This is because of insertion in decision tree (insert::operand).
>>
>> That's because you lower it to
>>
>>  (negate @0@1)
>>
>> right?  a capture with ->what being a capture itself?  That's reasonable
>> btw.
>>
 Since captures were always predicate or expressions,
 ie, the corresponding dt-node->type was dt_node::DT_OPERAND,
 i could hard-code for a search on dt_node::DT_OPERAND.
 This changes when capture itself can be captured 
 (dt_node::DT_TRUE/DT_MATCH),
 and there's no easy way right now to obtain corresponding
 dt-node from a given operand node.

 Should we keep a mapping (hash_map/std::map) from operand * to dt_node * ?
 That will also get rid of dt_node::parent (but we need to then
 introduce parent in AST).

 Other cases where convert1, convert2 are not captured work
 fine.
 (bit_not (convert1 @0) (convert2 @1))
 I have attached patch for reference (don't intend to get it committed).
>>>
>>> I have fixed it in a hackish way in the current patch, so capturing
>>> optional convert works..
>>> Since we don't know type corresponding to capture in decision tree
>>> :DT_TRUE/DT_MATCH (unless
>>> we get hold of dt-node corresponding to the capture), I tried both the
>>> possibilities.
>>
>> That looks reasonable to me though I manage to hit the assert with
>>
>> (simplify
>>(plus (convert1@1 (plus@2 INTEGER_CST_P INTEGER_CST_P)) @3)
>>(if (useless_type_conversion_p (TREE_TYPE (@1), TREE_TYPE (@2
>>@2)
>>
>> when ->what is an expression.
> oops, this leads to multiple captures -:)
> in this case we have 2-captures: @1 capturing @2 capturing inner plus
> expression.
> we could have n-captures by introducing arbitrary number of continuous 
> convert1.
> for instance:
> (plus (convert1@1 (convert1@2 (plus@3 INTEGER_CST_P INTEGER_CST_P))) @4)
> which is a 3-captures (@1:@2:@3:plus)
>
> This patch attempts to handle multiple number of captures.
>
> * genmatch.c (decision_tree::insert_operand): Adjust to allow multiple 
> capture.

Thanks - applied.

> Thanks,
> Prathamesh
>
>>
>> I'd also like the various compares against CONVERT1/2 to be not done
>> as string compares but using enum tree_code CONVERT1
>> equality checks.
>>
>> I have fixed that (but not the above insertion issue) and committed the
>> patch.
>>
>> Note that I'd still like the syntax to include a question mark, thus
>> convert?, convert1? or convert2? should be conditional converts.
>> Requires a parser hack though.
> hmm, but since we use convert1, convert2 explicitly to denote
> conditional convert,
> so probably no need for a marker like '?'  ?

Well, a '?' is so much clearer than a '1' - and being able to write
convert? is nicer than convert1 (so we need convert1 and convert2
only to form groups).

Well, I give it a quick try in a second ;)

Thanks,
Richard.

> Thanks,
> Prathamesh
>>
>> Thanks,
>> Richard.
>>
>>> * genmatch.c (enum tree_code): Add new values CONVERT1, CONVERT2.
>>> (lower_opt_convert): New function.
>>> (lower_opt_convert): New function overloaded.
>>> (lower_opt_convert): Likewise.
>>> (lower_opt_convert): Likewise.
>>> (remove_opt_convert): New function.
>>> (has_opt_convert): Likewise.
>>> (decision_tree::insert_operand): Adjust for capturing a capture.
>>> (parse_for): Reject CONVERT1, CONVERT2.
>>> (main): call add_operator.
>>>call lower_opt_convert.
>>>
>>> Thanks,
>>> Prathamesh

 Thanks,
 Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>>>
>>> Richard.
>>>

 * genmatch.c (enum tree_code): New value OPT_CONVERT.
   (e_operation): New member unsigned group_index.
   (e_operation::e_operation): Add new default argument.
   Adjust to changes in 
 e_operation.
   (remove_opt_convert): New function.
   (has_opt_convert): Likewise.
   (lower_opt_convert): Likewise.
   (lower_opt_convert): New overloaded function.
   (lower_opt_convert): Likewise.
   (parse_expr): Adjust to parse opt_convert.
   (parse_for): Reject opt_convert in opers.
   (main): Call lower_opt_convert.

 Thanks,

Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Kyrill Tkachov


On 12/08/14 10:39, Richard Biener wrote:

On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:

On 08/11/14 07:41, Kyrill Tkachov wrote:


I haven't been able to get combine to match the comparison+xor+neg+plus
RTL and it seems like it would be just a workaround to undo the
tree-level transformation.

Yea, it'd just be a workaround, but it's probably the easiest way to deal
with this problem.  Can you describe in further detail why you weren't able
to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.


From the combine dump I can see that it tried to combine up to:
(set (reg/i:SI 0 x0)
(plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
(reg:SI 73 [ D.2564 ]))
(reg:SI 84 [ D.2565 ])))

What I need is for that reg 84 to be the result of the comparison, 
something like:

(ne (cc_reg) (const_int 0)) which I couldn't get combine to shove in there.


On the other hand, I did manage to write a peephole2 that detected the 
sequence of compare+neg+xor+plus and transformed it into the 
if_then_else form that our current conditional negation pattern has, 
although I'm not sure how flexible this is.




OTOH a suitable place to "undo" would be smarter RTL expansion
that detects this pattern (and hope for TER to still apply - thus
no CSE opportunities going in your way).  For your testcase TER
allows

r_5 replace with --> r_5 = (int) _4;

_8 replace with --> _8 = _4 != 0;

_10 replace with --> _10 = -_9;

_11 replace with --> _11 = _10 ^ r_5;

_12 replace with --> _12 = _11 + _9;

which unfortunately already "breaks" because of the multi-use _9
and _4.

Now - the way out here is a GIMPLE pass right before expansion
that performs pattern detection and generates target specific code suitable
for optimal expand.  Fortunately we already have that with
pass_optimize_widening_mul (which does widen-mul and fma detection).


Maybe we can piggyback on this pass then? (probably renaming it to 
something more generic in the process...)




This is the place where you'd deal with such pattern, generating
a suitable compound operation (or two, dependent on expansion
and insn pattern details).  This of course usually requires new
tree codes or internal functions.  For more arcane stuff like
your conditional negate I'd prefer an internal function.


What is an internal function in this context? Is that like a 
target-specific builtin?




Of course you then have to add an optab or a target hook to
do custom internal function expansion.  I suppose for internal
functions a target hook would be nicer than a generic optab?


So something like TARGET_EXPAND_CONDITIONAL_NEGATION that gets called 
from the aforementioned GIMPLE pass when it finds a an appropriate 
sequence of compare+neg+xor+plus ?


Thanks for the pointers,
Kyrill



Thanks,
Richard.




What is the most acceptable way of disabling this transformation for a
target that has a conditional negation instruction?

In general, we don't want target dependencies in the gimple/ssa optimizers.

Jeff





Re: Could you please clarify about GCC optimizations?

2014-08-12 Thread Evgeniya Maenkova
Got it. Thank you.

On Fri, Aug 8, 2014 at 11:07 PM, Jeff Law  wrote:
> On 08/08/14 06:18, Evgeniya Maenkova wrote:
>>
>> As far as I know, there are so many configurations (frontends x
>> backends x applications(benchmarks) x etc), that the same optimization
>> could improve performance in one configuration and degrade at other
>> conditions.
>
> Correct.
>
>
>> What performance tests do you perform before including an optimization
>> in GCC? How do you aggregate the results (taking into account
>> different behavior on different frontend/backend/benchmarks)?
>> Excuse me, if some information is available in GCC documentation,
>> didn’t found so far.
>
> Each developer makes their own determination as to what performance tests
> are appropriate to run and on what platforms to run those tests. Some rely
> largely on SPEC, others utilize large desktop applications such as firefox
> and others are more focused on EEMBC, etc.  It really depends on each
> developer's focus.
>
> In general optimizations on GIMPLE/SSA are in large designed to eliminate as
> much redundancy as possible independent of the target processor.  There are
> exceptions, but as a guiding principle that is correct.
>
> When GIMPLE is lowered to RTL, the expanders query the backend for a
> information to guide lowering to RTL in a target dependent way. Similarly
> the RTL optimizers are designed to query the backend for information to
> guide low level aspects of code generation and optimization.
>
> When optimizations are submitted for inclusion, there's a review process
> where the code reviewers may ask questions or ask for further benchmarks,
> etc.  The reviewers also use their experience to guide submissions in the
> right direction.
>
> So there's no single simple answer.  It varies based on many factors.
>
> jeff



-- 
Thanks,

Evgeniya

perfstories.wordpress.com


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Marc Glisse

On Mon, 11 Aug 2014, Kyrill Tkachov wrote:


The aarch64 target has a conditional negation instruction
CSNEG Rd, Rs1, Rs2, cond

with semantics Rd = if cond then Rs1 else -Rs2.

This, however doesn't get end up getting matched for code such as:
int
foo2 (unsigned a, unsigned b)
{
 int r = 0;
 r = a & b;
 if (a & b)
   return -r;
 return r;
}


Note that in this particular case, we should just return -(a&b) like llvm 
does.


--
Marc Glisse


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Richard Biener
On Tue, Aug 12, 2014 at 12:31 PM, Kyrill Tkachov  wrote:
>
> On 12/08/14 10:39, Richard Biener wrote:
>>
>> On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:
>>>
>>> On 08/11/14 07:41, Kyrill Tkachov wrote:


 I haven't been able to get combine to match the comparison+xor+neg+plus
 RTL and it seems like it would be just a workaround to undo the
 tree-level transformation.
>>>
>>> Yea, it'd just be a workaround, but it's probably the easiest way to deal
>>> with this problem.  Can you describe in further detail why you weren't
>>> able
>>> to get this to work?
>>
>> Too many instructions to combine I guess.  You might want to add
>> intermediate "combine" insn-and-splits.  If that's still a no-go then
>> read on.
>
>
> From the combine dump I can see that it tried to combine up to:
> (set (reg/i:SI 0 x0)
> (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
> (reg:SI 73 [ D.2564 ]))
> (reg:SI 84 [ D.2565 ])))
>
> What I need is for that reg 84 to be the result of the comparison, something
> like:
> (ne (cc_reg) (const_int 0)) which I couldn't get combine to shove in there.
>
>
> On the other hand, I did manage to write a peephole2 that detected the
> sequence of compare+neg+xor+plus and transformed it into the if_then_else
> form that our current conditional negation pattern has, although I'm not
> sure how flexible this is.
>
>
>>
>> OTOH a suitable place to "undo" would be smarter RTL expansion
>> that detects this pattern (and hope for TER to still apply - thus
>> no CSE opportunities going in your way).  For your testcase TER
>> allows
>>
>> r_5 replace with --> r_5 = (int) _4;
>>
>> _8 replace with --> _8 = _4 != 0;
>>
>> _10 replace with --> _10 = -_9;
>>
>> _11 replace with --> _11 = _10 ^ r_5;
>>
>> _12 replace with --> _12 = _11 + _9;
>>
>> which unfortunately already "breaks" because of the multi-use _9
>> and _4.
>>
>> Now - the way out here is a GIMPLE pass right before expansion
>> that performs pattern detection and generates target specific code
>> suitable
>> for optimal expand.  Fortunately we already have that with
>> pass_optimize_widening_mul (which does widen-mul and fma detection).
>
>
> Maybe we can piggyback on this pass then? (probably renaming it to something
> more generic in the process...)
>
>
>>
>> This is the place where you'd deal with such pattern, generating
>> a suitable compound operation (or two, dependent on expansion
>> and insn pattern details).  This of course usually requires new
>> tree codes or internal functions.  For more arcane stuff like
>> your conditional negate I'd prefer an internal function.
>
>
> What is an internal function in this context? Is that like a target-specific
> builtin?
>
>
>>
>> Of course you then have to add an optab or a target hook to
>> do custom internal function expansion.  I suppose for internal
>> functions a target hook would be nicer than a generic optab?
>
>
> So something like TARGET_EXPAND_CONDITIONAL_NEGATION that gets called from
> the aforementioned GIMPLE pass when it finds a an appropriate sequence of
> compare+neg+xor+plus ?

More like TARGET_CAN_EXPAND_IFN with the new GIMPLE internal-fn
and a TARGET_EXPAND_IFN to actually perform the expansion.

Richard.

> Thanks for the pointers,
> Kyrill
>
>
>>
>> Thanks,
>> Richard.
>>
>>
>>
 What is the most acceptable way of disabling this transformation for a
 target that has a conditional negation instruction?
>>>
>>> In general, we don't want target dependencies in the gimple/ssa
>>> optimizers.
>>>
>>> Jeff
>
>
>


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Jeff Law

On 08/12/14 04:31, Kyrill Tkachov wrote:


On 12/08/14 10:39, Richard Biener wrote:

On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:

On 08/11/14 07:41, Kyrill Tkachov wrote:


I haven't been able to get combine to match the comparison+xor+neg+plus
RTL and it seems like it would be just a workaround to undo the
tree-level transformation.

Yea, it'd just be a workaround, but it's probably the easiest way to
deal
with this problem.  Can you describe in further detail why you
weren't able
to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.

My guess was too many insns as well..  But that's often solvable.



 From the combine dump I can see that it tried to combine up to:
(set (reg/i:SI 0 x0)
 (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
 (reg:SI 73 [ D.2564 ]))
 (reg:SI 84 [ D.2565 ])))
And did it find a match for this?   What happens if (just for testing 
purposes), you create a pattern for this?  Does combine then try 
something even more complex, possibly getting your conditional negation?





On the other hand, I did manage to write a peephole2 that detected the
sequence of compare+neg+xor+plus and transformed it into the
if_then_else form that our current conditional negation pattern has,
although I'm not sure how flexible this is.
Probably not very.  We really should be looking at combine.  In fact, I 
would argue that we should be looking at combine regardless of whether 
or not we twiddle expansion as humans or machine generated code could 
look like this...


jeff



Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Kyrill Tkachov


On 12/08/14 15:22, Jeff Law wrote:

On 08/12/14 04:31, Kyrill Tkachov wrote:

On 12/08/14 10:39, Richard Biener wrote:

On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:

On 08/11/14 07:41, Kyrill Tkachov wrote:

I haven't been able to get combine to match the comparison+xor+neg+plus
RTL and it seems like it would be just a workaround to undo the
tree-level transformation.

Yea, it'd just be a workaround, but it's probably the easiest way to
deal
with this problem.  Can you describe in further detail why you
weren't able
to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.

My guess was too many insns as well..  But that's often solvable.


  From the combine dump I can see that it tried to combine up to:
(set (reg/i:SI 0 x0)
  (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
  (reg:SI 73 [ D.2564 ]))
  (reg:SI 84 [ D.2565 ])))

And did it find a match for this?   What happens if (just for testing
purposes), you create a pattern for this?  Does combine then try
something even more complex, possibly getting your conditional negation?


I managed to get combine to recognise this pattern:
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 
"register_operand" "r"))

   (match_operand:GPI 2 "register_operand" "r"))
  (match_dup 1)))

Now what I need is for operand 1 to instead be a cc_reg comparison, but 
when I change operand 1 to this pattern:


(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 
"aarch64_comparison_operator"
[(match_operand:CC 2 "cc_register" "") 
(const_int 0)]))

   (match_operand:GPI 3 "register_operand" "r"))
  (match_dup 1)))

This doesn't match. Is there any way to express this in a combineable 
pattern?


Thanks,
Kyrill






On the other hand, I did manage to write a peephole2 that detected the
sequence of compare+neg+xor+plus and transformed it into the
if_then_else form that our current conditional negation pattern has,
although I'm not sure how flexible this is.

Probably not very.  We really should be looking at combine.  In fact, I
would argue that we should be looking at combine regardless of whether
or not we twiddle expansion as humans or machine generated code could
look like this...

jeff







Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Kyrill Tkachov


On 12/08/14 16:11, Kyrill Tkachov wrote:

On 12/08/14 15:22, Jeff Law wrote:

On 08/12/14 04:31, Kyrill Tkachov wrote:

On 12/08/14 10:39, Richard Biener wrote:

On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:

On 08/11/14 07:41, Kyrill Tkachov wrote:

I haven't been able to get combine to match the comparison+xor+neg+plus
RTL and it seems like it would be just a workaround to undo the
tree-level transformation.

Yea, it'd just be a workaround, but it's probably the easiest way to
deal
with this problem.  Can you describe in further detail why you
weren't able
to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.

My guess was too many insns as well..  But that's often solvable.


   From the combine dump I can see that it tried to combine up to:
(set (reg/i:SI 0 x0)
   (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
   (reg:SI 73 [ D.2564 ]))
   (reg:SI 84 [ D.2565 ])))

And did it find a match for this?   What happens if (just for testing
purposes), you create a pattern for this?  Does combine then try
something even more complex, possibly getting your conditional negation?

I managed to get combine to recognise this pattern:
(set (match_operand:GPI 0 "register_operand" "=r")
  (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1
"register_operand" "r"))
 (match_operand:GPI 2 "register_operand" "r"))
(match_dup 1)))

Now what I need is for operand 1 to instead be a cc_reg comparison, but
when I change operand 1 to this pattern:

(set (match_operand:GPI 0 "register_operand" "=r")
  (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1
"aarch64_comparison_operator"
  [(match_operand:CC 2 "cc_register" "")
(const_int 0)]))
 (match_operand:GPI 3 "register_operand" "r"))
(match_dup 1)))

This doesn't match. Is there any way to express this in a combineable
pattern?


argh, Thunderbird enforced the 80 character limit...

The pattern that matched was:
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (xor:GPI
(neg:GPI (match_operand:GPI 1 "register_operand" "r"))
(match_operand:GPI 2 "register_operand" "r"))
  (match_dup 1)))


And the one that failed to combine (with operand 1 substituted for a 
cc_reg comparison) was:


(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (xor:GPI
(neg:GPI
  (match_operator:GPI 1 "aarch64_comparison_operator"
[(match_operand:CC_ZERO 2 "cc_register" "")
   (const_int 0)]))
  (match_operand:GPI 3 "register_operand" "r"))
  (match_dup 1)))



Thanks,
Kyrill



On the other hand, I did manage to write a peephole2 that detected the
sequence of compare+neg+xor+plus and transformed it into the
if_then_else form that our current conditional negation pattern has,
although I'm not sure how flexible this is.

Probably not very.  We really should be looking at combine.  In fact, I
would argue that we should be looking at combine regardless of whether
or not we twiddle expansion as humans or machine generated code could
look like this...

jeff










Re: current NEON status

2014-08-12 Thread Marat Zakirov


On 08/12/2014 12:20 PM, Richard Earnshaw wrote:

On 12/08/14 07:49, Andrew Pinski wrote:

On Mon, Aug 11, 2014 at 11:44 PM, Marat Zakirov  wrote:

Hi Vladimir!

I think you are as the main IRA contributor would be appropriate person to
answer question bellow. Please confirm or refute my statement about
unsplittable register ranges in GCC IRA.


On 07/30/2014 05:38 PM, Marat Zakirov wrote:

Hi there!

My question came from bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725. I found that GCC
considers NEON register ranges as unsplittable. So any subregister may be
used only after whole chunk is dead. This issue leads to redundant spill
fills which is performance trouble.

Example 1: RAL trouble

#include 
#include 

extern  uint16x8x4_t m0;
extern  uint16x8x4_t m1;
extern  uint16x8x4_t m2;
extern  uint16x8x4_t m3;
extern  uint16x8_t   m4;

void foo1(uint16_t * in_ptr)
{
  uint16x8x4_t t0, t1, t2, t3;
  t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]);
  t1 = vld4q_u16((uint16_t *)&in_ptr[64]);
  t2 = vld4q_u16((uint16_t *)&in_ptr[128]);
  t3 = vld4q_u16((uint16_t *)&in_ptr[192]);
  m4 = t0.val[3];
  m4 = m4 * 3;   <<< *
  t0.val[3] = t1.val[3];
  m0 = t3;
  m1 = t2;
  m2 = t1;
  m3 = t0;
}

Here test uses all NEON registers. No spill is needed. Because
multiplication requires one Q register which may be obtained from dead
t0.val[3] subregister. But GCC makes spill if multiplication (*) exists
because of issue described above.

Example 2: CSE makes trouble for IRA

#include 
#include 

extern  uint16x8x4_t m0;
extern  uint16x8x4_t m1;

void foo2(uint16_t * in_ptr)
{
  uint16x8x4_t t0, t1;
  t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]);
  t1 = vld4q_u16((uint16_t *)&in_ptr[64]);
  t0.val[0] *= 333;
  t0.val[1] *= 333;
  t0.val[2] *= 333;
  t0.val[3] *= 333;
  t1.val[0] *= 333;
  t1.val[1] *= 333;
  t1.val[2] *= 333;
  t1.val[3] *= 333;
  m0 = t0;
  m1 = t1;
}

Here test uses only half NEON + one Q for '333' factor. But GCC makes
spills here too! Briefly speak problem is in partial CSE. GCC generates rtl
with the listed bellow form:

Before CSE:

a = b
a0 = a0 * 3
a1 = a1 * 3
a2 = a2 * 3
a3 = a3 * 3

After:

a = b
a0 = b0 * 3
a1 = a1 * 3 <<< *
a2 = a2 * 3
a3 = a3 * 3

CSE do not substitute b1 to a1 because at the moment (*) a0 was already
defined so actually a != b. Yes but a1 = b1, unfortunately CSE also do not
handle register-ranges parts as RA does. Strange thing here is that even if
we fix CSE, so CSE could propagate register-ranges subregs, this will make
trouble to RAL also because of the same reason: IRA do not handle precisely
register ranges parts. I attached a demo patch which forbids partial CSE
propagation and removes spills from Ex2. Is this patch OK? Or maybe CSE
should be fixed in a different way? Or maybe partial substitution is OK?

Main question: Are there any plans to fix/upgrade IRA?

--Marat



This patch is wrong and even more wrong.  XImode is not defined in all targets.


Even if it were, this still wouldn't be the right fix.  What if a
machine had a native XImode?  Then you'd be arbitrarily disabling parts
of the compiler.

R.



Maybe the better fix is to have lower subreg come along and split up
the moves for a = b and then a pass after reload comes along and
stitches it back together.

Thanks,
Andrew



Hi all,

First, thank you to pay attention for issue which is totally untrivial.

I wrote above that my patch is only a demo which illustrates issue with 
register ranges. My question is about conception not implementation. 
This demo patch switch off CSE partial substitution allowing next 
compiler passes substitute whole b and then throw away a at all which 
dramatically decreases register pressure in the example above. It is 
strange that disabling some compiler functionality gives you a better 
result. Anyway I believe that only way to really fix the issue - is to 
upgrade IRA pass. Upgraded version of IRA should work with parts of 
range register as they are separate ones, they should have separate Live 
Ranges, etc.


Could I propose appropriate version of my patch (which will do the same 
conceptually - forbidding partial CSE)? If not what would be the proper 
way to improve the code in my case?


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Kyrill Tkachov


On 12/08/14 16:16, Kyrill Tkachov wrote:

On 12/08/14 16:11, Kyrill Tkachov wrote:

On 12/08/14 15:22, Jeff Law wrote:

On 08/12/14 04:31, Kyrill Tkachov wrote:

On 12/08/14 10:39, Richard Biener wrote:

On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:

On 08/11/14 07:41, Kyrill Tkachov wrote:

I haven't been able to get combine to match the comparison+xor+neg+plus
RTL and it seems like it would be just a workaround to undo the
tree-level transformation.

Yea, it'd just be a workaround, but it's probably the easiest way to
deal
with this problem.  Can you describe in further detail why you
weren't able
to get this to work?

Too many instructions to combine I guess.  You might want to add
intermediate "combine" insn-and-splits.  If that's still a no-go then
read on.

My guess was too many insns as well..  But that's often solvable.


From the combine dump I can see that it tried to combine up to:
(set (reg/i:SI 0 x0)
(plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
(reg:SI 73 [ D.2564 ]))
(reg:SI 84 [ D.2565 ])))

And did it find a match for this?   What happens if (just for testing
purposes), you create a pattern for this?  Does combine then try
something even more complex, possibly getting your conditional negation?

I managed to get combine to recognise this pattern:
(set (match_operand:GPI 0 "register_operand" "=r")
   (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1
"register_operand" "r"))
  (match_operand:GPI 2 "register_operand" "r"))
 (match_dup 1)))

Now what I need is for operand 1 to instead be a cc_reg comparison, but
when I change operand 1 to this pattern:

(set (match_operand:GPI 0 "register_operand" "=r")
   (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1
"aarch64_comparison_operator"
   [(match_operand:CC 2 "cc_register" "")
(const_int 0)]))
  (match_operand:GPI 3 "register_operand" "r"))
 (match_dup 1)))

This doesn't match. Is there any way to express this in a combineable
pattern?

argh, Thunderbird enforced the 80 character limit...

The pattern that matched was:
(set (match_operand:GPI 0 "register_operand" "=r")
  (plus:GPI
(xor:GPI
  (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
  (match_operand:GPI 2 "register_operand" "r"))
(match_dup 1)))


If I match that to a define_and_split and split it into two sets:
(set (reg1) (neg reg2))
(set (plus (xor (reg1) (reg3))
(reg2)))

and then add a define_insn with the full pattern:

(set (match_operand:GPI 0 "register_operand" "=r")
 (plus:GPI
   (xor:GPI
 (neg:GPI
   (match_operator:GPI 1 "aarch64_comparison_operator"
 [(match_operand:CC 2 "cc_register" "")
(const_int 0)]))
   (match_operand:GPI 3 "register_operand" "r"))
   (match_dup 1)))

Then this does manage to match the full thing that I want. But I had to 
add another define_split to break up the plus+xor Frankenstein's monster 
back into two separate insns for the cases where it picks up 
non-conditional-negate patterns.


Does that seem reasonable or too much of a hack? Any plus+xor rtxs that 
get left after combine should be split up again relatively quickly in 
split1 and shouldn't inhibit further optimisation too badly, no?



Kyrill


And the one that failed to combine (with operand 1 substituted for a
cc_reg comparison) was:

(set (match_operand:GPI 0 "register_operand" "=r")
  (plus:GPI
(xor:GPI
  (neg:GPI
(match_operator:GPI 1 "aarch64_comparison_operator"
  [(match_operand:CC_ZERO 2 "cc_register" "")
 (const_int 0)]))
(match_operand:GPI 3 "register_operand" "r"))
(match_dup 1)))


Thanks,
Kyrill



On the other hand, I did manage to write a peephole2 that detected the
sequence of compare+neg+xor+plus and transformed it into the
if_then_else form that our current conditional negation pattern has,
although I'm not sure how flexible this is.

Probably not very.  We really should be looking at combine.  In fact, I
would argue that we should be looking at combine regardless of whether
or not we twiddle expansion as humans or machine generated code could
look like this...

jeff












Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Segher Boessenkool
On Tue, Aug 12, 2014 at 04:16:34PM +0100, Kyrill Tkachov wrote:
> >I managed to get combine to recognise this pattern:
> >(set (match_operand:GPI 0 "register_operand" "=r")
> >  (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1
> >"register_operand" "r"))
> > (match_operand:GPI 2 "register_operand" "r"))
> >(match_dup 1)))
> >
> >Now what I need is for operand 1 to instead be a cc_reg comparison,

You can add
  "nonzero_bits (operands[1], mode) == 1"
to this pattern's condition.


Segher


survey on C++ exception handling

2014-08-12 Thread Rodrigo Bonifacio
Dear all, I am conducting a survey about the use of exception handling
constructs in C++. I would really appreciate if you could contribute
to this research by answering a few questions on the subject.

The survey is available on-line:

https://pt.surveymonkey.com/s/exceptionHandling

All the best,

Rodrigo.