Re: current NEON status
On 12/08/14 07:49, Andrew Pinski wrote: > On Mon, Aug 11, 2014 at 11:44 PM, Marat Zakirov wrote: >> Hi Vladimir! >> >> I think you are as the main IRA contributor would be appropriate person to >> answer question bellow. Please confirm or refute my statement about >> unsplittable register ranges in GCC IRA. >> >> >> On 07/30/2014 05:38 PM, Marat Zakirov wrote: >>> >>> Hi there! >>> >>> My question came from bug >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725. I found that GCC >>> considers NEON register ranges as unsplittable. So any subregister may be >>> used only after whole chunk is dead. This issue leads to redundant spill >>> fills which is performance trouble. >>> >>> Example 1: RAL trouble >>> >>> #include >>> #include >>> >>> extern uint16x8x4_t m0; >>> extern uint16x8x4_t m1; >>> extern uint16x8x4_t m2; >>> extern uint16x8x4_t m3; >>> extern uint16x8_t m4; >>> >>> void foo1(uint16_t * in_ptr) >>> { >>> uint16x8x4_t t0, t1, t2, t3; >>> t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]); >>> t1 = vld4q_u16((uint16_t *)&in_ptr[64]); >>> t2 = vld4q_u16((uint16_t *)&in_ptr[128]); >>> t3 = vld4q_u16((uint16_t *)&in_ptr[192]); >>> m4 = t0.val[3]; >>> m4 = m4 * 3; <<< * >>> t0.val[3] = t1.val[3]; >>> m0 = t3; >>> m1 = t2; >>> m2 = t1; >>> m3 = t0; >>> } >>> >>> Here test uses all NEON registers. No spill is needed. Because >>> multiplication requires one Q register which may be obtained from dead >>> t0.val[3] subregister. But GCC makes spill if multiplication (*) exists >>> because of issue described above. >>> >>> Example 2: CSE makes trouble for IRA >>> >>> #include >>> #include >>> >>> extern uint16x8x4_t m0; >>> extern uint16x8x4_t m1; >>> >>> void foo2(uint16_t * in_ptr) >>> { >>> uint16x8x4_t t0, t1; >>> t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]); >>> t1 = vld4q_u16((uint16_t *)&in_ptr[64]); >>> t0.val[0] *= 333; >>> t0.val[1] *= 333; >>> t0.val[2] *= 333; >>> t0.val[3] *= 333; >>> t1.val[0] *= 333; >>> t1.val[1] *= 333; >>> t1.val[2] *= 333; >>> t1.val[3] *= 333; >>> m0 = t0; >>> m1 = t1; >>> } >>> >>> Here test uses only half NEON + one Q for '333' factor. But GCC makes >>> spills here too! Briefly speak problem is in partial CSE. GCC generates rtl >>> with the listed bellow form: >>> >>> Before CSE: >>> >>> a = b >>> a0 = a0 * 3 >>> a1 = a1 * 3 >>> a2 = a2 * 3 >>> a3 = a3 * 3 >>> >>> After: >>> >>> a = b >>> a0 = b0 * 3 >>> a1 = a1 * 3 <<< * >>> a2 = a2 * 3 >>> a3 = a3 * 3 >>> >>> CSE do not substitute b1 to a1 because at the moment (*) a0 was already >>> defined so actually a != b. Yes but a1 = b1, unfortunately CSE also do not >>> handle register-ranges parts as RA does. Strange thing here is that even if >>> we fix CSE, so CSE could propagate register-ranges subregs, this will make >>> trouble to RAL also because of the same reason: IRA do not handle precisely >>> register ranges parts. I attached a demo patch which forbids partial CSE >>> propagation and removes spills from Ex2. Is this patch OK? Or maybe CSE >>> should be fixed in a different way? Or maybe partial substitution is OK? >>> >>> Main question: Are there any plans to fix/upgrade IRA? >>> >>> --Marat >> >> >> >> gcc/ChangeLog: >> >> 2014-07-30 Marat Zakirov >> >> * cse.c (canon_reg): Forbid partial CSE. >> * fwprop.c (forward_propagate_and_simplify): Likewise. >> >> diff --git a/gcc/cse.c b/gcc/cse.c >> index 34f9364..a9e0442 100644 >> --- a/gcc/cse.c >> +++ b/gcc/cse.c >> @@ -2862,6 +2862,9 @@ canon_reg (rtx x, rtx insn) >> || ! REGNO_QTY_VALID_P (REGNO (x))) >> return x; >> >> +if (GET_MODE (x) == XImode) >> + return x; > > This patch is wrong and even more wrong. XImode is not defined in all > targets. > Even if it were, this still wouldn't be the right fix. What if a machine had a native XImode? Then you'd be arbitrarily disabling parts of the compiler. R. > Maybe the better fix is to have lower subreg come along and split up > the moves for a = b and then a pass after reload comes along and > stitches it back together. > > Thanks, > Andrew > > >> + >> q = REG_QTY (REGNO (x)); >> ent = &qty_table[q]; >> first = ent->first_reg; >> diff --git a/gcc/fwprop.c b/gcc/fwprop.c >> index 547fcd6..eadc729 100644 >> --- a/gcc/fwprop.c >> +++ b/gcc/fwprop.c >> @@ -1317,6 +1317,9 @@ forward_propagate_and_simplify (df_ref use, rtx >> def_insn, rtx def_set) >>if (!new_rtx) >> return false; >> >> + if (GET_MODE (reg) == XImode) >> +return false; >> + >>return try_fwprop_subst (use, loc, new_rtx, def_insn, set_reg_equal); >> } >> >> >
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: > On 08/11/14 07:41, Kyrill Tkachov wrote: >> >> >> I haven't been able to get combine to match the comparison+xor+neg+plus >> RTL and it seems like it would be just a workaround to undo the >> tree-level transformation. > > Yea, it'd just be a workaround, but it's probably the easiest way to deal > with this problem. Can you describe in further detail why you weren't able > to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. OTOH a suitable place to "undo" would be smarter RTL expansion that detects this pattern (and hope for TER to still apply - thus no CSE opportunities going in your way). For your testcase TER allows r_5 replace with --> r_5 = (int) _4; _8 replace with --> _8 = _4 != 0; _10 replace with --> _10 = -_9; _11 replace with --> _11 = _10 ^ r_5; _12 replace with --> _12 = _11 + _9; which unfortunately already "breaks" because of the multi-use _9 and _4. Now - the way out here is a GIMPLE pass right before expansion that performs pattern detection and generates target specific code suitable for optimal expand. Fortunately we already have that with pass_optimize_widening_mul (which does widen-mul and fma detection). This is the place where you'd deal with such pattern, generating a suitable compound operation (or two, dependent on expansion and insn pattern details). This of course usually requires new tree codes or internal functions. For more arcane stuff like your conditional negate I'd prefer an internal function. Of course you then have to add an optab or a target hook to do custom internal function expansion. I suppose for internal functions a target hook would be nicer than a generic optab? Thanks, Richard. > >> >> What is the most acceptable way of disabling this transformation for a >> target that has a conditional negation instruction? > > In general, we don't want target dependencies in the gimple/ssa optimizers. > > Jeff
Re: writing patterns
On Tue, Aug 12, 2014 at 12:47 AM, Prathamesh Kulkarni wrote: > On Mon, Aug 11, 2014 at 5:52 PM, Richard Biener > wrote: >> On Mon, Aug 11, 2014 at 1:06 AM, Prathamesh Kulkarni >> wrote: >>> On Mon, Aug 11, 2014 at 2:53 AM, Prathamesh Kulkarni >>> wrote: On Tue, Aug 5, 2014 at 3:15 PM, Richard Biener wrote: > On Mon, Aug 4, 2014 at 4:44 PM, Prathamesh Kulkarni > wrote: >> On Mon, Aug 4, 2014 at 2:59 PM, Richard Biener > No, not changing tree.def but its use as you did in your first proposed > patch. I have got it done except for capturing optional converts. for instance (negate (convert1@0 @1)) does not work. This is because of insertion in decision tree (insert::operand). >> >> That's because you lower it to >> >> (negate @0@1) >> >> right? a capture with ->what being a capture itself? That's reasonable >> btw. >> Since captures were always predicate or expressions, ie, the corresponding dt-node->type was dt_node::DT_OPERAND, i could hard-code for a search on dt_node::DT_OPERAND. This changes when capture itself can be captured (dt_node::DT_TRUE/DT_MATCH), and there's no easy way right now to obtain corresponding dt-node from a given operand node. Should we keep a mapping (hash_map/std::map) from operand * to dt_node * ? That will also get rid of dt_node::parent (but we need to then introduce parent in AST). Other cases where convert1, convert2 are not captured work fine. (bit_not (convert1 @0) (convert2 @1)) I have attached patch for reference (don't intend to get it committed). >>> >>> I have fixed it in a hackish way in the current patch, so capturing >>> optional convert works.. >>> Since we don't know type corresponding to capture in decision tree >>> :DT_TRUE/DT_MATCH (unless >>> we get hold of dt-node corresponding to the capture), I tried both the >>> possibilities. >> >> That looks reasonable to me though I manage to hit the assert with >> >> (simplify >>(plus (convert1@1 (plus@2 INTEGER_CST_P INTEGER_CST_P)) @3) >>(if (useless_type_conversion_p (TREE_TYPE (@1), TREE_TYPE (@2 >>@2) >> >> when ->what is an expression. > oops, this leads to multiple captures -:) > in this case we have 2-captures: @1 capturing @2 capturing inner plus > expression. > we could have n-captures by introducing arbitrary number of continuous > convert1. > for instance: > (plus (convert1@1 (convert1@2 (plus@3 INTEGER_CST_P INTEGER_CST_P))) @4) > which is a 3-captures (@1:@2:@3:plus) > > This patch attempts to handle multiple number of captures. > > * genmatch.c (decision_tree::insert_operand): Adjust to allow multiple > capture. Thanks - applied. > Thanks, > Prathamesh > >> >> I'd also like the various compares against CONVERT1/2 to be not done >> as string compares but using enum tree_code CONVERT1 >> equality checks. >> >> I have fixed that (but not the above insertion issue) and committed the >> patch. >> >> Note that I'd still like the syntax to include a question mark, thus >> convert?, convert1? or convert2? should be conditional converts. >> Requires a parser hack though. > hmm, but since we use convert1, convert2 explicitly to denote > conditional convert, > so probably no need for a marker like '?' ? Well, a '?' is so much clearer than a '1' - and being able to write convert? is nicer than convert1 (so we need convert1 and convert2 only to form groups). Well, I give it a quick try in a second ;) Thanks, Richard. > Thanks, > Prathamesh >> >> Thanks, >> Richard. >> >>> * genmatch.c (enum tree_code): Add new values CONVERT1, CONVERT2. >>> (lower_opt_convert): New function. >>> (lower_opt_convert): New function overloaded. >>> (lower_opt_convert): Likewise. >>> (lower_opt_convert): Likewise. >>> (remove_opt_convert): New function. >>> (has_opt_convert): Likewise. >>> (decision_tree::insert_operand): Adjust for capturing a capture. >>> (parse_for): Reject CONVERT1, CONVERT2. >>> (main): call add_operator. >>>call lower_opt_convert. >>> >>> Thanks, >>> Prathamesh Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >>> >>> Richard. >>> * genmatch.c (enum tree_code): New value OPT_CONVERT. (e_operation): New member unsigned group_index. (e_operation::e_operation): Add new default argument. Adjust to changes in e_operation. (remove_opt_convert): New function. (has_opt_convert): Likewise. (lower_opt_convert): Likewise. (lower_opt_convert): New overloaded function. (lower_opt_convert): Likewise. (parse_expr): Adjust to parse opt_convert. (parse_for): Reject opt_convert in opers. (main): Call lower_opt_convert. Thanks,
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 12/08/14 10:39, Richard Biener wrote: On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. Yea, it'd just be a workaround, but it's probably the easiest way to deal with this problem. Can you describe in further detail why you weren't able to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. From the combine dump I can see that it tried to combine up to: (set (reg/i:SI 0 x0) (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) (reg:SI 73 [ D.2564 ])) (reg:SI 84 [ D.2565 ]))) What I need is for that reg 84 to be the result of the comparison, something like: (ne (cc_reg) (const_int 0)) which I couldn't get combine to shove in there. On the other hand, I did manage to write a peephole2 that detected the sequence of compare+neg+xor+plus and transformed it into the if_then_else form that our current conditional negation pattern has, although I'm not sure how flexible this is. OTOH a suitable place to "undo" would be smarter RTL expansion that detects this pattern (and hope for TER to still apply - thus no CSE opportunities going in your way). For your testcase TER allows r_5 replace with --> r_5 = (int) _4; _8 replace with --> _8 = _4 != 0; _10 replace with --> _10 = -_9; _11 replace with --> _11 = _10 ^ r_5; _12 replace with --> _12 = _11 + _9; which unfortunately already "breaks" because of the multi-use _9 and _4. Now - the way out here is a GIMPLE pass right before expansion that performs pattern detection and generates target specific code suitable for optimal expand. Fortunately we already have that with pass_optimize_widening_mul (which does widen-mul and fma detection). Maybe we can piggyback on this pass then? (probably renaming it to something more generic in the process...) This is the place where you'd deal with such pattern, generating a suitable compound operation (or two, dependent on expansion and insn pattern details). This of course usually requires new tree codes or internal functions. For more arcane stuff like your conditional negate I'd prefer an internal function. What is an internal function in this context? Is that like a target-specific builtin? Of course you then have to add an optab or a target hook to do custom internal function expansion. I suppose for internal functions a target hook would be nicer than a generic optab? So something like TARGET_EXPAND_CONDITIONAL_NEGATION that gets called from the aforementioned GIMPLE pass when it finds a an appropriate sequence of compare+neg+xor+plus ? Thanks for the pointers, Kyrill Thanks, Richard. What is the most acceptable way of disabling this transformation for a target that has a conditional negation instruction? In general, we don't want target dependencies in the gimple/ssa optimizers. Jeff
Re: Could you please clarify about GCC optimizations?
Got it. Thank you. On Fri, Aug 8, 2014 at 11:07 PM, Jeff Law wrote: > On 08/08/14 06:18, Evgeniya Maenkova wrote: >> >> As far as I know, there are so many configurations (frontends x >> backends x applications(benchmarks) x etc), that the same optimization >> could improve performance in one configuration and degrade at other >> conditions. > > Correct. > > >> What performance tests do you perform before including an optimization >> in GCC? How do you aggregate the results (taking into account >> different behavior on different frontend/backend/benchmarks)? >> Excuse me, if some information is available in GCC documentation, >> didn’t found so far. > > Each developer makes their own determination as to what performance tests > are appropriate to run and on what platforms to run those tests. Some rely > largely on SPEC, others utilize large desktop applications such as firefox > and others are more focused on EEMBC, etc. It really depends on each > developer's focus. > > In general optimizations on GIMPLE/SSA are in large designed to eliminate as > much redundancy as possible independent of the target processor. There are > exceptions, but as a guiding principle that is correct. > > When GIMPLE is lowered to RTL, the expanders query the backend for a > information to guide lowering to RTL in a target dependent way. Similarly > the RTL optimizers are designed to query the backend for information to > guide low level aspects of code generation and optimization. > > When optimizations are submitted for inclusion, there's a review process > where the code reviewers may ask questions or ask for further benchmarks, > etc. The reviewers also use their experience to guide submissions in the > right direction. > > So there's no single simple answer. It varies based on many factors. > > jeff -- Thanks, Evgeniya perfstories.wordpress.com
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Mon, 11 Aug 2014, Kyrill Tkachov wrote: The aarch64 target has a conditional negation instruction CSNEG Rd, Rs1, Rs2, cond with semantics Rd = if cond then Rs1 else -Rs2. This, however doesn't get end up getting matched for code such as: int foo2 (unsigned a, unsigned b) { int r = 0; r = a & b; if (a & b) return -r; return r; } Note that in this particular case, we should just return -(a&b) like llvm does. -- Marc Glisse
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Tue, Aug 12, 2014 at 12:31 PM, Kyrill Tkachov wrote: > > On 12/08/14 10:39, Richard Biener wrote: >> >> On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: >>> >>> On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. >>> >>> Yea, it'd just be a workaround, but it's probably the easiest way to deal >>> with this problem. Can you describe in further detail why you weren't >>> able >>> to get this to work? >> >> Too many instructions to combine I guess. You might want to add >> intermediate "combine" insn-and-splits. If that's still a no-go then >> read on. > > > From the combine dump I can see that it tried to combine up to: > (set (reg/i:SI 0 x0) > (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) > (reg:SI 73 [ D.2564 ])) > (reg:SI 84 [ D.2565 ]))) > > What I need is for that reg 84 to be the result of the comparison, something > like: > (ne (cc_reg) (const_int 0)) which I couldn't get combine to shove in there. > > > On the other hand, I did manage to write a peephole2 that detected the > sequence of compare+neg+xor+plus and transformed it into the if_then_else > form that our current conditional negation pattern has, although I'm not > sure how flexible this is. > > >> >> OTOH a suitable place to "undo" would be smarter RTL expansion >> that detects this pattern (and hope for TER to still apply - thus >> no CSE opportunities going in your way). For your testcase TER >> allows >> >> r_5 replace with --> r_5 = (int) _4; >> >> _8 replace with --> _8 = _4 != 0; >> >> _10 replace with --> _10 = -_9; >> >> _11 replace with --> _11 = _10 ^ r_5; >> >> _12 replace with --> _12 = _11 + _9; >> >> which unfortunately already "breaks" because of the multi-use _9 >> and _4. >> >> Now - the way out here is a GIMPLE pass right before expansion >> that performs pattern detection and generates target specific code >> suitable >> for optimal expand. Fortunately we already have that with >> pass_optimize_widening_mul (which does widen-mul and fma detection). > > > Maybe we can piggyback on this pass then? (probably renaming it to something > more generic in the process...) > > >> >> This is the place where you'd deal with such pattern, generating >> a suitable compound operation (or two, dependent on expansion >> and insn pattern details). This of course usually requires new >> tree codes or internal functions. For more arcane stuff like >> your conditional negate I'd prefer an internal function. > > > What is an internal function in this context? Is that like a target-specific > builtin? > > >> >> Of course you then have to add an optab or a target hook to >> do custom internal function expansion. I suppose for internal >> functions a target hook would be nicer than a generic optab? > > > So something like TARGET_EXPAND_CONDITIONAL_NEGATION that gets called from > the aforementioned GIMPLE pass when it finds a an appropriate sequence of > compare+neg+xor+plus ? More like TARGET_CAN_EXPAND_IFN with the new GIMPLE internal-fn and a TARGET_EXPAND_IFN to actually perform the expansion. Richard. > Thanks for the pointers, > Kyrill > > >> >> Thanks, >> Richard. >> >> >> What is the most acceptable way of disabling this transformation for a target that has a conditional negation instruction? >>> >>> In general, we don't want target dependencies in the gimple/ssa >>> optimizers. >>> >>> Jeff > > >
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 08/12/14 04:31, Kyrill Tkachov wrote: On 12/08/14 10:39, Richard Biener wrote: On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. Yea, it'd just be a workaround, but it's probably the easiest way to deal with this problem. Can you describe in further detail why you weren't able to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. My guess was too many insns as well.. But that's often solvable. From the combine dump I can see that it tried to combine up to: (set (reg/i:SI 0 x0) (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) (reg:SI 73 [ D.2564 ])) (reg:SI 84 [ D.2565 ]))) And did it find a match for this? What happens if (just for testing purposes), you create a pattern for this? Does combine then try something even more complex, possibly getting your conditional negation? On the other hand, I did manage to write a peephole2 that detected the sequence of compare+neg+xor+plus and transformed it into the if_then_else form that our current conditional negation pattern has, although I'm not sure how flexible this is. Probably not very. We really should be looking at combine. In fact, I would argue that we should be looking at combine regardless of whether or not we twiddle expansion as humans or machine generated code could look like this... jeff
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 12/08/14 15:22, Jeff Law wrote: On 08/12/14 04:31, Kyrill Tkachov wrote: On 12/08/14 10:39, Richard Biener wrote: On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. Yea, it'd just be a workaround, but it's probably the easiest way to deal with this problem. Can you describe in further detail why you weren't able to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. My guess was too many insns as well.. But that's often solvable. From the combine dump I can see that it tried to combine up to: (set (reg/i:SI 0 x0) (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) (reg:SI 73 [ D.2564 ])) (reg:SI 84 [ D.2565 ]))) And did it find a match for this? What happens if (just for testing purposes), you create a pattern for this? Does combine then try something even more complex, possibly getting your conditional negation? I managed to get combine to recognise this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (match_operand:GPI 2 "register_operand" "r")) (match_dup 1))) Now what I need is for operand 1 to instead be a cc_reg comparison, but when I change operand 1 to this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) This doesn't match. Is there any way to express this in a combineable pattern? Thanks, Kyrill On the other hand, I did manage to write a peephole2 that detected the sequence of compare+neg+xor+plus and transformed it into the if_then_else form that our current conditional negation pattern has, although I'm not sure how flexible this is. Probably not very. We really should be looking at combine. In fact, I would argue that we should be looking at combine regardless of whether or not we twiddle expansion as humans or machine generated code could look like this... jeff
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 12/08/14 16:11, Kyrill Tkachov wrote: On 12/08/14 15:22, Jeff Law wrote: On 08/12/14 04:31, Kyrill Tkachov wrote: On 12/08/14 10:39, Richard Biener wrote: On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. Yea, it'd just be a workaround, but it's probably the easiest way to deal with this problem. Can you describe in further detail why you weren't able to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. My guess was too many insns as well.. But that's often solvable. From the combine dump I can see that it tried to combine up to: (set (reg/i:SI 0 x0) (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) (reg:SI 73 [ D.2564 ])) (reg:SI 84 [ D.2565 ]))) And did it find a match for this? What happens if (just for testing purposes), you create a pattern for this? Does combine then try something even more complex, possibly getting your conditional negation? I managed to get combine to recognise this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (match_operand:GPI 2 "register_operand" "r")) (match_dup 1))) Now what I need is for operand 1 to instead be a cc_reg comparison, but when I change operand 1 to this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) This doesn't match. Is there any way to express this in a combineable pattern? argh, Thunderbird enforced the 80 character limit... The pattern that matched was: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (match_operand:GPI 2 "register_operand" "r")) (match_dup 1))) And the one that failed to combine (with operand 1 substituted for a cc_reg comparison) was: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC_ZERO 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) Thanks, Kyrill On the other hand, I did manage to write a peephole2 that detected the sequence of compare+neg+xor+plus and transformed it into the if_then_else form that our current conditional negation pattern has, although I'm not sure how flexible this is. Probably not very. We really should be looking at combine. In fact, I would argue that we should be looking at combine regardless of whether or not we twiddle expansion as humans or machine generated code could look like this... jeff
Re: current NEON status
On 08/12/2014 12:20 PM, Richard Earnshaw wrote: On 12/08/14 07:49, Andrew Pinski wrote: On Mon, Aug 11, 2014 at 11:44 PM, Marat Zakirov wrote: Hi Vladimir! I think you are as the main IRA contributor would be appropriate person to answer question bellow. Please confirm or refute my statement about unsplittable register ranges in GCC IRA. On 07/30/2014 05:38 PM, Marat Zakirov wrote: Hi there! My question came from bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725. I found that GCC considers NEON register ranges as unsplittable. So any subregister may be used only after whole chunk is dead. This issue leads to redundant spill fills which is performance trouble. Example 1: RAL trouble #include #include extern uint16x8x4_t m0; extern uint16x8x4_t m1; extern uint16x8x4_t m2; extern uint16x8x4_t m3; extern uint16x8_t m4; void foo1(uint16_t * in_ptr) { uint16x8x4_t t0, t1, t2, t3; t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]); t1 = vld4q_u16((uint16_t *)&in_ptr[64]); t2 = vld4q_u16((uint16_t *)&in_ptr[128]); t3 = vld4q_u16((uint16_t *)&in_ptr[192]); m4 = t0.val[3]; m4 = m4 * 3; <<< * t0.val[3] = t1.val[3]; m0 = t3; m1 = t2; m2 = t1; m3 = t0; } Here test uses all NEON registers. No spill is needed. Because multiplication requires one Q register which may be obtained from dead t0.val[3] subregister. But GCC makes spill if multiplication (*) exists because of issue described above. Example 2: CSE makes trouble for IRA #include #include extern uint16x8x4_t m0; extern uint16x8x4_t m1; void foo2(uint16_t * in_ptr) { uint16x8x4_t t0, t1; t0 = vld4q_u16((uint16_t *)&in_ptr[0 ]); t1 = vld4q_u16((uint16_t *)&in_ptr[64]); t0.val[0] *= 333; t0.val[1] *= 333; t0.val[2] *= 333; t0.val[3] *= 333; t1.val[0] *= 333; t1.val[1] *= 333; t1.val[2] *= 333; t1.val[3] *= 333; m0 = t0; m1 = t1; } Here test uses only half NEON + one Q for '333' factor. But GCC makes spills here too! Briefly speak problem is in partial CSE. GCC generates rtl with the listed bellow form: Before CSE: a = b a0 = a0 * 3 a1 = a1 * 3 a2 = a2 * 3 a3 = a3 * 3 After: a = b a0 = b0 * 3 a1 = a1 * 3 <<< * a2 = a2 * 3 a3 = a3 * 3 CSE do not substitute b1 to a1 because at the moment (*) a0 was already defined so actually a != b. Yes but a1 = b1, unfortunately CSE also do not handle register-ranges parts as RA does. Strange thing here is that even if we fix CSE, so CSE could propagate register-ranges subregs, this will make trouble to RAL also because of the same reason: IRA do not handle precisely register ranges parts. I attached a demo patch which forbids partial CSE propagation and removes spills from Ex2. Is this patch OK? Or maybe CSE should be fixed in a different way? Or maybe partial substitution is OK? Main question: Are there any plans to fix/upgrade IRA? --Marat This patch is wrong and even more wrong. XImode is not defined in all targets. Even if it were, this still wouldn't be the right fix. What if a machine had a native XImode? Then you'd be arbitrarily disabling parts of the compiler. R. Maybe the better fix is to have lower subreg come along and split up the moves for a = b and then a pass after reload comes along and stitches it back together. Thanks, Andrew Hi all, First, thank you to pay attention for issue which is totally untrivial. I wrote above that my patch is only a demo which illustrates issue with register ranges. My question is about conception not implementation. This demo patch switch off CSE partial substitution allowing next compiler passes substitute whole b and then throw away a at all which dramatically decreases register pressure in the example above. It is strange that disabling some compiler functionality gives you a better result. Anyway I believe that only way to really fix the issue - is to upgrade IRA pass. Upgraded version of IRA should work with parts of range register as they are separate ones, they should have separate Live Ranges, etc. Could I propose appropriate version of my patch (which will do the same conceptually - forbidding partial CSE)? If not what would be the proper way to improve the code in my case?
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 12/08/14 16:16, Kyrill Tkachov wrote: On 12/08/14 16:11, Kyrill Tkachov wrote: On 12/08/14 15:22, Jeff Law wrote: On 08/12/14 04:31, Kyrill Tkachov wrote: On 12/08/14 10:39, Richard Biener wrote: On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law wrote: On 08/11/14 07:41, Kyrill Tkachov wrote: I haven't been able to get combine to match the comparison+xor+neg+plus RTL and it seems like it would be just a workaround to undo the tree-level transformation. Yea, it'd just be a workaround, but it's probably the easiest way to deal with this problem. Can you describe in further detail why you weren't able to get this to work? Too many instructions to combine I guess. You might want to add intermediate "combine" insn-and-splits. If that's still a no-go then read on. My guess was too many insns as well.. But that's often solvable. From the combine dump I can see that it tried to combine up to: (set (reg/i:SI 0 x0) (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ])) (reg:SI 73 [ D.2564 ])) (reg:SI 84 [ D.2565 ]))) And did it find a match for this? What happens if (just for testing purposes), you create a pattern for this? Does combine then try something even more complex, possibly getting your conditional negation? I managed to get combine to recognise this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (match_operand:GPI 2 "register_operand" "r")) (match_dup 1))) Now what I need is for operand 1 to instead be a cc_reg comparison, but when I change operand 1 to this pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) This doesn't match. Is there any way to express this in a combineable pattern? argh, Thunderbird enforced the 80 character limit... The pattern that matched was: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 "register_operand" "r")) (match_operand:GPI 2 "register_operand" "r")) (match_dup 1))) If I match that to a define_and_split and split it into two sets: (set (reg1) (neg reg2)) (set (plus (xor (reg1) (reg3)) (reg2))) and then add a define_insn with the full pattern: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) Then this does manage to match the full thing that I want. But I had to add another define_split to break up the plus+xor Frankenstein's monster back into two separate insns for the cases where it picks up non-conditional-negate patterns. Does that seem reasonable or too much of a hack? Any plus+xor rtxs that get left after combine should be split up again relatively quickly in split1 and shouldn't inhibit further optimisation too badly, no? Kyrill And the one that failed to combine (with operand 1 substituted for a cc_reg comparison) was: (set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1 "aarch64_comparison_operator" [(match_operand:CC_ZERO 2 "cc_register" "") (const_int 0)])) (match_operand:GPI 3 "register_operand" "r")) (match_dup 1))) Thanks, Kyrill On the other hand, I did manage to write a peephole2 that detected the sequence of compare+neg+xor+plus and transformed it into the if_then_else form that our current conditional negation pattern has, although I'm not sure how flexible this is. Probably not very. We really should be looking at combine. In fact, I would argue that we should be looking at combine regardless of whether or not we twiddle expansion as humans or machine generated code could look like this... jeff
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Tue, Aug 12, 2014 at 04:16:34PM +0100, Kyrill Tkachov wrote: > >I managed to get combine to recognise this pattern: > >(set (match_operand:GPI 0 "register_operand" "=r") > > (plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1 > >"register_operand" "r")) > > (match_operand:GPI 2 "register_operand" "r")) > >(match_dup 1))) > > > >Now what I need is for operand 1 to instead be a cc_reg comparison, You can add "nonzero_bits (operands[1], mode) == 1" to this pattern's condition. Segher
survey on C++ exception handling
Dear all, I am conducting a survey about the use of exception handling constructs in C++. I would really appreciate if you could contribute to this research by answering a few questions on the subject. The survey is available on-line: https://pt.surveymonkey.com/s/exceptionHandling All the best, Rodrigo.