Re: [GSoC] decision tree first steps

Richard Biener Fri, 06 Jun 2014 03:03:08 -0700

On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni
<bilbotheelffri...@gmail.com> wrote:
> On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni
>> <bilbotheelffri...@gmail.com> wrote:
>>> I have few questions regarding genmatch:
>>>
>>> a) Why is 4 hard-coded here: ?
>>> in write_nary_simplifiers:
>>>  fprintf (f, "      tree captures[4] = {};\n");
>>
>> Magic number (this must be big enough for all cases ...).  Honestly
>> this should be improved (but requires another scan over the matcher IL
>> to figure out the max N used in @N).
>>
>>> b) Should we add syntax for a symbol to denote multiple operators ?
>>> For exampleim in simplify_rotate:
>>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^  (CNT1 + CNT2 ==
>>> bitsize of type of X).
>>
>> Something to enhance the IL with, yes.  I'd say we support
>>
>> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR)
>>
>> thus,
>>
>> (define_op <operator-name> op...)
>>
>>> c) Remove for parsing capture in parse_expr since we reject outermost
>>> captured expressions ?
>>
>> but parse_expr is also used for inner expressions, no?
>>
>> (plus (minus@2 @0 @1) @3)
>>
>> should still work
>>
>>> d) I am not able to follow this comment in match.pd:
>>> /* Patterns required to avoid SCCVN testsuite regressions.  */
>>>
>>> /* (x >> 31) & 1 -> (x >> 31).  Folding in fold-const is more
>>>    complicated here, it does
>>>      Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>>>      (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>>      if the new mask might be further optimized.  */
>>> (match_and_simplify
>>>   (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep)
>>>   if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0)
>>>   @0)
>>
>> The comment is literally copied from the case I extracted the
>> (simplified) variant from fold-const.c.  See lines 11961-12056 in 
>> fold-const.c.
>> It'll be a challenge to implement the equivalent in a pattern ;)
>>
>>>
>>> Decision Tree.
>>>     I have tried to come up with a prototype for decision tree (patch 
>>> attached).
>>> For simplicity, it handles patterns involving only unary operators and
>>> no predicates
>>> and returns false when the pattern fails to match (no goto to match
>>> another pattern).
>>> I meant to post it only for illustration, and I have not really paid
>>> attention to code quality (bad formatting, memory leaks, etc.).
>>>
>>> * Basic Idea
>>> A pattern consists of following parts: match, ifexpr and result.
>>> Let's call <ifexpr, result> as "simplification" operand.
>>> The common prefix between different match operands would be represented
>>> by same nodes in the decision tree.
>>>
>>> Example:
>>> (negate (bit_not @0))
>>> S1
>>>
>>> (negate (negate @0))
>>> S2
>>>
>>> S1, S2 denote simplifications for the above patterns respectively.
>>>
>>> The decision tree would look something like
>>> (that's the way it gets constructed with the patch):
>>>
>>>                 dummy/root
>>>                         |
>>>            NEGATE_EXPR
>>>              /                  \
>>>      BIT_NOT           NEGATE_EXPR
>>>            |                         |
>>>          @0                     @0
>>>            |                         |
>>>          S1                      S2
>>>
>>> a) The children of an internal node are number of decisions that
>>> can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR.
>>> b) Simplification operand represents leaves of the decision tree
>>> c) Instead of having list of heads, I have added one dummy node,
>>> and heads become children of these dummy root node.
>>> d) Code-gen for non-simplification operands involves generating,
>>> "matching" code and for simplification operands involves generating
>>> "transform" code
>>>
>>> * Overall Flow:
>>> I guess we would build the decision tree from the AST.
>>> So the flow would be like:
>>> source -> struct simplify (match, ifexpr, result) -> decision tree -> c 
>>> code.
>>>
>>> Something like (in main):
>>> decision_tree dt;
>>> while (there is another pattern)
>>> {
>>>   simplify *s = parse_match_and_simplify ();
>>>   insert s into decision tree;
>>> };
>>> So parsing routines are concerned with parsing and building the AST 
>>> (operand),
>>> and not with the decision tree. Is that fine ?
>>
>> Yes, that's good.
>>
>>> * Representation of decision tree.
>>> A decision tree would need a way to represent language constructs
>>> like capture, predicate, etc. so in some ways it would be similar to AST.
>>> It
>>> In the patch, I have created the following heirarchy:
>>> dt_operand: represents a general base "operand" of in decision tree
>>> dt_expr: for representing expression. Expression contains operation
>>> to be performed (e_operation).
>>> dt_capture: analogous to capture.
>>> dt_simplify: for representing "simplification" operand.
>>> simplification consists of ifexpr and result
>>> dt_head: to represent "dummy" root. Maybe a separate class is not needed.
>>>
>>> * Constructing decision tree from AST
>>> The algorithm i have used is similar to inserting string in a trie
>>> outlined here: http://en.wikipedia.org/wiki/Trie
>>> The difference shall be to traverse AST depth-first rather than
>>> traversing the string.
>>> Apart from that I guess it would be the same (naturally find and
>>> compare operations would be different).
>>> I haven't given much thought about this. Currently, I construct
>>> decision tree for only patterns with unary operators in an
>>> ugly way by "flattening" the AST by walking it in pre-order and
>>> storing the nodes in vector in the order they are visited
>>>
>>> So to construct decision tree from the following AST:
>>>           negate
>>>               |
>>>           bit_not
>>>               |
>>>              @0
>>>
>>> AST is flattened by walking in preorder (by walk_operand_preorder) and 
>>> stored
>>> as: [ expr, expr, capture ]
>>> and this vector is used to construct decision tree.
>>> I did it as a "quick and dirty" way, it has to be changed.
>>> And it won't work for patterns with n-ary operators.
>>>
>>> We should visit each node of AST in preorder, and add that node
>>> during traversal to decision tree. I am not yet clear on way of doing that.
>>>
>>> * Comparing operands
>>> How do we compare operands ? We shall need to do this while inserting
>>> in decision tree, since if the operand is already inserted we do not create
>>> a new node.
>>> In the patch (cmp_operand), I have used following rules:
>>> a) if types not equal, they are clearly not equal.
>>> b) if type is expr, compare operation->op->id
>>> c) if type is capture, compare the number.
>>> d) for predicate, compare on ident ?
>>>
>>> * Representing patterns with n-ary operators.
>>> Consider following two match operands:
>>> (plus (minus @0 @1) @1)
>>> (plus (negate @0) @1)
>>>
>>> In decision tree, it would be represented as:
>>>
>>>             *********plus*********
>>>                /    \          /        \
>>>           minus  @1  negate  @1
>>>             /     \           |
>>>         @0   @1       @0
>>>
>>> plus has got 4 children - (minus @0 @1), @1, (negate @0), @1.
>>> However (minus @0 @1) and @1 are not separate "decisions", but
>>> children of plus within the same expression.
>>>
>>> For calculating number of decisions at an expression node:
>>> number of decisions =  number of children / arity of operator.
>>> in the above case, number of decision = 4 / 2 = 2
>>>
>>> For other cases, for instance capture node (the children would be
>>> simplification operand),
>>> number of decisions = number of children.
>>
>> Hmm.  I think it should be only two children from the plus,
>> as pre-order for the first pattern is for example
>> [plus minus @0 @1 @1] so you'd have
>>
>>    plus
>>    /     \
>> minus  negate
>> |           |
>> @0     @0
>> |           |
>> @1     @1
>> |
>> @1
>>
>> instead?
>>
>> Note that you probably have to deal with non-matching capture
>> IDs by re-writing them on-the-fly.  That is, the two
>> pre-oder traversals [plus minus @1 ...] and [plus minus @0 ...]
>> should commonize with renaming the non-matching capture
>> somehow.
> Um, could you please elaborate on that (non-matching capture ?),
> I am not sure if I understood it fully, thanks.


I'm worried about

(plus (minus@1 @2 @3) @2)

vs.

(plus (minus @1 @2) @1)

(just made-up examples, of course).  Here we'd like to
commonize the path checking for (plus (minus ...) ... but
of course we can't easily process the captures at that point
as the first has two captures (it captures the minus itself while
the 2nd doesn't) and the captures that are at the same position
use different indexes.  To finally match the minus we have to
verify that in one case @2 == @2 and in the other case that
@1 == @1.

Re-numbering capture indexes in pre-order would improve things
for the case where capture positions are in the same places but
the indexes do not match.  It won't handle the above case
of course.

> I have attached patch for patterns with binary operators (in principle
> should work for n-ary operators,
> but have only tested upto 2), that creates decision tree for
> patterns (excluding predicates, and multiple matching). I don't intend
> to commit this (contains many hacks!).
>
> * Constructing decision tree from AST
> We can define our decision tree to store prefixes of preorder
> traversals of diffferent AST.
> Insertion happens as follows (decision_tree::insert)
> a) Get preorder traversal of AST into vector.
> b) Insert AST nodes from vector into decision tree.
> Currently, I obtain preorder traversal into a vector. We can later
> change it to compute
> next preorder successor lazily.

Just make the code easy to understand, optimizing it should be
secondary.

> * Code-gen
> For n-ary operators, the patch generates code as follows:
> if (code == <expr code>)
> {
>    match operand 0
>      match operand 1
>         ....
>           match operand n-1
>             transform
>             return true;
> }
>
> * Adding parent, level, pos fields to AST (operand).
> One change (hack), I have made to code-gen, is naming of operands that are
> assigned to gimple_assign_rhs in code-gen of expressions.
> I am not sure if this is really needed.
>
> in AST, we have following representation for expressions:
>                       expr-
>                      /       \
>                   left      right
>                operand  operand
>
> code-gen off AST (expr::gen_gimple_match) produces code like:
> {
>   tree op = gimple_assign_rhs1 (def_stmt);
>   // code-gen for left operand
> }
> {
>   tree op = gimple_assign_rhs2 (def_stmt);
>   // code-gen for right operand
> }
>
> We can do this since, in expr::gen_gimple_match, we call
> left_operand->gen_gimple_match,
> come back to expr::gen_gimple_match (after
> left_operand->gen_gimple_match() returns), and
> then call right_operand->gen_gimple_match().
>
> However in decision tree, the expression gets represented as follows:
>            expr
>              |
>            left-operand
>              |
>            right-operand
>
> dt_expr::gen_gimple calls left_operand->gen_gimple, which calls
> right_operand->gen_gimple,
> so we return to dt_expr::gen_gimple, after code is generated for the
> whole subtree of expr.
>
> So I assign operands at the start before generating code for the subtree:
> tree op00 = gimple_assign_rhs1 (def_stmt);
> tree op10 = gimple_assign_rhs2 (def_stmt);
> <code for left-operand>
>   <code for right-operand>
> each of the operands know their positions, so they know which operand
> to use (get_op_name()).
> the names are generated as:
> op<position><level>, position = index of operand in it's parent's
> expr's vector (vec<operand *> ops).
> level: level of AST, at which the operand is stored.
>
> For this I added three fields to operand (unsigned pos, and unsigned
> level, operand *parent), and accordingly
> made changes to expr::append_op. In a way this is abusing the AST.
> This probably works (works with test-cases i tried), but I don't like it much.

The naming is ok, but indeed it doesn't feel very clean.  Somehow this
belongs to the decision tree instead... (but then you need to lookup
the decision tree operand info from code-gen which happens off the AST...).

> * Order of matching.
> Currently the order of matching operands is strictly 0, 1, .. n - 1
> for n-ary operator.
> However this might not be the best choice.
>
> For example, consider following patterns:
> (operator  op1  op) S1
> (operator  op2 op)  S2
>
> Both have common operator, and 1st common operand, they differ in 0th operand.
> With patch, the code generated is:
> if (code == <expr-code>)
> {
>   match op1
>      match op
>        S1
>
>   match op2
>     match op
>       S2
> }
>
> A better ordering would be to match the 1st operand and then
> respective 0th operands of both patterns:
>
> if (code == <expr-code>
> {
>   match op
>     match op1
>       S1
>     match op0
>       S2
> }
>
> This complicates insertion into decision tree.

I'd say we leave "optimizing" the decision tree to a later stage - it's a global
optimization problem after all (I believe you can't do optimally with just
looking at the present decision tree and the AST you want to insert).

> * hack: only one generated gimple_match_and_simplify function
> In the patch, I generate code for gimple_match_and_simplify with 3
> operands version (op0, op1, op2)
> and the other two (unary and binary versions), call it with NULL_TREE
> for extra operands.
> I will soon change that.
>
> * Testing patterns.
> I think I have written most of the test-cases wrongly in match-2.c
> For example this test-case doesn't match
> (i haven't checked in "fix testsuite fallout" patch, so this might not be true
> anymore)
>
> (match_and_simplify
>   plus (minus @0 @1) @1)
>   @0)
>
> for this test-case:
> /* (x - y) + y -> x */
> int f6(int x, int y)
> {
>   int t1 = x - y;
>   return t1 + y;
> }
> /* { dg-final { scan-tree-dump "gimple_match_and_simplified to
> \[^\n\r\]*= x_\\d\+\\(D\\)" "forwprop1" } } */
>
> I get following output (tree-forwprop-details):
>
> ;; Function f6 (f, funcdef_no=0, decl_uid=1744, symbol_order=0)
>
> gimple_match_and_simplified to _4 = y_2(D) + t1_3;
> f (int x, int y)
> {
>   int t1;
>   int _4;
>
>   <bb 2>:
>   t1_3 = x_1(D) - y_2(D);
>   _4 = x_1(D);
>   return _4;
>
> }
>
> Shouldn't it show: gimple_match_and_simplified to _4 = x_1(D) instead ?

Yeah.  I suppose you hit the issue that we don't commutate patterns,
so try with a commutated pattern inserted.

> The issue is this test-case fails in isolation, however it passes when
> it is placed with other
> test cases that PASS and have same regex in scan-tree-dump as this test-case:
> scan-tree-dump "gimple_match_and_simplified to \[^\n\r\]*= x_\\d\+\\(D\\)"
>
> Summary:
> I think we need to (at-least) deal with following problems for decision tree:

I have to leave for a while now, I'll come back later and have a detailed
look at the patch to try answer the questions below.

Richard.

> a) Representation and construction of decision tree from AST -
> We do this by storing prefixes of preorder traversals of AST in
> decision tree. Do we finalize on that ?
> b) Order of operand matching
> c) Generating patterns for built-in functions (I guess this will be
> similar to expr-gen).
> d) Multiple matching patterns
> e) Commutative ops
>
> Pattern Enhancements:
> Some of these were already discussed, I am jotting them here:
> a) Symbol for multiple operators
> b) grouping patterns by ifexpr
> c) should we have MATCH_FAILED or FAIL to explicitly denote failure in
> manual transform (c_expr) ?
> d) making operators to be commutative.
>
> Once the decision tree is done, I can try working on these.
> Till Sunday, I will attempt to have another prototype, that covers
> most of the patterns in match.pd
> (except for cond_expr, and those requiring GENERIC support), and
> accompanying correct test-case
> for each pattern. And next-week start working on a fair patch.
>
> Thanks and Regards,
> Prathamesh
>>
>> Richard.
>>
>>> * Code generation.
>>> Code shall be generated by walking the decision tree.
>>> The way it's constructed, there's no difference between code generation
>>> for "matching" and code generation for "transform". For non-simplificaton
>>> operands, "matching" code is generated, and for "simplification" operands,
>>> "transform" code is generated. The tree shall be walked twice,
>>> once to generate GIMPLE code and second time for GENERIC.
>>> For simplicity, I currently return false whenever there's a fail in match,
>>> instead of trying to match the next pattern.
>>>
>>> Code-gen for capture - same as capture::gen_gimple_match.
>>>
>>> Code-gen for predicate -  I haven't added support for predicate in
>>> decision tree yet, but I guess that would be the same as
>>> predicate::gen_gimple_match
>>>
>>> Code-gen for expr.
>>> There are two types of code-gen for expr.
>>> The patch generates following code:
>>> Type 1 - expr is child of root node.
>>> the only code that gets generated is (in decision_tree::gen_gimple):
>>> if (code == <expr code>)
>>> {
>>> tree captures[4] = {}
>>> <generated code for children>
>>> }
>>> This is similar to generating matching code in write_nary_simplifiers.
>>>
>>> Type 2 - expr_1 is a child of another expr_0 node.
>>> The code gets generated as follows (dt_expr::gen_gimple):
>>> {
>>> gimple def_stmt = SSA_NAME_DEF_STMT (op);
>>> if (is_gimple_assign (def_stmt)
>>>     && gimple_assign_rhs_code (def_stmt) == <expr_1-code>)
>>> {
>>> tree op = gimple_assign_rhs1 (def_stmt);
>>> if (valueize && TREE_CODE (op) == SSA_NAME)
>>> {
>>>   op = valueize (op);
>>>   if (!op) return false;
>>> }
>>> <code-gen for children of expr_1 node>
>>> }
>>>
>>> Example:
>>> (negate (negate @0))
>>> S1
>>>
>>> (negate (bit_not @0))
>>> S2
>>>
>>> decision tree:
>>>
>>>                 dummy/root
>>>                         |
>>>            NEGATE_EXPR
>>>              /                  \
>>>      BIT_NOT           NEGATE_EXPR
>>>            |                         |
>>>          @0                     @0
>>>            |                         |
>>>          S1                      S2
>>>
>>> // code-gen for NEGATE_EXPR (child of root):
>>> if (code == NEGATE_EXPR)
>>> {
>>> tree captures[4] = {};
>>> // code gen for BIT_NOT_EXPR
>>> {
>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0);
>>> if (is_gimple_assign (def_stmt)
>>>     && gimple_assign_rhs_code (def_stmt) == BIT_NOT_EXPR)
>>> {
>>> tree op = gimple_assign_rhs1 (def_stmt);
>>> if (valueize && TREE_CODE (op) == SSA_NAME)
>>> {
>>>   op = valueize (op);
>>>   if (!op) return false;
>>> }
>>>
>>> // code-gen for @0, child of BIT_NOT_EXPR
>>> if (!captures[0])
>>>   captures[0] = op;
>>> else if (captures[0] != op)
>>>   return false;
>>>
>>> // code-gen for S1, child of @0
>>> < same as code generated by .gen_gimple_transform >
>>> return true;
>>> }
>>> // code gen for inner NEGATE_EXPR
>>> {
>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0);
>>> if (is_gimple_assign (def_stmt)
>>>     && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR)
>>> <rest similar to the BIT_NOT case>
>>> }
>>>
>>> The following gets duplicated with the patch:
>>> {
>>> gimple_def_stmt = SSA_NAME_DEF_STMT (op0);
>>> if (TREE_CODE (op0) != SSA_NAME)
>>>   return false;
>>> if (is_gimple_assign (def_stmt)
>>>     && gimple_assign_rhs_code (def_stmt) == <expr-code>)
>>> ...
>>> }
>>>
>>> Improving code-gen for expr:
>>> "gimple def_stmt = ..." and "if (TREE_CODE (op0)" get duplicated,
>>> while they could be factored out, similar to this:
>>>
>>> {
>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0);
>>> if (TREE_CODE (op0) != SSA_NAME)
>>>   return false;
>>> if (!is_gimple_assign (def_stmt))
>>>   return false;
>>> if (gimple_assign_rhs_code (def_stmt) == BIT_NOT_EXPR)
>>> {
>>>   // code-gen for BIT_NOT_EXPR subtree
>>> }
>>> else if (gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR)
>>> {
>>>   // code-gen for NEGATE_EXPR subtree
>>> }
>>>
>>> For factoring "gimple def_stmt ..." and "if (TREE_CODE (op0) != SSA_NAME"
>>> we could have that generated at the parent of expr's node rather than
>>> at expr. However that would be incorrect for cases where not all children
>>> of a node are expressions:
>>>
>>> Example:
>>> // patterns only for illustration
>>> (negate (bit_not @0))
>>> (negate @0)
>>>
>>>                    root
>>>                      |
>>>                   negate
>>>                     /       \
>>>                 bit_not   @0
>>>                     |
>>>                   @0
>>>
>>> we cannot have the above code generated at negate,
>>> since it's not applicable negate's 2nd child (@0).
>>>
>>> This can be done by grouping together children that are expressions.
>>> However the patch does not do that.
>>>
>>> * Code-gen for simplification operand
>>> This involves code-gen for ifexpr and result of pattern.
>>> Calls gen_gimple_transform of ifexpr and result (dt_simplify::gen_gimple)
>>> So this is really code-gen off AST
>>
>> Right (modulo replacing captures with their replacements).
>>
>>> * Matching multiple patterns
>>> A pattern has following parts: match, ifexpr and result.
>>> If pattern fails in match operand, I guess we can safely return false ?
>>> We "club" together patterns that have same match operand,
>>> and use goto, if one of them fails in their (ifexpr/result) and then goto 
>>> the
>>> (ifexpr/result) of the next operand.
>>>
>>> Example:
>>> /* x & 0 -> 0 */
>>> (match_and_simplify
>>>   (bit_and @0 @1)
>>>   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && (@1 == integer_zero_node))
>>>   { integer_zero_node; })
>>>
>>> /* x & -1 -> x */
>>> (match_and_simplify
>>>   (bit_and @0 @1)
>>>   if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>>      && (@1 == integer_minus_one_node)
>>>   @0)
>>>
>>> For both patterns match is same.
>>>
>>> Decision Tree:
>>>                 bit_and
>>>                 /        \
>>>              @0      @1
>>>                            |
>>>                         S1,  S2
>>
>> I think it's worth adding a diagnostic to genmach whenever exactly
>> same matches appear.  But I'd say generally we'd support this
>> by testing the ifexpr of the next pattern.
>>
>>> S1 represents <ifexpr, result> of pattern-1, and S2 represents <ifexpr, 
>>> result>
>>> of pattern-2 respectively.
>>> S1, S2 would be stored as children of @1 (the last operand of n-ary 
>>> operator),
>>> in dt_operand::kids vector.
>>>
>>> The code would be generated as:
>>>
>>> matching code.
>>> if (! pattern-1 ifexpr condition)
>>>   goto simplify2;  // next pattern with the same "match" operand.
>>> transform code for pattern 1
>>>
>>> simplify2:
>>> if (! pattern-2 ifexpr condition)
>>>   return false;  // last pattern
>>> transform code for pattern 2.
>>>
>>> If matching itself fails, that is neither of the decisions get matched,
>>> I believe we can then return false as it cannot match any other pattern ?
>>>
>>> * patterns needing hacks like cond_expr or GENERIC support
>>> I haven't given thought to this yet. I suppose we can look to handle
>>> these after adding support for GENERIC. Instead of generating GENERIC
>>> matching in
>>> gimple_match_and_simplify, could we then call generic_match_and_simplify 
>>> from
>>> within gimple_match_and_simplify ?
>>
>> Yes (that's what's currently done btw).
>>
>>> * Tests
>>> The patch transformed the following patterns:
>>>
>>> (match_and_simplify
>>>   (negate (bit_not @0))
>>>   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>>   (plus @0 { build_int_cst (TREE_TYPE (@0)), 1); }))
>>>
>>> (match_and_simplify
>>>   (negate (negate @0))
>>>   @0)
>>>
>>> I have attached test-case I tried it with (negate.c)
>>>
>>> * Conclusion
>>> Does it sound reasonable ? I am going to be re-writing the
>>> decision tree from scratch, but is the basic idea fine ? Or should we
>>> take a different approach ?
>>>
>>> Thanks and Regards,
>>> Prathamesh

Re: [GSoC] decision tree first steps

Reply via email to