On Thu, 2017-12-14 at 13:43 -0700, Jeff Law wrote:
> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> > On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> > > On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > > > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > > > So, the story of this very small patch starts with me
> > > > > > adding
> > > > > > patterns
> > > > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > > > 
> > > > > >   [(set (pc)
> > > > > >     (if_then_else
> > > > > >       (and
> > > > > >          (ne (match_operand:P 1 "register_operand"
> > > > > > "c,*b,*b,*b")
> > > > > >              (const_int 1))
> > > > > >          (match_operator 3 "branch_comparison_operator"
> > > > > >                   [(match_operand 4 "cc_reg_operand"
> > > > > > "y,y,y,y")
> > > > > >                    (const_int 0)]))
> > > > > >                   (label_ref (match_operand 0))
> > > > > >                   (pc)))
> > > > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > > > "=1,*r,m,*d*wi*c*l")
> > > > > >     (plus:P (match_dup 1)
> > > > > >             (const_int -1)))
> > > > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > > > 
> > > > > > However when this gets to the loop_doloop pass, we get an
> > > > > > assert
> > > > > > fail
> > > > > > in iv_number_of_iterations():
> > > > > > 
> > > > > >   gcc_assert (COMPARISON_P (condition));
> > > > > > 
> > > > > > This is happening because this branch insn tests two things
> > > > > > ANDed
> > > > > > together so the and is at the top of the expression, not a
> > > > > > comparison.
> > > > > 
> > > > > Is this something you've created for an existing
> > > > > loop?  Presumably an
> > > > > existing loop that previously was a simple loop?
> > > > 
> > > > The rtl to use this instruction is generated by new code I'm
> > > > working on
> > > > to do a builtin expansion of memcmp using a loop. I call
> > > > gen_bdnztf_di() to generate rtl for the insn. It would be nice
> > > > to
> > > > be
> > > > able to generate this instruction from doloop conversion but
> > > > that
> > > > is
> > > > beyond the scope of what I'm working on presently.
> > > 
> > > Understood.
> > > 
> > > So what I think (and I'm hoping you can confirm one way or the
> > > other)
> > > is
> > > that by generating this instruction you're turing a loop which
> > > previously was considered a simple loop by the IV code and
> > > turning it
> > > into something the IV bits no longer think is a simple loop.
> > > 
> > > I think that's problematical as when the loop is thought to be a
> > > simple
> > > loop, it has to have a small number of forms for its loop
> > > back/exit
> > > loop
> > > tests and whether or not a loop is a simple loop is cached in the
> > > loop
> > > structure.
> > > 
> > > I think we need to dig into that first.  If my suspicion is
> > > correct
> > > then
> > > this patch is really just papering over that deeper problem.  So
> > > I
> > > think
> > > you need to dig a big deeper into why you're getting into the
> > > code in
> > > question (canonicalize_condition) and whether or not the call
> > > chain
> > > makes any sense given the changes you've made to the loop.
> > > 
> > 
> > Jeff,
> >   There is no existing loop structure. This starts with a memcmp()
> > call
> > and then goes down through the builtin expansion mechanism, which
> > is
> > ultimately expanding the pattern cmpmemsi which is where my code is
> > generating a loop that finishes with bdnzt. The code that's
> > ultimately
> > generated looks like this:
> 
> Understood.  But what I still struggle with is how you're getting
> into
> check_simple_exit to begin with and whether or not that should be
> happening.
> 
> 
> The only way to get into check_simple_exit is via find_simple_exit
> which
> is only called from get_simple_loop_desc.
> 
> And if you're calling get_simple_loop_desc, then there is some kind
> of
> loop structure in place AFAICT that contains this insn which is
> rather
> surprising.
> 
> > 
> > I really think the ultimate problem here is that both
> > canonicalize_condition and get_condition promise in their
> > documenting
> > comments that they will return something that has a cond at the
> > root of
> > the rtx, or 0 if they don't understand what they're given. In this
> > case
> > they do not understand the rtx of bdnzt and are returning rtx
> > rooted
> > with an and, not a cond. This may seem like papering over the
> > problem,
> > but I think it is legitimate for these functions to return 0 when
> > the
> > branch insn in question does not have a simple cond at the heart of
> > it.
> > And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> > Ultimately, yes something better ought to be done here.
> 
> 
> 
> Your pattern has the form:
> 
>   [(set (pc)
>       (if_then_else
>         (and
>            (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>                (const_int 1))
>            (match_operator 3 "branch_comparison_operator"
>                     [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>                      (const_int 0)]))
>                     (label_ref (match_operand 0))
>                     (pc)))
>    (set (match_operand:P 2 "nonimmediate_operand"
> "=1,*r,m,*d*wi*c*l")
>       (plus:P (match_dup 1)
>               (const_int -1)))
>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> 
> 
> 
> That's a form that get_condition knows how to parse.  It's going to
> pull
> out the condition which looks like this:
> 
> 
>         (and
>            (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>                (const_int 1))
>            (match_operator 3 "branch_comparison_operator"
>                     [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>                      (const_int 0)]))
> 
> ANd pass that down to canonicalize_condition.  That doesn't look like
> something canonicalize_condition should handle and thus it ought to
> be
> returning NULL_RTX.
> 
> However, I'm still concerned about how we got to a point where this
> is
> happening.  So while we can fix canonicalize_condition to reject this
> form (and you can argue we should and I'd generally agree with you),
> it
> could well be papering over a problem earlier.
> 

Jeff,
  How we end up there is that I do builtin expansion on a memcmp() call
and generate the RTL form for code such as this (35 byte memcmp,
ppc64le power8):

             li 8,2
             mtctr 8
             li 6,0
             li 5,8
     .L13:
             ldbrx 7,3,6
             ldbrx 9,10,6
             ldbrx 0,3,5
             ldbrx 4,10,5
             addi 6,6,16
             addi 5,5,16
             subfc. 9,9,7
             bne 0,.L10
             subfc. 9,4,0
             bdnzt 2,.L13
             bne 0,.L10
             add 3,3,6
             add 10,10,6
             addi 9,3,-5
             ldbrx 7,0,9
             addi 9,10,-5
             ldbrx 9,0,9
             subfc 9,9,7
     .L10:
             popcntd 9,9
             subfe 10,10,10
             or 9,9,10

This happens at expand. Then later down the line, the doloop pass
(doloop_optimize_loops) calls doloop_optimize on each loop. That in
turn calls get_simple_loop_desc().

All I did here was to replace the memcmp() call with a bunch of insns
and a couple labels adding a loop to the code. If I just used another
bne and bdnz I think everything would go through fine. But my
performance testing shows that there is a definite advantage to using
bdnzt here.

To summarize:
 * Input code is a call to memcmp()
 * builtin expansion happens
 * I generate insns forming a loop ending with bdnzt.
 * This loop is processed by doloop_optimize_loops.
 * canonicalize_condition() doesn't understand the RTL for bdnzt.

The beauty of doing stuff like this for builtin expansion is that the
expanded RTL code gets exposed to all of the back end optimizations.
CSE, constant propagation, and other things can do weird and wonderful
things to the code that was once a memcmp(). So if I generate a loop in
the expansion, of course that will be processed by doloop. I don't see
why we would want it any other way.

The function expand_compare_loop() that does this expansion and
generates the loop is in this patch:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00623.html

Thanks,
    Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Reply via email to