On Wed, May 15, 2019 at 10:53:43AM +0200, Richard Biener wrote: > I wonder if making the doloop patterns (tried to find them in rs6000.md, > but I only see define_expands with no predicates/alternatives...)
"doloop_end" --> "ctr<mode>" --> "<bd>_<mode>" (all consecutive in rs6000.md btw.) Alternative 0 in "<bd>_<mode>" are the actual looping instructions; the other alternatives are for the uncommon case where we ended up not being able to use this insn after all. > accept any counter register, just have a preference on that special > counter reg and have the define_insn deal with RA allocating another > one by emitting a regular update & branch-on-zero? That is what those other alternatives do. It is expensive, and cannot even *work* in all cases. We have no generic "branch on (not) zero" in Power, btw. Archs that do can use that as a doloop, if they choose IVs that end the loop at 0. > That is, the penalty of doing that shouldn't be too big and thus > we can more optimistically cost & handle "doloops"? We have done that for ages, in the RTL level doloop handling. With newer hardware it becomes more and more expensive to guess wrong. > I guess > the doloop.c checks are quite too strict because we have to > rely on RA being able to allocate that reg and as soon as we > need to spill it using a general reg with update & branch-on-zero > will be cheaper anyways? (Update, compare, branch, for us). We can predict quite well where the count register will be unavailable to our doloops. The cost if we are allocated a GPR isn't so bad: it costs an insn or maybe two more than if we made optimal code (without doloop). But we can be allocated a floating point register, or memory, instead. That is heavily discouraged (by making it more expensive), but it can still happen. This is a jump_insn so it cannot get any reloads, either; but even if it could, that is an *expensive* thing to do. Segher