On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst <karolher...@gmail.com> wrote: > 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: >> >> >> On 10/08/2016 10:04 PM, Karol Herbst wrote: >>> >>> looks great, a few comments below >> >> >> Thanks! >> >>> >>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: >>>> >>>> total instructions in shared programs :2286901 -> 2284473 (-0.11%) >>>> total gprs used in shared programs :335256 -> 335273 (0.01%) >>>> total local used in shared programs :31968 -> 31968 (0.00%) >>>> >>>> local gpr inst bytes >>>> helped 0 41 852 852 >>>> hurt 0 44 23 23 >>>> >>>> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> >>>> --- >>>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 94 >>>> ++++++++++++++++++++++ >>>> 1 file changed, 94 insertions(+) >>>> >>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> index 6efb29e..caf5d1d 100644 >>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb) >>>> >>>> // >>>> ============================================================================= >>>> >>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>>> +class LateAlgebraicOpt : public Pass >>>> +{ >>>> +private: >>>> + virtual bool visit(BasicBlock *); >>>> + >>>> + void handleADD(Instruction *); >>>> + bool tryADDToSHLADD(Instruction *); >>>> + >>>> + BuildUtil bld; >>>> +}; >>>> + >>>> +void >>>> +LateAlgebraicOpt::handleADD(Instruction *add) >>>> +{ >>>> + Value *src0 = add->getSrc(0); >>>> + Value *src1 = add->getSrc(1); >>>> + >>>> + if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR) >>>> + return; >>>> + >>>> + if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType)) >>>> + tryADDToSHLADD(add); >>>> +} >>>> + >>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>>> +bool >>>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add) >>>> +{ >>>> + Value *src0 = add->getSrc(0); >>>> + Value *src1 = add->getSrc(1); >>>> + Modifier mod[2]; >>>> + Value *src; >>>> + int s; >>>> + >>>> + if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8) >>>> + return false; >>>> + >>>> + if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL) >>>> + s = 0; >>>> + else >>>> + if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL) >>>> + s = 1; >>>> + else >>>> + return false; >>>> + >>>> + src = add->getSrc(s); >>>> + >>>> + if (src->getUniqueInsn()->bb != add->bb) >>>> + return false; >>> >>> >>> is there a reason to check for the bb here? >> >> >> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie. >> inside the same basic block). I just apply the same rule here. >> >>> >>>> + >>>> + if (src->getInsn()->usesFlags() || src->getInsn()->subOp) >>>> + return false; >>>> + >>>> + if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE) >>>> + return false; >>>> + >>>> + mod[0] = add->src(0).mod; >>>> + mod[1] = add->src(1).mod; >>>> + >>>> + add->op = OP_SHLADD; >>>> + add->setSrc(2, add->src(s ? 0 : 1)); >>> >>> >>> I usually use "s ^ 1" instead. >> >> >> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole). >> >>> >>>> + add->src(2).mod = mod[s]; >>>> + >>>> + add->setSrc(0, src->getInsn()->getSrc(0)); >>>> + add->src(0).mod = mod[0]; >>>> + add->setSrc(1, src->getInsn()->getSrc(1)); >>>> + add->src(1).mod = Modifier(0); >>>> + >>>> + return true; >>>> +} >>>> + >>>> +bool >>>> +LateAlgebraicOpt::visit(BasicBlock *bb) >>>> +{ >>>> + Instruction *next; >>>> + >>>> + for (Instruction *i = bb->getEntry(); i; i = next) { >>>> + next = i->next; >>>> + switch (i->op) { >>>> + case OP_ADD: >>>> + handleADD(i); >>>> + break; >>>> + default: >>>> + break; >>>> + } >>>> + } >>> >>> >>> you can also just implement visit(Instruction *) and return true; then >>> you won't need that silly loop. >> >> >> True. >> >>> >>>> + >>>> + return true; >>>> +} >>>> + >>>> +// >>>> ============================================================================= >>>> + >>>> static inline void >>>> updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn) >>>> { >>>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level) >>>> RUN_PASS(2, AlgebraicOpt, run); >>>> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less >>>> checks >>>> RUN_PASS(1, ConstantFolding, foldAll); >>>> + RUN_PASS(2, LateAlgebraicOpt, run); >>> >>> >>> What is the reason to add a new AlgebraicOpt pass for this? >> >> >> Because a bunch of optimizations are applied in ConstantFolding which means >> this one has to be done *after*. My first attempt was to do it in >> AlgebraicOpt, but it's actually not the best place. :-) >> > > Yeah but this applies to many other opts as well. I think in the end > we need to run those passes inside a loop and iterate like 2 or 3 > times like I do here: > https://github.com/karolherbst/mesa/commit/e2ae247aa27ddb150ff4ccd1c0f19baa470134ef > > The first line is cut of, but the benefit is around 0.5% less > instructions and a decent speed up in pixmark_piano.
This is different. The loop helps achieve a fixed point. However this pass does not enable any further optimizations from taking place, so it can easily be placed outside such a fixed-point loop. The general reason why we don't have a loop at all is that the optimizations are carefully ordered s.t. such a loop only provides minimal improvement. -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev