2016-10-09 21:34 GMT+02:00 Ilia Mirkin <imir...@alum.mit.edu>: > On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst <karolher...@gmail.com> wrote: >> 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: >>> >>> >>> On 10/08/2016 10:04 PM, Karol Herbst wrote: >>>> >>>> looks great, a few comments below >>> >>> >>> Thanks! >>> >>>> >>>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: >>>>> >>>>> total instructions in shared programs :2286901 -> 2284473 (-0.11%) >>>>> total gprs used in shared programs :335256 -> 335273 (0.01%) >>>>> total local used in shared programs :31968 -> 31968 (0.00%) >>>>> >>>>> local gpr inst bytes >>>>> helped 0 41 852 852 >>>>> hurt 0 44 23 23 >>>>> >>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> >>>>> --- >>>>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 94 >>>>> ++++++++++++++++++++++ >>>>> 1 file changed, 94 insertions(+) >>>>> >>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>> index 6efb29e..caf5d1d 100644 >>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb) >>>>> >>>>> // >>>>> ============================================================================= >>>>> >>>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>>>> +class LateAlgebraicOpt : public Pass >>>>> +{ >>>>> +private: >>>>> + virtual bool visit(BasicBlock *); >>>>> + >>>>> + void handleADD(Instruction *); >>>>> + bool tryADDToSHLADD(Instruction *); >>>>> + >>>>> + BuildUtil bld; >>>>> +}; >>>>> + >>>>> +void >>>>> +LateAlgebraicOpt::handleADD(Instruction *add) >>>>> +{ >>>>> + Value *src0 = add->getSrc(0); >>>>> + Value *src1 = add->getSrc(1); >>>>> + >>>>> + if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR) >>>>> + return; >>>>> + >>>>> + if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType)) >>>>> + tryADDToSHLADD(add); >>>>> +} >>>>> + >>>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>>>> +bool >>>>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add) >>>>> +{ >>>>> + Value *src0 = add->getSrc(0); >>>>> + Value *src1 = add->getSrc(1); >>>>> + Modifier mod[2]; >>>>> + Value *src; >>>>> + int s; >>>>> + >>>>> + if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8) >>>>> + return false; >>>>> + >>>>> + if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL) >>>>> + s = 0; >>>>> + else >>>>> + if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL) >>>>> + s = 1; >>>>> + else >>>>> + return false; >>>>> + >>>>> + src = add->getSrc(s); >>>>> + >>>>> + if (src->getUniqueInsn()->bb != add->bb) >>>>> + return false; >>>> >>>> >>>> is there a reason to check for the bb here? >>> >>> >>> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie. >>> inside the same basic block). I just apply the same rule here. >>> >>>> >>>>> + >>>>> + if (src->getInsn()->usesFlags() || src->getInsn()->subOp) >>>>> + return false; >>>>> + >>>>> + if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE) >>>>> + return false; >>>>> + >>>>> + mod[0] = add->src(0).mod; >>>>> + mod[1] = add->src(1).mod; >>>>> + >>>>> + add->op = OP_SHLADD; >>>>> + add->setSrc(2, add->src(s ? 0 : 1)); >>>> >>>> >>>> I usually use "s ^ 1" instead. >>> >>> >>> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole). >>> >>>> >>>>> + add->src(2).mod = mod[s]; >>>>> + >>>>> + add->setSrc(0, src->getInsn()->getSrc(0)); >>>>> + add->src(0).mod = mod[0]; >>>>> + add->setSrc(1, src->getInsn()->getSrc(1)); >>>>> + add->src(1).mod = Modifier(0); >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> +bool >>>>> +LateAlgebraicOpt::visit(BasicBlock *bb) >>>>> +{ >>>>> + Instruction *next; >>>>> + >>>>> + for (Instruction *i = bb->getEntry(); i; i = next) { >>>>> + next = i->next; >>>>> + switch (i->op) { >>>>> + case OP_ADD: >>>>> + handleADD(i); >>>>> + break; >>>>> + default: >>>>> + break; >>>>> + } >>>>> + } >>>> >>>> >>>> you can also just implement visit(Instruction *) and return true; then >>>> you won't need that silly loop. >>> >>> >>> True. >>> >>>> >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> +// >>>>> ============================================================================= >>>>> + >>>>> static inline void >>>>> updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn) >>>>> { >>>>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level) >>>>> RUN_PASS(2, AlgebraicOpt, run); >>>>> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less >>>>> checks >>>>> RUN_PASS(1, ConstantFolding, foldAll); >>>>> + RUN_PASS(2, LateAlgebraicOpt, run); >>>> >>>> >>>> What is the reason to add a new AlgebraicOpt pass for this? >>> >>> >>> Because a bunch of optimizations are applied in ConstantFolding which means >>> this one has to be done *after*. My first attempt was to do it in >>> AlgebraicOpt, but it's actually not the best place. :-) >>> >> >> Yeah but this applies to many other opts as well. I think in the end >> we need to run those passes inside a loop and iterate like 2 or 3 >> times like I do here: >> https://github.com/karolherbst/mesa/commit/e2ae247aa27ddb150ff4ccd1c0f19baa470134ef >> >> The first line is cut of, but the benefit is around 0.5% less >> instructions and a decent speed up in pixmark_piano. > > This is different. The loop helps achieve a fixed point. However this > pass does not enable any further optimizations from taking place, so > it can easily be placed outside such a fixed-point loop. The general > reason why we don't have a loop at all is that the optimizations are > carefully ordered s.t. such a loop only provides minimal improvement. > > -ilia
right, there is still a pending fix for that on my side: https://github.com/karolherbst/mesa/commit/2cc9062e886175246d0f93041ea8e953928ba069 but there are other issues as well _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev