2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: > > > On 10/08/2016 10:04 PM, Karol Herbst wrote: >> >> looks great, a few comments below > > > Thanks! > >> >> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>: >>> >>> total instructions in shared programs :2286901 -> 2284473 (-0.11%) >>> total gprs used in shared programs :335256 -> 335273 (0.01%) >>> total local used in shared programs :31968 -> 31968 (0.00%) >>> >>> local gpr inst bytes >>> helped 0 41 852 852 >>> hurt 0 44 23 23 >>> >>> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> >>> --- >>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 94 >>> ++++++++++++++++++++++ >>> 1 file changed, 94 insertions(+) >>> >>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> index 6efb29e..caf5d1d 100644 >>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb) >>> >>> // >>> ============================================================================= >>> >>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>> +class LateAlgebraicOpt : public Pass >>> +{ >>> +private: >>> + virtual bool visit(BasicBlock *); >>> + >>> + void handleADD(Instruction *); >>> + bool tryADDToSHLADD(Instruction *); >>> + >>> + BuildUtil bld; >>> +}; >>> + >>> +void >>> +LateAlgebraicOpt::handleADD(Instruction *add) >>> +{ >>> + Value *src0 = add->getSrc(0); >>> + Value *src1 = add->getSrc(1); >>> + >>> + if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR) >>> + return; >>> + >>> + if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType)) >>> + tryADDToSHLADD(add); >>> +} >>> + >>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c) >>> +bool >>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add) >>> +{ >>> + Value *src0 = add->getSrc(0); >>> + Value *src1 = add->getSrc(1); >>> + Modifier mod[2]; >>> + Value *src; >>> + int s; >>> + >>> + if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8) >>> + return false; >>> + >>> + if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL) >>> + s = 0; >>> + else >>> + if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL) >>> + s = 1; >>> + else >>> + return false; >>> + >>> + src = add->getSrc(s); >>> + >>> + if (src->getUniqueInsn()->bb != add->bb) >>> + return false; >> >> >> is there a reason to check for the bb here? > > > Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie. > inside the same basic block). I just apply the same rule here. > >> >>> + >>> + if (src->getInsn()->usesFlags() || src->getInsn()->subOp) >>> + return false; >>> + >>> + if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE) >>> + return false; >>> + >>> + mod[0] = add->src(0).mod; >>> + mod[1] = add->src(1).mod; >>> + >>> + add->op = OP_SHLADD; >>> + add->setSrc(2, add->src(s ? 0 : 1)); >> >> >> I usually use "s ^ 1" instead. > > > Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole). > >> >>> + add->src(2).mod = mod[s]; >>> + >>> + add->setSrc(0, src->getInsn()->getSrc(0)); >>> + add->src(0).mod = mod[0]; >>> + add->setSrc(1, src->getInsn()->getSrc(1)); >>> + add->src(1).mod = Modifier(0); >>> + >>> + return true; >>> +} >>> + >>> +bool >>> +LateAlgebraicOpt::visit(BasicBlock *bb) >>> +{ >>> + Instruction *next; >>> + >>> + for (Instruction *i = bb->getEntry(); i; i = next) { >>> + next = i->next; >>> + switch (i->op) { >>> + case OP_ADD: >>> + handleADD(i); >>> + break; >>> + default: >>> + break; >>> + } >>> + } >> >> >> you can also just implement visit(Instruction *) and return true; then >> you won't need that silly loop. > > > True. > >> >>> + >>> + return true; >>> +} >>> + >>> +// >>> ============================================================================= >>> + >>> static inline void >>> updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn) >>> { >>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level) >>> RUN_PASS(2, AlgebraicOpt, run); >>> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less >>> checks >>> RUN_PASS(1, ConstantFolding, foldAll); >>> + RUN_PASS(2, LateAlgebraicOpt, run); >> >> >> What is the reason to add a new AlgebraicOpt pass for this? > > > Because a bunch of optimizations are applied in ConstantFolding which means > this one has to be done *after*. My first attempt was to do it in > AlgebraicOpt, but it's actually not the best place. :-) >
Yeah but this applies to many other opts as well. I think in the end we need to run those passes inside a loop and iterate like 2 or 3 times like I do here: https://github.com/karolherbst/mesa/commit/e2ae247aa27ddb150ff4ccd1c0f19baa470134ef The first line is cut of, but the benefit is around 0.5% less instructions and a decent speed up in pixmark_piano. >> >>> RUN_PASS(1, LoadPropagation, run); >>> RUN_PASS(1, IndirectPropagation, run); >>> RUN_PASS(2, MemoryOpt, run); >>> -- >>> 2.10.0 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev