So, I don't know much about how nv50 ir works, but to me this just seems like a pretty slow implementation of a very limited instruction scheduler. In addition to the runtime complexity problems you mentioned, you're going to get a lot more benefit even from a very simple list scheduler compared to this, and it generally only takes a few hundred lines to write one. I'd send you some references, but I don't have any good explanations handy -- any decent compiler book should cover this, though. You could start at the wikipedia page for instruction scheduling to give you some idea of the main concepts [1], although it isn't great; for example, it doesn't mention the crucial difference between top-down and bottom-up scheduling. To give a concrete example, here's the skeleton of a basic list scheduler I wrote for freedreno a while ago [2]. The i965 scheduler right now isn't really great, and although I had a patch series to improve it a bit, it's a bit more complex than you need right now since it's designed to run pre-RA as well as post-RA so it needs to worry about register pressure. Nevertheless, you can find the end-result of my i965 scheduler series here [3]. If you ignore all the !post_ra stuff, it's a good example of what your scheduling heuristic should look like, particularly these lines [4]. Also, I'm here if you have any questions!
[1] https://en.wikipedia.org/wiki/Instruction_scheduling [2] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/gallium/drivers/freedreno/ir3/ir3_sched.c?h=ir3-sched [3] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp?h=i965-sched-v3&id=f5ff9c87b06a7eae0a08d381c02d0e0e131212ad [4] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp?h=i965-sched-v3&id=f5ff9c87b06a7eae0a08d381c02d0e0e131212ad#n1544 On Sat, Aug 13, 2016 at 6:02 AM, Karol Herbst <karolher...@gmail.com> wrote: > this paths reorders instructions in a way, so that the hardware is able to > dual issue more instructions leading to higher performance. > > with all previos commits put together > changes for ./GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 > /benchmark_duration_ms=60000 /width=1024 /height=640: > > inst_executed: 1.03G > inst_issued1: 614M -> 500M > inst_issued2: 213M -> 271M > > score: 1021 -> 1056 > > Signed-off-by: Karol Herbst <karolher...@gmail.com> > --- > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 59 > ++++++++++++++++++++++ > 1 file changed, 59 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index a9172f8..88354c9 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -3385,6 +3385,62 @@ DeadCodeElim::checkSplitLoad(Instruction *ld1) > > // > ============================================================================= > > +// Tries to improve dual issueing trivially > +// > +// the PASS checks for every instruction A,B with A->next == B and > +// !canDualIssue(A,B), if there is a Instruction C in the same BB which can > be > +// moved between A and B and canDualIssue(A,C) is true > +class PostRADualIssue : public Pass > +{ > +private: > + virtual bool visit(BasicBlock *); > +}; > + > +static bool > +isChainedCommutationLegal(Instruction *a, Instruction *b) > +{ > + Instruction *check = a; > + do { > + if (!check->isCommutationLegal(b) || (check->op == OP_TEX && check->op > == OP_TEX)) > + return false; > + check = check->next; > + } while (check != b); > + return true; > +} > + > +bool > +PostRADualIssue::visit(BasicBlock *bb) > +{ > + const Target *target = prog->getTarget(); > + Instruction *i, *next, *check; > + > + for (i = bb->getEntry(); i; i = next) { > + next = i->next; > + > + // check next > + if (!next || next->fixed || next->join || next->asFlow() || > target->canDualIssue(i, next)) > + continue; > + > + check = next->next; > + // we can't move fixed, flow instructions and instruction marked as > join > + while (check && !check->fixed && !check->join && !check->asFlow() && > check->prev->bb == check->bb && isChainedCommutationLegal(next, check)) { > + if (target->canDualIssue(i, check)) { > + // move the instruction after i step by step > + while (check->prev != i) > + bb->permuteAdjacent(check->prev, check); > + > + // we have to always continue with the currently next instruction > + next = i->next; > + break; > + } > + check = check->next; > + } > + } > + return true; > +} > + > +// > ============================================================================= > + > #define RUN_PASS(l, n, f) \ > if (level >= (l)) { \ > if (dbgFlags & NV50_IR_DEBUG_VERBOSE) \ > @@ -3420,6 +3476,9 @@ Program::optimizePostRA(int level) > RUN_PASS(2, FlatteningPass, run); > if (getTarget()->getChipset() < 0xc0) > RUN_PASS(2, NV50PostRaConstantFolding, run); > + // should be last > + if (getTarget()->hasDualIssueing()) > + RUN_PASS(2, PostRADualIssue, run); > > return true; > } > -- > 2.9.2 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev