Thanks for all your work on Nouveau and I look forward to your contributions to radeonsi
On Thu, 22 Dec 2016 at 23:16 Samuel Pitoiset <samuel.pitoi...@gmail.com> wrote: > Hello, > > This series makes use of the scheduling control code in order to improve > the > instruction pipelining on Maxwell GPUs. > > Starting with the Kepler architecture, where a control instruction has to > be > inserted every 7 instructions, Maxwell added additional control codes and > the > control instruction now has to be every 3 instructions. Maxwell control > codes > are really powerful and well documented [1]. By the way, I would like to > thank > Scott Gray who did an awesome reverse engineering work, although I had to > figure out the missing parts myself. > > On Maxwell, control codes are mainly used for setting the number of stall > counts and for producing/consumming dependency barriers in order to avoid > hazards. I'm not going to explain in details how do they work because the > documentation is quite good and because I added explanations here and there > in the source code. But the main thing to understand is that the previous > control code used by default (ie. st 0x0) means "wait for all dependencies > and stall the pipeline for 15 cycles which is the maximum". > Which is quite bad... > > Now, let's have a look at the (impressive) performance improvements. :-) > I measured on a GeForce GTX 750 Ti (GM107) reclocked to the highest perf > level, > with and without the control codes (NV50_PROG_SCHED=0/1). > > app: number of FPS without -> number of FPS with (+gain%) > > FurMark: 13 -> 42 (+223%) > Pixmark Piano: 2 -> 7 (+250%) > Pixmark Volposion: 6 -> 20 (+233%) > Julia F32: 61 -> 219 (+259%) > LightMarks: 352 -> 685 (+94%) > Heaven (low): 51 -> 102 (+100%) > Heaven (ultra): 14 -> 27 (+93%) > Valley (low): 30 -> 68 (+126%) > Valley (ultra): 18 -> 39 (+100%) > Talos (low): 32 -> 50 (+56%) > Talos (ultra): 7 -> 14 (+100%) > Shadow of Mordor (lowest): 13 -> 20 (+53%) > > That's it! I think it's enough to understand the power of Maxwell control > codes. We may get additional numbers from Phoronix (wink, wink, Michael). > As I said in the main patch, the control codes can be disabled with > 'export NV50_PROG_SCHED=0'. > > Now, let's have a look how nouveau performs compared to NVIDIA's blob. > > FurMark: 42 -> 59 (+40%) > Pixmark Piano: 7 -> 13 (+85%) > Pixmark Volposion: 20 -> 42 (+110%) > Julia F32: 219 -> 351 (+60%) > LightMarks: 685 -> 1192 (+74%) > Heaven (low): 102 -> 144 (+41%) > Heaven (ultra): 27 -> 46 (+70%) > Valley (low): 68 -> 94 (+38%) > Valley (ultra): 39 -> 60 (+53%) > Talos (low): 50 -> 128 (+156%) > Talos (ultra): 14 -> 30 (+114%) > Shadow of Mordor (lowest): 20 -> 77 (+285%) > > Nouveau is still far away from the blob, but now I think Maxwell is > actually > in roughly the same shape as Kepler in terms of performance and features. > Speaking about this, I will enable OpenGL 4.3 on Maxwell in a separate > patch, > later on. > > The overhead at compile time added by this seris is rather small. For a > full > shader-db run with my private repository of shaders, it takes approximately > 208s for compiling 25k shaders before the series and approximately 211s > after. > Less than 2% of overhead and it's comparable to a full shader-db run on > Kepler. > > No regressions with both piglit and dEQP (tested multiple times) and all > benchmarks/games I have tried render fine and seem to be quite stable. > > Due to a lack of time, some parts are still left to do and some others > could > be improved. With the following ideas implemented I'm pretty sure we can > improve performance significantly. > > * Add support for the yield flag. This seems to be a hint to the hardware > for > improving how the work is balanced between the warps. I didn't figure out > how and where to use it without breaking a bunch of things. Need time and > patience. > > * Add support for dual-issue, the rules are pretty different than Kepler > especially because of the dependency barriers. Note that the yield flag > has > to be set, otherwise the hardware won't dual-issue and in fact it will > wait > for all dependencies (ie. st 0x0) which is really different that what you > are looking for. > > * Reduce stall counts. A bunch of instructions have a read latency which > is the > number of cycles before they can actually read the sources. This should > be > fairly easy to implement but will require some reverse engineering to > completely understand the idea. > > This is my last contribution for the Nouveau driver for a while because I > have > been hired by Valve to work on radeonsi. Do not expect such perf > improvements > with radeonsi because it already performs really well, unlike Nouveau. But > with time and patience we can do better. :-) > > This series is also available from my fdo account: > https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=gm107_scheduler > > Please, review! > Thanks. > > [1] https://github.com/NervanaSystems/maxas/wiki/Control-Codes > > Samuel Pitoiset (5): > nv50/ir: do not insert texture barriers on gm107 > nv50/ir: improve instruction pipelining on gm107 > nv50/ir: use sched control codes for gm107 builtins > nvc0: use sched control codes for gm107 blitter shader > nvc0: use sched control codes for gm107 MP counters code > > src/gallium/drivers/nouveau/codegen/lib/gm107.asm | 40 +- > .../drivers/nouveau/codegen/lib/gm107.asm.h | 40 +- > .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 771 > ++++++++++++++++++++- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 +- > .../nouveau/codegen/nv50_ir_target_gm107.cpp | 253 +++++++ > .../drivers/nouveau/codegen/nv50_ir_target_gm107.h | 7 + > .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 88 +-- > src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 20 +- > 8 files changed, 1127 insertions(+), 95 deletions(-) > > -- > 2.11.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev