With the shader cache, compilation time matters less. As a side effect we can write more optimizations to produce better optimized code.
total instructions in shared programs : 3931743 -> 3917512 (-0.36%) total gprs used in shared programs : 481460 -> 481680 (0.05%) total local used in shared programs : 27481 -> 26761 (-2.62%) total bytes used in shared programs : 36032672 -> 35902648 (-0.36%) local gpr inst bytes helped 48 133 3843 3843 hurt 1 295 75 75 Signed-off-by: Karol Herbst <karolher...@gmail.com> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 0de84fe9fc..505de08573 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -3729,12 +3729,17 @@ Program::optimizeSSA(int level) RUN_PASS(1, CopyPropagation, run); RUN_PASS(1, MergeSplits, run); RUN_PASS(2, GlobalCSE, run); - RUN_PASS(1, LocalCSE, run); - RUN_PASS(2, AlgebraicOpt, run); - RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks - RUN_PASS(1, ConstantFolding, foldAll); - RUN_PASS(2, LateAlgebraicOpt, run); - RUN_PASS(1, Split64BitOpPreRA, run); + for (int i = 0; i < 2; ++i) { + RUN_PASS(1, LocalCSE, run); + RUN_PASS(2, AlgebraicOpt, run); + RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks + RUN_PASS(1, ConstantFolding, foldAll); + RUN_PASS(2, LateAlgebraicOpt, run); + // only once + if (i == 0) + RUN_PASS(1, Split64BitOpPreRA, run); + RUN_PASS(1, DeadCodeElim, buryAll); + } RUN_PASS(1, LoadPropagation, run); RUN_PASS(1, IndirectPropagation, run); RUN_PASS(2, MemoryOpt, run); -- 2.12.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev