Hi Marek Series is Tested-by: Edmondo Tommasina <edmondo.tommas...@gmail.com>
I've merged your series of patches on top of mesa git master and tested on a Radeon RX 470. No regressions found. OpenGL renderer string: Gallium 0.4 on AMD POLARIS10 (DRM 3.3.0 / 4.8.0-rc6, LLVM 3.9.0) OpenGL core profile version string: 4.3 (Core Profile) Mesa 12.1.0-devel (git-e076df4) Games tested: * OpenMW * The Witcher 2 * The Talos Principle * Wasteland 2 Thanks and regards edmondo On Sat, Oct 8, 2016 at 12:58 PM, Marek Olšák <mar...@gmail.com> wrote: > Hi, > > This patch series reduces the number of malloc calls in the GLSL > compiler by 63%. That leads to better compile times and less heap > thrashing. > > It's done by switching memory allocations in the GLSL compiler to my > new linear allocator that allocates out of a fixed-sized buffer with > a monotonically increasing offset. If more buffers are needed, it > chains them. > > The new allocator is used in all places where short-lived allocations > are used with a high number of malloc calls. The series also contains > other improvements not related to the new allocator that also improve > compile times. The results are below. > > I tested my shader-db with shaders only being compiled to TGSI. > (noop gallium driver) > > > master + libc's malloc: > > real 0m54.182s > user 3m33.640s > sys 0m0.620s > maxmem 275 MB > > > master + jemalloc preloaded: > > real 0m45.044s > user 2m56.356s > sys 0m1.652s > maxmem 284 MB > > > the series + libc's malloc: > > real 0m46.221s > user 3m2.080s > sys 0m0.544s > maxmem 270 MB > > > the series + jemalloc preloaded: > > real 0m40.729s > user 2m39.564s > sys 0m1.232s > maxmem 284 MB > > > The series without jemalloc almost caught up with jemalloc + master. > However, jemalloc also benefits. > > Current Mesa needs 54.182s and it drops to 40.729s with my series and > jemalloc. The total change in compile time is -25% if we incorporate > both. Without jemalloc, the difference is only -14.7%. > > With radeonsi, the improvement is approx. slightly more than 1/2 of that > (if you add the LLVM time). However, radeonsi also has asynchronous > shader compilation hiding LLVM overhead in some cases, so it depends. > > Drivers with faster compiler backends will benefit more than radeonsi, > but will probably not reach -25% or -14.7% (except softpipe, which uses > TGSI as-is). > > The memory usage looks reasonable in all tested cases. > > Note: One of the first patches moves memset from ralloc to rzalloc. > I tested and fixed the GLSL source -> TGSI path, but other codepaths > may break, and you need to use valgrind to find all uninitialized > variables that relied on ralloc doing memset (if there are any). > > You can also find it here: > https://cgit.freedesktop.org/~mareko/mesa/log/?h=glsl-alloc-rework > > Please review. > > src/compiler/glsl/ast.h | 4 +- > src/compiler/glsl/ast_to_hir.cpp | 4 +- > src/compiler/glsl/ast_type.cpp | 13 ++- > src/compiler/glsl/glcpp/glcpp-lex.l | 2 +- > src/compiler/glsl/glcpp/glcpp-parse.y | 203 > +++++++++++++++++--------------------- > src/compiler/glsl/glcpp/glcpp.h | 1 + > src/compiler/glsl/glsl_lexer.ll | 16 +-- > src/compiler/glsl/glsl_parser.yy | 202 > +++++++++++++++++++------------------- > src/compiler/glsl/glsl_parser_extras.cpp | 6 +- > src/compiler/glsl/glsl_parser_extras.h | 4 +- > src/compiler/glsl/glsl_symbol_table.cpp | 19 ++-- > src/compiler/glsl/glsl_symbol_table.h | 1 + > src/compiler/glsl/ir.cpp | 4 + > src/compiler/glsl/ir.h | 13 ++- > src/compiler/glsl/link_uniform_blocks.cpp | 2 +- > src/compiler/glsl/list.h | 2 +- > src/compiler/glsl/lower_packed_varyings.cpp | 8 +- > src/compiler/glsl/opt_constant_propagation.cpp | 14 ++- > src/compiler/glsl/opt_copy_propagation.cpp | 7 +- > src/compiler/glsl/opt_copy_propagation_elements.cpp | 19 ++-- > src/compiler/glsl/opt_dead_code_local.cpp | 12 ++- > src/compiler/glsl_types.cpp | 38 +------ > src/compiler/glsl_types.h | 6 +- > src/compiler/nir/nir.c | 8 +- > src/compiler/spirv/vtn_variables.c | 3 +- > src/gallium/drivers/freedreno/ir3/ir3.c | 2 +- > src/gallium/drivers/vc4/vc4_cl.c | 2 +- > src/gallium/drivers/vc4/vc4_program.c | 2 +- > src/gallium/drivers/vc4/vc4_simulator.c | 5 +- > src/mesa/drivers/dri/i965/brw_state_batch.c | 5 +- > src/util/ralloc.c | 392 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > src/util/ralloc.h | 93 ++++++++++++++++-- > 32 files changed, 782 insertions(+), 330 deletions(-) > > Marek > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev