Results for single-threaded shader-db (using shaders from one game only) including LLVM compilation:
Default: real 0m59.606s user 0m59.488s sys 0m0.104s Only ralloc is using jemalloc: real 0m55.284s (7.2% less time) user 0m55.032s sys 0m0.244s Ralloc is using my linear allocator: real 0m53.418s (10.4% less time) user 0m53.200s sys 0m0.208s Marek On Tue, Aug 30, 2016 at 11:51 AM, Marek Olšák <mar...@gmail.com> wrote: > Hi, > > Recently I discovered that our GLSL compiler spends a lot of time in > rzalloc_size, so I looked at possible options to optimize that. It's > worth noting that too many existing allocations slow down subsequent > malloc calls, which in turn slows down the GLSL compiler. When I kept > 5 instances of LLVMContext alive between compilations (I wanted to > reuse them), the GLSL compiler slowed down. That shows that the GLSL > compiler performance is too dependent on the size and complexity of > the heap. > > So I decided to write my own linear allocator and then compared it > with jemalloc preloaded by LD, and jemalloc linked statically and used > by ralloc only. > > The test was shader-db using AMD's shader collection. The command line was: > time GALLIUM_NOOP=1 shader-db/run shaders > The noop driver ensures the compilation process ends with TGSI. > > > Default Mesa: > real 0m58.343s > user 3m48.828s > sys 0m0.760s > > Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1: > real 0m48.550s (17% less time) > user 3m9.544s > sys 0m1.700s > > Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against > my libmesa_jemalloc_pic.a: > real 0m49.580s (15% less time) > user 3m14.452s > sys 0m0.996s > > Ralloc using my own linear allocator that allocates out of 32KB > buffers for 512b and smaller allocations: > real 0m46.521s (20% less time) > user 3m1.304s > sys 0m1.740s > > > Now let's test complete compilation down to GCN bytecode: > > Default Mesa: > real 1m57.634s > user 7m41.692s > sys 0m1.824s > > Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1: > real 1m42.604s (13% less time) > user 6m39.776s > sys 0m3.828s > > Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against > my libmesa_jemalloc_pic.a: > real 1m44.413s (11% less time) > user 6m48.808s > sys 0m2.480s > > Ralloc using my own linear allocator: > real 1m40.486s (14.6% less time) > user 6m34.456s > sys 0m2.224s > > > The linear allocator that I wrote has a very high memory usage due to > the inability to free 32KB blocks if those blocks have at least one > living allocation. The workaround would be to do realloc() when > changing a ralloc parent in order to "defragment" the memory, but > that's more involved. > > I don't know much about glibc, but it's hard to believe that glibc > people have been purposely ignoring jemalloc for so long. There must > be some anti-performance politics going on, but enough of > speculations. > > If we don't care about memory usage, let's use my allocator. If we do, > let's import jemalloc into the Mesa tree and use it for ralloc. That > "11% less time" spent in the shader compiler (which includes LLVM) > would be nice to have. > > Opinions? > > Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev