Hi, Recently I discovered that our GLSL compiler spends a lot of time in rzalloc_size, so I looked at possible options to optimize that. It's worth noting that too many existing allocations slow down subsequent malloc calls, which in turn slows down the GLSL compiler. When I kept 5 instances of LLVMContext alive between compilations (I wanted to reuse them), the GLSL compiler slowed down. That shows that the GLSL compiler performance is too dependent on the size and complexity of the heap.
So I decided to write my own linear allocator and then compared it with jemalloc preloaded by LD, and jemalloc linked statically and used by ralloc only. The test was shader-db using AMD's shader collection. The command line was: time GALLIUM_NOOP=1 shader-db/run shaders The noop driver ensures the compilation process ends with TGSI. Default Mesa: real 0m58.343s user 3m48.828s sys 0m0.760s Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1: real 0m48.550s (17% less time) user 3m9.544s sys 0m1.700s Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against my libmesa_jemalloc_pic.a: real 0m49.580s (15% less time) user 3m14.452s sys 0m0.996s Ralloc using my own linear allocator that allocates out of 32KB buffers for 512b and smaller allocations: real 0m46.521s (20% less time) user 3m1.304s sys 0m1.740s Now let's test complete compilation down to GCN bytecode: Default Mesa: real 1m57.634s user 7m41.692s sys 0m1.824s Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1: real 1m42.604s (13% less time) user 6m39.776s sys 0m3.828s Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against my libmesa_jemalloc_pic.a: real 1m44.413s (11% less time) user 6m48.808s sys 0m2.480s Ralloc using my own linear allocator: real 1m40.486s (14.6% less time) user 6m34.456s sys 0m2.224s The linear allocator that I wrote has a very high memory usage due to the inability to free 32KB blocks if those blocks have at least one living allocation. The workaround would be to do realloc() when changing a ralloc parent in order to "defragment" the memory, but that's more involved. I don't know much about glibc, but it's hard to believe that glibc people have been purposely ignoring jemalloc for so long. There must be some anti-performance politics going on, but enough of speculations. If we don't care about memory usage, let's use my allocator. If we do, let's import jemalloc into the Mesa tree and use it for ralloc. That "11% less time" spent in the shader compiler (which includes LLVM) would be nice to have. Opinions? Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev