glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf

Vladislav Egorov Sun, 01 Jan 2017 02:56:12 -0800


01.01.2017 06:41, Kenneth Graunke пишет:

On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote:

From: Marek Olšák <marek.ol...@amd.com>

This reduces compile times by 4.5% with the Gallium noop driver and
gl_constants::GLSLOptimizeConservatively == true.

Compile times of...what exactly?  Do you have any statistics for this
by itself?

Assuming we add your helper, this patch looks reasonable.
Reviewed-by: Kenneth Graunke <kenn...@whitecape.org>

BTW, I suspect you could get some additional speed up by changing

    parser->output = ralloc_strdup(parser, "");

to something like:

    parser->output = ralloc_size(parser, strlen(orig_concatenated_src));
    parser->output[0] = '\0';

to try and avoid reallocations.  rewrite_tail will realloc just enough
space every time it allocates, which means once you reallocate, you're
going to be calling realloc on every single token.  Yuck!

ralloc/talloc's string libraries were never meant for serious string
processing like the preprocessor does.  They're meant for convenience
when constructing debug messages which don't need to be that efficient.

Perhaps a better approach would be to have the preprocessor do this
itself.  Just ralloc_size() output and initialize the null byte.
reralloc to double the size if you need more space.  At the end of
preprocessing, reralloc to output_length at the end of free any waste
from doubling.

I suspect that would be a *lot* more efficient, and is probably what
we should have done in the first place...

I have similar patch (maybe need 1-2 days to clean it up), and I'vetested both variants. String in exponentially growing (by +50%) stringbuffer works better, but not *THAT* much better as I expected. It seemsthat in the sequence of str = realloc(str, 1001); str = realloc(str,1002); str = realloc(str, 1003), etc. most of reallocs will benon-moving in both glibc's allocator and jemalloc. For example, jemallochave size classes that already grow exponentially by 15-25% - ..., 4K,5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... reallocwill just test if the requested size belongs to the same size class anddo nothing. Reallocs inside of the same size class will be alwaysnon-moving and almost free. Overall avoiding formatted printing (DOUBLEformatted printing, which is entirely avoidable too) gives the singlelargest boost to the pre-processor.

Benchmark on my shader-db (glcpp and shader-db's run smashed together todo only preprocessing). Note that I used old jemalloc from Ubuntu 16.04,which can be important, because jemalloc changed its size class strategysince then.

perf stat --repeat 10
master                    8.91s
master+jemalloc           8.60s
Marek's patch             5.50s
Marek's patch+jemalloc    5.03s
my string_buffer          4.57s
my string_buffer+jemalloc 4.43s
my series                 3.83s
my series+jemalloc        3.68s

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf

Reply via email to