On Jan 1, 2017 11:55 AM, "Vladislav Egorov" <vegorov...@gmail.com> wrote:
01.01.2017 06:41, Kenneth Graunke пишет: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > >> From: Marek Olšák <marek.ol...@amd.com> >> >> This reduces compile times by 4.5% with the Gallium noop driver and >> gl_constants::GLSLOptimizeConservatively == true. >> > Compile times of...what exactly? Do you have any statistics for this > by itself? > > Assuming we add your helper, this patch looks reasonable. > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > BTW, I suspect you could get some additional speed up by changing > > parser->output = ralloc_strdup(parser, ""); > > to something like: > > parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); > parser->output[0] = '\0'; > > to try and avoid reallocations. rewrite_tail will realloc just enough > space every time it allocates, which means once you reallocate, you're > going to be calling realloc on every single token. Yuck! > > ralloc/talloc's string libraries were never meant for serious string > processing like the preprocessor does. They're meant for convenience > when constructing debug messages which don't need to be that efficient. > > Perhaps a better approach would be to have the preprocessor do this > itself. Just ralloc_size() output and initialize the null byte. > reralloc to double the size if you need more space. At the end of > preprocessing, reralloc to output_length at the end of free any waste > from doubling. > > I suspect that would be a *lot* more efficient, and is probably what > we should have done in the first place... > I have similar patch (maybe need 1-2 days to clean it up), and I've tested both variants. String in exponentially growing (by +50%) string buffer works better, but not *THAT* much better as I expected. It seems that in the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str = realloc(str, 1003), etc. most of reallocs will be non-moving in both glibc's allocator and jemalloc. For example, jemalloc have size classes that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc will just test if the requested size belongs to the same size class and do nothing. Reallocs inside of the same size class will be always non-moving and almost free. Overall avoiding formatted printing (DOUBLE formatted printing, which is entirely avoidable too) gives the single largest boost to the pre-processor. Benchmark on my shader-db (glcpp and shader-db's run smashed together to do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which can be important, because jemalloc changed its size class strategy since then. perf stat --repeat 10 master 8.91s master+jemalloc 8.60s Marek's patch 5.50s Marek's patch+jemalloc 5.03s my string_buffer 4.57s my string_buffer+jemalloc 4.43s my series 3.83s my series+jemalloc 3.68s Since you are further than me, let's merge your work instead. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev