Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
01.01.2017 06:41, Kenneth Graunke пишет: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: From: Marek Olšák This reduces compile times by 4.5% with the Gallium noop driver and gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... I have similar patch (maybe need 1-2 days to clean it up), and I've tested both variants. String in exponentially growing (by +50%) string buffer works better, but not *THAT* much better as I expected. It seems that in the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str = realloc(str, 1003), etc. most of reallocs will be non-moving in both glibc's allocator and jemalloc. For example, jemalloc have size classes that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc will just test if the requested size belongs to the same size class and do nothing. Reallocs inside of the same size class will be always non-moving and almost free. Overall avoiding formatted printing (DOUBLE formatted printing, which is entirely avoidable too) gives the single largest boost to the pre-processor. Benchmark on my shader-db (glcpp and shader-db's run smashed together to do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which can be important, because jemalloc changed its size class strategy since then. perf stat --repeat 10 master8.91s master+jemalloc 8.60s Marek's patch 5.50s Marek's patch+jemalloc5.03s my string_buffer 4.57s my string_buffer+jemalloc 4.43s my series 3.83s my series+jemalloc3.68s ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Jan 1, 2017 4:41 AM, "Kenneth Graunke" wrote: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > From: Marek Olšák > > This reduces compile times by 4.5% with the Gallium noop driver and > gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? I always use my shader-db with almost 30k shaders and I used the time command. Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... Sure. However, realloc is a no-op when there is free space after the allocation. It's a useless call, but it doesn't look too bad in a profiler. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Jan 1, 2017 11:55 AM, "Vladislav Egorov" wrote: 01.01.2017 06:41, Kenneth Graunke пишет: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > >> From: Marek Olšák >> >> This reduces compile times by 4.5% with the Gallium noop driver and >> gl_constants::GLSLOptimizeConservatively == true. >> > Compile times of...what exactly? Do you have any statistics for this > by itself? > > Assuming we add your helper, this patch looks reasonable. > Reviewed-by: Kenneth Graunke > > BTW, I suspect you could get some additional speed up by changing > > parser->output = ralloc_strdup(parser, ""); > > to something like: > > parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); > parser->output[0] = '\0'; > > to try and avoid reallocations. rewrite_tail will realloc just enough > space every time it allocates, which means once you reallocate, you're > going to be calling realloc on every single token. Yuck! > > ralloc/talloc's string libraries were never meant for serious string > processing like the preprocessor does. They're meant for convenience > when constructing debug messages which don't need to be that efficient. > > Perhaps a better approach would be to have the preprocessor do this > itself. Just ralloc_size() output and initialize the null byte. > reralloc to double the size if you need more space. At the end of > preprocessing, reralloc to output_length at the end of free any waste > from doubling. > > I suspect that would be a *lot* more efficient, and is probably what > we should have done in the first place... > I have similar patch (maybe need 1-2 days to clean it up), and I've tested both variants. String in exponentially growing (by +50%) string buffer works better, but not *THAT* much better as I expected. It seems that in the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str = realloc(str, 1003), etc. most of reallocs will be non-moving in both glibc's allocator and jemalloc. For example, jemalloc have size classes that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc will just test if the requested size belongs to the same size class and do nothing. Reallocs inside of the same size class will be always non-moving and almost free. Overall avoiding formatted printing (DOUBLE formatted printing, which is entirely avoidable too) gives the single largest boost to the pre-processor. Benchmark on my shader-db (glcpp and shader-db's run smashed together to do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which can be important, because jemalloc changed its size class strategy since then. perf stat --repeat 10 master8.91s master+jemalloc 8.60s Marek's patch 5.50s Marek's patch+jemalloc5.03s my string_buffer 4.57s my string_buffer+jemalloc 4.43s my series 3.83s my series+jemalloc3.68s Since you are further than me, let's merge your work instead. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.
https://bugs.freedesktop.org/show_bug.cgi?id=99240 Bug ID: 99240 Summary: glxinfo reports different extensions for wayland vs xorg. Product: Mesa Version: 13.0 Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: pandiculationfi...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org Created attachment 128704 --> https://bugs.freedesktop.org/attachment.cgi?id=128704&action=edit wayland glxinfo output glxinfo reports different opengl extensions under wayland when compared to xorg. seems to possibly cause stellaris to fail to launch. I'm not sure if this is a wayland bug, or a mesa bug, maybe even a SDL bug? wayland stellaris errors: ==> system.log <== [08:53:21][gfx_opengl.cpp:3187]: Loading settings for adapter -1, fullscreen=1, multisample=15 [08:53:21][gfx_opengl.cpp:3257]: Checking for multi-sampling support: [08:53:21][gfx_opengl.cpp:3282]: MaxTextureWidth: 16384 [08:53:21][gfx_opengl.cpp:3283]: MaxTextureHeight: 16384 [08:53:21][graphicssettings.cpp:149]: 2 samples supported. [08:53:21][graphicssettings.cpp:149]: 4 samples supported. [08:53:21][graphicssettings.cpp:149]: 8 samples supported. [08:53:21][graphicssettings.cpp:180]: Support for Anisotropic filtering found. [08:53:21][graphicssettings.cpp:183]: Max Anisotropic filtering supported: 16 [08:53:21][graphicssettings.cpp:184]: Anisotropic filtering set to: 16 [08:53:21][graphicssettings.cpp:669]: Using multisampling: 4 [08:53:21][graphicssettings.cpp:669]: Using multisampling: 4 [08:53:21][graphics.cpp:762]: Creating device with: HARDWARE_VERTEXPROCESSING Fullscreen=no Resolution=0x0 ==> error.log <== [08:53:21][gfx_opengl.cpp:1066]: SDL_GetWindowFromID failed: Invalid window [08:53:21][gfx_opengl.cpp:1077]: SDL_GL_CreateContext failed: Invalid window [08:53:21][gfx_opengl.cpp:]: glewInit failed: Missing GL version ==> system.log <== [08:53:21][gfx_opengl.cpp:1028]: OpenGL Version: [08:53:21][graphics.cpp:797]: Done creating device. ==> error.log <== [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/text.shader(Text) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/text.shader(Text3D) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Up) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Down) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Disable) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Over) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(TextUp) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(TextDown) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(TextDisable) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(TextOver) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Up) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Down) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Disable) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Over) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Up) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Down) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Disable) [08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader gfx/FX/buttonstate_onlydisable.shader(Over) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.
https://bugs.freedesktop.org/show_bug.cgi?id=99240 --- Comment #1 from pandiculationfi...@gmail.com --- Created attachment 128705 --> https://bugs.freedesktop.org/attachment.cgi?id=128705&action=edit xorg glxinfo output -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.
https://bugs.freedesktop.org/show_bug.cgi?id=99240 --- Comment #2 from pandiculationfi...@gmail.com --- diff of the first 50 or so lines of glxinfo. diff wayland-glxinfo.txt xorg-glxinfo.txt | head -50 7,8c7,9 < GLX_ARB_create_context, GLX_ARB_create_context_profile, < GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample, --- > GLX_ARB_create_context, GLX_ARB_create_context_profile, > GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, > GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample, 12,14c13,15 < GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_OML_swap_method, < GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, < GLX_SGIX_visual_select_group, GLX_SGI_make_current_read --- > GLX_EXT_visual_rating, GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer, > GLX_OML_swap_method, GLX_SGIS_multisample, GLX_SGIX_fbconfig, > GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_SGI_swap_control 34,39c35,41 < GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, < GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_buffer_age, < GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile, < GLX_EXT_fbconfig_packed_float, GLX_EXT_framebuffer_sRGB, < GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, < GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, --- > GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, > GLX_ARB_framebuffer_sRGB, GLX_ARB_get_proc_address, GLX_ARB_multisample, > GLX_EXT_buffer_age, GLX_EXT_create_context_es2_profile, > GLX_EXT_create_context_es_profile, GLX_EXT_fbconfig_packed_float, > GLX_EXT_framebuffer_sRGB, GLX_EXT_import_context, > GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, > GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer, 44c46 < GLX_SGI_video_sync --- > GLX_SGI_swap_control, GLX_SGI_video_sync 50c52 < Video memory: 7994MB --- > Video memory: 8039MB 300c302 < 192 GLX Visuals --- > 480 GLX Visuals -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/14] glsl: Don't rehash the values when copying to new table
Really, we should have some kind of function for copying the whole table, but this will work for now. --- src/compiler/glsl/opt_copy_propagation.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/compiler/glsl/opt_copy_propagation.cpp b/src/compiler/glsl/opt_copy_propagation.cpp index 247c4988ed..e9f82e0644 100644 --- a/src/compiler/glsl/opt_copy_propagation.cpp +++ b/src/compiler/glsl/opt_copy_propagation.cpp @@ -210,7 +210,8 @@ ir_copy_propagation_visitor::handle_if_block(exec_list *instructions) /* Populate the initial acp with a copy of the original */ struct hash_entry *entry; hash_table_foreach(orig_acp, entry) { - _mesa_hash_table_insert(acp, entry->key, entry->data); + _mesa_hash_table_insert_pre_hashed(acp, entry->hash, + entry->key, entry->data); } visit_list_elements(this, instructions); @@ -259,7 +260,8 @@ ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp) if (keep_acp) { struct hash_entry *entry; hash_table_foreach(orig_acp, entry) { - _mesa_hash_table_insert(acp, entry->key, entry->data); + _mesa_hash_table_insert_pre_hashed(acp, entry->hash, +entry->key, entry->data); } } -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation
Hi all. This is a series I've put together to lower the overhead of the copy propagation pass in glsl. The series is not refined, so please treat it as a proof of concept / RFC. The first five patches might be merge-ready, but their benefit is quite low compared to the actual major change here. The series goes on to copy our existing hash table implementation, and modify it in steps to allow insertion of multiple data entries with the same key. I've kept the series in steps corresponding to the way I worked when modifying it, as I found it nice for clarity. There is some obvious cleanups and squashing that will need to happen before this can be considered for merging. The major change is the last patch. This uses the new hash table in our copy propagation pass. This effectively makes the algorithm O(2n) instead of O(n^2), which is a big win. I tested with the shader mentioned in [1] but took the shaders and made a shader_test that I used in conjunction with shader-db to do testing. I cut down on the ridiculous amount of expression some, to make it even compile in a reasonable time on my poor laptop. So, on to the results. Compilation of the aforementioned shader went from 98 to 75 seconds. Perf shows that previously we spent 19% of the time walking the acp-table with _mesa_hash_table_next_entry. After this series, the hash table is practically gone from the profile. This is a major win, and I'm guite happy with the results. There is obviously a lot more work to be done though. Please leave critical feedback and ideas. This was merely a test from my part, but it seems there is some opportunities here. https://bugs.freedesktop.org/show_bug.cgi?id=94477 Thomas Helland (14): glsl: Don't rehash the values when copying to new table glsl: Don't compute the hash when we already have it available nir: Remove unused include of nir_array nir: Move nir_array to util, and rename to dyn_array glsl: Use dyn_array instead of the exec_list util: Make a copy of the existing hash table implementation util: Add field for pointing to next data util: Implement the insertion functionality util: Implement deletion of entries util: Add a comment about ordering of entries with matching keys util: Implement returning the last added entry on search util: Rename functions and structs util: Use hashing functions from hash_table.h glsl: Restructure copy propagation to use the non replacing hash table src/compiler/Makefile.sources | 1 - src/compiler/glsl/opt_copy_propagation.cpp | 112 +++-- .../glsl/opt_copy_propagation_elements.cpp | 6 +- src/compiler/nir/nir_lower_locals_to_regs.c| 1 - src/compiler/spirv/vtn_cfg.c | 6 +- src/compiler/spirv/vtn_private.h | 4 +- src/util/Makefile.sources | 3 + src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +- src/util/non_replacing_hash_table.c| 513 + src/util/non_replacing_hash_table.h| 135 ++ 10 files changed, 743 insertions(+), 56 deletions(-) rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%) create mode 100644 src/util/non_replacing_hash_table.c create mode 100644 src/util/non_replacing_hash_table.h -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/14] nir: Remove unused include of nir_array
--- src/compiler/nir/nir_lower_locals_to_regs.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/compiler/nir/nir_lower_locals_to_regs.c b/src/compiler/nir/nir_lower_locals_to_regs.c index f1af237b5e..d0667bc504 100644 --- a/src/compiler/nir/nir_lower_locals_to_regs.c +++ b/src/compiler/nir/nir_lower_locals_to_regs.c @@ -26,7 +26,6 @@ */ #include "nir.h" -#include "nir_array.h" struct locals_to_regs_state { nir_shader *shader; -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/14] glsl: Don't compute the hash when we already have it available
We should really have a function to copy the table contents, but this will at least get us somewhere in the meantime. --- src/compiler/glsl/opt_copy_propagation_elements.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp b/src/compiler/glsl/opt_copy_propagation_elements.cpp index 9f79fa9202..3d11d028c5 100644 --- a/src/compiler/glsl/opt_copy_propagation_elements.cpp +++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp @@ -143,11 +143,13 @@ public: struct hash_entry *entry; hash_table_foreach(lhs, entry) { - _mesa_hash_table_insert(lhs_ht, entry->key, entry->data); + _mesa_hash_table_insert_pre_hashed(lhs_ht, entry->hash, +entry->key, entry->data); } hash_table_foreach(rhs, entry) { - _mesa_hash_table_insert(rhs_ht, entry->key, entry->data); + _mesa_hash_table_insert_pre_hashed(rhs_ht, entry->hash, +entry->key, entry->data); } } -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/14] util: Add field for pointing to next data
--- src/util/non_replacing_hash_table.h | 5 + 1 file changed, 5 insertions(+) diff --git a/src/util/non_replacing_hash_table.h b/src/util/non_replacing_hash_table.h index b35ee871bb..7550400539 100644 --- a/src/util/non_replacing_hash_table.h +++ b/src/util/non_replacing_hash_table.h @@ -42,6 +42,11 @@ struct hash_entry { uint32_t hash; const void *key; void *data; + /* +* If 0, no more data for this key. +* If anything else, it is position of the next data + 1. +*/ + uint32_t next_data; }; struct hash_table { -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/14] util: Make a copy of the existing hash table implementation
--- src/util/Makefile.sources | 2 + src/util/non_replacing_hash_table.c | 504 src/util/non_replacing_hash_table.h | 159 3 files changed, 665 insertions(+) create mode 100644 src/util/non_replacing_hash_table.c create mode 100644 src/util/non_replacing_hash_table.h diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources index b24611a363..6d7f0884fb 100644 --- a/src/util/Makefile.sources +++ b/src/util/Makefile.sources @@ -20,6 +20,8 @@ MESA_UTIL_FILES :=\ macros.h \ mesa-sha1.c \ mesa-sha1.h \ + non_replacing_hash_table.c \ + non_replacing_hash_table.h \ ralloc.c \ ralloc.h \ register_allocate.c \ diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c new file mode 100644 index 00..1730b1ee0d --- /dev/null +++ b/src/util/non_replacing_hash_table.c @@ -0,0 +1,504 @@ +/* + * Copyright © 2009,2012 Intel Corporation + * Copyright © 1988-2004 Keith Packard and Bart Massey. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Except as contained in this notice, the names of the authors + * or their institutions shall not be used in advertising or + * otherwise to promote the sale, use or other dealings in this + * Software without prior written authorization from the + * authors. + * + * Authors: + *Eric Anholt + *Keith Packard + */ + +/** + * Implements an open-addressing, linear-reprobing hash table. + * + * For more information, see: + * + * http://cgit.freedesktop.org/~anholt/hash_table/tree/README + */ + +#include +#include +#include + +#include "non_replacing_hash_table.h" +#include "ralloc.h" +#include "macros.h" + +static const uint32_t deleted_key_value; + +/** + * From Knuth -- a good choice for hash/rehash values is p, p-2 where + * p and p-2 are both prime. These tables are sized to have an extra 10% + * free to avoid exponential performance degradation as the hash table fills + */ +static const struct { + uint32_t max_entries, size, rehash; +} hash_sizes[] = { + { 2,5, 3 }, + { 4,7, 5 }, + { 8,13, 11}, + { 16, 19, 17}, + { 32, 43, 41}, + { 64, 73, 71}, + { 128, 151,149 }, + { 256, 283,281 }, + { 512, 571,569 }, + { 1024, 1153, 1151 }, + { 2048, 2269, 2267 }, + { 4096, 4519, 4517 }, + { 8192, 9013, 9011 }, + { 16384,18043, 18041 }, + { 32768,36109, 36107 }, + { 65536,72091, 72089 }, + { 131072, 144409, 144407}, + { 262144, 288361, 288359}, + { 524288, 576883, 576881}, + { 1048576, 1153459,1153457 }, + { 2097152, 2307163,2307161 }, + { 4194304, 4613893,4613891 }, + { 8388608, 9227641,9227639 }, + { 16777216, 18455029, 18455027 }, + { 33554432, 36911011, 36911009 }, + { 67108864, 73819861, 73819859 }, + { 134217728,147639589, 147639587 }, + { 268435456,295279081, 295279079 }, + { 536870912,590559793, 590559791 }, + { 1073741824, 1181116273, 1181116271}, + { 2147483648ul, 2362232233ul, 2362232231ul} +}; + +static int +entry_is_free(const struct hash_entry *entry) +{ + return entry->key == NULL; +} + +static int +entry_
[Mesa-dev] [PATCH 05/14] glsl: Use dyn_array instead of the exec_list
This should give us lower memory consumption, allocation overhead, better cache locality, and all this nice stuff. --- src/compiler/glsl/opt_copy_propagation.cpp | 57 +- 1 file changed, 25 insertions(+), 32 deletions(-) diff --git a/src/compiler/glsl/opt_copy_propagation.cpp b/src/compiler/glsl/opt_copy_propagation.cpp index e9f82e0644..aa5e813553 100644 --- a/src/compiler/glsl/opt_copy_propagation.cpp +++ b/src/compiler/glsl/opt_copy_propagation.cpp @@ -38,24 +38,10 @@ #include "ir_optimization.h" #include "compiler/glsl_types.h" #include "util/hash_table.h" +#include "util/dyn_array.h" namespace { -class kill_entry : public exec_node -{ -public: - /* override operator new from exec_node */ - DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(kill_entry) - - kill_entry(ir_variable *var) - { - assert(var); - this->var = var; - } - - ir_variable *var; -}; - class ir_copy_propagation_visitor : public ir_hierarchical_visitor { public: ir_copy_propagation_visitor() @@ -65,7 +51,8 @@ public: lin_ctx = linear_alloc_parent(mem_ctx, 0); acp = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer, _mesa_key_pointer_equal); - this->kills = new(mem_ctx) exec_list; + this->kills = ralloc(mem_ctx, dyn_array); + dyn_array_init(this->kills, mem_ctx); killed_all = false; } ~ir_copy_propagation_visitor() @@ -89,10 +76,9 @@ public: /** Hash of lhs->rhs: The available copies to propagate */ hash_table *acp; /** -* List of kill_entry: The variables whose values were killed in this -* block. +* List of the variables whose values were killed in this block. */ - exec_list *kills; + dyn_array *kills; bool progress; @@ -112,12 +98,13 @@ ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir) * main() at link time, so they're irrelevant to us. */ hash_table *orig_acp = this->acp; - exec_list *orig_kills = this->kills; + dyn_array *orig_kills = this->kills; bool orig_killed_all = this->killed_all; acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); - this->kills = new(mem_ctx) exec_list; + this->kills = ralloc(mem_ctx, dyn_array); + dyn_array_init(this->kills, mem_ctx); this->killed_all = false; visit_list_elements(this, &ir->body); @@ -199,12 +186,13 @@ void ir_copy_propagation_visitor::handle_if_block(exec_list *instructions) { hash_table *orig_acp = this->acp; - exec_list *orig_kills = this->kills; + dyn_array *orig_kills = this->kills; bool orig_killed_all = this->killed_all; acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); - this->kills = new(mem_ctx) exec_list; + this->kills = ralloc(mem_ctx, dyn_array); + dyn_array_init(this->kills, mem_ctx); this->killed_all = false; /* Populate the initial acp with a copy of the original */ @@ -220,16 +208,18 @@ ir_copy_propagation_visitor::handle_if_block(exec_list *instructions) _mesa_hash_table_clear(orig_acp, NULL); } - exec_list *new_kills = this->kills; + dyn_array *new_kills = this->kills; this->kills = orig_kills; _mesa_hash_table_destroy(acp, NULL); this->acp = orig_acp; this->killed_all = this->killed_all || orig_killed_all; - foreach_in_list(kill_entry, k, new_kills) { - kill(k->var); + dyn_array_foreach(new_kills, ir_variable *, var_ptr) { + ir_variable *var = *var_ptr; + kill(var); } + dyn_array_fini(new_kills); ralloc_free(new_kills); } @@ -249,12 +239,13 @@ void ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp) { hash_table *orig_acp = this->acp; - exec_list *orig_kills = this->kills; + dyn_array *orig_kills = this->kills; bool orig_killed_all = this->killed_all; acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); - this->kills = new(mem_ctx) exec_list; + this->kills = ralloc(mem_ctx, dyn_array); + dyn_array_init(this->kills, mem_ctx); this->killed_all = false; if (keep_acp) { @@ -271,16 +262,18 @@ ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp) _mesa_hash_table_clear(orig_acp, NULL); } - exec_list *new_kills = this->kills; + dyn_array *new_kills = this->kills; this->kills = orig_kills; _mesa_hash_table_destroy(acp, NULL); this->acp = orig_acp; this->killed_all = this->killed_all || orig_killed_all; - foreach_in_list(kill_entry, k, new_kills) { - kill(k->var); + dyn_array_foreach(new_kills, ir_variable *, var_ptr) { + ir_variable *var = *var_ptr; + kill(var); } + dyn_array_fini(new_kills); ralloc_free(new_kills); } @@ -320,7 +313,7 @@ ir_copy_propagation_visitor::kill(ir_variable *var) /* Add the LHS varia
[Mesa-dev] [PATCH 04/14] nir: Move nir_array to util, and rename to dyn_array
-- We might want to merge u_vector and this array, or just replace this implementation completely. Having two separate implementations, where one of them has one sole user seems a bit wierd. --- src/compiler/Makefile.sources | 1 - src/compiler/spirv/vtn_cfg.c | 6 +++--- src/compiler/spirv/vtn_private.h | 4 ++-- src/util/Makefile.sources | 1 + src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +- 5 files changed, 15 insertions(+), 15 deletions(-) rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%) diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index be2215cc10..9c8f802cd2 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -178,7 +178,6 @@ NIR_GENERATED_FILES = \ NIR_FILES = \ nir/nir.c \ nir/nir.h \ - nir/nir_array.h \ nir/nir_builder.h \ nir/nir_clone.c \ nir/nir_constant_expressions.h \ diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c index 62b9056990..14255798cb 100644 --- a/src/compiler/spirv/vtn_cfg.c +++ b/src/compiler/spirv/vtn_cfg.c @@ -183,7 +183,7 @@ vtn_add_case(struct vtn_builder *b, struct vtn_switch *swtch, list_inithead(&c->body); c->start_block = case_block; c->fallthrough = NULL; - nir_array_init(&c->values, b); + dyn_array_init(&c->values, b); c->is_default = false; c->visited = false; @@ -195,7 +195,7 @@ vtn_add_case(struct vtn_builder *b, struct vtn_switch *swtch, if (is_default) { case_block->switch_case->is_default = true; } else { - nir_array_add(&case_block->switch_case->values, uint32_t, val); + dyn_array_add(&case_block->switch_case->values, uint32_t, val); } } @@ -722,7 +722,7 @@ vtn_emit_cf_list(struct vtn_builder *b, struct list_head *cf_list, } nir_ssa_def *cond = NULL; -nir_array_foreach(&cse->values, uint32_t, val) { +dyn_array_foreach(&cse->values, uint32_t, val) { nir_ssa_def *is_val = nir_ieq(&b->nb, sel, nir_imm_int(&b->nb, *val)); diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h index 9302611803..9e0a2137e2 100644 --- a/src/compiler/spirv/vtn_private.h +++ b/src/compiler/spirv/vtn_private.h @@ -27,7 +27,7 @@ #include "nir/nir.h" #include "nir/nir_builder.h" -#include "nir/nir_array.h" +#include "util/dyn_array.h" #include "nir_spirv.h" #include "spirv.h" @@ -112,7 +112,7 @@ struct vtn_case { struct vtn_case *fallthrough; /* The uint32_t values that map to this case */ - nir_array values; + dyn_array values; /* True if this is the default case */ bool is_default; diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources index d2ae5e734d..b24611a363 100644 --- a/src/util/Makefile.sources +++ b/src/util/Makefile.sources @@ -8,6 +8,7 @@ MESA_UTIL_FILES := \ debug.h \ disk_cache.c \ disk_cache.h \ + dyn_array.h \ format_r11g11b10f.h \ format_rgb9e5.h \ format_srgb.h \ diff --git a/src/compiler/nir/nir_array.h b/src/util/dyn_array.h similarity index 86% rename from src/compiler/nir/nir_array.h rename to src/util/dyn_array.h index 1db4e8cea3..8a06990ab4 100644 --- a/src/compiler/nir/nir_array.h +++ b/src/util/dyn_array.h @@ -36,10 +36,10 @@ typedef struct { size_t size; size_t alloc; void *data; -} nir_array; +} dyn_array; static inline void -nir_array_init(nir_array *arr, void *mem_ctx) +dyn_array_init(dyn_array *arr, void *mem_ctx) { arr->mem_ctx = mem_ctx; arr->size = 0; @@ -48,7 +48,7 @@ nir_array_init(nir_array *arr, void *mem_ctx) } static inline void -nir_array_fini(nir_array *arr) +dyn_array_fini(dyn_array *arr) { if (arr->mem_ctx) ralloc_free(arr->data); @@ -56,18 +56,18 @@ nir_array_fini(nir_array *arr) free(arr->data); } -#define NIR_ARRAY_INITIAL_SIZE 64 +#define DYN_ARRAY_INITIAL_SIZE 64 /* Increments the size of the array by the given ammount and returns a * pointer to the beginning of the newly added space. */ static inline void * -nir_array_grow(nir_array *arr, size_t additional) +dyn_array_grow(dyn_array *arr, size_t additional) { size_t new_size = arr->size + additional; if (new_size > arr->alloc) { if (arr->alloc == 0) - arr->alloc = NIR_ARRAY_INITIAL_SIZE; + arr->alloc = DYN_ARRAY_INITIAL_SIZE; while (new_size > arr->alloc) arr->alloc *= 2; @@ -84,10 +84,10 @@ nir_array_grow(nir_array *arr, size_t additional) return ptr; } -#define nir_array_add(arr, type, elem) \ - *(type *)nir_array_grow(arr, sizeof(type)) = (elem) +#define dyn_array_add(arr, type, elem) \ + *(type *)dyn_array_grow(arr, sizeof(type)) = (elem) -#define nir_array_foreach(arr, type, elem) \ +#define dyn_array_foreac
[Mesa-dev] [PATCH 10/14] util: Add a comment about ordering of entries with matching keys
--- src/util/non_replacing_hash_table.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index 33c93452c4..1128388ab8 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -34,10 +34,11 @@ /** * Implements an open-addressing, linear-reprobing hash table. - * - * For more information, see: - * - * http://cgit.freedesktop.org/~anholt/hash_table/tree/README + * This hash table does not implement replacement. Instead it + * allows for inserting multiple data entries for a single key. + * It is important to note that the table does not guarantee that + * the order of data with matching keys will not be scrambled + * during a rehashing of the table during a resize. */ #include -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/14] util: Implement returning the last added entry on search
--- src/util/non_replacing_hash_table.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index 1128388ab8..48f42aa7c9 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -222,7 +222,23 @@ hash_table_search(struct hash_table *ht, uint32_t hash, const void *key) return NULL; } else if (entry_is_present(ht, entry) && entry->hash == hash) { if (ht->key_equals_function(key, entry->key)) { -return entry; +struct hash_entry *previous_entry = entry; + +/* Follow the chain of matching key's to the end */ +while (previous_entry->next_data != 0 ) { + entry = ht->table + previous_entry->next_data; + /* We might have deleted the last entry, and have a +* dangling reference, so check that the keys match +* before we continue on traversing through the chain. +* If the key's don't match we return the previous entry. +*/ + if (!ht->key_equals_function(key, entry->key)) + return previous_entry; + + previous_entry = entry; +} + +return previous_entry; } } -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/14] util: Use hashing functions from hash_table.h
--- src/util/non_replacing_hash_table.c | 47 - src/util/non_replacing_hash_table.h | 35 +-- 2 files changed, 1 insertion(+), 81 deletions(-) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index ecec94e2d6..64dc5c9c18 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -511,50 +511,3 @@ _mesa_non_replacing_hash_table_random_entry(struct non_replacing_hash_table *ht, return NULL; } - - -/** - * Quick FNV-1a hash implementation based on: - * http://www.isthe.com/chongo/tech/comp/fnv/ - * - * FNV-1a is not be the best hash out there -- Jenkins's lookup3 is supposed - * to be quite good, and it probably beats FNV. But FNV has the advantage - * that it involves almost no code. For an improvement on both, see Paul - * Hsieh's http://www.azillionmonkeys.com/qed/hash.html - */ -uint32_t -_mesa_hash_data(const void *data, size_t size) -{ - return _mesa_fnv32_1a_accumulate_block(_mesa_fnv32_1a_offset_bias, - data, size); -} - -/** FNV-1a string hash implementation */ -uint32_t -_mesa_hash_string(const char *key) -{ - uint32_t hash = _mesa_fnv32_1a_offset_bias; - - while (*key != 0) { - hash = _mesa_fnv32_1a_accumulate(hash, *key); - key++; - } - - return hash; -} - -/** - * String compare function for use as the comparison callback in - * _mesa_non_replacing_hash_table_create(). - */ -bool -_mesa_key_string_equal(const void *a, const void *b) -{ - return strcmp(a, b) == 0; -} - -bool -_mesa_key_pointer_equal(const void *a, const void *b) -{ - return a == b; -} diff --git a/src/util/non_replacing_hash_table.h b/src/util/non_replacing_hash_table.h index f047ad5c11..e8a29635f6 100644 --- a/src/util/non_replacing_hash_table.h +++ b/src/util/non_replacing_hash_table.h @@ -33,6 +33,7 @@ #include #include "c99_compat.h" #include "macros.h" +#include "util/hash_table.h" #ifdef __cplusplus extern "C" { @@ -101,40 +102,6 @@ struct non_replacing_hash_entry * _mesa_non_replacing_hash_table_random_entry(struct non_replacing_hash_table *ht, bool (*predicate)(struct non_replacing_hash_entry *entry)); -uint32_t _mesa_hash_data(const void *data, size_t size); -uint32_t _mesa_hash_string(const char *key); -bool _mesa_key_string_equal(const void *a, const void *b); -bool _mesa_key_pointer_equal(const void *a, const void *b); - -static inline uint32_t _mesa_key_hash_string(const void *key) -{ - return _mesa_hash_string((const char *)key); -} - -static inline uint32_t _mesa_hash_pointer(const void *pointer) -{ - return _mesa_hash_data(&pointer, sizeof(pointer)); -} - -enum { - _mesa_fnv32_1a_offset_bias = 2166136261u, -}; - -static inline uint32_t -_mesa_fnv32_1a_accumulate_block(uint32_t hash, const void *data, -size_t size) -{ - const uint8_t *bytes = (const uint8_t *)data; - - while (size-- != 0) { - hash ^= *bytes; - hash = hash * 0x01000193; - bytes++; - } - - return hash; -} - #define _mesa_fnv32_1a_accumulate(hash, expr) \ _mesa_fnv32_1a_accumulate_block(hash, &(expr), sizeof(expr)) -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/14] util: Implement the insertion functionality
--- src/util/non_replacing_hash_table.c | 41 +++-- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index 1730b1ee0d..a4c57e9349 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -317,12 +317,14 @@ hash_table_insert(struct hash_table *ht, uint32_t hash, break; } - /* Implement replacement when another insert happens - * with a matching key. This is a relatively common - * feature of hash tables, with the alternative - * generally being "insert the new value as well, and - * return it first when the key is searched for". - * + /* When another insert happens with a matching key, + * we check if there is a chain of matchin keys. + * If there is, we follow this chain until the end. + * We then do a linear probe until we find an empty spot. + * We insert the entry, and update the entry at our past + * matching key's position to point to the positon of + * the newly added data. + * * Note that the hash table doesn't have a delete * callback. If freeing of old data pointers is * required to avoid memory leaks, perform a search @@ -331,8 +333,35 @@ hash_table_insert(struct hash_table *ht, uint32_t hash, if (!entry_is_deleted(ht, entry) && entry->hash == hash && ht->key_equals_function(key, entry->key)) { + struct hash_entry *old_entry = entry; + + /* Follow the chain of matching key's to the end */ + while (old_entry->next_data != 0) { +entry = ht->table + old_entry->next_data; +/* We might have deleted the last entry, and have a + * dangling reference, so check that the keys match + * before we continue on traversing through the chain. + * If the key's don't match we undo our change of entry. + */ +if (!ht->key_equals_function(key, entry->key)) { + entry = old_entry; + continue; +} +old_entry = entry; + } + + /* Do a linear search for an empty spot in the table */ + while (!entry_is_free(entry)) { +entry = ht->table + ((entry - ht->table + 1) % ht->size); + } + + /* Now that we have an empty spot we add our data */ entry->key = key; entry->data = data; + ht->entries++; + + /* And update the matching key chain */ + old_entry->next_data = entry - ht->table; return entry; } -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/14] glsl: Restructure copy propagation to use the non replacing hash table
This way we can build one hash table in each direction. This allows us to move the algorithm from O(n^2) to O(2*n). The downside is an effective doubling of memory consumption --- src/compiler/glsl/opt_copy_propagation.cpp | 49 +++--- 1 file changed, 45 insertions(+), 4 deletions(-) diff --git a/src/compiler/glsl/opt_copy_propagation.cpp b/src/compiler/glsl/opt_copy_propagation.cpp index aa5e813553..34e59b10ac 100644 --- a/src/compiler/glsl/opt_copy_propagation.cpp +++ b/src/compiler/glsl/opt_copy_propagation.cpp @@ -38,6 +38,7 @@ #include "ir_optimization.h" #include "compiler/glsl_types.h" #include "util/hash_table.h" +#include "util/non_replacing_hash_table.h" #include "util/dyn_array.h" namespace { @@ -51,6 +52,8 @@ public: lin_ctx = linear_alloc_parent(mem_ctx, 0); acp = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer, _mesa_key_pointer_equal); + rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, _mesa_hash_pointer, + _mesa_key_pointer_equal); this->kills = ralloc(mem_ctx, dyn_array); dyn_array_init(this->kills, mem_ctx); killed_all = false; @@ -75,6 +78,13 @@ public: /** Hash of lhs->rhs: The available copies to propagate */ hash_table *acp; + + /** +* Hash table with possibility for multiple data per key. +* This is used as an inverse of the acp table, to make the +* algorithm O(n) instead of O(n^2). +*/ + non_replacing_hash_table *rev_acp; /** * List of the variables whose values were killed in this block. */ @@ -98,11 +108,14 @@ ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir) * main() at link time, so they're irrelevant to us. */ hash_table *orig_acp = this->acp; + non_replacing_hash_table *orig_rev_acp = this->rev_acp; dyn_array *orig_kills = this->kills; bool orig_killed_all = this->killed_all; acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); + rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, _mesa_hash_pointer, + _mesa_key_pointer_equal); this->kills = ralloc(mem_ctx, dyn_array); dyn_array_init(this->kills, mem_ctx); this->killed_all = false; @@ -110,10 +123,12 @@ ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir) visit_list_elements(this, &ir->body); _mesa_hash_table_destroy(acp, NULL); + _mesa_non_replacing_hash_table_destroy(rev_acp, NULL); ralloc_free(this->kills); this->kills = orig_kills; this->acp = orig_acp; + this->rev_acp = orig_rev_acp; this->killed_all = orig_killed_all; return visit_continue_with_parent; @@ -177,6 +192,7 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir) * this call. So kill all copies. */ _mesa_hash_table_clear(acp, NULL); + _mesa_non_replacing_hash_table_clear(rev_acp, NULL); this->killed_all = true; return visit_continue_with_parent; @@ -186,11 +202,14 @@ void ir_copy_propagation_visitor::handle_if_block(exec_list *instructions) { hash_table *orig_acp = this->acp; + non_replacing_hash_table *orig_rev_acp = this->rev_acp; dyn_array *orig_kills = this->kills; bool orig_killed_all = this->killed_all; acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); + rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, _mesa_hash_pointer, + _mesa_key_pointer_equal); this->kills = ralloc(mem_ctx, dyn_array); dyn_array_init(this->kills, mem_ctx); this->killed_all = false; @@ -201,17 +220,25 @@ ir_copy_propagation_visitor::handle_if_block(exec_list *instructions) _mesa_hash_table_insert_pre_hashed(acp, entry->hash, entry->key, entry->data); } + struct non_replacing_hash_entry *nr_entry; + non_replacing_hash_table_foreach(orig_rev_acp, nr_entry) { + _mesa_non_replacing_hash_table_insert_pre_hashed(rev_acp, nr_entry->hash, + nr_entry->key, nr_entry->data); + } visit_list_elements(this, instructions); if (this->killed_all) { _mesa_hash_table_clear(orig_acp, NULL); + _mesa_non_replacing_hash_table_clear(orig_rev_acp, NULL); } dyn_array *new_kills = this->kills; this->kills = orig_kills; _mesa_hash_table_destroy(acp, NULL); + _mesa_non_replacing_hash_table_destroy(rev_acp, NULL); this->acp = orig_acp; + this->rev_acp = orig_rev_acp; this->killed_all = this->killed_all || orig_killed_all; dyn_array_foreach(new_kills, ir_variable *, var_ptr) { @@ -239,11 +266,14 @@ void ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp) { hash_table *orig_acp = thi
[Mesa-dev] [PATCH 09/14] util: Implement deletion of entries
--- src/util/non_replacing_hash_table.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index a4c57e9349..33c93452c4 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -183,6 +183,7 @@ _mesa_hash_table_clear(struct hash_table *ht, delete_function(entry); entry->key = NULL; + entry->next_data = 0; } ht->entries = 0; @@ -413,6 +414,10 @@ _mesa_hash_table_insert_pre_hashed(struct hash_table *ht, uint32_t hash, * * Note that deletion doesn't otherwise modify the table, so an iteration over * the table deleting entries is safe. + * + * IMPORTANT: The table only allows deleting the last entry of a chain of + * entries with matching keys. If an entry in the middle of the chain is deleted + * the chain will be broken, and the table will have unpredictable results. */ void _mesa_hash_table_remove(struct hash_table *ht, @@ -424,6 +429,7 @@ _mesa_hash_table_remove(struct hash_table *ht, entry->key = ht->deleted_key; ht->entries--; ht->deleted_entries++; + entry->next_data = 0; } /** -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/14] util: Rename functions and structs
--- src/util/non_replacing_hash_table.c | 112 +++- src/util/non_replacing_hash_table.h | 102 2 files changed, 111 insertions(+), 103 deletions(-) diff --git a/src/util/non_replacing_hash_table.c b/src/util/non_replacing_hash_table.c index 48f42aa7c9..ecec94e2d6 100644 --- a/src/util/non_replacing_hash_table.c +++ b/src/util/non_replacing_hash_table.c @@ -93,32 +93,32 @@ static const struct { }; static int -entry_is_free(const struct hash_entry *entry) +entry_is_free(const struct non_replacing_hash_entry *entry) { return entry->key == NULL; } static int -entry_is_deleted(const struct hash_table *ht, struct hash_entry *entry) +entry_is_deleted(const struct non_replacing_hash_table *ht, struct non_replacing_hash_entry *entry) { return entry->key == ht->deleted_key; } static int -entry_is_present(const struct hash_table *ht, struct hash_entry *entry) +entry_is_present(const struct non_replacing_hash_table *ht, struct non_replacing_hash_entry *entry) { return entry->key != NULL && entry->key != ht->deleted_key; } -struct hash_table * -_mesa_hash_table_create(void *mem_ctx, +struct non_replacing_hash_table * +_mesa_non_replacing_hash_table_create(void *mem_ctx, uint32_t (*key_hash_function)(const void *key), bool (*key_equals_function)(const void *a, const void *b)) { - struct hash_table *ht; + struct non_replacing_hash_table *ht; - ht = ralloc(mem_ctx, struct hash_table); + ht = ralloc(mem_ctx, struct non_replacing_hash_table); if (ht == NULL) return NULL; @@ -128,7 +128,7 @@ _mesa_hash_table_create(void *mem_ctx, ht->max_entries = hash_sizes[ht->size_index].max_entries; ht->key_hash_function = key_hash_function; ht->key_equals_function = key_equals_function; - ht->table = rzalloc_array(ht, struct hash_entry, ht->size); + ht->table = rzalloc_array(ht, struct non_replacing_hash_entry, ht->size); ht->entries = 0; ht->deleted_entries = 0; ht->deleted_key = &deleted_key_value; @@ -148,16 +148,16 @@ _mesa_hash_table_create(void *mem_ctx, * freeing. */ void -_mesa_hash_table_destroy(struct hash_table *ht, - void (*delete_function)(struct hash_entry *entry)) +_mesa_non_replacing_hash_table_destroy(struct non_replacing_hash_table *ht, + void (*delete_function)(struct non_replacing_hash_entry *entry)) { if (!ht) return; if (delete_function) { - struct hash_entry *entry; + struct non_replacing_hash_entry *entry; - hash_table_foreach(ht, entry) { + non_replacing_hash_table_foreach(ht, entry) { delete_function(entry); } } @@ -171,10 +171,10 @@ _mesa_hash_table_destroy(struct hash_table *ht, * If delete_function is passed, it gets called on each entry present. */ void -_mesa_hash_table_clear(struct hash_table *ht, - void (*delete_function)(struct hash_entry *entry)) +_mesa_non_replacing_hash_table_clear(struct non_replacing_hash_table *ht, + void (*delete_function)(struct non_replacing_hash_entry *entry)) { - struct hash_entry *entry; + struct non_replacing_hash_entry *entry; for (entry = ht->table; entry != ht->table + ht->size; entry++) { if (entry->key == NULL) @@ -202,13 +202,14 @@ _mesa_hash_table_clear(struct hash_table *ht, * This must be called before any keys are actually deleted from the table. */ void -_mesa_hash_table_set_deleted_key(struct hash_table *ht, const void *deleted_key) +_mesa_non_replacing_hash_table_set_deleted_key(struct non_replacing_hash_table *ht, + const void *deleted_key) { ht->deleted_key = deleted_key; } -static struct hash_entry * -hash_table_search(struct hash_table *ht, uint32_t hash, const void *key) +static struct non_replacing_hash_entry * +hash_table_search(struct non_replacing_hash_table *ht, uint32_t hash, const void *key) { uint32_t start_hash_address = hash % ht->size; uint32_t hash_address = start_hash_address; @@ -216,13 +217,13 @@ hash_table_search(struct hash_table *ht, uint32_t hash, const void *key) do { uint32_t double_hash; - struct hash_entry *entry = ht->table + hash_address; + struct non_replacing_hash_entry *entry = ht->table + hash_address; if (entry_is_free(entry)) { return NULL; } else if (entry_is_present(ht, entry) && entry->hash == hash) { if (ht->key_equals_function(key, entry->key)) { -struct hash_entry *previous_entry = entry; +struct non_replacing_hash_entry *previous_entry = entry; /* Follow the chain of matching key's to the end */ while (previous_entry->next_data != 0 ) { @@ -256,35 +257,36 @@ hash_table_search(struct hash_table *ht, uint32_t ha
Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation
Oh, and an important disclaimer. I have not ran this through picking, nor have I done a complete run of shader-db for performance comparisons on more realistic workloads. I'll see what time I can find this evening, or some time tomorrow. On Jan 1, 2017 19:38, "Thomas Helland" wrote: Hi all. This is a series I've put together to lower the overhead of the copy propagation pass in glsl. The series is not refined, so please treat it as a proof of concept / RFC. The first five patches might be merge-ready, but their benefit is quite low compared to the actual major change here. The series goes on to copy our existing hash table implementation, and modify it in steps to allow insertion of multiple data entries with the same key. I've kept the series in steps corresponding to the way I worked when modifying it, as I found it nice for clarity. There is some obvious cleanups and squashing that will need to happen before this can be considered for merging. The major change is the last patch. This uses the new hash table in our copy propagation pass. This effectively makes the algorithm O(2n) instead of O(n^2), which is a big win. I tested with the shader mentioned in [1] but took the shaders and made a shader_test that I used in conjunction with shader-db to do testing. I cut down on the ridiculous amount of expression some, to make it even compile in a reasonable time on my poor laptop. So, on to the results. Compilation of the aforementioned shader went from 98 to 75 seconds. Perf shows that previously we spent 19% of the time walking the acp-table with _mesa_hash_table_next_entry. After this series, the hash table is practically gone from the profile. This is a major win, and I'm guite happy with the results. There is obviously a lot more work to be done though. Please leave critical feedback and ideas. This was merely a test from my part, but it seems there is some opportunities here. https://bugs.freedesktop.org/show_bug.cgi?id=94477 Thomas Helland (14): glsl: Don't rehash the values when copying to new table glsl: Don't compute the hash when we already have it available nir: Remove unused include of nir_array nir: Move nir_array to util, and rename to dyn_array glsl: Use dyn_array instead of the exec_list util: Make a copy of the existing hash table implementation util: Add field for pointing to next data util: Implement the insertion functionality util: Implement deletion of entries util: Add a comment about ordering of entries with matching keys util: Implement returning the last added entry on search util: Rename functions and structs util: Use hashing functions from hash_table.h glsl: Restructure copy propagation to use the non replacing hash table src/compiler/Makefile.sources | 1 - src/compiler/glsl/opt_copy_propagation.cpp | 112 +++-- .../glsl/opt_copy_propagation_elements.cpp | 6 +- src/compiler/nir/nir_lower_locals_to_regs.c| 1 - src/compiler/spirv/vtn_cfg.c | 6 +- src/compiler/spirv/vtn_private.h | 4 +- src/util/Makefile.sources | 3 + src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +- src/util/non_replacing_hash_table.c| 513 + src/util/non_replacing_hash_table.h| 135 ++ 10 files changed, 743 insertions(+), 56 deletions(-) rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%) create mode 100644 src/util/non_replacing_hash_table.c create mode 100644 src/util/non_replacing_hash_table.h -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Patch for freedreno features
On Sat, Dec 31, 2016 at 1:31 PM, Romain Failliot wrote: > I tried to use git send-email but it doesn't seem to work (although > the output says otherwise). > > So eventually it's simpler to just copy/paste the patch generated by > git format-patch: well, I was trying to not loose authorship/etc from the patch (although tbh I've lost track of who the original author was).. http://www.mesa3d.org/devinfo.html#submitting has some suggestions. BR, -R > --- > docs/features.txt | 38 +++--- > 1 file changed, 19 insertions(+), 19 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index c27d521..63b45af 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions. > Feature Status > --- > > > -GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > llvmpipe, softpipe, swr > +GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, > radeonsi, llvmpipe, softpipe, swr > >glBindFragDataLocation, glGetFragDataLocation DONE >GL_NV_conditional_render (Conditional rendering) DONE () > @@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, > r600, radeonsi, llvmpipe, soft >glVertexAttribI commands DONE >Depth format cube texturesDONE () >GLX_ARB_create_context (GLX 1.4 is required) DONE > - Multisample anti-aliasing DONE > (llvmpipe (*), softpipe (*), swr (*)) > + Multisample anti-aliasing DONE > (freedreno (*), llvmpipe (*), softpipe (*), swr (*)) > > -(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support > +(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample > anti-aliasing support > > > -GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > llvmpipe, softpipe, swr > +GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, > radeonsi, llvmpipe, softpipe, swr > >Forward compatible context support/deprecations DONE () >GL_ARB_draw_instanced (Instanced drawing) DONE () > @@ -82,34 +82,34 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, > r600, radeonsi, llvmpipe, soft > >Core/compatibility profiles DONE >Geometry shaders DONE () > - GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (swr) > - GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr) > - GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr) > - GL_ARB_provoking_vertex (Provoking vertex)DONE (swr) > - GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (swr) > + GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (freedreno, swr) > + GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno, swr) > + GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno, swr) > + GL_ARB_provoking_vertex (Provoking vertex)DONE (freedreno, swr) > + GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (freedreno, swr) >GL_ARB_texture_multisample (Multisample textures) DONE (swr) > - GL_ARB_depth_clamp (Frag depth clamp) DONE (swr) > - GL_ARB_sync (Fence objects) DONE (swr) > + GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno, swr) > + GL_ARB_sync (Fence objects) DONE (freedreno, swr) >GLX_ARB_create_context_profileDONE > > > GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > llvmpipe, softpipe > > - GL_ARB_blend_func_extendedDONE (swr) > + GL_ARB_blend_func_extendedDONE > (freedreno/a3xx, swr) >GL_ARB_explicit_attrib_location DONE (all > drivers that support GLSL) > - GL_ARB_occlusion_query2 DONE (swr) > + GL_ARB_occlusion_query2 DONE (freedreno, swr) >GL_ARB_sampler_objectsDONE (all drivers) > - GL_ARB_shader_bit_encodingDONE (swr) > - GL_ARB_texture_rgb10_a2ui DONE (swr) > - GL_ARB_texture_swizzleDONE (swr) > + GL_ARB_shader_bit_encodingDONE (freedreno, swr) > + GL_ARB_texture_rgb10_a2ui DONE (freedreno, swr) > + GL_ARB_texture_swizzleDONE (freedreno, swr) >GL_ARB_timer_queryDONE (swr) > - GL_ARB_instanced_arrays DONE (swr) > - GL_ARB_vertex_type_2_10_10_10_rev DONE (swr) > + GL_ARB_instan
[Mesa-dev] [PATCH] gallium/hud: add a path separator between dump directory and filename
It's more user friendly and it avoids to write files in unexpected places. --- src/gallium/auxiliary/hud/hud_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 779c116..cccdb26 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -872,9 +872,10 @@ hud_graph_set_dump_file(struct hud_graph *gr) char *dump_file; if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) { - dump_file = malloc(strlen(hud_dump_dir) + sizeof(gr->name)); + dump_file = malloc(strlen(hud_dump_dir) + sizeof("/") + sizeof(gr->name)); if (dump_file) { strcpy(dump_file, hud_dump_dir); + strcat(dump_file, "/"); strcat(dump_file, gr->name); gr->fd = fopen(dump_file, "w+"); free(dump_file); -- 2.10.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation
Darn it. There appears to be a functional change with patch 14. My changes where meant to make no functional change, so something is fishy. An example from the public shader-db: cycles HURT: ./shaders/tesseract/205.shader_test VS vec4: 54 -> 100 (85.19%) I'll see if I can find some time to debug this during the week. If anyone immediately sees the issue, please let me know. I guess I should write some make-check tests for the new hash table, as I suspect that that is where the issue is. - Thomas 2017-01-01 20:41 GMT+01:00 Thomas Helland : > Oh, and an important disclaimer. > I have not ran this through picking, nor have I done a complete > run of shader-db for performance comparisons on more realistic workloads. > I'll see what time I can find this evening, or some time tomorrow. > > > On Jan 1, 2017 19:38, "Thomas Helland" wrote: > > Hi all. > > This is a series I've put together to lower the overhead of > the copy propagation pass in glsl. The series is not refined, > so please treat it as a proof of concept / RFC. > > The first five patches might be merge-ready, but their benefit is > quite low compared to the actual major change here. > > The series goes on to copy our existing hash table implementation, > and modify it in steps to allow insertion of multiple data entries > with the same key. I've kept the series in steps corresponding to > the way I worked when modifying it, as I found it nice for clarity. > There is some obvious cleanups and squashing that will need to happen > before this can be considered for merging. > > The major change is the last patch. This uses the new hash table in > our copy propagation pass. This effectively makes the algorithm > O(2n) instead of O(n^2), which is a big win. I tested with the > shader mentioned in [1] but took the shaders and made a shader_test > that I used in conjunction with shader-db to do testing. > I cut down on the ridiculous amount of expression some, to make it > even compile in a reasonable time on my poor laptop. > > So, on to the results. Compilation of the aforementioned shader > went from 98 to 75 seconds. Perf shows that previously we spent > 19% of the time walking the acp-table with _mesa_hash_table_next_entry. > After this series, the hash table is practically gone from the profile. > This is a major win, and I'm guite happy with the results. > There is obviously a lot more work to be done though. > > Please leave critical feedback and ideas. This was merely a test > from my part, but it seems there is some opportunities here. > > https://bugs.freedesktop.org/show_bug.cgi?id=94477 > > Thomas Helland (14): > glsl: Don't rehash the values when copying to new table > glsl: Don't compute the hash when we already have it available > nir: Remove unused include of nir_array > nir: Move nir_array to util, and rename to dyn_array > glsl: Use dyn_array instead of the exec_list > util: Make a copy of the existing hash table implementation > util: Add field for pointing to next data > util: Implement the insertion functionality > util: Implement deletion of entries > util: Add a comment about ordering of entries with matching keys > util: Implement returning the last added entry on search > util: Rename functions and structs > util: Use hashing functions from hash_table.h > glsl: Restructure copy propagation to use the non replacing hash table > > src/compiler/Makefile.sources | 1 - > src/compiler/glsl/opt_copy_propagation.cpp | 112 +++-- > .../glsl/opt_copy_propagation_elements.cpp | 6 +- > src/compiler/nir/nir_lower_locals_to_regs.c| 1 - > src/compiler/spirv/vtn_cfg.c | 6 +- > src/compiler/spirv/vtn_private.h | 4 +- > src/util/Makefile.sources | 3 + > src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +- > src/util/non_replacing_hash_table.c| 513 > + > src/util/non_replacing_hash_table.h| 135 ++ > 10 files changed, 743 insertions(+), 56 deletions(-) > rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%) > create mode 100644 src/util/non_replacing_hash_table.c > create mode 100644 src/util/non_replacing_hash_table.h > > -- > 2.11.0 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 20/27] i965: Pretend that CCS modified images are two planes
On 16-12-10 16:05:12, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:10:01PM -0800, Ben Widawsky wrote: From: Ben Widawsky Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/intel_screen.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 91eb7ec..f40761a 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -747,7 +747,7 @@ intel_query_image(__DRIimage *image, int attrib, int *value) case __DRI_IMAGE_ATTRIB_FOURCC: return intel_lookup_fourcc(image->dri_format, value); case __DRI_IMAGE_ATTRIB_NUM_PLANES: - *value = 1; + *value = image->aux_offset ? 2: 1; return true; case __DRI_IMAGE_ATTRIB_OFFSET: *value = image->offset; @@ -956,9 +956,17 @@ intel_from_planar(__DRIimage *parent, int plane, void *loaderPrivate) int width, height, offset, stride, dri_format, index; struct intel_image_format *f; __DRIimage *image; - -if (parent == NULL || parent->planar_format == NULL) -return NULL; +bool is_aux = parent->aux_offset && plane == 1; + +if (parent == NULL || parent->planar_format == NULL) { + if (is_aux) { + offset = parent->aux_offset; + stride = ALIGN(parent->pitch / 32, 128); + dri_format = parent->dri_format; + goto done; + } + return NULL; +} f = parent->planar_format; @@ -972,11 +980,13 @@ intel_from_planar(__DRIimage *parent, int plane, void *loaderPrivate) offset = parent->offsets[index]; stride = parent->strides[index]; +done: image = intel_allocate_image(parent->screen, dri_format, loaderPrivate); if (image == NULL) return NULL; -if (offset + height * stride > parent->bo->size) { +if (!is_aux && +offset + height * stride > parent->bo->size) { This means that parent->bo->size is set to the size of the color region and not to the full size of color+aux. I don't think I saw yet logic in this series doing this... Anyway, small comment would probably prevent other people wondering about the same thing. What do you think? No. You caught a bug. parent->bo->size should be the full size (it's allocated in create_image_with_modifier). Originally I had height defined as well so I could keep the assertion. I think I should go back and set height and keep the assertion as it was. diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 49adb1d2ff..501520bbf0 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -975,6 +975,7 @@ intel_from_planar(__DRIimage *parent, int plane, void *loaderPrivate) if (is_aux) { offset = parent->aux_offset; stride = ALIGN(parent->pitch / 32, 128); + height = ALIGN(DIV_ROUND_UP(parent->height, 16), 32); dri_format = parent->dri_format; goto done; } @@ -998,8 +999,7 @@ done: if (image == NULL) return NULL; -if (!is_aux && -offset + height * stride > parent->bo->size) { +if (offset + height * stride > parent->bo->size) { _mesa_warning(NULL, "intel_create_sub_image: subimage out of bounds"); free(image); return NULL; _mesa_warning(NULL, "intel_create_sub_image: subimage out of bounds"); free(image); return NULL; -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation
So I've discovered the issue. I wasn't inserting into both tables. So we never did any copy propagation. So that was stupid. Now it assertion fails in _insert_pre_hashed as hashing the key does not return the same result as the supplied hash. I'll look more into this some time during the week. 2017-01-01 22:53 GMT+01:00 Thomas Helland : > Darn it. There appears to be a functional change with patch 14. > My changes where meant to make no functional change, > so something is fishy. An example from the public shader-db: > > cycles HURT: ./shaders/tesseract/205.shader_test VS vec4: 54 -> > 100 (85.19%) > > I'll see if I can find some time to debug this during the week. > If anyone immediately sees the issue, please let me know. > > I guess I should write some make-check tests for the new > hash table, as I suspect that that is where the issue is. > > - Thomas > 2017-01-01 20:41 GMT+01:00 Thomas Helland : >> Oh, and an important disclaimer. >> I have not ran this through picking, nor have I done a complete >> run of shader-db for performance comparisons on more realistic workloads. >> I'll see what time I can find this evening, or some time tomorrow. >> >> >> On Jan 1, 2017 19:38, "Thomas Helland" wrote: >> >> Hi all. >> >> This is a series I've put together to lower the overhead of >> the copy propagation pass in glsl. The series is not refined, >> so please treat it as a proof of concept / RFC. >> >> The first five patches might be merge-ready, but their benefit is >> quite low compared to the actual major change here. >> >> The series goes on to copy our existing hash table implementation, >> and modify it in steps to allow insertion of multiple data entries >> with the same key. I've kept the series in steps corresponding to >> the way I worked when modifying it, as I found it nice for clarity. >> There is some obvious cleanups and squashing that will need to happen >> before this can be considered for merging. >> >> The major change is the last patch. This uses the new hash table in >> our copy propagation pass. This effectively makes the algorithm >> O(2n) instead of O(n^2), which is a big win. I tested with the >> shader mentioned in [1] but took the shaders and made a shader_test >> that I used in conjunction with shader-db to do testing. >> I cut down on the ridiculous amount of expression some, to make it >> even compile in a reasonable time on my poor laptop. >> >> So, on to the results. Compilation of the aforementioned shader >> went from 98 to 75 seconds. Perf shows that previously we spent >> 19% of the time walking the acp-table with _mesa_hash_table_next_entry. >> After this series, the hash table is practically gone from the profile. >> This is a major win, and I'm guite happy with the results. >> There is obviously a lot more work to be done though. >> >> Please leave critical feedback and ideas. This was merely a test >> from my part, but it seems there is some opportunities here. >> >> https://bugs.freedesktop.org/show_bug.cgi?id=94477 >> >> Thomas Helland (14): >> glsl: Don't rehash the values when copying to new table >> glsl: Don't compute the hash when we already have it available >> nir: Remove unused include of nir_array >> nir: Move nir_array to util, and rename to dyn_array >> glsl: Use dyn_array instead of the exec_list >> util: Make a copy of the existing hash table implementation >> util: Add field for pointing to next data >> util: Implement the insertion functionality >> util: Implement deletion of entries >> util: Add a comment about ordering of entries with matching keys >> util: Implement returning the last added entry on search >> util: Rename functions and structs >> util: Use hashing functions from hash_table.h >> glsl: Restructure copy propagation to use the non replacing hash table >> >> src/compiler/Makefile.sources | 1 - >> src/compiler/glsl/opt_copy_propagation.cpp | 112 +++-- >> .../glsl/opt_copy_propagation_elements.cpp | 6 +- >> src/compiler/nir/nir_lower_locals_to_regs.c| 1 - >> src/compiler/spirv/vtn_cfg.c | 6 +- >> src/compiler/spirv/vtn_private.h | 4 +- >> src/util/Makefile.sources | 3 + >> src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +- >> src/util/non_replacing_hash_table.c| 513 >> + >> src/util/non_replacing_hash_table.h| 135 ++ >> 10 files changed, 743 insertions(+), 56 deletions(-) >> rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%) >> create mode 100644 src/util/non_replacing_hash_table.c >> create mode 100644 src/util/non_replacing_hash_table.h >> >> -- >> 2.11.0 >> >> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Patch for freedreno features
On Sun, 2017-01-01 at 16:15 -0500, Rob Clark wrote: > On Sat, Dec 31, 2016 at 1:31 PM, Romain Failliot > wrote: > > I tried to use git send-email but it doesn't seem to work (although > > the output says otherwise). > > > > So eventually it's simpler to just copy/paste the patch generated by > > git format-patch: > > well, I was trying to not loose authorship/etc from the patch > (although tbh I've lost track of who the original author was).. just FYI you can use git commit --author="foo bar " to set arbitrary authorship at commit time. works with git commit --amend as well. Jan > http://www.mesa3d.org/devinfo.html#submitting has some suggestions. > > BR, > -R > > > --- > > docs/features.txt | 38 +++--- > > 1 file changed, 19 insertions(+), 19 deletions(-) > > > > diff --git a/docs/features.txt b/docs/features.txt > > index c27d521..63b45af 100644 > > --- a/docs/features.txt > > +++ b/docs/features.txt > > @@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions. > > Feature Status > > --- > > > > > > -GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > > llvmpipe, softpipe, swr > > +GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, > > radeonsi, llvmpipe, softpipe, swr > > > >glBindFragDataLocation, glGetFragDataLocation DONE > >GL_NV_conditional_render (Conditional rendering) DONE () > > @@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, > > r600, radeonsi, llvmpipe, soft > >glVertexAttribI commands DONE > >Depth format cube texturesDONE () > >GLX_ARB_create_context (GLX 1.4 is required) DONE > > - Multisample anti-aliasing DONE > > (llvmpipe (*), softpipe (*), swr (*)) > > + Multisample anti-aliasing DONE > > (freedreno (*), llvmpipe (*), softpipe (*), swr (*)) > > > > -(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support > > +(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample > > anti-aliasing support > > > > > > -GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > > llvmpipe, softpipe, swr > > +GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, > > radeonsi, llvmpipe, softpipe, swr > > > >Forward compatible context support/deprecations DONE () > >GL_ARB_draw_instanced (Instanced drawing) DONE () > > @@ -82,34 +82,34 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, > > r600, radeonsi, llvmpipe, soft > > > >Core/compatibility profiles DONE > >Geometry shaders DONE () > > - GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (swr) > > - GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr) > > - GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr) > > - GL_ARB_provoking_vertex (Provoking vertex)DONE (swr) > > - GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (swr) > > + GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (freedreno, > > swr) > > + GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno, > > swr) > > + GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno, > > swr) > > + GL_ARB_provoking_vertex (Provoking vertex)DONE (freedreno, > > swr) > > + GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (freedreno, > > swr) > >GL_ARB_texture_multisample (Multisample textures) DONE (swr) > > - GL_ARB_depth_clamp (Frag depth clamp) DONE (swr) > > - GL_ARB_sync (Fence objects) DONE (swr) > > + GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno, > > swr) > > + GL_ARB_sync (Fence objects) DONE (freedreno, > > swr) > >GLX_ARB_create_context_profileDONE > > > > > > GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, > > llvmpipe, softpipe > > > > - GL_ARB_blend_func_extendedDONE (swr) > > + GL_ARB_blend_func_extendedDONE > > (freedreno/a3xx, swr) > >GL_ARB_explicit_attrib_location DONE (all > > drivers that support GLSL) > > - GL_ARB_occlusion_query2 DONE (swr) > > + GL_ARB_occlusion_query2 DONE (freedreno, > > swr) > >GL_ARB_sampler_objectsDONE (all drivers) > > - GL_ARB_shader_bit_encodingDONE (swr) > > - GL_ARB_texture_rgb10_a2ui DONE (swr) > > - GL_ARB_texture_swizzleDONE (swr) > > + GL_ARB_shader_bit_encoding
Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation
2017-01-01 23:20 GMT+01:00 Thomas Helland : > So I've discovered the issue. I wasn't inserting into both tables. > So we never did any copy propagation. So that was stupid. > Now it assertion fails in _insert_pre_hashed as hashing the > key does not return the same result as the supplied hash. > I'll look more into this some time during the week. > Couldn't sleep until I'd found the issue. I was not setting the hash when hitting the new insertion codepath. Adding this makes things work, and reduces the functional changes to just two shaders from tf2 that are seeing an increase of one single instruction in their vertex shaders. But that's for another day. As for the compile time; I've only given it a shader-db run on normal shaders without disabling validation or nir optimizations. (Apart from the aforementioned unrealistic shader from the bug). It appears to give no noticeable difference, if something it might regress performance ever so slightly. I'd have to do more testing to verify if it has any major impact. It would not be unlikely that it would cause some regressions on smaller shaders though, as we are balancing hashing-overhead, and the fact we are inserting into two tables, with the overhead of array-traversal in the hash table. Some thorough testing will be needed. > 2017-01-01 22:53 GMT+01:00 Thomas Helland : >> Darn it. There appears to be a functional change with patch 14. >> My changes where meant to make no functional change, >> so something is fishy. An example from the public shader-db: >> >> cycles HURT: ./shaders/tesseract/205.shader_test VS vec4: 54 -> >> 100 (85.19%) >> >> I'll see if I can find some time to debug this during the week. >> If anyone immediately sees the issue, please let me know. >> >> I guess I should write some make-check tests for the new >> hash table, as I suspect that that is where the issue is. >> >> - Thomas >> 2017-01-01 20:41 GMT+01:00 Thomas Helland : >>> Oh, and an important disclaimer. >>> I have not ran this through picking, nor have I done a complete >>> run of shader-db for performance comparisons on more realistic workloads. >>> I'll see what time I can find this evening, or some time tomorrow. >>> >>> >>> On Jan 1, 2017 19:38, "Thomas Helland" wrote: >>> >>> Hi all. >>> >>> This is a series I've put together to lower the overhead of >>> the copy propagation pass in glsl. The series is not refined, >>> so please treat it as a proof of concept / RFC. >>> >>> The first five patches might be merge-ready, but their benefit is >>> quite low compared to the actual major change here. >>> >>> The series goes on to copy our existing hash table implementation, >>> and modify it in steps to allow insertion of multiple data entries >>> with the same key. I've kept the series in steps corresponding to >>> the way I worked when modifying it, as I found it nice for clarity. >>> There is some obvious cleanups and squashing that will need to happen >>> before this can be considered for merging. >>> >>> The major change is the last patch. This uses the new hash table in >>> our copy propagation pass. This effectively makes the algorithm >>> O(2n) instead of O(n^2), which is a big win. I tested with the >>> shader mentioned in [1] but took the shaders and made a shader_test >>> that I used in conjunction with shader-db to do testing. >>> I cut down on the ridiculous amount of expression some, to make it >>> even compile in a reasonable time on my poor laptop. >>> >>> So, on to the results. Compilation of the aforementioned shader >>> went from 98 to 75 seconds. Perf shows that previously we spent >>> 19% of the time walking the acp-table with _mesa_hash_table_next_entry. >>> After this series, the hash table is practically gone from the profile. >>> This is a major win, and I'm guite happy with the results. >>> There is obviously a lot more work to be done though. >>> >>> Please leave critical feedback and ideas. This was merely a test >>> from my part, but it seems there is some opportunities here. >>> >>> https://bugs.freedesktop.org/show_bug.cgi?id=94477 >>> >>> Thomas Helland (14): >>> glsl: Don't rehash the values when copying to new table >>> glsl: Don't compute the hash when we already have it available >>> nir: Remove unused include of nir_array >>> nir: Move nir_array to util, and rename to dyn_array >>> glsl: Use dyn_array instead of the exec_list >>> util: Make a copy of the existing hash table implementation >>> util: Add field for pointing to next data >>> util: Implement the insertion functionality >>> util: Implement deletion of entries >>> util: Add a comment about ordering of entries with matching keys >>> util: Implement returning the last added entry on search >>> util: Rename functions and structs >>> util: Use hashing functions from hash_table.h >>> glsl: Restructure copy propagation to use the non replacing hash table >>> >>> src/compiler/Makefile.sources | 1 - >>> src/compiler/glsl/opt_c
Re: [Mesa-dev] [PATCH 25/27] i965: Use partial resolves for CCS buffers being scanned out
On 16-12-10 16:24:46, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:10:06PM -0800, Ben Widawsky wrote: From: Ben Widawsky On Gen9 hardware, the display engine is able to scanout a compressed framebuffer by providing an offset to auxiliary compression information. Unfortunately, the hardware is incapable of doing the same thing for the fast clear color. To mitigate this, the hardware introduced a new resolve type called a partial resolve. The partial resolve will only do a resolve of the fast clear color and leave the rest of the compressed data alone. This patch enables using this resolve type for cases where the framebuffer will be passed along to the kernel for display. Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_context.c | 3 ++- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index cfd7a54..2bc2e94 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -1359,7 +1359,8 @@ intel_resolve_for_dri2_flush(struct brw_context *brw, if (rb->mt->num_samples <= 1) { assert(rb->mt_layer == 0 && rb->mt_level == 0 && rb->layer_count == 1); - intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1, 0); I'm now in a territory that I don't know that well and just need to ask random questions. How do we know this is destined for display directly and not for non-gpu compositing? I think that's always the case with intel_resolve_for_dri2_flush(). The current callers are: intel_flush_front() - from glFlush and intel_dri2_flush_with_flags - which is only called for drawables. I wasn't entirely sure myself, but I assumed it was correct because it passed all the CTS and piglit tests. + intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1, + INTEL_RESOLVE_HINT_CLEAR_COLOR); } else { intel_renderbuffer_downsample(brw, rb); } diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index dd71a06..b79de08 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -2399,8 +2399,8 @@ intel_miptree_make_shareable(struct brw_context *brw, assert(mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE || mt->num_samples <= 1); if (mt->mcs_buf) { Shouldn't we here at least check for scanout? Hmm. This probably makes sense. How about: - intel_miptree_all_slices_resolve_color(brw, mt, - INTEL_RESOLVE_HINT_CLEAR_COLOR); + intel_miptree_all_slices_resolve_color(brw, mt, mt->is_scanout ? + INTEL_RESOLVE_HINT_CLEAR_COLOR : + INTEL_RESOLVE_HINT_FULL); - intel_miptree_all_slices_resolve_color(brw, mt, 0); - mt->no_ccs = true; + intel_miptree_all_slices_resolve_color(brw, mt, + INTEL_RESOLVE_HINT_CLEAR_COLOR); } } -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 26/27] i965: Remove scanout restriction from lossless compression
On 16-12-11 10:05:25, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:10:07PM -0800, Ben Widawsky wrote: From: Ben Widawsky Cc: Topi Pohjolainen Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index b79de08..b297f79 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -157,7 +157,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw, if (mt->disable_aux_buffers) return false; - if (mt->is_scanout) + if (mt->is_scanout && mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS) I guess the same question I had in previous becomes more general. How do we know that the external consumer is prepared for compression (mt->is_scanout just tells that the buffer goes out from the driver, right)? Preparedness should have been determined by the graphics application (perhaps via the display server). The graphics application, using get_plane2 or some other magic, will determine that the external consumer can handle compression, and will then use the CCS modifiers upon buffer creation. This is the only way is_scanout && msaa_layout == INTEL_MSAA_LAYOUT_CMS, see create_ccs_buf_for_image(). return false; /* This function applies only to non-multisampled render targets. */ @@ -528,10 +528,6 @@ intel_miptree_create_layout(struct brw_context *brw, const UNUSED bool is_lossless_compressed_aux = brw->gen >= 9 && num_samples == 1 && mt->format == MESA_FORMAT_R_UINT32; - - /* For now, nothing else has this requirement */ - assert(is_lossless_compressed_aux || - (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0); Why do we need to drop this? And if we do, then we can drop the entire else-branch, "is_lossless_compressed_aux" is only used in the assertion. This might have been because of my bug for not having the layout flags set appropriately in an earlier version of my patches. I think the assertion should stay here, and I put it back. Thanks. } brw_miptree_layout(brw, mt, layout_flags); @@ -752,11 +748,9 @@ intel_miptree_create(struct brw_context *brw, * resolves. */ const bool lossless_compression_disabled = INTEL_DEBUG & DEBUG_NO_RBC; - assert(!mt->is_scanout); const bool is_lossless_compressed = unlikely(!lossless_compression_disabled) && - brw->gen >= 9 && !mt->is_scanout && - intel_miptree_supports_lossless_compressed(brw, mt); + brw->gen >= 9 && intel_miptree_supports_lossless_compressed(brw, mt); if (is_lossless_compressed) { intel_miptree_alloc_non_msrt_mcs(brw, mt, is_lossless_compressed); @@ -1043,7 +1037,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt) drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo); free((*mt)->hiz_buf); } - if ((*mt)->mcs_buf && !(*mt)->is_scanout) { + if ((*mt)->mcs_buf) { drm_intel_bo_unreference((*mt)->mcs_buf->bo); free((*mt)->mcs_buf); } -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Replace bool aux disable with enum
As CCS buffers are passed to KMS, it becomes useful to be able to determine exactly what type of aux buffers are disabled. This was previously not entirely needed (though the code was a little more confusing), however it becomes very desirable after a recent patch from Chad: commit 1c8be049bea786c2c054a770025976beba5b8636 Author: Chad Versace Date: Fri Dec 9 16:18:11 2016 -0800 i965/mt: Disable aux surfaces after making miptree shareable The next patch will handle CCS and get rid of no_ccs. Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 24 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 +- 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 11ad31d761..e99d8a128b 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -64,7 +64,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw, */ static enum intel_msaa_layout compute_msaa_layout(struct brw_context *brw, mesa_format format, -bool disable_aux_buffers) +enum intel_aux_disable aux_disable) { /* Prior to Gen7, all MSAA surfaces used IMS layout. */ if (brw->gen < 7) @@ -90,7 +90,7 @@ compute_msaa_layout(struct brw_context *brw, mesa_format format, */ if (brw->gen == 7 && _mesa_get_format_datatype(format) == GL_INT) { return INTEL_MSAA_LAYOUT_UMS; - } else if (disable_aux_buffers) { + } else if (aux_disable & INTEL_AUX_DISABLE_MCS) { /* We can't use the CMS layout because it uses an aux buffer, the MCS * buffer. So fallback to UMS, which is identical to CMS without the * MCS. */ @@ -149,7 +149,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw, if (brw->gen < 7) return false; - if (mt->disable_aux_buffers) + if (mt->aux_disable & INTEL_AUX_DISABLE_MCS) return false; /* This function applies only to non-multisampled render targets. */ @@ -322,7 +322,8 @@ intel_miptree_create_layout(struct brw_context *brw, mt->logical_width0 = width0; mt->logical_height0 = height0; mt->logical_depth0 = depth0; - mt->disable_aux_buffers = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0; + mt->aux_disable = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0 ? + INTEL_AUX_DISABLE_ALL : INTEL_AUX_DISABLE_NONE; mt->no_ccs = true; mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0; exec_list_make_empty(&mt->hiz_map); @@ -336,8 +337,7 @@ intel_miptree_create_layout(struct brw_context *brw, int depth_multiply = 1; if (num_samples > 1) { /* Adjust width/height/depth for MSAA */ - mt->msaa_layout = compute_msaa_layout(brw, format, -mt->disable_aux_buffers); + mt->msaa_layout = compute_msaa_layout(brw, format, mt->aux_disable); if (mt->msaa_layout == INTEL_MSAA_LAYOUT_IMS) { /* From the Ivybridge PRM, Volume 1, Part 1, page 108: * "If the surface is multisampled and it is a depth or stencil @@ -528,7 +528,7 @@ intel_miptree_create_layout(struct brw_context *brw, brw_miptree_layout(brw, mt, layout_flags); - if (mt->disable_aux_buffers) + if (mt->aux_disable & INTEL_AUX_DISABLE_MCS) assert(mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS); return mt; @@ -1524,7 +1524,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw, { assert(brw->gen >= 7); /* MCS only used on Gen7+ */ assert(mt->mcs_buf == NULL); - assert(!mt->disable_aux_buffers); + assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0); /* Choose the correct format for the MCS buffer. All that really matters * is that we allocate the right buffer size, since we'll always be @@ -1583,7 +1583,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw, bool is_lossless_compressed) { assert(mt->mcs_buf == NULL); - assert(!mt->disable_aux_buffers); + assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0); assert(!mt->no_ccs); struct isl_surf temp_main_surf; @@ -1922,7 +1922,7 @@ intel_miptree_wants_hiz_buffer(struct brw_context *brw, if (mt->hiz_buf != NULL) return false; - if (mt->disable_aux_buffers) + if (mt->aux_disable & INTEL_AUX_DISABLE_HIZ) return false; switch (mt->format) { @@ -1942,7 +1942,7 @@ intel_miptree_alloc_hiz(struct brw_context *brw, struct intel_mipmap_tree *mt) { assert(mt->hiz_buf == NULL); - assert(!mt->disable_aux_buffers); + assert((mt->aux_disable & INTEL_AUX_DISABLE_HIZ) == 0); if (brw->gen == 7) { mt->hiz_buf = intel_gen7_hiz_buf_create(brw, mt); @@ -2346,7 +2346,7 @@ intel_miptree_make_shareable(struct brw_context *brw, mt->hiz_buf = NULL; } - mt->disable_aux_b
[Mesa-dev] [PATCH 2/2] i965/miptree: Create a disable CCS flag
Cc: Topi Pohjolainen Cc: Chad Versace Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_blorp.c | 2 +- src/mesa/drivers/dri/i965/brw_draw.c | 3 ++- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 21 ++--- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 +++--- 4 files changed, 16 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 43ac3be2b9..8d58616f59 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -808,7 +808,7 @@ do_single_blorp_clear(struct brw_context *brw, struct gl_framebuffer *fb, if (set_write_disables(irb, ctx->Color.ColorMask[buf], color_write_disable)) can_fast_clear = false; - if (irb->mt->no_ccs || + if (irb->mt->aux_disable & INTEL_AUX_DISABLE_CCS || !brw_is_color_fast_clear_compatible(brw, irb->mt, &ctx->Color.ClearColor)) can_fast_clear = false; diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index d4cc2235a4..940ce70c64 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -417,7 +417,8 @@ brw_predraw_set_aux_buffers(struct brw_context *brw) * while each layer may have its own fast clear color value. For * compressed buffers color value is available in the color buffer. */ - if (irb->layer_count > 1 && !irb->mt->no_ccs && + if (irb->layer_count > 1 && + !(irb->mt->aux_disable & INTEL_AUX_DISABLE_CCS) && !intel_miptree_is_lossless_compressed(brw, irb->mt)) { assert(brw->gen >= 8); diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index e99d8a128b..a81905f3a4 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -324,7 +324,7 @@ intel_miptree_create_layout(struct brw_context *brw, mt->logical_depth0 = depth0; mt->aux_disable = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0 ? INTEL_AUX_DISABLE_ALL : INTEL_AUX_DISABLE_NONE; - mt->no_ccs = true; + mt->aux_disable |= INTEL_AUX_DISABLE_CCS; mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0; exec_list_make_empty(&mt->hiz_map); exec_list_make_empty(&mt->color_resolve_map); @@ -736,7 +736,7 @@ intel_miptree_create(struct brw_context *brw, */ if (intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) && intel_miptree_supports_non_msrt_fast_clear(brw, mt)) { - mt->no_ccs = false; + mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS; assert(brw->gen < 8 || mt->halign == 16 || num_samples <= 1); /* On Gen9+ clients are not currently capable of consuming compressed @@ -858,7 +858,7 @@ intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, */ if (intel_tiling_supports_non_msrt_mcs(intel, singlesample_mt->tiling) && intel_miptree_supports_non_msrt_fast_clear(intel, singlesample_mt)) { - singlesample_mt->no_ccs = false; + singlesample_mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS; } if (num_samples == 0) { @@ -1583,8 +1583,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw, bool is_lossless_compressed) { assert(mt->mcs_buf == NULL); - assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0); - assert(!mt->no_ccs); + assert((mt->aux_disable & (INTEL_AUX_DISABLE_MCS|INTEL_AUX_DISABLE_CCS)) == 0); struct isl_surf temp_main_surf; struct isl_surf temp_ccs_surf; @@ -2160,7 +2159,8 @@ intel_miptree_check_color_resolve(const struct brw_context *brw, const struct intel_mipmap_tree *mt, unsigned level, unsigned layer) { - if (mt->no_ccs || !mt->mcs_buf) + + if ((mt->aux_disable & INTEL_AUX_DISABLE_CCS) || !mt->mcs_buf) return; /* Fast color clear is supported for mipmapped surfaces only on Gen8+. */ @@ -2240,7 +2240,7 @@ intel_miptree_needs_color_resolve(const struct brw_context *brw, const struct intel_mipmap_tree *mt, int flags) { - if (mt->no_ccs) + if (mt->aux_disable & INTEL_AUX_DISABLE_CCS) return false; const bool is_lossless_compressed = @@ -2334,19 +2334,18 @@ intel_miptree_make_shareable(struct brw_context *brw, if (mt->mcs_buf) { intel_miptree_all_slices_resolve_color(brw, mt, 0); - mt->no_ccs = true; + mt->aux_disable |= (INTEL_AUX_DISABLE_CCS | INTEL_AUX_DISABLE_MCS); drm_intel_bo_unreference(mt->mcs_buf->bo); free(mt->mcs_buf); mt->mcs_buf = NULL; } if (mt->hiz_buf) { + mt->aux_disable |= INTEL_AUX_DISABLE_HIZ; intel_miptree_all_slices_resolve_depth(brw, mt); intel_miptree_hiz_buffer_free(mt->hiz_buf); mt->hiz_buf = NULL;
[Mesa-dev] [Bug 99244] hud_context.c:874:45: error: ‘W_OK’ undeclared
https://bugs.freedesktop.org/show_bug.cgi?id=99244 Bug ID: 99244 Summary: hud_context.c:874:45: error: ‘W_OK’ undeclared Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Keywords: bisected, regression Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: v...@freedesktop.org QA Contact: mesa-dev@lists.freedesktop.org CC: edmo...@eriadon.com, mar...@gmail.com mesa: 3f5fba8a7be61bfc0f46a5ea058108f6e0e1c268 (master 13.1.0-devel) Compiling src/gallium/auxiliary/hud/hud_context.c ... src/gallium/auxiliary/hud/hud_context.c: In function ‘hud_graph_set_dump_file’: src/gallium/auxiliary/hud/hud_context.c:874:24: warning: implicit declaration of function ‘access’ [-Wimplicit-function-declaration] if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) { ^~ src/gallium/auxiliary/hud/hud_context.c:874:45: error: ‘W_OK’ undeclared (first use in this function) if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) { ^~~~ 22cd9040da75cac6e1c61b821e1ac6906ac4a8ac is the first bad commit commit 22cd9040da75cac6e1c61b821e1ac6906ac4a8ac Author: Edmondo Tommasina Date: Wed Dec 21 22:58:09 2016 +0100 gallium/hud: dump hud_driver_query values to files Dump values for every selected data source in GALLIUM_HUD. Every data source has its own file and the filename is equal to the data source identifier. Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory. No values are dumped if the environment variable is not set, the directory doesn't exist or the user doesn't have write access. Signed-off-by: Marek Olšák :04 04 c269ea1aa72e829368e14dc3f923521c2d2de355 e297a9c27c8397302ab9ba61578eacab98cb8bc3 M src bisect run success -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] glsl: avoid treating fb fetches as output reads to be lowered
Signed-off-by: Ilia Mirkin --- src/compiler/glsl/lower_output_reads.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/lower_output_reads.cpp b/src/compiler/glsl/lower_output_reads.cpp index 851078b..bd3accb 100644 --- a/src/compiler/glsl/lower_output_reads.cpp +++ b/src/compiler/glsl/lower_output_reads.cpp @@ -90,7 +90,7 @@ output_read_remover::~output_read_remover() ir_visitor_status output_read_remover::visit(ir_dereference_variable *ir) { - if (ir->var->data.mode != ir_var_shader_out) + if (ir->var->data.mode != ir_var_shader_out || ir->var->data.fb_fetch_output) return visit_continue; hash_entry *entry = _mesa_hash_table_search(replacements, ir->var); -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] st/mesa: add support for advanced blend when fb can be fetched from
This implements support for emitting FBFETCH ops, using the existing lowering pass for advanced blend logic, and disabling hw blend when advanced blending is enabled. Signed-off-by: Ilia Mirkin --- src/mesa/state_tracker/st_atom_blend.c| 2 +- src/mesa/state_tracker/st_cb_texturebarrier.c | 13 + src/mesa/state_tracker/st_extensions.c| 2 ++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 28 --- 4 files changed, 37 insertions(+), 8 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_blend.c b/src/mesa/state_tracker/st_atom_blend.c index b8d65bd..f76cfab 100644 --- a/src/mesa/state_tracker/st_atom_blend.c +++ b/src/mesa/state_tracker/st_atom_blend.c @@ -205,7 +205,7 @@ update_blend( struct st_context *st ) blend->logicop_enable = 1; blend->logicop_func = translate_logicop(ctx->Color.LogicOp); } - else if (ctx->Color.BlendEnabled) { + else if (ctx->Color.BlendEnabled && !ctx->Color._AdvancedBlendMode) { /* blending enabled */ for (i = 0, j = 0; i < num_state; i++) { diff --git a/src/mesa/state_tracker/st_cb_texturebarrier.c b/src/mesa/state_tracker/st_cb_texturebarrier.c index 7fd1cbd..29cd37c 100644 --- a/src/mesa/state_tracker/st_cb_texturebarrier.c +++ b/src/mesa/state_tracker/st_cb_texturebarrier.c @@ -55,6 +55,18 @@ st_TextureBarrier(struct gl_context *ctx) /** + * Called via ctx->Driver.BlendBarrier() + */ +static void +st_BlendBarrier(struct gl_context *ctx) +{ + struct pipe_context *pipe = st_context(ctx)->pipe; + + pipe->texture_barrier(pipe, PIPE_TEXTURE_BARRIER_FRAMEBUFFER); +} + + +/** * Called via ctx->Driver.MemoryBarrier() */ static void @@ -118,5 +130,6 @@ st_MemoryBarrier(struct gl_context *ctx, GLbitfield barriers) void st_init_texture_barrier_functions(struct dd_function_table *functions) { functions->TextureBarrier = st_TextureBarrier; + functions->BlendBarrier = st_BlendBarrier; functions->MemoryBarrier = st_MemoryBarrier; } diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index df6ad08..4921d37 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -614,6 +614,8 @@ void st_init_extensions(struct pipe_screen *screen, { o(ARB_transform_feedback2), PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME }, { o(ARB_transform_feedback3), PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS }, + { o(KHR_blend_equation_advanced), PIPE_CAP_TGSI_FS_FBFETCH }, + { o(EXT_blend_equation_separate), PIPE_CAP_BLEND_EQUATION_SEPARATE }, { o(EXT_depth_bounds_test),PIPE_CAP_DEPTH_BOUNDS_TEST }, { o(EXT_draw_buffers2),PIPE_CAP_INDEP_BLEND_ENABLE }, diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 9599296..42c9b87 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -2500,10 +2500,19 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir) else decl->size = type_size(var->type); - entry = new(mem_ctx) variable_storage(var, - PROGRAM_OUTPUT, - decl->mesa_index, - decl->array_id); + if (var->data.fb_fetch_output) { +st_dst_reg dst = st_dst_reg(get_temp(var->type)); +st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index, +var->type, component, decl->array_id); +emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src); +entry = new(mem_ctx) variable_storage(var, dst.file, dst.index, + dst.array_id); + } else { +entry = new(mem_ctx) variable_storage(var, + PROGRAM_OUTPUT, + decl->mesa_index, + decl->array_id); + } entry->component = component; this->variables.push_tail(entry); @@ -6788,8 +6797,9 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) continue; bool progress; - exec_list *ir = prog->_LinkedShaders[i]->ir; - gl_shader_stage stage = prog->_LinkedShaders[i]->Stage; + struct gl_linked_shader *shader = prog->_LinkedShaders[i]; + exec_list *ir = shader->ir; + gl_shader_stage stage = shader->Stage; const struct gl_shader_compiler_options *options = &ctx->Const.ShaderCompilerOptions[stage]; enum pipe_shader_type ptarget = st_shader_stage_to_ptarget(stage); @@ -6805,7 +6815,7 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
[Mesa-dev] [PATCH 5/7] gallium: add flags parameter to texture barrier
This is so that we can differentiate between flushing any framebuffer reading caches from regular sampler caches. Signed-off-by: Ilia Mirkin --- This felt too simple and silly to create an extra callback for, especially since the implementations that rely on texture sampling to retrieve fb contents will have them be identical. src/gallium/docs/source/context.rst | 3 ++- src/gallium/drivers/ddebug/dd_context.c | 4 ++-- src/gallium/drivers/ilo/ilo_draw.c | 2 +- src/gallium/drivers/nouveau/nv50/nv50_context.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 2 +- src/gallium/drivers/r300/r300_state.c | 2 +- src/gallium/drivers/r600/r600_state_common.c| 2 +- src/gallium/drivers/radeonsi/si_state.c | 2 +- src/gallium/drivers/softpipe/sp_flush.c | 4 ++-- src/gallium/drivers/softpipe/sp_flush.h | 2 +- src/gallium/drivers/trace/tr_context.c | 5 +++-- src/gallium/include/pipe/p_context.h| 2 +- src/gallium/include/pipe/p_defines.h| 6 ++ src/mesa/state_tracker/st_cb_texturebarrier.c | 2 +- 14 files changed, 24 insertions(+), 16 deletions(-) diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index 35f51a0..e9dcccf 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -578,7 +578,8 @@ texture_barrier %%% This function flushes all pending writes to the currently-set surfaces and -invalidates all read caches of the currently-set samplers. +invalidates all read caches of the currently-set samplers. This can be used +for both regular textures as well as for framebuffers read via FBREAD. diff --git a/src/gallium/drivers/ddebug/dd_context.c b/src/gallium/drivers/ddebug/dd_context.c index edcbf2c..0b1dfbb 100644 --- a/src/gallium/drivers/ddebug/dd_context.c +++ b/src/gallium/drivers/ddebug/dd_context.c @@ -676,11 +676,11 @@ dd_context_texture_subdata(struct pipe_context *_pipe, */ static void -dd_context_texture_barrier(struct pipe_context *_pipe) +dd_context_texture_barrier(struct pipe_context *_pipe, unsigned flags) { struct pipe_context *pipe = dd_context(_pipe)->pipe; - pipe->texture_barrier(pipe); + pipe->texture_barrier(pipe, flags); } static void diff --git a/src/gallium/drivers/ilo/ilo_draw.c b/src/gallium/drivers/ilo/ilo_draw.c index 6831d2c..bef238a 100644 --- a/src/gallium/drivers/ilo/ilo_draw.c +++ b/src/gallium/drivers/ilo/ilo_draw.c @@ -603,7 +603,7 @@ ilo_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) } static void -ilo_texture_barrier(struct pipe_context *pipe) +ilo_texture_barrier(struct pipe_context *pipe, unsigned flags) { struct ilo_context *ilo = ilo_context(pipe); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c b/src/gallium/drivers/nouveau/nv50/nv50_context.c index fc852d7..ece7da9 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c @@ -43,7 +43,7 @@ nv50_flush(struct pipe_context *pipe, } static void -nv50_texture_barrier(struct pipe_context *pipe) +nv50_texture_barrier(struct pipe_context *pipe, unsigned flags) { struct nouveau_pushbuf *push = nv50_context(pipe)->base.pushbuf; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c index c711cb0..8f2b974 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c @@ -44,7 +44,7 @@ nvc0_flush(struct pipe_context *pipe, } static void -nvc0_texture_barrier(struct pipe_context *pipe) +nvc0_texture_barrier(struct pipe_context *pipe, unsigned flags) { struct nouveau_pushbuf *push = nvc0_context(pipe)->base.pushbuf; diff --git a/src/gallium/drivers/r300/r300_state.c b/src/gallium/drivers/r300/r300_state.c index 196c0df..8c49bfd 100644 --- a/src/gallium/drivers/r300/r300_state.c +++ b/src/gallium/drivers/r300/r300_state.c @@ -2068,7 +2068,7 @@ static void r300_set_constant_buffer(struct pipe_context *pipe, } } -static void r300_texture_barrier(struct pipe_context *pipe) +static void r300_texture_barrier(struct pipe_context *pipe, unsigned flags) { struct r300_context *r300 = r300_context(pipe); diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 60490b0..a9dbc8b 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -91,7 +91,7 @@ void r600_emit_alphatest_state(struct r600_context *rctx, struct r600_atom *atom radeon_set_context_reg(cs, R_028438_SX_ALPHA_REF, alpha_ref); } -static void r600_texture_barrier(struct pipe_context *ctx) +static void r600_texture_barrier(struct pipe_context *ctx, unsigned flags) { struct r600_context *rctx = (struct r600_context *)ctx; diff --git a/src/gallium/drivers/radeonsi/si_st
[Mesa-dev] [PATCH 2/7] mesa: allow BlendBarrier to be used without support for full fb fetch
The extension spec is not currently published, so it's a bit premature to require it for BlendBarrier usage. Signed-off-by: Ilia Mirkin --- src/mesa/main/barrier.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/barrier.c b/src/mesa/main/barrier.c index 2f5b451..e55d13c 100644 --- a/src/mesa/main/barrier.c +++ b/src/mesa/main/barrier.c @@ -114,7 +114,8 @@ _mesa_BlendBarrier(void) { GET_CURRENT_CONTEXT(ctx); - if (!ctx->Extensions.MESA_shader_framebuffer_fetch_non_coherent) { + if (!ctx->Extensions.MESA_shader_framebuffer_fetch_non_coherent && + !ctx->Extensions.KHR_blend_equation_advanced) { _mesa_error(ctx, GL_INVALID_OPERATION, "glBlendBarrier(not supported)"); return; -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] gallium: add PIPE_CAP_TGSI_FS_FBFETCH
Signed-off-by: Ilia Mirkin --- src/gallium/docs/source/screen.rst | 2 ++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/swr/swr_screen.cpp | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 17 files changed, 18 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 86aa259..5e561ca 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -366,6 +366,8 @@ The integer capabilities: ARB_transform_feedback3. * ``PIPE_CAP_TGSI_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read from the output file. +* ``PIPE_CAP_TGSI_FS_FBFETCH``: Whether a fragment shader can use the FBFETCH + opcode to retrieve the current value in the framebuffer. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index d84cd82..4d0981e 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -295,6 +295,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS: case PIPE_CAP_TGSI_ARRAY_COMPONENTS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 14f4271..1f13e4d 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -296,6 +296,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index c3fad73..93509bb 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -519,6 +519,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_TGSI_ARRAY_COMPONENTS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 83045cd..3d8acdc 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -340,6 +340,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; } /* should only get here on unhandled cases */ diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index beffeac..759fc23 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -205,6 +205,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_ARRAY_COMPONENTS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 22da3c9..0cfdb3c 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -259,6 +259,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_TGSI_FS_FBFETCH: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 2526a14..fe637af 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_s
[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value
Signed-off-by: Ilia Mirkin --- src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- src/gallium/docs/source/tgsi.rst | 11 +++ src/gallium/include/pipe/p_shader_tokens.h | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 37549aa..e34b8c7 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP }, { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS }, { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB }, - { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 }, /* removed */ + { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH }, { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV }, { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 }, { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL }, diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index d2d30b4..accbe1d 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2561,6 +2561,17 @@ Resource Access Opcodes image, while .w will contain the number of samples for multi-sampled images. +.. opcode:: FBFETCH - Load data from framebuffer + + Syntax: ``FBFETCH dst, output`` + + Example: ``FBFETCH TEMP[0], OUT[0]`` + + Returns the color of the current position in the framebuffer from + before this fragment shader invocation. Always returns the same + value from multiple calls for a particular output within a single + invocation. + .. _threadsyncopcodes: diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index ee59df0..328768f 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -397,7 +397,7 @@ struct tgsi_property_data { #define TGSI_OPCODE_CMP 66 #define TGSI_OPCODE_SCS 67 #define TGSI_OPCODE_TXB 68 -/* gap */ +#define TGSI_OPCODE_FBFETCH 69 #define TGSI_OPCODE_DIV 70 #define TGSI_OPCODE_DP2 71 #define TGSI_OPCODE_TXL 72 -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] nvc0: enable FBFETCH with a special slot for color buffer 0
We don't need to support all the color buffers for advanced blend, just cb0. For Fermi, we use the special binding slots so that we don't overlap with user textures, while Kepler+ gets a dedicated position for the fb handle in the driver constbuf. This logic is only triggered when a FBREAD is actually present so it should be a no-op most of the time. Signed-off-by: Ilia Mirkin --- Firstly, I'm not 100% sure that the default_tsc is needed. I had temporary failures with srgb decoding that went away on later runs that I was unable to reproduce even by forcing it to have srgb conversion disabled in the sampler. And TXF shouldn't need samplers. However I remember a weird situation where TXF was producing failures without a sampler bound. Secondly this needs to get testing on Fermi. All the dEQP advanced blend tests pass on Kepler though. docs/features.txt | 2 +- docs/relnotes/13.1.0.html | 1 + .../drivers/nouveau/codegen/nv50_ir_driver.h | 2 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 41 ++ .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++-- src/gallium/drivers/nouveau/nvc0/nvc0_context.h| 4 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 ++ src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 6 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 2 + .../drivers/nouveau/nvc0/nvc0_state_validate.c | 92 +- 11 files changed, 170 insertions(+), 8 deletions(-) diff --git a/docs/features.txt b/docs/features.txt index c27d521..60cb68c 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -253,7 +253,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+ GL_EXT_color_buffer_float DONE (all drivers) - GL_KHR_blend_equation_advancedDONE (i965) + GL_KHR_blend_equation_advancedDONE (i965, nvc0) GL_KHR_debug DONE (all drivers) GL_KHR_robustness DONE (i965, nvc0, radeonsi) GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+) diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html index 4dce843..be2c206 100644 --- a/docs/relnotes/13.1.0.html +++ b/docs/relnotes/13.1.0.html @@ -45,6 +45,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_post_depth_coverage on i965/gen9+ +GL_KHR_blend_equation_advanced on nvc0 GL_INTEL_conservative_rasterization on i965/gen9+ GL_NV_image_formats on any driver supporting GL_ARB_shader_image_load_store (i965, nvc0, radeonsi, softpipe) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index 9fdabcc..fbb692d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -146,6 +146,7 @@ struct nv50_ir_prog_info bool usesDiscard; bool persampleInvocation; bool usesSampleMaskIn; + bool readsFramebuffer; } fp; struct { uint32_t inputOffset; /* base address for user args */ @@ -179,6 +180,7 @@ struct nv50_ir_prog_info bool nv50styleSurfaces;/* generate gX[] access for raw buffers */ bool halfPixelCenter; /* externally set half pixel center state */ uint16_t texBindBase; /* base address for tex handles (nve4) */ + uint16_t fbtexBindBase;/* base address for fbtex handle (nve4) */ uint16_t suInfoBase; /* base address for surface info (nve4) */ uint16_t bufInfoBase; /* base address for buffer info */ uint16_t sampleInfoBase; /* base address for sample positions */ diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index d4e29a0..84d04d1 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1460,6 +1460,9 @@ bool Source::scanInstruction(const struct tgsi_full_instruction *inst) if (insn.getOpcode() == TGSI_OPCODE_BARRIER) info->numBarriers = 1; + if (insn.getOpcode() == TGSI_OPCODE_FBFETCH) + info->prop.fp.readsFramebuffer = true; + if (insn.dstCount()) { Instruction::DstRegister dst = insn.getDst(0); @@ -1575,6 +1578,7 @@ private: void handleTEX(Value *dst0[4], int R, int S, int L, int C, int Dx, int Dy); void handleTXF(Value *dst0[4], int R, int L_M); void handleTXQ(Value *dst0[4], enum TexQuery, int R); + void handleFBFETCH(Value *dst0[4]); void handleLIT(Value *dst0[4]); void handleUserClipPlanes(); @@ -2294,6 +2298,40 @@ Converter::handleTXF(Value *dst[4], int R, int L_M) } void +Converter::hand
Re: [Mesa-dev] [PATCH 18/27] i965/miptree: Add a return for updating of winsys
On Sat, Dec 31, 2016 at 02:40:42PM -0800, Ben Widawsky wrote: > On 16-12-10 15:39:12, Pohjolainen, Topi wrote: > > On Thu, Dec 01, 2016 at 02:09:59PM -0800, Ben Widawsky wrote: > > > From: Ben Widawsky > > > > > > There is nothing particularly useful to do currently if the update > > > fails, but there is no point carrying on either. As a result, this has a > > > behavior change. > > > > > > Signed-off-by: Ben Widawsky > > > --- > > > src/mesa/drivers/dri/i965/brw_context.c | 14 -- > > > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++--- > > > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +- > > > 3 files changed, 12 insertions(+), 10 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > > > b/src/mesa/drivers/dri/i965/brw_context.c > > > index b928f94..593fa67 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_context.c > > > +++ b/src/mesa/drivers/dri/i965/brw_context.c > > > @@ -1645,9 +1645,10 @@ intel_process_dri2_buffer(struct brw_context *brw, > > >return; > > > } > > > > > > - intel_update_winsys_renderbuffer_miptree(brw, rb, bo, > > > -drawable->w, drawable->h, > > > -buffer->pitch); > > > + if (intel_update_winsys_renderbuffer_miptree(brw, rb, bo, > > > +drawable->w, drawable->h, > > > +buffer->pitch)) > > > + return; > > > > > > if (_mesa_is_front_buffer_drawing(fb) && > > > (buffer->attachment == __DRI_BUFFER_FRONT_LEFT || > > > @@ -1703,9 +1704,10 @@ intel_update_image_buffer(struct brw_context > > > *intel, > > > if (last_mt && last_mt->bo == buffer->bo) > > >return; > > > > > > - intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo, > > > -buffer->width, > > > buffer->height, > > > -buffer->pitch); > > > + if (intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo, > > > +buffer->width, > > > buffer->height, > > > +buffer->pitch)) > > > + return; > > > > > > if (_mesa_is_front_buffer_drawing(fb) && > > > buffer_type == __DRI_IMAGE_BUFFER_FRONT && > > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > index d002546..74db507 100644 > > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > @@ -908,7 +908,7 @@ intel_miptree_create_for_image(struct brw_context > > > *intel, > > > * that will contain the actual rendering (which is lazily resolved to > > > * irb->singlesample_mt). > > > */ > > > -void > > > +int > > > > We don't seem to use "zero for success"-style at least in i965. Could you > > change this to bool and flip the check earlier for consistency? > > > > What do you mean by flip the check earlier? Just the intel_update_winsys_renderbuffer_miptree() return value check, sorry for confusing (that is obvious of course without saying). > > > > intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, > > > struct intel_renderbuffer *irb, > > > drm_intel_bo *bo, > > > @@ -974,12 +974,12 @@ intel_update_winsys_renderbuffer_miptree(struct > > > brw_context *intel, > > > irb->mt = multisample_mt; > > >} > > > } > > > - return; > > > + return 0; > > > > > > fail: > > > intel_miptree_release(&irb->singlesample_mt); > > > intel_miptree_release(&irb->mt); > > > - return; > > > + return -1; > > > } > > > > > > struct intel_mipmap_tree* > > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > > index 7b9a7be..85fe118 100644 > > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > > @@ -726,7 +726,7 @@ intel_miptree_create_for_image(struct brw_context > > > *intel, > > > uint32_t pitch, > > > uint32_t layout_flags); > > > > > > -void > > > +int > > > intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, > > > struct intel_renderbuffer *irb, > > > drm_intel_bo *bo, > > > -- > > > 2.10.2 > > > > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/me