Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf

2017-01-01 Thread Vladislav Egorov



01.01.2017 06:41, Kenneth Graunke пишет:

On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote:

From: Marek Olšák 

This reduces compile times by 4.5% with the Gallium noop driver and
gl_constants::GLSLOptimizeConservatively == true.

Compile times of...what exactly?  Do you have any statistics for this
by itself?

Assuming we add your helper, this patch looks reasonable.
Reviewed-by: Kenneth Graunke 

BTW, I suspect you could get some additional speed up by changing

parser->output = ralloc_strdup(parser, "");

to something like:

parser->output = ralloc_size(parser, strlen(orig_concatenated_src));
parser->output[0] = '\0';

to try and avoid reallocations.  rewrite_tail will realloc just enough
space every time it allocates, which means once you reallocate, you're
going to be calling realloc on every single token.  Yuck!

ralloc/talloc's string libraries were never meant for serious string
processing like the preprocessor does.  They're meant for convenience
when constructing debug messages which don't need to be that efficient.

Perhaps a better approach would be to have the preprocessor do this
itself.  Just ralloc_size() output and initialize the null byte.
reralloc to double the size if you need more space.  At the end of
preprocessing, reralloc to output_length at the end of free any waste
from doubling.

I suspect that would be a *lot* more efficient, and is probably what
we should have done in the first place...
I have similar patch (maybe need 1-2 days to clean it up), and I've 
tested both variants. String in exponentially growing (by +50%) string 
buffer works better, but not *THAT* much better as I expected. It seems 
that in the sequence of str = realloc(str, 1001); str = realloc(str, 
1002); str = realloc(str, 1003), etc. most of reallocs will be 
non-moving in both glibc's allocator and jemalloc. For example, jemalloc 
have size classes that already grow exponentially by 15-25% - ..., 4K, 
5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ...  realloc 
will just test if the requested size belongs to the same size class and 
do nothing. Reallocs inside of the same size class will be always 
non-moving and almost free. Overall avoiding formatted printing (DOUBLE 
formatted printing, which is entirely avoidable too) gives the single 
largest boost to the pre-processor.


Benchmark on my shader-db (glcpp and shader-db's run smashed together to 
do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, 
which can be important, because jemalloc changed its size class strategy 
since then.

perf stat --repeat 10
master8.91s
master+jemalloc   8.60s
Marek's patch 5.50s
Marek's patch+jemalloc5.03s
my string_buffer  4.57s
my string_buffer+jemalloc 4.43s
my series 3.83s
my series+jemalloc3.68s

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf

2017-01-01 Thread Marek Olšák
On Jan 1, 2017 4:41 AM, "Kenneth Graunke"  wrote:

On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote:
> From: Marek Olšák 
>
> This reduces compile times by 4.5% with the Gallium noop driver and
> gl_constants::GLSLOptimizeConservatively == true.

Compile times of...what exactly?  Do you have any statistics for this
by itself?


I always use my shader-db with almost 30k shaders and I used the time
command.


Assuming we add your helper, this patch looks reasonable.
Reviewed-by: Kenneth Graunke 

BTW, I suspect you could get some additional speed up by changing

   parser->output = ralloc_strdup(parser, "");

to something like:

   parser->output = ralloc_size(parser, strlen(orig_concatenated_src));
   parser->output[0] = '\0';

to try and avoid reallocations.  rewrite_tail will realloc just enough
space every time it allocates, which means once you reallocate, you're
going to be calling realloc on every single token.  Yuck!

ralloc/talloc's string libraries were never meant for serious string
processing like the preprocessor does.  They're meant for convenience
when constructing debug messages which don't need to be that efficient.

Perhaps a better approach would be to have the preprocessor do this
itself.  Just ralloc_size() output and initialize the null byte.
reralloc to double the size if you need more space.  At the end of
preprocessing, reralloc to output_length at the end of free any waste
from doubling.

I suspect that would be a *lot* more efficient, and is probably what
we should have done in the first place...


Sure. However, realloc is a no-op when there is free space after the
allocation. It's a useless call, but it doesn't look too bad in a profiler.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf

2017-01-01 Thread Marek Olšák
On Jan 1, 2017 11:55 AM, "Vladislav Egorov"  wrote:



01.01.2017 06:41, Kenneth Graunke пишет:

On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote:
>
>> From: Marek Olšák 
>>
>> This reduces compile times by 4.5% with the Gallium noop driver and
>> gl_constants::GLSLOptimizeConservatively == true.
>>
> Compile times of...what exactly?  Do you have any statistics for this
> by itself?
>
> Assuming we add your helper, this patch looks reasonable.
> Reviewed-by: Kenneth Graunke 
>
> BTW, I suspect you could get some additional speed up by changing
>
> parser->output = ralloc_strdup(parser, "");
>
> to something like:
>
> parser->output = ralloc_size(parser, strlen(orig_concatenated_src));
> parser->output[0] = '\0';
>
> to try and avoid reallocations.  rewrite_tail will realloc just enough
> space every time it allocates, which means once you reallocate, you're
> going to be calling realloc on every single token.  Yuck!
>
> ralloc/talloc's string libraries were never meant for serious string
> processing like the preprocessor does.  They're meant for convenience
> when constructing debug messages which don't need to be that efficient.
>
> Perhaps a better approach would be to have the preprocessor do this
> itself.  Just ralloc_size() output and initialize the null byte.
> reralloc to double the size if you need more space.  At the end of
> preprocessing, reralloc to output_length at the end of free any waste
> from doubling.
>
> I suspect that would be a *lot* more efficient, and is probably what
> we should have done in the first place...
>
I have similar patch (maybe need 1-2 days to clean it up), and I've tested
both variants. String in exponentially growing (by +50%) string buffer
works better, but not *THAT* much better as I expected. It seems that in
the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str =
realloc(str, 1003), etc. most of reallocs will be non-moving in both
glibc's allocator and jemalloc. For example, jemalloc have size classes
that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K,
12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ...  realloc will just test if the
requested size belongs to the same size class and do nothing. Reallocs
inside of the same size class will be always non-moving and almost free.
Overall avoiding formatted printing (DOUBLE formatted printing, which is
entirely avoidable too) gives the single largest boost to the pre-processor.

Benchmark on my shader-db (glcpp and shader-db's run smashed together to do
only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which
can be important, because jemalloc changed its size class strategy since
then.
perf stat --repeat 10
master8.91s
master+jemalloc   8.60s
Marek's patch 5.50s
Marek's patch+jemalloc5.03s
my string_buffer  4.57s
my string_buffer+jemalloc 4.43s
my series 3.83s
my series+jemalloc3.68s


Since you are further than me, let's merge your work instead.

Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.

2017-01-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99240

Bug ID: 99240
   Summary: glxinfo reports different extensions for wayland vs
xorg.
   Product: Mesa
   Version: 13.0
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: pandiculationfi...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 128704
  --> https://bugs.freedesktop.org/attachment.cgi?id=128704&action=edit
wayland glxinfo output

glxinfo reports different opengl extensions under wayland when compared to
xorg.

seems to possibly cause stellaris to fail to launch. I'm not sure if this is a
wayland bug, or a mesa bug, maybe even a SDL bug?

wayland stellaris errors:
==> system.log <==
[08:53:21][gfx_opengl.cpp:3187]: Loading settings for adapter -1, fullscreen=1,
multisample=15
[08:53:21][gfx_opengl.cpp:3257]: Checking for multi-sampling support:
[08:53:21][gfx_opengl.cpp:3282]: MaxTextureWidth: 16384
[08:53:21][gfx_opengl.cpp:3283]: MaxTextureHeight: 16384
[08:53:21][graphicssettings.cpp:149]:   2 samples supported.
[08:53:21][graphicssettings.cpp:149]:   4 samples supported.
[08:53:21][graphicssettings.cpp:149]:   8 samples supported.
[08:53:21][graphicssettings.cpp:180]: Support for Anisotropic filtering found.
[08:53:21][graphicssettings.cpp:183]:   Max Anisotropic filtering supported: 16
[08:53:21][graphicssettings.cpp:184]:   Anisotropic filtering set to: 16
[08:53:21][graphicssettings.cpp:669]: Using multisampling: 4
[08:53:21][graphicssettings.cpp:669]: Using multisampling: 4
[08:53:21][graphics.cpp:762]: Creating device with: HARDWARE_VERTEXPROCESSING
Fullscreen=no
Resolution=0x0

==> error.log <==
[08:53:21][gfx_opengl.cpp:1066]: SDL_GetWindowFromID failed: Invalid window
[08:53:21][gfx_opengl.cpp:1077]: SDL_GL_CreateContext failed: Invalid window
[08:53:21][gfx_opengl.cpp:]: glewInit failed: Missing GL version

==> system.log <==
[08:53:21][gfx_opengl.cpp:1028]:
OpenGL Version: [08:53:21][graphics.cpp:797]: Done creating device.

==> error.log <==
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/text.shader(Text)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/text.shader(Text3D)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Up)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Down)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Disable)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Over)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(TextUp)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(TextDown)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(TextDisable)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(TextOver)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Up)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Down)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Disable)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Over)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Up)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Down)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Disable)
[08:53:21][pdxshaderparser.cpp:1362]: Failed linking shader
gfx/FX/buttonstate_onlydisable.shader(Over)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.

2017-01-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99240

--- Comment #1 from pandiculationfi...@gmail.com ---
Created attachment 128705
  --> https://bugs.freedesktop.org/attachment.cgi?id=128705&action=edit
xorg glxinfo output

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99240] glxinfo reports different extensions for wayland vs xorg.

2017-01-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99240

--- Comment #2 from pandiculationfi...@gmail.com ---
diff of the first 50 or so lines of glxinfo.
diff wayland-glxinfo.txt xorg-glxinfo.txt | head -50
7,8c7,9
< GLX_ARB_create_context, GLX_ARB_create_context_profile,
< GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample,
---
> GLX_ARB_create_context, GLX_ARB_create_context_profile, 
> GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float,
> GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample,
12,14c13,15
< GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_OML_swap_method,
< GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer,
< GLX_SGIX_visual_select_group, GLX_SGI_make_current_read
---
> GLX_EXT_visual_rating, GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer,
> GLX_OML_swap_method, GLX_SGIS_multisample, GLX_SGIX_fbconfig,
> GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_SGI_swap_control
34,39c35,41
< GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB,
< GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_buffer_age,
< GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile,
< GLX_EXT_fbconfig_packed_float, GLX_EXT_framebuffer_sRGB,
< GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info,
< GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer,
---
> GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float,
> GLX_ARB_framebuffer_sRGB, GLX_ARB_get_proc_address, GLX_ARB_multisample,
> GLX_EXT_buffer_age, GLX_EXT_create_context_es2_profile,
> GLX_EXT_create_context_es_profile, GLX_EXT_fbconfig_packed_float,
> GLX_EXT_framebuffer_sRGB, GLX_EXT_import_context,
> GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating,
> GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer,
44c46
< GLX_SGI_video_sync
---
> GLX_SGI_swap_control, GLX_SGI_video_sync
50c52
< Video memory: 7994MB
---
> Video memory: 8039MB
300c302
< 192 GLX Visuals
---
> 480 GLX Visuals

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/14] glsl: Don't rehash the values when copying to new table

2017-01-01 Thread Thomas Helland
Really, we should have some kind of function for copying the whole
table, but this will work for now.
---
 src/compiler/glsl/opt_copy_propagation.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation.cpp 
b/src/compiler/glsl/opt_copy_propagation.cpp
index 247c4988ed..e9f82e0644 100644
--- a/src/compiler/glsl/opt_copy_propagation.cpp
+++ b/src/compiler/glsl/opt_copy_propagation.cpp
@@ -210,7 +210,8 @@ ir_copy_propagation_visitor::handle_if_block(exec_list 
*instructions)
/* Populate the initial acp with a copy of the original */
struct hash_entry *entry;
hash_table_foreach(orig_acp, entry) {
-  _mesa_hash_table_insert(acp, entry->key, entry->data);
+  _mesa_hash_table_insert_pre_hashed(acp, entry->hash,
+ entry->key, entry->data);
}
 
visit_list_elements(this, instructions);
@@ -259,7 +260,8 @@ ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool 
keep_acp)
if (keep_acp) {
   struct hash_entry *entry;
   hash_table_foreach(orig_acp, entry) {
- _mesa_hash_table_insert(acp, entry->key, entry->data);
+ _mesa_hash_table_insert_pre_hashed(acp, entry->hash,
+entry->key, entry->data);
   }
}
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation

2017-01-01 Thread Thomas Helland
Hi all.

This is a series I've put together to lower the overhead of
the copy propagation pass in glsl. The series is not refined,
so please treat it as a proof of concept / RFC.

The first five patches might be merge-ready, but their benefit is
quite low compared to the actual major change here.

The series goes on to copy our existing hash table implementation,
and modify it in steps to allow insertion of multiple data entries
with the same key. I've kept the series in steps corresponding to
the way I worked when modifying it, as I found it nice for clarity.
There is some obvious cleanups and squashing that will need to happen
before this can be considered for merging.

The major change is the last patch. This uses the new hash table in
our copy propagation pass. This effectively makes the algorithm
O(2n) instead of O(n^2), which is a big win. I tested with the
shader mentioned in [1] but took the shaders and made a shader_test
that I used in conjunction with shader-db to do testing.
I cut down on the ridiculous amount of expression some, to make it
even compile in a reasonable time on my poor laptop.

So, on to the results. Compilation of the aforementioned shader
went from 98 to 75 seconds. Perf shows that previously we spent
19% of the time walking the acp-table with _mesa_hash_table_next_entry.
After this series, the hash table is practically gone from the profile.
This is a major win, and I'm guite happy with the results.
There is obviously a lot more work to be done though.

Please leave critical feedback and ideas. This was merely a test
from my part, but it seems there is some opportunities here.

https://bugs.freedesktop.org/show_bug.cgi?id=94477

Thomas Helland (14):
  glsl: Don't rehash the values when copying to new table
  glsl: Don't compute the hash when we already have it available
  nir: Remove unused include of nir_array
  nir: Move nir_array to util, and rename to dyn_array
  glsl: Use dyn_array instead of the exec_list
  util: Make a copy of the existing hash table implementation
  util: Add field for pointing to next data
  util: Implement the insertion functionality
  util: Implement deletion of entries
  util: Add a comment about ordering of entries with matching keys
  util: Implement returning the last added entry on search
  util: Rename functions and structs
  util: Use hashing functions from hash_table.h
  glsl: Restructure copy propagation to use the non replacing hash table

 src/compiler/Makefile.sources  |   1 -
 src/compiler/glsl/opt_copy_propagation.cpp | 112 +++--
 .../glsl/opt_copy_propagation_elements.cpp |   6 +-
 src/compiler/nir/nir_lower_locals_to_regs.c|   1 -
 src/compiler/spirv/vtn_cfg.c   |   6 +-
 src/compiler/spirv/vtn_private.h   |   4 +-
 src/util/Makefile.sources  |   3 +
 src/{compiler/nir/nir_array.h => util/dyn_array.h} |  18 +-
 src/util/non_replacing_hash_table.c| 513 +
 src/util/non_replacing_hash_table.h| 135 ++
 10 files changed, 743 insertions(+), 56 deletions(-)
 rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%)
 create mode 100644 src/util/non_replacing_hash_table.c
 create mode 100644 src/util/non_replacing_hash_table.h

-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/14] nir: Remove unused include of nir_array

2017-01-01 Thread Thomas Helland
---
 src/compiler/nir/nir_lower_locals_to_regs.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/compiler/nir/nir_lower_locals_to_regs.c 
b/src/compiler/nir/nir_lower_locals_to_regs.c
index f1af237b5e..d0667bc504 100644
--- a/src/compiler/nir/nir_lower_locals_to_regs.c
+++ b/src/compiler/nir/nir_lower_locals_to_regs.c
@@ -26,7 +26,6 @@
  */
 
 #include "nir.h"
-#include "nir_array.h"
 
 struct locals_to_regs_state {
nir_shader *shader;
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/14] glsl: Don't compute the hash when we already have it available

2017-01-01 Thread Thomas Helland
We should really have a function to copy the table contents,
but this will at least get us somewhere in the meantime.
---
 src/compiler/glsl/opt_copy_propagation_elements.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp 
b/src/compiler/glsl/opt_copy_propagation_elements.cpp
index 9f79fa9202..3d11d028c5 100644
--- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
+++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
@@ -143,11 +143,13 @@ public:
   struct hash_entry *entry;
 
   hash_table_foreach(lhs, entry) {
- _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
+ _mesa_hash_table_insert_pre_hashed(lhs_ht, entry->hash,
+entry->key, entry->data);
   }
 
   hash_table_foreach(rhs, entry) {
- _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
+ _mesa_hash_table_insert_pre_hashed(rhs_ht, entry->hash,
+entry->key, entry->data);
   }
}
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/14] util: Add field for pointing to next data

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/util/non_replacing_hash_table.h 
b/src/util/non_replacing_hash_table.h
index b35ee871bb..7550400539 100644
--- a/src/util/non_replacing_hash_table.h
+++ b/src/util/non_replacing_hash_table.h
@@ -42,6 +42,11 @@ struct hash_entry {
uint32_t hash;
const void *key;
void *data;
+   /*
+* If 0, no more data for this key.
+* If anything else, it is position of the next data + 1.
+*/
+   uint32_t next_data;
 };
 
 struct hash_table {
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/14] util: Make a copy of the existing hash table implementation

2017-01-01 Thread Thomas Helland
---
 src/util/Makefile.sources   |   2 +
 src/util/non_replacing_hash_table.c | 504 
 src/util/non_replacing_hash_table.h | 159 
 3 files changed, 665 insertions(+)
 create mode 100644 src/util/non_replacing_hash_table.c
 create mode 100644 src/util/non_replacing_hash_table.h

diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
index b24611a363..6d7f0884fb 100644
--- a/src/util/Makefile.sources
+++ b/src/util/Makefile.sources
@@ -20,6 +20,8 @@ MESA_UTIL_FILES :=\
macros.h \
mesa-sha1.c \
mesa-sha1.h \
+   non_replacing_hash_table.c \
+   non_replacing_hash_table.h \
ralloc.c \
ralloc.h \
register_allocate.c \
diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
new file mode 100644
index 00..1730b1ee0d
--- /dev/null
+++ b/src/util/non_replacing_hash_table.c
@@ -0,0 +1,504 @@
+/*
+ * Copyright © 2009,2012 Intel Corporation
+ * Copyright © 1988-2004 Keith Packard and Bart Massey.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Except as contained in this notice, the names of the authors
+ * or their institutions shall not be used in advertising or
+ * otherwise to promote the sale, use or other dealings in this
+ * Software without prior written authorization from the
+ * authors.
+ *
+ * Authors:
+ *Eric Anholt 
+ *Keith Packard 
+ */
+
+/**
+ * Implements an open-addressing, linear-reprobing hash table.
+ *
+ * For more information, see:
+ *
+ * http://cgit.freedesktop.org/~anholt/hash_table/tree/README
+ */
+
+#include 
+#include 
+#include 
+
+#include "non_replacing_hash_table.h"
+#include "ralloc.h"
+#include "macros.h"
+
+static const uint32_t deleted_key_value;
+
+/**
+ * From Knuth -- a good choice for hash/rehash values is p, p-2 where
+ * p and p-2 are both prime.  These tables are sized to have an extra 10%
+ * free to avoid exponential performance degradation as the hash table fills
+ */
+static const struct {
+   uint32_t max_entries, size, rehash;
+} hash_sizes[] = {
+   { 2,5,  3 },
+   { 4,7,  5 },
+   { 8,13, 11},
+   { 16,   19, 17},
+   { 32,   43, 41},
+   { 64,   73, 71},
+   { 128,  151,149   },
+   { 256,  283,281   },
+   { 512,  571,569   },
+   { 1024, 1153,   1151  },
+   { 2048, 2269,   2267  },
+   { 4096, 4519,   4517  },
+   { 8192, 9013,   9011  },
+   { 16384,18043,  18041 },
+   { 32768,36109,  36107 },
+   { 65536,72091,  72089 },
+   { 131072,   144409, 144407},
+   { 262144,   288361, 288359},
+   { 524288,   576883, 576881},
+   { 1048576,  1153459,1153457   },
+   { 2097152,  2307163,2307161   },
+   { 4194304,  4613893,4613891   },
+   { 8388608,  9227641,9227639   },
+   { 16777216, 18455029,   18455027  },
+   { 33554432, 36911011,   36911009  },
+   { 67108864, 73819861,   73819859  },
+   { 134217728,147639589,  147639587 },
+   { 268435456,295279081,  295279079 },
+   { 536870912,590559793,  590559791 },
+   { 1073741824,   1181116273, 1181116271},
+   { 2147483648ul, 2362232233ul,   2362232231ul}
+};
+
+static int
+entry_is_free(const struct hash_entry *entry)
+{
+   return entry->key == NULL;
+}
+
+static int
+entry_

[Mesa-dev] [PATCH 05/14] glsl: Use dyn_array instead of the exec_list

2017-01-01 Thread Thomas Helland
This should give us lower memory consumption, allocation overhead,
better cache locality, and all this nice stuff.
---
 src/compiler/glsl/opt_copy_propagation.cpp | 57 +-
 1 file changed, 25 insertions(+), 32 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation.cpp 
b/src/compiler/glsl/opt_copy_propagation.cpp
index e9f82e0644..aa5e813553 100644
--- a/src/compiler/glsl/opt_copy_propagation.cpp
+++ b/src/compiler/glsl/opt_copy_propagation.cpp
@@ -38,24 +38,10 @@
 #include "ir_optimization.h"
 #include "compiler/glsl_types.h"
 #include "util/hash_table.h"
+#include "util/dyn_array.h"
 
 namespace {
 
-class kill_entry : public exec_node
-{
-public:
-   /* override operator new from exec_node */
-   DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(kill_entry)
-
-   kill_entry(ir_variable *var)
-   {
-  assert(var);
-  this->var = var;
-   }
-
-   ir_variable *var;
-};
-
 class ir_copy_propagation_visitor : public ir_hierarchical_visitor {
 public:
ir_copy_propagation_visitor()
@@ -65,7 +51,8 @@ public:
   lin_ctx = linear_alloc_parent(mem_ctx, 0);
   acp = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
 _mesa_key_pointer_equal);
-  this->kills = new(mem_ctx) exec_list;
+  this->kills = ralloc(mem_ctx, dyn_array);
+  dyn_array_init(this->kills, mem_ctx);
   killed_all = false;
}
~ir_copy_propagation_visitor()
@@ -89,10 +76,9 @@ public:
/** Hash of lhs->rhs: The available copies to propagate */
hash_table *acp;
/**
-* List of kill_entry: The variables whose values were killed in this
-* block.
+* List of the variables whose values were killed in this block.
 */
-   exec_list *kills;
+   dyn_array *kills;
 
bool progress;
 
@@ -112,12 +98,13 @@ 
ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir)
 * main() at link time, so they're irrelevant to us.
 */
hash_table *orig_acp = this->acp;
-   exec_list *orig_kills = this->kills;
+   dyn_array *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
-   this->kills = new(mem_ctx) exec_list;
+   this->kills = ralloc(mem_ctx, dyn_array);
+   dyn_array_init(this->kills, mem_ctx);
this->killed_all = false;
 
visit_list_elements(this, &ir->body);
@@ -199,12 +186,13 @@ void
 ir_copy_propagation_visitor::handle_if_block(exec_list *instructions)
 {
hash_table *orig_acp = this->acp;
-   exec_list *orig_kills = this->kills;
+   dyn_array *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
-   this->kills = new(mem_ctx) exec_list;
+   this->kills = ralloc(mem_ctx, dyn_array);
+   dyn_array_init(this->kills, mem_ctx);
this->killed_all = false;
 
/* Populate the initial acp with a copy of the original */
@@ -220,16 +208,18 @@ ir_copy_propagation_visitor::handle_if_block(exec_list 
*instructions)
   _mesa_hash_table_clear(orig_acp, NULL);
}
 
-   exec_list *new_kills = this->kills;
+   dyn_array *new_kills = this->kills;
this->kills = orig_kills;
_mesa_hash_table_destroy(acp, NULL);
this->acp = orig_acp;
this->killed_all = this->killed_all || orig_killed_all;
 
-   foreach_in_list(kill_entry, k, new_kills) {
-  kill(k->var);
+   dyn_array_foreach(new_kills, ir_variable *, var_ptr) {
+  ir_variable *var = *var_ptr;
+  kill(var);
}
 
+   dyn_array_fini(new_kills);
ralloc_free(new_kills);
 }
 
@@ -249,12 +239,13 @@ void
 ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp)
 {
hash_table *orig_acp = this->acp;
-   exec_list *orig_kills = this->kills;
+   dyn_array *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
-   this->kills = new(mem_ctx) exec_list;
+   this->kills = ralloc(mem_ctx, dyn_array);
+   dyn_array_init(this->kills, mem_ctx);
this->killed_all = false;
 
if (keep_acp) {
@@ -271,16 +262,18 @@ ir_copy_propagation_visitor::handle_loop(ir_loop *ir, 
bool keep_acp)
   _mesa_hash_table_clear(orig_acp, NULL);
}
 
-   exec_list *new_kills = this->kills;
+   dyn_array *new_kills = this->kills;
this->kills = orig_kills;
_mesa_hash_table_destroy(acp, NULL);
this->acp = orig_acp;
this->killed_all = this->killed_all || orig_killed_all;
 
-   foreach_in_list(kill_entry, k, new_kills) {
-  kill(k->var);
+   dyn_array_foreach(new_kills, ir_variable *, var_ptr) {
+  ir_variable *var = *var_ptr;
+  kill(var);
}
 
+   dyn_array_fini(new_kills);
ralloc_free(new_kills);
 }
 
@@ -320,7 +313,7 @@ ir_copy_propagation_visitor::kill(ir_variable *var)
 
/* Add the LHS varia

[Mesa-dev] [PATCH 04/14] nir: Move nir_array to util, and rename to dyn_array

2017-01-01 Thread Thomas Helland
--
We might want to merge u_vector and this array, or just replace this
implementation completely. Having two separate implementations, where
one of them has one sole user seems a bit wierd.
---
 src/compiler/Makefile.sources  |  1 -
 src/compiler/spirv/vtn_cfg.c   |  6 +++---
 src/compiler/spirv/vtn_private.h   |  4 ++--
 src/util/Makefile.sources  |  1 +
 src/{compiler/nir/nir_array.h => util/dyn_array.h} | 18 +-
 5 files changed, 15 insertions(+), 15 deletions(-)
 rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%)

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index be2215cc10..9c8f802cd2 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -178,7 +178,6 @@ NIR_GENERATED_FILES = \
 NIR_FILES = \
nir/nir.c \
nir/nir.h \
-   nir/nir_array.h \
nir/nir_builder.h \
nir/nir_clone.c \
nir/nir_constant_expressions.h \
diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 62b9056990..14255798cb 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -183,7 +183,7 @@ vtn_add_case(struct vtn_builder *b, struct vtn_switch 
*swtch,
   list_inithead(&c->body);
   c->start_block = case_block;
   c->fallthrough = NULL;
-  nir_array_init(&c->values, b);
+  dyn_array_init(&c->values, b);
   c->is_default = false;
   c->visited = false;
 
@@ -195,7 +195,7 @@ vtn_add_case(struct vtn_builder *b, struct vtn_switch 
*swtch,
if (is_default) {
   case_block->switch_case->is_default = true;
} else {
-  nir_array_add(&case_block->switch_case->values, uint32_t, val);
+  dyn_array_add(&case_block->switch_case->values, uint32_t, val);
}
 }
 
@@ -722,7 +722,7 @@ vtn_emit_cf_list(struct vtn_builder *b, struct list_head 
*cf_list,
 }
 
 nir_ssa_def *cond = NULL;
-nir_array_foreach(&cse->values, uint32_t, val) {
+dyn_array_foreach(&cse->values, uint32_t, val) {
nir_ssa_def *is_val =
   nir_ieq(&b->nb, sel, nir_imm_int(&b->nb, *val));
 
diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h
index 9302611803..9e0a2137e2 100644
--- a/src/compiler/spirv/vtn_private.h
+++ b/src/compiler/spirv/vtn_private.h
@@ -27,7 +27,7 @@
 
 #include "nir/nir.h"
 #include "nir/nir_builder.h"
-#include "nir/nir_array.h"
+#include "util/dyn_array.h"
 #include "nir_spirv.h"
 #include "spirv.h"
 
@@ -112,7 +112,7 @@ struct vtn_case {
struct vtn_case *fallthrough;
 
/* The uint32_t values that map to this case */
-   nir_array values;
+   dyn_array values;
 
/* True if this is the default case */
bool is_default;
diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
index d2ae5e734d..b24611a363 100644
--- a/src/util/Makefile.sources
+++ b/src/util/Makefile.sources
@@ -8,6 +8,7 @@ MESA_UTIL_FILES :=  \
debug.h \
disk_cache.c \
disk_cache.h \
+   dyn_array.h \
format_r11g11b10f.h \
format_rgb9e5.h \
format_srgb.h \
diff --git a/src/compiler/nir/nir_array.h b/src/util/dyn_array.h
similarity index 86%
rename from src/compiler/nir/nir_array.h
rename to src/util/dyn_array.h
index 1db4e8cea3..8a06990ab4 100644
--- a/src/compiler/nir/nir_array.h
+++ b/src/util/dyn_array.h
@@ -36,10 +36,10 @@ typedef struct {
size_t size;
size_t alloc;
void *data;
-} nir_array;
+} dyn_array;
 
 static inline void
-nir_array_init(nir_array *arr, void *mem_ctx)
+dyn_array_init(dyn_array *arr, void *mem_ctx)
 {
arr->mem_ctx = mem_ctx;
arr->size = 0;
@@ -48,7 +48,7 @@ nir_array_init(nir_array *arr, void *mem_ctx)
 }
 
 static inline void
-nir_array_fini(nir_array *arr)
+dyn_array_fini(dyn_array *arr)
 {
if (arr->mem_ctx)
   ralloc_free(arr->data);
@@ -56,18 +56,18 @@ nir_array_fini(nir_array *arr)
   free(arr->data);
 }
 
-#define NIR_ARRAY_INITIAL_SIZE 64
+#define DYN_ARRAY_INITIAL_SIZE 64
 
 /* Increments the size of the array by the given ammount and returns a
  * pointer to the beginning of the newly added space.
  */
 static inline void *
-nir_array_grow(nir_array *arr, size_t additional)
+dyn_array_grow(dyn_array *arr, size_t additional)
 {
size_t new_size = arr->size + additional;
if (new_size > arr->alloc) {
   if (arr->alloc == 0)
- arr->alloc = NIR_ARRAY_INITIAL_SIZE;
+ arr->alloc = DYN_ARRAY_INITIAL_SIZE;
 
   while (new_size > arr->alloc)
  arr->alloc *= 2;
@@ -84,10 +84,10 @@ nir_array_grow(nir_array *arr, size_t additional)
return ptr;
 }
 
-#define nir_array_add(arr, type, elem) \
-   *(type *)nir_array_grow(arr, sizeof(type)) = (elem)
+#define dyn_array_add(arr, type, elem) \
+   *(type *)dyn_array_grow(arr, sizeof(type)) = (elem)
 
-#define nir_array_foreach(arr, type, elem) \
+#define dyn_array_foreac

[Mesa-dev] [PATCH 10/14] util: Add a comment about ordering of entries with matching keys

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index 33c93452c4..1128388ab8 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -34,10 +34,11 @@
 
 /**
  * Implements an open-addressing, linear-reprobing hash table.
- *
- * For more information, see:
- *
- * http://cgit.freedesktop.org/~anholt/hash_table/tree/README
+ * This hash table does not implement replacement. Instead it
+ * allows for inserting multiple data entries for a single key.
+ * It is important to note that the table does not guarantee that
+ * the order of data with matching keys will not be scrambled
+ * during a rehashing of the table during a resize.
  */
 
 #include 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/14] util: Implement returning the last added entry on search

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index 1128388ab8..48f42aa7c9 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -222,7 +222,23 @@ hash_table_search(struct hash_table *ht, uint32_t hash, 
const void *key)
  return NULL;
   } else if (entry_is_present(ht, entry) && entry->hash == hash) {
  if (ht->key_equals_function(key, entry->key)) {
-return entry;
+struct hash_entry *previous_entry = entry;
+
+/* Follow the chain of matching key's to the end */
+while (previous_entry->next_data != 0 ) {
+   entry = ht->table + previous_entry->next_data;
+   /* We might have deleted the last entry, and have a
+* dangling reference, so check that the keys match
+* before we continue on traversing through the chain.
+* If the key's don't match we return the previous entry.
+*/
+   if (!ht->key_equals_function(key, entry->key))
+  return previous_entry;
+
+   previous_entry = entry;
+}
+
+return previous_entry;
  }
   }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/14] util: Use hashing functions from hash_table.h

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 47 -
 src/util/non_replacing_hash_table.h | 35 +--
 2 files changed, 1 insertion(+), 81 deletions(-)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index ecec94e2d6..64dc5c9c18 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -511,50 +511,3 @@ _mesa_non_replacing_hash_table_random_entry(struct 
non_replacing_hash_table *ht,
 
return NULL;
 }
-
-
-/**
- * Quick FNV-1a hash implementation based on:
- * http://www.isthe.com/chongo/tech/comp/fnv/
- *
- * FNV-1a is not be the best hash out there -- Jenkins's lookup3 is supposed
- * to be quite good, and it probably beats FNV.  But FNV has the advantage
- * that it involves almost no code.  For an improvement on both, see Paul
- * Hsieh's http://www.azillionmonkeys.com/qed/hash.html
- */
-uint32_t
-_mesa_hash_data(const void *data, size_t size)
-{
-   return _mesa_fnv32_1a_accumulate_block(_mesa_fnv32_1a_offset_bias,
-  data, size);
-}
-
-/** FNV-1a string hash implementation */
-uint32_t
-_mesa_hash_string(const char *key)
-{
-   uint32_t hash = _mesa_fnv32_1a_offset_bias;
-
-   while (*key != 0) {
-  hash = _mesa_fnv32_1a_accumulate(hash, *key);
-  key++;
-   }
-
-   return hash;
-}
-
-/**
- * String compare function for use as the comparison callback in
- * _mesa_non_replacing_hash_table_create().
- */
-bool
-_mesa_key_string_equal(const void *a, const void *b)
-{
-   return strcmp(a, b) == 0;
-}
-
-bool
-_mesa_key_pointer_equal(const void *a, const void *b)
-{
-   return a == b;
-}
diff --git a/src/util/non_replacing_hash_table.h 
b/src/util/non_replacing_hash_table.h
index f047ad5c11..e8a29635f6 100644
--- a/src/util/non_replacing_hash_table.h
+++ b/src/util/non_replacing_hash_table.h
@@ -33,6 +33,7 @@
 #include 
 #include "c99_compat.h"
 #include "macros.h"
+#include "util/hash_table.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -101,40 +102,6 @@ struct non_replacing_hash_entry *
 _mesa_non_replacing_hash_table_random_entry(struct non_replacing_hash_table 
*ht,
   bool (*predicate)(struct 
non_replacing_hash_entry *entry));
 
-uint32_t _mesa_hash_data(const void *data, size_t size);
-uint32_t _mesa_hash_string(const char *key);
-bool _mesa_key_string_equal(const void *a, const void *b);
-bool _mesa_key_pointer_equal(const void *a, const void *b);
-
-static inline uint32_t _mesa_key_hash_string(const void *key)
-{
-   return _mesa_hash_string((const char *)key);
-}
-
-static inline uint32_t _mesa_hash_pointer(const void *pointer)
-{
-   return _mesa_hash_data(&pointer, sizeof(pointer));
-}
-
-enum {
-   _mesa_fnv32_1a_offset_bias = 2166136261u,
-};
-
-static inline uint32_t
-_mesa_fnv32_1a_accumulate_block(uint32_t hash, const void *data,
-size_t size)
-{
-   const uint8_t *bytes = (const uint8_t *)data;
-
-   while (size-- != 0) {
-  hash ^= *bytes;
-  hash = hash * 0x01000193;
-  bytes++;
-   }
-
-   return hash;
-}
-
 #define _mesa_fnv32_1a_accumulate(hash, expr) \
_mesa_fnv32_1a_accumulate_block(hash, &(expr), sizeof(expr))
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/14] util: Implement the insertion functionality

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 41 +++--
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index 1730b1ee0d..a4c57e9349 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -317,12 +317,14 @@ hash_table_insert(struct hash_table *ht, uint32_t hash,
 break;
   }
 
-  /* Implement replacement when another insert happens
-   * with a matching key.  This is a relatively common
-   * feature of hash tables, with the alternative
-   * generally being "insert the new value as well, and
-   * return it first when the key is searched for".
-   *
+  /* When another insert happens with a matching key,
+   * we check if there is a chain of matchin keys.
+   * If there is, we follow this chain until the end.
+   * We then do a linear probe until we find an empty spot.
+   * We insert the entry, and update the entry at our past
+   * matching key's position to point to the positon of
+   * the newly added data.
+   * 
* Note that the hash table doesn't have a delete
* callback.  If freeing of old data pointers is
* required to avoid memory leaks, perform a search
@@ -331,8 +333,35 @@ hash_table_insert(struct hash_table *ht, uint32_t hash,
   if (!entry_is_deleted(ht, entry) &&
   entry->hash == hash &&
   ht->key_equals_function(key, entry->key)) {
+ struct hash_entry *old_entry = entry;
+
+ /* Follow the chain of matching key's to the end */
+ while (old_entry->next_data != 0) {
+entry = ht->table + old_entry->next_data;
+/* We might have deleted the last entry, and have a
+ * dangling reference, so check that the keys match
+ * before we continue on traversing through the chain.
+ * If the key's don't match we undo our change of entry.
+ */
+if (!ht->key_equals_function(key, entry->key)) {
+   entry = old_entry;
+   continue;
+}
+old_entry = entry;
+ }
+
+ /* Do a linear search for an empty spot in the table */
+ while (!entry_is_free(entry)) {
+entry = ht->table + ((entry - ht->table + 1) % ht->size);
+ }
+
+ /* Now that we have an empty spot we add our data */
  entry->key = key;
  entry->data = data;
+ ht->entries++;
+
+ /* And update the matching key chain */
+ old_entry->next_data = entry - ht->table;
  return entry;
   }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/14] glsl: Restructure copy propagation to use the non replacing hash table

2017-01-01 Thread Thomas Helland
This way we can build one hash table in each direction.
This allows us to move the algorithm from O(n^2) to O(2*n).
The downside is an effective doubling of memory consumption
---
 src/compiler/glsl/opt_copy_propagation.cpp | 49 +++---
 1 file changed, 45 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation.cpp 
b/src/compiler/glsl/opt_copy_propagation.cpp
index aa5e813553..34e59b10ac 100644
--- a/src/compiler/glsl/opt_copy_propagation.cpp
+++ b/src/compiler/glsl/opt_copy_propagation.cpp
@@ -38,6 +38,7 @@
 #include "ir_optimization.h"
 #include "compiler/glsl_types.h"
 #include "util/hash_table.h"
+#include "util/non_replacing_hash_table.h"
 #include "util/dyn_array.h"
 
 namespace {
@@ -51,6 +52,8 @@ public:
   lin_ctx = linear_alloc_parent(mem_ctx, 0);
   acp = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
 _mesa_key_pointer_equal);
+  rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, 
_mesa_hash_pointer,
+  _mesa_key_pointer_equal);
   this->kills = ralloc(mem_ctx, dyn_array);
   dyn_array_init(this->kills, mem_ctx);
   killed_all = false;
@@ -75,6 +78,13 @@ public:
 
/** Hash of lhs->rhs: The available copies to propagate */
hash_table *acp;
+
+   /**
+* Hash table with possibility for multiple data per key.
+* This is used as an inverse of the acp table, to make the
+* algorithm O(n) instead of O(n^2).
+*/
+   non_replacing_hash_table *rev_acp;
/**
 * List of the variables whose values were killed in this block.
 */
@@ -98,11 +108,14 @@ 
ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir)
 * main() at link time, so they're irrelevant to us.
 */
hash_table *orig_acp = this->acp;
+   non_replacing_hash_table *orig_rev_acp = this->rev_acp;
dyn_array *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
+   rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
this->kills = ralloc(mem_ctx, dyn_array);
dyn_array_init(this->kills, mem_ctx);
this->killed_all = false;
@@ -110,10 +123,12 @@ 
ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir)
visit_list_elements(this, &ir->body);
 
_mesa_hash_table_destroy(acp, NULL);
+   _mesa_non_replacing_hash_table_destroy(rev_acp, NULL);
ralloc_free(this->kills);
 
this->kills = orig_kills;
this->acp = orig_acp;
+   this->rev_acp = orig_rev_acp;
this->killed_all = orig_killed_all;
 
return visit_continue_with_parent;
@@ -177,6 +192,7 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir)
 * this call.  So kill all copies.
 */
_mesa_hash_table_clear(acp, NULL);
+   _mesa_non_replacing_hash_table_clear(rev_acp, NULL);
this->killed_all = true;
 
return visit_continue_with_parent;
@@ -186,11 +202,14 @@ void
 ir_copy_propagation_visitor::handle_if_block(exec_list *instructions)
 {
hash_table *orig_acp = this->acp;
+   non_replacing_hash_table *orig_rev_acp = this->rev_acp;
dyn_array *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
acp = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
+   rev_acp = _mesa_non_replacing_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
this->kills = ralloc(mem_ctx, dyn_array);
dyn_array_init(this->kills, mem_ctx);
this->killed_all = false;
@@ -201,17 +220,25 @@ ir_copy_propagation_visitor::handle_if_block(exec_list 
*instructions)
   _mesa_hash_table_insert_pre_hashed(acp, entry->hash,
  entry->key, entry->data);
}
+   struct non_replacing_hash_entry *nr_entry;
+   non_replacing_hash_table_foreach(orig_rev_acp, nr_entry) {
+  _mesa_non_replacing_hash_table_insert_pre_hashed(rev_acp, nr_entry->hash,
+   nr_entry->key, 
nr_entry->data);
+   }
 
visit_list_elements(this, instructions);
 
if (this->killed_all) {
   _mesa_hash_table_clear(orig_acp, NULL);
+  _mesa_non_replacing_hash_table_clear(orig_rev_acp, NULL);
}
 
dyn_array *new_kills = this->kills;
this->kills = orig_kills;
_mesa_hash_table_destroy(acp, NULL);
+   _mesa_non_replacing_hash_table_destroy(rev_acp, NULL);
this->acp = orig_acp;
+   this->rev_acp = orig_rev_acp;
this->killed_all = this->killed_all || orig_killed_all;
 
dyn_array_foreach(new_kills, ir_variable *, var_ptr) {
@@ -239,11 +266,14 @@ void
 ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp)
 {
hash_table *orig_acp = thi

[Mesa-dev] [PATCH 09/14] util: Implement deletion of entries

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index a4c57e9349..33c93452c4 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -183,6 +183,7 @@ _mesa_hash_table_clear(struct hash_table *ht,
  delete_function(entry);
 
   entry->key = NULL;
+  entry->next_data = 0;
}
 
ht->entries = 0;
@@ -413,6 +414,10 @@ _mesa_hash_table_insert_pre_hashed(struct hash_table *ht, 
uint32_t hash,
  *
  * Note that deletion doesn't otherwise modify the table, so an iteration over
  * the table deleting entries is safe.
+ *
+ * IMPORTANT: The table only allows deleting the last entry of a chain of
+ * entries with matching keys. If an entry in the middle of the chain is 
deleted
+ * the chain will be broken, and the table will have unpredictable results.
  */
 void
 _mesa_hash_table_remove(struct hash_table *ht,
@@ -424,6 +429,7 @@ _mesa_hash_table_remove(struct hash_table *ht,
entry->key = ht->deleted_key;
ht->entries--;
ht->deleted_entries++;
+   entry->next_data = 0;
 }
 
 /**
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/14] util: Rename functions and structs

2017-01-01 Thread Thomas Helland
---
 src/util/non_replacing_hash_table.c | 112 +++-
 src/util/non_replacing_hash_table.h | 102 
 2 files changed, 111 insertions(+), 103 deletions(-)

diff --git a/src/util/non_replacing_hash_table.c 
b/src/util/non_replacing_hash_table.c
index 48f42aa7c9..ecec94e2d6 100644
--- a/src/util/non_replacing_hash_table.c
+++ b/src/util/non_replacing_hash_table.c
@@ -93,32 +93,32 @@ static const struct {
 };
 
 static int
-entry_is_free(const struct hash_entry *entry)
+entry_is_free(const struct non_replacing_hash_entry *entry)
 {
return entry->key == NULL;
 }
 
 static int
-entry_is_deleted(const struct hash_table *ht, struct hash_entry *entry)
+entry_is_deleted(const struct non_replacing_hash_table *ht, struct 
non_replacing_hash_entry *entry)
 {
return entry->key == ht->deleted_key;
 }
 
 static int
-entry_is_present(const struct hash_table *ht, struct hash_entry *entry)
+entry_is_present(const struct non_replacing_hash_table *ht, struct 
non_replacing_hash_entry *entry)
 {
return entry->key != NULL && entry->key != ht->deleted_key;
 }
 
-struct hash_table *
-_mesa_hash_table_create(void *mem_ctx,
+struct non_replacing_hash_table *
+_mesa_non_replacing_hash_table_create(void *mem_ctx,
 uint32_t (*key_hash_function)(const void *key),
 bool (*key_equals_function)(const void *a,
 const void *b))
 {
-   struct hash_table *ht;
+   struct non_replacing_hash_table *ht;
 
-   ht = ralloc(mem_ctx, struct hash_table);
+   ht = ralloc(mem_ctx, struct non_replacing_hash_table);
if (ht == NULL)
   return NULL;
 
@@ -128,7 +128,7 @@ _mesa_hash_table_create(void *mem_ctx,
ht->max_entries = hash_sizes[ht->size_index].max_entries;
ht->key_hash_function = key_hash_function;
ht->key_equals_function = key_equals_function;
-   ht->table = rzalloc_array(ht, struct hash_entry, ht->size);
+   ht->table = rzalloc_array(ht, struct non_replacing_hash_entry, ht->size);
ht->entries = 0;
ht->deleted_entries = 0;
ht->deleted_key = &deleted_key_value;
@@ -148,16 +148,16 @@ _mesa_hash_table_create(void *mem_ctx,
  * freeing.
  */
 void
-_mesa_hash_table_destroy(struct hash_table *ht,
- void (*delete_function)(struct hash_entry *entry))
+_mesa_non_replacing_hash_table_destroy(struct non_replacing_hash_table *ht,
+ void (*delete_function)(struct 
non_replacing_hash_entry *entry))
 {
if (!ht)
   return;
 
if (delete_function) {
-  struct hash_entry *entry;
+  struct non_replacing_hash_entry *entry;
 
-  hash_table_foreach(ht, entry) {
+  non_replacing_hash_table_foreach(ht, entry) {
  delete_function(entry);
   }
}
@@ -171,10 +171,10 @@ _mesa_hash_table_destroy(struct hash_table *ht,
  * If delete_function is passed, it gets called on each entry present.
  */
 void
-_mesa_hash_table_clear(struct hash_table *ht,
-   void (*delete_function)(struct hash_entry *entry))
+_mesa_non_replacing_hash_table_clear(struct non_replacing_hash_table *ht,
+   void (*delete_function)(struct non_replacing_hash_entry 
*entry))
 {
-   struct hash_entry *entry;
+   struct non_replacing_hash_entry *entry;
 
for (entry = ht->table; entry != ht->table + ht->size; entry++) {
   if (entry->key == NULL)
@@ -202,13 +202,14 @@ _mesa_hash_table_clear(struct hash_table *ht,
  * This must be called before any keys are actually deleted from the table.
  */
 void
-_mesa_hash_table_set_deleted_key(struct hash_table *ht, const void 
*deleted_key)
+_mesa_non_replacing_hash_table_set_deleted_key(struct non_replacing_hash_table 
*ht,
+   const void *deleted_key)
 {
ht->deleted_key = deleted_key;
 }
 
-static struct hash_entry *
-hash_table_search(struct hash_table *ht, uint32_t hash, const void *key)
+static struct non_replacing_hash_entry *
+hash_table_search(struct non_replacing_hash_table *ht, uint32_t hash, const 
void *key)
 {
uint32_t start_hash_address = hash % ht->size;
uint32_t hash_address = start_hash_address;
@@ -216,13 +217,13 @@ hash_table_search(struct hash_table *ht, uint32_t hash, 
const void *key)
do {
   uint32_t double_hash;
 
-  struct hash_entry *entry = ht->table + hash_address;
+  struct non_replacing_hash_entry *entry = ht->table + hash_address;
 
   if (entry_is_free(entry)) {
  return NULL;
   } else if (entry_is_present(ht, entry) && entry->hash == hash) {
  if (ht->key_equals_function(key, entry->key)) {
-struct hash_entry *previous_entry = entry;
+struct non_replacing_hash_entry *previous_entry = entry;
 
 /* Follow the chain of matching key's to the end */
 while (previous_entry->next_data != 0 ) {
@@ -256,35 +257,36 @@ hash_table_search(struct hash_table *ht, uint32_t ha

Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation

2017-01-01 Thread Thomas Helland
Oh, and an important disclaimer.
I have not ran this through picking, nor have I done a complete
run of shader-db for performance comparisons on more realistic workloads.
I'll see what time I can find this evening, or some time tomorrow.

On Jan 1, 2017 19:38, "Thomas Helland"  wrote:

Hi all.

This is a series I've put together to lower the overhead of
the copy propagation pass in glsl. The series is not refined,
so please treat it as a proof of concept / RFC.

The first five patches might be merge-ready, but their benefit is
quite low compared to the actual major change here.

The series goes on to copy our existing hash table implementation,
and modify it in steps to allow insertion of multiple data entries
with the same key. I've kept the series in steps corresponding to
the way I worked when modifying it, as I found it nice for clarity.
There is some obvious cleanups and squashing that will need to happen
before this can be considered for merging.

The major change is the last patch. This uses the new hash table in
our copy propagation pass. This effectively makes the algorithm
O(2n) instead of O(n^2), which is a big win. I tested with the
shader mentioned in [1] but took the shaders and made a shader_test
that I used in conjunction with shader-db to do testing.
I cut down on the ridiculous amount of expression some, to make it
even compile in a reasonable time on my poor laptop.

So, on to the results. Compilation of the aforementioned shader
went from 98 to 75 seconds. Perf shows that previously we spent
19% of the time walking the acp-table with _mesa_hash_table_next_entry.
After this series, the hash table is practically gone from the profile.
This is a major win, and I'm guite happy with the results.
There is obviously a lot more work to be done though.

Please leave critical feedback and ideas. This was merely a test
from my part, but it seems there is some opportunities here.

https://bugs.freedesktop.org/show_bug.cgi?id=94477

Thomas Helland (14):
  glsl: Don't rehash the values when copying to new table
  glsl: Don't compute the hash when we already have it available
  nir: Remove unused include of nir_array
  nir: Move nir_array to util, and rename to dyn_array
  glsl: Use dyn_array instead of the exec_list
  util: Make a copy of the existing hash table implementation
  util: Add field for pointing to next data
  util: Implement the insertion functionality
  util: Implement deletion of entries
  util: Add a comment about ordering of entries with matching keys
  util: Implement returning the last added entry on search
  util: Rename functions and structs
  util: Use hashing functions from hash_table.h
  glsl: Restructure copy propagation to use the non replacing hash table

 src/compiler/Makefile.sources  |   1 -
 src/compiler/glsl/opt_copy_propagation.cpp | 112 +++--
 .../glsl/opt_copy_propagation_elements.cpp |   6 +-
 src/compiler/nir/nir_lower_locals_to_regs.c|   1 -
 src/compiler/spirv/vtn_cfg.c   |   6 +-
 src/compiler/spirv/vtn_private.h   |   4 +-
 src/util/Makefile.sources  |   3 +
 src/{compiler/nir/nir_array.h => util/dyn_array.h} |  18 +-
 src/util/non_replacing_hash_table.c| 513
+
 src/util/non_replacing_hash_table.h| 135 ++
 10 files changed, 743 insertions(+), 56 deletions(-)
 rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%)
 create mode 100644 src/util/non_replacing_hash_table.c
 create mode 100644 src/util/non_replacing_hash_table.h

--
2.11.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Patch for freedreno features

2017-01-01 Thread Rob Clark
On Sat, Dec 31, 2016 at 1:31 PM, Romain Failliot
 wrote:
> I tried to use git send-email but it doesn't seem to work (although
> the output says otherwise).
>
> So eventually it's simpler to just copy/paste the patch generated by
> git format-patch:

well, I was trying to not loose authorship/etc from the patch
(although tbh I've lost track of who the original author was)..
http://www.mesa3d.org/devinfo.html#submitting has some suggestions.

BR,
-R

> ---
>  docs/features.txt | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/docs/features.txt b/docs/features.txt
> index c27d521..63b45af 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions.
>  Feature Status
>  ---
> 
>
> -GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> llvmpipe, softpipe, swr
> +GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600,
> radeonsi, llvmpipe, softpipe, swr
>
>glBindFragDataLocation, glGetFragDataLocation DONE
>GL_NV_conditional_render (Conditional rendering)  DONE ()
> @@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0,
> r600, radeonsi, llvmpipe, soft
>glVertexAttribI commands  DONE
>Depth format cube texturesDONE ()
>GLX_ARB_create_context (GLX 1.4 is required)  DONE
> -  Multisample anti-aliasing DONE
> (llvmpipe (*), softpipe (*), swr (*))
> +  Multisample anti-aliasing DONE
> (freedreno (*), llvmpipe (*), softpipe (*), swr (*))
>
> -(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
> +(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample
> anti-aliasing support
>
>
> -GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> llvmpipe, softpipe, swr
> +GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600,
> radeonsi, llvmpipe, softpipe, swr
>
>Forward compatible context support/deprecations   DONE ()
>GL_ARB_draw_instanced (Instanced drawing) DONE ()
> @@ -82,34 +82,34 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0,
> r600, radeonsi, llvmpipe, soft
>
>Core/compatibility profiles   DONE
>Geometry shaders  DONE ()
> -  GL_ARB_vertex_array_bgra (BGRA vertex order)  DONE (swr)
> -  GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr)
> -  GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr)
> -  GL_ARB_provoking_vertex (Provoking vertex)DONE (swr)
> -  GL_ARB_seamless_cube_map (Seamless cubemaps)  DONE (swr)
> +  GL_ARB_vertex_array_bgra (BGRA vertex order)  DONE (freedreno, swr)
> +  GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno, swr)
> +  GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno, swr)
> +  GL_ARB_provoking_vertex (Provoking vertex)DONE (freedreno, swr)
> +  GL_ARB_seamless_cube_map (Seamless cubemaps)  DONE (freedreno, swr)
>GL_ARB_texture_multisample (Multisample textures) DONE (swr)
> -  GL_ARB_depth_clamp (Frag depth clamp) DONE (swr)
> -  GL_ARB_sync (Fence objects)   DONE (swr)
> +  GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno, swr)
> +  GL_ARB_sync (Fence objects)   DONE (freedreno, swr)
>GLX_ARB_create_context_profileDONE
>
>
>  GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> llvmpipe, softpipe
>
> -  GL_ARB_blend_func_extendedDONE (swr)
> +  GL_ARB_blend_func_extendedDONE
> (freedreno/a3xx, swr)
>GL_ARB_explicit_attrib_location   DONE (all
> drivers that support GLSL)
> -  GL_ARB_occlusion_query2   DONE (swr)
> +  GL_ARB_occlusion_query2   DONE (freedreno, swr)
>GL_ARB_sampler_objectsDONE (all drivers)
> -  GL_ARB_shader_bit_encodingDONE (swr)
> -  GL_ARB_texture_rgb10_a2ui DONE (swr)
> -  GL_ARB_texture_swizzleDONE (swr)
> +  GL_ARB_shader_bit_encodingDONE (freedreno, swr)
> +  GL_ARB_texture_rgb10_a2ui DONE (freedreno, swr)
> +  GL_ARB_texture_swizzleDONE (freedreno, swr)
>GL_ARB_timer_queryDONE (swr)
> -  GL_ARB_instanced_arrays   DONE (swr)
> -  GL_ARB_vertex_type_2_10_10_10_rev DONE (swr)
> +  GL_ARB_instan

[Mesa-dev] [PATCH] gallium/hud: add a path separator between dump directory and filename

2017-01-01 Thread Edmondo Tommasina
It's more user friendly and it avoids to write files in unexpected
places.
---
 src/gallium/auxiliary/hud/hud_context.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/hud/hud_context.c 
b/src/gallium/auxiliary/hud/hud_context.c
index 779c116..cccdb26 100644
--- a/src/gallium/auxiliary/hud/hud_context.c
+++ b/src/gallium/auxiliary/hud/hud_context.c
@@ -872,9 +872,10 @@ hud_graph_set_dump_file(struct hud_graph *gr)
char *dump_file;
 
if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) {
-  dump_file = malloc(strlen(hud_dump_dir) + sizeof(gr->name));
+  dump_file = malloc(strlen(hud_dump_dir) + sizeof("/") + 
sizeof(gr->name));
   if (dump_file) {
  strcpy(dump_file, hud_dump_dir);
+ strcat(dump_file, "/");
  strcat(dump_file, gr->name);
  gr->fd = fopen(dump_file, "w+");
  free(dump_file);
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation

2017-01-01 Thread Thomas Helland
Darn it. There appears to be a functional change with patch 14.
My changes where meant to make no functional change,
so something is fishy. An example from the public shader-db:

cycles HURT:   ./shaders/tesseract/205.shader_test VS vec4:  54 ->
100 (85.19%)

I'll see if I can find some time to debug this during the week.
If anyone immediately sees the issue, please let me know.

I guess I should write some make-check tests for the new
hash table, as I suspect that that is where the issue is.

- Thomas
2017-01-01 20:41 GMT+01:00 Thomas Helland :
> Oh, and an important disclaimer.
> I have not ran this through picking, nor have I done a complete
> run of shader-db for performance comparisons on more realistic workloads.
> I'll see what time I can find this evening, or some time tomorrow.
>
>
> On Jan 1, 2017 19:38, "Thomas Helland"  wrote:
>
> Hi all.
>
> This is a series I've put together to lower the overhead of
> the copy propagation pass in glsl. The series is not refined,
> so please treat it as a proof of concept / RFC.
>
> The first five patches might be merge-ready, but their benefit is
> quite low compared to the actual major change here.
>
> The series goes on to copy our existing hash table implementation,
> and modify it in steps to allow insertion of multiple data entries
> with the same key. I've kept the series in steps corresponding to
> the way I worked when modifying it, as I found it nice for clarity.
> There is some obvious cleanups and squashing that will need to happen
> before this can be considered for merging.
>
> The major change is the last patch. This uses the new hash table in
> our copy propagation pass. This effectively makes the algorithm
> O(2n) instead of O(n^2), which is a big win. I tested with the
> shader mentioned in [1] but took the shaders and made a shader_test
> that I used in conjunction with shader-db to do testing.
> I cut down on the ridiculous amount of expression some, to make it
> even compile in a reasonable time on my poor laptop.
>
> So, on to the results. Compilation of the aforementioned shader
> went from 98 to 75 seconds. Perf shows that previously we spent
> 19% of the time walking the acp-table with _mesa_hash_table_next_entry.
> After this series, the hash table is practically gone from the profile.
> This is a major win, and I'm guite happy with the results.
> There is obviously a lot more work to be done though.
>
> Please leave critical feedback and ideas. This was merely a test
> from my part, but it seems there is some opportunities here.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=94477
>
> Thomas Helland (14):
>   glsl: Don't rehash the values when copying to new table
>   glsl: Don't compute the hash when we already have it available
>   nir: Remove unused include of nir_array
>   nir: Move nir_array to util, and rename to dyn_array
>   glsl: Use dyn_array instead of the exec_list
>   util: Make a copy of the existing hash table implementation
>   util: Add field for pointing to next data
>   util: Implement the insertion functionality
>   util: Implement deletion of entries
>   util: Add a comment about ordering of entries with matching keys
>   util: Implement returning the last added entry on search
>   util: Rename functions and structs
>   util: Use hashing functions from hash_table.h
>   glsl: Restructure copy propagation to use the non replacing hash table
>
>  src/compiler/Makefile.sources  |   1 -
>  src/compiler/glsl/opt_copy_propagation.cpp | 112 +++--
>  .../glsl/opt_copy_propagation_elements.cpp |   6 +-
>  src/compiler/nir/nir_lower_locals_to_regs.c|   1 -
>  src/compiler/spirv/vtn_cfg.c   |   6 +-
>  src/compiler/spirv/vtn_private.h   |   4 +-
>  src/util/Makefile.sources  |   3 +
>  src/{compiler/nir/nir_array.h => util/dyn_array.h} |  18 +-
>  src/util/non_replacing_hash_table.c| 513
> +
>  src/util/non_replacing_hash_table.h| 135 ++
>  10 files changed, 743 insertions(+), 56 deletions(-)
>  rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%)
>  create mode 100644 src/util/non_replacing_hash_table.c
>  create mode 100644 src/util/non_replacing_hash_table.h
>
> --
> 2.11.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 20/27] i965: Pretend that CCS modified images are two planes

2017-01-01 Thread Ben Widawsky

On 16-12-10 16:05:12, Pohjolainen, Topi wrote:

On Thu, Dec 01, 2016 at 02:10:01PM -0800, Ben Widawsky wrote:

From: Ben Widawsky 

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 91eb7ec..f40761a 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -747,7 +747,7 @@ intel_query_image(__DRIimage *image, int attrib, int *value)
case __DRI_IMAGE_ATTRIB_FOURCC:
   return intel_lookup_fourcc(image->dri_format, value);
case __DRI_IMAGE_ATTRIB_NUM_PLANES:
-  *value = 1;
+  *value = image->aux_offset ? 2: 1;
   return true;
case __DRI_IMAGE_ATTRIB_OFFSET:
   *value = image->offset;
@@ -956,9 +956,17 @@ intel_from_planar(__DRIimage *parent, int plane, void 
*loaderPrivate)
 int width, height, offset, stride, dri_format, index;
 struct intel_image_format *f;
 __DRIimage *image;
-
-if (parent == NULL || parent->planar_format == NULL)
-return NULL;
+bool is_aux = parent->aux_offset && plane == 1;
+
+if (parent == NULL || parent->planar_format == NULL) {
+   if (is_aux) {
+  offset = parent->aux_offset;
+  stride = ALIGN(parent->pitch / 32, 128);
+  dri_format = parent->dri_format;
+  goto done;
+   }
+   return NULL;
+}

 f = parent->planar_format;

@@ -972,11 +980,13 @@ intel_from_planar(__DRIimage *parent, int plane, void 
*loaderPrivate)
 offset = parent->offsets[index];
 stride = parent->strides[index];

+done:
 image = intel_allocate_image(parent->screen, dri_format, loaderPrivate);
 if (image == NULL)
return NULL;

-if (offset + height * stride > parent->bo->size) {
+if (!is_aux &&
+offset + height * stride > parent->bo->size) {


This means that parent->bo->size is set to the size of the color region and
not to the full size of color+aux. I don't think I saw yet logic in this
series doing this...

Anyway, small comment would probably prevent other people wondering about the
same thing. What do you think?



No. You caught a bug. parent->bo->size should be the full size (it's allocated
in create_image_with_modifier). Originally I had height defined as well so I
could keep the assertion. I think I should go back and set height and keep the
assertion as it was.

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 49adb1d2ff..501520bbf0 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -975,6 +975,7 @@ intel_from_planar(__DRIimage *parent, int plane, void 
*loaderPrivate)
   if (is_aux) {
  offset = parent->aux_offset;
  stride = ALIGN(parent->pitch / 32, 128);
+  height = ALIGN(DIV_ROUND_UP(parent->height, 16), 32);
  dri_format = parent->dri_format;
  goto done;
   }
@@ -998,8 +999,7 @@ done:
if (image == NULL)
   return NULL;

-if (!is_aux &&
-offset + height * stride > parent->bo->size) {
+if (offset + height * stride > parent->bo->size) {
   _mesa_warning(NULL, "intel_create_sub_image: subimage out of bounds");
   free(image);
   return NULL;


_mesa_warning(NULL, "intel_create_sub_image: subimage out of bounds");
free(image);
return NULL;
--
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation

2017-01-01 Thread Thomas Helland
So I've discovered the issue. I wasn't inserting into both tables.
So we never did any copy propagation. So that was stupid.
Now it assertion fails in _insert_pre_hashed as hashing the
key does not return the same result as the supplied hash.
I'll look more into this some time during the week.

2017-01-01 22:53 GMT+01:00 Thomas Helland :
> Darn it. There appears to be a functional change with patch 14.
> My changes where meant to make no functional change,
> so something is fishy. An example from the public shader-db:
>
> cycles HURT:   ./shaders/tesseract/205.shader_test VS vec4:  54 ->
> 100 (85.19%)
>
> I'll see if I can find some time to debug this during the week.
> If anyone immediately sees the issue, please let me know.
>
> I guess I should write some make-check tests for the new
> hash table, as I suspect that that is where the issue is.
>
> - Thomas
> 2017-01-01 20:41 GMT+01:00 Thomas Helland :
>> Oh, and an important disclaimer.
>> I have not ran this through picking, nor have I done a complete
>> run of shader-db for performance comparisons on more realistic workloads.
>> I'll see what time I can find this evening, or some time tomorrow.
>>
>>
>> On Jan 1, 2017 19:38, "Thomas Helland"  wrote:
>>
>> Hi all.
>>
>> This is a series I've put together to lower the overhead of
>> the copy propagation pass in glsl. The series is not refined,
>> so please treat it as a proof of concept / RFC.
>>
>> The first five patches might be merge-ready, but their benefit is
>> quite low compared to the actual major change here.
>>
>> The series goes on to copy our existing hash table implementation,
>> and modify it in steps to allow insertion of multiple data entries
>> with the same key. I've kept the series in steps corresponding to
>> the way I worked when modifying it, as I found it nice for clarity.
>> There is some obvious cleanups and squashing that will need to happen
>> before this can be considered for merging.
>>
>> The major change is the last patch. This uses the new hash table in
>> our copy propagation pass. This effectively makes the algorithm
>> O(2n) instead of O(n^2), which is a big win. I tested with the
>> shader mentioned in [1] but took the shaders and made a shader_test
>> that I used in conjunction with shader-db to do testing.
>> I cut down on the ridiculous amount of expression some, to make it
>> even compile in a reasonable time on my poor laptop.
>>
>> So, on to the results. Compilation of the aforementioned shader
>> went from 98 to 75 seconds. Perf shows that previously we spent
>> 19% of the time walking the acp-table with _mesa_hash_table_next_entry.
>> After this series, the hash table is practically gone from the profile.
>> This is a major win, and I'm guite happy with the results.
>> There is obviously a lot more work to be done though.
>>
>> Please leave critical feedback and ideas. This was merely a test
>> from my part, but it seems there is some opportunities here.
>>
>> https://bugs.freedesktop.org/show_bug.cgi?id=94477
>>
>> Thomas Helland (14):
>>   glsl: Don't rehash the values when copying to new table
>>   glsl: Don't compute the hash when we already have it available
>>   nir: Remove unused include of nir_array
>>   nir: Move nir_array to util, and rename to dyn_array
>>   glsl: Use dyn_array instead of the exec_list
>>   util: Make a copy of the existing hash table implementation
>>   util: Add field for pointing to next data
>>   util: Implement the insertion functionality
>>   util: Implement deletion of entries
>>   util: Add a comment about ordering of entries with matching keys
>>   util: Implement returning the last added entry on search
>>   util: Rename functions and structs
>>   util: Use hashing functions from hash_table.h
>>   glsl: Restructure copy propagation to use the non replacing hash table
>>
>>  src/compiler/Makefile.sources  |   1 -
>>  src/compiler/glsl/opt_copy_propagation.cpp | 112 +++--
>>  .../glsl/opt_copy_propagation_elements.cpp |   6 +-
>>  src/compiler/nir/nir_lower_locals_to_regs.c|   1 -
>>  src/compiler/spirv/vtn_cfg.c   |   6 +-
>>  src/compiler/spirv/vtn_private.h   |   4 +-
>>  src/util/Makefile.sources  |   3 +
>>  src/{compiler/nir/nir_array.h => util/dyn_array.h} |  18 +-
>>  src/util/non_replacing_hash_table.c| 513
>> +
>>  src/util/non_replacing_hash_table.h| 135 ++
>>  10 files changed, 743 insertions(+), 56 deletions(-)
>>  rename src/{compiler/nir/nir_array.h => util/dyn_array.h} (86%)
>>  create mode 100644 src/util/non_replacing_hash_table.c
>>  create mode 100644 src/util/non_replacing_hash_table.h
>>
>> --
>> 2.11.0
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Patch for freedreno features

2017-01-01 Thread Jan Vesely
On Sun, 2017-01-01 at 16:15 -0500, Rob Clark wrote:
> On Sat, Dec 31, 2016 at 1:31 PM, Romain Failliot
>  wrote:
> > I tried to use git send-email but it doesn't seem to work (although
> > the output says otherwise).
> > 
> > So eventually it's simpler to just copy/paste the patch generated by
> > git format-patch:
> 
> well, I was trying to not loose authorship/etc from the patch
> (although tbh I've lost track of who the original author was)..

just FYI
you can use git commit --author="foo bar " to set arbitrary
authorship at commit time. works with git commit --amend as well.

Jan

> http://www.mesa3d.org/devinfo.html#submitting has some suggestions.
> 
> BR,
> -R
> 
> > ---
> >  docs/features.txt | 38 +++---
> >  1 file changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/docs/features.txt b/docs/features.txt
> > index c27d521..63b45af 100644
> > --- a/docs/features.txt
> > +++ b/docs/features.txt
> > @@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions.
> >  Feature Status
> >  ---
> > 
> > 
> > -GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> > llvmpipe, softpipe, swr
> > +GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600,
> > radeonsi, llvmpipe, softpipe, swr
> > 
> >glBindFragDataLocation, glGetFragDataLocation DONE
> >GL_NV_conditional_render (Conditional rendering)  DONE ()
> > @@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0,
> > r600, radeonsi, llvmpipe, soft
> >glVertexAttribI commands  DONE
> >Depth format cube texturesDONE ()
> >GLX_ARB_create_context (GLX 1.4 is required)  DONE
> > -  Multisample anti-aliasing DONE
> > (llvmpipe (*), softpipe (*), swr (*))
> > +  Multisample anti-aliasing DONE
> > (freedreno (*), llvmpipe (*), softpipe (*), swr (*))
> > 
> > -(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
> > +(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample
> > anti-aliasing support
> > 
> > 
> > -GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> > llvmpipe, softpipe, swr
> > +GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600,
> > radeonsi, llvmpipe, softpipe, swr
> > 
> >Forward compatible context support/deprecations   DONE ()
> >GL_ARB_draw_instanced (Instanced drawing) DONE ()
> > @@ -82,34 +82,34 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0,
> > r600, radeonsi, llvmpipe, soft
> > 
> >Core/compatibility profiles   DONE
> >Geometry shaders  DONE ()
> > -  GL_ARB_vertex_array_bgra (BGRA vertex order)  DONE (swr)
> > -  GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr)
> > -  GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr)
> > -  GL_ARB_provoking_vertex (Provoking vertex)DONE (swr)
> > -  GL_ARB_seamless_cube_map (Seamless cubemaps)  DONE (swr)
> > +  GL_ARB_vertex_array_bgra (BGRA vertex order)  DONE (freedreno, 
> > swr)
> > +  GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno, 
> > swr)
> > +  GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno, 
> > swr)
> > +  GL_ARB_provoking_vertex (Provoking vertex)DONE (freedreno, 
> > swr)
> > +  GL_ARB_seamless_cube_map (Seamless cubemaps)  DONE (freedreno, 
> > swr)
> >GL_ARB_texture_multisample (Multisample textures) DONE (swr)
> > -  GL_ARB_depth_clamp (Frag depth clamp) DONE (swr)
> > -  GL_ARB_sync (Fence objects)   DONE (swr)
> > +  GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno, 
> > swr)
> > +  GL_ARB_sync (Fence objects)   DONE (freedreno, 
> > swr)
> >GLX_ARB_create_context_profileDONE
> > 
> > 
> >  GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi,
> > llvmpipe, softpipe
> > 
> > -  GL_ARB_blend_func_extendedDONE (swr)
> > +  GL_ARB_blend_func_extendedDONE
> > (freedreno/a3xx, swr)
> >GL_ARB_explicit_attrib_location   DONE (all
> > drivers that support GLSL)
> > -  GL_ARB_occlusion_query2   DONE (swr)
> > +  GL_ARB_occlusion_query2   DONE (freedreno, 
> > swr)
> >GL_ARB_sampler_objectsDONE (all drivers)
> > -  GL_ARB_shader_bit_encodingDONE (swr)
> > -  GL_ARB_texture_rgb10_a2ui DONE (swr)
> > -  GL_ARB_texture_swizzleDONE (swr)
> > +  GL_ARB_shader_bit_encoding

Re: [Mesa-dev] [PATCH 00/14] [RFC] Optimizations on copy propagation

2017-01-01 Thread Thomas Helland
2017-01-01 23:20 GMT+01:00 Thomas Helland :
> So I've discovered the issue. I wasn't inserting into both tables.
> So we never did any copy propagation. So that was stupid.
> Now it assertion fails in _insert_pre_hashed as hashing the
> key does not return the same result as the supplied hash.
> I'll look more into this some time during the week.
>

Couldn't sleep until I'd found the issue. I was not setting the
hash when hitting the new insertion codepath. Adding this
makes things work, and reduces the functional changes to
just two shaders from tf2 that are seeing an increase of one
single instruction in their vertex shaders. But that's for another day.
As for the compile time; I've only given it a shader-db run on
normal shaders without disabling validation or nir optimizations.
(Apart from the aforementioned unrealistic shader from the bug).
It appears to give no noticeable difference, if something it
might regress performance ever so slightly. I'd have to do more
testing to verify if it has any major impact. It would not be
unlikely that it would cause some regressions on smaller
shaders though, as we are balancing hashing-overhead,
and the fact we are inserting into two tables,
with the overhead of array-traversal in the hash table.
Some thorough testing will be needed.

> 2017-01-01 22:53 GMT+01:00 Thomas Helland :
>> Darn it. There appears to be a functional change with patch 14.
>> My changes where meant to make no functional change,
>> so something is fishy. An example from the public shader-db:
>>
>> cycles HURT:   ./shaders/tesseract/205.shader_test VS vec4:  54 ->
>> 100 (85.19%)
>>
>> I'll see if I can find some time to debug this during the week.
>> If anyone immediately sees the issue, please let me know.
>>
>> I guess I should write some make-check tests for the new
>> hash table, as I suspect that that is where the issue is.
>>
>> - Thomas
>> 2017-01-01 20:41 GMT+01:00 Thomas Helland :
>>> Oh, and an important disclaimer.
>>> I have not ran this through picking, nor have I done a complete
>>> run of shader-db for performance comparisons on more realistic workloads.
>>> I'll see what time I can find this evening, or some time tomorrow.
>>>
>>>
>>> On Jan 1, 2017 19:38, "Thomas Helland"  wrote:
>>>
>>> Hi all.
>>>
>>> This is a series I've put together to lower the overhead of
>>> the copy propagation pass in glsl. The series is not refined,
>>> so please treat it as a proof of concept / RFC.
>>>
>>> The first five patches might be merge-ready, but their benefit is
>>> quite low compared to the actual major change here.
>>>
>>> The series goes on to copy our existing hash table implementation,
>>> and modify it in steps to allow insertion of multiple data entries
>>> with the same key. I've kept the series in steps corresponding to
>>> the way I worked when modifying it, as I found it nice for clarity.
>>> There is some obvious cleanups and squashing that will need to happen
>>> before this can be considered for merging.
>>>
>>> The major change is the last patch. This uses the new hash table in
>>> our copy propagation pass. This effectively makes the algorithm
>>> O(2n) instead of O(n^2), which is a big win. I tested with the
>>> shader mentioned in [1] but took the shaders and made a shader_test
>>> that I used in conjunction with shader-db to do testing.
>>> I cut down on the ridiculous amount of expression some, to make it
>>> even compile in a reasonable time on my poor laptop.
>>>
>>> So, on to the results. Compilation of the aforementioned shader
>>> went from 98 to 75 seconds. Perf shows that previously we spent
>>> 19% of the time walking the acp-table with _mesa_hash_table_next_entry.
>>> After this series, the hash table is practically gone from the profile.
>>> This is a major win, and I'm guite happy with the results.
>>> There is obviously a lot more work to be done though.
>>>
>>> Please leave critical feedback and ideas. This was merely a test
>>> from my part, but it seems there is some opportunities here.
>>>
>>> https://bugs.freedesktop.org/show_bug.cgi?id=94477
>>>
>>> Thomas Helland (14):
>>>   glsl: Don't rehash the values when copying to new table
>>>   glsl: Don't compute the hash when we already have it available
>>>   nir: Remove unused include of nir_array
>>>   nir: Move nir_array to util, and rename to dyn_array
>>>   glsl: Use dyn_array instead of the exec_list
>>>   util: Make a copy of the existing hash table implementation
>>>   util: Add field for pointing to next data
>>>   util: Implement the insertion functionality
>>>   util: Implement deletion of entries
>>>   util: Add a comment about ordering of entries with matching keys
>>>   util: Implement returning the last added entry on search
>>>   util: Rename functions and structs
>>>   util: Use hashing functions from hash_table.h
>>>   glsl: Restructure copy propagation to use the non replacing hash table
>>>
>>>  src/compiler/Makefile.sources  |   1 -
>>>  src/compiler/glsl/opt_c

Re: [Mesa-dev] [PATCH 25/27] i965: Use partial resolves for CCS buffers being scanned out

2017-01-01 Thread Ben Widawsky

On 16-12-10 16:24:46, Pohjolainen, Topi wrote:

On Thu, Dec 01, 2016 at 02:10:06PM -0800, Ben Widawsky wrote:

From: Ben Widawsky 

On Gen9 hardware, the display engine is able to scanout a compressed
framebuffer by providing an offset to auxiliary compression information.
Unfortunately, the hardware is incapable of doing the same thing for the
fast clear color.

To mitigate this, the hardware introduced a new resolve type called a
partial resolve. The partial resolve will only do a resolve of the fast
clear color and leave the rest of the compressed data alone.

This patch enables using this resolve type for cases where the
framebuffer will be passed along to the kernel for display.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_context.c   | 3 ++-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index cfd7a54..2bc2e94 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1359,7 +1359,8 @@ intel_resolve_for_dri2_flush(struct brw_context *brw,
   if (rb->mt->num_samples <= 1) {
  assert(rb->mt_layer == 0 && rb->mt_level == 0 &&
 rb->layer_count == 1);
- intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1, 0);


I'm now in a territory that I don't know that well and just need to ask
random questions. How do we know this is destined for display directly and not
for non-gpu compositing?



I think that's always the case with intel_resolve_for_dri2_flush(). The current
callers are:
intel_flush_front() - from glFlush
and
intel_dri2_flush_with_flags - which is only called for drawables.

I wasn't entirely sure myself, but I assumed it was correct because it passed
all the CTS and piglit tests.


+ intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1,
+ INTEL_RESOLVE_HINT_CLEAR_COLOR);
   } else {
  intel_renderbuffer_downsample(brw, rb);
   }
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index dd71a06..b79de08 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2399,8 +2399,8 @@ intel_miptree_make_shareable(struct brw_context *brw,
assert(mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE || mt->num_samples <= 1);

if (mt->mcs_buf) {


Shouldn't we here at least check for scanout?



Hmm. This probably makes sense.

How about:
-  intel_miptree_all_slices_resolve_color(brw, mt,
- INTEL_RESOLVE_HINT_CLEAR_COLOR);
+  intel_miptree_all_slices_resolve_color(brw, mt, mt->is_scanout ?
+ INTEL_RESOLVE_HINT_CLEAR_COLOR :
+ INTEL_RESOLVE_HINT_FULL);


-  intel_miptree_all_slices_resolve_color(brw, mt, 0);
-  mt->no_ccs = true;
+  intel_miptree_all_slices_resolve_color(brw, mt,
+ INTEL_RESOLVE_HINT_CLEAR_COLOR);
}
 }

--
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 26/27] i965: Remove scanout restriction from lossless compression

2017-01-01 Thread Ben Widawsky

On 16-12-11 10:05:25, Pohjolainen, Topi wrote:

On Thu, Dec 01, 2016 at 02:10:07PM -0800, Ben Widawsky wrote:

From: Ben Widawsky 

Cc: Topi Pohjolainen 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index b79de08..b297f79 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -157,7 +157,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
if (mt->disable_aux_buffers)
   return false;

-   if (mt->is_scanout)
+   if (mt->is_scanout && mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS)


I guess the same question I had in previous becomes more general. How do we
know that the external consumer is prepared for compression (mt->is_scanout
just tells that the buffer goes out from the driver, right)?



Preparedness should have been determined by the graphics application (perhaps
via the display server). The graphics application, using get_plane2 or some
other magic, will determine that the external consumer can handle compression,
and will then use the CCS modifiers upon buffer creation. This is the only way
is_scanout && msaa_layout == INTEL_MSAA_LAYOUT_CMS, see
create_ccs_buf_for_image().


   return false;

/* This function applies only to non-multisampled render targets. */
@@ -528,10 +528,6 @@ intel_miptree_create_layout(struct brw_context *brw,
   const UNUSED bool is_lossless_compressed_aux =
  brw->gen >= 9 && num_samples == 1 &&
  mt->format == MESA_FORMAT_R_UINT32;
-
-  /* For now, nothing else has this requirement */
-  assert(is_lossless_compressed_aux ||
- (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);


Why do we need to drop this? And if we do, then we can drop the entire
else-branch, "is_lossless_compressed_aux" is only used in the assertion.



This might have been because of my bug for not having the layout flags set
appropriately in an earlier version of my patches. I think the assertion should
stay here, and I put it back.

Thanks.


}

brw_miptree_layout(brw, mt, layout_flags);
@@ -752,11 +748,9 @@ intel_miptree_create(struct brw_context *brw,
* resolves.
*/
   const bool lossless_compression_disabled = INTEL_DEBUG & DEBUG_NO_RBC;
-  assert(!mt->is_scanout);
   const bool is_lossless_compressed =
  unlikely(!lossless_compression_disabled) &&
- brw->gen >= 9 && !mt->is_scanout &&
- intel_miptree_supports_lossless_compressed(brw, mt);
+ brw->gen >= 9 && intel_miptree_supports_lossless_compressed(brw, mt);

   if (is_lossless_compressed) {
  intel_miptree_alloc_non_msrt_mcs(brw, mt, is_lossless_compressed);
@@ -1043,7 +1037,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
 drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo);
  free((*mt)->hiz_buf);
   }
-  if ((*mt)->mcs_buf && !(*mt)->is_scanout) {
+  if ((*mt)->mcs_buf) {
  drm_intel_bo_unreference((*mt)->mcs_buf->bo);
  free((*mt)->mcs_buf);
   }
--
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Replace bool aux disable with enum

2017-01-01 Thread Ben Widawsky
As CCS buffers are passed to KMS, it becomes useful to be able to
determine exactly what type of aux buffers are disabled. This was
previously not entirely needed (though the code was a little more
confusing), however it becomes very desirable after a recent patch from
Chad:

commit 1c8be049bea786c2c054a770025976beba5b8636
Author: Chad Versace 
Date:   Fri Dec 9 16:18:11 2016 -0800

i965/mt: Disable aux surfaces after making miptree shareable

The next patch will handle CCS and get rid of no_ccs.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 24 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 +-
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 11ad31d761..e99d8a128b 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -64,7 +64,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
  */
 static enum intel_msaa_layout
 compute_msaa_layout(struct brw_context *brw, mesa_format format,
-bool disable_aux_buffers)
+enum intel_aux_disable aux_disable)
 {
/* Prior to Gen7, all MSAA surfaces used IMS layout. */
if (brw->gen < 7)
@@ -90,7 +90,7 @@ compute_msaa_layout(struct brw_context *brw, mesa_format 
format,
*/
   if (brw->gen == 7 && _mesa_get_format_datatype(format) == GL_INT) {
  return INTEL_MSAA_LAYOUT_UMS;
-  } else if (disable_aux_buffers) {
+  } else if (aux_disable & INTEL_AUX_DISABLE_MCS) {
  /* We can't use the CMS layout because it uses an aux buffer, the MCS
   * buffer. So fallback to UMS, which is identical to CMS without the
   * MCS. */
@@ -149,7 +149,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
if (brw->gen < 7)
   return false;
 
-   if (mt->disable_aux_buffers)
+   if (mt->aux_disable & INTEL_AUX_DISABLE_MCS)
   return false;
 
/* This function applies only to non-multisampled render targets. */
@@ -322,7 +322,8 @@ intel_miptree_create_layout(struct brw_context *brw,
mt->logical_width0 = width0;
mt->logical_height0 = height0;
mt->logical_depth0 = depth0;
-   mt->disable_aux_buffers = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0;
+   mt->aux_disable = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0 ?
+  INTEL_AUX_DISABLE_ALL : INTEL_AUX_DISABLE_NONE;
mt->no_ccs = true;
mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0;
exec_list_make_empty(&mt->hiz_map);
@@ -336,8 +337,7 @@ intel_miptree_create_layout(struct brw_context *brw,
int depth_multiply = 1;
if (num_samples > 1) {
   /* Adjust width/height/depth for MSAA */
-  mt->msaa_layout = compute_msaa_layout(brw, format,
-mt->disable_aux_buffers);
+  mt->msaa_layout = compute_msaa_layout(brw, format, mt->aux_disable);
   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_IMS) {
  /* From the Ivybridge PRM, Volume 1, Part 1, page 108:
   * "If the surface is multisampled and it is a depth or stencil
@@ -528,7 +528,7 @@ intel_miptree_create_layout(struct brw_context *brw,
 
brw_miptree_layout(brw, mt, layout_flags);
 
-   if (mt->disable_aux_buffers)
+   if (mt->aux_disable & INTEL_AUX_DISABLE_MCS)
   assert(mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS);
 
return mt;
@@ -1524,7 +1524,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
 {
assert(brw->gen >= 7); /* MCS only used on Gen7+ */
assert(mt->mcs_buf == NULL);
-   assert(!mt->disable_aux_buffers);
+   assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0);
 
/* Choose the correct format for the MCS buffer.  All that really matters
 * is that we allocate the right buffer size, since we'll always be
@@ -1583,7 +1583,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
  bool is_lossless_compressed)
 {
assert(mt->mcs_buf == NULL);
-   assert(!mt->disable_aux_buffers);
+   assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0);
assert(!mt->no_ccs);
 
struct isl_surf temp_main_surf;
@@ -1922,7 +1922,7 @@ intel_miptree_wants_hiz_buffer(struct brw_context *brw,
if (mt->hiz_buf != NULL)
   return false;
 
-   if (mt->disable_aux_buffers)
+   if (mt->aux_disable & INTEL_AUX_DISABLE_HIZ)
   return false;
 
switch (mt->format) {
@@ -1942,7 +1942,7 @@ intel_miptree_alloc_hiz(struct brw_context *brw,
struct intel_mipmap_tree *mt)
 {
assert(mt->hiz_buf == NULL);
-   assert(!mt->disable_aux_buffers);
+   assert((mt->aux_disable & INTEL_AUX_DISABLE_HIZ) == 0);
 
if (brw->gen == 7) {
   mt->hiz_buf = intel_gen7_hiz_buf_create(brw, mt);
@@ -2346,7 +2346,7 @@ intel_miptree_make_shareable(struct brw_context *brw,
   mt->hiz_buf = NULL;
}
 
-   mt->disable_aux_b

[Mesa-dev] [PATCH 2/2] i965/miptree: Create a disable CCS flag

2017-01-01 Thread Ben Widawsky
Cc: Topi Pohjolainen 
Cc: Chad Versace 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_blorp.c |  2 +-
 src/mesa/drivers/dri/i965/brw_draw.c  |  3 ++-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 21 ++---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 +++---
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 43ac3be2b9..8d58616f59 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -808,7 +808,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
if (set_write_disables(irb, ctx->Color.ColorMask[buf], color_write_disable))
   can_fast_clear = false;
 
-   if (irb->mt->no_ccs ||
+   if (irb->mt->aux_disable & INTEL_AUX_DISABLE_CCS ||
!brw_is_color_fast_clear_compatible(brw, irb->mt, 
&ctx->Color.ClearColor))
   can_fast_clear = false;
 
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index d4cc2235a4..940ce70c64 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -417,7 +417,8 @@ brw_predraw_set_aux_buffers(struct brw_context *brw)
* while each layer may have its own fast clear color value. For
* compressed buffers color value is available in the color buffer.
*/
-  if (irb->layer_count > 1 && !irb->mt->no_ccs &&
+  if (irb->layer_count > 1 &&
+  !(irb->mt->aux_disable & INTEL_AUX_DISABLE_CCS) &&
   !intel_miptree_is_lossless_compressed(brw, irb->mt)) {
  assert(brw->gen >= 8);
 
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e99d8a128b..a81905f3a4 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -324,7 +324,7 @@ intel_miptree_create_layout(struct brw_context *brw,
mt->logical_depth0 = depth0;
mt->aux_disable = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0 ?
   INTEL_AUX_DISABLE_ALL : INTEL_AUX_DISABLE_NONE;
-   mt->no_ccs = true;
+   mt->aux_disable |= INTEL_AUX_DISABLE_CCS;
mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0;
exec_list_make_empty(&mt->hiz_map);
exec_list_make_empty(&mt->color_resolve_map);
@@ -736,7 +736,7 @@ intel_miptree_create(struct brw_context *brw,
 */
if (intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) &&
intel_miptree_supports_non_msrt_fast_clear(brw, mt)) {
-  mt->no_ccs = false;
+  mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS;
   assert(brw->gen < 8 || mt->halign == 16 || num_samples <= 1);
 
   /* On Gen9+ clients are not currently capable of consuming compressed
@@ -858,7 +858,7 @@ intel_update_winsys_renderbuffer_miptree(struct brw_context 
*intel,
 */
if (intel_tiling_supports_non_msrt_mcs(intel, singlesample_mt->tiling) &&
intel_miptree_supports_non_msrt_fast_clear(intel, singlesample_mt)) {
-  singlesample_mt->no_ccs = false;
+  singlesample_mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS;
}
 
if (num_samples == 0) {
@@ -1583,8 +1583,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
  bool is_lossless_compressed)
 {
assert(mt->mcs_buf == NULL);
-   assert((mt->aux_disable & INTEL_AUX_DISABLE_MCS) == 0);
-   assert(!mt->no_ccs);
+   assert((mt->aux_disable & (INTEL_AUX_DISABLE_MCS|INTEL_AUX_DISABLE_CCS)) == 
0);
 
struct isl_surf temp_main_surf;
struct isl_surf temp_ccs_surf;
@@ -2160,7 +2159,8 @@ intel_miptree_check_color_resolve(const struct 
brw_context *brw,
   const struct intel_mipmap_tree *mt,
   unsigned level, unsigned layer)
 {
-   if (mt->no_ccs || !mt->mcs_buf)
+
+   if ((mt->aux_disable & INTEL_AUX_DISABLE_CCS) || !mt->mcs_buf)
   return;
 
/* Fast color clear is supported for mipmapped surfaces only on Gen8+. */
@@ -2240,7 +2240,7 @@ intel_miptree_needs_color_resolve(const struct 
brw_context *brw,
   const struct intel_mipmap_tree *mt,
   int flags)
 {
-   if (mt->no_ccs)
+   if (mt->aux_disable & INTEL_AUX_DISABLE_CCS)
   return false;
 
const bool is_lossless_compressed =
@@ -2334,19 +2334,18 @@ intel_miptree_make_shareable(struct brw_context *brw,
 
if (mt->mcs_buf) {
   intel_miptree_all_slices_resolve_color(brw, mt, 0);
-  mt->no_ccs = true;
+  mt->aux_disable |= (INTEL_AUX_DISABLE_CCS | INTEL_AUX_DISABLE_MCS);
   drm_intel_bo_unreference(mt->mcs_buf->bo);
   free(mt->mcs_buf);
   mt->mcs_buf = NULL;
}
 
if (mt->hiz_buf) {
+  mt->aux_disable |= INTEL_AUX_DISABLE_HIZ;
   intel_miptree_all_slices_resolve_depth(brw, mt);
   intel_miptree_hiz_buffer_free(mt->hiz_buf);
   mt->hiz_buf = NULL;

[Mesa-dev] [Bug 99244] hud_context.c:874:45: error: ‘W_OK’ undeclared

2017-01-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99244

Bug ID: 99244
   Summary: hud_context.c:874:45: error: ‘W_OK’ undeclared
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Keywords: bisected, regression
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org
CC: edmo...@eriadon.com, mar...@gmail.com

mesa: 3f5fba8a7be61bfc0f46a5ea058108f6e0e1c268 (master 13.1.0-devel)

  Compiling src/gallium/auxiliary/hud/hud_context.c ...
src/gallium/auxiliary/hud/hud_context.c: In function ‘hud_graph_set_dump_file’:
src/gallium/auxiliary/hud/hud_context.c:874:24: warning: implicit declaration
of function ‘access’ [-Wimplicit-function-declaration]
if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) {
^~
src/gallium/auxiliary/hud/hud_context.c:874:45: error: ‘W_OK’ undeclared (first
use in this function)
if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) {
 ^~~~

22cd9040da75cac6e1c61b821e1ac6906ac4a8ac is the first bad commit
commit 22cd9040da75cac6e1c61b821e1ac6906ac4a8ac
Author: Edmondo Tommasina 
Date:   Wed Dec 21 22:58:09 2016 +0100

gallium/hud: dump hud_driver_query values to files

Dump values for every selected data source in GALLIUM_HUD.

Every data source has its own file and the filename is
equal to the data source identifier.

Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory.

No values are dumped if the environment variable is not set, the
directory doesn't exist or the user doesn't have write access.

Signed-off-by: Marek Olšák 

:04 04 c269ea1aa72e829368e14dc3f923521c2d2de355
e297a9c27c8397302ab9ba61578eacab98cb8bc3 M  src
bisect run success

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] glsl: avoid treating fb fetches as output reads to be lowered

2017-01-01 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/compiler/glsl/lower_output_reads.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/lower_output_reads.cpp 
b/src/compiler/glsl/lower_output_reads.cpp
index 851078b..bd3accb 100644
--- a/src/compiler/glsl/lower_output_reads.cpp
+++ b/src/compiler/glsl/lower_output_reads.cpp
@@ -90,7 +90,7 @@ output_read_remover::~output_read_remover()
 ir_visitor_status
 output_read_remover::visit(ir_dereference_variable *ir)
 {
-   if (ir->var->data.mode != ir_var_shader_out)
+   if (ir->var->data.mode != ir_var_shader_out || 
ir->var->data.fb_fetch_output)
   return visit_continue;
 
hash_entry *entry = _mesa_hash_table_search(replacements, ir->var);
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] st/mesa: add support for advanced blend when fb can be fetched from

2017-01-01 Thread Ilia Mirkin
This implements support for emitting FBFETCH ops, using the existing
lowering pass for advanced blend logic, and disabling hw blend when
advanced blending is enabled.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_atom_blend.c|  2 +-
 src/mesa/state_tracker/st_cb_texturebarrier.c | 13 +
 src/mesa/state_tracker/st_extensions.c|  2 ++
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 28 ---
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_blend.c 
b/src/mesa/state_tracker/st_atom_blend.c
index b8d65bd..f76cfab 100644
--- a/src/mesa/state_tracker/st_atom_blend.c
+++ b/src/mesa/state_tracker/st_atom_blend.c
@@ -205,7 +205,7 @@ update_blend( struct st_context *st )
   blend->logicop_enable = 1;
   blend->logicop_func = translate_logicop(ctx->Color.LogicOp);
}
-   else if (ctx->Color.BlendEnabled) {
+   else if (ctx->Color.BlendEnabled && !ctx->Color._AdvancedBlendMode) {
   /* blending enabled */
   for (i = 0, j = 0; i < num_state; i++) {
 
diff --git a/src/mesa/state_tracker/st_cb_texturebarrier.c 
b/src/mesa/state_tracker/st_cb_texturebarrier.c
index 7fd1cbd..29cd37c 100644
--- a/src/mesa/state_tracker/st_cb_texturebarrier.c
+++ b/src/mesa/state_tracker/st_cb_texturebarrier.c
@@ -55,6 +55,18 @@ st_TextureBarrier(struct gl_context *ctx)
 
 
 /**
+ * Called via ctx->Driver.BlendBarrier()
+ */
+static void
+st_BlendBarrier(struct gl_context *ctx)
+{
+   struct pipe_context *pipe = st_context(ctx)->pipe;
+
+   pipe->texture_barrier(pipe, PIPE_TEXTURE_BARRIER_FRAMEBUFFER);
+}
+
+
+/**
  * Called via ctx->Driver.MemoryBarrier()
  */
 static void
@@ -118,5 +130,6 @@ st_MemoryBarrier(struct gl_context *ctx, GLbitfield 
barriers)
 void st_init_texture_barrier_functions(struct dd_function_table *functions)
 {
functions->TextureBarrier = st_TextureBarrier;
+   functions->BlendBarrier = st_BlendBarrier;
functions->MemoryBarrier = st_MemoryBarrier;
 }
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index df6ad08..4921d37 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -614,6 +614,8 @@ void st_init_extensions(struct pipe_screen *screen,
   { o(ARB_transform_feedback2),  
PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME   },
   { o(ARB_transform_feedback3),  
PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS },
 
+  { o(KHR_blend_equation_advanced),  PIPE_CAP_TGSI_FS_FBFETCH  
},
+
   { o(EXT_blend_equation_separate),  PIPE_CAP_BLEND_EQUATION_SEPARATE  
},
   { o(EXT_depth_bounds_test),PIPE_CAP_DEPTH_BOUNDS_TEST
},
   { o(EXT_draw_buffers2),PIPE_CAP_INDEP_BLEND_ENABLE   
},
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 9599296..42c9b87 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -2500,10 +2500,19 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
  else
 decl->size = type_size(var->type);
 
- entry = new(mem_ctx) variable_storage(var,
-   PROGRAM_OUTPUT,
-   decl->mesa_index,
-   decl->array_id);
+ if (var->data.fb_fetch_output) {
+st_dst_reg dst = st_dst_reg(get_temp(var->type));
+st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index,
+var->type, component, decl->array_id);
+emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src);
+entry = new(mem_ctx) variable_storage(var, dst.file, dst.index,
+  dst.array_id);
+ } else {
+entry = new(mem_ctx) variable_storage(var,
+  PROGRAM_OUTPUT,
+  decl->mesa_index,
+  decl->array_id);
+ }
  entry->component = component;
 
  this->variables.push_tail(entry);
@@ -6788,8 +6797,9 @@ st_link_shader(struct gl_context *ctx, struct 
gl_shader_program *prog)
  continue;
 
   bool progress;
-  exec_list *ir = prog->_LinkedShaders[i]->ir;
-  gl_shader_stage stage = prog->_LinkedShaders[i]->Stage;
+  struct gl_linked_shader *shader = prog->_LinkedShaders[i];
+  exec_list *ir = shader->ir;
+  gl_shader_stage stage = shader->Stage;
   const struct gl_shader_compiler_options *options =
 &ctx->Const.ShaderCompilerOptions[stage];
   enum pipe_shader_type ptarget = st_shader_stage_to_ptarget(stage);
@@ -6805,7 +6815,7 @@ st_link_shader(struct gl_context *ctx, struct 
gl_shader_program *prog)
   

[Mesa-dev] [PATCH 5/7] gallium: add flags parameter to texture barrier

2017-01-01 Thread Ilia Mirkin
This is so that we can differentiate between flushing any framebuffer
reading caches from regular sampler caches.

Signed-off-by: Ilia Mirkin 
---

This felt too simple and silly to create an extra callback for, especially
since the implementations that rely on texture sampling to retrieve fb
contents will have them be identical.

 src/gallium/docs/source/context.rst | 3 ++-
 src/gallium/drivers/ddebug/dd_context.c | 4 ++--
 src/gallium/drivers/ilo/ilo_draw.c  | 2 +-
 src/gallium/drivers/nouveau/nv50/nv50_context.c | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 2 +-
 src/gallium/drivers/r300/r300_state.c   | 2 +-
 src/gallium/drivers/r600/r600_state_common.c| 2 +-
 src/gallium/drivers/radeonsi/si_state.c | 2 +-
 src/gallium/drivers/softpipe/sp_flush.c | 4 ++--
 src/gallium/drivers/softpipe/sp_flush.h | 2 +-
 src/gallium/drivers/trace/tr_context.c  | 5 +++--
 src/gallium/include/pipe/p_context.h| 2 +-
 src/gallium/include/pipe/p_defines.h| 6 ++
 src/mesa/state_tracker/st_cb_texturebarrier.c   | 2 +-
 14 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/src/gallium/docs/source/context.rst 
b/src/gallium/docs/source/context.rst
index 35f51a0..e9dcccf 100644
--- a/src/gallium/docs/source/context.rst
+++ b/src/gallium/docs/source/context.rst
@@ -578,7 +578,8 @@ texture_barrier
 %%%
 
 This function flushes all pending writes to the currently-set surfaces and
-invalidates all read caches of the currently-set samplers.
+invalidates all read caches of the currently-set samplers. This can be used
+for both regular textures as well as for framebuffers read via FBREAD.
 
 
 
diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index edcbf2c..0b1dfbb 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -676,11 +676,11 @@ dd_context_texture_subdata(struct pipe_context *_pipe,
  */
 
 static void
-dd_context_texture_barrier(struct pipe_context *_pipe)
+dd_context_texture_barrier(struct pipe_context *_pipe, unsigned flags)
 {
struct pipe_context *pipe = dd_context(_pipe)->pipe;
 
-   pipe->texture_barrier(pipe);
+   pipe->texture_barrier(pipe, flags);
 }
 
 static void
diff --git a/src/gallium/drivers/ilo/ilo_draw.c 
b/src/gallium/drivers/ilo/ilo_draw.c
index 6831d2c..bef238a 100644
--- a/src/gallium/drivers/ilo/ilo_draw.c
+++ b/src/gallium/drivers/ilo/ilo_draw.c
@@ -603,7 +603,7 @@ ilo_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
 }
 
 static void
-ilo_texture_barrier(struct pipe_context *pipe)
+ilo_texture_barrier(struct pipe_context *pipe, unsigned flags)
 {
struct ilo_context *ilo = ilo_context(pipe);
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c 
b/src/gallium/drivers/nouveau/nv50/nv50_context.c
index fc852d7..ece7da9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c
@@ -43,7 +43,7 @@ nv50_flush(struct pipe_context *pipe,
 }
 
 static void
-nv50_texture_barrier(struct pipe_context *pipe)
+nv50_texture_barrier(struct pipe_context *pipe, unsigned flags)
 {
struct nouveau_pushbuf *push = nv50_context(pipe)->base.pushbuf;
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
index c711cb0..8f2b974 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
@@ -44,7 +44,7 @@ nvc0_flush(struct pipe_context *pipe,
 }
 
 static void
-nvc0_texture_barrier(struct pipe_context *pipe)
+nvc0_texture_barrier(struct pipe_context *pipe, unsigned flags)
 {
struct nouveau_pushbuf *push = nvc0_context(pipe)->base.pushbuf;
 
diff --git a/src/gallium/drivers/r300/r300_state.c 
b/src/gallium/drivers/r300/r300_state.c
index 196c0df..8c49bfd 100644
--- a/src/gallium/drivers/r300/r300_state.c
+++ b/src/gallium/drivers/r300/r300_state.c
@@ -2068,7 +2068,7 @@ static void r300_set_constant_buffer(struct pipe_context 
*pipe,
 }
 }
 
-static void r300_texture_barrier(struct pipe_context *pipe)
+static void r300_texture_barrier(struct pipe_context *pipe, unsigned flags)
 {
 struct r300_context *r300 = r300_context(pipe);
 
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 60490b0..a9dbc8b 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -91,7 +91,7 @@ void r600_emit_alphatest_state(struct r600_context *rctx, 
struct r600_atom *atom
radeon_set_context_reg(cs, R_028438_SX_ALPHA_REF, alpha_ref);
 }
 
-static void r600_texture_barrier(struct pipe_context *ctx)
+static void r600_texture_barrier(struct pipe_context *ctx, unsigned flags)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
diff --git a/src/gallium/drivers/radeonsi/si_st

[Mesa-dev] [PATCH 2/7] mesa: allow BlendBarrier to be used without support for full fb fetch

2017-01-01 Thread Ilia Mirkin
The extension spec is not currently published, so it's a bit premature
to require it for BlendBarrier usage.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/main/barrier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/barrier.c b/src/mesa/main/barrier.c
index 2f5b451..e55d13c 100644
--- a/src/mesa/main/barrier.c
+++ b/src/mesa/main/barrier.c
@@ -114,7 +114,8 @@ _mesa_BlendBarrier(void)
 {
GET_CURRENT_CONTEXT(ctx);
 
-   if (!ctx->Extensions.MESA_shader_framebuffer_fetch_non_coherent) {
+   if (!ctx->Extensions.MESA_shader_framebuffer_fetch_non_coherent &&
+   !ctx->Extensions.KHR_blend_equation_advanced) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glBlendBarrier(not supported)");
   return;
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] gallium: add PIPE_CAP_TGSI_FS_FBFETCH

2017-01-01 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/gallium/docs/source/screen.rst   | 2 ++
 src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
 src/gallium/drivers/i915/i915_screen.c   | 1 +
 src/gallium/drivers/ilo/ilo_screen.c | 1 +
 src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
 src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
 src/gallium/drivers/r300/r300_screen.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.c | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
 src/gallium/drivers/softpipe/sp_screen.c | 1 +
 src/gallium/drivers/svga/svga_screen.c   | 1 +
 src/gallium/drivers/swr/swr_screen.cpp   | 1 +
 src/gallium/drivers/vc4/vc4_screen.c | 1 +
 src/gallium/drivers/virgl/virgl_screen.c | 1 +
 src/gallium/include/pipe/p_defines.h | 1 +
 17 files changed, 18 insertions(+)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 86aa259..5e561ca 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -366,6 +366,8 @@ The integer capabilities:
   ARB_transform_feedback3.
 * ``PIPE_CAP_TGSI_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read
   from the output file.
+* ``PIPE_CAP_TGSI_FS_FBFETCH``: Whether a fragment shader can use the FBFETCH
+  opcode to retrieve the current value in the framebuffer.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index d84cd82..4d0981e 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -295,6 +295,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
return 0;
 
case PIPE_CAP_MAX_VIEWPORTS:
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 14f4271..1f13e4d 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -296,6 +296,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_NATIVE_FENCE_FD:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
   return 0;
 
case PIPE_CAP_MAX_VIEWPORTS:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index c3fad73..93509bb 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -519,6 +519,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_NATIVE_FENCE_FD:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 83045cd..3d8acdc 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -340,6 +340,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_NATIVE_FENCE_FD:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
   return 0;
}
/* should only get here on unhandled cases */
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index beffeac..759fc23 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -205,6 +205,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_NATIVE_FENCE_FD:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 22da3c9..0cfdb3c 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -259,6 +259,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_NATIVE_FENCE_FD:
+   case PIPE_CAP_TGSI_FS_FBFETCH:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 2526a14..fe637af 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_s

[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value

2017-01-01 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/gallium/auxiliary/tgsi/tgsi_info.c |  2 +-
 src/gallium/docs/source/tgsi.rst   | 11 +++
 src/gallium/include/pipe/p_shader_tokens.h |  2 +-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 37549aa..e34b8c7 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -106,7 +106,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
{ 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
{ 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
{ 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index d2d30b4..accbe1d 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -2561,6 +2561,17 @@ Resource Access Opcodes
   image, while .w will contain the number of samples for multi-sampled
   images.
 
+.. opcode:: FBFETCH - Load data from framebuffer
+
+  Syntax: ``FBFETCH dst, output``
+
+  Example: ``FBFETCH TEMP[0], OUT[0]``
+
+  Returns the color of the current position in the framebuffer from
+  before this fragment shader invocation. Always returns the same
+  value from multiple calls for a particular output within a single
+  invocation.
+
 
 .. _threadsyncopcodes:
 
diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index ee59df0..328768f 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -397,7 +397,7 @@ struct tgsi_property_data {
 #define TGSI_OPCODE_CMP 66
 #define TGSI_OPCODE_SCS 67
 #define TGSI_OPCODE_TXB 68
-/* gap */
+#define TGSI_OPCODE_FBFETCH 69
 #define TGSI_OPCODE_DIV 70
 #define TGSI_OPCODE_DP2 71
 #define TGSI_OPCODE_TXL 72
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] nvc0: enable FBFETCH with a special slot for color buffer 0

2017-01-01 Thread Ilia Mirkin
We don't need to support all the color buffers for advanced blend, just
cb0. For Fermi, we use the special binding slots so that we don't
overlap with user textures, while Kepler+ gets a dedicated position for
the fb handle in the driver constbuf.

This logic is only triggered when a FBREAD is actually present so it
should be a no-op most of the time.

Signed-off-by: Ilia Mirkin 
---

Firstly, I'm not 100% sure that the default_tsc is needed. I had temporary
failures with srgb decoding that went away on later runs that I was unable
to reproduce even by forcing it to have srgb conversion disabled in the
sampler. And TXF shouldn't need samplers. However I remember a weird situation
where TXF was producing failures without a sampler bound.

Secondly this needs to get testing on Fermi. All the dEQP advanced blend
tests pass on Kepler though.

 docs/features.txt  |  2 +-
 docs/relnotes/13.1.0.html  |  1 +
 .../drivers/nouveau/codegen/nv50_ir_driver.h   |  2 +
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 41 ++
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 21 +++--
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  4 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|  6 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_program.h|  1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  6 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.h |  2 +
 .../drivers/nouveau/nvc0/nvc0_state_validate.c | 92 +-
 11 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index c27d521..60cb68c 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -253,7 +253,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
 GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+
 
   GL_EXT_color_buffer_float DONE (all drivers)
-  GL_KHR_blend_equation_advancedDONE (i965)
+  GL_KHR_blend_equation_advancedDONE (i965, nvc0)
   GL_KHR_debug  DONE (all drivers)
   GL_KHR_robustness DONE (i965, nvc0, 
radeonsi)
   GL_KHR_texture_compression_astc_ldr   DONE (i965/gen9+)
diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html
index 4dce843..be2c206 100644
--- a/docs/relnotes/13.1.0.html
+++ b/docs/relnotes/13.1.0.html
@@ -45,6 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_post_depth_coverage on i965/gen9+
+GL_KHR_blend_equation_advanced on nvc0
 GL_INTEL_conservative_rasterization on i965/gen9+
 GL_NV_image_formats on any driver supporting 
GL_ARB_shader_image_load_store (i965, nvc0, radeonsi, softpipe)
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index 9fdabcc..fbb692d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -146,6 +146,7 @@ struct nv50_ir_prog_info
  bool usesDiscard;
  bool persampleInvocation;
  bool usesSampleMaskIn;
+ bool readsFramebuffer;
   } fp;
   struct {
  uint32_t inputOffset; /* base address for user args */
@@ -179,6 +180,7 @@ struct nv50_ir_prog_info
   bool nv50styleSurfaces;/* generate gX[] access for raw buffers */
   bool halfPixelCenter;  /* externally set half pixel center state */
   uint16_t texBindBase;  /* base address for tex handles (nve4) */
+  uint16_t fbtexBindBase;/* base address for fbtex handle (nve4) */
   uint16_t suInfoBase;   /* base address for surface info (nve4) */
   uint16_t bufInfoBase;  /* base address for buffer info */
   uint16_t sampleInfoBase;   /* base address for sample positions */
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index d4e29a0..84d04d1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1460,6 +1460,9 @@ bool Source::scanInstruction(const struct 
tgsi_full_instruction *inst)
if (insn.getOpcode() == TGSI_OPCODE_BARRIER)
   info->numBarriers = 1;
 
+   if (insn.getOpcode() == TGSI_OPCODE_FBFETCH)
+  info->prop.fp.readsFramebuffer = true;
+
if (insn.dstCount()) {
   Instruction::DstRegister dst = insn.getDst(0);
 
@@ -1575,6 +1578,7 @@ private:
void handleTEX(Value *dst0[4], int R, int S, int L, int C, int Dx, int Dy);
void handleTXF(Value *dst0[4], int R, int L_M);
void handleTXQ(Value *dst0[4], enum TexQuery, int R);
+   void handleFBFETCH(Value *dst0[4]);
void handleLIT(Value *dst0[4]);
void handleUserClipPlanes();
 
@@ -2294,6 +2298,40 @@ Converter::handleTXF(Value *dst[4], int R, int L_M)
 }
 
 void
+Converter::hand

Re: [Mesa-dev] [PATCH 18/27] i965/miptree: Add a return for updating of winsys

2017-01-01 Thread Pohjolainen, Topi
On Sat, Dec 31, 2016 at 02:40:42PM -0800, Ben Widawsky wrote:
> On 16-12-10 15:39:12, Pohjolainen, Topi wrote:
> > On Thu, Dec 01, 2016 at 02:09:59PM -0800, Ben Widawsky wrote:
> > > From: Ben Widawsky 
> > > 
> > > There is nothing particularly useful to do currently if the update
> > > fails, but there is no point carrying on either. As a result, this has a
> > > behavior change.
> > > 
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_context.c   | 14 --
> > >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  6 +++---
> > >  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  2 +-
> > >  3 files changed, 12 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> > > b/src/mesa/drivers/dri/i965/brw_context.c
> > > index b928f94..593fa67 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > > @@ -1645,9 +1645,10 @@ intel_process_dri2_buffer(struct brw_context *brw,
> > >return;
> > > }
> > > 
> > > -   intel_update_winsys_renderbuffer_miptree(brw, rb, bo,
> > > -drawable->w, drawable->h,
> > > -buffer->pitch);
> > > +   if (intel_update_winsys_renderbuffer_miptree(brw, rb, bo,
> > > +drawable->w, drawable->h,
> > > +buffer->pitch))
> > > +  return;
> > > 
> > > if (_mesa_is_front_buffer_drawing(fb) &&
> > > (buffer->attachment == __DRI_BUFFER_FRONT_LEFT ||
> > > @@ -1703,9 +1704,10 @@ intel_update_image_buffer(struct brw_context 
> > > *intel,
> > > if (last_mt && last_mt->bo == buffer->bo)
> > >return;
> > > 
> > > -   intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo,
> > > -buffer->width, 
> > > buffer->height,
> > > -buffer->pitch);
> > > +   if (intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo,
> > > +buffer->width, 
> > > buffer->height,
> > > +buffer->pitch))
> > > +  return;
> > > 
> > > if (_mesa_is_front_buffer_drawing(fb) &&
> > > buffer_type == __DRI_IMAGE_BUFFER_FRONT &&
> > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > index d002546..74db507 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > @@ -908,7 +908,7 @@ intel_miptree_create_for_image(struct brw_context 
> > > *intel,
> > >   * that will contain the actual rendering (which is lazily resolved to
> > >   * irb->singlesample_mt).
> > >   */
> > > -void
> > > +int
> > 
> > We don't seem to use "zero for success"-style at least in i965. Could you
> > change this to bool and flip the check earlier for consistency?
> > 
> 
> What do you mean by flip the check earlier?

Just the intel_update_winsys_renderbuffer_miptree() return value check, sorry
for confusing (that is obvious of course without saying).

> 
> > >  intel_update_winsys_renderbuffer_miptree(struct brw_context *intel,
> > >   struct intel_renderbuffer *irb,
> > >   drm_intel_bo *bo,
> > > @@ -974,12 +974,12 @@ intel_update_winsys_renderbuffer_miptree(struct 
> > > brw_context *intel,
> > >   irb->mt = multisample_mt;
> > >}
> > > }
> > > -   return;
> > > +   return 0;
> > > 
> > >  fail:
> > > intel_miptree_release(&irb->singlesample_mt);
> > > intel_miptree_release(&irb->mt);
> > > -   return;
> > > +   return -1;
> > >  }
> > > 
> > >  struct intel_mipmap_tree*
> > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > > index 7b9a7be..85fe118 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > > @@ -726,7 +726,7 @@ intel_miptree_create_for_image(struct brw_context 
> > > *intel,
> > > uint32_t pitch,
> > > uint32_t layout_flags);
> > > 
> > > -void
> > > +int
> > >  intel_update_winsys_renderbuffer_miptree(struct brw_context *intel,
> > >   struct intel_renderbuffer *irb,
> > >   drm_intel_bo *bo,
> > > --
> > > 2.10.2
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/me