date:20160926

Re: [Mesa-dev] [llvm] r282237 - [InstCombine] Fix for PR29124: reduce insertelements to shufflevector

2016-09-26 Thread Michel Dänzer


Hi Alexey,


On 23/09/16 06:14 PM, Alexey Bataev via llvm-commits wrote:
> Author: abataev
> Date: Fri Sep 23 04:14:08 2016
> New Revision: 282237
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=282237&view=rev
> Log:
> [InstCombine] Fix for PR29124: reduce insertelements to shufflevector

This change introduced failures with the Mesa llvmpipe driver unit test
lp_test_format. See below for information about the CPU, and the
attachment for the IR and results of the failing sub-tests. Let me know
if you need more information.


processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 48
model name  : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
stepping: 1
microcode   : 0x6003106
cpu MHz : 4100.000
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 16
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid
aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm
sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce
nodeid_msr tbm topoext perfctr_core perfctr_nb bpext arat cpb hw_pstate
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold vmmcall fsgsbase bmi1 xsaveopt
bugs: fxsave_leak sysret_ss_attrs
bogomips: 8200.55
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
define void @fetch_r32g32b32_float_float(<4 x float>*, i8*, i32, i32, { [2048 x 
i32], [128 x i64] }*) {
entry:
  %5 = getelementptr i8, i8* %1, i32 0
  %6 = bitcast i8* %5 to <3 x float>*
  %7 = load <3 x float>, <3 x float>* %6, align 4
  %8 = shufflevector <3 x float> %7, <3 x float> undef, <4 x i32> 
  %9 = shufflevector <4 x float> %8, <4 x float> , <4 x i32> 
  store <4 x float> %9, <4 x float>* %0
  ret void
}

Testing PIPE_FORMAT_R32G32B32_FLOAT (float) ...
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 0 0 obtained
  0 0 0 1 expected
FAILED
  Packed: 00 00 80 3f
  Unpacked (0,0): 1 1 0 0 obtained
  1 0 0 1 expected
FAILED
  Packed: 00 00 80 bf
  Unpacked (0,0): -1 1 0 0 obtained
  -1 0 0 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 0 0 obtained
  0 1 0 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 0 0 obtained
  0 -1 0 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 1 0 obtained
  0 0 1 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 -1 0 obtained
  0 0 -1 1 expected
FAILED
  Packed: 00 00 80 3f
  Unpacked (0,0): 1 1 1 0 obtained
  1 1 1 1 expected

[...]

define void @fetch_r32g32b32_unorm_float(<4 x float>*, i8*, i32, i32, { [2048 x 
i32], [128 x i64] }*) {
entry:
  %5 = getelementptr i8, i8* %1, i32 0
  %6 = bitcast i8* %5 to <3 x i32>*
  %7 = load <3 x i32>, <3 x i32>* %6, align 4
  %8 = shufflevector <3 x i32> %7, <3 x i32> undef, <4 x i32> 
  %9 = lshr <4 x i32> %8, 
  %10 = or <4 x i32> %9, 
  %11 = bitcast <4 x i32> %10 to <4 x float>
  %12 = fsub <4 x float> %11, 
  %13 = fmul <4 x float> %12, 
  %14 = shufflevector <4 x float> %13, <4 x float> , <4 x i32> 
  store <4 x float> %14, <4 x float>* %0
  ret void
}

Testing PIPE_FORMAT_R32G32B32_UNORM (float) ...
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 0 0 obtained
  0 0 0 1 expected
FAILED
  Packed: ff ff ff ff
  Unpacked (0,0): 1 1 0 0 obtained
  1 0 0 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 0 0 obtained
  0 1 0 1 expected
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 0 1 1 0 obtained
  0 0 1 1 expected
FAILED
  Packed: ff ff ff ff
  Unpacked (0,0): 1 1 1 0 obtained
  1 1 1 1 expected

[...]

define void @fetch_r32g32b32_uscaled_float(<4 x float>*, i8*, i32, i32, { [2048 
x i32], [128 x i64] }*) {
entry:
  %5 = getelementptr i8, i8* %1, i32 0
  %6 = bitcast i8* %5 to <3 x i32>*
  %7 = load <3 x i32>, <3 x i32>* %6, align 4
  %8 = shufflevector <3 x i32> %7, <3 x i32> undef, <4 x i32> 
  %9 = sitofp <4 x i32> %8 to <4 x float>
  %10 = shufflevector <4 x float> %9, <4 x float> , <4 x i32> 
  store <4 x float> %10, <4 x float>* %0
  ret void
}

Testing PIPE_FORMAT_R32G32B32_USCALED (float) ...
FAILED
  Packed: 00 00 00 00
  Unpacke

[Mesa-dev] [RFC] egl: stop claiming support for pbuffer + msaa (RFC)

2016-09-26 Thread Tapani Pälli

This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test
and same crash in many dEQP EGL tests.

I also found that some Qt example did a workaround because of this
crash: https://bugreports.qt.io/browse/QTBUG-47509

Signed-off-by: Tapani Pälli 
---

This is RFC as I'm not sure if we are supposed to support this. I tried
to verify overall pbuffer situation with some mesa-demos using pbuffer
but those are not working for me at all with or without my patch.

 src/egl/main/eglconfig.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/egl/main/eglconfig.c b/src/egl/main/eglconfig.c
index 6161d26..20cf9d4 100644
--- a/src/egl/main/eglconfig.c
+++ b/src/egl/main/eglconfig.c
@@ -407,6 +407,11 @@ _eglValidateConfig(const _EGLConfig *conf, EGLBoolean 
for_matching)
   return EGL_FALSE;
}
 
+   /* pbuffer with MSAA not supported */
+   if (conf->SurfaceType & EGL_PBUFFER_BIT && conf->Samples) {
+  return EGL_FALSE;
+   }
+
if (!(conf->SurfaceType & EGL_WINDOW_BIT)) {
   if (conf->NativeVisualID != 0 || conf->NativeVisualType != EGL_NONE)
  valid = EGL_FALSE;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] v2 st/va Avoid VBR bitrate calculation overflow

2016-09-26 Thread Andy Furniss

VBR bitrate calc needs 64 bits at high rates.
v2 use float.

Signed-off-by: Andy Furniss 
---
 src/gallium/state_trackers/va/picture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 7f3d96d..399667f 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -322,7 +322,7 @@ handleVAEncMiscParameterTypeRateControl(vlVaContext 
*context, VAEncMiscParameter
PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT)
   context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second;
else
-  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * 
rc->target_percentage / 100;
+  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * 
(rc->target_percentage / 100.0);
context->desc.h264enc.rate_ctrl.peak_bitrate = rc->bits_per_second;
if (context->desc.h264enc.rate_ctrl.target_bitrate < 200)
   context->desc.h264enc.rate_ctrl.vbv_buffer_size = 
MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75), 200);
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for vaapi encode

2016-09-26 Thread Andy Furniss


Andy Furniss wrote:

Andy Furniss wrote:

Andy Furniss wrote:


https://patchwork.freedesktop.org/patch/112040/


Hmm that got mungled I'll try again later going to be AFK for a while.


This one worked.

https://patchwork.freedesktop.org/patch/112069/


Or maybe a version that uses float - I don't know what is best.

https://patchwork.freedesktop.org/patch/112161/

Something else I noticed all the logic that chooses vbv buffer size just
below the patched line is just overwritten by 2000 in getEncParamPreset
is this deliberate?
Testing I see that too big (between 85M and 90M) is bad so setting from
bitrate can be bad, though it could be capped I guess.

Are the units in bits, like the spec/libx264?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa 4/4] nir/spirv: add spirv2nir binary to .gitignore

2016-09-26 Thread Eric Engestrom

On Sun, Sep 25, 2016 at 10:49:29AM -0700, Jason Ekstrand wrote:
> I hope you realize that this is the only truly useful change in the series.
> :-). Still, no reason why our silly little helpers shouldn't be correct.

Yeah, I know :P
I got the Coverity report like everyone and thought we might as well
print real error messages, esp. since asserts are gone in release builds
(but who would use spirv2nir in a release build? ^^)

> Series is
> 
> Reviewed-by: Jason Ekstrand 

Thanks! Can you push it for me?

Cheers,
  Eric

> 
> On Sep 25, 2016 6:50 PM, "Eric Engestrom"  wrote:
> 
> > Signed-off-by: Eric Engestrom 
> > ---
> >  src/compiler/.gitignore | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/src/compiler/.gitignore b/src/compiler/.gitignore
> > index c0e6299..c4f17be 100644
> > --- a/src/compiler/.gitignore
> > +++ b/src/compiler/.gitignore
> > @@ -3,3 +3,4 @@ subtest-cr
> >  subtest-cr-lf
> >  subtest-lf
> >  subtest-lf-cr
> > +spirv2nir
> > --
> > Cheers,
> >   Eric
> >
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/15] glsl/standalone: Optimize dead variable declarations

2016-09-26 Thread Tapani Pälli




On 09/16/2016 01:12 AM, Ian Romanick wrote:

From: Ian Romanick 

We didn't bother with this in the regular compiler because it doesn't
change the generated code.  In the stand-alone compiler, this can
clutter the output with useless variables.  It's especially bad after
functions are inlined but the foo_retval declarations remain.

Signed-off-by: Ian Romanick 
---
 src/compiler/glsl/standalone.cpp | 63 
 1 file changed, 63 insertions(+)

diff --git a/src/compiler/glsl/standalone.cpp b/src/compiler/glsl/standalone.cpp
index c4b6854..f7e1055 100644
--- a/src/compiler/glsl/standalone.cpp
+++ b/src/compiler/glsl/standalone.cpp
@@ -37,6 +37,7 @@
 #include "standalone_scaffolding.h"
 #include "standalone.h"
 #include "util/string_to_uint_map.h"
+#include "util/set.h"

 class add_neg_to_sub_visitor : public ir_hierarchical_visitor {
 public:
@@ -69,6 +70,64 @@ public:
}
 };

+class dead_variable_visitor : public ir_hierarchical_visitor {
+public:
+   dead_variable_visitor()
+   {
+  variables = _mesa_set_create(NULL,
+   _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
+   virtual ~dead_variable_visitor()
+   {
+  _mesa_set_destroy(variables, NULL);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *ir)
+   {
+  /* If the variable is auto or temp, add it to the set of variables that
+   * are candidates for removal.
+   */
+  if (ir->data.mode != ir_var_auto && ir->data.mode != ir_var_temporary)
+ return visit_continue;
+
+  _mesa_set_add(variables, ir);
+
+  return visit_continue;
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+  struct set_entry *entry = _mesa_set_search(variables, ir->var);
+
+  /* If a variable is dereferenced at all, remove it from the set of
+   * variables that are candidates for removal.
+   */
+  if (entry != NULL)
+ _mesa_set_remove(variables, entry);
+
+  return visit_continue;
+   }
+
+   void remove_dead_variables()
+   {
+  struct set_entry *entry;
+
+  for (entry = _mesa_set_next_entry(variables, NULL);
+   entry != NULL;
+   entry = _mesa_set_next_entry(variables, entry)) {


please use set_foreach() macro here, with that fixed;

Reviewed-by: Tapani Pälli 


+ ir_variable *ir = (ir_variable *) entry->key;
+
+ assert(ir->ir_type == ir_type_variable);
+ ir->remove();
+  }
+   }
+
+private:
+   set *variables;
+};
+
 static const struct standalone_options *options;

 static void
@@ -471,6 +530,10 @@ standalone_compile_shader(const struct standalone_options 
*_options,
  add_neg_to_sub_visitor v;
  visit_list_elements(&v, shader->ir);

+ dead_variable_visitor dv;
+ visit_list_elements(&dv, shader->ir);
+ dv.remove_dead_variables();
+
  shader->Program = rzalloc(shader, gl_program);
  init_gl_program(shader->Program, shader->Stage);
   }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2016-09-26 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #20 from Eero Tamminen  ---
Apitrace's own CPU overhead is so high that it's not very good for identifying
CPU bottlenecks.

Best would be to do (e.g. from SSH console):
  # perf record -a
  ^C

During the game freeze.

And provide profile report here:
  # perf report

(+ install debug symbols for anything that misses symbols in the report.)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] v2 st/va Avoid VBR bitrate calculation overflow

2016-09-26 Thread Christian König


Am 26.09.2016 um 11:44 schrieb Andy Furniss:

VBR bitrate calc needs 64 bits at high rates.
v2 use float.

Signed-off-by: Andy Furniss 


Reviewed-by: Christian König .

Since Leo is on vacation I will probably collect all remaining mesa 
patches and commit them later today.


Christian.


---
  src/gallium/state_trackers/va/picture.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 7f3d96d..399667f 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -322,7 +322,7 @@ handleVAEncMiscParameterTypeRateControl(vlVaContext 
*context, VAEncMiscParameter
 PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT)
context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second;
 else
-  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * 
rc->target_percentage / 100;
+  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * 
(rc->target_percentage / 100.0);
 context->desc.h264enc.rate_ctrl.peak_bitrate = rc->bits_per_second;
 if (context->desc.h264enc.rate_ctrl.target_bitrate < 200)
context->desc.h264enc.rate_ctrl.vbv_buffer_size = 
MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75), 200);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/88] glsl: Add initial functions to implement an on-disk cache

2016-09-26 Thread Brian Paul


On 09/23/2016 11:24 PM, Timothy Arceri wrote:

From: Carl Worth 

This code provides for an on-disk cache of objects. Objects are stored
and retrieved via names that are arbitrary 20-byte sequences,
(intended to be SHA-1 hashes of something identifying for the
content). The directory used for the cache can be specified by means
of environment variables in the following priority order:

$MESA_GLSL_CACHE_DIR
$XDG_CACHE_HOME/mesa
/.cache/mesa

By default the cache will be limited to a maximum size of 1GB. The
environment variable:

$MESA_GLSL_CACHE_MAX_SIZE

can be set (at the time of GL context creation) to choose some other
size. This variable is a number that can optionally be followed by
'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
gigabytes. By default, an unadorned value will be interpreted as
gigabytes.

The cache will be entirely disabled at runtime if the variable
MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.

Many thanks to Kristian Høgsberg  for the initial
implementation of code that led to this patch. In particular, the idea
of using an mmapped file, (indexed by a portion of the SHA-1), for the
efficent implementation of cache_has_key was entirely his
idea. Kristian also provided some very helpful advice in discussions
regarding various race conditions to be avoided in this code.

Signed-off-by: Timothy Arceri 
---
  configure.ac |   3 +
  docs/envvars.html|  11 +
  src/compiler/Makefile.glsl.am|  10 +
  src/compiler/Makefile.sources|   4 +
  src/compiler/glsl/cache.c| 709 +++
  src/compiler/glsl/cache.h| 172 +
  src/compiler/glsl/tests/.gitignore   |   1 +
  src/compiler/glsl/tests/cache_test.c | 416 
  8 files changed, 1326 insertions(+)
  create mode 100644 src/compiler/glsl/cache.c
  create mode 100644 src/compiler/glsl/cache.h
  create mode 100644 src/compiler/glsl/tests/cache_test.c

diff --git a/configure.ac b/configure.ac
index 0604ad9..7db31e4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1305,6 +1305,9 @@ if test "x$with_sha1" = "x"; then
  fi
  fi
  AM_CONDITIONAL([ENABLE_SHADER_CACHE], [test x$enable_shader_cache = xyes])
+if test "x$enable_shader_cache" = "xyes"; then
+   AC_DEFINE([ENABLE_SHADER_CACHE], [1], [Enable shader cache])
+fi

  case "$host_os" in
  linux*)
diff --git a/docs/envvars.html b/docs/envvars.html
index cf57ca5..2375145 100644
--- a/docs/envvars.html
+++ b/docs/envvars.html
@@ -112,6 +112,17 @@ glGetString(GL_VERSION) for OpenGL ES.
  glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are integers, such as
  "130".  Mesa will not really implement all the features of the given language 
version
  if it's higher than what's normally reported. (for developers only)
+MESA_GLSL_CACHE_DISABLE - if set, disables the GLSL shader cache
+MESA_GLSL_CACHE_MAX_SIZE - if set, determines the maximum size of
+the on-disk cache of compiled GLSL programs. Should be set to a number
+optionally followed by 'K', 'M', or 'G' to specify a size in
+kilobytes, megabytes, or gigabytes. By default, gigabytes will be
+assumed. And if unset, a maxium size of 1GB will be used.
+MESA_GLSL_CACHE_DIR - if set, determines the directory to be used
+for the on-disk cache of compiled GLSL programs. If this variable is
+not set, then the cache will be stored in $XDG_CACHE_HOME/.mesa (if
+that variable is set), or else within .cache/mesa within the user's
+home directory.
  MESA_GLSL - shading language compiler 
options
  MESA_NO_MINMAX_CACHE - when set, the minmax index cache is globally 
disabled.
  
diff --git a/src/compiler/Makefile.glsl.am b/src/compiler/Makefile.glsl.am
index b8225cb..80dfb73 100644
--- a/src/compiler/Makefile.glsl.am
+++ b/src/compiler/Makefile.glsl.am
@@ -33,6 +33,7 @@ EXTRA_DIST += glsl/tests glsl/glcpp/tests glsl/README \
  TESTS += glsl/glcpp/tests/glcpp-test  \
glsl/glcpp/tests/glcpp-test-cr-lf   \
glsl/tests/blob-test\
+   glsl/tests/cache-test   \
glsl/tests/general-ir-test  \
glsl/tests/optimization-test\
glsl/tests/sampler-types-test   \
@@ -47,6 +48,7 @@ check_PROGRAMS += \
glsl/glcpp/glcpp\
glsl/glsl_test  \
glsl/tests/blob-test\
+   glsl/tests/cache-test   \
glsl/tests/general-ir-test  \
glsl/tests/sampler-types-test   \
glsl/tests/uniform-initializer-test
@@ -58,6 +60,11 @@ glsl_tests_blob_test_SOURCES =   
\
  glsl_tests_blob_test_LDADD =  \
glsl/libglsl.la

+glsl_tests_ca

[Mesa-dev] [PATCH 1/2] i965: drop copy of NumImages

2016-09-26 Thread Lionel Landwerlin

We can access this value through gl_shader_program.

Signed-off-by: Lionel Landwerlin 
Cc: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_compiler.h  | 1 -
 src/mesa/drivers/dri/i965/brw_cs.c| 1 -
 src/mesa/drivers/dri/i965/brw_gs.c| 1 -
 src/mesa/drivers/dri/i965/brw_tcs.c   | 1 -
 src/mesa/drivers/dri/i965/brw_tes.c   | 1 -
 src/mesa/drivers/dri/i965/brw_vs.c| 5 +
 src/mesa/drivers/dri/i965/brw_wm.c| 4 +---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 5 -
 8 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 445c166..437528b 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -344,7 +344,6 @@ struct brw_stage_prog_data {
 
GLuint nr_params;   /**< number of float params/constants */
GLuint nr_pull_params;
-   unsigned nr_image_params;
 
unsigned curb_read_length;
unsigned total_scratch;
diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
b/src/mesa/drivers/dri/i965/brw_cs.c
index 4e746fe..febf53a 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.c
+++ b/src/mesa/drivers/dri/i965/brw_cs.c
@@ -106,7 +106,6 @@ brw_codegen_cs_prog(struct brw_context *brw,
prog_data.base.image_param =
   rzalloc_array(NULL, struct brw_image_param, cs->base.NumImages);
prog_data.base.nr_params = param_count;
-   prog_data.base.nr_image_params = cs->base.NumImages;
 
brw_nir_setup_glsl_uniforms(cp->program.Base.nir, prog, &cp->program.Base,
&prog_data.base, true);
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 741216c..486416a 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -128,7 +128,6 @@ brw_codegen_gs_prog(struct brw_context *brw,
prog_data.base.base.image_param =
   rzalloc_array(NULL, struct brw_image_param, gs->NumImages);
prog_data.base.base.nr_params = param_count;
-   prog_data.base.base.nr_image_params = gs->NumImages;
 
brw_nir_setup_glsl_uniforms(gp->program.Base.nir, prog, &gp->program.Base,
&prog_data.base.base,
diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c 
b/src/mesa/drivers/dri/i965/brw_tcs.c
index 7e6c69a..88df595 100644
--- a/src/mesa/drivers/dri/i965/brw_tcs.c
+++ b/src/mesa/drivers/dri/i965/brw_tcs.c
@@ -216,7 +216,6 @@ brw_codegen_tcs_prog(struct brw_context *brw,
 
   prog_data.base.base.image_param =
  rzalloc_array(NULL, struct brw_image_param, tcs->NumImages);
-  prog_data.base.base.nr_image_params = tcs->NumImages;
 
   brw_nir_setup_glsl_uniforms(nir, shader_prog, &tcp->program.Base,
   &prog_data.base.base,
diff --git a/src/mesa/drivers/dri/i965/brw_tes.c 
b/src/mesa/drivers/dri/i965/brw_tes.c
index 87ada17..88739b9 100644
--- a/src/mesa/drivers/dri/i965/brw_tes.c
+++ b/src/mesa/drivers/dri/i965/brw_tes.c
@@ -161,7 +161,6 @@ brw_codegen_tes_prog(struct brw_context *brw,
prog_data.base.base.image_param =
   rzalloc_array(NULL, struct brw_image_param, tes->NumImages);
prog_data.base.base.nr_params = param_count;
-   prog_data.base.base.nr_image_params = tes->NumImages;
 
prog_data.base.cull_distance_mask =
   ((1 << tep->program.Base.CullDistanceArraySize) - 1) <<
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index ba7315e..c242190 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -123,9 +123,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
 */
int param_count = vp->program.Base.nir->num_uniforms / 4;
 
-   if (vs)
-  prog_data.base.base.nr_image_params = vs->base.NumImages;
-
/* vec4_visitor::setup_uniform_clipplane_values() also uploads user clip
 * planes as uniforms.
 */
@@ -137,7 +134,7 @@ brw_codegen_vs_prog(struct brw_context *brw,
   rzalloc_array(NULL, const gl_constant_value *, param_count);
stage_prog_data->image_param =
   rzalloc_array(NULL, struct brw_image_param,
-stage_prog_data->nr_image_params);
+vs ? vs->base.NumImages : 0);
stage_prog_data->nr_params = param_count;
 
if (prog) {
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index 6ffe7c8..1af6bf7 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -106,8 +106,6 @@ brw_codegen_wm_prog(struct brw_context *brw,
 * by the state cache.
 */
int param_count = fp->program.Base.nir->num_uniforms / 4;
-   if (fs)
-  prog_data.base.nr_image_params = fs->base.NumImages;
/* The backend also sometimes adds params for texture size. */
param_count += 2 * 
ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
prog_data.base.param =
@@ -116,7 +114,7 @@ brw_codegen_wm_prog(struct brw_context *brw,

[Mesa-dev] [PATCH 2/2] i965: use L3 data cache for SSBOs

2016-09-26 Thread Lionel Landwerlin

Anv programs the hardware to use L3 data cache if we use either SSBOs or
images in the shaders, we can program i965 the same way.

gl_shader_program has a bit of a confusing named field with
'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the
shader in an atomic way but instead the number of atomic counters
manipulated by the shader.

Signed-off-by: Lionel Landwerlin 
Cc: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 92e8788..fdaea81 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -55,7 +55,8 @@ get_pipeline_state_l3_weights(const struct brw_context *brw)
  prog ? prog->_LinkedShaders[stage_states[i]->stage] : NULL;
   const struct brw_stage_prog_data *prog_data = stage_states[i]->prog_data;
 
-  needs_dc |= (prog && prog->NumAtomicBuffers) ||
+  needs_dc |= (prog && (prog->NumAtomicBuffers ||
+prog->NumShaderStorageBlocks)) ||
  (shader && shader->NumImages) ||
  (prog_data && prog_data->total_scratch);
   needs_slm |= prog_data && prog_data->total_shared;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Was: Re: [PATCH] r600g: Add support for PK2H/UP2H

2016-09-26 Thread Marek Olšák

Pushed. Thanks for the reminder.

Marek

On Wed, Sep 21, 2016 at 11:20 PM, Dieter Nützel  wrote:
> Ping. - Again.
>
> Ilia and Marek voted for it.
>
> Any progress?
> Anyone, Marek, Nicolai?
> Should I rebase?
>
> Dieter
>
>> [Mesa-dev] [PATCH] r600g: Add support for PK2H/UP2H
>>
>> Glenn Kennard glenn.kennard at gmail.com
>> Sun Jan 3 14:47:18 PST 2016
>> Previous message: [Mesa-dev] [PATCH 1/2] WIP gallivm: add support for
>> PK2H/UP2H Next message: [Mesa-dev] [PATCH] mesa: use gl_shader_variable in
>> program resource list Messages sorted by: [ date ] [ thread ] [ subject ]
>> [
>> author ]
>> Based off of Ilia's original patch, but with output values replicated so
>> that it matches the TGSI semantics.
>>
>> Signed-off-by: Glenn Kennard 
>> ---
>>
>>  src/gallium/drivers/r600/r600_pipe.c   |   2 +-
>>  src/gallium/drivers/r600/r600_shader.c | 107
>>  +++-- 2 files changed, 104 insertions(+), 5
>>  deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_pipe.c
>> b/src/gallium/drivers/r600/r600_pipe.c index d71082f..3b5d26c 100644
>> --- a/src/gallium/drivers/r600/r600_pipe.c
>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>> @@ -328,6 +328,7 @@ static int r600_get_param(struct pipe_screen* pscreen,
>> enum pipe_cap param)>
>> case PIPE_CAP_TEXTURE_QUERY_LOD:
>> case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
>>
>> case PIPE_CAP_SAMPLER_VIEW_TARGET:
>> +   case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
>> return family >= CHIP_CEDAR ? 1 : 0;
>> case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
>> return family >= CHIP_CEDAR ? 4 : 0;
>>
>> @@ -349,7 +350,6 @@ static int r600_get_param(struct pipe_screen* pscreen,
>> enum pipe_cap param)>
>> case PIPE_CAP_SHAREABLE_SHADERS:
>> case PIPE_CAP_CLEAR_TEXTURE:
>>
>> case PIPE_CAP_DRAW_PARAMETERS:
>> -   case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
>> return 0;
>> case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>> b/src/gallium/drivers/r600/r600_shader.c index 9c040ae..7b1eade 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -8960,6 +8960,105 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
>>
>> return 0;
>>
>>  }
>>
>> +static int tgsi_pk2h(struct r600_shader_ctx *ctx)
>> +{
>> +   struct tgsi_full_instruction *inst =
>> &ctx->parse.FullToken.FullInstruction;
>> +   struct r600_bytecode_alu alu;
>> +   int r, i;
>> +   int lasti =
>> tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>> +
>> +   /* temp.xy = f32_to_f16(src) */
>> +   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
>> +   alu.op = ALU_OP1_FLT32_TO_FLT16;
>> +   alu.dst.chan = 0;
>> +   alu.dst.sel = ctx->temp_reg;
>> +   alu.dst.write = 1;
>> +   r600_bytecode_src(&alu.src[0], &ctx->src[0], 0);
>> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
>> +   if (r)
>> +   return r;
>> +   alu.dst.chan = 1;
>> +   r600_bytecode_src(&alu.src[0], &ctx->src[0], 1);
>> +   alu.last = 1;
>> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
>> +   if (r)
>> +   return r;
>> +
>> +   /* dst.x = temp.y * 0x1 + temp.x */
>> +   for (i = 0; i < lasti + 1; i++) {
>> +   if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
>> +   continue;
>> +
>> +   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
>> +   alu.op = ALU_OP3_MULADD_UINT24;
>> +   alu.is_op3 = 1;
>> +   tgsi_dst(ctx, &inst->Dst[0], i, &alu.dst);
>> +   alu.last = i == lasti;
>> +   alu.src[0].sel = ctx->temp_reg;
>> +   alu.src[0].chan = 1;
>> +   alu.src[1].sel = V_SQ_ALU_SRC_LITERAL;
>> +   alu.src[1].value = 0x1;
>> +   alu.src[2].sel = ctx->temp_reg;
>> +   alu.src[2].chan = 0;
>> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
>> +   if (r)
>> +   return r;
>> +   }
>> +
>> +   return 0;
>> +}
>> +
>> +static int tgsi_up2h(struct r600_shader_ctx *ctx)
>> +{
>> +   struct tgsi_full_instruction *inst =
>> &ctx->parse.FullToken.FullInstruction;
>> +   struct r600_bytecode_alu alu;
>> +   int r, i;
>> +   int lasti =
>> tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>> +
>> +   /* temp.x = src.x */
>> +   /* note: no need to mask out the high bits */
>> +   memset(&alu, 0, sizeof(struct r600_bytecode_alu));
>> +   alu.op = ALU_OP1_MOV;
>> +   alu.dst.chan = 0;
>> +   alu.dst.sel = ctx->temp_reg;
>> +   alu.dst.write = 1;
>> +   r600_bytecode_src(&alu.src[0], &ctx->src[0], 0);
>> +   r = r600_bytecode_add_alu(ctx->bc, &alu);
>> +   if (r)
>> +   return r;
>> +
>> +   /* temp.y = src.x >> 16 */
>> +   memset(&al

Re: [Mesa-dev] [RFC] egl: stop claiming support for pbuffer + msaa (RFC)

2016-09-26 Thread Marek Olšák

Sounds good to me. I think only legacy applications would use
pbuffers. There is no reason to use pbuffers on anything that has
GL_ARB_framebuffer_object (pbuffers were use to do render-to-texture
when FBOs didn't exist).

Reviewed-by: Marek Olšák 

Marek

On Mon, Sep 26, 2016 at 9:41 AM, Tapani Pälli  wrote:
> This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test
> and same crash in many dEQP EGL tests.
>
> I also found that some Qt example did a workaround because of this
> crash: https://bugreports.qt.io/browse/QTBUG-47509
>
> Signed-off-by: Tapani Pälli 
> ---
>
> This is RFC as I'm not sure if we are supposed to support this. I tried
> to verify overall pbuffer situation with some mesa-demos using pbuffer
> but those are not working for me at all with or without my patch.
>
>  src/egl/main/eglconfig.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/egl/main/eglconfig.c b/src/egl/main/eglconfig.c
> index 6161d26..20cf9d4 100644
> --- a/src/egl/main/eglconfig.c
> +++ b/src/egl/main/eglconfig.c
> @@ -407,6 +407,11 @@ _eglValidateConfig(const _EGLConfig *conf, EGLBoolean 
> for_matching)
>return EGL_FALSE;
> }
>
> +   /* pbuffer with MSAA not supported */
> +   if (conf->SurfaceType & EGL_PBUFFER_BIT && conf->Samples) {
> +  return EGL_FALSE;
> +   }
> +
> if (!(conf->SurfaceType & EGL_WINDOW_BIT)) {
>if (conf->NativeVisualID != 0 || conf->NativeVisualType != EGL_NONE)
>   valid = EGL_FALSE;
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/mesa: enable ARB_ES3_2_compatibility when enough available

2016-09-26 Thread Marek Olšák

For the series:

Acked-by: Marek Olšák 

Marek

On Fri, Sep 23, 2016 at 2:52 AM, Ilia Mirkin  wrote:
> ping
>
> On Tue, Sep 13, 2016 at 8:54 PM, Ilia Mirkin  wrote:
>> Signed-off-by: Ilia Mirkin 
>> ---
>>  src/mesa/state_tracker/st_extensions.c | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/src/mesa/state_tracker/st_extensions.c 
>> b/src/mesa/state_tracker/st_extensions.c
>> index 4d54928..55019d7 100644
>> --- a/src/mesa/state_tracker/st_extensions.c
>> +++ b/src/mesa/state_tracker/st_extensions.c
>> @@ -1246,4 +1246,24 @@ void st_init_extensions(struct pipe_screen *screen,
>>extensions->OES_texture_buffer &&
>>extensions->OES_texture_cube_map_array &&
>>extensions->EXT_texture_sRGB_decode;
>> +
>> +   /* Same deal as for ARB_ES3_1_compatibility - this has to be computed
>> +* before overall versions are selected. Also it's actually a subset of 
>> ES
>> +* 3.2, since it doesn't require ASTC or advanced blending.
>> +*/
>> +   extensions->ARB_ES3_2_compatibility =
>> +  extensions->ARB_ES3_1_compatibility &&
>> +  extensions->KHR_robustness &&
>> +  extensions->ARB_copy_image &&
>> +  extensions->ARB_draw_buffers_blend &&
>> +  extensions->ARB_draw_elements_base_vertex &&
>> +  extensions->OES_geometry_shader &&
>> +  extensions->ARB_gpu_shader5 &&
>> +  extensions->ARB_sample_shading &&
>> +  extensions->ARB_tessellation_shader &&
>> +  extensions->ARB_texture_border_clamp &&
>> +  extensions->OES_texture_buffer &&
>> +  extensions->ARB_texture_cube_map_array &&
>> +  extensions->ARB_texture_stencil8 &&
>> +  extensions->ARB_texture_multisample;
>>  }
>> --
>> 2.7.3
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/88] glsl: Add initial functions to

2016-09-26 Thread Eric Anholt

Timothy Arceri  writes:

> On Sun, 2016-09-25 at 13:26 -0700, Eric Anholt wrote:
>> Timothy Arceri  writes:
>> > +static void
>> > +test_put_key_and_get_key(void)
>> > +{
>> > +   struct program_cache *cache;
>> > +   bool result;
>> > +
>> > +   uint8_t key_a[20] = {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
>> > +   10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
>> > +   uint8_t key_b[20] = { 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>> > +   30, 33, 32, 33, 34, 35, 36, 37, 38, 39};
>> > +   uint8_t key_a_collide[20] =
>> > +{ 0,  1, 42, 43, 44, 45, 46, 47, 48, 49,
>> > +   50, 55, 52, 53, 54, 55, 56, 57, 58, 59};
>> > +
>> > +   cache = cache_create();
>> > +
>> > +   /* First test that cache_has_key returns false before
>> > cache_put_key */
>> > +   result = cache_has_key(cache, key_a);
>> > +   expect_equal(result, 0, "cache_has_key before key added");
>> > +
>> > +   /* Then a couple of tests of cache_put_key followed by
>> > cache_has_key */
>> > +   cache_put_key(cache, key_a);
>> > +   result = cache_has_key(cache, key_a);
>> > +   expect_equal(result, 1, "cache_has_key after key added");
>> > +
>> > +   cache_put_key(cache, key_b);
>> > +   result = cache_has_key(cache, key_b);
>> > +   expect_equal(result, 1, "2nd cache_has_key after key added");
>> > +
>> > +   /* Test that a key with the same two bytes as an existing key
>> > +* forces an eviction.
>> > +*/
>> > +   cache_put_key(cache, key_a_collide);
>> > +   result = cache_has_key(cache, key_a_collide);
>> > +   expect_equal(result, 1, "put_key of a colliding key lands in
>> > the cache");
>> > +
>> > +   result = cache_has_key(cache, key_a);
>> > +   expect_equal(result, 0, "put_key of a colliding key evicts from
>> > the cache");
>> > +
>> > +   /* And finally test that we can re-add the original key to re-
>> > evict
>> > +* the colliding key.
>> > +*/
>> > +   cache_put_key(cache, key_a);
>> > +   result = cache_has_key(cache, key_a);
>> > +   expect_equal(result, 1, "put_key of original key lands again");
>> > +
>> > +   result = cache_has_key(cache, key_a_collide);
>> > +   expect_equal(result, 0, "put_key of oiginal key evicts the
>> > colliding key");
>> 
>> "original"
>> 
>> I haven't yet figured out what the purpose of
>> cache_put_key()/cache_has_key() are.  I suppose I'll find out later
>> in
>> the series.
>
> Since we cache a program rather than individual shaders we set a cache
> key for each shader and opportunistically skip compiling it next time
> we see the shader.
>
> If we happen to be using the shader to create a program we haven't seen
> before we end up having to fall back to compiling the shader later. 

That works out better than just always skipping shader compile until
link time and you find that you need it?


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 02/14] mesa/main: add support for ARB_compute_variable_groups_size

2016-09-26 Thread Samuel Pitoiset

v2: - update formatting spec quotations (Ian)
- move the total_invocations check outside of the loop (Ian)

Signed-off-by: Samuel Pitoiset 
---
 src/mesa/main/api_validate.c | 96 
 src/mesa/main/api_validate.h |  4 ++
 src/mesa/main/compute.c  | 17 +++
 src/mesa/main/context.c  |  6 +++
 src/mesa/main/dd.h   |  9 
 src/mesa/main/extensions_table.h |  1 +
 src/mesa/main/get.c  | 12 +
 src/mesa/main/get_hash_params.py |  3 ++
 src/mesa/main/mtypes.h   | 24 +-
 src/mesa/main/shaderapi.c|  1 +
 src/mesa/main/shaderobj.c|  2 +
 11 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 6cb626a..fa24854 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -1096,6 +1096,7 @@ GLboolean
 _mesa_validate_DispatchCompute(struct gl_context *ctx,
const GLuint *num_groups)
 {
+   struct gl_shader_program *prog;
int i;
FLUSH_CURRENT(ctx, 0);
 
@@ -1128,6 +1129,88 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx,
   }
}
 
+   /* The ARB_compute_variable_group_size spec says:
+*
+* "An INVALID_OPERATION error is generated by DispatchCompute if the active
+* program for the compute shader stage has a variable work group size."
+*/
+   prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
+   if (prog->Comp.LocalSizeVariable) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  "glDispatchCompute(variable work group size forbidden)");
+  return GL_FALSE;
+   }
+
+   return GL_TRUE;
+}
+
+GLboolean
+_mesa_validate_DispatchComputeGroupSizeARB(struct gl_context *ctx,
+   const GLuint *num_groups,
+   const GLuint *group_size)
+{
+   struct gl_shader_program *prog;
+   GLuint total_invocations = 1;
+   int i;
+
+   FLUSH_CURRENT(ctx, 0);
+
+   if (!check_valid_to_compute(ctx, "glDispatchComputeGroupSizeARB"))
+  return GL_FALSE;
+
+   for (i = 0; i < 3; i++) {
+  /* The ARB_compute_variable_group_size spec says:
+   *
+   * "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if
+   * any of , , or  is less than
+   * or equal to zero or greater than the maximum local work group size for
+   * compute shaders with variable group size
+   * (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in the corresponding dimension."
+   *
+   * However, the "less than" is a spec bug because they are declared as
+   * unsigned integers.
+   */
+  if (group_size[i] == 0 ||
+  group_size[i] > ctx->Const.MaxComputeVariableGroupSize[i]) {
+ _mesa_error(ctx, GL_INVALID_VALUE,
+ "glDispatchComputeGroupSizeARB(group_size_%c)", 'x' + i);
+ return GL_FALSE;
+  }
+
+  /* The ARB_compute_variable_group_size spec says:
+   *
+   * "An INVALID_OPERATION error is generated by
+   * DispatchComputeGroupSizeARB if the active program for the compute
+   * shader stage has a fixed work group size."
+   */
+  prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
+  if (prog->Comp.LocalSize[i] != 0) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+ "glDispatchComputeGroupSizeARB(fixed work group size "
+ "forbidden)");
+ return GL_FALSE;
+  }
+
+  total_invocations *= group_size[i];
+   }
+
+   /* The ARB_compute_variable_group_size spec says:
+*
+* "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if
+* the product of , , and  exceeds
+* the implementation-dependent maximum local work group invocation count
+* for compute shaders with variable group size
+* (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB)."
+*/
+   if (total_invocations > ctx->Const.MaxComputeVariableGroupInvocations) {
+  _mesa_error(ctx, GL_INVALID_VALUE,
+  "glDispatchComputeGroupSizeARB(product of local_sizes "
+  "exceeds MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB "
+  "(%d > %d))", total_invocations,
+  ctx->Const.MaxComputeVariableGroupInvocations);
+  return GL_FALSE;
+   }
+
return GL_TRUE;
 }
 
@@ -1137,6 +1220,7 @@ valid_dispatch_indirect(struct gl_context *ctx,
 GLsizei size, const char *name)
 {
const uint64_t end = (uint64_t) indirect + size;
+   struct gl_shader_program *prog;
 
if (!check_valid_to_compute(ctx, name))
   return GL_FALSE;
@@ -1182,6 +1266,18 @@ valid_dispatch_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
+   /* The ARB_compute_variable_group_size spec says:
+*
+* "An INVALID_OPERATION error is generated if the active program for the
+* compute shader stage has a variable work group size.

[Mesa-dev] [PATCH v3 04/14] glsl: process local_size_variable input qualifier

2016-09-26 Thread Samuel Pitoiset

This is the new layout qualifier introduced by
ARB_compute_variable_group_size which allows to use a variable work
group size.

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ian Romanick 
---
 src/compiler/glsl/ast.h  |  5 +
 src/compiler/glsl/ast_type.cpp   |  6 ++
 src/compiler/glsl/glsl_parser.yy | 13 +
 src/compiler/glsl/glsl_parser_extras.cpp |  6 ++
 src/compiler/glsl/glsl_parser_extras.h   |  6 ++
 5 files changed, 36 insertions(+)

diff --git a/src/compiler/glsl/ast.h b/src/compiler/glsl/ast.h
index 4c648d0..55f009a 100644
--- a/src/compiler/glsl/ast.h
+++ b/src/compiler/glsl/ast.h
@@ -553,6 +553,11 @@ struct ast_type_qualifier {
   */
  unsigned local_size:3;
 
+/** \name Layout qualifiers for ARB_compute_variable_group_size. */
+/** \{ */
+unsigned local_size_variable:1;
+/** \} */
+
 /** \name Layout and memory qualifiers for 
ARB_shader_image_load_store. */
 /** \{ */
 unsigned early_fragment_tests:1;
diff --git a/src/compiler/glsl/ast_type.cpp b/src/compiler/glsl/ast_type.cpp
index f3f6b29..3f19f1f 100644
--- a/src/compiler/glsl/ast_type.cpp
+++ b/src/compiler/glsl/ast_type.cpp
@@ -503,6 +503,7 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
  state->in_qualifier->flags.q.local_size == 0;
 
   valid_in_mask.flags.q.local_size = 7;
+  valid_in_mask.flags.q.local_size_variable = 1;
   break;
default:
   _mesa_glsl_error(loc, state,
@@ -580,6 +581,10 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
   this->point_mode = q.point_mode;
}
 
+   if (q.flags.q.local_size_variable) {
+  state->cs_input_local_size_variable_specified = true;
+   }
+
if (create_node) {
   if (create_gs_ast) {
  node = new(mem_ctx) ast_gs_input_layout(*loc, q.prim_type);
@@ -653,6 +658,7 @@ ast_type_qualifier::validate_flags(YYLTYPE *loc,
 bad.flags.q.prim_type ? " prim_type" : "",
 bad.flags.q.max_vertices ? " max_vertices" : "",
 bad.flags.q.local_size ? " local_size" : "",
+bad.flags.q.local_size_variable ? " local_size_variable" : 
"",
 bad.flags.q.early_fragment_tests ? " early_fragment_tests" 
: "",
 bad.flags.q.explicit_image_format ? " image_format" : "",
 bad.flags.q.coherent ? " coherent" : "",
diff --git a/src/compiler/glsl/glsl_parser.yy b/src/compiler/glsl/glsl_parser.yy
index 9e1fd9e..38cbd3f 100644
--- a/src/compiler/glsl/glsl_parser.yy
+++ b/src/compiler/glsl/glsl_parser.yy
@@ -1491,6 +1491,19 @@ layout_qualifier_id:
  }
   }
 
+  /* Layout qualifiers for ARB_compute_variable_group_size. */
+  if (!$$.flags.i) {
+ if (match_layout_qualifier($1, "local_size_variable", state) == 0) {
+$$.flags.q.local_size_variable = 1;
+ }
+
+ if ($$.flags.i && !state->ARB_compute_variable_group_size_enable) {
+_mesa_glsl_error(& @1, state,
+ "qualifier `local_size_variable` requires "
+ "ARB_compute_variable_group_size");
+ }
+  }
+
   if (!$$.flags.i) {
  _mesa_glsl_error(& @1, state, "unrecognized layout identifier "
   "`%s'", $1);
diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index eff5235..e200b7d 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -297,6 +297,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
gl_context *_ctx,
   sizeof(this->atomic_counter_offsets));
this->allow_extension_directive_midshader =
   ctx->Const.AllowGLSLExtensionDirectiveMidShader;
+
+   this->cs_input_local_size_variable_specified = false;
 }
 
 /**
@@ -1676,6 +1678,7 @@ set_shader_inout_layout(struct gl_shader *shader,
if (shader->Stage != MESA_SHADER_COMPUTE) {
   /* Should have been prevented by the parser. */
   assert(!state->cs_input_local_size_specified);
+  assert(!state->cs_input_local_size_variable_specified);
}
 
if (shader->Stage != MESA_SHADER_FRAGMENT) {
@@ -1791,6 +1794,9 @@ set_shader_inout_layout(struct gl_shader *shader,
  for (int i = 0; i < 3; i++)
 shader->info.Comp.LocalSize[i] = 0;
   }
+
+  shader->info.Comp.LocalSizeVariable =
+ state->cs_input_local_size_variable_specified;
   break;
 
case MESA_SHADER_FRAGMENT:
diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index 7528df7..127edbc 100644
--- a/src/compiler/glsl/glsl_parser_extras.h
+++ b/src/compiler/glsl/glsl_parser_extras.h
@@ -405,6 +405,12 @@ struct _mesa_glsl_parse_state {
unsigned cs_input_local_size[3];
 
/**
+* True if a compute shader input local variable size was specified using
+* a layout d

[Mesa-dev] [PATCH v3 00/14] add support for ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset

v3: - use a new case statement in r600_pipe_common.c
- fix compilation with softpipe
- initialize max_variable_threads_per_block to 0

v2: - update formatting spec quotations
- add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
- expose the ext based on that new cap
- add missing relnotes
- various cosmetic changes

From original cover-letter:

Hi,

This series implements ARB_compute_variable_group_size written against GL 4.3.
This extension allows to dispatch variable work group size via a new function
called glDispatchComputeGroupSizeARB().

Because this extension is pretty similar to ARB_compute_shader, all Gallium
drivers which already support compute shaders will expose
ARB_compute_variable_group_size with that series.

I did write a bunch of piglit tests, have a look here if you want:
https://lists.freedesktop.org/archives/piglit/2016-September/020755.html

All tests pass on Fermi (GF119) as well as all previous compute shaders tests.

Marek, Nicolai and other AMD folks, I don't know if radeonsi will need a fix
somewhere for handling a variable work group size, but as I don't have the
hardware, I can't test. Let me know if something needs to be slighty updated.

Please review,
Thanks!

Samuel Pitoiset (14):
  glapi: add entry points for GL_ARB_compute_variable_group_size
  mesa/main: add support for ARB_compute_variable_groups_size
  glsl: add enable flags for ARB_compute_variable_group_size
  glsl: process local_size_variable input qualifier
  glsl: reject compute shaders with fixed and variable local size
  glsl/linker: handle errors when a variable local size is used
  glsl: add gl_LocalGroupSizeARB as a system value
  gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
  st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZE
  st/mesa: add support for dispatching a variable local size
  st/mesa: expose ARB_compute_variable_group_size
  nv50/ir: use 1024 threads/block for variable local size
  nvc0: expose ARB_compute_variable_group_size
  docs: mark ARB_compute_variable_group_size as done for nvc0

 docs/features.txt  |  2 +-
 docs/relnotes/12.1.0.html  |  1 +
 src/compiler/glsl/ast.h|  5 ++
 src/compiler/glsl/ast_to_hir.cpp   | 14 
 src/compiler/glsl/ast_type.cpp |  6 ++
 src/compiler/glsl/builtin_variables.cpp|  6 ++
 src/compiler/glsl/glsl_parser.yy   | 13 +++
 src/compiler/glsl/glsl_parser_extras.cpp   |  7 ++
 src/compiler/glsl/glsl_parser_extras.h |  8 ++
 src/compiler/glsl/linker.cpp   | 25 +-
 src/compiler/glsl/standalone.cpp   |  4 +
 src/compiler/glsl/standalone_scaffolding.cpp   |  5 ++
 src/compiler/shader_enums.h|  1 +
 src/gallium/docs/source/screen.rst |  4 +
 src/gallium/drivers/ilo/ilo_screen.c   |  2 +
 .../drivers/nouveau/codegen/nv50_ir_target.h   |  3 +-
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  2 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  1 +
 src/gallium/drivers/radeon/r600_pipe_common.c  |  2 +
 src/gallium/drivers/softpipe/sp_screen.c   |  1 +
 src/gallium/include/pipe/p_defines.h   |  3 +-
 .../glapi/gen/ARB_compute_variable_group_size.xml  | 25 ++
 src/mapi/glapi/gen/Makefile.am |  1 +
 src/mapi/glapi/gen/gl_API.xml  |  4 +-
 src/mesa/main/api_validate.c   | 96 ++
 src/mesa/main/api_validate.h   |  4 +
 src/mesa/main/compute.c| 25 ++
 src/mesa/main/compute.h|  5 ++
 src/mesa/main/context.c|  6 ++
 src/mesa/main/dd.h |  9 ++
 src/mesa/main/extensions_table.h   |  1 +
 src/mesa/main/get.c| 12 +++
 src/mesa/main/get_hash_params.py   |  3 +
 src/mesa/main/mtypes.h | 24 +-
 src/mesa/main/shaderapi.c  |  1 +
 src/mesa/main/shaderobj.c  |  2 +
 src/mesa/main/tests/dispatch_sanity.cpp|  3 +
 src/mesa/state_tracker/st_cb_compute.c | 15 +++-
 src/mesa/state_tracker/st_extensions.c | 22 +
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  2 +
 40 files changed, 365 insertions(+), 10 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml

-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 01/14] glapi: add entry points for GL_ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset

v2: - correctly sort that new extension (Ian)
- fix up the comment (Ian)

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ian Romanick 
---
 .../glapi/gen/ARB_compute_variable_group_size.xml  | 25 ++
 src/mapi/glapi/gen/Makefile.am |  1 +
 src/mapi/glapi/gen/gl_API.xml  |  4 +++-
 src/mesa/main/compute.c|  8 +++
 src/mesa/main/compute.h|  5 +
 src/mesa/main/tests/dispatch_sanity.cpp|  3 +++
 6 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml

diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml 
b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
new file mode 100644
index 000..b21c52f
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
@@ -0,0 +1,25 @@
+
+
+
+
+
+
+
+
+
+  
+  
+  
+  
+
+  
+
+
+
+
+
+
+  
+
+
+
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index ba5d144..bd51157 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -120,6 +120,7 @@ API_XML = \
ARB_color_buffer_float.xml \
ARB_compressed_texture_pixel_storage.xml \
ARB_compute_shader.xml \
+   ARB_compute_variable_group_size.xml \
ARB_copy_buffer.xml \
ARB_copy_image.xml \
ARB_debug_output.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 17c59db..5998ccf 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8254,7 +8254,9 @@
 
 
 
-
+
+
+http://www.w3.org/2001/XInclude"/>
 
 http://www.w3.org/2001/XInclude"/>
 
diff --git a/src/mesa/main/compute.c b/src/mesa/main/compute.c
index b71430f..b052bae 100644
--- a/src/mesa/main/compute.c
+++ b/src/mesa/main/compute.c
@@ -60,3 +60,11 @@ _mesa_DispatchComputeIndirect(GLintptr indirect)
 
ctx->Driver.DispatchComputeIndirect(ctx, indirect);
 }
+
+void GLAPIENTRY
+_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y,
+  GLuint num_groups_z, GLuint group_size_x,
+  GLuint group_size_y, GLuint group_size_z)
+{
+
+}
diff --git a/src/mesa/main/compute.h b/src/mesa/main/compute.h
index 0cc034f..8018bbb 100644
--- a/src/mesa/main/compute.h
+++ b/src/mesa/main/compute.h
@@ -35,4 +35,9 @@ _mesa_DispatchCompute(GLuint num_groups_x,
 extern void GLAPIENTRY
 _mesa_DispatchComputeIndirect(GLintptr indirect);
 
+extern void GLAPIENTRY
+_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y,
+  GLuint num_groups_z, GLuint group_size_x,
+  GLuint group_size_y, GLuint group_size_z);
+
 #endif
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index 0d3b6ab..3fdd80a 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -942,6 +942,9 @@ const struct function common_desktop_functions_possible[] = 
{
{ "glDispatchCompute", 43, -1 },
{ "glDispatchComputeIndirect", 43, -1 },
 
+   /* GL_ARB_compute_variable_group_size */
+   { "glDispatchComputeGroupSizeARB", 43, -1 },
+
/* GL_EXT_polygon_offset_clamp */
{ "glPolygonOffsetClampEXT", 11, -1 },
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 06/14] glsl/linker: handle errors when a variable local size is used

2016-09-26 Thread Samuel Pitoiset

Compute shaders can now include a fixed local size as defined by
ARB_compute_shader or a variable size as defined by
ARB_compute_variable_group_size.

v2: - update formatting spec quotations (Ian)
- various cosmetic changes (Ian)

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ian Romanick 
---
 src/compiler/glsl/linker.cpp | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 929a653..b76a950 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -1989,6 +1989,8 @@ link_cs_input_layout_qualifiers(struct gl_shader_program 
*prog,
for (int i = 0; i < 3; i++)
   linked_shader->info.Comp.LocalSize[i] = 0;
 
+   linked_shader->info.Comp.LocalSizeVariable = false;
+
/* This function is called for all shader stages, but it only has an effect
 * for compute shaders.
 */
@@ -2023,6 +2025,20 @@ link_cs_input_layout_qualifiers(struct gl_shader_program 
*prog,
 linked_shader->info.Comp.LocalSize[i] =
shader->info.Comp.LocalSize[i];
  }
+  } else if (shader->info.Comp.LocalSizeVariable) {
+ if (linked_shader->info.Comp.LocalSize[0] != 0) {
+/* The ARB_compute_variable_group_size spec says:
+ *
+ * If one compute shader attached to a program declares a
+ * variable local group size and a second compute shader
+ * attached to the same program declares a fixed local group
+ * size, a link-time error results.
+ */
+linker_error(prog, "compute shader defined with both fixed and "
+ "variable local group size\n");
+return;
+ }
+ linked_shader->info.Comp.LocalSizeVariable = true;
   }
}
 
@@ -2030,12 +2046,17 @@ link_cs_input_layout_qualifiers(struct 
gl_shader_program *prog,
 * since we already know we're in the right type of shader program
 * for doing it.
 */
-   if (linked_shader->info.Comp.LocalSize[0] == 0) {
-  linker_error(prog, "compute shader didn't declare local size\n");
+   if (linked_shader->info.Comp.LocalSize[0] == 0 &&
+   !linked_shader->info.Comp.LocalSizeVariable) {
+  linker_error(prog, "compute shader must contain a fixed or a variable "
+ "local group size\n");
   return;
}
for (int i = 0; i < 3; i++)
   prog->Comp.LocalSize[i] = linked_shader->info.Comp.LocalSize[i];
+
+   prog->Comp.LocalSizeVariable =
+  linked_shader->info.Comp.LocalSizeVariable;
 }
 
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 05/14] glsl: reject compute shaders with fixed and variable local size

2016-09-26 Thread Samuel Pitoiset

The ARB_compute_variable_group_size specification explains that
when a compute shader includes both a fixed and a variable local
size, a compile-time error occurs.

v2: - update formatting spec quotations (Ian)

Signed-off-by: Samuel Pitoiset 
---
 src/compiler/glsl/ast_to_hir.cpp | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
index 9de8454..392da40 100644
--- a/src/compiler/glsl/ast_to_hir.cpp
+++ b/src/compiler/glsl/ast_to_hir.cpp
@@ -8041,6 +8041,20 @@ ast_cs_input_layout::hir(exec_list *instructions,
   }
}
 
+   /* The ARB_compute_variable_group_size spec says:
+*
+* If a compute shader including a *local_size_variable* qualifier also
+* declares a fixed local group size using the *local_size_x*,
+* *local_size_y*, or *local_size_z* qualifiers, a compile-time error
+* results
+*/
+   if (state->cs_input_local_size_variable_specified) {
+  _mesa_glsl_error(&loc, state,
+   "compute shader can't include both a variable and a "
+   "fixed local group size");
+  return NULL;
+   }
+
state->cs_input_local_size_specified = true;
for (int i = 0; i < 3; i++)
   state->cs_input_local_size[i] = qual_local_size[i];
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 07/14] glsl: add gl_LocalGroupSizeARB as a system value

2016-09-26 Thread Samuel Pitoiset

v2: - only add it if the ext is enabled (Ilia)

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ian Romanick 
---
 src/compiler/glsl/builtin_variables.cpp | 6 ++
 src/compiler/shader_enums.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index 36a8667..10a8750 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -1249,6 +1249,12 @@ builtin_variable_generator::generate_cs_special_vars()
 "gl_LocalInvocationID");
add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID");
add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, "gl_NumWorkGroups");
+
+   if (state->ARB_compute_variable_group_size_enable) {
+  add_system_value(SYSTEM_VALUE_LOCAL_GROUP_SIZE,
+   uvec3_t, "gl_LocalGroupSizeARB");
+   }
+
if (state->ctx->Const.LowerCsDerivedVariables) {
   add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
   add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h
index c3a62e0..b6e048e 100644
--- a/src/compiler/shader_enums.h
+++ b/src/compiler/shader_enums.h
@@ -472,6 +472,7 @@ typedef enum
SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
SYSTEM_VALUE_WORK_GROUP_ID,
SYSTEM_VALUE_NUM_WORK_GROUPS,
+   SYSTEM_VALUE_LOCAL_GROUP_SIZE,
/*@}*/
 
/**
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 13/14] nvc0: expose ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset

Let's return the same number of threads per block for both fixed and
variable sizes.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index df6c6af..6540c31 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -446,6 +446,7 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   }
case PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE:
   RET(((uint64_t []) { 1024, 1024, 64 }));
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
case PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK:
   RET((uint64_t []) { 1024 });
case PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE: /* g[] */
@@ -478,8 +479,6 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   RET((uint32_t []) { 512 }); /* FIXME: arbitrary limit */
case PIPE_COMPUTE_CAP_ADDRESS_BITS:
   RET((uint32_t []) { 64 });
-   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
-  RET((uint64_t []) { 0 });
default:
   return 0;
}
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 10/14] st/mesa: add support for dispatching a variable local size

2016-09-26 Thread Samuel Pitoiset

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Marek Olšák 
---
 src/mesa/state_tracker/st_cb_compute.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_compute.c 
b/src/mesa/state_tracker/st_cb_compute.c
index 88c1ee2..ccc5dc2 100644
--- a/src/mesa/state_tracker/st_cb_compute.c
+++ b/src/mesa/state_tracker/st_cb_compute.c
@@ -36,6 +36,7 @@
 
 static void st_dispatch_compute_common(struct gl_context *ctx,
const GLuint *num_groups,
+   const GLuint *group_size,
struct pipe_resource *indirect,
GLintptr indirect_offset)
 {
@@ -56,7 +57,7 @@ static void st_dispatch_compute_common(struct gl_context *ctx,
   st_validate_state(st, ST_PIPELINE_COMPUTE);
 
for (unsigned i = 0; i < 3; i++) {
-  info.block[i] = prog->Comp.LocalSize[i];
+  info.block[i] = group_size ? group_size[i] : prog->Comp.LocalSize[i];
   info.grid[i]  = num_groups ? num_groups[i] : 0;
}
 
@@ -71,7 +72,7 @@ static void st_dispatch_compute_common(struct gl_context *ctx,
 static void st_dispatch_compute(struct gl_context *ctx,
 const GLuint *num_groups)
 {
-   st_dispatch_compute_common(ctx, num_groups, NULL, 0);
+   st_dispatch_compute_common(ctx, num_groups, NULL, NULL, 0);
 }
 
 static void st_dispatch_compute_indirect(struct gl_context *ctx,
@@ -80,11 +81,19 @@ static void st_dispatch_compute_indirect(struct gl_context 
*ctx,
struct gl_buffer_object *indirect_buffer = ctx->DispatchIndirectBuffer;
struct pipe_resource *indirect = st_buffer_object(indirect_buffer)->buffer;
 
-   st_dispatch_compute_common(ctx, NULL, indirect, indirect_offset);
+   st_dispatch_compute_common(ctx, NULL, NULL, indirect, indirect_offset);
+}
+
+static void st_dispatch_compute_group_size(struct gl_context *ctx,
+   const GLuint *num_groups,
+   const GLuint *group_size)
+{
+   st_dispatch_compute_common(ctx, num_groups, group_size, NULL, 0);
 }
 
 void st_init_compute_functions(struct dd_function_table *functions)
 {
functions->DispatchCompute = st_dispatch_compute;
functions->DispatchComputeIndirect = st_dispatch_compute_indirect;
+   functions->DispatchComputeGroupSize = st_dispatch_compute_group_size;
 }
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 03/14] glsl: add enable flags for ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset

This also initializes the default values for the standalone compiler.

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ian Romanick 
---
 src/compiler/glsl/glsl_parser_extras.cpp | 1 +
 src/compiler/glsl/glsl_parser_extras.h   | 2 ++
 src/compiler/glsl/standalone.cpp | 4 
 src/compiler/glsl/standalone_scaffolding.cpp | 5 +
 4 files changed, 12 insertions(+)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index b108afd..eff5235 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -590,6 +590,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(ARB_ES3_2_compatibility),
EXT(ARB_arrays_of_arrays),
EXT(ARB_compute_shader),
+   EXT(ARB_compute_variable_group_size),
EXT(ARB_conservative_depth),
EXT(ARB_cull_distance),
EXT(ARB_derivative_control),
diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index f4050e3..7528df7 100644
--- a/src/compiler/glsl/glsl_parser_extras.h
+++ b/src/compiler/glsl/glsl_parser_extras.h
@@ -576,6 +576,8 @@ struct _mesa_glsl_parse_state {
bool ARB_arrays_of_arrays_warn;
bool ARB_compute_shader_enable;
bool ARB_compute_shader_warn;
+   bool ARB_compute_variable_group_size_enable;
+   bool ARB_compute_variable_group_size_warn;
bool ARB_conservative_depth_enable;
bool ARB_conservative_depth_warn;
bool ARB_cull_distance_enable;
diff --git a/src/compiler/glsl/standalone.cpp b/src/compiler/glsl/standalone.cpp
index d6e6829..90847a9 100644
--- a/src/compiler/glsl/standalone.cpp
+++ b/src/compiler/glsl/standalone.cpp
@@ -58,6 +58,10 @@ initialize_context(struct gl_context *ctx, gl_api api)
ctx->Const.MaxComputeWorkGroupSize[2] = 64;
ctx->Const.MaxComputeWorkGroupInvocations = 1024;
ctx->Const.MaxComputeSharedMemorySize = 32768;
+   ctx->Const.MaxComputeVariableGroupSize[0] = 512;
+   ctx->Const.MaxComputeVariableGroupSize[1] = 512;
+   ctx->Const.MaxComputeVariableGroupSize[2] = 64;
+   ctx->Const.MaxComputeVariableGroupInvocations = 512;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents = 1024;
diff --git a/src/compiler/glsl/standalone_scaffolding.cpp 
b/src/compiler/glsl/standalone_scaffolding.cpp
index b0fb4b7..decff5f 100644
--- a/src/compiler/glsl/standalone_scaffolding.cpp
+++ b/src/compiler/glsl/standalone_scaffolding.cpp
@@ -144,6 +144,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, 
gl_api api)
ctx->Extensions.dummy_false = false;
ctx->Extensions.dummy_true = true;
ctx->Extensions.ARB_compute_shader = true;
+   ctx->Extensions.ARB_compute_variable_group_size = true;
ctx->Extensions.ARB_conservative_depth = true;
ctx->Extensions.ARB_draw_instanced = true;
ctx->Extensions.ARB_ES2_compatibility = true;
@@ -207,6 +208,10 @@ void initialize_context_to_defaults(struct gl_context 
*ctx, gl_api api)
ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
ctx->Const.MaxComputeWorkGroupSize[2] = 64;
ctx->Const.MaxComputeWorkGroupInvocations = 1024;
+   ctx->Const.MaxComputeVariableGroupSize[0] = 512;
+   ctx->Const.MaxComputeVariableGroupSize[1] = 512;
+   ctx->Const.MaxComputeVariableGroupSize[2] = 64;
+   ctx->Const.MaxComputeVariableGroupInvocations = 512;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /* not used 
*/
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 11/14] st/mesa: expose ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset

This extension is only exposed if the underlying driver supports
ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK
is set.

v3: - initialize max_variable_threads_per_block to 0
v2: - expose the ext based on that new cap

Signed-off-by: Samuel Pitoiset 
---
 src/mesa/state_tracker/st_extensions.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index ef17aba..024dba8 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -1196,6 +1196,28 @@ void st_init_extensions(struct pipe_screen *screen,
  extensions->ARB_compute_shader =
   extensions->ARB_shader_image_load_store 
&&
   extensions->ARB_shader_atomic_counters;
+
+ if (extensions->ARB_compute_shader) {
+uint64_t max_variable_threads_per_block = 0;
+
+screen->get_compute_param(screen, PIPE_SHADER_IR_TGSI,
+  
PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK,
+  &max_variable_threads_per_block);
+
+for (i = 0; i < 3; i++) {
+   /* Clamp the values to avoid having a local work group size
+* greater than the maximum number of invocations.
+*/
+   consts->MaxComputeVariableGroupSize[i] =
+  MIN2(consts->MaxComputeWorkGroupSize[i],
+   max_variable_threads_per_block);
+}
+consts->MaxComputeVariableGroupInvocations =
+   max_variable_threads_per_block;
+
+extensions->ARB_compute_variable_group_size =
+   max_variable_threads_per_block > 0;
+ }
   }
}
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 14/14] docs: mark ARB_compute_variable_group_size as done for nvc0

2016-09-26 Thread Samuel Pitoiset

Signed-off-by: Samuel Pitoiset 
---
 docs/features.txt | 2 +-
 docs/relnotes/12.1.0.html | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/features.txt b/docs/features.txt
index fbb3952..6cc429a 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -279,7 +279,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
 
   GL_ARB_bindless_texture   started (airlied)
   GL_ARB_cl_event   not started
-  GL_ARB_compute_variable_group_sizenot started
+  GL_ARB_compute_variable_group_sizeDONE (nvc0)
   GL_ARB_ES3_2_compatibilityDONE (i965/gen8+)
   GL_ARB_fragment_shader_interlock  not started
   GL_ARB_gl_spirv   not started
diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html
index cdd8909..01ba057 100644
--- a/docs/relnotes/12.1.0.html
+++ b/docs/relnotes/12.1.0.html
@@ -49,6 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_ES3_1_compatibility on i965
 GL_ARB_ES3_2_compatibility on i965/gen8+
 GL_ARB_clear_texture on r600, radeonsi
+GL_ARB_compute_variable_group_size on nvc0
 GL_ARB_cull_distance on radeonsi
 GL_ARB_enhanced_layouts on i965
 GL_ARB_indirect_parameters on radeonsi
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 09/14] st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZE

2016-09-26 Thread Samuel Pitoiset

gl_LocalGroupSizeARB can be translated into TGSI_SEMANTIC_BLOCK_SIZE
which represents the block size in threads.

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Marek Olšák 
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 507a782..429f4b0 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5235,6 +5235,8 @@ _mesa_sysval_to_semantic(unsigned sysval)
   return TGSI_SEMANTIC_BLOCK_ID;
case SYSTEM_VALUE_NUM_WORK_GROUPS:
   return TGSI_SEMANTIC_GRID_SIZE;
+   case SYSTEM_VALUE_LOCAL_GROUP_SIZE:
+  return TGSI_SEMANTIC_BLOCK_SIZE;
 
/* Unhandled */
case SYSTEM_VALUE_LOCAL_INVOCATION_INDEX:
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 08/14] gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK

2016-09-26 Thread Samuel Pitoiset

v3: - use a new case statement in r600_pipe_common.c
- fix compilation of softpipe...

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/docs/source/screen.rst | 4 
 src/gallium/drivers/ilo/ilo_screen.c   | 2 ++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 ++
 src/gallium/drivers/radeon/r600_pipe_common.c  | 2 ++
 src/gallium/drivers/softpipe/sp_screen.c   | 1 +
 src/gallium/include/pipe/p_defines.h   | 3 ++-
 7 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 5dff650..cfc0a1b 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -498,6 +498,10 @@ pipe_screen::get_compute_param.
   threads. Also known as wavefront size, warp size or SIMD width.
 * ``PIPE_COMPUTE_CAP_ADDRESS_BITS``: The default compute device address space
   size specified as an unsigned integer value in bits.
+* ``PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK``: Maximum variable number
+  of threads that a single block can contain. This is similar to
+  PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK, except that the variable size is not
+  known a compile-time but at dispatch-time.
 
 .. _pipe_bind:
 
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index b9e5ad6..85357fa 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -303,6 +303,8 @@ ilo_get_compute_param(struct pipe_screen *screen,
   ptr = &val.subgroup_size;
   size = sizeof(val.subgroup_size);
   break;
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
+  /* fallthrough */
default:
   ptr = NULL;
   size = 0;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 1ec791d..6eb18ea 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -418,6 +418,8 @@ nv50_screen_get_compute_param(struct pipe_screen *pscreen,
   RET((uint32_t []) { 512 }); /* FIXME: arbitrary limit */
case PIPE_COMPUTE_CAP_ADDRESS_BITS:
   RET((uint32_t []) { 32 });
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
+  RET((uint64_t []) { 0 });
default:
   return 0;
}
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 1757cbb..df6c6af 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -478,6 +478,8 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   RET((uint32_t []) { 512 }); /* FIXME: arbitrary limit */
case PIPE_COMPUTE_CAP_ADDRESS_BITS:
   RET((uint32_t []) { 64 });
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
+  RET((uint64_t []) { 0 });
default:
   return 0;
}
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index b0d9813..61b1b0e 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -966,6 +966,8 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
*subgroup_size = r600_wavefront_size(rscreen->family);
}
return sizeof(uint32_t);
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
+   return 0;
}
 
 fprintf(stderr, "unknown PIPE_COMPUTE_CAP %d\n", param);
diff --git a/src/gallium/drivers/softpipe/sp_screen.c 
b/src/gallium/drivers/softpipe/sp_screen.c
index cd4269f..6ffc777 100644
--- a/src/gallium/drivers/softpipe/sp_screen.c
+++ b/src/gallium/drivers/softpipe/sp_screen.c
@@ -522,6 +522,7 @@ softpipe_get_compute_param(struct pipe_screen *_screen,
case PIPE_COMPUTE_CAP_IMAGES_SUPPORTED:
case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
case PIPE_COMPUTE_CAP_ADDRESS_BITS:
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
   break;
}
return 0;
diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 88aa050..655995e 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -847,7 +847,8 @@ enum pipe_compute_cap
PIPE_COMPUTE_CAP_MAX_CLOCK_FREQUENCY,
PIPE_COMPUTE_CAP_MAX_COMPUTE_UNITS,
PIPE_COMPUTE_CAP_IMAGES_SUPPORTED,
-   PIPE_COMPUTE_CAP_SUBGROUP_SIZE
+   PIPE_COMPUTE_CAP_SUBGROUP_SIZE,
+   PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK,
 };
 
 /**
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 12/14] nv50/ir: use 1024 threads/block for variable local size

2016-09-26 Thread Samuel Pitoiset

When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
index 4a701f7..0bb14ec 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
@@ -174,7 +174,8 @@ public:
virtual void getBuiltinCode(const uint32_t **code, uint32_t *size) const = 
0;
 
virtual void parseDriverInfo(const struct nv50_ir_prog_info *info) {
-  threads = info->prop.cp.numThreads;
+  threads =
+ info->prop.cp.numThreads == 0 ? 1024 : info->prop.cp.numThreads;
}
 
virtual bool runLegalizePass(Program *, CGStage stage) const = 0;
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 13/14] nvc0: expose ARB_compute_variable_group_size

2016-09-26 Thread Ilia Mirkin

FWIW this limits it to 32 regs on Fermi. IMO that's pretty limiting,
esp given how shitty our RA is. I think we should do 512 for Fermi and
1024 for Kepler+. [A matching adjustment will be needed in codegen.]

On Mon, Sep 26, 2016 at 1:23 PM, Samuel Pitoiset
 wrote:
> Let's return the same number of threads per block for both fixed and
> variable sizes.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index df6c6af..6540c31 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -446,6 +446,7 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
>}
> case PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE:
>RET(((uint64_t []) { 1024, 1024, 64 }));
> +   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
> case PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK:
>RET((uint64_t []) { 1024 });
> case PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE: /* g[] */
> @@ -478,8 +479,6 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
>RET((uint32_t []) { 512 }); /* FIXME: arbitrary limit */
> case PIPE_COMPUTE_CAP_ADDRESS_BITS:
>RET((uint32_t []) { 64 });
> -   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
> -  RET((uint64_t []) { 0 });
> default:
>return 0;
> }
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2] anv/blorp: Handle zero width/height blits in blorp_copy()

2016-09-26 Thread Anuj Phogat

V2: Move the check from copy_buffer_to_image() to blorp_copy(). (Nanley)

Signed-off-by: Anuj Phogat 
Cc: Nanley Chery 
---
 src/intel/blorp/blorp_blit.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index af46389..0c3ee72 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1838,8 +1838,11 @@ blorp_copy(struct blorp_batch *batch,
uint32_t src_width, uint32_t src_height)
 {
struct blorp_params params;
-   blorp_params_init(¶ms);
 
+   if (src_width == 0 || src_height == 0)
+  return;
+
+   blorp_params_init(¶ms);
brw_blorp_surface_info_init(batch->blorp, ¶ms.src, src_surf, src_level,
src_layer, ISL_FORMAT_UNSUPPORTED, false);
brw_blorp_surface_info_init(batch->blorp, ¶ms.dst, dst_surf, dst_level,
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V2] anv/blorp: Handle zero width/height blits in blorp_copy()

2016-09-26 Thread Nanley Chery

On Mon, Sep 26, 2016 at 10:22:43AM -0700, Anuj Phogat wrote:
> V2: Move the check from copy_buffer_to_image() to blorp_copy(). (Nanley)
> 
> Signed-off-by: Anuj Phogat 
> Cc: Nanley Chery 
> ---

This patch is
Reviewed-by: Nanley Chery 

>  src/intel/blorp/blorp_blit.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
> index af46389..0c3ee72 100644
> --- a/src/intel/blorp/blorp_blit.c
> +++ b/src/intel/blorp/blorp_blit.c
> @@ -1838,8 +1838,11 @@ blorp_copy(struct blorp_batch *batch,
> uint32_t src_width, uint32_t src_height)
>  {
> struct blorp_params params;
> -   blorp_params_init(¶ms);
>  
> +   if (src_width == 0 || src_height == 0)
> +  return;
> +
> +   blorp_params_init(¶ms);
> brw_blorp_surface_info_init(batch->blorp, ¶ms.src, src_surf, 
> src_level,
> src_layer, ISL_FORMAT_UNSUPPORTED, false);
> brw_blorp_surface_info_init(batch->blorp, ¶ms.dst, dst_surf, 
> dst_level,
> -- 
> 2.5.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2016-09-26 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #21 from Silvan Jegen  ---
(In reply to Eero Tamminen from comment #20)
> Best would be to do (e.g. from SSH console):
>   # perf record -a
>   ^C
> 
> During the game freeze.

I have a dual screen setup and ran 'perf record -a' on a terminal on the screen
that the game was not running on. I captured events for a short amount of time
during which the game was mostly stalling/freezing.


> And provide profile report here:
>   # perf report
> 
> (+ install debug symbols for anything that misses symbols in the report.)

I uploaded the perf report.

Most of the CPU time seems to be spent in RocketLeague itself for which there
are no symbols available.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2016-09-26 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #22 from Silvan Jegen  ---
Created attachment 126796
  --> https://bugs.freedesktop.org/attachment.cgi?id=126796&action=edit
perf report of RocketLeague stalling/freezing

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/13] anv: Use blorp for VkCmdFillBuffer

2016-09-26 Thread Nanley Chery

On Sun, Sep 25, 2016 at 09:59:00AM -0700, Jason Ekstrand wrote:
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/vulkan/anv_blorp.c  | 106 +
>  src/intel/vulkan/anv_meta_clear.c | 120 
> --
>  2 files changed, 96 insertions(+), 130 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index cb61070..f5a6c40 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -480,6 +480,20 @@ void anv_CmdBlitImage(
> blorp_batch_finish(&batch);
>  }
>  
> +static enum isl_format
> +isl_format_for_size(unsigned size_B)
> +{
> +   switch (size_B) {
> +   case 1:  return ISL_FORMAT_R8_UINT;
> +   case 2:  return ISL_FORMAT_R8G8_UINT;
> +   case 4:  return ISL_FORMAT_R8G8B8A8_UINT;
> +   case 8:  return ISL_FORMAT_R16G16B16A16_UINT;
> +   case 16: return ISL_FORMAT_R32G32B32A32_UINT;
> +   default:
> +  unreachable("Not a power-of-two format size");
> +   }
> +}
> +
>  static void
>  do_buffer_copy(struct blorp_batch *batch,
> struct anv_bo *src, uint64_t src_offset,
> @@ -491,16 +505,7 @@ do_buffer_copy(struct blorp_batch *batch,
> /* The actual format we pick doesn't matter as blorp will throw it away.
>  * The only thing that actually matters is the size.
>  */
> -   enum isl_format format;
> -   switch (block_size) {
> -   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> -   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> -   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> -   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> -   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> -   default:
> -  unreachable("Not a power-of-two format size");
> -   }
> +   enum isl_format format = isl_format_for_size(block_size);
>  
> struct isl_surf surf;
> isl_surf_init(&device->isl_dev, &surf,
> @@ -667,6 +672,87 @@ void anv_CmdUpdateBuffer(
> blorp_batch_finish(&batch);
>  }
>  
> +void anv_CmdFillBuffer(
> +VkCommandBuffer commandBuffer,
> +VkBufferdstBuffer,
> +VkDeviceSizedstOffset,
> +VkDeviceSizefillSize,
> +uint32_tdata)
> +{
> +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> +   struct blorp_surf surf;
> +   struct isl_surf isl_surf;
> +
> +   struct blorp_batch batch;
> +   blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer);
> +
> +   if (fillSize == VK_WHOLE_SIZE) {
> +  fillSize = dst_buffer->size - dstOffset;
> +  /* Make sure fillSize is a multiple of 4 */
> +  fillSize &= ~3ull;
> +   }
> +
> +   /* First, we compute the biggest format that can be used with the
> +* given offsets and size.
> +*/
> +   int bs = 16;
> +   bs = gcd_pow2_u64(bs, dstOffset);
> +   bs = gcd_pow2_u64(bs, fillSize);
> +   enum isl_format isl_format = isl_format_for_size(bs);
> +
> +   union isl_color_value color = {
> +  .u32 = { data, data, data, data },
> +   };
> +
> +   const uint64_t max_fill_size = MAX_SURFACE_DIM * MAX_SURFACE_DIM * bs;
> +   while (fillSize > max_fill_size) {
  ^
  This should be '>='.

> +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> +dst_buffer, dstOffset,
> +MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> +MAX_SURFACE_DIM * bs, isl_format,
> +&surf, &isl_surf);
> +
> +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> +  color, NULL);
> +  fillSize -= max_fill_size;
> +  dstOffset += max_fill_size;
> +   }
> +
> +   uint64_t height = fillSize / (MAX_SURFACE_DIM * bs);
> +   assert(height < MAX_SURFACE_DIM);
> +   if (height != 0) {
> +  const uint64_t rect_fill_size = height * MAX_SURFACE_DIM * bs;
> +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> +dst_buffer, dstOffset,
> +MAX_SURFACE_DIM, height,
> +MAX_SURFACE_DIM * bs, isl_format,
> +&surf, &isl_surf);
> +
> +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, height,
> +  color, NULL);
> +  fillSize -= rect_fill_size;
> +  dstOffset += rect_fill_size;
> +   }
> +
> +   if (fillSize != 0) {
> +  const uint32_t width = fillSize / bs;
> +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> +dst_buffer, dstOffset,
> +

Re: [Mesa-dev] [PATCH v3 13/14] nvc0: expose ARB_compute_variable_group_size

2016-09-26 Thread Samuel Pitoiset




On 09/26/2016 07:27 PM, Ilia Mirkin wrote:

FWIW this limits it to 32 regs on Fermi. IMO that's pretty limiting,
esp given how shitty our RA is. I think we should do 512 for Fermi and
1024 for Kepler+. [A matching adjustment will be needed in codegen.]


Yep, I will improve it, but this can be done just after the mesa/gallium 
bits are upstream. :)




On Mon, Sep 26, 2016 at 1:23 PM, Samuel Pitoiset
 wrote:

Let's return the same number of threads per block for both fixed and
variable sizes.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index df6c6af..6540c31 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -446,6 +446,7 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   }
case PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE:
   RET(((uint64_t []) { 1024, 1024, 64 }));
+   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
case PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK:
   RET((uint64_t []) { 1024 });
case PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE: /* g[] */
@@ -478,8 +479,6 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   RET((uint32_t []) { 512 }); /* FIXME: arbitrary limit */
case PIPE_COMPUTE_CAP_ADDRESS_BITS:
   RET((uint32_t []) { 64 });
-   case PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK:
-  RET((uint64_t []) { 0 });
default:
   return 0;
}
--
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Only emit 1 viewport when possible.

2016-09-26 Thread Kenneth Graunke

In core profile, we support up to 16 viewports.  However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.

Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor.  This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.

This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case.  The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.

According to Eric Anholt, this reduces the CPU overhead of scissor and
viewport state changes n Glamor from 2.5% or so to .8% or so.

Cc: Eric Anholt 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_cc.c  | 10 +++---
 src/mesa/drivers/dri/i965/brw_context.c |  1 +
 src/mesa/drivers/dri/i965/brw_context.h |  9 +
 src/mesa/drivers/dri/i965/brw_gs_state.c|  6 --
 src/mesa/drivers/dri/i965/brw_state_upload.c| 11 +++
 src/mesa/drivers/dri/i965/gen6_clip_state.c | 16 +++-
 src/mesa/drivers/dri/i965/gen6_scissor_state.c  | 10 +++---
 src/mesa/drivers/dri/i965/gen6_viewport_state.c | 22 +++---
 src/mesa/drivers/dri/i965/gen7_viewport_state.c | 10 +++---
 src/mesa/drivers/dri/i965/gen8_viewport_state.c | 10 +++---
 10 files changed, 75 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cc.c 
b/src/mesa/drivers/dri/i965/brw_cc.c
index 5c58b44..b11d7c8 100644
--- a/src/mesa/drivers/dri/i965/brw_cc.c
+++ b/src/mesa/drivers/dri/i965/brw_cc.c
@@ -44,12 +44,15 @@ brw_upload_cc_vp(struct brw_context *brw)
struct gl_context *ctx = &brw->ctx;
struct brw_cc_viewport *ccv;
 
+   /* BRW_NEW_VIEWPORT_COUNT */
+   const unsigned viewport_count = brw->clip.viewport_count;
+
ccv = brw_state_batch(brw, AUB_TRACE_CC_VP_STATE,
-sizeof(*ccv) * ctx->Const.MaxViewports, 32,
+sizeof(*ccv) * viewport_count, 32,
  &brw->cc.vp_offset);
 
/* _NEW_TRANSFORM */
-   for (unsigned i = 0; i < ctx->Const.MaxViewports; i++) {
+   for (unsigned i = 0; i < viewport_count; i++) {
   if (ctx->Transform.DepthClamp) {
  /* _NEW_VIEWPORT */
  ccv[i].min_depth = MIN2(ctx->ViewportArray[i].Near,
@@ -77,7 +80,8 @@ const struct brw_tracked_state brw_cc_vp = {
   .mesa = _NEW_TRANSFORM |
   _NEW_VIEWPORT,
   .brw = BRW_NEW_BATCH |
- BRW_NEW_BLORP,
+ BRW_NEW_BLORP |
+ BRW_NEW_VIEWPORT_COUNT,
},
.emit = brw_upload_cc_vp
 };
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 6efad78..b0eec16 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1085,6 +1085,7 @@ brwCreateContext(gl_api api,
brw->prim_restart.enable_cut_index = false;
brw->gs.enabled = false;
brw->sf.viewport_transform_enable = true;
+   brw->clip.viewport_count = 1;
 
brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 00f0adc..b27fe51 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -226,6 +226,7 @@ enum brw_state_id {
BRW_STATE_URB_SIZE,
BRW_STATE_CC_STATE,
BRW_STATE_BLORP,
+   BRW_STATE_VIEWPORT_COUNT,
BRW_NUM_STATE_BITS
 };
 
@@ -294,6 +295,7 @@ enum brw_state_id {
 #define BRW_NEW_PROGRAM_CACHE   (1ull << BRW_STATE_PROGRAM_CACHE)
 #define BRW_NEW_STATE_BASE_ADDRESS  (1ull << BRW_STATE_STATE_BASE_ADDRESS)
 #define BRW_NEW_VUE_MAP_GEOM_OUT(1ull << BRW_STATE_VUE_MAP_GEOM_OUT)
+#define BRW_NEW_VIEWPORT_COUNT  (1ull << BRW_STATE_VIEWPORT_COUNT)
 #define BRW_NEW_TRANSFORM_FEEDBACK  (1ull << BRW_STATE_TRANSFORM_FEEDBACK)
 #define BRW_NEW_RASTERIZER_DISCARD  (1ull << BRW_STATE_RASTERIZER_DISCARD)
 #define BRW_NEW_STATS_WM(1ull << BRW_STATE_STATS_WM)
@@ -1160,6 +1162,13 @@ struct brw_context
* instead of vp_bo.
*/
   uint32_t vp_offset;
+
+  /**
+   * The number of viewports to use.  If gl_ViewportIndex is written,
+   * we can have up to ctx->Const.MaxViewports viewports.  If not,
+   * the viewport index is always 0, so we can only emit one.
+   */
+  uint8_t viewport_count;
} clip;
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_gs_state.c 
b/src/mesa/drivers/dri/i965/brw_gs_state.c
index 1757201..8e3bf1e 100644
--- a/src/mesa/drivers/dri/i965/brw_gs_state.c
+++ b/s

Re: [Mesa-dev] [PATCH 02/13] anv/meta: Roll clear_image into CmdClearDepthStencilImage

2016-09-26 Thread Nanley Chery

On Sun, Sep 25, 2016 at 09:59:01AM -0700, Jason Ekstrand wrote:
> It is now the only caller so there's no sense in keeping things split out.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/vulkan/anv_meta_clear.c | 84 
> +--
>  1 file changed, 28 insertions(+), 56 deletions(-)

This patch is
Reviewed-by: Nanley Chery 

> 
> diff --git a/src/intel/vulkan/anv_meta_clear.c 
> b/src/intel/vulkan/anv_meta_clear.c
> index 5579454..11b471f 100644
> --- a/src/intel/vulkan/anv_meta_clear.c
> +++ b/src/intel/vulkan/anv_meta_clear.c
> @@ -752,28 +752,24 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer 
> *cmd_buffer)
> meta_clear_end(&saved_state, cmd_buffer);
>  }
>  
> -static void
> -anv_cmd_clear_image(struct anv_cmd_buffer *cmd_buffer,
> -struct anv_image *image,
> -VkImageLayout image_layout,
> -VkClearValue clear_value,
> -uint32_t range_count,
> -const VkImageSubresourceRange *ranges)
> +void anv_CmdClearDepthStencilImage(
> +VkCommandBuffer commandBuffer,
> +VkImage image_h,
> +VkImageLayout   imageLayout,
> +const VkClearDepthStencilValue* pDepthStencil,
> +uint32_trangeCount,
> +const VkImageSubresourceRange*  pRanges)
>  {
> -   VkDevice device_h = anv_device_to_handle(cmd_buffer->device);
> +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> +   ANV_FROM_HANDLE(anv_image, image, image_h);
> +   struct anv_meta_saved_state saved_state;
>  
> -   VkFormat vk_format = image->vk_format;
> -   if (vk_format == VK_FORMAT_E5B9G9R9_UFLOAT_PACK32) {
> -  /* We can't actually render to this format so we have to work around it
> -   * by manually unpacking and using R32_UINT.
> -   */
> -  clear_value.color.uint32[0] =
> - float3_to_rgb9e5(clear_value.color.float32);
> -  vk_format = VK_FORMAT_R32_UINT;
> -   }
> +   meta_clear_begin(&saved_state, cmd_buffer);
> +
> +   VkDevice device_h = anv_device_to_handle(cmd_buffer->device);
>  
> -   for (uint32_t r = 0; r < range_count; r++) {
> -  const VkImageSubresourceRange *range = &ranges[r];
> +   for (uint32_t r = 0; r < rangeCount; r++) {
> +  const VkImageSubresourceRange *range = &pRanges[r];
>for (uint32_t l = 0; l < anv_get_levelCount(image, range); ++l) {
>   const uint32_t layer_count = image->type == VK_IMAGE_TYPE_3D ?
>anv_minify(image->extent.depth, l) :
> @@ -785,7 +781,7 @@ anv_cmd_clear_image(struct anv_cmd_buffer *cmd_buffer,
>.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
>.image = anv_image_to_handle(image),
>.viewType = anv_meta_get_view_type(image),
> -  .format = vk_format,
> +  .format = image->vk_format,
>.subresourceRange = {
>   .aspectMask = range->aspectMask,
>   .baseMipLevel = range->baseMipLevel + l,
> @@ -812,13 +808,18 @@ anv_cmd_clear_image(struct anv_cmd_buffer *cmd_buffer,
> &fb);
>  
>  VkAttachmentDescription att_desc = {
> -   .format = vk_format,
> +   .format = image->vk_format,
> .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
> .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
> .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
> .stencilStoreOp = VK_ATTACHMENT_STORE_OP_STORE,
> -   .initialLayout = image_layout,
> -   .finalLayout = image_layout,
> +   .initialLayout = imageLayout,
> +   .finalLayout = imageLayout,
> +};
> +
> +const VkAttachmentReference att_ref = {
> +   .attachment = 0,
> +   .layout = imageLayout,
>  };
>  
>  VkSubpassDescription subpass_desc = {
> @@ -827,23 +828,11 @@ anv_cmd_clear_image(struct anv_cmd_buffer *cmd_buffer,
> .colorAttachmentCount = 0,
> .pColorAttachments = NULL,
> .pResolveAttachments = NULL,
> -   .pDepthStencilAttachment = NULL,
> +   .pDepthStencilAttachment = &att_ref,
> .preserveAttachmentCount = 0,
> .pPreserveAttachments = NULL,
>  };
>  
> -const VkAttachmentReference att_ref = {
> -   .attachment = 0,
> -   .layout = image_layout,
> -};
> -
> -if (range->aspectMask & VK_IMAGE_ASPECT_COLOR_BIT) {
> -   subpass_desc.colorAttachmentCount = 1;
> -   subpass_desc.pColorAttachments = &att_ref;
> -} else {
> -   subpass_desc.pDepthStencilAttachment = &att_ref;
> -}

[Mesa-dev] [PATCH] intel/blorp_blit: Simplify uncompressed level0 extent assignment

2016-09-26 Thread Nanley Chery

These values are the same. Avoid the extra computation.

Signed-off-by: Nanley Chery 
---
 src/intel/blorp/blorp_blit.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index af46389..1c878e8 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1774,15 +1774,12 @@ surf_convert_to_uncompressed(const struct isl_device 
*isl_dev,
*x /= fmtl->bw;
*y /= fmtl->bh;
 
-   info->surf.logical_level0_px.width =
-  DIV_ROUND_UP(info->surf.logical_level0_px.width, fmtl->bw);
-   info->surf.logical_level0_px.height =
-  DIV_ROUND_UP(info->surf.logical_level0_px.height, fmtl->bh);
-
assert(info->surf.phys_level0_sa.width % fmtl->bw == 0);
assert(info->surf.phys_level0_sa.height % fmtl->bh == 0);
info->surf.phys_level0_sa.width /= fmtl->bw;
info->surf.phys_level0_sa.height /= fmtl->bh;
+   info->surf.logical_level0_px.width = info->surf.phys_level0_sa.width;
+   info->surf.logical_level0_px.height = info->surf.phys_level0_sa.height;
 
assert(info->tile_x_sa % fmtl->bw == 0);
assert(info->tile_y_sa % fmtl->bh == 0);
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] glsl: move some uniform linking code to new link_setup_uniform_remap_tables()

2016-09-26 Thread Kenneth Graunke

On Sunday, September 25, 2016 10:50:24 PM PDT Timothy Arceri wrote:
> This makes link_assign_uniform_locations() easier to follow.
> ---
>  src/compiler/glsl/link_uniforms.cpp | 330 
> +++-
>  src/compiler/glsl/linker.cpp|   4 +-
>  src/compiler/glsl/linker.h  |   5 +-
>  3 files changed, 177 insertions(+), 162 deletions(-)
> 
> diff --git a/src/compiler/glsl/link_uniforms.cpp 
> b/src/compiler/glsl/link_uniforms.cpp
> index 4d3fc6d..11204fc 100644
> --- a/src/compiler/glsl/link_uniforms.cpp
> +++ b/src/compiler/glsl/link_uniforms.cpp
> @@ -997,12 +997,168 @@ find_empty_block(struct gl_shader_program *prog,
> return -1;
>  }
>  
> +static void
> +link_setup_uniform_remap_tables(struct gl_context *ctx,

   const struct gl_context *ctx,

> +struct gl_shader_program *prog,
> +unsigned num_explicit_uniform_locs)
> +{
[snip]
> +}
>  void
>  link_assign_uniform_locations(struct gl_shader_program *prog,
> -  unsigned int boolean_true,
> -  unsigned int num_explicit_uniform_locs,
> -  unsigned int max_uniform_locs)
> +  struct gl_context *ctx,

 const struct gl_context *ctx,

> +  unsigned int num_explicit_uniform_locs)
>  {
> +   unsigned int boolean_true = ctx->Const.UniformBooleanTrue;
> +
> ralloc_free(prog->UniformStorage);
> prog->UniformStorage = NULL;
> prog->NumUniformStorage = 0;
> @@ -1067,7 +1223,7 @@ link_assign_uniform_locations(struct gl_shader_program 
> *prog,
>}
> }
>  
> -   const unsigned num_uniforms = uniform_size.num_active_uniforms;
> +   prog->NumUniformStorage = uniform_size.num_active_uniforms;

You can delete the earlier prog->NumUniformStorage = 0 if you like.
Then again, it isn't hurting anything, either.

Patches 1-2 are:
Reviewed-by: Kenneth Graunke 

I don't understand patches 3-4.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: drop copy of NumImages

2016-09-26 Thread Jason Ekstrand

Not a big fan. This makes the prog_data structures less self-contained.
You shouldn't have to look up an almost unrelated structure in order to
figure out how big this one is.  Also, I've been trying to move us in the
direction of *more* stuff in prog_data, not less, so that we aren't looking
up the GL data structures in state setup any more than we have to.

--Jason

On Sep 26, 2016 7:31 AM, "Lionel Landwerlin"  wrote:

> We can access this value through gl_shader_program.
>
> Signed-off-by: Lionel Landwerlin 
> Cc: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/brw_compiler.h  | 1 -
>  src/mesa/drivers/dri/i965/brw_cs.c| 1 -
>  src/mesa/drivers/dri/i965/brw_gs.c| 1 -
>  src/mesa/drivers/dri/i965/brw_tcs.c   | 1 -
>  src/mesa/drivers/dri/i965/brw_tes.c   | 1 -
>  src/mesa/drivers/dri/i965/brw_vs.c| 5 +
>  src/mesa/drivers/dri/i965/brw_wm.c| 4 +---
>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 5 -
>  8 files changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index 445c166..437528b 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -344,7 +344,6 @@ struct brw_stage_prog_data {
>
> GLuint nr_params;   /**< number of float params/constants */
> GLuint nr_pull_params;
> -   unsigned nr_image_params;
>
> unsigned curb_read_length;
> unsigned total_scratch;
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.c
> b/src/mesa/drivers/dri/i965/brw_cs.c
> index 4e746fe..febf53a 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.c
> +++ b/src/mesa/drivers/dri/i965/brw_cs.c
> @@ -106,7 +106,6 @@ brw_codegen_cs_prog(struct brw_context *brw,
> prog_data.base.image_param =
>rzalloc_array(NULL, struct brw_image_param, cs->base.NumImages);
> prog_data.base.nr_params = param_count;
> -   prog_data.base.nr_image_params = cs->base.NumImages;
>
> brw_nir_setup_glsl_uniforms(cp->program.Base.nir, prog,
> &cp->program.Base,
> &prog_data.base, true);
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index 741216c..486416a 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -128,7 +128,6 @@ brw_codegen_gs_prog(struct brw_context *brw,
> prog_data.base.base.image_param =
>rzalloc_array(NULL, struct brw_image_param, gs->NumImages);
> prog_data.base.base.nr_params = param_count;
> -   prog_data.base.base.nr_image_params = gs->NumImages;
>
> brw_nir_setup_glsl_uniforms(gp->program.Base.nir, prog,
> &gp->program.Base,
> &prog_data.base.base,
> diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c
> b/src/mesa/drivers/dri/i965/brw_tcs.c
> index 7e6c69a..88df595 100644
> --- a/src/mesa/drivers/dri/i965/brw_tcs.c
> +++ b/src/mesa/drivers/dri/i965/brw_tcs.c
> @@ -216,7 +216,6 @@ brw_codegen_tcs_prog(struct brw_context *brw,
>
>prog_data.base.base.image_param =
>   rzalloc_array(NULL, struct brw_image_param, tcs->NumImages);
> -  prog_data.base.base.nr_image_params = tcs->NumImages;
>
>brw_nir_setup_glsl_uniforms(nir, shader_prog, &tcp->program.Base,
>&prog_data.base.base,
> diff --git a/src/mesa/drivers/dri/i965/brw_tes.c
> b/src/mesa/drivers/dri/i965/brw_tes.c
> index 87ada17..88739b9 100644
> --- a/src/mesa/drivers/dri/i965/brw_tes.c
> +++ b/src/mesa/drivers/dri/i965/brw_tes.c
> @@ -161,7 +161,6 @@ brw_codegen_tes_prog(struct brw_context *brw,
> prog_data.base.base.image_param =
>rzalloc_array(NULL, struct brw_image_param, tes->NumImages);
> prog_data.base.base.nr_params = param_count;
> -   prog_data.base.base.nr_image_params = tes->NumImages;
>
> prog_data.base.cull_distance_mask =
>((1 << tep->program.Base.CullDistanceArraySize) - 1) <<
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index ba7315e..c242190 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -123,9 +123,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
>  */
> int param_count = vp->program.Base.nir->num_uniforms / 4;
>
> -   if (vs)
> -  prog_data.base.base.nr_image_params = vs->base.NumImages;
> -
> /* vec4_visitor::setup_uniform_clipplane_values() also uploads user
> clip
>  * planes as uniforms.
>  */
> @@ -137,7 +134,7 @@ brw_codegen_vs_prog(struct brw_context *brw,
>rzalloc_array(NULL, const gl_constant_value *, param_count);
> stage_prog_data->image_param =
>rzalloc_array(NULL, struct brw_image_param,
> -stage_prog_data->nr_image_params);
> +vs ? vs->base.NumImages : 0);
> stage_prog_data->nr_params = param_count;
>
> if (prog) {
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c
> b/src/mesa/drive

Re: [Mesa-dev] [PATCH 2/2] i965: use L3 data cache for SSBOs

2016-09-26 Thread Jason Ekstrand

Looks good to me. Curro, do you see anything wrong with this?

--Jason

On Sep 26, 2016 7:31 AM, "Lionel Landwerlin"  wrote:

> Anv programs the hardware to use L3 data cache if we use either SSBOs or
> images in the shaders, we can program i965 the same way.
>
> gl_shader_program has a bit of a confusing named field with
> 'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the
> shader in an atomic way but instead the number of atomic counters
> manipulated by the shader.
>
> Signed-off-by: Lionel Landwerlin 
> Cc: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> index 92e8788..fdaea81 100644
> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -55,7 +55,8 @@ get_pipeline_state_l3_weights(const struct brw_context
> *brw)
>   prog ? prog->_LinkedShaders[stage_states[i]->stage] : NULL;
>const struct brw_stage_prog_data *prog_data =
> stage_states[i]->prog_data;
>
> -  needs_dc |= (prog && prog->NumAtomicBuffers) ||
> +  needs_dc |= (prog && (prog->NumAtomicBuffers ||
> +prog->NumShaderStorageBlocks)) ||
>   (shader && shader->NumImages) ||
>   (prog_data && prog_data->total_scratch);
>needs_slm |= prog_data && prog_data->total_shared;
> --
> 2.9.3
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/13] anv: Use blorp for VkCmdFillBuffer

2016-09-26 Thread Jason Ekstrand

On Sep 26, 2016 11:16 AM, "Nanley Chery"  wrote:
>
> On Sun, Sep 25, 2016 at 09:59:00AM -0700, Jason Ekstrand wrote:
> > Signed-off-by: Jason Ekstrand 
> > ---
> >  src/intel/vulkan/anv_blorp.c  | 106
+
> >  src/intel/vulkan/anv_meta_clear.c | 120
--
> >  2 files changed, 96 insertions(+), 130 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> > index cb61070..f5a6c40 100644
> > --- a/src/intel/vulkan/anv_blorp.c
> > +++ b/src/intel/vulkan/anv_blorp.c
> > @@ -480,6 +480,20 @@ void anv_CmdBlitImage(
> > blorp_batch_finish(&batch);
> >  }
> >
> > +static enum isl_format
> > +isl_format_for_size(unsigned size_B)
> > +{
> > +   switch (size_B) {
> > +   case 1:  return ISL_FORMAT_R8_UINT;
> > +   case 2:  return ISL_FORMAT_R8G8_UINT;
> > +   case 4:  return ISL_FORMAT_R8G8B8A8_UINT;
> > +   case 8:  return ISL_FORMAT_R16G16B16A16_UINT;
> > +   case 16: return ISL_FORMAT_R32G32B32A32_UINT;
> > +   default:
> > +  unreachable("Not a power-of-two format size");
> > +   }
> > +}
> > +
> >  static void
> >  do_buffer_copy(struct blorp_batch *batch,
> > struct anv_bo *src, uint64_t src_offset,
> > @@ -491,16 +505,7 @@ do_buffer_copy(struct blorp_batch *batch,
> > /* The actual format we pick doesn't matter as blorp will throw it
away.
> >  * The only thing that actually matters is the size.
> >  */
> > -   enum isl_format format;
> > -   switch (block_size) {
> > -   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> > -   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> > -   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> > -   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> > -   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> > -   default:
> > -  unreachable("Not a power-of-two format size");
> > -   }
> > +   enum isl_format format = isl_format_for_size(block_size);
> >
> > struct isl_surf surf;
> > isl_surf_init(&device->isl_dev, &surf,
> > @@ -667,6 +672,87 @@ void anv_CmdUpdateBuffer(
> > blorp_batch_finish(&batch);
> >  }
> >
> > +void anv_CmdFillBuffer(
> > +VkCommandBuffer commandBuffer,
> > +VkBufferdstBuffer,
> > +VkDeviceSizedstOffset,
> > +VkDeviceSizefillSize,
> > +uint32_tdata)
> > +{
> > +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> > +   struct blorp_surf surf;
> > +   struct isl_surf isl_surf;
> > +
> > +   struct blorp_batch batch;
> > +   blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer);
> > +
> > +   if (fillSize == VK_WHOLE_SIZE) {
> > +  fillSize = dst_buffer->size - dstOffset;
> > +  /* Make sure fillSize is a multiple of 4 */
> > +  fillSize &= ~3ull;
> > +   }
> > +
> > +   /* First, we compute the biggest format that can be used with the
> > +* given offsets and size.
> > +*/
> > +   int bs = 16;
> > +   bs = gcd_pow2_u64(bs, dstOffset);
> > +   bs = gcd_pow2_u64(bs, fillSize);
> > +   enum isl_format isl_format = isl_format_for_size(bs);
> > +
> > +   union isl_color_value color = {
> > +  .u32 = { data, data, data, data },
> > +   };
> > +
> > +   const uint64_t max_fill_size = MAX_SURFACE_DIM * MAX_SURFACE_DIM *
bs;
> > +   while (fillSize > max_fill_size) {
>   ^
>   This should be '>='.

Sure.  Both work but >= is a bit clearer.  Fixed locally.

> > +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> > +dst_buffer, dstOffset,
> > +MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> > +MAX_SURFACE_DIM * bs, isl_format,
> > +&surf, &isl_surf);
> > +
> > +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> > +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> > +  color, NULL);
> > +  fillSize -= max_fill_size;
> > +  dstOffset += max_fill_size;
> > +   }
> > +
> > +   uint64_t height = fillSize / (MAX_SURFACE_DIM * bs);
> > +   assert(height < MAX_SURFACE_DIM);
> > +   if (height != 0) {
> > +  const uint64_t rect_fill_size = height * MAX_SURFACE_DIM * bs;
> > +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> > +dst_buffer, dstOffset,
> > +MAX_SURFACE_DIM, height,
> > +MAX_SURFACE_DIM * bs, isl_format,
> > +&surf, &isl_surf);
> > +
> > +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> > +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, height,
> > +

Re: [Mesa-dev] [PATCH 01/13] anv: Use blorp for VkCmdFillBuffer

2016-09-26 Thread Nanley Chery

On Mon, Sep 26, 2016 at 12:12:32PM -0700, Jason Ekstrand wrote:
> On Sep 26, 2016 11:16 AM, "Nanley Chery"  wrote:
> >
> > On Sun, Sep 25, 2016 at 09:59:00AM -0700, Jason Ekstrand wrote:
> > > Signed-off-by: Jason Ekstrand 
> > > ---
> > >  src/intel/vulkan/anv_blorp.c  | 106
> +
> > >  src/intel/vulkan/anv_meta_clear.c | 120
> --
> > >  2 files changed, 96 insertions(+), 130 deletions(-)
> > >
> > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> > > index cb61070..f5a6c40 100644
> > > --- a/src/intel/vulkan/anv_blorp.c
> > > +++ b/src/intel/vulkan/anv_blorp.c
> > > @@ -480,6 +480,20 @@ void anv_CmdBlitImage(
> > > blorp_batch_finish(&batch);
> > >  }
> > >
> > > +static enum isl_format
> > > +isl_format_for_size(unsigned size_B)
> > > +{
> > > +   switch (size_B) {
> > > +   case 1:  return ISL_FORMAT_R8_UINT;
> > > +   case 2:  return ISL_FORMAT_R8G8_UINT;
> > > +   case 4:  return ISL_FORMAT_R8G8B8A8_UINT;
> > > +   case 8:  return ISL_FORMAT_R16G16B16A16_UINT;
> > > +   case 16: return ISL_FORMAT_R32G32B32A32_UINT;
> > > +   default:
> > > +  unreachable("Not a power-of-two format size");
> > > +   }
> > > +}
> > > +
> > >  static void
> > >  do_buffer_copy(struct blorp_batch *batch,
> > > struct anv_bo *src, uint64_t src_offset,
> > > @@ -491,16 +505,7 @@ do_buffer_copy(struct blorp_batch *batch,
> > > /* The actual format we pick doesn't matter as blorp will throw it
> away.
> > >  * The only thing that actually matters is the size.
> > >  */
> > > -   enum isl_format format;
> > > -   switch (block_size) {
> > > -   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> > > -   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> > > -   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> > > -   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> > > -   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> > > -   default:
> > > -  unreachable("Not a power-of-two format size");
> > > -   }
> > > +   enum isl_format format = isl_format_for_size(block_size);
> > >
> > > struct isl_surf surf;
> > > isl_surf_init(&device->isl_dev, &surf,
> > > @@ -667,6 +672,87 @@ void anv_CmdUpdateBuffer(
> > > blorp_batch_finish(&batch);
> > >  }
> > >
> > > +void anv_CmdFillBuffer(
> > > +VkCommandBuffer commandBuffer,
> > > +VkBufferdstBuffer,
> > > +VkDeviceSizedstOffset,
> > > +VkDeviceSizefillSize,
> > > +uint32_tdata)
> > > +{
> > > +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> > > +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> > > +   struct blorp_surf surf;
> > > +   struct isl_surf isl_surf;
> > > +
> > > +   struct blorp_batch batch;
> > > +   blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer);
> > > +
> > > +   if (fillSize == VK_WHOLE_SIZE) {
> > > +  fillSize = dst_buffer->size - dstOffset;
> > > +  /* Make sure fillSize is a multiple of 4 */
> > > +  fillSize &= ~3ull;
> > > +   }
> > > +
> > > +   /* First, we compute the biggest format that can be used with the
> > > +* given offsets and size.
> > > +*/
> > > +   int bs = 16;
> > > +   bs = gcd_pow2_u64(bs, dstOffset);
> > > +   bs = gcd_pow2_u64(bs, fillSize);
> > > +   enum isl_format isl_format = isl_format_for_size(bs);
> > > +
> > > +   union isl_color_value color = {
> > > +  .u32 = { data, data, data, data },
> > > +   };
> > > +
> > > +   const uint64_t max_fill_size = MAX_SURFACE_DIM * MAX_SURFACE_DIM *
> bs;
> > > +   while (fillSize > max_fill_size) {
> >   ^
> >   This should be '>='.
> 
> Sure.  Both work but >= is a bit clearer.  Fixed locally.
> 

I don't see how both could work. Wouldn't the assertion for height below fail
if fillSize == max_fill_size?

> > > +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> > > +dst_buffer, dstOffset,
> > > +MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> > > +MAX_SURFACE_DIM * bs, isl_format,
> > > +&surf, &isl_surf);
> > > +
> > > +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> > > +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> > > +  color, NULL);
> > > +  fillSize -= max_fill_size;
> > > +  dstOffset += max_fill_size;
> > > +   }
> > > +
> > > +   uint64_t height = fillSize / (MAX_SURFACE_DIM * bs);
> > > +   assert(height < MAX_SURFACE_DIM);
> > > +   if (height != 0) {
> > > +  const uint64_t rect_fill_size = height * MAX_SURFACE_DIM * bs;
> > > +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> > > +

Re: [Mesa-dev] [PATCH] i965: Only emit 1 viewport when possible.

2016-09-26 Thread Eric Anholt

Kenneth Graunke  writes:

> In core profile, we support up to 16 viewports.  However, in the
> majority of cases, only 1 of them is actually used - we only need
> the others if the last shader stage prior to the rasterizer writes
> gl_ViewportIndex.
>
> Processing all 16 viewports adds additional CPU overhead, which hurts
> CPU-intensive workloads such as Glamor.  This meant that switching to
> core profile actually penalized Glamor to an extent, which is
> unfortunate.
>
> This patch tracks the number of relevant viewports, switching between
> 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
> BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
> viewport state when switching, but hopefully this is offset by doing
> 1/16th of the work in the common case.  The new flag is also lighter
> weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.
>
> According to Eric Anholt, this reduces the CPU overhead of scissor and
> viewport state changes n Glamor from 2.5% or so to .8% or so.
>
> Cc: Eric Anholt 
> Signed-off-by: Kenneth Graunke 

If this is the same patch I tested on IRC earlier, x11perf -copypixwin10
performance improves by 11.5094% +/- 3.10841% (n=10) on my Skylake.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only emit 1 viewport when possible.

2016-09-26 Thread Anuj Phogat

On Mon, Sep 26, 2016 at 11:23 AM, Kenneth Graunke  wrote:
> In core profile, we support up to 16 viewports.  However, in the
> majority of cases, only 1 of them is actually used - we only need
> the others if the last shader stage prior to the rasterizer writes
> gl_ViewportIndex.
>
> Processing all 16 viewports adds additional CPU overhead, which hurts
> CPU-intensive workloads such as Glamor.  This meant that switching to
> core profile actually penalized Glamor to an extent, which is
> unfortunate.
>
> This patch tracks the number of relevant viewports, switching between
> 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
> BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
> viewport state when switching, but hopefully this is offset by doing
> 1/16th of the work in the common case.  The new flag is also lighter
> weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.
>
> According to Eric Anholt, this reduces the CPU overhead of scissor and
> viewport state changes n Glamor from 2.5% or so to .8% or so.
>
> Cc: Eric Anholt 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_cc.c  | 10 +++---
>  src/mesa/drivers/dri/i965/brw_context.c |  1 +
>  src/mesa/drivers/dri/i965/brw_context.h |  9 +
>  src/mesa/drivers/dri/i965/brw_gs_state.c|  6 --
>  src/mesa/drivers/dri/i965/brw_state_upload.c| 11 +++
>  src/mesa/drivers/dri/i965/gen6_clip_state.c | 16 +++-
>  src/mesa/drivers/dri/i965/gen6_scissor_state.c  | 10 +++---
>  src/mesa/drivers/dri/i965/gen6_viewport_state.c | 22 +++---
>  src/mesa/drivers/dri/i965/gen7_viewport_state.c | 10 +++---
>  src/mesa/drivers/dri/i965/gen8_viewport_state.c | 10 +++---
>  10 files changed, 75 insertions(+), 30 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_cc.c 
> b/src/mesa/drivers/dri/i965/brw_cc.c
> index 5c58b44..b11d7c8 100644
> --- a/src/mesa/drivers/dri/i965/brw_cc.c
> +++ b/src/mesa/drivers/dri/i965/brw_cc.c
> @@ -44,12 +44,15 @@ brw_upload_cc_vp(struct brw_context *brw)
> struct gl_context *ctx = &brw->ctx;
> struct brw_cc_viewport *ccv;
>
> +   /* BRW_NEW_VIEWPORT_COUNT */
> +   const unsigned viewport_count = brw->clip.viewport_count;
> +
> ccv = brw_state_batch(brw, AUB_TRACE_CC_VP_STATE,
> -sizeof(*ccv) * ctx->Const.MaxViewports, 32,
> +sizeof(*ccv) * viewport_count, 32,
>   &brw->cc.vp_offset);
>
> /* _NEW_TRANSFORM */
> -   for (unsigned i = 0; i < ctx->Const.MaxViewports; i++) {
> +   for (unsigned i = 0; i < viewport_count; i++) {
>if (ctx->Transform.DepthClamp) {
>   /* _NEW_VIEWPORT */
>   ccv[i].min_depth = MIN2(ctx->ViewportArray[i].Near,
> @@ -77,7 +80,8 @@ const struct brw_tracked_state brw_cc_vp = {
>.mesa = _NEW_TRANSFORM |
>_NEW_VIEWPORT,
>.brw = BRW_NEW_BATCH |
> - BRW_NEW_BLORP,
> + BRW_NEW_BLORP |
> + BRW_NEW_VIEWPORT_COUNT,
> },
> .emit = brw_upload_cc_vp
>  };
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 6efad78..b0eec16 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -1085,6 +1085,7 @@ brwCreateContext(gl_api api,
> brw->prim_restart.enable_cut_index = false;
> brw->gs.enabled = false;
> brw->sf.viewport_transform_enable = true;
> +   brw->clip.viewport_count = 1;
>
> brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 00f0adc..b27fe51 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -226,6 +226,7 @@ enum brw_state_id {
> BRW_STATE_URB_SIZE,
> BRW_STATE_CC_STATE,
> BRW_STATE_BLORP,
> +   BRW_STATE_VIEWPORT_COUNT,
> BRW_NUM_STATE_BITS
>  };
>
> @@ -294,6 +295,7 @@ enum brw_state_id {
>  #define BRW_NEW_PROGRAM_CACHE   (1ull << BRW_STATE_PROGRAM_CACHE)
>  #define BRW_NEW_STATE_BASE_ADDRESS  (1ull << 
> BRW_STATE_STATE_BASE_ADDRESS)
>  #define BRW_NEW_VUE_MAP_GEOM_OUT(1ull << BRW_STATE_VUE_MAP_GEOM_OUT)
> +#define BRW_NEW_VIEWPORT_COUNT  (1ull << BRW_STATE_VIEWPORT_COUNT)
>  #define BRW_NEW_TRANSFORM_FEEDBACK  (1ull << 
> BRW_STATE_TRANSFORM_FEEDBACK)
>  #define BRW_NEW_RASTERIZER_DISCARD  (1ull << 
> BRW_STATE_RASTERIZER_DISCARD)
>  #define BRW_NEW_STATS_WM(1ull << BRW_STATE_STATS_WM)
> @@ -1160,6 +1162,13 @@ struct brw_context
> * instead of vp_bo.
> */
>uint32_t vp_offset;
> +
> +  /**
> +   * The number of viewports to use.  If gl_ViewportIndex is written,
> +   * we can have up to ctx->Const.MaxViewports viewports.  If not,
> +   * the viewport index is a

[Mesa-dev] [PATCH v2 6/6] nv50/ir: teach insnCanLoad() about SHLADD

2016-09-26 Thread Samuel Pitoiset

Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.

This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.

GF100/GK104:

total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs:396684 -> 396386 (-0.08%)
total local used in shared programs   :34416 -> 34416 (0.00%)

localgpr   inst  bytes
helped   0 32611051105
  hurt   0  55   3   3

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index d8fa285..9bc5b8d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -334,6 +334,8 @@ TargetNVC0::insnCanLoad(const Instruction *i, int s,
   if (i->src(k).getFile() == FILE_IMMEDIATE) {
  if (k == 2 && i->op == OP_SUCLAMP) // special case
 continue;
+ if (k == 1 && i->op == OP_SHLADD) // special case
+continue;
  if (i->getSrc(k)->reg.data.u64 != 0)
 return false;
   } else
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 4/6] nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)

2016-09-26 Thread Samuel Pitoiset

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index c9d5b5f..cbbe34d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -907,6 +907,14 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
&imm2)
  return;
   }
   break;
+   case OP_SHLADD:
+  if (imm2.isInteger(0)) {
+ i->op = OP_SHL;
+ i->setSrc(2, NULL);
+ foldCount++;
+ return;
+  }
+  break;
default:
   return;
}
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/6] nv50/ir: add preliminary support for SHLADD

2016-09-26 Thread Samuel Pitoiset

This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.

v2: - fix up the commutative table on nv50/ir

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   | 6 +++---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  | 5 +++--
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 7 +--
 6 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index d6011d9..bedbdcc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -57,6 +57,7 @@ enum operation
OP_MAD,
OP_FMA,
OP_SAD, // abs(src0 - src1) + src2
+   OP_SHLADD,
OP_ABS,
OP_NEG,
OP_NOT,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 22f2f5d..dbd0f7d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] =
"mad",
"fma",
"sad",
+   "shladd",
"abs",
"neg",
"not",
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 7d7b315..273ec34 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -30,7 +30,7 @@ const uint8_t Target::operationSrcNr[] =
0, 0,   // NOP, PHI
0, 0, 0, 0, // UNION, SPLIT, MERGE, CONSTRAINT
1, 1, 2,// MOV, LOAD, STORE
-   2, 2, 2, 2, 2, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD
+   2, 2, 2, 2, 2, 3, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD, SHLADD
1, 1, 1,// ABS, NEG, NOT
2, 2, 2, 2, 2,  // AND, OR, XOR, SHL, SHR
2, 2, 1,// MAX, MIN, SAT
@@ -70,10 +70,10 @@ const OpClass Target::operationClass[] =
OPCLASS_MOVE,
OPCLASS_LOAD,
OPCLASS_STORE,
-   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD
+   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD, SHLADD
OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
OPCLASS_ARITH, OPCLASS_ARITH,
-   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
+   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
// ABS, NEG; NOT, AND, OR, XOR; SHL, SHR
OPCLASS_CONVERT, OPCLASS_CONVERT,
OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
index 6b8f767..cf8a08f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
@@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
case OP_DIV:
case OP_MOD:
   return false;
+   case OP_SHLADD:
+  if (isFloatType(ty))
+ return false;
+  break;
default:
   break;
}
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 1246cc6..83b4102 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -115,12 +115,12 @@ void TargetNV50::initOpInfo()
{
   // ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN, SET_AND, SET_OR, SET_XOR,
   // SET, SELP, SLCT
-  0x0670ca00, 0x003f, 0x, 0x
+  0x0ce0ca00, 0x007e, 0x, 0x
};
static const uint32_t shortForm[(OP_LAST + 31) / 32] =
{
   // MOV, ADD, SUB, MUL, MAD, SAD, RCP, L/PINTERP, TEX, TXF
-  0x00014e40, 0x0040, 0x0930, 0x
+  0x00014e40, 0x0080, 0x1260, 0x
};
static const operation noDestList[] =
{
@@ -438,6 +438,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
case OP_EXTBF:
case OP_EXIT: // want exit modifier instead (on NOP if required)
case OP_MEMBAR:
+   case OP_SHLADD:
   return false;
case OP_SAD:
   return ty == TYPE_S32;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index f75e395..d8fa285 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -105,6 +105,7 @@ static const struct opProperties _initProps[] =
{ OP_

[Mesa-dev] [PATCH v2 3/6] nv50/ir: optimize IMAD to SHLADD in presence of power of 2

2016-09-26 Thread Samuel Pitoiset

Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.

v2: - use non-negative values and use applyLog2()

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 74a5a85..c9d5b5f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -915,6 +915,7 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue &imm2)
 void
 ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
 {
+   const Target *target = prog->getTarget();
const int t = !s;
const operation op = i->op;
Instruction *newi = i;
@@ -1016,6 +1017,12 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
  i->src(1).mod = i->src(2).mod;
  i->setSrc(2, NULL);
  i->op = OP_ADD;
+  } else
+  if (s == 1 && !imm0.isNegative() && imm0.isPow2() &&
+  target->isOpSupported(i->op, i->dType)) {
+ i->op = OP_SHLADD;
+ imm0.applyLog2();
+ i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32));
   }
   break;
case OP_ADD:
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 2/6] nvc0/ir: add emission for SHLADD

2016-09-26 Thread Samuel Pitoiset

Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.

v2: - s/emitSHLADD/emitISCADD for GM107 emitter

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 53 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 32 +
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 44 ++
 3 files changed, 129 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 61c450b..2c4e3a7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -96,6 +96,7 @@ private:
void emitDMUL(const Instruction *);
void emitIMAD(const Instruction *);
void emitISAD(const Instruction *);
+   void emitSHLADD(const Instruction *);
void emitFMAD(const Instruction *);
void emitDMAD(const Instruction *);
void emitMADSP(const Instruction *i);
@@ -757,6 +758,55 @@ CodeEmitterGK110::emitISAD(const Instruction *i)
 }
 
 void
+CodeEmitterGK110::emitSHLADD(const Instruction *i)
+{
+   uint8_t addOp =
+  (i->src(2).mod.neg() << 1) | (i->src(0).mod.neg() ^ i->src(1).mod.neg());
+   const ImmediateValue *imm = i->src(1).get()->asImm();
+   assert(imm);
+
+   if (i->src(2).getFile() == FILE_IMMEDIATE) {
+  code[0] = 0x1;
+  code[1] = 0xc0c << 20;
+   } else {
+  code[0] = 0x2;
+  code[1] = 0x20c << 20;
+   }
+   code[1] |= addOp << 19;
+
+   emitPredicate(i);
+
+   defId(i->def(0), 2);
+   srcId(i->src(0), 10);
+
+   if (i->flagsDef >= 0)
+  code[1] |= 1 << 18;
+
+   assert(!(imm->reg.data.u32 & 0xffe0));
+   code[1] |= imm->reg.data.u32 << 10;
+
+   switch (i->src(2).getFile()) {
+   case FILE_GPR:
+  assert(code[0] & 0x2);
+  code[1] |= 0xc << 28;
+  srcId(i->src(2), 23);
+  break;
+   case FILE_MEMORY_CONST:
+  assert(code[0] & 0x2);
+  code[1] |= 0x4 << 28;
+  setCAddress14(i->src(2));
+  break;
+   case FILE_IMMEDIATE:
+  assert(code[0] & 0x1);
+  setShortImmediate(i, 2);
+  break;
+   default:
+  assert(!"bad src2 file");
+  break;
+   }
+}
+
+void
 CodeEmitterGK110::emitNOT(const Instruction *i)
 {
code[0] = 0x0003fc02; // logop(mov2) dst, 0, not src
@@ -2403,6 +2453,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn)
case OP_SAD:
   emitISAD(insn);
   break;
+   case OP_SHLADD:
+  emitSHLADD(insn);
+  break;
case OP_NOT:
   emitNOT(insn);
   break;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index cfde66c..3fedafd 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -152,6 +152,7 @@ private:
void emitIADD();
void emitIMUL();
void emitIMAD();
+   void emitISCADD();
void emitIMNMX();
void emitICMP();
void emitISET();
@@ -1813,6 +1814,34 @@ CodeEmitterGM107::emitIMAD()
 }
 
 void
+CodeEmitterGM107::emitISCADD()
+{
+   switch (insn->src(2).getFile()) {
+   case FILE_GPR:
+  emitInsn(0x5c18);
+  emitGPR (0x14, insn->src(2));
+  break;
+   case FILE_MEMORY_CONST:
+  emitInsn(0x4c18);
+  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(2));
+  break;
+   case FILE_IMMEDIATE:
+  emitInsn(0x3818);
+  emitIMMD(0x14, 19, insn->src(2));
+  break;
+   default:
+  assert(!"bad src1 file");
+  break;
+   }
+   emitNEG (0x31, insn->src(0));
+   emitNEG (0x30, insn->src(2));
+   emitCC  (0x2f);
+   emitIMMD(0x27, 5, insn->src(1));
+   emitGPR (0x08, insn->src(0));
+   emitGPR (0x00, insn->def(0));
+}
+
+void
 CodeEmitterGM107::emitIMNMX()
 {
switch (insn->src(1).getFile()) {
@@ -3098,6 +3127,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
  emitIMAD();
   }
   break;
+   case OP_SHLADD:
+  emitISCADD();
+  break;
case OP_MIN:
case OP_MAX:
   if (isFloatType(insn->dType)) {
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index d8ca6ab..c874b86 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -101,6 +101,7 @@ private:
void emitDMUL(const Instruction *);
void emitIMAD(const Instruction *);
void emitISAD(const Instruction *);
+   void emitSHLADD(const Instruction *a);
void emitFMAD(const Instruction *);
void emitDMAD(const Instruction *);
void emitMADSP(const Instruction *);
@@ -759,6 +760,46 @@ CodeEmitterNVC0::emitIMAD(const Instruction *i)
 }
 
 void
+CodeEmitterNVC0::emitSHLADD(const Instruction *i)
+{
+   uint8_t addOp =
+  (i->src(2).mod.neg() << 1) | (i->src(0).mod.neg() ^ i->src(1).mod.neg());
+   const ImmediateValue *

[Mesa-dev] [PATCH v2 5/6] nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)

2016-09-26 Thread Samuel Pitoiset

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index cbbe34d..9875738 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -778,6 +778,9 @@ ConstantFolding::expr(Instruction *i,
   }
   break;
}
+   case OP_SHLADD:
+  res.data.u32 = (a->data.u32 << b->data.u32) + c->data.u32;
+  break;
default:
   return;
}
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/6] nv50/ir: add preliminary support for SHLADD

2016-09-26 Thread Ilia Mirkin

IMHO I'd drop the isFloatType() bs in isOpSupported() - that can never
be true, if it is, you're using the instruction very wrong. Otherwise
this is

Reviewed-by: Ilia Mirkin 

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> This instruction is available since SM20 (Fermi) and allow to do
> (a << b) + c in one shot. In some situations, IMAD should be
> replaced by SHLADD when b is a power of 2, and ADD+SHL should be
> replaced by SHLADD as well.
>
> v2: - fix up the commutative table on nv50/ir
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   | 6 +++---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  | 5 +++--
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 7 +--
>  6 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> index d6011d9..bedbdcc 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> @@ -57,6 +57,7 @@ enum operation
> OP_MAD,
> OP_FMA,
> OP_SAD, // abs(src0 - src1) + src2
> +   OP_SHLADD,
> OP_ABS,
> OP_NEG,
> OP_NOT,
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> index 22f2f5d..dbd0f7d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> @@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] =
> "mad",
> "fma",
> "sad",
> +   "shladd",
> "abs",
> "neg",
> "not",
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> index 7d7b315..273ec34 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> @@ -30,7 +30,7 @@ const uint8_t Target::operationSrcNr[] =
> 0, 0,   // NOP, PHI
> 0, 0, 0, 0, // UNION, SPLIT, MERGE, CONSTRAINT
> 1, 1, 2,// MOV, LOAD, STORE
> -   2, 2, 2, 2, 2, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD
> +   2, 2, 2, 2, 2, 3, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD, 
> SHLADD
> 1, 1, 1,// ABS, NEG, NOT
> 2, 2, 2, 2, 2,  // AND, OR, XOR, SHL, SHR
> 2, 2, 1,// MAX, MIN, SAT
> @@ -70,10 +70,10 @@ const OpClass Target::operationClass[] =
> OPCLASS_MOVE,
> OPCLASS_LOAD,
> OPCLASS_STORE,
> -   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD
> +   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD, SHLADD
> OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
> OPCLASS_ARITH, OPCLASS_ARITH,
> -   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
> +   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
> // ABS, NEG; NOT, AND, OR, XOR; SHL, SHR
> OPCLASS_CONVERT, OPCLASS_CONVERT,
> OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC,
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> index 6b8f767..cf8a08f 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> @@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
> case OP_DIV:
> case OP_MOD:
>return false;
> +   case OP_SHLADD:
> +  if (isFloatType(ty))
> + return false;
> +  break;
> default:
>break;
> }
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> index 1246cc6..83b4102 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> @@ -115,12 +115,12 @@ void TargetNV50::initOpInfo()
> {
>// ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN, SET_AND, SET_OR, 
> SET_XOR,
>// SET, SELP, SLCT
> -  0x0670ca00, 0x003f, 0x, 0x
> +  0x0ce0ca00, 0x007e, 0x, 0x
> };
> static const uint32_t shortForm[(OP_LAST + 31) / 32] =
> {
>// MOV, ADD, SUB, MUL, MAD, SAD, RCP, L/PINTERP, TEX, TXF
> -  0x00014e40, 0x0040, 0x0930, 0x
> +  0x00014e40, 0x0080, 0x1260, 0x
> };
> static const operation noDestList[] =
> {
> @@ -438,6 +438,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
> case OP_EXTBF:
> case OP_EXIT: // want exit modifier instead (on NOP if required)
> case OP_MEMBAR:
> +

Re: [Mesa-dev] [PATCH 01/13] anv: Use blorp for VkCmdFillBuffer

2016-09-26 Thread Jason Ekstrand

On Sep 26, 2016 12:26 PM, "Nanley Chery"  wrote:
>
> On Mon, Sep 26, 2016 at 12:12:32PM -0700, Jason Ekstrand wrote:
> > On Sep 26, 2016 11:16 AM, "Nanley Chery"  wrote:
> > >
> > > On Sun, Sep 25, 2016 at 09:59:00AM -0700, Jason Ekstrand wrote:
> > > > Signed-off-by: Jason Ekstrand 
> > > > ---
> > > >  src/intel/vulkan/anv_blorp.c  | 106
> > +
> > > >  src/intel/vulkan/anv_meta_clear.c | 120
> > --
> > > >  2 files changed, 96 insertions(+), 130 deletions(-)
> > > >
> > > > diff --git a/src/intel/vulkan/anv_blorp.c
b/src/intel/vulkan/anv_blorp.c
> > > > index cb61070..f5a6c40 100644
> > > > --- a/src/intel/vulkan/anv_blorp.c
> > > > +++ b/src/intel/vulkan/anv_blorp.c
> > > > @@ -480,6 +480,20 @@ void anv_CmdBlitImage(
> > > > blorp_batch_finish(&batch);
> > > >  }
> > > >
> > > > +static enum isl_format
> > > > +isl_format_for_size(unsigned size_B)
> > > > +{
> > > > +   switch (size_B) {
> > > > +   case 1:  return ISL_FORMAT_R8_UINT;
> > > > +   case 2:  return ISL_FORMAT_R8G8_UINT;
> > > > +   case 4:  return ISL_FORMAT_R8G8B8A8_UINT;
> > > > +   case 8:  return ISL_FORMAT_R16G16B16A16_UINT;
> > > > +   case 16: return ISL_FORMAT_R32G32B32A32_UINT;
> > > > +   default:
> > > > +  unreachable("Not a power-of-two format size");
> > > > +   }
> > > > +}
> > > > +
> > > >  static void
> > > >  do_buffer_copy(struct blorp_batch *batch,
> > > > struct anv_bo *src, uint64_t src_offset,
> > > > @@ -491,16 +505,7 @@ do_buffer_copy(struct blorp_batch *batch,
> > > > /* The actual format we pick doesn't matter as blorp will throw
it
> > away.
> > > >  * The only thing that actually matters is the size.
> > > >  */
> > > > -   enum isl_format format;
> > > > -   switch (block_size) {
> > > > -   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> > > > -   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> > > > -   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> > > > -   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> > > > -   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> > > > -   default:
> > > > -  unreachable("Not a power-of-two format size");
> > > > -   }
> > > > +   enum isl_format format = isl_format_for_size(block_size);
> > > >
> > > > struct isl_surf surf;
> > > > isl_surf_init(&device->isl_dev, &surf,
> > > > @@ -667,6 +672,87 @@ void anv_CmdUpdateBuffer(
> > > > blorp_batch_finish(&batch);
> > > >  }
> > > >
> > > > +void anv_CmdFillBuffer(
> > > > +VkCommandBuffer commandBuffer,
> > > > +VkBufferdstBuffer,
> > > > +VkDeviceSizedstOffset,
> > > > +VkDeviceSizefillSize,
> > > > +uint32_tdata)
> > > > +{
> > > > +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> > > > +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> > > > +   struct blorp_surf surf;
> > > > +   struct isl_surf isl_surf;
> > > > +
> > > > +   struct blorp_batch batch;
> > > > +   blorp_batch_init(&cmd_buffer->device->blorp, &batch,
cmd_buffer);
> > > > +
> > > > +   if (fillSize == VK_WHOLE_SIZE) {
> > > > +  fillSize = dst_buffer->size - dstOffset;
> > > > +  /* Make sure fillSize is a multiple of 4 */
> > > > +  fillSize &= ~3ull;
> > > > +   }
> > > > +
> > > > +   /* First, we compute the biggest format that can be used with
the
> > > > +* given offsets and size.
> > > > +*/
> > > > +   int bs = 16;
> > > > +   bs = gcd_pow2_u64(bs, dstOffset);
> > > > +   bs = gcd_pow2_u64(bs, fillSize);
> > > > +   enum isl_format isl_format = isl_format_for_size(bs);
> > > > +
> > > > +   union isl_color_value color = {
> > > > +  .u32 = { data, data, data, data },
> > > > +   };
> > > > +
> > > > +   const uint64_t max_fill_size = MAX_SURFACE_DIM *
MAX_SURFACE_DIM *
> > bs;
> > > > +   while (fillSize > max_fill_size) {
> > >   ^
> > >   This should be '>='.
> >
> > Sure.  Both work but >= is a bit clearer.  Fixed locally.
> >
>
> I don't see how both could work. Wouldn't the assertion for height below
fail
> if fillSize == max_fill_size?

Right. The assertion would trigger but the release build logic is correct.
In any case, it doesn't matter since I made the change.

> > > > +  get_blorp_surf_for_anv_buffer(cmd_buffer->device,
> > > > +dst_buffer, dstOffset,
> > > > +MAX_SURFACE_DIM,
MAX_SURFACE_DIM,
> > > > +MAX_SURFACE_DIM * bs,
isl_format,
> > > > +&surf, &isl_surf);
> > > > +
> > > > +  blorp_clear(&batch, &surf, isl_format, ISL_SWIZZLE_IDENTITY,
> > > > +  0, 0, 1, 0, 0, MAX_SURFACE_DIM, MAX_SURFACE_DIM,
> > > > +

Re: [Mesa-dev] [PATCH v2 2/6] nvc0/ir: add emission for SHLADD

2016-09-26 Thread Ilia Mirkin

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> Unfortunately, we can't use the emit helpers for GF100/GK110
> because src1 and src2 are swapped.
>
> v2: - s/emitSHLADD/emitISCADD for GM107 emitter
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 53 
> ++
>  .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 32 +
>  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 44 ++
>  3 files changed, 129 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> index 61c450b..2c4e3a7 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> @@ -96,6 +96,7 @@ private:
> void emitDMUL(const Instruction *);
> void emitIMAD(const Instruction *);
> void emitISAD(const Instruction *);
> +   void emitSHLADD(const Instruction *);
> void emitFMAD(const Instruction *);
> void emitDMAD(const Instruction *);
> void emitMADSP(const Instruction *i);
> @@ -757,6 +758,55 @@ CodeEmitterGK110::emitISAD(const Instruction *i)
>  }
>
>  void
> +CodeEmitterGK110::emitSHLADD(const Instruction *i)
> +{
> +   uint8_t addOp =
> +  (i->src(2).mod.neg() << 1) | (i->src(0).mod.neg() ^ 
> i->src(1).mod.neg());
> +   const ImmediateValue *imm = i->src(1).get()->asImm();
> +   assert(imm);
> +
> +   if (i->src(2).getFile() == FILE_IMMEDIATE) {
> +  code[0] = 0x1;
> +  code[1] = 0xc0c << 20;
> +   } else {
> +  code[0] = 0x2;
> +  code[1] = 0x20c << 20;
> +   }
> +   code[1] |= addOp << 19;
> +
> +   emitPredicate(i);
> +
> +   defId(i->def(0), 2);
> +   srcId(i->src(0), 10);
> +
> +   if (i->flagsDef >= 0)
> +  code[1] |= 1 << 18;
> +
> +   assert(!(imm->reg.data.u32 & 0xffe0));
> +   code[1] |= imm->reg.data.u32 << 10;
> +
> +   switch (i->src(2).getFile()) {
> +   case FILE_GPR:
> +  assert(code[0] & 0x2);
> +  code[1] |= 0xc << 28;
> +  srcId(i->src(2), 23);
> +  break;
> +   case FILE_MEMORY_CONST:
> +  assert(code[0] & 0x2);
> +  code[1] |= 0x4 << 28;
> +  setCAddress14(i->src(2));
> +  break;
> +   case FILE_IMMEDIATE:
> +  assert(code[0] & 0x1);
> +  setShortImmediate(i, 2);
> +  break;
> +   default:
> +  assert(!"bad src2 file");
> +  break;
> +   }
> +}
> +
> +void
>  CodeEmitterGK110::emitNOT(const Instruction *i)
>  {
> code[0] = 0x0003fc02; // logop(mov2) dst, 0, not src
> @@ -2403,6 +2453,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn)
> case OP_SAD:
>emitISAD(insn);
>break;
> +   case OP_SHLADD:
> +  emitSHLADD(insn);
> +  break;
> case OP_NOT:
>emitNOT(insn);
>break;
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> index cfde66c..3fedafd 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> @@ -152,6 +152,7 @@ private:
> void emitIADD();
> void emitIMUL();
> void emitIMAD();
> +   void emitISCADD();
> void emitIMNMX();
> void emitICMP();
> void emitISET();
> @@ -1813,6 +1814,34 @@ CodeEmitterGM107::emitIMAD()
>  }
>
>  void
> +CodeEmitterGM107::emitISCADD()
> +{
> +   switch (insn->src(2).getFile()) {
> +   case FILE_GPR:
> +  emitInsn(0x5c18);
> +  emitGPR (0x14, insn->src(2));
> +  break;
> +   case FILE_MEMORY_CONST:
> +  emitInsn(0x4c18);
> +  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(2));
> +  break;
> +   case FILE_IMMEDIATE:
> +  emitInsn(0x3818);
> +  emitIMMD(0x14, 19, insn->src(2));
> +  break;
> +   default:
> +  assert(!"bad src1 file");
> +  break;
> +   }
> +   emitNEG (0x31, insn->src(0));
> +   emitNEG (0x30, insn->src(2));
> +   emitCC  (0x2f);
> +   emitIMMD(0x27, 5, insn->src(1));
> +   emitGPR (0x08, insn->src(0));
> +   emitGPR (0x00, insn->def(0));
> +}
> +
> +void
>  CodeEmitterGM107::emitIMNMX()
>  {
> switch (insn->src(1).getFile()) {
> @@ -3098,6 +3127,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
>   emitIMAD();
>}
>break;
> +   case OP_SHLADD:
> +  emitISCADD();
> +  break;
> case OP_MIN:
> case OP_MAX:
>if (isFloatType(insn->dType)) {
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> index d8ca6ab..c874b86 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> @@ -101,6 +101,7 @@ private:
> void emitDMUL(const Instruction *);
> void emitIMAD(const Instruction *);
> void emitISAD(const Instruction *);
> +   void emitSHLADD(const Instruction *a);
> vo

Re: [Mesa-dev] [PATCH v2 1/6] nv50/ir: add preliminary support for SHLADD

2016-09-26 Thread Ilia Mirkin

On Mon, Sep 26, 2016 at 5:09 PM, Ilia Mirkin  wrote:
> IMHO I'd drop the isFloatType() bs in isOpSupported() - that can never
> be true, if it is, you're using the instruction very wrong. Otherwise
> this is
>
> Reviewed-by: Ilia Mirkin 
>
> On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
>  wrote:
>> This instruction is available since SM20 (Fermi) and allow to do
>> (a << b) + c in one shot. In some situations, IMAD should be
>> replaced by SHLADD when b is a power of 2, and ADD+SHL should be
>> replaced by SHLADD as well.
>>
>> v2: - fix up the commutative table on nv50/ir
>>
>> Signed-off-by: Samuel Pitoiset 
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 +
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   | 6 +++---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  | 5 +++--
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 7 +--
>>  6 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> index d6011d9..bedbdcc 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> @@ -57,6 +57,7 @@ enum operation
>> OP_MAD,
>> OP_FMA,
>> OP_SAD, // abs(src0 - src1) + src2
>> +   OP_SHLADD,
>> OP_ABS,
>> OP_NEG,
>> OP_NOT,
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>> index 22f2f5d..dbd0f7d 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>> @@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] =
>> "mad",
>> "fma",
>> "sad",
>> +   "shladd",
>> "abs",
>> "neg",
>> "not",
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
>> index 7d7b315..273ec34 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
>> @@ -30,7 +30,7 @@ const uint8_t Target::operationSrcNr[] =
>> 0, 0,   // NOP, PHI
>> 0, 0, 0, 0, // UNION, SPLIT, MERGE, CONSTRAINT
>> 1, 1, 2,// MOV, LOAD, STORE
>> -   2, 2, 2, 2, 2, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD
>> +   2, 2, 2, 2, 2, 3, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD, 
>> SHLADD
>> 1, 1, 1,// ABS, NEG, NOT
>> 2, 2, 2, 2, 2,  // AND, OR, XOR, SHL, SHR
>> 2, 2, 1,// MAX, MIN, SAT
>> @@ -70,10 +70,10 @@ const OpClass Target::operationClass[] =
>> OPCLASS_MOVE,
>> OPCLASS_LOAD,
>> OPCLASS_STORE,
>> -   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD
>> +   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD, SHLADD
>> OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
>> OPCLASS_ARITH, OPCLASS_ARITH,
>> -   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
>> +   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
>> // ABS, NEG; NOT, AND, OR, XOR; SHL, SHR
>> OPCLASS_CONVERT, OPCLASS_CONVERT,
>> OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC,
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> index 6b8f767..cf8a08f 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> @@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) 
>> const
>> case OP_DIV:
>> case OP_MOD:
>>return false;
>> +   case OP_SHLADD:
>> +  if (isFloatType(ty))
>> + return false;
>> +  break;
>> default:
>>break;
>> }
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>> index 1246cc6..83b4102 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>> @@ -115,12 +115,12 @@ void TargetNV50::initOpInfo()
>> {
>>// ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN, SET_AND, SET_OR, 
>> SET_XOR,
>>// SET, SELP, SLCT
>> -  0x0670ca00, 0x003f, 0x, 0x
>> +  0x0ce0ca00, 0x007e, 0x, 0x
>> };
>> static const uint32_t shortForm[(OP_LAST + 31) / 32] =
>> {
>>// MOV, ADD, SUB, MUL, MAD, SAD, RCP, L/PINTERP, TEX, TXF
>> -  0x00014e40, 0x0040, 0x0930, 0x
>> +  0x00014e40, 0x0080, 0x1260, 0x
>> };
>> static const operation noDestList[] =
>> {
>> @@ -438,6 +438,7 @@

Re: [Mesa-dev] [PATCH v2 3/6] nv50/ir: optimize IMAD to SHLADD in presence of power of 2

2016-09-26 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.
>
> v2: - use non-negative values and use applyLog2()
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 74a5a85..c9d5b5f 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -915,6 +915,7 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
> &imm2)
>  void
>  ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
>  {
> +   const Target *target = prog->getTarget();
> const int t = !s;
> const operation op = i->op;
> Instruction *newi = i;
> @@ -1016,6 +1017,12 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
> &imm0, int s)
>   i->src(1).mod = i->src(2).mod;
>   i->setSrc(2, NULL);
>   i->op = OP_ADD;
> +  } else
> +  if (s == 1 && !imm0.isNegative() && imm0.isPow2() &&
> +  target->isOpSupported(i->op, i->dType)) {
> + i->op = OP_SHLADD;
> + imm0.applyLog2();
> + i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32));
>}
>break;
> case OP_ADD:
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/6] nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)

2016-09-26 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index c9d5b5f..cbbe34d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -907,6 +907,14 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
> &imm2)
>   return;
>}
>break;
> +   case OP_SHLADD:
> +  if (imm2.isInteger(0)) {
> + i->op = OP_SHL;
> + i->setSrc(2, NULL);
> + foldCount++;
> + return;
> +  }
> +  break;
> default:
>return;
> }
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 5/6] nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)

2016-09-26 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index cbbe34d..9875738 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -778,6 +778,9 @@ ConstantFolding::expr(Instruction *i,
>}
>break;
> }
> +   case OP_SHLADD:
> +  res.data.u32 = (a->data.u32 << b->data.u32) + c->data.u32;
> +  break;
> default:
>return;
> }
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 6/6] nv50/ir: teach insnCanLoad() about SHLADD

2016-09-26 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

On Mon, Sep 26, 2016 at 5:02 PM, Samuel Pitoiset
 wrote:
> Commutativity is not allowed with SHLADD, but src2 can accept
> loads. To allow the load propagation pass to do its job, add a
> special case like for SUCLAMP because src1 is always an immediate.
>
> This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
> Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
> Warrior.
>
> GF100/GK104:
>
> total instructions in shared programs :2838045 -> 2834712 (-0.12%)
> total gprs used in shared programs:396684 -> 396386 (-0.08%)
> total local used in shared programs   :34416 -> 34416 (0.00%)
>
> localgpr   inst  bytes
> helped   0 32611051105
>   hurt   0  55   3   3
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> index d8fa285..9bc5b8d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> @@ -334,6 +334,8 @@ TargetNVC0::insnCanLoad(const Instruction *i, int s,
>if (i->src(k).getFile() == FILE_IMMEDIATE) {
>   if (k == 2 && i->op == OP_SUCLAMP) // special case
>  continue;
> + if (k == 1 && i->op == OP_SHLADD) // special case
> +continue;
>   if (i->getSrc(k)->reg.data.u64 != 0)
>  return false;
>} else
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/blorp_blit: Simplify uncompressed level0 extent assignment

2016-09-26 Thread Jason Ekstrand

I think this is correct given that this function is never called on a
multisampled image.  We should add an assert(samples == 1) somewhere just
to be clear.

On Sep 26, 2016 11:53 AM, "Nanley Chery"  wrote:

> These values are the same. Avoid the extra computation.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/blorp/blorp_blit.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
> index af46389..1c878e8 100644
> --- a/src/intel/blorp/blorp_blit.c
> +++ b/src/intel/blorp/blorp_blit.c
> @@ -1774,15 +1774,12 @@ surf_convert_to_uncompressed(const struct
> isl_device *isl_dev,
> *x /= fmtl->bw;
> *y /= fmtl->bh;
>
> -   info->surf.logical_level0_px.width =
> -  DIV_ROUND_UP(info->surf.logical_level0_px.width, fmtl->bw);
> -   info->surf.logical_level0_px.height =
> -  DIV_ROUND_UP(info->surf.logical_level0_px.height, fmtl->bh);
> -
> assert(info->surf.phys_level0_sa.width % fmtl->bw == 0);
> assert(info->surf.phys_level0_sa.height % fmtl->bh == 0);
> info->surf.phys_level0_sa.width /= fmtl->bw;
> info->surf.phys_level0_sa.height /= fmtl->bh;
> +   info->surf.logical_level0_px.width = info->surf.phys_level0_sa.width;
> +   info->surf.logical_level0_px.height = info->surf.phys_level0_sa.
> height;
>
> assert(info->tile_x_sa % fmtl->bw == 0);
> assert(info->tile_y_sa % fmtl->bh == 0);
> --
> 2.10.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for vaapi encode

2016-09-26 Thread Zhang, Boyuan

Hi Andy,

For the VBR target/max bit-rate, yes, this is gstreamer-vaapi's current design. 
User typed bit-rate is actually the max bit-rate not the actual bit-rate, which 
is a bit confused.

For the overflow concern, unsigned int can handle about 4294Mbit/s, which we 
thought is big enough for real life cases, right?

Regards,
Boyuan

-Original Message-
From: Andy Furniss [mailto:adf.li...@gmail.com] 
Sent: September-25-16 6:46 AM
To: Liu, Leo; Christian König; Zhang, Boyuan; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for vaapi encode

Leo Liu wrote:
>
>
> On 09/07/2016 03:02 AM, Christian König wrote:
>> Am 06.09.2016 um 22:39 schrieb boyuan.zh...@amd.com:
>>> From: Boyuan Zhang 
>>>
>>> This patch enables variable bit-rate for vaapi encoding. According 
>>> to va.h, target bit-rate equals to maximum bit-rate multiplies by 
>>> target_percentage.
>>>
>>> Signed-off-by: Boyuan Zhang 
>>
>> That was astonishing simple to fix :)

Bit late on this but I am not sure this is VBR as it's constrained and that has 
it's own setting.

Maybe I am taking too much notice of what libx264/broadcast encoders do, but it 
would be good if VCE could throw way more bits at IDR frames as they really 
need it to avoid pulsing with default gop 30.

Of course I don't know how things work/what's possible.

The names help of course but not always eg. what is -

peak_bits_picture_fraction seems to be always 0.

and

rate_ctrl.vbv_buf_lv 0 or 48 is this some level in a spec, or is it initial 
buffer fullness for vbv?

One issue I found and sent a patch for is the bitrate calc can overflow.

Generally as a user asking for a rate and getting 70% seems a bit strange 
anyway but maybe that's a different discussion.

https://patchwork.freedesktop.org/patch/112040/

>>
>> Patch is Reviewed-by: Christian König .
>>
>> Leo do you want to push it or should I take care of this?
>
> I will take care of it.
>
> Regards,
> Leo
>
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>   src/gallium/state_trackers/va/config.c  | 2 +-
>>>   src/gallium/state_trackers/va/picture.c | 2 +-
>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/state_trackers/va/config.c
>>> b/src/gallium/state_trackers/va/config.c
>>> index 84bf913..4052316 100644
>>> --- a/src/gallium/state_trackers/va/config.c
>>> +++ b/src/gallium/state_trackers/va/config.c
>>> @@ -120,7 +120,7 @@ vlVaGetConfigAttributes(VADriverContextP ctx, 
>>> VAProfile profile, VAEntrypoint en
>>>value = VA_RT_FORMAT_YUV420;
>>>break;
>>> case VAConfigAttribRateControl:
>>> - value = VA_RC_CQP | VA_RC_CBR;
>>> + value = VA_RC_CQP | VA_RC_CBR | VA_RC_VBR;
>>>break;
>>> default:
>>>value = VA_ATTRIB_NOT_SUPPORTED; diff --git 
>>> a/src/gallium/state_trackers/va/picture.c
>>> b/src/gallium/state_trackers/va/picture.c
>>> index a283e83..7f3d96d 100644
>>> --- a/src/gallium/state_trackers/va/picture.c
>>> +++ b/src/gallium/state_trackers/va/picture.c
>>> @@ -322,7 +322,7 @@
>>> handleVAEncMiscParameterTypeRateControl(vlVaContext *context, 
>>> VAEncMiscParameter
>>>  PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT)
>>> context->desc.h264enc.rate_ctrl.target_bitrate =
>>> rc->bits_per_second;
>>>  else
>>> -  context->desc.h264enc.rate_ctrl.target_bitrate =
>>> rc->bits_per_second * rc->target_percentage;
>>> +  context->desc.h264enc.rate_ctrl.target_bitrate =
>>> rc->bits_per_second * rc->target_percentage / 100;
>>>  context->desc.h264enc.rate_ctrl.peak_bitrate = rc->bits_per_second;
>>>  if (context->desc.h264enc.rate_ctrl.target_bitrate < 200)
>>> context->desc.h264enc.rate_ctrl.vbv_buffer_size = 
>>> MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75), 
>>> 200);
>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/blorp_blit: Simplify uncompressed level0 extent assignment

2016-09-26 Thread Nanley Chery

From: Nanley Chery 

These values are the same. Avoid the extra computation.

Signed-off-by: Nanley Chery 
---

v2: Add a sample count assertion (Jason)

 src/intel/blorp/blorp_blit.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index af46389..8fa3e8d 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1774,16 +1774,15 @@ surf_convert_to_uncompressed(const struct isl_device 
*isl_dev,
*x /= fmtl->bw;
*y /= fmtl->bh;
 
-   info->surf.logical_level0_px.width =
-  DIV_ROUND_UP(info->surf.logical_level0_px.width, fmtl->bw);
-   info->surf.logical_level0_px.height =
-  DIV_ROUND_UP(info->surf.logical_level0_px.height, fmtl->bh);
-
assert(info->surf.phys_level0_sa.width % fmtl->bw == 0);
assert(info->surf.phys_level0_sa.height % fmtl->bh == 0);
info->surf.phys_level0_sa.width /= fmtl->bw;
info->surf.phys_level0_sa.height /= fmtl->bh;
 
+   assert(info->surf.samples == 1);
+   info->surf.logical_level0_px.width = info->surf.phys_level0_sa.width;
+   info->surf.logical_level0_px.height = info->surf.phys_level0_sa.height;
+
assert(info->tile_x_sa % fmtl->bw == 0);
assert(info->tile_y_sa % fmtl->bh == 0);
info->tile_x_sa /= fmtl->bw;
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for vaapi encode

2016-09-26 Thread Andy Furniss


Zhang, Boyuan wrote:

Hi Andy,

For the VBR target/max bit-rate, yes, this is gstreamer-vaapi's
current design. User typed bit-rate is actually the max bit-rate not
the actual bit-rate, which is a bit confused.


Fair enough on the bitrate, though I am still a bit confused on the VBR
being constrained as both va.h and gstreamer vaapi have vbr and
vbr_constrained.

Any thoughts on my other questions?


For the overflow concern, unsigned int can handle about 4294Mbit/s,
which we thought is big enough for real life cases, right?


Yea, but it gets x 100 and my vce can do at least 2160p so for baseline
higher than 42.94 mbit is not that extreme.

Anyway Christian acked my v2 patch -

https://patchwork.freedesktop.org/patch/112161/




Regards, Boyuan

-Original Message- From: Andy Furniss
[mailto:adf.li...@gmail.com] Sent: September-25-16 6:46 AM To: Liu,
Leo; Christian König; Zhang, Boyuan; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for
vaapi encode

Leo Liu wrote:



On 09/07/2016 03:02 AM, Christian König wrote:

Am 06.09.2016 um 22:39 schrieb boyuan.zh...@amd.com:

From: Boyuan Zhang 

This patch enables variable bit-rate for vaapi encoding.
According to va.h, target bit-rate equals to maximum bit-rate
multiplies by target_percentage.

Signed-off-by: Boyuan Zhang 


That was astonishing simple to fix :)


Bit late on this but I am not sure this is VBR as it's constrained
and that has it's own setting.

Maybe I am taking too much notice of what libx264/broadcast encoders
do, but it would be good if VCE could throw way more bits at IDR
frames as they really need it to avoid pulsing with default gop 30.

Of course I don't know how things work/what's possible.

The names help of course but not always eg. what is -

peak_bits_picture_fraction seems to be always 0.

and

rate_ctrl.vbv_buf_lv 0 or 48 is this some level in a spec, or is it
initial buffer fullness for vbv?

One issue I found and sent a patch for is the bitrate calc can
overflow.

Generally as a user asking for a rate and getting 70% seems a bit
strange anyway but maybe that's a different discussion.

https://patchwork.freedesktop.org/patch/112040/



Patch is Reviewed-by: Christian König
.

Leo do you want to push it or should I take care of this?


I will take care of it.

Regards, Leo



Regards, Christian.


--- src/gallium/state_trackers/va/config.c  | 2 +-
src/gallium/state_trackers/va/picture.c | 2 +- 2 files changed,
2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c
b/src/gallium/state_trackers/va/config.c index 84bf913..4052316
100644 --- a/src/gallium/state_trackers/va/config.c +++
b/src/gallium/state_trackers/va/config.c @@ -120,7 +120,7 @@
vlVaGetConfigAttributes(VADriverContextP ctx, VAProfile
profile, VAEntrypoint en value = VA_RT_FORMAT_YUV420; break;
case VAConfigAttribRateControl: - value = VA_RC_CQP |
VA_RC_CBR; + value = VA_RC_CQP | VA_RC_CBR |
VA_RC_VBR; break; default: value = VA_ATTRIB_NOT_SUPPORTED;
diff --git a/src/gallium/state_trackers/va/picture.c
b/src/gallium/state_trackers/va/picture.c index
a283e83..7f3d96d 100644 ---
a/src/gallium/state_trackers/va/picture.c +++
b/src/gallium/state_trackers/va/picture.c @@ -322,7 +322,7 @@
handleVAEncMiscParameterTypeRateControl(vlVaContext *context,
VAEncMiscParameter PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT)
context->desc.h264enc.rate_ctrl.target_bitrate =
rc->bits_per_second; else -
context->desc.h264enc.rate_ctrl.target_bitrate =
rc->bits_per_second * rc->target_percentage; +
context->desc.h264enc.rate_ctrl.target_bitrate =
rc->bits_per_second * rc->target_percentage / 100;
context->desc.h264enc.rate_ctrl.peak_bitrate =
rc->bits_per_second; if
(context->desc.h264enc.rate_ctrl.target_bitrate < 200)
context->desc.h264enc.rate_ctrl.vbv_buffer_size =
MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75),
200);



___ mesa-dev mailing
list mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___ mesa-dev mailing
list mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/va: Fix vaSyncSurface with no outstanding operation

2016-09-26 Thread Mark Thompson

---
A simple fix to the problem described here: 
.

With this applied, the driver no longer hangs/crashes when vaSyncSurface() is 
called in places other than for the first time after an encode operation 
(including a second call on the same surface).

Thanks,

- Mark


 src/gallium/state_trackers/va/surface.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 75db650..5e92980 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -111,6 +111,12 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID 
render_target)
   return VA_STATUS_ERROR_INVALID_SURFACE;
}

+   if (!surf->feedback) {
+  // No outstanding operation: nothing to do.
+  pipe_mutex_unlock(drv->mutex);
+  return VA_STATUS_SUCCESS;
+   }
+
context = handle_table_get(drv->htab, surf->ctx);
if (!context) {
   pipe_mutex_unlock(drv->mutex);
@@ -126,6 +132,7 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID 
render_target)
   if (frame_diff < 2)
  context->decoder->flush(context->decoder);
   context->decoder->get_feedback(context->decoder, surf->feedback, 
&(surf->coded_buf->coded_size));
+  surf->feedback = NULL;
}
pipe_mutex_unlock(drv->mutex);
return VA_STATUS_SUCCESS;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/va: enable vbr rate control for vaapi encode

2016-09-26 Thread Andy Furniss


Andy Furniss wrote:

Zhang, Boyuan wrote:



For the overflow concern, unsigned int can handle about 4294Mbit/s,
which we thought is big enough for real life cases, right?


Yea, but it gets x 100 and my vce can do at least 2160p so for baseline
higher than 42.94 mbit is not that extreme.


OK so x 70 currently but then I guess ffmpeg may have target_percent as
a param sometime so it could be set higher.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/12] genX/cmd_buffer: Enable rendering to HiZ

2016-09-26 Thread Nanley Chery

On Mon, Sep 19, 2016 at 01:49:09PM -0700, Nanley Chery wrote:
> On Fri, Sep 02, 2016 at 03:16:21PM -0700, Chad Versace wrote:
> > On Wed 31 Aug 2016, Nanley Chery wrote:
> > > From: Chad Versace 
> > > 
> > > Nanley Chery:
> > > (rebase)
> > >  - Resolve conflicts with new anv_batch_emit macro
> > > (amend)
> > >  - Remove wip! tag and handle a QPitch TODO
> > >  - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
> > >  - Only use HiZ for single-subpass renderpasses
> > >  - Emit the HiZ instruction before the stencil instruction to follow the
> > >optimized clear sequence specified in the PRMs
> > >  - Don't modify clear params
> > >  - Enable resolves when a HiZ buffer is used to ensure depth buffer 
> > > validity
> > > 
> > > Provides an FPS increase of ~15% on the Sascha triangle and multisampling
> > > demos.
> > > 
> > > Signed-off-by: Nanley Chery 
> > > ---
> > >  src/intel/vulkan/gen8_cmd_buffer.c |  4 
> > >  src/intel/vulkan/genX_cmd_buffer.c | 41 
> > > ++
> > >  2 files changed, 41 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
> > > b/src/intel/vulkan/gen8_cmd_buffer.c
> > > index 4f27350..7f65fe2 100644
> > > --- a/src/intel/vulkan/gen8_cmd_buffer.c
> > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> > > @@ -414,6 +414,10 @@ genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer 
> > > *cmd_buffer, enum anv_hz_op op)
> > > if (iview == NULL || !anv_image_has_hiz(iview->image))
> > >return;
> > >  
> > > +   /* FIXME: Implement multi-subpass HiZ */
> > > +   if (cmd_buffer->state.pass->subpass_count > 1)
> > > +  return;
> > > +
> > > const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
> > > const bool full_surface_op =
> > >   cmd_state->render_area.extent.width == iview->extent.width 
> > > &&
> > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > index 95ed5f2..349d2a4 100644
> > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > @@ -1040,6 +1040,7 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
> > > *cmd_buffer)
> > >anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
> > > const struct anv_image *image = iview ? iview->image : NULL;
> > > const bool has_depth = image && (image->aspects & 
> > > VK_IMAGE_ASPECT_DEPTH_BIT);
> > > +   const bool has_hiz = image != NULL && anv_image_has_hiz(image);
> > > const bool has_stencil =
> > >image && (image->aspects & VK_IMAGE_ASPECT_STENCIL_BIT);
> > 
> > >  
> > > @@ -1052,7 +1053,12 @@ cmd_buffer_emit_depth_stencil(struct 
> > > anv_cmd_buffer *cmd_buffer)
> > >   db.SurfaceType   = SURFTYPE_2D;
> > >   db.DepthWriteEnable  = true;
> > >   db.StencilWriteEnable= has_stencil;
> > > - db.HierarchicalDepthBufferEnable = false;
> > > +
> > > + if (cmd_buffer->state.pass->subpass_count == 1) {
> > > +db.HierarchicalDepthBufferEnable = has_hiz;
> > > + } else {
> > > +anv_finishme("Multiple-subpass HiZ not implemented");
> > > + }
> > >  
> > >   db.SurfaceFormat = isl_surf_get_depth_format(&device->isl_dev,
> > >
> > > &image->depth_surface.isl);
> > > @@ -1104,6 +1110,34 @@ cmd_buffer_emit_depth_stencil(struct 
> > > anv_cmd_buffer *cmd_buffer)
> > >}
> > > }
> > >  
> > > +   if (has_hiz) {
> > 
> > Note: This codepath is hit sometimes when
> > 3DSTATE_DEPTH_BUFFER.HierarchicalDepthBufferEnable is false.
> > Specifically, when subpass_count > 1. It's weird, but I doubt it causes
> > any harm. After all, all the surface data programmed by
> > 3DSTATE_HIER_BUFFER is valid here regardless of the value of
> > HierarchicalDepthBufferEnable.
> > 
> > > +  anv_batch_emit(&cmd_buffer->batch, 
> > > GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb) {
> > > + hdb.HierarchicalDepthBufferObjectControlState = GENX(MOCS);
> > > + hdb.SurfacePitch = image->hiz_surface.isl.row_pitch - 1;
> > > + hdb.SurfaceBaseAddress = (struct anv_address) {
> > > +.bo = image->bo,
> > > +.offset = image->offset + image->hiz_surface.offset,
> > > + };
> > > +#if GEN_GEN >= 8
> > > + /* From the SKL PRM Vol2a:
> > > +  *
> > > +  *The interpretation of this field is dependent on Surface 
> > > Type
> > > +  *as follows:
> > > +  *- SURFTYPE_1D: distance in pixels between array slices
> > > +  *- SURFTYPE_2D/CUBE: distance in rows between array slices
> > > +  *- SURFTYPE_3D: distance in rows between R - slices
> > > +  *
> > > +  * ISL implements HiZ surfaces for 1D depth buffers as 2D. 
> > > Therefore
> > > +  * the depth buffer needs to be checked for the dimension.
> > > +  */

Re: [Mesa-dev] [PATCH 03/88] glsl: Add initial functions to

2016-09-26 Thread Timothy Arceri

On Mon, 2016-09-26 at 08:42 -0700, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > 
> > On Sun, 2016-09-25 at 13:26 -0700, Eric Anholt wrote:
> > > 
> > > Timothy Arceri  writes:
> > > > 
> > > > +static void
> > > > +test_put_key_and_get_key(void)
> > > > +{
> > > > +   struct program_cache *cache;
> > > > +   bool result;
> > > > +
> > > > +   uint8_t key_a[20] =
> > > > {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
> > > > +    10, 11, 12, 13, 14, 15, 16, 17, 18,
> > > > 19};
> > > > +   uint8_t key_b[20] = { 20, 21, 22, 23, 24, 25, 26, 27, 28,
> > > > 29,
> > > > +    30, 33, 32, 33, 34, 35, 36, 37, 38,
> > > > 39};
> > > > +   uint8_t key_a_collide[20] =
> > > > +{ 0,  1, 42, 43, 44, 45, 46, 47, 48,
> > > > 49,
> > > > +    50, 55, 52, 53, 54, 55, 56, 57, 58,
> > > > 59};
> > > > +
> > > > +   cache = cache_create();
> > > > +
> > > > +   /* First test that cache_has_key returns false before
> > > > cache_put_key */
> > > > +   result = cache_has_key(cache, key_a);
> > > > +   expect_equal(result, 0, "cache_has_key before key added");
> > > > +
> > > > +   /* Then a couple of tests of cache_put_key followed by
> > > > cache_has_key */
> > > > +   cache_put_key(cache, key_a);
> > > > +   result = cache_has_key(cache, key_a);
> > > > +   expect_equal(result, 1, "cache_has_key after key added");
> > > > +
> > > > +   cache_put_key(cache, key_b);
> > > > +   result = cache_has_key(cache, key_b);
> > > > +   expect_equal(result, 1, "2nd cache_has_key after key
> > > > added");
> > > > +
> > > > +   /* Test that a key with the same two bytes as an existing
> > > > key
> > > > +* forces an eviction.
> > > > +*/
> > > > +   cache_put_key(cache, key_a_collide);
> > > > +   result = cache_has_key(cache, key_a_collide);
> > > > +   expect_equal(result, 1, "put_key of a colliding key lands
> > > > in
> > > > the cache");
> > > > +
> > > > +   result = cache_has_key(cache, key_a);
> > > > +   expect_equal(result, 0, "put_key of a colliding key evicts
> > > > from
> > > > the cache");
> > > > +
> > > > +   /* And finally test that we can re-add the original key to
> > > > re-
> > > > evict
> > > > +* the colliding key.
> > > > +*/
> > > > +   cache_put_key(cache, key_a);
> > > > +   result = cache_has_key(cache, key_a);
> > > > +   expect_equal(result, 1, "put_key of original key lands
> > > > again");
> > > > +
> > > > +   result = cache_has_key(cache, key_a_collide);
> > > > +   expect_equal(result, 0, "put_key of oiginal key evicts the
> > > > colliding key");
> > > 
> > > "original"
> > > 
> > > I haven't yet figured out what the purpose of
> > > cache_put_key()/cache_has_key() are.  I suppose I'll find out
> > > later
> > > in
> > > the series.
> > 
> > Since we cache a program rather than individual shaders we set a
> > cache
> > key for each shader and opportunistically skip compiling it next
> > time
> > we see the shader.
> > 
> > If we happen to be using the shader to create a program we haven't
> > seen
> > before we end up having to fall back to compiling the shader
> > later. 
> 
> That works out better than just always skipping shader compile until
> link time and you find that you need it?

Good question. I haven't really done any profiling, this is the way
Carl and Kristian come up with doing it before I came along. For the
initial implementation I've been focusing on correctness rather than
speed. I'd really like to leave performance tweaks until after we land
something thats working.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 7/7] intel/isl: Add a detailed comment about multisampling with HiZ

2016-09-26 Thread Nanley Chery

On Mon, Sep 12, 2016 at 05:58:24PM -0700, Jason Ekstrand wrote:
> Signed-off-by: Jason Ekstrand 
> Reviewed-by: Chad Versace 
> ---
>  src/intel/isl/isl.c | 60 
> +++--
>  1 file changed, 58 insertions(+), 2 deletions(-)

This patch is
Reviewed-by: Nanley Chery 

> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index ee5330e..749d228 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -1288,6 +1288,63 @@ isl_surf_get_hiz_surf(const struct isl_device *dev,
> assert(surf->msaa_layout == ISL_MSAA_LAYOUT_NONE ||
>surf->msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED);
>  
> +   /* From the Broadwell PRM Vol. 7, "Hierarchical Depth Buffer":
> +*
> +*"The Surface Type, Height, Width, Depth, Minimum Array Element, 
> Render
> +*Target View Extent, and Depth Coordinate Offset X/Y of the
> +*hierarchical depth buffer are inherited from the depth buffer. The
> +*height and width of the hierarchical depth buffer that must be
> +*allocated are computed by the following formulas, where HZ is the
> +*hierarchical depth buffer and Z is the depth buffer. The Z_Height,
> +*Z_Width, and Z_Depth values given in these formulas are those 
> present
> +*in 3DSTATE_DEPTH_BUFFER incremented by one.
> +*
> +*"The value of Z_Height and Z_Width must each be multiplied by 2 
> before
> +*being applied to the table below if Number of Multisamples is set to
> +*NUMSAMPLES_4. The value of Z_Height must be multiplied by 2 and
> +*Z_Width must be multiplied by 4 before being applied to the table
> +*below if Number of Multisamples is set to NUMSAMPLES_8."
> +*
> +* In the Sky Lake PRM, the second paragraph is replaced with this:
> +*
> +*"The Z_Height and Z_Width values must equal those present in
> +*3DSTATE_DEPTH_BUFFER incremented by one."
> +*
> +* In other words, on Sandy Bridge through Broadwell, each 128-bit HiZ
> +* block corresponds to a region of 8x4 samples in the primary depth
> +* surface.  On Sky Lake, on the other hand, each HiZ block corresponds to
> +* a region of 8x4 pixels in the primary depth surface regardless of the
> +* number of samples.  The dimensions of a HiZ block in both pixels and
> +* samples are given in the table below:
> +*
> +*| SNB - BDW | SKL+
> +*  --+---+-
> +*1x  |  8 x 4 sa |   8 x 4 sa
> +*   MSAA |  8 x 4 px |   8 x 4 px
> +*  --+---+-
> +*2x  |  8 x 4 sa |  16 x 4 sa
> +*   MSAA |  4 x 4 px |   8 x 4 px
> +*  --+---+-
> +*4x  |  8 x 4 sa |  16 x 8 sa
> +*   MSAA |  4 x 2 px |   8 x 4 px
> +*  --+---+-
> +*8x  |  8 x 4 sa |  32 x 8 sa
> +*   MSAA |  2 x 2 px |   8 x 4 px
> +*  --+---+-
> +*   16x  |N/A| 32 x 16 sa
> +*   MSAA |N/A|  8 x  4 px
> +*  --+---+-
> +*
> +* There are a number of different ways that this discrepency could be
> +* handled.  The way we have chosen is to simply make MSAA HiZ have the
> +* same number of samples as the parent surface pre-Sky Lake and always be
> +* single-sampled on Sky Lake and above.  Since the block sizes of
> +* compressed formats are given in samples, this neatly handles everything
> +* without the need for additional HiZ formats with different block sizes
> +* on SKL+.
> +*/
> +   const unsigned samples = ISL_DEV_GEN(dev) >= 9 ? 1 : surf->samples;
> +
> isl_surf_init(dev, hiz_surf,
>   .dim = ISL_SURF_DIM_2D,
>   .format = ISL_FORMAT_HIZ,
> @@ -1296,8 +1353,7 @@ isl_surf_get_hiz_surf(const struct isl_device *dev,
>   .depth = 1,
>   .levels = surf->levels,
>   .array_len = surf->logical_level0_px.array_len,
> - /* On SKL+, HiZ is always single-sampled */
> - .samples = ISL_DEV_GEN(dev) >= 9 ? 1 : surf->samples,
> + .samples = samples,
>   .usage = ISL_SURF_USAGE_HIZ_BIT,
>   .tiling_flags = ISL_TILING_HIZ_BIT);
>  }
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/12] genX/cmd_buffer: Enable rendering to HiZ

2016-09-26 Thread Chad Versace

On Mon 26 Sep 2016, Nanley Chery wrote:
> On Mon, Sep 19, 2016 at 01:49:09PM -0700, Nanley Chery wrote:
> > On Fri, Sep 02, 2016 at 03:16:21PM -0700, Chad Versace wrote:
> > > On Wed 31 Aug 2016, Nanley Chery wrote:
> > > > From: Chad Versace 
> > > > 
> > > > Nanley Chery:
> > > > (rebase)
> > > >  - Resolve conflicts with new anv_batch_emit macro
> > > > (amend)
> > > >  - Remove wip! tag and handle a QPitch TODO
> > > >  - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
> > > >  - Only use HiZ for single-subpass renderpasses
> > > >  - Emit the HiZ instruction before the stencil instruction to follow the
> > > >optimized clear sequence specified in the PRMs
> > > >  - Don't modify clear params
> > > >  - Enable resolves when a HiZ buffer is used to ensure depth buffer 
> > > > validity
> > > > 
> > > > Provides an FPS increase of ~15% on the Sascha triangle and 
> > > > multisampling
> > > > demos.
> > > > 
> > > > Signed-off-by: Nanley Chery 
> > > > ---
> > > >  src/intel/vulkan/gen8_cmd_buffer.c |  4 
> > > >  src/intel/vulkan/genX_cmd_buffer.c | 41 
> > > > ++
> > > >  2 files changed, 41 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
> > > > b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > index 4f27350..7f65fe2 100644
> > > > --- a/src/intel/vulkan/gen8_cmd_buffer.c
> > > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > @@ -414,6 +414,10 @@ genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer 
> > > > *cmd_buffer, enum anv_hz_op op)
> > > > if (iview == NULL || !anv_image_has_hiz(iview->image))
> > > >return;
> > > >  
> > > > +   /* FIXME: Implement multi-subpass HiZ */
> > > > +   if (cmd_buffer->state.pass->subpass_count > 1)
> > > > +  return;
> > > > +
> > > > const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
> > > > const bool full_surface_op =
> > > >   cmd_state->render_area.extent.width == 
> > > > iview->extent.width &&
> > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > index 95ed5f2..349d2a4 100644
> > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > @@ -1040,6 +1040,7 @@ cmd_buffer_emit_depth_stencil(struct 
> > > > anv_cmd_buffer *cmd_buffer)
> > > >anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
> > > > const struct anv_image *image = iview ? iview->image : NULL;
> > > > const bool has_depth = image && (image->aspects & 
> > > > VK_IMAGE_ASPECT_DEPTH_BIT);
> > > > +   const bool has_hiz = image != NULL && anv_image_has_hiz(image);
> > > > const bool has_stencil =
> > > >image && (image->aspects & VK_IMAGE_ASPECT_STENCIL_BIT);
> > > 
> > > >  
> > > > @@ -1052,7 +1053,12 @@ cmd_buffer_emit_depth_stencil(struct 
> > > > anv_cmd_buffer *cmd_buffer)
> > > >   db.SurfaceType   = SURFTYPE_2D;
> > > >   db.DepthWriteEnable  = true;
> > > >   db.StencilWriteEnable= has_stencil;
> > > > - db.HierarchicalDepthBufferEnable = false;
> > > > +
> > > > + if (cmd_buffer->state.pass->subpass_count == 1) {
> > > > +db.HierarchicalDepthBufferEnable = has_hiz;
> > > > + } else {
> > > > +anv_finishme("Multiple-subpass HiZ not implemented");
> > > > + }
> > > >  
> > > >   db.SurfaceFormat = isl_surf_get_depth_format(&device->isl_dev,
> > > >
> > > > &image->depth_surface.isl);
> > > > @@ -1104,6 +1110,34 @@ cmd_buffer_emit_depth_stencil(struct 
> > > > anv_cmd_buffer *cmd_buffer)
> > > >}
> > > > }
> > > >  
> > > > +   if (has_hiz) {
> > > 
> > > Note: This codepath is hit sometimes when
> > > 3DSTATE_DEPTH_BUFFER.HierarchicalDepthBufferEnable is false.
> > > Specifically, when subpass_count > 1. It's weird, but I doubt it causes
> > > any harm. After all, all the surface data programmed by
> > > 3DSTATE_HIER_BUFFER is valid here regardless of the value of
> > > HierarchicalDepthBufferEnable.
> > > 
> > > > +  anv_batch_emit(&cmd_buffer->batch, 
> > > > GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb) {
> > > > + hdb.HierarchicalDepthBufferObjectControlState = GENX(MOCS);
> > > > + hdb.SurfacePitch = image->hiz_surface.isl.row_pitch - 1;
> > > > + hdb.SurfaceBaseAddress = (struct anv_address) {
> > > > +.bo = image->bo,
> > > > +.offset = image->offset + image->hiz_surface.offset,
> > > > + };
> > > > +#if GEN_GEN >= 8
> > > > + /* From the SKL PRM Vol2a:
> > > > +  *
> > > > +  *The interpretation of this field is dependent on 
> > > > Surface Type
> > > > +  *as follows:
> > > > +  *- SURFTYPE_1D: distance in pixels between array slices
> > > > +  *- SURFTYPE_2D/CUBE: distance in rows between array 
> > > > slices
> >

Re: [Mesa-dev] [PATCH 03/88] glsl: Add initial functions to implement an on-disk cache

2016-09-26 Thread Timothy Arceri

On Mon, 2016-09-26 at 08:29 -0600, Brian Paul wrote:
> On 09/23/2016 11:24 PM, Timothy Arceri wrote:
> > 
> > From: Carl Worth 
> > 
> > This code provides for an on-disk cache of objects. Objects are
> > stored
> > and retrieved via names that are arbitrary 20-byte sequences,
> > (intended to be SHA-1 hashes of something identifying for the
> > content). The directory used for the cache can be specified by
> > means
> > of environment variables in the following priority order:
> > 
> > $MESA_GLSL_CACHE_DIR
> > $XDG_CACHE_HOME/mesa
> > /.cache/mesa
> > 
> > By default the cache will be limited to a maximum size of 1GB. The
> > environment variable:
> > 
> > $MESA_GLSL_CACHE_MAX_SIZE
> > 
> > can be set (at the time of GL context creation) to choose some
> > other
> > size. This variable is a number that can optionally be followed by
> > 'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
> > gigabytes. By default, an unadorned value will be interpreted as
> > gigabytes.
> > 
> > The cache will be entirely disabled at runtime if the variable
> > MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.
> > 
> > Many thanks to Kristian Høgsberg  for the
> > initial
> > implementation of code that led to this patch. In particular, the
> > idea
> > of using an mmapped file, (indexed by a portion of the SHA-1), for
> > the
> > efficent implementation of cache_has_key was entirely his
> > idea. Kristian also provided some very helpful advice in
> > discussions
> > regarding various race conditions to be avoided in this code.
> > 
> > Signed-off-by: Timothy Arceri 
> > ---
> >   configure.ac |   3 +
> >   docs/envvars.html|  11 +
> >   src/compiler/Makefile.glsl.am|  10 +
> >   src/compiler/Makefile.sources|   4 +
> >   src/compiler/glsl/cache.c| 709
> > +++
> >   src/compiler/glsl/cache.h| 172 +
> >   src/compiler/glsl/tests/.gitignore   |   1 +
> >   src/compiler/glsl/tests/cache_test.c | 416 
> >   8 files changed, 1326 insertions(+)
> >   create mode 100644 src/compiler/glsl/cache.c
> >   create mode 100644 src/compiler/glsl/cache.h
> >   create mode 100644 src/compiler/glsl/tests/cache_test.c
> > 
> > diff --git a/configure.ac b/configure.ac
> > index 0604ad9..7db31e4 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -1305,6 +1305,9 @@ if test "x$with_sha1" = "x"; then
> >   fi
> >   fi
> >   AM_CONDITIONAL([ENABLE_SHADER_CACHE], [test x$enable_shader_cache
> > = xyes])
> > +if test "x$enable_shader_cache" = "xyes"; then
> > +   AC_DEFINE([ENABLE_SHADER_CACHE], [1], [Enable shader cache])
> > +fi
> > 
> >   case "$host_os" in
> >   linux*)
> > diff --git a/docs/envvars.html b/docs/envvars.html
> > index cf57ca5..2375145 100644
> > --- a/docs/envvars.html
> > +++ b/docs/envvars.html
> > @@ -112,6 +112,17 @@ glGetString(GL_VERSION) for OpenGL ES.
> >   glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are
> > integers, such as
> >   "130".  Mesa will not really implement all the features of the
> > given language version
> >   if it's higher than what's normally reported. (for developers
> > only)
> > +MESA_GLSL_CACHE_DISABLE - if set, disables the GLSL shader
> > cache
> > +MESA_GLSL_CACHE_MAX_SIZE - if set, determines the maximum size
> > of
> > +the on-disk cache of compiled GLSL programs. Should be set to a
> > number
> > +optionally followed by 'K', 'M', or 'G' to specify a size in
> > +kilobytes, megabytes, or gigabytes. By default, gigabytes will be
> > +assumed. And if unset, a maxium size of 1GB will be used.
> > +MESA_GLSL_CACHE_DIR - if set, determines the directory to be
> > used
> > +for the on-disk cache of compiled GLSL programs. If this variable
> > is
> > +not set, then the cache will be stored in $XDG_CACHE_HOME/.mesa
> > (if
> > +that variable is set), or else within .cache/mesa within the
> > user's
> > +home directory.
> >   MESA_GLSL - shading language
> > compiler options
> >   MESA_NO_MINMAX_CACHE - when set, the minmax index cache is
> > globally disabled.
> >   
> > diff --git a/src/compiler/Makefile.glsl.am
> > b/src/compiler/Makefile.glsl.am
> > index b8225cb..80dfb73 100644
> > --- a/src/compiler/Makefile.glsl.am
> > +++ b/src/compiler/Makefile.glsl.am
> > @@ -33,6 +33,7 @@ EXTRA_DIST += glsl/tests glsl/glcpp/tests
> > glsl/README \
> >   TESTS += glsl/glcpp/tests/glcpp-test  \
> >     glsl/glcpp/tests/glcpp-test-cr-lf   \
> >     glsl/tests/blob-test\
> > +   glsl/tests/cache-test   \
> >     glsl/tests/general-ir-test  \
> >     glsl/tests/optimization-test\
> >     glsl/tests/sampler-types-test   \
> > @@ -47,6 +48,7 @@ check_PROGRAMS += 
> > \
> >     glsl/glcpp/glcpp\
> >     glsl/g

Re: [Mesa-dev] [PATCH] st/va: Fix vaSyncSurface with no outstanding operation

2016-09-26 Thread Andy Furniss


Mark Thompson wrote:

---
A simple fix to the problem described here: 
.

With this applied, the driver no longer hangs/crashes when vaSyncSurface() is 
called in places other than for the first time after an encode operation 
(including a second call on the same surface).


Once I could get ffmpeg (patched) or avconv to roughly work (before the 
dual instance commit), but I can't get either to work now = produces 
unreadable file.


Testing with git avconv I am trying -

./avconv -vaapi_device :0 -f rawvideo -framerate 50 -s 2560x1440 
-pix_fmt nv12 -i /mnt/ramdisk/trees-1440p50.nv12 -vframes 5 -vf 
'hwupload' -c:v h264_vaapi -profile:v 66 -b:v 40M  -bf 0 -g 30  -f h264 
-y /mnt/ramdisk/out.264


but debugging printfs show refs = 2 and bframes enabled (I also notice 
with your baseline patch that -profile:v 66 fails).


Do you have an example that works for you with avconv + this patch?

TIA



Thanks,

- Mark


  src/gallium/state_trackers/va/surface.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 75db650..5e92980 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -111,6 +111,12 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID 
render_target)
return VA_STATUS_ERROR_INVALID_SURFACE;
 }

+   if (!surf->feedback) {
+  // No outstanding operation: nothing to do.
+  pipe_mutex_unlock(drv->mutex);
+  return VA_STATUS_SUCCESS;
+   }
+
 context = handle_table_get(drv->htab, surf->ctx);
 if (!context) {
pipe_mutex_unlock(drv->mutex);
@@ -126,6 +132,7 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID 
render_target)
if (frame_diff < 2)
   context->decoder->flush(context->decoder);
context->decoder->get_feedback(context->decoder, surf->feedback, 
&(surf->coded_buf->coded_size));
+  surf->feedback = NULL;
 }
 pipe_mutex_unlock(drv->mutex);
 return VA_STATUS_SUCCESS;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 01/11] isl: Correct a comment in the isl_format enum

2016-09-26 Thread Nanley Chery

From: Nanley Chery 

HiZ is not a color surface, but an auxiliary depth surface.

Signed-off-by: Nanley Chery 
Reviewed-by: Chad Versace 
Reviewed-by: Jason Ekstrand 
---
 src/intel/isl/isl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index d2f0e16..007dd80 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -358,7 +358,7 @@ enum isl_format {
 * actual hardware formats *must* come before these in the list.
 */
 
-   /* Formats for color compression surfaces */
+   /* Formats for auxiliary surfaces */
ISL_FORMAT_HIZ,
ISL_FORMAT_MCS_2X,
ISL_FORMAT_MCS_4X,
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 03/11] anv: Add anv_image::hiz_surface

2016-09-26 Thread Nanley Chery

From: Chad Versace 

Unused.

Signed-off-by: Nanley Chery 
Reviewed-by: Jason Ekstrand 
---
 src/intel/vulkan/anv_private.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 443c31f..7e08786 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1651,6 +1651,7 @@ anv_pipeline_setup_l3_config(struct anv_pipeline 
*pipeline, bool needs_slm);
  * Subsurface of an anv_image.
  */
 struct anv_surface {
+   /** Valid only if isl_surf::size > 0. */
struct isl_surf isl;
 
/**
@@ -1697,6 +1698,7 @@ struct anv_image {
 
   struct {
  struct anv_surface depth_surface;
+ struct anv_surface hiz_surface;
  struct anv_surface stencil_surface;
   };
};
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 00/11] anv: Implement HiZ for basic cases

2016-09-26 Thread Nanley Chery

This series is the second revision of the series found here:
https://lists.freedesktop.org/archives/mesa-dev/2016-September/127687.html

Comments from the first were addressed and the code was rebased onto
the upstream master.

Cc: Chad Versace 
Cc: Jason Ekstrand 

Chad Versace (4):
  anv: Add anv_image::hiz_surface
  anv: Add func anv_image_has_hiz()
  anv: Allocate hiz surface
  genX/cmd_buffer: Enable rendering to HiZ

Jason Ekstrand (2):
  anv: Move BindImageMemory to anv_image.c
  anv/image: Memset hiz surfaces to 0 when binding memory

Nanley Chery (5):
  isl: Correct a comment in the isl_format enum
  isl: Update isl_surf_get_hiz_surf()
  anv/cmd_buffer: Add code for performing HZ operations
  genX/cmd_buffer: Enable fast depth clears
  anv/TODO: Update the HiZ task

 src/intel/isl/isl.c|  39 ++--
 src/intel/isl/isl.h|   2 +-
 src/intel/vulkan/TODO  |   2 +-
 src/intel/vulkan/anv_device.c  |  20 -
 src/intel/vulkan/anv_genX.h|   3 +
 src/intel/vulkan/anv_image.c   |  88 +-
 src/intel/vulkan/anv_pass.c|  13 +++
 src/intel/vulkan/anv_private.h |  12 +++
 src/intel/vulkan/gen7_cmd_buffer.c |   6 ++
 src/intel/vulkan/gen8_cmd_buffer.c | 177 +
 src/intel/vulkan/genX_cmd_buffer.c |  47 --
 11 files changed, 371 insertions(+), 38 deletions(-)

-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 02/11] isl: Update isl_surf_get_hiz_surf()

2016-09-26 Thread Nanley Chery

From: Nanley Chery 

Modify extents and dimensions to match the PRMs more closely. Along with
being able to create the correct 3D surface this enables us to avoid working
with multisampled compressed textures.

Signed-off-by: Nanley Chery 
Reviewed-by: Chad Versace 
---

Note: This patch will have to be rebased if/when the following series
lands upstream before this series:

https://lists.freedesktop.org/archives/mesa-dev/2016-September/128761.html

 src/intel/isl/isl.c | 39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index b6e88ad..c68ab45 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1287,27 +1287,54 @@ isl_surf_get_tile_info(const struct isl_device *dev,
isl_tiling_get_info(dev, surf->tiling, fmtl->bpb, tile_info);
 }
 
+/**
+ * \todo Implement the correct dimensions pre-BDW.
+ */
 void
 isl_surf_get_hiz_surf(const struct isl_device *dev,
   const struct isl_surf *surf,
   struct isl_surf *hiz_surf)
 {
+   assert(surf->usage & ISL_SURF_USAGE_DEPTH_BIT);
+   assert(surf->usage & ~ISL_SURF_USAGE_DISABLE_AUX_BIT);
+
assert(ISL_DEV_GEN(dev) >= 5 && ISL_DEV_USE_SEPARATE_STENCIL(dev));
 
/* Multisampled depth is always interleaved */
assert(surf->msaa_layout == ISL_MSAA_LAYOUT_NONE ||
   surf->msaa_layout == ISL_MSAA_LAYOUT_INTERLEAVED);
 
+   uint32_t width, height;
+
+   /* On SKL+, one HiZ sample maps to one depth pixel and
+* and the miplayout is recalculated.
+*/
+   if (ISL_DEV_GEN(dev) >= 9) {
+  width = surf->logical_level0_px.width;
+  height = surf->logical_level0_px.height;
+   } else {
+  /* On BDW+, one HiZ sample maps to one depth sample and
+   * and the miplayout is recalculated.
+   */
+  width = surf->phys_level0_sa.width;
+  height = surf->phys_level0_sa.height;
+   }
+
isl_surf_init(dev, hiz_surf,
- .dim = ISL_SURF_DIM_2D,
+ /* The layout of a 2D HiZ surface is identical to that of a
+  * 1D HiZ surface HSW+. Since ISL doesn't support compressed
+  * 1D surfaces currently and it is not yet needed, change the
+  * dimension for now.
+  */
+ .dim = surf->dim == ISL_SURF_DIM_1D ?
+ISL_SURF_DIM_2D : surf->dim,
  .format = ISL_FORMAT_HIZ,
- .width = surf->logical_level0_px.width,
- .height = surf->logical_level0_px.height,
- .depth = 1,
+ .width = width,
+ .height = height,
+ .depth = surf->logical_level0_px.depth,
  .levels = surf->levels,
  .array_len = surf->logical_level0_px.array_len,
- /* On SKL+, HiZ is always single-sampled */
- .samples = ISL_DEV_GEN(dev) >= 9 ? 1 : surf->samples,
+ .samples = 1,
  .usage = ISL_SURF_USAGE_HIZ_BIT,
  .tiling_flags = ISL_TILING_HIZ_BIT);
 }
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 09/11] genX/cmd_buffer: Enable rendering to HiZ

2016-09-26 Thread Nanley Chery

From: Chad Versace 

Nanley Chery:
(rebase)
 - Resolve conflicts with new anv_batch_emit macro
(amend)
 - Handle a QPitch TODO
 - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
 - Only use HiZ for single-subpass renderpasses
 - Emit the HiZ instruction before the stencil instruction to follow the
   optimized clear sequence specified in the PRMs
 - Don't modify clear params
 - Enable resolves when a HiZ buffer is used to ensure depth buffer validity

Provides an FPS increase of ~15% on the Sascha triangle and multisampling
demos.

Signed-off-by: Nanley Chery 

---

v2: Emit zero'ed 3DSTATE_HIER_DEPTH_BUFFER when hiz is disabled
(Jason, Chad)

 src/intel/vulkan/gen8_cmd_buffer.c |  4 
 src/intel/vulkan/genX_cmd_buffer.c | 43 ++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index a13413c..14e6a7b 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -417,6 +417,10 @@ genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
if (iview == NULL || !anv_image_has_hiz(iview->image))
   return;
 
+   /* FIXME: Implement multi-subpass HiZ */
+   if (cmd_buffer->state.pass->subpass_count > 1)
+  return;
+
const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
const bool full_surface_op =
  cmd_state->render_area.extent.width == iview->extent.width &&
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 6a84383..2cb1539 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1199,6 +1199,7 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
   anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
const struct anv_image *image = iview ? iview->image : NULL;
const bool has_depth = image && (image->aspects & 
VK_IMAGE_ASPECT_DEPTH_BIT);
+   const bool has_hiz = image != NULL && anv_image_has_hiz(image);
const bool has_stencil =
   image && (image->aspects & VK_IMAGE_ASPECT_STENCIL_BIT);
 
@@ -1211,7 +1212,12 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
  db.SurfaceType   = SURFTYPE_2D;
  db.DepthWriteEnable  = true;
  db.StencilWriteEnable= has_stencil;
- db.HierarchicalDepthBufferEnable = false;
+
+ if (cmd_buffer->state.pass->subpass_count == 1) {
+db.HierarchicalDepthBufferEnable = has_hiz;
+ } else {
+anv_finishme("Multiple-subpass HiZ not implemented");
+ }
 
  db.SurfaceFormat = isl_surf_get_depth_format(&device->isl_dev,
   
&image->depth_surface.isl);
@@ -1263,6 +1269,36 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
   }
}
 
+   if (has_hiz) {
+  anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb) 
{
+ hdb.HierarchicalDepthBufferObjectControlState = GENX(MOCS);
+ hdb.SurfacePitch = image->hiz_surface.isl.row_pitch - 1;
+ hdb.SurfaceBaseAddress = (struct anv_address) {
+.bo = image->bo,
+.offset = image->offset + image->hiz_surface.offset,
+ };
+#if GEN_GEN >= 8
+ /* From the SKL PRM Vol2a:
+  *
+  *The interpretation of this field is dependent on Surface Type
+  *as follows:
+  *- SURFTYPE_1D: distance in pixels between array slices
+  *- SURFTYPE_2D/CUBE: distance in rows between array slices
+  *- SURFTYPE_3D: distance in rows between R - slices
+  *
+  * ISL implements HiZ surfaces for 1D depth buffers as 2D. Therefore
+  * the depth buffer needs to be checked for the dimension.
+  */
+ hdb.SurfaceQPitch =
+image->depth_surface.isl.dim == ISL_SURF_DIM_1D ?
+   isl_surf_get_array_pitch_el(&image->hiz_surface.isl) >> 2 :
+   isl_surf_get_array_pitch_el_rows(&image->hiz_surface.isl) >> 2;
+#endif
+  }
+   } else {
+  anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb);
+   }
+
/* Emit 3DSTATE_STENCIL_BUFFER */
if (has_stencil) {
   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_STENCIL_BUFFER), sb) {
@@ -1285,9 +1321,6 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_STENCIL_BUFFER), sb);
}
 
-   /* Disable hierarchial depth buffers. */
-   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_HIER_DEPTH_BUFFER), hz);
-
/* Clear the clear params. */
anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp);
 }
@@ -1323,6 +1356,7 @@ void genX(CmdBeginRenderPass)(
genX(flush_pipeline_select_3d)(cmd_buffer);
 
genX(cmd_buffer_set_subpass)(cmd_buffer, pass->subpasses);
+   genX(cmd_buffer_do_

[Mesa-dev] [PATCH V2 08/11] anv/cmd_buffer: Add code for performing HZ operations

2016-09-26 Thread Nanley Chery

Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.

Signed-off-by: Nanley Chery 

---

v2. Add documentation
Fix the alignment check
Don't minify clear rectangle (Jason)
Use blorp enums (Jason)
Enable depth stalls and flushes
Use full RT rectangle for resolve ops
Add stencil clear todo

 src/intel/vulkan/anv_genX.h|   3 +
 src/intel/vulkan/gen7_cmd_buffer.c |   6 ++
 src/intel/vulkan/gen8_cmd_buffer.c | 167 +
 3 files changed, 176 insertions(+)

diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index 02e79c2..ad3bec9 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -58,6 +58,9 @@ genX(emit_urb_setup)(struct anv_device *device, struct 
anv_batch *batch,
  unsigned vs_entry_size, unsigned gs_entry_size,
  const struct gen_l3_config *l3_config);
 
+void genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer *cmd_buffer,
+   enum blorp_hiz_op op);
+
 VkResult
 genX(graphics_pipeline_create)(VkDevice _device,
struct anv_pipeline_cache *cache,
diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index b627ef0..78b5ac7 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -323,6 +323,12 @@ genX(cmd_buffer_flush_dynamic_state)(struct anv_cmd_buffer 
*cmd_buffer)
cmd_buffer->state.dirty = 0;
 }
 
+void
+genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer *cmd_buffer,
+  enum blorp_hiz_op op)
+{
+}
+
 void genX(CmdSetEvent)(
 VkCommandBuffer commandBuffer,
 VkEvent event,
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 7058608..a13413c 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -399,6 +399,173 @@ genX(cmd_buffer_flush_compute_state)(struct 
anv_cmd_buffer *cmd_buffer)
genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
 }
 
+
+/**
+ * Emit the HZ_OP packet in the sequence specified by the BDW PRM section
+ * entitled: "Optimized Depth Buffer Clear and/or Stencil Buffer Clear."
+ *
+ * \todo Enable Stencil Buffer-only clears
+ */
+void
+genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer *cmd_buffer,
+  enum blorp_hiz_op op)
+{
+   struct anv_cmd_state *cmd_state = &cmd_buffer->state;
+   const struct anv_image_view *iview =
+  anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
+
+   if (iview == NULL || !anv_image_has_hiz(iview->image))
+  return;
+
+   const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
+   const bool full_surface_op =
+ cmd_state->render_area.extent.width == iview->extent.width &&
+ cmd_state->render_area.extent.height == iview->extent.height;
+
+   /* Validate that we can perform the HZ operation and that it's necessary. */
+   switch (op) {
+   case BLORP_HIZ_OP_DEPTH_CLEAR:
+  if (cmd_buffer->state.pass->attachments[ds].load_op !=
+  VK_ATTACHMENT_LOAD_OP_CLEAR)
+ return;
+
+  /* Apply alignment restrictions. Despite the BDW PRM mentioning this is
+   * only needed for a depth buffer surface type of D16_UNORM, testing
+   * showed it to be necessary for other depth formats as well
+   * (e.g., D32_FLOAT).
+   */
+  if (!full_surface_op) {
+
+ struct isl_extent2d px_dim;
+#if GEN_GEN == 8
+ /* Pre-SKL, HiZ has an 8x4 sample block. As the number of samples
+  * increases, the number of pixels representable by this block
+  * decreases by a factor of the sample dimensions. Sample dimensions
+  * scale following the MSAA interleaved pattern.
+  *
+  * Sample|Sample|Pixel
+  * Count |Dim   |Dim
+  * ===
+  *1  | 1x1  | 8x4
+  *2  | 2x1  | 4x4
+  *4  | 2x2  | 4x2
+  *8  | 4x2  | 2x2
+  *   16  | 4x4  | 2x1
+  *
+  * Table: Pixel Dimensions in a HiZ Sample Block Pre-SKL
+  */
+ const struct isl_extent2d sa_dim =
+isl_get_interleaved_msaa_px_size_sa(iview->image->samples);
+ px_dim.w = 8 / sa_dim.w;
+ px_dim.h = 4 / sa_dim.h;
+#else
+ /* SKL+, the sample block becomes a "pixel block" so the expected
+  * pixel dimension is a constant 8x4 px for all sample counts.
+  */
+ px_dim = (struct isl_extent2d) { .w = 8, .h = 4};
+#endif
+
+ /* Fast depth clears clear an entire sample block at a time. As a
+  * result, the rectangle must be aligned to the pixel dimensions of
+  * a sample block for a successful operation.
+  */
+ if (cmd_state->render_area.offset.x % px_dim.w ||
+ cmd_state->re

[Mesa-dev] [PATCH V2 05/11] anv: Allocate hiz surface

2016-09-26 Thread Nanley Chery

From: Chad Versace 

Nanley Chery:
(rebase)
 - Use isl_surf_get_hiz_surf()
(amend)
 - Only add a HiZ surface onto a depth/stencil attachment
 - Add comment above HiZ surface addition
 - Hide HiZ behind INTEL_VK_HIZ prior to BDW
 - Disable HiZ for untested cases
 - Remove DISABLE_AUX_BIT instead of preventing it from being added

Signed-off-by: Nanley Chery 
Reviewed-by: Jason Ekstrand 
Reviewed-by: Chad Versace  (v1)

---

v2: Disable certain HiZ cases here (Jason)

 src/intel/vulkan/anv_image.c | 39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index f6e8672..d408819 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -28,6 +28,7 @@
 #include 
 
 #include "anv_private.h"
+#include "util/debug.h"
 
 #include "vk_format_info.h"
 
@@ -60,6 +61,7 @@ choose_isl_surf_usage(VkImageUsageFlags vk_usage,
   default:
  unreachable("bad VkImageAspect");
   case VK_IMAGE_ASPECT_DEPTH_BIT:
+ isl_usage &= ~ISL_SURF_USAGE_DISABLE_AUX_BIT;
  isl_usage |= ISL_SURF_USAGE_DEPTH_BIT;
  break;
   case VK_IMAGE_ASPECT_STENCIL_BIT:
@@ -99,6 +101,16 @@ get_surface(struct anv_image *image, VkImageAspectFlags 
aspect)
}
 }
 
+static void
+add_surface(struct anv_image *image, struct anv_surface *surf)
+{
+   assert(surf->isl.size > 0); /* isl surface must be initialized */
+
+   surf->offset = align_u32(image->size, surf->isl.alignment);
+   image->size = surf->offset + surf->isl.size;
+   image->alignment = MAX(image->alignment, surf->isl.alignment);
+}
+
 /**
  * Initialize the anv_image::*_surface selected by \a aspect. Then update the
  * image's memory requirements (that is, the image's size and alignment).
@@ -160,9 +172,30 @@ make_surface(const struct anv_device *dev,
 */
assert(ok);
 
-   anv_surf->offset = align_u32(image->size, anv_surf->isl.alignment);
-   image->size = anv_surf->offset + anv_surf->isl.size;
-   image->alignment = MAX(image->alignment, anv_surf->isl.alignment);
+   add_surface(image, anv_surf);
+
+   /* Allow the user to control HiZ enabling. Disable by default on gen7
+* because resolves are not currently implemented pre-BDW.
+*/
+   if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >= 8)) {
+  anv_finishme("Implement gen7 HiZ");
+  return VK_SUCCESS;
+   } else if (vk_info->mipLevels > 1) {
+  anv_finishme("Test multi-LOD HiZ");
+  return VK_SUCCESS;
+   } else if (dev->info.gen == 8 && vk_info->samples > 1) {
+  anv_finishme("Test gen8 multisampled HiZ");
+  return VK_SUCCESS;
+   }
+
+   /* Add a HiZ surface to a depth buffer that will be used for rendering.
+*/
+   if (aspect == VK_IMAGE_ASPECT_DEPTH_BIT &&
+   (image->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
+  isl_surf_get_hiz_surf(&dev->isl_dev, &image->depth_surface.isl,
+&image->hiz_surface.isl);
+  add_surface(image, &image->hiz_surface);
+   }
 
return VK_SUCCESS;
 }
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 11/11] anv/TODO: Update the HiZ task

2016-09-26 Thread Nanley Chery

From: Nanley Chery 

Signed-off-by: Nanley Chery 

---

v2. Add untested HiZ cases

 src/intel/vulkan/TODO | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/TODO b/src/intel/vulkan/TODO
index 8fac370..9ac63eb 100644
--- a/src/intel/vulkan/TODO
+++ b/src/intel/vulkan/TODO
@@ -19,7 +19,7 @@ Code sharing with GL:
  - Generalize blorp to use ISL and be sharable between the two drivers
 
 Performance:
- - HiZ (Nanley)
+ - Multi-{sampled/gen8,LOD,subpass} HiZ
  - Fast color clears (after HiZ?)
  - Compressed multisample support
  - Renderbuffer compression (SKL+)
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 04/11] anv: Add func anv_image_has_hiz()

2016-09-26 Thread Nanley Chery

From: Chad Versace 

Signed-off-by: Nanley Chery 
Reviewed-by: Jason Ekstrand 

---

v2. Check aspect instead of usage (Chad, Jason)

 src/intel/vulkan/anv_private.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 7e08786..5f925df 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1762,6 +1762,16 @@ const struct anv_surface *
 anv_image_get_surface_for_aspect_mask(const struct anv_image *image,
   VkImageAspectFlags aspect_mask);
 
+static inline bool
+anv_image_has_hiz(const struct anv_image *image)
+{
+   /* We must check the aspect because anv_image::hiz_surface belongs to
+* a union.
+*/
+   return (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
+  image->hiz_surface.isl.size > 0;
+}
+
 void anv_image_view_init(struct anv_image_view *view,
  struct anv_device *device,
  const VkImageViewCreateInfo* pCreateInfo,
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 07/11] anv/image: Memset hiz surfaces to 0 when binding memory

2016-09-26 Thread Nanley Chery

From: Jason Ekstrand 

Nanley Chery (amend):
 - Change memset value from 0xff to 0 (a defined value for HiZ).

Signed-off-by: Nanley Chery 

---

v2. Add asserts (Jason)
Handle NULL return value of the mmap

 src/intel/vulkan/anv_image.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 57b2014..1cdb71f 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -319,11 +319,12 @@ anv_DestroyImage(VkDevice _device, VkImage _image,
 }
 
 VkResult anv_BindImageMemory(
-VkDevicedevice,
+VkDevice_device,
 VkImage _image,
 VkDeviceMemory  _memory,
 VkDeviceSizememoryOffset)
 {
+   ANV_FROM_HANDLE(anv_device, device, _device);
ANV_FROM_HANDLE(anv_device_memory, mem, _memory);
ANV_FROM_HANDLE(anv_image, image, _image);
 
@@ -335,6 +336,34 @@ VkResult anv_BindImageMemory(
   image->offset = 0;
}
 
+   if (anv_image_has_hiz(image)) {
+
+  /* The offset and size must be a multiple of 4K or else the
+   * anv_gem_mmap call below will return NULL.
+   */
+  assert((image->offset + image->hiz_surface.offset) % 4096 == 0);
+  assert(image->hiz_surface.isl.size % 4096 == 0);
+
+  /* HiZ surfaces need to have their memory cleared to 0 before they
+   * can be used.  If we let it have garbage data, it can cause GPU
+   * hangs on some hardware.
+   */
+  void *map = anv_gem_mmap(device, image->bo->gem_handle,
+   image->offset + image->hiz_surface.offset,
+   image->hiz_surface.isl.size,
+   device->info.has_llc ? 0 : I915_MMAP_WC);
+
+  /* If anv_gem_mmap returns NULL, it's likely that the kernel was
+   * not able to find space on the host to create a proper mapping.
+   */
+  if (map == NULL)
+ return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+  memset(map, 0, image->hiz_surface.isl.size);
+
+  anv_gem_munmap(map, image->hiz_surface.isl.size);
+   }
+
return VK_SUCCESS;
 }
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 06/11] anv: Move BindImageMemory to anv_image.c

2016-09-26 Thread Nanley Chery

From: Jason Ekstrand 

Signed-off-by: Nanley Chery 
Reviewed-by: Chad Versace 
Reviewed-by: Jason Ekstrand 
---
 src/intel/vulkan/anv_device.c | 20 
 src/intel/vulkan/anv_image.c  | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index f786ebe..bc623e7 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1421,26 +1421,6 @@ VkResult anv_BindBufferMemory(
return VK_SUCCESS;
 }
 
-VkResult anv_BindImageMemory(
-VkDevicedevice,
-VkImage _image,
-VkDeviceMemory  _memory,
-VkDeviceSizememoryOffset)
-{
-   ANV_FROM_HANDLE(anv_device_memory, mem, _memory);
-   ANV_FROM_HANDLE(anv_image, image, _image);
-
-   if (mem) {
-  image->bo = &mem->bo;
-  image->offset = memoryOffset;
-   } else {
-  image->bo = NULL;
-  image->offset = 0;
-   }
-
-   return VK_SUCCESS;
-}
-
 VkResult anv_QueueBindSparse(
 VkQueue queue,
 uint32_tbindInfoCount,
diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index d408819..57b2014 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -318,6 +318,26 @@ anv_DestroyImage(VkDevice _device, VkImage _image,
anv_free2(&device->alloc, pAllocator, anv_image_from_handle(_image));
 }
 
+VkResult anv_BindImageMemory(
+VkDevicedevice,
+VkImage _image,
+VkDeviceMemory  _memory,
+VkDeviceSizememoryOffset)
+{
+   ANV_FROM_HANDLE(anv_device_memory, mem, _memory);
+   ANV_FROM_HANDLE(anv_image, image, _image);
+
+   if (mem) {
+  image->bo = &mem->bo;
+  image->offset = memoryOffset;
+   } else {
+  image->bo = NULL;
+  image->offset = 0;
+   }
+
+   return VK_SUCCESS;
+}
+
 static void
 anv_surface_get_subresource_layout(struct anv_image *image,
struct anv_surface *surface,
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 10/11] genX/cmd_buffer: Enable fast depth clears

2016-09-26 Thread Nanley Chery

From: Nanley Chery 

Provides an FPS increase of ~30% on the Sascha triangle and multisampling
demos.

Clears that happen within a render pass via vkCmdClearAttachments are safe
even if the clear color changes. This is because the meta implementation does
not use LOAD_OP_CLEAR which avoids any conflicts with 3DSTATE_CLEAR_PARAMS.

Signed-off-by: Nanley Chery 
Reviewed-by: Jason Ekstrand 

---

v2. Update granularity comment for accuracy

 src/intel/vulkan/anv_pass.c| 13 +
 src/intel/vulkan/gen8_cmd_buffer.c |  6 ++
 src/intel/vulkan/genX_cmd_buffer.c |  4 +---
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
index 69c3c7e..595c2ea 100644
--- a/src/intel/vulkan/anv_pass.c
+++ b/src/intel/vulkan/anv_pass.c
@@ -155,5 +155,18 @@ void anv_GetRenderAreaGranularity(
 VkRenderPassrenderPass,
 VkExtent2D* pGranularity)
 {
+   ANV_FROM_HANDLE(anv_render_pass, pass, renderPass);
+
+   /* This granularity satisfies HiZ fast clear alignment requirements
+* for all sample counts.
+*/
+   for (unsigned i = 0; i < pass->subpass_count; ++i) {
+  if (pass->subpasses[i].depth_stencil_attachment !=
+  VK_ATTACHMENT_UNUSED) {
+ *pGranularity = (VkExtent2D) { .width = 8, .height = 4 };
+ return;
+  }
+   }
+
*pGranularity = (VkExtent2D) { 1, 1 };
 }
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 14e6a7b..96e972c 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -479,6 +479,12 @@ genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
  cmd_state->render_area.extent.height % px_dim.h)
 return;
   }
+
+  anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp) {
+ cp.DepthClearValueValid = true;
+ cp.DepthClearValue =
+cmd_buffer->state.attachments[ds].clear_value.depthStencil.depth;
+  }
   break;
case BLORP_HIZ_OP_DEPTH_RESOLVE:
   if (cmd_buffer->state.pass->attachments[ds].store_op !=
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 2cb1539..290fefc 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1320,9 +1320,6 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
} else {
   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_STENCIL_BUFFER), sb);
}
-
-   /* Clear the clear params. */
-   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp);
 }
 
 /**
@@ -1357,6 +1354,7 @@ void genX(CmdBeginRenderPass)(
 
genX(cmd_buffer_set_subpass)(cmd_buffer, pass->subpasses);
genX(cmd_buffer_do_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
+   genX(cmd_buffer_do_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_CLEAR);
anv_cmd_buffer_clear_subpass(cmd_buffer);
 }
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/va: Fix vaSyncSurface with no outstanding operation

2016-09-26 Thread Mark Thompson

On 27/09/16 00:49, Andy Furniss wrote:
> Mark Thompson wrote:
>> ---
>> A simple fix to the problem described here: 
>> .
>>
>> With this applied, the driver no longer hangs/crashes when vaSyncSurface() 
>> is called in places other than for the first time after an encode operation 
>> (including a second call on the same surface).
> 
> Once I could get ffmpeg (patched) or avconv to roughly work (before the dual 
> instance commit), but I can't get either to work now = produces unreadable 
> file.
> 
> Testing with git avconv I am trying -
> 
> ./avconv -vaapi_device :0 -f rawvideo -framerate 50 -s 2560x1440 -pix_fmt 
> nv12 -i /mnt/ramdisk/trees-1440p50.nv12 -vframes 5 -vf 'hwupload' -c:v 
> h264_vaapi -profile:v 66 -b:v 40M  -bf 0 -g 30  -f h264 -y 
> /mnt/ramdisk/out.264
> 
> but debugging printfs show refs = 2 and bframes enabled (I also notice with 
> your baseline patch that -profile:v 66 fails).
> 
> Do you have an example that works for you with avconv + this patch?

Yes: this patch 
 is 
also required to match the vaSyncSurface() change.  The rest of the that series 
to libav and the one to mesa for config setup makes it all a bit more sensible 
(doesn't submit a load of packed headers which are ignored), but it does mostly 
work without.

With all of those, the commands:

./avconv -y -vaapi_device /dev/dri/renderD129 -i in.mp4 -an -vf 
'format=nv12,hwupload' -c:v h264_vaapi -bf 0 out.mp4

./avconv -y -vaapi_device /dev/dri/renderD129 -hwaccel vaapi 
-hwaccel_output_format vaapi -i in.mp4 -an -c:v h264_vaapi -bf 0 out.mp4

./avconv -y -vaapi_device /dev/dri/renderD129 -hwaccel vaapi 
-hwaccel_output_format vaapi -i in.mp4 -an -vf 'scale_vaapi=w=1280:h=720' -c:v 
h264_vaapi -bf 0 out.mp4

work sensibly for me (also with -b for CBR, -qp for CQP, -g for GOP size); I 
imagine raw video as in your example would also be fine.  On profile, 
constrained baseline on the command line is 578 (== 66 | 0x200, for 
constraint_set1_flag).

Thanks,

- Mark

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [llvm] r282237 - [InstCombine] Fix for PR29124: reduce insertelements to shufflevector

2016-09-26 Thread Michel Dänzer

On 26/09/16 10:28 PM, Alexey Bataev wrote:
> Michael, fixed this bug in r282401

I can confirm it's fixed, thanks!


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: remove remaining tabs in glsl_parser_extras.h

2016-09-26 Thread Timothy Arceri

---
 src/compiler/glsl/glsl_parser_extras.h | 60 +-
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index f4050e3..b9c9a1a 100644
--- a/src/compiler/glsl/glsl_parser_extras.h
+++ b/src/compiler/glsl/glsl_parser_extras.h
@@ -69,12 +69,12 @@ typedef struct YYLTYPE {
 # define YYLTYPE_IS_TRIVIAL 1
 
 extern void _mesa_glsl_error(YYLTYPE *locp, _mesa_glsl_parse_state *state,
-const char *fmt, ...);
+ const char *fmt, ...);
 
 
 struct _mesa_glsl_parse_state {
_mesa_glsl_parse_state(struct gl_context *_ctx, gl_shader_stage stage,
- void *mem_ctx);
+  void *mem_ctx);
 
DECLARE_RALLOC_CXX_OPERATORS(_mesa_glsl_parse_state);
 
@@ -816,23 +816,23 @@ struct _mesa_glsl_parse_state {
unsigned clip_dist_size, cull_dist_size;
 };
 
-# define YYLLOC_DEFAULT(Current, Rhs, N)   \
-do {   \
-   if (N)  \
-   {   \
-  (Current).first_line   = YYRHSLOC(Rhs, 1).first_line;\
-  (Current).first_column = YYRHSLOC(Rhs, 1).first_column;  \
-  (Current).last_line= YYRHSLOC(Rhs, N).last_line; \
-  (Current).last_column  = YYRHSLOC(Rhs, N).last_column;   \
-   }   \
-   else\
-   {   \
-  (Current).first_line   = (Current).last_line =   \
-YYRHSLOC(Rhs, 0).last_line;\
-  (Current).first_column = (Current).last_column = \
-YYRHSLOC(Rhs, 0).last_column;  \
-   }   \
-   (Current).source = 0;   \
+# define YYLLOC_DEFAULT(Current, Rhs, N)\
+do {\
+   if (N)   \
+   {\
+  (Current).first_line   = YYRHSLOC(Rhs, 1).first_line; \
+  (Current).first_column = YYRHSLOC(Rhs, 1).first_column;   \
+  (Current).last_line= YYRHSLOC(Rhs, N).last_line;  \
+  (Current).last_column  = YYRHSLOC(Rhs, N).last_column;\
+   }\
+   else \
+   {\
+  (Current).first_line   = (Current).last_line =\
+ YYRHSLOC(Rhs, 0).last_line;\
+  (Current).first_column = (Current).last_column =  \
+ YYRHSLOC(Rhs, 0).last_column;  \
+   }\
+   (Current).source = 0;\
 } while (0)
 
 /**
@@ -841,11 +841,11 @@ do {  
\
  * \sa _mesa_glsl_error
  */
 extern void _mesa_glsl_warning(const YYLTYPE *locp,
-  _mesa_glsl_parse_state *state,
-  const char *fmt, ...);
+   _mesa_glsl_parse_state *state,
+   const char *fmt, ...);
 
 extern void _mesa_glsl_lexer_ctor(struct _mesa_glsl_parse_state *state,
- const char *string);
+  const char *string);
 
 extern void _mesa_glsl_lexer_dtor(struct _mesa_glsl_parse_state *state);
 
@@ -863,9 +863,9 @@ extern int _mesa_glsl_parse(struct _mesa_glsl_parse_state 
*);
  * \c false is returned.
  */
 extern bool _mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp,
-const char *behavior,
-YYLTYPE *behavior_locp,
-_mesa_glsl_parse_state *state);
+ const char *behavior,
+ YYLTYPE *behavior_locp,
+ _mesa_glsl_parse_state *state);
 
 #endif /* __cplusplus */
 
@@ -880,11 +880,11 @@ extern "C" {
 struct glcpp_parser;
 
 typedef void (*glcpp_extension_iterator)(
-   struct _mesa_glsl_parse_state *state,
-   void (*add_builtin_define)(struct glcpp_parser *, const char *, 
int),
-   struct glcpp_parser *data,
-   unsigned version,
-   bool es);
+  struct _mesa_glsl_parse_state *state,
+  void (*add_builtin_def

[Mesa-dev] [Bug 96953] dri2_wl_swrast crashes on 64 bit, but not on 32 bit

2016-09-26 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=96953

--- Comment #3 from n3rdopolis  ---
I tried a recent recompile, this still appears to be happening, 64 bit only. 32
bit is fine

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

89 matches

Mail list logo