On 03/31/2016 06:27 PM, Ilia Mirkin wrote:
On Thu, Mar 31, 2016 at 12:08 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:
Hi,

This series adds support for ARB_compute_shader on GK104 and GM107+, except on
GK110 where one test miserably fail (related to texelFetch) for really weird
reasons. Anyway, this is not going to break anything because NVF0_COMPUTE is
still required for using compute on GK110. I will have a deeper look at this
fail later.

Almost all dEQP compute tests pass with a very good ratio. As usual, the list
of fails is described below. About piglit, only two tests fail but this is
related to images support.

I don't update GL3.txt in this series because compute shaders are not really
useful without images support.

ARB_shader_image_load_store and ARB_shader_image_size are in work in progress
and should be ready in a couple of weeks.

Please review,
Thanks!

Samuel Pitoiset (13):
   nvc0: bind driver cb for compute on c7[] for Kepler
   nvc0: bind shader buffers for compute on Kepler
   nvc0: bind user uniforms for compute on Kepler
   nvc0: reserve an area for ubos info in the driver constbuf
   nvc0: store ubo info to the driver constbuf on Kepler
   nvc0: reduce likelihood of collision for real buffers on Kepler
   nvc0: add indirect compute support on Kepler
   nvc0/ir: add support for compute UBOs on Kepler
   nvc0/ir: fix wrong pred emission for ld lock on GK104
   nvc0/ir: add atomics support on shared memory for Kepler
   nvc0/ir: do not lower shared+atomics on GM107+
   nvc0: bump the maximum number of UBOs for compute on Kepler
   nvc0: enable compute shaders on GK104 and GM107+

  .../drivers/nouveau/codegen/nv50_ir_driver.h       |   1 +
  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |   5 +-
  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      | 181 +++++++++++++-
  .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   4 +
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c    |   4 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h    |  15 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c    |  16 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c     |   6 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h     |   1 -
  .../drivers/nouveau/nvc0/nvc0_state_validate.c     |   8 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c        |   2 +-
  src/gallium/drivers/nouveau/nvc0/nve4_compute.c    | 260 ++++++++++++++++-----
  src/gallium/drivers/nouveau/nvc0/nve4_compute.h    |  44 +---
  13 files changed, 421 insertions(+), 126 deletions(-)

--
2.7.4

** dEQP **

deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/scalar:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec2:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec3:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec4:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec2:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec4:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/scalar:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec2:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec3:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec4:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/scalar:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec2:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec3:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec4:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/scalar:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec2:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec3:
 fail
deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec4:
 fail

These are all expected(ish). IIRC I looked into atan2 and it was
returning numbers outside of the expected range, so we could use a
clamp on that maybe. I suspect the issue with tanh is similar. These
are all done in the GLSL IR anyways, and, as it happens, also fail on
i965. So I wouldn't worry about those.

Yeah, these fails are unrelated to my work.


deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2darrayshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2dshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/samplercubeshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2darrayshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/samplercubeshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2darrayshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2dshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/samplercubeshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2darrayshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2dshadow:
 fail
deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/samplercubeshadow:
 fail

These, OTOH, are not. This leads me to believe that I've missed out on
some bit of subtlety wrt ordering or placement of the shadow argument
on Kepler. Please trace the simplest one of these (I'm thinking
gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow)
on the blob, and see if it orders some arguments differently.

Sure, but my plan is to fix them later. :-)

Check your mailbox for the trace.


Note that the current state of the art wrt tex argument ordering
knowledge is at:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n636

Note that I may have made wild assumptions about the similarity of
SM30 and SM35 argument ordering that may have been unwarranted. Were
you seeing this on SM30 or SM35? Since they reordered *something* on
every other ISA change, it seems a little odd that they would have
kept things put for SM30 -> SM35. Perhaps the hw guys had a moment of
weakness :)

Not included in that description is the splitting up of the (up to) 8
potential arguments between 2 (implicitly) quad register arguments.
This logic is available here, in code form only:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2126

For nve0 (and, implicitly, nvf0), it just cuts them up at the 4th
argument. However the cutting for nvc0 and gm107 are a little more
sophisticated than that, and it's entirely possible some bit of
subtlety was missed there. [And also, entirely possible that some
wrong way works sometimes even though it's wrong.]

I'll have a look at the MMT trace to see if something is wrong.
Thanks for your explanation.


   -ilia

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to