On 09/12/2016 05:19 PM, Nicolai Hähnle wrote:
On 11.09.2016 20:45, Samuel Pitoiset wrote:
Signed-off-by: Samuel Pitoiset
---
src/gallium/docs/source/screen.rst | 4
src/gallium/drivers/ilo/ilo_screen.c | 2 ++
src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2
On 09/12/2016 05:26 PM, Nicolai Hähnle wrote:
On 11.09.2016 20:45, Samuel Pitoiset wrote:
This extension is only exposed if the underlying driver supports
ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK
is set.
v2: - expose the ext based on that new cap
Signed-off-by
When multiple GPUs are plugged in the same box, we might want to
use /dev/dri/renderD129 without updating/compiling the code. This
doesn't change the existing behaviour.
---
run.c | 23 +++
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/run.c b/run.c
index c7f0b
On 10/18/2016 05:53 AM, Ilia Mirkin wrote:
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted by one argument position.
Signed-off-by: Ilia Mirkin
Cc: mesa-sta...@lists.freedesktop.org
---
src
->tex.rIndirectSrc >= 0) {
emitInsn (0xdeb8);
- emitField(0x35, 2, lodm);
+ emitField(0x25, 2, lodm);
The length should be 3, but as we don't use lba/lla, it's fine.
Reviewed-by: Samuel Pitoiset
emitField(0x24, 1, insn->tex.useOffsets == 1);
} els
On 10/18/2016 10:50 AM, Samuel Pitoiset wrote:
On 10/18/2016 05:53 AM, Ilia Mirkin wrote:
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted by one argument position.
Signed-off-by: Ilia
On 10/18/2016 12:33 PM, Samuel Pitoiset wrote:
On 10/18/2016 10:50 AM, Samuel Pitoiset wrote:
On 10/18/2016 05:53 AM, Ilia Mirkin wrote:
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted
Found that information message while replaying a trace from
Metro 2033 Redux. Mark that property as useless for now.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/drivers/nouveau
Ah, I probably forgot to remove it in the later revisions of my
ARB_compute_variable_group_size series.
Thanks.
Reviewed-by: Samuel Pitoiset
On 10/19/2016 01:51 AM, Timothy Arceri wrote:
Cc: Samuel Pitoiset
Cc: Kenneth Graunke
---
src/mesa/main/mtypes.h| 5 -
src/mesa/main
radeonsi does the same check, seems like correct.
How did you catch this? Does this fix a CTS test or something else?
Reviewed-by: Samuel Pitoiset
On 10/19/2016 06:08 AM, Ilia Mirkin wrote:
The state tracker tries to attach the info to the wrong shader. This is
easy enough to protect against
I'm aware of that CTS fail, and according to the GLSL shader this makes
sense.
I would like to run a quick piglit before you push it, but I'm confident
with that change. :-)
Reviewed-by: Samuel Pitoiset
On 10/19/2016 07:22 AM, Ilia Mirkin wrote:
With ARB_gpu_shader5, texture o
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index dbd0f7d..0c143e5 100644
--- a
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
The predicate is always CC_NOT_P as defined in
processSurfaceCoordsNVE4(), so we only want to emit OR.
Signed-off-by: Samuel Pitoiset
---
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp| 20 ++--
1 file changed, 6 insertions(+), 14 deletions(-)
diff --git a/src/gallium
, Oct 19, 2016 at 5:21 PM, Samuel Pitoiset
wrote:
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/ga
On 10/19/2016 11:41 PM, Ilia Mirkin wrote:
On Wed, Oct 19, 2016 at 5:33 PM, Samuel Pitoiset
wrote:
On 10/19/2016 11:29 PM, Ilia Mirkin wrote:
It avoids creating a ton of symbols unnecessarily during the lifetime
of the pass. Does it hurt anything?
I think either we use that symbol
be to use two different channels, one for 3D and one
for CP.
This fixes a bunch of regressions pinpointed by piglit.
Fixes: "nvc0: fix up image support for allowing multiple samples"
Cc: "12.0"
Signed-off-by: Samuel Pitoiset
---
This will require different piglit runs on bot
On 10/20/2016 12:55 AM, Ilia Mirkin wrote:
On Wed, Oct 19, 2016 at 6:46 PM, Samuel Pitoiset
wrote:
Long short story, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.
I still don't hav
On 10/20/2016 12:46 AM, Samuel Pitoiset wrote:
Long short story, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.
I still don't have the faintest clue what is going on, but
this seems to
Reviewed-by: Samuel Pitoiset
On 10/20/2016 04:44 AM, Ilia Mirkin wrote:
This reverts commits 1af0641db345209c076e9b1ba4dca7524541671a and
a6ad49cbbd599aec054d0a3163fff5ad724f2b18.
st/mesa adjusts the rasterizer state for us now.
Signed-off-by: Ilia Mirkin
---
v1 -> v2: also revert the n
This makes shader-db reports results for compute shaders.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 4
1 file changed, 4 insertions(+)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
index
On Thu, Oct 20, 2016 at 12:08 PM, Samuel Pitoiset
wrote:
This makes shader-db reports results for compute shaders.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 4
1 file changed, 4 insertions(+)
diff --git a/src/gallium/drivers/nouveau/nvc0
Reviewed-by: Samuel Pitoiset
On 10/21/2016 08:30 AM, Ilia Mirkin wrote:
radeonsi also does the same thing. I suspect that this is likely to be a
no-op in reality, but it brings nouveau code closer to what the blob
produces. Plus it makes sense to not try to do auto-derivatives on this.
Signed
Reviewed-by: Samuel Pitoiset
On 10/21/2016 08:30 AM, Ilia Mirkin wrote:
Signed-off-by: Ilia Mirkin
---
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b
Reviewed-by: Samuel Pitoiset
On 10/21/2016 08:30 AM, Ilia Mirkin wrote:
nvdisasm does not print a .S even though the bit is set.
Signed-off-by: Ilia Mirkin
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/drivers
On 10/21/2016 11:18 AM, Samuel Pitoiset wrote:
Reviewed-by: Samuel Pitoiset
On 10/21/2016 08:30 AM, Ilia Mirkin wrote:
nvdisasm does not print a .S even though the bit is set.
Signed-off-by: Ilia Mirkin
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 +
1 file changed, 1
This patch breaks a bunch of piglit tests, see a short list below:
bin/arb_texture_barrier-blending-in-shader 512 42 1 128 7 -auto -fbo
bin/arb_texture_buffer_object-formats vs core -auto -fbo
bin/texelFetch 140 vs sampler2DRect -auto -fbo
bin/mesa_pack_invert-readpixels -auto -fbo
...
Around 15
This affects GF100:GK110 chipsets, but not GM107+ where the
logic is a bit different. The emitters tried to emit sub
instead of subr when src0 has a NEG modifier.
This fixes the following piglit tests glsl-fs-loop-nested
and glsl-vs-loop-nested.
Signed-off-by: Samuel Pitoiset
Cc: "
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1 -
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 1 -
2 files changed, 2 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
b/src/gallium/drivers
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders which use shared memory.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen
On 10/24/2016 04:35 PM, Ilia Mirkin wrote:
On Mon, Oct 24, 2016 at 10:29 AM, Samuel Pitoiset
wrote:
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 0c143e5..78c0757 100644
--- a
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders which use shared memory.
v2: - do not use ==
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers
On 10/08/2016 06:58 PM, Samuel Pitoiset wrote:
This breaks a bunch of things, like:
spec/glsl-4.30/execution/built-in-functions/cs-all-bvec2-using-if: fail
spec/glsl-4.30/execution/built-in-functions/cs-all-bvec3-using-if: fail
spec/glsl-4.30/execution/built-in-functions/cs-all-bvec4-using-if
Signed-off-by: Samuel Pitoiset
Cc: "12.0 13.0"
---
src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c
index cbc270d..e57391e 100644
e and 3d instead of writing compiler optimizations.
I didn't see any regressions with full piglit on GK110, but I will
launch a new one on Fermi.
Please review,
Thanks!
Samuel Pitoiset (7):
nvc0: reduce the number of PUSH_SPACE in draw path
nvc0: only update primitive restart for ind
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 14 ++
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
index 138e24d..11fd7eb 100644
--- a/src
This might help CPU-bounds applications but should not have
any real effects for GPU-bounds ones.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 8 +++-
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0
This is especially useful when switching from compute to 3D.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 11 +++
src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 14 ++
src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 11
Unnecessary to update it at every draw calls, especially for
non-indexed draws. This is similar to what nv50 already does.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers
Loosely based on radeonsi, thanks Nicolai!
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nv50/nv50_miptree.c | 13
src/gallium/drivers/nouveau/nv50/nv50_resource.h | 3 ++-
src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c | 6 --
src/gallium/drivers/nouveau
It's not particularily useful to store commands which are
going to be send few lines after.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 17 -
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/src/gallium/drivers/nouveau
MEM_BARRIER seems to be similar to FLUSH, thus bit 0 is for
flushing code while bit 12 is for constant buffers.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 +-
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 2 +-
2 files changed, 2 insertions(+), 2
On 10/25/2016 09:49 PM, Ilia Mirkin wrote:
What if instance_count = 1M? (It can happen.)
We allocate a giant space in the pushbuf in one shot. Well, anyways this
is not the optimization of the year, so I can drop it. :-)
On Tue, Oct 25, 2016 at 3:41 PM, Samuel Pitoiset
wrote:
This
%
sure it's correct (and blob doesn't always use 0x1011).
On Tue, Oct 25, 2016 at 3:41 PM, Samuel Pitoiset
wrote:
MEM_BARRIER seems to be similar to FLUSH, thus bit 0 is for
flushing code while bit 12 is for constant buffers.
Signed-off-by: Samuel Pitoiset
---
src/gallium/driv
On 10/25/2016 09:59 PM, Ilia Mirkin wrote:
On Tue, Oct 25, 2016 at 3:41 PM, Samuel Pitoiset
wrote:
This is especially useful when switching from compute to 3D.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 11 +++
src/gallium/drivers/nouveau
On 10/25/2016 09:57 PM, Ilia Mirkin wrote:
It's useful because it lets you avoid having to send a bunch of begins. NAK.
Well okay, we should probably adds more consistency then, because it's
the only place where we do something like that.
On Tue, Oct 25, 2016 at 3:41 PM, Samue
You forgot to add emission for the CC flag, ie:
if (i->flagsDef >= 0)
code[1] |= 1 << 23;
Few other comments below.
On 10/09/2016 11:04 AM, Karol Herbst wrote:
v2: renamed commit
reordered modifiers
add assert(dst == src2)
Signed-off-by: Karol Herbst
---
.../drivers/nouveau/codege
On 10/09/2016 11:04 AM, Karol Herbst wrote:
v2: renamed commit
reordered modifiers
add assert(dst == src2)
Signed-off-by: Karol Herbst
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 35 --
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/sr
I'm definitely in favour of my first solution, ie.:
if (postRA)
return post_ra_dead(this);
On 10/09/2016 11:04 AM, Karol Herbst wrote:
Signed-off-by: Karol Herbst
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h| 2 +-
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20
e any
improvements with Elemental but this might help in some cases.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 12 +-
src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 7 +-
src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 159 ++--
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 14 ++
src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 10 +-
src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 4 ++--
3 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/src
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 72 ++---
1 file changed, 16 insertions(+), 56 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
index d661c00
The emitter tried to emit sub instead of subr when src0 has
actually a NEG modifier.
Signed-off-by: Samuel Pitoiset
Cc: "11.0 12.0 13.0"
---
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 +-
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +-
2 files
nitor more queries than
supported.
This breaks amd_performance_monitor_measure but it's expected.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nv50/nv50_query.c | 14 ++
src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 14 ++
2 files changed, 12
Reviewed-by: Samuel Pitoiset
One minor comment below.
On 10/09/2016 11:04 AM, Karol Herbst wrote:
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
Signed-off-by: Karol Herbst
---
.../drivers/nouveau/codegen
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 50 +++---
1 file changed, 7 insertions(+), 43 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 18 --
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
On 10/31/2016 02:21 PM, Ilia Mirkin wrote:
Does that work? Won't you just end up with 0 all the time?
Forgot to return double instead of uint64_t...
On Mon, Oct 31, 2016 at 10:16 AM, Samuel Pitoiset
wrote:
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau
v2: - forgot to return double instead of uint64_t
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 26 +-
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src
On 10/31/2016 02:41 PM, Ilia Mirkin wrote:
Is that worth it? Now you're getting potentially imprecise return
results for int64 counters, just to remove a few *100?
We don't have any in64 counters, we do only use UINT64.
On Mon, Oct 31, 2016 at 10:37 AM, Samuel Pitoiset
This is a nice refactoring.
Reviewed-by: Samuel Pitoiset
On 11/02/2016 05:38 AM, Ilia Mirkin wrote:
The GM107 had a bunch of prepareEmission needlessly duplicated because
the sched block size is different. Move that knowledge into the target,
and generalize the existing code.
Signed-off-by
This seems like redundant, and because the GM107 emitter already has a
bunch of emitXXX() helpers, how about adding emitTARG()? Like:
void
CodeEmitterGM107::emitTARG()
{
int32_t pos = insn->target.bb->binPos;
if (writeIssueDelays && !(pos & 0x1f))
pos += 8;
emitField(0x14, 24, pos - (c
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
index 0d7ead3..03a3ff0 100644
--- a
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
index 36534ba
!
Samuel Pitoiset (6):
nvc0: sort performance metrics alphabetically
nvc0: respect 80-chars for perf metrics descriptions
nvc0: add new warp_execution_efficiency metric on SM30+
nvc0: do not expose metric-inst_issued twice on SM35
nvc0: add missing metric-issue_slot on SM35
nvc0: add new
Event not_predicated_off_thread_inst_executed is SM35+.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 37 +-
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 1 +
2 files changed, 37 insertions(+), 1 deletion(-)
diff --git a
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 23 ++
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 1 +
2 files changed, 24 insertions(+)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
index e5034f7..0d7ead3 100644
--- a/src
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
index 6f02be3
gk106 returned the correct value.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 29 +-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers
On 11/04/2016 07:21 PM, Pierre Moreau wrote:
Are reduction doable on shared atomics as well?
AFAIK, no.
Pierre
On 08:08 pm - Nov 04 2016, Samuel Pitoiset wrote:
This is similar to NVC0 and GK110 emitters where we emit
reduction operations instead of atomic operations when the
Instead, hardcode group sigsel because there are a bunch of unknown
groups, especially on SM50/SM52.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 112 ++---
1 file changed, 56 insertions(+), 56 deletions(-)
diff --git a/src/gallium
On 11/05/2016 06:06 PM, Ilia Mirkin wrote:
On Sat, Nov 5, 2016 at 12:56 PM, Samuel Pitoiset
wrote:
Instead, hardcode group sigsel because there are a bunch of unknown
groups, especially on SM50/SM52.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_sm.c
s, it's more fine-grained texture flushes. I will run piglit on few
cards and check elemental on fermi/kepler to be sure the validation is
still correct.
On Wed, Oct 26, 2016 at 4:00 PM, Samuel Pitoiset
wrote:
The first goal is to reduce code duplication between 3d and
compute and increase r
ce GL 4.5
On Tue, Oct 25, 2016 at 3:41 PM, Samuel Pitoiset
wrote:
Unnecessary to update it at every draw calls, especially for
non-indexed draws. This is similar to what nv50 already does.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 5 +++--
1 file change
On 11/07/2016 04:32 AM, Ilia Mirkin wrote:
On Wed, Oct 26, 2016 at 4:14 AM, Samuel Pitoiset
wrote:
On 10/25/2016 09:49 PM, Ilia Mirkin wrote:
What if instance_count = 1M? (It can happen.)
We allocate a giant space in the pushbuf in one shot. Well, anyways this is
not the optimization
On 11/07/2016 04:36 AM, Ilia Mirkin wrote:
On Tue, Oct 25, 2016 at 3:41 PM, Samuel Pitoiset
wrote:
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 14 ++
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/src/gallium/drivers/nouveau
This is especially useful when switching from compute to 3D.
v2: - get rid of one loop with 'x |= (1ULL << y) - 1' instead
Signed-off-by: Samuel Pitoiset
---
Tested with Elemental on GK208, works fine.
src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 6 +++---
src/gallium
This could be still improved by adding textures/samplers_valid[6] into
the context.
On 11/07/2016 11:13 PM, Samuel Pitoiset wrote:
This is especially useful when switching from compute to 3D.
v2: - get rid of one loop with 'x |= (1ULL << y) - 1' instead
Signed-off-by:
egressions with piglit on GF108 and GM107.
Heaven, Valley and Shadow of Mordor look fine as well.
On Wed, Oct 26, 2016 at 4:00 PM, Samuel Pitoiset
wrote:
The first goal is to reduce code duplication between 3d and
compute and increase readability of that area.
This refactoring also tries to redu
For all instructions with 3 sources (like OP_SLCT), src2 needs
to be destroyed because srcExists(2) will return true although
it's actually undefined.
Spotted with my ADD3 series.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 +
1 file ch
On 11/09/2016 03:58 PM, Ilia Mirkin wrote:
On Wed, Nov 9, 2016 at 9:20 AM, Samuel Pitoiset
wrote:
For all instructions with 3 sources (like OP_SLCT), src2 needs
to be destroyed because srcExists(2) will return true although
it's actually undefined.
Spotted with my ADD3 series.
Sounds
On 11/09/2016 04:19 PM, Ilia Mirkin wrote:
On Wed, Nov 9, 2016 at 10:10 AM, Samuel Pitoiset
wrote:
On 11/09/2016 03:58 PM, Ilia Mirkin wrote:
On Wed, Nov 9, 2016 at 9:20 AM, Samuel Pitoiset
wrote:
For all instructions with 3 sources (like OP_SLCT), src2 needs
to be destroyed because
hey work fine on the different cards. I have tested
with nouveau-git this week, nothing changed. I will report the issue.
Please review,
Thanks!
Samuel Pitoiset (1):
nvc0: support MP performance counters on Maxwell
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 26 +-
.../drivers/nouveau/nvc
This adds some performance counters/metrics for SM50/SM52.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 26 +-
.../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 740 -
.../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 13
On 11/10/2016 03:42 AM, Ilia Mirkin wrote:
Signed-off-by: Ilia Mirkin
---
v1 -> v2:
Move to handling this at SSA time. This is a lot more fragile since the
texture arguments have been reordered already, but it's still easy enough
to find the LOD argument.
.../nouveau/codegen/nv50_ir_l
On 11/14/2016 06:53 PM, Ilia Mirkin wrote:
On Mon, Nov 14, 2016 at 12:39 PM, Samuel Pitoiset
wrote:
On 11/10/2016 03:42 AM, Ilia Mirkin wrote:
Signed-off-by: Ilia Mirkin
---
v1 -> v2:
Move to handling this at SSA time. This is a lot more fragile since the
texture arguments h
This is not allowed for indirect accesses because the source
GPR might be erased by a subsequent instruction (WaR hazard)
if we don't emit a read dep bar.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp | 16
.../drivers/no
iform accesses. I should do
something similar when loading from the driver constant buffer
but it seems like a bit tricky to handle for now.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff
Found by inspection, probably a typo because a surface store is
definitely not an atomic operation.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/codegen
Yes, IMUL/IMAD require dependency barriers and we should
definitely replace these instructions by XMAD but the
different flags need to be figured out. Note that XMAD only
supports 16-bits integers.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/lib/gm107.asm | 40
) and real games
like Shadow of Mordor and they all work fine.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 11 ---
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
b/src/gallium/drivers/no
h perf improvements
with radeonsi because it already performs really well, unlike Nouveau. But
with time and patience we can do better. :-)
This series is also available from my fdo account:
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=gm107_scheduler
Please, review!
Thanks.
[1] https
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 88 +++---
1 file changed, 44 insertions(+), 44 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index
Signed-off-by: Samuel Pitoiset
Acked-by: Ilia Mirkin
---
src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 20 ++--
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
corner case somewhere. That way, the scheduler
is enabled by default but it can be deactivated by using
NV50_PROG_SCHED=0.
Thanks to Scott Gray for the reverse engineering work available from
https://github.com/NervanaSystems/maxas/wiki/Control-Codes.
Signed-off-by: Samuel Pitoiset
It's actually useless to insert those texture barriers post RA
because the current control code (ie. st 0x0) will wait for all
dependencies before issuing a new instruction.
Signed-off-by: Samuel Pitoiset
Reviewed-by: Ilia Mirkin
---
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering
I saw those warnings few weeks ago when I updated to GCC6 as well (but
was lazy to fix them). Thanks.
Reviewed-by: Samuel Pitoiset
On 06/29/2016 02:38 PM, Hans de Goede wrote:
Signed-off-by: Hans de Goede
---
src/gallium/drivers/nouveau/codegen/nv50_ir_util.h | 4
1 file changed, 4
On 06/29/2016 03:33 PM, Ilia Mirkin wrote:
Since you're the 75th person to send these, I'm going to break down
and say "fine, whtvr". I really hate this "pander to stupid compilers"
things. If the warning is wrong, my natural inclination would be to
disable it (after my even more natural inclin
401 - 500 of 5029 matches
Mail list logo