Am 19.07.2016 um 00:43 schrieb Boyuan Zhang:
VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We
will save this encode entry point in config. config_id was used as profile
previously. Now, config has both profile and entrypoint field, and config_id is
used to get the
Am 19.07.2016 um 00:43 schrieb Boyuan Zhang:
Add environmental variable to disable interlace mode. At VAAPI decoding stage,
driver can not distinguish b/w pure decoding case and transcoding case. And
since interlace encoding is not supported, we have to disable interlace for
transcoding case.
Am 19.07.2016 um 00:43 schrieb Boyuan Zhang:
Add necessary functions/changes for VAAPI encoding to buffer and picture. These
changes will allow driver to handle all Vaapi encode related operations. This
patch doesn't change the Vaapi decode behaviour.
Signed-off-by: Boyuan Zhang
---
src/gal
Hi,
sorry for being late but this patch doesn't mention that all those
symbols should be exported in libGL.so too [1].
If you look at the history of static_data.py it was mentioned that
this list of functions should never grow [2].
Thanks,
Andreas
[1]
https://anonscm.debian.org/cgit/pkg-xorg/li
And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.
v2: - use moveSources()
- allow ADD3 -> ADD when srcFlags is set
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/gallium/
And ADD3(d, a, b, c) to ADD(d, b, a + c) as well.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 62 ++
1 file changed, 62 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouv
This instruction is new since SM50 (Maxwell) and allows to perform
an add with three sources. Unfortunately, it only supports integers.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
This is similar to what we already do for MAD/FMA.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/d
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index ceb9718..77bac82
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8
1 file changed, 8 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 77bac82..ec6418b 10064
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 34 ++
1 file changed, 34 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index f1ba27a.
Signed-off-by: Samuel Pitoiset
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 55 ++
1 file changed, 55 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3fc1abf..ceb
Am 19.07.2016 um 00:43 schrieb Boyuan Zhang:
Add entrypoint to distinguish H.264 decode and encode. For example, in patch 5/11 when is calling
"VaCreateContext", "pps" and "sps" shouldn't be allocated for H.264 encoding.
So we need to use the entry_point to determine this is H.264 decode or H.2
Yeah, already done so.
Regards,
Christian.
Am 18.07.2016 um 19:17 schrieb Nayan Deshmukh:
Hi Guys,
I don't have the push access. Can anyone please push the patch.
Thanks,
Nayan.
On Thu, Jul 14, 2016 at 10:36 AM, Nayan Deshmukh
mailto:nayan26deshm...@gmail.com>> wrote:
Reviewed-by: Nay
RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0.
In that situation, the regioning generated for the sources seems to be
equivalent to <4,4,1>:DF, so it will only work for components XY, which
means that we have to move any other swizzle to a temporary so that we can
source
The help string wasn't updated in cbc37f7.
Fixes: cbc37f7 ("anv: install the intel_icd.json to ${datarootdir} by
default")
Signed-off-by: Andreas Boll
---
configure.ac | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/configure.ac b/configure.ac
index 54416b4..6ea1f2c 100644
-
Because the meaning of the swizzles and writemasks involved is different,
so replacing the source would lead to different semantics.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propag
RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0.
In that situation, the regioning generated for the sources seems to be
equivalent to <4,4,1>:DF, so it will only work for components XY, which
means that we have to move any other swizzle to a temporary so that we can
source
We need to emit to 32-bit load messages to load a full dvec4. If only
1 or 2 double components are needed dead-code-elimination will remove
the second one.
We also need to shuffle the result of the 32-bit messages to form
valid 64-bit SIMD4x2 data.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |
Same requirements as for UBO loads.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 32 --
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 172bf48..5bc1fd5 100
From: Connor Abbott
v2: Also check if the instruction source target is 64-bit. (Samuel)
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propag
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this res
In this case we need to shuffle the 64-bit data before we write it
to memory, source from reg_offset + 1 to write components Z and W
and consider that each DF channel is twice as big.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 --
1 file changed, 32 insertions(
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 454ad03..6e09778 100644
--- a/src/mesa/drivers/dri/i965/brw
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/driv
From: Samuel Iglesias Gonsálvez
Sometimes we emit code that has subnr > 0 to select the second half
of a DF register (components Z or W). For example, the 64-bit
shuffling code does this. For that code to work properly we need to
make sure that that we use a vstride=0 on these source registers to
From: Samuel Iglesias Gonsálvez
This means we would copy propagate partial reads or writes and that can affect
the result.
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/driver
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e204d81..b4a22d1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers
A vec4 is 16 bytes and a dvec4 is 32 bytes so for doubles we have
to multiply the reladdr by 2. The reg_offset part is in units of 16
bytes and is used to select the low/high 16-byte chunk of a full
dvec4, so we don't want to multiply that part of the address.
---
src/mesa/drivers/dri/i965/brw_vec
SIMD4x2 64bit data is stored in register space like this:
r0.0:DF x0 y0 z0 w0
r0.1:DF x1 y1 z1 w1
When we need to write data such as this to memory using 32-bit write
messages we need to shuffle it in this fashion:
r0.0:DF x0 y0 x1 y1
r0.1:DF z0 w0 z1 w1
and emit two 32-bit write messages,
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 19 +++
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri
FIXME: We need to fix the case where not all the attributes fit
in the push constant buffer
---
src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 63 +++---
1 file changed, 48 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
b/src/mesa/dri
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/mesa/drivers/dri/
This way callers don't need to know about 64-bit particularities and
we reuse some code.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++-
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++
2 files changed, 50 insertions(+), 53 deletions(-)
diff --git a
Use a width of 2 with 64-bit attributes. Also, if we have a dvec
split across two registers such that components XY are stored in
the second half of a register and components ZW are stored in the
first half of the next register, fix up the regioning parameters
for channels ZW.
---
src/mesa/drivers
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i
---
src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 29 +++--
1 file changed, 27 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index 70f81a0..cdfcefa 100644
--- a/src/mesa/drivers/dri/i965/b
We can implement them directly. Also, document other possible improvements
for future reference.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 46 +-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/driv
From: Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index f92abe3..1
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index d7fbb5d..5c7a07a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 441a450..40ba648 100644
--- a/src/mesa/drivers/dri/i965
The tessellation evaluation stage generates source regions with a vstride=0
for these so they hit the gen7 hardware decompression bug. Split them to
prevent this.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/src/mesa/d
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++
1 file chang
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +-
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 99b30ce..d7fbb5d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec
These can happen, for example, in tessellation evaluation when it maps
incoming attributes to FIXED_GRF registers. In this case, just as with
VGRFs, we need to make sure we have vstride=0 for these to work.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +--
1 file changed, 1 insertion(+), 2 delet
From: Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 43 +--
1 file changed, 28 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
b/src/mesa/drivers/dri/i965
ARB_gpu_shader_fp64 was the last piece missing. Notice that some
hardware and kernel combinations do not support pipelined register
writes, which are required for some OpenGL 4.0 features, in which
case the driver won't expose 4.0.
---
src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++
src/mesa/
---
src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 27 ---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index f61c612..70f81a0 100644
--- a/src/mesa/drivers/dri/i965/brw
---
src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
b/src/mesa/drivers/dri/i965/intel_extensions.c
index c557137..6ba44b8 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drive
Now that we are letting some instructions through without being
fully scalarized we have to make sure that we do scalarize any
that have XY / ZW writemasks, since this don't have native support.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +-
1 file changed, 9 insertions(+), 1 deletion
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/sr
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 6662a1e..1f8fa80 100644
--- a/src/mesa/drivers/dri/i965/brw_ve
From: Samuel Iglesias Gonsálvez
max_vector_size is used in the vec4 backend to pad out the uniform
components to match a size that is a multiple of a vec4. Double and dvec2
uniforms only require a single vec4 slot, not two.
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quir
From: Connor Abbott
Less duplication, one one less case to handle for doubles and support
for sized NIR types.
v2: Fix call to get_instance by swapping rows and columns params (Iago)
Signed-off-by: Iago Toral Quiroga
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 ++--
1 file
Basically, this involves considering the bit-size information to set
the appropriate type on both operands and destination.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 162b481..bf6701e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers
For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).
For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index dd06a32..cf35f2e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 1f8fa80..c5b9715 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/s
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 5a7ee0b..df927e7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src
From: "Juan A. Suarez Romero"
Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to access 64-bit data in
These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.
We can't do this in Align16 because we would need data to cross the
vec4 boundary.
---
src/mesa/drivers/dri/i965/brw_defines.h | 2 ++
src/mesa/drivers
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 70 ++
1 file changed, 52 insertions(+), 18 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index cbb4dae..1d33fb2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't
want the copy propagation pass to mess with them.
---
.../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_
---
src/mesa/drivers/dri/i965/brw_disasm.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c
b/src/mesa/drivers/dri/i965/brw_disasm.c
index d74d5d5..c8bdeab 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 7b8e30d..65fa057 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mes
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 3043147..afcf31e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -79,
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 095a27d..cbb4dae 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4
On 19 July 2016 at 04:21, Tomasz Figa wrote:
> On Tue, Jul 19, 2016 at 2:35 AM, Emil Velikov
> wrote:
>> On 18 July 2016 at 16:38, Tomasz Figa wrote:
>>> On Mon, Jul 18, 2016 at 11:58 PM, Emil Velikov
>>> wrote:
On 18 July 2016 at 13:02, Tomasz Figa wrote:
> On Mon, Jul 18, 2016 at
Hi,
this series implements initial support for Haswell align16 FP64, and with that
we can enable FP64 and OpenGL 4.0 in Haswell. Gen8+ is now fully scalar, so the
patches focus on gen7 and Haswell specifically (although they do mention what
things are not expected to work outside gen7 for futur
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12
1 file changed, 12 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 25fd1fe..82bf927 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/dr
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_reg.h | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h
b/src/mesa/drivers/dri/i965/brw_reg.h
index 38cf8e3..e4c3e7a 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 13 ++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index fde7b60..7b8e30d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
A source region like <2,2,1>.xyxy:DF selects XXZZ of a dvec4. If we
have code such as:
mov g2.z g4.x
This creates a problem because we end up writing g4.z in g2.z. To fix
this we want to generate a region and we can do that by exploiting
again the vstride=0 behavior of the hardware in gen7.
The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we
let the bulk of the vec4 backend ignore this fact and we fix the
swizzles right before code
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.
Signed-off-by: Samuel I
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 829b7d3..88bf895 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
We will use this in cases where we want to force the vstride of a src_reg
to 0 to exploit a particular behavior of the hardware. It will come in
handy to implement access to components Z/W.
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++
2 files c
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++
1 file changed, 14 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 81389a9..1525a3d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/
From the HSW PRM, Command Reference, QtrCtrl:
"NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float)
source or destination type."
v2 (Samuel): Assert that the type is DF.
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 9
From: Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 8a45fd
From the BDW PRM, Workarounds chapter:
"DF->f format conversion for Align16 has wrong emask calculation when
source is immediate."
So detect the case and move the immediate source to a VGRF before we attempt
the conversion.
Notice that Broadwell and later are strictly scalar at the moment
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 57 ++
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 4014020..c9b8edf 100644
--- a/src/mesa/drivers/dri/i965
Generally, instructions in Align16 mode only ever write to a single
register and don't need anny form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.
---
src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
In the vec4 backend the generator sets the execution size for all
instructions to 8, however, we will have to split certain DF instructions
to have an execution size of 4, so we need to indicate this explicitly in the
IR for the generator to set the right execution size for them.
We will use this
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index df927e7..095a27d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index cf35f2e..fde7b60 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/s
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 53 +++---
2 files changed, 38 insertions(+), 20 d
These opcodes do partial writes of 64-bit data. The problem is that we intend
to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first, which is
The hardware only supports 32-bit swizzles, which means that a swizzle
like XYZW only selects channels XY of a DF, making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swiz
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 82bf927..dd06a32 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/dri
These need to be emitted as align1 MOV's, since they need to have a
stride of 2 on the float register (whether src or dest) so that data
from another thread doesn't cross the middle of a SIMD8 register.
v2 (Iago):
- The float-to-double needs to align 32-bit data to 64-bit before doing the
conversi
The pass does not support doubles in its current form. I'm not even sure that
it should, since it would basically change the type of the operation and that
could have implications for things like SSBO writes, etc.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
1 file changed, 1 insertion(+)
di
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to things like
unpackDouble2x32.
We can't do this in Align16 because we would need data to cross the
vec4 boundary.
---
src/mesa/drivers/dri/i965/brw_defines.h | 2 +
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++
1 file changed, 18 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 1525a3d..4014020 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/m
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c55d594..8316691 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i9
We need to consider the fact that dvec3/4 require two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 1b190ab..95b408e 10064
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9400baa..a366548 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
1 - 100 of 284 matches
Mail list logo