---
src/mesa/drivers/dri/i965/brw_context.h | 8 +++-
src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 25
src/mesa/drivers/dri/i965/brw_state.h| 3 ++
src/mesa/drivers/dri/i965/brw_state_upload.c | 10 +
src/mesa/drivers/dri/i965/brw_vs_surface_state.
---
src/mesa/drivers/dri/i965/brw_gs.c | 1 +
src/mesa/drivers/dri/i965/brw_vs.c | 4 ++--
src/mesa/drivers/dri/i965/brw_wm.c | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c
b/src/mesa/drivers/dri/i965/brw_gs.c
index ce3cba4..bfb64f3 1006
This moves most of the surface state set-up logic that can be shared
between textures and shader images to a separate function.
---
src/mesa/drivers/dri/i965/brw_context.h | 11 ++
src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 124 +-
2 files changed, 83 insert
---
src/mesa/drivers/dri/i965/brw_tex_layout.c| 45 +--
src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 18 +++
2 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c
b/src/mesa/drivers/dri/i965/brw_tex_layou
Null surfaces are going to be useful to have something to point
unbound image units to, as the ARB_shader_image_load_store extension
requires us to behave deterministically in cases where some shader
tries to access an unbound image unit: Invalid stores and atomics are
supposed to be discarded and
---
src/mesa/drivers/dri/i965/brw_context.h | 2 +
src/mesa/drivers/dri/i965/brw_surface_formats.c | 111 +++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 77
3 files changed, 190 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_cont
---
src/mesa/drivers/dri/i965/brw_defines.h | 3 +++
src/mesa/drivers/dri/i965/gen7_wm_state.c| 7 +++
src/mesa/drivers/dri/i965/gen8_depth_state.c | 12
src/mesa/drivers/dri/i965/gen8_ps_state.c| 13 +
4 files changed, 31 insertions(+), 4 deletions(-)
This moves most of the surface state set-up logic that can be shared
between textures and shader images to a separate function.
---
src/mesa/drivers/dri/i965/gen8_surface_state.c | 136 ++---
1 file changed, 77 insertions(+), 59 deletions(-)
diff --git a/src/mesa/drivers/dri/i
---
src/mesa/drivers/dri/i965/brw_defines.h | 3 +++
src/mesa/drivers/dri/i965/gen7_gs_state.c | 4 +++-
src/mesa/drivers/dri/i965/gen7_vs_state.c | 13 -
src/mesa/drivers/dri/i965/gen7_wm_state.c | 3 +++
src/mesa/drivers/dri/i965/gen8_gs_state.c | 4 +++-
src/mesa/drivers/dri/i
Hey Matt,
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> MRFs cannot be read from anyway so they cannot possibly be a valid
>> source of LOAD_PAYLOAD.
>> ---
>
> The function only seems to test inst->dst.file == MRF. I don
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> Scalar registers are required to have zero stride, fix the
>> regs_written calculation not to assume that the instruction writes
>> zero registers in that case.
>> ---
>> src/mes
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> If the source type differs from the original type of the constant we
>> need to bit-cast it before propagating, otherwise the original type
>> information will be lost. If the constant was
Kenneth Graunke writes:
> On Friday, February 06, 2015 07:23:15 PM Francisco Jerez wrote:
>> It doesn't really improve locality of texture fetches, quite the
>> opposite it's a waste of memory bandwidth and space due to tile
>> alignment.
>> ---
>> sr
Kenneth Graunke writes:
> On Friday, February 06, 2015 07:23:16 PM Francisco Jerez wrote:
>> Reviewed-by: Paul Berry
>> ---
>> src/mesa/drivers/dri/i965/brw_context.h | 5 +
>> src/mesa/drivers/dri/i965/brw_shader.cpp | 7 +++
>> 2 files changed, 12
Kenneth Graunke writes:
> On Saturday, February 07, 2015 02:10:19 AM Francisco Jerez wrote:
>> Matt Turner writes:
>>
>> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez
>> > wrote:
>> >> Scalar registers are required to have zero stride, fix the
Matt Turner writes:
> On Fri, Feb 6, 2015 at 4:17 PM, Francisco Jerez wrote:
>> Matt Turner writes:
>>
>>> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez
>>> wrote:
>>>> If the source type differs from the original type of the constant w
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:43 AM, Francisco Jerez wrote:
>> ---
>> src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> Using 'ralloc*(this, ...)' is wrong if the object has automatic
>> storage or was allocated through any other means. Use normal dynamic
>> memory instead.
>> ---
>
> I
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> Fixes rewrite by the register coalesce pass of references to
>> individual halves of 16-wide coalesced registers.
>> ---
>> src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 +++
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> ---
>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> b
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez wrote:
>> So regs_written gets initialized with a sensible value.
>> ---
>> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +--
>> 1 file changed, 5 insertions(+), 6 deletions(-)
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:43 AM, Francisco Jerez wrote:
>> ---
>> src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
>> src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 15 ++-
>> src/mesa/dr
Some instruction bits don't have a mapping defined to any compacted
instruction field. If they're ever set and we end up compacting the
instruction they will be forced to zero. Avoid using compaction in such
cases.
v2: Align multiple lines of an expression to the same column. Change
conditi
Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depen
Some people have complained that code using the CEILING() macro is
difficult to understand because it's not immediately obvious what it
is supposed to do until you go and look up its definition. Use a more
descriptive name that matches the similar utility macro in the Linux
kernel.
---
src/mesa/d
Scalar registers are required to have zero stride, fix the
regs_written calculation not to assume that the instruction writes
zero registers in that case.
v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken)
Reviewed-by: Kenneth Graunke
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
1 file ch
So the i965 driver can expose 32 image uniforms per shader stage.
---
src/mesa/main/config.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/main/config.h b/src/mesa/main/config.h
index 4ec4b75..08e1a14 100644
--- a/src/mesa/main/config.h
+++ b/src/mesa/main/config.h
@
Kenneth Graunke writes:
> On Saturday, February 07, 2015 03:03:44 AM Francisco Jerez wrote:
>> Kenneth Graunke writes:
>>
>> > On Friday, February 06, 2015 07:23:16 PM Francisco Jerez wrote:
>> >> Reviewed-by: Paul Berry
>> >> ---
>
It doesn't really improve locality of texture fetches, quite the
opposite it's a waste of memory bandwidth and space due to tile
alignment.
v2: Check mt->logical_height0 instead of mt->target (Ken). Add short
comment explaining why they shouldn't be tiled.
---
src/mesa/drivers/dri/i965/intel
Kristian Høgsberg writes:
> On Fri, Feb 6, 2015 at 9:23 AM, Francisco Jerez wrote:
>> ---
>> src/mesa/drivers/dri/i965/brw_program.c | 40
>> +
>> src/mesa/drivers/dri/i965/intel_reg.h | 1 +
>> 2 files changed, 41 insertio
Kenneth Graunke writes:
> On Friday, February 06, 2015 07:23:25 PM Francisco Jerez wrote:
>> Shaders with image uniforms may have side effects. Make sure that
>> fragment shader threads are dispatched if the shader has any image
>> uniforms.
>> ---
>> src/mesa
Shaders with image uniforms may have side effects. Make sure that
fragment shader threads are dispatched if the shader has any image
uniforms.
v2: Use brw_stage_state::nr_image_params to find out if the shader has
image uniforms instead of checking core mesa data structures (Ken).
---
src/me
v2: Set the PS UAV-only bit on HSW (Ken).
---
src/mesa/drivers/dri/i965/brw_defines.h | 4
src/mesa/drivers/dri/i965/gen7_gs_state.c | 4 +++-
src/mesa/drivers/dri/i965/gen7_vs_state.c | 13 -
src/mesa/drivers/dri/i965/gen7_wm_state.c | 9 +
src/mesa/drivers/dri/i965/
v2: Store early fragment test mode in brw_wm_prog_data instead of
getting it from core mesa data structures (Ken).
---
src/mesa/drivers/dri/i965/brw_context.h | 1 +
src/mesa/drivers/dri/i965/brw_defines.h | 3 +++
src/mesa/drivers/dri/i965/brw_wm.c | 2 ++
src/mesa/drivers
Matt Turner writes:
> On Mon, Feb 9, 2015 at 6:08 AM, Francisco Jerez wrote:
>> Some instruction bits don't have a mapping defined to any compacted
>> instruction field. If they're ever set and we end up compacting the
>> instruction they will be forced to zero.
Some instruction bits don't have a mapping defined to any compacted
instruction field. If they're ever set and we end up compacting the
instruction they will be forced to zero. Avoid using compaction in such
cases.
v2: Align multiple lines of an expression to the same column. Change
conditi
Some instruction bits don't have a mapping defined to any compacted
instruction field. If they're ever set and we end up compacting the
instruction they will be forced to zero. Avoid using compaction in such
cases.
v2: Align multiple lines of an expression to the same column. Change
conditi
s = new unsigned[capacity];
> +memcpy(tmp_offsets, offsets, count * sizeof(unsigned));
> +delete[] offsets;
> + offsets = tmp_offsets;
> }
>
> sizes[count] = size;
> --
> 1.8.5.1
Looks OK to me,
Re
Matt Turner writes:
> On Wed, Feb 11, 2015 at 6:37 AM, Juha-Pekka Heikkila
> wrote:
>> There is no error path available thus instead of giving
>> realloc possibility to fail use new which will never
>> return null pointer and throws bad_alloc on failure.
>
> The problem was that we weren't check
Matt Turner writes:
> On Wed, Feb 11, 2015 at 9:16 AM, Francisco Jerez
> wrote:
>> Matt Turner writes:
>>
>>> On Wed, Feb 11, 2015 at 6:37 AM, Juha-Pekka Heikkila
>>> wrote:
>>>> There is no error path available thus instead of giving
>>&
Matt Turner writes:
>[...]
> Indeed. And another thing to consider is that we've discussed
> compiling with -fno-exceptions.
>
Heh, the benefit you get from doing that is virtually zero. And in
cases like this where failure would have to be handled many levels up in
the stack and require redesig
Francisco Jerez writes:
> Kenneth Graunke writes:
>
>> On Sunday, January 18, 2015 01:04:02 AM Francisco Jerez wrote:
>>> This is the first part of a series meant to improve our usage of the L3
>>> cache.
>>> Currently it's far from ideal s
Matt Turner writes:
> On Fri, Feb 6, 2015 at 6:43 AM, Francisco Jerez wrote:
>> ---
>
> We don't have any operations today that return more than a single
> register in the vec4 backend, do we? Presumably this is partly
> preparation for image_load_store?
>
Yeah,
Matt Turner writes:
> On Mon, Feb 9, 2015 at 11:25 AM, Matt Turner wrote:
>> On Fri, Feb 6, 2015 at 2:40 PM, Matt Turner wrote:
>>> 8 - Sent a question
>>> 9 - Like mine better?
>>> 10 - Looks wrong to me
>>> 11-13 - Asked Jason to review
>>> 14 - Asked for an example showing the problem
>>> 15
Francisco Jerez writes:
> Matt Turner writes:
>
>> On Mon, Feb 9, 2015 at 11:25 AM, Matt Turner wrote:
>>> On Fri, Feb 6, 2015 at 2:40 PM, Matt Turner wrote:
>>>> 8 - Sent a question
>>>> 9 - Like mine better?
>>>> 10 - Looks wrong to
This fixes a regression in the running time of Piglit introduced by
commit 78e9043475d4bed8b50f7e413963c960fa0935bb, which increased the
number of register allocation classes set up by the VEC4 back-end
from 2 to 16. The algorithm used by ra_set_finalize() to calculate
them is unnecessarily expens
This line was removed by accident in commit
16b911257440afbd77a6eb762e28df62e3c19bc7 causing a regression in the
ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert Khronos conformance
test. It's necessary because the swizzle_result() code below expects
all four components of the vector to be valid.
Connor Abbott writes:
> I'll ask the same question I asked Jason when he did this for FS...
> did you verify that the new q_values is the same as the old one?
>
Yeah, I did.
> On Fri, Feb 13, 2015 at 8:02 AM, Francisco Jerez
> wrote:
>> This fixes a regression in
emit(MOV(component(sources[0], 7), brw_flag_reg(0, 1)))
> ->force_writemask_all = true;
> + } else if (stage == MESA_SHADER_VERTEX) {
> + emit(MOV(component(sources[0], 7),
> + brw_imm_ud(0xff)))->force_
Hi Ian,
Ian Romanick writes:
> Please tag the commit with
>
> Cc: "10.5"
>
I don't think that's necessary, the commit that caused this regression
isn't part of 10.5.
> On 02/13/2015 05:03 AM, Francisco Jerez wrote:
>&g
Jason Ekstrand writes:
> On Feb 15, 2015 11:55 PM, "Ben Widawsky"
> wrote:
>>
>> The short version: we need to set bits in R0.7 which provide a mask to be
> used
>> for PS kill samples/pixels. Since the VS has no such concept, we just
> need to
>> set all 1.
>>
>> The longer version...
>> Execut
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around. On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the num
Jason Ekstrand writes:
> On Feb 16, 2015 8:35 AM, "Francisco Jerez" wrote:
>>
>> The round-robin allocation strategy is expected to decrease the amount
>> of false dependencies created by the register allocator and give the
>> post-RA scheduling pass mor
Jason Ekstrand writes:
> On Feb 16, 2015 9:34 AM, "Francisco Jerez" wrote:
>>
>> Jason Ekstrand writes:
>>
>> > On Feb 16, 2015 8:35 AM, "Francisco Jerez"
> wrote:
>> >>
>> >> The round-robin allocation strategy
Matt Turner writes:
> On Mon, Feb 16, 2015 at 10:40 AM, Francisco Jerez
> wrote:
>> My intuition is that the huge performance improvement Matt observed by
>> disabling the third scheduling heuristic is more likely to have been
>> caused by a decrease in the amount of c
Jason Ekstrand writes:
> On Mon, Feb 16, 2015 at 10:40 AM, Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > On Feb 16, 2015 9:34 AM, "Francisco Jerez"
>> wrote:
>> >>
>> >> Jason Ekstrand writes
lead to any
measurable improvement in any of the other cases (actually I would be
very surprised if that's the case), so this suggestion seems a bit of a
premature optimization to me, how about we KISS for now.
>
>
> On Mon, Feb 16, 2015 at 11:39 AM, Francisco Jerez
> wrote:
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around. On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the num
Tom Stellard writes:
> On Tue, Feb 17, 2015 at 03:23:05PM +0200, Francisco Jerez wrote:
>> The round-robin allocation strategy is expected to decrease the amount
>> of false dependencies created by the register allocator and give the
>> post-RA scheduling pass more freedom
Jason Ekstrand writes:
> On Mon, Feb 16, 2015 at 11:39 AM, Francisco Jerez
> wrote:
>
>> The round-robin allocation strategy is expected to decrease the amount
>> of false dependencies created by the register allocator and give the
>> post-RA scheduling pass more f
Connor Abbott writes:
> On Tue, Feb 17, 2015 at 8:15 AM, Francisco Jerez
> wrote:
>> Connor Abbott writes:
>>
>>> Hi Francisco,
>>>
>> Hi Connor, and thank you for your feedback.
>>
>>> A few comments:
>>>
>>> 1
e will fix cases that write atomics, such as
> atomicCounterIncrement, and this change will fix cases than only read
> atomics, such as atomicCounter.
>
> Signed-off-by: Jordan Justen
> Cc: Ben Widawsky
> Cc: Francisco Jerez
> ---
> src/mesa/drivers/dri/i965/brw_fs_visito
Connor Abbott writes:
> On Tue, Feb 17, 2015 at 3:04 PM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Mon, Feb 16, 2015 at 11:39 AM, Francisco Jerez
>>> wrote:
>>>
>>>> The round-robin allocation strategy is expected
Jason Ekstrand writes:
> On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez
> wrote:
>
>> Hey Matt,
>>
>> Matt Turner writes:
>>
>> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez
>> wrote:
>> >> MRFs cannot be read from anyway so
Jason Ekstrand writes:
> On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez
>> > wrote:
>> >
>> >> Hey Matt,
>> >>
>> &
s, so we set all bits to enabled.
>> >
>> > Note: this mask is ANDed with the execution mask, so some channels may not
>> > end
>> > up issuing the atomic operation.
>> >
>> > Signed-off-by: Jordan Justen
>> > Cc: Ben Widawsky
>>
v/2015-February/076097.html
[4] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html
> --Jason
>
> On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand
> wrote:
>
>>
>>
>> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez
>> wrote:
>>
Jason Ekstrand writes:
> On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > I'm still a little pensive. But
>> >
>> > Reviewed-by: Jason Ekstrand
>> >
>> Thanks.
>>
>>
image_load_store
code that ends up using byte types with stride=4 for some image formats.
>
> On Fri, Feb 6, 2015 at 9:42 AM, Francisco Jerez
> wrote:
>
>> ---
>> src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>
Jason Ekstrand writes:
> On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez
>> > wrote:
>> >
>> >> Jason Ekstrand writes:
>> >&
;t be handled in roughly the same way you do it
now? Recognize when src[i + 4] is the same 8-wide register as src[i]
shifted by 8 and emit a COMPR4 copy in that case?
> On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand
> wrote:
>
>>
>>
>> On Fri, Feb 20, 2015 at 1:09 PM, Fra
ARB_gpu_shader5 requires sampler array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index
---
src/mesa/drivers/dri/i965/brw_fs.h | 2 ++
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 12
src/mesa/drivers/dri/i965/brw_vec4.h | 3 +++
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++
4 files changed, 28 insertions(+)
diff --git a/sr
The BROADCAST instruction picks the channel from its first source
given by an index passed in as second source. This will be used in
situations where all channels from the same SIMD thread have to agree
on the value of something, e.g. a surface binding table index.
---
src/mesa/drivers/dri/i965/b
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++
src/mesa/drivers/dri/i965/brw_fs.h | 1 +
src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 +
src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
src/mesa/drivers/dri/i965/brw_vec4.h
ARB_gpu_shader5 requires UBO array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index calc
---
src/mesa/drivers/dri/i965/brw_fs.cpp| 15 +++
src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 1 +
src/mesa/drivers/dri/i965/brw_fs_cse.cpp| 1 +
src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 +++
src/mesa/drivers/d
This instruction calculates the index of an arbitrary channel enabled
in the current execution mask. It's expected to be used as input for
the BROADCAST opcode, but it's implemented as a separate instruction
rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has
no dependencies so it
The same
opcodes will be used for dynamically uniform indexing of image arrays
too.
> On 02/20/2015 11:48 AM, Francisco Jerez wrote:
>> The BROADCAST instruction picks the channel from its first source
>> given by an index passed in as second source. This will be used in
>
Tom Stellard writes:
> This should be done by the frontend for devices that support this
> extension.
Reviewed-by: Francisco Jerez
> ---
> src/gallium/state_trackers/clover/llvm/invocation.cpp | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/src/gallium/state
This is currently not a problem because the vec4 visitor happens to
mask out unused components from the destination, but it might become
an issue when we start using atomics without writeback message. In
any case it seems sensible to set it again here because the
consequences of setting the wrong
---
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 4e695f5..48ee
Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
surface index as a register instead of a constant and to use
brw_send_indirect_message() to emit the indirect variant of send with
a dynamically calculated message descriptor. This will be required to
support variable indexing
---
src/mesa/drivers/dri/i965/brw_defines.h | 2 +
src/mesa/drivers/dri/i965/brw_eu.h | 4 ++
src/mesa/drivers/dri/i965/brw_eu_emit.c | 70
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 ++
src/mesa/drivers/dri/i965/brw_shader.cpp
---
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 5 +++--
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 --
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
4 files changed, 10 insertions(+), 6 deletions(-)
diff --
This is consistent with the untyped surface read opcode. From now on
all typed and untyped surface access opcodes will follow the same
pattern: src[0] will be the message payload, src[1] will be the
surface index and src[2] will be a control immediate (atomic operation
for atomic opcodes and numbe
This doesn't actually enable untyped surface message sends from GRF
yet, the upcoming atomic counter and image intrinsic lowering code
will.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 7 ---
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 +++-
src/mesa/drivers/d
And calculate the message response size based on the number of
components rather than the other way around. This simplifies their
interface somewhat and allows the caller to request a writeback
message with more than one vector component in SIMD4x2 mode.
---
src/mesa/drivers/dri/i965/brw_eu.h
This was telling the sampler to do texture fetches for *all* channels
in the non-constant surface index case, what could have reduced
throughput unnecessarily when some of the channels were disabled by
control flow.
---
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 ++--
src/mesa/d
The generate_untyped_*() methods do nothing useful other than calling
the corresponding function from brw_eu_emit.c. The calls to
brw_mark_surface_used() will go away too in a future commit.
---
src/mesa/drivers/dri/i965/brw_fs.h | 11 --
src/mesa/drivers/dri/i965/brw_fs_generat
---
src/mesa/drivers/dri/i965/brw_eu.h | 19 ++--
src/mesa/drivers/dri/i965/brw_eu_emit.c | 58 ++--
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 55 +-
src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 37 ---
4
---
src/mesa/drivers/dri/i965/brw_defines.h| 4 +
src/mesa/drivers/dri/i965/brw_eu.h | 24 +++
src/mesa/drivers/dri/i965/brw_eu_emit.c| 169 +
src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++
src/mesa/drivers/dri/i965/brw_f
---
src/mesa/drivers/dri/i965/brw_defines.h| 1 +
src/mesa/drivers/dri/i965/brw_eu.h | 7 +++
src/mesa/drivers/dri/i965/brw_eu_emit.c| 51 ++
src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++
src/mesa/drivers/dri/i965/brw_fs_g
Tom Stellard writes:
> ---
> src/gallium/state_trackers/clover/api/device.cpp | 3 +--
> src/gallium/state_trackers/clover/core/device.cpp | 6 ++
> src/gallium/state_trackers/clover/core/device.hpp | 1 +
> 3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/state
ice query function from cl_khr_fp86() to
> has_doubles().
>
> v3:
> - Return 0 for device::doubled_fp_confg() when doubles aren't
> supported.
>
> v4:
> - Remove device query for double fp_config.
Reviewed-by: Francisco Jerez
> ---
> s
Tom Stellard writes:
> This means dropping CL_FP_DENORM from the current return value.
> ---
> src/gallium/state_trackers/clover/api/device.cpp | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/device.cpp
> b/src/gallium/state_trac
"Pohjolainen, Topi" writes:
> On Fri, Mar 06, 2015 at 10:37:06AM +0200, Pohjolainen, Topi wrote:
>> On Fri, Feb 27, 2015 at 05:34:44PM +0200, Francisco Jerez wrote:
>> >[..]
>> > +/**
>> > + * Send message to shared unit \p sfid with a possibly indir
"Pohjolainen, Topi" writes:
> On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote:
>> Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
>> surface index as a register instead of a constant and to use
>> brw_send_indirect_message()
"Pohjolainen, Topi" writes:
> On Fri, Mar 06, 2015 at 02:29:15PM +0200, Francisco Jerez wrote:
>> "Pohjolainen, Topi" writes:
>>
>> > On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote:
>> >> Change brw_untyped_atomic()
"Pohjolainen, Topi" writes:
> On Fri, Feb 27, 2015 at 05:34:47PM +0200, Francisco Jerez wrote:
>> This is currently not a problem because the vec4 visitor happens to
>> mask out unused components from the destination, but it might become
>> an issue when we start
301 - 400 of 3036 matches
Mail list logo