Jordan Justen writes:
> This prevents an assertion from being hit with SIMD16:
>
> Assertion `inst->exec_size == dispatch_width() || force_writemask_all' failed.
>
> Signed-off-by: Jordan Justen
> Cc: Francisco Jerez
> ---
> src/mesa/drivers/dri/i965/brw_
Jason Ekstrand writes:
> We want to move these into the builder so that they know the current
> builder's dispatch width. This will be needed by a later commit.
I very much like the idea of this series, but, why do you need to move
these register manipulators into the builder? The builder is a
Jason Ekstrand writes:
> On Tue, Jun 23, 2015 at 9:22 AM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> We want to move these into the builder so that they know the current
>>> builder's dispatch width. This will be needed by a later c
Iago Toral writes:
> On Wed, 2015-06-17 at 17:20 -0700, Jordan Justen wrote:
>> I wanted to question whether this was required, based on this text
>> from the extension spec:
>>
>> "The ability to write to buffer objects creates the potential for
>> multiple independent shader invocations to re
Zoltan Gilian writes:
> Image attributes are passed to the kernel as hidden parameters after the
> image attribute itself. An llvm pass replaces the getter builtins to
> the appropriate parameters.
This seems to be doing essentially the same thing as v1? Is it the
right patch?
> ---
> src/gal
Jason Ekstrand writes:
> On Jun 24, 2015 4:29 AM, "Francisco Jerez" wrote:
>>
>> Jason Ekstrand writes:
>>
>> > On Tue, Jun 23, 2015 at 9:22 AM, Francisco Jerez
> wrote:
>> >> Jason Ekstrand writes:
>> >>
>>
Jason Ekstrand writes:
> On Jun 24, 2015 6:29 AM, "Francisco Jerez" wrote:
>>
>> Jason Ekstrand writes:
>>
>> > On Jun 24, 2015 4:29 AM, "Francisco Jerez"
> wrote:
>> >>
>> >> Jason Ekstrand writes:
>> &g
Jason Ekstrand writes:
> On Wed, Jun 24, 2015 at 6:44 AM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Jun 24, 2015 6:29 AM, "Francisco Jerez" wrote:
>>>>
>>>> Jason Ekstrand writes:
>>>>
>>&g
Davin McCall writes:
> On 26/06/15 11:08, Erik Faye-Lund wrote:
>> On Thu, Jun 25, 2015 at 1:48 AM, Davin McCall wrote:
>>> This is an alternative to my earlier patch [1] (and it is now constructed
>>> properly using git format-patch).
>>>
>>> Quick background:
>>> There is a problem in exec_lis
Davin McCall writes:
> On 26/06/15 13:18, Francisco Jerez wrote:
>> Davin McCall writes:
>>
>>> On 26/06/15 11:08, Erik Faye-Lund wrote:
>>>> On Thu, Jun 25, 2015 at 1:48 AM, Davin McCall wrote:
>>>>> This is an alternative to my earlier
Davin McCall writes:
> On 26/06/15 14:31, Eirik Byrkjeflot Anonsen wrote:
>> Erik Faye-Lund writes:
>>
>>> On Fri, Jun 26, 2015 at 1:23 PM, Davin McCall wrote:
On 26/06/15 12:03, Davin McCall wrote:
> ... The stored value of 'n' is not accessed by any other type than the
> type of
Erik Faye-Lund writes:
> On Fri, Jun 26, 2015 at 4:16 PM, Davin McCall wrote:
>> On 26/06/15 14:53, Erik Faye-Lund wrote:
>>>
>>> On Fri, Jun 26, 2015 at 3:05 PM, Davin McCall wrote:
On 26/06/15 12:55, Erik Faye-Lund wrote:
On Fri, Jun 26, 2015 at 1:23 PM, Davin McCall wrot
y we want now.
>
> 08: New. It's just moving code around so it should be trivial.
>
> 09: New. This is a complete replacement of patch 07 from the previous
> series.
>
> Cc: Topi Pohjolainen
> Cc: Iago Toral Quiroga
> Cc: Francisco Jerez
> Cc: Neil Ro
Erik Faye-Lund writes:
> On Fri, Jun 26, 2015 at 4:53 PM, Francisco Jerez
> wrote:
>> Erik Faye-Lund writes:
>>
>>> On Fri, Jun 26, 2015 at 4:16 PM, Davin McCall wrote:
>>>> On 26/06/15 14:53, Erik Faye-Lund wrote:
>>>>>
>
Erik Faye-Lund writes:
> On Fri, Jun 26, 2015 at 4:01 PM, Francisco Jerez
> wrote:
>> Davin McCall writes:
>>
>>> On 26/06/15 14:31, Eirik Byrkjeflot Anonsen wrote:
>>>> Erik Faye-Lund writes:
>>>>
>>>>> On Fri, Jun 26, 201
delta * MAX2(reg.width * reg.stride, 1) *
> + delta * bld.dispatch_width() * reg.stride *
Er... This doesn't look right for stride == 0. If you keep the
MAX2(.., 1) expression this patch is:
Reviewed-by: Francisco Jerez
> type_sz(r
g to a kernel's
>> resource usage, but that's a possible optimization for the future.
>
> Ping?
>
> This is rather simple, but I'd like an Rb, if possible. That also goes
> for the Gallium support patch.
>
For this patch:
Reviewed-by: Francisco Jerez
Tha
Grigori Goronzy writes:
> On 2015-06-09 22:52, Francisco Jerez wrote:
>>> +
>>> + if (blocking)
>>> + hev().wait();
>>> +
>>
>> hard_event::wait() may fail, so this should probably be done before the
>> ret_object() call to a
Jason Ekstrand writes:
> On Fri, Jun 26, 2015 at 8:52 AM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> Reviewed-by: Topi Pohjolainen
>>> ---
>>> src/mesa/drivers/dri/i965/brw_fs.h | 2 +-
>>> 1 file changed, 1 insertion(+
Jason Ekstrand writes:
> In C, if you partially initialize a structure, the rest of the struct gets
> set to 0. C++, however, does not have this rule so GCC throws warnings
> whenver NIR_SRC_INIT or NIR_DEST_INIT is used in C++.
I don't think that's right, in C++ initializers missing from an
ag
Jason Ekstrand writes:
> On Fri, Jun 26, 2015 at 12:08 PM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> In C, if you partially initialize a structure, the rest of the struct gets
>>> set to 0. C++, however, does not have this rule so GCC thro
Jason Ekstrand writes:
> On Fri, Jun 26, 2015 at 3:03 PM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Fri, Jun 26, 2015 at 12:08 PM, Francisco Jerez
>>> wrote:
>>>> Jason Ekstrand writes:
>>>>
>>>>>
Jason Ekstrand writes:
> On Fri, Jun 26, 2015 at 3:34 PM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Fri, Jun 26, 2015 at 3:03 PM, Francisco Jerez
>>> wrote:
>>>> Jason Ekstrand writes:
>>>>
>>&
Grigori Goronzy writes:
> We need this to implement OpenCL's
> CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
Reviewed-by: Francisco Jerez
Thanks.
> ---
> src/gallium/docs/source/screen.rst | 2 ++
> src/gallium/drivers/ilo/ilo_screen.c | 8 +++
ves/mesa-dev/2015-June/086049.html
>
> All patches applied on master:
> http://cgit.freedesktop.org/~tpalli/mesa/log/?h=unroll_loops
>
Looks good to me, for the series:
Reviewed-by: Francisco Jerez
> Thanks;
>
> Tapani Pälli (3):
> i965: use EmitNoIndirectSampler for ge
Davin McCall writes:
> On 26/06/15 14:53, Francisco Jerez wrote:
>
>> [...]
>>
>> Your first approach seemed quite reasonable IMHO. Were you able to
>> measure any performance regression from it?
>>
>> Thanks.
>>
>
> Wh
Davin McCall writes:
> On 29/06/15 10:40, Francisco Jerez wrote:
>> Davin McCall writes:
>>
>>> On 26/06/15 14:53, Francisco Jerez wrote:
>>>
>>>> [...]
>>>>
>>>> Your first approach seemed quite reasonable IM
Jason Ekstrand writes:
> Reviewed-by: Iago Toral Quiroga
> Reviewed-by: Topi Pohjolainen
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
> 1 file changed, 6 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 589b74c..6
Jason Ekstrand writes:
> Reviewed-by: Topi Pohjolainen
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 8
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index d1e253a..4f56865 100644
> -
Jason Ekstrand writes:
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 42
> src/mesa/drivers/dri/i965/brw_fs.h | 2 +-
> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +-
> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 58 +--
> src/mesa/drivers/dri/i
this->pixel_y = vgrf(glsl_type::float_type);
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
From 09f6cb08cd9951d8618dea7360aa7619cc80698
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to overri
Ben Widawsky writes:
> On Tue, Jun 30, 2015 at 11:25:42PM +0300, Francisco Jerez wrote:
>> Instead of relying on hardware defaults the i915 kernel driver is
>> going program custom MOCS tables system-wide on Gen9 hardware. The
>> "WT" entry previously used fo
Ben Widawsky writes:
> On Wed, Jul 01, 2015 at 12:33:54AM +0300, Francisco Jerez wrote:
>> Ben Widawsky writes:
>>
>> > On Tue, Jun 30, 2015 at 11:25:42PM +0300, Francisco Jerez wrote:
>> >> Instead of relying on hardware defaults the i915 kernel drive
abld.exec_all().group(dispatch_width * 2, 0);
The abld32 name seems misleading because this can actually be a 16 or 32
wide builder depending on dispatch_width. I suggest "dbld" (d for
double), or just expand the definition in its only user and get rid of
the temporary. With that fixed:
Re
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to overri
Follow-up to "i965/gen9: Use custom MOCS entries set up by the
kernel.", sent as a separate patch to make the SKL change easier to
back-port to stable branches.
---
This change depends on Ville's "[PATCH 1/2] i965: House MOCS settings
in brw_context/brw_device_info":
http://lists.freedesktop.org/a
Jason Ekstrand writes:
> On Fri, Jun 26, 2015 at 11:51 AM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Fri, Jun 26, 2015 at 8:52 AM, Francisco Jerez
>>> wrote:
>>>> Jason Ekstrand writes:
>>>>
>>>>>
Samuel Iglesias Gonsálvez writes:
> On 29/06/15 09:11, Jordan Justen wrote:
>> On 2015-06-24 07:36:24, Iago Toral wrote:
>>> On Wed, 2015-06-24 at 15:43 +0300, Francisco Jerez wrote:
>>>> AFAICT the reason why this (and many of the other changes in GLSL
>>
Neil Roberts writes:
> There was a comment saying that in SIMD16 mode the pixel interpolator
> returns coords interleaved 8 channels at a time and that this requires
> extra work to support. However, this interleaved format is exactly
> what the PLN instruction requires so I don't think anything
Hi EdB, a bunch of comments inline,
EdB writes:
> ---
> src/gallium/state_trackers/clover/api/program.cpp | 6 +-
> .../state_trackers/clover/core/compiler.hpp| 7 +-
> src/gallium/state_trackers/clover/core/error.hpp | 21 ++
> src/gallium/state_trackers/clover/core/program.cpp
How about "[...] from an intrusive reference to a Clover object
[...]"?
With that fixed:
Reviewed-by: Francisco Jerez
> + template
> + typename T::descriptor_type *
> + ret_object(const intrusive_ref &v) {
> + v().retain();
> + return desc
EdB writes:
> On Sunday 05 July 2015 18:15:33 Francisco Jerez wrote:
>>[...]
>> > --- a/src/gallium/state_trackers/clover/core/error.hpp
>> > +++ b/src/gallium/state_trackers/clover/core/error.hpp
>> > @@ -68,10 +68,31 @@ namespace clover {
>> &g
Hi Matt,
Matt Turner writes:
> On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez wrote:
>> Heh, I happened to come across this comment yesterday while looking for
>> the remaining no16 calls and wondered why on earth it couldn't do the
>> same that the normal interpolat
ribed in my reply to v1, it would be acceptable to
implement it for the time being using a workaround similar to
llvm/invocation.cpp:433 -- Hint: you'll need new
module::argument::semantic enums.
Thanks.
> On Wed, Jun 24, 2015 at 2:48 PM, Francisco Jerez
> wrote:
>> Zoltan Gilian
The hardware docs don't mention explicitly what these fields should
be, but I've verified experimentally on ILK that using a GRF as
destination causes the register to be corrupted when the execution
size of an ENDIF instruction is higher than 8 -- and because the
destination we were using was g0, e
From the hardware docs for the DO instruction:
"Execution size is ignored for this instruction."
My observation on ILK hardware contradicts the spec though, channels
over the execution size of a DO instruction won't enter the loop, and
channels over the execution size of a WHILE instruction will
This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec. The documentation for most control flow
instructions mentions among other restrictions:
"Instruction compression is not allowed."
Thi
Matt Turner writes:
> On Sun, Jul 5, 2015 at 4:45 PM, Francisco Jerez wrote:
>> Hi Matt,
>>
>> Matt Turner writes:
>>
>>> On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez
>>> wrote:
>>>> Heh, I happened to come across this comment yest
ipe_loader_sw_probe_xlib) to using loader_open_device() over
>> open(), with the former caring about CLOEXEC.
>>
> Francisco, Tom,
>
> Can you guys please take a look at the series. Even an Ack would be
> greatly appreciated.
>
Looks OK to me, assuming that Tom is OK with th
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to overri
Ben Widawsky writes:
> On Tue, Jul 07, 2015 at 10:21:28PM +0300, Francisco Jerez wrote:
>> Instead of relying on hardware defaults the i915 kernel driver is
>> going program custom MOCS tables system-wide on Gen9 hardware. The
>> "WT" entry previously used fo
We were passing src0 alpha and oMask in reverse order. There seems to
be no good way to pass them in the correct order to the new-style
LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while
oMask is not. Just split src0 alpha in fixed-width registers and pass
them to LOAD_PAYLOAD
Aside from the trivial GRF underallocation problem in the
"devinfo->gen < 6 && is_rect" if-block, the texrect scale uniform
look-up code was assuming a one-to-one mapping between UNIFORM
register indices and the param array, which only holds during the
SIMD8 run.
It seems dubious that this needs t
This gets rid of two no16() fall-backs and should allow better
scheduling of the generated IR. There are no uses of usubBorrow() or
uaddCarry() in shader-db so no changes are expected. However the
"arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
"arb_gpu_shader5/execution/built-in
pproach tomorrow.
> On Thu, Jul 9, 2015 at 3:51 PM, Francisco Jerez wrote:
>> This gets rid of two no16() fall-backs and should allow better
>> scheduling of the generated IR. There are no uses of usubBorrow() or
>> uaddCarry() in shader-db so no changes are expected. Ho
Jason Ekstrand writes:
> On Jul 9, 2015 7:57 AM, "Francisco Jerez" wrote:
>>
>> We were passing src0 alpha and oMask in reverse order. There seems to
>> be no good way to pass them in the correct order to the new-style
>> LOAD_PAYLOAD (how surprising) be
Jason Ekstrand writes:
> On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez
> wrote:
>> Jason Ekstrand writes:
>>
>>> On Jul 9, 2015 7:57 AM, "Francisco Jerez" wrote:
>>>>
>>>> We were passing src0 alpha and oMask in reverse o
Booleans are represented as 0/-1 on modern hardware which means we can
just negate them to convert them into a numeric type. Negation has
the benefit that it can be implemented using a source modifier which
can easily be propagated into some other instruction. shader-db
results on HSW:
total in
This gets rid of two no16() fall-backs and should allow better
scheduling of the generated IR. There are no uses of usubBorrow() or
uaddCarry() in shader-db so no changes are expected. However the
"arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
"arb_gpu_shader5/execution/built-in
Zoltan Gilian writes:
> Read-only and write-only image arguments are recognized and
> distinguished.
> Attributes of the image arguments are passed to the kernel as implicit
> arguments.
Thanks, this looks much better. One thing that still seems kind of
unfortunate is the fact that you've added
Matt Turner writes:
> On Fri, Jul 10, 2015 at 10:06 AM, Francisco Jerez
> wrote:
>> Booleans are represented as 0/-1 on modern hardware which means we can
>> just negate them to convert them into a numeric type. Negation has
>> the benefit that it can be implemented
eads.
>
> v2:
> - Don't use wrapper for pipe_screen.
>
> CC: 10.6
Thanks, this patch is:
Reviewed-by: Francisco Jerez
> ---
> src/gallium/state_trackers/clover/core/queue.cpp | 2 ++
> src/gallium/targets/opencl/Makefile.am | 4 +++-
> 2 files changed, 5 i
Tom Stellard writes:
> pipe_context::flush() can return a NULL fence if the queue is already
> empty, so we should not assume that an event with a NULL fence
> has the status of CL_QUEUED.
>
This seems suspicious... On the one hand it doesn't seem to be a
documented "feature" of pipe_context::f
Jason Ekstrand writes:
> Now that the old GLSL IR visitor code is gone, having the remap is silly.
> ---
> src/mesa/drivers/dri/i965/brw_fs.h | 12 +--
> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 18 +---
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135
> ++
> }
>
> bool
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
From f09181eadd3ff1cd10f1afeee13e6c4bb86caa91 Mon Sep 17 00:00:00 2001
Fr
; input buffer) objectionable? Do you have any suggestions on how to
> overcome this problem, so the metadata could be passed interleaved?
>
> On Fri, Jul 10, 2015 at 8:08 PM, Francisco Jerez
> wrote:
>> Zoltan Gilian writes:
>>
>>> Read-only and write-only image a
Connor Abbott writes:
> On Tue, Jul 14, 2015 at 6:02 AM, Francisco Jerez
> wrote:
>> Connor Abbott writes:
>>
>>> sources with file == HW_REG get all their information from the
>>> fixed_hw_reg field, so we need to get the stride and type from there
>>
When the width field was removed from fs_reg the BROADCAST handling
code in opt_algebraic() started to miss a number of trivial
optimization cases resulting in the ugly indirect-addressing sequence
to be emitted unnecessarily for some variable-indexed texturing and
UBO loads regardless of one of th
This is essentially the same problem fixed in an earlier patch for
immediates. Setting the stride to zero will be particularly useful
for my future SIMD lowering pass, because we will be able to just
check whether the stride of a source register is zero and skip
emitting the copies required to unz
This fixes essentially the same problem as for immediates. Registers
of the UNIFORM file are typically accessed according to the formula:
read_uniform(r, channel_index, array_index) =
read_element(r, channel_index * 0 + array_index * 1)
Which matches the general direct addressing formula fo
This pass will house ad-hoc lowering code for several send
message-like virtual opcodes that will represent their logically
independent arguments as separate instruction sources rather than as a
single payload blob. This pass will basically just take the separate
arguments that are supposed to be
And start using it in fs_builder::LOAD_PAYLOAD(). This will be used
to emit logical send message opcodes which have an unusually large
number of arguments.
---
src/mesa/drivers/dri/i965/brw_fs_builder.h | 15 ---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/mesa/
This lowering pass implements an algorithm to expand SIMDN
instructions into a sequence of SIMDM instructions in cases where the
hardware doesn't support the original execution size natively for some
particular instruction. The most important use-cases are:
- Lowering send message instructions t
Typically BAD_FILE sources are used to mark a source as not present
what implies that no registers are read. This will become much more
frequent with logical send opcodes which have a large number of
sources, many of them optionally used and marked as BAD_FILE when they
aren't applicable. It will
In cases where the color0 argument wasn't being provided,
emit_single_fb_writes() would take the alpha channel directly from the
visitor state instead of taking it from its arguments. This sort of
hack didn't fit nicely into the logical send-message approach because
all parameters of the instructi
---
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 --
src/mesa/drivers/dri/i965/brw_wm.c | 3 ++-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 08d9abf..c489010 100644
-
We were previously guessing the half based on the EOT flag which seems
rather gross.
---
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
b/src/mesa/drivers/dri/i965/brw_fs_generat
The logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects the arguments
that make up the payload separately as individual sources, like:
fb_write_logical null, color0, color1, src0_alpha,
src_depth, dst_depth,
It's surprising that we weren't checking for this already. A future
patch will cause code like the following to be emitted:
MOV(16) tmp<1>:uw, src
MOV(8) dst<1>:ud, tmp<8,8,1>:ud
The second MOV comes from the expansion of a LOAD_PAYLOAD header copy,
so I don't have control over its types. Cop
This is now unused.
---
src/mesa/drivers/dri/i965/brw_defines.h| 1 -
src/mesa/drivers/dri/i965/brw_fs.h | 4 ---
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 35 --
src/mesa/drivers/dri/i965/brw_shader.cpp | 2 --
4 files changed, 42 deleti
This shouldn't have any effect because we don't emit logical
framebuffer writes yet.
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index ae050b7..70fdc5e 100644
---
There's no need to initialize the wrong half of oMask in the payload
when we're doing an 8-wide framebuffer write because it will be
ignored by the hardware anyway. By doing it this way we can let the
SIMD lowering pass split the sample_mask source as a regular
per-channel source, otherwise we wou
This does essentially the same thing as
fs_visitor::emit_single_fb_write(), with some slight differences:
- We don't have to worry about exec_size and use_2nd_half anymore,
16-wide sources have already been lowered to 8-wide thanks to the
previous commit and the manual argument unzipping is
Flatten the if ladder to match the way that the ordering of these
fields is specified in the hardware documentation a bit more closely.
---
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 28
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/src/mesa/drive
The only non-trivial thing it still has to do is figure out where to
take the src/dst depth values from and predicate the instruction if
discard is in use. The manual SIMD unrolling logic in the dual-source
case goes away because this is now handled transparently by the SIMD
lowering pass.
---
sr
And update the comment.
---
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 20 +++-
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index c489010..b5a42b1 100644
--- a/src/me
Samuel Iglesias Gonsálvez writes:
> On 16/07/15 17:33, Francisco Jerez wrote:
>> When the width field was removed from fs_reg the BROADCAST handling
>> code in opt_algebraic() started to miss a number of trivial
>> optimization cases resulting in the ugly indirect-address
Samuel Iglesias Gonsálvez writes:
> On Fri, 2015-07-17 at 16:33 +0300, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez writes:
>>
>> > On 16/07/15 17:33, Francisco Jerez wrote:
>> >> When the width field was removed from fs_reg the BROADCAST handling
&
Michel Dänzer writes:
> On 17.07.2015 06:03, Marek Olšák wrote:
>> From: Marek Olšák
>>
>> An alternative (and ugly) solution to the current clover issue.
>
> How about something like this instead? (Compile tested only)
>
I'm rather unfamiliar with the radeonsi pipe driver code so I should
pro
dispatch_width is global for a single compilation and doesn't
necessarily match the desired execution width if we had to lower the
original full-width instruction due to hardware limitations. These
were all inside a Gen4-specific branch so this patch shouldn't have
any effect on more recent hardwa
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6afb9fe..c31a0e1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/b
This should be largely equivalent to emit_texture_gen5() except for
slight codestyle changes and the use i965 opcodes instead of the
ir_texture_opcode enum, see "i965/fs: Implement lowering of logical
texturing opcodes on Gen7+." for the mapping between them.
---
src/mesa/drivers/dri/i965/brw_fs.c
This should match the set of cases in which we currently call fail()
or no16() from the emit_texture_*() methods and the ones in which
emit_texture_gen4() enables the SIMD16 workaround.
Hint for reviewers: It's not a big deal if I happen to have missed
some case here, it will just lead to an asser
These weren't being handled by emit_texture_gen7() but we can easily
lower them here for consistency with other texturing opcodes.
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 16 +++-
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/
Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects the arguments
separately as individual sources, like:
tex_logical dst, coordinates, shadow_c, lod, lod2,
sample_index, mcs, sampler, offset,
This should be largely equivalent to emit_texture_gen7() except that
we now get i965 sampling opcodes directly rather than
ir_texture_opcode enum values. The mapping is as follows:
- ir_tex -> SHADER_OPCODE_TEX
- ir_txb -> FS_OPCODE_TXB
- ir_txl -> SHADER_OPCODE_TXL
- ir_txd -> SHADER_OPCODE_
So that it's left uninitialized by LOAD_PAYLOAD, we only need to
reserve space for it in the message since it will be initialized
implicitly by the generator.
---
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/mesa/driver
Unlike its Gen5 and Gen7 counterparts this patch isn't a plain
refactor of the previous Gen4 texturing code, it's more of a rewrite
largely based on emit_texture_gen4_simd16(). The reason is that on
the one hand the original emit_texture_gen4() code didn't seem easily
fixable to be SIMD width-inva
---
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 66 +---
1 file changed, 49 insertions(+), 17 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 89fcc49..4011639 100644
--- a/src/mesa/drivers/dri/
1 - 100 of 3036 matches
Mail list logo