I just realised I made regular cube map textures stop working via the
blit path with this patch. Here is a v2 which just adds
GL_TEXTURE_CUBE_MAP to the switch in intel_try_pbo_upload. I've tested
that it still works with a hacky tweak to the piglit test case.
--- >8 --- (use git a
er path for Jason if he's already working on that.
Regards,
- Neil
Chris Forbes writes:
> Are there some performance numbers to go with this?
>
> On Tue, Dec 23, 2014 at 12:08 PM, Neil Roberts wrote:
>> Here are some patches to make the i965 driver use the blit pipeline
>&g
This patch looks really good. I have some comments below.
Jason Ekstrand writes:
> This meta path, designed for use with PBO's, creates a temporary texture
> out of the PBO and uses BlitFramebuffers to do the actual texture upload.
> ---
> src/mesa/Makefile.sources | 1 +
>
Daniel Vetter writes:
> Oh, I guess my earlier mail was too late. One issue still is picking
> the numbers, since you seem to assume here that ver >= 2 means the
> stuff actually works. But like Ken said the cmd parser in upstream
> isn't really enabled yet.
The patch only enables the predicate
Jason Ekstrand writes:
> This improves texture upload performance on the PBO upload test available
> at http://www.songho.ca/opengl/gl_pbo.html by 80% for the non-PBO case (due
> to avoiding a buffer stall) and 500% for the PBO case.
Just for reference, if I run this branch against the little te
u change that it looks good to me.
Reviewed-by: Neil Roberts
Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
This patch and the rest of the series (apart from the comment for patch
4) look good to me and are
Reviewed-by: Neil Roberts
- Neil
pgpBvaoeRSGjh.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256
Neil Roberts writes:
> + assert(src_bits + dst_bits <= sizeof(x) * 8);
Erm, actually I didn't realise there were places calling this with
dst_bits set to 32, so this isn't going to work. I probably should have
waited for Piglit to finish before sending the patch
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256
Jason Ekstrand writes:
> This looks fine to me. We should probably also do this for snorm formats.
> I don't care if that's part of this or in a separate patch.
> --Jason
The snorm formats are a bit more fiddly because the hardware doesn't
quite seem to be doing what I'd expect. For example, wh
) = (x >> n) + (x >> 2*n) + ...
>
> See also
>
> http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/gallivm/lp_bld_arit.c#n851
>
> Jose
>
> (*) it can be expanded as shifts too, but it wouldn't be worthwhile
>
>
> ___
Marius, the ‘Reviewed-by’ tag should only be added if someone explicitly
replies to your patch and says that you can add it with their name. It's
supposed to mean that the person is happy for the patch to be pushed to
master. I did not do this, I only looked at a previous version of the
patch brief
Hi,
The COPY_CLEAN_4V_TYPE_AS_FLOAT still doesn't look right because as the
last step it calls COPY_SZ_4V which will copy its float arguments using
floating-point registers. It seems the piglit test case is still failing
and if I step through with GDB I can see that it is hitting this code
and usi
Jason Ekstrand writes:
> ---
> src/mesa/main/dd.h | 15 +++
> 1 file changed, 15 insertions(+)
>
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index 2f40915..eb30847 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -415,6 +415,21 @@ struct dd_function_tabl
Jason Ekstrand writes:
> Since the meta path can do strictly more than the blitter path, we just
> remove the blitter path entirely.
> ---
> src/mesa/drivers/dri/i965/intel_pixel_read.c | 130
> ++-
> 1 file changed, 6 insertions(+), 124 deletions(-)
>
> diff --git a/src
Jason Ekstrand writes:
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index 688a919..a64a5f4 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -172,15 +58,11 @@ int
Jason Ekstrand writes:
> - }
> + if (_mesa_meta_pbo_GetTexSubImage(ctx, 3, texImage, 0, 0, 0,
> + texImage->Width, texImage->Height,
> + texImage->Depth, format, type,
> + pixels, &ctx-
This series looks really good to me. I can confirm it gives a 241%
transfer rate increase in that little pboUnpack test on BayTrail.
Assuming the minor comments I made are fixed and the v2 patch for the
pthread_once thingy is used then the series is:
Reviewed-by: Neil Roberts
Regards,
- Neil
any data, it would
just have the height wrong in the sampler state.
Reviewed-by: Neil Roberts
Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.
This was causing GL error
Ian Romanick writes:
> It seems like it should be handled in the core, and it looks like
> _mesa_tex_sub_image is already doing that. Note the "if (width > 0 &&
> height > 0 && depth > 0)" check. What is the callstack that gets here
> with height or depth as zero? That seems fishy.
This funct
Ilia Mirkin writes:
>> - end = _mesa_image_offset(dimensions, pack, width, height,
>> - format, type, depth-1, height-1, width);
>> + if (depth == 0 || height == 0)
>
> Why not width == 0 as well? You could probably just do
>
> return GL_TRUE;
>
> in that case a
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.
This was causing GL error
Jason Ekstrand writes:
> We can probably just bail higher up in the stack and never call the
> driver hook if we have a zero dimension. That would also protect us
> from silly zero-dim bugs that may exist.
Yes, that already does happen. As mentioned elsewhere in the thread,
_mesa_validate_pbo_ac
Make sense.
Is this the only use of the currentTexUnitSave variable? Could be good
to remove it if so.
Reviewed-by: Neil Roberts
- Neil
Ian Romanick writes:
> From: Ian Romanick
>
> We may have been called from glGenerateTextureMipmap with CurrentUnit
> still set to 0, so w
I ran the series through Piglit but there are some issues.
The texelFetch test appears to be broken for sample counts > 10 and
needs this patch to work:
http://patchwork.freedesktop.org/patch/59485/
The accuracy tests are failing but I think the problem is just that it
is too strict. I've writte
From: Kenneth Graunke
Signed-off-by: Kenneth Graunke
Reviewed-by: Neil Roberts
---
src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
b/src/mesa/drivers/dri/i965
This is the standard pattern used by the other 3D graphics API.
BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.
The comment above for the 8x sample posi
The maximum message length for a send message is 11. Some of the
sampler message types have more than 5 arguments which means when they
are doubled to accomodate the SIMD16 register size then the message is
too long. This is important for the ld2dms_w message which will be
used in a later patch bec
In order to support 16x MSAA, skl+ has a wider version of lcd2dms that
takes two parameters for the MCS data. This patch makes it allocate a
register that is twice as big for the MCS data and then always use
the wider version.
---
src/mesa/drivers/dri/i965/brw_defines.h| 4
src/mesa/
I'm not too sure about the expression used to index into sample_map in
the shader. It looks like if fract(coord.x) and fract(coord.y) are
close to 1.0 then it would index outside of the array. However the
code for 4 and 8 has the same problem and the results seems to look
reasonable. It might make
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.
---
src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 22 +-
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/src/mesa/drivers/dri/
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwh
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 9 -
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/src/mesa/drivers/dri/i965/brw_fs.cpp
ind
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.
---
src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.
---
src/mesa/drivers/dri/i965/brw_context.c | 6 ++
src/mesa/drivers/dri/i965/intel_screen.c | 5 -
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_context.c
b/src/mesa/drivers/dri/i965/brw_context.c
index 7c1c133..c05fb74 100644
--- a/src/mes
In order to support 16x MSAA, skl+ has a wider version of lcd2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp
When 16 samples are used the MCS buffer needs 64 bits per pixel.
---
src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 0cb0632..9faafb4 100644
Francisco Jerez writes:
> NAK, these cases are already handled without disabling SIMD16 by
> lowering the SIMD16 message into SIMD8 halves. You just need to add a
> case to get_lowered_simd_width() so that the SIMD lowering pass knows
> what the maximum execution size is for your new sampler mess
Ben Widawsky writes:
>>} else if (op == ir_txf_ms) {
>> emit(MOV(dst_reg(MRF, param_base + 1, sample_index.type,
>> WRITEMASK_X),
>>sample_index));
>> - if (devinfo->gen >= 7) {
>> + if (opcode == SHADER_OPCODE_TXF_CMS_W) {
>> +/*
Ben Widawsky writes:
> On Thu, Sep 17, 2015 at 05:00:08PM +0100, Neil Roberts wrote:
>> When 16x MSAA is used for sampling with texelFetch the compiler needs
>> to use a different instruction which passes more arguments for the MCS
>> data. Previously on skl+ it was uncon
Ben Widawsky writes:
>> + /* On Gen9+ we'll use lcd2ms_w instead which has two registers for
>> + * the MCS data.
>> + */
>> + if (op == SHADER_OPCODE_TXF_CMS_W) {
>> +bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD),
>> +mcs.
Ben Widawsky writes:
> Hmm. As I read it, it sounded like you didn't have to send LOD it's
> implied to be 0 if you don't send it. If I am wrong about that, then I
> agree with you completely.
I'm a bit lost. You're right that it's not necessary to send the LOD
when it's zero. In fact Mesa never
v2: Fix the x_scale in the shader. Remove the doubts in the commit
message.
---
After some helpful explanation from Anuj and reading the code a bit
more, I think I understand this a bit better and I no longer think
there is an issue with the sample map array having out-of-bounds
indices. The t
Anuj Phogat writes:
> As per docs we're supposed to get the per slot SampleID written to
> 15:0 bits in R1.0. I used SSPI to compute the SampleID because I never
> got anything useful in these bits on IVB. Things might have changed on
> later platforms. So, I think it's worth trying to do what do
Neil Roberts writes:
> The following tests are failing but on my SKL device the corresponding
> tests with 8 samples are also failing. As far as I understand these
> aren't known regressions for other people so it may be something to do
> with my device being pre-production. It
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to roundin
Ilia Mirkin writes:
> A couple of fairly generic comments:
>
> - It is not at all clear to me why it's OK to interpolate at sample 0
Yes, this was cheating a little bit. At least on Intel hardware the
samples are supposed to be sorted by order of distance from the centre
so sample 0 will be the
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to roundin
I think this implementation will have problems if the string being
copied is not null terminated. It's not clear from the man pages whether
that is an allowed way to use the function but a quick Google shows up a
few similar patches where they have later been fixed by using strnlen.
It looks like s
Looks good to me. Thanks for doing that.
Reviewed-by: Neil Roberts
- Neil
Samuel Iglesias Gonsalvez writes:
> If the string being copied is not NULL-terminated the result of
> strlen() is undefined.
>
> Signed-off-by: Samuel Iglesias Gonsalvez
> ---
> src/util/ralloc.c
Francisco Jerez writes:
> Sigh, it's really awful that our hardware only supports a single sample
> index for the whole SIMD thread... I was thinking though that there
> might be a better alternative to running the sample-index interpolator
> query in a loop: The "Per Slot Offset" interpolator q
Matt Turner writes:
>> +static fs_reg
>> +get_num_samples_reg(fs_visitor *v)
>> +{
>> + struct gl_program_parameter_list *params = v->prog->Parameters;
>> + static gl_state_index tokens[STATE_LENGTH] = {
>
> I suspect this isn't thread-safe.
Do you mean because the tokens array is static? I
It is possible to directly predicate the WHILE instruction. In this
case there will be a second successor block because the execution can
resume from the instruction after the loop. This will be used in a
subsequent patch.
---
src/mesa/drivers/dri/i965/brw_cfg.cpp | 4
1 file changed, 4 inser
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to
Previously the name of the nir shader was being freed prematurely
during nir_sweep. Since 756613ed35d the name was later being used to
generate filenames for the optimiser debug output and these would end
up with garbage from the dangling pointer.
---
src/glsl/nir/nir_sweep.c | 3 +++
1 file chang
Oops, I just made a similar patch without noticing this one. Feel free
to take the commit message from my patch if you want. Either way this
one is:
Reviewed-by: Neil Roberts
http://patchwork.freedesktop.org/patch/61369/
Sorry for the noise.
Regards,
- Neil
Jason Ekstrand writes
Seems like a good idea to me. Series is
Reviewed-by: Neil Roberts
- Neil
Chad Versace writes:
> This series lives at
> git://github.com/chadversary/mesa refs/tags/skl-fast-clear-v08.01
>
> No Piglit regressions on:
> - Skylake 0x1912 (rev 06)
> - linux 4.3-rc4
&
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to
Bump. Anyone fancy reviewing this small patch? I think it would be good
to have because it makes the code a bit simpler as well as fixing a
corner case and making it more robust.
- Neil
Neil Roberts writes:
> When programming the fast clear color there was previously a chunk of
> code
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA an
According to the GL 1.4 spec section 3.8.10, a cubemap texture is only
complete if:
• The level base arrays of each of the six texture images making up
the cube map have identical, positive, and square dimensions.
• The level base arrays were each specified with the same internal
format.
• The
The texture mipmap completeness checking code was checking whether all
of the faces have the same size. However this is pointless because the
code just above it checks whether the face has the expected size
calculated for the mipmap level anyway so the error condition could
never be reached. This p
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
---
src/mesa/main/get.c | 4
src/mesa/main/get_hash_params.py | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 04348
Previously the framebuffer default sample count was taken directly
from the value given by the application. On the i965 driver on HSW if
the value wasn't one that is supported by the hardware it would hit an
assert when it tried to program the state for it. This patch fixes it
by adding a derived s
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
---
src/mesa/program/prog_statevars.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/program/prog_statevars.c
b/src/mesa/program/prog_statevars.c
index 12490d0..eed2412 1
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
---
src/mesa/main/get.c | 3 +++
src/mesa/main/get_hash_params.py | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 307059
The opt_sampler_eot optimisation of fs_visitor effectively assumes
that it is running on a fragment shader because it casts the program
key to a brw_wm_prog_key. However on Skylake fs_visitor can also be
used for vertex shaders. It looks like this usually works anyway
because the optimisation is sk
If a send message is emitted with a message length that is less than
required for the message then the remaining parameters default to
zero. We can take advantage of this to save a register when a shader
passes constant zeroes as the final coordinates to the sample
function.
I think this might be
Sorry for the really long delay in replying! This patch is still needed
in order to fix a number of Piglit tests so it would be good to get it
landed.
Ben Widawsky writes:
> Sorry for the delay, but I put this off initially because I wasn't
> sure which part of the docs this was addressing. I se
Jason Ekstrand writes:
> +#define list_for_each_entry(type, pos, head, member)\
> + for (type *pos = container_of((head)->next, pos, member);\
> + &pos->member != (head); \
> + pos = container_of(pos->member.next, p
Jason Ekstrand writes:
> +static inline bool list_empty(struct list_head *list)
> +{
> + return list->next == list;
> +}
It would be good if list.h also included stdbool.h in order to get the
declaration of bool. However, will that cause problems on MSVC? Is the
Gallium code compiled on MSVC i
Jason Ekstrand writes:
> +static inline void list_validate(struct list_head *list)
> +{
> + assert(list->next->prev == list && list->prev->next == list);
> + for (struct list_head *node = list->next; node != list; node = node->next)
> + assert(node->next->prev == node && node->prev->next
Hi,
This optimisation doesn't seem to work with textureGather so a bunch of
Piglit tests are failing for me. I'm not sure why it didn't get picked
up by your Jenkins run.
I can't find anything in the bspec nor a known workaround to suggest
that this shouldn't work so I'm not really sure what to d
opt_sampler_eot enables a direct write to framebuffer from a sample.
In order to do this the sample message needs to have a message header
so if there wasn't one already then the function adds one. In addition
the function sets the destination register to null because it's no
longer used. However i
Commit 94ee908448 added a header size parameter to the function to
create the LOAD_PAYLOAD instruction. However this broke
opt_sampler_eot which manually constructs the instruction and so
wasn't setting the header_size. This ends up making the parameters for
the send message all have the wrong loca
The opt_sampler_eot optimisation seems to break when the last
instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
this so it causes a lot of regressions. I can't find any documentation
or known workarounds to indicate that this is expected behaviour, but
considering that this is
I thought it might be a good idea to try posting these patches again
since it's been 6 months since they were originally posted. The
patches are a lot more useful now since the command parser in the
kernel is working correctly for Haswell. This means the functionality
is no longer restricted to onl
EMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Aut
In order to detect whether the predicate source registers can be used
in a later patch we will need to know the version number for the
command parser. This patch just adds a member to intel_screen and does
an ioctl to get the version.
Reviewed-by: Kenneth Graunke
---
src/mesa/drivers/dri/i965/in
Kenneth Graunke writes:
> It might be nice to create a brw_load_register_mem64 function, for
> symmetry with brw_store_register_mem64 - we might want to reuse it
> elsewhere someday.
Ok, that sounds sensible.
> One interesting quirk: the two halves of your register write may land
> in two separ
Ian Romanick writes:
>> For what it's worth, I'm strongly in favour of using these
>> kernel-style lists instead of exec_list. The kernel ones seem much
>> less confusing.
>
> Huh? They're practically identical. The only difference is the
> kernel-style lists have a single sentinel node, and that
This is required for the I915_PARAM_REVISION macro. Previously this
define was directly copied into the Mesa source.
---
configure.ac | 2 +-
src/mesa/drivers/dri/i965/intel_screen.c | 5 -
2 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/configure.ac b
Jason Ekstrand writes:
> I think *most* of that code *should* already be there. In theory,
> it's all keyed off of the block size provided by formats.csv.
> However, given some of the rendering errors we're currently seeing, it
> looks like it may need a little patching here and there. :-)
inte
When using SIMD4x2 on Skylake, the sampler instructions need a message
header to select the correct mode. This was added for most sample
instructions in 0ac4c2727 but the TXF_MCS instruction is emitted
separately and it was missed.
This fixes a bunch of Piglit tests which test texelFetch in a geom
atches out there to handle this. Please ignore if
> this has already been sent by someone. (Direct me to it and I will
> review it).
>
> Cc: Matt Turner
> Cc: Neil Roberts
> Cc: Mark Janes
> Signed-off-by: Ben Widawsky
> ---
> src/mesa/drivers/dri/i965/brw_eu_compact.c |
Previously when setting up the sample instruction for an indirect
sampler the vec4 backend was directly passing the pseudo opcode's
src0. However this isn't actually set to a valid register because
instead the MRF registers are used as the source so it would end up
passing null as src0.
This patch
Previously when generating the send instruction for a sample
instruction with an indirect sampler it would use the destination
register as a temporary store. This breaks when used in combination
with the opt_sampler_eot optimisation because that forces the
destination to be null. This patch fixes t
When calculating the binding table index for non-constant sampler
array indexing it needs to add the base binding table index which is a
constant within the generated code. Often this base is zero so we can
avoid a redundant instruction in that case.
It looks like nothing in shader-db is doing non
Many thanks for all the reviews and testing. I've pushed the two
patches.
The remaining sampler_array_indexing tests that fail on SKL (the gs
ones) are because of a separate problem described in this patch:
http://patchwork.freedesktop.org/patch/50676/
I'm not really sure whether that's the clea
Looks good to me.
Reviewed-by: Neil Roberts
- Neil
Anuj Phogat writes:
> Adding Neil to Cc who committed 4ab8d59.
>
> Reviewed-by: Anuj Phogat
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman
querying the block height anyway. The later patch about
combining the two functions would need to be changed too.
Regards,
- Neil
Neil Roberts writes:
> Looks good to me.
>
> Reviewed-by: Neil Roberts
>
> - Neil
>
> Anuj Phogat writes:
>
>> Adding Neil to Cc who
Looks good to me. Thanks for fixing this. I guess I still have more to
learn about the ISA.
However, should we not also fix the vec4 version? With that,
Reviewed-by: Neil Roberts
If we wanted to play safe and avoid the MUL, we could change it to this
and still avoid having a temporary
Matt Turner writes:
> I don't know why I was confused by this patch -- after arriving at the
> same conclusion independently I see that all of the analysis I needed
> was right there.
Yes sorry, I probably didn't explain it very well. Your explanation is a
lot clearer.
> To sum up, vec4_visitor
Both patches look good to me and I can confirm they make the Piglit
tests pass on Skylake.
Reviewed-by: Neil Roberts
My original assumption of the problem was that the implied writes from
the SCRATCH_WRITE instruction aren't taken into account when calculating
the liveliness of the regi
Jason Ekstrand writes:
> The only place when the fact that the MRFs are virtual matters is in
> register allocation. Implied MRF writes are taken into account in
> setup_mrf_hack_interference. We figure out what MRFs are used and
> then mark them as conflicting with *all* of the VGRFs. We also
A freshly constructed instruction defaults to having a base_mrf of 0
which means that if nothing disables it it will default to using
send-from-MRF. Previously this didn't matter because the constant load
instructions on Gen7 were ignoring the base_mrf anyway. However in the
next patch the brw_send
Matt Turner writes:
>> I'll have another look at moving it into brw_send_indirect_message.
>
> Thanks. I'm not really sure what the right solution is, so if you
> decide this patch is good as is, that's fine with me.
Here's what the patches would look like if we made
brw_send_indirect_message lo
201 - 300 of 506 matches
Mail list logo