On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote:
> Added on do_untyped_vector_read, that is used on the following > intrinsics: > * nir_intrinsic_load_shared > * nir_intrinsic_load_ubo > * nir_intrinsic_load_ssbo > > v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) > --- > src/intel/compiler/brw_fs_nir.cpp | 24 ++++++++++++++++++++---- > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index 13c16fc912..d2f2e17b70 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -2268,6 +2268,18 @@ do_untyped_vector_read(const fs_builder &bld, > > bld.ADD(read_offset, read_offset, brw_imm_ud(16)); > } > + } else if (type_sz(dest.type) == 2) { > + for (unsigned i = 0; i < num_components; i++) { > + fs_reg base_offset = bld.vgrf(BRW_REGISTER_TYPE_UD); > + bld.ADD(base_offset, > + offset_reg, > + brw_imm_ud(i * type_sz(dest.type))); > + fs_reg read_reg = emit_byte_scattered_read(bld, surf_index, > base_offset, > + 1 /* dims */, > + 1, > + BRW_PREDICATE_NONE); > + bld.MOV(offset(dest,bld,i), subscript(read_reg, dest.type, 0)); > + } > This looks fine though I'm not actually sure byte scattered reads are actually going to buy us anything over doing a DWORD read and unpacking the 16-bit values from the result. > } else { > unreachable("Unsupported type"); > } > @@ -3911,10 +3923,14 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > &bld, nir_intrinsic_instr *instr > if (const_offset == NULL) { > fs_reg base_offset = retype(get_nir_src(instr->src[1]), > BRW_REGISTER_TYPE_UD); > - > - for (int i = 0; i < instr->num_components; i++) > - VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), > surf_index, > - base_offset, i * > type_sz(dest.type)); > + if (type_sz(dest.type) == 2) { > + do_untyped_vector_read(bld, dest, surf_index, base_offset, > + instr->num_components); > + } else { > + for (int i = 0; i < instr->num_components; i++) > + VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), > surf_index, > + base_offset, i * > type_sz(dest.type)); > + } > I don't like lumping this change in with the other and I'm not sure I like the change at all. This switches us from using the sampler cache over to using the data cache silently based on the component size. I would rather we either switch 100% over to the byte/oword scattered messages at some hardware generation, or just keep with the sampler operation we're doing today. From my reading of the SKL docs, it looks like we can't actually use the constant cache with the byte scattered messages but we could with the DWORD scattered messages which may be a good change in future (though Curro may have some reason we don't want to do that). Unless we have a very good reason why we need to use the byte messages, let's use something that goes through the constant or sampler cache. This is for UBO constant data pulls so there's no real problem with reading too much data. I think the thing to do in the short term is just do a VARYING_PULL_CONSTANT_LOAD for each pair of components and then split them out into 16-bit chunks. Also, that should be it's own patch. > } else { > /* Even if we are loading doubles, a pull constant load will load > * a 32-bit vec4, so should only reserve vgrf space for that. If > we > -- > 2.13.6 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev