Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > On Wed, 2016-05-11 at 17:12 +0200, Samuel Iglesias Gonsálvez wrote: >> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote: >> > >> > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> > >> > > >> > > >> > > From: Iago Toral Quiroga <ito...@igalia.com> >> > > >> > > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 >> > > starting at a >> > > constant offset that is 16-byte aligned. If we need to access an >> > > unaligned >> > > offset we emit a load with an aligned offset and use the >> > > remaining >> > > constant >> > > offset to select the component into the vec4 result that we are >> > > interested >> > > in. This component must be computed in units of the type size, >> > > since that >> > > is what fs_reg::set_smear expects. >> > > >> > > This patch does this change in the two places where we use this >> > > message: >> > > In demote_pull_constants when we lower uniform access with >> > > constant >> > > offset >> > > into the pull constant buffer and in UBO loads with constant >> > > offset. >> > > --- >> > > src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++- >> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++- >> > > 2 files changed, 5 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > index 0e69be8..dff13ea 100644 >> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads() >> > > inst->src[i].file = VGRF; >> > > inst->src[i].nr = dst.nr; >> > > inst->src[i].reg_offset = 0; >> > > - inst->src[i].set_smear(pull_index & 3); >> > > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) / >> > > 4); >> > > + inst->src[i].set_smear((pull_index & 3) / type_slots); >> > > >> > This cannot be right, why should we care what the destination type >> > of >> > the instruction is while lowering a uniform source? Also I don't >> > think >> > the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < >> > 1 >> > you'll force type_slots to 1 and end up interpreting the pull_index >> > in >> > the wrong units. How about: >> > >> > > >> > > >> > > inst->src[i].set_smear((pull_index & 3) * 4 / >> > > type_sz(inst->src[i].type)); >> > > >> OK >> >> > >> > > >> > > brw_mark_surface_used(prog_data, index); >> > > } >> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > > index 4cd219a..532ca65 100644 >> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const >> > > fs_builder &bld, nir_intrinsic_instr *instr >> > > bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, >> > > packed_consts, >> > > surf_index, const_offset_reg); >> > > >> > > + unsigned component_base = >> > > + (const_offset->u32[0] % 16) / MAX2(1, >> > > type_sz(dest.type)); >> > Rather than dividing by the type size only to let set_smear >> > multiply >> > by >> > the type size again, I think it would be cleaner to do something >> > like: >> > >> > > >> > > >> > > const fs_reg consts = byte_offset(packed_consts, >> > > const_offset->u32[0] % 16); >> > > >> > > for (unsigned i = 0; i < instr->num_components; i++) { >> > then here: >> > >> > > >> > > >> > > bld.MOV(offset(dest, bld, i), component(consts, i)); >> > and then remove the rest of the loop. >> > >> I am having troubles with adapting patch 13/23 to this way because >> the >> following assert in component() is failing for some tests: >> >> assert(reg.subreg_offset == 0); >> >> consts.subreg is not zero thanks to byte_offset() call. >> >> So I prefer to go to a mixed solution: keep set_smear() usage, then: >> >> bld.MOV(offset(dest, bld, i), packed_consts); >> > > Looking at patch 13, offset(dest, bld, i) needs to be adjusted to save > the remaining components, so I think the MOV is clearer as it is now > than the proposed change. > I won't pretend I understand everything going on in PATCH 13 :P, but feel free to keep the induction on the destination register if it turns out to be convenient later on.
> Sam > >> and remove the rest of the loop. >> >> Sam >> >> > >> > > >> > > >> > > - packed_consts.set_smear(const_offset->u32[0] % 16 / >> > > 4 >> > > + i); >> > > + packed_consts.set_smear(component_base + i); >> > > >> > > /* The std140 packing rules don't allow vectors to >> > > cross 16-byte >> > > * boundaries, and a reg is 32 bytes. >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev