Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > It is tested empirically that IVB/BYT don't support indirect addressing > with doubles but it is not documented in the PRM. > > This patch applies the same solution than for Cherryview/Broxton and > takes into account that we cannot double the stride, since the > hardware will do it internally. > > v2: > - Fix assert to take into account Indirect DF MOVs in IVB and HSW. > > Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > --- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 > ++++++++++++++++++-------- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 ++++++----- > 2 files changed, 25 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index 487f2e90224..dd6cbab773c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > @@ -418,18 +418,29 @@ fs_generator::generate_mov_indirect(fs_inst *inst, > brw_MOV(p, dst, reg); > } else { > /* Prior to Broadwell, there are only 8 address registers. */ > - assert(inst->exec_size == 8 || devinfo->gen >= 8); > + assert(inst->exec_size == 8 || devinfo->gen >= 8 || > + (devinfo->gen == 7 && inst->exec_size < 8 && > + (type_sz(reg.type) == 8 || type_sz(dst.type) == 8))); > > /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */ > struct brw_reg addr = vec8(brw_address_reg(0)); > > - /* The destination stride of an instruction (in bytes) must be greater > - * than or equal to the size of the rest of the instruction. Since the > - * address register is of type UW, we can't use a D-type instruction. > - * In order to get around this, re retype to UW and use a stride. > - */ > - indirect_byte_offset = > - retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW); > + if (devinfo->gen != 7 || devinfo->is_haswell || type_sz(reg.type) != > 8) { > + /* The destination stride of an instruction (in bytes) must be > greater > + * than or equal to the size of the rest of the instruction. Since > the > + * address register is of type UW, we can't use a D-type > instruction. > + * In order to get around this, re retype to UW and use a stride. > + */ > + indirect_byte_offset = > + retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW); > + } else { > + /* In Ivybridge/Baytrail, when it operates with DF operands, we > + * cannot double the stride, since the hardware will do it > + * internally. Tested empirically. > + */ > + indirect_byte_offset = > + retype(indirect_byte_offset, BRW_REGISTER_TYPE_UW); > + }
This doesn't make any sense, indirect_byte_offset is a 32-bit integer type regardless of the type of the data you're copying, the hardware won't do any stride doubling for you here. I've verified on IVB hardware that indirect addressing works for 64-bit moves, but getting it to do what you want is a bit trickier than on more recent hardware -- I'll reply with a patch that gets it working. > > /* There are a number of reasons why we don't use the base offset here. > * One reason is that the field is only 9 bits which means we can only > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 8f745dff440..85f43b7b144 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -3822,17 +3822,18 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, > nir_intrinsic_instr *instr > (instr->num_components - 1) * type_sz(dest.type); > > fs_reg indirect_chv_high_32bit; > - bool is_chv_bxt_64bit = > - (devinfo->is_cherryview || devinfo->is_broxton) && > - type_sz(dest.type) == 8; > - if (is_chv_bxt_64bit) { > + bool is_ivb_byt_chv_bxt_64bit = Let's use a more sensible name here like "supports_64bit_indirects" and mark as const... Or just remove the variable altogether and fold this into the condition of the second if statement below. > + (devinfo->is_cherryview || devinfo->is_broxton || > + devinfo->is_ivybridge || devinfo->is_baytrail) && > + type_sz(dest.type) == 8; > + if (is_ivb_byt_chv_bxt_64bit) { > indirect_chv_high_32bit = vgrf(glsl_type::uint_type); > /* Calculate indirect address to read high 32 bits */ > bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4)); The ADD and temporary seem redundant, you could take into account the additional 32bit offset by applying an immediate offset to the 'src' region used below. > } > > for (unsigned j = 0; j < instr->num_components; j++) { > - if (!is_chv_bxt_64bit) { > + if (!is_ivb_byt_chv_bxt_64bit) { The code below seems pretty broken... You lower a 64-bit to 64-bit indirect move into two 64-bit to UD type-converting moves, which is unlikely to do what you want. You need to lower it into two raw 32-bit to 32bit moves (e.g. by using subscript() in the source as well as on the destination). Please fix this (and CC the patch to mesa-stable) regardless of my patch to get 64-bit MOV_INDIRECT working, CHV/BXT are unlikely to work as expected except by luck. > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > offset(dest, bld, j), offset(src, bld, j), > indirect, brw_imm_ud(read_size)); > -- > 2.11.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev