On Tue, 2019-01-22 at 16:36 -0800, Matt Turner wrote: > On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <ito...@igalia.com > > wrote: > > > > We use ALign16 mode for this, since it is more convenient, but the > > PRM > > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register > > region > > restrictions', Section '1. Special Restrictions': > > > > "In Align16 mode, the channel selects and channel enables apply > > to a > > pair of half-floats, because these parameters are defined for > > DWord > > elements ONLY. This is applicable when both source and > > destination > > are half-floats." > > > > This means that we cannot select individual HF elements using > > swizzles > > like we do with 32-bit floats so we can't implement the required > > regioning for this. > > > > Use the gen11 path for this instead, which uses Align1 mode. > > > > The restriction is not present in gen9 or gen10, where the Align16 > > implementation seems to work just fine. > > > > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > > --- > > src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/src/intel/compiler/brw_fs_generator.cpp > > b/src/intel/compiler/brw_fs_generator.cpp > > index d0cc4a6d231..4310f0b7fdc 100644 > > --- a/src/intel/compiler/brw_fs_generator.cpp > > +++ b/src/intel/compiler/brw_fs_generator.cpp > > @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst > > *inst, > > const uint32_t type_size = type_sz(src.type); > > > > if (inst->opcode == FS_OPCODE_DDY_FINE) { > > - /* produce accurate derivatives */ > > - if (devinfo->gen >= 11) { > > + /* produce accurate derivatives. We can do this easily in > > Align16 > > + * but this is not supported in gen11+ and gen8 Align16 > > swizzles > > + * for Half-Float operands work in units of 32-bit and > > always > > + * select pairs of consecutive half-float elements, so we > > can't use > > + * use it for this. > > + */ > > Let's break this comment up and include (or move) the BSpec text from > the commit message here. I wouldn't mention the "this is not > supported > in gen11+" because it's slightly unclear whether you're talking about > "accurate derivatives" or "Align16". How about: > > > /* produce accurate derivatives. > * > * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU) > * "Register Region Restrictions", Section "1. Special > Restrictions": > * > * "In Align16 mode, the channel selects and channel enables > apply to > * a pair of half-floats, because these parameters are > defined for > * DWord elements ONLY. This is applicable when both source > and > * destination are half-floats." > * > * So for half-float operations we use the Gen11+ Align1 path. > CHV > * inherits its FP16 hardware from SKL, so it is not affected. > */ > > > > + if (devinfo->gen >= 11 || > > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) > > { > > > The docs are bad about telling us whether a BDW restriction applies > to > CHV as well, but in this case I suspect the answer is no. CHV seems > to > inherit its FP16 hw from SKL, which doesn't have the restriction as > you say. > > So I suspect you want devinfo->is_broadwell instead of devinfo->gen > == 8.
I have just confirmed that your suspicion is correct, thanks for bringing this up! > With that (and confirmation that CHV isn't affected), this patch is > > Reviewed-by: Matt Turner <matts...@gmail.com> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev