fs: add a stride helper

Iago Toral Thu, 05 May 2016 00:55:51 -0700

On Thu, 2016-05-05 at 09:15 +0200, Iago Toral wrote:
> On Wed, 2016-05-04 at 13:59 -0700, Francisco Jerez wrote:
> > Iago Toral <ito...@igalia.com> writes:
> > 
> > > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
> > >> Iago Toral <ito...@igalia.com> writes:
> > >> 
> > >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
> > >> >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:
> > >> >> 
> > >> >> > From: Connor Abbott <connor.w.abb...@intel.com>
> > >> >> >
> > >> >> > Similar to retype() and offset().
> > >> >> > ---
> > >> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 ++++++++
> > >> >> >  1 file changed, 8 insertions(+)
> > >> >> >
> > >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
> > >> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > index e4f20f4..abda2c3 100644
> > >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
> > >> >> >  }
> > >> >> >  
> > >> >> >  static inline fs_reg
> > >> >> > +stride(fs_reg reg, unsigned stride)
> > >> >> > +{
> > >> >> > +   if (reg.stride != 0)
> > >> >> > +      reg.stride = stride;
> > >> >> > +   return reg;
> > >> >> > +}
> > >> >> > +
> > >> >> 
> > >> >> This only works if reg.stride == 0 or 1, we need to honour the stride 
> > >> >> of
> > >> >> the original register (e.g. by doing reg.stride *= stride) or you'll 
> > >> >> end
> > >> >> up taking components not part of the region given as argument.  It 
> > >> >> gets
> > >> >> messy with ARF and HW registers because they represent the stride
> > >> >> differently...  I suggest that instead of fixing this you take the
> > >> >> following patch from my SIMD32 branch that introduces a somewhat 
> > >> >> easier
> > >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), 
> > >> >> i),
> > >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of 
> > >> >> type t
> > >> >> of the original register you would just do 'subscript(reg, t, i)'. 
> > >> >
> > >> > Actually, I think this is not what we want in a number of places.
> > >> >
> > >> > Sometimes we just want to apply a stride to a register, no type changes
> > >> > or anything. That is, we just want to do something like stride(reg, 2).
> > >> > Of course, we can code that with subscript as subscript(reg, 
> > >> > retype(reg,
> > >> > type_that is_half_the_size), 0) but that is certainly forced. For
> > >> > example, we have code such as:
> > >> >
> > >> > bld.MOV(tmp, stride(component_i, 2));
> > >> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
> > >> >         stride(horiz_offset(component_i, 1), 2));
> > >> >
> > >> Oh no, this is precisely the use-case I had in mind: component_i is a
> > >> vector of 64-bit channels laid out contiguously, and you want to extract
> > >> each 32-bit field from each channel separately.  You happen to have
> > >> declared component_i as a 32-bit type but that's kind of a lie because
> > >> each component of "component_i" only represents half a channel, it would
> > >> definitely make more sense to make it a DF because that's what it
> > >> contains...
> > >
> > > I think that whether component_i is really 64-bit or 32-bit is not
> > > relevant, in the sense that what matters is how the algorithm
> > > needs/wants to interpret the data it has to manipulate. The way I
> > > thought about this is that we have our 32-bit channels mixed up and we
> > > need to re-arrange them, so from my perspective, this is all about
> > > 32-bit data that we need to move around in a particular fashion.
> > 
> > Keep in mind that channel masking is applied to component_i in multiples
> > of 64 bits regardless of the type you declare it with, so if you forget
> > the fact that component_i is a vector of elements 64 bits apart from
> > each other and pretend it's a vector of 32 bit channels, you'll end up
> > crossing channel boundaries illegally and miscompiling the program (or
> > working it around with ->force_writemask_all hacks like you do in the
> > SHUFFLE functions...). 
> 
> Yes, that is true.
> 
> >  subscript() guarantees that you never have to
> > break channel boundaries not even temporarily -- If the register you
> > give it as argument was channel-aligned the result is guaranteed
> > channel-aligned.  stride() OTOH gives you a channel-misaligned result
> > you need to compensate for with a matching retype(), so pretty much
> > every use of the result from stride() other than as argument for
> > retype() indicates a bug, that's why it makes sense to do both things in
> > the same operation.  Otherwise you need to deal with a temporary value
> > which is basically meaningless, and you need to manually make sure that
> > the value provided as stride multiplier matches the ratio of the type
> > sizes exactly (the snippet you paste below from PACK lowering
> > illustrates the problem since it would miscompile the program silently
> > if the instruction didn't have the exact number of sources you
> > expected).
> 
> Ok, I see your point, that sounds like a good argument :)
> 
> > > That's why I find the stride() helper a lot more natural to use, where
> > > I just tell the access type and the stride explicitly, while with
> > > subscript() I have to think about a register type and an access type
> > > instead and the stride is a consequence of that relation. I admit this
> > > is very subjective though.
> > >
> > Sorry, but this wasn't just a codestyle suggestion, most of the uses of
> > stride() and horiz_offset() you have in both series will be broken for a
> > certain input of fully well-formed IR due to at least one out of three
> > different reasons (horiz_offset() not being intended (or able except by
> > luck) to move you out of the original SIMD vector, the argument of
> > stride() not being guaranteed to match the intended ratio of types
> > exactly and your stride() implementation being broken for all input
> > strides except 0 or 1).
> > 
> > > In any case, let's change it, as you say we should be able to use
> > > subscript with some minor changes so let's do that. I just wanted to
> > > explain why I personally think that stride() might be more
> > > flexible/natural in some scenarios, at least for some people like me :)
> > >
> > I agree that stride() is more flexible -- Almost too flexible.
> > subscript() is precisely about distilling off the subset of
> > functionality from stride() that you can use safely without breaking
> > channel boundaries -- Not saying that stride() couldn't conceivably be
> > useful for some more exotic purpose, but I've grepped your whole branch
> > and I haven't found a single instance which wasn't basically an
> > open-coded and arguably buggy version of subscript().  My SIMD32 branch
> > makes more intensive use of subscript() (even for things which are not
> > per-channel as you can guess from the ARF/HW REG handling), and I
> > haven't yet found a case where the extra flexibility of stride() would
> > be necessary or useful.
> 
> There is the case where we convert from DF to F which I think is a bit
> special. There we have this code:
> 
>       fs_reg temp = ibld.vgrf(inst->dst.type, 2);
>       ibld.MOV(stride(temp, 2), inst->src[0]);
>       ibld.MOV(dst, stride(temp, 2));
> 
> Basically, what is going on here is that the first MOV does the
> conversion but it writes elements of 64-bit anyway where the hi 32-bit
> of each element is trash so we need the second MOV to re-combine the low
> 32-bit regions to produce a proper 32-bit result.
> 
> With subscript we need to play around with the types we use everywhere
> to achieve the same thing. I think the resulting code is a bit more
> confusing:
> 
>       fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>       fs_reg strided_temp =
>          retype(subscript(temp, inst->dst.type, 0), inst->dst.type);
>       ibld.MOV(strided_temp, inst->src[0]);
>       ibld.MOV(dst, strided_temp);


Never mind this, the confusion was only mine :).

The retype is not necessary of course, I guess I did something wrong
while playing with this.

Iago

> The issue here is that we need to do the allocation of temp with the
> larger type so we can then subscript into it with the smaller type, but
> then we really need to write to and read from temp using the smaller
> type, so we need to retype anyway.
> 
> I don't think this alone justifies that we keep the stride helper
> though, just pointing out a case where using subscript is a bit more
> forced IMHO.
> 
> > > For the record, Ken suggested we land things as they are now (with
> > > horiz_offset/stride) and fix these things later to use subscript()
> > > instead, so we will do that.
> > >
> > >> > where component_i is of type F. Of course, we could retype component_i
> > >> > to DF when we pass it to subscript, and use type F for the type
> > >> > parameter, so that subscript does what we want, but it is certainly not
> > >> > very natural. Subscript is all about accessing data with a different
> > >> > type and we have some places where we want to do that and subscript 
> > >> > does
> > >> > what we want, but we also want to use a "normal" stride sometimes.
> > >> >
> > >> > Another example, in lowering of PACK, we do something like this:
> > >> >
> > >> > ibld.MOV(stride(horiz_offset(retype(dst, inst->src[i].type), i), 
> > >> >                 inst->sources),
> > >> >          inst->src[i]);
> > >> >
> > >> Same here.  If inst->sources hadn't matched the ratio between the two
> > >> types you'd have ended up reading cross channel boundaries and
> > >> miscompiling the program.  Using subscript() instead would guarantee
> > >> that this is never the case even if e.g. one of the sources happens to
> > >> be missing.
> > >> 
> > >> >
> > >> > Here the stride we want is not based on any relation between types 
> > >> > (even
> > >> > if I think in this case it would work because inst->sources will be 2
> > >> > and the type of inst->src[i] will be half of the type of the dst).
> > >> >
> > >> > I think that even if subscript can be useful in some of our cases, we
> > >> > still want the stride helper for a bunch of other cases.
> > >> >
> > >> >>  I
> > >> >> think I've looked through all the uses of stride() in your branch and
> > >> >> they are all sort of an open-coded version of subscript().  This will
> > >> >> also address the issue Ken pointed out in patch 29 related to the use 
> > >> >> of
> > >> >> horiz_offset() on uniforms (or anything with stride other than one,
> > >> >> really) -- See the attached patch (yes, correct ARF/HW reg handling 
> > >> >> will
> > >> >> be required for SIMD32...).
> > >> >> 
> > >> >> > +static inline fs_reg
> > >> >> >  byte_offset(fs_reg reg, unsigned delta)
> > >> >> >  {
> > >> >> >     switch (reg.file) {
> > >> >> > -- 
> > >> >> > 2.5.0
> > >> >> >
> > >> >> > _______________________________________________
> > >> >> > mesa-dev mailing list
> > >> >> > mesa-dev@lists.freedesktop.org
> > >> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >> >> 
> > >> >> _______________________________________________
> > >> >> mesa-dev mailing list
> > >> >> mesa-dev@lists.freedesktop.org
> > >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

Reply via email to