On Thu, 2016-05-05 at 09:15 +0200, Iago Toral wrote: > On Wed, 2016-05-04 at 13:59 -0700, Francisco Jerez wrote: > > Iago Toral <ito...@igalia.com> writes: > > > > > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote: > > >> Iago Toral <ito...@igalia.com> writes: > > >> > > >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote: > > >> >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > >> >> > > >> >> > From: Connor Abbott <connor.w.abb...@intel.com> > > >> >> > > > >> >> > Similar to retype() and offset(). > > >> >> > --- > > >> >> > src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 ++++++++ > > >> >> > 1 file changed, 8 insertions(+) > > >> >> > > > >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h > > >> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h > > >> >> > index e4f20f4..abda2c3 100644 > > >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h > > >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h > > >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type) > > >> >> > } > > >> >> > > > >> >> > static inline fs_reg > > >> >> > +stride(fs_reg reg, unsigned stride) > > >> >> > +{ > > >> >> > + if (reg.stride != 0) > > >> >> > + reg.stride = stride; > > >> >> > + return reg; > > >> >> > +} > > >> >> > + > > >> >> > > >> >> This only works if reg.stride == 0 or 1, we need to honour the stride > > >> >> of > > >> >> the original register (e.g. by doing reg.stride *= stride) or you'll > > >> >> end > > >> >> up taking components not part of the region given as argument. It > > >> >> gets > > >> >> messy with ARF and HW registers because they represent the stride > > >> >> differently... I suggest that instead of fixing this you take the > > >> >> following patch from my SIMD32 branch that introduces a somewhat > > >> >> easier > > >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), > > >> >> i), > > >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of > > >> >> type t > > >> >> of the original register you would just do 'subscript(reg, t, i)'. > > >> > > > >> > Actually, I think this is not what we want in a number of places. > > >> > > > >> > Sometimes we just want to apply a stride to a register, no type changes > > >> > or anything. That is, we just want to do something like stride(reg, 2). > > >> > Of course, we can code that with subscript as subscript(reg, > > >> > retype(reg, > > >> > type_that is_half_the_size), 0) but that is certainly forced. For > > >> > example, we have code such as: > > >> > > > >> > bld.MOV(tmp, stride(component_i, 2)); > > >> > bld.MOV(horiz_offset(tmp, 8 * multiplier), > > >> > stride(horiz_offset(component_i, 1), 2)); > > >> > > > >> Oh no, this is precisely the use-case I had in mind: component_i is a > > >> vector of 64-bit channels laid out contiguously, and you want to extract > > >> each 32-bit field from each channel separately. You happen to have > > >> declared component_i as a 32-bit type but that's kind of a lie because > > >> each component of "component_i" only represents half a channel, it would > > >> definitely make more sense to make it a DF because that's what it > > >> contains... > > > > > > I think that whether component_i is really 64-bit or 32-bit is not > > > relevant, in the sense that what matters is how the algorithm > > > needs/wants to interpret the data it has to manipulate. The way I > > > thought about this is that we have our 32-bit channels mixed up and we > > > need to re-arrange them, so from my perspective, this is all about > > > 32-bit data that we need to move around in a particular fashion. > > > > Keep in mind that channel masking is applied to component_i in multiples > > of 64 bits regardless of the type you declare it with, so if you forget > > the fact that component_i is a vector of elements 64 bits apart from > > each other and pretend it's a vector of 32 bit channels, you'll end up > > crossing channel boundaries illegally and miscompiling the program (or > > working it around with ->force_writemask_all hacks like you do in the > > SHUFFLE functions...). > > Yes, that is true. > > > subscript() guarantees that you never have to > > break channel boundaries not even temporarily -- If the register you > > give it as argument was channel-aligned the result is guaranteed > > channel-aligned. stride() OTOH gives you a channel-misaligned result > > you need to compensate for with a matching retype(), so pretty much > > every use of the result from stride() other than as argument for > > retype() indicates a bug, that's why it makes sense to do both things in > > the same operation. Otherwise you need to deal with a temporary value > > which is basically meaningless, and you need to manually make sure that > > the value provided as stride multiplier matches the ratio of the type > > sizes exactly (the snippet you paste below from PACK lowering > > illustrates the problem since it would miscompile the program silently > > if the instruction didn't have the exact number of sources you > > expected). > > Ok, I see your point, that sounds like a good argument :) > > > > That's why I find the stride() helper a lot more natural to use, where > > > I just tell the access type and the stride explicitly, while with > > > subscript() I have to think about a register type and an access type > > > instead and the stride is a consequence of that relation. I admit this > > > is very subjective though. > > > > > Sorry, but this wasn't just a codestyle suggestion, most of the uses of > > stride() and horiz_offset() you have in both series will be broken for a > > certain input of fully well-formed IR due to at least one out of three > > different reasons (horiz_offset() not being intended (or able except by > > luck) to move you out of the original SIMD vector, the argument of > > stride() not being guaranteed to match the intended ratio of types > > exactly and your stride() implementation being broken for all input > > strides except 0 or 1). > > > > > In any case, let's change it, as you say we should be able to use > > > subscript with some minor changes so let's do that. I just wanted to > > > explain why I personally think that stride() might be more > > > flexible/natural in some scenarios, at least for some people like me :) > > > > > I agree that stride() is more flexible -- Almost too flexible. > > subscript() is precisely about distilling off the subset of > > functionality from stride() that you can use safely without breaking > > channel boundaries -- Not saying that stride() couldn't conceivably be > > useful for some more exotic purpose, but I've grepped your whole branch > > and I haven't found a single instance which wasn't basically an > > open-coded and arguably buggy version of subscript(). My SIMD32 branch > > makes more intensive use of subscript() (even for things which are not > > per-channel as you can guess from the ARF/HW REG handling), and I > > haven't yet found a case where the extra flexibility of stride() would > > be necessary or useful. > > There is the case where we convert from DF to F which I think is a bit > special. There we have this code: > > fs_reg temp = ibld.vgrf(inst->dst.type, 2); > ibld.MOV(stride(temp, 2), inst->src[0]); > ibld.MOV(dst, stride(temp, 2)); > > Basically, what is going on here is that the first MOV does the > conversion but it writes elements of 64-bit anyway where the hi 32-bit > of each element is trash so we need the second MOV to re-combine the low > 32-bit regions to produce a proper 32-bit result. > > With subscript we need to play around with the types we use everywhere > to achieve the same thing. I think the resulting code is a bit more > confusing: > > fs_reg temp = ibld.vgrf(inst->src[0].type, 1); > fs_reg strided_temp = > retype(subscript(temp, inst->dst.type, 0), inst->dst.type); > ibld.MOV(strided_temp, inst->src[0]); > ibld.MOV(dst, strided_temp);
Never mind this, the confusion was only mine :). The retype is not necessary of course, I guess I did something wrong while playing with this. Iago > The issue here is that we need to do the allocation of temp with the > larger type so we can then subscript into it with the smaller type, but > then we really need to write to and read from temp using the smaller > type, so we need to retype anyway. > > I don't think this alone justifies that we keep the stride helper > though, just pointing out a case where using subscript is a bit more > forced IMHO. > > > > For the record, Ken suggested we land things as they are now (with > > > horiz_offset/stride) and fix these things later to use subscript() > > > instead, so we will do that. > > > > > >> > where component_i is of type F. Of course, we could retype component_i > > >> > to DF when we pass it to subscript, and use type F for the type > > >> > parameter, so that subscript does what we want, but it is certainly not > > >> > very natural. Subscript is all about accessing data with a different > > >> > type and we have some places where we want to do that and subscript > > >> > does > > >> > what we want, but we also want to use a "normal" stride sometimes. > > >> > > > >> > Another example, in lowering of PACK, we do something like this: > > >> > > > >> > ibld.MOV(stride(horiz_offset(retype(dst, inst->src[i].type), i), > > >> > inst->sources), > > >> > inst->src[i]); > > >> > > > >> Same here. If inst->sources hadn't matched the ratio between the two > > >> types you'd have ended up reading cross channel boundaries and > > >> miscompiling the program. Using subscript() instead would guarantee > > >> that this is never the case even if e.g. one of the sources happens to > > >> be missing. > > >> > > >> > > > >> > Here the stride we want is not based on any relation between types > > >> > (even > > >> > if I think in this case it would work because inst->sources will be 2 > > >> > and the type of inst->src[i] will be half of the type of the dst). > > >> > > > >> > I think that even if subscript can be useful in some of our cases, we > > >> > still want the stride helper for a bunch of other cases. > > >> > > > >> >> I > > >> >> think I've looked through all the uses of stride() in your branch and > > >> >> they are all sort of an open-coded version of subscript(). This will > > >> >> also address the issue Ken pointed out in patch 29 related to the use > > >> >> of > > >> >> horiz_offset() on uniforms (or anything with stride other than one, > > >> >> really) -- See the attached patch (yes, correct ARF/HW reg handling > > >> >> will > > >> >> be required for SIMD32...). > > >> >> > > >> >> > +static inline fs_reg > > >> >> > byte_offset(fs_reg reg, unsigned delta) > > >> >> > { > > >> >> > switch (reg.file) { > > >> >> > -- > > >> >> > 2.5.0 > > >> >> > > > >> >> > _______________________________________________ > > >> >> > mesa-dev mailing list > > >> >> > mesa-dev@lists.freedesktop.org > > >> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > >> >> > > >> >> _______________________________________________ > > >> >> mesa-dev mailing list > > >> >> mesa-dev@lists.freedesktop.org > > >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev