On Fri, 2017-01-13 at 14:36 -0800, Matt Turner wrote: > On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez > <sigles...@igalia.com> wrote: > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > On Ivybridge/Valleyview, when converting a float (F) to a double > > precision float (DF), the hardware automatically duplicates the > > source > > horizontal stride, hence converting only the values in odd > > positions. > > > > This commit adds a new lowering step, exclusively for IVB/VLV, > > where the > > sources are first copied in a temporal register with stride 2, and > > then converted from this temporal register. Thus, we do not lose > > any > > value. > > Curro explained how he thinks the hardware works to me. I'll try to > reproduce that description here. > > The FPU channels are 32-bits wide on IVB/BYT. Normally, for example > when operating on 8 float channels, the FPU is given a channel of the > source register to operate on, and each FPU channel produces a value > which is written to the channels of the destination. > > But when operating on doubles, each *pair* of FPU channels operates > on > one (double-precision) value. Unfortunately the hardware designers > didn't seem to update the input and output logic, so for instance > every pair of float channels from the source region are given as > input > to the FPU, even though only the low (or even numbered) channel will > be used. This is why it appears that the hardware doubles the stride, > but it's really just ignoring all of the odd channels. > > A similar thing happens on output. The output elements are 64-bits > (even if the output type is float), and so a destination stride of 1 > means the writes are strided by 64-bits. This explains the strange > looking behavior you discovered of an instruction like mov(8) gX<1>F > gY<8,8,1>DF. > > With that understanding, we actually can read consecutive float > channels and convert them to doubles in one instruction -- by using a > <1,2,0> region. Each float channel is read twice, and the second read > will be ignored by the FPU. > > So we can replace this patch with the one I have attached. A nice > side > effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.
Oh, thanks a lot for this explanation! It helps us a lot to understand how IvyBridge works :-) Thanks for the patch, I will apply it to our -rc2 branch. Sam _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev