On Mon, Dec 5, 2016 at 2:20 PM, Connor Abbott <[email protected]> wrote: > On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <[email protected]> wrote: >> On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <[email protected]> wrote: >>> On 12/05, Matt Turner wrote: >>>> >>>> On 11/28, Ian Romanick wrote: >>>>> >>>>> From: Ian Romanick <[email protected]> >>>>> Patches 42 through 50 enable the extension on BDW+. >>>> >>>> >>>> 42-48 are >>>> >>>> Reviewed-by: Matt Turner <[email protected]> >>>> >>>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how >>>> fp64 works. >>> >>> >>> Ah, I think I see. Because 16x doubles take up 4 registers, we have to >>> emit two CMP instructions, one with 1Q and one with 2Q: >>> >>> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g11<4,4,1>DF { align1 >>> 1Q }; >>> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g7<4,4,1>DF { align1 2Q >>> }; >>> >>> (from fs-op-add-double-double.shader_test) >>> >>> Makes sense to me. 49 is >>> >>> Reviewed-by: Matt Turner <[email protected]> >> >> Actually, it's something a little different. The splitting you're >> talking about is handled just fine by curro's SIMD lowering pass. The >> issue here is that if you don't specify a null destination register >> (in which case this a moot point), CMP will always output the same >> destination bitsize as the source bitsize. That is, if you compare two >> registers with 8 doubles each (two SIMD8 registers each), the result >> will take up two SIMD8 registers instead of one as you'd expect. I >> couldn't track this down in the PRM, but I definitely remember having >> to implement it and getting wrong results without it. The end result >> is that you have to use a strided move to get the low 32 bits of each >> 64-bit destination channel, which is what subscript() does. This >> happens irrespective of whether you're compiling for SIMD8 or SIMD16. >> Of course, in this case some backend optimizations have managed to >> remove the destination register, so that's why you don't see it here, >> but if you do something trickier, like store the result to a buffer, >> the strided mov will be there. >> >> Anyways, that's what I remember of it... it's been a while. > > Although, the example you gave has a bug, since the second CMP > overwrites the result of the previous one... it looks like > lower_simd_width isn't offsetting the flag register correctly when > splitting the CMP.
I assumed that quarter control would select which flag subregister to write... I sure hope that's how the hardware works. _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
