On Mon, Dec 5, 2016 at 5:48 PM, Matt Turner <matts...@gmail.com> wrote: > On Mon, Dec 5, 2016 at 2:20 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >>> On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <matts...@gmail.com> wrote: >>>> On 12/05, Matt Turner wrote: >>>>> >>>>> On 11/28, Ian Romanick wrote: >>>>>> >>>>>> From: Ian Romanick <ian.d.roman...@intel.com> >>>>>> Patches 42 through 50 enable the extension on BDW+. >>>>> >>>>> >>>>> 42-48 are >>>>> >>>>> Reviewed-by: Matt Turner <matts...@gmail.com> >>>>> >>>>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how >>>>> fp64 works. >>>> >>>> >>>> Ah, I think I see. Because 16x doubles take up 4 registers, we have to >>>> emit two CMP instructions, one with 1Q and one with 2Q: >>>> >>>> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g11<4,4,1>DF { align1 >>>> 1Q }; >>>> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g7<4,4,1>DF { align1 >>>> 2Q >>>> }; >>>> >>>> (from fs-op-add-double-double.shader_test) >>>> >>>> Makes sense to me. 49 is >>>> >>>> Reviewed-by: Matt Turner <matts...@gmail.com> >>> >>> Actually, it's something a little different. The splitting you're >>> talking about is handled just fine by curro's SIMD lowering pass. The >>> issue here is that if you don't specify a null destination register >>> (in which case this a moot point), CMP will always output the same >>> destination bitsize as the source bitsize. That is, if you compare two >>> registers with 8 doubles each (two SIMD8 registers each), the result >>> will take up two SIMD8 registers instead of one as you'd expect. I >>> couldn't track this down in the PRM, but I definitely remember having >>> to implement it and getting wrong results without it. The end result >>> is that you have to use a strided move to get the low 32 bits of each >>> 64-bit destination channel, which is what subscript() does. This >>> happens irrespective of whether you're compiling for SIMD8 or SIMD16. >>> Of course, in this case some backend optimizations have managed to >>> remove the destination register, so that's why you don't see it here, >>> but if you do something trickier, like store the result to a buffer, >>> the strided mov will be there. >>> >>> Anyways, that's what I remember of it... it's been a while. >> >> Although, the example you gave has a bug, since the second CMP >> overwrites the result of the previous one... it looks like >> lower_simd_width isn't offsetting the flag register correctly when >> splitting the CMP. > > I assumed that quarter control would select which flag subregister to > write... I sure hope that's how the hardware works.
Ah, yeah, that's right. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev