Paul Berry <stereotype...@gmail.com> writes: > Previously, we computed dFdy() using the following instruction: > > add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q } > > That had the disadvantage that it computed the same value for all 4 > pixels of a 2x2 subspan, which meant that it was less accurate than > dFdx(). This patch changes it to the following instruction when > c->key.high_quality_derivatives is set: > > add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q } > > This gives it comparable accuracy to dFdx(). > > Unfortunately, for some reason the SIMD16 version of this instruction: > > add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H } > > Doesn't seem to work reliably (presumably the hardware designers never > validated the combination of align16 mode with compressed > instructions), so we unroll it to:
From the gen4 PRM vol4, page 340: "A compressed instruction must be in Align1 access mode. Align16 mode instructions cannot be compressed." Other than updating the comment about compressed due to the PRM cite, this is: Reviewed-by: Eric Anholt <e...@anholt.net> Thanks for figuring this out.
pgpYrw2z_nYiS.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev