Am 29.10.2015 um 20:33 schrieb Oded Gabbay: > On Thu, Oct 29, 2015 at 9:25 PM, Roland Scheidegger <srol...@vmware.com> > wrote: >> Am 29.10.2015 um 20:10 schrieb Oded Gabbay: >>> On Thu, Oct 29, 2015 at 9:02 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>>> On Thu, Oct 29, 2015 at 2:44 PM, Oded Gabbay <oded.gab...@gmail.com> wrote: >>>>> However, I would hate to keep the situation as is, meaning the test >>>>> passes on x86-64 and fails on ppc64le. >>>> >>>> Sounds like it'd actually be a difference between AVX and SSE4.2 as >>>> well -- what happens if you run on your x86_64 chip with >>>> LP_NATIVE_VECTOR_WIDTH=128 ? It fails for me on my HSW chip, looking >>>> at the results visually it's mostly good but there's a sprinkling of >>>> red pixels. >>>> >>>> -ilia >>> >>> It fails on my Haswell chip - definitely sprinkling of red pixels. >>> Also the error seems to be greater than 5e-7. Even with 1.6e-6 as >>> failure point, it still fails, while on ppc64le it passes. >>> Only when I went for 2e-6, the test passes. >>> >>> As I said and Roland explained, the calculation method is inherently >>> less accurate in the two-stages path. Although I don't know why on >>> SSE4.2 the deviation is a bit larger than on VMX >>> >> >> Does that have fma and does it auto-fuse mul/add to fma? Albeit I don't >> think it should right now... Other than that, I'm not sure why the >> results would be different - albeit on x86 we explicitly disable denorms >> which could cause different results, for this example I don't think this >> should be an issue. >> >> Roland >> >> > > You asked about ppc64le ? > If so, It does have fma instructions in its ISA, both for floating > point and for vector. However, if I look in the LLVM IR, I don't see > any fma intrinsic and if I look at the disassemble, I don't see any > instructions of multiply-and-add. > > So in theory it exists but it isn't used in the tests I mentioned. Ok that's as expected then. It just isn't always very obvious if auto-fusing is allowed or not (unsafe math of course always allows this). Maybe there's some more differences in the fpu environments I can't remember...
Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev