On Thu, Oct 29, 2015 at 9:25 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 29.10.2015 um 20:10 schrieb Oded Gabbay: >> On Thu, Oct 29, 2015 at 9:02 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>> On Thu, Oct 29, 2015 at 2:44 PM, Oded Gabbay <oded.gab...@gmail.com> wrote: >>>> However, I would hate to keep the situation as is, meaning the test >>>> passes on x86-64 and fails on ppc64le. >>> >>> Sounds like it'd actually be a difference between AVX and SSE4.2 as >>> well -- what happens if you run on your x86_64 chip with >>> LP_NATIVE_VECTOR_WIDTH=128 ? It fails for me on my HSW chip, looking >>> at the results visually it's mostly good but there's a sprinkling of >>> red pixels. >>> >>> -ilia >> >> It fails on my Haswell chip - definitely sprinkling of red pixels. >> Also the error seems to be greater than 5e-7. Even with 1.6e-6 as >> failure point, it still fails, while on ppc64le it passes. >> Only when I went for 2e-6, the test passes. >> >> As I said and Roland explained, the calculation method is inherently >> less accurate in the two-stages path. Although I don't know why on >> SSE4.2 the deviation is a bit larger than on VMX >> > > Does that have fma and does it auto-fuse mul/add to fma? Albeit I don't > think it should right now... Other than that, I'm not sure why the > results would be different - albeit on x86 we explicitly disable denorms > which could cause different results, for this example I don't think this > should be an issue. > > Roland > >
You asked about ppc64le ? If so, It does have fma instructions in its ISA, both for floating point and for vector. However, if I look in the LLVM IR, I don't see any fma intrinsic and if I look at the disassemble, I don't see any instructions of multiply-and-add. So in theory it exists but it isn't used in the tests I mentioned. Oded _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev