On 23 August 2017 at 14:31, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 23.08.2017 15:26, Emil Velikov wrote: >> >> On 23 August 2017 at 13:23, Nicolai Hähnle <nhaeh...@gmail.com> wrote: >>> >>> On 23.08.2017 13:07, Elie Tournier wrote: >>>> >>>> >>>> From: Elie Tournier <elie.tourn...@collabora.com> >>>> >>>> TL;DR >>>> This series is a "status update" of my work done for adding fp64 support >>>> on r600g. >>>> One of the biggest issue is due to a lake of accuracy on the rcp >>>> implementation. >>>> Divide relay on rcp. >>>> >>>> A branch is available on >>>> https://github.com/Hopetech/mesa/tree/glsl_arb_gpu_shader_fp64_v3 >>>> Comments and reviews are welcome. >>>> >>>> Patches 1-18: >>>> These few patches implement the basic fp64 operations. >>>> >>>> Patches 19-47: >>>> Lower operations using the builtin functions previously implemented. >>>> >>>> Known issues: >>>> - operations on matrix crash the system. >>>> - sqrt and d2f are not accurate enought so the piglit tests are >>>> failling. >>>> But sqrt and d2f are working correctly using softpipe. >>>> However, implementing sqrt64 as f2d(sqrt32(d2f()) seems to be good >>>> enought for Piglit. >>>> - rcp is define as pow(pow(x, -0.5), 2) >>>> NIR and NV convert the input in a fp32, realize a rcp, convert back >>>> to >>>> a fp64 and realize some Newton-Raphson step. >>>> This is not possible with GLSL IR because using fma will generate a >>>> massive builtin_float64.h file. >>> >>> >>> >>> I don't understand this part. You need multiplication and addition >>> anyway. >>> So if it's only fma which is the problem (why?), then why not just use >>> non-fused multiply-add? It may end up being slightly less accurate, but >>> we >>> don't give any strong guarantees about rcp accuracy anyway, do we? >>> >> Pardon for dropping it like that. I'll try to explain things in a >> slightly different way. >> >> Due to the fp64 <> fp32 conversion the accuracy of RCP is pretty bad. >> >> Thus a couple of Newton-Ralphson steps are used. Each one implemented via >> fma. >> There's no native fma thus we use normal multiply and add. >> >> As those get added to the generated file of built-ins >> (builtin_float64.h), it grows by ~20k LoC making compilation/linking >> quite slow. >> Noticeably bloating the final binary size as well (Elie has some crazy >> numbers from the very first experiments). > > > Oh, I think I get it now. The issue is that the mul+add gets inlined into > the rcp in builtin_float64.h? Precisely. Note that pretty much _everything_ gets inlined. Which is why the file is so big at the moment 20k.
> Can that be avoided? AFAICT that's not possible atm. > Although I guess that > just bloats the final shader, to questionable effects... > Haven't looked at the final shader - Elie should have some numbers here. At some point the binary size of generate_ir.cpp (the one that includes builtin_float64.h) was ~1/3 of the total driver size. > Thanks for helping me get it :) > Yw. I'm pretty sure Elie will correct me since, I'm not that expert in the stuff. Just helping him out see the light [at the end of the tunnel]. -Emil _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev