On 23.08.2017 15:26, Emil Velikov wrote:
On 23 August 2017 at 13:23, Nicolai Hähnle <nhaeh...@gmail.com> wrote:
On 23.08.2017 13:07, Elie Tournier wrote:

From: Elie Tournier <elie.tourn...@collabora.com>

TL;DR
This series is a "status update" of my work done for adding fp64 support
on r600g.
One of the biggest issue is due to a lake of accuracy on the rcp
implementation.
Divide relay on rcp.

A branch is available on
https://github.com/Hopetech/mesa/tree/glsl_arb_gpu_shader_fp64_v3
Comments and reviews are welcome.

Patches 1-18:
These few patches implement the basic fp64 operations.

Patches 19-47:
Lower operations using the builtin functions previously implemented.

Known issues:
- operations on matrix crash the system.
- sqrt and d2f are not accurate enought so the piglit tests are failling.
    But sqrt and d2f are working correctly using softpipe.
    However, implementing sqrt64 as f2d(sqrt32(d2f()) seems to be good
enought for Piglit.
- rcp is define as pow(pow(x, -0.5), 2)
    NIR and NV convert the input in a fp32, realize a rcp, convert back to
a fp64 and realize some Newton-Raphson step.
    This is not possible with GLSL IR because using fma will generate a
massive builtin_float64.h file.


I don't understand this part. You need multiplication and addition anyway.
So if it's only fma which is the problem (why?), then why not just use
non-fused multiply-add? It may end up being slightly less accurate, but we
don't give any strong guarantees about rcp accuracy anyway, do we?

Pardon for dropping it like that. I'll try to explain things in a
slightly different way.

Due to the fp64 <> fp32 conversion the accuracy of RCP is pretty bad.

Thus a couple of Newton-Ralphson steps are used. Each one implemented via fma.
There's no native fma thus we use normal multiply and add.

As those get added to the generated file of built-ins
(builtin_float64.h), it grows by ~20k LoC making compilation/linking
quite slow.
Noticeably bloating the final binary size as well (Elie has some crazy
numbers from the very first experiments).

Oh, I think I get it now. The issue is that the mul+add gets inlined into the rcp in builtin_float64.h? Can that be avoided? Although I guess that just bloats the final shader, to questionable effects...

Thanks for helping me get it :)

Cheers,
Nicolai


-Emil



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to