This is the second version of fp64 precision series, including fixes as
per Ilia's advice.

The first patch should be functionally equivalent to the previous
version. Changes mostly focuses on code cleanup and rewording comments.
The second patch fixes a case where the original patch would generate
inaccurate rsq for some small normal inputs. The third one stays
untouched.

I ran through more tests on these two algorithms, comparing their
result with CPU implementation. I have never seen more than 1ulp
difference in rcp. While in rsq, there were some cases (~500ppm) with
2ulp difference. However, analysis with mpfr shows that all of those
were 1ulp error on both sides. So the precision now should satisfy
the requirement.

The assembly uses an instruction format yet to be merged to upstream
envytools assembler. I'll get that merged soon.

Boyan Ding (3):
  gk110/ir: Add rcp f64 implementation
  gk110/ir: Add rsq f64 implementation
  gk110/ir: Use the new rcp/rsq in library

 src/gallium/drivers/nouveau/codegen/lib/gk110.asm  | 219 ++++++++++++++++++++-
 .../drivers/nouveau/codegen/lib/gk110.asm.h        | 127 +++++++++++-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |  32 +++
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   1 +
 4 files changed, 375 insertions(+), 4 deletions(-)

-- 
2.12.0

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to