On 12.01.2017 09:24, Samuel Pitoiset wrote:
On 01/12/2017 02:12 AM, Marek Olšák wrote:
On Thu, Jan 12, 2017 at 12:33 AM, Ilia Mirkin <imir...@alum.mit.edu>
wrote:
On Wed, Jan 11, 2017 at 4:00 PM, Roland Scheidegger
<srol...@vmware.com> wrote:
Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
I don't think there's any glsl, es or otherwise, specification which
would require denorms (since obviously lots of hw can't do it, d3d10
forbids them), with any precision qualifier. Hence these look like
bugs
of the test suite to me?
(Irrespective if it's a good idea or not to enable denormals, which I
don't realy know.)
That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
also works on Intel hw. I don't think it's buggy there.
The question then is why it needs denorms on radeons...
I spent some time with Samuel looking at this. So, this is pretty
funny... (or at least feels that way after staring at floating point
for a while)
dEQP is, in fact, feeding denorms to the min/max functions. But it's
smart enough to know that flushing denorms to 0 is OK, and so it
treats a 0 as a pass. (And obviously it treats the "right answer" as a
pass.) So that's why enabling denorm processing fixes it - that causes
the hw to return the proper correct answer and all is well.
However the issue is that without denorm processing, the hw is
returning the *wrong* answer. At first I thought that max was being
lowered into something like
if (a > b) { x = a; } else { x = b; }
which would end up with potentially wrong results if a and b are being
flushed as inputs into the comparison but not into the assignments.
But that's not (explicitly) what's happening - the v_max_f32_e32
instruction is being used. Perhaps that's what it does internally? If
so, that means that results of affected float functions in LLVM need
explicit flushing before being stored into results.
FWIW the specific values triggering the issue are:
in0=-0x0.000002p-126, in1=-0x0.fffffep-126, out0=-0x0.fffffep-126 ->
FAIL
With denorm processing, it correctly reports out0=-0x0.000002p-126,
while nouveau with denorm flushing enabled reports out0=0.0 which also
passes.
The denorm configuration has 2 bits:
- flush (0) or allow (1) input denorms
- flush (0) or allow (1) output denorms
In the case of v_max, it looks like output denorms are not flushed and
it behaves almost like you said:
if (a >= b) { x = a; } else { x = b; }
Should we adjust the denorm mode with s_setreg for v_max_f32/v_min_f32?
That might eliminate some optimization opportunities, so let's first see
if another fix is possible?
I haven't run the test, but from the description the most plausible
explanation is that v_max_f32/v_min_f32 flushes input denorms, but
doesn't flush output denorms for some stupid reason. Perhaps we could
change the fp_denorm setting to
- allow input denorms
- flush output denorms
Then min/max will preserve the denorms, but other operations will flush
denorms to zero.
Do you know how that affects v_mad_f32? If we just change the register
without telling LLVM about it, LLVM will still happily emit v_mad_f32,
and perhaps that produces incorrect results when denorms are passed in
from uniforms?
If this register setting doesn't work, then yes, looks like s_setreg may
be needed. Unless there's a cheap way to flush denorms from loads as
well? But I don't think there is.
Nicolai
Marek
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev