I certainly don't want to deter you from pushing a quick fix now.

If possible I'd just like any further information you might have collected while debugging this issue in the hope we can figure out a more comprehensive solution should be.

Because I'm sure other drivers (including VMware's svga and llvmpipe) will face the exact same issue.

Jose


On 04/12/14 19:49, Marek Olšák wrote:
I agree that this should be fixed properly, but my time is limited at
the moment. We know there's a problem with our shader backend, because
there are issues with other games too. We can fix it later and we
certainly won't forget about it.

Note that r600g is using a lot of these non-standard legacy
instructions (like dx9 MUL where x*0=0, clamped RCP, clamped RSQ),
while radeonsi is mostly using the IEEE versions. I'd say radeonsi is
not so horrible yet, but yeah the situation is very messy.

If we want a driver to use IEEE behavior for all instructions, the
upper layers that also transform and optimize shaders must be ready
for it and must handle NaNs and Infs correctly. For starters, !(x > y)
is not equivalent to (x <= y), so the few comparison instructions that
we have currently aren't enough.

Marek

On Thu, Dec 4, 2014 at 8:03 PM, Roland Scheidegger <srol...@vmware.com> wrote:
The problem is, while glsl was quite lenient in older versions wrt
precision or things like div by zero, starting with glsl 4.10 you are
_required_ to return +-/Inf for divs by zero. Hence if you use hacks
here to make some app run, you _will_ have to fix that properly in the
future in any case.
(That said, somewhat related coming up from time to time, we don't have
a way to express "legacy" vs. "modern" behavior in tgsi - glsl may have
been indifferent at least in theory, but d3d9 was not and legacy
arb_fp/vp also might have expected the no-nans, no-infs, rcp(0) = 0
(though the spec doesn't tell that) and so on - it is unclear if this
should be done in tgsi (different instructions? shader property?) or
just translated away in state trackers if legacy behavior is expected,
though the latter is no good for older drivers. But just expecting
legacy behavior in backends is NOT feasible. RCP might be something of a
special case because modern apis (be it glsl, opencl, d3d10) don't have
it so legacy behavior may be expcted, but since it's used to lower other
instructions this isn't really true neither.)

RCP_CLAMP I guess looks somewhat better (because it's "closer" to the
real result) but does not solve that issue neither.

Like I said though, I have no idea if this issue is really coming from
the app, or somehow the div lowering or other optimization somewhere (it
looks like since both returning zero or +MAX_FLT work I guess it's
really the NaN turning up at some point when probably doing a mul of it
with zero or something like that either in the shader or even blend is
the problem why things are screwed in the end).

I don't really object this patch though, you can do in the driver
whatever you want, but this is something which I feel should be figured
out for real.

Roland

Am 04.12.2014 um 18:52 schrieb Marek Olšák:
Returning FLT_MAX instead of 0 also works, which is similar to another
hw instruction: V_RCP_CLAMP_F32.

The Unreal engine is a pretty big target with a lot of apps out there.
I'm afraid a driconf option isn't feasible.

Marek

On Thu, Dec 4, 2014 at 5:39 PM, Roland Scheidegger <srol...@vmware.com> wrote:
Hmm I have to say I'm not really convinced of that solution. Because all
divs are lowered, this will screw the results of all divs (if the rcp
would come from a legacy arb_fp rcp that would be different and quite
possible some apps depending on it, problems like that are very common
for d3d9 apps too). But really there's some expectations stuff conforms
to ieee754 rules these days, and making divs by zero return 0 ain't so hot.
Maybe it's the div lowering itself which causes this, in which case it
should probably be disabled? Might be a good idea anyway (if the driver
supports native div) since rcp isn't accurate usually.
Difficult to tell though without seeing the glsl and tgsi shader. But if
it was really the app expecting zero out of a div by zero (but I have
doubts about that), I'd certainly classify that as an app bug, and any
workarounds only be enabled by some dri conf option.

Roland



Am 04.12.2014 um 13:34 schrieb Marek Olšák:
From: Marek Olšák <marek.ol...@amd.com>

Discussion: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D83510-23c8&d=AAIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=HWErIxbQoflHYrNCrKP9DUUVY69hrn9DXmH6mjo7TRE&s=djKOxL0S3qmyGMXZQcCFe2SkcTCdbpIrvB_EaeA2Xbc&e=
---
  src/gallium/drivers/radeonsi/si_shader.c | 27 +++++++++++++++++++++++++++
  1 file changed, 27 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 973bac2..e0799c9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2744,6 +2744,32 @@ static int si_generate_gs_copy_shader(struct si_screen 
*sscreen,
       return r;
  }

+/**
+ * This emulates V_RCP_LEGACY_F32, which has the following rule for division
+ * by zero: 1 / 0 = 0
+ *
+ * V_RCP_F32(x) = 1 / x
+ * V_RCP_LEGACY_F32(x) = (x != +-0) ? V_RCP_F32(x) : 0.
+ */
+static void si_llvm_emit_rcp_legacy(const struct lp_build_tgsi_action * action,
+                                 struct lp_build_tgsi_context * bld_base,
+                                 struct lp_build_emit_data * emit_data)
+{
+     LLVMValueRef cmp =
+             lp_build_cmp(&bld_base->base,
+                          PIPE_FUNC_NOTEQUAL,
+                          emit_data->args[0],
+                          bld_base->base.zero);
+
+     LLVMValueRef div =
+             lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
+                                       bld_base->base.one,
+                                       emit_data->args[0]);
+
+     emit_data->output[emit_data->chan] =
+             lp_build_select(&bld_base->base, cmp, div, bld_base->base.zero);
+}
+
  int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
  {
       struct si_shader_selector *sel = shader->selector;
@@ -2798,6 +2824,7 @@ int si_shader_create(struct si_screen *sscreen, struct 
si_shader *shader)
               bld_base->op_actions[TGSI_OPCODE_MIN].emit = 
build_tgsi_intrinsic_nomem;
               bld_base->op_actions[TGSI_OPCODE_MIN].intr_name = 
"llvm.minnum.f32";
       }
+     bld_base->op_actions[TGSI_OPCODE_RCP].emit = si_llvm_emit_rcp_legacy;

       si_shader_ctx.radeon_bld.load_system_value = declare_system_value;
       si_shader_ctx.tokens = sel->tokens;



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzE&m=FiE6oNGxIBhAhWYTizfUU0U-CMLJWgwTBa8HHlEwmYk&s=ix4zRbhUVrkjpEX2HelK0L1X-NFxAVdGALq9vyUv7eo&e=


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to