I'm also concerned this sort of ad-hoc re-interpretation of opcode semantics will come to bytes us later, as different state trackers might want different semantics.

I think we might need to redefine TGSI_OPCODE_RCP opcode or introduce a TGSI_OPCODE_RCP_LEGACY opcode.

Also, do we know exactly what shape does this division by zero takes in the incoming GLSL shader? For example, could this be a 0/0 caused by calling GLSL' normalize() on a zero-length vector, and should this special RCP be used exclusively on the lowering of that built-in function?

Jose


On 04/12/14 17:52, Marek Olšák wrote:
Returning FLT_MAX instead of 0 also works, which is similar to another
hw instruction: V_RCP_CLAMP_F32.

The Unreal engine is a pretty big target with a lot of apps out there.
I'm afraid a driconf option isn't feasible.

Marek

On Thu, Dec 4, 2014 at 5:39 PM, Roland Scheidegger <srol...@vmware.com> wrote:
Hmm I have to say I'm not really convinced of that solution. Because all
divs are lowered, this will screw the results of all divs (if the rcp
would come from a legacy arb_fp rcp that would be different and quite
possible some apps depending on it, problems like that are very common
for d3d9 apps too). But really there's some expectations stuff conforms
to ieee754 rules these days, and making divs by zero return 0 ain't so hot.
Maybe it's the div lowering itself which causes this, in which case it
should probably be disabled? Might be a good idea anyway (if the driver
supports native div) since rcp isn't accurate usually.
Difficult to tell though without seeing the glsl and tgsi shader. But if
it was really the app expecting zero out of a div by zero (but I have
doubts about that), I'd certainly classify that as an app bug, and any
workarounds only be enabled by some dri conf option.

Roland



Am 04.12.2014 um 13:34 schrieb Marek Olšák:
From: Marek Olšák <marek.ol...@amd.com>

Discussion: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D83510-23c8&d=AAIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzE&m=5VBoEca5PN7fZhaOGG8S3HeaGc1EQooZ2Ud4WZaehnQ&s=mL0Xf45D0QZ5Fb0AqLTlumjLLRA2A5wP3C1bU7UrapI&e=
---
  src/gallium/drivers/radeonsi/si_shader.c | 27 +++++++++++++++++++++++++++
  1 file changed, 27 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 973bac2..e0799c9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2744,6 +2744,32 @@ static int si_generate_gs_copy_shader(struct si_screen 
*sscreen,
       return r;
  }

+/**
+ * This emulates V_RCP_LEGACY_F32, which has the following rule for division
+ * by zero: 1 / 0 = 0
+ *
+ * V_RCP_F32(x) = 1 / x
+ * V_RCP_LEGACY_F32(x) = (x != +-0) ? V_RCP_F32(x) : 0.
+ */
+static void si_llvm_emit_rcp_legacy(const struct lp_build_tgsi_action * action,
+                                 struct lp_build_tgsi_context * bld_base,
+                                 struct lp_build_emit_data * emit_data)
+{
+     LLVMValueRef cmp =
+             lp_build_cmp(&bld_base->base,
+                          PIPE_FUNC_NOTEQUAL,
+                          emit_data->args[0],
+                          bld_base->base.zero);
+
+     LLVMValueRef div =
+             lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
+                                       bld_base->base.one,
+                                       emit_data->args[0]);
+
+     emit_data->output[emit_data->chan] =
+             lp_build_select(&bld_base->base, cmp, div, bld_base->base.zero);
+}
+
  int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
  {
       struct si_shader_selector *sel = shader->selector;
@@ -2798,6 +2824,7 @@ int si_shader_create(struct si_screen *sscreen, struct 
si_shader *shader)
               bld_base->op_actions[TGSI_OPCODE_MIN].emit = 
build_tgsi_intrinsic_nomem;
               bld_base->op_actions[TGSI_OPCODE_MIN].intr_name = 
"llvm.minnum.f32";
       }
+     bld_base->op_actions[TGSI_OPCODE_RCP].emit = si_llvm_emit_rcp_legacy;

       si_shader_ctx.radeon_bld.load_system_value = declare_system_value;
       si_shader_ctx.tokens = sel->tokens;


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzE&m=5VBoEca5PN7fZhaOGG8S3HeaGc1EQooZ2Ud4WZaehnQ&s=_sTrnx12zMjNK82ioPUvQCC1Zp0Syl0--cWB40-ihqc&e=


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to