On Mon, Mar 2, 2015 at 4:55 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 02.03.2015 um 12:52 schrieb Marek Olšák: >> From: Marek Olšák <marek.ol...@amd.com> >> >> Needed by ARB_gpu_shader5. >> --- >> src/gallium/auxiliary/gallivm/lp_bld_limits.h | 1 + >> src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + >> src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- >> src/gallium/auxiliary/tgsi/tgsi_util.c | 1 + >> src/gallium/docs/source/screen.rst | 1 + >> src/gallium/docs/source/tgsi.rst | 23 >> +++++++++++++++++++++++ >> src/gallium/drivers/freedreno/freedreno_screen.c | 1 + >> src/gallium/drivers/i915/i915_screen.c | 1 + >> src/gallium/drivers/nouveau/nv30/nv30_screen.c | 2 ++ >> src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + >> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + >> src/gallium/drivers/r300/r300_screen.c | 2 ++ >> src/gallium/drivers/r600/r600_pipe.c | 1 + >> src/gallium/drivers/r600/r600_shader.c | 6 +++--- >> src/gallium/drivers/radeonsi/si_pipe.c | 1 + >> src/gallium/drivers/svga/svga_screen.c | 2 ++ >> src/gallium/drivers/vc4/vc4_screen.c | 1 + >> src/gallium/include/pipe/p_defines.h | 1 + >> src/gallium/include/pipe/p_shader_tokens.h | 2 +- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 12 ++++++++---- >> 20 files changed, 54 insertions(+), 9 deletions(-) >> >> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h >> b/src/gallium/auxiliary/gallivm/lp_bld_limits.h >> index 2962360..c5c51c1 100644 >> --- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h >> +++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h >> @@ -129,6 +129,7 @@ gallivm_get_shader_param(enum pipe_shader_cap param) >> case PIPE_SHADER_CAP_DOUBLES: >> case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: >> case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: >> + case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: >> return 0; >> } >> /* if we get here, we missed a shader cap above (and should have seen >> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h >> b/src/gallium/auxiliary/tgsi/tgsi_exec.h >> index 609c81b..0e59b88 100644 >> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h >> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h >> @@ -459,6 +459,7 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param) >> case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: >> return 1; >> case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: >> + case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: >> return 0; >> } >> /* if we get here, we missed a shader cap above (and should have seen >> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c >> b/src/gallium/auxiliary/tgsi/tgsi_info.c >> index 4d838fd..e6e0a60 100644 >> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c >> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c >> @@ -56,7 +56,7 @@ static const struct tgsi_opcode_info >> opcode_info[TGSI_OPCODE_LAST] = >> { 1, 3, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD }, >> { 1, 2, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB }, >> { 1, 3, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP }, >> - { 0, 0, 0, 0, 0, 0, NONE, "", 19 }, /* removed */ >> + { 1, 3, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA }, >> { 1, 1, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT }, >> { 1, 3, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A }, >> { 0, 0, 0, 0, 0, 0, NONE, "", 22 }, /* removed */ >> diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c >> b/src/gallium/auxiliary/tgsi/tgsi_util.c >> index d572ff0..e5b8427 100644 >> --- a/src/gallium/auxiliary/tgsi/tgsi_util.c >> +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c >> @@ -193,6 +193,7 @@ tgsi_util_get_inst_usage_mask(const struct >> tgsi_full_instruction *inst, >> case TGSI_OPCODE_MAD: >> case TGSI_OPCODE_SUB: >> case TGSI_OPCODE_LRP: >> + case TGSI_OPCODE_FMA: >> case TGSI_OPCODE_FRC: >> case TGSI_OPCODE_CEIL: >> case TGSI_OPCODE_CLAMP: >> diff --git a/src/gallium/docs/source/screen.rst >> b/src/gallium/docs/source/screen.rst >> index e0fd1a2..dd7a012 100644 >> --- a/src/gallium/docs/source/screen.rst >> +++ b/src/gallium/docs/source/screen.rst >> @@ -336,6 +336,7 @@ to be 0. >> is supported. If it is, DTRUNC/DCEIL/DFLR/DROUND opcodes may be used. >> * ``PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED``: Whether DFRACEXP and >> DLDEXP are supported. >> +* ``PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED``: Whether TGSI_OPCODE_FMA is >> supported. >> >> >> .. _pipe_compute_cap: >> diff --git a/src/gallium/docs/source/tgsi.rst >> b/src/gallium/docs/source/tgsi.rst >> index b0a975a..6871676 100644 >> --- a/src/gallium/docs/source/tgsi.rst >> +++ b/src/gallium/docs/source/tgsi.rst >> @@ -272,6 +272,29 @@ This instruction replicates its result. >> dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w >> >> >> +.. opcode:: FMA - Fused Multiply-Add >> + >> +The results may not be identical to evaluating the expression (a*b)+c, >> +because the computation may be performed in a single operation with >> +intermediate precision different from that used to compute a non-FMA >> +expression. >> + >> +The results of FMA are guaranteed to be invariant given fixed inputs >> +<src0>, <src1>, and <src2>. That means the implementation is not allowed >> +to expand the opcode to MUL+ADD and apply algebraic optimizations affecting >> +the floating-point results. > I think these paragraphs are slightly confusing, especially "because > the computation may be performed in a single operation with intermediate > precision different from that used to compute a non-FMA expression". > Would be more obvious to say something along the lines that (in contrast > to MAD) no intermediate rounding is happening. Otherwise this sounds > like it would be allowed to do some sort of intermediate rounding, as > long as the intermediate precision is larger than what you'd get by > separate mul+mad, which I don't think is what you wanted.
Well, it's partially copied from the extension spec and it just states that the intermediate precision is different. I guess the main point is that the result is invariant with regard to inputs. > (FWIW I don't think we really clarified MAD wrt intermediate rounding, I > particularly like opencl convention that FMA = no rounding, MUL + ADD = > rounding, MAD = do whatever is fastest (because optimizing backends can > fuse back MUL+ADD back into a MAD themselves if the hw can do that with > intermediate rounding) but traditionally of course MAD always did > intermediate rounding.) Also MAD doesn't support denormals (on radeon), while FMA does. IIRC, FMA is the slower one of the two. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev