On Mon, Mar 2, 2015 at 5:48 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 02.03.2015 um 17:12 schrieb Marek Olšák: >> On Mon, Mar 2, 2015 at 4:55 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> Am 02.03.2015 um 12:52 schrieb Marek Olšák: >>>> From: Marek Olšák <marek.ol...@amd.com> >>>> >>>> Needed by ARB_gpu_shader5. >>>> --- >>>> src/gallium/auxiliary/gallivm/lp_bld_limits.h | 1 + >>>> src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + >>>> src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- >>>> src/gallium/auxiliary/tgsi/tgsi_util.c | 1 + >>>> src/gallium/docs/source/screen.rst | 1 + >>>> src/gallium/docs/source/tgsi.rst | 23 >>>> +++++++++++++++++++++++ >>>> src/gallium/drivers/freedreno/freedreno_screen.c | 1 + >>>> src/gallium/drivers/i915/i915_screen.c | 1 + >>>> src/gallium/drivers/nouveau/nv30/nv30_screen.c | 2 ++ >>>> src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + >>>> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + >>>> src/gallium/drivers/r300/r300_screen.c | 2 ++ >>>> src/gallium/drivers/r600/r600_pipe.c | 1 + >>>> src/gallium/drivers/r600/r600_shader.c | 6 +++--- >>>> src/gallium/drivers/radeonsi/si_pipe.c | 1 + >>>> src/gallium/drivers/svga/svga_screen.c | 2 ++ >>>> src/gallium/drivers/vc4/vc4_screen.c | 1 + >>>> src/gallium/include/pipe/p_defines.h | 1 + >>>> src/gallium/include/pipe/p_shader_tokens.h | 2 +- >>>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 12 ++++++++---- >>>> 20 files changed, 54 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h >>>> b/src/gallium/auxiliary/gallivm/lp_bld_limits.h >>>> index 2962360..c5c51c1 100644 >>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h >>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h >>>> @@ -129,6 +129,7 @@ gallivm_get_shader_param(enum pipe_shader_cap param) >>>> case PIPE_SHADER_CAP_DOUBLES: >>>> case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: >>>> case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: >>>> + case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: >>>> return 0; >>>> } >>>> /* if we get here, we missed a shader cap above (and should have seen >>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h >>>> b/src/gallium/auxiliary/tgsi/tgsi_exec.h >>>> index 609c81b..0e59b88 100644 >>>> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h >>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h >>>> @@ -459,6 +459,7 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param) >>>> case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: >>>> return 1; >>>> case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: >>>> + case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: >>>> return 0; >>>> } >>>> /* if we get here, we missed a shader cap above (and should have seen >>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c >>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c >>>> index 4d838fd..e6e0a60 100644 >>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c >>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c >>>> @@ -56,7 +56,7 @@ static const struct tgsi_opcode_info >>>> opcode_info[TGSI_OPCODE_LAST] = >>>> { 1, 3, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD }, >>>> { 1, 2, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB }, >>>> { 1, 3, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP }, >>>> - { 0, 0, 0, 0, 0, 0, NONE, "", 19 }, /* removed */ >>>> + { 1, 3, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA }, >>>> { 1, 1, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT }, >>>> { 1, 3, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A }, >>>> { 0, 0, 0, 0, 0, 0, NONE, "", 22 }, /* removed */ >>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c >>>> b/src/gallium/auxiliary/tgsi/tgsi_util.c >>>> index d572ff0..e5b8427 100644 >>>> --- a/src/gallium/auxiliary/tgsi/tgsi_util.c >>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c >>>> @@ -193,6 +193,7 @@ tgsi_util_get_inst_usage_mask(const struct >>>> tgsi_full_instruction *inst, >>>> case TGSI_OPCODE_MAD: >>>> case TGSI_OPCODE_SUB: >>>> case TGSI_OPCODE_LRP: >>>> + case TGSI_OPCODE_FMA: >>>> case TGSI_OPCODE_FRC: >>>> case TGSI_OPCODE_CEIL: >>>> case TGSI_OPCODE_CLAMP: >>>> diff --git a/src/gallium/docs/source/screen.rst >>>> b/src/gallium/docs/source/screen.rst >>>> index e0fd1a2..dd7a012 100644 >>>> --- a/src/gallium/docs/source/screen.rst >>>> +++ b/src/gallium/docs/source/screen.rst >>>> @@ -336,6 +336,7 @@ to be 0. >>>> is supported. If it is, DTRUNC/DCEIL/DFLR/DROUND opcodes may be used. >>>> * ``PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED``: Whether DFRACEXP and >>>> DLDEXP are supported. >>>> +* ``PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED``: Whether TGSI_OPCODE_FMA is >>>> supported. >>>> >>>> >>>> .. _pipe_compute_cap: >>>> diff --git a/src/gallium/docs/source/tgsi.rst >>>> b/src/gallium/docs/source/tgsi.rst >>>> index b0a975a..6871676 100644 >>>> --- a/src/gallium/docs/source/tgsi.rst >>>> +++ b/src/gallium/docs/source/tgsi.rst >>>> @@ -272,6 +272,29 @@ This instruction replicates its result. >>>> dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w >>>> >>>> >>>> +.. opcode:: FMA - Fused Multiply-Add >>>> + >>>> +The results may not be identical to evaluating the expression (a*b)+c, >>>> +because the computation may be performed in a single operation with >>>> +intermediate precision different from that used to compute a non-FMA >>>> +expression. >>>> + >>>> +The results of FMA are guaranteed to be invariant given fixed inputs >>>> +<src0>, <src1>, and <src2>. That means the implementation is not allowed >>>> +to expand the opcode to MUL+ADD and apply algebraic optimizations >>>> affecting >>>> +the floating-point results. >>> I think these paragraphs are slightly confusing, especially "because >>> the computation may be performed in a single operation with intermediate >>> precision different from that used to compute a non-FMA expression". >>> Would be more obvious to say something along the lines that (in contrast >>> to MAD) no intermediate rounding is happening. Otherwise this sounds >>> like it would be allowed to do some sort of intermediate rounding, as >>> long as the intermediate precision is larger than what you'd get by >>> separate mul+mad, which I don't think is what you wanted. >> >> Well, it's partially copied from the extension spec and it just states >> that the intermediate precision is different. I guess the main point >> is that the result is invariant with regard to inputs. > Hmm frankly I find the wording confusing, spec or not. Makes me think > though it was worded on purpose like that, maybe not quite all chips can > actually guarantee "correct" fma results (correct as in opencl fma > specification which is a lot better imho ("Returns the correctly rounded > floating-point representation of the sum of c with the infinitely > precise product of a and b. Rounding of intermediate products shall not > occur. Edge case behavior is per the IEEE 754-2008 standard.") > glsl also has a quite different wording but there the meaning is > somewhat different - https://www.opengl.org/sdk/docs/man/html/fma.xhtml. > In other words, if you don't have precise attribute, it's just the same > as a MAD. With precise though it seems to imply I think (because it's > considered a single operation, not "may be performed in a single > operation" like in arb_gpu_shader5) that there's no intermediate > rounding, just as what opencl expects. > > Roland > > > >> >>> (FWIW I don't think we really clarified MAD wrt intermediate rounding, I >>> particularly like opencl convention that FMA = no rounding, MUL + ADD = >>> rounding, MAD = do whatever is fastest (because optimizing backends can >>> fuse back MUL+ADD back into a MAD themselves if the hw can do that with >>> intermediate rounding) but traditionally of course MAD always did >>> intermediate rounding.) >> >> Also MAD doesn't support denormals (on radeon), while FMA does. IIRC, >> FMA is the slower one of the two. >> > > Interesting. I thought most gpus wouldn't handle denorms at all for > single precision floats for all operations, hence there wouldn't be much > point supporting it for just fma. Or can you enable that explicitly for > most operations just not for MAD?
Yeah, there is a global switch that sets the initial behavior and a special shader instruction that can change it. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev