On Fri, Jan 6, 2017 at 6:05 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 05.01.2017 17:59, Ilia Mirkin wrote: >> >> On Thu, Jan 5, 2017 at 11:30 AM, Nicolai Hähnle <nhaeh...@gmail.com> >> wrote: >>> >>> On 05.01.2017 17:02, Ilia Mirkin wrote: >>>> >>>> >>>> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaeh...@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> On 02.01.2017 21:41, Marek Olšák wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imir...@alum.mit.edu> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> >>>>>>> --- >>>>>>> src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- >>>>>>> src/gallium/docs/source/tgsi.rst | 11 +++++++++++ >>>>>>> src/gallium/include/pipe/p_shader_tokens.h | 2 +- >>>>>>> 3 files changed, 13 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c >>>>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c >>>>>>> index 37549aa..e34b8c7 100644 >>>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c >>>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c >>>>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info >>>>>>> opcode_info[TGSI_OPCODE_LAST] = >>>>>>> { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP }, >>>>>>> { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS }, >>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB }, >>>>>>> - { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 }, /* removed */ >>>>>>> + { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH }, >>>>>>> { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV }, >>>>>>> { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 }, >>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL }, >>>>>>> diff --git a/src/gallium/docs/source/tgsi.rst >>>>>>> b/src/gallium/docs/source/tgsi.rst >>>>>>> index d2d30b4..accbe1d 100644 >>>>>>> --- a/src/gallium/docs/source/tgsi.rst >>>>>>> +++ b/src/gallium/docs/source/tgsi.rst >>>>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes >>>>>>> image, while .w will contain the number of samples for >>>>>>> multi-sampled >>>>>>> images. >>>>>>> >>>>>>> +.. opcode:: FBFETCH - Load data from framebuffer >>>>>>> + >>>>>>> + Syntax: ``FBFETCH dst, output`` >>>>>>> + >>>>>>> + Example: ``FBFETCH TEMP[0], OUT[0]`` >>>>>>> + >>>>>>> + Returns the color of the current position in the framebuffer from >>>>>>> + before this fragment shader invocation. Always returns the same >>>>>>> + value from multiple calls for a particular output within a single >>>>>>> + invocation. >>>>> >>>>> >>>>> >>>>> >>>>> I'm not a fan of this last sentence. It's true that we could somehow >>>>> bend >>>>> things in the compiler to make the sentence true, but >>>>> >>>>> (a) The statement is clearly false with a straight-forward >>>>> implementation >>>>> of >>>>> the instruction: multiple fragment shaders can be simultaneously >>>>> in-flight >>>>> on the same pixel/sample. A second FBFETCH could happen after an >>>>> earlier >>>>> invocation on the same pixel finished and get the new framebuffer >>>>> value. >>>>> >>>>> (b) I'm not aware of an API that actually requires this guarantee. >>>>> >>>>> >>>>>> If the value is always the same, can it be declared as a system value >>>>>> instead? >>>>> >>>>> >>>>> >>>>> >>>>> I don't know. I'd remove the statement about the value always being the >>>>> same >>>>> to begin with. And with an eye to how this actually ends up being >>>>> implemented, and possible interactions with >>>>> ARB_fragment_shader_interlock, >>>>> I'd say it makes sense (and our lives easier) for the TGSI to define >>>>> _when_ >>>>> the framebuffer value is supposed to be read, and for that it makes >>>>> sense >>>>> to >>>>> have an instruction for it. >>>> >>>> >>>> >>>> In case it's not obvious, this is primarily for >>>> KHR_blend_equation_advanced. It's illegal to use it with overdraw >>>> without a barrier first. >>> >>> >>> >>> Well, you get an undefined value :) >>> >>> My point is that there's a plausible implementation of FBFETCH as an >>> instruction in which "the same value will be returned within a single >>> invocation" is only guaranteed if the application follows the rules >>> involving BlendBarrier. >>> >>> >>>> There's also >>>> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch, >>>> which do allow overdraw. I'm not sure how the ordering is specified >>>> (or, how it's specified for regular blending). The gl_LastFragData >>>> stuff are a non-writable space, and as such, it makes sense that >>>> they'd be computed once on invocation and then kept constant >>>> throughout the shader run. >>> >>> >>> >>> It still leaves open the question of _when_ they should be computed, >>> especially in the future when guarantees about the order of fragment >>> shaders >>> are required (i.e. KHR_blend_equation_advanced_coherent): If you load the >>> data too early, you may lose some potential for parallelism because you >>> have >>> to spend more time waiting for earlier invocations to finish. This >>> obviously >>> depends on the details of the hardware, but ARB_fragment_shader_interlock >>> suggests that it is (or will be?) quite common to have synchronization >>> that >>> is rather fine-grained. >> >> >> This is a related to the implementation issues I had on nouveau with >> the sysval - basically the code looks like >> >> if (advanced blend is enabled) { >> MOV TEMP[0], SV >> IF >> use(sv) >> ELSE >> use(sv) >> use(sv some more) >> } >> >> And then the st_glsl_to_tgsi copy propagation logic pushes those SV's >> down to the uses. This is further complicated by the fact that the >> xyzw values are logically independent at the TGSI level. And now I'm >> stuck either >> >> (a) getting the value once at the top of the shader, even if advanced >> blend might be disabled entirely >> (b) fetching the texture once per bb >> (c) figuring out some code motion heuristics which presently don't >> exist in nouveau >> >> So... the operation makes the most sense to me. How about "once you >> call FBFETCH, all further invocations should return identical values"? > > > I don't know about "should". What about "may"? This would allow the driver > to "cache" the fbfetch result or fetch it only once at the start of the > shader, but doesn't force it to do that. So things are straightforward for > the driver both when a simple texture-fetch-based implementation should be > used _and_ when there really is fixed function hardware. How about this:
I like may. > >> Example: ``FBFETCH TEMP[0], OUT[0]`` >> >> Returns the current color of the current position in the framebuffer. >> The result is undefined if the current position is rendered to twice >> without a blend barrier in between. > > I think that's pretty much the condition at the API level, isn't it? Yes. I just want to point out that this opcode is tied to the API operation. Marek, are you good with this non-SV approach? If so, I'll respin my original series to take Nicolai's suggestions into account. Cheers, -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev