On Thu, Jun 22, 2017 at 6:13 PM, Alex Smith <asm...@feralinteractive.com> wrote: > On 22 June 2017 at 15:52, Roland Scheidegger <srol...@vmware.com> wrote: >> Am 22.06.2017 um 13:09 schrieb Nicolai Hähnle: >>> On 22.06.2017 10:14, Michel Dänzer wrote: >>>> On 22/06/17 04:34 PM, Nicolai Hähnle wrote: >>>>> On 22.06.2017 03:38, Rob Clark wrote: >>>>>> On Wed, Jun 21, 2017 at 8:15 PM, Marek Olšák <mar...@gmail.com> wrote: >>>>>>> On Wed, Jun 21, 2017 at 10:37 PM, Rob Clark <robdcl...@gmail.com> >>>>>>> wrote: >>>>>>>> On Tue, Jun 20, 2017 at 6:54 PM, Marek Olšák <mar...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> This series updates pipe loaders so that flags such as drirc options >>>>>>>>> can be passed to create_screen(). I have compile-tested everything >>>>>>>>> except clover. >>>>>>>>> >>>>>>>>> The first pipe_screen flag is a drirc option to fix incorrect grass >>>>>>>>> rendering in Rocket League for radeonsi. Rocket League expects >>>>>>>>> DirectX >>>>>>>>> behavior for partial derivative computations after discard/kill, but >>>>>>>>> radeonsi implements the more efficient but stricter OpenGL behavior >>>>>>>>> and that will remain our default behavior. The new screen flag >>>>>>>>> forces >>>>>>>>> radeonsi to use the DX behavior for that game. >>>>>>>>> >>>>>>>> >>>>>>>> do we really want this to be a *global* option for the screen? >>>>>>> >>>>>>> Yes. Shaders are pipe_screen (global) objects in radeonsi, so a >>>>>>> compiler option also has to be global. We can't look at the context >>>>>>> during the TGSI->LLVM translation. >>>>>> >>>>>> well, I didn't really mean per-screen vs per-context, as much as >>>>>> per-screen vs per-shader (or maybe more per-screen vs >>>>>> per-instruction?) >>>>> >>>>> I honestly don't think it's worth the trouble. Applications that are >>>>> properly coded against GLSL can benefit from the relaxed semantics, and >>>>> applications that get it wrong in one shader are rather likely to get it >>>>> wrong everywhere. >>>>> >>>>> Since GLSL simply says derivatives are undefined after non-uniform >>>>> discard, and this option makes it defined instead, setting this flag can >>>>> never break the behavior of a correctly written shader. >>>> >>>> BTW, how expensive is the radeonsi workaround when it isn't needed? >>>> >>>> I'm starting to wonder if we shouldn't just make it always safe and call >>>> it a day, saving the trouble of identifying broken apps and plumbing the >>>> info through the API layers... >>> >>> As-is, the workaround can be *very* expensive in the worst case. A large >>> number of pixels could be disabled by a discard early in the shader, and >>> we're now moving the discard down, which means a lot of unnecessary >>> texture fetches may be happening. >>> >>> Also, I think I spoke too soon about this flag not having negative >>> effects: if a shader has an image/buffer write after a discard, that >>> write is now no longer disabled. >>> >>> A more efficient workaround can be done at the LLVM level by doing the >>> discard early, but then re-enabling WQM "relative to" the new set of >>> active pixels. It's a bit involved, especially when the discard itself >>> happens in a branch, and still a little more expensive, but it's an option. >>> >> >> I'm wondering what your driver for the other OS does (afaik dx10 is >> really the odd man out, all of glsl, spir-v, even metal have undefined >> derivatives after non-uniform discards). Thinking surely there must be >> something clever you could do... > > I'm wondering the same. > > This is an issue we come across from time to time, where a game's > shaders are expecting the D3D behaviour of derivatives remaining > defined post-discard. For this we usually do essentially what this > workaround is doing, just postpone the discard until the very end of > the shader. > > However it seems like doing this is less performant than the original > shaders running on D3D. One case I've seen had a big performance loss > against D3D when doing a delayed discard (which was being used early > in a complex shader to cull a lot of unneeded pixels), on both AMD and > NVIDIA. > > Given that, I've wondered whether there's something clever that the > D3D drivers are doing to optimise this. Maybe, for example, discarding > immediately if all pixels in a quad used for derivative calculations > get discarded? Is something like that possible on AMD hardware?
Yes, it's possible but not implemented in LLVM yet. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev