On Fri, Dec 15, 2017 at 1:08 PM, Nicolai Hähnle <nicolai.haeh...@amd.com> wrote: > On 15.12.2017 12:37, Rob Clark wrote: >> >> On Fri, Dec 15, 2017 at 4:41 AM, Nicolai Hähnle <nicolai.haeh...@amd.com> >> wrote: >>> >>> On 15.12.2017 00:56, Rob Clark wrote: >>>> >>>> >>>> On Wed, Dec 6, 2017 at 3:31 PM, Ian Romanick <i...@freedesktop.org> >>>> wrote: >>>>> >>>>> >>>>> On 12/05/2017 08:25 AM, Ilia Mirkin wrote: >>>>>> >>>>>> >>>>>> On Tue, Dec 5, 2017 at 8:18 AM, Emil Velikov >>>>>> <emil.l.veli...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rob, >>>>>>> >>>>>>> On 5 December 2017 at 12:54, Rob Clark <robdcl...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> This is a bit sad/annoying. But with current GPU firmware (at least >>>>>>>> on >>>>>>>> a5xx) we can support both draw-indirect and base-instance. But we >>>>>>>> can't >>>>>>>> support draw-indirect with a non-zero base-instance specified. So >>>>>>>> add >>>>>>>> a >>>>>>>> driconf option to hide the extension from games that are known to >>>>>>>> use >>>>>>>> both. >>>>>>>> >>>>>>>> Signed-off-by: Rob Clark <robdcl...@gmail.com> >>>>>>>> --- >>>>>>>> Tbh, I'm also not really sure what to do when/if we got updated >>>>>>>> firmware >>>>>>>> which handled draw-indirect with base-instance, since we'd need to >>>>>>>> make >>>>>>>> this option conditional on fw version. For STK that probably isn't >>>>>>>> a >>>>>>>> big deal since it doesn't use draw-indirect in a particularly useful >>>>>>>> way >>>>>>>> (the indirect buffer is generated on CPU). >>>>>>>> >>>>>>> Couldn't freedreno just return 0 for PIPE_CAP_DRAW_INDIRECT (aka >>>>>>> disable the extension) as it detects buggy FW? >>>>>>> This is what radeons have been doing as they encounter iffy firmware >>>>>>> or >>>>>>> LLVM. >>>>>>> >>>>>>> AFAICT freedreno doesn't do GL 4.0 or GLES 3.1 so one should be safe. >>>>>> >>>>>> >>>>>> >>>>>> Rob is this -><- close to ES 3.1, so that's not a great option. >>>>> >>>>> >>>>> >>>>> And I don't suppose there's a way to get updated firmware? i965 has >>>>> similar sorts of cases where higher versions are disabled due to >>>>> missing >>>>> kernel features. >>>>> >>>> >>>> so after r/e the instruction set for the CP microcontrollers and >>>> writing a disassembler and assembler[1], and figuring out how the fw >>>> handles CP_DRAW_INDIRECT and CP_DRAW_INDX_INDIRECT packets, I've come >>>> to the conclusion that the issue isn't actually with draw-indirect vs >>>> base-instance (at least not w/ the fw from my pixel2 which md5sum >>>> claims is the same as what is in linux-firmware.. it is possible that >>>> I was using an earlier version of the fw before when I came to this >>>> conclusion). On the plus side, the PFP/ME microcontrollers that parse >>>> the cmdstream are pretty neat and I learned some useful stuff along >>>> the way. >>>> >>>> But thinking a bit about how stk is using GL_MAP_PERSISTENT_BIT to map >>>> and update the draw-indirect buffers, it seems to me there are plenty >>>> of ways this can go wrong w/ tilers (and even more when you throw >>>> re-ordering into the mix). Possibly I should disable reordering when >>>> the indirect buffer is mapped w/ PERSISTENT bit, although for games >>>> like stk this is probably counter-productive vs just hiding the >>>> draw-indirect extension.. for games that actually use the GPU to write >>>> the draw-indirect buffer it shouldn't be a problem. So I think a >>>> driconf patch like this probably still ends up being useful in the >>>> end. >>> >>> >>> >>> Can you detail a bit what you think could go wrong? I believe that the >>> intention of the GL spec is that reordering in tilers should be possible >>> at >>> least for buffers that are mapped PERSISTENT but not COHERENT. >>> >>> You may only have to block reordering if the buffer is mapped both >>> PERSISTENT *and* COHERENT -- and even then, reordering is probably >>> possible. >>> >>> Granted, the spec is unclear as usual when it comes to these memory >>> synchronization issues -- the description of the MAP_COHERENT_BIT in >>> section >>> 6.2 does not mention WAR hazards (in particular, Write by client after >>> Read >>> by server) -- but perhaps that can be fixed. >>> >>> To go into a bit more detail, what I suspect you're worried about is >>> applications doing stuff like: >>> >>> 1. Write to indirect buffer (persistently & coherently mapped) >>> 2. Draw*Indirect >>> 3. Write to the same location in the indirect buffer >>> 4. Draw*Indirect >>> >>> ... but this is bound to fail with "normal" GPUs (like ours) as well. >>> Perhaps you have a different scenario in mind? >> >> >> yeah, this was basically the scenario I had in mind.. although I'm >> perhaps more aggressive in deferring rendering, to the point of >> re-ordering draws if unnecessary fbo switches are made. Normally I >> track which buffers are read and written in a given batch (draw pass) >> in order to preserve correctness (and in some cases shadowing or doing >> a staging transfer to update buffers/textures to avoid splitting a >> batch). Perhaps it is only an issue w/ persistent+coherent, but w/ >> cpu updating buffer without driver knowing when is kind of >> sub-optimal. >> >> I'm thinking I do need to keep track when there are outstanding >> coherent+persistent transfers mapped and switch off some of the >> cleverness. > > > I'm not convinced that this is actually necessary. You probably only need to > break for things like glFlushMappedBufferRange() and glFenceSync(). > > The reasoning is basically this: unless one of those synchronizing commands > is used, every desktop GPU will effectively re-order a sequence of: > > 1. Write #1 to persistent-mapped buffer > 2. Draw #1 > 3. Write #2 to persistent-mapped buffer > 4. Draw #2 > > to: > > 1. Write #1 to persistent-mapped buffer > 2. Write #2 to persistent-mapped buffer > 3. Draw #1 > 4. Draw #2 > > Correct me if I'm wrong, but the way I understand tiled GPUs, you might in > addition do some multi-passing that ends up as: > > 1. Write #1 to persistent-mapped buffer > 2. Write #2 to persistent-mapped buffer > 3.0. Draw #1, tile 0 > 4.0. Draw #2, tile 0 > 3.1. Draw #1, tile 1 > 4.1. Draw #2, tile 1 > ... etc. ... >
I guess if you throw in FBO switches and draw-reordering things can be happening a bit more out of order than this. Maybe not an issue w/ a correctly written game/app.. Anyways, I'll have to look more closely at stk to understand how it is expecting CPU writes to this indirect-params buffer to be synchronized. I'm not quite sure what is going on, but I think the main diff (depending on whether draw-indirect + base-instance is exposed or not) is using glDrawElementsBaseVertex() vs glDrawElementsIndirect()/glMultiDrawElementsIndirect(). And somehow it goes fabulously wrong. I guess I should try mapping/unmapping each time the buffer is updated to ensure it is synchronized w/ gpu.. BR, -R > So as far as interactions with persistent-mapped buffers are concerned, > tiling just doesn't make a difference. > > >> All the same, the way stk uses draw-indirect is not useful and it >> would be better to shut it off, at least for me. Although perhaps >> instead of adding things like this to driconf extension by extension, >> it would be more useful to have a way to provide a default >> MESA_EXTENSION_OVERRIDE via driconf? > > > Yeah, that makes sense. > > Cheers, > Nicolai > > >> >> BR, >> -R >> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev