On Tuesday, November 28, 2017 6:34:20 PM PST Ian Romanick wrote: > On 11/28/2017 04:13 PM, Kenneth Graunke wrote: > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > > index b30ffe5d7f6..12d165d7236 100644 > > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > > @@ -183,6 +183,8 @@ intel_batchbuffer_reset(struct brw_context *brw) > > batch->last_bo = batch->batch.bo; > > > > batch->batch.bo = brw_bo_alloc(bufmgr, "batchbuffer", BATCH_SZ, 4096); > > + batch->batch.partial_bo = 0; > > Use 0 or NULL in both places (*.partial_bo in this hunk and the next).
Oops, meant to use NULL. Fixed locally, thanks. > > + /* Exchange the two BOs...without breaking pointers to the old BO. > > + * > > + * Consider this scenario: > > + * > > + * 1. Somebody calls brw_state_batch() to get a region of memory, and > > + * and then creates a brw_address pointing to brw->batch.state.bo. > > + * 2. They then call brw_state_batch() a second time, which happens to > > + * grow and replace the state buffer. They then try to emit a > > + * relocation to their first section of memory. > > + * > > + * If we replace the brw->batch.state.bo pointer at step 2, we would > > + * break the address created in step 1. They'd have a pointer to the > > + * old destroyed BO. Emitting a relocation would add this dead BO to > > + * the validation list...causing /both/ statebuffers to be in the list, > > + * and all kinds of disasters. > > + * > > + * This is not a contrived case - BLORP vertex data upload hits this. > > Is this the source of the DiRT GPU hangs? I'm unclear on that. I tracked this down by hacking brw_state_batch to "grow" the buffer on every call, so that we hit this path all the time. I then used glsl-algebraic-add-zero and glxgears as test cases, and found the BLORP vertex buffer problem there. I worked around that problem and ran into others, though, that I didn't track down. That's why I went this route of "make it work" instead of trying to adjust every caller to do a safe thing. It's hard to find them all and ensure they follow a non-obvious set of ordering constraints. When I'd fixed the problem, it fixed DiRT. But I'm not sure which exact piece of state was hitting the issue. > > + * > > + * There are worse scenarios too. Fences for GL sync objects reference > > + * brw->batch.batch.bo. If we replaced the batch pointer when growing, > > + * we'd need to chase down every fence and update it to point to the > > + * new BO. Otherwise, it would refer to a "batch" that never actually > > + * gets submitted, and would fail to trigger. > > The BLORP case seems really, really hard to test in a reliable way. > This, however, seems more testable. Do you think a piglit could be > created that could trigger this? Possibly...but it's a bit tricky to cause the batch to grow. You have to issue a draw call that emits a lot of commands at just the right moment (near the end of the batch). State (like lots of textures) doesn't help either...you need commands. That's sort of why I added the debugging code in patch 6 - by hacking the driver to go through the motions of growing on every batch, we turn every program into a stress test. Hacking the driver isn't ideal though... Overflowing the state buffer is easier, we could probably do that with a test that draws using a maximal number of surfaces. I think I've seen the CTS hit this already, depending how exactly it was sharded.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev