On Thu, Jun 28, 2018 at 3:58 AM, Iago Toral <ito...@igalia.com> wrote:
> On Thu, 2018-06-28 at 08:47 +0200, Iago Toral wrote: > > On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote: > > On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral <ito...@igalia.com> wrote: > > On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote: > > On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga <ito...@igalia.com> > wrote: > > Storage images require to patch push constant stateto work, which happens > during > binding table emision. In the scenario where our pipeline and descriptors > are > not dirty, we don't re-emit the binding table, however, if our push > constant > state is dirty, we will re-emit the push constant state, trashing storage > image setup. > > While that scenario is probably not very likely to happen in practice, > there > are some CTS tests that trigger this by clearing storage images and buffers > and dispatching a compute shader in a loop. The clearing of the images > and buffers will trigger a blorp execution which will dirty our push > constant > state, however, because we don't alter the descriptors or the compute > dispatch > at all in the loop (we are basically execution the same program in a loop), > our pipeline and descriptor state is not dirty. If the shader uses a > storage > image, then any iteration after the first will re-emit push constant state > without re-emitting binding tables and the storage image will not be > properly > setup any more. > > > I don't see why that is a problem. The only thing flush_descriptor_sets > does is fill out the binding and sampler tables and fill in the push > constant data for storage images/buffers. The actual HW packets are filled > out by flush_push_constants and emit_descriptor_pointers. Yes, blorp > trashes our descriptor pointers but the descriptor sets should be fine. > For push constants, it does emit 3DSTATE_CONSTANT_* but it doesn't actually > modify anv_cmd_state::push_constants. > > Are secondary command buffers involved? I could see something funny going > on with those. > > > No, no secondaries are involved. I did some more investigation and I think > my explanation of the problem was not good, this is what is really > happening: > > First, I found the problem in the compute pipeline and I only extended the > fix to the graphics pipeline because it looked like the same rationale > would apply, so I'll explain what happens in compute and then we can > discuss whether the same problem applies to graphics. > > The test does something like this: > > for (...) { > clear ssbos / storage images > dispatch compute > } > > The first iteration of this loop will find that the compute pipeline and > descriptors are dirty and proceed to emit binding tables. We have storage > images, so during that process the push constant buffer is amended to > include storage images. Specifically, we call > anv_cmd_buffer_ensure_push_constants_size() > for the images field. This gives us a size of 624. > > We move on to the second iteration of the loop. When we clear images and > ssbos via blorp, we again mark the push constant buffer as dirty. Now we > execute the compute dispatch and the first thing we do there is > anv_cmd_buffer_push_base_group_id() which calls > anv_cmd_buffer_ensure_push_constants_size() for the base group id, which > gives as a size of 144. This is smaller than what we computed in the > previous iteration, because we haven't called the same function for the > images field yet. Unfortunately, we will never call that again, because we > only do that during binding table emission and we only do that if the > compute pipeline is dirty (it is not) or our descriptors are dirty (they > are not). So we don't re-emit binding table and we don't ensure push > constant space for the image data, but because we come from a blorp > execution our push constant dirty flag is true, so we re-emit push constant > data, only that this time we won't emit the push constant data we need for > the storage images, which leads to the problem. > > > The intention has always been that anv_cmd_buffer_ensure_push_constants_size > would only ever grow the push constants and never shrink them. The most > obvious bug is in anv_cmd_buffer_ensure_push_constants_size. > > > I thought that maybe making anv_cmd_buffer_ensure_push_constants_size() > only update the size if we alloc or realloc would fix this, but that can > cause GPU hangs in some cases when I run multiple tests in parallel, so I > guess it isn't that simple. > > > Ugh... that makes things more interesting. That does look like the right > fix and now I'm wondering why it leads to a hang. > > In the compute case, flush_compute_descriptor_sets emits > MEDIA_INTERFACE_DESCRIPTOR_LOAD. My feeling is that not emitting that > packet is the real bug. In GL, we just re-emit all 4 compute packets all > the time and don't try to track dirty bits. I had patches to do that in > Vulkan somewhere. I rebased them and pushed them here: > > https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/anv-cs-packets > > Is that plus your patch to fix ensure_push_constants_size() enough to fix > the bug? If so, then I think what's going on is that we have to re-emit > all the compute packets after a switch from 3D to compute and we're not > doing that. > > > No, with this we still run into a GPU hang some times and a bunch of tests > keep failing consistently. The annoying part is that all this only happens > when we run the tests in group (i.e. deqp-vk -n dEQP-VK.path.to.tests.*). > If I run any of the failing tests alone (with our without your patches, but > with the patch for ensure_push_constants_size()), they always pass. I'll > see if I can find out more about what is happening. > > > Ok, I think I know what the problem is. The loop I mention above has 200 > iterations and the problem starts showing up only after iteration 31. With > 31 or less the tests pss just fine (with or without your patches but with > the patch for ensure_push_constants_size()). So that got me thinking that > something was growing too big and that was causing our binding table to be > invalidated. > > I think the problem is that we allocate too many binding tables (each > iteration of the loop emits blorp clears, and each blorp clear allocates > binding tables), so there is a moment where blorp tries to allocate a new > binding table and we run out of space in the block, so we hit the fallback > where we need to allocate a new block *and* we call > anv_cmd_buffer_emit_state_base_address(), which I understand is going to > make the binding table used with the compute pipeline (which we only emit > once in the first iteration of the loop), invalid, since it is was > allocated in the other block. > > So what I think we need to do is mark descriptors as dirty every time we > grab a new binding table block and emit a new state base address. FWIW, > adding "cmd_buffer->state.descriptors_dirty |= ~0" to > anv_cmd_buffer_alloc_blorp_binding_table() when we need to allocate a new > block fixes the problem. I think we would need to do this every time we > allocate a new binding table block, not just for blorp allocations. > > Does this make sense? > Yes, much more so. Thanks for digging deeper! We should flag descriptors_dirty for everything whenever we re-emit STATE_BASE_ADDRESS or something like that. --Jason > Iago > > --Jason > > > > I hope I am making more sense now. > > Iago > > --Jason > > > > Fixes multiple failures in some new CTS tests. > --- > src/intel/vulkan/genX_cmd_buffer.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index 97b321ccaeb..6e48aaedb9b 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -2554,7 +2554,8 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer > *cmd_buffer) > * 3DSTATE_BINDING_TABLE_POINTER_* for the push constants to take > effect. > */ > uint32_t dirty = 0; > - if (cmd_buffer->state.descriptors_dirty) > + if (cmd_buffer->state.descriptors_dirty || > + cmd_buffer->state.push_constants_dirty) > dirty = flush_descriptor_sets(cmd_buffer); > > if (dirty || cmd_buffer->state.push_constants_dirty) { > @@ -2988,7 +2989,13 @@ genX(cmd_buffer_flush_compute_state)(struct > anv_cmd_buffer *cmd_buffer) > anv_batch_emit_batch(&cmd_buffer->batch, &pipeline->batch); > } > > + /* Storage images require push constant data, which is setup during the > + * binding table emision. If we have dirty push constants, we need to > + * re-emit the binding table so we get the push constant storage image > setup > + * done, otherwise we trash it when we emit push constants below. > + */ > if ((cmd_buffer->state.descriptors_dirty & > VK_SHADER_STAGE_COMPUTE_BIT) || > + (cmd_buffer->state.push_constants_dirty & > VK_SHADER_STAGE_COMPUTE_BIT) || > cmd_buffer->state.compute.pipeline_dirty) { > /* FIXME: figure out descriptors for gen7 */ > result = flush_compute_descriptor_set(cmd_buffer); > -- > 2.14.1 > > > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev