On Wed, Apr 05, 2017 at 11:02:18AM -0700, Jason Ekstrand wrote: > On Wed, Apr 5, 2017 at 10:45 AM, Chris Wilson > <[1]ch...@chris-wilson.co.uk> wrote: > > On Wed, Apr 05, 2017 at 10:28:53AM -0700, Jason Ekstrand wrote: > > Before, we were just looking at whether or not the user wanted us to > > wait and waiting on the BO. Some clients, such as the Serious engine, > > use a single query pool for hundreds of individual query results where > > the writes for those queries may be split across several command > > buffers. In this scenario, the individual query we're looking for may > > become available long before the BO is idle so waiting on the query > pool > > BO to be finished is wasteful. This commit makes us instead busy-loop > on > > each query until it's available. > > > > This significantly reduces pipeline bubbles and improves performance > of > > The Talos Principle on medium settings (where the GPU isn't overloaded > > with drawing) by around 20% on my SkyLake gt4. > > --- > > src/intel/vulkan/genX_query.c | 52 > ++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 46 insertions(+), 6 deletions(-) > > > > diff --git a/src/intel/vulkan/genX_query.c > b/src/intel/vulkan/genX_query.c > > index 7ea9404..0d303a6 100644 > > --- a/src/intel/vulkan/genX_query.c > > +++ b/src/intel/vulkan/genX_query.c > > @@ -131,6 +131,44 @@ cpu_write_query_result(void *dst_slot, > VkQueryResultFlags flags, > > } > > } > > > > +static bool > > +query_is_available(struct anv_device *device, volatile uint64_t > *slot) > > +{ > > + if (!device->info.has_llc) > > + __builtin_ia32_clflush(slot); > > + > > + return slot[0]; > > +} > > + > > +static VkResult > > +wait_for_available(struct anv_device *device, > > + struct anv_query_pool *pool, uint64_t *slot) > > +{ > > + while (true) { > > + if (query_is_available(device, slot)) > > + return VK_SUCCESS; > > + > > + VkResult result = anv_device_bo_busy(device, &pool->bo); > > Ah, but you can use the simpler check here because you follow up with a > query_is_available() so you know whether or not the hang clobbered the > result. > > If the query is not available but the bo is idle, you might then went to > check for a reset in case it was due to a lost device. GEM_BUSY is > lockless, but GEM_RESET_STATS currently takes the big struct_mutex and > so has non-deterministic and often quite large latencies. > > anv_device_bo_busy does that for us. :-) In particular, it queries > GEM_RESET_STATS but only if the BO is not busy. This way, so long as the > BO is busy, we only do GEM_BUSY but as soon as it is not busy we do a > single GEM_RESET_STATS and return VK_ERROR_DEVICE_LOST if we've gotten a > hang. The theory (the same for wait) is that, so long as the BO is still > busy, we either haven't hung or we haven't figured out that we've hung. > We could, in theory have two batches where the first one hangs and the > second is ok, so this isn't quite true. However, the important thing is > to never return VK_SUCCESS to the user when results are invalid. Both > anv_device_wait and anv_device_bo_busy only do an early return of > VK_NOT_READY or VK_TIMEOUT and don't return VK_SUCCESS until after they've > called GEM_RESET_STATS to ensuring that we haven't hung.
It's the ordering though. Currently it is !query_is_available !busy get_reset_stats <-- slow return query_is_available. I'm arguing that you want !query_is_available !busy if (query_is_available) return SUCCESS return get_reset_stats <-- slow. with the observation that if the gpu completed the available before the hang, it is not affected by the unlikely hang. You really, really want to avoid get_reset_stats on low latency paths (for the next few months at least). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev