On Wed, Apr 05, 2017 at 10:28:53AM -0700, Jason Ekstrand wrote: > Before, we were just looking at whether or not the user wanted us to > wait and waiting on the BO. Some clients, such as the Serious engine, > use a single query pool for hundreds of individual query results where > the writes for those queries may be split across several command > buffers. In this scenario, the individual query we're looking for may > become available long before the BO is idle so waiting on the query pool > BO to be finished is wasteful. This commit makes us instead busy-loop on > each query until it's available. > > This significantly reduces pipeline bubbles and improves performance of > The Talos Principle on medium settings (where the GPU isn't overloaded > with drawing) by around 20% on my SkyLake gt4. > --- > src/intel/vulkan/genX_query.c | 52 > ++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 46 insertions(+), 6 deletions(-) > > diff --git a/src/intel/vulkan/genX_query.c b/src/intel/vulkan/genX_query.c > index 7ea9404..0d303a6 100644 > --- a/src/intel/vulkan/genX_query.c > +++ b/src/intel/vulkan/genX_query.c > @@ -131,6 +131,44 @@ cpu_write_query_result(void *dst_slot, > VkQueryResultFlags flags, > } > } > > +static bool > +query_is_available(struct anv_device *device, volatile uint64_t *slot) > +{ > + if (!device->info.has_llc) > + __builtin_ia32_clflush(slot); > + > + return slot[0]; > +} > + > +static VkResult > +wait_for_available(struct anv_device *device, > + struct anv_query_pool *pool, uint64_t *slot) > +{ > + while (true) { > + if (query_is_available(device, slot)) > + return VK_SUCCESS; > + > + VkResult result = anv_device_bo_busy(device, &pool->bo);
Ah, but you can use the simpler check here because you follow up with a query_is_available() so you know whether or not the hang clobbered the result. If the query is not available but the bo is idle, you might then went to check for a reset in case it was due to a lost device. GEM_BUSY is lockless, but GEM_RESET_STATS currently takes the big struct_mutex and so has non-deterministic and often quite large latencies. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev