On Mon, Apr 3, 2017 at 3:45 PM, Chad Versace <chadvers...@chromium.org> wrote:
> On Mon 03 Apr 2017, Jason Ekstrand wrote: > > On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace <chadvers...@chromium.org> > > wrote: > > > > > On Fri 31 Mar 2017, Chad Versace wrote: > > > > On Wed 15 Mar 2017, Jason Ekstrand wrote: > > > > > This cache allows us to easily ensure that we have a unique anv_bo > for > > > > > each gem handle. We'll need this in order to support > multiple-import > > > of > > > > > memory objects and semaphores. > > > > > > > > > > v2 (Jason Ekstrand): > > > > > - Reject BO imports if the size doesn't match the prime fd size as > > > > > reported by lseek(). > > > > > > > > > > v3 (Jason Ekstrand): > > > > > - Fix reference counting around cache_release (Chris Willson) > > > > > - Move the mutex_unlock() later in cache_release > > > > > --- > > > > > src/intel/vulkan/anv_allocator.c | 261 > ++++++++++++++++++++++++++++++ > > > +++++++++ > > > > > src/intel/vulkan/anv_private.h | 26 ++++ > > > > > 2 files changed, 287 insertions(+) > > > > > > > > > > > > > +static uint32_t > > > > > +hash_uint32_t(const void *key) > > > > > +{ > > > > > + return (uint32_t)(uintptr_t)key; > > > > > +} > > > > > > > > This hash function does not appear hashy. > > > > > > > > If I correctly understand the details of Mesa's struct hash_table, > > > > choosing the identify function for the hash function causes unwanted > > > > clustering when inserting consecutive gem handles. Since the kernel > does > > > > allocate gem handles consecutively, the problem is real. > > > > > > > > For proof, consider the following: > > > > > > > > - Suppose a long-running process (maybe the compositor) has > thrashed > > > on the > > > > hash table long enough that its bucket count > > > > is ht->size = hash_sizes[7].size = 283. Suppose a spike of > > > > compositor activity raises the hash table's density to about > 0.5. > > > > And suppose the hash table buckets are filled with the > consecutive > > > gem > > > > handles > > > > > > > > {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 } > > > > > > > > The exact density is (128 - 4 + 1) / 283 = 0.4417. > > > > > > > > - Next, some other in-process activity (maybe OpenGL) generated > > > > a lot of gem handles after Vulkan's most recently imported > > > > gem handle, 128. > > > > > > This point in the example---the reason why the gem handles in the > > > anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem > *is* > > > real for multiple in-process OpenGL contexts derived from the same > > > EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context > > > shares the same intel_screen, and therefore the same drm device fd. But > > > in Vulkan, each VkDevice opens its own drm device id. So, bogus > example. > > > > > > BUT, that leads to a new question... > > > > > > Since each VkDevice has a unique drm device fd, and since the kernel > > > allocates gem handles consecutively on the fd, and since struct > > > hash_table only grows and never shrinks, and since patch 8/18 inserts > > > every VkDeviceMemory into the cache... I believe no collisions are > > > possible in anv_bo_cache. > > > > > > > Does this fall under the category of unbreakable kernel ABI or is it > just a > > side-effect of the implementation? If not, then I'm reluctant to trust > it. > > I'm not certain. But krh, sitting beside me, says "It's ABI at this point. > The kernel uses a 'idr' structure which guarantees that behavior". > If we think we can rely on it, I'm happy to make it into a flat array but it'll basically be a simplified hash table at that point. > > > If there are no collisions, then the hash table is only adding > overhead, > > > > Sure, but a no-collision hash table is pretty cheap... > > > > > and we should use a direct-addressing lookup table. The bo cache should > > > look like this: > > > > > > struct anv_bo_cache { > > > /* The array indices are gem handles. Null entries are legal. */ > > > struct anv_bo **bos; > > > > > > /* Length of the array. Because the array can have holes, this > > > * is *not* the number of gem handles in the array. > > > */ > > > size_t len; > > > > > > pthread_mutex_t mutex; > > > }; > > > > > > struct anv_bo * > > > anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t > gem_handle) > > > { > > > struct anv_bo *bo = NULL; > > > > > > pthread_mutex_lock(&cache->mutex); > > > > > > if (gem_handle < cache->len) > > > bo = cache->entries[gem_handle]; > > > > > > pthread_mutex_unlock(&cache->mutex); > > > > > > return bo; > > > > > > } > > > > > > BUT, that leads to yet another question... > > > > > > Why is patch 8/18 inserting every VkDeviceMemory into the cache? If > > > I understand things correctly, we only *need* to insert a > VkDeviceMemory > > > into the cache if, in vkAllocateMemory, either (1) > > > VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2) > > > VkMemoryAllocateInfo's pNext chain contains an import structure. > > > > > > > Because I'm lazy. In order to start using the bo cache, > > anv_device_memory::bo needs to be a pointer (well, it's makes the BO > cache > > API simpler and more efficient if it's a pointer). This would mean that > we > > would have to allocate an additional chunk of memory or go through some > > other hoops in order to make it work. At the end of the day, just > stuffing > > everything in the cache was simpler and kept us to a single path. > > > Huh? I don't understand how pointers-or-not-pointers affect the decision > to use an array or a hash table. In the lookup function I wrote above, > the anv_bo_cache holds pointers to anv_bo's, just like the anv_bo_cache > in the actual patch. (I kept it that way so the API would be the same as > in your patch series). > That was in response to the question about inserting every VkDeviceMemory into the cache. > > > If we insert into the cache only those VkDeviceMemories that are > > > imported or that will be exported, then the bo cache remains small, and > > > we *should* use a hash table. > > > > > > > Maybe. But the client isn't supposed to be allocating hundreds of > > VkDeviceMemory objects. It's supposed to allocate a few and then > > suballocate from those. If the client allocates so many memory objects > > that they start hitting hash table performance issues, that's their own > > fault. > > > > Also, please remember that vkAllocateMemory is considered to be a *very* > > heavy-weight function in Vulkan. Compared to an ioctl which allocates > > memory, a hash table insert is trivial. I'm reasonably happy to make a > few > > changes here or there to make it more efficient if any of this proves to > be > > a problem. However, I think we're working way too hard to micro-optimize > > something that takes < 0.001% of runtime. > > True. The difference between time needed for an array lookup and a hash > table lookup is tiny compared to the ioctl. I wasn't suggesting to use > an array for that 0.001% though. I suggested an array because struct > hash_table, as used in this patch, is just a resizable array with an > unneeded vtable. It just felt weird. > > We've discussed all the things :) I'll drop the complaint about > hash-vs-array. >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev