On Mon 03 Apr 2017, Jason Ekstrand wrote: > On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace <chadvers...@chromium.org> > wrote: > > > On Fri 31 Mar 2017, Chad Versace wrote: > > > On Wed 15 Mar 2017, Jason Ekstrand wrote: > > > > This cache allows us to easily ensure that we have a unique anv_bo for > > > > each gem handle. We'll need this in order to support multiple-import > > of > > > > memory objects and semaphores. > > > > > > > > v2 (Jason Ekstrand): > > > > - Reject BO imports if the size doesn't match the prime fd size as > > > > reported by lseek(). > > > > > > > > v3 (Jason Ekstrand): > > > > - Fix reference counting around cache_release (Chris Willson) > > > > - Move the mutex_unlock() later in cache_release > > > > --- > > > > src/intel/vulkan/anv_allocator.c | 261 ++++++++++++++++++++++++++++++ > > +++++++++ > > > > src/intel/vulkan/anv_private.h | 26 ++++ > > > > 2 files changed, 287 insertions(+) > > > > > > > > > > +static uint32_t > > > > +hash_uint32_t(const void *key) > > > > +{ > > > > + return (uint32_t)(uintptr_t)key; > > > > +} > > > > > > This hash function does not appear hashy. > > > > > > If I correctly understand the details of Mesa's struct hash_table, > > > choosing the identify function for the hash function causes unwanted > > > clustering when inserting consecutive gem handles. Since the kernel does > > > allocate gem handles consecutively, the problem is real. > > > > > > For proof, consider the following: > > > > > > - Suppose a long-running process (maybe the compositor) has thrashed > > on the > > > hash table long enough that its bucket count > > > is ht->size = hash_sizes[7].size = 283. Suppose a spike of > > > compositor activity raises the hash table's density to about 0.5. > > > And suppose the hash table buckets are filled with the consecutive > > gem > > > handles > > > > > > {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 } > > > > > > The exact density is (128 - 4 + 1) / 283 = 0.4417. > > > > > > - Next, some other in-process activity (maybe OpenGL) generated > > > a lot of gem handles after Vulkan's most recently imported > > > gem handle, 128. > > > > This point in the example---the reason why the gem handles in the > > anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem *is* > > real for multiple in-process OpenGL contexts derived from the same > > EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context > > shares the same intel_screen, and therefore the same drm device fd. But > > in Vulkan, each VkDevice opens its own drm device id. So, bogus example. > > > > BUT, that leads to a new question... > > > > Since each VkDevice has a unique drm device fd, and since the kernel > > allocates gem handles consecutively on the fd, and since struct > > hash_table only grows and never shrinks, and since patch 8/18 inserts > > every VkDeviceMemory into the cache... I believe no collisions are > > possible in anv_bo_cache. > > > > Does this fall under the category of unbreakable kernel ABI or is it just a > side-effect of the implementation? If not, then I'm reluctant to trust it.
I'm not certain. But krh, sitting beside me, says "It's ABI at this point. The kernel uses a 'idr' structure which guarantees that behavior". > > If there are no collisions, then the hash table is only adding overhead, > > Sure, but a no-collision hash table is pretty cheap... > > > and we should use a direct-addressing lookup table. The bo cache should > > look like this: > > > > struct anv_bo_cache { > > /* The array indices are gem handles. Null entries are legal. */ > > struct anv_bo **bos; > > > > /* Length of the array. Because the array can have holes, this > > * is *not* the number of gem handles in the array. > > */ > > size_t len; > > > > pthread_mutex_t mutex; > > }; > > > > struct anv_bo * > > anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle) > > { > > struct anv_bo *bo = NULL; > > > > pthread_mutex_lock(&cache->mutex); > > > > if (gem_handle < cache->len) > > bo = cache->entries[gem_handle]; > > > > pthread_mutex_unlock(&cache->mutex); > > > > return bo; > > > > } > > > > BUT, that leads to yet another question... > > > > Why is patch 8/18 inserting every VkDeviceMemory into the cache? If > > I understand things correctly, we only *need* to insert a VkDeviceMemory > > into the cache if, in vkAllocateMemory, either (1) > > VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2) > > VkMemoryAllocateInfo's pNext chain contains an import structure. > > > > Because I'm lazy. In order to start using the bo cache, > anv_device_memory::bo needs to be a pointer (well, it's makes the BO cache > API simpler and more efficient if it's a pointer). This would mean that we > would have to allocate an additional chunk of memory or go through some > other hoops in order to make it work. At the end of the day, just stuffing > everything in the cache was simpler and kept us to a single path. Huh? I don't understand how pointers-or-not-pointers affect the decision to use an array or a hash table. In the lookup function I wrote above, the anv_bo_cache holds pointers to anv_bo's, just like the anv_bo_cache in the actual patch. (I kept it that way so the API would be the same as in your patch series). > > If we insert into the cache only those VkDeviceMemories that are > > imported or that will be exported, then the bo cache remains small, and > > we *should* use a hash table. > > > > Maybe. But the client isn't supposed to be allocating hundreds of > VkDeviceMemory objects. It's supposed to allocate a few and then > suballocate from those. If the client allocates so many memory objects > that they start hitting hash table performance issues, that's their own > fault. > > Also, please remember that vkAllocateMemory is considered to be a *very* > heavy-weight function in Vulkan. Compared to an ioctl which allocates > memory, a hash table insert is trivial. I'm reasonably happy to make a few > changes here or there to make it more efficient if any of this proves to be > a problem. However, I think we're working way too hard to micro-optimize > something that takes < 0.001% of runtime. True. The difference between time needed for an array lookup and a hash table lookup is tiny compared to the ioctl. I wasn't suggesting to use an array for that 0.001% though. I suggested an array because struct hash_table, as used in this patch, is just a resizable array with an unneeded vtable. It just felt weird. We've discussed all the things :) I'll drop the complaint about hash-vs-array. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev