On 21.08.25 16:06, Thomas Hellström wrote: >> What are you referring to? > > https://lore.kernel.org/intel-xe/a004736315d77837172418eb196d5b5f80b74e6c.ca...@linux.intel.com/
Thanks, that one never made it into my inbox as far as I can see. > A couple of questions on the design direction here: > > IIRC both xe and i915 has checks to consider objects with a 0 gem > refcount as zombies requiring special treatment or skipping, when > encountered in TTM callbacks. We need to double-check that. I think I've found all of those. The one in i915 were actually not TTM specific but try to catch the same problem on the GEM refcount. > But I wonder, > first this practice of resurrecting refcounts seem a bit unusual, I > wonder if we can get rid of that somehow? I was also going back on forth if that is a good idea or not as well. The usual solution to such kinds of issues is to use two reference counts, so that you got a multi stage cleanup approach. E.g. backing store and object, like what mm_struct is using as well. The problem was simply that TTM/GEM ended up having *four* reference counts for the same object, each was doing something different and they didn't worked well together at all. > Furthermore, it seems the problem with drm_exec is related only to the > LRU walk. What about adding a struct completion to the object, that is > signaled when the object has freed its final backing-store. The LRU > walk would then check if the object is a zombie, and if so just wait on > the struct completion. (Need of course to carefully set up locking > orders). Then we wouldn't need to resurrect the gem refcount, nor use > drm_exec locking for zombies. I had a similar idea, waiting is already possible by waiting for the BOs work item. But I abandoned that idea because I couldn't see how we could solve the locking. > We would still need some form of refcounting while waiting on the > struct completion, but if we restricted the TTM refcount to *only* be > used internally for that sole purpose, and also replaced the final > ttm_bo_put() with the ttm_bo_finalize() that you suggest we wouldn't > need to resurrect that refcount since it wouldn't drop to zero until > the object is ready for final free. > > Ideas, comments? Ideally I think we would use the handle_count as backing store the drm_gem_object->refcount as structure reference. But that means a massive rework of the GEM handling/drivers/TTM. Alternative we could just grab a reference to a unsignaled fence when we encounter a dead BO on the LRU. What do you think of that idea? Regards, Christian.