On 12/09/2013 06:28 PM, Daniel Vetter wrote: > On Thu, Dec 05, 2013 at 11:45:03AM -0500, Jerome Glisse wrote: >> On Thu, Dec 05, 2013 at 05:22:54PM +0100, Maarten Lankhorst wrote: >>> op 05-12-13 16:49, Jerome Glisse schreef: >>>> On Thu, Dec 05, 2013 at 11:26:46AM +0100, Thomas Hellstrom wrote: >>>>> Hi! >>>>> >>>>> On 12/05/2013 10:36 AM, Lauri Kasanen wrote: >>>>>> Hi list, Thomas, >>>>>> >>>>>> I will be investigating the use of a hotness score for each bo, to >>>>>> replace the ping-pong causing LRU eviction in radeon*. >>>>>> >>>>>> The goal is to put all bos that fit in VRAM there, in order of hotness; >>>>>> a new bo should only be placed there if its hotness score is greater >>>>>> than the lowest VRAM bo's. Then the lowest-hotness-bos in >>>>>> VRAM should be evicted until the new bo fits. This should result in a >>>>>> more stable set with less ping-pong. >>>>>> >>>>>> Jerome advised that the bo placement should be done entirely outside >>>>>> TTM. As I'm not (yet) too familiar with that side of the kernel, what is >>>>>> the opinion of TTM folks? >>>>> There are a couple of things to be considered: >>>>> 1) You need to decide where a bo to be validated should be placed. >>>>> The driver can give a list of possible placements to TTM and let >>>>> TTM decide, trying each placement in turn. A driver that thinks this >>>>> isn't sufficient can come up with its on strategy and give only a >>>>> single placement to TTM. If TTM can't satisfy that, it will give you >>>>> an error back, and the driver will need to validate with an >>>>> alternative placement. I think Radeon already does this? vmwgfx does >>>>> it to some extent. >>>>> >>>>> 2) As you say, TTM is evicting strictly on an lru basis, and is >>>>> maintaining one LRU list per memory type, and also a global swap lru >>>>> list for buffers that are backed by system pages (not VRAM). I guess >>>>> what you would want to do is to replace the VRAM lru list with a >>>>> priority queue where bos are continously sorted based on hotness. >>>>> As long as you obey the locking rules: >>>>> *) Locking order is bo::reserve -> lru-lock >>>>> *) When walking the queue with the lru-lock held, you must therefore >>>>> tryreserve if you want to reserve an object on the queue >>>>> *) bo:s need to be removed from the queue as soon as they are reserved >>>>> *) Don't remove a bo from the queue unless it is reserved >>>>> Nothing stops you from doing this in the driver, but OTOH if this >>>>> ends up being useful for other drivers I'd prefer we put it into >>>>> TTM. >>>> It will be useful to others, the point i am making is that others might >>>> not use ttm either and there is nothing about bo placement that needs >>>> to be ttm specific. >>>> >>>> To avoid bo eviction from lru list is just a matter of driver never >>>> over committing bo on a pool of memory and driver doing eviction by >>>> itself, ie deciding of a new placement for bo and moving that bo >>>> before moving in other bo, which can be done outside ttm. >>>> >>>> The only thing that will needs modification to ttm is work done to >>>> control memory fragmentation but this should be not be enforce on >>>> all ttm user and should be a runtime decision. GPU with virtual >>>> address space can scatter bo through vram by using vram pages making >>>> memory fragmentation pretty much a non issue (some GPU still needs >>>> contiguous memory for scan out buffer or other specific buffer). >>>> >>> You're correct it COULD be done like that, but that's a nasty workaround. >>> Simply assign a priority to each buffer, then modify ttm_bo_add_to_lru, >>> ttm_bo_swapout, ttm_mem_evict_first and be done with it. >>> >>> Memory management is exactly the kind of thing that should be done in TTM, >>> so why have something 'generic' for something that's little more than a >>> renamed priority queue? >> The end score and use of the score for placement decision be done in ttm >> but the whole score computation and heuristic related to it should not. > btw another thing to look at is the eviction roaster in drm_mm. It's > completely standalone, the only thing it requires is that you have a > deterministic order to add objects to it and unroll them (but that can > always be solved by putting objects on a temporary list). > > That way if you have some big objects and a highly fragmented vram you > don't end up eviction a big load of data, but just a perfectly-sized hole. > All the scanning is linar, but ime with the implementation in i915.ko > that's not a real-world issue really. The drm_mm roaster supports all the > same features as the normal block allocator, so range-restricted > allocations (and everything else) also works. See evict_something in > i915_gem_eviction.c for how it all works (yeah, no docs but writing those > for drm_mm.c is on my todo somewhere). > -Daniel
The problem with combining this with TTM is that eviction by default doesn't take place under a mutex, so multiple threads may be traversing the LRU list more or less at the same time, evicting stuff. However, when it comes to eviction, that's not really a behaviour we need to preserve. It would, IMO, be OK to take a "big" per-memory-type mutex around eviction, but then one would have to sort out how / whether swapping and delayed destruction would need to wait on that mutex as well.... /Thomas