Re: [PATCH v4] drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs control

Christian König Mon, 22 Nov 2021 06:15:45 -0800

Am 16.11.21 um 16:53 schrieb Zack Rusin:

On Nov 16, 2021, at 03:20, Christian König <christian.koe...@amd.com> wrote:


Am 16.11.21 um 08:43 schrieb Thomas Hellström:

On 11/16/21 08:19, Christian König wrote:

Am 13.11.21 um 12:26 schrieb Thomas Hellström:

Hi, Zack,

On 11/11/21 17:44, Zack Rusin wrote:

On Wed, 2021-11-10 at 09:50 -0500, Zack Rusin wrote:

TTM takes full control over TTM_PL_SYSTEM placed buffers. This makes
driver internal usage of TTM_PL_SYSTEM prone to errors because it
requires the drivers to manually handle all interactions between TTM
which can swap out those buffers whenever it thinks it's the right
thing to do and driver.

CPU buffers which need to be fenced and shared with accelerators
should
be placed in driver specific placements that can explicitly handle
CPU/accelerator buffer fencing.
Currently, apart, from things silently failing nothing is enforcing
that requirement which means that it's easy for drivers and new
developers to get this wrong. To avoid the confusion we can document
this requirement and clarify the solution.

This came up during a discussion on dri-devel:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F232f45e9-8748-1243-09bf-56763e6668b3%40amd.com&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C57658f299a72436627e608d9a9194209%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726748229186505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=UbE43u8a0MPNgXtzcqkJJfSe0I%2BA5Yojz7yoh7e6Oyo%3D&amp;reserved=0

I took a slightly deeper look into this. I think we need to formalize this a bit more to understand
pros and cons and what the restrictions are really all about. Anybody looking at the prevous
discussion will mostly see arguments similar to "this is stupid and difficult" and
"it has always been this way" which are not really constructive.

First disregarding all accounting stuff, I think this all boils down to
TTM_PL_SYSTEM having three distinct states:
1) POPULATED
2) LIMBO (Or whatever we want to call it. No pages present)
3) SWAPPED.

The ttm_bo_move_memcpy() helper understands these, and any standalone driver
implementation of the move() callback _currently_ needs to understand these as
well, unless using the ttm_bo_move_memcpy() helper.

Now using a bounce domain to proxy SYSTEM means that the driver can forget
about the SWAPPED state, it's automatically handled by the move setup code.
However, another pitfall is LIMBO, in that if when you move from SYSTEM/LIMBO
to your bounce domain, the BO will be populated. So any naive accelerated
move() implementation creating a 1GB BO in fixed memory, like VRAM, will
needlessly allocate and free 1GB of system memory in the process instead of
just performing a clear operation. Looks like amdgpu suffers from this?

I think what is really needed is either

a) A TTM helper that helps move callback implementations resolve the issues
populating system from LIMBO or SWAP, and then also formalize driver
notification for swapping. At a minimum, I think the swap_notify() callback
needs to be able to return a late error.

b) Make LIMBO and SWAPPED distinct memory regions. (I think I'd vote for this
without looking into it in detail).

In both these cases, we should really make SYSTEM bindable by GPU, otherwise
we'd just be trading one pitfall for another related without really resolving
the root problem.

As for fencing not being supported by SYSTEM, I'm not sure why we don't want
this, because it would for example prohibit async ttm_move_memcpy(), and also,
async unbinding of ttm_tt memory like MOB on vmgfx. (I think it's still sync).

There might be an accounting issue related to this as well, but I guess
Christian would need to chime in on this. If so, I think it needs to be well
understood and documented (in TTM, not in AMD drivers).

I think the problem goes deeper than what has been mentioned here so far.

Having fences attached to BOs in the system domain is probably ok, but the key 
point is that the BOs in the system domain are under TTMs control and should 
not be touched by the driver.

What we have now is that TTMs internals like the allocation state of BOs in 
system memory (the populated, limbo, swapped you mentioned above) is leaking 
into the drivers and I think exactly that is the part which doesn't work 
reliable here. You can of course can get that working, but that requires 
knowledge of the internal state which in my eyes was always illegal.

Well, I tend to agree to some extent, but then, like said above even disregarding 
swap will cause trouble with the limbo state, because the driver's move callback 
would need knowledge of that to implement moves limbo -> vram efficiently.

Well my long term plan is to audit the code base once more and remove the limbo 
state from the SYSTEM domain.

E.g. instead of a SYSTEM BO without pages you allocate a BO without a resource in 
general which is now possible since bo->resource is a pointer.

This would still allow us to allocate "empty shell" BOs. But a validation of 
those BOs doesn't cause a move, but rather just allocates the resource for the first time.

The problem so far was just that we access bo->resource way to often without 
checking it.

So all in all this would be a two step process 1) Eliminate the “limbo” state, 
2) Split PL_SYSTEM into PL_SYSTEM, that drivers could use for really anything, 
and PL_SWAP which would be under complete TTM control, yes? If that’s the case 
that would certainly make my life a lot easier (because the drivers wouldn’t 
need to introduce/manage their own system placements) and I think it would make 
the code a lot easier to understand.


We also have a couple of other use cases for this, yes.

That’s a bit of work though so the question what happens until this lands comes 
to mind. Is introduction of VMW_PL_SYSTEM and documenting that PL_SYSTEM is 
under complete TTM control (like this patch does) the way to go or do we want 
to start working on the above immediately? Because I’d love to be able to 
unload vmwgfx without kernel oopsing =)

I think documenting and getting into a clean state should beprerequisite for new development (even if the fix for the unload istrivial).


Regards,
Christian.

Re: [PATCH v4] drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs control

Reply via email to