Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Christian König Wed, 08 Oct 2025 07:04:08 -0700

On 08.10.25 15:50, Tvrtko Ursulin wrote:
> 
> On 08/10/2025 13:35, Christian König wrote:
>> On 08.10.25 13:53, Tvrtko Ursulin wrote:
>>> Disclaimer:
>>> Please note that as this series includes a patch which touches a good 
>>> number of
>>> drivers I will only copy everyone in the cover letter and the respective 
>>> patch.
>>> Assumption is people are subscribed to dri-devel so can look at the whole 
>>> series
>>> there. I know someone is bound to complain for both the case when everyone 
>>> is
>>> copied on everything for getting too much email, and also for this other 
>>> case.
>>> So please be flexible.
>>>
>>> Description:
>>>
>>> All drivers which use the TTM pool allocator end up requesting large order
>>> allocations when allocating large buffers. Those can be slow due memory 
>>> pressure
>>> and so add latency to buffer creation. But there is often also a size limit
>>> above which contiguous blocks do not bring any performance benefits. This 
>>> series
>>> allows drivers to say when it is okay for the TTM to try a bit less hard.
>>>
>>> We do this by allowing drivers to specify this cut off point when creating 
>>> the
>>> TTM device and pools. Allocations above this size will skip direct reclaim 
>>> so
>>> under memory pressure worst case latency will improve. Background reclaim is
>>> still kicked off and both before and after the memory pressure all the TTM 
>>> pool
>>> buckets remain to be used as they are today.
>>>
>>> This is especially interesting if someone has configured MAX_PAGE_ORDER to
>>> higher than the default. And even with the default, with amdgpu for example,
>>> the last patch in the series makes use of the new feature by telling TTM 
>>> that
>>> above 2MiB we do not expect performance benefits. Which makes TTM not try 
>>> direct
>>> reclaim for the top bucket (4MiB).
>>>
>>> End result is TTM drivers become a tiny bit nicer mm citizens and users 
>>> benefit
>>> from better worst case buffer creation latencies. As a side benefit we get 
>>> rid
>>> of two instances of those often very unreadable mutliple nameless booleans
>>> function signatures.
>>>
>>> If this sounds interesting and gets merge the invidual drivers can follow up
>>> with patches configuring their thresholds.
>>>
>>> v2:
>>>   * Christian suggested to pass in the new data by changing the function 
>>> signatures.
>>>
>>> v3:
>>>   * Moved ttm pool helpers into new ttm_pool_internal.h. (Christian)
>>
>> Patch #3 is Acked-by: Christian König <[email protected]>.
>>
>> The rest is Reviewed-by: Christian König <[email protected]>
> 
> Thank you!
> 
> So I think now I need acks to merge via drm-misc for all the drivers which 
> have their own trees. Which seems to be just xe.


I think you should ping the XE guys for their opinion, but since there 
shouldn't be any functional change for them you can probably go ahead and merge 
the patches to drm-misc-next when there is no reply in time.

> Also interesting for other drivers is that when this lands folks can start 
> passing in their "max size which leads to performance gains" via 
> TTM_POOL_BENEFICIAL_ORDER and get the worst case allocation latency 
> improvements.

Yeah, as said before if any other driver says they don't need this behavior we 
should certainly add something.

> I am thinking xe also maxes out at 2MiB pages, for others I don't know.

For AMDGPU it can actually be that this changes on future HW generations, so 
having it configurable is certainly the right approach.

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>> v1 thread:
>>> https://lore.kernel.org/dri-devel/[email protected]/
>>>
>>> Cc: Alex Deucher <[email protected]>
>>> Cc: Christian König <[email protected]>
>>> Cc: Danilo Krummrich <[email protected]>
>>> Cc: Dave Airlie <[email protected]>
>>> Cc: Gerd Hoffmann <[email protected]>
>>> Cc: Joonas Lahtinen <[email protected]>
>>> Cc: Lucas De Marchi <[email protected]>
>>> Cc: Lyude Paul <[email protected]>
>>> Cc: Maarten Lankhorst <[email protected]>
>>> Cc: Maxime Ripard <[email protected]>
>>> Cc: Rodrigo Vivi <[email protected]>
>>> Cc: Sui Jingfeng <[email protected]>
>>> Cc: Thadeu Lima de Souza Cascardo <[email protected]>
>>> Cc: Thomas Hellström <[email protected]>
>>> Cc: Thomas Zimmermann <[email protected]>
>>> Cc: Zack Rusin <[email protected]>
>>>
>>> Tvrtko Ursulin (5):
>>>    drm/ttm: Add getter for some pool properties
>>>    drm/ttm: Replace multiple booleans with flags in pool init
>>>    drm/ttm: Replace multiple booleans with flags in device init
>>>    drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
>>>    drm/amdgpu: Configure max beneficial TTM pool allocation order
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  7 +--
>>>   drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
>>>   drivers/gpu/drm/i915/intel_region_ttm.c       |  2 +-
>>>   drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_ttm.c         |  4 +-
>>>   drivers/gpu/drm/qxl/qxl_ttm.c                 |  2 +-
>>>   drivers/gpu/drm/radeon/radeon_ttm.c           |  4 +-
>>>   drivers/gpu/drm/ttm/tests/ttm_bo_test.c       | 16 +++----
>>>   .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  2 +-
>>>   drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 31 +++++--------
>>>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 22 ++++-----
>>>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  7 +--
>>>   drivers/gpu/drm/ttm/tests/ttm_pool_test.c     | 23 +++++-----
>>>   drivers/gpu/drm/ttm/ttm_device.c              |  7 ++-
>>>   drivers/gpu/drm/ttm/ttm_pool.c                | 45 +++++++++++--------
>>>   drivers/gpu/drm/ttm/ttm_pool_internal.h       | 24 ++++++++++
>>>   drivers/gpu/drm/ttm/ttm_tt.c                  | 10 +++--
>>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  4 +-
>>>   drivers/gpu/drm/xe/xe_device.c                |  2 +-
>>>   include/drm/ttm/ttm_device.h                  |  2 +-
>>>   include/drm/ttm/ttm_pool.h                    | 13 +++---
>>>   21 files changed, 125 insertions(+), 106 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/ttm/ttm_pool_internal.h
>>>
>>
>

Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Reply via email to