[PATCH 0/9] Replace use of radeon_sa with a new sub allocator

Christian König Wed, 31 Dec 2014 18:07:16 +0100

> The long-term solution
That was the part that I missed in the description. Please note 
somewhere that we still need to improve this.


Apart from that the patches look fine to me, but I need more time to 
review them in detail.

Regards,
Christian.

Am 31.12.2014 um 15:06 schrieb Oded Gabbay:
>
> On 12/31/2014 03:49 PM, Christian KÃ¶nig wrote:
>> Am 31.12.2014 um 14:39 schrieb Oded Gabbay:
>>> Background:
>>>
>>> amdkfd needs GART memory for several things, such as runlist packets,
>>> MQDs, HPDs and more. Unfortunately, all of this memory must be always
>>> pinned (due to several reasons which were discussed during the
>>> initial review of amdkfd).
>> In general seems to be a good idea, but so far I still don't have seen a
>> good explanation why all those memory must be pinned. So please summarize
>> that one once more.
>>
>> Regards,
>> Christian.
>>
> ok, once more :)
>
> The bulk of the allocations in the GART is for MQDs. MQDs represent active
> user-mode queues, which are on the current runlist. It is important to
> remember that active queues doesn't necessarily mean scheduled/running
> queues, especially if there is over-subscription of queues or more than a
> single HSA process.
>
> Because the scheduling of the user-mode queues is done by the CP firmware,
> amdkfd doesn't have any indication if the queue is scheduled or not. If the
> CP will try to schedule a queue, and its MQD is not present, this will
> probably stuck the CP permanently, as it will load garbage from the GART
> (the address of the MQD is given to the CP inside the runlist packet).
>
> In addition, there are a couple of small allocations which also should
> always be pinned - runlist packets (2 packets) and HPDs. runlist packets can
> be quite large, depending on number of processes and queues.
>
> A few solutions were proposed, but at the end Jerome agreed there is no harm
> when limiting the total memory consumption to around 4MB.
>
> The long-term solution, which I will be working on, hopefully soon, is to
> create a mechanism through which radeon/ttm can ask amdkfd to clear
> GART/VRAM memory due to memory pressure. Then, amdkfd will preempt the
> running queues and wait until the memory pressure is over. Then it will
> reschedule the queues. But I'm getting ahead of myself. I hope to send an
> RFC about that in the next couple of weeks.
>
>       Oded
>
>
>
>>> Current Solution:
>>>
>>> The current (short/mid-term) solution that was proposed by Jerome.G, is
>>> to limit the amount of memory to a small size, roughly 4MB and allocate
>>> this buffer at the start of the GART. To accomodate this, amdkfd has
>>> two kernel module parameters, maximum number of HSA processes and
>>> maximum number of queues per process, which require under 4MB of GART
>>> memory when using their defaults, 32 and 128 respectively.
>>>
>>> Until now, amdkfd used the radeon sub-allocator module (radeon_sa)
>>> to handle the sub-allocation of memory from this large buffer to
>>> different modules inside the amdkfd.
>>>
>>> However, while running OpenCL conformance test suite, we found that
>>> radeon_sa module is not suitable for this kind of task, due to its
>>> design:
>>> 1. Every allocation increments its interal pointer so the next
>>> allocation is *always* done ahead of the previous allocation. This
>>> causes the internal pointer to wrap-around when it reaches the end of
>>> the buffer.
>>>
>>> 2. When encoutering an area that is already allocated, the module
>>> waits for that area to be freed. If it is not freed in a timely manner
>>> (or has no fence), the allocation fails. Simply put, it can't "skip"
>>> the allocated area.
>>>
>>> Now, this is most probably good for graphics, but for amdkfd needs,
>>> the combination of the two behaviors mentioned above eventually causes
>>> a denial-of-service. This is because some memory allocations
>>> are *always* present and *never* freed (such as HPDs).
>>> Therefore, given enough time and workload, the radeon_sa eventually
>>> wraps around, encounters an already allocated area and gets stuck.
>>>
>>> Proposed new solution:
>>>
>>> To solve this, I have written a simple sub-allocator module inside
>>> amdkfd. It allocates fixed-size contiguous chunks (1 or more) and uses
>>> a bitmap to manage the allocations. The next allocation is always
>>> being searched for from the start of the GART buffer, and the module
>>> knows how to skip allocated chunks.
>>>
>>> Because most allocations are MQDs, and MQDs are 512 Bytes in size, I
>>> set the default chunk size to be 512 Bytes.
>>>
>>> The basic GART memory allocation is still being done in the
>>> amdkfd <--> radeon interface, and it still occupies less than 4MB.
>>>
>>> I have chosen to implement a new allocator instead of changing
>>> radeon_sa because the behavior of radeon_sa is very appropriate for
>>> graphics, where allocations do not stay forever. Also, amdkfd doesn't
>>> actually need the flexibility and features radeon_sa provides.
>>>
>>>      Oded
>>>
>>> Oded Gabbay (9):
>>>     drm/amd: Add new kfd-->kgd interface for gart usage
>>>     drm/radeon: Impl. new gtt allocate/free functions
>>>     drm/amdkfd: Add gtt sa related data to kfd_dev struct
>>>     drm/amdkfd: Add kfd gtt sub-allocator functions
>>>     drm/amdkfd: Fixed calculation of gart buffer size
>>>     drm/amdkfd: Allocate gart memory using new interface
>>>     drm/amdkfd: Using new gtt sa in amdkfd
>>>     drm/radeon: Remove old radeon_sa usage from kfd-->kgd interface
>>>     drm/amd: Remove old radeon_sa funcs from kfd-->kgd interface
>>>
>>>    drivers/gpu/drm/amd/amdkfd/kfd_device.c            | 217
>>> ++++++++++++++++++++-
>>>    .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  23 +--
>>>    drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |  41 ++--
>>>    drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |  16 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  10 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  28 ++-
>>>    drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  23 +--
>>>    drivers/gpu/drm/radeon/radeon_kfd.c                | 128 ++++++------
>>>    8 files changed, 329 insertions(+), 157 deletions(-)
>>>

[PATCH 0/9] Replace use of radeon_sa with a new sub allocator

Reply via email to