> The long-term solution That was the part that I missed in the description. Please note somewhere that we still need to improve this.
Apart from that the patches look fine to me, but I need more time to review them in detail. Regards, Christian. Am 31.12.2014 um 15:06 schrieb Oded Gabbay: > > On 12/31/2014 03:49 PM, Christian König wrote: >> Am 31.12.2014 um 14:39 schrieb Oded Gabbay: >>> Background: >>> >>> amdkfd needs GART memory for several things, such as runlist packets, >>> MQDs, HPDs and more. Unfortunately, all of this memory must be always >>> pinned (due to several reasons which were discussed during the >>> initial review of amdkfd). >> In general seems to be a good idea, but so far I still don't have seen a >> good explanation why all those memory must be pinned. So please summarize >> that one once more. >> >> Regards, >> Christian. >> > ok, once more :) > > The bulk of the allocations in the GART is for MQDs. MQDs represent active > user-mode queues, which are on the current runlist. It is important to > remember that active queues doesn't necessarily mean scheduled/running > queues, especially if there is over-subscription of queues or more than a > single HSA process. > > Because the scheduling of the user-mode queues is done by the CP firmware, > amdkfd doesn't have any indication if the queue is scheduled or not. If the > CP will try to schedule a queue, and its MQD is not present, this will > probably stuck the CP permanently, as it will load garbage from the GART > (the address of the MQD is given to the CP inside the runlist packet). > > In addition, there are a couple of small allocations which also should > always be pinned - runlist packets (2 packets) and HPDs. runlist packets can > be quite large, depending on number of processes and queues. > > A few solutions were proposed, but at the end Jerome agreed there is no harm > when limiting the total memory consumption to around 4MB. > > The long-term solution, which I will be working on, hopefully soon, is to > create a mechanism through which radeon/ttm can ask amdkfd to clear > GART/VRAM memory due to memory pressure. Then, amdkfd will preempt the > running queues and wait until the memory pressure is over. Then it will > reschedule the queues. But I'm getting ahead of myself. I hope to send an > RFC about that in the next couple of weeks. > > Oded > > > >>> Current Solution: >>> >>> The current (short/mid-term) solution that was proposed by Jerome.G, is >>> to limit the amount of memory to a small size, roughly 4MB and allocate >>> this buffer at the start of the GART. To accomodate this, amdkfd has >>> two kernel module parameters, maximum number of HSA processes and >>> maximum number of queues per process, which require under 4MB of GART >>> memory when using their defaults, 32 and 128 respectively. >>> >>> Until now, amdkfd used the radeon sub-allocator module (radeon_sa) >>> to handle the sub-allocation of memory from this large buffer to >>> different modules inside the amdkfd. >>> >>> However, while running OpenCL conformance test suite, we found that >>> radeon_sa module is not suitable for this kind of task, due to its >>> design: >>> 1. Every allocation increments its interal pointer so the next >>> allocation is *always* done ahead of the previous allocation. This >>> causes the internal pointer to wrap-around when it reaches the end of >>> the buffer. >>> >>> 2. When encoutering an area that is already allocated, the module >>> waits for that area to be freed. If it is not freed in a timely manner >>> (or has no fence), the allocation fails. Simply put, it can't "skip" >>> the allocated area. >>> >>> Now, this is most probably good for graphics, but for amdkfd needs, >>> the combination of the two behaviors mentioned above eventually causes >>> a denial-of-service. This is because some memory allocations >>> are *always* present and *never* freed (such as HPDs). >>> Therefore, given enough time and workload, the radeon_sa eventually >>> wraps around, encounters an already allocated area and gets stuck. >>> >>> Proposed new solution: >>> >>> To solve this, I have written a simple sub-allocator module inside >>> amdkfd. It allocates fixed-size contiguous chunks (1 or more) and uses >>> a bitmap to manage the allocations. The next allocation is always >>> being searched for from the start of the GART buffer, and the module >>> knows how to skip allocated chunks. >>> >>> Because most allocations are MQDs, and MQDs are 512 Bytes in size, I >>> set the default chunk size to be 512 Bytes. >>> >>> The basic GART memory allocation is still being done in the >>> amdkfd <--> radeon interface, and it still occupies less than 4MB. >>> >>> I have chosen to implement a new allocator instead of changing >>> radeon_sa because the behavior of radeon_sa is very appropriate for >>> graphics, where allocations do not stay forever. Also, amdkfd doesn't >>> actually need the flexibility and features radeon_sa provides. >>> >>> Oded >>> >>> Oded Gabbay (9): >>> drm/amd: Add new kfd-->kgd interface for gart usage >>> drm/radeon: Impl. new gtt allocate/free functions >>> drm/amdkfd: Add gtt sa related data to kfd_dev struct >>> drm/amdkfd: Add kfd gtt sub-allocator functions >>> drm/amdkfd: Fixed calculation of gart buffer size >>> drm/amdkfd: Allocate gart memory using new interface >>> drm/amdkfd: Using new gtt sa in amdkfd >>> drm/radeon: Remove old radeon_sa usage from kfd-->kgd interface >>> drm/amd: Remove old radeon_sa funcs from kfd-->kgd interface >>> >>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 217 >>> ++++++++++++++++++++- >>> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 23 +-- >>> drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 41 ++-- >>> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 16 +- >>> drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 10 +- >>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 28 ++- >>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 23 +-- >>> drivers/gpu/drm/radeon/radeon_kfd.c | 128 ++++++------ >>> 8 files changed, 329 insertions(+), 157 deletions(-) >>>