Patch against current amd-staging-4.6 is attached. Regards, Felix
On 16-08-13 05:25 AM, Christian König wrote: > Am 13.08.2016 um 01:22 schrieb Felix Kuehling: >> [CC Kent FYI] >> >> On 16-08-11 04:31 PM, Deucher, Alexander wrote: >>>> -----Original Message----- >>>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf >>>> Of Felix Kuehling >>>> Sent: Thursday, August 11, 2016 3:52 PM >>>> To: Michel Dänzer; Christian König >>>> Cc: amd-gfx@lists.freedesktop.org >>>> Subject: Reverted another change to fix buffer move hangs (was Re: >>>> [PATCH] drm/ttm: partial revert "cleanup ttm_tt_(unbind|destroy)" v2) >>>> >>>> We had to revert another change on the KFD branch to fix a buffer move >>>> problem: 8b6b79f43801f00ddcdc10a4d5719eba4b2e32aa (drm/amdgpu: >>>> group BOs >>>> by log2 of the size on the LRU v2 >>> That makes sense. I think you may want a different LRU scheme for >>> KFD or at least special handling for KFD buffers. >> [FK] But I think the patch shouldn't cause hangs, regardless. >> >> I eventually found what the problem was. The "group BOs by log2 of the >> size on the LRU v2" patch exposed a latent bug related to the GART size. >> On our KFD branch, we calculate the GART size differently, and it can >> easily go above 4GB. I think on amd-staging-4.6 the GART size can also >> go above 4GB on cards with lots of VRAM. >> >> However, the offset parameter in amdgpu_gart_bind and unbind is only >> 32-bit. With the patch our test ended up using GART offsets beyond 4GB >> for the first time. Changing the offset parameter to uint64_t fixes the >> problem. > > Nice catch, please provide a patch to fix this. > >> Our test also demonstrates a potential flaw in the log2 grouping patch: >> When a buffer of a previously unused size is added to the LRU, it gets >> added to the front of the list, rather than the tail. So an application >> that allocates a very large buffer after a bunch of smaller buffers, is >> very likely to have that buffer evicted over and over again before any >> smaller buffers are considered for eviction. I believe, this can result >> in thrashing of large buffers. >> >> Some other observations: When the last BO of a given size is removed >> from the LRU list, the LRU tail for that size is left "floating" in the >> middle of the LRU list. So the next BO of that size that is added, will >> be added at an arbitrary position in the list. It may even end up in the >> middle of a block of pages of a different size. So a log2 grouping may >> end up being split. > > Yeah, those are more or less known issues. > > Keep in mind that we only added the grouping by log2 of the size to > have a justification to push the TTM changes upstream for the coming > KFD fences. > > E.g. so that we are able to have this upstream before we try to push > on the fence code. > > I will take a look at fixing those issues when I have time, shouldn't > be to complicated to set the entries to zero when they aren't used or > adjust other entries as well when some are added. > > Regards, > Christian. > >> >> Regards, >> Felix >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >
>From 34b9f9e33fc83e375c291e8d3480f8faff150183 Mon Sep 17 00:00:00 2001 From: Felix Kuehling <felix.kuehl...@amd.com> Date: Fri, 12 Aug 2016 19:25:21 -0400 Subject: [PATCH] drm/amdgpu: Change GART offset to 64-bit The GART aperture size can be bigger than 4GB. Therefore the offset used in amdgpu_gart_bind and amdgpu_gart_unbind must be 64-bit. Signed-off-by: Felix Kuehling <felix.kuehl...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d4ab35a..a55eb6f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -662,9 +662,9 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); -void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset, +void amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); -int amdgpu_gart_bind(struct amdgpu_device *adev, unsigned offset, +int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset, int pages, struct page **pagelist, dma_addr_t *dma_addr, uint32_t flags); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 921bce2..0feea34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -221,7 +221,7 @@ void amdgpu_gart_table_vram_free(struct amdgpu_device *adev) * Unbinds the requested pages from the gart page table and * replaces them with the dummy page (all asics). */ -void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset, +void amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages) { unsigned t; @@ -268,7 +268,7 @@ void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset, * (all asics). * Returns 0 for success, -EINVAL for failure. */ -int amdgpu_gart_bind(struct amdgpu_device *adev, unsigned offset, +int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset, int pages, struct page **pagelist, dma_addr_t *dma_addr, uint32_t flags) { -- 1.9.1
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx