On Wed, Oct 13, 2021 at 10:08:21PM +0200, Salvatore Bonaccorso wrote: > Hi, > > On Mon, Oct 11, 2021 at 10:30:21AM +0200, Christian König wrote: > > Am 10.10.21 um 16:14 schrieb Xi Ruoyao: > > > On Sun, 2021-10-10 at 14:46 +0100, Nathaniel Filardo wrote: > > > > It occurs to me, quite belatedly, that it may be worth asking the > > > > author, reviewers, and signers of the change in question their > > > > thoughts on this bug report: > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-bin%2Fbugreport.cgi%3Fbug%3D990279&data=04%7C01%7Cchristian.koenig%40amd.com%7C915628061dd746062c5408d98bf84df9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637694721282436279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=V4R4BPCHNQzx2bF6STDzfjW%2BQezZg89w8%2FEeRpuRVnM%3D&reserved=0 > > > > > > > > In particular, on ppc64 systems, Linux typically is configured to use > > > > a 64KiB page (i.e., shift 16) rather than 4KiB (shift 12) page. It > > > > looks, however, that AMDGPU_GPU_PAGE_SIZE is always 4096, and so > > > > something (perhaps in userspace, even, eek?) is requesting > > > > 4KiB-but-not-64KiB alignment of this buffer. > > > Christian told me the buffer should be aligned to *CPU* page boundary, > > > or the page table in AMDGPU driver will be corrupted: > > > > Yeah, that's indeed correct. And that intentionally breaks because otherwise > > we can corrupt the page tables and potentially cause much worse trouble. > > > > Question is more why userspace isn't told the correct value in your branch. > > > > > > > > > the value of num_entries must always be a multiple of > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. > > > > You need to identify the root cause of this, most likely start or last > > > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > > IMO f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable > > > dev_info.gart_page_size" should be backported along with this, which > > > makes the kernel to provide the CPU page size to libdrm and mesa and > > > correct userspace behavior. I'm not sure why only one is backported. > > > > > > Yes, exactly that sounds like the correct fix to me as well. > > So, the 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo > map") was backported to several stable series 4.14.229, 4.19.185, > 5.4.110, 5.10.28 and 5.11.12 but not the > f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable > dev_info.gart_page_size". > > What is confusely is that all of those backports reference as upstream > commit e3512fb67093fabdf27af303066627b921ee9bd8 and not > 9a89a721b41b23c6da8f8a6dd0e382966a850dcf which might be in part source > of the confusion? > > Can any of you request to backport > f4d3da72a76a9ce5f57bba64788931686a9dc333 as well for those stable > series where relevant?
Here is the proposed change. Should this be submitted to stable for 5.10.y? Regards, Salvatore >From 02c987eb2ab0cdfd536d08bf812f4e37d3cc150a Mon Sep 17 00:00:00 2001 From: Huacai Chen <che...@lemote.com> Date: Tue, 30 Mar 2021 23:33:33 +0800 Subject: [PATCH] drm/amdgpu: Set a suitable dev_info.gart_page_size MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit f4d3da72a76a9ce5f57bba64788931686a9dc333 upstream. In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE). Signed-off-by: Rui Wang <wa...@lemote.com> Signed-off-by: Huacai Chen <che...@lemote.com> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao <xry...@mengyan1223.wang> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <d...@danny.cz> Reviewed-by: Christian König <christian.koe...@amd.com> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> [Salvatore Bonaccorso: Backport to 5.10.y which does not contain a5a52a43eac0 ("drm/amd/amdgpu/amdgpu_kms: Remove 'struct drm_amdgpu_info_device dev_info' from the stack") which removes dev_info from the stack and places it on the heap.] Signed-off-by: Salvatore Bonaccorso <car...@debian.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index efda38349a03..917b94002f4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -766,9 +766,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file dev_info.high_va_offset = AMDGPU_GMC_HOLE_END; dev_info.high_va_max = AMDGPU_GMC_HOLE_END | vm_size; } - dev_info.virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); + dev_info.virtual_address_alignment = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info.pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; - dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE; + dev_info.gart_page_size = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info.cu_active_number = adev->gfx.cu_info.number; dev_info.cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; dev_info.ce_ram_size = adev->gfx.ce_ram_size; -- 2.33.0