[PATCH] drm/amd/display: Fix memory leak in dm_sw_fini()
After destroying dmub_srv, the memory associated with it is not freed, causing a memory leak: unreferenced object 0x896302b45800 (size 1024): comm "(udev-worker)", pid 222, jiffies 4294894636 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace (crc 6265fd77): [] kmalloc_trace+0x29d/0x340 [] dm_dmub_sw_init+0xb4/0x450 [amdgpu] [] dm_sw_init+0x15/0x2b0 [amdgpu] [] amdgpu_device_init+0x1417/0x24e0 [amdgpu] [] amdgpu_driver_load_kms+0x15/0x190 [amdgpu] [] amdgpu_pci_probe+0x187/0x4e0 [amdgpu] [] local_pci_probe+0x3e/0x90 [] pci_device_probe+0xc3/0x230 [] really_probe+0xe2/0x480 [] __driver_probe_device+0x78/0x160 [] driver_probe_device+0x1f/0x90 [] __driver_attach+0xce/0x1c0 [] bus_for_each_dev+0x70/0xc0 [] bus_add_driver+0x112/0x210 [] driver_register+0x55/0x100 [] do_one_initcall+0x41/0x300 Fix this by freeing dmub_srv after destroying it. Fixes: 743b9786b14a ("drm/amd/display: Hook up the DMUB service in DM") Signed-off-by: Armin Wolf --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 59d2eee72a32..9cbfc8d39dee 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2287,6 +2287,7 @@ static int dm_sw_fini(void *handle) if (adev->dm.dmub_srv) { dmub_srv_destroy(adev->dm.dmub_srv); + kfree(adev->dm.dmub_srv); adev->dm.dmub_srv = NULL; } -- 2.39.2
Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"
Am 04.06.24 um 20:28 schrieb Deucher, Alexander: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Kuehling, Felix Sent: Tuesday, June 4, 2024 2:25 PM To: Armin Wolf ; Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui ; gre...@linuxfoundation.org; sas...@kernel.org Cc: sta...@vger.kernel.org; bkau...@gmail.com; Zhang, Yifan ; Liang, Prike ; dri- de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init" On 2024-06-03 18:19, Armin Wolf wrote: Am 23.05.24 um 19:30 schrieb Armin Wolf: This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Hi, what is the status of this? Which branch is this for? This patch won't apply to anything after Linux 6.5. It's applicable to 5.15 stable only. The original patch caused a regression on 5.15 so probably should not have been applied there. Alex Correct, and i would be very grateful if this regression could be resolved in the near future. The user already wrote a blog post about the whole issue, see here: https://bkhome.org/news/202405/kernel-amd-gpu-disaster-fixed.html Thanks, Armin Wolf Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by: commit c99a2e7ae291e5b19b60443eb6397320ef9e8571 Author: Alex Deucher Date: Fri Jul 28 12:20:12 2023 -0400 drm/amdkfd: drop IOMMUv2 support Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling Acked-by: Christian König Tested-by: Mike Lothian Signed-off-by: Alex Deucher Regards, Felix Armin Wolf Reported-by: Barry Kauler Signed-off-by: Armin Wolf --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 222a1d9ecf16..5f6c32ec674d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; +r = amdgpu_amdkfd_resume_iommu(adev); +if (r) +goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); -r = amdgpu_amdkfd_resume_iommu(adev); -if (r) -goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed: -- 2.39.2
Re: Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree
Am 12.06.24 um 14:45 schrieb gre...@linuxfoundation.org: This is a note to let you know that I've just added the patch titled Revert "drm/amdgpu: init iommu after amdkfd device init" to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch and it can be found in the queue-5.15 subdirectory. Thank you :) If you, or anyone else, feels it should not be added to the stable tree, please let know about it. From w_ar...@gmx.de Wed Jun 12 14:43:21 2024 From: Armin Wolf Date: Thu, 23 May 2024 19:30:31 +0200 Subject: Revert "drm/amdgpu: init iommu after amdkfd device init" To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, gre...@linuxfoundation.org, sas...@kernel.org Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, prike.li...@amd.com, dri-de...@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Message-ID: <20240523173031.4212-1-w_ar...@gmx.de> From: Armin Wolf This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Reported-by: Barry Kauler Signed-off-by: Armin Wolf Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 1 file changed, 4 insertions(+), 4 deletions(-) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct if (r) goto init_failed; + r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed: Patches currently in stable-queue which might be from w_ar...@gmx.de are queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU
Am 23.05.24 um 18:29 schrieb Greg KH: On Thu, May 23, 2024 at 05:59:39PM +0200, Armin Wolf wrote: Am 23.05.24 um 15:13 schrieb Barry Kauler: On Wed, May 22, 2024 at 12:58 AM Armin Wolf wrote: Am 20.05.24 um 18:22 schrieb Alex Deucher: On Sat, May 18, 2024 at 8:17 PM Armin Wolf wrote: Am 17.05.24 um 03:30 schrieb Barry Kauler: Armin, Yifan, Prike, I will top-post, so you don't have to scroll down. After identifying the commit that causes black screen with my gpu, I posted the result to you guys, on May 9. It is now May 17 and no reply. OK, I have now created a patch that reverts Yifan's commit, compiled 5.15.158, and my gpu now works. Note, the radeon module is not loaded, so it is not a factor. I'm not a kernel developer. I have identified the culprit and it is up to you guys to fix it, Yifan especially, as you are the person who has created the regression. I will attach my patch. Regards, Barry Kauler Hi, sorry for not responding to your findings. I normally do not work with GPU drivers, so i hoped one of the amdgpu developers would handle this. I cceddri-de...@lists.freedesktop.org and amd-gfx@lists.freedesktop.org so that other amdgpu developers hear from this issue. Thanks you for you persistence in finding the offending commit. Likely this patch should not have been ported to 5.15 in the first place. The IOMMU requirements have been dropped from the driver for the last few kernel versions so it is no longer relevant on newer kernels. Alex Barry, can you verify that the latest upstream kernel works on you device? If yes, then the commit itself is ok and just the backporting itself was wrong. Thanks, Armin Wolf Armin, The unmodified 6.8.1 kernel works ok. I presume that patch was applied long before 6.8.1 got released and only got backported to 5.15.x recently. Regards, Barry Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: init iommu after amdkfd device init") from the 5.15.y series. I CCed the stable mailing list so that they can revert the offending commit. Please submit the patch/revert that you wish to have applied to the tree so we can have the correct information in it. I have no idea what to do here with this deep response thread as-is, sorry. thanks, greg k-h Hi, the new 5.15.161 kernel finally contains the necessary patch (many thanks to the stable team :)). Barry, can you test this kernel version and report if the issue is now gone? Thanks, Armin Wolf
Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU
Am 17.05.24 um 03:30 schrieb Barry Kauler: Armin, Yifan, Prike, I will top-post, so you don't have to scroll down. After identifying the commit that causes black screen with my gpu, I posted the result to you guys, on May 9. It is now May 17 and no reply. OK, I have now created a patch that reverts Yifan's commit, compiled 5.15.158, and my gpu now works. Note, the radeon module is not loaded, so it is not a factor. I'm not a kernel developer. I have identified the culprit and it is up to you guys to fix it, Yifan especially, as you are the person who has created the regression. I will attach my patch. Regards, Barry Kauler Hi, sorry for not responding to your findings. I normally do not work with GPU drivers, so i hoped one of the amdgpu developers would handle this. I cceddri-de...@lists.freedesktop.org and amd-gfx@lists.freedesktop.org so that other amdgpu developers hear from this issue. Thanks you for you persistence in finding the offending commit. Armin Wolf On Thu, May 9, 2024 at 4:08 PM Barry Kauler wrote: On Fri, May 3, 2024 at 9:03 PM Armin Wolf wrote: ... # lspci | grep VGA 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c2) 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver # lspci -n -k ... 05:00.0 0300: 1002:15d8 (rev c2) Subsystem: 1025:1456 Kernel driver in use: amdgpu Kernel modules: amdgpu ... thanks for informing us of this regression. Since there are four commits affecting amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits, see https://docs.kernel.org/admin-guide/bug-bisect.html for details. I think you can speed up the bisecting process by limiting yourself to the AMD DRM driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the man page of "git bisect" for details. Thanks, Armin Wolf Armin, Thanks for the advice. I am unfamiliar with git on the commandline. Previously only used SmartGit gui. EasyOS requires aufs patch, and for a few days tried to figure out how to use that with git bisect, then gave up. Changed to testing with my "QV" distro, which is more conventional, doesn't need any kernel patches. Managed to get it down to one commit. Here are the steps I followed: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git # cd linux-stable # git tag -l | grep '5\.15\.150' v5.15.150 # git checkout -b my5.15.150 v5.15.150 Updating files: 100% (65776/65776), done. Switched to a new branch 'my5.15.150' Copied in my .config then... # make menuconfig # git bisect start -- drivers/gpu/drm/amd # git bisect bad # git bisect good v5.15.149 Bisecting: 1 revision left to test after this (roughly 1 step) [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for s3 suspend abort case # make # rm -rf /boot2 # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, with amdgpu, works! # git bisect good Bisecting: 0 revisions left to test after this (roughly 0 steps) [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu after amdkfd device init # make # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, black screen! # git bisect bad 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit commit 56b522f4668167096a50c39446d6263c96219f5f Author: Yifan Zhang Date: Tue Sep 28 15:42:35 2021 +0800 drm/amdgpu: init iommu after amdkfd device init [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ] This patch is to fix clinfo failure in Raven/Picasso: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.2 AMD-APP (3364.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0 Signed-off-by: Yifan Zhang Reviewed-by: James Zhu Tested-by: James Zhu Acked-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) Anything else I should do, to identify what in this commit is the likely culprit? Regards, Barry Kauler
Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU
Am 20.05.24 um 18:22 schrieb Alex Deucher: On Sat, May 18, 2024 at 8:17 PM Armin Wolf wrote: Am 17.05.24 um 03:30 schrieb Barry Kauler: Armin, Yifan, Prike, I will top-post, so you don't have to scroll down. After identifying the commit that causes black screen with my gpu, I posted the result to you guys, on May 9. It is now May 17 and no reply. OK, I have now created a patch that reverts Yifan's commit, compiled 5.15.158, and my gpu now works. Note, the radeon module is not loaded, so it is not a factor. I'm not a kernel developer. I have identified the culprit and it is up to you guys to fix it, Yifan especially, as you are the person who has created the regression. I will attach my patch. Regards, Barry Kauler Hi, sorry for not responding to your findings. I normally do not work with GPU drivers, so i hoped one of the amdgpu developers would handle this. I cceddri-de...@lists.freedesktop.org and amd-gfx@lists.freedesktop.org so that other amdgpu developers hear from this issue. Thanks you for you persistence in finding the offending commit. Likely this patch should not have been ported to 5.15 in the first place. The IOMMU requirements have been dropped from the driver for the last few kernel versions so it is no longer relevant on newer kernels. Alex Barry, can you verify that the latest upstream kernel works on you device? If yes, then the commit itself is ok and just the backporting itself was wrong. Thanks, Armin Wolf Armin Wolf On Thu, May 9, 2024 at 4:08 PM Barry Kauler wrote: On Fri, May 3, 2024 at 9:03 PM Armin Wolf wrote: ... # lspci | grep VGA 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c2) 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver # lspci -n -k ... 05:00.0 0300: 1002:15d8 (rev c2) Subsystem: 1025:1456 Kernel driver in use: amdgpu Kernel modules: amdgpu ... thanks for informing us of this regression. Since there are four commits affecting amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits, see https://docs.kernel.org/admin-guide/bug-bisect.html for details. I think you can speed up the bisecting process by limiting yourself to the AMD DRM driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the man page of "git bisect" for details. Thanks, Armin Wolf Armin, Thanks for the advice. I am unfamiliar with git on the commandline. Previously only used SmartGit gui. EasyOS requires aufs patch, and for a few days tried to figure out how to use that with git bisect, then gave up. Changed to testing with my "QV" distro, which is more conventional, doesn't need any kernel patches. Managed to get it down to one commit. Here are the steps I followed: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git # cd linux-stable # git tag -l | grep '5\.15\.150' v5.15.150 # git checkout -b my5.15.150 v5.15.150 Updating files: 100% (65776/65776), done. Switched to a new branch 'my5.15.150' Copied in my .config then... # make menuconfig # git bisect start -- drivers/gpu/drm/amd # git bisect bad # git bisect good v5.15.149 Bisecting: 1 revision left to test after this (roughly 1 step) [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for s3 suspend abort case # make # rm -rf /boot2 # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, with amdgpu, works! # git bisect good Bisecting: 0 revisions left to test after this (roughly 0 steps) [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu after amdkfd device init # make # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, black screen! # git bisect bad 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit commit 56b522f4668167096a50c39446d6263c96219f5f Author: Yifan Zhang Date: Tue Sep 28 15:42:35 2021 +0800 drm/amdgpu: init iommu after amdkfd device init [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ] This patch is to fix clinfo failure in Raven/Picasso: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.2 AMD-APP (3364.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0 Signed-off-by: Yifan Zhang Reviewed-by: James Zhu Tested-by: James Zhu Acked-by: Felix Kuehling Signed-off-by
Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU
Am 23.05.24 um 15:13 schrieb Barry Kauler: On Wed, May 22, 2024 at 12:58 AM Armin Wolf wrote: Am 20.05.24 um 18:22 schrieb Alex Deucher: On Sat, May 18, 2024 at 8:17 PM Armin Wolf wrote: Am 17.05.24 um 03:30 schrieb Barry Kauler: Armin, Yifan, Prike, I will top-post, so you don't have to scroll down. After identifying the commit that causes black screen with my gpu, I posted the result to you guys, on May 9. It is now May 17 and no reply. OK, I have now created a patch that reverts Yifan's commit, compiled 5.15.158, and my gpu now works. Note, the radeon module is not loaded, so it is not a factor. I'm not a kernel developer. I have identified the culprit and it is up to you guys to fix it, Yifan especially, as you are the person who has created the regression. I will attach my patch. Regards, Barry Kauler Hi, sorry for not responding to your findings. I normally do not work with GPU drivers, so i hoped one of the amdgpu developers would handle this. I cceddri-de...@lists.freedesktop.org and amd-gfx@lists.freedesktop.org so that other amdgpu developers hear from this issue. Thanks you for you persistence in finding the offending commit. Likely this patch should not have been ported to 5.15 in the first place. The IOMMU requirements have been dropped from the driver for the last few kernel versions so it is no longer relevant on newer kernels. Alex Barry, can you verify that the latest upstream kernel works on you device? If yes, then the commit itself is ok and just the backporting itself was wrong. Thanks, Armin Wolf Armin, The unmodified 6.8.1 kernel works ok. I presume that patch was applied long before 6.8.1 got released and only got backported to 5.15.x recently. Regards, Barry Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: init iommu after amdkfd device init") from the 5.15.y series. I CCed the stable mailing list so that they can revert the offending commit. Thanks, Armin Wolf Armin Wolf On Thu, May 9, 2024 at 4:08 PM Barry Kauler wrote: On Fri, May 3, 2024 at 9:03 PM Armin Wolf wrote: ... # lspci | grep VGA 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c2) 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver # lspci -n -k ... 05:00.0 0300: 1002:15d8 (rev c2) Subsystem: 1025:1456 Kernel driver in use: amdgpu Kernel modules: amdgpu ... thanks for informing us of this regression. Since there are four commits affecting amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits, see https://docs.kernel.org/admin-guide/bug-bisect.html for details. I think you can speed up the bisecting process by limiting yourself to the AMD DRM driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the man page of "git bisect" for details. Thanks, Armin Wolf Armin, Thanks for the advice. I am unfamiliar with git on the commandline. Previously only used SmartGit gui. EasyOS requires aufs patch, and for a few days tried to figure out how to use that with git bisect, then gave up. Changed to testing with my "QV" distro, which is more conventional, doesn't need any kernel patches. Managed to get it down to one commit. Here are the steps I followed: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git # cd linux-stable # git tag -l | grep '5\.15\.150' v5.15.150 # git checkout -b my5.15.150 v5.15.150 Updating files: 100% (65776/65776), done. Switched to a new branch 'my5.15.150' Copied in my .config then... # make menuconfig # git bisect start -- drivers/gpu/drm/amd # git bisect bad # git bisect good v5.15.149 Bisecting: 1 revision left to test after this (roughly 1 step) [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for s3 suspend abort case # make # rm -rf /boot2 # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, with amdgpu, works! # git bisect good Bisecting: 0 revisions left to test after this (roughly 0 steps) [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu after amdkfd device init # make # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, black screen! # git bisect bad 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit commit 56b522f4668167096a50c39446d6263c96219f5f Author: Yifan Zhang Date: Tue Sep 28 15:42:35 2021 +0800 drm/amdgpu: init iommu after amdkfd device init [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ] This patch is to fix clinfo failure in Raven/Picasso:
[PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Reported-by: Barry Kauler Signed-off-by: Armin Wolf --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 222a1d9ecf16..5f6c32ec674d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; + r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed: -- 2.39.2
Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"
Am 23.05.24 um 19:30 schrieb Armin Wolf: This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Hi, what is the status of this? Armin Wolf Reported-by: Barry Kauler Signed-off-by: Armin Wolf --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 222a1d9ecf16..5f6c32ec674d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; + r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed: -- 2.39.2
Re: [PATCH v2 02/10] sysfs: introduce callback attribute_group::bin_size
Am 03.11.24 um 18:03 schrieb Thomas Weißschuh: Several drivers need to dynamically calculate the size of an binary attribute. Currently this is done by assigning attr->size from the is_bin_visible() callback. Hi, i really like your idea of introducing this new callback, it will be very useful for the wmi-bmof driver :). Thanks, Armin Wolf This has drawbacks: * It is not documented. * A single attribute can be instantiated multiple times, overwriting the shared size field. * It prevents the structure to be moved to read-only memory. Introduce a new dedicated callback to calculate the size of the attribute. Signed-off-by: Thomas Weißschuh --- fs/sysfs/group.c | 2 ++ include/linux/sysfs.h | 8 2 files changed, 10 insertions(+) diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index 45b2e92941da1f49dcc71af3781317c61480c956..8b01a7eda5fb3239e138372417d01967c7a3f122 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -98,6 +98,8 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj, if (!mode) continue; } + if (grp->bin_size) + size = grp->bin_size(kobj, *bin_attr, i); WARN(mode & ~(SYSFS_PREALLOC | 0664), "Attribute %s: Invalid permissions 0%o\n", diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index c4e64dc112063f7cb89bf66059d0338716089e87..4746cccb95898b24df6f53de9421ea7649b5568f 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -87,6 +87,11 @@ do { \ *SYSFS_GROUP_VISIBLE() when assigning this callback to *specify separate _group_visible() and _attr_visible() *handlers. + * @bin_size: + * Optional: Function to return the size of a binary attribute + * of the group. Will be called repeatedly for each binary + * attribute in the group. Overwrites the size field embedded + * inside the attribute itself. * @attrs:Pointer to NULL terminated list of attributes. * @bin_attrs:Pointer to NULL terminated list of binary attributes. *Either attrs or bin_attrs or both must be provided. @@ -97,6 +102,9 @@ struct attribute_group { struct attribute *, int); umode_t (*is_bin_visible)(struct kobject *, struct bin_attribute *, int); + size_t (*bin_size)(struct kobject *, + const struct bin_attribute *, + int); struct attribute**attrs; struct bin_attribute**bin_attrs; };
Re: amdgpu 4k@120Hz / HDMI 2.1
Am 09.01.25 um 10:19 schrieb Mischa Baars: On Mon, Jan 6, 2025 at 4:30 AM Mario Limonciello wrote: When new specifications are made available it's not like the old one suddenly becomes "open", so I don't see any reason that a new specification would change anything. I paid about €3000 for my new PC, including €300 for the graphics card with HDMI 2.1 output and about €2000 for my new Samsung OLED TV with 4 HDMI 2.1 inputs, and now you are telling me that I will not be able to utilize them fully because the cable specification has not been made publicly available? Did someone forget to pay the people that design the cables? Because that is what it sounds like. Why does Linux stay behind? Sadly the HDMI forum only provides the HDMI specification under a special license which prohibits implementing it in open source drivers. Since membership inside the HDMI forum costs 15000$ annually, i suspect that the HDMI forum is abusing its power to force people to join (and pay). I can feel your disappointment, but there is nothing we can do which does not land us in court :(. Thanks, Armin Wolf On Mon, Jan 6, 2025 at 4:41 PM Michel Dänzer wrote: On 2024-12-31 13:42, Mischa Baars wrote: In the meantime I also checked the framerate synchronization through glxgears at different resolutions and framerates. This does function as expected. Although I haven't yet inspected the glxgears source codes in detail, the OpenGL double buffering must be functional up to some level. This means that the problem must be confined to GTK and the GtkGLArea widget. Using GDK_BACKEND=x11 I do get a double buffered context, but the default buffer does not alternate between GL_FRONT and GL_BACK. Yeah, that's not how double-buffering works in GL. The draw buffer is always GL_BACK, SwapBuffers doesn't affect that (it just may internally change which actual buffer GL_BACK refers to). I don't see more context about the issue you're investigating, any pointers? -- Earthling Michel Dänzer \GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast