[PATCH] drm/amd/display: Fix memory leak in dm_sw_fini()

2024-02-13 Thread Armin Wolf
After destroying dmub_srv, the memory associated with it is
not freed, causing a memory leak:

unreferenced object 0x896302b45800 (size 1024):
  comm "(udev-worker)", pid 222, jiffies 4294894636
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace (crc 6265fd77):
[] kmalloc_trace+0x29d/0x340
[] dm_dmub_sw_init+0xb4/0x450 [amdgpu]
[] dm_sw_init+0x15/0x2b0 [amdgpu]
[] amdgpu_device_init+0x1417/0x24e0 [amdgpu]
[] amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
[] amdgpu_pci_probe+0x187/0x4e0 [amdgpu]
[] local_pci_probe+0x3e/0x90
[] pci_device_probe+0xc3/0x230
[] really_probe+0xe2/0x480
[] __driver_probe_device+0x78/0x160
[] driver_probe_device+0x1f/0x90
[] __driver_attach+0xce/0x1c0
[] bus_for_each_dev+0x70/0xc0
[] bus_add_driver+0x112/0x210
[] driver_register+0x55/0x100
[] do_one_initcall+0x41/0x300

Fix this by freeing dmub_srv after destroying it.

Fixes: 743b9786b14a ("drm/amd/display: Hook up the DMUB service in DM")
Signed-off-by: Armin Wolf 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 59d2eee72a32..9cbfc8d39dee 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2287,6 +2287,7 @@ static int dm_sw_fini(void *handle)

if (adev->dm.dmub_srv) {
dmub_srv_destroy(adev->dm.dmub_srv);
+   kfree(adev->dm.dmub_srv);
adev->dm.dmub_srv = NULL;
}

--
2.39.2



Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-11 Thread Armin Wolf

Am 04.06.24 um 20:28 schrieb Deucher, Alexander:


[AMD Official Use Only - AMD Internal Distribution Only]


-Original Message-
From: Kuehling, Felix 
Sent: Tuesday, June 4, 2024 2:25 PM
To: Armin Wolf ; Deucher, Alexander
; Koenig, Christian
; Pan, Xinhui ;
gre...@linuxfoundation.org; sas...@kernel.org
Cc: sta...@vger.kernel.org; bkau...@gmail.com; Zhang, Yifan
; Liang, Prike ; dri-
de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device
init"


On 2024-06-03 18:19, Armin Wolf wrote:

Am 23.05.24 um 19:30 schrieb Armin Wolf:


This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the
problematic commit and verified that by reverting it the notebook works

again.

He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Hi,

what is the status of this?

Which branch is this for? This patch won't apply to anything after Linux 6.5.

It's applicable to 5.15 stable only.  The original patch caused a regression on 
5.15 so probably should not have been applied there.

Alex


Correct, and i would be very grateful if this regression could be resolved in 
the near future.
The user already wrote a blog post about the whole issue, see here:

https://bkhome.org/news/202405/kernel-amd-gpu-disaster-fixed.html

Thanks,
Armin Wolf


Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:

commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
Author: Alex Deucher 
Date:   Fri Jul 28 12:20:12 2023 -0400

  drm/amdkfd: drop IOMMUv2 support

  Now that we use the dGPU path for all APUs, drop the
  IOMMUv2 support.

  v2: drop the now unused queue manager functions for gfx7/8 APUs

  Reviewed-by: Felix Kuehling 
  Acked-by: Christian König 
  Tested-by: Mike Lothian 
  Signed-off-by: Alex Deucher 

Regards,
    Felix



Armin Wolf


Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
   if (r)
   goto init_failed;

+r = amdgpu_amdkfd_resume_iommu(adev);
+if (r)
+goto init_failed;
+
   r = amdgpu_device_ip_hw_init_phase1(adev);
   if (r)
   goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
   if (!adev->gmc.xgmi.pending_reset)
   amdgpu_amdkfd_device_init(adev);

-r = amdgpu_amdkfd_resume_iommu(adev);
-if (r)
-goto init_failed;
-
   amdgpu_fru_get_product_info(adev);

   init_failed:
--
2.39.2




Re: Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree

2024-06-14 Thread Armin Wolf

Am 12.06.24 um 14:45 schrieb gre...@linuxfoundation.org:


This is a note to let you know that I've just added the patch titled

 Revert "drm/amdgpu: init iommu after amdkfd device init"

to the 5.15-stable tree which can be found at:
 
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
  revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
and it can be found in the queue-5.15 subdirectory.


Thank you :)



If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


 From w_ar...@gmx.de  Wed Jun 12 14:43:21 2024
From: Armin Wolf 
Date: Thu, 23 May 2024 19:30:31 +0200
Subject: Revert "drm/amdgpu: init iommu after amdkfd device init"
To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, 
gre...@linuxfoundation.org, sas...@kernel.org
Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, 
prike.li...@amd.com, dri-de...@lists.freedesktop.org, 
amd-gfx@lists.freedesktop.org
Message-ID: <20240523173031.4212-1-w_ar...@gmx.de>

From: Armin Wolf 

This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de
Signed-off-by: Greg Kroah-Hartman 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

  init_failed:


Patches currently in stable-queue which might be from w_ar...@gmx.de are

queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch


Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-06-17 Thread Armin Wolf

Am 23.05.24 um 18:29 schrieb Greg KH:


On Thu, May 23, 2024 at 05:59:39PM +0200, Armin Wolf wrote:

Am 23.05.24 um 15:13 schrieb Barry Kauler:


On Wed, May 22, 2024 at 12:58 AM Armin Wolf  wrote:

Am 20.05.24 um 18:22 schrieb Alex Deucher:


On Sat, May 18, 2024 at 8:17 PM Armin Wolf  wrote:

Am 17.05.24 um 03:30 schrieb Barry Kauler:


Armin, Yifan, Prike,
I will top-post, so you don't have to scroll down.
After identifying the commit that causes black screen with my gpu, I
posted the result to you guys, on May 9.
It is now May 17 and no reply.
OK, I have now created a patch that reverts Yifan's commit, compiled
5.15.158, and my gpu now works.
Note, the radeon module is not loaded, so it is not a factor.
I'm not a kernel developer. I have identified the culprit and it is up
to you guys to fix it, Yifan especially, as you are the person who has
created the regression.
I will attach my patch.
Regards,
Barry Kauler

Hi,

sorry for not responding to your findings. I normally do not work with GPU 
drivers,
so i hoped one of the amdgpu developers would handle this.

I cceddri-de...@lists.freedesktop.org  and amd-gfx@lists.freedesktop.org so 
that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.

Likely this patch should not have been ported to 5.15 in the first
place.  The IOMMU requirements have been dropped from the driver for
the last few kernel versions so it is no longer relevant on newer
kernels.

Alex

Barry, can you verify that the latest upstream kernel works on you device?
If yes, then the commit itself is ok and just the backporting itself was wrong.

Thanks,
Armin Wolf

Armin,
The unmodified 6.8.1 kernel works ok.
I presume that patch was applied long before 6.8.1 got released and
only got backported to 5.15.x recently.

Regards,
Barry


Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: 
init iommu after amdkfd device init")
from the 5.15.y series.

I CCed the stable mailing list so that they can revert the offending commit.

Please submit the patch/revert that you wish to have applied to the tree
so we can have the correct information in it.  I have no idea what to do
here with this deep response thread as-is, sorry.

thanks,

greg k-h


Hi,

the new 5.15.161 kernel finally contains the necessary patch (many thanks to 
the stable team :)).

Barry, can you test this kernel version and report if the issue is now gone?

Thanks,
Armin Wolf



Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-05-19 Thread Armin Wolf

Am 17.05.24 um 03:30 schrieb Barry Kauler:


Armin, Yifan, Prike,
I will top-post, so you don't have to scroll down.
After identifying the commit that causes black screen with my gpu, I
posted the result to you guys, on May 9.
It is now May 17 and no reply.
OK, I have now created a patch that reverts Yifan's commit, compiled
5.15.158, and my gpu now works.
Note, the radeon module is not loaded, so it is not a factor.
I'm not a kernel developer. I have identified the culprit and it is up
to you guys to fix it, Yifan especially, as you are the person who has
created the regression.
I will attach my patch.
Regards,
Barry Kauler


Hi,

sorry for not responding to your findings. I normally do not work with GPU 
drivers,
so i hoped one of the amdgpu developers would handle this.

I cceddri-de...@lists.freedesktop.org  and amd-gfx@lists.freedesktop.org so 
that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.
Armin Wolf



On Thu, May 9, 2024 at 4:08 PM Barry Kauler  wrote:

On Fri, May 3, 2024 at 9:03 PM Armin Wolf  wrote:

...
# lspci | grep VGA
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c2)
05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
[AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver

# lspci -n -k
...
05:00.0 0300: 1002:15d8 (rev c2)
Subsystem: 1025:1456
Kernel driver in use: amdgpu
Kernel modules: amdgpu
...

thanks for informing us of this regression. Since there are four commits 
affecting
amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty 
commits,
see https://docs.kernel.org/admin-guide/bug-bisect.html for details.

I think you can speed up the bisecting process by limiting yourself to the AMD 
DRM
driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at 
the
man page of "git bisect" for details.

Thanks,
Armin Wolf

Armin,
Thanks for the advice. I am unfamiliar with git on the commandline.
Previously only used SmartGit gui.
EasyOS requires aufs patch, and for a few days tried to figure out how
to use that with git bisect, then gave up. Changed to testing with my
"QV" distro, which is more conventional, doesn't need any kernel
patches. Managed to get it down to one commit. Here are the steps I
followed:

# git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
# cd linux-stable
# git tag -l | grep '5\.15\.150'
v5.15.150
# git checkout -b my5.15.150 v5.15.150
Updating files: 100% (65776/65776), done.
Switched to a new branch 'my5.15.150'

Copied in my .config then...

# make menuconfig
# git bisect start -- drivers/gpu/drm/amd
# git bisect bad
# git bisect good v5.15.149
Bisecting: 1 revision left to test after this (roughly 1 step)
[b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
s3 suspend abort case
# make
# rm -rf /boot2
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, with amdgpu, works!
# git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
after amdkfd device init
# make
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, black screen!

# git bisect bad
56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
commit 56b522f4668167096a50c39446d6263c96219f5f
Author: Yifan Zhang 
Date:   Tue Sep 28 15:42:35 2021 +0800

 drm/amdgpu: init iommu after amdkfd device init

 [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]

 This patch is to fix clinfo failure in Raven/Picasso:

 Number of platforms: 1
   Platform Profile: FULL_PROFILE
   Platform Version: OpenCL 2.2 AMD-APP (3364.0)
   Platform Name: AMD Accelerated Parallel Processing
   Platform Vendor: Advanced Micro Devices, Inc.
   Platform Extensions: cl_khr_icd cl_amd_event_callback

   Platform Name: AMD Accelerated Parallel Processing Number of devices: 0

 Signed-off-by: Yifan Zhang 
 Reviewed-by: James Zhu 
 Tested-by: James Zhu 
 Acked-by: Felix Kuehling 
 Signed-off-by: Alex Deucher 
 Signed-off-by: Sasha Levin 

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

Anything else I should do, to identify what in this commit is the
likely culprit?
Regards,
Barry Kauler


Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-05-22 Thread Armin Wolf

Am 20.05.24 um 18:22 schrieb Alex Deucher:


On Sat, May 18, 2024 at 8:17 PM Armin Wolf  wrote:

Am 17.05.24 um 03:30 schrieb Barry Kauler:


Armin, Yifan, Prike,
I will top-post, so you don't have to scroll down.
After identifying the commit that causes black screen with my gpu, I
posted the result to you guys, on May 9.
It is now May 17 and no reply.
OK, I have now created a patch that reverts Yifan's commit, compiled
5.15.158, and my gpu now works.
Note, the radeon module is not loaded, so it is not a factor.
I'm not a kernel developer. I have identified the culprit and it is up
to you guys to fix it, Yifan especially, as you are the person who has
created the regression.
I will attach my patch.
Regards,
Barry Kauler

Hi,

sorry for not responding to your findings. I normally do not work with GPU 
drivers,
so i hoped one of the amdgpu developers would handle this.

I cceddri-de...@lists.freedesktop.org  and amd-gfx@lists.freedesktop.org so 
that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.

Likely this patch should not have been ported to 5.15 in the first
place.  The IOMMU requirements have been dropped from the driver for
the last few kernel versions so it is no longer relevant on newer
kernels.

Alex


Barry, can you verify that the latest upstream kernel works on you device?
If yes, then the commit itself is ok and just the backporting itself was wrong.

Thanks,
Armin Wolf


Armin Wolf


On Thu, May 9, 2024 at 4:08 PM Barry Kauler  wrote:

On Fri, May 3, 2024 at 9:03 PM Armin Wolf  wrote:

...
# lspci | grep VGA
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c2)
05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
[AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver

# lspci -n -k
...
05:00.0 0300: 1002:15d8 (rev c2)
Subsystem: 1025:1456
Kernel driver in use: amdgpu
Kernel modules: amdgpu
...

thanks for informing us of this regression. Since there are four commits 
affecting
amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty 
commits,
see https://docs.kernel.org/admin-guide/bug-bisect.html for details.

I think you can speed up the bisecting process by limiting yourself to the AMD 
DRM
driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at 
the
man page of "git bisect" for details.

Thanks,
Armin Wolf

Armin,
Thanks for the advice. I am unfamiliar with git on the commandline.
Previously only used SmartGit gui.
EasyOS requires aufs patch, and for a few days tried to figure out how
to use that with git bisect, then gave up. Changed to testing with my
"QV" distro, which is more conventional, doesn't need any kernel
patches. Managed to get it down to one commit. Here are the steps I
followed:

# git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
# cd linux-stable
# git tag -l | grep '5\.15\.150'
v5.15.150
# git checkout -b my5.15.150 v5.15.150
Updating files: 100% (65776/65776), done.
Switched to a new branch 'my5.15.150'

Copied in my .config then...

# make menuconfig
# git bisect start -- drivers/gpu/drm/amd
# git bisect bad
# git bisect good v5.15.149
Bisecting: 1 revision left to test after this (roughly 1 step)
[b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
s3 suspend abort case
# make
# rm -rf /boot2
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, with amdgpu, works!
# git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
after amdkfd device init
# make
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, black screen!

# git bisect bad
56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
commit 56b522f4668167096a50c39446d6263c96219f5f
Author: Yifan Zhang 
Date:   Tue Sep 28 15:42:35 2021 +0800

  drm/amdgpu: init iommu after amdkfd device init

  [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]

  This patch is to fix clinfo failure in Raven/Picasso:

  Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3364.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback

Platform Name: AMD Accelerated Parallel Processing Number of devices: 0

  Signed-off-by: Yifan Zhang 
  Reviewed-by: James Zhu 
  Tested-by: James Zhu 
  Acked-by: Felix Kuehling 
  Signed-off-by

Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-05-23 Thread Armin Wolf

Am 23.05.24 um 15:13 schrieb Barry Kauler:


On Wed, May 22, 2024 at 12:58 AM Armin Wolf  wrote:

Am 20.05.24 um 18:22 schrieb Alex Deucher:


On Sat, May 18, 2024 at 8:17 PM Armin Wolf  wrote:

Am 17.05.24 um 03:30 schrieb Barry Kauler:


Armin, Yifan, Prike,
I will top-post, so you don't have to scroll down.
After identifying the commit that causes black screen with my gpu, I
posted the result to you guys, on May 9.
It is now May 17 and no reply.
OK, I have now created a patch that reverts Yifan's commit, compiled
5.15.158, and my gpu now works.
Note, the radeon module is not loaded, so it is not a factor.
I'm not a kernel developer. I have identified the culprit and it is up
to you guys to fix it, Yifan especially, as you are the person who has
created the regression.
I will attach my patch.
Regards,
Barry Kauler

Hi,

sorry for not responding to your findings. I normally do not work with GPU 
drivers,
so i hoped one of the amdgpu developers would handle this.

I cceddri-de...@lists.freedesktop.org  and amd-gfx@lists.freedesktop.org so 
that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.

Likely this patch should not have been ported to 5.15 in the first
place.  The IOMMU requirements have been dropped from the driver for
the last few kernel versions so it is no longer relevant on newer
kernels.

Alex

Barry, can you verify that the latest upstream kernel works on you device?
If yes, then the commit itself is ok and just the backporting itself was wrong.

Thanks,
Armin Wolf

Armin,
The unmodified 6.8.1 kernel works ok.
I presume that patch was applied long before 6.8.1 got released and
only got backported to 5.15.x recently.

Regards,
Barry


Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: 
init iommu after amdkfd device init")
from the 5.15.y series.

I CCed the stable mailing list so that they can revert the offending commit.

Thanks,
Armin Wolf


Armin Wolf


On Thu, May 9, 2024 at 4:08 PM Barry Kauler  wrote:

On Fri, May 3, 2024 at 9:03 PM Armin Wolf  wrote:

...
# lspci | grep VGA
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c2)
05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
[AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver

# lspci -n -k
...
05:00.0 0300: 1002:15d8 (rev c2)
Subsystem: 1025:1456
Kernel driver in use: amdgpu
Kernel modules: amdgpu
...

thanks for informing us of this regression. Since there are four commits 
affecting
amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty 
commits,
see https://docs.kernel.org/admin-guide/bug-bisect.html for details.

I think you can speed up the bisecting process by limiting yourself to the AMD 
DRM
driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at 
the
man page of "git bisect" for details.

Thanks,
Armin Wolf

Armin,
Thanks for the advice. I am unfamiliar with git on the commandline.
Previously only used SmartGit gui.
EasyOS requires aufs patch, and for a few days tried to figure out how
to use that with git bisect, then gave up. Changed to testing with my
"QV" distro, which is more conventional, doesn't need any kernel
patches. Managed to get it down to one commit. Here are the steps I
followed:

# git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
# cd linux-stable
# git tag -l | grep '5\.15\.150'
v5.15.150
# git checkout -b my5.15.150 v5.15.150
Updating files: 100% (65776/65776), done.
Switched to a new branch 'my5.15.150'

Copied in my .config then...

# make menuconfig
# git bisect start -- drivers/gpu/drm/amd
# git bisect bad
# git bisect good v5.15.149
Bisecting: 1 revision left to test after this (roughly 1 step)
[b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
s3 suspend abort case
# make
# rm -rf /boot2
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, with amdgpu, works!
# git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
after amdkfd device init
# make
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, black screen!

# git bisect bad
56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
commit 56b522f4668167096a50c39446d6263c96219f5f
Author: Yifan Zhang 
Date:   Tue Sep 28 15:42:35 2021 +0800

   drm/amdgpu: init iommu after amdkfd device init

   [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]

   This patch is to fix clinfo failure in Raven/Picasso:

 

[PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-05-24 Thread Armin Wolf
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

 init_failed:
--
2.39.2



Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-04 Thread Armin Wolf

Am 23.05.24 um 19:30 schrieb Armin Wolf:


This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.


Hi,

what is the status of this?

Armin Wolf



Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

  init_failed:
--
2.39.2




Re: [PATCH v2 02/10] sysfs: introduce callback attribute_group::bin_size

2024-11-06 Thread Armin Wolf

Am 03.11.24 um 18:03 schrieb Thomas Weißschuh:


Several drivers need to dynamically calculate the size of an binary
attribute. Currently this is done by assigning attr->size from the
is_bin_visible() callback.


Hi,

i really like your idea of introducing this new callback, it will be very
useful for the wmi-bmof driver :).

Thanks,
Armin Wolf



This has drawbacks:
* It is not documented.
* A single attribute can be instantiated multiple times, overwriting the
   shared size field.
* It prevents the structure to be moved to read-only memory.

Introduce a new dedicated callback to calculate the size of the
attribute.

Signed-off-by: Thomas Weißschuh 
---
  fs/sysfs/group.c  | 2 ++
  include/linux/sysfs.h | 8 
  2 files changed, 10 insertions(+)

diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 
45b2e92941da1f49dcc71af3781317c61480c956..8b01a7eda5fb3239e138372417d01967c7a3f122
 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -98,6 +98,8 @@ static int create_files(struct kernfs_node *parent, struct 
kobject *kobj,
if (!mode)
continue;
}
+   if (grp->bin_size)
+   size = grp->bin_size(kobj, *bin_attr, i);

WARN(mode & ~(SYSFS_PREALLOC | 0664),
 "Attribute %s: Invalid permissions 0%o\n",
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 
c4e64dc112063f7cb89bf66059d0338716089e87..4746cccb95898b24df6f53de9421ea7649b5568f
 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -87,6 +87,11 @@ do { \
   *SYSFS_GROUP_VISIBLE() when assigning this callback to
   *specify separate _group_visible() and _attr_visible()
   *handlers.
+ * @bin_size:
+ * Optional: Function to return the size of a binary attribute
+ * of the group. Will be called repeatedly for each binary
+ * attribute in the group. Overwrites the size field embedded
+ * inside the attribute itself.
   * @attrs:Pointer to NULL terminated list of attributes.
   * @bin_attrs:Pointer to NULL terminated list of binary attributes.
   *Either attrs or bin_attrs or both must be provided.
@@ -97,6 +102,9 @@ struct attribute_group {
  struct attribute *, int);
umode_t (*is_bin_visible)(struct kobject *,
  struct bin_attribute *, int);
+   size_t  (*bin_size)(struct kobject *,
+   const struct bin_attribute *,
+   int);
struct attribute**attrs;
struct bin_attribute**bin_attrs;
  };





Re: amdgpu 4k@120Hz / HDMI 2.1

2025-01-10 Thread Armin Wolf

Am 09.01.25 um 10:19 schrieb Mischa Baars:

On Mon, Jan 6, 2025 at 4:30 AM Mario Limonciello
 wrote:


When new specifications are made available it's not like the old one
suddenly becomes "open", so I don't see any reason that a new
specification would change anything.

I paid about €3000 for my new PC, including €300 for the graphics card
with HDMI 2.1 output and about €2000 for my new Samsung OLED TV with 4
HDMI 2.1 inputs, and now you are telling me that I will not be able to
utilize them fully because the cable specification has not been made
publicly available?

Did someone forget to pay the people that design the cables? Because
that is what it sounds like. Why does Linux stay behind?


Sadly the HDMI forum only provides the HDMI specification under a special 
license which
prohibits implementing it in open source drivers.

Since membership inside the HDMI forum costs 15000$ annually, i suspect that 
the HDMI forum
is abusing its power to force people to join (and pay).

I can feel your disappointment, but there is nothing we can do which does not 
land us in court :(.

Thanks,
Armin Wolf


On Mon, Jan 6, 2025 at 4:41 PM Michel Dänzer  wrote:

On 2024-12-31 13:42, Mischa Baars wrote:

In the meantime I also checked the framerate synchronization through
glxgears at different resolutions and framerates. This does function
as expected. Although I haven't yet inspected the glxgears source
codes in detail, the OpenGL double buffering must be functional up to
some level. This means that the problem must be confined to GTK and
the GtkGLArea widget. Using GDK_BACKEND=x11 I do get a double buffered
context, but the default buffer does not alternate between GL_FRONT
and GL_BACK.

Yeah, that's not how double-buffering works in GL. The draw buffer is always 
GL_BACK, SwapBuffers doesn't affect that (it just may internally change which 
actual buffer GL_BACK refers to).

I don't see more context about the issue you're investigating, any pointers?


--
Earthling Michel Dänzer   \GNOME / Xwayland / Mesa developer
https://redhat.com \   Libre software enthusiast