Hi,it has been quite at while since I first started experiencing this particular bug I am about to describe. Suffice it to say during my Easter holiday I finally had the time to dig into it. It all started with an update of linux LTS from 6.6 to 6.12.
I am a user of the sway tiling window manager and have written a small utility to manage my display configuration across different setups. With the added twist that I wrote some code to determine which monitor inputs is currently in use using the monitor command interface. Anyway the interesting detail here is that, starting with kernel 6.12 I started running into the following problem. With my display management daemon running and attaching my Laptop to an external display my internal display would just freeze with no way to bring it back apart from power cycling the entire device. When my management daemon was not running this would not happen, I would then need to manually configure my display setup. Further investigation into the what is triggering the display freeze lead me into the part of the code where I am enumerating attached displays and am trying to match `i2c` devices to their corresponding display.
To get more specific the procedure is as follows, using udev enumerate all `i2c` busses and filter them base on some heuristics like device name and devices with parent devices drm / graphics device. Sadly this is not quite enough to already match an `i2c` command interface to the corresponding monitor, in many cases it is required to manually read the EDID information via the i2c interface and compare it to the known attached displays to get the match. And this is where the trigger for the display freeze is to be found.
Here is the output when scanning sysfs for my internal laptop display: ```# ls -al /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0/drm/card1/card1-eDP-1
total 0 drwxr-xr-x 6 root root 0 22. Apr 18:07 . drwxr-xr-x 11 root root 0 22. Apr 18:07 .. drwxr-xr-x 3 root root 0 22. Apr 18:07 amdgpu_bl1 -r--r--r-- 1 root root 4096 22. Apr 18:07 connector_id lrwxrwxrwx 1 root root 0 22. Apr 18:07 ddc -> ../../../i2c-3 lrwxrwxrwx 1 root root 0 22. Apr 18:07 device -> ../../card1 -r--r--r-- 1 root root 4096 22. Apr 18:07 dpms drwxr-xr-x 3 root root 0 22. Apr 18:07 drm_dp_aux0 -r--r--r-- 1 root root 0 22. Apr 18:07 edid -r--r--r-- 1 root root 4096 22. Apr 18:07 enabled drwxr-xr-x 4 root root 0 22. Apr 18:07 i2c-11 -r--r--r-- 1 root root 4096 22. Apr 18:07 modes drwxr-xr-x 2 root root 0 22. Apr 18:07 power -rw-r--r-- 1 root root 4096 22. Apr 18:07 statuslrwxrwxrwx 1 root root 0 22. Apr 18:07 subsystem -> ../../../../../../../class/drm
-rw-r--r-- 1 root root 4096 22. Apr 18:07 uevent ```As can be seen there are two i2c devices present, i2c-3 (as ddc symlink) and i2c-11. Now from the perspective of udev i2c-11 has the parent set to card1-eDP-1 while i2c-3 has the parent set to the drm device itself. More importantly I can not rule out i2c-3 as a valid command interface because in some cases valid command channels are never assigned to the corresponding display output directly but only live directly on the drm device, this is especially true when monitors are not attached directly but via a docking station. So I do have to look at each i2c device on its own. The freeze is trigged by trying to read edid from i2c-3: This is the code snipped I used to trigger the bug: https://github.com/ju6ge/libmonitor/blob/918b2543eafb96aca29f66debc70fd18fa21ee11/examples/via-i2c-dev.rs (adjusted target i2c interface accordingly). To be absolutely clear this is not the i2c device that is expected to work in every case of trying this with kernel 6.6 to 6.12 I get the following error message: DdcError(CommunicationError(ReceiveError(EIO: I/O error))). That is expected internal laptop displays do not support the command interface in most cases anyway. But what I do not expect to happen is that my Laptop screen freezes! And since this did not happen with kernel 6.6 but started happening with 6.12 this seems to be a software issue and with that a regression!
Next I bisected the kernel from 6.6 to 6.12 to determine when this regression was introduced. I attached the full bisect log to the email ;)
The offending commit seems to be:[58a261bfc96763a851cb48b203ed57da37e157b8] drm/amd/display: use a more lax vblank enable policy for older ASICs
Since this is quite a small commit I validated this by reverting the changes on a newer kernel version (patch attached as well). Testing actually shows that reverting the change resolves the screen freezing behavior for me.
Now I am not deep enough into graphics drivers to claim that just reverting the commit should be considered a valid fix. Just that the change is definitely responsible for the screen freezing now as apposed to before.
So what should be done here? I can validate any other suggested fixes against my setup or provide more information if need be.
Kind regards, Felix Richter #regzbot introduced: v6.6..v6.12
git bisect start # Status: warte auf guten und schlechten Commit # good: [ffc253263a1375a65fa6c9f62a893e9767fbebfa] Linux 6.6 git bisect good ffc253263a1375a65fa6c9f62a893e9767fbebfa # bad: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12 git bisect bad adc218676eef25575469234709c2d87185ca223a # good: [7ee04901215b3cab8fa35aa5bf4692d7aa312e36] Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel git bisect good 7ee04901215b3cab8fa35aa5bf4692d7aa312e36 # good: [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup guard git bisect good 280e36f0d5b997173d014c07484c03a7f7750668 # good: [26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4] Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux git bisect good 26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4 # bad: [431844b65f4c1b988ccd886f2ed29c138f7bb262] sched_ext: Provide a sysfs enable_seq counter git bisect bad 431844b65f4c1b988ccd886f2ed29c138f7bb262 # good: [3a7101e9b27fe97240c2fd430c71e61262447dd1] Merge tag 'powerpc-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good 3a7101e9b27fe97240c2fd430c71e61262447dd1 # bad: [ae2c6d8b3b88c176dff92028941a4023f1b4cb91] Merge tag 'drm-xe-next-fixes-2024-09-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next git bisect bad ae2c6d8b3b88c176dff92028941a4023f1b4cb91 # good: [34bb7b813ab398106f700b0a6b218509bb0b904c] drm/xe: Use xe_pm_runtime_get in xe_bo_move() if reclaim-safe. git bisect good 34bb7b813ab398106f700b0a6b218509bb0b904c # good: [988bfa0bc67d7220ff8d9e2ba3a425727aa98af3] drm/amd/display: Make core_dcn4_g6_temp_read_blackout_table static git bisect good 988bfa0bc67d7220ff8d9e2ba3a425727aa98af3 # bad: [2bb3fc536d692d43cd55396ecff73c7691eeae85] Merge drm/drm-next into drm-intel-next git bisect bad 2bb3fc536d692d43cd55396ecff73c7691eeae85 # good: [4461e9e5c374f8c11fee8e4a0e3290b072cfd538] Merge v6.11-rc5 into drm-next git bisect good 4461e9e5c374f8c11fee8e4a0e3290b072cfd538 # good: [21bb04152a18ac2314ef4186b6dcd46f1b847354] drm/i915/dsb: Convert dewake_scanline to a hw scanline number earlier git bisect good 21bb04152a18ac2314ef4186b6dcd46f1b847354 # bad: [b290af0500f09577ad40b9f716d551fd65ceff25] drm/tegra: hub: Use fn parameter directly to fix Coccinelle warning git bisect bad b290af0500f09577ad40b9f716d551fd65ceff25 # bad: [51394119f640423858a2f04076d6f1c3e83fa715] drm/panel-edp: add BOE NE140WUM-N6G panel entry git bisect bad 51394119f640423858a2f04076d6f1c3e83fa715 # good: [e45b6716de4bf06b628a9f3559f7fc8dd5e94d58] drm/amd/display: use a more lax vblank enable policy for DCN35+ git bisect good e45b6716de4bf06b628a9f3559f7fc8dd5e94d58 # bad: [e794b7b9b92977365c693760a259f8eef940c536] drm: omapdrm: Add missing check for alloc_ordered_workqueue git bisect bad e794b7b9b92977365c693760a259f8eef940c536 # skip: [6729c73103bd7a0e60b0c980b51b5434010b4502] drm/ttm: fix kernel-doc typo for @trylock_only git bisect skip 6729c73103bd7a0e60b0c980b51b5434010b4502 # bad: [58a261bfc96763a851cb48b203ed57da37e157b8] drm/amd/display: use a more lax vblank enable policy for older ASICs git bisect bad 58a261bfc96763a851cb48b203ed57da37e157b8 # first bad commit: [58a261bfc96763a851cb48b203ed57da37e157b8] drm/amd/display: use a more lax vblank enable policy for older ASICs
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8488,10 +8488,11 @@ if (acrtc_state) { if (amdgpu_ip_version(adev, DCE_HWIP, 0) < - IP_VERSION(3, 5, 0) || - acrtc_state->stream->link->psr_settings.psr_version < - DC_PSR_VERSION_UNSUPPORTED || - !(adev->flags & AMD_IS_APU)) { + IP_VERSION(3, 5, 0)) { + drm_crtc_vblank_on(&acrtc->base); + } else if (acrtc_state->stream->link->psr_settings.psr_version < + DC_PSR_VERSION_UNSUPPORTED || + !(adev->flags & AMD_IS_APU)) { timing = &acrtc_state->stream->timing; /* at least 2 frames */ @@ -8501,12 +8502,14 @@ timing->pix_clk_100hz); config.offdelay_ms = offdelay ?: 30; + drm_crtc_vblank_on_config(&acrtc->base, + &config); } else { config.disable_immediate = true; + drm_crtc_vblank_on_config(&acrtc->base, + &config); } - drm_crtc_vblank_on_config(&acrtc->base, - &config); } else { drm_crtc_vblank_off(&acrtc->base); }
OpenPGP_signature.asc
Description: OpenPGP digital signature