On 7/17/2025 2:42 PM, Felix Richter wrote:
Hi,
just tested that this Bug still exists in kernel version 6.16-rc6. The
example trigger from my previous mail still works triggering screen
freezing shortly after invocation.
I also learned that setting kernel param `amdgpu.dcdebugmask=0x10` works
as a workaround.
Kind regards,
Felix Richter
On 4/22/25 21:44, Felix Richter wrote:
Hi,
it has been quite at while since I first started experiencing this
particular bug I am about to describe. Suffice it to say during my
Easter holiday I finally had the time to dig into it. It all started
with an update of linux LTS from 6.6 to 6.12.
I am a user of the sway tiling window manager and have written a small
utility to manage my display configuration across different setups.
With the added twist that I wrote some code to determine which monitor
inputs is currently in use using the monitor command interface. Anyway
the interesting detail here is that, starting with kernel 6.12 I
started running into the following problem. With my display management
daemon running and attaching my Laptop to an external display my
internal display would just freeze with no way to bring it back apart
from power cycling the entire device. When my management daemon was
not running this would not happen, I would then need to manually
configure my display setup. Further investigation into the what is
triggering the display freeze lead me into the part of the code where
I am enumerating attached displays and am trying to match `i2c`
devices to their corresponding display.
To get more specific the procedure is as follows, using udev enumerate
all `i2c` busses and filter them base on some heuristics like device
name and devices with parent devices drm / graphics device. Sadly this
is not quite enough to already match an `i2c` command interface to the
corresponding monitor, in many cases it is required to manually read
the EDID information via the i2c interface and compare it to the known
attached displays to get the match. And this is where the trigger for
the display freeze is to be found.
Here is the output when scanning sysfs for my internal laptop display:
```
# ls -al /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0/drm/card1/
card1-eDP-1
total 0
drwxr-xr-x 6 root root 0 22. Apr 18:07 .
drwxr-xr-x 11 root root 0 22. Apr 18:07 ..
drwxr-xr-x 3 root root 0 22. Apr 18:07 amdgpu_bl1
-r--r--r-- 1 root root 4096 22. Apr 18:07 connector_id
lrwxrwxrwx 1 root root 0 22. Apr 18:07 ddc -> ../../../i2c-3
lrwxrwxrwx 1 root root 0 22. Apr 18:07 device -> ../../card1
-r--r--r-- 1 root root 4096 22. Apr 18:07 dpms
drwxr-xr-x 3 root root 0 22. Apr 18:07 drm_dp_aux0
-r--r--r-- 1 root root 0 22. Apr 18:07 edid
-r--r--r-- 1 root root 4096 22. Apr 18:07 enabled
drwxr-xr-x 4 root root 0 22. Apr 18:07 i2c-11
-r--r--r-- 1 root root 4096 22. Apr 18:07 modes
drwxr-xr-x 2 root root 0 22. Apr 18:07 power
-rw-r--r-- 1 root root 4096 22. Apr 18:07 status
lrwxrwxrwx 1 root root 0 22. Apr 18:07 subsystem -
> ../../../../../../../class/drm
-rw-r--r-- 1 root root 4096 22. Apr 18:07 uevent
```
As can be seen there are two i2c devices present, i2c-3 (as ddc
symlink) and i2c-11. Now from the perspective of udev i2c-11 has the
parent set to card1-eDP-1 while i2c-3 has the parent set to the drm
device itself. More importantly I can not rule out i2c-3 as a valid
command interface because in some cases valid command channels are
never assigned to the corresponding display output directly but only
live directly on the drm device, this is especially true when monitors
are not attached directly but via a docking station. So I do have to
look at each i2c device on its own. The freeze is trigged by trying to
read edid from i2c-3: This is the code snipped I used to trigger the
bug: https://github.com/ju6ge/libmonitor/
blob/918b2543eafb96aca29f66debc70fd18fa21ee11/examples/via-i2c-dev.rs
(adjusted target i2c interface accordingly). To be absolutely clear
this is not the i2c device that is expected to work in every case of
trying this with kernel 6.6 to 6.12 I get the following error message:
DdcError(CommunicationError(ReceiveError(EIO: I/O error))). That is
expected internal laptop displays do not support the command interface
in most cases anyway. But what I do not expect to happen is that my
Laptop screen freezes! And since this did not happen with kernel 6.6
but started happening with 6.12 this seems to be a software issue and
with that a regression!
Next I bisected the kernel from 6.6 to 6.12 to determine when this
regression was introduced. I attached the full bisect log to the email ;)
The offending commit seems to be:
[58a261bfc96763a851cb48b203ed57da37e157b8] drm/amd/display: use a more
lax vblank enable policy for older ASICs
Since this is quite a small commit I validated this by reverting the
changes on a newer kernel version (patch attached as well). Testing
actually shows that reverting the change resolves the screen freezing
behavior for me.
Now I am not deep enough into graphics drivers to claim that just
reverting the commit should be considered a valid fix. Just that the
change is definitely responsible for the screen freezing now as
apposed to before.
So what should be done here? I can validate any other suggested fixes
against my setup or provide more information if need be.
Kind regards,
Felix Richter
#regzbot introduced: v6.6..v6.12
At least to me, this issue sounds like a case that multiple entities are
trying to communicate with the panel at the same time.
By setting dcdebugmask=0x10 what you're essentially doing is stopping
the display hardware from trying to put the panel into PSR. So there is
"less" I2C traffic to fight with.
*Why* are you using I2C to read the EDID like this? Could you instead
use /sys/class/drm/cardX-inputY/edid? Or even better - can you use the
information from drm_info to make decisions?
I think the less I2C traffic done directly from userspace the better
when it comes to synchronization issues..