The latest upstream firmware is stable, so I reverted back to 1.197 so that
I could test only the picasso* files.
Before doing that, I decided to run for a while with pristine linux-
firmware 1.197 to double-check that the bug happens quickly.
Unexpectedly, 1.197 is now reliable too! I have been running it for about
half a day (which is more than what was possible before) and it is fine.
The only thing that changed was that flatpak's org.freedesktop.Platform and
org.freedesktop.Platform.GL.default were updated, not sure if yesterday or
the day before.
This is relevant because I use Firefox from flathub.
I'm suspecting that the instability comes from the combination of linux-
firmware 1.197 + a particular version of some userspace component (Mesa I
guess) that was in org.freedesktop.Platform{,.GL.default}.
I'll try reverting the flatpak update to see if I can get back to the
unstable state to confirm the hypothesis.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/1928393
Title:
linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
retry page fault"
Status in linux-firmware package in Ubuntu:
Incomplete
Bug description:
After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
and severe GPU instability. When this happens, I see this error in
dmesg:
[20061.061069] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0
pid 1236)
[20061.061103] amdgpu 0000:03:00.0: amdgpu: in page starting at address
0x800000401000 from client 27
[20061.061135] amdgpu 0000:03:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[20061.061147] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP
(0x8)
[20061.061157] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[20061.061167] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[20061.061174] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[20061.061183] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[20061.061189] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
I'll attach a couple of full dmesgs that I collected.
Many of the times when this happens, the screen and keyboard freeze
irreversibly (I tried waiting for more than 30 minutes, but it doesn't
help). I can still log in via ssh though. When there's no freeze, I
can continue using the computer normally, but the laptop fans keep
running are always running and the battery depletes fast. There's
probably something on a permanent loop either in the kernel or in the
GPU.
This bug happens several times a day, rendering the machine so
unstable as to be almost unusable. It is a severe regression and I'm
aghast that it passed AMD's Quality Assurance.
After downgrading back to linux-firmware 1.190.5, the machine is back
to the previous, mostly-reliable state. Which is to say, this bug is
gone, I'm just left with the other amdgpu suspend bug I've learned to
live with since I bought this computer.
Please revert the amdgpu firmware in this package as soon as possible.
This is unbearable.
Relevant information:
Ubuntu version: 21.04
Linux kernel: 5.11.0-17-generic x86_64
CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso (rev c1)
Laptop model: Lenovo Ideapad S145
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1928393/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp