Em sábado, 15 de maio de 2021, às 11:24:17 -03, Thiago Jung Bauermann
escreveu:
> Unexpectedly, 1.197 is now reliable too! I have been running it for about
> half a day (which is more than what was possible before) and it is fine.
After 4 days of stability I just had the retry page fault problem again,
with stock linux-firmware 1.197 and kernel 5.11.0-17-generic.
> The only thing that changed was that flatpak's org.freedesktop.Platform
> and org.freedesktop.Platform.GL.default were updated, not sure if
> yesterday or the day before.
>
> This is relevant because I use Firefox from flathub.
>
> I'm suspecting that the instability comes from the combination of linux-
> firmware 1.197 + a particular version of some userspace component (Mesa I
> guess) that was in org.freedesktop.Platform{,.GL.default}.
So apparently the flatpak update made the problem less likely to happen,
but it still does.
> I'll try reverting the flatpak update to see if I can get back to the
> unstable state to confirm the hypothesis.
Life got in the way and I wasn't able to do this yet.
With it taking 4 days to reproduce the problem with the current stack, I
think it still makes sense to revert the flatpak update.
Unfortunately "flatpak history" seems to be broken on my system:
$ flatpak history
error: appstream2/x86_64 is not application or runtime
I'll try using ostree directly. I don't think I'll be able to do it today,
but hopefuly within the next couple of days.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/1928393
Title:
linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
retry page fault"
Status in linux-firmware package in Ubuntu:
Incomplete
Bug description:
After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
and severe GPU instability. When this happens, I see this error in
dmesg:
[20061.061069] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0
pid 1236)
[20061.061103] amdgpu 0000:03:00.0: amdgpu: in page starting at address
0x800000401000 from client 27
[20061.061135] amdgpu 0000:03:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[20061.061147] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP
(0x8)
[20061.061157] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[20061.061167] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[20061.061174] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[20061.061183] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[20061.061189] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
I'll attach a couple of full dmesgs that I collected.
Many of the times when this happens, the screen and keyboard freeze
irreversibly (I tried waiting for more than 30 minutes, but it doesn't
help). I can still log in via ssh though. When there's no freeze, I
can continue using the computer normally, but the laptop fans keep
running are always running and the battery depletes fast. There's
probably something on a permanent loop either in the kernel or in the
GPU.
This bug happens several times a day, rendering the machine so
unstable as to be almost unusable. It is a severe regression and I'm
aghast that it passed AMD's Quality Assurance.
After downgrading back to linux-firmware 1.190.5, the machine is back
to the previous, mostly-reliable state. Which is to say, this bug is
gone, I'm just left with the other amdgpu suspend bug I've learned to
live with since I bought this computer.
Please revert the amdgpu firmware in this package as soon as possible.
This is unbearable.
Relevant information:
Ubuntu version: 21.04
Linux kernel: 5.11.0-17-generic x86_64
CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso (rev c1)
Laptop model: Lenovo Ideapad S145
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1928393/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp