Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-24 Thread Mikhail Gavrilov
On Mon, Aug 5, 2024 at 11:05 PM Mikhail Gavrilov wrote: > > Hi, > After commit 1b04dcca4fb1, launching some RenPy games causes computer hang. > After the hang, even Alt + sysrq + REISUB can't reboot the computer! > And no trace in the kernel log! > For demonstration, I&

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-02 Thread Mikhail Gavrilov
On Sun, Aug 25, 2024 at 2:12 AM Mikhail Gavrilov wrote: > > Hi, > Is anyone trying to look into it? > I continue to reproduce this issue on fresh kernel builds 6.11-rc4+. > In addition to the RenPy engine, the problem also reproduces on games > from Ubisoft, such as Far Cry 4.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
On Wed, Sep 4, 2024 at 4:15 AM Leo Li wrote: > Hi Mike, > > Super sorry for the ridiculous wait. Your first two emails slipped by my > inbox, > which is really silly, given I'm first in the to field... > > Thanks for bisecting and finding a free game to reproduce it on. I did not > have > luck r

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
tch was definitely not enough. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-08 Thread Mikhail Gavrilov
On Sat, Sep 7, 2024 at 12:47 AM Leo Li wrote: > > > Hi Mikhail, > > I've tried to align my system with yours as best as I can, but so far, I've > had > no luck reproducing the hang. A video of what I'm doing: > https://youtu.be/VeD-LPCnfWM?si=b2baF8MyDBuU4jRH > (Under the hood, the W7900 and 7900

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Mikhail Gavrilov
tely hangs without any messages in kernel logs. On Wed, Sep 11, 2024 at 2:11 AM Leo Li wrote: > > Hi Mikhail, > > Can you give this patch a try to see if it helps? > https://gist.github.com/leeonadoh/3271e90ec95d768424c572c970ada743 > Thanks, with this patch, the issue is not r

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-05 Thread Mikhail Gavrilov
On Sun, May 26, 2024 at 7:06 PM Mikhail Gavrilov wrote: > > Hi, > Day before yesterday I replaced 7900XTX to 6900XT for got clear in > which kernel first time appeared warning message "DMA-API: amdgpu > :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren&#

Re: 6.10/regression/bisected - commit a68c7eaa7a8f cause *ERROR* Trying to clear memory with ring turned off in amdgpu_fill_buffer.

2024-06-09 Thread Mikhail Gavrilov
On Fri, May 17, 2024 at 8:59 PM Mikhail Gavrilov wrote: > > Thanks, Arun. > With the patch this error did not appear anymore. > Tested-by: Mikhail Gavrilov on 7900XTX > hardware. > I see that this patch do the same but more correctly: https://gitlab.freedesktop.org

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-09 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 6:39 PM Alex Deucher wrote: > > --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > @@ -944,7 +944,7 @@ void optc1_set_drr( > OTG_V_TOTAL_MAX_SEL, 1, >

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-10 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 5:29 PM Linux regression tracking (Thorsten Leemhuis) wrote: > > [CCing the other amd drm maintainers] > > Mikhail: are those details in any way relevant? Then in the future best > leave them out (or make things easier to follow), they make the bug > report confusing and sou

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-21 Thread Mikhail Gavrilov
On Fri, Jun 21, 2024 at 12:56 PM Linux regression tracking (Thorsten Leemhuis) wrote: > Hmmm, I might have missed something, but it looks like nothing happened > here since then. What's the status? Is the issue still happening? Yes. Tested on e5b3efbe1ab1. I spotted that the problem disappears a

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-29 Thread Mikhail Gavrilov
On Sat, Jun 29, 2024 at 9:46 PM Rodrigo Siqueira Jordao wrote: > Hi Mikhail, > > I'm trying to reproduce this issue, but until now, I've been unable to > reproduce it. I tried some different scenarios with the following > components: > > 1. Displays: I tried with one and two displays > - 4k@120

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
On Tue, Jul 9, 2024 at 7:48 PM Rodrigo Siqueira Jordao wrote: > Hi, > > I also tried it with 6900XT. I got the same results on my side. This is weird. > Anyway, I could not reproduce the issue with the below components. I may > be missing something that will trigger this bug; in this sense, coul

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
On Wed, Jul 10, 2024 at 12:01 PM Mikhail Gavrilov wrote: > > On Tue, Jul 9, 2024 at 7:48 PM Rodrigo Siqueira Jordao > wrote: > > Hi, > > > > I also tried it with 6900XT. I got the same results on my side. > > This is weird. > > > Anyway, I could not rep

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-16 Thread Mikhail Gavrilov
On Tue, Jul 16, 2024 at 10:10 PM Alex Deucher wrote: > > Does the attached partial revert fix it? > > Alex > Yes, thanks. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-24 Thread Mikhail Gavrilov
On Tue, Jul 23, 2024 at 2:34 AM Alex Deucher wrote: > Do either of these patches help? > https://patchwork.freedesktop.org/patch/605437/ Unfortunately, this patch didn't help. Please see the attached kernel log. > https://patchwork.freedesktop.org/patch/605201/ For which kernel is this patch int

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-08-02 Thread Mikhail Gavrilov
On Wed, Jul 24, 2024 at 10:16 PM Mikhail Gavrilov wrote: > > https://patchwork.freedesktop.org/patch/605201/ > For which kernel is this patch intended? The patch is not applied on > top of d67978318827. I am able to apply this patch on top of e4fc196f5ba3 and the issue is gone

6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-05 Thread Mikhail Gavrilov
Hi, After commit 1b04dcca4fb1, launching some RenPy games causes computer hang. After the hang, even Alt + sysrq + REISUB can't reboot the computer! And no trace in the kernel log! For demonstration, I'm going to use the game "Find the Orange Narwhal" because it is free and has 100% reproducivity f

Re: 6.10/regression/bisected - commit a68c7eaa7a8f cause *ERROR* Trying to clear memory with ring turned off in amdgpu_fill_buffer.

2024-05-17 Thread Mikhail Gavrilov
Thanks, Arun. With the patch this error did not appear anymore. Tested-by: Mikhail Gavrilov on 7900XTX hardware. -- Best Regards, Mike Gavrilov. <>

6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-05-26 Thread Mikhail Gavrilov
Hi, Day before yesterday I replaced 7900XTX to 6900XT for got clear in which kernel first time appeared warning message "DMA-API: amdgpu :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported". The kernel 6.3 and older won't boot on a computer with Radeon 7900XTX. When I boo

regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-14 Thread Mikhail Gavrilov
Hi, Yesterday came the 6.7-rc1 kernel. And surprisingly it turned out it is not working with my LG C3. I use this OLED TV as my primary monitor. After login to GNOME I see a horizontal flashing bar with a picture of the desktop background on white screen. Demonstration: https://youtu.be/7F76VfRkrVo

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Tue, Nov 14, 2023 at 11:03 PM Mikhail Gavrilov wrote: > > On Tue, Nov 14, 2023 at 3:55 PM Mikhail Gavrilov > wrote: > > > > Hi, > > Yesterday came the 6.7-rc1 kernel. > > And surprisingly it turned out it is not working with my LG C3. > > I use this O

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Wed, Nov 15, 2023 at 11:14 PM Hamza Mahfooz wrote: > > What version of DMUB firmware are you on? > The easiest way to find out would be using the following: > > # dmesg | grep DMUB > Sapphire AMD Radeon RX 7900 XTX PULSE OC: ❯ dmesg | grep DMUB [ 14.341362] [drm] Loading DMUB firmware via PS

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Wed, Nov 15, 2023 at 11:39 PM Lee, Alvin wrote: > > This change has a DMCUB dependency - are you able to update your DMCUB > version as well? > I can confirm this issue was gone after updating firmware. ❯ dmesg | grep DMUB [ 11.496679] [drm] Loading DMUB firmware via PSP: version=0x0700230

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-17 Thread Mikhail Gavrilov
he first one patch is enough. Tested-on: 7900XTX, 6900XT and 6800M. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-15 Thread Mikhail Gavrilov
On Tue, Feb 28, 2023 at 5:43 PM Christian König wrote: > > The point is it doesn't need to talk to the amdgpu hardware. What it > does is that it talks to the good old VGA/VESA emulation and that just > happens to be still enabled by the BIOS/GRUB. > > And that VGA/VESA emulation doesn't need any

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-12-18 Thread Mikhail Gavrilov
On Fri, Dec 15, 2023 at 9:14 PM Hamza Mahfooz wrote: > > Can you try the following patch with old fw (version 0x07002100 should > be fine)?: https://patchwork.freedesktop.org/patch/572298/ > Tested-by: Mikhail Gavrilov on 7900XTX hardware. Can I ask? What does SubVP actually d

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-19 Thread Mikhail Gavrilov
On Fri, Dec 15, 2023 at 5:37 PM Christian König wrote: > > I have no idea :) > > From the logs I can see that the AMDGPU now has the proper BARs assigned: > > [5.722015] pci :03:00.0: [1002:73df] type 00 class 0x038000 > [5.722051] pci :03:00.0: reg 0x10: [mem > 0xf8-0xfbf

Re: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities

2024-01-24 Thread Mikhail Gavrilov
On Wed, Jan 24, 2024 at 7:19 AM Mikhail Gavrilov wrote: > > Who could dig into it, please? You decided to revert it? https://lkml.org/lkml/2024/1/22/1866 Also I forgot to attach the kernel build .config in the previous message. I'm going to fix it here. It may be useful for reprodu

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-09-14 Thread Mikhail Gavrilov
On Wed, 14 Apr 2021 at 11:48, Christian König < ckoenig.leichtzumer...@gmail.com> wrote: > > That is expected behavior, the application is just buggy and causing a > page fault on the GPU. > > The kernel should just not crash with a backtrace. > > Regards, > Christian. > If after it GPU hangs wit

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-09-15 Thread Mikhail Gavrilov
On Wed, 15 Sept 2021 at 14:55, Christian König wrote: > > Yes, absolutely. You should see GPU resets and recovery in the system log > after that. Unfortunately, not one DE will survive a GPU reset. All applications will terminate abnormally in fact this would be equivalent to reboot (and denial

Unexpected multihop in swaput - likely driver bug.

2021-04-07 Thread Mikhail Gavrilov
Hi! During the 5.12 testing cycle I observed the repeatable bug when launching heavy graphic applications. The kernel log is flooded with the message "Unexpected multihop in swaput - likely driver bug.". Trace: [ 8707.814899] [ cut here ] [ 8707.814920] Unexpected multihop

Re: Unexpected multihop in swaput - likely driver bug.

2021-04-07 Thread Mikhail Gavrilov
On Wed, 7 Apr 2021 at 15:46, Christian König wrote: > > What hardware are you using $ inxi -bM System:Host: fedora Kernel: 5.12.0-0.rc6.184.fc35.x86_64+debug x86_64 bits: 64 Desktop: GNOME 40.0 Distro: Fedora release 35 (Rawhide) Machine: Type: Desktop Mobo: ASUSTeK model: ROG ST

[BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-12 Thread Mikhail Gavrilov
Video demonstration: https://youtu.be/3nkvUeB0GSw How looks kernel traces. 1. [ 7315.156460] amdgpu :0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread obs:cs0 pid 23977) [ 7315.156490] amdgpu :0b:00.0: amdgpu: in page starting at a

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Mikhail Gavrilov
On Tue, 13 Apr 2021 at 12:29, Christian König wrote: > > Hi Mikhail, > > the crash is a known issue and should be fixed by: > > commit f63da9ae7584280582cbc834b20cc18bfb203b14 > Author: Philip Yang > Date: Thu Apr 1 00:22:23 2021 -0400 > > drm/amdgpu: reserve fence slot to update page tabl

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Mikhail Gavrilov
On Tue, 13 Apr 2021 at 04:55, Leo Liu wrote: > > >It curious why ffmpeg does not cause such issues. > >For example such command not cause kernel panic: > >$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf > >'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec > >h264

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-14 Thread Mikhail Gavrilov
On Wed, 14 Apr 2021 at 03:22, Leo Liu wrote: > > This is decode command line, are you seeing issue with encode or > decode? I was means that described above the kernel panic happens only when OBS record or stream video with VAAPI encoder. Grabbing and encoding video with ffmpeg (given command exa

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-20 Thread Mikhail Gavrilov
On Wed, 14 Apr 2021 at 11:48, Christian König wrote: > > >> commit f63da9ae7584280582cbc834b20cc18bfb203b14 > >> Author: Philip Yang > >> Date: Thu Apr 1 00:22:23 2021 -0400 > >> > >> drm/amdgpu: reserve fence slot to update page table > >> > > That is expected behavior, the application i

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-21 Thread Mikhail Gavrilov
On Wed, 21 Apr 2021 at 11:42, Christian König wrote: > I can try, but I'm not sure if we even have the full page fault handling > for Navi in 5.12. > It would be great. For me this patch is working as expected and I already for several days didn't see the panic "kernel BUG at drivers/dma-buf/dma-

[bugreport] [5.10] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww) We 'forgot' to unlock everything else first?

2020-10-17 Thread Mikhail Gavrilov
Hi folks. I observed this issue since 5.3 and it still happens with 5.10 git. This warning has reproductivity 100% reliable when I launch "Wolfenstein: Youngblood" version of Mesa doesn't matter. [73690.883948] [ cut here ] [73690.883953] DEBUG_LOCKS_WARN_ON(ww_ctx->contend

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
Hi Christian, On Tue, 12 Jan 2021 at 01:45, Christian König wrote: > > Hi Mike, > > Unfortunately not, that's DC stuff. Easiest is to assign this as a bug > tracker to our DC team. Ok > At least some progress. Any objections that I add your e-mail address as > tested-by tag? Yes, feel free add m

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-13 Thread Mikhail Gavrilov
On Tue, 12 Jan 2021 at 01:45, Christian König wrote: > > But what you have in your logs so far are only unrelated symptoms, the > root of the problem is that somebody is leaking memory. > > What you could do as well is to try to enable kmemleak I captured some memleaks. Do they contain any useful

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-14 Thread Mikhail Gavrilov
On Thu, 14 Jan 2021 at 18:56, Christian König wrote: > Unfortunately not of hand. > > I also don't see any bug reports from other people and can't reproduce > the last backtrace you send out TTM here. Because only the most desperate will install kernels with enabled debug flags and then load the

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-19 Thread Mikhail Gavrilov
On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov wrote: > In rc4, the number of warnings has dropped dramatically. No more errors "kasan slab-out-of-bounds" and no "DMA-API device driver failed to check map error". But still not fixed "sleeping function called from inva

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-24 Thread Mikhail Gavrilov
On Thu, 21 Jan 2021 at 18:27, Christian König wrote: > > I still have no idea what's going on here. > > The KASAN messages from the DC code are completely unrelated. > > Please add the full dmesg to your bug report. > I did it. https://gitlab.freedesktop.org/drm/amd/-/issues/1439#note_776267 --

[bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]

2021-01-30 Thread Mikhail Gavrilov
The 5.11-rc5 (git 76c057c84d28) brought a new issue. Now the kernel log is flooded with the message "page allocation failure". Trace: msedge:cs0: page allocation failure: order:10, mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 18 PID: 4540

Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]

2021-02-06 Thread Mikhail Gavrilov
On Sun, 31 Jan 2021 at 22:22, Christian König wrote: > > > Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes. > Should land in the next -rc by the weekend. > > Regards, > Christian. I checked this patch [1] for several days. And I can confirm that the reported issue was gone. [1

Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]

2021-02-09 Thread Mikhail Gavrilov
On Mon, 8 Feb 2021 at 14:18, Christian König wrote: > > Are the other problems gone as well? > And yes and no. The issue with monitor turns off was gone after rc6 (git3aaf0a27ffc2) But both traces 1) BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 (kernel 5.11 s

[bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure

2020-11-03 Thread Mikhail Gavrilov
Hi folks. I observed hard reproductible the set of bugs. It always started as 1) kworker/u64:2: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 Continious as: 2) WARNING: CPU: 21 PID: 806649 at drivers/gpu/drm/amd/amdgpu/../d

Re: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

2020-12-27 Thread Mikhail Gavrilov
On Sun, 27 Dec 2020 at 21:39, Mikhail Gavrilov wrote: > I suppose the root of cause my problem here: > > [3.961326] amdgpu :0b:00.0: Direct firmware load for > amdgpu/sienna_cichlid_sos.bin failed with error -2 > [3.961359] amdgpu :0b:00.0: amdgpu: failed to in

Re: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

2020-12-30 Thread Mikhail Gavrilov
On Tue, 29 Dec 2020 at 20:15, Deucher, Alexander wrote: > > It looks like the driver is not able to access the firmware for some reason. > Please make sure it is available in your initrd or compiled into the kernel > depending on your config. Exactly! Thanks! # lsinitrd /boot/initramfs-5.10.

BUG: key ffff8b521bda9148 has not been registered!

2021-01-09 Thread Mikhail Gavrilov
Hi folks! I started to see this message every boot after replacing Radeon VII to 6900XT. $ journalctl | grep "BUG: key" Dec 31 05:19:42 localhost.localdomain kernel: BUG: key 98b59ab01148 has not been registered! Dec 31 05:25:44 localhost.localdomain kernel: BUG: key 8d425ba01148 has not b

[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-10 Thread Mikhail Gavrilov
Hi folks, today I joined to testing Kernel 5.11 and saw that the kernel log was flooded with BUG messages: BUG: sleeping function called from invalid context at mm/vmalloc.c:1756 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0 INFO: lockdep is turned off. CPU: 15 PID: 266

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
On Mon, 11 Jan 2021 at 19:01, Christian König wrote: > Changing the page table attributes while releasing memory might sleep. > So we can't use a spinlock here. > > Thanks for the report, a patch to fix this is on the mailing list now. Can you look also the first trace? Here a same error message

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Mon, Nov 6, 2023 at 8:29 PM Alex Deucher wrote: > > Already fixed in this commit: > https://gitlab.freedesktop.org/agd5f/linux/-/commit/d1d4c0b7b65b7fab2bc6f97af9e823b1c42ccdb0 > Which is in included in last weeks PR. > Thanks, it fixed the issue above. But, unfortunately this is not the only

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Wed, Nov 8, 2023 at 12:12 AM Alex Deucher wrote: > > The attached patch should fix it. Not sure why your GPU shows up as > busy. The AGP aperture was just disabled. Tested-by: Mikhail Gavrilov Thanks, after applying the patch GPU loading meets expectations. Games are working so ov

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-09-19 Thread Mikhail Gavrilov
Hi! Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel. The issue became hard to repeat. I spent the whole day at the computer when use-after-free again happened, I was playing the game Tiny Tina's Wonderlands. Therefore, forget about repeatability. It remains only to hope f

[regression][6.1] After commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86 system randomly hungs

2022-10-11 Thread Mikhail Gavrilov
Hi! The hungs occurs randomly, but I found good reproductive scenario (This is running the campaign in the game Halo Infinite) The backtrace is look like this: [ 147.260971] BUG: kernel NULL pointer dereference, address: 0088 [ 147.260987] [ cut here ] [ 147.

[regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2022-10-13 Thread Mikhail Gavrilov
Hi! I bisected an issue of the 6.0 kernel which started happening after 6.0-rc7 on all my machines. Backtrace of this issue looks like as: [ 2807.339439] [ cut here ] [ 2807.339445] WARNING: CPU: 11 PID: 2061 at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-10-17 Thread Mikhail Gavrilov
On Wed, May 11, 2022 at 5:01 PM Christian König wrote: > > > We have implemented a workaround, but still don't know the exact root cause. > > If anybody wants to look into this it would be rather helpful to be able > to reproduce the issue. > > Regards, > Christian. I see that issue was returned

[6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
Hi! I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6. dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 Author: Christian König Date: Thu Jul 14 10:23:38

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
On Fri, Oct 21, 2022 at 1:33 PM Christian König wrote: > > Hi, > > yes Bas already reported this issue, but I couldn't reproduce it. Need > to come up with a patch to narrow this down further. > > Can I send you something to test? I would appreciate to test any patches and ideas. -- Best Regard

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-30 Thread Mikhail Gavrilov
On Wed, Oct 26, 2022 at 12:29 PM Christian König wrote: > > Attached is the original test patch rebased on current amd-staging-drm-next. > > Can you test if this is enough to make sure that the games start without > crashing by fetching the userptrs? 1. Over the past week the list of games affect

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-02 Thread Mikhail Gavrilov
On Tue, Nov 1, 2022 at 10:52 PM Christian König wrote: > > Let's focus on one problem at a time. > > The issue here is that somehow userptr handling became racy after we > removed the lock, but I don't see why. > > We need to fix this ASAP since it is probably a much wider problem and > the additi

[Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games

2022-04-03 Thread Mikhail Gavrilov
Hi, Between commits ed4643521e6a and 34af78c4e616 something was broken. I noted that kernel log flooded with warning message "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" when some games are running: "Resident Evil Village", "Marvel's Aven

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
Hi Christian > those are two independent and already known problems. > > The warning triggered from the sync_file is already fixed in > drm-misc-next-fixes, but so far I couldn't figure out why the games > suddenly doesn't work any more. I thought that these warnings are related to the stuck of t

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
On Fri, 8 Apr 2022 at 16:13, Christian König wrote: > I own you a beer. > > I still don't know what happens here, but that makes at least a bit more > sense than a patch which only changes comments :) > > Looks like we are missing something here. Can I send you a patch to try > something later to

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
ers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" has gone. Thanks. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-14 Thread Mikhail Gavrilov
On Sat, Apr 9, 2022 at 7:27 PM Christian König wrote: > > That's unfortunately not the end of the story. > > This is fixing your problem, but reintroducing the original problem that > we call the syncobj with a lock held which can crash badly as well. > > Going to take a closer look on Monday. I h

[Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-06-28 Thread Mikhail Gavrilov
Hi guys. Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode instead I see black screen with constantly glowing cursor. Demonstration: https://youtu.be/rGL4LsHMae4 In the kernel logs there are references to hung processes: [ 149.363465] rfkill: input handler disabled

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-06 Thread Mikhail Gavrilov
On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov wrote: > Christian can you look why drm_aperture_remove_conflicting_pci_framebuffers cause this kernel bug on my machine? [6.822385] amdgpu: Ignoring ACPI CRAT on non-APU system [6.822462] amdgpu: Virtual CRAT table created for

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-09 Thread Mikhail Gavrilov
On Thu, Jul 7, 2022 at 2:50 PM Christian König wrote: > > Am 07.07.22 um 02:20 schrieb Mikhail Gavrilov: > > On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov > > wrote: > > Christian can you look why > > drm_aperture_remove_conflicting_pci_framebuffers cause thi

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-13 Thread Mikhail Gavrilov
On Sat, Jul 9, 2022 at 5:10 PM Mikhail Gavrilov wrote: > Hi Christian, > if you read my initial post. You should see that I tried to bisect the issue. > But it is very problematic because on each step I see different symptomes. > And if mark different symptoms with skip step we got a

Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-18 Thread Mikhail Gavrilov
Hi guys I continue testing 5.19 rc7 and found the bug. Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0008 on driver amdgpu. Here is trace: [ 1320.203332] BUG: kernel NULL pointer dereference, address: 0008 [ 1320.203338] #PF: supervisor read access

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-18 Thread Mikhail Gavrilov
On Wed, Jul 13, 2022 at 5:38 PM Mikhail Gavrilov wrote: > # first bad commit: [9cbbd694a58bdf24def2462276514c90cab7cf80] Merge > drm/drm-next into drm-misc-next > Don't know who to thank but the issue disappeared in 5.19 rc7. -- Best Regards, Mike Gavrilov.

Re: Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-19 Thread Mikhail Gavrilov
On Tue, Jul 19, 2022 at 1:40 PM Mike Lothian wrote: > > I was told that this patch replaces the patch you mentioned > https://patchwork.freedesktop.org/series/106078/ and it the one > that'll hopefully land in Linus's tree > Great, I confirm that both patches solve the issue. As I understand the

Re: Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-19 Thread Mikhail Gavrilov
On Tue, Jul 19, 2022 at 4:26 PM Mikhail Gavrilov wrote: > In the kernel log there is no error so it is most likely a user space issue , > but I am not > sure about it. But I am confused by the message in the kernel log: [ 1962.000909] amdgpu: HIQ MQD's queue_doorbell_id0 i

[BUG][5.20] refcount_t: underflow; use-after-free

2022-08-14 Thread Mikhail Gavrilov
Hi folks. Joined testing 5.20 today (7ebfc85e2cd7). I encountered a frequently GPU freeze, after which a message appears in the kernel logs: [ 220.280990] [ cut here ] [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcoun

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal wrote: > > Hi Mikhail > > Looks like this use-after-free problem was introduced on > 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems > like: if amdgpu_cs_vm_handling return r != 0, then it will unlock > bo_list_mutex inside the fun

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-16 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov wrote: > > Thanks, I tested this patch. > But with this patch use-after-free problem happening in another place: Does anyone have an idea why the second use-after-free happened? >From the trace I don't understand which code is

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen wrote: > > Hi Mikhail, > > IIUC, you got this second user-after-free by applying the first version > of Maíra's patch, right? So, that version was adding another unbalanced > unlock to the cs ioctl flow, but it was solved in the latest version, > that yo

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal wrote: > > Hi Mikhail, > > Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial > revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the > error. Try reverting it and check if the use-after-free still happens. Thanks,

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-24 Thread Mikhail Gavrilov
On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal wrote: > > Hi Mikhail, > > Could you please specify the steps to reproduce this use-after-free? I > will try to reproduce it on the RX5700 XT and bisect the issue. > Hi Maíra, thanks for help. I'm afraid that it will be unrealistic to reproduce, becaus

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-09 Thread Mikhail Gavrilov
9be6ebc61e will stop these warnings. I also attached fresh logs from 6.2.0-0.rc6. 6.2-rc7 I started to build without commit b261509952bc19d1012cf732f853659be6ebc61e to avoid these warnings. On Thu, Oct 13, 2022 at 6:36 PM Mikhail Gavrilov > > Hi! > I bisected an issue of the 6.0 kernel whic

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-13 Thread Mikhail Gavrilov
drop_locks no longer appears anymore. I hope this patch will have time to be merged in 6.2 before release. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov. uptime.tar.xz Description: application/xz

Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2023-02-16 Thread Mikhail Gavrilov
On Fri, Dec 9, 2022 at 7:37 PM Leo Liu wrote: > > Please try the latest AMDGPU driver: > > https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/ > Sorry Leo, I miss your message. This issue is still actual for 6.2-rc8. In my first message I was mistaken. > Before kernel 5.1

Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2023-02-17 Thread Mikhail Gavrilov
On Fri, Feb 17, 2023 at 8:30 PM Alex Deucher wrote: > > On Fri, Feb 17, 2023 at 1:10 AM Mikhail Gavrilov > wrote: > > > > On Fri, Dec 9, 2022 at 7:37 PM Leo Liu wrote: > > > > > > Please try the latest AMDGPU driver: > > > > > > https:/

amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-23 Thread Mikhail Gavrilov
Hi, I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But it is impossible to use without AC power because the system losts nvme when I disconnect the power adapter. Messages from kernel log when it happens: nvme nvme0: controller is down; will reset: CSTS=0x, PCI_STATUS=0

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 12:13 PM Christian König wrote: > > Hi Mikhail, > > this is pretty clearly a problem with the system and/or it's BIOS and > not the GPU hw or the driver. > > The option pci=nocrs makes the kernel ignore additional resource windows > the BIOS reports through ACPI. This then

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 8:31 PM Christian König wrote: > > Sorry I totally missed that you attached the full dmesg to your original > mail. > > Yeah, the driver did fail gracefully. But then X doesn't come up and > then gdm just dies. Are you sure that these messages should be present when the dr

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-28 Thread Mikhail Gavrilov
On Mon, Feb 27, 2023 at 3:22 PM Christian König > > Unfortunately yes. We could clean that up a bit more so that you don't > run into a BUG() assertion, but what essentially happens here is that we > completely fail to talk to the hardware. > > In this situation we can't even re-enable vesa or text

[6.3][regression] commit a4e771729a51168bc36317effaa9962e336d4f5e lead to flood kernel logs with warning messages "at kernel/workqueue.c:3167 __flush_work+0x472/0x500"

2023-03-08 Thread Mikhail Gavrilov
Hi, I didn't faced to issue drm_bridge_hpd_enable+0x94/0x9c [drm] but fixing this issue leads to warning messages on my laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007 which has two AMD GPU. Discrete Radeon 6800M and integrated in CPU Cezanne Vega 8. I found bad commit by bisecting: ❯ git

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-03-23 Thread Mikhail Gavrilov
On Tue, Mar 21, 2023 at 11:47 PM Christian König wrote: > > Hi Mikhail, > > That looks like a reference counting issue to me. > > I'm going to take a look, but we have already fixed one of those recently. > > Probably best that you try this on drm-fixes, just to double check that > this isn't the

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-04-04 Thread Mikhail Gavrilov
On Fri, Mar 24, 2023 at 7:37 PM Christian König wrote: > > Yeah, that one > > Thanks for the info, looks like this isn't fixed. > > Christian. > Hi, glad to see that "BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0" was fixed in 6.3-rc5. For history it would be good to kn

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-14 Thread Mikhail Gavrilov
On Tue, Apr 11, 2023 at 10:40 PM Mikhail Gavrilov wrote: > > Hi, > KASAN continues to find problems in the drm_sched_job_cleanup code at 6.3rc6. > I not got any feedback in the thread > https://lore.kernel.org/lkml/cabxgcsmvub2ra4d+k5cna0_2521tox++d4nmoukki4x2-q_...@mail.gmail.com/

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
Christian? ❯ /usr/src/kernels/6.3.0-0.rc7.56.fc39.x86_64/scripts/faddr2line /lib/debug/lib/modules/6.3.0-0.rc7.56.fc39.x86_64/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.debug drm_sched_job_cleanup+0x9a drm_sched_job_cleanup+0x9a/0x130: drm_sched_job_cleanup at /usr/src/debug/kernel-6.3-rc7/linu

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
On Wed, Apr 19, 2023 at 1:12 PM Christian König wrote: > > I'm already looking into this, but can't figure out why we run into > problems here. > > What happens is that a CS is aborted without sending the job to the > scheduler and in this case the cleanup function doesn't seem to work. > > Christ

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > > Could you try drm-misc-next as well? > > Going to give drm-fixes another round of testing. > > Thanks, > Christian. Important don't give up. https://youtu.be/25zhHBGIHJ8 [40 min] https://youtu.be/utnDR26eYBY [50 min] https://youtu.be/DJQ_

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > Could you try drm-misc-next as well? If as I assume I cloned right repo $ git clone -b drm-misc-next git://anongit.freedesktop.org/drm/drm-misc linux-drm-misc-next for my hardware last commit on this branch is turned out completely unworkin

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-25 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov wrote: > > Important don't give up. > https://youtu.be/25zhHBGIHJ8 [40 min] > https://youtu.be/utnDR26eYBY [50 min] > https://youtu.be/DJQ_tiimW6g [12 min] > https://youtu.be/Y6AH1oJKivA [6 min] > Yes the issue is everyth

  1   2   >