Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Mikhail Gavrilov
tely hangs without any messages in kernel logs. On Wed, Sep 11, 2024 at 2:11 AM Leo Li wrote: > > Hi Mikhail, > > Can you give this patch a try to see if it helps? > https://gist.github.com/leeonadoh/3271e90ec95d768424c572c970ada743 > Thanks, with this patch, the issue is not r

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-08 Thread Mikhail Gavrilov
On Sat, Sep 7, 2024 at 12:47 AM Leo Li wrote: > > > Hi Mikhail, > > I've tried to align my system with yours as best as I can, but so far, I've > had > no luck reproducing the hang. A video of what I'm doing: > https://youtu.be/VeD-LPCnfWM?si=b2baF8MyDBuU4jRH > (Under the hood, the W7900 and 7900

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
tch was definitely not enough. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
On Wed, Sep 4, 2024 at 4:15 AM Leo Li wrote: > Hi Mike, > > Super sorry for the ridiculous wait. Your first two emails slipped by my > inbox, > which is really silly, given I'm first in the to field... > > Thanks for bisecting and finding a free game to reproduce it on. I did not > have > luck r

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-02 Thread Mikhail Gavrilov
On Sun, Aug 25, 2024 at 2:12 AM Mikhail Gavrilov wrote: > > Hi, > Is anyone trying to look into it? > I continue to reproduce this issue on fresh kernel builds 6.11-rc4+. > In addition to the RenPy engine, the problem also reproduces on games > from Ubisoft, such as Far Cry 4.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-24 Thread Mikhail Gavrilov
On Mon, Aug 5, 2024 at 11:05 PM Mikhail Gavrilov wrote: > > Hi, > After commit 1b04dcca4fb1, launching some RenPy games causes computer hang. > After the hang, even Alt + sysrq + REISUB can't reboot the computer! > And no trace in the kernel log! > For demonstration, I&

6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-05 Thread Mikhail Gavrilov
Hi, After commit 1b04dcca4fb1, launching some RenPy games causes computer hang. After the hang, even Alt + sysrq + REISUB can't reboot the computer! And no trace in the kernel log! For demonstration, I'm going to use the game "Find the Orange Narwhal" because it is free and has 100% reproducivity f

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-08-02 Thread Mikhail Gavrilov
On Wed, Jul 24, 2024 at 10:16 PM Mikhail Gavrilov wrote: > > https://patchwork.freedesktop.org/patch/605201/ > For which kernel is this patch intended? The patch is not applied on > top of d67978318827. I am able to apply this patch on top of e4fc196f5ba3 and the issue is gone

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-24 Thread Mikhail Gavrilov
On Tue, Jul 23, 2024 at 2:34 AM Alex Deucher wrote: > Do either of these patches help? > https://patchwork.freedesktop.org/patch/605437/ Unfortunately, this patch didn't help. Please see the attached kernel log. > https://patchwork.freedesktop.org/patch/605201/ For which kernel is this patch int

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-16 Thread Mikhail Gavrilov
On Tue, Jul 16, 2024 at 10:10 PM Alex Deucher wrote: > > Does the attached partial revert fix it? > > Alex > Yes, thanks. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
On Wed, Jul 10, 2024 at 12:01 PM Mikhail Gavrilov wrote: > > On Tue, Jul 9, 2024 at 7:48 PM Rodrigo Siqueira Jordao > wrote: > > Hi, > > > > I also tried it with 6900XT. I got the same results on my side. > > This is weird. > > > Anyway, I could not rep

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
On Tue, Jul 9, 2024 at 7:48 PM Rodrigo Siqueira Jordao wrote: > Hi, > > I also tried it with 6900XT. I got the same results on my side. This is weird. > Anyway, I could not reproduce the issue with the below components. I may > be missing something that will trigger this bug; in this sense, coul

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-29 Thread Mikhail Gavrilov
On Sat, Jun 29, 2024 at 9:46 PM Rodrigo Siqueira Jordao wrote: > Hi Mikhail, > > I'm trying to reproduce this issue, but until now, I've been unable to > reproduce it. I tried some different scenarios with the following > components: > > 1. Displays: I tried with one and two displays > - 4k@120

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-21 Thread Mikhail Gavrilov
On Fri, Jun 21, 2024 at 12:56 PM Linux regression tracking (Thorsten Leemhuis) wrote: > Hmmm, I might have missed something, but it looks like nothing happened > here since then. What's the status? Is the issue still happening? Yes. Tested on e5b3efbe1ab1. I spotted that the problem disappears a

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-10 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 5:29 PM Linux regression tracking (Thorsten Leemhuis) wrote: > > [CCing the other amd drm maintainers] > > Mikhail: are those details in any way relevant? Then in the future best > leave them out (or make things easier to follow), they make the bug > report confusing and sou

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-09 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 6:39 PM Alex Deucher wrote: > > --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > @@ -944,7 +944,7 @@ void optc1_set_drr( > OTG_V_TOTAL_MAX_SEL, 1, >

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-05 Thread Mikhail Gavrilov
On Sun, May 26, 2024 at 7:06 PM Mikhail Gavrilov wrote: > > Hi, > Day before yesterday I replaced 7900XTX to 6900XT for got clear in > which kernel first time appeared warning message "DMA-API: amdgpu > :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren&#

6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-05-26 Thread Mikhail Gavrilov
Hi, Day before yesterday I replaced 7900XTX to 6900XT for got clear in which kernel first time appeared warning message "DMA-API: amdgpu :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported". The kernel 6.3 and older won't boot on a computer with Radeon 7900XTX. When I boo

Re: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities

2024-01-24 Thread Mikhail Gavrilov
On Wed, Jan 24, 2024 at 7:19 AM Mikhail Gavrilov wrote: > > Who could dig into it, please? You decided to revert it? https://lkml.org/lkml/2024/1/22/1866 Also I forgot to attach the kernel build .config in the previous message. I'm going to fix it here. It may be useful for reprodu

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-19 Thread Mikhail Gavrilov
On Fri, Dec 15, 2023 at 5:37 PM Christian König wrote: > > I have no idea :) > > From the logs I can see that the AMDGPU now has the proper BARs assigned: > > [5.722015] pci :03:00.0: [1002:73df] type 00 class 0x038000 > [5.722051] pci :03:00.0: reg 0x10: [mem > 0xf8-0xfbf

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-15 Thread Mikhail Gavrilov
On Tue, Feb 28, 2023 at 5:43 PM Christian König wrote: > > The point is it doesn't need to talk to the amdgpu hardware. What it > does is that it talks to the good old VGA/VESA emulation and that just > happens to be still enabled by the BIOS/GRUB. > > And that VGA/VESA emulation doesn't need any

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-17 Thread Mikhail Gavrilov
he first one patch is enough. Tested-on: 7900XTX, 6900XT and 6800M. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Wed, Nov 8, 2023 at 12:12 AM Alex Deucher wrote: > > The attached patch should fix it. Not sure why your GPU shows up as > busy. The AGP aperture was just disabled. Tested-by: Mikhail Gavrilov Thanks, after applying the patch GPU loading meets expectations. Games are working so ov

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Mon, Nov 6, 2023 at 8:29 PM Alex Deucher wrote: > > Already fixed in this commit: > https://gitlab.freedesktop.org/agd5f/linux/-/commit/d1d4c0b7b65b7fab2bc6f97af9e823b1c42ccdb0 > Which is in included in last weeks PR. > Thanks, it fixed the issue above. But, unfortunately this is not the only

Re: [PATCH] drm/ttm: check null pointer before accessing when swapping

2023-07-27 Thread Mikhail Gavrilov
On Thu, Jul 27, 2023 at 12:33 PM Chen, Guchun wrote: > > Reviewed-by: Christian König > > > > Has this already been pushed to drm-misc-next? > > > > Thanks, > > Christian. > > Not yet, Christian, as I don't have push permission. I saw you were on > vacation, so I would expect to ping you to push

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-25 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov wrote: > > Important don't give up. > https://youtu.be/25zhHBGIHJ8 [40 min] > https://youtu.be/utnDR26eYBY [50 min] > https://youtu.be/DJQ_tiimW6g [12 min] > https://youtu.be/Y6AH1oJKivA [6 min] > Yes the issue is everyth

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > Could you try drm-misc-next as well? If as I assume I cloned right repo $ git clone -b drm-misc-next git://anongit.freedesktop.org/drm/drm-misc linux-drm-misc-next for my hardware last commit on this branch is turned out completely unworkin

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > > Could you try drm-misc-next as well? > > Going to give drm-fixes another round of testing. > > Thanks, > Christian. Important don't give up. https://youtu.be/25zhHBGIHJ8 [40 min] https://youtu.be/utnDR26eYBY [50 min] https://youtu.be/DJQ_

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
On Wed, Apr 19, 2023 at 1:12 PM Christian König wrote: > > I'm already looking into this, but can't figure out why we run into > problems here. > > What happens is that a CS is aborted without sending the job to the > scheduler and in this case the cleanup function doesn't seem to work. > > Christ

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
Christian? ❯ /usr/src/kernels/6.3.0-0.rc7.56.fc39.x86_64/scripts/faddr2line /lib/debug/lib/modules/6.3.0-0.rc7.56.fc39.x86_64/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.debug drm_sched_job_cleanup+0x9a drm_sched_job_cleanup+0x9a/0x130: drm_sched_job_cleanup at /usr/src/debug/kernel-6.3-rc7/linu

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-14 Thread Mikhail Gavrilov
On Tue, Apr 11, 2023 at 10:40 PM Mikhail Gavrilov wrote: > > Hi, > KASAN continues to find problems in the drm_sched_job_cleanup code at 6.3rc6. > I not got any feedback in the thread > https://lore.kernel.org/lkml/cabxgcsmvub2ra4d+k5cna0_2521tox++d4nmoukki4x2-q_...@mail.gmail.com/

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-04-04 Thread Mikhail Gavrilov
On Fri, Mar 24, 2023 at 7:37 PM Christian König wrote: > > Yeah, that one > > Thanks for the info, looks like this isn't fixed. > > Christian. > Hi, glad to see that "BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0" was fixed in 6.3-rc5. For history it would be good to kn

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-03-23 Thread Mikhail Gavrilov
On Tue, Mar 21, 2023 at 11:47 PM Christian König wrote: > > Hi Mikhail, > > That looks like a reference counting issue to me. > > I'm going to take a look, but we have already fixed one of those recently. > > Probably best that you try this on drm-fixes, just to double check that > this isn't the

[6.3][regression] commit a4e771729a51168bc36317effaa9962e336d4f5e lead to flood kernel logs with warning messages "at kernel/workqueue.c:3167 __flush_work+0x472/0x500"

2023-03-08 Thread Mikhail Gavrilov
Hi, I didn't faced to issue drm_bridge_hpd_enable+0x94/0x9c [drm] but fixing this issue leads to warning messages on my laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007 which has two AMD GPU. Discrete Radeon 6800M and integrated in CPU Cezanne Vega 8. I found bad commit by bisecting: ❯ git

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-28 Thread Mikhail Gavrilov
On Mon, Feb 27, 2023 at 3:22 PM Christian König > > Unfortunately yes. We could clean that up a bit more so that you don't > run into a BUG() assertion, but what essentially happens here is that we > completely fail to talk to the hardware. > > In this situation we can't even re-enable vesa or text

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 8:31 PM Christian König wrote: > > Sorry I totally missed that you attached the full dmesg to your original > mail. > > Yeah, the driver did fail gracefully. But then X doesn't come up and > then gdm just dies. Are you sure that these messages should be present when the dr

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 12:13 PM Christian König wrote: > > Hi Mikhail, > > this is pretty clearly a problem with the system and/or it's BIOS and > not the GPU hw or the driver. > > The option pci=nocrs makes the kernel ignore additional resource windows > the BIOS reports through ACPI. This then

amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-23 Thread Mikhail Gavrilov
Hi, I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But it is impossible to use without AC power because the system losts nvme when I disconnect the power adapter. Messages from kernel log when it happens: nvme nvme0: controller is down; will reset: CSTS=0x, PCI_STATUS=0

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-13 Thread Mikhail Gavrilov
drop_locks no longer appears anymore. I hope this patch will have time to be merged in 6.2 before release. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov. uptime.tar.xz Description: application/xz

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-09 Thread Mikhail Gavrilov
9be6ebc61e will stop these warnings. I also attached fresh logs from 6.2.0-0.rc6. 6.2-rc7 I started to build without commit b261509952bc19d1012cf732f853659be6ebc61e to avoid these warnings. On Thu, Oct 13, 2022 at 6:36 PM Mikhail Gavrilov > > Hi! > I bisected an issue of the 6.0 kernel whic

[6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs

2022-12-18 Thread Mikhail Gavrilov
Hi, The kernel 6.2 preparation cycle has begun. And after the kernel was updated on my Fedora Rawhide I started receiving use-after-free errors with complete computer hangs. At least a good reproducer of this behaviour is launch of the game "Marvel's Avengers". The backtrace of the issue looks lik

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-28 Thread Mikhail Gavrilov
On Tue, Nov 22, 2022 at 12:16 PM Christian König wrote: > > Ah, thanks a lot for this. I've already pushed the patches into our > internal branch, but getting this confirmation is still great! > > This was quite some fundamental bug in the handling and I hope to get > this completely reworked at s

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2022-11-22 Thread Mikhail Gavrilov
On Thu, Oct 13, 2022 at 6:36 PM Mikhail Gavrilov wrote: > > Hi! > I bisected an issue of the 6.0 kernel which started happening after > 6.0-rc7 on all my machines. > > Backtrace of this issue looks like as: > > [ 2807.339439] [ cut here ] > [ 28

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-21 Thread Mikhail Gavrilov
letely gone. All known broken games working. Tested-by: Mikhail Gavrilov The only thing I don't like is the flood in the kernel logs of the message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70", but this is not related to the patches bei

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-02 Thread Mikhail Gavrilov
On Tue, Nov 1, 2022 at 10:52 PM Christian König wrote: > > Let's focus on one problem at a time. > > The issue here is that somehow userptr handling became racy after we > removed the lock, but I don't see why. > > We need to fix this ASAP since it is probably a much wider problem and > the additi

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-30 Thread Mikhail Gavrilov
On Wed, Oct 26, 2022 at 12:29 PM Christian König wrote: > > Attached is the original test patch rebased on current amd-staging-drm-next. > > Can you test if this is enough to make sure that the games start without > crashing by fetching the userptrs? 1. Over the past week the list of games affect

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
On Fri, Oct 21, 2022 at 1:33 PM Christian König wrote: > > Hi, > > yes Bas already reported this issue, but I couldn't reproduce it. Need > to come up with a patch to narrow this down further. > > Can I send you something to test? I would appreciate to test any patches and ideas. -- Best Regard

[6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
Hi! I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6. dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 Author: Christian König Date: Thu Jul 14 10:23:38

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-10-17 Thread Mikhail Gavrilov
On Wed, May 11, 2022 at 5:01 PM Christian König wrote: > > > We have implemented a workaround, but still don't know the exact root cause. > > If anybody wants to look into this it would be rather helpful to be able > to reproduce the issue. > > Regards, > Christian. I see that issue was returned

[regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2022-10-13 Thread Mikhail Gavrilov
Hi! I bisected an issue of the 6.0 kernel which started happening after 6.0-rc7 on all my machines. Backtrace of this issue looks like as: [ 2807.339439] [ cut here ] [ 2807.339445] WARNING: CPU: 11 PID: 2061 at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks

[regression][6.1] After commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86 system randomly hungs

2022-10-11 Thread Mikhail Gavrilov
Hi! The hungs occurs randomly, but I found good reproductive scenario (This is running the campaign in the game Halo Infinite) The backtrace is look like this: [ 147.260971] BUG: kernel NULL pointer dereference, address: 0088 [ 147.260987] [ cut here ] [ 147.

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-09-19 Thread Mikhail Gavrilov
Hi! Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel. The issue became hard to repeat. I spent the whole day at the computer when use-after-free again happened, I was playing the game Tiny Tina's Wonderlands. Therefore, forget about repeatability. It remains only to hope f

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-24 Thread Mikhail Gavrilov
On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal wrote: > > Hi Mikhail, > > Could you please specify the steps to reproduce this use-after-free? I > will try to reproduce it on the RX5700 XT and bisect the issue. > Hi Maíra, thanks for help. I'm afraid that it will be unrealistic to reproduce, becaus

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal wrote: > > Hi Mikhail, > > Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial > revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the > error. Try reverting it and check if the use-after-free still happens. Thanks,

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen wrote: > > Hi Mikhail, > > IIUC, you got this second user-after-free by applying the first version > of Maíra's patch, right? So, that version was adding another unbalanced > unlock to the cs ioctl flow, but it was solved in the latest version, > that yo

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-16 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov wrote: > > Thanks, I tested this patch. > But with this patch use-after-free problem happening in another place: Does anyone have an idea why the second use-after-free happened? >From the trace I don't understand which code is

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal wrote: > > Hi Mikhail > > Looks like this use-after-free problem was introduced on > 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems > like: if amdgpu_cs_vm_handling return r != 0, then it will unlock > bo_list_mutex inside the fun

[BUG][5.20] refcount_t: underflow; use-after-free

2022-08-14 Thread Mikhail Gavrilov
Hi folks. Joined testing 5.20 today (7ebfc85e2cd7). I encountered a frequently GPU freeze, after which a message appears in the kernel logs: [ 220.280990] [ cut here ] [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcoun

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-24 Thread Mikhail Gavrilov
On Thu, 21 Jan 2021 at 18:27, Christian König wrote: > > I still have no idea what's going on here. > > The KASAN messages from the DC code are completely unrelated. > > Please add the full dmesg to your bug report. > I did it. https://gitlab.freedesktop.org/drm/amd/-/issues/1439#note_776267 --

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-19 Thread Mikhail Gavrilov
On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov wrote: > In rc4, the number of warnings has dropped dramatically. No more errors "kasan slab-out-of-bounds" and no "DMA-API device driver failed to check map error". But still not fixed "sleeping function called from inva

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-14 Thread Mikhail Gavrilov
On Thu, 14 Jan 2021 at 18:56, Christian König wrote: > Unfortunately not of hand. > > I also don't see any bug reports from other people and can't reproduce > the last backtrace you send out TTM here. Because only the most desperate will install kernels with enabled debug flags and then load the

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-13 Thread Mikhail Gavrilov
On Tue, 12 Jan 2021 at 01:45, Christian König wrote: > > But what you have in your logs so far are only unrelated symptoms, the > root of the problem is that somebody is leaking memory. > > What you could do as well is to try to enable kmemleak I captured some memleaks. Do they contain any useful

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
Hi Christian, On Tue, 12 Jan 2021 at 01:45, Christian König wrote: > > Hi Mike, > > Unfortunately not, that's DC stuff. Easiest is to assign this as a bug > tracker to our DC team. Ok > At least some progress. Any objections that I add your e-mail address as > tested-by tag? Yes, feel free add m

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
On Mon, 11 Jan 2021 at 19:01, Christian König wrote: > Changing the page table attributes while releasing memory might sleep. > So we can't use a spinlock here. > > Thanks for the report, a patch to fix this is on the mailing list now. Can you look also the first trace? Here a same error message

[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-10 Thread Mikhail Gavrilov
Hi folks, today I joined to testing Kernel 5.11 and saw that the kernel log was flooded with BUG messages: BUG: sleeping function called from invalid context at mm/vmalloc.c:1756 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0 INFO: lockdep is turned off. CPU: 15 PID: 266

Re: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

2020-12-30 Thread Mikhail Gavrilov
On Tue, 29 Dec 2020 at 20:15, Deucher, Alexander wrote: > > It looks like the driver is not able to access the firmware for some reason. > Please make sure it is available in your initrd or compiled into the kernel > depending on your config. Exactly! Thanks! # lsinitrd /boot/initramfs-5.10.

Re: [bug] Radeon 3900XT not switch to graphic mode on kernel 5.10

2020-12-27 Thread Mikhail Gavrilov
On Sun, 27 Dec 2020 at 21:39, Mikhail Gavrilov wrote: > I suppose the root of cause my problem here: > > [3.961326] amdgpu :0b:00.0: Direct firmware load for > amdgpu/sienna_cichlid_sos.bin failed with error -2 > [3.961359] amdgpu :0b:00.0: amdgpu: failed to in

[bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure

2020-11-03 Thread Mikhail Gavrilov
Hi folks. I observed hard reproductible the set of bugs. It always started as 1) kworker/u64:2: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 Continious as: 2) WARNING: CPU: 21 PID: 806649 at drivers/gpu/drm/amd/amdgpu/../d

[BUG] general protection fault, probably for non-canonical address 0xfe5d6f0af7831e5e: 0000 [#1] SMP NOPTI (5.7RC4 GIT 79dede78c057)

2020-05-11 Thread Mikhail Gavrilov
Hi folks. I didn’t do anything unusual, I just restarted the computer after the update, launched all the applications that I usually launch and went to drink tea. When I returned, I found that the monitor was on (it should have turned off since I had set the energy-saving mode for 5 minutes in DE)

Re: BUG: kernel NULL pointer dereference, address: 0000000000000026 after switching to 5.7 kernel

2020-04-18 Thread Mikhail Gavrilov
On Sat, 11 Apr 2020 at 14:56, Christian König wrote: > > Yeah, that is a known issue. > > You could try the attached patch, but please be aware that it is not > even compile tested because of the Easter holidays here. > Looks good to me, so it's pity that this patch did not exist in the pull requ

BUG: kernel NULL pointer dereference, address: 0000000000000026 after switching to 5.7 kernel

2020-04-10 Thread Mikhail Gavrilov
Hi folks. After upgrade kernel to 5.7 I see every boot in kernel log following error messages: [2.569513] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19 [2.569538] [drm] PSP loading UVD firmware [2.570038] BUG: kernel NULL pointer dereference, address: 0026 [2

BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 60s!

2020-01-11 Thread Mikhail Gavrilov
Hi folks, I just wanted to share my logs via paste but didn't look at what size they are. I opened the file in Geany and press Ctrl + A, Ctrl + C, and then go to Chrome in tab with opened pastebin.com and pressed Ctrl + V. I did not expect that after such action the GUI of the system hangs. I conne

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-15 Thread Mikhail Gavrilov
On Mon, 9 Sep 2019 at 14:15, Koenig, Christian wrote: > > I agree with Daniels analysis. > > It looks like the problem is simply that PM turns of a block before all > work is done on that block. > > Have you opened a bug report yet? If not then that would certainly help > cause it is really hard t

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-08 Thread Mikhail Gavrilov
p 2019 at 12:58, Daniel Vetter wrote: > > On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov > wrote: > > > > On Wed, 4 Sep 2019 at 13:37, Daniel Vetter wrote: > > > > > > Extend your backtrac warning slightly like > > > > > > WARN(r,

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-04 Thread Mikhail Gavrilov
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter wrote: > > Extend your backtrac warning slightly like > > WARN(r, "we're stuck on fence %pS\n", fence->ops); > > Also adding Harry and Alex, I'm not really working on amdgpu ... [ 3511.998320] [ cut here ] [ 3511.998714] w

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-03 Thread Mikhail Gavrilov
On Tue, 3 Sep 2019 at 13:21, Hillf Danton wrote: > > Describe the problems you are experiencing please. > Say is the screen locked up? Machine lockedup? > Anything unnormal after you see the warning? > According to my observations, all "gnome shell stuck warning" happened when me not sitting on t

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-02 Thread Mikhail Gavrilov
On Fri, 30 Aug 2019 at 08:30, Hillf Danton wrote: > > Add a warning to show if it makes sense in field: neither regression nor > problem will have been observed with the warning printed. > I caught the problem. [21793.094289] [ cut here ] [21793.094296] gnome shell stuck

gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-08-25 Thread Mikhail Gavrilov
Hi folks, I left unblocked gnome-shell at noon, and when I returned at the evening I discovered than monitor not sleeping and show open gnome activity. At first, I thought that some application did not let fall asleep the system. But when I try to move the mouse, I realized that the system hanged.

Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

2019-08-10 Thread Mikhail Gavrilov
On Fri, 9 Aug 2019 at 23:55, Mikhail Gavrilov wrote: > Finally initial problem "gnome-shell: page allocation failure: > order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), > nodemask=(null),cpuset=/,mems_allowed=0" did not happens anymore with > latest version of the patch (I tes

Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

2019-08-05 Thread Mikhail Gavrilov
On Mon, 5 Aug 2019 at 08:21, Hillf Danton wrote: > > > > Try to fix the failure above using vmalloc + kmalloc. > > --- a/drivers/gpu/drm/amd/display/dc/core/dc.c > +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c > @@ -1174,8 +1174,12 @@ struct dc_state *dc_create_state(struct > struct dc_st