On Thu, Mar 20, 2025 at 4:29 AM Feng, Kenneth <kenneth.f...@amd.com> wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Alex, > The call trace is generated when the gdm is launched, as below. > I tried running on a standalone workqueue but still see the workqueue is > flushed.
I think that should be fixed by this patch: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de35994ecd2dd6148ab5a6c5050a1670a04dec77 Alex > Thanks. > > [ 21.558439] ------------[ cut here ]------------ > [ 21.558443] workqueue: WQ_MEM_RECLAIM gfx_0.0.0:drm_sched_run_job_work > [amd_sched] is flushing !WQ_MEM_RECLAIM > events:amdgpu_gfx_profile_idle_work_handler [amdgpu] > [ 21.558716] WARNING: CPU: 0 PID: 115 at kernel/workqueue.c:3706 > check_flush_dependency+0x151/0x180 > [ 21.558724] Modules linked in: snd_seq_dummy snd_hrtimer qrtr sunrpc > amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel > snd_intel_dspcfg edac_mce_amd snd_intel_sdw_acpi snd_usb_audio snd_hda_codec > kvm_amd snd_usbmidi_lib snd_hda_core snd_ump mc snd_hwdep snd_pcm kvm > snd_seq_midi snd_seq_midi_event crct10dif_pclmul snd_rawmidi polyval_clmulni > polyval_generic ghash_clmulni_intel spd5118 sha256_ssse3 sha1_ssse3 snd_seq > aesni_intel crypto_simd cryptd snd_seq_device snd_timer rapl wmi_bmof ccp snd > i2c_piix4 k10temp i2c_smbus soundcore input_leds joydev gpio_amdpt mac_hid > binfmt_misc sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink > dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid amdgpu(OE) > amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) drm_exec > drm_suballoc_helper amd_sched(OE) amdkcl(OE) drm_display_helper cec rc_core > nvme i2c_algo_bit drm_ttm_helper crc32_pclmul r8169 xhci_pci nvme_core ahci > ttm xhci_pci_renesas libahci realtek nvme_auth video wmi > [ 21.558817] CPU: 0 UID: 0 PID: 115 Comm: kworker/u64:1 Tainted: G > OE 6.11.0-17-generic #17~24.04.2-Ubuntu > [ 21.558822] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE > [ 21.558823] Hardware name: Micro-Star International Co., Ltd. MS-7D76/MAG > B650M MORTAR WIFI (MS-7D76), BIOS A.J0 12/17/2024 > [ 21.558825] Workqueue: gfx_0.0.0 drm_sched_run_job_work [amd_sched] > [ 21.558830] RIP: 0010:check_flush_dependency+0x151/0x180 > [ 21.558833] Code: 56 18 4d 89 e0 48 8d 8b c0 00 00 00 48 c7 c7 e8 88 09 a1 > c6 05 e8 4d 8d 02 01 48 8b 70 08 48 81 c6 c0 00 00 00 e8 6f 54 fd ff <0f> 0b > e9 d2 fe ff ff 44 0f b6 3d ca 4d 8d 02 41 80 ff 01 77 0f 41 > [ 21.558836] RSP: 0018:ffffae930051fbe8 EFLAGS: 00010046 > [ 21.558838] RAX: 0000000000000000 RBX: ffff9abf80201400 RCX: > 0000000000000000 > [ 21.558840] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 21.558842] RBP: ffffae930051fc10 R08: 0000000000000000 R09: > 0000000000000000 > [ 21.558843] R10: 0000000000000000 R11: 0000000000000000 R12: > ffffffffc0992ad0 > [ 21.558844] R13: 0000000000000000 R14: ffff9abf8030d440 R15: > ffffae930051fc40 > [ 21.558846] FS: 0000000000000000(0000) GS:ffff9ace9d800000(0000) > knlGS:0000000000000000 > [ 21.558848] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 21.558850] CR2: 0000073bf2b6c000 CR3: 000000004623e000 CR4: > 0000000000f50ef0 > [ 21.558852] PKRU: 55555554 > [ 21.558853] Call Trace: > [ 21.558855] <TASK> > [ 21.558859] ? show_regs+0x6c/0x80 > [ 21.558864] ? __warn+0x88/0x140 > [ 21.558867] ? check_flush_dependency+0x151/0x180 > [ 21.558870] ? report_bug+0x182/0x1b0 > [ 21.558875] ? handle_bug+0x6e/0xb0 > [ 21.558880] ? exc_invalid_op+0x18/0x80 > [ 21.558883] ? asm_exc_invalid_op+0x1b/0x20 > [ 21.558888] ? __pfx_amdgpu_gfx_profile_idle_work_handler+0x10/0x10 > [amdgpu] > [ 21.559113] ? check_flush_dependency+0x151/0x180 > [ 21.559116] ? check_flush_dependency+0x151/0x180 > [ 21.559120] __flush_work+0x238/0x310 > [ 21.559124] ? __mod_timer+0x122/0x340 > [ 21.559129] cancel_delayed_work_sync+0x76/0x80 > [ 21.559133] amdgpu_gfx_profile_ring_begin_use+0x34/0xa0 [amdgpu] > [ 21.559341] gfx_v12_0_ring_begin_use+0x12/0x30 [amdgpu] > [ 21.559531] amdgpu_ring_alloc+0x40/0x70 [amdgpu] > [ 21.559675] amdgpu_ib_schedule+0x172/0x830 [amdgpu] > [ 21.559821] amdgpu_job_run+0x8d/0x200 [amdgpu] > [ 21.559994] drm_sched_run_job_work+0x2bb/0x450 [amd_sched] > [ 21.559997] process_one_work+0x178/0x3d0 > [ 21.560000] worker_thread+0x2de/0x410 > [ 21.560002] ? __pfx_worker_thread+0x10/0x10 > [ 21.560004] kthread+0xe1/0x110 > [ 21.560006] ? __pfx_kthread+0x10/0x10 > [ 21.560008] ret_from_fork+0x44/0x70 > [ 21.560010] ? __pfx_kthread+0x10/0x10 > [ 21.560012] ret_from_fork_asm+0x1a/0x30 > [ 21.560017] </TASK> > [ 21.560017] ---[ end trace 0000000000000000 ]--- > > > -----Original Message----- > From: Alex Deucher <alexdeuc...@gmail.com> > Sent: Wednesday, March 19, 2025 8:54 PM > To: Feng, Kenneth <kenneth.f...@amd.com> > Cc: amd-gfx@lists.freedesktop.org; Wang, Yang(Kevin) <kevinyang.w...@amd.com> > Subject: Re: [PATCH] drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx > idle worker timeout" > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On Wed, Mar 19, 2025 at 2:38 AM Kenneth Feng <kenneth.f...@amd.com> wrote: > > > > This reverts commit b00fb9765ea4b05198d67256118445c6f13f9ddf. > > > > Reason for revert: this causes some tests fail with call trace. > > Do you have a copy of the call trace? I can't see how this would be an issue? > > Alex > > > > > Signed-off-by: Kenneth Feng <kenneth.f...@amd.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > > index a6d3a4554caa..75af4f25a133 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > > @@ -57,8 +57,8 @@ enum amdgpu_gfx_pipe_priority { #define > > AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM 0 #define > > AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM 15 > > > > -/* 10 millisecond timeout */ > > -#define GFX_PROFILE_IDLE_TIMEOUT msecs_to_jiffies(10) > > +/* 1 second timeout */ > > +#define GFX_PROFILE_IDLE_TIMEOUT msecs_to_jiffies(1000) > > > > enum amdgpu_gfx_partition { > > AMDGPU_SPX_PARTITION_MODE = 0, > > -- > > 2.34.1 > >