On Thu, Mar 20, 2025 at 4:29 AM Feng, Kenneth <kenneth.f...@amd.com> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Alex,
> The call trace is generated when the gdm is launched, as below.
> I tried running on a standalone workqueue but still see the workqueue is 
> flushed.

I think that should be fixed by this patch:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de35994ecd2dd6148ab5a6c5050a1670a04dec77

Alex

> Thanks.
>
> [   21.558439] ------------[ cut here ]------------
> [   21.558443] workqueue: WQ_MEM_RECLAIM gfx_0.0.0:drm_sched_run_job_work 
> [amd_sched] is flushing !WQ_MEM_RECLAIM 
> events:amdgpu_gfx_profile_idle_work_handler [amdgpu]
> [   21.558716] WARNING: CPU: 0 PID: 115 at kernel/workqueue.c:3706 
> check_flush_dependency+0x151/0x180
> [   21.558724] Modules linked in: snd_seq_dummy snd_hrtimer qrtr sunrpc 
> amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel 
> snd_intel_dspcfg edac_mce_amd snd_intel_sdw_acpi snd_usb_audio snd_hda_codec 
> kvm_amd snd_usbmidi_lib snd_hda_core snd_ump mc snd_hwdep snd_pcm kvm 
> snd_seq_midi snd_seq_midi_event crct10dif_pclmul snd_rawmidi polyval_clmulni 
> polyval_generic ghash_clmulni_intel spd5118 sha256_ssse3 sha1_ssse3 snd_seq 
> aesni_intel crypto_simd cryptd snd_seq_device snd_timer rapl wmi_bmof ccp snd 
> i2c_piix4 k10temp i2c_smbus soundcore input_leds joydev gpio_amdpt mac_hid 
> binfmt_misc sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink 
> dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid amdgpu(OE) 
> amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) drm_exec 
> drm_suballoc_helper amd_sched(OE) amdkcl(OE) drm_display_helper cec rc_core 
> nvme i2c_algo_bit drm_ttm_helper crc32_pclmul r8169 xhci_pci nvme_core ahci 
> ttm xhci_pci_renesas libahci realtek nvme_auth video wmi
> [   21.558817] CPU: 0 UID: 0 PID: 115 Comm: kworker/u64:1 Tainted: G          
>  OE      6.11.0-17-generic #17~24.04.2-Ubuntu
> [   21.558822] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [   21.558823] Hardware name: Micro-Star International Co., Ltd. MS-7D76/MAG 
> B650M MORTAR WIFI (MS-7D76), BIOS A.J0 12/17/2024
> [   21.558825] Workqueue: gfx_0.0.0 drm_sched_run_job_work [amd_sched]
> [   21.558830] RIP: 0010:check_flush_dependency+0x151/0x180
> [   21.558833] Code: 56 18 4d 89 e0 48 8d 8b c0 00 00 00 48 c7 c7 e8 88 09 a1 
> c6 05 e8 4d 8d 02 01 48 8b 70 08 48 81 c6 c0 00 00 00 e8 6f 54 fd ff <0f> 0b 
> e9 d2 fe ff ff 44 0f b6 3d ca 4d 8d 02 41 80 ff 01 77 0f 41
> [   21.558836] RSP: 0018:ffffae930051fbe8 EFLAGS: 00010046
> [   21.558838] RAX: 0000000000000000 RBX: ffff9abf80201400 RCX: 
> 0000000000000000
> [   21.558840] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [   21.558842] RBP: ffffae930051fc10 R08: 0000000000000000 R09: 
> 0000000000000000
> [   21.558843] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffffffffc0992ad0
> [   21.558844] R13: 0000000000000000 R14: ffff9abf8030d440 R15: 
> ffffae930051fc40
> [   21.558846] FS:  0000000000000000(0000) GS:ffff9ace9d800000(0000) 
> knlGS:0000000000000000
> [   21.558848] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   21.558850] CR2: 0000073bf2b6c000 CR3: 000000004623e000 CR4: 
> 0000000000f50ef0
> [   21.558852] PKRU: 55555554
> [   21.558853] Call Trace:
> [   21.558855]  <TASK>
> [   21.558859]  ? show_regs+0x6c/0x80
> [   21.558864]  ? __warn+0x88/0x140
> [   21.558867]  ? check_flush_dependency+0x151/0x180
> [   21.558870]  ? report_bug+0x182/0x1b0
> [   21.558875]  ? handle_bug+0x6e/0xb0
> [   21.558880]  ? exc_invalid_op+0x18/0x80
> [   21.558883]  ? asm_exc_invalid_op+0x1b/0x20
> [   21.558888]  ? __pfx_amdgpu_gfx_profile_idle_work_handler+0x10/0x10 
> [amdgpu]
> [   21.559113]  ? check_flush_dependency+0x151/0x180
> [   21.559116]  ? check_flush_dependency+0x151/0x180
> [   21.559120]  __flush_work+0x238/0x310
> [   21.559124]  ? __mod_timer+0x122/0x340
> [   21.559129]  cancel_delayed_work_sync+0x76/0x80
> [   21.559133]  amdgpu_gfx_profile_ring_begin_use+0x34/0xa0 [amdgpu]
> [   21.559341]  gfx_v12_0_ring_begin_use+0x12/0x30 [amdgpu]
> [   21.559531]  amdgpu_ring_alloc+0x40/0x70 [amdgpu]
> [   21.559675]  amdgpu_ib_schedule+0x172/0x830 [amdgpu]
> [   21.559821]  amdgpu_job_run+0x8d/0x200 [amdgpu]
> [   21.559994]  drm_sched_run_job_work+0x2bb/0x450 [amd_sched]
> [   21.559997]  process_one_work+0x178/0x3d0
> [   21.560000]  worker_thread+0x2de/0x410
> [   21.560002]  ? __pfx_worker_thread+0x10/0x10
> [   21.560004]  kthread+0xe1/0x110
> [   21.560006]  ? __pfx_kthread+0x10/0x10
> [   21.560008]  ret_from_fork+0x44/0x70
> [   21.560010]  ? __pfx_kthread+0x10/0x10
> [   21.560012]  ret_from_fork_asm+0x1a/0x30
> [   21.560017]  </TASK>
> [   21.560017] ---[ end trace 0000000000000000 ]---
>
>
> -----Original Message-----
> From: Alex Deucher <alexdeuc...@gmail.com>
> Sent: Wednesday, March 19, 2025 8:54 PM
> To: Feng, Kenneth <kenneth.f...@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Wang, Yang(Kevin) <kevinyang.w...@amd.com>
> Subject: Re: [PATCH] drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx 
> idle worker timeout"
>
> Caution: This message originated from an External Source. Use proper caution 
> when opening attachments, clicking links, or responding.
>
>
> On Wed, Mar 19, 2025 at 2:38 AM Kenneth Feng <kenneth.f...@amd.com> wrote:
> >
> > This reverts commit b00fb9765ea4b05198d67256118445c6f13f9ddf.
> >
> > Reason for revert: this causes some tests fail with call trace.
>
> Do you have a copy of the call trace?  I can't see how this would be an issue?
>
> Alex
>
> >
> > Signed-off-by: Kenneth Feng <kenneth.f...@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index a6d3a4554caa..75af4f25a133 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -57,8 +57,8 @@ enum amdgpu_gfx_pipe_priority {  #define
> > AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM  0  #define
> > AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM  15
> >
> > -/* 10 millisecond timeout */
> > -#define GFX_PROFILE_IDLE_TIMEOUT       msecs_to_jiffies(10)
> > +/* 1 second timeout */
> > +#define GFX_PROFILE_IDLE_TIMEOUT       msecs_to_jiffies(1000)
> >
> >  enum amdgpu_gfx_partition {
> >         AMDGPU_SPX_PARTITION_MODE = 0,
> > --
> > 2.34.1
> >

Reply via email to