On 5/5/26 21:01, Matthew Brost wrote:
> On Tue, May 05, 2026 at 10:18:14AM +0300, Mika Penttilä wrote: >> On 5/5/26 10:09, Alistair Popple wrote: >> >>> Thanks for doing this work Mika. I've been meaning to take a look at this >>> series >>> for a while. I'm currently at LSFMM but will try and take a look this week >>> or >>> next as it sounds quite useful. >>> >>> - Alistair >> Thanks Alistair and no problem, appreciate your insights whenever you have >> time. >> > It looks like this series is breaking Intel's CI [1]. Looks like > something in RCU is blowing up: > > <4> [212.361418] ------------[ cut here ]------------ > <4> [212.361431] Voluntary context switch within RCU read-side critical > section! > <4> [212.361432] WARNING: kernel/rcu/tree_plugin.h:332 at > rcu_note_context_switch+0x82/0x780, CPU#11: kworker/u65:5/2352 > <4> [212.361440] Modules linked in: snd_hda_codec_intelhdmi > snd_hda_codec_hdmi mei_lb mei_gsc_proxy mtd_intel_dg mei_gsc xe drm_gpuvm > drm_gpusvm_helper drm_buddy gpu_sched drm_ttm_helper ttm drm_suballoc_helper > drm_exec drm_display_helper cec rc_core drm_kunit_helpers i2c_algo_bit kunit > overlay intel_rapl_msr intel_rapl_common intel_uncore_frequency > intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal > intel_powerclamp hid_generic coretemp eeepc_wmi cmdlinepart asus_wmi > binfmt_misc sparse_keymap spi_nor mei_hdcp mei_pxp mtd wmi_bmof kvm_intel kvm > irqbypass aesni_intel gf128mul r8169 usbhid rapl hid intel_cstate realtek > snd_hda_intel phy_package snd_intel_dspcfg intel_pmc_core snd_hda_codec > idma64 nls_iso8859_1 pmt_telemetry snd_hda_core video snd_hwdep pmt_discovery > snd_pcm i2c_i801 pinctrl_alderlake pmt_class snd_timer i2c_mux > intel_pmc_ssram_telemetry acpi_tad acpi_pad mei_me snd i2c_smbus > spi_intel_pci soundcore mei spi_intel wmi intel_vsec dm_multipath msr > nvme_fabrics fuse efi_pstore nfnetlink autofs4 > <4> [212.361711] CPU: 11 UID: 0 PID: 2352 Comm: kworker/u65:5 Tainted: G S > U 7.1.0-rc2-lgci-xe-xe-pw-165953v1-debug+ #1 PREEMPT(lazy) > <4> [212.361715] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER > <4> [212.361716] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, > BIOS 0812 02/24/2023 > <4> [212.361718] Workqueue: xe_page_fault_work_queue xe_pagefault_queue_work > [xe] > <4> [212.361833] RIP: 0010:rcu_note_context_switch+0x82/0x780 > <4> [212.361838] Code: 45 85 c0 74 0f 65 8b 05 24 84 ab 02 85 c0 0f 84 8d 01 > 00 00 45 84 ed 75 16 8b 83 bc 08 00 00 85 c0 7e 0c 48 8d 3d de ad 4d 02 <67> > 48 0f b9 3a 8b 83 bc 08 00 00 85 c0 7e 0d 80 bb c0 08 00 00 00 > <4> [212.361840] RSP: 0018:ffffc9000186f4a0 EFLAGS: 00010002 > <4> [212.361843] RAX: 0000000000000001 RBX: ffff88810a3a8040 RCX: > 0000000000000000 > <4> [212.361845] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffffffff839bcea0 > <4> [212.361846] RBP: ffffc9000186f4e8 R08: 0000000000000001 R09: > 0000000000000000 > <4> [212.361848] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff88885f1b6a00 > <4> [212.361849] R13: 0000000000000000 R14: ffffffff83248312 R15: > ffffc9000186f630 > <4> [212.361851] FS: 0000000000000000(0000) GS:ffff8888db203000(0000) > knlGS:0000000000000000 > <4> [212.361853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> [212.361854] CR2: 00007fe433b2f088 CR3: 000000000344a000 CR4: > 0000000000f52ef0 > <4> [212.361856] PKRU: 55555554 > <4> [212.361858] Call Trace: > <4> [212.361859] <TASK> > <4> [212.361862] ? lock_is_held_type+0xa3/0x130 > <4> [212.361868] __schedule+0x103/0x1f70 > <4> [212.361870] ? lock_acquire+0xc4/0x300 > <4> [212.361874] ? find_held_lock+0x31/0x90 > <4> [212.361877] ? schedule+0x10e/0x180 > <4> [212.361880] ? lock_release+0xd0/0x2b0 > <4> [212.361885] schedule+0x3a/0x180 > <4> [212.361888] io_schedule+0x4c/0x80 > <4> [212.361890] ? softleaf_entry_wait_on_locked+0x147/0x2b0 > <4> [212.361894] softleaf_entry_wait_on_locked+0x24f/0x2b0 > <4> [212.361899] ? __pfx_wake_page_function+0x10/0x10 > <4> [212.361904] migration_entry_wait+0xff/0x190 > <4> [212.361909] hmm_vma_handle_pte+0x440/0x790 > <4> [212.361914] hmm_vma_walk_pmd+0x5c8/0x1360 > <4> [212.361918] ? xe_pagefault_queue_work+0x1a9/0x520 [xe] > <4> [212.362015] walk_pgd_range+0x57f/0xd70 > <4> [212.362017] ? lock_is_held_type+0xa3/0x130 > <4> [212.362028] __walk_page_range+0x8e/0x290 > <4> [212.362034] walk_page_range_mm_unsafe+0x19e/0x270 > <4> [212.362036] ? trace_hardirqs_on+0x22/0xf0 > <4> [212.362043] walk_page_range+0x2a/0x40 > <4> [212.362045] hmm_range_fault+0x94/0x190 > <4> [212.362053] drm_gpusvm_get_pages+0x269/0xa30 [drm_gpusvm_helper] > <4> [212.362067] drm_gpusvm_range_get_pages+0x2e/0x50 [drm_gpusvm_helper] > <4> [212.362071] __xe_svm_handle_pagefault+0x3e0/0xef0 [xe] > <4> [212.362181] ? __lock_acquire+0x43e/0x2790 > <4> [212.362188] ? lock_is_held_type+0xa3/0x130 > <4> [212.362193] ? lock_is_held_type+0xa3/0x130 > <4> [212.362197] ? xe_vm_find_overlapping_vma+0x57/0x1e0 [xe] > <4> [212.362304] xe_svm_handle_pagefault+0x3d/0xb0 [xe] > <4> [212.362412] xe_pagefault_queue_work+0x1a9/0x520 [xe] > <4> [212.362509] process_one_work+0x239/0x740 > <4> [212.362518] worker_thread+0x200/0x3f0 > <4> [212.362521] ? __pfx_worker_thread+0x10/0x10 > <4> [212.362524] kthread+0x10d/0x150 > <4> [212.362527] ? __pfx_kthread+0x10/0x10 > <4> [212.362530] ret_from_fork+0x3bd/0x470 > <4> [212.362533] ? __pfx_kthread+0x10/0x10 > <4> [212.362536] ret_from_fork_asm+0x1a/0x30 > <4> [212.362546] </TASK> > <4> [212.362547] irq event stamp: 2057044 > > I’ll be out this Thursday for five weeks, but assuming you can sort this > part out, I’m fine with the series moving forward. I’ve looked at this > several times, and it seems sane enough to me. > > On our list we also have the Sashiko setup [2], which I’ve found to be > incredibly helpful for series that do deep MM work. I’m not sure why > Sashiko is saying this series didn’t apply, since it applied cleanly to > our CI branches. If you can get Sashiko to run on it, that might be > helpful as well. > > Matt Yes there seemed to be a missing pte_unmap() before migration_entry_wait()... fixed and sent v10. --Mika > > [1] > https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-165953v1/shard-bmg-4/igt@xe_exec_system_alloca...@process-many-stride-mmap-race-nomemset.html > [2] > https://sashiko.dev/#/patchset/20260505051658.2219537-1-mpenttil%40redhat.com > >
