Hi,
On 9/1/26 17:22, Matthew Brost wrote:
On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
Hi
On 9/1/26 11:31, Matthew Brost wrote:
On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
Hi,
On 8/1/26 16:42, Jordan Niethe wrote:
Hi,
On 8/1/26 13:25, Jordan Niethe wrote:
Hi,
On 8/1/26 05:36, Matthew Brost wrote:
Thanks for the series. For some reason Intel's CI couldn't apply this
series to drm-tip to get results [1]. I'll manually apply this
and run all
our SVM tests and get back you on results + review the changes here. For
future reference if you want to use our CI system, the series must apply
to drm-tip, feel free to rebase this series and just send to intel-xe
list if you want CI
Thanks, I'll rebase on drm-tip and send to the intel-xe list.
For reference the rebase on drm-tip on the intel-xe list:
https://patchwork.freedesktop.org/series/159738/
Will watch the CI results.
The series causes some failures in the intel-xe tests:
https://patchwork.freedesktop.org/series/159738/#rev4
Working through the failures now.
Yea, I saw the failures. I haven't had time look at the patches on my
end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
not have bandwidth to look in depth until mid next week but digging is
on my TODO list.
Sure, that's completely fine. The failures seem pretty directly related to
the
series so I think I'll be able to make good progress.
For example
https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@[email protected]
It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
remove the call to devm_release_mem_region() now we are no longer reserving
a mem
region.
+1
So this is the one I’d be most concerned about [1].
xe_exec_system_allocator is our SVM test, which does almost all the
ridiculous things possible in user space to stress SVM. It’s blowing up
in the core MM—but the source of the bug could be anywhere (e.g., Xe
SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
have bandwidth.
Matt
[1]
https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_alloca...@threads-many-large-execqueues-free-nomemset.html
A similar fault in lruvec_stat_mod_folio can be repro'd if
memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of
(say)
numa_node_id() for the nid parameter.
The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
dev_to_node() for the nid parameter. Suspect this is causing something
similar
to happen.
When memremap_pages() calls pagemap_range() we have the following logic:
if (nid < 0)
nid = numa_mem_id();
I think we might need to add this to memremap_device_private_pagemap()
to handle
the NUMA_NO_NODE case. Still confirming.
Thanks,
Jordan.
Thanks,
Jordan.
Matt
Thanks,
Jordan.
Thanks,
Jordan.
Jordan.
I was also wondering if Nvidia could help review one our core MM patches
[2] which is gating enabling 2M device pages too?
Matt
[1] https://patchwork.freedesktop.org/series/159738/
[2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1