On Fri, 17 May 2024 11:14:41 +0100 Jonathan Cameron <jonathan.came...@huawei.com> wrote:
> On Fri, 17 May 2024 18:07:07 +0800 > Yuquan Wang <wangyuquan1...@phytium.com.cn> wrote: > > > On Fri, May 10, 2024 at 06:16:46PM +0100, Jonathan Cameron wrote: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/jic23/cxl-staging.git/log/?h=arm-numa-fixes > > > > > Thank you :) > > > I've run out of time to sort out cover letters and things + just before > > > the merge > > > window is never a good time get anyone to pay attention to potentially > > > controversial > > > patches. So for now I've thrown up a branch on kernel.org with Robert's > > > series of fixes of related code (that's queued in the ACPI tree for the > > > merge window) > > > and Dan Williams (from several years ago) + my additions that 'work' > > > (lightly tested) > > > on qemu/arm64 with the generic port patches etc. > > > > > > I'll send out an RFC in a couple of weeks. In meantime let me know if you > > > run into any problems or have suggestions to improve them. > > > > > > Jonathan > > > > > With the latest commit(d077bf9) in the 'arm-numa-fixes', the qemu virt > > could create a cxl region with a new numa node (node 2) just like x86. > > At this stage(the first time to create cxl region), everything works > > fine. > > > > However, if I use below commands to delete the created cxl region: > > > > `daxctl offline-memory dax0.0` > > `cxl disable-region region0` > > `cxl destroy-region region0` > > > > and then recreate it by `cxl create-region -d decoder0.0 -t ram`, the > > kernel could not create the numa node2 again, and the kernel will print: > > > > [ 589.458971] Fallback order for Node 0: 0 1 > > [ 589.459136] Fallback order for Node 1: 1 0 > > [ 589.459175] Fallback order for Node 2: 0 1 > > [ 589.459213] Built 2 zonelists, mobility grouping on. Total pages: > > 1009890 > > [ 589.459284] Policy zone: Normal > > I'll see if I can figure out what is happening there. So I know what is happening but not sure on the solution yet. The issue is on unbind of the region there is a call to try_remove_memory() and that calls memblock_phys_free(). That removes the reserved memblocks being used for tracking the numa node, so when you bind a region at that HPA again, there is no tracking information. So far I haven't figured out why that call is there in the first place which isn't helping me solve this. https://elixir.bootlin.com/linux/v6.9.1/source/mm/memory_hotplug.c#L2286 Until I get this code out there, kind of hard to ask the mm folk - for now I may just have to say it only works once and point at that line as the problem in an RFC. Long shot, but Dan, did you run into this when you were doing your [PATCH v2 08/22] memblock: Introduce a generic phys_addr_to_target_node() stuff? I assume that ultimately called try_remove_memory() in a remove path somewhere and similarly to this if you try putting it back it would be missing. Or alternatively, any idea why what that memblock_phys_free() is balancing with? Jonathan > > > > Meanwhile, the qemu reports that: > > > > "qemu-system-aarch64: virtio: bogus descriptor or out of resources" > > That sounds like another TCG issue, or possibly the DMA bounce buffer > problem resurfacing. It's not directly related to his NUMA aspect unless > something very odd is going on. I'm even more confused because I think > you are not using kmem with the above commands, so we shouldn't be using > the CXL memory for virtio. > > Just to check, you aren't running with KVM I hope? That opens a much > bigger problem set. :( > > Jonathan > > > > > > > Many thanks > > Yuquan > > > >