On Tue, Jan 09, 2024 at 11:36:03AM -0800, Dan Williams wrote: > Jason Gunthorpe wrote: > > On Tue, Jan 09, 2024 at 06:02:03PM +0100, David Hildenbrand wrote: > > > > Given that, an alternative proposal that I think would work > > > > for you would be to add a 'placeholder' memory node definition > > > > in SRAT (so allow 0 size explicitly - might need a new SRAT > > > > entry to avoid backwards compat issues). > > > > > > Putting all the PCI/GI/... complexity aside, I'll just raise again that > > > for > > > virtio-mem something simple like that might be helpful as well, IIUC. > > > > > > -numa node,nodeid=2 \ > > > ... > > > -device virtio-mem-pci,node=2,... \ > > > > > > All we need is the OS to prepare for an empty node that will get populated > > > with memory later. > > > > That is all this is doing too, the NUMA relationship of the actual > > memory is desribed already by the PCI device since it is a BAR on the > > device. > > > > The only purpose is to get the empty nodes into Linux :( > > > > > So if that's what a "placeholder" node definition in srat could achieve as > > > well, even without all of the other acpi-generic-initiator stuff, that > > > would > > > be great. > > > > Seems like there are two use quite similar cases.. virtio-mem is going > > to be calling the same family of kernel API I suspect :) > > It seems sad that we, as an industry, went through all of this trouble > to define a dynamically enumerable CXL device model only to turn around > and require static ACPI tables to tell us how to enumerate it. > > A similar problem exists on the memory target side and the approach > taken there was to have Linux statically reserve at least enough numa > node numbers for all the platform CXL memory ranges (defined in the > ACPI.CEDT.CFMWS), but with the promise to come back and broach the > dynamic node creation problem "if the need arises". > > This initiator-node enumeration case seems like that occasion where the > need has arisen to get Linux out of the mode of needing to declare all > possible numa nodes early in boot. Allow for nodes to be discoverable > post NUMA-init. > > One strawman scheme that comes to mind is instead of "add nodes early" in > boot, "delete unused nodes late" in boot after the device topology has > been enumerated. Otherwise, requiring static ACPI tables to further > enumerate an industry-standard dynamically enumerated bus seems to be > going in the wrong direction.
Fully agree, and I think this will get increasingly painful as we go down the CXL road. Jason