On Wed, Jan 10, 2024 at 03:19:05PM -0800, Dan Williams wrote: > David Hildenbrand wrote: > > On 09.01.24 17:52, Jonathan Cameron wrote: > > > On Thu, 4 Jan 2024 10:39:41 -0700 > > > Alex Williamson <alex.william...@redhat.com> wrote: > > > > > >> On Thu, 4 Jan 2024 16:40:39 +0000 > > >> Ankit Agrawal <ank...@nvidia.com> wrote: > > >> > > >>> Had a discussion with RH folks, summary follows: > > >>> > > >>> 1. To align with the current spec description pointed by Jonathan, we > > >>> first do > > >>> a separate object instance per GI node as suggested by Jonathan. > > >>> i.e. > > >>> a acpi-generic-initiator would only link one node to the device. > > >>> To > > >>> associate a set of nodes, those number of object instances should > > >>> be > > >>> created. > > >>> 2. In parallel, we work to get the spec updated. After the update, we > > >>> switch > > >>> to the current implementation to link a PCI device with a set of > > >>> NUMA > > >>> nodes. > > >>> > > >>> Alex/Jonathan, does this sound fine? > > >>> > > >> > > >> Yes, as I understand Jonathan's comments, the acpi-generic-initiator > > >> object should currently define a single device:node relationship to > > >> match the ACPI definition. > > > > > > Doesn't matter for this, but it's a many_device:single_node > > > relationship as currently defined. We should be able to support that > > > in any new interfaces for QEMU. > > > > > >> Separately a clarification of the spec > > >> could be pursued that could allow us to reinstate a node list option > > >> for the acpi-generic-initiator object. In the interim, a user can > > >> define multiple 1:1 objects to create the 1:N relationship that's > > >> ultimately required here. Thanks, > > > > > > Yes, a spec clarification would work, probably needs some text > > > to say a GI might not be an initiator as well - my worry is > > > theoretical backwards compatibility with a (probably > > > nonexistent) OS that assumes the N:1 mapping. So you may be in > > > new SRAT entry territory. > > > > > > Given that, an alternative proposal that I think would work > > > for you would be to add a 'placeholder' memory node definition > > > in SRAT (so allow 0 size explicitly - might need a new SRAT > > > entry to avoid backwards compat issues). > > > > Putting all the PCI/GI/... complexity aside, I'll just raise again that > > for virtio-mem something simple like that might be helpful as well, IIUC. > > > > -numa node,nodeid=2 \ > > ... > > -device virtio-mem-pci,node=2,... \ > > > > All we need is the OS to prepare for an empty node that will get > > populated with memory later. > > > > So if that's what a "placeholder" node definition in srat could achieve > > as well, even without all of the other acpi-generic-initiator stuff, > > that would be great. > > Please no "placeholder" definitions in SRAT. One of the main thrusts of > CXL is to move away from static ACPI tables describing vendor-specific > memory topology, towards an industry standard device enumeration. > > Platform firmware enumerates the platform CXL "windows" (ACPI CEDT > CFMWS) and the relative performance of the CPU access a CXL port (ACPI > HMAT Generic Port), everything else is CXL standard enumeration.
I assume memory topology and so on apply, right? E.g PMTT etc. Just making sure. > It is strictly OS policy about how many NUMA nodes it imagines it wants > to define within that playground. The current OS policy is one node per > "window". If a solution believes Linux should be creating more than that > I submit that's a discussion with OS policy developers, not a trip to > the BIOS team to please sprinkle in more placeholders. Linux can fully > own the policy here. The painful bit is just that it never had to > before.