Gregory Price wrote: > On Fri, Jan 20, 2023 at 09:38:13AM -0800, Dan Williams wrote: > > As it stands currently that dax device and the cxl device are not > > related since a default dax-device is loaded just based on the presence > > of an EFI_MEMORY_SP address range in the address map. With the new ram > > enabling that default device will be elided and CXL will register a > > dax-device parented by a cxl region. > > > > > - The memory *does not* auto-online, instead the dax device can be > > > onlined as system-ram *manually* via ndctl and friends > > > > That *manually* part is the problem that needs distro help to solve. It > > should be the case that by default all Linux distributions auto-online > > all dax-devices. If that happens to online memory that is too slow for > > general use, or too high-performance / precious for general purpose use > > then the administrator can set policy after the fact. Unfortunately user > > policy can not be applied if these memory ranges were onlined by the > > kernel at boot , so that's why the kernel policy defaults to not-online. > > > > In other words, there is no guarantee that memory that was assigned to > > the general purpose pool at boot can be removed. The only guaranteed > > behavior is to never give the memory to the core kernel in the first > > instance and always let user policy route the memory. > > > > > 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless > > > of the type-3 device configuration (pmem-only or vmem-only) > > > > Correct, the top-level bus code (cxl_acpi) and the endpoint code > > (cxl_mem, cxl_port) need to handshake before establishing regions. For > > pmem regions the platform needs to claim the availability of a pmem > > capable CXL window. > > > > > 4) As you can see above, multiple decoders are registered. I'm not sure > > > if that's correct or not, but it does seem odd given there's only one > > > cxl type-3 device. Odd that decoder0.0 shows up when CFMW is there, > > > but not when it isn't. > > > > CXL windows are modeled as decoders hanging off the the CXL root device > > (ACPI0017 on ACPI based platforms). An endpoint decoder can then map a > > selection of that window. > > > > > Don't know why I haven't thought of this until now, but is the CFMW code > > > reporting something odd about what's behind it? Is it assuming the > > > devices are pmem? > > > > No, the cxl_acpi code is just advertising platform decode possibilities > > independent of what devices show up. Think of this like the PCI MMIO > > space that gets allocated to a root bridge at the beginning of time. > > That space may or may not get consumed based on what devices show up > > downstream. > > Thank you for the explanation Dan, and thank you for you patience > @JCameron. I'm fairly sure I grok it now. > > Summarizing to make sure: the cxl driver is providing what would be the > CXL.io (control) path, and the CXL.mem path is basically being simulated > by what otherwise would be a traditional PCI memory region. This explains > why turning off Legacy mode drops the dax devices, and why the topology > looks strange - the devices are basically attached in 2 different ways. > > Might there be interest from the QEMU community to implement this > legacy-style setup in the short term, in an effort to test the the > control path of type-3 devices while we wait for the kernel to catch up? > > Or should we forget this mode ever existed and just barrel forward > with HDM decoders and writing the kernel code to hook up the underlying > devices in drivers/cxl?
Which mode are you referring? The next steps for the kernel enabling relevant to this thread are: * ram region discovery (platform firmware or kexec established) * ram region creation * pmem region discovery (from labels)