Smita Koralahalli wrote: > This series introduces the ability to manage SOFT RESERVED iomem > resources, enabling the CXL driver to remove any portions that > intersect with created CXL regions. > > The current approach of leaving SOFT RESERVED entries as is can result > in failures during device hotplug such as CXL because the address range > remains reserved and unavailable for reuse even after region teardown.
I will go through the patches, but the main concern here is not hotplug, it is region assembly failure. We have a constant drip of surprising platform behaviors that trip up the driver leaving memory stranded. Specifically, device-dax defers to CXL to assemble the region representing the soft-reserve range, CXL fails to complete that assembly due to being confused by the platform, end user wonders why their platform BIOS sees memory capacity that Linux does not see. So the priority order of solutions needed here is: 1/ Fix all shipping platform "quirks", try to prevent new ones from being created. I.e. ideally, long term, Linux doed not need a soft-reserve fallback and just always ignores Soft Reserve in CXL Windows because the CXL subsystem will handle it. 2/ In the near term forseeable future, for all yet to be solved or yet to be discovered platform quirks, provide a device-dax fallback to recover baseline device-dax behavior (equivalent to putting cxl_acpi on a modprobe deny-list). 3/ For hotplug, remove the conflicting resource. > To address this, the CXL driver now uses a background worker that waits > for cxl_mem driver probe to complete before scanning for intersecting > resources. Then the driver walks through created CXL regions to trim any > intersections with SOFT RESERVED resources in the iomem tree. The precision of this gives me pause. I think it is fine to make this more coarse because any mismatch between Soft Reserve and a CXL Window resource should be cause to give up on the CXL side. If a Soft Reserve range straddles a CXL window and "System RAM", give up on trying to use the CXL driver on that system. CXL does not completely cover a soft-reserve region, give up on trying to use the CXL driver on that system. Effectively anytime we detect unexpected platform shenanigans it is likely indicating missing understanding in the Linux driver. > The following scenarios have been tested: Nice! Appreciate you including the test case results. [..] > Example 3: No alignment > |---------- "Soft Reserved" ----------| > |---- "Region #" ----| Per above, CXL subsystem should completely give up in this scenario. The BIOS said that all of the range is Conventional memory and CXL is only creating a region for part of it. Somebody is wrong. Given the fact that non-CXL aware OSes would try to use the entirety of the Soft Reserved region, then this scenario is "disable CXL, it clearly does not understand this platform".