On Sat, Mar 21, 2026 at 10:40:21AM -0700, Andrew Morton wrote: > On Sat, 21 Mar 2026 11:03:56 -0400 Gregory Price <[email protected]> wrote: > > > The dax kmem driver currently onlines memory during probe using the > > system default policy, with no way to control or query the region state > > at runtime - other than by inspecting the state of individual blocks. > > > > Offlining and removing an entire region requires operating on individual > > memory blocks, creating race conditions where external entities can > > interfere between the offline and remove steps. > > > > The problem was discussed specifically in the LPC2025 device memory > > sessions - https://lpc.events/event/19/contributions/2016/ - where > > it was discussed how the non-atomic interface for dax hotplug is causing > > issues in some distributions which have competing userland controllers > > that interfere with each other. > > > > This series adds a sysfs "hotplug" attribute for atomic whole-device > > hotplug control, along with the mm and dax plumbing to support it. > > AI review (which hasn't completed at this time) has a lot to say: > > https://sashiko.dev/#/patchset/[email protected]
Looking at the results - i mucked up a UAF during the rebase that i didn't catch during testing. Will clean that up. I also just realized I left an extern in one of the patches that I thought I had removed. So I owe a respin on this in more ways than one. But on the AI review comment for non-trivial stuff --- Much of the remaining commentary is about either the pre-existing code race conditions, or design questions in the space of that race condition. Specifically: userland can still try to twiddle the memoryN/state bits while the dax device loops over non-contiguous regions. I dropped this commit: https://lore.kernel.org/all/[email protected]/ >From the series, because the feedback here: https://lore.kernel.org/linux-mm/[email protected]/ suggested that offline_and_remove_memory() would resolve the race condition problem - but the patch proposed actually solved two issues: 1) Inconsistent hotplug state issue (user is still using the old per-block offlining pattern) 2) The old offline pattern calling BUG() instead of WARN() when trying to unbind while things are still online. But this goes to the issue of: If the race condition in userland has been around for many years, is it to be considered a feature we should not break - or on what time scale should we consider breaking it? I don't know the answer, David will have to weigh in on that. ~Gregory

