On Tue, 3 Mar 2026 10:19:52 +0000 Lorenzo Stoakes <[email protected]> wrote:
> > > Setting VM_IO just to trigger a failure case in madvise() feels like a > > > hack? I > > > guess it'd do the trick though, but you're not going to be able to > > > reclaim that > > > memory, and you might get some unexpected behaviour in code paths that > > > assume > > > VM_IO means it's memory-mapped I/O... (for instance GUP will stop > > > working, if > > > you need that). > > > > Well, we don't reclaim that memory anyway. > > OK so maybe not such an issue then! As long as GUP not working with it > wouldn't > break anything? Yeah, we don't ever use get_user_page() for this memory. It's pretty much all handled on the kernel side. User space just has it mapped as read only. The kernel doesn't care what user space does with it. > > Yeah, right now the accounting gets screwed up as the mappings get out of > > sync when it is forked. > > Ack, is that in a way that could screw up things from a kernel point of view > at > all? Presumably there was some report that triggered this work, like an assert > fail or something, or a warning? Yes, it triggered a warning. The start of this thread added a patch to allow madvise to remove DONTCOPY from this memory without screwing things up (by adding an open() handler for the vm_operations_struct). > > If a user is explicitly going to an ftrace buffer and madvise()'ing it in > random > ways they've only themselves to blame for being so stupid :) > > As long as it's not exploitable in some way I don't think that's too much of > an > issue? > > It'd be nice to keep the semantics of 'don't copy on fork' if we could, even > if > some crazy users might override it with madvise(). OK, so should we add the VM_IO flag? > > Kind of separate from the question, but I mean if it's kernel-allocated memory > which you're managing separately it should surely be VM_PFNMAP? OK, I'm a bit confused. What do you mean "managing separately"? You mean managing the user space side of things? if so then yes. Hmm, I haven't heard of VM_PFNMAP, can you explain more its use case. > > It depends if you want to have a refcounted folio underneath though. If you do > then yeah don't do that :) I have no idea what a refcounted folio would do here :-p > > In general I'd suggest supporting the fork case if you can, or just let things > be wrong if a user does something crazy and undoes the VM_DONTCOPY flag. OK, so don't add these flags and just allow the forking to happen as this patch does (if they screw it up, it's their problem). This patch is here: https://lore.kernel.org/all/[email protected]/ I mean, we allow two separate tasks to mmap the same buffer, and they will have the same issues as a fork would have. Thus, I guess the answer is to apply this patch? -- Steve
