On Sat, May 16, 2026 at 07:21:26PM +0100, Pedro Falcato wrote: > Since the advent of vulns like Dirty Pipe, Dirty Frag, Copy Fail > and Fragnasia, splicing a read-only file is fundamentally unsafe. > > As such, as a mitigation, add a way for users to block splice() for > files they cannot write to. This eliminates this whole class of exploits > that use splice()+confusion in pipe/net/etc code to gain write-access to > files they can only read. > > Users can simply toggle fs.splice_needs_write=1 and suddenly splice() will > refuse perfectly legal splices() from files it can only read, but not write. > > For vmsplice(), make due with the address_space attached to the folio. Care > is held to make sure the operation isn't too slowed down with locks. The check > itself isn't entirely equivalent (the mapping's host can be the internal bdev > inode, etc, and not the one in /dev against which permissions are checked), > but doing it in a more correct way would require dropping from GUP-fast to > GUP, and that would be too slow. > > Signed-off-by: Pedro Falcato <[email protected]> > --- > > Hello, > > sending this out as an RFC so I can get better opinions from VFS & security > folks upstream. I wrote this out as a way to harden against all the page > cache attacks we've seen lately, that bottom out to splice() from a file > they cannot write + confusion elsewhere on the net stack/pipes/etc. > > This is _obviously_ not perfect and not complete. My first (unsent) version > straight up returned -EPERM on splice() for these files. This one attempts > to retain some compatibility by only blocking the page splicing operation, > but still issuing the operation with normal copies (kindly suggested by Jan). > vmsplice() is a complicated issue, because gup_fast does not allow us access > to the VMA's vm_file. I tried hacking around it but it's not perfect (e.g you > cannot grab the mnt_idmap for the file, since we only have access to the > address_space + its host). > I'm also not a fan of having somewhat hairy MM code in the middle of > fs/splice.c but that's something we can simply hoist elsewhere as this gets > un-RFC'd. It's also missing the external-facing docs for the sysctl. > > My big questions are: > 1) Is this a viable way forward?
I think that splice and vmsplice() are pretty wonky apis. Ignoring it's recent prominent role in page cache attacks it suffers from weird issues due to its interactions with pipe_lock(). Bug with splice to a pipe preventing a process exit [email protected] Sendfile holding pipe->mutex blocks the peer's pipe_release() from do_exit(). Change in splice() behaviour after 5.10? (LTP splice07) [email protected] [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion [email protected] Pending splice from tty/socket/FIFO holds pipe->mutex indefinitely, blocking all other FIFO ops incl. read(O_NONBLOCK) splice: prevent deadlock when splicing a file to itself [email protected] do_splice_direct_actor() still lacks file_inode(in) == file_inode(out) guard AF_UNIX/zerocopy/pipe/vmsplice/splice vs FOLL_PIN [email protected] vmsplice/splice into AF_UNIX/pipe doesn't FOLL_PIN the source memory My main gripe with the patch as written is that I find it really hard to figure out who would deploy this. It half-cripples splice() and vmsplice() for some use-cases but leaves it intact for others. At that point you can also just ENOSYS splice() and vmsplice() via seccomp and force a fallback on non-splice codepaths that userspace has to have anyway as splice() isn't supported unconditionally. It feels like a knee-jerk reaction to an exploit class originating in buggy modules that we have little control over and we would extend an API to users that is really difficult to use. What might make more sense is to add a splice specific security_*() hook into the code so that an LSM can deny usage of splice in whatever way it wants to - bpf lsm or in-tree lsm. Then we don't have to have all this gunk in the VFS layer that will be annoying to maintain with little value in the long-term. So I'm not very likely to pick this up as is.

