On Mon, May 18, 2026 at 02:20:30PM +0200, Christian Brauner wrote:
> On Sat, May 16, 2026 at 07:21:26PM +0100, Pedro Falcato wrote:
> > Since the advent of vulns like Dirty Pipe, Dirty Frag, Copy Fail
> > and Fragnasia, splicing a read-only file is fundamentally unsafe.
> > 
> > As such, as a mitigation, add a way for users to block splice() for
> > files they cannot write to. This eliminates this whole class of exploits
> > that use splice()+confusion in pipe/net/etc code to gain write-access to
> > files they can only read.
> > 
> > Users can simply toggle fs.splice_needs_write=1 and suddenly splice() will
> > refuse perfectly legal splices() from files it can only read, but not write.
> > 
> > For vmsplice(), make due with the address_space attached to the folio. Care
> > is held to make sure the operation isn't too slowed down with locks. The 
> > check
> > itself isn't entirely equivalent (the mapping's host can be the internal 
> > bdev
> > inode, etc, and not the one in /dev against which permissions are checked),
> > but doing it in a more correct way would require dropping from GUP-fast to
> > GUP, and that would be too slow.
> > 
> > Signed-off-by: Pedro Falcato <[email protected]>
> > ---
> > 
> > Hello,
> > 
> > sending this out as an RFC so I can get better opinions from VFS & security
> > folks upstream. I wrote this out as a way to harden against all the page
> > cache attacks we've seen lately, that bottom out to splice() from a file
> > they cannot write + confusion elsewhere on the net stack/pipes/etc.
> > 
> > This is _obviously_ not perfect and not complete. My first (unsent) version
> > straight up returned -EPERM on splice() for these files. This one attempts
> > to retain some compatibility by only blocking the page splicing operation,
> > but still issuing the operation with normal copies (kindly suggested by 
> > Jan).
> > vmsplice() is a complicated issue, because gup_fast does not allow us access
> > to the VMA's vm_file. I tried hacking around it but it's not perfect (e.g 
> > you
> > cannot grab the mnt_idmap for the file, since we only have access to the
> > address_space + its host).
> > I'm also not a fan of having somewhat hairy MM code in the middle of
> > fs/splice.c but that's something we can simply hoist elsewhere as this gets
> > un-RFC'd. It's also missing the external-facing docs for the sysctl.
> > 
> > My big questions are:
> > 1) Is this a viable way forward?
> 
> I think that splice and vmsplice() are pretty wonky apis. Ignoring it's
> recent prominent role in page cache attacks it suffers from weird issues
> due to its interactions with pipe_lock().
> 
> Bug with splice to a pipe preventing a process exit
> [email protected]
> Sendfile holding pipe->mutex blocks the peer's pipe_release() from do_exit().
> 
> Change in splice() behaviour after 5.10? (LTP splice07)
> [email protected]
> 
> [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion
> [email protected]
> Pending splice from tty/socket/FIFO holds pipe->mutex indefinitely, blocking 
> all other FIFO ops incl. read(O_NONBLOCK)
> 
> splice: prevent deadlock when splicing a file to itself
> [email protected]
> do_splice_direct_actor() still lacks file_inode(in) == file_inode(out) guard
> 
> AF_UNIX/zerocopy/pipe/vmsplice/splice vs FOLL_PIN
> [email protected]
> vmsplice/splice into AF_UNIX/pipe doesn't FOLL_PIN the source memory
> 
> My main gripe with the patch as written is that I find it really hard to
> figure out who would deploy this. It half-cripples splice() and
> vmsplice() for some use-cases but leaves it intact for others.

Not just splice() and vmsplice(), but sendfile(), copy_file_range() too.
My bet (perhaps not informed enough) is that there simply aren't that many
users doing splice-like opeartions from files they do not own in some way.

(maybe not true for copy_file_range(), I admit)

> 
> At that point you can also just ENOSYS splice() and vmsplice() via
> seccomp and force a fallback on non-splice codepaths that userspace has
> to have anyway as splice() isn't supported unconditionally.

IIRC GNU grep is one simple example where they assume splice() from a pipe
to /dev/null Just Works(tm) and it exits(1) otherwise.

> It feels like a knee-jerk reaction to an exploit class originating in
> buggy modules that we have little control over and we would extend an
> API to users that is really difficult to use.
> 
> What might make more sense is to add a splice specific security_*() hook
> into the code so that an LSM can deny usage of splice in whatever way it
> wants to - bpf lsm or in-tree lsm.

I don't dislike that option, but I don't love leaving hardening to LSMs. The
kernel quite literally gets a new splice-related vulnerability every week now,
where userspace gets to pass pages it has no business passing to funky
codepaths that then write on these pages. I feel like natively restricting
what you can pass is simply a natural way forward.

> 
> Then we don't have to have all this gunk in the VFS layer that will be
> annoying to maintain with little value in the long-term. So I'm not very
> likely to pick this up as is.

Totally. That's what the RFC tag is for :)

-- 
Pedro

Reply via email to