On 2026-05-19 12:51 +0200, Mateusz Guzik wrote: > On Tue, May 19, 2026 at 11:49 AM Christian Brauner <[email protected]> wrote: > > > > On 2026-05-18 20:59 +0200, Jann Horn wrote: > > > I feel like a sysctl for "disable all the splice-like interfaces and > > > zerocopy TX" would be reasonable to have? Either by blocking such > > > operations, or better, silently downgrading all such operations to > > > normal copies. > > > [..] > > I think blocking isn't going to be useful as it will make it harder for > > distros to turn this on. So we should degrade. > > > [..] > > Let's discuss the other aggressive alternative: Can we try and > > unconditionally degrade to copy. This would affect sendfile(), splice(), > > and vmsplice(). Worst-case we would have to introduce the sysctl > > retroactively. > > > > I know at least nginx uses sendfile, but I never benchmarked how much it buys. > > The original patch as proposed filters by rw perms on the file, which > I expect to exclude nginx. > > While kernel-internal copy is still going to beat a userspace-based > read/write loop, this is still going to be a hit and I expect people > are going to complain. Afterwards you may end up with tutorials how to > re-enable pre-patch behavior, partially defeating the point. > > How about denial of splice usage or degradation to copy are still on > the table, but based on a different criterion: whether code involved > is "known good" for lack of a better description. iow the kernel would > maintain a whitelist of "safe" cases. Random-ass AF_NOBODYEVERHEARDOF > does not make the cut.
I had thought about that to but I felt a bit iffy about it. You could envision an FOP_* flag for this: /* Module may use splice-like apis */ #define FOP_MAY_SPLICE ((__force fop_flags_t)(1 << 8)) But that doesn't address how fundamentally broken vmsplice() for example really is and that probably no one should get to use it in its current form. > Common-case usage would have to be audited of course, but this sounds > rather actionable and would provide hardening without much friction. And that's the usual problem where rando module will just raise the flag. Maybe that's fine and we will keep up. > I can't stress enough that mucking around splice (even if worthwhile) > is merely addressing the currently popular attack vector and not the > general problem. > > The general problem is that the kernel is expected to be able to run > with untrusted unprivileged users, while it avoidably exposes a huge > attack surface. Of course there is no way around providing a bunch of > syscalls to users, so *some* danger will always be there and one has > to expect that even core code has bugs which will be discovered by > LLMs in the coming months. Even then, there is tons of code which is > currently being audited by third parties and which has no use in most > setups. Instead it gets autoloaded in response to an exploit wishing > to take advantage of its bugs. > > The huge attack surface was always a problematic position to be in, > but with the advent lf LLMs any unskilled person can drop a 0day and > the position is straight up untenable. In the long run there is no way > around blocking access to code by default, way beyond the current > splice proposal. I see this is the "let's become goat-farmers" portion of the message.

