> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -445,6 +445,16 @@ int sk_msg_read_core(struct sock *sk, struct sk_psock
> *psock,
> copy = actor(actor_arg, page,
> sge->offset, copy);
> if (!copy) {
> + /*
> + * The loop processes msg_rx->sg entries
> + * sequentially and prior entries may
> + * already be consumed. Advance sg.start
> + * so the next call resumes at the correct
> + * entry, otherwise it would revisit
> + * zero-length entries and return -EFAULT.
> + */
> + if (!peek)
> + msg_rx->sg.start = i;
> copied = copied ? copied : -EFAULT;
> goto out;
The sg.start fix looks correct for the mid-way failure case.
When the splice actor (below) fails on the very first sg entry, copied
is still 0, so this returns -EFAULT to the caller. For the recvmsg
actor, -EFAULT is the right error because copy_page_to_iter() returning
0 means a genuine fault. But for the splice actor, 0 can mean the pipe
is full (add_to_pipe() returned -EAGAIN) or memory allocation failed.
Would it make sense for sk_msg_splice_actor() to propagate the actual
error, or for the callers to translate -EFAULT into the appropriate
errno for their context? Right now the splice path returns -EFAULT to
userspace for a pipe-full condition, where standard tcp_splice_read()
would return -EAGAIN:
tcp_bpf_splice_read()
-> __tcp_bpf_recvmsg()
-> sk_msg_read_core() with sk_msg_splice_actor
-> add_to_pipe() returns -EAGAIN (pipe full)
-> actor returns 0
-> copied = 0 ? 0 : -EFAULT
returns -EFAULT
returns -EFAULT to userspace
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
[ ... ]
> +static int sk_msg_splice_actor(void *arg, struct page *page,
> + unsigned int offset, size_t len)
> +{
> + struct tcp_bpf_splice_ctx *ctx = arg;
> + struct pipe_buffer buf = {
> + .ops = &nosteal_pipe_buf_ops,
> + };
> + ssize_t ret;
> +
> + buf.page = alloc_page(GFP_KERNEL);
> + if (!buf.page)
> + return 0;
> +
> + memcpy(page_address(buf.page), page_address(page) + offset, len);
> + buf.offset = 0;
> + buf.len = len;
> +
> + /*
> + * add_to_pipe() calls pipe_buf_release() on failure, which
> + * handles put_page() via nosteal_pipe_buf_ops, so no explicit
> + * cleanup is needed here.
> + */
> + ret = add_to_pipe(ctx->pipe, &buf);
> + if (ret <= 0)
> + return 0;
> + return ret;
> +}
Both the alloc_page failure and the add_to_pipe failure return 0 here.
Since sk_msg_read_core() converts actor-returning-0 into -EFAULT when
no bytes have been copied yet, this means userspace can see -EFAULT for
what is actually -ENOMEM (allocation failure) or -EAGAIN (pipe full).
The alloc_page case under memory pressure seems particularly confusing,
as -EFAULT gives no indication that backing off might help.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22658462977