> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -445,6 +445,16 @@ int sk_msg_read_core(struct sock *sk, struct sk_psock 
> *psock,
>                               copy = actor(actor_arg, page,
>                                            sge->offset, copy);
>                       if (!copy) {
> +                             /*
> +                              * The loop processes msg_rx->sg entries
> +                              * sequentially and prior entries may
> +                              * already be consumed. Advance sg.start
> +                              * so the next call resumes at the correct
> +                              * entry, otherwise it would revisit
> +                              * zero-length entries and return -EFAULT.
> +                              */
> +                             if (!peek)
> +                                     msg_rx->sg.start = i;
>                               copied = copied ? copied : -EFAULT;
>                               goto out;

The sg.start fix looks correct for the mid-way failure case.

When the splice actor (below) fails on the very first sg entry, copied
is still 0, so this returns -EFAULT to the caller. For the recvmsg
actor, -EFAULT is the right error because copy_page_to_iter() returning
0 means a genuine fault. But for the splice actor, 0 can mean the pipe
is full (add_to_pipe() returned -EAGAIN) or memory allocation failed.

Would it make sense for sk_msg_splice_actor() to propagate the actual
error, or for the callers to translate -EFAULT into the appropriate
errno for their context? Right now the splice path returns -EFAULT to
userspace for a pipe-full condition, where standard tcp_splice_read()
would return -EAGAIN:

tcp_bpf_splice_read()
  -> __tcp_bpf_recvmsg()
    -> sk_msg_read_core() with sk_msg_splice_actor
      -> add_to_pipe() returns -EAGAIN (pipe full)
      -> actor returns 0
      -> copied = 0 ? 0 : -EFAULT
    returns -EFAULT
  returns -EFAULT to userspace

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c

[ ... ]

> +static int sk_msg_splice_actor(void *arg, struct page *page,
> +                            unsigned int offset, size_t len)
> +{
> +     struct tcp_bpf_splice_ctx *ctx = arg;
> +     struct pipe_buffer buf = {
> +             .ops = &nosteal_pipe_buf_ops,
> +     };
> +     ssize_t ret;
> +
> +     buf.page = alloc_page(GFP_KERNEL);
> +     if (!buf.page)
> +             return 0;
> +
> +     memcpy(page_address(buf.page), page_address(page) + offset, len);
> +     buf.offset = 0;
> +     buf.len = len;
> +
> +     /*
> +      * add_to_pipe() calls pipe_buf_release() on failure, which
> +      * handles put_page() via nosteal_pipe_buf_ops, so no explicit
> +      * cleanup is needed here.
> +      */
> +     ret = add_to_pipe(ctx->pipe, &buf);
> +     if (ret <= 0)
> +             return 0;
> +     return ret;
> +}

Both the alloc_page failure and the add_to_pipe failure return 0 here.
Since sk_msg_read_core() converts actor-returning-0 into -EFAULT when
no bytes have been copied yet, this means userspace can see -EFAULT for
what is actually -ENOMEM (allocation failure) or -EAGAIN (pipe full).
The alloc_page case under memory pressure seems particularly confusing,
as -EFAULT gives no indication that backing off might help.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22658462977

Reply via email to