Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size

Yuyang Huang Mon, 25 May 2026 00:22:25 -0700

Thanks for the review comment. To better explain the problem, please
bear with me through this long email reply.



>AI is wrong.
>The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?

Thanks for pointing it out. I think the correct Fixes tags should be
as follows, and I will split them individually in the v2 patches:

Fixes: 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs")
Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs")

>Please provide more details.
>I'm still not convinced that what you're saying... is a kernel bug.
>Looks like user space is querying new attachment types with old structure.

I created a small C program using BPF_PROG_QUERY to query
BPF_CGROUP_INET_INGRESS (not the BPF_TCX_EGRESS) to demonstrate the
full problem , hope it clarifies:

```c
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>

#define __NR_bpf 321
#define BPF_PROG_QUERY 16
#define BPF_CGROUP_INET_INGRESS 0
int main() {
    char buf[64];
    memset(buf, 0xAA, 64); // Set 0xAA signature
    memset(buf, 0, 40);    // Zero out legacy 40-byte struct
    *(int*)(buf + 0) = open("/sys/fs/cgroup", O_RDONLY);
    *(int*)(buf + 4) = BPF_CGROUP_INET_INGRESS; // Explicitly set attach_type

    // Call BPF_PROG_QUERY (16) with legacy size (40) on x86_64 (syscall 321)
    long ret = syscall(__NR_bpf, BPF_PROG_QUERY, buf, 40);
    printf("Syscall return value: %ld\n", ret);
    printf("Revision (offsets 56-63): %016llx\n", *(unsigned long
long*)(buf + 56));
    close(*(int*)(buf + 0)); // Clean up cgroup FD
    return 0;
}
```

Before the patch:
  Syscall return value: 0
  Revision (offsets 56-63): 0000000000000001   <-- Stack/Heap Corruption!

After the patch (Safe / revision writeback is gated):
  Syscall return value: 0
  Revision (offsets 56-63): aaaaaaaaaaaaaaaa   <-- Safe (Signature intact)

For this example, the regression lies in legacy CGROUP queries:
1. BPF_PROG_QUERY for cgroups is not a new feature; it has been in the
kernel since 2017.
2. There are existing legacy userspace applications and language
wrappers (like our android net-tests) written years ago with older
structure layouts (passing size = 40, ending at
query.prog_attach_flags) that query cgroups.
3. In June 2025, Commit 120933984460 backported "revision" support to
cgroup queries, introducing an unconditional writeback in
`__cgroup_bpf_query()` at offset 56:
```c
// In __cgroup_bpf_query() (kernel/bpf/cgroup.c) - Introduced in 120933984460:
if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
        return -EFAULT;
```

Now lets talk about BPF_TCX_EGRESS:

> bpf_mprog_query() and tcx_prog_query() were only introduced there.
> How come your userspace passes shorter uatter to query newer
> BPF_TCX_EGRESS attachment?

I understand your point, but two common cases exist where userspace
will legitimately query BPF_TCX_EGRESS with a 40-byte structure:

First, a generic BPF querying tool (assume it called `./bpf-query`)
compiled in 2020 (when the query buffer size was 40 bytes) might
accept the attach type dynamically from the command line:
  ./bpf-query --type 47 --fd 3  # 47 is BPF_TCX_EGRESS

The unmodified 2020 binary populates the struct dynamically and
invokes the syscall with size = 40:
```c
   ....
   attr.query.attach_type = 47; // BPF_TCX_EGRESS (passed dynamically
at runtime)
   syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40);
```

Without the size check, the unconditional writeback of revision at
offset 56 would result in an out-of-bounds write on the stack.

Second, in dynamic language environments (like Python), UAPI
structures are often declared manually. The developer might only
declare the subset of fields the application actually uses (40 bytes
in this case) to optimize memory overhead. Even with this subset
declaration, we believe the kernel should ideally respect the declared
size limit to prevent accidental out-of-bounds writes into adjacent
userspace memory.

> Looks like user space is querying new attachment types
> with old structure.
> Potentially the check can only be around:
>         case BPF_TCX_INGRESS:
>         case BPF_TCX_EGRESS:
>                 return tcx_prog_query(attr, uattr);
>
> to be nice to broken user space,
> but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS
> with a smaller uattr.

Add gating to the TCX path, will not be able to fix the legacy cgroup
query path. As shown above, the cgroup query path has the exact same
OOB writeback regression affecting legacy userspace. Since we must
plumb uattr_size to cgroup.c to resolve the cgroup regression safely,
applying the same consistent size-gating in bpf_mprog_query() seemed
like the most consistent and robust architectural path.

Feel free to let us know your thoughts.

Thanks,
Yuyang

On Sun, May 24, 2026 at 8:22 PM Alexei Starovoitov
<[email protected]> wrote:
>
> On Thu, May 21, 2026 at 1:41 PM Yuyang Huang <[email protected]> wrote:
> >
> > >This looks like a bug fix for an out-of-bounds write vulnerability.
> > >Should this include a Fixes: tag to identify which commit introduced the
> > >unconditional copy_to_user() calls for query.revision without checking
> > >the user-provided buffer size?
> >
> >  > Fixes: 3fe213c040b3 ("adding ci files")
> >
> > I completely missed this checkpatch comment. I will add the Fixes tag
> > in v2 if there is a consensus that this patch is the right approach to
> > resolve the issue.
>
> AI is wrong.
> The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?
>
> bpf_mprog_query() and tcx_prog_query() were only introduced there.
> How come your userspace passes shorter uatter to query newer
> BPF_TCX_EGRESS attachment?
>
> Please provide more details.
>
> I'm still not convinced that what you're saying:
> "
> For example, our Android net_test  suite uses this legacy 40-byte
> Python ctypes struct layout:
>
>   # legacy 40-byte layout (pre-revision field)
>   BpfAttrProgQuery = cstruct.Struct(
>     "bpf_attr_prog_query", "=IIIIQI4xQ",
>     "target_fd attach_type query_flags attach_flags prog_ids_ptr
> prog_cnt prog_attach_flags"
>   )
>   # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40)
> "
> is a kernel bug.
>
> Looks like user space is querying new attachment types
> with old structure.
> Potentially the check can only be around:
>         case BPF_TCX_INGRESS:
>         case BPF_TCX_EGRESS:
>                 return tcx_prog_query(attr, uattr);
>
> to be nice to broken user space,
> but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS
> with a smaller uattr.
>
> pw-bot: cr

Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size

Reply via email to