On 5/12/26 4:58 AM, Minh Nguyen wrote:
> vmci_transport_recv_connecting_server() jumps to its destroy: label
> and performs an unconditional sock_put(pending) to release the
> explicit sock_hold() taken by vmci_transport_recv_listen() before
> schedule_delayed_work().  The existing comment claimed this was safe
> because the listen handler removes pending from the pending list on
> the way out, which would prevent vsock_pending_work() from dropping
> the same reference later.
> 
> That assumption breaks for a peer RST.  The default arm of the packet
> switch sets:
> 
>       err = pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST ? 0 : -EINVAL;
> 
> and vmci_transport_recv_listen() only calls vsock_remove_pending()
> when err < 0:
> 
>       if (err < 0)
>               vsock_remove_pending(sk, pending);
> 
> For RST (err == 0) the socket stays on the pending list, so when
> vsock_pending_work() fires it takes the is_pending=true path and
> drops all three references itself: the pending-list reference via
> vsock_remove_pending(), then the two trailing sock_put(sk) calls.
> The unconditional sock_put() in destroy: had already dropped the
> explicit sock_hold() reference, so the second trailing sock_put(sk)
> in vsock_pending_work() is a write into the freed AF_VSOCK slab
> object.  KASAN reports a slab-use-after-free write of 4 bytes from
> refcount_warn_saturate() on the workqueue path:
> 
>   BUG: KASAN: slab-use-after-free in refcount_warn_saturate
>   Write of size 4 at addr ffff88800b1cac80 by task kworker
>   Workqueue: events vsock_pending_work
>   Call Trace:
>    refcount_warn_saturate
>    vsock_pending_work
>    process_one_work
>    worker_thread
> 
> Triggering the bug requires only the ability to open a VSOCK
> connection to the target and send a RST before the listener accepts.
> 
> Skip the sock_put() in destroy: when err == 0 so it only compensates
> the cases where vmci_transport_recv_listen() actually calls
> vsock_remove_pending().  RST is the only path that reaches destroy:
> with err == 0; every other path produces a negative value, so their
> behaviour is unchanged.
> 
> Verified on lts-6.12.79 with KASAN enabled (CONFIG_KASAN_INLINE=y,
> kasan_multi_shot): same trigger binary, same VM, 100 iterations:
> without this patch 52 KASAN slab-use-after-free reports fire; with
> this patch applied, 0 reports.
> 
> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
> Cc: [email protected]
> Signed-off-by: Minh Nguyen <[email protected]>
> Assisted-by: Claude:claude-opus-4-7
> ---
> v2:
>   - Resubmit to netdev per Stefano Garzarella's request after v1 review.
>   - Retested the PoC with the patch applied on lts-6.12.79 with KASAN
>     enabled: 52/100 unpatched -> 0/100 patched (same trigger binary,
>     same VM, 100 iterations); test summary captured in the commit
>     message.
>   - Changed Cc: [email protected] -> [email protected] now that the
>     bug is no longer embargoed.
>   - Rebased onto net/main (no functional change to the diff).
> 
> v1 was sent to [email protected] on 2026-05-10 (not on lore archives;
> no public link available).  v1 review summary, for reference:
>   - Stefano Garzarella (vsock maintainer): "Overall LGTM, but I'd wait
>     vmware guys on this that know this code better."  Asked for retest
>     and resubmission via the net tree workflow.
>   - Bryan Tan (VMCI maintainer): "Thanks for the fix, it looks good to
>     me."  Also noted that no modern VMware product allows guest-to-guest
>     VMCI communication, so the practical attack surface is host -> guest.
> 
>  net/vmw_vsock/vmci_transport.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 4296ca1..88d7128 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -1269,14 +1269,16 @@ vmci_transport_recv_connecting_server(struct sock 
> *listener,
>  destroy:
>       pending->sk_err = skerr;
>       pending->sk_state = TCP_CLOSE;
> -     /* As long as we drop our reference, all necessary cleanup will handle
> -      * when the cleanup function drops its reference and our destruct
> -      * implementation is called.  Note that since the listen handler will
> -      * remove pending from the pending list upon our failure, the cleanup
> -      * function won't drop the additional reference, which is why we do it
> -      * here.
> +     /* Drop the reference taken by vmci_transport_recv_listen() before
> +      * schedule_delayed_work() only on real errors.  For a peer RST
> +      * (err == 0) the listener leaves pending on the pending list, and
> +      * vsock_pending_work() will drop that reference itself when it
> +      * later cleans the socket up.  Calling sock_put() here in that
> +      * case would be a double-put and free the socket while
> +      * vsock_pending_work() still holds it.
>        */
> -     sock_put(pending);
> +     if (err < 0)
> +             sock_put(pending);

Sashiko says:

---
Could this change lead to a socket memory leak if another packet arrives
before vsock_pending_work() executes?
If a peer RST is received (err == 0), the socket stays on the
pending_links list with its state set to TCP_CLOSE, and the base
reference is kept.
If the peer then sends another packet (such as another RST) within the
delay window before vsock_pending_work() runs,
vmci_transport_get_pending() might find this same socket.
Since its state is TCP_CLOSE, vmci_transport_recv_listen() would hit the
default switch case, set err = -EINVAL, and call vsock_remove_pending().
This removes the socket from the list and drops the list reference, but
it bypasses vmci_transport_recv_connecting_server(), meaning the base
reference is never dropped.
When vsock_pending_work() runs later, vsock_is_pending() evaluates to false.
This sets cleanup = false and bypasses the sock_put(sk) call, leaking
the pending socket.
While not introduced by this patch, does this error path leak
sk_ack_backlog slots on failed handshakes?
If a handshake fails due to an error, vmci_transport_recv_listen()
handles it by calling vsock_remove_pending(). This removes the socket
from the pending_links list but does not call sk_acceptq_removed(sk).
When vsock_pending_work() runs later, vsock_is_pending() evaluates to
false because the socket is no longer in the list. This causes the work
function to skip its own sk_acceptq_removed(listener) call, meaning the
listener's sk_ack_backlog is never decremented.
---

it looks like the above is trading an UaF for a leak ?!?


Reply via email to