On Thu, 14 May 2026 at 17:16, Michael S. Tsirkin <[email protected]> wrote:
>
> On Thu, May 14, 2026 at 04:57:16PM +0200, Stefano Garzarella wrote:
> > On Wed, May 13, 2026 at 12:54:16PM +0200, Stefano Garzarella wrote:
> > > From: Stefano Garzarella <[email protected]>
> > >
> > > When there is no more space to queue an incoming packet, the packet is
> > > silently dropped. This causes data loss without any notification to
> > > either peer, since there is no retransmission.
> > >
> > > Under normal circumstances, this should never happen. However, it could
> > > happen if the other peer doesn't respect the credit, or if the skb
> > > overhead, which we recently began to take into account with commit
> > > 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue"),
> > > is too high.
> > >
> > > Fix this by resetting the connection and setting the local socket error
> > > to ENOBUFS when virtio_transport_recv_enqueue() can no longer queue a
> > > packet, so both peers are explicitly notified of the failure rather than
> > > silently losing data.
> > >
> > > Fixes: ae6fcfbf5f03 ("vsock/virtio: discard packets if credit is not
respected")
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > > net/vmw_vsock/virtio_transport_common.c | 19 ++++++++++++++-----
> > > 1 file changed, 14 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
> > > index 989cc252d3d3..4a4ac69d1ad1 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -1350,7 +1350,7 @@ virtio_transport_recv_connecting(struct sock *sk,
> > > return err;
> > > }
> > >
> > > -static void
> > > +static bool
> > > virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> > > struct sk_buff *skb)
> > > {
> > > @@ -1365,10 +1365,8 @@ virtio_transport_recv_enqueue(struct vsock_sock
*vsk,
> > > spin_lock_bh(&vvs->rx_lock);
> > >
> > > can_enqueue = virtio_transport_inc_rx_pkt(vvs, len);
> > > - if (!can_enqueue) {
> > > - free_pkt = true;
> > > + if (!can_enqueue)
> > > goto out;
> > > - }
> > >
> > > if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
> > > vvs->msg_count++;
> > > @@ -1408,6 +1406,8 @@ virtio_transport_recv_enqueue(struct vsock_sock
*vsk,
> > > spin_unlock_bh(&vvs->rx_lock);
> > > if (free_pkt)
> > > kfree_skb(skb);
> > > +
> > > + return can_enqueue;
> > > }
> > >
> > > static int
> > > @@ -1420,7 +1420,16 @@ virtio_transport_recv_connected(struct sock *sk,
> > >
> > > switch (le16_to_cpu(hdr->op)) {
> > > case VIRTIO_VSOCK_OP_RW:
> > > - virtio_transport_recv_enqueue(vsk, skb);
> > > + if (!virtio_transport_recv_enqueue(vsk, skb)) {
> > > + /* There is no more space to queue the packet, so
let's
> > > + * close the connection; otherwise, we'll lose data.
> > > + */
> > > + (void)virtio_transport_reset(vsk, skb);
> > > + sk->sk_state = TCP_CLOSE;
> > > + sk->sk_err = ENOBUFS;
> > > + sk_error_report(sk);
> >
> > sashiko reported some issues related to setting TCP_CLOSE state and not
> > removing the socket from the connect table:
> > https://sashiko.dev/#/patchset/20260513105417.56761-1-sgarzare%40redhat.com
> >
> > I'll change this by calling virtio_transport_do_close() and
> > vsock_remove_sock() in the next version.
> >
> > Stefano
> >
> > > + break;
> > > + }
> > > vsock_data_ready(sk);
> > > return err;
> > > case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
> > > --
> > > 2.54.0
> > >
>
>
> And so the bag of hacks grows. I feel this is energy not well spent.
> Please, let us fix this properly *first*. And then worry about how to
> backport. Maybe it will not be so terrible to backport after all.
>
TBH I don't think this is an hack, but an issue we should fix in any case.
Regarding the second patch, I see your point, but it's a big change
that worries me. I'd like some more time to fix it properly without
rushing. Staying calm without realizing that userspace is broken like
we are now without this series :-(
That said, evaluating further, I think we have a similar issue also
with STREAM on the host side where the skb usually doesn't free space,
so we need a merge strategy also there.
So, I'd like to have time to fix both definitely. If you have time and
want to go ahead, please do.
Thanks,
Stefano