> On Wed, May 21, 2025 at 10:06:13AM +0800, Xuewei Niu wrote:
> >> On Mon, May 19, 2025 at 03:06:48PM +0800, Xuewei Niu wrote:
> >> >The virtio_vsock_sock has a new field called bytes_unread as the return
> >> >value of the SIOCINQ ioctl.
> >> >
> >> >Though the rx_bytes exists, we introduce a bytes_unread field to the
> >> >virtio_vsock_sock struct. The reason is that it will not be updated
> >> >until the skbuff is fully consumed, which causes inconsistency.
> >> >
> >> >The byte_unread is increased by the length of the skbuff when skbuff is
> >> >enqueued, and it is decreased when dequeued.
> >> >
> >> >Signed-off-by: Xuewei Niu <niuxuewei....@antgroup.com>
> >> >---
> >> > drivers/vhost/vsock.c                   |  1 +
> >> > include/linux/virtio_vsock.h            |  2 ++
> >> > net/vmw_vsock/virtio_transport.c        |  1 +
> >> > net/vmw_vsock/virtio_transport_common.c | 17 +++++++++++++++++
> >> > net/vmw_vsock/vsock_loopback.c          |  1 +
> >> > 5 files changed, 22 insertions(+)
> >> >
> >> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >> >index 802153e23073..0f20af6e5036 100644
> >> >--- a/drivers/vhost/vsock.c
> >> >+++ b/drivers/vhost/vsock.c
> >> >@@ -452,6 +452,7 @@ static struct virtio_transport vhost_transport = {
> >> >          .notify_set_rcvlowat      = 
> >> > virtio_transport_notify_set_rcvlowat,
> >> >
> >> >          .unsent_bytes             = virtio_transport_unsent_bytes,
> >> >+         .unread_bytes             = virtio_transport_unread_bytes,
> >> >
> >> >          .read_skb = virtio_transport_read_skb,
> >> >  },
> >> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >> >index 0387d64e2c66..0a7bd240113a 100644
> >> >--- a/include/linux/virtio_vsock.h
> >> >+++ b/include/linux/virtio_vsock.h
> >> >@@ -142,6 +142,7 @@ struct virtio_vsock_sock {
> >> >  u32 buf_alloc;
> >> >  struct sk_buff_head rx_queue;
> >> >  u32 msg_count;
> >> >+ size_t bytes_unread;
> >>
> >> Can we just use `rx_bytes` field we already have?
> >>
> >> Thanks,
> >> Stefano
> >
> >I perfer not. The `rx_bytes` won't be updated until the skbuff is fully
> >consumed, causing inconsistency issues. If it is acceptable to you, I'll
> >reuse the field instead.
> 
> I think here we found a little pre-existing issue that should be related
> also to what Arseniy (CCed) is trying to fix (low_rx_bytes).
> 
> We basically have 2 counters:
> - rx_bytes, which we use internally to see if there are bytes to read
>    and for sock_rcvlowat
> - fwd_cnt, which we use instead for the credit mechanism and informing
>    the other peer whether we have space or not
> 
> These are updated with virtio_transport_dec_rx_pkt() and 
> virtio_transport_inc_rx_pkt()
> 
> As far as I can see, from the beginning, we call 
> virtio_transport_dec_rx_pkt() only when we consume the entire packet.
> This makes sense for `fwd_cnt`, because we still have occupied space in 
> memory and we don't want to update the credit until we free all the 
> space, but I think it makes no sense for `rx_bytes`, which is only used 
> internally and should reflect the current situation of bytes to read.
> 
> So in my opinion we should fix it this way (untested):
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> b/net/vmw_vsock/virtio_transport_common.c
> index 11eae88c60fc..ee70cb114328 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -449,10 +449,10 @@ static bool virtio_transport_inc_rx_pkt(struct 
> virtio_vsock_sock *vvs,
>   }
> 
>   static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -                                     u32 len)
> +                                     u32 bytes_read, u32 bytes_dequeued)
>   {
> -     vvs->rx_bytes -= len;
> -     vvs->fwd_cnt += len;
> +     vvs->rx_bytes -= bytes_read;
> +     vvs->fwd_cnt += bytes_dequeued;
>   }
> 
>   void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct 
> sk_buff *skb)
> @@ -581,11 +581,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock 
> *vsk,
>                                  size_t len)
>   {
>       struct virtio_vsock_sock *vvs = vsk->trans;
> -     size_t bytes, total = 0;
>       struct sk_buff *skb;
>       u32 fwd_cnt_delta;
>       bool low_rx_bytes;
>       int err = -EFAULT;
> +     size_t total = 0;
>       u32 free_space;
> 
>       spin_lock_bh(&vvs->rx_lock);
> @@ -597,6 +597,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>       }
> 
>       while (total < len && !skb_queue_empty(&vvs->rx_queue)) {
> +             size_t bytes, dequeued = 0;
> +
>               skb = skb_peek(&vvs->rx_queue);
> 
>               bytes = min_t(size_t, len - total,
> @@ -620,12 +622,12 @@ virtio_transport_stream_do_dequeue(struct vsock_sock 
> *vsk,
>               VIRTIO_VSOCK_SKB_CB(skb)->offset += bytes;
> 
>               if (skb->len == VIRTIO_VSOCK_SKB_CB(skb)->offset) {
> -                     u32 pkt_len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
> -
> -                     virtio_transport_dec_rx_pkt(vvs, pkt_len);
> +                     dequeued = le32_to_cpu(virtio_vsock_hdr(skb)->len);
>                       __skb_unlink(skb, &vvs->rx_queue);
>                       consume_skb(skb);
>               }
> +
> +             virtio_transport_dec_rx_pkt(vvs, bytes, dequeued);
>       }
> 
>       fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
> @@ -782,7 +784,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct 
> vsock_sock *vsk,
>                               msg->msg_flags |= MSG_EOR;
>               }
> 
> -             virtio_transport_dec_rx_pkt(vvs, pkt_len);
> +             virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
>               vvs->bytes_unread -= pkt_len;
>               kfree_skb(skb);
>       }
> @@ -1752,6 +1754,7 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, 
> skb_read_actor_t recv_acto
>       struct sock *sk = sk_vsock(vsk);
>       struct virtio_vsock_hdr *hdr;
>       struct sk_buff *skb;
> +     u32 pkt_len;
>       int off = 0;
>       int err;
> 
> @@ -1769,7 +1772,8 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, 
> skb_read_actor_t recv_acto
>       if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
>               vvs->msg_count--;
> 
> -     virtio_transport_dec_rx_pkt(vvs, le32_to_cpu(hdr->len));
> +     pkt_len = le32_to_cpu(hdr->len);
> +     virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
>       spin_unlock_bh(&vvs->rx_lock);
> 
>       virtio_transport_send_credit_update(vsk);
> 
> @Arseniy WDYT?
> I will test it and send a proper patch.
> 
> @Xuewei with that fixed, I think you can use `rx_bytes`, right?

I've seen your patch, and looks good to me. This will greatly simplify the
SIOCINQ ioctl implementation. I'll rework after your patch gets merged.

Thanks,
Xuewei

> Also because you missed for example `virtio_transport_read_skb()` used 
> by ebpf (see commit 3543152f2d33 ("vsock: Update rx_bytes on 
> read_skb()")).
> 
> Thanks,
> Stefano

Reply via email to