> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Friday, April 24, 2020 8:30 PM
> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>;
> Wang, Zhihong <zhihong.w...@intel.com>
> Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haa...@intel.com>
> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
>
>
>
> On 4/24/20 11:24 AM, Marvin Liu wrote:
> > Optimize packed ring Tx path alike Rx path. Split Tx path into batch and
>
> s/alike/like/ ?
>
> > single Tx functions. Batch function is further optimized by AVX512
> > instructions.
> >
> > Signed-off-by: Marvin Liu <yong....@intel.com>
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> > index 5c112cac7..b7d52d497 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue,
> struct rte_mbuf **rx_pkts,
> > uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf
> **rx_pkts,
> > uint16_t nb_pkts);
> >
> > +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> > + uint16_t nb_pkts);
> > +
> > int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
> >
> > void virtio_interrupt_handler(void *param);
> > diff --git a/drivers/net/virtio/virtio_rxtx.c
> > b/drivers/net/virtio/virtio_rxtx.c
> > index cf18fe564..f82fe8d64 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue
> __rte_unused,
> > {
> > return 0;
> > }
> > +
> > +__rte_weak uint16_t
> > +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused,
> > + struct rte_mbuf **tx_pkts __rte_unused,
> > + uint16_t nb_pkts __rte_unused)
> > +{
> > + return 0;
> > +}
> > diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > index 8a7b459eb..c023ace4e 100644
> > --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > @@ -23,6 +23,24 @@
> > #define PACKED_FLAGS_MASK ((0ULL |
> VRING_PACKED_DESC_F_AVAIL_USED) << \
> > FLAGS_BITS_OFFSET)
> >
> > +/* reference count offset in mbuf rearm data */
> > +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
> > + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +/* segment number offset in mbuf rearm data */
> > +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
> > + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +
> > +/* default rearm data */
> > +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
> > + 1ULL << REFCNT_BITS_OFFSET)
> > +
> > +/* id bits offset in packed ring desc higher 64bits */
> > +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \
> > + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE)
> > +
> > +/* net hdr short size mask */
> > +#define NET_HDR_MASK 0x3F
> > +
> > #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> > sizeof(struct vring_packed_desc))
> > #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> > @@ -47,6 +65,48 @@
> > for (iter = val; iter < num; iter++)
> > #endif
> >
> > +static inline void
> > +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq)
> > +{
> > + struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
> > + struct vq_desc_extra *dxp;
> > + uint16_t used_idx, id, curr_id, free_cnt = 0;
> > + uint16_t size = vq->vq_nentries;
> > + struct rte_mbuf *mbufs[size];
> > + uint16_t nb_mbuf = 0, i;
> > +
> > + used_idx = vq->vq_used_cons_idx;
> > +
> > + if (!desc_is_used(&desc[used_idx], vq))
> > + return;
> > +
> > + id = desc[used_idx].id;
> > +
> > + do {
> > + curr_id = used_idx;
> > + dxp = &vq->vq_descx[used_idx];
> > + used_idx += dxp->ndescs;
> > + free_cnt += dxp->ndescs;
> > +
> > + if (dxp->cookie != NULL) {
> > + mbufs[nb_mbuf] = dxp->cookie;
> > + dxp->cookie = NULL;
> > + nb_mbuf++;
> > + }
> > +
> > + if (used_idx >= size) {
> > + used_idx -= size;
> > + vq->vq_packed.used_wrap_counter ^= 1;
> > + }
> > + } while (curr_id != id);
> > +
> > + for (i = 0; i < nb_mbuf; i++)
> > + rte_pktmbuf_free(mbufs[i]);
> > +
> > + vq->vq_used_cons_idx = used_idx;
> > + vq->vq_free_cnt += free_cnt;
> > +}
> > +
>
>
> I think you can re-use the inlined non-vectorized cleanup function here.
> Or use your implementation in non-vectorized path.
> BTW, do you know we have to pass the num argument in non-vectorized
> case? I'm not sure to remember.
>
Maxime,
This is simple version of xmit clean up function. It is based on the concept
that backend will update used id in burst which also match frontend's
requirement.
I just found original version work better in loopback case. Will adapt it in
next version.
Thanks,
Marvin
> Maxime