On 4/24/20 3:33 PM, Liu, Yong wrote:
>
>
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coque...@redhat.com>
>> Sent: Friday, April 24, 2020 8:30 PM
>> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>;
>> Wang, Zhihong <zhihong.w...@intel.com>
>> Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haa...@intel.com>
>> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
>>
>>
>>
>> On 4/24/20 11:24 AM, Marvin Liu wrote:
>>> Optimize packed ring Tx path alike Rx path. Split Tx path into batch and
>>
>> s/alike/like/ ?
>>
>>> single Tx functions. Batch function is further optimized by AVX512
>>> instructions.
>>>
>>> Signed-off-by: Marvin Liu <yong....@intel.com>
>>>
>>> diff --git a/drivers/net/virtio/virtio_ethdev.h
>> b/drivers/net/virtio/virtio_ethdev.h
>>> index 5c112cac7..b7d52d497 100644
>>> --- a/drivers/net/virtio/virtio_ethdev.h
>>> +++ b/drivers/net/virtio/virtio_ethdev.h
>>> @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue,
>> struct rte_mbuf **rx_pkts,
>>> uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf
>> **rx_pkts,
>>> uint16_t nb_pkts);
>>>
>>> +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf
>> **tx_pkts,
>>> + uint16_t nb_pkts);
>>> +
>>> int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
>>>
>>> void virtio_interrupt_handler(void *param);
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c
>>> b/drivers/net/virtio/virtio_rxtx.c
>>> index cf18fe564..f82fe8d64 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue
>> __rte_unused,
>>> {
>>> return 0;
>>> }
>>> +
>>> +__rte_weak uint16_t
>>> +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused,
>>> + struct rte_mbuf **tx_pkts __rte_unused,
>>> + uint16_t nb_pkts __rte_unused)
>>> +{
>>> + return 0;
>>> +}
>>> diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c
>> b/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> index 8a7b459eb..c023ace4e 100644
>>> --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> @@ -23,6 +23,24 @@
>>> #define PACKED_FLAGS_MASK ((0ULL |
>> VRING_PACKED_DESC_F_AVAIL_USED) << \
>>> FLAGS_BITS_OFFSET)
>>>
>>> +/* reference count offset in mbuf rearm data */
>>> +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
>>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
>>> +/* segment number offset in mbuf rearm data */
>>> +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
>>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
>>> +
>>> +/* default rearm data */
>>> +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
>>> + 1ULL << REFCNT_BITS_OFFSET)
>>> +
>>> +/* id bits offset in packed ring desc higher 64bits */
>>> +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \
>>> + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE)
>>> +
>>> +/* net hdr short size mask */
>>> +#define NET_HDR_MASK 0x3F
>>> +
>>> #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
>>> sizeof(struct vring_packed_desc))
>>> #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
>>> @@ -47,6 +65,48 @@
>>> for (iter = val; iter < num; iter++)
>>> #endif
>>>
>>> +static inline void
>>> +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq)
>>> +{
>>> + struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
>>> + struct vq_desc_extra *dxp;
>>> + uint16_t used_idx, id, curr_id, free_cnt = 0;
>>> + uint16_t size = vq->vq_nentries;
>>> + struct rte_mbuf *mbufs[size];
>>> + uint16_t nb_mbuf = 0, i;
>>> +
>>> + used_idx = vq->vq_used_cons_idx;
>>> +
>>> + if (!desc_is_used(&desc[used_idx], vq))
>>> + return;
>>> +
>>> + id = desc[used_idx].id;
>>> +
>>> + do {
>>> + curr_id = used_idx;
>>> + dxp = &vq->vq_descx[used_idx];
>>> + used_idx += dxp->ndescs;
>>> + free_cnt += dxp->ndescs;
>>> +
>>> + if (dxp->cookie != NULL) {
>>> + mbufs[nb_mbuf] = dxp->cookie;
>>> + dxp->cookie = NULL;
>>> + nb_mbuf++;
>>> + }
>>> +
>>> + if (used_idx >= size) {
>>> + used_idx -= size;
>>> + vq->vq_packed.used_wrap_counter ^= 1;
>>> + }
>>> + } while (curr_id != id);
>>> +
>>> + for (i = 0; i < nb_mbuf; i++)
>>> + rte_pktmbuf_free(mbufs[i]);
>>> +
>>> + vq->vq_used_cons_idx = used_idx;
>>> + vq->vq_free_cnt += free_cnt;
>>> +}
>>> +
>>
>>
>> I think you can re-use the inlined non-vectorized cleanup function here.
>> Or use your implementation in non-vectorized path.
>> BTW, do you know we have to pass the num argument in non-vectorized
>> case? I'm not sure to remember.
>>
>
> Maxime,
> This is simple version of xmit clean up function. It is based on the concept
> that backend will update used id in burst which also match frontend's
> requirement.
And what the backend doesn't follow that concept?
It is just slower or broken?
> I just found original version work better in loopback case. Will adapt it in
> next version.
>
> Thanks,
> Marvin
>
>> Maxime
>