On 2/2/2024 1:19 AM, lon...@linuxonhyperv.com wrote:
> From: Long Li <lon...@microsoft.com>
> 
> Instead of allocating mbufs one by one during RX, use
> rte_pktmbuf_alloc_bulk() to allocate them in a batch.
> 
> With this patch, there are no measurable performance improvements in
> benchmarks. However, this patch should improve CPU cycles and reduce
> potential locking conflicts in real-world applications.
> 
> Signed-off-by: Long Li <lon...@microsoft.com>

<...>

> @@ -120,20 +113,33 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq)
>  /*
>   * Post work requests for a Rx queue.
>   */
> +#define MANA_MBUF_BULK 32u
>  static int
> -mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq)
> +mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq, uint32_t count)
>  {
>       int ret;
> -     uint32_t i;
> +     uint32_t i, batch_count;
> +     struct rte_mbuf *mbufs[MANA_MBUF_BULK];
> +
> +more_mbufs:
> +     batch_count = RTE_MIN(count, MANA_MBUF_BULK);
> +     ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, batch_count);
> +     if (ret) {
> +             DP_LOG(ERR, "failed to allocate mbufs for RX");
> +             rxq->stats.nombuf += count;
> +
> +             /* Bail out to ring doorbell for posted packets */
> +             goto out;
> +     }
>  
>  #ifdef RTE_ARCH_32
>       rxq->wqe_cnt_to_short_db = 0;
>  #endif
> -     for (i = 0; i < rxq->num_desc; i++) {
> -             ret = mana_alloc_and_post_rx_wqe(rxq);
> +     for (i = 0; i < batch_count; i++) {
> +             ret = mana_post_rx_wqe(rxq, mbufs[i]);
>               if (ret) {
>                       DP_LOG(ERR, "failed to post RX ret = %d", ret);
> -                     return ret;
> +                     break;
>

Hi Long,

Assume that if "count > MANA_MBUF_BULK", and int the first iteration of
the loop 'mana_post_rx_wqe()' failed, but in second iteration it is
successful, this will cause function to return success in spite of
failure in first iteration.

As mbufs posted Rx queue, it may be OK to consider above case as
success, but since 'count' number not posted this may be misleading.
I just want to double check if this is done intentionally.


With the limitation of VLA code become more complex, and if there is no
performance benefit of the allocating 'mbufs' array from stack, you may
prefer to switch back to allocating dynamic memory, up to you.


Reply via email to