On 2/2/2024 1:19 AM, lon...@linuxonhyperv.com wrote: > From: Long Li <lon...@microsoft.com> > > Instead of allocating mbufs one by one during RX, use > rte_pktmbuf_alloc_bulk() to allocate them in a batch. > > With this patch, there are no measurable performance improvements in > benchmarks. However, this patch should improve CPU cycles and reduce > potential locking conflicts in real-world applications. > > Signed-off-by: Long Li <lon...@microsoft.com>
<...> > @@ -120,20 +113,33 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq) > /* > * Post work requests for a Rx queue. > */ > +#define MANA_MBUF_BULK 32u > static int > -mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq) > +mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq, uint32_t count) > { > int ret; > - uint32_t i; > + uint32_t i, batch_count; > + struct rte_mbuf *mbufs[MANA_MBUF_BULK]; > + > +more_mbufs: > + batch_count = RTE_MIN(count, MANA_MBUF_BULK); > + ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, batch_count); > + if (ret) { > + DP_LOG(ERR, "failed to allocate mbufs for RX"); > + rxq->stats.nombuf += count; > + > + /* Bail out to ring doorbell for posted packets */ > + goto out; > + } > > #ifdef RTE_ARCH_32 > rxq->wqe_cnt_to_short_db = 0; > #endif > - for (i = 0; i < rxq->num_desc; i++) { > - ret = mana_alloc_and_post_rx_wqe(rxq); > + for (i = 0; i < batch_count; i++) { > + ret = mana_post_rx_wqe(rxq, mbufs[i]); > if (ret) { > DP_LOG(ERR, "failed to post RX ret = %d", ret); > - return ret; > + break; > Hi Long, Assume that if "count > MANA_MBUF_BULK", and int the first iteration of the loop 'mana_post_rx_wqe()' failed, but in second iteration it is successful, this will cause function to return success in spite of failure in first iteration. As mbufs posted Rx queue, it may be OK to consider above case as success, but since 'count' number not posted this may be misleading. I just want to double check if this is done intentionally. With the limitation of VLA code become more complex, and if there is no performance benefit of the allocating 'mbufs' array from stack, you may prefer to switch back to allocating dynamic memory, up to you.