> Can you please quantify the performance improvement (as percentage), this > clarifies the impact of the modification.
I didn't see any meaningful performance improvements in benchmarks. However, this should improve CPU cycles and reduce potential locking conflicts in real-world applications. Using batch allocation was one of the review comments during initial driver submission, suggested by Stephen Hemminger. I promised to fix it at that time. Sorry it took a while to submit this patch. > > <...> > > > @@ -121,19 +115,32 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq > *rxq) > > * Post work requests for a Rx queue. > > */ > > static int > > -mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq) > > +mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq, uint32_t count) > > { > > int ret; > > uint32_t i; > > + struct rte_mbuf **mbufs; > > + > > + mbufs = rte_calloc_socket("mana_rx_mbufs", count, sizeof(struct > rte_mbuf *), > > + 0, rxq->mp->socket_id); > > + if (!mbufs) > > + return -ENOMEM; > > > > 'mbufs' is temporarily storage for allocated mbuf pointers, why not allocate > if from > stack instead, can be faster and easier to manage: > "struct rte_mbuf *mbufs[count]" > > > > + > > + ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, count); > > + if (ret) { > > + DP_LOG(ERR, "failed to allocate mbufs for RX"); > > + rxq->stats.nombuf += count; > > + goto fail; > > + } > > > > #ifdef RTE_ARCH_32 > > rxq->wqe_cnt_to_short_db = 0; > > #endif > > - for (i = 0; i < rxq->num_desc; i++) { > > - ret = mana_alloc_and_post_rx_wqe(rxq); > > + for (i = 0; i < count; i++) { > > + ret = mana_post_rx_wqe(rxq, mbufs[i]); > > if (ret) { > > DP_LOG(ERR, "failed to post RX ret = %d", ret); > > - return ret; > > + goto fail; > > > > This may leak memory. There are allocated mbufs, if exit from loop here and > free > 'mubfs' variable, how remaining mubfs will be freed? Mbufs are always freed after fail: fail: rte_free(mbufs); >