Hello Connor, Thank you for the questions and comments. I will repeat the questions, followed by my answers.
Q: Could you be more detailed, why is mbuf pool caching not needed? A: The short answer: under certain conditions, we can run out of buffers from that small, LACPDU-mempool. We actually saw this occur in production, on mostly-idle links. For a long explanation, let's assume the following: 1. 1 tx-queue per bond and underlying ethdev ports. 2. 256 tx-descriptors (per ethdev port). 3. 257 mbufs in each port's LACPDU-pool, as computed by bond_mode_8023ad_activate_slave(), and cache-size 32. 4. The "app" xmits zero packets to this bond for a long time. 5. In EAL intr thread context, LACP tx_machine() allocates 1 mbuf (LACPDU) per second from the pool, and puts it into LACP tx-ring. 6. Every second, another thread, let's call it the tx-core, calls tx-burst (with zero packets to xmit), finds 1 mbuf on LACP tx-ring, and underlying ethdev PMD puts mbuf data into a tx-desc. 7. PMD tx-burst configured not to clean up used tx-descs until there are almost none free, e.g., less than pool's cache-size * CACHE_FLUSH_THRESH_MULTIPLIER (1.5). 8. When cleaning up tx-descs, we may leave up to 47 mbufs in the tx-core's LACPDU-pool cache (not accessible from intr thread). When the number of used tx-descs (0..255) + number of mbufs in the cache (0..47) reaches 257, then allocation fails. If I understand the LACP tx-burst code correctly, it would be worse if nb_tx_queues > 1, because (assuming multiple tx-cores) any queue/lcore could xmit an LACPDU. Thus, up to nb_tx_queues * 47 mbufs could be cached, and not accessible from tx_machine(). You would not see this problem if the app xmits other (non-LACP) mbufs on a regular basis, to expedite the clean-up of tx-descs including LACPDU mbufs (unless nb_tx_queues tx-core caches could hold all LACPDU mbufs). If we make mempool's cache size 0, then allocation will not fail. A mempool cache for LACPDUs does not offer much additional speed: during alloc, the intr thread does not have default mempool caches (AFAIK); and the average time between frees is either 1 second (LACP short timeouts) or 10 seconds (long timeouts), i.e., infrequent. -------- Q: Why reserve one additional slot in the rx and tx rings? A: rte_ring_create() requires the ring size N, to be a power of 2, but it can only store N-1 items. Thus, if we want to store X items, we need to ask for (at least) X+1. Original code fails when the real desired size is a power of 2, because in such a case, align32pow2 does not round up. For example, say we want a ring to hold 4: rte_ring_create(... rte_align32pow2(4) ...) rte_align32pow2(4) returns 4, and we end up with a ring that only stores 3 items. rte_ring_create(... rte_align32pow2(4+1) ...) rte_align32pow2(5) returns 8, and we end up with a ring that stores up to 7 items, more than we need, but acceptable. -------- Q: I found the comment for BOND_MODE_8023AX_SLAVE_RX_PKTS is wrong, could you fix it in this patch? A: Yes, I will fix it in the next version of the patch. -- Regards, Robert Sanford On 12/16/21, 4:01 AM, "Min Hu (Connor)" <humi...@huawei.com> wrote: Hi, Robert, 在 2021/12/16 2:19, Robert Sanford 写道: > - Turn off mbuf pool caching to avoid mbufs lingering in pool caches. > At most, we transmit one LACPDU per second, per port. Could you be more detailed, why does mbuf pool caching is not needed? > - Fix calculation of ring sizes, taking into account that a ring of > size N holds up to N-1 items. Same to that, why should resvere another items ? > By the way, I found the comment for BOND_MODE_8023AX_SLAVE_RX_PKTS is is wrong, could you fix it in this patch? > Signed-off-by: Robert Sanford <rsanf...@akamai.com> > --- > drivers/net/bonding/rte_eth_bond_8023ad.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c > index 43231bc..83d3938 100644 > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c > @@ -1101,9 +1101,7 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, > } > > snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id); > - port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc, > - RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? > - 32 : RTE_MEMPOOL_CACHE_MAX_SIZE, > + port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc, 0, > 0, element_size, socket_id); > > /* Any memory allocation failure in initialization is critical because > @@ -1113,19 +1111,23 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, > slave_id, mem_name, rte_strerror(rte_errno)); > } > > + /* Add one extra because ring reserves one. */ > snprintf(mem_name, RTE_DIM(mem_name), "slave_%u_rx", slave_id); > port->rx_ring = rte_ring_create(mem_name, > - rte_align32pow2(BOND_MODE_8023AX_SLAVE_RX_PKTS), socket_id, 0); > + rte_align32pow2(BOND_MODE_8023AX_SLAVE_RX_PKTS + 1), > + socket_id, 0); > > if (port->rx_ring == NULL) { > rte_panic("Slave %u: Failed to create rx ring '%s': %s\n", slave_id, > mem_name, rte_strerror(rte_errno)); > } > > - /* TX ring is at least one pkt longer to make room for marker packet. */ > + /* TX ring is at least one pkt longer to make room for marker packet. > + * Add one extra because ring reserves one. */ > snprintf(mem_name, RTE_DIM(mem_name), "slave_%u_tx", slave_id); > port->tx_ring = rte_ring_create(mem_name, > - rte_align32pow2(BOND_MODE_8023AX_SLAVE_TX_PKTS + 1), socket_id, 0); > + rte_align32pow2(BOND_MODE_8023AX_SLAVE_TX_PKTS + 2), > + socket_id, 0); > > if (port->tx_ring == NULL) { > rte_panic("Slave %u: Failed to create tx ring '%s': %s\n", slave_id, >