bonding: another fix to LACP mempool size

Sanford, Robert Tue, 23 Aug 2016 20:01:57 +0000

Hi Olivier,

On 8/23/16, 11:09 AM, "Olivier MATZ" <olivier.matz at 6wind.com> wrote:

    Hi Robert,

    On 08/01/2016 10:42 PM, Robert Sanford wrote:
    > The following log message may appear after a slave is idle (or nearly
    > idle) for a few minutes: "PMD: Failed to allocate LACP packet from
    > pool".
    >
    > Problem: All mbufs from a slave's private pool (used exclusively for
    > transmitting LACPDUs) have been allocated and are still sitting in
    > the device's tx descriptor ring and other cores' mempool caches.
    >
    > Solution: Ensure that each slaves' tx (LACPDU) mempool owns more than
    > n-tx-queues * (n-tx-descriptors + per-core-mempool-flush-threshold)
    > mbufs.
    >
    > Note that the LACP tx machine function is the only code that allocates
    > from a slave's private pool. It runs in the context of the interrupt
    > thread, and thus it has no mempool cache of its own.
    >
    > Signed-off-by: Robert Sanford <rsanford at akamai.com>
    > ---
    >   drivers/net/bonding/rte_eth_bond_8023ad.c |   10 +++++++---
    >   1 files changed, 7 insertions(+), 3 deletions(-)
    >
    > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
    > index 2f7ae70..1207896 100644
    > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
    > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
    > @@ -854,6 +854,8 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev 
*bond_dev, uint8_t slave_id)
    >           char mem_name[RTE_ETH_NAME_MAX_LEN];
    >           int socket_id;
    >           unsigned element_size;
    > + unsigned cache_size;
    > + unsigned cache_flushthresh;
    >           uint32_t total_tx_desc;
    >           struct bond_tx_queue *bd_tx_q;
    >           uint16_t q_id;
    > @@ -890,19 +892,21 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev 
*bond_dev, uint8_t slave_id)
    >
    >           element_size = sizeof(struct slow_protocol_frame) + 
sizeof(struct rte_mbuf)
    >                                   + RTE_PKTMBUF_HEADROOM;
    > + cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
    > +         32 : RTE_MEMPOOL_CACHE_MAX_SIZE;
    > + cache_flushthresh = RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size);
    >
    >           /* The size of the mempool should be at least:
    >            * the sum of the TX descriptors + 
BOND_MODE_8023AX_SLAVE_TX_PKTS */
    >           total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS;
    >           for (q_id = 0; q_id < bond_dev->data->nb_tx_queues; q_id++) {
    >                   bd_tx_q = (struct 
bond_tx_queue*)bond_dev->data->tx_queues[q_id];
    > -         total_tx_desc += bd_tx_q->nb_tx_desc;
    > +         total_tx_desc += bd_tx_q->nb_tx_desc + cache_flushthresh;
    >           }
    >
    >           snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", 
slave_id);
    >           port->mbuf_pool = rte_mempool_create(mem_name,
    > -         total_tx_desc, element_size,
    > -         RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 : 
RTE_MEMPOOL_CACHE_MAX_SIZE,
    > +         total_tx_desc, element_size, cache_size,
    >                   sizeof(struct rte_pktmbuf_pool_private), 
rte_pktmbuf_pool_init,
    >                   NULL, rte_pktmbuf_init, NULL, socket_id, 
MEMPOOL_F_NO_SPREAD);
    >
    >

    I'm not very familiar with bonding code, so maybe your patch is correct. 
    I think the size of the mempool should be:

       BOND_MODE_8023AX_SLAVE_TX_PKTS +
         n_cores * RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size)

    With n_cores = number of cores that can dequeue from the mempool.

    The safest thing to do would be to have n_cores = RTE_MAX_LCORE. I don't 
    know if bond_dev->data->nb_tx_queue corresponds to this definition, if 
    yes you can ignore my comment ;)

    Regards,
    Olivier

--

Only one, non-EAL thread dequeues from a per-slave, private (LACPDU) mempool. 
The thread calls mbuf-alloc in the context of an rte_eal_alarm_set( ) callback 
function, and enqueues the mbuf to a multi-consumer ring.

The consumer(s) of the mbufs can be any core(s) that invoke the bond PMD?s 
tx-burst function. As we know, mbufs can sit around on TX descriptor rings long 
after we transmit them, often until we need to reuse the descriptors for 
subsequent TX packets. When the TX cores finally free some of the mbufs, some 
of the mbufs get left in a pool?s per-core cache. So, the maximum number of 
mbufs that may be lying around doing nothing is: the ?done? mbufs in the TX 
desc ring and in the per-core cache, times the number of TX cores, or 
approximately nb_tx_queues * (nb_tx_desc + CACHE_FLUSHTHRESH(cache_size)). (I 
probably should also update the comment above the for-loop that calculates 
total_tx_desc.)

When we include normal TX mbufs and private LACPDU mbufs displacing each other 
in the TX desc ring, there are many (if not infinite) possibilities. I wrote a 
program to simulate all this interaction, trying different values for 
cache_size, nb_tx_desc, etc., because it took a long time to run out of LACPDU 
mbufs (or it never happened) just using long-interval pings.

Anyway, I hope that this is a nice, long way of saying that I will ignore your 
comment. ;-)

Another idea, related to mempools (but not this LACP patch) ...
Back when I first investigated the LACP pool problem, I got an idea related to 
the mempool per-core caches. It seems like a waste that in some usage patterns 
(like above), a core may ?put? objects to a pool, but rarely/never ?get? 
objects from the pool, the result being that it rarely/never reuses objects 
stuck in the per-core cache. We could recognize this kind of behavior by adding 
a small counter to the rte_mempool_cache struct. During a put, we increment the 
counter, and during a get, we set the counter to zero. If, during a put that is 
going to flush some of the objects to the global part of the pool, the counter 
is at or above some threshold (8, 16, 32, or whatever), then we flush ALL of 
the objects, since this core hasn?t been using them. What do you think?

Regards,
Robert

[dpdk-dev] [PATCH 3/4] net/bonding: another fix to LACP mempool size

Reply via email to