March 2, 2026 at 16:10, "Sebastian Andrzej Siewior" <[email protected] 
mailto:[email protected]?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E
 > wrote:


> 
> On 2026-02-28 03:36:24 [+0000], Jiayuan Chen wrote:
> 
> > 
> > My only concern is that this will waste a percpu u32 per bond
> >  device for the majority of bonding use cases (which use modes other than
> >  balance-rr), which could be a few hundred bytes on a large machine.
> >  
> >  Does everything work reliably if the rr_tx_counter allocation
> >  happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as
> >  well as in bond_option_mode_set?
> > 
> …
> 
> > 
> > An alternative would be to allocate conditionally in bond_init() (since the 
> > default mode is round-robin)
> >  and manage allocation/deallocation in bond_option_mode_set() when the mode 
> > changes.
> > 
> This sounds reasonable.
> 
> > 
> > This is a trade-off between the added complexity of conditional alloc/free 
> > across multiple code
> >  paths and saving a per-CPU u32 for non-round-robin bonds.
> >  
> >  For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond 
> > device — and machines with
> >  that many CPUs tend to have plenty of memory to match.
> > 
> 4 bytes is the minimum allocation for per-CPU memory. The memory is
> already "there" it is just not assigned. So for the 4 byte allocation it
> is needed to find a single area (the smallest allocation size).
> In case there no free block, a new block will be allocated and mapped
> for each CPU which the part that costs memory.
> That said, we should not waste memory but it is not _that_ expensive
> either for a bond device. Things change if here are hundreds of devices.


Hi Jay, Sebastian,

Sorry, the conditional alloc/free approach in bond_option_mode_set()
I suggested earlier doesn't work well on closer inspection.

Since bond_option_mode_set() requires the bond to be down, and as this
bug shows, the XDP redirect path can still reach a downed bond device,
we need to carefully order the operations:

when switching to RR, alloc rr_tx_counter before setting mode;
when switching away, set mode before freeing rr_tx_counter.

Strictly speaking, this also requires smp_wmb()/smp_rmb() pairs to
guarantee the ordering is visible to other CPUs — the write side in
bond_option_mode_set() and the read side in the XDP path.

In practice the race window is very small and unlikely to trigger, but
leaving out the barriers would look incorrect, and adding them to the
XDP hot path feels wrong for saving only 4 bytes per CPU per bond device.


smp_wmb()/smp_rmb() (no test): 

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev)
        struct list_head *iter;
        struct slave *slave;

-     if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
-             bond->rr_tx_counter = alloc_percpu(u32);
-             if (!bond->rr_tx_counter)
-                     return -ENOMEM;
-     }
-
        /* reset slave->backup and slave->inactive */
        if (bond_has_slaves(bond)) {
                        bond_for_each_slave(bond, slave, iter) {
@@ -5532,6 +5526,8 @@ bond_xdp_get_xmit_slave(struct net_device *bond_dev, 
struct xdp_buff *xdp)

        switch (BOND_MODE(bond)) {
        case BOND_MODE_ROUNDROBIN:
+             /* Pairs with smp_wmb() in bond_option_mode_set() */
+             smp_rmb();
                        slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
                        break;

@@ -6411,6 +6407,14 @@ static int bond_init(struct net_device *bond_dev)
        if (!bond->wq)
                        return -ENOMEM;

+     /* Default mode is round-robin, allocate rr_tx_counter for it.
+      * For mode changes, bond_option_mode_set() manages the lifecycle.
+      */
+     bond->rr_tx_counter = alloc_percpu(u32);
+     if (!bond->rr_tx_counter) {
+             destroy_workqueue(bond->wq);
+             return -ENOMEM;
+     }
+
        bond->notifier_ctx = false;

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -918,7 +918,27 @@ static int bond_option_mode_set(struct bonding *bond,
        /* don't cache arp_validate between modes */
        bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
-     bond->params.mode = newval->value;
+
+     if (newval->value == BOND_MODE_ROUNDROBIN) {
+             /* Switching to round-robin: allocate before setting mode,
+              * so XDP path seeing BOND_MODE_ROUNDROBIN always finds
+              * rr_tx_counter allocated.
+              */
+             if (!bond->rr_tx_counter) {
+                     bond->rr_tx_counter = alloc_percpu(u32);
+                     if (!bond->rr_tx_counter)
+                             return -ENOMEM;
+             }
+             /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */
+             smp_wmb();
+             bond->params.mode = newval->value;
+     } else {
+             /* Switching away: set mode first so XDP no longer
+              * enters RR branch before we free rr_tx_counter.
+              */
+             bond->params.mode = newval->value;
+             /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */
+             smp_wmb();
+             free_percpu(bond->rr_tx_counter);
+             bond->rr_tx_counter = NULL;
+     }

Thanks,
Jiayuan


> > 
> > Thanks
> >  
> >  -J
> > 
> Sebastian
>

Reply via email to