On 4/9/25 17:41, Cosmin Ratiu wrote:
> This patch series was motivated by fixing a few bugs in the bonding
> driver related to xfrm state migration on device failover.
>
> struct xfrm_dev_offload has two net_device pointers: dev and real_dev.
> The first one is the device the xfrm_state is offloaded on and the
> second one is used by the bonding driver to manage the underlying device
> xfrm_states are actually offloaded on. When bonding isn't used, the two
> pointers are the same.
>
> This causes confusion in drivers: Which device pointer should they use?
> If they want to support bonding, they need to only use real_dev and
> never look at dev.
>
> Furthermore, real_dev is used without proper locking from multiple code
> paths and changing it is dangerous. See commit [1] for example.
>
> This patch series clears things out by removing all uses of real_dev
> from outside the bonding driver.
> Then, the bonding driver is refactored to fix a couple of long standing
> races and the original bug which motivated this patch series.
>
> [1] commit f8cde9805981 ("bonding: fix xfrm real_dev null pointer
> dereference")
>
> v1 -> v2:
> Added missing kdoc for various functions.
> Made bond_ipsec_del_sa() use xso.real_dev instead of curr_active_slave.
>
> Cosmin Ratiu (6):
> Cleaning up unnecessary uses of xso.real_dev:
> net/mlx5: Avoid using xso.real_dev unnecessarily
> xfrm: Use xdo.dev instead of xdo.real_dev
> xfrm: Remove unneeded device check from validate_xmit_xfrm
> Refactoring device operations to get an explicit device pointer:
> xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
> Fixing a bonding xfrm state migration bug:
> bonding: Mark active offloaded xfrm_states
> Fixing long standing races in bonding:
> bonding: Fix multiple long standing offload races
>
> Documentation/networking/xfrm_device.rst | 10 +-
> drivers/net/bonding/bond_main.c | 113 +++++++++---------
> .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 ++--
> .../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++-
> .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 41 ++++---
> drivers/net/ethernet/intel/ixgbevf/ipsec.c | 21 ++--
> .../marvell/octeontx2/nic/cn10k_ipsec.c | 18 +--
> .../mellanox/mlx5/core/en_accel/ipsec.c | 28 ++---
> .../mellanox/mlx5/core/en_accel/ipsec.h | 1 +
> .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +-
> drivers/net/netdevsim/ipsec.c | 15 ++-
> include/linux/netdevice.h | 10 +-
> include/net/xfrm.h | 8 ++
> net/xfrm/xfrm_device.c | 13 +-
> net/xfrm/xfrm_state.c | 16 +--
> 15 files changed, 182 insertions(+), 161 deletions(-)
>
Thank you for following up on this. It's definitely not getting cleaner with a
bond
ptr in xfrm protected by two locks in different drivers, but it should do
AFAICT.
In case there is another version I'd add a big comment of the locking
expectations
for real_dev, and maybe in the future it can fully move to the bonding as well.
For the set:
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Cheers,
Nik