On Wed, Oct 16, 2024 at 11:30 AM Aleksandr Loktionov <aleksandr.loktio...@intel.com> wrote: > > Fix a race condition in the i40e driver that leads to MAC/VLAN filters > becoming corrupted and leaking. Address the issue that occurs under > heavy load when multiple threads are concurrently modifying MAC/VLAN > filters by setting mac and port VLAN. > > 1. Thread T0 allocates a filter in i40e_add_filter() within > i40e_ndo_set_vf_port_vlan(). > 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within > i40e_ndo_set_vf_mac(). > 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which > refers to the already freed filter memory, causing corruption.
I think an important detail is missing from the race description. I am guessing it could happen like this: 1. A thread allocates a filter with i40e_add_filter(). 2. i40e_service_task() calls i40e_sync_vsi_filters(), which adds an entry to its local tmp_add_list referencing the filter. It releases vsi->mac_filter_hash_lock before it processes tmp_add_list. 3. A thread frees the filter in __i40e_del_filter(). This is while holding vsi->mac_filter_hash_lock. 4. The service task processes tmp_add_list and dereferences the already freed filter. Do I understand the race right? Michal > Reproduction steps: > 1. Spawn multiple VFs. > 2. Apply a concurrent heavy load by running parallel operations to change > MAC addresses on the VFs and change port VLANs on the host. > 3. Observe errors in dmesg: > "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX, > please set promiscuous on manually for VF XX". > > Exact code for stable reproduction Intel can't open-source now. > > The fix involves implementing a new intermediate filter state, > I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list. > These filters cannot be deleted from the hash list directly but > must be removed using the full process. > > Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC > Address as key") > Signed-off-by: Aleksandr Loktionov <aleksandr.loktio...@intel.com> > --- > v1->v2 change commit title, removed RESERVED state byt request in review > --- > drivers/net/ethernet/intel/i40e/i40e.h | 2 ++ > drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 1 + > drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++++++++++-- > 3 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e.h > b/drivers/net/ethernet/intel/i40e/i40e.h > index 2089a0e..2e205218 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e.h > +++ b/drivers/net/ethernet/intel/i40e/i40e.h > @@ -755,6 +755,8 @@ enum i40e_filter_state { > I40E_FILTER_ACTIVE, /* Added to switch by FW */ > I40E_FILTER_FAILED, /* Rejected by FW */ > I40E_FILTER_REMOVE, /* To be removed */ > + I40E_FILTER_NEW_SYNC, /* New, not sent yet, is in > + i40e_sync_vsi_filters() */ > /* There is no 'removed' state; the filter struct is freed */ > }; > struct i40e_mac_filter { > diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c > b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c > index abf624d..208c2f0 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c > @@ -89,6 +89,7 @@ static char *i40e_filter_state_string[] = { > "ACTIVE", > "FAILED", > "REMOVE", > + "NEW_SYNC", > }; > > /** > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c > b/drivers/net/ethernet/intel/i40e/i40e_main.c > index 25295ae..55fb362 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c > @@ -1255,6 +1255,7 @@ int i40e_count_filters(struct i40e_vsi *vsi) > > hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) { > if (f->state == I40E_FILTER_NEW || > + f->state == I40E_FILTER_NEW_SYNC || > f->state == I40E_FILTER_ACTIVE) > ++cnt; > } > @@ -1441,6 +1442,8 @@ static int i40e_correct_mac_vlan_filters(struct > i40e_vsi *vsi, > > new->f = add_head; > new->state = add_head->state; > + if (add_head->state == I40E_FILTER_NEW) > + add_head->state = I40E_FILTER_NEW_SYNC; > > /* Add the new filter to the tmp list */ > hlist_add_head(&new->hlist, tmp_add_list); > @@ -1550,6 +1553,8 @@ static int i40e_correct_vf_mac_vlan_filters(struct > i40e_vsi *vsi, > return -ENOMEM; > new_mac->f = add_head; > new_mac->state = add_head->state; > + if (add_head->state == I40E_FILTER_NEW) > + add_head->state = I40E_FILTER_NEW_SYNC; > > /* Add the new filter to the tmp list */ > hlist_add_head(&new_mac->hlist, tmp_add_list); > @@ -2437,7 +2442,8 @@ static int > i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char *vsi_name, > struct i40e_mac_filter *f) > { > - bool enable = f->state == I40E_FILTER_NEW; > + bool enable = f->state == I40E_FILTER_NEW || > + f->state == I40E_FILTER_NEW_SYNC; > struct i40e_hw *hw = &vsi->back->hw; > int aq_ret; > > @@ -2611,6 +2617,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) > > /* Add it to the hash list */ > hlist_add_head(&new->hlist, &tmp_add_list); > + f->state = I40E_FILTER_NEW_SYNC; > } > > /* Count the number of active (current and new) VLAN > @@ -2762,7 +2769,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) > spin_lock_bh(&vsi->mac_filter_hash_lock); > hlist_for_each_entry_safe(new, h, &tmp_add_list, hlist) { > /* Only update the state if we're still NEW */ > - if (new->f->state == I40E_FILTER_NEW) > + if (new->f->state == I40E_FILTER_NEW || > + new->f->state == I40E_FILTER_NEW_SYNC) > new->f->state = new->state; > hlist_del(&new->hlist); > netdev_hw_addr_refcnt(new->f, vsi->netdev, -1); > -- > 2.25.1 >