Re: [Intel-wired-lan] [PATCH iwl-net v2 1/2] ice: Fix entering Safe Mode

2024-09-23 Thread Marcin Szycik



On 20.09.2024 19:14, Brett Creeley wrote:
> 
> 
> On 9/20/2024 9:59 AM, Marcin Szycik wrote:
>> Caution: This message originated from an External Source. Use proper caution 
>> when opening attachments, clicking links, or responding.
>>
>>
>> If DDP package is missing or corrupted, the driver should enter Safe Mode.
>> Instead, an error is returned and probe fails.
>>
>> Don't check return value of ice_init_ddp_config() to fix this.
>>
>> Change ice_init_ddp_config() type to void, as now its return is never
>> checked.
>>
>> Repro:
>> * Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
>> * Load ice
>>
>> Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology")
>> Reviewed-by: Przemek Kitszel 
>> Signed-off-by: Marcin Szycik 
>> ---
>> v2: Change ice_init_ddp_config() type to void
>> ---
>>   drivers/net/ethernet/intel/ice/ice_main.c | 15 +++
>>   1 file changed, 3 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
>> b/drivers/net/ethernet/intel/ice/ice_main.c
>> index 0f5c9d347806..aeebf4ae25ae 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>> @@ -4548,34 +4548,27 @@ ice_init_tx_topology(struct ice_hw *hw, const struct 
>> firmware *firmware)
>>    *
>>    * This function loads DDP file from the disk, then initializes Tx
>>    * topology. At the end DDP package is loaded on the card.
>> - *
>> - * Return: zero when init was successful, negative values otherwise.
>>    */
>> -static int ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf)
>> +static void ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf)
>>   {
>>  struct device *dev = ice_pf_to_dev(pf);
>>  const struct firmware *firmware = NULL;
>>  int err;
>>
>>  err = ice_request_fw(pf, &firmware);
>> -   if (err) {
>> +   if (err)
>>  dev_err(dev, "Fail during requesting FW: %d\n", err);
>> -   return err;
>> -   }
>>
>>  err = ice_init_tx_topology(hw, firmware);
>>  if (err) {
>>  dev_err(dev, "Fail during initialization of Tx topology: 
>> %d\n",
>>  err);
>>  release_firmware(firmware);
>> -   return err;
>>  }
>>
>>  /* Download firmware to device */
>>  ice_load_pkg(firmware, pf);
>>  release_firmware(firmware);
>> -
>> -   return 0;
>>   }
>>
>>   /**
>> @@ -4748,9 +4741,7 @@ int ice_init_dev(struct ice_pf *pf)
>>
>>  ice_init_feature_support(pf);
>>
>> -   err = ice_init_ddp_config(hw, pf);
>> -   if (err)
>> -   return err;
>> +   ice_init_ddp_config(hw, pf);
> 
> I just commented this on v1 as I didn't expect it to be resent. I'm also okay 
> with Maciej's suggestion, but I wanted to offer an alternative option.
> 
> As an alternative solution you could potentially do the following, which
> would make the flow more readable:
> 
> err = ice_init_ddp_config(hw, pf);
> if (err || ice_is_safe_mode(pf))
>    ice_set_safe_mode_caps(hw);

This sounds reasonable, I'll change it if there will be no more comments.

> Also, should there be some sort of messaging if the device goes into
> safe mode? I wonder if a dev_dbg() would be better than nothing. If 
> ice_init_ddp_config() fails, then it will print an error message, so maybe a 
> dev_warn/info() is warranted if (err)? Of course this would depend on 
> ice_init_ddp_config() to return a non-void value.

ice_request_fw() already prints a dev_err() message when entering safe mode, so 
I don't think it's necessary here.

Thanks,
Marcin
> 
> Thanks,
> 
> Brett
> 
>>
>>  /* if ice_init_ddp_config fails, ICE_FLAG_ADV_FEATURES bit won't be
>>   * set in pf->state, which will cause ice_is_safe_mode to return
>> -- 
>> 2.45.0
>>
>>



[Intel-wired-lan] [PATCH iwl-net v1] i40e: Fix macvlan leak by synchronizing access to mac_filter_hash

2024-09-23 Thread Aleksandr Loktionov
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.

To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.

Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.

Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.

This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.

Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski 
Signed-off-by: Aleksandr Loktionov 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 1 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 03205eb..25295ae 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1734,6 +1734,7 @@ struct i40e_mac_filter *i40e_add_mac_filter(struct 
i40e_vsi *vsi,
struct hlist_node *h;
int bkt;
 
+   lockdep_assert_held(&vsi->mac_filter_hash_lock);
if (vsi->info.pvid)
return i40e_add_filter(vsi, macaddr,
   le16_to_cpu(vsi->info.pvid));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 662622f..dfa785e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2213,8 +2213,10 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf 
*vf, u8 *msg)
vfres->vsi_res[0].qset_handle
  = le16_to_cpu(vsi->info.qs_handle[0]);
if (!(vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO) && 
!vf->pf_set_mac) {
+   spin_lock_bh(&vsi->mac_filter_hash_lock);
i40e_del_mac_filter(vsi, vf->default_lan_addr.addr);
eth_zero_addr(vf->default_lan_addr.addr);
+   spin_unlock_bh(&vsi->mac_filter_hash_lock);
}
ether_addr_copy(vfres->vsi_res[0].default_mac_addr,
vf->default_lan_addr.addr);
-- 
2.25.1



Re: [Intel-wired-lan] [PATCH] ice: Unbind the workqueue

2024-09-23 Thread Przemek Kitszel

On 9/23/24 00:24, Frederic Weisbecker wrote:

The ice workqueue doesn't seem to rely on any CPU locality and should
therefore be able to run on any CPU. In practice this is already
happening through the unbound ice_service_timer that may fire anywhere
and queue the workqueue accordingly to any CPU.

Make this official so that the ice workqueue is only ever queued to
housekeeping CPUs on nohz_full.

Signed-off-by: Frederic Weisbecker 
---
  drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index ea780d468579..70990f42ac05 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5924,7 +5924,7 @@ static int __init ice_module_init(void)
  
  	ice_adv_lnk_speed_maps_init();
  
-	ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME);

+   ice_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, KBUILD_MODNAME);
if (!ice_wq) {
pr_err("Failed to create workqueue\n");
return status;


Thank you for the patch, it would make sense for our iwl-next tree,
with such assumption:
Reviewed-by: Przemek Kitszel 

@Tony, do you want it resent with target tree in the subject?


Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock

2024-09-23 Thread Przemek Kitszel

On 9/21/24 14:52, Paul Menzel wrote:

Dear Wander,


Thank you for your patch.

Am 20.09.24 um 20:59 schrieb Wander Lairson Costa:

tx_queue_lock and stats_lock are declared and initialized, but never
used. Remove them.

Signed-off-by: Wander Lairson Costa 


It’d be great if you added a Fixes: tag.


Alternatively you could split this series into two, and send this patch
to iwl-next tree, without the fixes tag. For me this patch is just
a cleanup, not a fix.





[...]



With that addressed:

Reviewed-by: Paul Menzel 


Kind regards,

Paul




Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock

2024-09-23 Thread Wander Lairson Costa
On Mon, Sep 23, 2024 at 6:04 AM Przemek Kitszel
 wrote:
>
> On 9/21/24 14:52, Paul Menzel wrote:
> > Dear Wander,
> >
> >
> > Thank you for your patch.
> >
> > Am 20.09.24 um 20:59 schrieb Wander Lairson Costa:
> >> tx_queue_lock and stats_lock are declared and initialized, but never
> >> used. Remove them.
> >>
> >> Signed-off-by: Wander Lairson Costa 
> >
> > It’d be great if you added a Fixes: tag.
>
> Alternatively you could split this series into two, and send this patch
> to iwl-next tree, without the fixes tag. For me this patch is just
> a cleanup, not a fix.
>
> >
>

Should I send a new version of the patches separately?

> [...]
>
> >
> > With that addressed:
> >
> > Reviewed-by: Paul Menzel 
> >
> >
> > Kind regards,
> >
> > Paul
>



Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

2024-09-23 Thread Jason Gunthorpe
On Fri, Sep 20, 2024 at 02:14:02PM +0800, Yunsheng Lin wrote:

> I am not sure what dose the API that allows netdev to "give" struct device
> to page_pool look like or how to implement the API yet, but the obvious way
> to stall the calling of device_del() is to wait for the inflight
> page to

It is not device_del() you need to stall, but the remove() function of
the device driver.

Once all drivers have been unbound the DMA API can be reconfigured and
all existing DMA mappings must be concluded before this happens,
otherwise there will be problems.

So, stalling something like unregister_netdevice() would be a better
target - though stalling forever on driver unbind would not be
acceptable.

Jason


Re: [Intel-wired-lan] ice: Use common error handling code in two functions

2024-09-23 Thread Tony Nguyen




On 9/20/2024 12:05 AM, Markus Elfring wrote:

Add jump targets so that a bit of exception handling can be better reused
at the end of two function implementations.


Thank you for contribution, the change is fine,


Thanks for this positive feedback.



 but not as a bugfix.


Would you like to qualify my update suggestion as a correction for
a coding style issue?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v6.11#n526


It can go to -next to correct the coding style, however, for -net it 
should be user visible bugs.



Please send as a [iwl-next], when the submission window opens.


Will a patch resend really be needed for the proposed adjustment?


I'll go ahead and apply this to iwl-next without a re-send, but please 
keep Przemek's comments in mind for future submissions.


Thanks,
Tony


Re: [Intel-wired-lan] [PATCH] ice: Unbind the workqueue

2024-09-23 Thread Tony Nguyen




On 9/23/2024 1:57 AM, Przemek Kitszel wrote:

On 9/23/24 00:24, Frederic Weisbecker wrote:

The ice workqueue doesn't seem to rely on any CPU locality and should
therefore be able to run on any CPU. In practice this is already
happening through the unbound ice_service_timer that may fire anywhere
and queue the workqueue accordingly to any CPU.

Make this official so that the ice workqueue is only ever queued to
housekeeping CPUs on nohz_full.

Signed-off-by: Frederic Weisbecker 
---
  drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c

index ea780d468579..70990f42ac05 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5924,7 +5924,7 @@ static int __init ice_module_init(void)
  ice_adv_lnk_speed_maps_init();
-    ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME);
+    ice_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, KBUILD_MODNAME);
  if (!ice_wq) {
  pr_err("Failed to create workqueue\n");
  return status;


Thank you for the patch, it would make sense for our iwl-next tree,
with such assumption:
Reviewed-by: Przemek Kitszel 

@Tony, do you want it resent with target tree in the subject?


No, I can apply this as-is but please remember to designate a tree for 
future patches.


Thanks,
Tony


Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

2024-09-23 Thread Yunsheng Lin
On 2024/9/19 18:54, Yunsheng Lin wrote:
> On 2024/9/19 1:06, Ilias Apalodimas wrote:
>> Hi Yunsheng,
>>
>> Thanks for looking into this!
>>
>> On Wed, 18 Sept 2024 at 14:24, Yunsheng Lin  wrote:
>>>
>>> Networking driver with page_pool support may hand over page
>>> still with dma mapping to network stack and try to reuse that
>>> page after network stack is done with it and passes it back
>>> to page_pool to avoid the penalty of dma mapping/unmapping.
>>
>> I think you can shorten this to "If recycling and DMA mapping are
>> enabled during the pool creation"
> 
> I am not sure if I understand the 'recycling' part here. Is the
> 'recycling' part referring to whether skb_mark_for_recycle() is
> called to enable recycling for the skb? Is there still any driver
> with page_pool support but doesn't call skb_mark_for_recycle()
> when handing over page to network stack?
> 
> For the 'DMA mapping' part, as there is no space in 'struct
> page' to track the inflight pages, so 'pp' in 'struct page'
> is renamed to 'pp_item' to enable the tracking of inflight
> page. I tried shortening this for 'pool->dma_map being false'
> when coding, but it seems differentiating the same field in
> 'struct page' doesn't make much sense according to 'pool->dma_map'
> as it means we might need to add an union in 'struct page' for
> that to work and add additional checking to decide if it is 'pp'
> or 'pp_item'.
> 
>>
>>> With all the caching in the network stack, some pages may be
>>> held in the network stack without returning to the page_pool
>>> soon enough, and with VF disable causing the driver unbound,
>>> the page_pool does not stop the driver from doing it's
>>> unbounding work, instead page_pool uses workqueue to check
>>> if there is some pages coming back from the network stack
>>> periodically, if there is any, it will do the dma unmmapping
>>> related cleanup work.
>>>
>>> As mentioned in [1], attempting DMA unmaps after the driver
>>> has already unbound may leak resources or at worst corrupt
>>> memory. Fundamentally, the page pool code cannot allow DMA
>>> mappings to outlive the driver they belong to.
>>>
>>> Currently it seems there are at least two cases that the page
>>> is not released fast enough causing dma unmmapping done after
>>> driver has already unbound:
>>> 1. ipv4 packet defragmentation timeout: this seems to cause
>>>delay up to 30 secs:
>>>
>>> 2. skb_defer_free_flush(): this may cause infinite delay if
>>>there is no triggering for net_rx_action().
>>>
>>> In order not to do the dma unmmapping after driver has already
>>> unbound and stall the unloading of the networking driver, add
>>> the pool->items array to record all the pages including the ones
>>> which are handed over to network stack, so the page_pool can
>>> do the dma unmmapping for those pages when page_pool_destroy()
>>> is called.
>>
>> So, I was thinking of a very similar idea. But what do you mean by
>> "all"? The pages that are still in caches (slow or fast) of the pool
>> will be unmapped during page_pool_destroy().
> 
> Yes, it includes the one in pool->alloc and pool->ring.

It worths mentioning that there is a semantics changing here:
Before this patch, there can be almost unlimited inflight pages used by
driver and network stack, as page_pool doesn't really track those pages.
After this patch, as we use a fixed-size pool->items array to track the
inflight pages, the inflight pages is limited by the pool->items, currently
the size of pool->items array is calculated as below in this patch:

+#define PAGE_POOL_MIN_ITEM_CNT 512
+   unsigned int item_cnt = (params->pool_size ? : 1024) +
+   PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT;

Personally I would consider it is an advantage to limit how many pages which
are used by the driver and network stack, the problem seems to how to decide
the limited number of page used by network stack so that performance is not
impacted.

> 
>> Don't we 'just' need a list of the inflight packets and their pages or
>> fragments? What we could do is go through that list and unmap these
>> pages during page_pool_destroy().
> 
> The main reason for that is to avoid the overhead of page_pool_item_del()
> and page_pool_item_add() when allocing/freeing page from/to pool->alloc
> and pool->ring.
> 
> Yes, including the pages in pool->ring seems to make the pool->ring
> somewhat duplicated, maybe we can remove pool->ring if we can make
> and prove 'pool->items' is performing better than pool->ring in the
> future?
> 
>>
>> I'll have a closer look at the patch tomorrow
> 
> Thanks for the reviewing.
> 
>>
>> Thanks!
>> /Ilias
>>
> 


Re: [Intel-wired-lan] [PATCH 1/2] igb: Disable threaded IRQ for igb_msix_other

2024-09-23 Thread Przemek Kitszel

On 9/20/24 20:59, Wander Lairson Costa wrote:

During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.

The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx.

To resolve this, we set IRQF_NO_THREAD for the affected interrupt,
ensuring that the kernel handles it immediately, thereby preventing
the aforementioned error.

Reproducer:

 #!/bin/bash

 # echo 2 > /sys/class/net/ens14f0/device/sriov_numvfs
 ipaddr_vlan=3
 nic_test=ens14f0
 vf=${nic_test}v0

 while true; do
ip link set ${nic_test} mtu 1500
ip link set ${vf} mtu 1500
ip link set $vf up
ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
if ! ip link show $vf | grep 'state UP'; then
echo 'Error found'
break
fi
ip link set $vf down
 done

Signed-off-by: Wander Lairson Costa 
Reported-by: Yuying Ma 
---
  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 1ef4cb871452..8a1696d7289f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -907,7 +907,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
int i, err = 0, vector = 0, free_vector = 0;
  
  	err = request_irq(adapter->msix_entries[vector].vector,

- igb_msix_other, 0, netdev->name, adapter);
+ igb_msix_other, IRQF_NO_THREAD, netdev->name, 
adapter);
if (err)
goto err_out;
  


Thank you for small, localized fix with a good description.
Our VAL will check it also on non-RT OS.
Reviewed-by: Przemek Kitszel 

PS: for future intel ethernet submissions please split out fixes and
refactors, and tag each commit with the [iwl-net] or [iwl-next] tags


Re: [Intel-wired-lan] [PATCH] igb: Do not bring the device up after non-fatal error

2024-09-23 Thread Jacob Keller



On 9/23/2024 2:22 PM, Mohamed Khalfella wrote:
> Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
> changed igb_io_error_detected() to ignore non-fatal pcie errors in order
> to avoid hung task that can happen when igb_down() is called multiple
> times. This caused an issue when processing transient non-fatal errors.
> igb_io_resume(), which is called after igb_io_error_detected(), assumes
> that device is brought down by igb_io_error_detected() if the interface
> is up. This resulted in panic with stacktrace below.
> 
> [ T3256] igb :09:00.0 haeth0: igb: haeth0 NIC Link is Down
> [  T292] pcieport :00:1c.5: AER: Uncorrected (Non-Fatal) error received: 
> :09:00.0
> [  T292] igb :09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), 
> type=Transaction Layer, (Requester ID)
> [  T292] igb :09:00.0:   device [8086:1537] error 
> status/mask=4000/
> [  T292] igb :09:00.0:[14] CmpltTO [  200.105524,009][  T292] igb 
> :09:00.0: AER:   TLP Header:    
> [  T292] pcieport :00:1c.5: AER: broadcast error_detected message
> [  T292] igb :09:00.0: Non-correctable non-fatal error reported.
> [  T292] pcieport :00:1c.5: AER: broadcast mmio_enabled message
> [  T292] pcieport :00:1c.5: AER: broadcast resume message
> [  T292] [ cut here ]
> [  T292] kernel BUG at net/core/dev.c:6539!
> [  T292] invalid opcode:  [#1] PREEMPT SMP
> [  T292] RIP: 0010:napi_enable+0x37/0x40
> [  T292] Call Trace:
> [  T292]  
> [  T292]  ? die+0x33/0x90
> [  T292]  ? do_trap+0xdc/0x110
> [  T292]  ? napi_enable+0x37/0x40
> [  T292]  ? do_error_trap+0x70/0xb0
> [  T292]  ? napi_enable+0x37/0x40
> [  T292]  ? napi_enable+0x37/0x40
> [  T292]  ? exc_invalid_op+0x4e/0x70
> [  T292]  ? napi_enable+0x37/0x40
> [  T292]  ? asm_exc_invalid_op+0x16/0x20
> [  T292]  ? napi_enable+0x37/0x40
> [  T292]  igb_up+0x41/0x150
> [  T292]  igb_io_resume+0x25/0x70
> [  T292]  report_resume+0x54/0x70
> [  T292]  ? report_frozen_detected+0x20/0x20
> [  T292]  pci_walk_bus+0x6c/0x90
> [  T292]  ? aer_print_port_info+0xa0/0xa0
> [  T292]  pcie_do_recovery+0x22f/0x380
> [  T292]  aer_process_err_devices+0x110/0x160
> [  T292]  aer_isr+0x1c1/0x1e0
> [  T292]  ? disable_irq_nosync+0x10/0x10
> [  T292]  irq_thread_fn+0x1a/0x60
> [  T292]  irq_thread+0xe3/0x1a0
> [  T292]  ? irq_set_affinity_notifier+0x120/0x120
> [  T292]  ? irq_affinity_notify+0x100/0x100
> [  T292]  kthread+0xe2/0x110
> [  T292]  ? kthread_complete_and_exit+0x20/0x20
> [  T292]  ret_from_fork+0x2d/0x50
> [  T292]  ? kthread_complete_and_exit+0x20/0x20
> [  T292]  ret_from_fork_asm+0x11/0x20
> [  T292]  
> 
> To fix this issue igb_io_resume() checks if the interface is running and
> the device is not down this means igb_io_error_detected() did not bring
> the device down and there is no need to bring it up.
> 
> Signed-off-by: Mohamed Khalfella 
> Reviewed-by: Yuanyuan Zhong
> Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 1ef4cb871452..8c6bc3db9a3d 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -9651,6 +9651,10 @@ static void igb_io_resume(struct pci_dev *pdev)
>   struct igb_adapter *adapter = netdev_priv(netdev);
>  
>   if (netif_running(netdev)) {
> + if (!test_bit(__IGB_DOWN, &adapter->state)) {
> + dev_info(&pdev->dev, "Resuming from non-fatal error, do 
> nothing.\n");
> + return;

I'm not sure this needs to be a dev_info.

Reviewed-by: Jacob Keller 


Re: [Intel-wired-lan] [PATCH] igb: Do not bring the device up after non-fatal error

2024-09-23 Thread Jacob Keller



On 9/23/2024 4:11 PM, Jacob Keller wrote:
> 
> 
> On 9/23/2024 2:22 PM, Mohamed Khalfella wrote:
>> Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
>> changed igb_io_error_detected() to ignore non-fatal pcie errors in order
>> to avoid hung task that can happen when igb_down() is called multiple
>> times. This caused an issue when processing transient non-fatal errors.
>> igb_io_resume(), which is called after igb_io_error_detected(), assumes
>> that device is brought down by igb_io_error_detected() if the interface
>> is up. This resulted in panic with stacktrace below.
>>
>> [ T3256] igb :09:00.0 haeth0: igb: haeth0 NIC Link is Down
>> [  T292] pcieport :00:1c.5: AER: Uncorrected (Non-Fatal) error received: 
>> :09:00.0
>> [  T292] igb :09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), 
>> type=Transaction Layer, (Requester ID)
>> [  T292] igb :09:00.0:   device [8086:1537] error 
>> status/mask=4000/
>> [  T292] igb :09:00.0:[14] CmpltTO [  200.105524,009][  T292] igb 
>> :09:00.0: AER:   TLP Header:    
>> [  T292] pcieport :00:1c.5: AER: broadcast error_detected message
>> [  T292] igb :09:00.0: Non-correctable non-fatal error reported.
>> [  T292] pcieport :00:1c.5: AER: broadcast mmio_enabled message
>> [  T292] pcieport :00:1c.5: AER: broadcast resume message
>> [  T292] [ cut here ]
>> [  T292] kernel BUG at net/core/dev.c:6539!
>> [  T292] invalid opcode:  [#1] PREEMPT SMP
>> [  T292] RIP: 0010:napi_enable+0x37/0x40
>> [  T292] Call Trace:
>> [  T292]  
>> [  T292]  ? die+0x33/0x90
>> [  T292]  ? do_trap+0xdc/0x110
>> [  T292]  ? napi_enable+0x37/0x40
>> [  T292]  ? do_error_trap+0x70/0xb0
>> [  T292]  ? napi_enable+0x37/0x40
>> [  T292]  ? napi_enable+0x37/0x40
>> [  T292]  ? exc_invalid_op+0x4e/0x70
>> [  T292]  ? napi_enable+0x37/0x40
>> [  T292]  ? asm_exc_invalid_op+0x16/0x20
>> [  T292]  ? napi_enable+0x37/0x40
>> [  T292]  igb_up+0x41/0x150
>> [  T292]  igb_io_resume+0x25/0x70
>> [  T292]  report_resume+0x54/0x70
>> [  T292]  ? report_frozen_detected+0x20/0x20
>> [  T292]  pci_walk_bus+0x6c/0x90
>> [  T292]  ? aer_print_port_info+0xa0/0xa0
>> [  T292]  pcie_do_recovery+0x22f/0x380
>> [  T292]  aer_process_err_devices+0x110/0x160
>> [  T292]  aer_isr+0x1c1/0x1e0
>> [  T292]  ? disable_irq_nosync+0x10/0x10
>> [  T292]  irq_thread_fn+0x1a/0x60
>> [  T292]  irq_thread+0xe3/0x1a0
>> [  T292]  ? irq_set_affinity_notifier+0x120/0x120
>> [  T292]  ? irq_affinity_notify+0x100/0x100
>> [  T292]  kthread+0xe2/0x110
>> [  T292]  ? kthread_complete_and_exit+0x20/0x20
>> [  T292]  ret_from_fork+0x2d/0x50
>> [  T292]  ? kthread_complete_and_exit+0x20/0x20
>> [  T292]  ret_from_fork_asm+0x11/0x20
>> [  T292]  
>>
>> To fix this issue igb_io_resume() checks if the interface is running and
>> the device is not down this means igb_io_error_detected() did not bring
>> the device down and there is no need to bring it up.
>>
>> Signed-off-by: Mohamed Khalfella 
>> Reviewed-by: Yuanyuan Zhong
>> Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
>> b/drivers/net/ethernet/intel/igb/igb_main.c
>> index 1ef4cb871452..8c6bc3db9a3d 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -9651,6 +9651,10 @@ static void igb_io_resume(struct pci_dev *pdev)
>>  struct igb_adapter *adapter = netdev_priv(netdev);
>>  
>>  if (netif_running(netdev)) {
>> +if (!test_bit(__IGB_DOWN, &adapter->state)) {
>> +dev_info(&pdev->dev, "Resuming from non-fatal error, do 
>> nothing.\n");
>> +return;
> 
> I'm not sure this needs to be a dev_info.
> 

I was thinking dev_dbg, because I don't really see why its relevant to
inform the user we did nothing. Seems like its log spam to me.

> Reviewed-by: Jacob Keller 
> 



Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock

2024-09-23 Thread Tony Nguyen




On 9/23/2024 9:46 AM, Wander Lairson Costa wrote:

On Mon, Sep 23, 2024 at 6:04 AM Przemek Kitszel
 wrote:


On 9/21/24 14:52, Paul Menzel wrote:

Dear Wander,


Thank you for your patch.

Am 20.09.24 um 20:59 schrieb Wander Lairson Costa:

tx_queue_lock and stats_lock are declared and initialized, but never
used. Remove them.

Signed-off-by: Wander Lairson Costa 


It’d be great if you added a Fixes: tag.


Alternatively you could split this series into two, and send this patch
to iwl-next tree, without the fixes tag. For me this patch is just
a cleanup, not a fix.







Should I send a new version of the patches separately?


The patches apply to the respective trees when split out so I can take 
these without a re-send. Patch 1 will need a Fixes: for it though...


I'm seeing it as: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet 
driver")?


Thanks,
Tony


[...]



With that addressed:

Reviewed-by: Paul Menzel 


Kind regards,

Paul






Re: [Intel-wired-lan] [PATCH iwl-net v1] i40e: Fix macvlan leak by synchronizing access to mac_filter_hash

2024-09-23 Thread Simon Horman
On Mon, Sep 23, 2024 at 11:12:19AM +0200, Aleksandr Loktionov wrote:
> This patch addresses a macvlan leak issue in the i40e driver caused by
> concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
> threads attempt to modify the mac_filter_hash simultaneously, leading to
> inconsistent state and potential memory leaks.
> 
> To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
> vf->default_lan_addr.addr with 
> spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
> ensuring atomic operations and preventing concurrent access.
> 
> Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
> i40e_add_mac_filter() to help catch similar issues in the future.
> 
> Reproduction steps:
> 1. Spawn VFs and configure port vlan on them.
> 2. Trigger concurrent macvlan operations (e.g., adding and deleting
>   portvlan and/or mac filters).
> 3. Observe the potential memory leak and inconsistent state in the
>   mac_filter_hash.
> 
> This synchronization ensures the integrity of the mac_filter_hash and prevents
> the described leak.
> 
> Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
> Reviewed-by: Arkadiusz Kubalewski 
> Signed-off-by: Aleksandr Loktionov 

Thanks Aleksandr,

I see that:

1) All calls to i40e_add_mac_filter() and all other calls
   to i40e_del_mac_filter() are already protected by
   vsi->mac_filter_hash_lock.

2) i40e_del_mac_filter() already asserts that
   vsi->mac_filter_hash_lock is held.

So this looks good to me.

Reviewed-by: Simon Horman 

...


Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

2024-09-23 Thread Ilias Apalodimas
Hi Jason,

On Mon, 23 Sept 2024 at 20:52, Jason Gunthorpe  wrote:
>
> On Fri, Sep 20, 2024 at 02:14:02PM +0800, Yunsheng Lin wrote:
>
> > I am not sure what dose the API that allows netdev to "give" struct device
> > to page_pool look like or how to implement the API yet, but the obvious way
> > to stall the calling of device_del() is to wait for the inflight
> > page to
>
> It is not device_del() you need to stall, but the remove() function of
> the device driver.
>
> Once all drivers have been unbound the DMA API can be reconfigured and
> all existing DMA mappings must be concluded before this happens,
> otherwise there will be problems.
>
> So, stalling something like unregister_netdevice() would be a better
> target - though stalling forever on driver unbind would not be
> acceptable.

TBH, I have doubts that even stalling it for small amounts of time is
going to disrupt userspace and people are going to yell at us.
I am gonna repeat myself here, but I think keeping a list of the
inflight SKBs that we need to unmap when the interface goes down, is
the most complex, but less disruptive solution

Thanks
/Ilias
>
> Jason