Re: [Intel-wired-lan] [PATCH iwl-net v2 1/2] ice: Fix entering Safe Mode
On 20.09.2024 19:14, Brett Creeley wrote: > > > On 9/20/2024 9:59 AM, Marcin Szycik wrote: >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> If DDP package is missing or corrupted, the driver should enter Safe Mode. >> Instead, an error is returned and probe fails. >> >> Don't check return value of ice_init_ddp_config() to fix this. >> >> Change ice_init_ddp_config() type to void, as now its return is never >> checked. >> >> Repro: >> * Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg) >> * Load ice >> >> Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology") >> Reviewed-by: Przemek Kitszel >> Signed-off-by: Marcin Szycik >> --- >> v2: Change ice_init_ddp_config() type to void >> --- >> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++ >> 1 file changed, 3 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c >> b/drivers/net/ethernet/intel/ice/ice_main.c >> index 0f5c9d347806..aeebf4ae25ae 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_main.c >> +++ b/drivers/net/ethernet/intel/ice/ice_main.c >> @@ -4548,34 +4548,27 @@ ice_init_tx_topology(struct ice_hw *hw, const struct >> firmware *firmware) >> * >> * This function loads DDP file from the disk, then initializes Tx >> * topology. At the end DDP package is loaded on the card. >> - * >> - * Return: zero when init was successful, negative values otherwise. >> */ >> -static int ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf) >> +static void ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf) >> { >> struct device *dev = ice_pf_to_dev(pf); >> const struct firmware *firmware = NULL; >> int err; >> >> err = ice_request_fw(pf, &firmware); >> - if (err) { >> + if (err) >> dev_err(dev, "Fail during requesting FW: %d\n", err); >> - return err; >> - } >> >> err = ice_init_tx_topology(hw, firmware); >> if (err) { >> dev_err(dev, "Fail during initialization of Tx topology: >> %d\n", >> err); >> release_firmware(firmware); >> - return err; >> } >> >> /* Download firmware to device */ >> ice_load_pkg(firmware, pf); >> release_firmware(firmware); >> - >> - return 0; >> } >> >> /** >> @@ -4748,9 +4741,7 @@ int ice_init_dev(struct ice_pf *pf) >> >> ice_init_feature_support(pf); >> >> - err = ice_init_ddp_config(hw, pf); >> - if (err) >> - return err; >> + ice_init_ddp_config(hw, pf); > > I just commented this on v1 as I didn't expect it to be resent. I'm also okay > with Maciej's suggestion, but I wanted to offer an alternative option. > > As an alternative solution you could potentially do the following, which > would make the flow more readable: > > err = ice_init_ddp_config(hw, pf); > if (err || ice_is_safe_mode(pf)) > ice_set_safe_mode_caps(hw); This sounds reasonable, I'll change it if there will be no more comments. > Also, should there be some sort of messaging if the device goes into > safe mode? I wonder if a dev_dbg() would be better than nothing. If > ice_init_ddp_config() fails, then it will print an error message, so maybe a > dev_warn/info() is warranted if (err)? Of course this would depend on > ice_init_ddp_config() to return a non-void value. ice_request_fw() already prints a dev_err() message when entering safe mode, so I don't think it's necessary here. Thanks, Marcin > > Thanks, > > Brett > >> >> /* if ice_init_ddp_config fails, ICE_FLAG_ADV_FEATURES bit won't be >> * set in pf->state, which will cause ice_is_safe_mode to return >> -- >> 2.45.0 >> >>
[Intel-wired-lan] [PATCH iwl-net v1] i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
This patch addresses a macvlan leak issue in the i40e driver caused by concurrent access to vsi->mac_filter_hash. The leak occurs when multiple threads attempt to modify the mac_filter_hash simultaneously, leading to inconsistent state and potential memory leaks. To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock), ensuring atomic operations and preventing concurrent access. Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in i40e_add_mac_filter() to help catch similar issues in the future. Reproduction steps: 1. Spawn VFs and configure port vlan on them. 2. Trigger concurrent macvlan operations (e.g., adding and deleting portvlan and/or mac filters). 3. Observe the potential memory leak and inconsistent state in the mac_filter_hash. This synchronization ensures the integrity of the mac_filter_hash and prevents the described leak. Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM") Reviewed-by: Arkadiusz Kubalewski Signed-off-by: Aleksandr Loktionov --- drivers/net/ethernet/intel/i40e/i40e_main.c| 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 03205eb..25295ae 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1734,6 +1734,7 @@ struct i40e_mac_filter *i40e_add_mac_filter(struct i40e_vsi *vsi, struct hlist_node *h; int bkt; + lockdep_assert_held(&vsi->mac_filter_hash_lock); if (vsi->info.pvid) return i40e_add_filter(vsi, macaddr, le16_to_cpu(vsi->info.pvid)); diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 662622f..dfa785e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2213,8 +2213,10 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg) vfres->vsi_res[0].qset_handle = le16_to_cpu(vsi->info.qs_handle[0]); if (!(vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO) && !vf->pf_set_mac) { + spin_lock_bh(&vsi->mac_filter_hash_lock); i40e_del_mac_filter(vsi, vf->default_lan_addr.addr); eth_zero_addr(vf->default_lan_addr.addr); + spin_unlock_bh(&vsi->mac_filter_hash_lock); } ether_addr_copy(vfres->vsi_res[0].default_mac_addr, vf->default_lan_addr.addr); -- 2.25.1
Re: [Intel-wired-lan] [PATCH] ice: Unbind the workqueue
On 9/23/24 00:24, Frederic Weisbecker wrote: The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index ea780d468579..70990f42ac05 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5924,7 +5924,7 @@ static int __init ice_module_init(void) ice_adv_lnk_speed_maps_init(); - ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME); + ice_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, KBUILD_MODNAME); if (!ice_wq) { pr_err("Failed to create workqueue\n"); return status; Thank you for the patch, it would make sense for our iwl-next tree, with such assumption: Reviewed-by: Przemek Kitszel @Tony, do you want it resent with target tree in the subject?
Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock
On 9/21/24 14:52, Paul Menzel wrote: Dear Wander, Thank you for your patch. Am 20.09.24 um 20:59 schrieb Wander Lairson Costa: tx_queue_lock and stats_lock are declared and initialized, but never used. Remove them. Signed-off-by: Wander Lairson Costa It’d be great if you added a Fixes: tag. Alternatively you could split this series into two, and send this patch to iwl-next tree, without the fixes tag. For me this patch is just a cleanup, not a fix. [...] With that addressed: Reviewed-by: Paul Menzel Kind regards, Paul
Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock
On Mon, Sep 23, 2024 at 6:04 AM Przemek Kitszel wrote: > > On 9/21/24 14:52, Paul Menzel wrote: > > Dear Wander, > > > > > > Thank you for your patch. > > > > Am 20.09.24 um 20:59 schrieb Wander Lairson Costa: > >> tx_queue_lock and stats_lock are declared and initialized, but never > >> used. Remove them. > >> > >> Signed-off-by: Wander Lairson Costa > > > > It’d be great if you added a Fixes: tag. > > Alternatively you could split this series into two, and send this patch > to iwl-next tree, without the fixes tag. For me this patch is just > a cleanup, not a fix. > > > > Should I send a new version of the patches separately? > [...] > > > > > With that addressed: > > > > Reviewed-by: Paul Menzel > > > > > > Kind regards, > > > > Paul >
Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound
On Fri, Sep 20, 2024 at 02:14:02PM +0800, Yunsheng Lin wrote: > I am not sure what dose the API that allows netdev to "give" struct device > to page_pool look like or how to implement the API yet, but the obvious way > to stall the calling of device_del() is to wait for the inflight > page to It is not device_del() you need to stall, but the remove() function of the device driver. Once all drivers have been unbound the DMA API can be reconfigured and all existing DMA mappings must be concluded before this happens, otherwise there will be problems. So, stalling something like unregister_netdevice() would be a better target - though stalling forever on driver unbind would not be acceptable. Jason
Re: [Intel-wired-lan] ice: Use common error handling code in two functions
On 9/20/2024 12:05 AM, Markus Elfring wrote: Add jump targets so that a bit of exception handling can be better reused at the end of two function implementations. Thank you for contribution, the change is fine, Thanks for this positive feedback. but not as a bugfix. Would you like to qualify my update suggestion as a correction for a coding style issue? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v6.11#n526 It can go to -next to correct the coding style, however, for -net it should be user visible bugs. Please send as a [iwl-next], when the submission window opens. Will a patch resend really be needed for the proposed adjustment? I'll go ahead and apply this to iwl-next without a re-send, but please keep Przemek's comments in mind for future submissions. Thanks, Tony
Re: [Intel-wired-lan] [PATCH] ice: Unbind the workqueue
On 9/23/2024 1:57 AM, Przemek Kitszel wrote: On 9/23/24 00:24, Frederic Weisbecker wrote: The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index ea780d468579..70990f42ac05 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5924,7 +5924,7 @@ static int __init ice_module_init(void) ice_adv_lnk_speed_maps_init(); - ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME); + ice_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, KBUILD_MODNAME); if (!ice_wq) { pr_err("Failed to create workqueue\n"); return status; Thank you for the patch, it would make sense for our iwl-next tree, with such assumption: Reviewed-by: Przemek Kitszel @Tony, do you want it resent with target tree in the subject? No, I can apply this as-is but please remember to designate a tree for future patches. Thanks, Tony
Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound
On 2024/9/19 18:54, Yunsheng Lin wrote: > On 2024/9/19 1:06, Ilias Apalodimas wrote: >> Hi Yunsheng, >> >> Thanks for looking into this! >> >> On Wed, 18 Sept 2024 at 14:24, Yunsheng Lin wrote: >>> >>> Networking driver with page_pool support may hand over page >>> still with dma mapping to network stack and try to reuse that >>> page after network stack is done with it and passes it back >>> to page_pool to avoid the penalty of dma mapping/unmapping. >> >> I think you can shorten this to "If recycling and DMA mapping are >> enabled during the pool creation" > > I am not sure if I understand the 'recycling' part here. Is the > 'recycling' part referring to whether skb_mark_for_recycle() is > called to enable recycling for the skb? Is there still any driver > with page_pool support but doesn't call skb_mark_for_recycle() > when handing over page to network stack? > > For the 'DMA mapping' part, as there is no space in 'struct > page' to track the inflight pages, so 'pp' in 'struct page' > is renamed to 'pp_item' to enable the tracking of inflight > page. I tried shortening this for 'pool->dma_map being false' > when coding, but it seems differentiating the same field in > 'struct page' doesn't make much sense according to 'pool->dma_map' > as it means we might need to add an union in 'struct page' for > that to work and add additional checking to decide if it is 'pp' > or 'pp_item'. > >> >>> With all the caching in the network stack, some pages may be >>> held in the network stack without returning to the page_pool >>> soon enough, and with VF disable causing the driver unbound, >>> the page_pool does not stop the driver from doing it's >>> unbounding work, instead page_pool uses workqueue to check >>> if there is some pages coming back from the network stack >>> periodically, if there is any, it will do the dma unmmapping >>> related cleanup work. >>> >>> As mentioned in [1], attempting DMA unmaps after the driver >>> has already unbound may leak resources or at worst corrupt >>> memory. Fundamentally, the page pool code cannot allow DMA >>> mappings to outlive the driver they belong to. >>> >>> Currently it seems there are at least two cases that the page >>> is not released fast enough causing dma unmmapping done after >>> driver has already unbound: >>> 1. ipv4 packet defragmentation timeout: this seems to cause >>>delay up to 30 secs: >>> >>> 2. skb_defer_free_flush(): this may cause infinite delay if >>>there is no triggering for net_rx_action(). >>> >>> In order not to do the dma unmmapping after driver has already >>> unbound and stall the unloading of the networking driver, add >>> the pool->items array to record all the pages including the ones >>> which are handed over to network stack, so the page_pool can >>> do the dma unmmapping for those pages when page_pool_destroy() >>> is called. >> >> So, I was thinking of a very similar idea. But what do you mean by >> "all"? The pages that are still in caches (slow or fast) of the pool >> will be unmapped during page_pool_destroy(). > > Yes, it includes the one in pool->alloc and pool->ring. It worths mentioning that there is a semantics changing here: Before this patch, there can be almost unlimited inflight pages used by driver and network stack, as page_pool doesn't really track those pages. After this patch, as we use a fixed-size pool->items array to track the inflight pages, the inflight pages is limited by the pool->items, currently the size of pool->items array is calculated as below in this patch: +#define PAGE_POOL_MIN_ITEM_CNT 512 + unsigned int item_cnt = (params->pool_size ? : 1024) + + PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT; Personally I would consider it is an advantage to limit how many pages which are used by the driver and network stack, the problem seems to how to decide the limited number of page used by network stack so that performance is not impacted. > >> Don't we 'just' need a list of the inflight packets and their pages or >> fragments? What we could do is go through that list and unmap these >> pages during page_pool_destroy(). > > The main reason for that is to avoid the overhead of page_pool_item_del() > and page_pool_item_add() when allocing/freeing page from/to pool->alloc > and pool->ring. > > Yes, including the pages in pool->ring seems to make the pool->ring > somewhat duplicated, maybe we can remove pool->ring if we can make > and prove 'pool->items' is performing better than pool->ring in the > future? > >> >> I'll have a closer look at the patch tomorrow > > Thanks for the reviewing. > >> >> Thanks! >> /Ilias >> >
Re: [Intel-wired-lan] [PATCH 1/2] igb: Disable threaded IRQ for igb_msix_other
On 9/20/24 20:59, Wander Lairson Costa wrote: During testing of SR-IOV, Red Hat QE encountered an issue where the ip link up command intermittently fails for the igbvf interfaces when using the PREEMPT_RT variant. Investigation revealed that e1000_write_posted_mbx returns an error due to the lack of an ACK from e1000_poll_for_ack. The underlying issue arises from the fact that IRQs are threaded by default under PREEMPT_RT. While the exact hardware details are not available, it appears that the IRQ handled by igb_msix_other must be processed before e1000_poll_for_ack times out. However, e1000_write_posted_mbx is called with preemption disabled, leading to a scenario where the IRQ is serviced only after the failure of e1000_write_posted_mbx. To resolve this, we set IRQF_NO_THREAD for the affected interrupt, ensuring that the kernel handles it immediately, thereby preventing the aforementioned error. Reproducer: #!/bin/bash # echo 2 > /sys/class/net/ens14f0/device/sriov_numvfs ipaddr_vlan=3 nic_test=ens14f0 vf=${nic_test}v0 while true; do ip link set ${nic_test} mtu 1500 ip link set ${vf} mtu 1500 ip link set $vf up ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan} ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf} ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf} if ! ip link show $vf | grep 'state UP'; then echo 'Error found' break fi ip link set $vf down done Signed-off-by: Wander Lairson Costa Reported-by: Yuying Ma --- drivers/net/ethernet/intel/igb/igb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 1ef4cb871452..8a1696d7289f 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -907,7 +907,7 @@ static int igb_request_msix(struct igb_adapter *adapter) int i, err = 0, vector = 0, free_vector = 0; err = request_irq(adapter->msix_entries[vector].vector, - igb_msix_other, 0, netdev->name, adapter); + igb_msix_other, IRQF_NO_THREAD, netdev->name, adapter); if (err) goto err_out; Thank you for small, localized fix with a good description. Our VAL will check it also on non-RT OS. Reviewed-by: Przemek Kitszel PS: for future intel ethernet submissions please split out fixes and refactors, and tag each commit with the [iwl-net] or [iwl-next] tags
Re: [Intel-wired-lan] [PATCH] igb: Do not bring the device up after non-fatal error
On 9/23/2024 2:22 PM, Mohamed Khalfella wrote: > Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal") > changed igb_io_error_detected() to ignore non-fatal pcie errors in order > to avoid hung task that can happen when igb_down() is called multiple > times. This caused an issue when processing transient non-fatal errors. > igb_io_resume(), which is called after igb_io_error_detected(), assumes > that device is brought down by igb_io_error_detected() if the interface > is up. This resulted in panic with stacktrace below. > > [ T3256] igb :09:00.0 haeth0: igb: haeth0 NIC Link is Down > [ T292] pcieport :00:1c.5: AER: Uncorrected (Non-Fatal) error received: > :09:00.0 > [ T292] igb :09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), > type=Transaction Layer, (Requester ID) > [ T292] igb :09:00.0: device [8086:1537] error > status/mask=4000/ > [ T292] igb :09:00.0:[14] CmpltTO [ 200.105524,009][ T292] igb > :09:00.0: AER: TLP Header: > [ T292] pcieport :00:1c.5: AER: broadcast error_detected message > [ T292] igb :09:00.0: Non-correctable non-fatal error reported. > [ T292] pcieport :00:1c.5: AER: broadcast mmio_enabled message > [ T292] pcieport :00:1c.5: AER: broadcast resume message > [ T292] [ cut here ] > [ T292] kernel BUG at net/core/dev.c:6539! > [ T292] invalid opcode: [#1] PREEMPT SMP > [ T292] RIP: 0010:napi_enable+0x37/0x40 > [ T292] Call Trace: > [ T292] > [ T292] ? die+0x33/0x90 > [ T292] ? do_trap+0xdc/0x110 > [ T292] ? napi_enable+0x37/0x40 > [ T292] ? do_error_trap+0x70/0xb0 > [ T292] ? napi_enable+0x37/0x40 > [ T292] ? napi_enable+0x37/0x40 > [ T292] ? exc_invalid_op+0x4e/0x70 > [ T292] ? napi_enable+0x37/0x40 > [ T292] ? asm_exc_invalid_op+0x16/0x20 > [ T292] ? napi_enable+0x37/0x40 > [ T292] igb_up+0x41/0x150 > [ T292] igb_io_resume+0x25/0x70 > [ T292] report_resume+0x54/0x70 > [ T292] ? report_frozen_detected+0x20/0x20 > [ T292] pci_walk_bus+0x6c/0x90 > [ T292] ? aer_print_port_info+0xa0/0xa0 > [ T292] pcie_do_recovery+0x22f/0x380 > [ T292] aer_process_err_devices+0x110/0x160 > [ T292] aer_isr+0x1c1/0x1e0 > [ T292] ? disable_irq_nosync+0x10/0x10 > [ T292] irq_thread_fn+0x1a/0x60 > [ T292] irq_thread+0xe3/0x1a0 > [ T292] ? irq_set_affinity_notifier+0x120/0x120 > [ T292] ? irq_affinity_notify+0x100/0x100 > [ T292] kthread+0xe2/0x110 > [ T292] ? kthread_complete_and_exit+0x20/0x20 > [ T292] ret_from_fork+0x2d/0x50 > [ T292] ? kthread_complete_and_exit+0x20/0x20 > [ T292] ret_from_fork_asm+0x11/0x20 > [ T292] > > To fix this issue igb_io_resume() checks if the interface is running and > the device is not down this means igb_io_error_detected() did not bring > the device down and there is no need to bring it up. > > Signed-off-by: Mohamed Khalfella > Reviewed-by: Yuanyuan Zhong > Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal") > --- > drivers/net/ethernet/intel/igb/igb_main.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 1ef4cb871452..8c6bc3db9a3d 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -9651,6 +9651,10 @@ static void igb_io_resume(struct pci_dev *pdev) > struct igb_adapter *adapter = netdev_priv(netdev); > > if (netif_running(netdev)) { > + if (!test_bit(__IGB_DOWN, &adapter->state)) { > + dev_info(&pdev->dev, "Resuming from non-fatal error, do > nothing.\n"); > + return; I'm not sure this needs to be a dev_info. Reviewed-by: Jacob Keller
Re: [Intel-wired-lan] [PATCH] igb: Do not bring the device up after non-fatal error
On 9/23/2024 4:11 PM, Jacob Keller wrote: > > > On 9/23/2024 2:22 PM, Mohamed Khalfella wrote: >> Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal") >> changed igb_io_error_detected() to ignore non-fatal pcie errors in order >> to avoid hung task that can happen when igb_down() is called multiple >> times. This caused an issue when processing transient non-fatal errors. >> igb_io_resume(), which is called after igb_io_error_detected(), assumes >> that device is brought down by igb_io_error_detected() if the interface >> is up. This resulted in panic with stacktrace below. >> >> [ T3256] igb :09:00.0 haeth0: igb: haeth0 NIC Link is Down >> [ T292] pcieport :00:1c.5: AER: Uncorrected (Non-Fatal) error received: >> :09:00.0 >> [ T292] igb :09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), >> type=Transaction Layer, (Requester ID) >> [ T292] igb :09:00.0: device [8086:1537] error >> status/mask=4000/ >> [ T292] igb :09:00.0:[14] CmpltTO [ 200.105524,009][ T292] igb >> :09:00.0: AER: TLP Header: >> [ T292] pcieport :00:1c.5: AER: broadcast error_detected message >> [ T292] igb :09:00.0: Non-correctable non-fatal error reported. >> [ T292] pcieport :00:1c.5: AER: broadcast mmio_enabled message >> [ T292] pcieport :00:1c.5: AER: broadcast resume message >> [ T292] [ cut here ] >> [ T292] kernel BUG at net/core/dev.c:6539! >> [ T292] invalid opcode: [#1] PREEMPT SMP >> [ T292] RIP: 0010:napi_enable+0x37/0x40 >> [ T292] Call Trace: >> [ T292] >> [ T292] ? die+0x33/0x90 >> [ T292] ? do_trap+0xdc/0x110 >> [ T292] ? napi_enable+0x37/0x40 >> [ T292] ? do_error_trap+0x70/0xb0 >> [ T292] ? napi_enable+0x37/0x40 >> [ T292] ? napi_enable+0x37/0x40 >> [ T292] ? exc_invalid_op+0x4e/0x70 >> [ T292] ? napi_enable+0x37/0x40 >> [ T292] ? asm_exc_invalid_op+0x16/0x20 >> [ T292] ? napi_enable+0x37/0x40 >> [ T292] igb_up+0x41/0x150 >> [ T292] igb_io_resume+0x25/0x70 >> [ T292] report_resume+0x54/0x70 >> [ T292] ? report_frozen_detected+0x20/0x20 >> [ T292] pci_walk_bus+0x6c/0x90 >> [ T292] ? aer_print_port_info+0xa0/0xa0 >> [ T292] pcie_do_recovery+0x22f/0x380 >> [ T292] aer_process_err_devices+0x110/0x160 >> [ T292] aer_isr+0x1c1/0x1e0 >> [ T292] ? disable_irq_nosync+0x10/0x10 >> [ T292] irq_thread_fn+0x1a/0x60 >> [ T292] irq_thread+0xe3/0x1a0 >> [ T292] ? irq_set_affinity_notifier+0x120/0x120 >> [ T292] ? irq_affinity_notify+0x100/0x100 >> [ T292] kthread+0xe2/0x110 >> [ T292] ? kthread_complete_and_exit+0x20/0x20 >> [ T292] ret_from_fork+0x2d/0x50 >> [ T292] ? kthread_complete_and_exit+0x20/0x20 >> [ T292] ret_from_fork_asm+0x11/0x20 >> [ T292] >> >> To fix this issue igb_io_resume() checks if the interface is running and >> the device is not down this means igb_io_error_detected() did not bring >> the device down and there is no need to bring it up. >> >> Signed-off-by: Mohamed Khalfella >> Reviewed-by: Yuanyuan Zhong >> Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal") >> --- >> drivers/net/ethernet/intel/igb/igb_main.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c >> b/drivers/net/ethernet/intel/igb/igb_main.c >> index 1ef4cb871452..8c6bc3db9a3d 100644 >> --- a/drivers/net/ethernet/intel/igb/igb_main.c >> +++ b/drivers/net/ethernet/intel/igb/igb_main.c >> @@ -9651,6 +9651,10 @@ static void igb_io_resume(struct pci_dev *pdev) >> struct igb_adapter *adapter = netdev_priv(netdev); >> >> if (netif_running(netdev)) { >> +if (!test_bit(__IGB_DOWN, &adapter->state)) { >> +dev_info(&pdev->dev, "Resuming from non-fatal error, do >> nothing.\n"); >> +return; > > I'm not sure this needs to be a dev_info. > I was thinking dev_dbg, because I don't really see why its relevant to inform the user we did nothing. Seems like its log spam to me. > Reviewed-by: Jacob Keller >
Re: [Intel-wired-lan] [PATCH 2/2] igbvf: remove unused spinlock
On 9/23/2024 9:46 AM, Wander Lairson Costa wrote: On Mon, Sep 23, 2024 at 6:04 AM Przemek Kitszel wrote: On 9/21/24 14:52, Paul Menzel wrote: Dear Wander, Thank you for your patch. Am 20.09.24 um 20:59 schrieb Wander Lairson Costa: tx_queue_lock and stats_lock are declared and initialized, but never used. Remove them. Signed-off-by: Wander Lairson Costa It’d be great if you added a Fixes: tag. Alternatively you could split this series into two, and send this patch to iwl-next tree, without the fixes tag. For me this patch is just a cleanup, not a fix. Should I send a new version of the patches separately? The patches apply to the respective trees when split out so I can take these without a re-send. Patch 1 will need a Fixes: for it though... I'm seeing it as: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")? Thanks, Tony [...] With that addressed: Reviewed-by: Paul Menzel Kind regards, Paul
Re: [Intel-wired-lan] [PATCH iwl-net v1] i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
On Mon, Sep 23, 2024 at 11:12:19AM +0200, Aleksandr Loktionov wrote: > This patch addresses a macvlan leak issue in the i40e driver caused by > concurrent access to vsi->mac_filter_hash. The leak occurs when multiple > threads attempt to modify the mac_filter_hash simultaneously, leading to > inconsistent state and potential memory leaks. > > To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing > vf->default_lan_addr.addr with > spin_lock/unlock_bh(&vsi->mac_filter_hash_lock), > ensuring atomic operations and preventing concurrent access. > > Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in > i40e_add_mac_filter() to help catch similar issues in the future. > > Reproduction steps: > 1. Spawn VFs and configure port vlan on them. > 2. Trigger concurrent macvlan operations (e.g., adding and deleting > portvlan and/or mac filters). > 3. Observe the potential memory leak and inconsistent state in the > mac_filter_hash. > > This synchronization ensures the integrity of the mac_filter_hash and prevents > the described leak. > > Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM") > Reviewed-by: Arkadiusz Kubalewski > Signed-off-by: Aleksandr Loktionov Thanks Aleksandr, I see that: 1) All calls to i40e_add_mac_filter() and all other calls to i40e_del_mac_filter() are already protected by vsi->mac_filter_hash_lock. 2) i40e_del_mac_filter() already asserts that vsi->mac_filter_hash_lock is held. So this looks good to me. Reviewed-by: Simon Horman ...
Re: [Intel-wired-lan] [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound
Hi Jason, On Mon, 23 Sept 2024 at 20:52, Jason Gunthorpe wrote: > > On Fri, Sep 20, 2024 at 02:14:02PM +0800, Yunsheng Lin wrote: > > > I am not sure what dose the API that allows netdev to "give" struct device > > to page_pool look like or how to implement the API yet, but the obvious way > > to stall the calling of device_del() is to wait for the inflight > > page to > > It is not device_del() you need to stall, but the remove() function of > the device driver. > > Once all drivers have been unbound the DMA API can be reconfigured and > all existing DMA mappings must be concluded before this happens, > otherwise there will be problems. > > So, stalling something like unregister_netdevice() would be a better > target - though stalling forever on driver unbind would not be > acceptable. TBH, I have doubts that even stalling it for small amounts of time is going to disrupt userspace and people are going to yell at us. I am gonna repeat myself here, but I think keeping a list of the inflight SKBs that we need to unmap when the interface goes down, is the most complex, but less disruptive solution Thanks /Ilias > > Jason