Re: [Intel-wired-lan] [PATCH net-next v6 0/8] fix two bugs related to page_pool
On Mon, 6 Jan 2025 21:01:08 +0800 Yunsheng Lin wrote: > This patchset fix a possible time window problem for page_pool and > the dma API misuse problem as mentioned in [1], and try to avoid the > overhead of the fixing using some optimization. > > From the below performance data, the overhead is not so obvious > due to performance variations for time_bench_page_pool01_fast_path() > and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead > for time_bench_page_pool03_slow() for fixing the bug. This appears to make the selftest from the drivers/net target implode. [ 20.227775][ T218] BUG: KASAN: use-after-free in page_pool_item_uninit+0x100/0x130 Running the ping.py tests should be enough to repro. -- pw-bot: cr
Re: [Intel-wired-lan] [PATCH net-next 0/9] i40e deadcoding
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski : On Thu, 2 Jan 2025 17:37:08 + you wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This is a bunch of deadcoding of functions that > are entirely uncalled in the i40e driver. > > Build tested only. > > [...] Here is the summary with links: - [net-next,1/9] i40e: Deadcode i40e_aq_* https://git.kernel.org/netdev/net-next/c/59ec698d01eb - [net-next,2/9] i40e: Remove unused i40e_blink_phy_link_led https://git.kernel.org/netdev/net-next/c/39cabb01d26d - [net-next,3/9] i40e: Remove unused i40e_(read|write)_phy_register https://git.kernel.org/netdev/net-next/c/8cc51e28ecce - [net-next,4/9] i40e: Deadcode profile code https://git.kernel.org/netdev/net-next/c/81d6bb2012e1 - [net-next,5/9] i40e: Remove unused i40e_get_cur_guaranteed_fd_count https://git.kernel.org/netdev/net-next/c/3eb24a9e0af3 - [net-next,6/9] i40e: Remove unused i40e_del_filter https://git.kernel.org/netdev/net-next/c/38dfb07d9a65 - [net-next,7/9] i40e: Remove unused i40e_commit_partition_bw_setting https://git.kernel.org/netdev/net-next/c/a324484ac855 - [net-next,8/9] i40e: Remove unused i40e_asq_send_command_v2 https://git.kernel.org/netdev/net-next/c/d424b93f35a6 - [net-next,9/9] i40e: Remove unused i40e_dcb_hw_get_num_tc https://git.kernel.org/netdev/net-next/c/47ea5d4e6f40 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [Intel-wired-lan] [PATCH net-next 0/3] igc deadcoding
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski : On Thu, 2 Jan 2025 17:41:39 + you wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This set removes some functions that are entirely unused > and have been since ~2018. > > Build tested. > > [...] Here is the summary with links: - [net-next,1/3] igc: Remove unused igc_acquire/release_nvm https://git.kernel.org/netdev/net-next/c/b37dba891b17 - [net-next,2/3] igc: Remove unused igc_read/write_pci_cfg wrappers https://git.kernel.org/netdev/net-next/c/121c3c6bc661 - [net-next,3/3] igc: Remove unused igc_read/write_pcie_cap_reg https://git.kernel.org/netdev/net-next/c/c75889081366 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [Intel-wired-lan] [PATCH net-next v3 3/6] net: napi: add CPU affinity to napi_config
Hi Ahmed, kernel test robot noticed the following build warnings: url: https://github.com/intel-lab-lkp/linux/commits/Ahmed-Zaki/net-move-ARFS-rmap-management-to-core/20250104-084501 base: net-next/main patch link: https://lore.kernel.org/r/20250104004314.208259-4-ahmed.zaki%40intel.com patch subject: [Intel-wired-lan] [PATCH net-next v3 3/6] net: napi: add CPU affinity to napi_config config: i386-randconfig-141-20250104 (https://download.01.org/0day-ci/archive/20250105/202501050625.ny1c97ex-...@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Reported-by: Dan Carpenter | Closes: https://lore.kernel.org/r/202501050625.ny1c97ex-...@intel.com/ smatch warnings: net/core/dev.c:6835 napi_restore_config() warn: variable dereferenced before check 'n->config' (see line 6831) net/core/dev.c:6855 napi_save_config() warn: variable dereferenced before check 'n->config' (see line 6850) vim +6835 net/core/dev.c 86e25f40aa1e9e5 Joe Damato 2024-10-11 6829 static void napi_restore_config(struct napi_struct *n) 86e25f40aa1e9e5 Joe Damato 2024-10-11 6830 { 86e25f40aa1e9e5 Joe Damato 2024-10-11 @6831 n->defer_hard_irqs = n->config->defer_hard_irqs; 86e25f40aa1e9e5 Joe Damato 2024-10-11 6832 n->gro_flush_timeout = n->config->gro_flush_timeout; 5dc51ec86df6e22 Martin Karsten 2024-11-09 6833 n->irq_suspend_timeout = n->config->irq_suspend_timeout; ^ These lines all dereference n->config. d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6834 d6b43b8a2e5297b Ahmed Zaki 2025-01-03 @6835 if (n->irq > 0 && n->config && n->dev->irq_affinity_auto) ^ This code assumes it can be NULL d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6836 irq_set_affinity(n->irq, &n->config->affinity_mask); d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6837 86e25f40aa1e9e5 Joe Damato 2024-10-11 6838 /* a NAPI ID might be stored in the config, if so use it. if not, use 86e25f40aa1e9e5 Joe Damato 2024-10-11 6839 * napi_hash_add to generate one for us. It will be saved to the config 86e25f40aa1e9e5 Joe Damato 2024-10-11 6840 * in napi_disable. 86e25f40aa1e9e5 Joe Damato 2024-10-11 6841 */ 86e25f40aa1e9e5 Joe Damato 2024-10-11 6842 if (n->config->napi_id) 86e25f40aa1e9e5 Joe Damato 2024-10-11 6843 napi_hash_add_with_id(n, n->config->napi_id); 86e25f40aa1e9e5 Joe Damato 2024-10-11 6844 else 86e25f40aa1e9e5 Joe Damato 2024-10-11 6845 napi_hash_add(n); 86e25f40aa1e9e5 Joe Damato 2024-10-11 6846 } 86e25f40aa1e9e5 Joe Damato 2024-10-11 6847 86e25f40aa1e9e5 Joe Damato 2024-10-11 6848 static void napi_save_config(struct napi_struct *n) 86e25f40aa1e9e5 Joe Damato 2024-10-11 6849 { 86e25f40aa1e9e5 Joe Damato 2024-10-11 @6850 n->config->defer_hard_irqs = n->defer_hard_irqs; 86e25f40aa1e9e5 Joe Damato 2024-10-11 6851 n->config->gro_flush_timeout = n->gro_flush_timeout; 5dc51ec86df6e22 Martin Karsten 2024-11-09 6852 n->config->irq_suspend_timeout = n->irq_suspend_timeout; 86e25f40aa1e9e5 Joe Damato 2024-10-11 6853 n->config->napi_id = n->napi_id; d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6854 d6b43b8a2e5297b Ahmed Zaki 2025-01-03 @6855 if (n->irq > 0 && n->config && n->dev->irq_affinity_auto) Same d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6856 irq_set_affinity_notifier(n->irq, NULL); d6b43b8a2e5297b Ahmed Zaki 2025-01-03 6857 86e25f40aa1e9e5 Joe Damato 2024-10-11 6858 napi_hash_del(n); 86e25f40aa1e9e5 Joe Damato 2024-10-11 6859 } -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [Intel-wired-lan] [RFC net-next 1/9] i40e: Deadcode i40e_aq_*
On Sat, Dec 21, 2024 at 06:42:39PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > i40e_aq_add_mirrorrule(), i40e_aq_delete_mirrorrule() and > i40e_aq_set_vsi_vlan_promisc() were added in 2016 by > commit 7bd6875bef70 ("i40e: APIs to Add/remove port mirroring rules") > but haven't been used. > > They were the last user of i40e_mirrorrule_op(). > > i40e_aq_rearrange_nvm() was added in 2018 by > commit f05798b4ff82 ("i40e: Add AQ command for rearrange NVM structure") > but hasn't been used. > > i40e_aq_restore_lldp() was added in 2019 by > commit c65e78f87f81 ("i40e: Further implementation of LLDP") > but hasn't been used. > > Remove them. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 4/9] i40e: Deadcode profile code
On Sat, Dec 21, 2024 at 06:42:42PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > i40e_add_pinfo_to_list() was added in 2017 by > commit 1d5c960c5ef5 ("i40e: new AQ commands") > > i40e_find_section_in_profile() was added in 2019 by > commit cdc594e00370 ("i40e: Implement DDP support in i40e driver") > > Neither have been used. > > Remove them. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 3/9] i40e: Remove unused i40e_(read|write)_phy_register
On Sat, Dec 21, 2024 at 06:42:41PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > i40e_read_phy_register() and i40e_write_phy_register() were added in > 2016 by > commit f62ba91458b5 ("i40e: Add functions which apply correct PHY access > method for read and write operation") > > but haven't been used. > > Remove them. > > (There are more specific _clause* variants of these functions > that are still used.) > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 8/9] i40e: Remove unused i40e_asq_send_command_v2
On Sat, Dec 21, 2024 at 06:42:46PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > i40e_asq_send_command_v2() was added in 2022 by > commit 74073848b0d7 ("i40e: Add new versions of send ASQ command > functions") > but hasn't been used. > > Remove it. > > (The _atomic_v2 version of the function is used, so leave it). > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 6/9] i40e: Remove unused i40e_del_filter
On Sat, Dec 21, 2024 at 06:42:44PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > The last use of i40e_del_filter() was removed in 2016 by > commit 9569a9a4547d ("i40e: when adding or removing MAC filters, correctly > handle VLANs") > > Remove it. > > Fix up a comment that referenced it. > > Note: The __ version of this function is still used. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [PATCH iwl-next v6] ice: Add E830 checksum offload support
On Wed, Dec 18, 2024 at 04:11:45AM -0500, Paul Greenwalt wrote: > E830 supports raw receive and generic transmit checksum offloads. > > Raw receive checksum support is provided by hardware calculating the > checksum over the whole packet, regardless of type. The calculated > checksum is provided to driver in the Rx flex descriptor. Then the driver > assigns the checksum to skb->csum and sets skb->ip_summed to > CHECKSUM_COMPLETE. > > Generic transmit checksum support is provided by hardware calculating the > checksum given two offsets: the start offset to begin checksum calculation, > and the offset to insert the calculated checksum in the packet. Support is > advertised to the stack using NETIF_F_HW_CSUM feature. > > E830 has the following limitations when both generic transmit checksum > offload and TCP Segmentation Offload (TSO) are enabled: > > 1. Inner packet header modification is not supported. This restriction >includes the inability to alter TCP flags, such as the push flag. As a >result, this limitation can impact the receiver's ability to coalesce >packets, potentially degrading network throughput. > 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents >support of Maximum Transmission Unit (MTU) greater than 1063 bytes. > > Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually > exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not > enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the > defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and > NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity > of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features(). > > When NETIF_F_HW_CSUM is requested the provided skb->csum_start and > skb->csum_offset are passed to hardware in the Tx context descriptor > generic checksum (GCS) parameters. Hardware calculates the 1's complement > from skb->csum_start to the end of the packet, and inserts the result in > the packet at skb->csum_offset. > > Co-developed-by: Alice Michael > Signed-off-by: Alice Michael > Co-developed-by: Eric Joyner > Signed-off-by: Eric Joyner > Signed-off-by: Paul Greenwalt Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [PATCH net-next 0/9] i40e deadcoding
On 1/4/2025 8:16 AM, Jakub Kicinski wrote: On Thu, 2 Jan 2025 17:37:08 + li...@treblig.org wrote: This is a bunch of deadcoding of functions that are entirely uncalled in the i40e driver. Build tested only. Intel folks, is it okay if we take this (and the igc series) in directly? Seems very unlikely to require testing... It's fine to take directly. I don't think this needs testing either. I believe this will get picked up from here: Reviewed-by: Tony Nguyen Thanks, Tony
Re: [Intel-wired-lan] [PATCH net-next 0/3] igc deadcoding
On 1/2/2025 9:41 AM, li...@treblig.org wrote: From: "Dr. David Alan Gilbert" Hi, This set removes some functions that are entirely unused and have been since ~2018. Build tested. Signed-off-by: Dr. David Alan Gilbert (Repost now netdev is open) Reviewed-by: Tony Nguyen Dr. David Alan Gilbert (3): igc: Remove unused igc_acquire/release_nvm igc: Remove unused igc_read/write_pci_cfg wrappers igc: Remove unused igc_read/write_pcie_cap_reg drivers/net/ethernet/intel/igc/igc_hw.h | 5 --- drivers/net/ethernet/intel/igc/igc_main.c | 39 -- drivers/net/ethernet/intel/igc/igc_nvm.c | 50 --- drivers/net/ethernet/intel/igc/igc_nvm.h | 2 - 4 files changed, 96 deletions(-)
Re: [Intel-wired-lan] [PATCH iwl-net v3] ice: fix ice_parser_rt::bst_key array size
On Thu, Dec 19, 2024 at 12:55:16PM +0100, Przemek Kitszel wrote: > Fix &ice_parser_rt::bst_key size. It was wrongly set to 10 instead of 20 > in the initial impl commit (see Fixes tag). All usage code assumed it was > of size 20. That was also the initial size present up to v2 of the intro > series [2], but halved by v3 [3] refactor described as "Replace magic > hardcoded values with macros." The introducing series was so big that > some ugliness was unnoticed, same for bugs :/ > > ICE_BST_KEY_TCAM_SIZE and ICE_BST_TCAM_KEY_SIZE were differing by one. > There was tmp variable @j in the scope of edited function, but was not > used in all places. This ugliness is now gone. > I'm moving ice_parser_rt::pg_prio a few positions up, to fill up one of > the holes in order to compensate for the added 10 bytes to the ::bst_key, > resulting in the same size of the whole as prior to the fix, and miminal > changes in the offsets of the fields. > > Extend also the debug dump print of the key to cover all bytes. To not > have string with 20 "%02x" and 20 params, switch to > ice_debug_array_w_prefix(). > > This fix obsoletes Ahmed's attempt at [1]. > > [1] > https://lore.kernel.org/intel-wired-lan/20240823230847.172295-1-ahmed.z...@intel.com > [2] > https://lore.kernel.org/intel-wired-lan/20230605054641.2865142-13-junfeng@intel.com > [3] > https://lore.kernel.org/intel-wired-lan/20230817093442.2576997-13-junfeng@intel.com > > Reported-by: Dan Carpenter > Closes: > https://lore.kernel.org/intel-wired-lan/b1fb6ff9-b69e-4026-9988-3c783d86c2e0@stanley.mountain > Fixes: 9a4c07aaa0f5 ("ice: add parser execution main loop") > CC: Ahmed Zaki > Reviewed-by: Larysa Zaremba > Signed-off-by: Przemek Kitszel > --- > v3: mention printing change in commit msg, separate prefix from the debug log > (Simon) > > v2: same as v3, but lacks code change :( > > v1: > https://lore.kernel.org/intel-wired-lan/20241216170548.gi780...@kernel.org/T/#mbf984a0faa12a5bdb53460b150201fdd7cc1826a Thanks for the updates, much appreciated. Reviewed-by: Simon Horman
[Intel-wired-lan] [PATCH bpf-next v4 0/4] xsk: TX metadata Launch Time support
This series expands the XDP TX metadata framework to allow user applications to pass per packet 64-bit launch time directly to the kernel driver, requesting launch time hardware offload support. The XDP TX metadata framework will not perform any clock conversion or packet reordering. Please note that the role of Tx metadata is just to pass the launch time, not to enable the offload feature. Users will need to enable the launch time hardware offload feature of the device by using the respective command, such as the tc-etf command. Although some devices use the tc-etf command to enable their launch time hardware offload feature, xsk packets will not go through the etf qdisc. Therefore, in my opinion, the launch time should always be based on the PTP Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the clock source. To simplify the test steps, I modified the xdp_hw_metadata bpf self-test tool in such a way that it will set the launch time based on the offset provided by the user and the value of the Receive Hardware Timestamp, which is against the PHC. This will eliminate the need to discipline System Clock with the PHC and then use clock_gettime() to get the time. Please note that AF_XDP lacks a feedback mechanism to inform the application if the requested launch time is invalid. So, users are expected to familiar with the horizon of the launch time of the device they use and not request a launch time that is beyond the horizon. Otherwise, the driver might interpret the launch time incorrectly and react wrongly. For stmmac and igc, where modulo computation is used, a launch time larger than the horizon will cause the device to transmit the packet earlier that the requested launch time. Although there is no feedback mechanism for the launch time request for now, user still can check whether the requested launch time is working or not, by requesting the Transmit Completion Hardware Timestamp. Changes since v1: - renamed to use Earliest TxTime First (Willem) - renamed to use txtime (Willem) Changes since v2: - renamed to use launch time (Jesper & Willem) - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s because some NICs do not support such a large future time. Changes since v3: - added XDP launch time support to the igc driver (Jesper & Florian) - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper) - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub) - added step to enable launch time in the commit message (Jesper & Willem) - explicitly documented the type of launch_time and which clock source it is against (Willem) v1: https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-1-yoong.siang.s...@intel.com/ v2: https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768-1-yoong.siang.s...@intel.com/ v3: https://patchwork.kernel.org/project/netdevbpf/cover/20231203165129.1740512-1-yoong.siang.s...@intel.com/ Song Yoong Siang (4): xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add Launch Time request to xdp_hw_metadata net: stmmac: Add launch time support to XDP ZC igc: Add launch time support to XDP ZC Documentation/netlink/specs/netdev.yaml | 4 + Documentation/networking/xsk-tx-metadata.rst | 64 +++ drivers/net/ethernet/intel/igc/igc_main.c | 78 +-- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 include/net/xdp_sock.h| 10 +++ include/net/xdp_sock_drv.h| 1 + include/uapi/linux/if_xdp.h | 10 +++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c| 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 +++ tools/include/uapi/linux/netdev.h | 3 + tools/testing/selftests/bpf/xdp_hw_metadata.c | 30 ++- 14 files changed, 208 insertions(+), 25 deletions(-) -- 2.34.1
Re: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG
On Thu, Jan 02, 2025 at 08:07:52PM +0100, Marcin Szycik wrote: > Ever since removing switchdev control VSI and using PF for port > representor Tx/Rx, switchdev slow-path has been working improperly after > failover in SR-IOV LAG. LAG assumes that the first uplink to be added to > the aggregate will own VFs and have switchdev configured. After > failing-over to the other uplink, representors are still configured to > Tx through the uplink they are set up on, which fails because that > uplink is now down. > > On failover, update all PRs on primary uplink to use the currently > active uplink for Tx. Call netif_keep_dst(), as the secondary uplink > might not be in switchdev mode. Also make sure to call > ice_eswitch_set_target_vsi() if uplink is in LAG. > > On the Rx path, representors are already working properly, because > default Tx from VFs is set to PF owning the eswitch. After failover the > same PF is receiving traffic from VFs, even though link is down. > > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path") > Reviewed-by: Michal Swiatkowski > Signed-off-by: Marcin Szycik Reviewed-by: Simon Horman
[Intel-wired-lan] [PATCH bpf-next v4 2/4] selftests/bpf: Add Launch Time request to xdp_hw_metadata
Add Launch Time hw offload request to xdp_hw_metadata. User can configure the delta of launch time to HW RX-time by using "-l" argument. The default delta is 100,000,000 nanosecond. Signed-off-by: Song Yoong Siang --- tools/testing/selftests/bpf/xdp_hw_metadata.c | 30 +-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 6f7b15d6c6ed..795c1d14e02d 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -13,6 +13,7 @@ * - UDP 9091 packets trigger TX reply * - TX HW timestamp is requested and reported back upon completion * - TX checksum is requested + * - TX launch time HW offload is requested for transmission */ #include @@ -64,6 +65,8 @@ int rxq; bool skip_tx; __u64 last_hw_rx_timestamp; __u64 last_xdp_rx_timestamp; +__u64 last_launch_time; +__u64 launch_time_delta_to_hw_rx_timestamp = 1; /* 0.1 second */ void test__fail(void) { /* for network_helpers.c */ } @@ -298,6 +301,8 @@ static bool complete_tx(struct xsk *xsk, clockid_t clock_id) if (meta->completion.tx_timestamp) { __u64 ref_tstamp = gettime(clock_id); + print_tstamp_delta("HW Launch-time", "HW TX-complete-time", + last_launch_time, meta->completion.tx_timestamp); print_tstamp_delta("HW TX-complete-time", "User TX-complete-time", meta->completion.tx_timestamp, ref_tstamp); print_tstamp_delta("XDP RX-time", "User TX-complete-time", @@ -395,6 +400,14 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id) xsk, ntohs(udph->check), ntohs(want_csum), meta->request.csum_start, meta->request.csum_offset); + /* Set the value of launch time */ + meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME; + meta->request.launch_time = last_hw_rx_timestamp + + launch_time_delta_to_hw_rx_timestamp; + last_launch_time = meta->request.launch_time; + print_tstamp_delta("HW RX-time", "HW Launch-time", last_hw_rx_timestamp, + meta->request.launch_time); + memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */ tx_desc->options |= XDP_TX_METADATA; tx_desc->len = len; @@ -402,10 +415,14 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id) xsk_ring_prod__submit(&xsk->tx, 1); } +#define SLEEP_PER_ITERATION_IN_US 10 +#define SLEEP_PER_ITERATION_IN_NS (SLEEP_PER_ITERATION_IN_US * 1000) +#define MAX_ITERATION(x) (((x) / SLEEP_PER_ITERATION_IN_NS) + 500) static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id) { const struct xdp_desc *rx_desc; struct pollfd fds[rxq + 1]; + int max_iterations; __u64 comp_addr; __u64 addr; __u32 idx = 0; @@ -418,6 +435,9 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t fds[i].revents = 0; } + /* Calculate max iterations to wait for transmit completion */ + max_iterations = MAX_ITERATION(launch_time_delta_to_hw_rx_timestamp); + fds[rxq].fd = server_fd; fds[rxq].events = POLLIN; fds[rxq].revents = 0; @@ -477,10 +497,10 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t if (ret) printf("kick_tx ret=%d\n", ret); - for (int j = 0; j < 500; j++) { + for (int j = 0; j < max_iterations; j++) { if (complete_tx(xsk, clock_id)) break; - usleep(10); + usleep(SLEEP_PER_ITERATION_IN_US); } } } @@ -608,6 +628,7 @@ static void print_usage(void) " -hDisplay this help and exit\n\n" " -mEnable multi-buffer XDP for larger MTU\n" " -rDon't generate AF_XDP reply (rx metadata only)\n" + " -lDelta of launch time to HW RX-time in ns (default: 100,000,000ns)\n" "Generate test packets on the other machine with:\n" " echo -n xdp | nc -u -q1 9091\n"; @@ -618,7 +639,7 @@ static void read_args(int argc, char *argv[]) { int opt; - while ((opt = getopt(argc, argv, "chmr")) != -1) { + while ((opt = getopt(argc, argv, "chmrl:")) != -1) { switch (opt) { case 'c':
[Intel-wired-lan] [PATCH bpf-next v4 4/4] igc: Add launch time support to XDP ZC
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx metadata framework. This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel Tiger Lake platform. Below are the test steps and result. Test Steps: 1. Add mqprio qdisc: $ sudo tc qdisc add dev enp2s0 handle 8001: parent root mqprio num_tc 4 map 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 1@2 1@3 hw 0 2. Enable launch time hardware offload on hardware queue 1: $ sudo tc qdisc replace dev enp2s0 parent 8001:2 etf offload clockid CLOCK_TAI delta 50 3. Change RSS to route all incoming IP packets into hardware queue 1: $ sudo ethtool -X enp2s0 start 1 equal 1 4. Start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 10 5. Send an UDP packet to port 9091 of DUT. $ echo -n xdp | nc -u -q0 169.254.1.1 9091 When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 0.016us, as shown in result below: 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-time sec:0.0002 (183.103 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-time sec:0.0001 (80.309 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 csum_offset=6 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW Launch-time sec:1. (100.000 usec) 0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW TX-complete-time sec:0. (0.016 usec) HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta to User TX-complete-time sec:0. (32.546 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-complete-time sec:0. (29.768 usec) HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-complete-time sec:1. (100.016 usec) 0x562ff5dc8880: complete rx idx=132 addr=84110 Signed-off-by: Song Yoong Siang --- drivers/net/ethernet/intel/igc/igc_main.c | 78 --- 1 file changed, 56 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 27872bdea9bd..6857f5f5b4b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s return false; } +static void igc_insert_empty_packet(struct igc_ring *tx_ring) +{ + struct igc_tx_buffer *empty_info; + struct sk_buff *empty; + void *data; + + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC); + if (!empty) + return; + + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE); + memset(data, 0, IGC_EMPTY_FRAME_SIZE); + + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0); + + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0) + dev_kfree_skb_any(empty); +} + static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, struct igc_ring *tx_ring) { @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, skb->tstamp = ktime_set(0, 0); launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty); - if (insert_empty) { - struct igc_tx_buffer *empty_info; - struct sk_buff *empty; - void *data; - - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC); - if (!empty) - goto done; - - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE); - memset(data, 0, IGC_EMPTY_FRAME_SIZE); - - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0); - - if (igc_init_tx_empty_descriptor(tx_ring, -empty, -empty_info) < 0) - dev_kfree_skb_any(empty); - } + if (insert_empty) + igc_insert_empty_packet(tx_ring); done: /* record the location of the first descriptor for this packet */ @@ -2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; } +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct igc_metadata_request *meta_req = _priv; + struct igc_ring *tx_ring = meta_req->tx_ring; + __le32 launch_time_offset; + bool insert_empty = false; + bool first_flag = false; + + if (!tx_ring-
[Intel-wired-lan] [PATCH bpf-next v4 3/4] net: stmmac: Add launch time support to XDP ZC
Enable launch time (Time-Based Scheduling) support to XDP zero copy via XDP Tx metadata framework. This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel Tiger Lake platform. Below are the test steps and result. Test Steps: 1. Add mqprio qdisc: $ sudo tc qdisc add dev enp0s30f4 handle 8001: parent root mqprio num_tc 4 map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 queues 1@0 1@1 1@2 1@3 hw 0 2. Enable launch time hardware offload on hardware queue 1: $ sudo tc qdisc replace dev enp0s30f4 parent 8001:2 etf offload clockid CLOCK_TAI delta 50 3. Add an ingress qdisc: $ sudo tc qdisc add dev enp0s30f4 ingress 4. Add a flower filter to route incoming packet with VLAN priority 1 into hardware queue 1: $ sudo tc filter add dev enp0s30f4 parent : protocol 802.1Q flower vlan_prio 1 hw_tc 1 5. Enable VLAN tag stripping: $ sudo ethtool -K enp0s30f4 rxvlan on 6. Start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp0s30f4 -l 10 7. Send an UDP packet with VLAN priority 1 to port 9091 of DUT. When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 16.963us, as shown in result below: 0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP No rx_hash, err=-95 HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to User RX-time sec:0.0004 (375.624 usec) XDP RX-time: 1734579065768004454 (sec:1734579065.7680) delta to User RX-time sec:0.0001 (88.498 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x55b5864717a8: ping-pong with csum=5619 (want ) csum_start=34 csum_offset=6 HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to HW Launch-time sec:1. (100.000 usec) 0x55b5864717a8: complete tx idx=4 addr=4018 HW Launch-time: 1734579066767717328 (sec:1734579066.7677) delta to HW TX-complete-time sec:0. (16.963 usec) HW TX-complete-time: 1734579066767734291 (sec:1734579066.7677) delta to User TX-complete-time sec:0.0001 (130.408 usec) XDP RX-time: 1734579065768004454 (sec:1734579065.7680) delta to User TX-complete-time sec:0. (999860.245 usec) HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to HW TX-complete-time sec:1. (116.963 usec) 0x55b5864717a8: complete rx idx=132 addr=88100 Signed-off-by: Song Yoong Siang --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 1d86439b8a14..c80462d42989 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -106,6 +106,8 @@ struct stmmac_metadata_request { struct stmmac_priv *priv; struct dma_desc *tx_desc; bool *set_ic; + struct dma_edesc *edesc; + int tbs; }; struct stmmac_xsk_tx_complete { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c81ea8cdfe6e..3a083e3684ed 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2445,9 +2445,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv) return 0; } +static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct stmmac_metadata_request *meta_req = _priv; + struct timespec64 ts = ns_to_timespec64(launch_time); + + if (meta_req->tbs & STMMAC_TBS_EN) + stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec, + ts.tv_nsec); +} + static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = { .tmo_request_timestamp = stmmac_xsk_request_timestamp, .tmo_fill_timestamp = stmmac_xsk_fill_timestamp, + .tmo_request_launch_time= stmmac_xsk_request_launch_time, }; static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) @@ -2531,6 +2542,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) meta_req.priv = priv; meta_req.tx_desc = tx_desc; meta_req.set_ic = &set_ic; + meta_req.tbs = tx_q->tbs; + meta_req.edesc = &tx_q->dma_entx[entry]; xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops, &meta_req); if (set_ic) { -- 2.34.1
[Intel-wired-lan] [PATCH bpf-next v4 1/4] xsk: Add launch time hardware offload support to XDP Tx metadata
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev Signed-off-by: Song Yoong Siang --- Documentation/netlink/specs/netdev.yaml | 4 ++ Documentation/networking/xsk-tx-metadata.rst | 64 include/net/xdp_sock.h | 10 +++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 +++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c| 3 + tools/include/uapi/linux/if_xdp.h| 10 +++ tools/include/uapi/linux/netdev.h| 3 + 10 files changed, 110 insertions(+) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index cbb544bd6c84..e59c8a14f7d1 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -70,6 +70,10 @@ definitions: name: tx-checksum doc: L3 checksum HW offload is supported by the driver. + - +name: tx-launch-time +doc: + Launch time HW offload is supported by the driver. - name: queue-type type: enum diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst index e76b0cfc32f7..3cec089747ce 100644 --- a/Documentation/networking/xsk-tx-metadata.rst +++ b/Documentation/networking/xsk-tx-metadata.rst @@ -50,6 +50,10 @@ The flags field enables the particular offload: checksum. ``csum_start`` specifies byte offset of where the checksumming should start and ``csum_offset`` specifies byte offset where the device should store the computed checksum. +- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the + packet for transmission at a pre-determined time called launch time. The + value of launch time is indicated by ``launch_time`` field of + ``union xsk_tx_metadata``. Besides the flags above, in order to trigger the offloads, the first packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` @@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX checksum is calculated on the CPU. Do not enable this option in production because it will negatively affect performance. +Launch Time +=== + +The value of the requested launch time should be based on the device's PTP +Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path +compared to the ETF queuing discipline, which organizes packets and delays +their transmission. Instead, AF_XDP immediately hands off the packets to +the device driver without rearranging their order or holding them prior to +transmission. In scenarios where the launch time offload feature is +disabled, the device driver is expected to disregard the launch time +request. For correct interpretation and meaningful operation, the launch +time should never be set to a value larger than the farthest programmable +time in the future (the horizon). Different devices have different hardware +limitations on the launch time offload feature. + +stmmac driver +- + +For stmmac, TSO and launch time (TBS) features are mutually exclusive for +each individual Tx Queue. By default, the driver configures Tx Queue 0 to +support TSO and the rest of the Tx Queues to support TBS. The launch time +hardware offload feature can be enabled or disabled by using the tc-etf +command to call the driver's ndo_setup_tc() callback. + +The value of the launch time that is programmed in the Enhanced Normal +Transmit Descriptors is a 32-bit value, where the most significant 8 bits +represent the time in seconds and the remaining 24 bits represent the time +in 256 ns increments. The programmed launch time is compared against the +PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the +horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the +future. + +The stmmac driver maintains FIFO behavior and does not perform packet +reordering. This means that a packet with a launch time request will block +other packets in the same Tx Queue until it is transmitted. + +igc driver +-- + +For igc, all four Tx Queues support the launch time feature. The launch +time hardware offload feature can be enabled or disabled by using the +tc-etf command to call the driver's ndo_setup_tc() callback. When entering +TSN mode, the igc driver will reset the device and create a default Qbv +schedule with a 1-second cycle time, with all Tx Queues open at all times. + +The value of the launch time that is programmed in the Advanced Transmit +Context Descriptor is a relative offset t
Re: [Intel-wired-lan] [PATCH v2 RESEND net-next] e1000e: makes e1000_watchdog_task use queue_delayed_work
On 1/5/2025 1:38 PM, Dmitrii Ermakov wrote: Replaces watchdog timer with delayed_work as advised in the driver's TODO comment. Signed-off-by: Dmitrii Ermakov --- V1 -> V2: Removed redundant line wraps, renamed e1000_watchdog to e1000_watchdog_work drivers/net/ethernet/intel/e1000e/e1000.h | 4 +-- drivers/net/ethernet/intel/e1000e/netdev.c | 42 -- 2 files changed, 16 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h index ba9c19e6994c..5a60372d2158 100644 --- a/drivers/net/ethernet/intel/e1000e/e1000.h +++ b/drivers/net/ethernet/intel/e1000e/e1000.h @@ -189,12 +189,12 @@ struct e1000_phy_regs { /* board specific private data structure */ struct e1000_adapter { - struct timer_list watchdog_timer; struct timer_list phy_info_timer; struct timer_list blink_timer; + struct delayed_work watchdog_work; + struct work_struct reset_task; - struct work_struct watchdog_task; const struct e1000_info *ei; diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 286155efcedf..cb68662cdc3a 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1778,7 +1778,7 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data) } /* guard against interrupt when we're going down */ if (!test_bit(__E1000_DOWN, &adapter->state)) - mod_timer(&adapter->watchdog_timer, jiffies + 1); + queue_delayed_work(system_wq, &adapter->watchdog_work, 1); } /* Reset on uncorrectable ECC error */ @@ -1857,7 +1857,7 @@ static irqreturn_t e1000_intr(int __always_unused irq, void *data) } /* guard against interrupt when we're going down */ if (!test_bit(__E1000_DOWN, &adapter->state)) - mod_timer(&adapter->watchdog_timer, jiffies + 1); + queue_delayed_work(system_wq, &adapter->watchdog_work, 1); } /* Reset on uncorrectable ECC error */ @@ -1901,7 +1901,7 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ if (!test_bit(__E1000_DOWN, &adapter->state)) - mod_timer(&adapter->watchdog_timer, jiffies + 1); + queue_delayed_work(system_wq, &adapter->watchdog_work, 1); } if (!test_bit(__E1000_DOWN, &adapter->state)) @@ -4287,7 +4287,8 @@ void e1000e_down(struct e1000_adapter *adapter, bool reset) napi_synchronize(&adapter->napi); - del_timer_sync(&adapter->watchdog_timer); + cancel_delayed_work_sync(&adapter->watchdog_work); + del_timer_sync(&adapter->phy_info_timer); spin_lock(&adapter->stats64_lock); @@ -5169,25 +5170,12 @@ static void e1000e_check_82574_phy_workaround(struct e1000_adapter *adapter) } } -/** - * e1000_watchdog - Timer Call-back - * @t: pointer to timer_list containing private info adapter - **/ -static void e1000_watchdog(struct timer_list *t) +static void e1000_watchdog_work(struct work_struct *work) { - struct e1000_adapter *adapter = from_timer(adapter, t, watchdog_timer); - - /* Do the rest outside of interrupt context */ - schedule_work(&adapter->watchdog_task); - - /* TODO: make this use queue_delayed_work() */ -} - -static void e1000_watchdog_task(struct work_struct *work) -{ - struct e1000_adapter *adapter = container_of(work, -struct e1000_adapter, -watchdog_task); + struct delayed_work *dwork = + container_of(work, struct delayed_work, work); + struct e1000_adapter *adapter = + container_of(dwork, struct e1000_adapter, watchdog_work); struct net_device *netdev = adapter->netdev; struct e1000_mac_info *mac = &adapter->hw.mac; struct e1000_phy_info *phy = &adapter->hw.phy; @@ -5416,8 +5404,8 @@ static void e1000_watchdog_task(struct work_struct *work) /* Reset the timer */ if (!test_bit(__E1000_DOWN, &adapter->state)) - mod_timer(&adapter->watchdog_timer, - round_jiffies(jiffies + 2 * HZ)); + queue_delayed_work(system_wq, &adapter->watchdog_work, + round_jiffies(2 * HZ)); } #define E1000_TX_FLAGS_CSUM 0x0001 @@ -7596,11 +7584,10 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto err_eeprom; } - timer_setup(&adapter->watchdog_timer, e1000_watchdog, 0); timer_setup(&adapter->phy_info_timer, e1000_upd
Re: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix transaction timeouts on reset
On Thu, Dec 19, 2024 at 06:09:32PM -0800, Emil Tantilov wrote: > Restore the call to idpf_vc_xn_shutdown() at the beginning of > idpf_vc_core_deinit() provided the function is not called on remove. > In the reset path the mailbox is destroyed, leading to all transactions > timing out. > > Fixes: 09d0fb5cb30e ("idpf: deinit virtchnl transaction manager after vport > and vectors") > Reviewed-by: Larysa Zaremba > Signed-off-by: Emil Tantilov > --- > Changelog: > v2: > - Assigned the current state of REMOVE_IN_PROG flag to a boolean > variable, to be checked instead of reading the flag twice. > - Updated the description to clarify the reason for the timeouts on > reset is due to the mailbox being destroyed. > > v1: > https://lore.kernel.org/intel-wired-lan/20241218014417.3786-1-emil.s.tanti...@intel.com/ > > Testing hints: > echo 1 > /sys/class/net//device/reset Thanks for the update, Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [PATCH iwl-next v4] e1000e: Fix real-time violations on link up
On Thu, Dec 19, 2024 at 08:27:43PM +0100, Gerhard Engleder wrote: > From: Gerhard Engleder > > Link down and up triggers update of MTA table. This update executes many > PCIe writes and a final flush. Thus, PCIe will be blocked until all > writes are flushed. As a result, DMA transfers of other targets suffer > from delay in the range of 50us. This results in timing violations on > real-time systems during link down and up of e1000e in combination with > an Intel i3-2310E Sandy Bridge CPU. > > The i3-2310E is quite old. Launched 2011 by Intel but still in use as > robot controller. The exact root cause of the problem is unclear and > this situation won't change as Intel support for this CPU has ended > years ago. Our experience is that the number of posted PCIe writes needs > to be limited at least for real-time systems. With posted PCIe writes a > much higher throughput can be generated than with PCIe reads which > cannot be posted. Thus, the load on the interconnect is much higher. > Additionally, a PCIe read waits until all posted PCIe writes are done. > Therefore, the PCIe read can block the CPU for much more than 10us if a > lot of PCIe writes were posted before. Both issues are the reason why we > are limiting the number of posted PCIe writes in row in general for our > real-time systems, not only for this driver. > > A flush after a low enough number of posted PCIe writes eliminates the > delay but also increases the time needed for MTA table update. The > following measurements were done on i3-2310E with e1000e for 128 MTA > table entries: > > Single flush after all writes: 106us > Flush after every write: 429us > Flush after every 2nd write: 266us > Flush after every 4th write: 180us > Flush after every 8th write: 141us > Flush after every 16th write: 121us > > A flush after every 8th write delays the link up by 35us and the > negative impact to DMA transfers of other targets is still tolerable. > > Execute a flush after every 8th write. This prevents overloading the > interconnect with posted writes. > > Signed-off-by: Gerhard Engleder > Link: > https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0...@lunn.ch/T/ > CC: Vitaly Lifshits > Reviewed-by: Przemek Kitszel > Tested-by: Avigail Dahan > --- > v4: > - add PREEMPT_RT dependency again (Vitaly Lifshits) > - fix comment styple (Alexander Lobakin) > - add to comment each 8th and explain why (Alexander Lobakin) > - simplify check for every 8th write (Alexander Lobakin) > > v3: > - mention problematic platform explicitly (Bjorn Helgaas) > - improve comment (Paul Menzel) > > v2: > - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) Reviewed-by: Simon Horman
[Intel-wired-lan] [PATCH net-next v6 0/8] fix two bugs related to page_pool
This patchset fix a possible time window problem for page_pool and the dma API misuse problem as mentioned in [1], and try to avoid the overhead of the fixing using some optimization. >From the below performance data, the overhead is not so obvious due to performance variations for time_bench_page_pool01_fast_path() and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead for time_bench_page_pool03_slow() for fixing the bug. Before this patchset: root@(none)$ insmod bench_page_pool_simple.ko [ 323.367627] bench_page_pool_simple: Loaded [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:1 tsc_interval:7699707) [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:1 tsc_interval:134685507) [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:1000 tsc_interval:15010120) [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:1 tsc_interval:65421294) [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:1000 tsc_interval:29633814) [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:1000 tsc_interval:57391174) [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:1000 tsc_interval:181849581) [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:1000 tsc_interval:29632785) [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:1000 tsc_interval:79523650) [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:1000 tsc_interval:190104743) After this patchset: root@(none)$ insmod bench_page_pool_simple.ko [ 138.634758] bench_page_pool_simple: Loaded [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:1 tsc_interval:7697265) [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:1 tsc_interval:134673531) [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:1000 tsc_interval:15005497) [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:1 tsc_interval:65412493) [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:1000 tsc_interval:30159812) [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:1000 tsc_interval:70140573) [ 141.994933] bench_page_pool_simple: time_
[Intel-wired-lan] [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API
introduce page_pool_get_pp() API to avoid caller accessing page->pp directly. Signed-off-by: Yunsheng Lin --- drivers/net/ethernet/freescale/fec_main.c | 8 +--- .../net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +- drivers/net/ethernet/intel/iavf/iavf_txrx.c| 6 -- drivers/net/ethernet/intel/idpf/idpf_txrx.c| 14 +- drivers/net/ethernet/intel/libeth/rx.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 ++- drivers/net/netdevsim/netdev.c | 6 -- drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- include/net/libeth/rx.h| 3 ++- include/net/page_pool/helpers.h| 5 + 10 files changed, 34 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index b2daed55bf6c..18d2119dbec1 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -1009,7 +1009,8 @@ static void fec_enet_bd_init(struct net_device *dev) struct page *page = txq->tx_buf[i].buf_p; if (page) - page_pool_put_page(page->pp, page, 0, false); + page_pool_put_page(page_pool_get_pp(page), + page, 0, false); } txq->tx_buf[i].buf_p = NULL; @@ -1549,7 +1550,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id, int budget) xdp_return_frame_rx_napi(xdpf); } else { /* recycle pages of XDP_TX frames */ /* The dma_sync_size = 0 as XDP_TX has already synced DMA for_device */ - page_pool_put_page(page->pp, page, 0, true); + page_pool_put_page(page_pool_get_pp(page), page, 0, true); } txq->tx_buf[index].buf_p = NULL; @@ -3307,7 +3308,8 @@ static void fec_enet_free_buffers(struct net_device *ndev) } else { struct page *page = txq->tx_buf[i].buf_p; - page_pool_put_page(page->pp, page, 0, false); + page_pool_put_page(page_pool_get_pp(page), + page, 0, false); } txq->tx_buf[i].buf_p = NULL; diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c index 403f0f335ba6..87422b8828ff 100644 --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c @@ -210,7 +210,7 @@ void gve_free_to_page_pool(struct gve_rx_ring *rx, if (!page) return; - page_pool_put_full_page(page->pp, page, allow_direct); + page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct); buf_state->page_info.page = NULL; } diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c index 26b424fd6718..e1bf5554f6e3 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c @@ -1050,7 +1050,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb, const struct libeth_fqe *rx_buffer, unsigned int size) { - u32 hr = rx_buffer->page->pp->p.offset; + struct page_pool *pool = page_pool_get_pp(rx_buffer->page); + u32 hr = pool->p.offset; skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page, rx_buffer->offset + hr, size, rx_buffer->truesize); @@ -1067,7 +1068,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb, static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer, unsigned int size) { - u32 hr = rx_buffer->page->pp->p.offset; + struct page_pool *pool = page_pool_get_pp(rx_buffer->page); + u32 hr = pool->p.offset; struct sk_buff *skb; void *va; diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c index 2fa9c36e33c9..04f2347716ca 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c @@ -385,7 +385,8 @@ static void idpf_rx_page_rel(struct libeth_fqe *rx_buf) if (unlikely(!rx_buf->page)) return; - page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false); + page_pool_put_full_page(page_pool_get_pp(rx_buf->page), rx_buf->page, + false); rx_buf->page = NULL; rx_buf->offset = 0; @@ -3098,7 +3099,8 @@ idpf_rx_process_skb_fields(struct idpf_rx_queu
Re: [Intel-wired-lan] [RFC net-next] ixgbevf: Remove unused ixgbevf_hv_mbx_ops
On Thu, Dec 26, 2024 at 02:09:23PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > The const struct ixgbevf_hv_mbx_ops was added in 2016 as part of > commit c6d45171d706 ("ixgbevf: Support Windows hosts (Hyper-V)") > > but has remained unused. > > The functions it references are still referenced elsewhere. > > Remove it. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 1/3] igc: Remove unused igc_acquire/release_nvm
On Thu, Dec 26, 2024 at 04:52:13PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > igc_acquire_nvm() and igc_release_nvm() were added in 2018 as part of > commit ab4056126813 ("igc: Add NVM support") > > but never used. > > Remove them. > > The igc_1225.c has it's own specific implementations. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 2/3] igc: Remove unused igc_read/write_pci_cfg wrappers
On Thu, Dec 26, 2024 at 04:52:14PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > igc_read_pci_cfg() and igc_write_pci_cfg were added in 2018 as part of > commit 146740f9abc4 ("igc: Add support for PF") > but have remained unused. > > Remove them. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman
Re: [Intel-wired-lan] [RFC net-next 3/3] igc: Remove unused igc_read/write_pcie_cap_reg
On Thu, Dec 26, 2024 at 04:52:15PM +, li...@treblig.org wrote: > From: "Dr. David Alan Gilbert" > > The last uses of igc_read_pcie_cap_reg() and igc_write_pcie_cap_reg() > were removed in 2019 by > commit 16ecd8d9af26 ("igc: Remove the obsolete workaround") > > Remove them. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman