Re: [Intel-wired-lan] [PATCH net-next v6 0/8] fix two bugs related to page_pool

2025-01-06 Thread Jakub Kicinski
On Mon, 6 Jan 2025 21:01:08 +0800 Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
> 
> From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.

This appears to make the selftest from the drivers/net target implode.

[   20.227775][  T218] BUG: KASAN: use-after-free in 
page_pool_item_uninit+0x100/0x130

Running the ping.py tests should be enough to repro.
-- 
pw-bot: cr


Re: [Intel-wired-lan] [PATCH net-next 0/9] i40e deadcoding

2025-01-06 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Thu,  2 Jan 2025 17:37:08 + you wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   This is a bunch of deadcoding of functions that
> are entirely uncalled in the i40e driver.
> 
>   Build tested only.
> 
> [...]

Here is the summary with links:
  - [net-next,1/9] i40e: Deadcode i40e_aq_*
https://git.kernel.org/netdev/net-next/c/59ec698d01eb
  - [net-next,2/9] i40e: Remove unused i40e_blink_phy_link_led
https://git.kernel.org/netdev/net-next/c/39cabb01d26d
  - [net-next,3/9] i40e: Remove unused i40e_(read|write)_phy_register
https://git.kernel.org/netdev/net-next/c/8cc51e28ecce
  - [net-next,4/9] i40e: Deadcode profile code
https://git.kernel.org/netdev/net-next/c/81d6bb2012e1
  - [net-next,5/9] i40e: Remove unused i40e_get_cur_guaranteed_fd_count
https://git.kernel.org/netdev/net-next/c/3eb24a9e0af3
  - [net-next,6/9] i40e: Remove unused i40e_del_filter
https://git.kernel.org/netdev/net-next/c/38dfb07d9a65
  - [net-next,7/9] i40e: Remove unused i40e_commit_partition_bw_setting
https://git.kernel.org/netdev/net-next/c/a324484ac855
  - [net-next,8/9] i40e: Remove unused i40e_asq_send_command_v2
https://git.kernel.org/netdev/net-next/c/d424b93f35a6
  - [net-next,9/9] i40e: Remove unused i40e_dcb_hw_get_num_tc
https://git.kernel.org/netdev/net-next/c/47ea5d4e6f40

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [Intel-wired-lan] [PATCH net-next 0/3] igc deadcoding

2025-01-06 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Thu,  2 Jan 2025 17:41:39 + you wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   This set removes some functions that are entirely unused
> and have been since ~2018.
> 
> Build tested.
> 
> [...]

Here is the summary with links:
  - [net-next,1/3] igc: Remove unused igc_acquire/release_nvm
https://git.kernel.org/netdev/net-next/c/b37dba891b17
  - [net-next,2/3] igc: Remove unused igc_read/write_pci_cfg wrappers
https://git.kernel.org/netdev/net-next/c/121c3c6bc661
  - [net-next,3/3] igc: Remove unused igc_read/write_pcie_cap_reg
https://git.kernel.org/netdev/net-next/c/c75889081366

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [Intel-wired-lan] [PATCH net-next v3 3/6] net: napi: add CPU affinity to napi_config

2025-01-06 Thread Dan Carpenter
Hi Ahmed,

kernel test robot noticed the following build warnings:

url:
https://github.com/intel-lab-lkp/linux/commits/Ahmed-Zaki/net-move-ARFS-rmap-management-to-core/20250104-084501
base:   net-next/main
patch link:
https://lore.kernel.org/r/20250104004314.208259-4-ahmed.zaki%40intel.com
patch subject: [Intel-wired-lan] [PATCH net-next v3 3/6] net: napi: add CPU 
affinity to napi_config
config: i386-randconfig-141-20250104 
(https://download.01.org/0day-ci/archive/20250105/202501050625.ny1c97ex-...@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project 
ab51eccf88f5321e7c60591c5546b254b6afab99)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Reported-by: Dan Carpenter 
| Closes: https://lore.kernel.org/r/202501050625.ny1c97ex-...@intel.com/

smatch warnings:
net/core/dev.c:6835 napi_restore_config() warn: variable dereferenced before 
check 'n->config' (see line 6831)
net/core/dev.c:6855 napi_save_config() warn: variable dereferenced before check 
'n->config' (see line 6850)

vim +6835 net/core/dev.c

86e25f40aa1e9e5 Joe Damato 2024-10-11  6829  static void 
napi_restore_config(struct napi_struct *n)
86e25f40aa1e9e5 Joe Damato 2024-10-11  6830  {
86e25f40aa1e9e5 Joe Damato 2024-10-11 @6831 n->defer_hard_irqs = 
n->config->defer_hard_irqs;
86e25f40aa1e9e5 Joe Damato 2024-10-11  6832 n->gro_flush_timeout = 
n->config->gro_flush_timeout;
5dc51ec86df6e22 Martin Karsten 2024-11-09  6833 n->irq_suspend_timeout 
= n->config->irq_suspend_timeout;

 ^
These lines all dereference n->config.

d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6834  
d6b43b8a2e5297b Ahmed Zaki 2025-01-03 @6835 if (n->irq > 0 && 
n->config && n->dev->irq_affinity_auto)
  
^
This code assumes it can be NULL

d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6836 
irq_set_affinity(n->irq, &n->config->affinity_mask);
d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6837  
86e25f40aa1e9e5 Joe Damato 2024-10-11  6838 /* a NAPI ID might be 
stored in the config, if so use it. if not, use
86e25f40aa1e9e5 Joe Damato 2024-10-11  6839  * napi_hash_add to 
generate one for us. It will be saved to the config
86e25f40aa1e9e5 Joe Damato 2024-10-11  6840  * in napi_disable.
86e25f40aa1e9e5 Joe Damato 2024-10-11  6841  */
86e25f40aa1e9e5 Joe Damato 2024-10-11  6842 if (n->config->napi_id)
86e25f40aa1e9e5 Joe Damato 2024-10-11  6843 
napi_hash_add_with_id(n, n->config->napi_id);
86e25f40aa1e9e5 Joe Damato 2024-10-11  6844 else
86e25f40aa1e9e5 Joe Damato 2024-10-11  6845 
napi_hash_add(n);
86e25f40aa1e9e5 Joe Damato 2024-10-11  6846  }
86e25f40aa1e9e5 Joe Damato 2024-10-11  6847  
86e25f40aa1e9e5 Joe Damato 2024-10-11  6848  static void 
napi_save_config(struct napi_struct *n)
86e25f40aa1e9e5 Joe Damato 2024-10-11  6849  {
86e25f40aa1e9e5 Joe Damato 2024-10-11 @6850 
n->config->defer_hard_irqs = n->defer_hard_irqs;
86e25f40aa1e9e5 Joe Damato 2024-10-11  6851 
n->config->gro_flush_timeout = n->gro_flush_timeout;
5dc51ec86df6e22 Martin Karsten 2024-11-09  6852 
n->config->irq_suspend_timeout = n->irq_suspend_timeout;
86e25f40aa1e9e5 Joe Damato 2024-10-11  6853 n->config->napi_id = 
n->napi_id;
d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6854  
d6b43b8a2e5297b Ahmed Zaki 2025-01-03 @6855 if (n->irq > 0 && 
n->config && n->dev->irq_affinity_auto)

Same

d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6856 
irq_set_affinity_notifier(n->irq, NULL);
d6b43b8a2e5297b Ahmed Zaki 2025-01-03  6857  
86e25f40aa1e9e5 Joe Damato 2024-10-11  6858 napi_hash_del(n);
86e25f40aa1e9e5 Joe Damato 2024-10-11  6859  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [Intel-wired-lan] [RFC net-next 1/9] i40e: Deadcode i40e_aq_*

2025-01-06 Thread Simon Horman
On Sat, Dec 21, 2024 at 06:42:39PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> i40e_aq_add_mirrorrule(), i40e_aq_delete_mirrorrule() and
> i40e_aq_set_vsi_vlan_promisc() were added in 2016 by
> commit 7bd6875bef70 ("i40e: APIs to Add/remove port mirroring rules")
> but haven't been used.
> 
> They were the last user of i40e_mirrorrule_op().
> 
> i40e_aq_rearrange_nvm() was added in 2018 by
> commit f05798b4ff82 ("i40e: Add AQ command for rearrange NVM structure")
> but hasn't been used.
> 
> i40e_aq_restore_lldp() was added in 2019 by
> commit c65e78f87f81 ("i40e: Further implementation of LLDP")
> but hasn't been used.
> 
> Remove them.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 4/9] i40e: Deadcode profile code

2025-01-06 Thread Simon Horman
On Sat, Dec 21, 2024 at 06:42:42PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> i40e_add_pinfo_to_list() was added in 2017 by
> commit 1d5c960c5ef5 ("i40e: new AQ commands")
> 
> i40e_find_section_in_profile() was added in 2019 by
> commit cdc594e00370 ("i40e: Implement DDP support in i40e driver")
> 
> Neither have been used.
> 
> Remove them.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 3/9] i40e: Remove unused i40e_(read|write)_phy_register

2025-01-06 Thread Simon Horman
On Sat, Dec 21, 2024 at 06:42:41PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> i40e_read_phy_register() and i40e_write_phy_register() were added in
> 2016 by
> commit f62ba91458b5 ("i40e: Add functions which apply correct PHY access
> method for read and write operation")
> 
> but haven't been used.
> 
> Remove them.
> 
> (There are more specific _clause* variants of these functions
> that are still used.)
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 8/9] i40e: Remove unused i40e_asq_send_command_v2

2025-01-06 Thread Simon Horman
On Sat, Dec 21, 2024 at 06:42:46PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> i40e_asq_send_command_v2() was added in 2022 by
> commit 74073848b0d7 ("i40e: Add new versions of send ASQ command
> functions")
> but hasn't been used.
> 
> Remove it.
> 
> (The _atomic_v2 version of the function is used, so leave it).
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 6/9] i40e: Remove unused i40e_del_filter

2025-01-06 Thread Simon Horman
On Sat, Dec 21, 2024 at 06:42:44PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The last use of i40e_del_filter() was removed in 2016 by
> commit 9569a9a4547d ("i40e: when adding or removing MAC filters, correctly
> handle VLANs")
> 
> Remove it.
> 
> Fix up a comment that referenced it.
> 
> Note: The __ version of this function is still used.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [PATCH iwl-next v6] ice: Add E830 checksum offload support

2025-01-06 Thread Simon Horman
On Wed, Dec 18, 2024 at 04:11:45AM -0500, Paul Greenwalt wrote:
> E830 supports raw receive and generic transmit checksum offloads.
> 
> Raw receive checksum support is provided by hardware calculating the
> checksum over the whole packet, regardless of type. The calculated
> checksum is provided to driver in the Rx flex descriptor. Then the driver
> assigns the checksum to skb->csum and sets skb->ip_summed to
> CHECKSUM_COMPLETE.
> 
> Generic transmit checksum support is provided by hardware calculating the
> checksum given two offsets: the start offset to begin checksum calculation,
> and the offset to insert the calculated checksum in the packet. Support is
> advertised to the stack using NETIF_F_HW_CSUM feature.
> 
> E830 has the following limitations when both generic transmit checksum
> offload and TCP Segmentation Offload (TSO) are enabled:
> 
> 1. Inner packet header modification is not supported. This restriction
>includes the inability to alter TCP flags, such as the push flag. As a
>result, this limitation can impact the receiver's ability to coalesce
>packets, potentially degrading network throughput.
> 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents
>support of Maximum Transmission Unit (MTU) greater than 1063 bytes.
> 
> Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually
> exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not
> enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the
> defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and
> NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity
> of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features().
> 
> When NETIF_F_HW_CSUM is requested the provided skb->csum_start and
> skb->csum_offset are passed to hardware in the Tx context descriptor
> generic checksum (GCS) parameters. Hardware calculates the 1's complement
> from skb->csum_start to the end of the packet, and inserts the result in
> the packet at skb->csum_offset.
> 
> Co-developed-by: Alice Michael 
> Signed-off-by: Alice Michael 
> Co-developed-by: Eric Joyner 
> Signed-off-by: Eric Joyner 
> Signed-off-by: Paul Greenwalt 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [PATCH net-next 0/9] i40e deadcoding

2025-01-06 Thread Tony Nguyen




On 1/4/2025 8:16 AM, Jakub Kicinski wrote:

On Thu,  2 Jan 2025 17:37:08 + li...@treblig.org wrote:

   This is a bunch of deadcoding of functions that
are entirely uncalled in the i40e driver.

   Build tested only.


Intel folks, is it okay if we take this (and the igc series)
in directly? Seems very unlikely to require testing...


It's fine to take directly. I don't think this needs testing either.

I believe this will get picked up from here:
Reviewed-by: Tony Nguyen 

Thanks,
Tony



Re: [Intel-wired-lan] [PATCH net-next 0/3] igc deadcoding

2025-01-06 Thread Tony Nguyen




On 1/2/2025 9:41 AM, li...@treblig.org wrote:

From: "Dr. David Alan Gilbert" 

Hi,
   This set removes some functions that are entirely unused
and have been since ~2018.

Build tested.

Signed-off-by: Dr. David Alan Gilbert 
(Repost now netdev is open)


Reviewed-by: Tony Nguyen 


Dr. David Alan Gilbert (3):
   igc: Remove unused igc_acquire/release_nvm
   igc: Remove unused igc_read/write_pci_cfg wrappers
   igc: Remove unused igc_read/write_pcie_cap_reg

  drivers/net/ethernet/intel/igc/igc_hw.h   |  5 ---
  drivers/net/ethernet/intel/igc/igc_main.c | 39 --
  drivers/net/ethernet/intel/igc/igc_nvm.c  | 50 ---
  drivers/net/ethernet/intel/igc/igc_nvm.h  |  2 -
  4 files changed, 96 deletions(-)





Re: [Intel-wired-lan] [PATCH iwl-net v3] ice: fix ice_parser_rt::bst_key array size

2025-01-06 Thread Simon Horman
On Thu, Dec 19, 2024 at 12:55:16PM +0100, Przemek Kitszel wrote:
> Fix &ice_parser_rt::bst_key size. It was wrongly set to 10 instead of 20
> in the initial impl commit (see Fixes tag). All usage code assumed it was
> of size 20. That was also the initial size present up to v2 of the intro
> series [2], but halved by v3 [3] refactor described as "Replace magic
> hardcoded values with macros." The introducing series was so big that
> some ugliness was unnoticed, same for bugs :/
> 
> ICE_BST_KEY_TCAM_SIZE and ICE_BST_TCAM_KEY_SIZE were differing by one.
> There was tmp variable @j in the scope of edited function, but was not
> used in all places. This ugliness is now gone.
> I'm moving ice_parser_rt::pg_prio a few positions up, to fill up one of
> the holes in order to compensate for the added 10 bytes to the ::bst_key,
> resulting in the same size of the whole as prior to the fix, and miminal
> changes in the offsets of the fields.
> 
> Extend also the debug dump print of the key to cover all bytes. To not
> have string with 20 "%02x" and 20 params, switch to
> ice_debug_array_w_prefix().
> 
> This fix obsoletes Ahmed's attempt at [1].
> 
> [1] 
> https://lore.kernel.org/intel-wired-lan/20240823230847.172295-1-ahmed.z...@intel.com
> [2] 
> https://lore.kernel.org/intel-wired-lan/20230605054641.2865142-13-junfeng@intel.com
> [3] 
> https://lore.kernel.org/intel-wired-lan/20230817093442.2576997-13-junfeng@intel.com
> 
> Reported-by: Dan Carpenter 
> Closes: 
> https://lore.kernel.org/intel-wired-lan/b1fb6ff9-b69e-4026-9988-3c783d86c2e0@stanley.mountain
> Fixes: 9a4c07aaa0f5 ("ice: add parser execution main loop")
> CC: Ahmed Zaki 
> Reviewed-by: Larysa Zaremba 
> Signed-off-by: Przemek Kitszel 
> ---
> v3: mention printing change in commit msg, separate prefix from the debug log 
> (Simon)
> 
> v2: same as v3, but lacks code change :(
> 
> v1: 
> https://lore.kernel.org/intel-wired-lan/20241216170548.gi780...@kernel.org/T/#mbf984a0faa12a5bdb53460b150201fdd7cc1826a

Thanks for the updates, much appreciated.

Reviewed-by: Simon Horman 



[Intel-wired-lan] [PATCH bpf-next v4 0/4] xsk: TX metadata Launch Time support

2025-01-06 Thread Song Yoong Siang
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.

Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.

Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.

To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.

Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.

Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.

Changes since v1:
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)

Changes since v2:
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
  because some NICs do not support such a large future time.

Changes since v3:
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
  it is against (Willem)

v1: 
https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-1-yoong.siang.s...@intel.com/
v2: 
https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768-1-yoong.siang.s...@intel.com/
v3: 
https://patchwork.kernel.org/project/netdevbpf/cover/20231203165129.1740512-1-yoong.siang.s...@intel.com/

Song Yoong Siang (4):
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add Launch Time request to xdp_hw_metadata
  net: stmmac: Add launch time support to XDP ZC
  igc: Add launch time support to XDP ZC

 Documentation/netlink/specs/netdev.yaml   |  4 +
 Documentation/networking/xsk-tx-metadata.rst  | 64 +++
 drivers/net/ethernet/intel/igc/igc_main.c | 78 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 +
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 
 include/net/xdp_sock.h| 10 +++
 include/net/xdp_sock_drv.h|  1 +
 include/uapi/linux/if_xdp.h   | 10 +++
 include/uapi/linux/netdev.h   |  3 +
 net/core/netdev-genl.c|  2 +
 net/xdp/xsk.c |  3 +
 tools/include/uapi/linux/if_xdp.h | 10 +++
 tools/include/uapi/linux/netdev.h |  3 +
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 30 ++-
 14 files changed, 208 insertions(+), 25 deletions(-)

-- 
2.34.1



Re: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG

2025-01-06 Thread Simon Horman
On Thu, Jan 02, 2025 at 08:07:52PM +0100, Marcin Szycik wrote:
> Ever since removing switchdev control VSI and using PF for port
> representor Tx/Rx, switchdev slow-path has been working improperly after
> failover in SR-IOV LAG. LAG assumes that the first uplink to be added to
> the aggregate will own VFs and have switchdev configured. After
> failing-over to the other uplink, representors are still configured to
> Tx through the uplink they are set up on, which fails because that
> uplink is now down.
> 
> On failover, update all PRs on primary uplink to use the currently
> active uplink for Tx. Call netif_keep_dst(), as the secondary uplink
> might not be in switchdev mode. Also make sure to call
> ice_eswitch_set_target_vsi() if uplink is in LAG.
> 
> On the Rx path, representors are already working properly, because
> default Tx from VFs is set to PF owning the eswitch. After failover the
> same PF is receiving traffic from VFs, even though link is down.
> 
> Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
> Reviewed-by: Michal Swiatkowski 
> Signed-off-by: Marcin Szycik 

Reviewed-by: Simon Horman 



[Intel-wired-lan] [PATCH bpf-next v4 2/4] selftests/bpf: Add Launch Time request to xdp_hw_metadata

2025-01-06 Thread Song Yoong Siang
Add Launch Time hw offload request to xdp_hw_metadata. User can configure
the delta of launch time to HW RX-time by using "-l" argument. The default
delta is 100,000,000 nanosecond.

Signed-off-by: Song Yoong Siang 
---
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 30 +--
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c 
b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 6f7b15d6c6ed..795c1d14e02d 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -13,6 +13,7 @@
  * - UDP 9091 packets trigger TX reply
  * - TX HW timestamp is requested and reported back upon completion
  * - TX checksum is requested
+ * - TX launch time HW offload is requested for transmission
  */
 
 #include 
@@ -64,6 +65,8 @@ int rxq;
 bool skip_tx;
 __u64 last_hw_rx_timestamp;
 __u64 last_xdp_rx_timestamp;
+__u64 last_launch_time;
+__u64 launch_time_delta_to_hw_rx_timestamp = 1; /* 0.1 second */
 
 void test__fail(void) { /* for network_helpers.c */ }
 
@@ -298,6 +301,8 @@ static bool complete_tx(struct xsk *xsk, clockid_t clock_id)
if (meta->completion.tx_timestamp) {
__u64 ref_tstamp = gettime(clock_id);
 
+   print_tstamp_delta("HW Launch-time", "HW TX-complete-time",
+  last_launch_time, 
meta->completion.tx_timestamp);
print_tstamp_delta("HW TX-complete-time", "User 
TX-complete-time",
   meta->completion.tx_timestamp, ref_tstamp);
print_tstamp_delta("XDP RX-time", "User TX-complete-time",
@@ -395,6 +400,14 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, 
clockid_t clock_id)
   xsk, ntohs(udph->check), ntohs(want_csum),
   meta->request.csum_start, meta->request.csum_offset);
 
+   /* Set the value of launch time */
+   meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME;
+   meta->request.launch_time = last_hw_rx_timestamp +
+   launch_time_delta_to_hw_rx_timestamp;
+   last_launch_time = meta->request.launch_time;
+   print_tstamp_delta("HW RX-time", "HW Launch-time", last_hw_rx_timestamp,
+  meta->request.launch_time);
+
memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity 
*/
tx_desc->options |= XDP_TX_METADATA;
tx_desc->len = len;
@@ -402,10 +415,14 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, 
clockid_t clock_id)
xsk_ring_prod__submit(&xsk->tx, 1);
 }
 
+#define SLEEP_PER_ITERATION_IN_US 10
+#define SLEEP_PER_ITERATION_IN_NS (SLEEP_PER_ITERATION_IN_US * 1000)
+#define MAX_ITERATION(x) (((x) / SLEEP_PER_ITERATION_IN_NS) + 500)
 static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, 
clockid_t clock_id)
 {
const struct xdp_desc *rx_desc;
struct pollfd fds[rxq + 1];
+   int max_iterations;
__u64 comp_addr;
__u64 addr;
__u32 idx = 0;
@@ -418,6 +435,9 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int 
server_fd, clockid_t
fds[i].revents = 0;
}
 
+   /* Calculate max iterations to wait for transmit completion */
+   max_iterations = MAX_ITERATION(launch_time_delta_to_hw_rx_timestamp);
+
fds[rxq].fd = server_fd;
fds[rxq].events = POLLIN;
fds[rxq].revents = 0;
@@ -477,10 +497,10 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, 
int server_fd, clockid_t
if (ret)
printf("kick_tx ret=%d\n", ret);
 
-   for (int j = 0; j < 500; j++) {
+   for (int j = 0; j < max_iterations; 
j++) {
if (complete_tx(xsk, clock_id))
break;
-   usleep(10);
+   
usleep(SLEEP_PER_ITERATION_IN_US);
}
}
}
@@ -608,6 +628,7 @@ static void print_usage(void)
"  -hDisplay this help and exit\n\n"
"  -mEnable multi-buffer XDP for larger MTU\n"
"  -rDon't generate AF_XDP reply (rx metadata only)\n"
+   "  -lDelta of launch time to HW RX-time in ns (default: 
100,000,000ns)\n"
"Generate test packets on the other machine with:\n"
"  echo -n xdp | nc -u -q1  9091\n";
 
@@ -618,7 +639,7 @@ static void read_args(int argc, char *argv[])
 {
int opt;
 
-   while ((opt = getopt(argc, argv, "chmr")) != -1) {
+   while ((opt = getopt(argc, argv, "chmrl:")) != -1) {
switch (opt) {
case 'c':
   

[Intel-wired-lan] [PATCH bpf-next v4 4/4] igc: Add launch time support to XDP ZC

2025-01-06 Thread Song Yoong Siang
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx
metadata framework.

This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on
Intel Tiger Lake platform. Below are the test steps and result.

Test Steps:
1. Add mqprio qdisc:
   $ sudo tc qdisc add dev enp2s0 handle 8001: parent root mqprio num_tc 4
 map 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 1@2 1@3 hw 0

2. Enable launch time hardware offload on hardware queue 1:
   $ sudo tc qdisc replace dev enp2s0 parent 8001:2 etf offload clockid
 CLOCK_TAI delta 50

3. Change RSS to route all incoming IP packets into hardware queue 1:
   $ sudo ethtool -X enp2s0 start 1 equal 1

4. Start xdp_hw_metadata selftest application:
   $ sudo ./xdp_hw_metadata enp2s0 -l 10

5. Send an UDP packet to port 9091 of DUT.
   $ echo -n xdp | nc -u -q0 169.254.1.1 9091

When launch time is set to 1s in the future, the delta between launch time
and transmit hardware timestamp is equal to 0.016us, as shown in result
below:
  0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
  rx_hash: 0xE343384 with RSS type:0x1
  HW RX-time:   1734578015467548904 (sec:1734578015.4675) delta to User RX-time 
sec:0.0002 (183.103 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677) delta to User 
RX-time sec:0.0001 (80.309 usec)
  No rx_vlan_tci or rx_vlan_proto, err=-95
  0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 
csum_offset=6
  HW RX-time:   1734578015467548904 (sec:1734578015.4675) delta to HW 
Launch-time sec:1. (100.000 usec)
  0x562ff5dc8880: complete tx idx=4 addr=4018
  HW Launch-time:   1734578016467548904 (sec:1734578016.4675) delta to HW 
TX-complete-time sec:0. (0.016 usec)
  HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675) delta to 
User TX-complete-time sec:0. (32.546 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677) delta to User 
TX-complete-time sec:0. (29.768 usec)
  HW RX-time:   1734578015467548904 (sec:1734578015.4675) delta to HW 
TX-complete-time sec:1. (100.016 usec)
  0x562ff5dc8880: complete rx idx=132 addr=84110

Signed-off-by: Song Yoong Siang 
---
 drivers/net/ethernet/intel/igc/igc_main.c | 78 ---
 1 file changed, 56 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 27872bdea9bd..6857f5f5b4b2 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter 
*adapter, struct sk_buff *s
return false;
 }
 
+static void igc_insert_empty_packet(struct igc_ring *tx_ring)
+{
+   struct igc_tx_buffer *empty_info;
+   struct sk_buff *empty;
+   void *data;
+
+   empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+   empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
+   if (!empty)
+   return;
+
+   data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
+   memset(data, 0, IGC_EMPTY_FRAME_SIZE);
+
+   igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
+
+   if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
+   dev_kfree_skb_any(empty);
+}
+
 static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
   struct igc_ring *tx_ring)
 {
@@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff 
*skb,
skb->tstamp = ktime_set(0, 0);
launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, 
&insert_empty);
 
-   if (insert_empty) {
-   struct igc_tx_buffer *empty_info;
-   struct sk_buff *empty;
-   void *data;
-
-   empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
-   empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
-   if (!empty)
-   goto done;
-
-   data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
-   memset(data, 0, IGC_EMPTY_FRAME_SIZE);
-
-   igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
-
-   if (igc_init_tx_empty_descriptor(tx_ring,
-empty,
-empty_info) < 0)
-   dev_kfree_skb_any(empty);
-   }
+   if (insert_empty)
+   igc_insert_empty_packet(tx_ring);
 
 done:
/* record the location of the first descriptor for this packet */
@@ -2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
return *(u64 *)_priv;
 }
 
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+   struct igc_metadata_request *meta_req = _priv;
+   struct igc_ring *tx_ring = meta_req->tx_ring;
+   __le32 launch_time_offset;
+   bool insert_empty = false;
+   bool first_flag = false;
+
+   if (!tx_ring-

[Intel-wired-lan] [PATCH bpf-next v4 3/4] net: stmmac: Add launch time support to XDP ZC

2025-01-06 Thread Song Yoong Siang
Enable launch time (Time-Based Scheduling) support to XDP zero copy via XDP
Tx metadata framework.

This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on
Intel Tiger Lake platform. Below are the test steps and result.

Test Steps:
1. Add mqprio qdisc:
   $ sudo tc qdisc add dev enp0s30f4 handle 8001: parent root mqprio num_tc
 4 map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 queues 1@0 1@1 1@2 1@3 hw 0

2. Enable launch time hardware offload on hardware queue 1:
   $ sudo tc qdisc replace dev enp0s30f4 parent 8001:2 etf offload clockid
 CLOCK_TAI delta 50

3. Add an ingress qdisc:
   $ sudo tc qdisc add dev enp0s30f4 ingress

4. Add a flower filter to route incoming packet with VLAN priority 1 into
   hardware queue 1:
   $ sudo tc filter add dev enp0s30f4 parent : protocol 802.1Q flower
 vlan_prio 1 hw_tc 1

5. Enable VLAN tag stripping:
   $ sudo ethtool -K enp0s30f4 rxvlan on

6. Start xdp_hw_metadata selftest application:
   $ sudo ./xdp_hw_metadata enp0s30f4 -l 10

7. Send an UDP packet with VLAN priority 1 to port 9091 of DUT.

When launch time is set to 1s in the future, the delta between launch time
and transmit hardware timestamp is equal to 16.963us, as shown in result
below:
  0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP
  No rx_hash, err=-95
  HW RX-time:   1734579065767717328 (sec:1734579065.7677) delta to User RX-time 
sec:0.0004 (375.624 usec)
  XDP RX-time:   1734579065768004454 (sec:1734579065.7680) delta to User 
RX-time sec:0.0001 (88.498 usec)
  No rx_vlan_tci or rx_vlan_proto, err=-95
  0x55b5864717a8: ping-pong with csum=5619 (want ) csum_start=34 
csum_offset=6
  HW RX-time:   1734579065767717328 (sec:1734579065.7677) delta to HW 
Launch-time sec:1. (100.000 usec)
  0x55b5864717a8: complete tx idx=4 addr=4018
  HW Launch-time:   1734579066767717328 (sec:1734579066.7677) delta to HW 
TX-complete-time sec:0. (16.963 usec)
  HW TX-complete-time:   1734579066767734291 (sec:1734579066.7677) delta to 
User TX-complete-time sec:0.0001 (130.408 usec)
  XDP RX-time:   1734579065768004454 (sec:1734579065.7680) delta to User 
TX-complete-time sec:0. (999860.245 usec)
  HW RX-time:   1734579065767717328 (sec:1734579065.7677) delta to HW 
TX-complete-time sec:1. (116.963 usec)
  0x55b5864717a8: complete rx idx=132 addr=88100

Signed-off-by: Song Yoong Siang 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 1d86439b8a14..c80462d42989 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -106,6 +106,8 @@ struct stmmac_metadata_request {
struct stmmac_priv *priv;
struct dma_desc *tx_desc;
bool *set_ic;
+   struct dma_edesc *edesc;
+   int tbs;
 };
 
 struct stmmac_xsk_tx_complete {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c81ea8cdfe6e..3a083e3684ed 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2445,9 +2445,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv)
return 0;
 }
 
+static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+   struct stmmac_metadata_request *meta_req = _priv;
+   struct timespec64 ts = ns_to_timespec64(launch_time);
+
+   if (meta_req->tbs & STMMAC_TBS_EN)
+   stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec,
+   ts.tv_nsec);
+}
+
 static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = {
.tmo_request_timestamp  = stmmac_xsk_request_timestamp,
.tmo_fill_timestamp = stmmac_xsk_fill_timestamp,
+   .tmo_request_launch_time= stmmac_xsk_request_launch_time,
 };
 
 static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -2531,6 +2542,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, 
u32 queue, u32 budget)
meta_req.priv = priv;
meta_req.tx_desc = tx_desc;
meta_req.set_ic = &set_ic;
+   meta_req.tbs = tx_q->tbs;
+   meta_req.edesc = &tx_q->dma_entx[entry];
xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops,
&meta_req);
if (set_ic) {
-- 
2.34.1



[Intel-wired-lan] [PATCH bpf-next v4 1/4] xsk: Add launch time hardware offload support to XDP Tx metadata

2025-01-06 Thread Song Yoong Siang
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.

Suggested-by: Stanislav Fomichev 
Signed-off-by: Song Yoong Siang 
---
 Documentation/netlink/specs/netdev.yaml  |  4 ++
 Documentation/networking/xsk-tx-metadata.rst | 64 
 include/net/xdp_sock.h   | 10 +++
 include/net/xdp_sock_drv.h   |  1 +
 include/uapi/linux/if_xdp.h  | 10 +++
 include/uapi/linux/netdev.h  |  3 +
 net/core/netdev-genl.c   |  2 +
 net/xdp/xsk.c|  3 +
 tools/include/uapi/linux/if_xdp.h| 10 +++
 tools/include/uapi/linux/netdev.h|  3 +
 10 files changed, 110 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml 
b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..e59c8a14f7d1 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -70,6 +70,10 @@ definitions:
 name: tx-checksum
 doc:
   L3 checksum HW offload is supported by the driver.
+  -
+name: tx-launch-time
+doc:
+  Launch time HW offload is supported by the driver.
   -
 name: queue-type
 type: enum
diff --git a/Documentation/networking/xsk-tx-metadata.rst 
b/Documentation/networking/xsk-tx-metadata.rst
index e76b0cfc32f7..3cec089747ce 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -50,6 +50,10 @@ The flags field enables the particular offload:
   checksum. ``csum_start`` specifies byte offset of where the checksumming
   should start and ``csum_offset`` specifies byte offset where the
   device should store the computed checksum.
+- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
+  packet for transmission at a pre-determined time called launch time. The
+  value of launch time is indicated by ``launch_time`` field of
+  ``union xsk_tx_metadata``.
 
 Besides the flags above, in order to trigger the offloads, the first
 packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX 
checksum
 is calculated on the CPU. Do not enable this option in production because
 it will negatively affect performance.
 
+Launch Time
+===
+
+The value of the requested launch time should be based on the device's PTP
+Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
+compared to the ETF queuing discipline, which organizes packets and delays
+their transmission. Instead, AF_XDP immediately hands off the packets to
+the device driver without rearranging their order or holding them prior to
+transmission. In scenarios where the launch time offload feature is
+disabled, the device driver is expected to disregard the launch time
+request. For correct interpretation and meaningful operation, the launch
+time should never be set to a value larger than the farthest programmable
+time in the future (the horizon). Different devices have different hardware
+limitations on the launch time offload feature.
+
+stmmac driver
+-
+
+For stmmac, TSO and launch time (TBS) features are mutually exclusive for
+each individual Tx Queue. By default, the driver configures Tx Queue 0 to
+support TSO and the rest of the Tx Queues to support TBS. The launch time
+hardware offload feature can be enabled or disabled by using the tc-etf
+command to call the driver's ndo_setup_tc() callback.
+
+The value of the launch time that is programmed in the Enhanced Normal
+Transmit Descriptors is a 32-bit value, where the most significant 8 bits
+represent the time in seconds and the remaining 24 bits represent the time
+in 256 ns increments. The programmed launch time is compared against the
+PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
+horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
+future.
+
+The stmmac driver maintains FIFO behavior and does not perform packet
+reordering. This means that a packet with a launch time request will block
+other packets in the same Tx Queue until it is transmitted.
+
+igc driver
+--
+
+For igc, all four Tx Queues support the launch time feature. The launch
+time hardware offload feature can be enabled or disabled by using the
+tc-etf command to call the driver's ndo_setup_tc() callback. When entering
+TSN mode, the igc driver will reset the device and create a default Qbv
+schedule with a 1-second cycle time, with all Tx Queues open at all times.
+
+The value of the launch time that is programmed in the Advanced Transmit
+Context Descriptor is a relative offset t

Re: [Intel-wired-lan] [PATCH v2 RESEND net-next] e1000e: makes e1000_watchdog_task use queue_delayed_work

2025-01-06 Thread Lifshits, Vitaly




On 1/5/2025 1:38 PM, Dmitrii Ermakov wrote:

Replaces watchdog timer with delayed_work as advised
in the driver's TODO comment.

Signed-off-by: Dmitrii Ermakov 
---
V1 -> V2: Removed redundant line wraps, renamed e1000_watchdog to 
e1000_watchdog_work

  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 +--
  drivers/net/ethernet/intel/e1000e/netdev.c | 42 --
  2 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index ba9c19e6994c..5a60372d2158 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -189,12 +189,12 @@ struct e1000_phy_regs {
  
  /* board specific private data structure */

  struct e1000_adapter {
-   struct timer_list watchdog_timer;
struct timer_list phy_info_timer;
struct timer_list blink_timer;
  
+	struct delayed_work watchdog_work;

+
struct work_struct reset_task;
-   struct work_struct watchdog_task;
  
  	const struct e1000_info *ei;
  
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c

index 286155efcedf..cb68662cdc3a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1778,7 +1778,7 @@ static irqreturn_t e1000_intr_msi(int __always_unused 
irq, void *data)
}
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
-   mod_timer(&adapter->watchdog_timer, jiffies + 1);
+   queue_delayed_work(system_wq, &adapter->watchdog_work, 
1);
}
  
  	/* Reset on uncorrectable ECC error */

@@ -1857,7 +1857,7 @@ static irqreturn_t e1000_intr(int __always_unused irq, 
void *data)
}
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
-   mod_timer(&adapter->watchdog_timer, jiffies + 1);
+   queue_delayed_work(system_wq, &adapter->watchdog_work, 
1);
}
  
  	/* Reset on uncorrectable ECC error */

@@ -1901,7 +1901,7 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
hw->mac.get_link_status = true;
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
-   mod_timer(&adapter->watchdog_timer, jiffies + 1);
+   queue_delayed_work(system_wq, &adapter->watchdog_work, 
1);
}
  
  	if (!test_bit(__E1000_DOWN, &adapter->state))

@@ -4287,7 +4287,8 @@ void e1000e_down(struct e1000_adapter *adapter, bool 
reset)
  
  	napi_synchronize(&adapter->napi);
  
-	del_timer_sync(&adapter->watchdog_timer);

+   cancel_delayed_work_sync(&adapter->watchdog_work);
+
del_timer_sync(&adapter->phy_info_timer);
  
  	spin_lock(&adapter->stats64_lock);

@@ -5169,25 +5170,12 @@ static void e1000e_check_82574_phy_workaround(struct 
e1000_adapter *adapter)
}
  }
  
-/**

- * e1000_watchdog - Timer Call-back
- * @t: pointer to timer_list containing private info adapter
- **/
-static void e1000_watchdog(struct timer_list *t)
+static void e1000_watchdog_work(struct work_struct *work)
  {
-   struct e1000_adapter *adapter = from_timer(adapter, t, watchdog_timer);
-
-   /* Do the rest outside of interrupt context */
-   schedule_work(&adapter->watchdog_task);
-
-   /* TODO: make this use queue_delayed_work() */
-}
-
-static void e1000_watchdog_task(struct work_struct *work)
-{
-   struct e1000_adapter *adapter = container_of(work,
-struct e1000_adapter,
-watchdog_task);
+   struct delayed_work *dwork =
+   container_of(work, struct delayed_work, work);
+   struct e1000_adapter *adapter =
+   container_of(dwork, struct e1000_adapter, watchdog_work);
struct net_device *netdev = adapter->netdev;
struct e1000_mac_info *mac = &adapter->hw.mac;
struct e1000_phy_info *phy = &adapter->hw.phy;
@@ -5416,8 +5404,8 @@ static void e1000_watchdog_task(struct work_struct *work)
  
  	/* Reset the timer */

if (!test_bit(__E1000_DOWN, &adapter->state))
-   mod_timer(&adapter->watchdog_timer,
- round_jiffies(jiffies + 2 * HZ));
+   queue_delayed_work(system_wq, &adapter->watchdog_work,
+  round_jiffies(2 * HZ));
  }
  
  #define E1000_TX_FLAGS_CSUM		0x0001

@@ -7596,11 +7584,10 @@ static int e1000_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_eeprom;
}
  
-	timer_setup(&adapter->watchdog_timer, e1000_watchdog, 0);

timer_setup(&adapter->phy_info_timer, e1000_upd

Re: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix transaction timeouts on reset

2025-01-06 Thread Simon Horman
On Thu, Dec 19, 2024 at 06:09:32PM -0800, Emil Tantilov wrote:
> Restore the call to idpf_vc_xn_shutdown() at the beginning of
> idpf_vc_core_deinit() provided the function is not called on remove.
> In the reset path the mailbox is destroyed, leading to all transactions
> timing out.
> 
> Fixes: 09d0fb5cb30e ("idpf: deinit virtchnl transaction manager after vport 
> and vectors")
> Reviewed-by: Larysa Zaremba 
> Signed-off-by: Emil Tantilov 
> ---
> Changelog:
> v2:
> - Assigned the current state of REMOVE_IN_PROG flag to a boolean
>   variable, to be checked instead of reading the flag twice.
> - Updated the description to clarify the reason for the timeouts on
>   reset is due to the mailbox being destroyed.
> 
> v1:
> https://lore.kernel.org/intel-wired-lan/20241218014417.3786-1-emil.s.tanti...@intel.com/
> 
> Testing hints:
> echo 1 > /sys/class/net//device/reset

Thanks for the update,

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [PATCH iwl-next v4] e1000e: Fix real-time violations on link up

2025-01-06 Thread Simon Horman
On Thu, Dec 19, 2024 at 08:27:43PM +0100, Gerhard Engleder wrote:
> From: Gerhard Engleder 
> 
> Link down and up triggers update of MTA table. This update executes many
> PCIe writes and a final flush. Thus, PCIe will be blocked until all
> writes are flushed. As a result, DMA transfers of other targets suffer
> from delay in the range of 50us. This results in timing violations on
> real-time systems during link down and up of e1000e in combination with
> an Intel i3-2310E Sandy Bridge CPU.
> 
> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
> robot controller. The exact root cause of the problem is unclear and
> this situation won't change as Intel support for this CPU has ended
> years ago. Our experience is that the number of posted PCIe writes needs
> to be limited at least for real-time systems. With posted PCIe writes a
> much higher throughput can be generated than with PCIe reads which
> cannot be posted. Thus, the load on the interconnect is much higher.
> Additionally, a PCIe read waits until all posted PCIe writes are done.
> Therefore, the PCIe read can block the CPU for much more than 10us if a
> lot of PCIe writes were posted before. Both issues are the reason why we
> are limiting the number of posted PCIe writes in row in general for our
> real-time systems, not only for this driver.
> 
> A flush after a low enough number of posted PCIe writes eliminates the
> delay but also increases the time needed for MTA table update. The
> following measurements were done on i3-2310E with e1000e for 128 MTA
> table entries:
> 
> Single flush after all writes: 106us
> Flush after every write:   429us
> Flush after every 2nd write:   266us
> Flush after every 4th write:   180us
> Flush after every 8th write:   141us
> Flush after every 16th write:  121us
> 
> A flush after every 8th write delays the link up by 35us and the
> negative impact to DMA transfers of other targets is still tolerable.
> 
> Execute a flush after every 8th write. This prevents overloading the
> interconnect with posted writes.
> 
> Signed-off-by: Gerhard Engleder 
> Link: 
> https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0...@lunn.ch/T/
> CC: Vitaly Lifshits 
> Reviewed-by: Przemek Kitszel 
> Tested-by: Avigail Dahan 
> ---
> v4:
> - add PREEMPT_RT dependency again (Vitaly Lifshits)
> - fix comment styple (Alexander Lobakin)
> - add to comment each 8th and explain why (Alexander Lobakin)
> - simplify check for every 8th write (Alexander Lobakin)
> 
> v3:
> - mention problematic platform explicitly (Bjorn Helgaas)
> - improve comment (Paul Menzel)
> 
> v2:
> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)

Reviewed-by: Simon Horman 



[Intel-wired-lan] [PATCH net-next v6 0/8] fix two bugs related to page_pool

2025-01-06 Thread Yunsheng Lin
This patchset fix a possible time window problem for page_pool and
the dma API misuse problem as mentioned in [1], and try to avoid the
overhead of the fixing using some optimization.

>From the below performance data, the overhead is not so obvious
due to performance variations for time_bench_page_pool01_fast_path()
and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
for time_bench_page_pool03_slow() for fixing the bug.

Before this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[  323.367627] bench_page_pool_simple: Loaded
[  323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns 
(step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - 
(invoke count:1 tsc_interval:7699707)
[  324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns 
(step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - 
(invoke count:1 tsc_interval:134685507)
[  324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) 
- (measurement period time:0.150101270 sec time_interval:150101270) - (invoke 
count:1000 tsc_interval:15010120)
[  325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - 
(measurement period time:0.654213000 sec time_interval:654213000) - (invoke 
count:1 tsc_interval:65421294)
[  325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): 
Cannot use page_pool fast-path
[  325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 
29.633 ns (step:0) - (measurement period time:0.296338200 sec 
time_interval:296338200) - (invoke count:1000 tsc_interval:29633814)
[  325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): 
Cannot use page_pool fast-path
[  326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 
57.391 ns (step:0) - (measurement period time:0.573911820 sec 
time_interval:573911820) - (invoke count:1000 tsc_interval:57391174)
[  326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot 
use page_pool fast-path
[  328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 
181.849 ns (step:0) - (measurement period time:1.818495880 sec 
time_interval:1818495880) - (invoke count:1000 tsc_interval:181849581)
[  328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq 
fast-path
[  328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): 
in_serving_softirq fast-path
[  328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 
cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec 
time_interval:296327910) - (invoke count:1000 tsc_interval:29632785)
[  328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): 
in_serving_softirq fast-path
[  329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 
cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec 
time_interval:795236560) - (invoke count:1000 tsc_interval:79523650)
[  329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): 
in_serving_softirq fast-path
[  331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 
cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec 
time_interval:1901047510) - (invoke count:1000 tsc_interval:190104743)

After this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[  138.634758] bench_page_pool_simple: Loaded
[  138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns 
(step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - 
(invoke count:1 tsc_interval:7697265)
[  140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns 
(step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - 
(invoke count:1 tsc_interval:134673531)
[  140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) 
- (measurement period time:0.150055080 sec time_interval:150055080) - (invoke 
count:1000 tsc_interval:15005497)
[  140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - 
(measurement period time:0.654125000 sec time_interval:654125000) - (invoke 
count:1 tsc_interval:65412493)
[  140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): 
Cannot use page_pool fast-path
[  141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 
30.159 ns (step:0) - (measurement period time:0.301598160 sec 
time_interval:301598160) - (invoke count:1000 tsc_interval:30159812)
[  141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): 
Cannot use page_pool fast-path
[  141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 
70.140 ns (step:0) - (measurement period time:0.701405780 sec 
time_interval:701405780) - (invoke count:1000 tsc_interval:70140573)
[  141.994933] bench_page_pool_simple: time_

[Intel-wired-lan] [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API

2025-01-06 Thread Yunsheng Lin
introduce page_pool_get_pp() API to avoid caller accessing
page->pp directly.

Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/freescale/fec_main.c  |  8 +---
 .../net/ethernet/google/gve/gve_buffer_mgmt_dqo.c  |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c|  6 --
 drivers/net/ethernet/intel/idpf/idpf_txrx.c| 14 +-
 drivers/net/ethernet/intel/libeth/rx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |  3 ++-
 drivers/net/netdevsim/netdev.c |  6 --
 drivers/net/wireless/mediatek/mt76/mt76.h  |  2 +-
 include/net/libeth/rx.h|  3 ++-
 include/net/page_pool/helpers.h|  5 +
 10 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index b2daed55bf6c..18d2119dbec1 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1009,7 +1009,8 @@ static void fec_enet_bd_init(struct net_device *dev)
struct page *page = txq->tx_buf[i].buf_p;
 
if (page)
-   page_pool_put_page(page->pp, page, 0, 
false);
+   
page_pool_put_page(page_pool_get_pp(page),
+  page, 0, false);
}
 
txq->tx_buf[i].buf_p = NULL;
@@ -1549,7 +1550,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id, 
int budget)
xdp_return_frame_rx_napi(xdpf);
} else { /* recycle pages of XDP_TX frames */
/* The dma_sync_size = 0 as XDP_TX has already synced 
DMA for_device */
-   page_pool_put_page(page->pp, page, 0, true);
+   page_pool_put_page(page_pool_get_pp(page), page, 0, 
true);
}
 
txq->tx_buf[index].buf_p = NULL;
@@ -3307,7 +3308,8 @@ static void fec_enet_free_buffers(struct net_device *ndev)
} else {
struct page *page = txq->tx_buf[i].buf_p;
 
-   page_pool_put_page(page->pp, page, 0, false);
+   page_pool_put_page(page_pool_get_pp(page),
+  page, 0, false);
}
 
txq->tx_buf[i].buf_p = NULL;
diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c 
b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 403f0f335ba6..87422b8828ff 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -210,7 +210,7 @@ void gve_free_to_page_pool(struct gve_rx_ring *rx,
if (!page)
return;
 
-   page_pool_put_full_page(page->pp, page, allow_direct);
+   page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct);
buf_state->page_info.page = NULL;
 }
 
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c 
b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 26b424fd6718..e1bf5554f6e3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1050,7 +1050,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
 const struct libeth_fqe *rx_buffer,
 unsigned int size)
 {
-   u32 hr = rx_buffer->page->pp->p.offset;
+   struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+   u32 hr = pool->p.offset;
 
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->offset + hr, size, rx_buffer->truesize);
@@ -1067,7 +1068,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
 static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
  unsigned int size)
 {
-   u32 hr = rx_buffer->page->pp->p.offset;
+   struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+   u32 hr = pool->p.offset;
struct sk_buff *skb;
void *va;
 
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c 
b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 2fa9c36e33c9..04f2347716ca 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -385,7 +385,8 @@ static void idpf_rx_page_rel(struct libeth_fqe *rx_buf)
if (unlikely(!rx_buf->page))
return;
 
-   page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
+   page_pool_put_full_page(page_pool_get_pp(rx_buf->page), rx_buf->page,
+   false);
 
rx_buf->page = NULL;
rx_buf->offset = 0;
@@ -3098,7 +3099,8 @@ idpf_rx_process_skb_fields(struct idpf_rx_queu

Re: [Intel-wired-lan] [RFC net-next] ixgbevf: Remove unused ixgbevf_hv_mbx_ops

2025-01-06 Thread Simon Horman
On Thu, Dec 26, 2024 at 02:09:23PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The const struct ixgbevf_hv_mbx_ops was added in 2016 as part of
> commit c6d45171d706 ("ixgbevf: Support Windows hosts (Hyper-V)")
> 
> but has remained unused.
> 
> The functions it references are still referenced elsewhere.
> 
> Remove it.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 1/3] igc: Remove unused igc_acquire/release_nvm

2025-01-06 Thread Simon Horman
On Thu, Dec 26, 2024 at 04:52:13PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> igc_acquire_nvm() and igc_release_nvm() were added in 2018 as part of
> commit ab4056126813 ("igc: Add NVM support")
> 
> but never used.
> 
> Remove them.
> 
> The igc_1225.c has it's own specific implementations.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 2/3] igc: Remove unused igc_read/write_pci_cfg wrappers

2025-01-06 Thread Simon Horman
On Thu, Dec 26, 2024 at 04:52:14PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> igc_read_pci_cfg() and igc_write_pci_cfg were added in 2018 as part of
> commit 146740f9abc4 ("igc: Add support for PF")
> but have remained unused.
> 
> Remove them.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman 



Re: [Intel-wired-lan] [RFC net-next 3/3] igc: Remove unused igc_read/write_pcie_cap_reg

2025-01-06 Thread Simon Horman
On Thu, Dec 26, 2024 at 04:52:15PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The last uses of igc_read_pcie_cap_reg() and igc_write_pcie_cap_reg()
> were removed in 2019 by
> commit 16ecd8d9af26 ("igc: Remove the obsolete workaround")
> 
> Remove them.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Simon Horman