date:20250206

[Intel-wired-lan] [PATCH iwl-next v1 1/3] ice: Add sync delay for E825C

2025-02-06 Thread Grzegorz Nitka

From: Karol Kolacinski 

Implement setting GLTSYN_SYNC_DLAY for E825C products.
This is the execution delay compensation of SYNC command between
PHC and PHY.
Also, refactor the code by changing ice_ptp_init_phc_eth56g function
name to ice_ptp_init_phc_e825, to be consistent with the naming pattern
for other devices.

Reviewed-by: Przemek Kitszel 
Signed-off-by: Karol Kolacinski 
Signed-off-by: Grzegorz Nitka 
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 14 +++---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c 
b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 3e824f7b30c0..c3dea6d61de4 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -2650,18 +2650,18 @@ static void ice_sb_access_ena_eth56g(struct ice_hw *hw, 
bool enable)
 }
 
 /**
- * ice_ptp_init_phc_eth56g - Perform E82X specific PHC initialization
+ * ice_ptp_init_phc_e825 - Perform E825 specific PHC initialization
  * @hw: pointer to HW struct
  *
- * Perform PHC initialization steps specific to E82X devices.
+ * Perform E825-specific PTP hardware clock initialization steps.
  *
- * Return:
- * * %0 - success
- * * %other - failed to initialize CGU
+ * Return: 0 on success, negative error code otherwise.
  */
-static int ice_ptp_init_phc_eth56g(struct ice_hw *hw)
+static int ice_ptp_init_phc_e825(struct ice_hw *hw)
 {
ice_sb_access_ena_eth56g(hw, true);
+   ice_ptp_cfg_sync_delay(hw, ICE_E825_SYNC_DELAY);
+
/* Initialize the Clock Generation Unit */
return ice_init_cgu_e82x(hw);
 }
@@ -6123,7 +6123,7 @@ int ice_ptp_init_phc(struct ice_hw *hw)
case ICE_MAC_GENERIC:
return ice_ptp_init_phc_e82x(hw);
case ICE_MAC_GENERIC_3K_E825:
-   return ice_ptp_init_phc_eth56g(hw);
+   return ice_ptp_init_phc_e825(hw);
default:
return -EOPNOTSUPP;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h 
b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index 8442d1d60351..10295fa9a383 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -324,7 +324,10 @@ extern const struct ice_vernier_info_e82x 
e822_vernier[NUM_ICE_PTP_LNK_SPD];
  */
 #define ICE_E810_PLL_FREQ  81250
 #define ICE_PTP_NOMINAL_INCVAL_E8100x13b13b13bULL
+
+/* PHC to PHY synchronization delay definitions */
 #define ICE_E810_E830_SYNC_DELAY   0
+#define ICE_E825_SYNC_DELAY6
 
 /* Device agnostic functions */
 u8 ice_get_ptp_src_clock_index(struct ice_hw *hw);
-- 
2.39.3

[Intel-wired-lan] [PATCH iwl-next v1 2/3] ice: Refactor E825C PHY registers info struct

2025-02-06 Thread Grzegorz Nitka

From: Karol Kolacinski 

Simplify ice_phy_reg_info_eth56g struct definition to include base
address for the very first quad. Use base address info and 'step'
value to determine address for specific PHY quad.

Reviewed-by: Przemek Kitszel 
Signed-off-by: Karol Kolacinski 
Signed-off-by: Grzegorz Nitka 
---
 .../net/ethernet/intel/ice/ice_ptp_consts.h   | 75 ---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |  6 +-
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   |  2 +-
 3 files changed, 19 insertions(+), 64 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h 
b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h
index ac46d1183300..003cdfada3ca 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h
@@ -10,70 +10,25 @@
 /* Constants defined for the PTP 1588 clock hardware. */
 
 const struct ice_phy_reg_info_eth56g eth56g_phy_res[NUM_ETH56G_PHY_RES] = {
-   /* ETH56G_PHY_REG_PTP */
-   {
-   /* base_addr */
-   {
-   0x092000,
-   0x126000,
-   0x1BA000,
-   0x24E000,
-   0x2E2000,
-   },
-   /* step */
-   0x98,
+   [ETH56G_PHY_REG_PTP] = {
+   .base_addr = 0x092000,
+   .step = 0x98,
},
-   /* ETH56G_PHY_MEM_PTP */
-   {
-   /* base_addr */
-   {
-   0x093000,
-   0x127000,
-   0x1BB000,
-   0x24F000,
-   0x2E3000,
-   },
-   /* step */
-   0x200,
+   [ETH56G_PHY_MEM_PTP] = {
+   .base_addr = 0x093000,
+   .step = 0x200,
},
-   /* ETH56G_PHY_REG_XPCS */
-   {
-   /* base_addr */
-   {
-   0x00,
-   0x009400,
-   0x128000,
-   0x1BC000,
-   0x25,
-   },
-   /* step */
-   0x21000,
+   [ETH56G_PHY_REG_XPCS] = {
+   .base_addr = 0x00,
+   .step = 0x21000,
},
-   /* ETH56G_PHY_REG_MAC */
-   {
-   /* base_addr */
-   {
-   0x085000,
-   0x119000,
-   0x1AD000,
-   0x241000,
-   0x2D5000,
-   },
-   /* step */
-   0x1000,
+   [ETH56G_PHY_REG_MAC] = {
+   .base_addr = 0x085000,
+   .step = 0x1000,
},
-   /* ETH56G_PHY_REG_GPCS */
-   {
-   /* base_addr */
-   {
-   0x084000,
-   0x118000,
-   0x1AC000,
-   0x24,
-   0x2D4000,
-   },
-   /* step */
-   0x400,
+   [ETH56G_PHY_REG_GPCS] = {
+   .base_addr = 0x084000,
+   .step = 0x400,
},
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c 
b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index c3dea6d61de4..5c61bc3a2c25 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -1010,7 +1010,7 @@ static int ice_phy_res_address_eth56g(struct ice_hw *hw, 
u8 lane,
 
/* Lanes 4..7 are in fact 0..3 on a second PHY */
lane %= hw->ptp.ports_per_phy;
-   *addr = eth56g_phy_res[res_type].base[0] +
+   *addr = eth56g_phy_res[res_type].base_addr +
lane * eth56g_phy_res[res_type].step + offset;
 
return 0;
@@ -1240,7 +1240,7 @@ static int ice_write_quad_ptp_reg_eth56g(struct ice_hw 
*hw, u8 port,
if (port >= hw->ptp.num_lports)
return -EIO;
 
-   addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base[0] + offset;
+   addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base_addr + offset;
 
return ice_write_phy_eth56g(hw, port, addr, val);
 }
@@ -1265,7 +1265,7 @@ static int ice_read_quad_ptp_reg_eth56g(struct ice_hw 
*hw, u8 port,
if (port >= hw->ptp.num_lports)
return -EIO;
 
-   addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base[0] + offset;
+   addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base_addr + offset;
 
return ice_read_phy_eth56g(hw, port, addr, val);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h 
b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index 10295fa9a383..63a63ef64aaa 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -72,7 +72,7 @@ enum ice_eth56g_link_spd {
  * ETH56G devices
  */
 struct ice_phy_reg_info_eth56g {
-   u32 base[NUM_ETH56G_PHY_RES];
+   u32 base_addr;
u32 step;
 };
 
-- 
2.39.3

[Intel-wired-lan] [PATCH iwl-next v1 3/3] ice: E825C PHY register cleanup

2025-02-06 Thread Grzegorz Nitka

From: Karol Kolacinski 

Minor PTP register refactor, including logical grouping E825C 1-step
timestamping registers. Remove unused register definitions
(PHY_REG_GPCS_BITSLIP, PHY_REG_REVISION).
Also, apply preferred GENMASK macro (instead of ICE_M) for register
fields definition affected by this patch.

Reviewed-by: Przemek Kitszel 
Signed-off-by: Karol Kolacinski 
Signed-off-by: Grzegorz Nitka 
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 31 ++---
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h 
b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index 63a63ef64aaa..6ca1561ec5e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -783,36 +783,19 @@ static inline bool ice_is_dual(struct ice_hw *hw)
 #define PHY_MAC_XIF_TS_SFD_ENA_M   ICE_M(0x1, 20)
 #define PHY_MAC_XIF_GMII_TS_SEL_M  ICE_M(0x1, 21)
 
-/* GPCS config register */
-#define PHY_GPCS_CONFIG_REG0   0x268
-#define PHY_GPCS_CONFIG_REG0_TX_THR_M  ICE_M(0xF, 24)
-#define PHY_GPCS_BITSLIP   0x5C
-
 #define PHY_TS_INT_CONFIG_THRESHOLD_M  ICE_M(0x3F, 0)
 #define PHY_TS_INT_CONFIG_ENA_MBIT(6)
 
-/* 1-step PTP config */
-#define PHY_PTP_1STEP_CONFIG   0x270
-#define PHY_PTP_1STEP_T1S_UP64_M   ICE_M(0xF, 4)
-#define PHY_PTP_1STEP_T1S_DELTA_M  ICE_M(0xF, 8)
-#define PHY_PTP_1STEP_PEER_DELAY(_port)(0x274 + 4 * (_port))
-#define PHY_PTP_1STEP_PD_ADD_PD_M  ICE_M(0x1, 0)
-#define PHY_PTP_1STEP_PD_DELAY_M   ICE_M(0x3fff, 1)
-#define PHY_PTP_1STEP_PD_DLY_V_M   ICE_M(0x1, 31)
-
 /* Macros to derive offsets for TimeStampLow and TimeStampHigh */
 #define PHY_TSTAMP_L(x) (((x) * 8) + 0)
 #define PHY_TSTAMP_U(x) (((x) * 8) + 4)
 
-#define PHY_REG_REVISION   0x85000
-
 #define PHY_REG_DESKEW_0   0x94
 #define PHY_REG_DESKEW_0_RLEVELGENMASK(6, 0)
 #define PHY_REG_DESKEW_0_RLEVEL_FRAC   GENMASK(9, 7)
 #define PHY_REG_DESKEW_0_RLEVEL_FRAC_W 3
 #define PHY_REG_DESKEW_0_VALID GENMASK(10, 10)
 
-#define PHY_REG_GPCS_BITSLIP   0x5C
 #define PHY_REG_SD_BIT_SLIP(_port_offset)  (0x29C + 4 * (_port_offset))
 #define PHY_REVISION_ETH56G0x10200
 #define PHY_VENDOR_TXLANE_THRESH   0x2000C
@@ -832,7 +815,21 @@ static inline bool ice_is_dual(struct ice_hw *hw)
 #define PHY_MAC_BLOCKTIME  0x50
 #define PHY_MAC_MARKERTIME 0x54
 #define PHY_MAC_TX_OFFSET  0x58
+#define PHY_GPCS_BITSLIP   0x5C
 
 #define PHY_PTP_INT_STATUS 0x7FD140
 
+/* ETH56G registers shared per quad */
+/* GPCS config register */
+#define PHY_GPCS_CONFIG_REG0   0x268
+#define PHY_GPCS_CONFIG_REG0_TX_THR_M  GENMASK(27, 24)
+/* 1-step PTP config */
+#define PHY_PTP_1STEP_CONFIG   0x270
+#define PHY_PTP_1STEP_T1S_UP64_M   GENMASK(7, 4)
+#define PHY_PTP_1STEP_T1S_DELTA_M  GENMASK(11, 8)
+#define PHY_PTP_1STEP_PEER_DELAY(_quad_lane)   (0x274 + 4 * (_quad_lane))
+#define PHY_PTP_1STEP_PD_ADD_PD_M  BIT(0)
+#define PHY_PTP_1STEP_PD_DELAY_M   GENMASK(30, 1)
+#define PHY_PTP_1STEP_PD_DLY_V_M   BIT(31)
+
 #endif /* _ICE_PTP_HW_H_ */
-- 
2.39.3

[Intel-wired-lan] [PATCH iwl-next v1 0/3] E825C PTP cleanup

2025-02-06 Thread Grzegorz Nitka

This patch series simplifies PTP code related to E825C products by
simplifying PHY register info definition.
Cleanup the code by removing unused register definitions.
Also, add sync delay compensation between PHC and PHY for E825C.

Karol Kolacinski (3):
  ice: Add sync delay for E825C
  ice: Refactor E825C PHY registers info struct
  ice: E825C PHY register cleanup

 .../net/ethernet/intel/ice/ice_ptp_consts.h   | 75 ---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   | 20 ++---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   | 36 -
 3 files changed, 43 insertions(+), 88 deletions(-)


base-commit: 70bdf16570c2c207a562e996833ff196a4bd7029
-- 
2.39.3

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

2025-02-06 Thread Leon Romanovsky

On Thu, Feb 06, 2025 at 02:16:08PM +0530, Bharat Bhushan wrote:
> Hi Leon,
> 
> On Wed, Feb 5, 2025 at 11:50 PM Leon Romanovsky  wrote:
> >
> > From: Leon Romanovsky 
> >
> > XFRM offload path is probed even if offload isn't needed at all. Let's
> > make sure that x->type_offload pointer stays NULL for such path to
> > reduce ambiguity.
> >
> > Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.")
> > Signed-off-by: Leon Romanovsky 
> > ---
> >  include/net/xfrm.h | 12 +++-
> >  net/xfrm/xfrm_device.c | 14 +-
> >  net/xfrm/xfrm_state.c  | 22 +-
> >  net/xfrm/xfrm_user.c   |  2 +-
> >  4 files changed, 30 insertions(+), 20 deletions(-)

<...>

> > +   x->type_offload = xfrm_get_type_offload(x->id.proto, 
> > x->props.family);
> > +   if (!x->type_offload) {

<...>

> > +   xfrm_put_type_offload(x->type_offload);
> > +   x->type_offload = NULL;
> 
> We always set type_offload to NULL. Can we move type_offload = NULL in
> xfrm_put_type_offload() ?

We can, but it will require change to xfrm_get_type_offload() too,
otherwise we will get asymmetrical get/put.

Do you want something like that?
int xfrm_get_type_offload(struct xfrm_state *x);
void xfrm_put_type_offload(struct xfrm_state *x);

Thansk

> 
> Thanks
> -Bharat
> 
> > /* User explicitly requested packet offload mode and 
> > configured
> >  * policy in addition to the XFRM state. So be civil to 
> > users,
> >  * and return an error instead of taking fallback path.
> > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > index ad2202fa82f3..568fe8df7741 100644
> > --- a/net/xfrm/xfrm_state.c
> > +++ b/net/xfrm/xfrm_state.c
> > @@ -424,11 +424,12 @@ void xfrm_unregister_type_offload(const struct 
> > xfrm_type_offload *type,
> >  }
> >  EXPORT_SYMBOL(xfrm_unregister_type_offload);
> >
> > -static const struct xfrm_type_offload *
> > -xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
> > +const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto,
> > + unsigned short family)
> >  {
> > const struct xfrm_type_offload *type = NULL;
> > struct xfrm_state_afinfo *afinfo;
> > +   bool try_load = true;
> >
> >  retry:
> > afinfo = xfrm_state_get_afinfo(family);
> > @@ -456,11 +457,7 @@ xfrm_get_type_offload(u8 proto, unsigned short family, 
> > bool try_load)
> >
> > return type;
> >  }
> > -
> > -static void xfrm_put_type_offload(const struct xfrm_type_offload *type)
> > -{
> > -   module_put(type->owner);
> > -}
> > +EXPORT_SYMBOL(xfrm_get_type_offload);
> >
> >  static const struct xfrm_mode xfrm4_mode_map[XFRM_MODE_MAX] = {
> > [XFRM_MODE_BEET] = {
> > @@ -609,8 +606,6 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
> > kfree(x->coaddr);
> > kfree(x->replay_esn);
> > kfree(x->preplay_esn);
> > -   if (x->type_offload)
> > -   xfrm_put_type_offload(x->type_offload);
> > if (x->type) {
> > x->type->destructor(x);
> > xfrm_put_type(x->type);
> > @@ -784,6 +779,9 @@ void xfrm_dev_state_free(struct xfrm_state *x)
> > struct xfrm_dev_offload *xso = &x->xso;
> > struct net_device *dev = READ_ONCE(xso->dev);
> >
> > +   xfrm_put_type_offload(x->type_offload);
> > +   x->type_offload = NULL;
> > +
> > if (dev && dev->xfrmdev_ops) {
> > spin_lock_bh(&xfrm_state_dev_gc_lock);
> > if (!hlist_unhashed(&x->dev_gclist))
> > @@ -3122,7 +3120,7 @@ u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
> >  }
> >  EXPORT_SYMBOL_GPL(xfrm_state_mtu);
> >
> > -int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
> > +int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
> >   struct netlink_ext_ack *extack)
> >  {
> > const struct xfrm_mode *inner_mode;
> > @@ -3178,8 +3176,6 @@ int __xfrm_init_state(struct xfrm_state *x, bool 
> > init_replay, bool offload,
> > goto error;
> > }
> >
> > -   x->type_offload = xfrm_get_type_offload(x->id.proto, family, 
> > offload);
> > -
> > err = x->type->init_state(x, extack);
> > if (err)
> > goto error;
> > @@ -3229,7 +3225,7 @@ int xfrm_init_state(struct xfrm_state *x)
> >  {
> > int err;
> >
> > -   err = __xfrm_init_state(x, true, false, NULL);
> > +   err = __xfrm_init_state(x, true, NULL);
> > if (!err)
> > x->km.state = XFRM_STATE_VALID;
> >
> > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> > index 08c6d6f0179f..82a768500999 100644
> > --- a/net/xfrm/xfrm_user.c
> > +++ b/net/xfrm/xfrm_user.c
> > @@ -907,7 +907,7 @@ static struct xfrm_state *xfrm_state_construct(struct 
> > net *net,
> > goto error;
> > }
> >
> > -   err = __x

Re: [Intel-wired-lan] [PATCH bpf-next v9 4/5] igc: Refactor empty frame insertion for launch time support

2025-02-06 Thread Maciej Fijalkowski

On Thu, Feb 06, 2025 at 02:04:07PM +0800, Song Yoong Siang wrote:
> Refactor the code for inserting an empty frame into a new function
> igc_insert_empty_frame(). This change extracts the logic for inserting
> an empty packet from igc_xmit_frame_ring() into a separate function,
> allowing it to be reused in future implementations, such as the XDP
> zero copy transmit function.
> 
> Remove the igc_desc_unused() checking in igc_init_tx_empty_descriptor()
> because the number of descriptors needed is guaranteed.
> 
> Ensure that skb allocation and DMA mapping work for the empty frame,
> before proceeding to fill in igc_tx_buffer info, context descriptor,
> and data descriptor.
> 
> Rate limit the error messages for skb allocation and DMA mapping failures.
> 
> Update the comment to indicate that the 2 descriptors needed by the empty
> frame are already taken into consideration in igc_xmit_frame_ring().
> 
> Handle the case where the insertion of an empty frame fails and explain
> the reason behind this handling.
> 
> Signed-off-by: Song Yoong Siang 

Reviewed-by: Maciej Fijalkowski 

with one nit below.

> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 84 ++-
>  1 file changed, 52 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
> b/drivers/net/ethernet/intel/igc/igc_main.c
> index 84307bb7313e..3df608601a4b 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1092,7 +1092,9 @@ static int igc_init_empty_frame(struct igc_ring *ring,
>  
>   dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
>   if (dma_mapping_error(ring->dev, dma)) {
> - netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
> + if (net_ratelimit())
> + netdev_err(ring->netdev,
> +"DMA mapping error for empty frame\n");

This is open-coded net_err_ratelimited(), no?

>   return -ENOMEM;
>   }
>  
> @@ -1108,20 +1110,12 @@ static int igc_init_empty_frame(struct igc_ring *ring,
>   return 0;
>  }
>  
> -static int igc_init_tx_empty_descriptor(struct igc_ring *ring,
> - struct sk_buff *skb,
> - struct igc_tx_buffer *first)
> +static void igc_init_tx_empty_descriptor(struct igc_ring *ring,
> +  struct sk_buff *skb,
> +  struct igc_tx_buffer *first)
>  {
>   union igc_adv_tx_desc *desc;
>   u32 cmd_type, olinfo_status;
> - int err;
> -
> - if (!igc_desc_unused(ring))
> - return -EBUSY;
> -
> - err = igc_init_empty_frame(ring, first, skb);
> - if (err)
> - return err;
>  
>   cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
>  IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> @@ -1140,8 +1134,6 @@ static int igc_init_tx_empty_descriptor(struct igc_ring 
> *ring,
>   ring->next_to_use++;
>   if (ring->next_to_use == ring->count)
>   ring->next_to_use = 0;
> -
> - return 0;
>  }
>  
>  #define IGC_EMPTY_FRAME_SIZE 60
> @@ -1567,6 +1559,41 @@ static bool igc_request_tx_tstamp(struct igc_adapter 
> *adapter, struct sk_buff *s
>   return false;
>  }
>  
> +static int igc_insert_empty_frame(struct igc_ring *tx_ring)
> +{
> + struct igc_tx_buffer *empty_info;
> + struct sk_buff *empty_skb;
> + void *data;
> + int ret;
> +
> + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + empty_skb = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> + if (unlikely(!empty_skb)) {
> + if (net_ratelimit())
> + netdev_err(tx_ring->netdev,
> +"skb alloc error for empty frame\n");

ditto

> + return -ENOMEM;
> + }
> +
> + data = skb_put(empty_skb, IGC_EMPTY_FRAME_SIZE);
> + memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> +
> + /* Prepare DMA mapping and Tx buffer information */
> + ret = igc_init_empty_frame(tx_ring, empty_info, empty_skb);
> + if (unlikely(ret)) {
> + dev_kfree_skb_any(empty_skb);
> + return ret;
> + }
> +
> + /* Prepare advanced context descriptor for empty packet */
> + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> +
> + /* Prepare advanced data descriptor for empty packet */
> + igc_init_tx_empty_descriptor(tx_ring, empty_skb, empty_info);
> +
> + return 0;
> +}
> +
>  static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
>  struct igc_ring *tx_ring)
>  {
> @@ -1586,6 +1613,7 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff 
> *skb,
>*  + 1 desc for skb_headlen/IGC_MAX_DATA_PER_TXD,
>*  + 2 desc gap to keep tail from touching head,
>*  + 1 desc for context descriptor,
> +  *  + 2 desc for inserting an em

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

2025-02-06 Thread Jagielski, Jedrzej

From: Andrew Lunn  
Sent: Tuesday, February 4, 2025 2:09 PM
>On Tue, Feb 04, 2025 at 08:17:00AM +0100, Jedrzej Jagielski wrote:
>> E610 NICs unlike the previous devices utilising ixgbe driver
>> are notified in the case of overheatning by the FW ACI event.
>> 
>> In event of overheat when treshold is exceeded, FW suspends all
>> traffic and sends overtemp event to the driver. Then driver
>> logs appropriate message and closes the adapter instance.
>> The card remains in that state until the platform is rebooted.
>
>There is also an HWMON temp[1-*]_emergency_alarm you can set. I
>_think_ that should also cause a udev event, so user space knows the
>print^h^h^h^h^hnetwork is on fire.
>
>   Andrew

I am not sure whether HWMON is applicable in that case.
Driver receives an async notification from the FW that an overheating
occurred, so has to handle it. In that case - by printing msg
and making the interface disabled for the user.
FW is responsible for monitoring temperature itself.
There's even no possibility to read temperature by the driver

Thanks
Jedrek

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

2025-02-06 Thread Andrew Lunn

On Thu, Feb 06, 2025 at 01:05:27PM +, Jagielski, Jedrzej wrote:
> From: Andrew Lunn  
> Sent: Tuesday, February 4, 2025 2:09 PM
> >On Tue, Feb 04, 2025 at 08:17:00AM +0100, Jedrzej Jagielski wrote:
> >> E610 NICs unlike the previous devices utilising ixgbe driver
> >> are notified in the case of overheatning by the FW ACI event.
> >> 
> >> In event of overheat when treshold is exceeded, FW suspends all
> >> traffic and sends overtemp event to the driver. Then driver
> >> logs appropriate message and closes the adapter instance.
> >> The card remains in that state until the platform is rebooted.
> >
> >There is also an HWMON temp[1-*]_emergency_alarm you can set. I
> >_think_ that should also cause a udev event, so user space knows the
> >print^h^h^h^h^hnetwork is on fire.
> >
> > Andrew
> 
> I am not sure whether HWMON is applicable in that case.
> Driver receives an async notification from the FW that an overheating
> occurred, so has to handle it. In that case - by printing msg
> and making the interface disabled for the user.
> FW is responsible for monitoring temperature itself.
> There's even no possibility to read temperature by the driver

https://elixir.bootlin.com/linux/v6.13.1/source/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c#L27

ixgbe_hwmon_show_temp() is some other temperature sensor? Which you do
have HWMON support for?

Or is the E610 not really an ixgbe, it has a different architecture,
more stuff pushed into firmware, less visibility from the kernel, no
temperature monitoring, just a NIC on fire indication?

Andrew

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

2025-02-06 Thread Jagielski, Jedrzej

From: Andrew Lunn  
Sent: Thursday, February 6, 2025 2:59 PM
>On Thu, Feb 06, 2025 at 01:05:27PM +, Jagielski, Jedrzej wrote:
>> From: Andrew Lunn  
>> Sent: Tuesday, February 4, 2025 2:09 PM
>> >On Tue, Feb 04, 2025 at 08:17:00AM +0100, Jedrzej Jagielski wrote:
>> >> E610 NICs unlike the previous devices utilising ixgbe driver
>> >> are notified in the case of overheatning by the FW ACI event.
>> >> 
>> >> In event of overheat when treshold is exceeded, FW suspends all
>> >> traffic and sends overtemp event to the driver. Then driver
>> >> logs appropriate message and closes the adapter instance.
>> >> The card remains in that state until the platform is rebooted.
>> >
>> >There is also an HWMON temp[1-*]_emergency_alarm you can set. I
>> >_think_ that should also cause a udev event, so user space knows the
>> >print^h^h^h^h^hnetwork is on fire.
>> >
>> >Andrew
>> 
>> I am not sure whether HWMON is applicable in that case.
>> Driver receives an async notification from the FW that an overheating
>> occurred, so has to handle it. In that case - by printing msg
>> and making the interface disabled for the user.
>> FW is responsible for monitoring temperature itself.
>> There's even no possibility to read temperature by the driver
>
>https://elixir.bootlin.com/linux/v6.13.1/source/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c#L27
>
>ixgbe_hwmon_show_temp() is some other temperature sensor? Which you do
>have HWMON support for?

This feature is not supported for E610 which has no support for reading
temperature

hw->mac.ops.get_thermal_sensor_data() callback used in
ixgbe_hwmon_show_temp has no implementation for E610, as there is no
such support from the FW side

>
>Or is the E610 not really an ixgbe, it has a different architecture,

ixgbe is used by several adapters, each is slightly different
in this case monitoring stuff is pushed into FW

>more stuff pushed into firmware, less visibility from the kernel, no
>temperature monitoring, just a NIC on fire indication?

yeah, right

Jedrek

Re: [Intel-wired-lan] [PATCH iwl-next v2 5/9] igc: Add support for frame preemption verification

2025-02-06 Thread Abdul Rahim, Faizal




Hi Vladimir,

Thanks for the quick review, appreciate your help.

On 6/2/2025 1:12 am, Vladimir Oltean wrote:

On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote:

This patch implements the "ethtool --set-mm" callback to trigger the
frame preemption verification handshake.

Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool
to perform the verification handshake for igc.
The structure fpe.mmsv is set by mmsv in ethtool and should remain
read-only for the driver.

igc does not use two mmsv callbacks:
a) configure_tx()
- igc lacks registers to configure FPE in the transmit direction.


Yes, maybe, but it's still important to handle this. It tells you when
the preemptible traffic classes should be sent as preemptible on the wire
(i.e. when the verification is either disabled, or it succeeded).

There is a selftest called manual_failed_verification() which supposedly
tests this exact condition: if verification fails, then packets sent to
TC0 are supposed to bump the eMAC's TX counters, even though TC0 is
configured as preemptible. Otherwise stated: even if the tc program says
that a certain traffic class is preemptible, you don't want to actually
send preemptible packets if you haven't verified the link partner can
handle them, since it will likely drop them on RX otherwise.


Even though fpe in tx direction isn't set in igc, it still checks 
ethtool_mmsv_is_tx_active() before setting a queue as preemptible.


This is done in :
igc_tsn_enable_offload(struct igc_adapter *adapter) {
{

if (ethtool_mmsv_is_tx_active(&adapter->fpe.mmsv) &&
ring->preemptible)
txqctl |= IGC_TXQCTL_PREEMPTIBLE;


Wouldn't this handle the situation mentioned ?
Sorry if I miss something here.


The problem with that selftest is that it relies on the driver's ability
to report split ethtool counters for eMAC/pMAC, rather than just aggregate.
In lack of that, you need to know through some other mechanism whether
the link partner received those packets as preemptible or express, and
that's kind of hard to add to a selftest.


b) configure_pmac()
- this callback dynamically controls pmac_enabled at runtime. For
  example, mmsv calls configure_pmac() and disables pmac_enabled when
  the link partner goes down, even if the user previously enabled it.
  The intention is to save power but it is not feasible in igc
  because it causes an endless adapter reset loop:

 1) Board A and Board B complete the verification handshake.
 Tx mode register for both boards are in TSN mode.
 2) Board B link goes down.

  On Board A:
 3) mmsv calls configure_pmac() with pmac_enabled = false.
 4) configure_pmac() in igc updates a new field based on
 pmac_enabled. Driver uses this field in igc_tsn_new_flags()
 to indicate that the user enabled/disabled FPE.
 5) configure_pmac() in igc calls igc_tsn_offload_apply() to check
 whether an adapter reset is needed. Calls existing logic in
 igc_tsn_will_tx_mode_change() and igc_tsn_new_flags().
 6) Since pmac_enabled is now disabled and no other TSN feature
 is active, igc_tsn_will_tx_mode_change() evaluates to true
 because Tx mode will switch from TSN to Legacy.
 7) Driver resets the adapter.
 8) Registers are set, and Tx mode switches to Legacy.
 9) When link partner is up, steps 3–8 repeat, but this time
 with pmac_enabled = true, reactivating TSN.
 igc_tsn_will_tx_mode_change() evaluates to true again,
 since Tx mode will switch from Legacy to TSN.
10) Driver resets the adapter.
11) Rest adapter completes, registers are set, and Tx mode
switches to TSN.

 On Board B:
12) Adapter reset on Board A at step 10 causes it to detect its
 link partner as down.


Is this using fiber/DAC, or twisted pair (RJ45 PHY)?


It's using RJ45 PHY.


13) Repeats steps 3–8.
14) Once reset adapter on Board A is completed at step 11, it
 detects its link partner as up.
15) Repeats steps 9–11.

- this cycle repeats indefinitely. To avoid this issue, IGC only uses
  mmsv.pmac_enabled to track whether FPE is enabled or disabled.


Not that I couldn't eventually tolerate this workaround, but I figure
it's worth asking anyway. Isn't there a way to do an adapter reset
without losing link in the PHY (assuming it's the PHY that loses link)?
Is that a consequence of software design that's not careful enough, or
of hardware design? I assume the PHY is a discrete component. Avoiding
the need to retrigger auto-negotiation saves a few seconds of waiting,
so it's worth pursuing if it's possible at all.



I briefly checked the driver code and the i226 SW User Manual.

The code calls igc_reset_task() → igc_reset() → igc_reinit_locked() → 
igc_down() → igc_reset

[Intel-wired-lan] [tnguy-next-queue:for-next] BUILD SUCCESS d67627e7b53203ca150e54723abbed81a0716286

2025-02-06 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git for-next
branch HEAD: d67627e7b53203ca150e54723abbed81a0716286  ice: init flow director 
before RDMA

elapsed time: 1444m

configs tested: 99
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfiggcc-14.2.0
alphaallyesconfiggcc-14.2.0
arc  allmodconfiggcc-13.2.0
arc   allnoconfiggcc-13.2.0
arc  allyesconfiggcc-13.2.0
arc   randconfig-001-20250206gcc-13.2.0
arc   randconfig-002-20250206gcc-13.2.0
arm  allmodconfiggcc-14.2.0
arm   allnoconfigclang-17
arm  allyesconfiggcc-14.2.0
arm   randconfig-001-20250206clang-21
arm   randconfig-002-20250206clang-19
arm   randconfig-003-20250206clang-21
arm   randconfig-004-20250206gcc-14.2.0
arm64allmodconfigclang-18
arm64 allnoconfiggcc-14.2.0
arm64 randconfig-001-20250206clang-21
arm64 randconfig-002-20250206clang-21
arm64 randconfig-003-20250206clang-15
arm64 randconfig-004-20250206gcc-14.2.0
csky  allnoconfiggcc-14.2.0
csky  randconfig-001-20250206gcc-14.2.0
csky  randconfig-002-20250206gcc-14.2.0
hexagon  allmodconfigclang-21
hexagon   allnoconfigclang-21
hexagon  allyesconfigclang-18
hexagon   randconfig-001-20250206clang-21
hexagon   randconfig-002-20250206clang-18
i386 allmodconfiggcc-12
i386  allnoconfiggcc-12
i386 allyesconfiggcc-12
i386buildonly-randconfig-001-20250206clang-19
i386buildonly-randconfig-002-20250206gcc-12
i386buildonly-randconfig-003-20250206clang-19
i386buildonly-randconfig-004-20250206clang-19
i386buildonly-randconfig-005-20250206clang-19
i386buildonly-randconfig-006-20250206gcc-12
i386defconfigclang-19
loongarch allnoconfiggcc-14.2.0
loongarch randconfig-001-20250206gcc-14.2.0
loongarch randconfig-002-20250206gcc-14.2.0
m68k allmodconfiggcc-14.2.0
m68k  allnoconfiggcc-14.2.0
m68k allyesconfiggcc-14.2.0
microblazeallnoconfiggcc-14.2.0
mips  allnoconfiggcc-14.2.0
nios2 allnoconfiggcc-14.2.0
nios2 randconfig-001-20250206gcc-14.2.0
nios2 randconfig-002-20250206gcc-14.2.0
openrisc  allnoconfiggcc-14.2.0
openrisc allyesconfiggcc-14.2.0
parisc   allmodconfiggcc-14.2.0
pariscallnoconfiggcc-14.2.0
parisc   allyesconfiggcc-14.2.0
pariscrandconfig-001-20250206gcc-14.2.0
pariscrandconfig-002-20250206gcc-14.2.0
powerpc   allnoconfiggcc-14.2.0
powerpc   randconfig-001-20250206clang-21
powerpc   randconfig-002-20250206clang-19
powerpc   randconfig-003-20250206clang-21
powerpc64 randconfig-001-20250206clang-15
powerpc64 randconfig-002-20250206clang-17
powerpc64 randconfig-003-20250206clang-19
riscv allnoconfiggcc-14.2.0
riscv randconfig-001-20250206clang-19
riscv randconfig-002-20250206clang-17
s390 allmodconfigclang-19
s390  allnoconfigclang-21
s390 allyesconfiggcc-14.2.0
s390  randconfig-001-20250206clang-21
s390  randconfig-002-20250206gcc-14.2.0
sh   allmodconfiggcc-14.2.0
shallnoconfiggcc-14.2.0
sh   allyesconfiggcc-14.2.0
shrandconfig-001-20250206gcc-14.2.0
shrandconfig-002-20250206gcc-14.2.0
sparcallmodconfiggcc-14.2.0
sparc allnoconfiggcc-14.2.0
sparc randconfig

Re: [Intel-wired-lan] [PATCH bpf-next v9 3/5] net: stmmac: Add launch time support to XDP ZC

2025-02-06 Thread Maciej Fijalkowski

On Thu, Feb 06, 2025 at 02:04:06PM +0800, Song Yoong Siang wrote:
> Enable launch time (Time-Based Scheduling) support for XDP zero copy via
> the XDP Tx metadata framework.
> 
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel Tiger Lake platform. Below are the test steps and result.
> 
> Test 1: Send a single packet with the launch time set to 1 s in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>$ sudo ./xdp_hw_metadata enp0s30f4 -l 10 -L 1
> 
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>of the DUT.
> 
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 16.963 us, as shown in
> printout of the xdp_hw_metadata application below.
>   0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP
>   No rx_hash, err=-95
>   HW RX-time:   1734579065767717328 (sec:1734579065.7677)
> delta to User RX-time sec:0.0004 (375.624 usec)
>   XDP RX-time:   1734579065768004454 (sec:1734579065.7680)
>  delta to User RX-time sec:0.0001 (88.498 usec)
>   No rx_vlan_tci or rx_vlan_proto, err=-95
>   0x55b5864717a8: ping-pong with csum=5619 (want )
>   csum_start=34 csum_offset=6
>   HW RX-time:   1734579065767717328 (sec:1734579065.7677)
> delta to HW Launch-time sec:1. (100.000 usec)
>   0x55b5864717a8: complete tx idx=4 addr=4018
>   HW Launch-time:   1734579066767717328 (sec:1734579066.7677)
> delta to HW TX-complete-time sec:0. (16.963 usec)
>   HW TX-complete-time:   1734579066767734291 (sec:1734579066.7677)
>  delta to User TX-complete-time sec:0.0001
>  (130.408 usec)
>   XDP RX-time:   1734579065768004454 (sec:1734579065.7680)
>  delta to User TX-complete-time sec:0.
> (999860.245 usec)
>   HW RX-time:   1734579065767717328 (sec:1734579065.7677)
> delta to HW TX-complete-time sec:1. (116.963 usec)
>   0x55b5864717a8: complete rx idx=132 addr=88100
> 
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> 500 us in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>$ sudo chrt -f 99 ./xdp_hw_metadata enp0s30f4 -l 50 -L 1 > \
>  /dev/shm/result.log
> 
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>VLAN priority 1 to port 9091 of the DUT.
> 
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 13.854 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
>   Min delta: 08.410 us
>   Avr delta: 13.854 us
>   Max delta: 17.076 us
>   Total packets forwarded: 1000
> 
> Reviewed-by: Choong Yong Liang 
> Signed-off-by: Song Yoong Siang 

Sorry haven't looked here in previous review approaches :/

> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 ++
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index f05cae103d83..925d8b97a42b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -106,6 +106,8 @@ struct stmmac_metadata_request {
>   struct stmmac_priv *priv;
>   struct dma_desc *tx_desc;
>   bool *set_ic;
> + struct dma_edesc *edesc;
> + int tbs;

wanted to comment you're introducing holes here but set_ic is a ptr:)

>  };
>  
>  struct stmmac_xsk_tx_complete {
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d04543e5697b..5e5d24924ce7 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2514,9 +2514,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv)
>   return 0;
>  }
>  
> +static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> + struct stmmac_metadata_request *meta_req = _priv;
> + struct timespec64 ts = ns_to_timespec64(launch_time);

nit: Apply reverse christmas tree rule here (order the variable
definitions from longest to shortest), but it's not a showstopper I
guess...

> +
> + if (meta_req->tbs & STMMAC_TBS_EN)
> + stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec,
> + ts.tv_nsec);
> +}
> +
>  static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = {
>   .tmo_request_timestamp  = stmmac_xsk_request_timestamp,
>   .tmo_fill_timestamp

Re: [Intel-wired-lan] [PATCH bpf-next v9 5/5] igc: Add launch time support to XDP ZC

2025-02-06 Thread Maciej Fijalkowski

On Thu, Feb 06, 2025 at 02:04:08PM +0800, Song Yoong Siang wrote:
> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> metadata framework.
> 
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> 
> Test 1: Send a single packet with the launch time set to 1 s in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>$ sudo ./xdp_hw_metadata enp2s0 -l 10 -L 1
> 
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>of the DUT.
> 
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> printout of the xdp_hw_metadata application below.
>   0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>   rx_hash: 0xE343384 with RSS type:0x1
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> delta to User RX-time sec:0.0002 (183.103 usec)
>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>  delta to User RX-time sec:0.0001 (80.309 usec)
>   No rx_vlan_tci or rx_vlan_proto, err=-95
>   0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>   csum_start=34 csum_offset=6
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> delta to HW Launch-time sec:1. (100.000 usec)
>   0x562ff5dc8880: complete tx idx=4 addr=4018
>   HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
> delta to HW TX-complete-time sec:0. (0.016 usec)
>   HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
>  delta to User TX-complete-time sec:0.
>  (32.546 usec)
>   XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
>  delta to User TX-complete-time sec:0.
>  (29.768 usec)
>   HW RX-time:   1734578015467548904 (sec:1734578015.4675)
> delta to HW TX-complete-time sec:1. (100.016 usec)
>   0x562ff5dc8880: complete rx idx=132 addr=84110
> 
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> 500 us in the future.
> 
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
>$ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 50 -L 1 > \
>  /dev/shm/result.log
> 
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>VLAN priority 1 to port 9091 of the DUT.
> 
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 0.016 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
>   Min delta: 0.005 us
>   Avr delta: 0.016 us
>   Max delta: 0.031 us
>   Total packets forwarded: 1000
> 
> Signed-off-by: Song Yoong Siang 

Reviewed-by: Maciej Fijalkowski 

also with one nit. Thanks!

> ---
>  drivers/net/ethernet/intel/igc/igc.h  |  1 +
>  drivers/net/ethernet/intel/igc/igc_main.c | 57 ++-
>  2 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h 
> b/drivers/net/ethernet/intel/igc/igc.h
> index b8111ad9a9a8..cd1d7b6c1782 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -579,6 +579,7 @@ struct igc_metadata_request {
>   struct xsk_tx_metadata *meta;
>   struct igc_ring *tx_ring;
>   u32 cmd_type;
> + u16 used_desc;
>  };
>  
>  struct igc_q_vector {
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
> b/drivers/net/ethernet/intel/igc/igc_main.c
> index 3df608601a4b..f239f744247d 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2973,9 +2973,44 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>   return *(u64 *)_priv;
>  }
>  
> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> + struct igc_metadata_request *meta_req = _priv;
> + struct igc_ring *tx_ring = meta_req->tx_ring;
> + __le32 launch_time_offset;
> + bool insert_empty = false;
> + bool first_flag = false;
> +
> + if (!tx_ring->launchtime_enable)
> + return;
> +
> + launch_time_offset = igc_tx_launchtime(tx_ring,
> +ns_to_ktime(launch_time),
> +&first_flag, &insert_empty);
> + if (insert_empty) {
> + /* Disregard the launch time request if the required empty frame
> +  * fails to be inserted.
> +  */
> + if (igc_insert_empty_frame(tx_ring))
> + return;
> +
> + meta_req->tx_buffer =
> + &tx_ring->

Re: [Intel-wired-lan] suspend/resume broken of igc driver broken on 6.12

2025-02-06 Thread Stephen Hemminger

On Thu, 6 Feb 2025 15:17:00 +0200
"Lifshits, Vitaly"  wrote:

> On 2/6/2025 6:13 AM, Stephen Hemminger wrote:
> > On Wed, 5 Feb 2025 12:36:31 +0200
> > "Lifshits, Vitaly"  wrote:
> >   
> >> On 1/31/2025 3:21 AM, Stephen Hemminger wrote:  
> >>> On Thu, 30 Jan 2025 21:17:30 +0200
> >>> "Lifshits, Vitaly"  wrote:
> >>>  
>  On 1/30/2025 7:11 PM, Stephen Hemminger wrote:  
> > I am using:
> >
> > 5a:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
> > I226-LM (rev 04)
> > Subsystem: Intel Corporation Device 
> > Flags: bus master, fast devsel, latency 0, IRQ 19, IOMMU group 
> > 20
> > Memory at 6c50 (32-bit, non-prefetchable) [size=1M]
> > Memory at 6c60 (32-bit, non-prefetchable) [size=16K]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > Capabilities: [a0] Express Endpoint, IntMsgNum 0
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [140] Device Serial Number 58-47-ca-ff-ff-7a-98-3d
> > Capabilities: [1c0] Latency Tolerance Reporting
> > Capabilities: [1f0] Precision Time Measurement
> > Capabilities: [1e0] L1 PM Substates
> > Kernel driver in use: igc
> > Kernel modules: igc
> >
> >
> > Using both Debian testing and my own kernel built from 6.12, the igc
> > driver appears broken after resume.  
> 
> From which system state are you resuming?
>  
> >
> > After resuming the device is down and no address present.
> > Attempts to set link up manually fail.  
> 
>  Did you get any errors in the dmesg log?
>  What is the firmware version on your device (you can get it by running
>  ethtool -i)?
>  
> > If I do rmmod/modprobe of igc it comes back.
> >
> > Doing a bit of bisectting but it is slow going.  
> 
>  Meanwhile, we'll also try to reproduce this issue in our lab.  Could you
>  share more details about your system so we can create a similar setup?  
> >>>
> >>> Given that error reported is -ENODEV, might be a generic netdev problem 
> >>> not
> >>> just for igc device.
> >>>  
> >>
> >> We weren't able to reproduce this issue on our systems, even though we
> >> tried several suspend-resume cycles on different kernels and different
> >> systems.
> >>
> >> However, a few days ago we received a comment in a BZ about an issue
> >> similar to yours. In there adding a short delay in igc_resume function
> >> https://bugzilla.kernel.org/show_bug.cgi?id=219143
> >> https://bugzilla.kernel.org/show_bug.cgi?id=219143#c123
> >>
> >>
> >>
> >> Can you try to see if it fixes your issue as well?  
> > 
> > I tried the proposed delay and it had no impact.
> > Any idea of other things to instrument?
> >   
> 
> 
> Has the adapter worked with a different kernel? Can you try to reproduce 
> the issue over kernel 6.9?
> 
> Is the LAN cable connected to the igc adapter? Does it maintain link 
> during suspend?
> 
> Also, I saw that on your board you have three more adapters, I assume 
> that enp2s0f0np0 and enp2s0f0np1 are i40e adapters. Does this issue also 
> happen to enp87s0?

This is a new machine, and not sure if it ever worked.
I can boot some older distro via USB if that helps.

The LAN cable is always connected (it is a desktop box), and the
10G NIC's are not used; they are connected by a loopback cable and
used for DPDK testing occasionally.

It does work in Windows...

Re: [Intel-wired-lan] suspend/resume broken of igc driver broken on 6.12

2025-02-06 Thread Lifshits, Vitaly





On 2/6/2025 6:13 AM, Stephen Hemminger wrote:

On Wed, 5 Feb 2025 12:36:31 +0200
"Lifshits, Vitaly"  wrote:


On 1/31/2025 3:21 AM, Stephen Hemminger wrote:

On Thu, 30 Jan 2025 21:17:30 +0200
"Lifshits, Vitaly"  wrote:
   

On 1/30/2025 7:11 PM, Stephen Hemminger wrote:

I am using:

5a:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-LM (rev 
04)
Subsystem: Intel Corporation Device 
Flags: bus master, fast devsel, latency 0, IRQ 19, IOMMU group 20
Memory at 6c50 (32-bit, non-prefetchable) [size=1M]
Memory at 6c60 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Capabilities: [a0] Express Endpoint, IntMsgNum 0
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 58-47-ca-ff-ff-7a-98-3d
Capabilities: [1c0] Latency Tolerance Reporting
Capabilities: [1f0] Precision Time Measurement
Capabilities: [1e0] L1 PM Substates
Kernel driver in use: igc
Kernel modules: igc


Using both Debian testing and my own kernel built from 6.12, the igc
driver appears broken after resume.


   From which system state are you resuming?
  


After resuming the device is down and no address present.
Attempts to set link up manually fail.


Did you get any errors in the dmesg log?
What is the firmware version on your device (you can get it by running
ethtool -i)?
  

If I do rmmod/modprobe of igc it comes back.

Doing a bit of bisectting but it is slow going.


Meanwhile, we'll also try to reproduce this issue in our lab.  Could you
share more details about your system so we can create a similar setup?


Given that error reported is -ENODEV, might be a generic netdev problem not
just for igc device.
   


We weren't able to reproduce this issue on our systems, even though we
tried several suspend-resume cycles on different kernels and different
systems.

However, a few days ago we received a comment in a BZ about an issue
similar to yours. In there adding a short delay in igc_resume function
https://bugzilla.kernel.org/show_bug.cgi?id=219143
https://bugzilla.kernel.org/show_bug.cgi?id=219143#c123



Can you try to see if it fixes your issue as well?


I tried the proposed delay and it had no impact.
Any idea of other things to instrument?




Has the adapter worked with a different kernel? Can you try to reproduce 
the issue over kernel 6.9?


Is the LAN cable connected to the igc adapter? Does it maintain link 
during suspend?


Also, I saw that on your board you have three more adapters, I assume 
that enp2s0f0np0 and enp2s0f0np1 are i40e adapters. Does this issue also 
happen to enp87s0?

Re: [Intel-wired-lan] [PATCH iwl-next v1 01/13] ixgbe: add initial devlink support

2025-02-06 Thread Tony Nguyen





On 2/3/2025 7:03 AM, Jedrzej Jagielski wrote:

Add an initial support for devlink interface to ixgbe driver.

Similarly to i40e driver the implementation doesn't enable
devlink to manage device-wide configuration. Devlink instance
is created for each physical function of PCIe device.

Create separate directory for devlink related ixgbe files
and use naming scheme similar to the one used in the ice driver.

Add a stub for Documentation, to be extended by further patches.

Reviewed-by: Mateusz Polchlopek 
Signed-off-by: Jedrzej Jagielski 


...


@@ -11283,6 +11284,10 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_ioremap;
}
  
+	err = ixgbe_allocate_devlink(adapter);

+   if (err)
+   goto err_ioremap;


Looks like the unroll has an issue.

Smatch reports:
../drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:11844 ixgbe_probe() 
warn: 'hw->hw_addr' from ioremap() not released on lines: 11844.


Thanks,
Tony


+
netdev->netdev_ops = &ixgbe_netdev_ops;
ixgbe_set_ethtool_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;

Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: health.c: fix compilation on gcc 7.5

2025-02-06 Thread Zhuo, Qiuxu

> From: Kitszel, Przemyslaw 
> [...]
> Subject: [PATCH iwl-net v2] ice: health.c: fix compilation on gcc 7.5
> 
> GCC 7 is not as good as GCC 8+ in telling what is a compile-time const, and
> thus could be used for static storage.
> Fortunately keeping strings as const arrays is enough to make old gcc happy.
> 
> Excerpt from the report:
> My GCC is: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.
> 
>   CC [M]  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o
> drivers/net/ethernet/intel/ice/devlink/health.c:35:3: error: initializer 
> element
> is not constant
>ice_common_port_solutions, {ice_port_number_label}},
>^
> drivers/net/ethernet/intel/ice/devlink/health.c:35:3: note: (near 
> initialization
> for 'ice_health_status_lookup[0].solution')
> drivers/net/ethernet/intel/ice/devlink/health.c:35:31: error: initializer
> element is not constant
>ice_common_port_solutions, {ice_port_number_label}},
>^
> drivers/net/ethernet/intel/ice/devlink/health.c:35:31: note: (near 
> initialization
> for 'ice_health_status_lookup[0].data_label[0]')
> drivers/net/ethernet/intel/ice/devlink/health.c:37:46: error: initializer
> element is not constant
>"Change or replace the module or cable.", {ice_port_number_label}},
>   ^
> drivers/net/ethernet/intel/ice/devlink/health.c:37:46: note: (near 
> initialization
> for 'ice_health_status_lookup[1].data_label[0]')
> drivers/net/ethernet/intel/ice/devlink/health.c:39:3: error: initializer 
> element
> is not constant
>ice_common_port_solutions, {ice_port_number_label}},
>^
> 
> Fixes: 85d6164ec56d ("ice: add fw and port health reporters")
> Reported-by: Qiuxu Zhuo 
> Closes:
> https://lore.kernel.org/netdev/CY8PR11MB7134BF7A46D71E50D25FA7A989F
> 7...@cy8pr11mb7134.namprd11.prod.outlook.com
> Reviewed-by: Michal Swiatkowski 
> Suggested-by: Simon Horman 
> Signed-off-by: Przemek Kitszel 

   Tested-by: Qiuxu Zhuo 

Thanks!

[Intel-wired-lan] [PATCH bpf-next v10 5/5] igc: Add launch time support to XDP ZC

2025-02-06 Thread Song Yoong Siang

Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
metadata framework.

This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel I225-LM Ethernet controller. Below are the test steps and result.

Test 1: Send a single packet with the launch time set to 1 s in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo ./xdp_hw_metadata enp2s0 -l 10 -L 1

2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
   of the DUT.

Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 0.016 us, as shown in
printout of the xdp_hw_metadata application below.
  0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
  rx_hash: 0xE343384 with RSS type:0x1
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
delta to User RX-time sec:0.0002 (183.103 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
 delta to User RX-time sec:0.0001 (80.309 usec)
  No rx_vlan_tci or rx_vlan_proto, err=-95
  0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
  csum_start=34 csum_offset=6
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
delta to HW Launch-time sec:1. (100.000 usec)
  0x562ff5dc8880: complete tx idx=4 addr=4018
  HW Launch-time:   1734578016467548904 (sec:1734578016.4675)
delta to HW TX-complete-time sec:0. (0.016 usec)
  HW TX-complete-time:   1734578016467548920 (sec:1734578016.4675)
 delta to User TX-complete-time sec:0.
 (32.546 usec)
  XDP RX-time:   1734578015467651698 (sec:1734578015.4677)
 delta to User TX-complete-time sec:0.
 (29.768 usec)
  HW RX-time:   1734578015467548904 (sec:1734578015.4675)
delta to HW TX-complete-time sec:1. (100.016 usec)
  0x562ff5dc8880: complete rx idx=132 addr=84110

Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
500 us in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 50 -L 1 > \
 /dev/shm/result.log

2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
   VLAN priority 1 to port 9091 of the DUT.

Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 0.016 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
  Min delta: 0.005 us
  Avr delta: 0.016 us
  Max delta: 0.031 us
  Total packets forwarded: 1000

Reviewed-by: Maciej Fijalkowski 
Signed-off-by: Song Yoong Siang 
---
 drivers/net/ethernet/intel/igc/igc.h  |  1 +
 drivers/net/ethernet/intel/igc/igc_main.c | 61 ++-
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h 
b/drivers/net/ethernet/intel/igc/igc.h
index b8111ad9a9a8..cd1d7b6c1782 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -579,6 +579,7 @@ struct igc_metadata_request {
struct xsk_tx_metadata *meta;
struct igc_ring *tx_ring;
u32 cmd_type;
+   u16 used_desc;
 };
 
 struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 1bfa71545e37..3044392e8ded 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2971,9 +2971,48 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
return *(u64 *)_priv;
 }
 
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+   struct igc_metadata_request *meta_req = _priv;
+   struct igc_ring *tx_ring = meta_req->tx_ring;
+   __le32 launch_time_offset;
+   bool insert_empty = false;
+   bool first_flag = false;
+   u16 used_desc = 0;
+
+   if (!tx_ring->launchtime_enable)
+   return;
+
+   launch_time_offset = igc_tx_launchtime(tx_ring,
+  ns_to_ktime(launch_time),
+  &first_flag, &insert_empty);
+   if (insert_empty) {
+   /* Disregard the launch time request if the required empty frame
+* fails to be inserted.
+*/
+   if (igc_insert_empty_frame(tx_ring))
+   return;
+
+   meta_req->tx_buffer =
+   &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+   /* Inserting an empty packet requires two descriptors:
+* one data descriptor and one context descriptor.
+*/
+   used_desc += 2;
+   }
+
+   /* Us

[Intel-wired-lan] [PATCH iwl-net v2] ice: health.c: fix compilation on gcc 7.5

2025-02-06 Thread Przemek Kitszel

GCC 7 is not as good as GCC 8+ in telling what is a compile-time
const, and thus could be used for static storage.
Fortunately keeping strings as const arrays is enough to make old
gcc happy.

Excerpt from the report:
My GCC is: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.

  CC [M]  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o
drivers/net/ethernet/intel/ice/devlink/health.c:35:3: error: initializer 
element is not constant
   ice_common_port_solutions, {ice_port_number_label}},
   ^
drivers/net/ethernet/intel/ice/devlink/health.c:35:3: note: (near 
initialization for 'ice_health_status_lookup[0].solution')
drivers/net/ethernet/intel/ice/devlink/health.c:35:31: error: initializer 
element is not constant
   ice_common_port_solutions, {ice_port_number_label}},
   ^
drivers/net/ethernet/intel/ice/devlink/health.c:35:31: note: (near 
initialization for 'ice_health_status_lookup[0].data_label[0]')
drivers/net/ethernet/intel/ice/devlink/health.c:37:46: error: initializer 
element is not constant
   "Change or replace the module or cable.", {ice_port_number_label}},
  ^
drivers/net/ethernet/intel/ice/devlink/health.c:37:46: note: (near 
initialization for 'ice_health_status_lookup[1].data_label[0]')
drivers/net/ethernet/intel/ice/devlink/health.c:39:3: error: initializer 
element is not constant
   ice_common_port_solutions, {ice_port_number_label}},
   ^

Fixes: 85d6164ec56d ("ice: add fw and port health reporters")
Reported-by: Qiuxu Zhuo 
Closes: 
https://lore.kernel.org/netdev/cy8pr11mb7134bf7a46d71e50d25fa7a989...@cy8pr11mb7134.namprd11.prod.outlook.com
Reviewed-by: Michal Swiatkowski 
Suggested-by: Simon Horman 
Signed-off-by: Przemek Kitszel 
---
v2: use static const char[] instead of #define - Simon
+added RB tag from Michal, but not adding TB tag from Qiuxu

v1:
 
https://lore.kernel.org/netdev/20250205104252.30464-2-przemyslaw.kits...@intel.com

CC: Kees Cook 
CC: Jiri Slaby 
---
 drivers/net/ethernet/intel/ice/devlink/health.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c 
b/drivers/net/ethernet/intel/ice/devlink/health.c
index ea40f7941259..19c3d37aa768 100644
--- a/drivers/net/ethernet/intel/ice/devlink/health.c
+++ b/drivers/net/ethernet/intel/ice/devlink/health.c
@@ -25,10 +25,10 @@ struct ice_health_status {
  * The below lookup requires to be sorted by code.
  */
 
-static const char *const ice_common_port_solutions =
+static const char ice_common_port_solutions[] =
"Check your cable connection. Change or replace the module or cable. 
Manually set speed and duplex.";
-static const char *const ice_port_number_label = "Port Number";
-static const char *const ice_update_nvm_solution = "Update to the latest NVM 
image.";
+static const char ice_port_number_label[] = "Port Number";
+static const char ice_update_nvm_solution[] = "Update to the latest NVM 
image.";
 
 static const struct ice_health_status ice_health_status_lookup[] = {
{ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT, "An unsupported module 
was detected.",

base-commit: 4241a702e0d0c2ca9364cfac08dbf134264962de
-- 
2.46.0

[Intel-wired-lan] [PATCH bpf-next v10 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata

2025-02-06 Thread Song Yoong Siang

Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.

Suggested-by: Stanislav Fomichev 
Acked-by: Stanislav Fomichev 
Signed-off-by: Song Yoong Siang 
---
 Documentation/netlink/specs/netdev.yaml  |  4 ++
 Documentation/networking/xsk-tx-metadata.rst | 62 
 include/net/xdp_sock.h   | 10 
 include/net/xdp_sock_drv.h   |  1 +
 include/uapi/linux/if_xdp.h  | 10 
 include/uapi/linux/netdev.h  |  3 +
 net/core/netdev-genl.c   |  2 +
 net/xdp/xsk.c|  3 +
 tools/include/uapi/linux/if_xdp.h| 10 
 tools/include/uapi/linux/netdev.h|  3 +
 10 files changed, 108 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml 
b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..901b5afb3df0 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -70,6 +70,10 @@ definitions:
 name: tx-checksum
 doc:
   L3 checksum HW offload is supported by the driver.
+  -
+name: tx-launch-time-fifo
+doc:
+  Launch time HW offload is supported by the driver.
   -
 name: queue-type
 type: enum
diff --git a/Documentation/networking/xsk-tx-metadata.rst 
b/Documentation/networking/xsk-tx-metadata.rst
index e76b0cfc32f7..df53a10ccac3 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -50,6 +50,10 @@ The flags field enables the particular offload:
   checksum. ``csum_start`` specifies byte offset of where the checksumming
   should start and ``csum_offset`` specifies byte offset where the
   device should store the computed checksum.
+- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
+  packet for transmission at a pre-determined time called launch time. The
+  value of launch time is indicated by ``launch_time`` field of
+  ``union xsk_tx_metadata``.
 
 Besides the flags above, in order to trigger the offloads, the first
 packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -65,6 +69,63 @@ In this case, when running in ``XDK_COPY`` mode, the TX 
checksum
 is calculated on the CPU. Do not enable this option in production because
 it will negatively affect performance.
 
+Launch Time
+===
+
+The value of the requested launch time should be based on the device's PTP
+Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
+compared to the ETF queuing discipline, which organizes packets and delays
+their transmission. Instead, AF_XDP immediately hands off the packets to
+the device driver without rearranging their order or holding them prior to
+transmission. Since the driver maintains FIFO behavior and does not perform
+packet reordering, a packet with a launch time request will block other
+packets in the same Tx Queue until it is sent. Therefore, it is recommended
+to allocate separate queue for scheduling traffic that is intended for
+future transmission.
+
+In scenarios where the launch time offload feature is disabled, the device
+driver is expected to disregard the launch time request. For correct
+interpretation and meaningful operation, the launch time should never be
+set to a value larger than the farthest programmable time in the future
+(the horizon). Different devices have different hardware limitations on the
+launch time offload feature.
+
+stmmac driver
+-
+
+For stmmac, TSO and launch time (TBS) features are mutually exclusive for
+each individual Tx Queue. By default, the driver configures Tx Queue 0 to
+support TSO and the rest of the Tx Queues to support TBS. The launch time
+hardware offload feature can be enabled or disabled by using the tc-etf
+command to call the driver's ndo_setup_tc() callback.
+
+The value of the launch time that is programmed in the Enhanced Normal
+Transmit Descriptors is a 32-bit value, where the most significant 8 bits
+represent the time in seconds and the remaining 24 bits represent the time
+in 256 ns increments. The programmed launch time is compared against the
+PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
+horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
+future.
+
+igc driver
+--
+
+For igc, all four Tx Queues support the launch time feature. The launch
+time hardware offload feature can be enabled or disabled by using the
+tc-etf command to call the driver's ndo_setup_tc() callback. When entering
+TSN mode, the igc driver will reset the device and create a default Qbv
+schedule with a 1-second cycle time, with all Tx Queues

[Intel-wired-lan] [PATCH bpf-next v10 0/5] xsk: TX metadata Launch Time support

2025-02-06 Thread Song Yoong Siang

This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.

Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.

Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.

To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.

Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.

Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.

v10:
  - use net_err_ratelimited(), instead of net_ratelimit() (Maciej)
  - accumulate the amount of used descs in local variable and update the
igc_metadata_request::used_desc once (Maciej)
  - Ensure reverse christmas tree rule (Maciej)

V9: 
https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.s...@intel.com/
  - Remove the igc_desc_unused() checking (Maciej)
  - Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
  - Rate limit the error messages (Maciej)
  - Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
  - Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
  - put self SOB tag as last tag (Maciej)

V8: 
https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.s...@intel.com/
  - check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
  - submit another bug fix patch to set the buffer type for empty frame 
(Maciej):

https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.s...@intel.com/

V7: 
https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.s...@intel.com/
  - split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
  - add explanation on why the value "4" is used as igc transmit budget
(Faizal)
  - perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)

V6: 
https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.s...@intel.com/
  - fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)

V5: 
https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.s...@intel.com/
  - change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
  - improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
  - add launch time setup steps into xdp_hw_metadata app (Stanislav)

V4: 
https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.s...@intel.com/
  - added XDP launch time support to the igc driver (Jesper & Florian)
  - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
  - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
  - added step to enable launch time in the commit message (Jesper & Willem)
  - explicitly documented the type of launch_time and which clock source
it is against (Willem)

V3: 
https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.s...@intel.com/
  - renamed to use launch time (Jesper & Willem)
  - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
beca

[Intel-wired-lan] [PATCH bpf-next v10 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata

2025-02-06 Thread Song Yoong Siang

Add launch time hardware offload request to xdp_hw_metadata. Users can
configure the delta of launch time relative to HW RX-time using the "-l"
argument. By default, the delta is set to 0 ns, which means the launch time
is disabled. By setting the delta to a non-zero value, the launch time
hardware offload feature will be enabled and requested. Additionally, users
can configure the Tx Queue to be enabled with the launch time hardware
offload using the "-L" argument. By default, Tx Queue 0 will be used.

Acked-by: Stanislav Fomichev 
Signed-off-by: Song Yoong Siang 
---
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +-
 1 file changed, 163 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c 
b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 6f7b15d6c6ed..3d8de0d4c96a 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -13,6 +13,7 @@
  * - UDP 9091 packets trigger TX reply
  * - TX HW timestamp is requested and reported back upon completion
  * - TX checksum is requested
+ * - TX launch time HW offload is requested for transmission
  */
 
 #include 
@@ -37,6 +38,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include "xdp_metadata.h"
 
@@ -64,6 +74,18 @@ int rxq;
 bool skip_tx;
 __u64 last_hw_rx_timestamp;
 __u64 last_xdp_rx_timestamp;
+__u64 last_launch_time;
+__u64 launch_time_delta_to_hw_rx_timestamp;
+int launch_time_queue;
+
+#define run_command(cmd, ...)  \
+({ \
+   char command[1024]; \
+   memset(command, 0, sizeof(command));\
+   snprintf(command, sizeof(command), cmd, ##__VA_ARGS__); \
+   fprintf(stderr, "Running: %s\n", command);  \
+   system(command);\
+})
 
 void test__fail(void) { /* for network_helpers.c */ }
 
@@ -298,6 +320,12 @@ static bool complete_tx(struct xsk *xsk, clockid_t 
clock_id)
if (meta->completion.tx_timestamp) {
__u64 ref_tstamp = gettime(clock_id);
 
+   if (launch_time_delta_to_hw_rx_timestamp) {
+   print_tstamp_delta("HW Launch-time",
+  "HW TX-complete-time",
+  last_launch_time,
+  meta->completion.tx_timestamp);
+   }
print_tstamp_delta("HW TX-complete-time", "User 
TX-complete-time",
   meta->completion.tx_timestamp, ref_tstamp);
print_tstamp_delta("XDP RX-time", "User TX-complete-time",
@@ -395,6 +423,17 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, 
clockid_t clock_id)
   xsk, ntohs(udph->check), ntohs(want_csum),
   meta->request.csum_start, meta->request.csum_offset);
 
+   /* Set the value of launch time */
+   if (launch_time_delta_to_hw_rx_timestamp) {
+   meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME;
+   meta->request.launch_time = last_hw_rx_timestamp +
+   
launch_time_delta_to_hw_rx_timestamp;
+   last_launch_time = meta->request.launch_time;
+   print_tstamp_delta("HW RX-time", "HW Launch-time",
+  last_hw_rx_timestamp,
+  meta->request.launch_time);
+   }
+
memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity 
*/
tx_desc->options |= XDP_TX_METADATA;
tx_desc->len = len;
@@ -407,6 +446,7 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int 
server_fd, clockid_t
const struct xdp_desc *rx_desc;
struct pollfd fds[rxq + 1];
__u64 comp_addr;
+   __u64 deadline;
__u64 addr;
__u32 idx = 0;
int ret;
@@ -477,9 +517,15 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, 
int server_fd, clockid_t
if (ret)
printf("kick_tx ret=%d\n", ret);
 
-   for (int j = 0; j < 500; j++) {
+   /* wait 1 second + cover launch time */
+   deadline = gettime(clock_id) +
+  NANOSEC_PER_SEC +
+  
launch_time_delta_to_hw_rx_timestamp;
+   while (true) {
if (complete_tx(xsk, clock_id))
break;
+   if (gettime(clock_id) >= 
deadline)
+

[Intel-wired-lan] [PATCH bpf-next v10 3/5] net: stmmac: Add launch time support to XDP ZC

2025-02-06 Thread Song Yoong Siang

Enable launch time (Time-Based Scheduling) support for XDP zero copy via
the XDP Tx metadata framework.

This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel Tiger Lake platform. Below are the test steps and result.

Test 1: Send a single packet with the launch time set to 1 s in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo ./xdp_hw_metadata enp0s30f4 -l 10 -L 1

2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
   of the DUT.

Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 16.963 us, as shown in
printout of the xdp_hw_metadata application below.
  0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP
  No rx_hash, err=-95
  HW RX-time:   1734579065767717328 (sec:1734579065.7677)
delta to User RX-time sec:0.0004 (375.624 usec)
  XDP RX-time:   1734579065768004454 (sec:1734579065.7680)
 delta to User RX-time sec:0.0001 (88.498 usec)
  No rx_vlan_tci or rx_vlan_proto, err=-95
  0x55b5864717a8: ping-pong with csum=5619 (want )
  csum_start=34 csum_offset=6
  HW RX-time:   1734579065767717328 (sec:1734579065.7677)
delta to HW Launch-time sec:1. (100.000 usec)
  0x55b5864717a8: complete tx idx=4 addr=4018
  HW Launch-time:   1734579066767717328 (sec:1734579066.7677)
delta to HW TX-complete-time sec:0. (16.963 usec)
  HW TX-complete-time:   1734579066767734291 (sec:1734579066.7677)
 delta to User TX-complete-time sec:0.0001
 (130.408 usec)
  XDP RX-time:   1734579065768004454 (sec:1734579065.7680)
 delta to User TX-complete-time sec:0.
(999860.245 usec)
  HW RX-time:   1734579065767717328 (sec:1734579065.7677)
delta to HW TX-complete-time sec:1. (116.963 usec)
  0x55b5864717a8: complete rx idx=132 addr=88100

Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
500 us in the future.

Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
   $ sudo chrt -f 99 ./xdp_hw_metadata enp0s30f4 -l 50 -L 1 > \
 /dev/shm/result.log

2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
   VLAN priority 1 to port 9091 of the DUT.

Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 13.854 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
  Min delta: 08.410 us
  Avr delta: 13.854 us
  Max delta: 17.076 us
  Total packets forwarded: 1000

Reviewed-by: Choong Yong Liang 
Signed-off-by: Song Yoong Siang 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index f05cae103d83..925d8b97a42b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -106,6 +106,8 @@ struct stmmac_metadata_request {
struct stmmac_priv *priv;
struct dma_desc *tx_desc;
bool *set_ic;
+   struct dma_edesc *edesc;
+   int tbs;
 };
 
 struct stmmac_xsk_tx_complete {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d04543e5697b..ff67dc4ecf72 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2514,9 +2514,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv)
return 0;
 }
 
+static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+   struct timespec64 ts = ns_to_timespec64(launch_time);
+   struct stmmac_metadata_request *meta_req = _priv;
+
+   if (meta_req->tbs & STMMAC_TBS_EN)
+   stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec,
+   ts.tv_nsec);
+}
+
 static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = {
.tmo_request_timestamp  = stmmac_xsk_request_timestamp,
.tmo_fill_timestamp = stmmac_xsk_fill_timestamp,
+   .tmo_request_launch_time= stmmac_xsk_request_launch_time,
 };
 
 static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -2600,6 +2611,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, 
u32 queue, u32 budget)
meta_req.priv = priv;
meta_req.tx_desc = tx_desc;
meta_req.set_ic = &set_ic;
+   meta_req.tbs = tx_q->tbs;
+   meta_req.edesc = &tx_q->dma_entx[entry];

[Intel-wired-lan] [PATCH bpf-next v10 4/5] igc: Refactor empty frame insertion for launch time support

2025-02-06 Thread Song Yoong Siang

Refactor the code for inserting an empty frame into a new function
igc_insert_empty_frame(). This change extracts the logic for inserting
an empty packet from igc_xmit_frame_ring() into a separate function,
allowing it to be reused in future implementations, such as the XDP
zero copy transmit function.

Remove the igc_desc_unused() checking in igc_init_tx_empty_descriptor()
because the number of descriptors needed is guaranteed.

Ensure that skb allocation and DMA mapping work for the empty frame,
before proceeding to fill in igc_tx_buffer info, context descriptor,
and data descriptor.

Rate limit the error messages for skb allocation and DMA mapping failures.

Update the comment to indicate that the 2 descriptors needed by the empty
frame are already taken into consideration in igc_xmit_frame_ring().

Handle the case where the insertion of an empty frame fails and explain
the reason behind this handling.

Reviewed-by: Maciej Fijalkowski 
Signed-off-by: Song Yoong Siang 
---
 drivers/net/ethernet/intel/igc/igc_main.c | 82 ++-
 1 file changed, 50 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 84307bb7313e..1bfa71545e37 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1092,7 +1092,8 @@ static int igc_init_empty_frame(struct igc_ring *ring,
 
dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
if (dma_mapping_error(ring->dev, dma)) {
-   netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
+   net_err_ratelimited("%s: DMA mapping error for empty frame\n",
+   netdev_name(ring->netdev));
return -ENOMEM;
}
 
@@ -1108,20 +1109,12 @@ static int igc_init_empty_frame(struct igc_ring *ring,
return 0;
 }
 
-static int igc_init_tx_empty_descriptor(struct igc_ring *ring,
-   struct sk_buff *skb,
-   struct igc_tx_buffer *first)
+static void igc_init_tx_empty_descriptor(struct igc_ring *ring,
+struct sk_buff *skb,
+struct igc_tx_buffer *first)
 {
union igc_adv_tx_desc *desc;
u32 cmd_type, olinfo_status;
-   int err;
-
-   if (!igc_desc_unused(ring))
-   return -EBUSY;
-
-   err = igc_init_empty_frame(ring, first, skb);
-   if (err)
-   return err;
 
cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
@@ -1140,8 +1133,6 @@ static int igc_init_tx_empty_descriptor(struct igc_ring 
*ring,
ring->next_to_use++;
if (ring->next_to_use == ring->count)
ring->next_to_use = 0;
-
-   return 0;
 }
 
 #define IGC_EMPTY_FRAME_SIZE 60
@@ -1567,6 +1558,40 @@ static bool igc_request_tx_tstamp(struct igc_adapter 
*adapter, struct sk_buff *s
return false;
 }
 
+static int igc_insert_empty_frame(struct igc_ring *tx_ring)
+{
+   struct igc_tx_buffer *empty_info;
+   struct sk_buff *empty_skb;
+   void *data;
+   int ret;
+
+   empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+   empty_skb = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
+   if (unlikely(!empty_skb)) {
+   net_err_ratelimited("%s: skb alloc error for empty frame\n",
+   netdev_name(tx_ring->netdev));
+   return -ENOMEM;
+   }
+
+   data = skb_put(empty_skb, IGC_EMPTY_FRAME_SIZE);
+   memset(data, 0, IGC_EMPTY_FRAME_SIZE);
+
+   /* Prepare DMA mapping and Tx buffer information */
+   ret = igc_init_empty_frame(tx_ring, empty_info, empty_skb);
+   if (unlikely(ret)) {
+   dev_kfree_skb_any(empty_skb);
+   return ret;
+   }
+
+   /* Prepare advanced context descriptor for empty packet */
+   igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
+
+   /* Prepare advanced data descriptor for empty packet */
+   igc_init_tx_empty_descriptor(tx_ring, empty_skb, empty_info);
+
+   return 0;
+}
+
 static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
   struct igc_ring *tx_ring)
 {
@@ -1586,6 +1611,7 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff 
*skb,
 *  + 1 desc for skb_headlen/IGC_MAX_DATA_PER_TXD,
 *  + 2 desc gap to keep tail from touching head,
 *  + 1 desc for context descriptor,
+*  + 2 desc for inserting an empty packet for launch time,
 * otherwise try next time
 */
for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
@@ -1605,24 +1631,16 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff 
*skb,
launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, 
&insert_empty);
 
if

Re: [Intel-wired-lan] [PATCH net-next v7 1/5] net: move ARFS rmap management to core

2025-02-06 Thread Jakub Kicinski

On Tue,  4 Feb 2025 15:06:18 -0700 Ahmed Zaki wrote:
> +void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
> +{
> + int rc;
> +
> + /* Remove existing rmap entries */
> + if (napi->dev->rx_cpu_rmap_auto &&
> + napi->irq != irq && napi->irq > 0)

this condition gets a bit hairy by the end of the series.
could you add a napi state bit that indicates that a notifier is
installed? Then here:

if (napi->irq == irq)
return;

if (test_and_clear_bit(NAPI_STATE_HAS_NOTIFIER, &napi->state))
irq_set_affinity_notifier(napi->irq, NULL);
if (irq < 0)
return;

And you can similarly simplify napi_disable_locked().

Speaking of which, why do the auto-removal in napi_disable()
rather than netif_napi_del() ? We don't reinstall on napi_enable()
and doing a disable() + enable() is fairly common during driver
reconfig.

> + irq_set_affinity_notifier(napi->irq, NULL);
> +
> + napi->irq = irq;
> + if (irq > 0) {
> + rc = napi_irq_cpu_rmap_add(napi, irq);
> + if (rc)
> + netdev_warn(napi->dev, "Unable to update ARFS map 
> (%d)\n",

nit: not sure I'd grasp this message as a user, maybe:

"Unable to install aRFS CPU to Rx queue mapping"

? Not great either, I guess.

Re: [Intel-wired-lan] [PATCH net-next v7 2/5] net: napi: add CPU affinity to napi_config

2025-02-06 Thread Jakub Kicinski

On Wed, 5 Feb 2025 08:20:20 -0700 Ahmed Zaki wrote:
> >> +  if (napi->dev->rx_cpu_rmap_auto) {
> >>rc = napi_irq_cpu_rmap_add(napi, irq);
> >>if (rc)
> >>netdev_warn(napi->dev, "Unable to update ARFS map 
> >> (%d)\n",
> >>rc);
> >> +  } else if (napi->config && napi->dev->irq_affinity_auto) {
> >> +  napi->notify.notify = netif_napi_irq_notify;
> >> +  napi->notify.release = netif_napi_affinity_release;
> >> +
> >> +  rc = irq_set_affinity_notifier(irq, &napi->notify);
> >> +  if (rc)
> >> +  netdev_warn(napi->dev, "Unable to set IRQ notifier 
> >> (%d)\n",
> >> +  rc);
> >>}  
> > 
> > Should there be a WARN_ON or WARN_ON_ONCE in here somewhere if the
> > driver calls netif_napi_set_irq_locked but did not link NAPI config
> > with a call to netif_napi_add_config?
> > 
> > It seems like in that case the driver is buggy and a warning might
> > be helpful.
> >   
> 
> I think that is a good idea, if there is a new version I can add this in 
> the second part of the if:
> 
> 
> if (WARN_ON_ONCE(!napi->config))
>   return;

To be clear, this will make it illegal to set IRQ on a NAPI instance
before it's listed. Probably for the best if we also have auto-remove
in netif_napi_del().

Re: [Intel-wired-lan] [PATCH net-next v7 2/5] net: napi: add CPU affinity to napi_config

2025-02-06 Thread Jakub Kicinski

On Tue,  4 Feb 2025 15:06:19 -0700 Ahmed Zaki wrote:
> + *   @irq_affinity_auto: driver wants the core to manage the IRQ affinity.

"manage" is probably too strong? "store" or "remember" ?
Your commit message explains it quite nicely.

> + *   Set by netif_enable_irq_affinity(), then driver must
> + *   create persistent napi by netif_napi_add_config()
> + *   and finally bind napi to IRQ (netif_napi_set_irq).
> + *
>   *   @rx_cpu_rmap_auto: driver wants the core to manage the ARFS rmap.
>   *  Set by calling netif_enable_cpu_rmap().
>   *
> @@ -2402,6 +2406,7 @@ struct net_device {
>   struct lock_class_key   *qdisc_tx_busylock;
>   boolproto_down;
>   boolthreaded;
> + boolirq_affinity_auto;
>   boolrx_cpu_rmap_auto;
>  
>   /* priv_flags_slow, ungrouped to save space */
> @@ -2662,6 +2667,11 @@ static inline void netdev_set_ml_priv(struct 
> net_device *dev,
>   dev->ml_priv_type = type;
>  }
>  
> +static inline void netif_enable_irq_affinity(struct net_device *dev)

Similar here, "enable affinity" is a bit strong.

netif_remember_irq_affinity() would be more accurate IMHO

> +{
> + dev->irq_affinity_auto = true;
> +}

[Intel-wired-lan] [tnguy-next-queue:100GbE] BUILD SUCCESS d67c484647606b83c9e5cc7d445ff7d7d2012fa2

2025-02-06 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git 100GbE
branch HEAD: d67c484647606b83c9e5cc7d445ff7d7d2012fa2  ice: Implement PTP 
support for E830 devices

elapsed time: 1448m

configs tested: 93
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfiggcc-14.2.0
alphaallyesconfiggcc-14.2.0
arc  allmodconfiggcc-13.2.0
arc   allnoconfiggcc-13.2.0
arc  allyesconfiggcc-13.2.0
arc   randconfig-001-20250206gcc-13.2.0
arc   randconfig-002-20250206gcc-13.2.0
arm  allmodconfiggcc-14.2.0
arm   allnoconfigclang-17
arm  allyesconfiggcc-14.2.0
arm   randconfig-001-20250206clang-21
arm   randconfig-002-20250206clang-19
arm   randconfig-003-20250206clang-21
arm   randconfig-004-20250206gcc-14.2.0
arm64allmodconfigclang-18
arm64 allnoconfiggcc-14.2.0
arm64 randconfig-001-20250206clang-21
arm64 randconfig-002-20250206clang-21
arm64 randconfig-003-20250206clang-15
arm64 randconfig-004-20250206gcc-14.2.0
csky  allnoconfiggcc-14.2.0
csky  randconfig-001-20250206gcc-14.2.0
csky  randconfig-002-20250206gcc-14.2.0
hexagon  allmodconfigclang-21
hexagon   allnoconfigclang-21
hexagon  allyesconfigclang-18
hexagon   randconfig-001-20250206clang-21
hexagon   randconfig-002-20250206clang-18
i386 allmodconfiggcc-12
i386  allnoconfiggcc-12
i386 allyesconfiggcc-12
i386buildonly-randconfig-001-20250206clang-19
i386buildonly-randconfig-002-20250206gcc-12
i386buildonly-randconfig-003-20250206clang-19
i386buildonly-randconfig-004-20250206clang-19
i386buildonly-randconfig-005-20250206clang-19
i386buildonly-randconfig-006-20250206gcc-12
i386defconfigclang-19
loongarch allnoconfiggcc-14.2.0
loongarch randconfig-001-20250206gcc-14.2.0
loongarch randconfig-002-20250206gcc-14.2.0
m68k  allnoconfiggcc-14.2.0
microblazeallnoconfiggcc-14.2.0
mips  allnoconfiggcc-14.2.0
nios2 allnoconfiggcc-14.2.0
nios2 randconfig-001-20250206gcc-14.2.0
nios2 randconfig-002-20250206gcc-14.2.0
openrisc allyesconfiggcc-14.2.0
openriscdefconfiggcc-14.2.0
parisc   allmodconfiggcc-14.2.0
parisc   allyesconfiggcc-14.2.0
parisc  defconfiggcc-14.2.0
pariscrandconfig-001-20250206gcc-14.2.0
pariscrandconfig-002-20250206gcc-14.2.0
powerpc   randconfig-001-20250206clang-21
powerpc   randconfig-002-20250206clang-19
powerpc   randconfig-003-20250206clang-21
powerpc64 randconfig-001-20250206clang-15
powerpc64 randconfig-002-20250206clang-17
powerpc64 randconfig-003-20250206clang-19
riscv randconfig-001-20250206clang-19
riscv randconfig-002-20250206clang-17
s390 allmodconfigclang-19
s390 allyesconfiggcc-14.2.0
s390  randconfig-001-20250206clang-21
s390  randconfig-002-20250206gcc-14.2.0
sh   allmodconfiggcc-14.2.0
shallnoconfiggcc-14.2.0
sh   allyesconfiggcc-14.2.0
shrandconfig-001-20250206gcc-14.2.0
shrandconfig-002-20250206gcc-14.2.0
sparcallmodconfiggcc-14.2.0
sparc allnoconfiggcc-14.2.0
sparc randconfig-001-20250206gcc-14.2.0
sparc randconfig-002-20250206gcc-14.2.0
sparc64   randconfig-001-20250206gcc-14.2.0
sparc64   randconfig-002-20250206gcc-14.2.0
um   allmodconfigclang-21
um

Re: [Intel-wired-lan] [PATCH net-next v7 0/5] net: napi: add CPU affinity to napi->config

2025-02-06 Thread Joe Damato

On Tue, Feb 04, 2025 at 03:06:17PM -0700, Ahmed Zaki wrote:
> Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
> after reset. However, since there can be only one IRQ affinity notifier
> for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
> management in the core (which also registers separate IRQ affinity
> notifiers).   
> 
> Move the IRQ affinity management to the napi struct. This way we can have
> a unified IRQ notifier to re-apply the user-set affinity and also manage
> the ARFS rmaps. The first patch  moves the ARFS rmap management to CORE.
> The second patch adds the IRQ affinity mask to napi_config and re-applies
> the mask after reset. Patches 3-5 use the new API for bnxt, ice and idpf
> drivers.

If there's another version maybe adding this to netdevsim might be
good?

Was just thinking that if one day in the distant future netdev-genl
was extended to expose the per NAPI affinity mask, a test could
probably be written.

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

2025-02-06 Thread Bharat Bhushan

On Thu, Feb 6, 2025 at 2:24 PM Leon Romanovsky  wrote:
>
> On Thu, Feb 06, 2025 at 02:16:08PM +0530, Bharat Bhushan wrote:
> > Hi Leon,
> >
> > On Wed, Feb 5, 2025 at 11:50 PM Leon Romanovsky  wrote:
> > >
> > > From: Leon Romanovsky 
> > >
> > > XFRM offload path is probed even if offload isn't needed at all. Let's
> > > make sure that x->type_offload pointer stays NULL for such path to
> > > reduce ambiguity.
> > >
> > > Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.")
> > > Signed-off-by: Leon Romanovsky 
> > > ---
> > >  include/net/xfrm.h | 12 +++-
> > >  net/xfrm/xfrm_device.c | 14 +-
> > >  net/xfrm/xfrm_state.c  | 22 +-
> > >  net/xfrm/xfrm_user.c   |  2 +-
> > >  4 files changed, 30 insertions(+), 20 deletions(-)
>
> <...>
>
> > > +   x->type_offload = xfrm_get_type_offload(x->id.proto, 
> > > x->props.family);
> > > +   if (!x->type_offload) {
>
> <...>
>
> > > +   xfrm_put_type_offload(x->type_offload);
> > > +   x->type_offload = NULL;
> >
> > We always set type_offload to NULL. Can we move type_offload = NULL in
> > xfrm_put_type_offload() ?
>
> We can, but it will require change to xfrm_get_type_offload() too,
> otherwise we will get asymmetrical get/put.

"x->type_offload = NULL" is always set after the put() function. so I
thought that maybe moving "x->type_offload = NULL" to the put()
function would simplify.
Yes, get/put will be asymmetric. Maybe setting "x->type_offload" can
be done in get/put().
Anyway it is not a major comment. ignore if this does not simplify.

Thanks
-Bharat

>
> Do you want something like that?
> int xfrm_get_type_offload(struct xfrm_state *x);
> void xfrm_put_type_offload(struct xfrm_state *x);
>
> Thansk
>
> >
> > Thanks
> > -Bharat
> >
> > > /* User explicitly requested packet offload mode and 
> > > configured
> > >  * policy in addition to the XFRM state. So be civil to 
> > > users,
> > >  * and return an error instead of taking fallback path.
> > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > > index ad2202fa82f3..568fe8df7741 100644
> > > --- a/net/xfrm/xfrm_state.c
> > > +++ b/net/xfrm/xfrm_state.c
> > > @@ -424,11 +424,12 @@ void xfrm_unregister_type_offload(const struct 
> > > xfrm_type_offload *type,
> > >  }
> > >  EXPORT_SYMBOL(xfrm_unregister_type_offload);
> > >
> > > -static const struct xfrm_type_offload *
> > > -xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
> > > +const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto,
> > > + unsigned short 
> > > family)
> > >  {
> > > const struct xfrm_type_offload *type = NULL;
> > > struct xfrm_state_afinfo *afinfo;
> > > +   bool try_load = true;
> > >
> > >  retry:
> > > afinfo = xfrm_state_get_afinfo(family);
> > > @@ -456,11 +457,7 @@ xfrm_get_type_offload(u8 proto, unsigned short 
> > > family, bool try_load)
> > >
> > > return type;
> > >  }
> > > -
> > > -static void xfrm_put_type_offload(const struct xfrm_type_offload *type)
> > > -{
> > > -   module_put(type->owner);
> > > -}
> > > +EXPORT_SYMBOL(xfrm_get_type_offload);
> > >
> > >  static const struct xfrm_mode xfrm4_mode_map[XFRM_MODE_MAX] = {
> > > [XFRM_MODE_BEET] = {
> > > @@ -609,8 +606,6 @@ static void ___xfrm_state_destroy(struct xfrm_state 
> > > *x)
> > > kfree(x->coaddr);
> > > kfree(x->replay_esn);
> > > kfree(x->preplay_esn);
> > > -   if (x->type_offload)
> > > -   xfrm_put_type_offload(x->type_offload);
> > > if (x->type) {
> > > x->type->destructor(x);
> > > xfrm_put_type(x->type);
> > > @@ -784,6 +779,9 @@ void xfrm_dev_state_free(struct xfrm_state *x)
> > > struct xfrm_dev_offload *xso = &x->xso;
> > > struct net_device *dev = READ_ONCE(xso->dev);
> > >
> > > +   xfrm_put_type_offload(x->type_offload);
> > > +   x->type_offload = NULL;
> > > +
> > > if (dev && dev->xfrmdev_ops) {
> > > spin_lock_bh(&xfrm_state_dev_gc_lock);
> > > if (!hlist_unhashed(&x->dev_gclist))
> > > @@ -3122,7 +3120,7 @@ u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
> > >  }
> > >  EXPORT_SYMBOL_GPL(xfrm_state_mtu);
> > >
> > > -int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool 
> > > offload,
> > > +int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
> > >   struct netlink_ext_ack *extack)
> > >  {
> > > const struct xfrm_mode *inner_mode;
> > > @@ -3178,8 +3176,6 @@ int __xfrm_init_state(struct xfrm_state *x, bool 
> > > init_replay, bool offload,
> > > goto error;
> > > }
> > >
> > > -   x->type_offload = xfrm_get_type_offload(x->id.proto, family, 
> > > offload);
> > > -
> > > err = x->type->init_state(x, extack);
> > > if (err)
> > >

Re: [Intel-wired-lan] [PATCH iwl-net] ice: health.c: fix compilation on gcc 7.5

2025-02-06 Thread David Laight

On Wed, 5 Feb 2025 20:45:46 +
Simon Horman  wrote:

> + Jiri
> 
> On Wed, Feb 05, 2025 at 11:42:12AM +0100, Przemek Kitszel wrote:
> > GCC 7 is not as good as GCC 8+ in telling what is a compile-time const,
> > and thus could be used for static storage. So we could not use variables
> > for that, no matter how much "const" keyword is sprinkled around.
> > 
> > Excerpt from the report:
> > My GCC is: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.
> > 
> >   CC [M]  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o
> > drivers/net/ethernet/intel/ice/devlink/health.c:35:3: error: initializer 
> > element is not constant
> >ice_common_port_solutions, {ice_port_number_label}},
> >^
> > drivers/net/ethernet/intel/ice/devlink/health.c:35:3: note: (near 
> > initialization for 'ice_health_status_lookup[0].solution')
> > drivers/net/ethernet/intel/ice/devlink/health.c:35:31: error: initializer 
> > element is not constant
> >ice_common_port_solutions, {ice_port_number_label}},
> >^
> > drivers/net/ethernet/intel/ice/devlink/health.c:35:31: note: (near 
> > initialization for 'ice_health_status_lookup[0].data_label[0]')
> > drivers/net/ethernet/intel/ice/devlink/health.c:37:46: error: initializer 
> > element is not constant
> >"Change or replace the module or cable.", {ice_port_number_label}},
> >   ^
> > drivers/net/ethernet/intel/ice/devlink/health.c:37:46: note: (near 
> > initialization for 'ice_health_status_lookup[1].data_label[0]')
> > drivers/net/ethernet/intel/ice/devlink/health.c:39:3: error: initializer 
> > element is not constant
> >ice_common_port_solutions, {ice_port_number_label}},
> >^
> > 
> > Fixes: 85d6164ec56d ("ice: add fw and port health reporters")
> > Reported-by: Qiuxu Zhuo 
> > Closes: 
> > https://lore.kernel.org/netdev/cy8pr11mb7134bf7a46d71e50d25fa7a989...@cy8pr11mb7134.namprd11.prod.outlook.com
> > Signed-off-by: Przemek Kitszel 
> > ---
> > I would really like to bump min gcc to 8.5 (RH 8 family),
> > instead of supporting old Ubuntu. However SLES 15 is also stuck with gcc 
> > 7.5 :(
> > 
> > CC: Linus Torvalds 
> > CC: Kees Cook 
> > CC: Nick Desaulniers   
> 
> Hi Prezemek,
> 
> I ran into a similar problem not so long ago and I'm wondering if
> the following, based on a suggestion by Jiri Slaby, resolves your
> problem.

I'm sure I remember from somewhere that although the variables are
'static const' they have to be real variables because they can still
be patched.

Which stops you using their contents as initialisers.

Maybe I'm mis-remembering it.

David

> 
> diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c 
> b/drivers/net/ethernet/intel/ice/devlink/health.c
> index ea40f7941259..19c3d37aa768 100644
> --- a/drivers/net/ethernet/intel/ice/devlink/health.c
> +++ b/drivers/net/ethernet/intel/ice/devlink/health.c
> @@ -25,10 +25,10 @@ struct ice_health_status {
>   * The below lookup requires to be sorted by code.
>   */
>  
> -static const char *const ice_common_port_solutions =
> +static const char ice_common_port_solutions[] =
>   "Check your cable connection. Change or replace the module or cable. 
> Manually set speed and duplex.";
> -static const char *const ice_port_number_label = "Port Number";
> -static const char *const ice_update_nvm_solution = "Update to the latest NVM 
> image.";
> +static const char ice_port_number_label[] = "Port Number";
> +static const char ice_update_nvm_solution[] = "Update to the latest NVM 
> image.";
>  
>  static const struct ice_health_status ice_health_status_lookup[] = {
>   {ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT, "An unsupported module 
> was detected.",
> 
> 
> Link: 
> https://lore.kernel.org/netdev/485dbc5a-a04b-40c2-9481-955eaa5ce...@kernel.org/
> Link: https://git.kernel.org/netdev/net-next/c/36fb51479e3c
>

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

2025-02-06 Thread Bharat Bhushan

Hi Leon,

On Wed, Feb 5, 2025 at 11:50 PM Leon Romanovsky  wrote:
>
> From: Leon Romanovsky 
>
> XFRM offload path is probed even if offload isn't needed at all. Let's
> make sure that x->type_offload pointer stays NULL for such path to
> reduce ambiguity.
>
> Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.")
> Signed-off-by: Leon Romanovsky 
> ---
>  include/net/xfrm.h | 12 +++-
>  net/xfrm/xfrm_device.c | 14 +-
>  net/xfrm/xfrm_state.c  | 22 +-
>  net/xfrm/xfrm_user.c   |  2 +-
>  4 files changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index ed4b83696c77..28355a5be5b9 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -464,6 +464,16 @@ struct xfrm_type_offload {
>
>  int xfrm_register_type_offload(const struct xfrm_type_offload *type, 
> unsigned short family);
>  void xfrm_unregister_type_offload(const struct xfrm_type_offload *type, 
> unsigned short family);
> +const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto,
> + unsigned short family);
> +static inline void xfrm_put_type_offload(const struct xfrm_type_offload 
> *type)
> +{
> +   if (!type)
> +   return;
> +
> +   module_put(type->owner);
> +}
> +
>
>  /**
>   * struct xfrm_mode_cbs - XFRM mode callbacks
> @@ -1760,7 +1770,7 @@ void xfrm_spd_getinfo(struct net *net, struct 
> xfrmk_spdinfo *si);
>  u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
>  int xfrm_init_replay(struct xfrm_state *x, struct netlink_ext_ack *extack);
>  u32 xfrm_state_mtu(struct xfrm_state *x, int mtu);
> -int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
> +int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
>   struct netlink_ext_ack *extack);
>  int xfrm_init_state(struct xfrm_state *x);
>  int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
> diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
> index d1fa94e52cea..e01a7f5a4c75 100644
> --- a/net/xfrm/xfrm_device.c
> +++ b/net/xfrm/xfrm_device.c
> @@ -244,11 +244,6 @@ int xfrm_dev_state_add(struct net *net, struct 
> xfrm_state *x,
> xfrm_address_t *daddr;
> bool is_packet_offload;
>
> -   if (!x->type_offload) {
> -   NL_SET_ERR_MSG(extack, "Type doesn't support offload");
> -   return -EINVAL;
> -   }
> -
> if (xuo->flags &
> ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND | 
> XFRM_OFFLOAD_PACKET)) {
> NL_SET_ERR_MSG(extack, "Unrecognized flags in offload 
> request");
> @@ -310,6 +305,13 @@ int xfrm_dev_state_add(struct net *net, struct 
> xfrm_state *x,
> return -EINVAL;
> }
>
> +   x->type_offload = xfrm_get_type_offload(x->id.proto, x->props.family);
> +   if (!x->type_offload) {
> +   NL_SET_ERR_MSG(extack, "Type doesn't support offload");
> +   dev_put(dev);
> +   return -EINVAL;
> +   }
> +
> xso->dev = dev;
> netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
> xso->real_dev = dev;
> @@ -332,6 +334,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state 
> *x,
> netdev_put(dev, &xso->dev_tracker);
> xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
>
> +   xfrm_put_type_offload(x->type_offload);
> +   x->type_offload = NULL;

We always set type_offload to NULL. Can we move type_offload = NULL in
xfrm_put_type_offload() ?

Thanks
-Bharat

> /* User explicitly requested packet offload mode and 
> configured
>  * policy in addition to the XFRM state. So be civil to users,
>  * and return an error instead of taking fallback path.
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index ad2202fa82f3..568fe8df7741 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -424,11 +424,12 @@ void xfrm_unregister_type_offload(const struct 
> xfrm_type_offload *type,
>  }
>  EXPORT_SYMBOL(xfrm_unregister_type_offload);
>
> -static const struct xfrm_type_offload *
> -xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
> +const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto,
> + unsigned short family)
>  {
> const struct xfrm_type_offload *type = NULL;
> struct xfrm_state_afinfo *afinfo;
> +   bool try_load = true;
>
>  retry:
> afinfo = xfrm_state_get_afinfo(family);
> @@ -456,11 +457,7 @@ xfrm_get_type_offload(u8 proto, unsigned short family, 
> bool try_load)
>
> return type;
>  }
> -
> -static void xfrm_put_type_offload(const struct xfrm_type_offload *type)
> -{
> -   module_put(type->owner);
> -}
> +EXPORT_SYMBOL(xfrm_get_type_offload);
>
>  static const struct xfrm_mode xfrm4_mode_map[XFRM_MOD

Re: [Intel-wired-lan] [PATCH] igc: Fix HW RX timestamp when passed by ZC XDP

2025-02-06 Thread Mor Bar-Gabay


On 28/01/2025 14:26, Zdenek Bouska wrote:

Fixes HW RX timestamp in the following scenario:
- AF_PACKET socket with enabled HW RX timestamps is created
- AF_XDP socket with enabled zero copy is created
- frame is forwarded to the BPF program, where the timestamp should
   still be readable (extracted by igc_xdp_rx_timestamp(), kfunc
   behind bpf_xdp_metadata_rx_timestamp())
- the frame got XDP_PASS from BPF program, redirecting to the stack
- AF_PACKET socket receives the frame with HW RX timestamp

Moves the skb timestamp setting from igc_dispatch_skb_zc() to
igc_construct_skb_zc() so that igc_construct_skb_zc() is similar to
igc_construct_skb().

This issue can also be reproduced by running:
  # tools/testing/selftests/bpf/xdp_hw_metadata enp1s0
When a frame with the wrong port 9092 (instead of 9091) is used:
  # echo -n xdp | nc -u -q1 192.168.10.9 9092
then the RX timestamp is missing and xdp_hw_metadata prints:
  skb hwtstamp is not found!

With this fix or when copy mode is used:
  # tools/testing/selftests/bpf/xdp_hw_metadata -c enp1s0
then RX timestamp is found and xdp_hw_metadata prints:
  found skb hwtstamp = 1736509937.852786132

Fixes: 069b142f5819 ("igc: Add support for PTP .getcyclesx64()")
Signed-off-by: Zdenek Bouska 
Acked-by: Vinicius Costa Gomes 
Reviewed-by: Simon Horman 
Reviewed-by: Florian Bezdeka 
Reviewed-by: Song Yoong Siang 
---
  drivers/net/ethernet/intel/igc/igc_main.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)


Tested-by: Mor Bar-Gabay

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

2025-02-06 Thread Leon Romanovsky

On Thu, Feb 06, 2025 at 07:29:13PM +0530, Bharat Bhushan wrote:
> On Thu, Feb 6, 2025 at 2:24 PM Leon Romanovsky  wrote:
> >
> > On Thu, Feb 06, 2025 at 02:16:08PM +0530, Bharat Bhushan wrote:
> > > Hi Leon,
> > >
> > > On Wed, Feb 5, 2025 at 11:50 PM Leon Romanovsky  wrote:
> > > >
> > > > From: Leon Romanovsky 
> > > >
> > > > XFRM offload path is probed even if offload isn't needed at all. Let's
> > > > make sure that x->type_offload pointer stays NULL for such path to
> > > > reduce ambiguity.
> > > >
> > > > Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.")
> > > > Signed-off-by: Leon Romanovsky 
> > > > ---
> > > >  include/net/xfrm.h | 12 +++-
> > > >  net/xfrm/xfrm_device.c | 14 +-
> > > >  net/xfrm/xfrm_state.c  | 22 +-
> > > >  net/xfrm/xfrm_user.c   |  2 +-
> > > >  4 files changed, 30 insertions(+), 20 deletions(-)
> >
> > <...>
> >
> > > > +   x->type_offload = xfrm_get_type_offload(x->id.proto, 
> > > > x->props.family);
> > > > +   if (!x->type_offload) {
> >
> > <...>
> >
> > > > +   xfrm_put_type_offload(x->type_offload);
> > > > +   x->type_offload = NULL;
> > >
> > > We always set type_offload to NULL. Can we move type_offload = NULL in
> > > xfrm_put_type_offload() ?
> >
> > We can, but it will require change to xfrm_get_type_offload() too,
> > otherwise we will get asymmetrical get/put.
> 
> "x->type_offload = NULL" is always set after the put() function. so I
> thought that maybe moving "x->type_offload = NULL" to the put()
> function would simplify.
> Yes, get/put will be asymmetric. Maybe setting "x->type_offload" can
> be done in get/put().
> Anyway it is not a major comment. ignore if this does not simplify.

Thanks, let's wait for other comments. If I need respin the series, I'll
change the functions to the proposed below.

Thanks


> 
> Thanks
> -Bharat
> 
> >
> > Do you want something like that?
> > int xfrm_get_type_offload(struct xfrm_state *x);
> > void xfrm_put_type_offload(struct xfrm_state *x);
> >
> > Thansk
> >
> > >
> > > Thanks
> > > -Bharat
> > >
> > > > /* User explicitly requested packet offload mode and 
> > > > configured
> > > >  * policy in addition to the XFRM state. So be civil to 
> > > > users,
> > > >  * and return an error instead of taking fallback path.
> > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > > > index ad2202fa82f3..568fe8df7741 100644
> > > > --- a/net/xfrm/xfrm_state.c
> > > > +++ b/net/xfrm/xfrm_state.c
> > > > @@ -424,11 +424,12 @@ void xfrm_unregister_type_offload(const struct 
> > > > xfrm_type_offload *type,
> > > >  }
> > > >  EXPORT_SYMBOL(xfrm_unregister_type_offload);
> > > >
> > > > -static const struct xfrm_type_offload *
> > > > -xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
> > > > +const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto,
> > > > + unsigned short 
> > > > family)
> > > >  {
> > > > const struct xfrm_type_offload *type = NULL;
> > > > struct xfrm_state_afinfo *afinfo;
> > > > +   bool try_load = true;
> > > >
> > > >  retry:
> > > > afinfo = xfrm_state_get_afinfo(family);
> > > > @@ -456,11 +457,7 @@ xfrm_get_type_offload(u8 proto, unsigned short 
> > > > family, bool try_load)
> > > >
> > > > return type;
> > > >  }
> > > > -
> > > > -static void xfrm_put_type_offload(const struct xfrm_type_offload *type)
> > > > -{
> > > > -   module_put(type->owner);
> > > > -}
> > > > +EXPORT_SYMBOL(xfrm_get_type_offload);
> > > >
> > > >  static const struct xfrm_mode xfrm4_mode_map[XFRM_MODE_MAX] = {
> > > > [XFRM_MODE_BEET] = {
> > > > @@ -609,8 +606,6 @@ static void ___xfrm_state_destroy(struct xfrm_state 
> > > > *x)
> > > > kfree(x->coaddr);
> > > > kfree(x->replay_esn);
> > > > kfree(x->preplay_esn);
> > > > -   if (x->type_offload)
> > > > -   xfrm_put_type_offload(x->type_offload);
> > > > if (x->type) {
> > > > x->type->destructor(x);
> > > > xfrm_put_type(x->type);
> > > > @@ -784,6 +779,9 @@ void xfrm_dev_state_free(struct xfrm_state *x)
> > > > struct xfrm_dev_offload *xso = &x->xso;
> > > > struct net_device *dev = READ_ONCE(xso->dev);
> > > >
> > > > +   xfrm_put_type_offload(x->type_offload);
> > > > +   x->type_offload = NULL;
> > > > +
> > > > if (dev && dev->xfrmdev_ops) {
> > > > spin_lock_bh(&xfrm_state_dev_gc_lock);
> > > > if (!hlist_unhashed(&x->dev_gclist))
> > > > @@ -3122,7 +3120,7 @@ u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(xfrm_state_mtu);
> > > >
> > > > -int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool 
> > > > offload,
> > > > +int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
> > > >

Re: [Intel-wired-lan] [PATCH iwl-next v2 5/9] igc: Add support for frame preemption verification

2025-02-06 Thread Vladimir Oltean

On Thu, Feb 06, 2025 at 10:40:11PM +0800, Abdul Rahim, Faizal wrote:
> 
> Hi Vladimir,
> 
> Thanks for the quick review, appreciate your help.
> 
> On 6/2/2025 1:12 am, Vladimir Oltean wrote:
> > On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote:
> > > This patch implements the "ethtool --set-mm" callback to trigger the
> > > frame preemption verification handshake.
> > > 
> > > Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool
> > > to perform the verification handshake for igc.
> > > The structure fpe.mmsv is set by mmsv in ethtool and should remain
> > > read-only for the driver.
> > > 
> > > igc does not use two mmsv callbacks:
> > > a) configure_tx()
> > > - igc lacks registers to configure FPE in the transmit direction.
> > 
> > Yes, maybe, but it's still important to handle this. It tells you when
> > the preemptible traffic classes should be sent as preemptible on the wire
> > (i.e. when the verification is either disabled, or it succeeded).
> > 
> > There is a selftest called manual_failed_verification() which supposedly
> > tests this exact condition: if verification fails, then packets sent to
> > TC0 are supposed to bump the eMAC's TX counters, even though TC0 is
> > configured as preemptible. Otherwise stated: even if the tc program says
> > that a certain traffic class is preemptible, you don't want to actually
> > send preemptible packets if you haven't verified the link partner can
> > handle them, since it will likely drop them on RX otherwise.
> 
> Even though fpe in tx direction isn't set in igc, it still checks
> ethtool_mmsv_is_tx_active() before setting a queue as preemptible.
> 
> This is done in :
> igc_tsn_enable_offload(struct igc_adapter *adapter) {
> {
>   
>   if (ethtool_mmsv_is_tx_active(&adapter->fpe.mmsv) &&
> ring->preemptible)
>   txqctl |= IGC_TXQCTL_PREEMPTIBLE;
> 
> 
> Wouldn't this handle the situation mentioned ?
> Sorry if I miss something here.

And what if tx_active becomes true after you had already configured the
queues with tc (and the above check caused IGC_TXQCTL_PREEMPTIBLE to not
be set)? Shouldn't you set IGC_TXQCTL_PREEMPTIBLE now? Isn't
ethtool_mmsv_configure_tx() exactly the function that notifies you of
changes to tx_active, and hence, aren't you interested in setting up a
callback for it?

Or is igc_tsn_reset() -> igc_tsn_enable_offload() called through some
other path, after verification succeeds, that I'm not seeing? I don't
think so. Maybe coincidentally, but not guaranteed.

> I briefly checked the driver code and the i226 SW User Manual.
> 
> The code calls igc_reset_task() → igc_reset() → igc_reinit_locked() →
> igc_down() → igc_reset() → igc_power_down_phy_copper_base().
> 
> I suspect igc_power_down_phy_copper_base() contributes to the link loss, but
> there are likely other factors as well.
> 
> The SW User Manual states that a software reset (CTRL.DEV_RST) affects
> several components, including "Port Configuration Autoload from NVM, Data
> Path, On-die Memories, MAC, PCS, Auto Negotiation and other Port-related
> Logic, Function Queue Enable, Function Interrupt and Statistics registers,
> Wake-up Management Registers, and Memory Configuration Registers."
> 
> Given this, right now, I’m unsure about the feasibility of making this
> change, though I see the benefits mentioned.
> Validating this would require significant exploration—i.e., investigating
> the code, running experiments, reviewing commit history, confirming hardware
> expectations from the SW manual, and consulting other hardware SMEs.
> 
> Resetting the adapter is a common mechanism in igc that relies on shared
> functions, which can be triggered in various scenarios. Modifying this
> behavior (if feasible) could introduce some risks and would likely require
> additional testing across those scenarios. Focusing on this right now might
> delay the current series, which is primarily aimed at enabling Qbu on
> i225/6.
> 
> Would it be alright if I explore this separately from this Qbu series?

Ok - as I said, it's not as if I couldn't eventually tolerate the workaround
you chose. We'd be putting a dependency of this feature on some unrelated
thing with a high degree of uncertainty. For example, I asked if this is
fiber or a twisted pair PHY, because while for the copper PHY the issue
might be more tractable (clause 28 autoneg on the media side is completely
independent of what the host side NIC is doing - it may reset, it may do
whatever - unless the MAC provides a clock that is important for PHY
operation), I do wonder if anything at all could be done to avoid the
link partner seeing a link loss over fiber, because there, the connection
is direct PCS-to-PCS. If you have to reset, you have to reset, and then
pick up the pieces, I understand.

It's good you're committing to look into this in more depth, though.

[Intel-wired-lan] [PATCH iwl-net] ixgbe: fix media cage present detection for E610 device

2025-02-06 Thread Piotr Kwapulinski

The commit 23c0e5a16bcc ("ixgbe: Add link management support for E610
device") introduced incorrect checking of media cage presence for E610
device. Fix it.

Fixes: 23c0e5a16bcc ("ixgbe: Add link management support for E610 device")
Reported-by: Dan Carpenter 
Closes: 
https://lore.kernel.org/all/e7d73b32-f12a-49d1-8b60-1ef83359ec13@stanley.mountain/
Reviewed-by: Michal Swiatkowski 
Reviewed-by: Przemek Kitszel 
Signed-off-by: Piotr Kwapulinski 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
index 683c668..cb07ecd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
@@ -1122,7 +1122,7 @@ static bool ixgbe_is_media_cage_present(struct ixgbe_hw 
*hw)
 * returns error (ENOENT), then no cage present. If no cage present then
 * connection type is backplane or BASE-T.
 */
-   return ixgbe_aci_get_netlist_node(hw, cmd, NULL, NULL);
+   return !ixgbe_aci_get_netlist_node(hw, cmd, NULL, NULL);
 }
 
 /**
-- 
2.43.0

[Intel-wired-lan] [PATCH iwl-next v1 1/3] ice: Add sync delay for E825C

[Intel-wired-lan] [PATCH iwl-next v1 2/3] ice: Refactor E825C PHY registers info struct

[Intel-wired-lan] [PATCH iwl-next v1 3/3] ice: E825C PHY register cleanup

[Intel-wired-lan] [PATCH iwl-next v1 0/3] E825C PTP cleanup

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

Re: [Intel-wired-lan] [PATCH bpf-next v9 4/5] igc: Refactor empty frame insertion for launch time support

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: add support for thermal sensor event reception

Re: [Intel-wired-lan] [PATCH iwl-next v2 5/9] igc: Add support for frame preemption verification

[Intel-wired-lan] [tnguy-next-queue:for-next] BUILD SUCCESS d67627e7b53203ca150e54723abbed81a0716286

Re: [Intel-wired-lan] [PATCH bpf-next v9 3/5] net: stmmac: Add launch time support to XDP ZC

Re: [Intel-wired-lan] [PATCH bpf-next v9 5/5] igc: Add launch time support to XDP ZC

Re: [Intel-wired-lan] suspend/resume broken of igc driver broken on 6.12

Re: [Intel-wired-lan] suspend/resume broken of igc driver broken on 6.12

Re: [Intel-wired-lan] [PATCH iwl-next v1 01/13] ixgbe: add initial devlink support

Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: health.c: fix compilation on gcc 7.5

[Intel-wired-lan] [PATCH bpf-next v10 5/5] igc: Add launch time support to XDP ZC

[Intel-wired-lan] [PATCH iwl-net v2] ice: health.c: fix compilation on gcc 7.5

[Intel-wired-lan] [PATCH bpf-next v10 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata

[Intel-wired-lan] [PATCH bpf-next v10 0/5] xsk: TX metadata Launch Time support

[Intel-wired-lan] [PATCH bpf-next v10 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata

[Intel-wired-lan] [PATCH bpf-next v10 3/5] net: stmmac: Add launch time support to XDP ZC

[Intel-wired-lan] [PATCH bpf-next v10 4/5] igc: Refactor empty frame insertion for launch time support

Re: [Intel-wired-lan] [PATCH net-next v7 1/5] net: move ARFS rmap management to core

Re: [Intel-wired-lan] [PATCH net-next v7 2/5] net: napi: add CPU affinity to napi_config

Re: [Intel-wired-lan] [PATCH net-next v7 2/5] net: napi: add CPU affinity to napi_config

[Intel-wired-lan] [tnguy-next-queue:100GbE] BUILD SUCCESS d67c484647606b83c9e5cc7d445ff7d7d2012fa2

Re: [Intel-wired-lan] [PATCH net-next v7 0/5] net: napi: add CPU affinity to napi->config

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

Re: [Intel-wired-lan] [PATCH iwl-net] ice: health.c: fix compilation on gcc 7.5

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

Re: [Intel-wired-lan] [PATCH] igc: Fix HW RX timestamp when passed by ZC XDP

Re: [Intel-wired-lan] [PATCH ipsec-next 1/5] xfrm: delay initialization of offload path till its actually requested

Re: [Intel-wired-lan] [PATCH iwl-next v2 5/9] igc: Add support for frame preemption verification

[Intel-wired-lan] [PATCH iwl-net] ixgbe: fix media cage present detection for E610 device

36 matches

Site Navigation

Mail list logo

Footer information