Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: Convert ret val type from s32 to int

2024-01-10 Thread Jagielski, Jedrzej
From: Nguyen, Anthony L  
Sent: Friday, January 5, 2024 10:50 PM

>On 1/5/2024 2:31 AM, Jedrzej Jagielski wrote:
>> Currently big amount of the functions returning standard error codes
>> are of type s32. Convert them to regular ints as typdefs here are
>> not necessary to return standard error codes.
>> 
>> Suggested-by: Jacob Keller 
>> Reviewed-by: Jacob Keller 
>> Signed-off-by: Jedrzej Jagielski 
>> ---
>> v2: fix smatch warnings,
>
>These changes aren't mentioned in the commit message, however, it should 
>probably be in it's own commit to keep the changes separate and easier 
>to see/review (since they pre-existed this patch).

OK, i agree it's better to create a separate commit for that

>
>Also, not seeing changes/fixes to the second warning [1]
>
>New smatch warnings:
>...
>drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c:3130 
>ixgbe_enter_lplu_t_x550em() warn: missing error code? 'status'
>
>> alter the commit msg
>> ---
>
>...
>
>> @@ -93,11 +93,11 @@ static s32 ixgbe_get_invariants_82598(struct ixgbe_hw 
>> *hw)
>>*  not known.  Perform the SFP init if necessary.
>>*
>>**/
>> -static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
>> +static int ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
>>   {
>>  struct ixgbe_mac_info *mac = &hw->mac;
>>  struct ixgbe_phy_info *phy = &hw->phy;
>> -s32 ret_val;
>> +int ret_val;
>>  u16 list_offset, data_offset;
>
>Should we RCT what we're touching? (many other instances in the patch as 
>well)

I am not sure if that's still in the scope of that patch

>
>...
>
>> @@ -1866,9 +1866,9 @@ static s32 ixgbe_enable_rx_dma_82599(struct ixgbe_hw 
>> *hw, u32 regval)
>>*  Returns IXGBE_ERR_EEPROM_VERSION if the FW is not present or
>>*  if the FW version is not supported.
>>**/
>> -static s32 ixgbe_verify_fw_version_82599(struct ixgbe_hw *hw)
>> +static int ixgbe_verify_fw_version_82599(struct ixgbe_hw *hw)
>>   {
>> -s32 status = IXGBE_ERR_EEPROM_VERSION;
>> +int status = IXGBE_ERR_EEPROM_VERSION;
>
>Does IXGBE_ERR_EEPROM_VERSION exist anymore after your other patch? [2]

Sure, need to rebase it

>
>>  u16 fw_offset, fw_ptp_cfg_offset;
>>  u16 offset;
>>  u16 fw_version = 0;
>
>...
>
>> @@ -2062,12 +2062,12 @@ static s32 ixgbe_reset_pipeline_82599(struct 
>> ixgbe_hw *hw)
>>*  Performs byte read operation to SFP module's EEPROM over I2C interface 
>> at
>>*  a specified device address.
>>**/
>> -static s32 ixgbe_read_i2c_byte_82599(struct ixgbe_hw *hw, u8 byte_offset,
>> +static int ixgbe_read_i2c_byte_82599(struct ixgbe_hw *hw, u8 byte_offset,
>>   u8 dev_addr, u8 *data)
>>   {
>>  u32 esdp;
>> -s32 status;
>> -s32 timeout = 200;
>> +int status;
>> +int timeout = 200;
>
>unrelated change (at least a few others in this patch as well)

Thank you for pointing this out, need to take a look on that once again

>
>>   
>>  if (hw->phy.qsfp_shared_i2c_bus == true) {
>>  /* Acquire I2C bus ownership. */
>> @@ -2116,12 +2116,12 @@ static s32 ixgbe_read_i2c_byte_82599(struct ixgbe_hw 
>> *hw, u8 byte_offset,
>>*  Performs byte write operation to SFP module's EEPROM over I2C 
>> interface at
>>*  a specified device address.
>>**/
>> -static s32 ixgbe_write_i2c_byte_82599(struct ixgbe_hw *hw, u8 byte_offset,
>> +static int ixgbe_write_i2c_byte_82599(struct ixgbe_hw *hw, u8 byte_offset,
>>u8 dev_addr, u8 data)
>>   {
>>  u32 esdp;
>> -s32 status;
>> -s32 timeout = 200;
>> +int status;
>> +int timeout = 200;
>>   
>>  if (hw->phy.qsfp_shared_i2c_bus == true) {
>>  /* Acquire I2C bus ownership. */
>
>Thanks,
>Tony
>
>[1] 
>https://lore.kernel.org/intel-wired-lan/08d8b75e-af80-438b-8006-9121b8444f49@moroto.mountain/
>[2] 
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit?id=5795f533f30a80aa0473652876296ebc9129e33a


Re: [Intel-wired-lan] [REGRESSION] Intel ICE Ethernet driver in linux >= 6.6.9 triggers extra memory consumption and cause continous kswapd* usage and continuous swapping

2024-01-10 Thread Jaroslav Pulchart
>
> Hello
>
> I would like to report a regression triggered by recent change in
> Intel ICE Ethernet driver in the 6.6.9 linux kernel. The problem was
> bisected and the regression is triggered by
> fc4d6d136d42fab207b3ce20a8ebfd61a13f931f "ice: alter feature support
> check for SRIOV and LAG" commit and originally reported as part of
> https://lore.kernel.org/linux-mm/cak8ffz4dy+gtba40pm7nn5xchy+51w3sfxpqkqpqaksxyyx...@mail.gmail.com/T/#m5217c62beb03b3bc75d7dd4b1d9bab64a3e68826
> thread.
>
> > However, after the following patch we see that more NUMA nodes have
> > such a low amount of memory and  that is causing constant reclaiming
> > of memory because it looks like something inside of the kernel ate all
> > the memory. This is right after the start of the system as well.
>
>  I'm reporting it here as it is a different problem than the original
> thread. The commit introduces a low memory problem per each numa node
> of the first socket (node0 .. node3 in our case) and cause constant
> kswapd* 100% CPU usage. See attached 6.6.9-kswapd_usage.png. The low
> memory issue is nicely visible in "numastat -m", see attached files:
> * numastat_m-6.6.10_28GB_HP_ice_revert.txt   >= 6.6.9 with reverted ice commit
> * numastat_m-6.6.10_28GB_HP_no_revert.txt>= 6.6.9 vanilla
> the server "is fresh" (after reboot), without running any application load.
>
> $ grep MemFree numastat_m-6.6.10_28GB_HP_ice_revert.txt
> numastat_m-6.6.10_28GB_HP_no_revert.txt
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 2756.89 2754.86  100.39 2278.43 < ice
> fix is reverted, we have ~2GB free per numa, except one, like before
> == no issue
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 3551.29 1530.52 2212.04 3488.09
> ...
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 127.52   66.49  120.23  263.47   <
> ice fix is present, we see just few MB free per each node, this will
> cause kswapd utilization!
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 3322.18 3134.47  195.55  879.17
> ...
>
> If you have some hints on how to debug what is actually occupying all
> that memory and some fix of the problem will be nice. We can provide
> testing and more reports if needed to analyze the issue. We reverted
> the commit fc4d6d136d42fab207b3ce20a8ebfd61a13f931f as a workaround
> till we know a proper fix.
>
> Best regards,
> Jaroslav Pulchart

Hello everyone

FYI: I try to use linuxk-6.7(.0) kernel and the issue is there as well.

Best regards,
Jaroslav Pulchart


Re: [Intel-wired-lan] [REGRESSION] Intel ICE Ethernet driver in linux >= 6.6.9 triggers extra memory consumption and cause continous kswapd* usage and continuous swapping

2024-01-10 Thread Jesse Brandeburg
On 1/8/2024 2:49 AM, Jaroslav Pulchart wrote:
> Hello

First, thank you for your work trying to chase this!

> 
> I would like to report a regression triggered by recent change in
> Intel ICE Ethernet driver in the 6.6.9 linux kernel. The problem was
> bisected and the regression is triggered by
> fc4d6d136d42fab207b3ce20a8ebfd61a13f931f "ice: alter feature support
> check for SRIOV and LAG" commit and originally reported as part of
> https://lore.kernel.org/linux-mm/cak8ffz4dy+gtba40pm7nn5xchy+51w3sfxpqkqpqaksxyyx...@mail.gmail.com/T/#m5217c62beb03b3bc75d7dd4b1d9bab64a3e68826
> thread.

I think that's a bad bisect. There is no reason I could understand for
that change to cause a continuous or large leak, it really doesn't make
any sense. Reverting it consistently helps? You're not just rewinding
the tree back to that point, right? just running 6.6.9 without that
patch? (sorry for being pedantic, just trying to be certain)


>> However, after the following patch we see that more NUMA nodes have
>> such a low amount of memory and  that is causing constant reclaiming
>> of memory because it looks like something inside of the kernel ate all
>> the memory. This is right after the start of the system as well.
> 
>  I'm reporting it here as it is a different problem than the original
> thread. The commit introduces a low memory problem per each numa node
> of the first socket (node0 .. node3 in our case) and cause constant
> kswapd* 100% CPU usage. See attached 6.6.9-kswapd_usage.png. The low
> memory issue is nicely visible in "numastat -m", see attached files:
> * numastat_m-6.6.10_28GB_HP_ice_revert.txt   >= 6.6.9 with reverted ice commit
> * numastat_m-6.6.10_28GB_HP_no_revert.txt>= 6.6.9 vanilla
> the server "is fresh" (after reboot), without running any application load.

OK, so the initial allocations of your system is running your system out
of memory.

Are you running jumbo frames on your ethernet interfaces?

Do you have /proc/slabinfo output from working/non-working boot?

> 
> $ grep MemFree numastat_m-6.6.10_28GB_HP_ice_revert.txt
> numastat_m-6.6.10_28GB_HP_no_revert.txt
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 2756.89 2754.86  100.39 2278.43 < ice
> fix is reverted, we have ~2GB free per numa, except one, like before
> == no issue
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 3551.29 1530.52 2212.04 3488.09
> ...
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 127.52   66.49  120.23  263.47   <


> ice fix is present, we see just few MB free per each node, this will
> cause kswapd utilization!
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 3322.18 3134.47  195.55  879.17
> ...
> 
> If you have some hints on how to debug what is actually occupying all
> that memory and some fix of the problem will be nice. We can provide
> testing and more reports if needed to analyze the issue. We reverted
> the commit fc4d6d136d42fab207b3ce20a8ebfd61a13f931f as a workaround
> till we know a proper fix.

My first suspicion is that we're contributing to the problem by running
out of receive descriptors memory.

Can we see the ethtool -S stats from the freshly booted system that's
running out of memory or doing OOM? Also, all the standard debugging
info (at least once please), devlink dev info, any other configuration
specifics? What networking config (bonding? anything else?)

Do you have a bugzilla.kernel.org bug yet where you can upload larger
files like dmesg and others?

Also, I'm curious if your problem goes away if you change / reduce the
number of queues per port. use ethtool -L eth0 combined 4 ?

You also said something about reproducing when launching / destroying
virtual machines with VF passthrough?

Can you reproduce the issue without starting qemu (just doing bare-metal
SR-IOV instance creation/destruction via
/sys/class/net/eth0/device/sriov_numvfs ?)

Thanks


Re: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: Convert ret val type from s32 to int

2024-01-10 Thread Tony Nguyen




On 1/10/2024 3:51 AM, Jagielski, Jedrzej wrote:

From: Nguyen, Anthony L 
Sent: Friday, January 5, 2024 10:50 PM


On 1/5/2024 2:31 AM, Jedrzej Jagielski wrote:


...


@@ -93,11 +93,11 @@ static s32 ixgbe_get_invariants_82598(struct ixgbe_hw *hw)
*  not known.  Perform the SFP init if necessary.
*
**/
-static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
+static int ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
   {
struct ixgbe_mac_info *mac = &hw->mac;
struct ixgbe_phy_info *phy = &hw->phy;
-   s32 ret_val;
+   int ret_val;
u16 list_offset, data_offset;


Should we RCT what we're touching? (many other instances in the patch as
well)


I am not sure if that's still in the scope of that patch


Not 100% the same situation, but it could be in the patch [1] or a 
separate one [2]. I prefer the latter as that's what I did :)


Thanks,
Tony

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5180ff1364bc26c031b54a68a80aa90ce0028b70
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5518ac2a64423f226e922b6719cf0eb62c31e141


[Intel-wired-lan] [PATCH iwl-net] i40e: Include types.h to some headers

2024-01-10 Thread Tony Nguyen
Commit 56df345917c0 ("i40e: Remove circular header dependencies and fix
headers") redistributed a number of includes from one large header file
to the locations they were needed. In some environments, types.h is not
included and causing compile issues. The driver should not rely on
implicit inclusion from other locations; explicitly include it to these
files.

Snippet of issue. Entire log can be seen through the Closes: link.

In file included from drivers/net/ethernet/intel/i40e/i40e_diag.h:7,
 from drivers/net/ethernet/intel/i40e/i40e_diag.c:4:
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:33:9: error: unknown type 
name '__le16'
   33 | __le16 flags;
  | ^~
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:34:9: error: unknown type 
name '__le16'
   34 | __le16 opcode;
  | ^~
...
drivers/net/ethernet/intel/i40e/i40e_diag.h:22:9: error: unknown type name 'u32'
   22 | u32 elements;   /* number of elements if array */
  | ^~~
drivers/net/ethernet/intel/i40e/i40e_diag.h:23:9: error: unknown type name 'u32'
   23 | u32 stride; /* bytes between each element */

Reported-by: Martin Zaharinov 
Closes: 
https://lore.kernel.org/netdev/21bbd62a-f874-4e42-b347-93087eea8...@gmail.com/
Fixes: 56df345917c0 ("i40e: Remove circular header dependencies and fix 
headers")
Reviewed-by: Jesse Brandeburg 
Signed-off-by: Tony Nguyen 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 1 +
 drivers/net/ethernet/intel/i40e/i40e_diag.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 18a1c3b6d72c..c8f35d4de271 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -5,6 +5,7 @@
 #define _I40E_ADMINQ_CMD_H_
 
 #include 
+#include 
 
 /* This header file defines the i40e Admin Queue commands and is shared between
  * i40e Firmware and Software.
diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.h 
b/drivers/net/ethernet/intel/i40e/i40e_diag.h
index ece3a6b9a5c6..ab20202a3da3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_diag.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_diag.h
@@ -4,6 +4,7 @@
 #ifndef _I40E_DIAG_H_
 #define _I40E_DIAG_H_
 
+#include 
 #include "i40e_adminq_cmd.h"
 
 /* forward-declare the HW struct for the compiler */
-- 
2.41.0