This is the log from the HWE kernel:

[33219.508873] ------------[ cut here ]------------
[33219.508877] NETDEV WATCHDOG: enp161s0f1 (ice): transmit queue 35 timed out
[33219.508932] WARNING: CPU: 48 PID: 0 at net/sched/sch_generic.c:525 
dev_watchdog+0x21f/0x230
[33219.508940] Modules linked in: sch_ingress nf_conntrack_netlink geneve 
ip6_udp_tunnel udp_tunnel xt_CT dm_crypt scsi_transport_iscsi veth 
nfnetlink_cttimeout openvswitch nsh nf_conncount unix_diag nft_masq zfs(PO) 
zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) 
vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock 
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
nf_tables nfnetlink bridge sunrpc nvme_fabrics 8021q garp mrp stp llc bonding 
tls binfmt_misc ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac 
edac_mce_amd dell_wmi kvm_amd video ledtrig_audio nls_iso8859_1 irdma 
sparse_keymap kvm i40e irqbypass dell_smbios dcdbas ib_uverbs rapl 
dell_wmi_descriptor wmi_bmof ib_core ccp ptdma k10temp acpi_ipmi ipmi_si 
ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops
[33219.509051]  reed_solomon pstore_blk pstore_zone efi_pstore ip_tables 
x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear cdc_ether usbnet mii mgag200 i2c_algo_bit drm_shmem_helper 
drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect sysimgblt crc32_pclmul 
bcache polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 nvme 
aesni_intel crypto_simd nvme_core ahci xhci_pci cryptd ice tg3 libahci drm 
megaraid_sas i2c_piix4 xhci_pci_renesas nvme_common wmi
[33219.509114] CPU: 48 PID: 0 Comm: swapper/48 Tainted: P           O       
6.2.0-32-generic #32~22.04.1-Ubuntu
[33219.509116] Hardware name: Dell Inc. PowerEdge R7525/03WYW4, BIOS 2.12.4 
07/26/2023
[33219.509118] RIP: 0010:dev_watchdog+0x21f/0x230
[33219.509122] Code: 00 e9 31 ff ff ff 4c 89 e7 c6 05 66 83 78 01 01 e8 56 00 
f8 ff 44 89 f1 4c 89 e6 48 c7 c7 08 4f e4 b7 48 89 c2 e8 61 df 2b ff <0f> 0b e9 
22 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[33219.509123] RSP: 0018:ffffb42719fd0e70 EFLAGS: 00010246
[33219.509125] RAX: 0000000000000000 RBX: ffff9bd91b3e74c8 RCX: 0000000000000000
[33219.509126] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[33219.509127] RBP: ffffb42719fd0e98 R08: 0000000000000000 R09: 0000000000000000
[33219.509128] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9bd91b3e7000
[33219.509129] R13: ffff9bd91b3e741c R14: 0000000000000023 R15: 0000000000000000
[33219.509130] FS:  0000000000000000(0000) GS:ffff9b573de00000(0000) 
knlGS:0000000000000000
[33219.509132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[33219.509133] CR2: 000055fd64034000 CR3: 0000010273ae2004 CR4: 0000000000770ee0
[33219.509135] PKRU: 55555554
[33219.509135] Call Trace:
[33219.509137]  <IRQ>
[33219.509140]  ? show_regs+0x72/0x90
[33219.509145]  ? dev_watchdog+0x21f/0x230
[33219.509147]  ? __warn+0x8d/0x160
[33219.509151]  ? dev_watchdog+0x21f/0x230
[33219.509154]  ? report_bug+0x1bb/0x1d0
[33219.509158]  ? handle_bug+0x46/0x90
[33219.509162]  ? exc_invalid_op+0x19/0x80
[33219.509165]  ? asm_exc_invalid_op+0x1b/0x20
[33219.509171]  ? dev_watchdog+0x21f/0x230
[33219.509174]  ? __pfx_dev_watchdog+0x10/0x10
[33219.509176]  call_timer_fn+0x2c/0x160
[33219.509180]  ? __pfx_dev_watchdog+0x10/0x10
[33219.509182]  __run_timers.part.0+0x1fb/0x2b0
[33219.509185]  ? ktime_get+0x46/0xc0
[33219.509187]  ? __pfx_tick_sched_timer+0x10/0x10
[33219.509191]  ? native_apic_msr_write+0x46/0x70
[33219.509194]  ? lapic_next_event+0x20/0x30
[33219.509197]  ? clockevents_program_event+0xb5/0x140
[33219.509200]  run_timer_softirq+0x2a/0x60
[33219.509202]  __do_softirq+0xdd/0x330
[33219.509205]  ? hrtimer_interrupt+0x12b/0x250
[33219.509208]  __irq_exit_rcu+0xa2/0xd0
[33219.509210]  irq_exit_rcu+0xe/0x20
[33219.509212]  sysvec_apic_timer_interrupt+0x96/0xb0
[33219.509215]  </IRQ>
[33219.509216]  <TASK>
[33219.509216]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[33219.509219] RIP: 0010:mwait_idle+0x55/0x90
[33219.509222] Code: 31 d2 48 89 d1 65 48 8b 04 25 40 18 03 00 0f 01 c8 48 8b 
00 a8 08 75 14 eb 07 0f 00 2d 24 d2 35 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 
0f 1f 44 00 00 65 48 8b 04 25 40 18 03 00 f0 80 60 02 df
[33219.509224] RSP: 0018:ffffb42700587e80 EFLAGS: 00000246
[33219.509225] RAX: 0000000000000000 RBX: ffff9ad9ccd999c0 RCX: 0000000000000000
[33219.509226] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[33219.509227] RBP: ffffb42700587e80 R08: 0000000000000000 R09: 0000000000000000
[33219.509229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[33219.509230] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[33219.509232]  arch_cpu_idle+0x15/0x20
[33219.509235]  default_idle_call+0x4a/0x120
[33219.509237]  cpuidle_idle_call+0x185/0x1e0
[33219.509241]  do_idle+0x82/0x110
[33219.509243]  cpu_startup_entry+0x20/0x30
[33219.509245]  start_secondary+0x122/0x160
[33219.509248]  secondary_startup_64_no_verify+0xe5/0xeb
[33219.509253]  </TASK>
[33219.509254] ---[ end trace 0000000000000000 ]---
[33220.417178] ice 0000:a1:00.1 enp161s0f1: tx_timeout: VSI_num: 8, Q 35, NTC: 
0x42, HW_HEAD: 0x41, NTU: 0x42, INT: 0x0
[33220.417186] ice 0000:a1:00.1 enp161s0f1: tx_timeout recovery level 1, 
txqueue 35
[33223.905010] bond0: (slave enp161s0f1): link status definitely down, 
disabling slave
[33223.905018] bond0: active interface up!
[33224.344729] ice 0000:a1:00.1: PTP reset successful
[33655.093659] ice 0000:a1:00.1: VSI rebuilt. VSI index 0, type ICE_VSI_PF
[33655.104975] ice 0000:a1:00.1: VSI rebuilt. VSI index 383, type ICE_VSI_CTRL
[33655.217315] bond0: (slave enp161s0f1): link status definitely up, 25000 Mbps 
full duplex
[33666.895550] ice 0000:a1:00.1 enp161s0f1: tx_timeout: VSI_num: 8, Q 92, NTC: 
0x17, HW_HEAD: 0x25, NTU: 0x26, INT: 0x0
[33666.895557] ice 0000:a1:00.1 enp161s0f1: tx_timeout recovery level 1, 
txqueue 92
[33670.816422] bond0: (slave enp161s0f1): link status definitely down, 
disabling slave
[33671.261841] ice 0000:a1:00.1: PTP reset successful
[33961.392293] ice 0000:a1:00.1: VSI rebuilt. VSI index 0, type ICE_VSI_PF
[33961.410920] ice 0000:a1:00.1: VSI rebuilt. VSI index 383, type ICE_VSI_CTRL
[33961.476136] bond0: (slave enp161s0f1): link status definitely up, 25000 Mbps 
full duplex

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2036239

Title:
  Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

Status in linux package in Ubuntu:
  New

Bug description:
  
  I'm having issues with an Intel E810-XXV card on a Dell server under Ubuntu 
Jammy.

  Details:

  - hardware --> a1:00.0 Ethernet controller: Intel Corporation Ethernet
  Controller E810-XXV for SFP (rev 02)

  - tested with both GA and HWE kernels (`5.15.0-83-generic #92` and
  `6.2.0-32-generic #32~22.04.1-Ubuntu`) with the same results.

  - using a bond over the two ports of the same card, at 25Gbps to two
  different switches, bond is using LACP with hash layer3+4 and fast
  timeout. But I believe the bug is not directly related to bonding as
  the problem seems to be in the interface.

  - machine installed by maas. No issues during installation, but at
  that time bond is not formed yet, later when linux is booted, the bond
  is formed and works without issues for a while

  - it works for about 2 to 3 hours fine, then the issue starts (may or
  may not be related to network load, but it seems that it is triggered
  by some tests that I run after openstack finishes installing)

  - one of the legs of the bond freezes and everything that would go to
  that lag is discarded, in and out, ping to random external hosts start
  losing every second packet

  - after some time you can see on the kernel log messages about "NETDEV
  WATCHDOG: enp161s0f0 (ice): transmit queue 166 timed out" and a stack
  trace

  - the switch does log that the bond is flapping

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to