Your message dated Wed, 19 Feb 2025 15:49:25 +0100 (CET)
with message-id <20250219144925.7c335be2...@eldamar.lan>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #929359,
regarding linux: instability on arm64 MP30-AR1 servers
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
929359: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929359
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Source: linux
Version: 4.9.168-1
Severity: important
X-Debbugs-Cc: debian-...@lists.debian.org, debian-ad...@lists.debian.org
User: debian-ad...@lists.debian.org
Usertags: needed-by-DSA-Team

Hi,

ever since the 9.9 point release conova-node01.debian.org and
conova-node02.debian.org have been unstable.  They run for an hour or
three, and then things go bad.  Rebooting back to 4.9.144-3.1 makes them
stable again.

Latest example:

May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: PingAck did not arrive in time.
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) 
pdsk( UpToDate -> DUnknown ) 
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: new current UUID 
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: ack_receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Terminating drbd_a_resource
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Connection closed
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( NetworkFailure -> Unconnected ) 
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Restarting receiver thread
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: receiver (re)started
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( Unconnected -> WFConnection ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Handshake successful: Agreed network protocol version 101
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC 
WRITE_SAME.
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Peer authenticated using 16 bytes HMAC
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( WFConnection -> WFReportParams ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Starting ack_recv thread (from drbd_r_resource [8449])
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: drbd_sync_handshake:
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: self 
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5 bits:4 
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: peer 
0BEBDA613EA56FD6:0000000000000000:D5BF70E0AA6560C4:D5BE70E0AA6560C5 bits:0 
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: uuid_compare()=1 by rule 70
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) 
pdsk( DUnknown -> Consistent ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), total 
28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), 
total 28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: helper command: /bin/true before-resync-source minor-3
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: helper command: /bin/true before-resync-source minor-3 exit code 0 
(0x0)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: Began resync as SyncSource (will sync 16 KB [4 bits set]).
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: updated sync UUID 
3EA2D1FA6B3ACD47:0BECDA613EA56FD7:0BEBDA613EA56FD7:D5BF70E0AA6560C5
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: Resync done (total 1 sec; paused 0 sec; 16 K/sec)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: updated UUIDs 
3EA2D1FA6B3ACD47:0000000000000000:0BECDA613EA56FD7:0BEBDA613EA56FD7
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
May 22 04:17:48 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI set_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad 
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Internal error: Oops - bad mode: 0 [#1] SMP
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Modules linked in: openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat binfmt_misc 
nls_ascii nls_cp437 vfat fat dm_mod ip6t_REJECT nf_reject_ipv6
 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT 
nf_reject_ipv4 xt_NFLOG nfnetlink_log nfnetlink xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_hashlimit xt_multiport xt_conntrack nf_conntr
ack iptable_filter ast ttm drm_kms_helper xgene_hwmon efi_pstore drm 
i2c_algo_bit xgene_edac edac_core xgene_dma joydev evdev chaoskey 
mailbox_xgene_slimpro sg xgene_rng rng_core efivars tun drbd lru_cache efivarfs 
ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
crc32c_generic libcrc32c raid0 multipath linear raid1 hid_generic md_mod usbhid 
hid sd_mod
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:  
i2c_xgene_slimpro ahci_xgene libahci_platform libahci xhci_plat_hcd xgene_enet 
xhci_hcd libata phy_xgene marvell usbcore scsi_mod mdio_xgene of_mdio fixed_phy 
libphy usb_common gpio_xgene_sb
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: CPU: 
0 PID: 1410 Comm: ovsdb-server Tainted: G        W I     4.9.0-9-arm64 #1 
Debian 4.9.168-1+deb9u2
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Hardware name: GIGABYTE R120-P31/MP30-AR1, BIOS D7b 08/26/2016
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
task: ffff807ff9d54380 task.stack: ffff807f95c94000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: PC 
is at 0xffffa10dbf00
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: LR 
is at 0xffffa13d221c
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: pc : 
[<0000ffffa10dbf00>] lr : [<0000ffffa13d221c>] pstate: a0000000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: sp : 
0000fffff72e8970
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x29: 
0000fffff72e8970 x28: 0000000000000000 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x27: 
0000aaaafa714d90 x26: 0000aaaafa7354c8 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x25: 
0000aaaafa6eaed0 x24: 0000000000000018 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x23: 
0000aaaafa72c660 x22: 0000aaaafa711b80 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x21: 
0000000000000004 x20: 000000000000000c 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x19: 
0000aaaafa702b90 x18: 00000000002597a9 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x17: 
0000ffffa10dbec0 x16: 0000ffffa14837a0 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x15: 
ffffffffffffffff x14: 0000000000000010 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x13: 
33613a63353a3834 x12: 3a66373a63613a36 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x11: 
0101010101010101 x10: 0000000066666666 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x9 : 
7f7f7f7f7f7f7f7f x8 : 0101010101010101 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x7 : 
7f7fffffff7f7f7f x6 : feffa9a9f970ff72 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x5 : 
8080000000008000 x4 : 0080000000008080 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x3 : 
0000aaaafa720073 x2 : 726f7272655f7874 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x1 : 
0000aaaafa711c20 x0 : 0000000000000008 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Process ovsdb-server (pid: 1410, stack limit = 0xffff807f95c94020)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: ---[ 
end trace 1fdaa7d4350a5508 ]---
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad 
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
INFO: rcu_bh detected stalls on CPUs/tasks:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:      
0-...: (1 GPs behind) idle=1fd/140000000000000/0 softirq=736283/736285 fqs=2434 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:      
(detected by 2, t=5255 jiffies, g=15038, c=15037, q=8)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Task 
dump for CPU 0:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
ovsdb-server    R  running task        0  1410   1409 0x0000000a
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Call 
trace:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008086190>] __switch_to+0x90/0xd8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b804>] bad_mode+0x6c/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000000021dc9afc>] 0x21dc9afc
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000000021db79b8>] 0x21db79b8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008610748>] virt_efi_set_variable.part.6+0x68/0xb0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008610898>] virt_efi_set_variable+0x78/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000860f020>] efivar_entry_set_safe+0xc8/0x200
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff0000010574b8>] efi_pstore_write+0x158/0x1b0 [efi_pstore]
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000830cdbc>] pstore_dump+0x17c/0x388
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008132a54>] kmsg_dump+0xac/0xd0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff0000080cf5cc>] oops_exit+0x2c/0x38
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b0a4>] die+0xdc/0x1c8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b818>] bad_mode+0x80/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000ffffa13d221c>] 0xffffa13d221c

I don't know if the drbd stuff is related to the Oops, I guess it may
not be (as I see similar messages before things break).  In any case
after that point the network is down.  The network driver is xgene-enet.

/etc/network/interfaces:

  # The loopback network interface
  auto lo
  iface lo inet loopback

  auto eth0
  iface eth0 inet manual
        pre-up    echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        pre-up    ip link set dev $IFACE up
        post-down ip link set dev $IFACE down

  # The primary network interface
  allow-hotplug br-inet
  iface br-inet inet static
        address 217.196.149.227/28
        gateway 217.196.149.238
  iface br-inet inet6 static
        address 2a02:16a8:dc41:100::227/64
        gateway 2a02:16a8:dc41:100::def

  auto eth1
  iface eth1 inet static
        address 172.29.186.11/24

  auto eth2
  iface eth2 inet static
        address 172.29.184.11/24

bridge config:

  # ovs-vsctl show
  91934a25-b86f-4d3a-a598-19f915404192
      Bridge br-inet
          Port "tap0"
              Interface "tap0"
          Port "eth0"
              Interface "eth0"
          Port br-inet
              Interface br-inet
                  type: internal
          Port "tap2"
              Interface "tap2"
                  error: "could not open network device tap2 (No such device)"
          Port "tap1"
              Interface "tap1"
      ovs_version: "2.6.2"

(the tap interfaces are for qemu VMs)

Cheers,
Julien

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply via email to