Source: linux
Version: 4.9.168-1
Severity: important
X-Debbugs-Cc: debian-...@lists.debian.org, debian-ad...@lists.debian.org
User: debian-ad...@lists.debian.org
Usertags: needed-by-DSA-Team

Hi,

ever since the 9.9 point release conova-node01.debian.org and
conova-node02.debian.org have been unstable.  They run for an hour or
three, and then things go bad.  Rebooting back to 4.9.144-3.1 makes them
stable again.

Latest example:

May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: PingAck did not arrive in time.
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) 
pdsk( UpToDate -> DUnknown ) 
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: new current UUID 
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: ack_receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Terminating drbd_a_resource
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Connection closed
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( NetworkFailure -> Unconnected ) 
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Restarting receiver thread
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: receiver (re)started
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( Unconnected -> WFConnection ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Handshake successful: Agreed network protocol version 101
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC 
WRITE_SAME.
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Peer authenticated using 16 bytes HMAC
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: conn( WFConnection -> WFReportParams ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd 
resource3: Starting ack_recv thread (from drbd_r_resource [8449])
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: drbd_sync_handshake:
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: self 
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5 bits:4 
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: peer 
0BEBDA613EA56FD6:0000000000000000:D5BF70E0AA6560C4:D5BE70E0AA6560C5 bits:0 
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: uuid_compare()=1 by rule 70
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) 
pdsk( DUnknown -> Consistent ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), total 
28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), 
total 28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: helper command: /bin/true before-resync-source minor-3
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: helper command: /bin/true before-resync-source minor-3 exit code 0 
(0x0)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: Began resync as SyncSource (will sync 16 KB [4 bits set]).
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: updated sync UUID 
3EA2D1FA6B3ACD47:0BECDA613EA56FD7:0BEBDA613EA56FD7:D5BF70E0AA6560C5
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: Resync done (total 1 sec; paused 0 sec; 16 K/sec)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: updated UUIDs 
3EA2D1FA6B3ACD47:0000000000000000:0BECDA613EA56FD7:0BEBDA613EA56FD7
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
block drbd3: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
May 22 04:17:48 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI set_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: 
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad 
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Internal error: Oops - bad mode: 0 [#1] SMP
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Modules linked in: openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat binfmt_misc 
nls_ascii nls_cp437 vfat fat dm_mod ip6t_REJECT nf_reject_ipv6
 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT 
nf_reject_ipv4 xt_NFLOG nfnetlink_log nfnetlink xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_hashlimit xt_multiport xt_conntrack nf_conntr
ack iptable_filter ast ttm drm_kms_helper xgene_hwmon efi_pstore drm 
i2c_algo_bit xgene_edac edac_core xgene_dma joydev evdev chaoskey 
mailbox_xgene_slimpro sg xgene_rng rng_core efivars tun drbd lru_cache efivarfs 
ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
crc32c_generic libcrc32c raid0 multipath linear raid1 hid_generic md_mod usbhid 
hid sd_mod
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:  
i2c_xgene_slimpro ahci_xgene libahci_platform libahci xhci_plat_hcd xgene_enet 
xhci_hcd libata phy_xgene marvell usbcore scsi_mod mdio_xgene of_mdio fixed_phy 
libphy usb_common gpio_xgene_sb
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: CPU: 
0 PID: 1410 Comm: ovsdb-server Tainted: G        W I     4.9.0-9-arm64 #1 
Debian 4.9.168-1+deb9u2
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Hardware name: GIGABYTE R120-P31/MP30-AR1, BIOS D7b 08/26/2016
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
task: ffff807ff9d54380 task.stack: ffff807f95c94000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: PC 
is at 0xffffa10dbf00
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: LR 
is at 0xffffa13d221c
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: pc : 
[<0000ffffa10dbf00>] lr : [<0000ffffa13d221c>] pstate: a0000000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: sp : 
0000fffff72e8970
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x29: 
0000fffff72e8970 x28: 0000000000000000 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x27: 
0000aaaafa714d90 x26: 0000aaaafa7354c8 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x25: 
0000aaaafa6eaed0 x24: 0000000000000018 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x23: 
0000aaaafa72c660 x22: 0000aaaafa711b80 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x21: 
0000000000000004 x20: 000000000000000c 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x19: 
0000aaaafa702b90 x18: 00000000002597a9 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x17: 
0000ffffa10dbec0 x16: 0000ffffa14837a0 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x15: 
ffffffffffffffff x14: 0000000000000010 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x13: 
33613a63353a3834 x12: 3a66373a63613a36 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x11: 
0101010101010101 x10: 0000000066666666 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x9 : 
7f7f7f7f7f7f7f7f x8 : 0101010101010101 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x7 : 
7f7fffffff7f7f7f x6 : feffa9a9f970ff72 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x5 : 
8080000000008000 x4 : 0080000000008080 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x3 : 
0000aaaafa720073 x2 : 726f7272655f7874 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x1 : 
0000aaaafa711c20 x0 : 0000000000000008 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
Process ovsdb-server (pid: 1410, stack limit = 0xffff807f95c94020)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: ---[ 
end trace 1fdaa7d4350a5508 ]---
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad 
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
INFO: rcu_bh detected stalls on CPUs/tasks:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:      
0-...: (1 GPs behind) idle=1fd/140000000000000/0 softirq=736283/736285 fqs=2434 
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:      
(detected by 2, t=5255 jiffies, g=15038, c=15037, q=8)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Task 
dump for CPU 0:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
ovsdb-server    R  running task        0  1410   1409 0x0000000a
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Call 
trace:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008086190>] __switch_to+0x90/0xd8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b804>] bad_mode+0x6c/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000000021dc9afc>] 0x21dc9afc
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000000021db79b8>] 0x21db79b8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008610748>] virt_efi_set_variable.part.6+0x68/0xb0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008610898>] virt_efi_set_variable+0x78/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000860f020>] efivar_entry_set_safe+0xc8/0x200
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff0000010574b8>] efi_pstore_write+0x158/0x1b0 [efi_pstore]
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000830cdbc>] pstore_dump+0x17c/0x388
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff000008132a54>] kmsg_dump+0xac/0xd0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff0000080cf5cc>] oops_exit+0x2c/0x38
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b0a4>] die+0xdc/0x1c8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<ffff00000808b818>] bad_mode+0x80/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 
[<0000ffffa13d221c>] 0xffffa13d221c

I don't know if the drbd stuff is related to the Oops, I guess it may
not be (as I see similar messages before things break).  In any case
after that point the network is down.  The network driver is xgene-enet.

/etc/network/interfaces:

  # The loopback network interface
  auto lo
  iface lo inet loopback

  auto eth0
  iface eth0 inet manual
        pre-up    echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        pre-up    ip link set dev $IFACE up
        post-down ip link set dev $IFACE down

  # The primary network interface
  allow-hotplug br-inet
  iface br-inet inet static
        address 217.196.149.227/28
        gateway 217.196.149.238
  iface br-inet inet6 static
        address 2a02:16a8:dc41:100::227/64
        gateway 2a02:16a8:dc41:100::def

  auto eth1
  iface eth1 inet static
        address 172.29.186.11/24

  auto eth2
  iface eth2 inet static
        address 172.29.184.11/24

bridge config:

  # ovs-vsctl show
  91934a25-b86f-4d3a-a598-19f915404192
      Bridge br-inet
          Port "tap0"
              Interface "tap0"
          Port "eth0"
              Interface "eth0"
          Port br-inet
              Interface br-inet
                  type: internal
          Port "tap2"
              Interface "tap2"
                  error: "could not open network device tap2 (No such device)"
          Port "tap1"
              Interface "tap1"
      ovs_version: "2.6.2"

(the tap interfaces are for qemu VMs)

Cheers,
Julien

Reply via email to