Source: linux Version: 4.9.168-1 Severity: important X-Debbugs-Cc: debian-...@lists.debian.org, debian-ad...@lists.debian.org User: debian-ad...@lists.debian.org Usertags: needed-by-DSA-Team
Hi, ever since the 9.9 point release conova-node01.debian.org and conova-node02.debian.org have been unstable. They run for an hour or three, and then things go bad. Rebooting back to 4.9.144-3.1 makes them stable again. Latest example: May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: PingAck did not arrive in time. May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: new current UUID 3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5 May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: ack_receiver terminated May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Terminating drbd_a_resource May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Connection closed May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: conn( NetworkFailure -> Unconnected ) May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: receiver terminated May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Restarting receiver thread May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: receiver (re)started May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: conn( Unconnected -> WFConnection ) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Handshake successful: Agreed network protocol version 101 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Peer authenticated using 16 bytes HMAC May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: conn( WFConnection -> WFReportParams ) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd resource3: Starting ack_recv thread (from drbd_r_resource [8449]) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: drbd_sync_handshake: May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: self 3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5 bits:4 flags:0 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: peer 0BEBDA613EA56FD6:0000000000000000:D5BF70E0AA6560C4:D5BE70E0AA6560C5 bits:0 flags:0 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: uuid_compare()=1 by rule 70 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), total 28; compression: 100.0% May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), total 28; compression: 100.0% May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: helper command: /bin/true before-resync-source minor-3 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: helper command: /bin/true before-resync-source minor-3 exit code 0 (0x0) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: Began resync as SyncSource (will sync 16 KB [4 bits set]). May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: updated sync UUID 3EA2D1FA6B3ACD47:0BECDA613EA56FD7:0BEBDA613EA56FD7:D5BF70E0AA6560C5 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: Resync done (total 1 sec; paused 0 sec; 16 K/sec) May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: updated UUIDs 3EA2D1FA6B3ACD47:0000000000000000:0BECDA613EA56FD7:0BEBDA613EA56FD7 May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: block drbd3: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) May 22 04:17:48 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: [Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: [Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI set_time May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi: [Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64) May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Internal error: Oops - bad mode: 0 [#1] SMP May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Modules linked in: openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat binfmt_misc nls_ascii nls_cp437 vfat fat dm_mod ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log nfnetlink xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_hashlimit xt_multiport xt_conntrack nf_conntr ack iptable_filter ast ttm drm_kms_helper xgene_hwmon efi_pstore drm i2c_algo_bit xgene_edac edac_core xgene_dma joydev evdev chaoskey mailbox_xgene_slimpro sg xgene_rng rng_core efivars tun drbd lru_cache efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq crc32c_generic libcrc32c raid0 multipath linear raid1 hid_generic md_mod usbhid hid sd_mod May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: i2c_xgene_slimpro ahci_xgene libahci_platform libahci xhci_plat_hcd xgene_enet xhci_hcd libata phy_xgene marvell usbcore scsi_mod mdio_xgene of_mdio fixed_phy libphy usb_common gpio_xgene_sb May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: CPU: 0 PID: 1410 Comm: ovsdb-server Tainted: G W I 4.9.0-9-arm64 #1 Debian 4.9.168-1+deb9u2 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Hardware name: GIGABYTE R120-P31/MP30-AR1, BIOS D7b 08/26/2016 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: task: ffff807ff9d54380 task.stack: ffff807f95c94000 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: PC is at 0xffffa10dbf00 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: LR is at 0xffffa13d221c May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: pc : [<0000ffffa10dbf00>] lr : [<0000ffffa13d221c>] pstate: a0000000 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: sp : 0000fffff72e8970 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x29: 0000fffff72e8970 x28: 0000000000000000 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x27: 0000aaaafa714d90 x26: 0000aaaafa7354c8 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x25: 0000aaaafa6eaed0 x24: 0000000000000018 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x23: 0000aaaafa72c660 x22: 0000aaaafa711b80 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x21: 0000000000000004 x20: 000000000000000c May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x19: 0000aaaafa702b90 x18: 00000000002597a9 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x17: 0000ffffa10dbec0 x16: 0000ffffa14837a0 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x15: ffffffffffffffff x14: 0000000000000010 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x13: 33613a63353a3834 x12: 3a66373a63613a36 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x11: 0101010101010101 x10: 0000000066666666 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x9 : 7f7f7f7f7f7f7f7f x8 : 0101010101010101 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x7 : 7f7fffffff7f7f7f x6 : feffa9a9f970ff72 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x5 : 8080000000008000 x4 : 0080000000008080 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x3 : 0000aaaafa720073 x2 : 726f7272655f7874 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x1 : 0000aaaafa711c20 x0 : 0000000000000008 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Process ovsdb-server (pid: 1410, stack limit = 0xffff807f95c94020) May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: ---[ end trace 1fdaa7d4350a5508 ]--- May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64) May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: INFO: rcu_bh detected stalls on CPUs/tasks: May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: 0-...: (1 GPs behind) idle=1fd/140000000000000/0 softirq=736283/736285 fqs=2434 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: (detected by 2, t=5255 jiffies, g=15038, c=15037, q=8) May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Task dump for CPU 0: May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: ovsdb-server R running task 0 1410 1409 0x0000000a May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Call trace: May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff000008086190>] __switch_to+0x90/0xd8 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff00000808b804>] bad_mode+0x6c/0x90 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<0000000021dc9afc>] 0x21dc9afc May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<0000000021db79b8>] 0x21db79b8 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff000008610748>] virt_efi_set_variable.part.6+0x68/0xb0 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff000008610898>] virt_efi_set_variable+0x78/0x90 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff00000860f020>] efivar_entry_set_safe+0xc8/0x200 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff0000010574b8>] efi_pstore_write+0x158/0x1b0 [efi_pstore] May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff00000830cdbc>] pstore_dump+0x17c/0x388 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff000008132a54>] kmsg_dump+0xac/0xd0 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff0000080cf5cc>] oops_exit+0x2c/0x38 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff00000808b0a4>] die+0xdc/0x1c8 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<ffff00000808b818>] bad_mode+0x80/0x90 May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: [<0000ffffa13d221c>] 0xffffa13d221c I don't know if the drbd stuff is related to the Oops, I guess it may not be (as I see similar messages before things break). In any case after that point the network is down. The network driver is xgene-enet. /etc/network/interfaces: # The loopback network interface auto lo iface lo inet loopback auto eth0 iface eth0 inet manual pre-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6 pre-up ip link set dev $IFACE up post-down ip link set dev $IFACE down # The primary network interface allow-hotplug br-inet iface br-inet inet static address 217.196.149.227/28 gateway 217.196.149.238 iface br-inet inet6 static address 2a02:16a8:dc41:100::227/64 gateway 2a02:16a8:dc41:100::def auto eth1 iface eth1 inet static address 172.29.186.11/24 auto eth2 iface eth2 inet static address 172.29.184.11/24 bridge config: # ovs-vsctl show 91934a25-b86f-4d3a-a598-19f915404192 Bridge br-inet Port "tap0" Interface "tap0" Port "eth0" Interface "eth0" Port br-inet Interface br-inet type: internal Port "tap2" Interface "tap2" error: "could not open network device tap2 (No such device)" Port "tap1" Interface "tap1" ovs_version: "2.6.2" (the tap interfaces are for qemu VMs) Cheers, Julien