Hi Johan,

Could you check if you see the following in you dmesg or message log file?

[1123306.014288] ------------[ cut here ]------------
[1123306.014302] WARNING: at net/core/dev.c:2189 
skb_warn_bad_offload+0xcd/0xda()
[1123306.014306] : caps=(0x0000000200004849, 0x0000000000000000) len=330 
data_len=276 gso_size=276 gso_type=1 ip_summed=1
[1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filter 
ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun 
scsi_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time 
intel_powerclamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel cryptd pcspkr sb_edac edac_core i2c_i801 
lpc_ich mfd_core mei_me mei wmi ioatdma shpchp ipmi_devintf ipmi_si 
ipmi_msghandler acpi_power_meter acpi_pad 8021q garp mrp bridge stp llc bonding 
dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea 
sysfillrect sysimgblt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe 
i2c_algo_bit libahci libata mdio i2c_core ptp megaraid_sas pps_core dca 
dm_mirror dm_region_hash dm_log dm_mod
[1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W   
--------------   3.10.0-229.1.2.el7.x86_64 #1
[1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS 1.1 
08/03/2015
[1123306.014364]  ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 
ffffffff81604afa
[1123306.014371]  ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 
ffff881fce88c000
[1123306.014376]  0000000000000001 0000000000000001 ffff881fcebb0500 
ffff881fffc43a00
[1123306.014381] Call Trace:
[1123306.014383]  <IRQ>  [<ffffffff81604afa>] dump_stack+0x19/0x1b
[1123306.014396]  [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
[1123306.014399]  [<ffffffff8106e3ec>] warn_slowpath_fmt+0x5c/0x80
[1123306.014405]  [<ffffffff812db093>] ? ___ratelimit+0x93/0x100
[1123306.014409]  [<ffffffff816076c3>] skb_warn_bad_offload+0xcd/0xda
[1123306.014425]  [<ffffffff814fdeb9>] __skb_gso_segment+0x79/0xb0
[1123306.014429]  [<ffffffff814fe1c2>] dev_hard_start_xmit+0x1a2/0x580
[1123306.014438]  [<ffffffffa0168790>] ? deliver_clone+0x50/0x50 [bridge]
[1123306.014443]  [<ffffffff8151df1e>] sch_direct_xmit+0xee/0x1c0
[1123306.014447]  [<ffffffff814fe798>] dev_queue_xmit+0x1f8/0x4a0
[1123306.014453]  [<ffffffffa016880b>] br_dev_queue_push_xmit+0x7b/0xc0 [bridge]
[1123306.014458]  [<ffffffffa0168a22>] br_forward_finish+0x22/0x60 [bridge]
[1123306.014464]  [<ffffffffa0168ae0>] __br_forward+0x80/0xf0 [bridge]
[1123306.014469]  [<ffffffffa0168ebb>] br_forward+0x8b/0xa0 [bridge]
[1123306.014476]  [<ffffffffa0169e65>] br_handle_frame_finish+0x175/0x410 
[bridge]
[1123306.014481]  [<ffffffffa016a275>] br_handle_frame+0x175/0x260 [bridge]
[1123306.014485]  [<ffffffff814fc112>] __netif_receive_skb_core+0x282/0x870
[1123306.014490]  [<ffffffff8101b589>] ? read_tsc+0x9/0x10
[1123306.014493]  [<ffffffff814fc718>] __netif_receive_skb+0x18/0x60
[1123306.014497]  [<ffffffff814fc7a0>] netif_receive_skb+0x40/0xd0
[1123306.014500]  [<ffffffff814fd2b0>] napi_gro_receive+0x80/0xb0
[1123306.014512]  [<ffffffffa00cde2c>] ixgbe_clean_rx_irq+0x7ac/0xb30 [ixgbe]
[1123306.014519]  [<ffffffffa00cf07b>] ixgbe_poll+0x4bb/0x930 [ixgbe]
[1123306.014524]  [<ffffffff814fcb62>] net_rx_action+0x152/0x240
[1123306.014528]  [<ffffffff81077bf7>] __do_softirq+0xf7/0x290
[1123306.014533]  [<ffffffff8161635c>] call_softirq+0x1c/0x30
[1123306.014539]  [<ffffffff81015de5>] do_softirq+0x55/0x90
[1123306.014543]  [<ffffffff81077f95>] irq_exit+0x115/0x120
[1123306.014546]  [<ffffffff81616ef8>] do_IRQ+0x58/0xf0
[1123306.014551]  [<ffffffff8160c0ed>] common_interrupt+0x6d/0x6d
[1123306.014553]  <EOI>  [<ffffffff814aa6d2>] ? cpuidle_enter_state+0x52/0xc0
[1123306.014561]  [<ffffffff814aa6c8>] ? cpuidle_enter_state+0x48/0xc0
[1123306.014565]  [<ffffffff814aa805>] cpuidle_idle_call+0xc5/0x200
[1123306.014569]  [<ffffffff8101d21e>] arch_cpu_idle+0xe/0x30
[1123306.014574]  [<ffffffff810c6945>] cpu_startup_entry+0xf5/0x290
[1123306.014580]  [<ffffffff810423ca>] start_secondary+0x1ba/0x230
[1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]---

If so, then could you try the following:

ethtool -K <nic name> lro off

Do this for all the 10G intel nics and check if the problems still exists


Kind regards,

Jurriën Bloemen

On 17-03-16 09:49, Johan Kooijman wrote:
Hi all,

Since we upgraded to the latest ovirt node running 7.2, we're seeing that nodes 
become unavailable after a while. It's running fine, with a couple of VM's on 
it, untill it becomes non responsive. At that moment it doesn't even respond to 
ICMP. It'll come back by itself after a while, but oVirt fences the machine 
before that time and restarts VM's elsewhere.

Engine tells me this message:

VDSM host09 command failed: Message timeout which can be caused by 
communication issues

Is anyone else experiencing these issues with ixgbe drivers? I'm running on 
Intel X540-AT2 cards.

--
Met vriendelijke groeten / With kind regards,
Johan Kooijman



_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.ovirt.org/mailman/listinfo/users


This message (including any attachments) may contain information that is 
privileged or confidential. If you are not the intended recipient, please 
notify the sender and delete this email immediately from your systems and 
destroy all copies of it. You may not, directly or indirectly, use, disclose, 
distribute, print or copy this email or any part of it if you are not the 
intended recipient
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to