Hi,

We're experiencing what appears to be the same problem as well on a
Pacemaker cluster of ours; this is causing us serious issues as the
nodes are rebooted when the problem appears.

Has any progress been made in identifying a cause for this and/or curing
the problem?

>From dmesg:

> Dec 28 23:16:32 tyne kernel: [418756.268195] WARNING: at 
> /build/linux-rrsxby/linux-3.2.51/net/sched/sch_generic.c:256 
> dev_watchdog+0xf2/0x151()
> Dec 28 23:16:32 tyne kernel: [418756.382761] Hardware name: X9DRD-iF
> Dec 28 23:16:32 tyne kernel: [418756.496392] NETDEV WATCHDOG: eth1 (igb): 
> transmit queue 1 timed out
> Dec 28 23:16:33 tyne kernel: [418756.607364] Modules linked in: hmac dlm sctp 
> libcrc32c configfs ip6table_filter ebtable_nat ebtables act_police cls_basic 
> cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq 
> xt_statistic xt_CT xt_time xt_connlimit xt_realm xt_addrtype iptable_raw 
> xt_comment 
> xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP 
> ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp 
> nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre 
> nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda 
> nf_conntrack_sane nf_con
> ntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite 
> nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre 
> nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast 
> nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core 
> ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_p
> hysdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_
> Dec 28 23:16:34 tyne kernel: mac xt_limit xt_length xt_iprange xt_helper 
> xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY 
> xt_AUDIT ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink ib_iser rdma_cm ib_cm 
> iw_cm ib_sa ib_mad ib_
> core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
> iptable_filter ip_tables x_tables nfsd nfs lockd fscache auth_rpcgss nfs_acl 
> sunrpc bonding sha1_ssse3 sha1_generic ipmi_poweroff ipmi_devintf ipmi_si 
> ipmi_msghandler vhost_net macvtap macvlan tun drbd lru_cache bridge stp loop 
> kvm_intel kvm snd_pcm s
> nd_timer coretemp snd soundcore acpi_cpufreq crc32c_intel ghash_clmulni_intel 
> mperf aesni_intel psmouse snd_page_alloc cryptd iTCO_wdt sb_edac processor 
> i2c_i801 serio_raw aes_x86_64 ioatdma pcspkr iTCO_vendor_support aes_generic 
> thermal_sys i2c_core joydev edac_core evdev container button acpi_pad ext4 
> crc16 jbd2 m
> bcache dm_mod raid1 md_mod microcode usbhid hid sg sd_mod crc_t10dif ahci lib
> Dec 28 23:16:34 tyne kernel: ahci isci libsas libata ehci_hcd 
> scsi_transport_sas usbcore igb scsi_mod usb_common dca [last unloaded: 
> scsi_wait_scan]
> Dec 28 23:16:34 tyne kernel: [418758.541550] Pid: 0, comm: swapper/0 Not 
> tainted 3.2.0-4-amd64 #1 Debian 3.2.51-1
> Dec 28 23:16:34 tyne kernel: [418758.652098] Call Trace:
> Dec 28 23:16:35 tyne kernel: [418758.761884]  <IRQ>  [<ffffffff81046cbd>] ? 
> warn_slowpath_common+0x78/0x8c
> Dec 28 23:16:35 tyne kernel: [418758.869948]  [<ffffffff81046d69>] ? 
> warn_slowpath_fmt+0x45/0x4a
> Dec 28 23:16:35 tyne kernel: [418758.977593]  [<ffffffff812a6f11>] ? 
> netif_tx_lock+0x40/0x75
> Dec 28 23:16:35 tyne kernel: [418759.082681]  [<ffffffff812a7081>] ? 
> dev_watchdog+0xf2/0x151
> Dec 28 23:16:35 tyne kernel: [418759.186240]  [<ffffffff81052480>] ? 
> run_timer_softirq+0x19a/0x261
> Dec 28 23:16:35 tyne kernel: [418759.287841]  [<ffffffff812a6f8f>] ? 
> netif_tx_unlock+0x49/0x49
> Dec 28 23:16:35 tyne kernel: [418759.387569]  [<ffffffff8104c2f8>] ? 
> __do_softirq+0xb9/0x177
> Dec 28 23:16:35 tyne kernel: [418759.486351]  [<ffffffff81096529>] ? 
> rcu_needs_cpu+0x50/0x1bb
> Dec 28 23:16:35 tyne kernel: [418759.583008]  [<ffffffff8135646c>] ? 
> call_softirq+0x1c/0x30
> Dec 28 23:16:35 tyne kernel: [418759.677333]  [<ffffffff8100f8cd>] ? 
> do_softirq+0x3c/0x7b
> Dec 28 23:16:36 tyne kernel: [418759.770142]  [<ffffffff8104c560>] ? 
> irq_exit+0x3c/0x99
> Dec 28 23:16:36 tyne kernel: [418759.860906]  [<ffffffff8100f5fd>] ? 
> do_IRQ+0x82/0x98
> Dec 28 23:16:36 tyne kernel: [418759.954639]  [<ffffffff8134f4ee>] ? 
> common_interrupt+0x6e/0x6e
> Dec 28 23:16:36 tyne kernel: [418760.048124]  <EOI>  [<ffffffff811ee07d>] ? 
> intel_idle+0xea/0x119
> Dec 28 23:16:36 tyne kernel: [418760.137012]  [<ffffffff811ee05c>] ? 
> intel_idle+0xc9/0x119
> Dec 28 23:16:36 tyne kernel: [418760.222705]  [<ffffffff8126febd>] ? 
> cpuidle_idle_call+0xec/0x179
> Dec 28 23:16:36 tyne kernel: [418760.306317]  [<ffffffff8100d243>] ? 
> cpu_idle+0xa5/0xf2
> Dec 28 23:16:36 tyne kernel: [418760.388391]  [<ffffffff816abb36>] ? 
> start_kernel+0x3b8/0x3c3
> Dec 28 23:16:36 tyne kernel: [418760.470137]  [<ffffffff816ab140>] ? 
> early_idt_handlers+0x140/0x140
> Dec 28 23:16:36 tyne kernel: [418760.548953]  [<ffffffff816ab3c4>] ? 
> x86_64_start_kernel+0x104/0x111
> Dec 28 23:16:36 tyne kernel: [418760.626209] ---[ end trace 25448d4e9ff0e259 
> ]---
> Dec 28 23:16:37 tyne kernel: [418760.710249] igb 0000:06:00.1: eth1: Reset 
> adapter
> Dec 28 23:16:37 tyne kernel: [418760.814181] igb 0000:06:00.0: eth0: Reset 
> adapter
- and -
> Dec 28 23:16:32 tees kernel: [419013.476706] WARNING: at 
> /build/linux-rrsxby/linux-3.2.51/net/sched/sch_generic.c:256 
> dev_watchdog+0xf2/0x151()
> Dec 28 23:16:33 tees kernel: [419013.591003] Hardware name: X9DRD-iF
> Dec 28 23:16:33 tees kernel: [419013.705052] NETDEV WATCHDOG: eth1 (igb): 
> transmit queue 3 timed out
> Dec 28 23:16:34 tees kernel: [419013.817376] Modules linked in: hmac dlm sctp 
> libcrc32c configfs ip6table_filter ebtable_nat ebtables act_police cls_basic 
> cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq 
> xt_statistic xt_CT xt_time xt_connlimit xt_realm xt_addrtype iptable_raw 
> xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP 
> ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp 
> nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre 
> nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda 
> nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
> nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp 
> nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns 
> nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp 
> xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype 
> xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink
 _
log xt_multiport xt_mark xt_
> Dec 28 23:16:34 tees kernel: mac xt_limit xt_length xt_iprange xt_helper 
> xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY 
> xt_AUDIT ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink ib_iser rdma_cm ib_cm 
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi iptable_filter ip_tables x_tables nfsd nfs lockd fscache 
> auth_rpcgss nfs_acl sunrpc bonding sha1_ssse3 sha1_generic ipmi_poweroff 
> ipmi_devintf ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun drbd 
> lru_cache bridge stp loop kvm_intel kvm snd_pcm snd_timer snd i2c_i801 
> coretemp crc32c_intel iTCO_wdt soundcore ghash_clmulni_intel acpi_cpufreq 
(this is as far as that server got before being STONITHed)

Both servers have Supermicro X9DRD-iF motherboards and are running
linux-image-3.2.0-4-amd64 3.2.51-1.

lspci -vvv for one of the ports in question (eth1 on tyne) is:
> 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
>       Subsystem: Super Micro Computer Inc Device 1521
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
>       Latency: 0, Cache Line Size: 64 bytes
>       Interrupt: pin B routed to IRQ 17
>       Region 0: Memory at fbd00000 (32-bit, non-prefetchable) [size=128K]
>       Region 2: I/O ports at d000 [size=32]
>       Region 3: Memory at fbdc0000 (32-bit, non-prefetchable) [size=16K]
>       Capabilities: [40] Power Management version 3
>               Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>               Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>       Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>               Address: 0000000000000000  Data: 0000
>               Masking: 00000000  Pending: 00000000
>       Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
>               Vector table: BAR=3 offset=00000000
>               PBA: BAR=3 offset=00002000
>       Capabilities: [a0] Express (v2) Endpoint, MSI 00
>               DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, 
> L1 <64us
>                       ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
>               DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
> Unsupported+
>                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                       MaxPayload 128 bytes, MaxReadReq 512 bytes
>               DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ 
> TransPend-
>               LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 
> <4us, L1 <32us
>                       ClockPM- Surprise- LLActRep- BwNot-
>               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>               LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- 
> BWMgmt- ABWMgmt-
>               DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
>               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
>               LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
> SpeedDis-, Selectable De-emphasis: -6dB
>                        Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>                        Compliance De-emphasis: -6dB
>               LnkSta2: Current De-emphasis Level: -6dB, 
> EqualizationComplete-, EqualizationPhase1-
>                        EqualizationPhase2-, EqualizationPhase3-, 
> LinkEqualizationRequest-
>       Capabilities: [100 v2] Advanced Error Reporting
>               UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>               UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>               UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>               CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>               AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>       Capabilities: [140 v1] Device Serial Number 00-25-90-ff-ff-4e-ae-18
>       Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>               ARICap: MFVC- ACS-, Next Function: 0
>               ARICtl: MFVC- ACS-, Function Group: 0
>       Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>               IOVCap: Migration-, Interrupt Message Number: 000
>               IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
>               IOVSta: Migration-
>               Initial VFs: 8, Total VFs: 8, Number of VFs: 8, Function 
> Dependency Link: 01
>               VF offset: 384, stride: 4, Device ID: 1520
>               Supported Page Size: 00000553, System Page Size: 00000001
>               Region 0: Memory at fbd60000 (32-bit, non-prefetchable)
>               Region 3: Memory at fbd40000 (32-bit, non-prefetchable)
>               VF Migration: offset: 00000000, BIR: 0
>       Capabilities: [1a0 v1] Transaction Processing Hints
>               Device specific mode supported
>               Steering table in TPH capability structure
>       Capabilities: [1d0 v1] Access Control Services
>               ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- 
> EgressCtrl- DirectTrans-
>               ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- 
> EgressCtrl- DirectTrans-
>       Kernel driver in use: igb

Please let me know if I can provide any further information.

Best regards,
Chris

-- 
Chris Boot
Tiger Computing Ltd
"Linux for Business"

Tel: 01600 483 484
Web: http://www.tiger-computing.co.uk
Follow us on Facebook: http://www.facebook.com/TigerComputing

Registered in England. Company number: 3389961
Registered address: Wyastone Business Park,
 Wyastone Leys, Monmouth, NP25 3SR


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/52cd4f3d.8000...@tiger-computing.co.uk

Reply via email to