apport information ** Tags added: apport-collected
** Description changed: Our mail server has its backups on a storage array which is mounted via iSCSI. Starting April 11th, there have been 5 events where the iSCSI connection has been lost and the filesystem has been automatically remounted read-only as a result. The server and storage array are directly connected by gigabit Ethernet; no switch in between. We changed the cable and this is still happening. On April 9th we upgraded to linux-image-2.6.32-46-server 2.6.32-46.107 and these events happened on April 11, 14, 15. On April 19th we upgraded to linux-image-2.6.32-46-server 2.6.32-46.108 and these events happened April 22 and 25. The dedicated connection is more or less idle except when a backup is happening. The timing of the events doesn't appear to correspond to high load; they have happened in the middle of the day as well as overnight. The backups run every night, so it's not happening every time. I'll work more on trying to reproduce. Restarting the interface (ifdown eth1 && ifup eth1) returns everything to normal; a reboot is not necessary. http://bugs.centos.org/view.php?id=6249 seems like it might be similar. The reporter said it was resolved in CentOS 6.4. The kernel says: Apr 25 21:02:31 mail kernel: [313179.870010] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4326254283, last ping 4326254783, now 4326255283 Apr 25 21:02:31 mail kernel: [313179.870466] connection1:0: detected conn error (1011) Apr 25 21:04:31 mail kernel: [313300.124386] session1: session recovery timed out after 120 secs Apr 25 21:08:24 mail kernel: [313532.861947] ------------[ cut here ]------------ Apr 25 21:08:24 mail kernel: [313532.861961] WARNING: at /build/buildd/linux-2.6.32/net/sched/sch_generic.c:261 dev_watchdog+0x262/0x270() Apr 25 21:08:24 mail kernel: [313532.861964] Hardware name: ProLiant DL380 G5 Apr 25 21:08:24 mail kernel: [313532.861966] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 0 timed out Apr 25 21:08:24 mail kernel: [313532.861968] Modules linked in: crc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm psmouse drm_kms_helper bnx2 serio_raw drm lp shpchp usbhid i2c_algo_bit ipmi_si i5000_edac hid edac_core i5k_amb ipmi_msghandler hpilo parport cciss Apr 25 21:08:24 mail kernel: [313532.862003] Pid: 22, comm: ksoftirqd/6 Not tainted 2.6.32-46-server #108-Ubuntu Apr 25 21:08:24 mail kernel: [313532.862006] Call Trace: Apr 25 21:08:24 mail kernel: [313532.862009] <IRQ> [<ffffffff81067d1b>] warn_slowpath_common+0x7b/0xc0 Apr 25 21:08:24 mail kernel: [313532.862018] [<ffffffff81067dc1>] warn_slowpath_fmt+0x41/0x50 Apr 25 21:08:24 mail kernel: [313532.862022] [<ffffffff814936f2>] dev_watchdog+0x262/0x270 Apr 25 21:08:24 mail kernel: [313532.862026] [<ffffffff8104b70f>] ? enqueue_task+0x5f/0x70 Apr 25 21:08:24 mail kernel: [313532.862030] [<ffffffff8104ce4c>] ? resched_task+0x2c/0x90 Apr 25 21:08:24 mail kernel: [313532.862033] [<ffffffff81493490>] ? dev_watchdog+0x0/0x270 Apr 25 21:08:24 mail kernel: [313532.862037] [<ffffffff810787db>] run_timer_softirq+0x19b/0x340 Apr 25 21:08:24 mail kernel: [313532.862042] [<ffffffff8106f3b7>] __do_softirq+0xb7/0x1f0 Apr 25 21:08:24 mail kernel: [313532.862046] [<ffffffff810142ac>] call_softirq+0x1c/0x30 Apr 25 21:08:24 mail kernel: [313532.862048] <EOI> [<ffffffff81015c75>] do_softirq+0x65/0xa0 Apr 25 21:08:24 mail kernel: [313532.862053] [<ffffffff8106ee60>] ksoftirqd+0x80/0x110 Apr 25 21:08:24 mail kernel: [313532.862056] [<ffffffff8106ede0>] ? ksoftirqd+0x0/0x110 Apr 25 21:08:24 mail kernel: [313532.862060] [<ffffffff810862d6>] kthread+0x96/0xa0 Apr 25 21:08:24 mail kernel: [313532.862064] [<ffffffff810141aa>] child_rip+0xa/0x20 Apr 25 21:08:24 mail kernel: [313532.862067] [<ffffffff81086240>] ? kthread+0x0/0xa0 Apr 25 21:08:24 mail kernel: [313532.862070] [<ffffffff810141a0>] ? child_rip+0x0/0x20 Apr 25 21:08:24 mail kernel: [313532.862072] ---[ end trace 004141e95911ce82 ]--- Apr 25 21:08:24 mail kernel: [313532.990994] bnx2: eth1 NIC Copper Link is Down and the iSCSI device stops responding, causing a lot of following SCSI and ext4 errors. linux 2.6.32-46.107 only had a few changes. The one that seems most likely to be related is 267a6bfc746dfb906076ba6634007a1f4d29848a. I'm going to try rolling back to 2.6.32-46.105, and if that fixes the bug I'll try to build a kernel with that commit backed out. ProblemType: Bug DistroRelease: Ubuntu 10.04 Package: linux-image-2.6.32-46-server 2.6.32-46.108 Regression: Yes Reproducible: No ProcVersionSignature: Ubuntu 2.6.32-46.108-server 2.6.32.60+drm33.26 Uname: Linux 2.6.32-46-server x86_64 AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory AplayDevices: Error: [Errno 2] No such file or directory Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory Date: Fri Apr 26 08:03:58 2013 Frequency: Once a week. InstallationMedia: Ubuntu-Server 10.04.4 LTS "Lucid Lynx" - Release amd64 (20120214.2) MachineType: HP ProLiant DL380 G5 PciMultimedia: ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-46-server root=UUID=0d239486-7826-43d6-a5dc-ed1e2377ad2b ro quiet ProcEnviron: PATH=(custom, no user) LANG=en_CA.UTF-8 SHELL=/bin/bash SourcePackage: linux dmi.bios.date: 05/02/2011 dmi.bios.vendor: HP dmi.bios.version: P56 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP56:bd05/02/2011:svnHP:pnProLiantDL380G5:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL380 G5 dmi.sys.vendor: HP + --- + AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory + AplayDevices: Error: [Errno 2] No such file or directory + Architecture: amd64 + ArecordDevices: Error: [Errno 2] No such file or directory + CurrentDmesg: + [ 17.654167] eth0: no IPv6 routers present + [ 17.912468] eth1: no IPv6 routers present + DistroRelease: Ubuntu 10.04 + Frequency: Once every few days. + InstallationMedia: Ubuntu-Server 10.04.4 LTS "Lucid Lynx" - Release amd64 (20120214.2) + MachineType: HP ProLiant DL380 G5 + Package: linux (not installed) + PciMultimedia: + + ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-46-server root=UUID=0d239486-7826-43d6-a5dc-ed1e2377ad2b ro quiet + ProcEnviron: + PATH=(custom, no user) + LANG=en_CA.UTF-8 + SHELL=/bin/bash + ProcVersionSignature: Ubuntu 2.6.32-46.105-server 2.6.32.60+drm33.26 + Regression: Yes + Reproducible: No + Tags: lucid networking regression-update needs-upstream-testing + Uname: Linux 2.6.32-46-server x86_64 + UserGroups: + + dmi.bios.date: 05/02/2011 + dmi.bios.vendor: HP + dmi.bios.version: P56 + dmi.chassis.type: 23 + dmi.chassis.vendor: HP + dmi.modalias: dmi:bvnHP:bvrP56:bd05/02/2011:svnHP:pnProLiantDL380G5:pvr:cvnHP:ct23:cvr: + dmi.product.name: ProLiant DL380 G5 + dmi.sys.vendor: HP ** Attachment added: "BootDmesg.txt" https://bugs.edge.launchpad.net/bugs/1173253/+attachment/3670407/+files/BootDmesg.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1173253 Title: bnx2 transmit queue timeouts with linux 2.6.32-46.107 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1173253/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs