Re: rdump stuck in sbwait state (RELENG_7)
On 2008-Dec-29 20:28:41 -0500, Terry Kennedy wrote: > I upgraded a box (Dell Poweredge 1550, dual PIII processors) from a kernel + >world of December 8th to one from today (December 29th) and I am experiencing >a new problem with rdump. ... > A tcpdump on both the sending and receiving systems shows no packets >between them from the rdump processes. However, I can rshell both ways >and get the expected output, so the link isn't down. This is probably the critical piece of information - the TCP connection has stopped transferring data for some reason and the rdump is blocked waiting to send. Unfortunately, you need the last packets that were exchanged in order to identify which end has the problem (and hopefully provide some pointers as to why). If possible, can you repeat the dump whilst you run a tcpdump on the rdump flow and then post the last dozen or so packets in each direction. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpL79vKz7BI4.pgp Description: PGP signature
panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock
While debugging I noticed that sys/netgraph/ng_base.c#rev1.131 was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15. But this depends on sys/kern/subr_witness.c#rev1.227 which was not MFCed, and that is triggering panic (in subj) if kernel is built with WITNESS. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock
2008/12/30 Alexander Motin : > pluknet wrote: >> While debugging I noticed that sys/netgraph/ng_base.c#rev1.131 >> was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15. >> >> But this depends on sys/kern/subr_witness.c#rev1.227 which was >> not MFCed, and that is triggering panic (in subj) if kernel is built >> with WITNESS. > > Merged. > > -- > Alexander Motin > many thanks! -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock
pluknet wrote: > While debugging I noticed that sys/netgraph/ng_base.c#rev1.131 > was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15. > > But this depends on sys/kern/subr_witness.c#rev1.227 which was > not MFCed, and that is triggering panic (in subj) if kernel is built > with WITNESS. Merged. -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: rdump stuck in sbwait state (RELENG_7)
> Unfortunately, you need the last packets that were exchanged in order > to identify which end has the problem (and hopefully provide some > pointers as to why). If possible, can you repeat the dump whilst you > run a tcpdump on the rdump flow and then post the last dozen or so > packets in each direction. That could be pretty unpleasant - this happens at a random point while dumping 4GB or so. If I have to, I'll do it but I was hoping there was a better way. Shouldn't this get torn down by a keepalive at some point? It has been sitting for 9 hours or so at this point... Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: rdump stuck in sbwait state (RELENG_7)
On 2008-Dec-30 05:48:26 -0500, Terry Kennedy wrote: >> Unfortunately, you need the last packets that were exchanged in order >> to identify which end has the problem (and hopefully provide some >> pointers as to why). If possible, can you repeat the dump whilst you >> run a tcpdump on the rdump flow and then post the last dozen or so >> packets in each direction. > > That could be pretty unpleasant - this happens at a random point while >dumping 4GB or so. If I have to, I'll do it but I was hoping there was >a better way. Sorry, I can't think of any - by the time you see it hung, whatever went wrong has already happened. You might glean some insight from the TCP socket state (on the FreeBSD side, use 'netstat -A' to print the PCB address and gdb to dump the contents but I'm not sure how to get this data out of OpenVMS). The '-C' and '-W' options to tcpdump will help. > Shouldn't this get torn down by a keepalive at some point? It has been >sitting for 9 hours or so at this point... On FreeBSD, keepalives are off by default. You change change the default with sysctl net.inet.tcp.always_keepalive but I think that only affects new connections. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpEhfcVex9gC.pgp Description: PGP signature
Re: rdump stuck in sbwait state (RELENG_7)
I'm pretty sure it's caused by FreeBSD. It can very well be related to PR 117603, a real nasty dump(8) bug that was introduced in 7.0 on SMP systems. But it should have been patched back in March by this: jeff 2008-03-13 00:46:12 UTC FreeBSD src repository Modified files: sys/kern subr_sleepqueue.c Log: PR 117603 - Close a sleepqueue signal race by interlocking with the per-process spinlock. This was mistakenly omitted from the thread_lock patch and has been a race since. MFC After: 1 week PR: bin/117603 Reported by: Danny Braniss Revision Changes Path 1.48 +5 -2 src/sys/kern/subr_sleepqueue.c So I'm real surprised it shows up again. We got a pretty large backup environment with dump(8) being a critical element of it. I just hope the problem will be resolved before 7.1-RELEASE hit the streets. Terry, please file a bug report on this and get in touch with iedowse@ who was implementing the aforementioned patch. Andy Kosela ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SATA hotplug and AHCI
Andrey V. Elsukov wrote: ... Linux's libata driver has a quirk for VIA AHCI: /* vt8251 doesn't clear BSY on signature FIS reception, * request follow-up softreset. */ If i right understand it issues softreset for VIA controllers just after hardreset. And after softreset it is trying to read device signature. FreeBSD CURRENT has similar code, but it is disabled by default. You can try install CURRENT and rebuild ata_ahci driver with AHCI_PM option. May be it will help.. I'm glad this came up. When I asked a few weeks ago about SATA Hotplug support, I was asking because of a board with a VIA SATA controller I was planning to add drives too, on a JBOD basis. Perhaps this hack can be backported to 7.x to actually make VIA controllers useful? P.S. VIA's SATA RAID BIOS is a pile of poop, don't bother using VIA for RAID. cheers BMS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SATA hotplug and AHCI
Bruce M. Simpson wrote: Andrey V. Elsukov wrote: ... Linux's libata driver has a quirk for VIA AHCI: /* vt8251 doesn't clear BSY on signature FIS reception, * request follow-up softreset. */ If i right understand it issues softreset for VIA controllers just after hardreset. And after softreset it is trying to read device signature. FreeBSD CURRENT has similar code, but it is disabled by default. You can try install CURRENT and rebuild ata_ahci driver with AHCI_PM option. May be it will help.. I'm glad this came up. When I asked a few weeks ago about SATA Hotplug support, I was asking because of a board with a VIA SATA controller I was planning to add drives too, on a JBOD basis. Perhaps this hack can be backported to 7.x to actually make VIA controllers useful? I'm *probably* going to wait for the next release and hope they enable the fix. Having to run atacontrol attach/detach is a little annoying, but it seems to work, so for now, I might just say that's good enough. P.S. VIA's SATA RAID BIOS is a pile of poop, don't bother using VIA for RAID. I'd say the entire BIOS is. I had problems getting it to detect boot devices for the F11 boot menu. The were more or less resolved after rebooting (so the hardware was no longer new), but still... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
TCP packet out-of-order problem
Dear listers, We recently found our new FreeBSD server (located in some foreign region) has poor network performance. After doing some tcpdump and iperf testing, we found that out-of-order TCP packets are not inserted into queue. This is an 100Mbps line, and TSO is disabled. % uname -a FreeBSD bsd 7.1-RC2 FreeBSD 7.1-RC2 #2: Wed Dec 31 03:12:39 CST 2008 r...@bsd:/usr/obj/usr/src/sys/KERNEL amd64 % iperf -c 10.1.1.250 Client connecting to office, TCP port 5001 TCP window size: 3.07 MByte (default) [ 4] local 10.1.1.210 port 61488 connected with 10.1.1.250 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.2 sec 5.74 MBytes 4.74 Mbits/sec 03:47:21.146397 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 159305:160753(1448) ack 1 win 1040 03:47:21.146409 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 160753 win 12568 03:47:21.146473 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 160753:162201(1448) ack 1 win 1040 03:47:21.146485 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 162201 win 12568 03:47:21.146972 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 163649:165097(1448) ack 1 win 1040 03:47:21.146983 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 162201 win 12573 03:47:21.146985 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 162201:163649(1448) ack 1 win 1040 03:47:21.146996 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649 win 12568 03:47:21.146998 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 165097:166545(1448) ack 1 win 1040 03:47:21.147006 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649 win 12573 03:47:21.147009 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 166545:167993(1448) ack 1 win 1040 03:47:21.147017 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649 win 12573 03:47:21.147019 IP 10.1.1.210.54919 > 10.1.1.250.5001: . 167993:169441(1448) ack 1 win 1040 * You can see "ack 163649" repeating, but the packet is transmitted before 163649:165097. % cat /etc/sysctl.conf # $FreeBSD: src/etc/sysctl.conf,v 1.8 2003/03/13 18:43:50 mux Exp $ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Uncomment this to prevent users from seeing information about processes that # are being run under another UID. #security.bsd.see_other_uids=0 debug.bootverbose=1 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.maxprocperuid=65536 net.inet.ip.fastforwarding=1 net.inet.tcp.delayed_ack=0 vm.pmap.shpgperproc=2000 kern.ipc.maxsockbuf=8388608 net.inet.tcp.sendspace=3217968 net.inet.tcp.recvspace=3217968 Is our configuration wrong? Or it is an known bug? I have searched stable & net list, but found no similar discussion. Thank you all in advance! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
7.1-RC2 : ACPI warning and errors ACPI Error (psparse-0626)
Hello, With 7.1-RC2 : Dec 30 18:10:38 client1 kernel: FreeBSD 7.1-RC2 #0: Tue Dec 23 11:42:13 UTC 2008 Dec 30 18:10:38 client1 kernel: r...@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Dec 30 18:10:38 client1 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Dec 30 18:10:38 client1 kernel: CPU: Intel(R) Core(TM)2 CPU 4300 @ 1.80GHz (1800.01-MHz K8-class CPU) Dec 30 18:10:38 client1 kernel: Origin = "GenuineIntel" Id = 0x6f2 Stepping = 2 I have found following acpi warning and errors : Dec 30 18:10:38 client1 kernel: cpu0: on acpi0 Dec 30 18:10:38 client1 kernel: ACPI Warning (tbutils-0243): Incorrect checksum in table [ASF!] - 77, should be 32 [20070320] Dec 30 18:10:38 client1 kernel: ACPI Error (psparse-0626): Method parse/execution failed [\_PR_.CPU0._OSC] (Node 0xff0001264aa0), AE_ALREADY_EXISTS Dec 30 18:10:38 client1 kernel: est0: Control> on cpu0 Dec 30 18:10:38 client1 kernel: p4tcc0: on cpu0 Dec 30 18:10:38 client1 kernel: cpu1: on acpi0 Dec 30 18:10:38 client1 kernel: ACPI Error (psparse-0626): Method parse/execution failed [\_PR_.CPU1._OSC] (Node 0xff0001264a00), AE_ALREADY_EXISTS Dec 30 18:10:38 client1 kernel: est1: Control> on cpu1 Dec 30 18:10:38 client1 kernel: p4tcc1: on cpu1 Dec 30 18:10:38 client1 kernel: acpi_hpet0: iomem 0xfed0-0xfed003ff on acpi0 Dec 30 18:10:38 client1 kernel: device_attach: acpi_hpet0 attach returned 12 I don't see any direct wrong behaviour on the system. Is anybody interested in more details ? Best regards, -- Bernard DUGAS Mobile +33 615 333 770 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: rdump stuck in sbwait state (RELENG_7)
> I'm pretty sure it's caused by FreeBSD. It can very well be related to > PR 117603, a real nasty dump(8) bug that was introduced in 7.0 on SMP > systems. But it should have been patched back in March by this: [...] > So I'm real surprised it shows up again. We got a pretty large backup > environment with dump(8) being a critical element of it. I just hope > the problem will be resolved before 7.1-RELEASE hit the streets. > > Terry, please file a bug report on this and get in touch with iedowse@ > who was implementing the aforementioned patch. I don't think my hang is related to that problem - mine seems to be in the TCP code while that problem seems to be in the kernel / filesystem code (or at least that's what I recall of it from prior discussions). Plus, my problem just showed up in a recent build. The last time subr_ sleepqueue was touched seems to have been back in September. Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
7.1RC2 - Sendmail : Segmentation fault (core dumped)
I had 7.0 installed and did a binary upgrade to 7.1RC2 everything seems okay until I went to check my mailq and got : Segmentation fault (core dumped) I am running Postfix as my mail server and it does not seem to be affected. I can't even run sendmail as I get the same error. -- Peter Sprokkelenburg mailto:pet...@netreconsys.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Lock enabling onboard lan (Attansic L1 GbE) on 7.1-PRERELEASE
Hello, one of my motherboards has an onboard Attansic network interface, I think an AR8121. # pciconf -lcv no...@pci0:4:0:0: class=0x02 card=0x82261043 chip=0x10481969 rev=0xb0 hdr=0x00 vendor = 'Attansic (Now owned by Atheros)' device = 'L1 Gigabit Ethernet 10/100/1000Base-T Ethernet Controller' class = network subclass = ethernet cap 01[40] = powerspec 2 supports D0 D3 current D0 cap 05[48] = MSI supports 1 message, 64 bit cap 10[58] = PCI-Express 1 endpoint cap 03[6c] = VPD Today I decided to give it a try. But if I try loading the if_age module, the system prints the following lines and then it freezes. age0: mem 0xfbdc-0xfbdf irq 36 at device 0.0 on pci4 age0: PCI device revision : 0x00b0 age0: Chip id/revision : 0x9006 age0: 1280 Tx FIFO, 2364 Rx FIFO age0: MSIX count : 0 age0: MSI count : 1 age0: Using 1 MSI messages. age0: Read request size : 512 bytes. age0: TLP payload size : 128 bytes. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.1RC2 - Sendmail : Segmentation fault (core dumped)
On Dec 30, 2008, at 20:16, Peter Sprokkelenburg wrote: I had 7.0 installed and did a binary upgrade to 7.1RC2 everything seems okay until I went to check my mailq and got : Segmentation fault (core dumped) I am running Postfix as my mail server and it does not seem to be affected. I can't even run sendmail as I get the same error. Could you run truss on the process please and attach the log? Thanks, -Garrett ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"