Re: rdump stuck in sbwait state (RELENG_7)

2008-12-30 Thread Peter Jeremy
On 2008-Dec-29 20:28:41 -0500, Terry Kennedy  wrote:
>  I upgraded a box (Dell Poweredge 1550, dual PIII processors) from a kernel +
>world of December 8th to one from today (December 29th) and I am experiencing
>a new problem with rdump.
...
>  A tcpdump on both the sending and receiving systems shows no packets
>between them from the rdump processes. However, I can rshell both ways
>and get the expected output, so the link isn't down.

This is probably the critical piece of information - the TCP connection
has stopped transferring data for some reason and the rdump is blocked
waiting to send.

Unfortunately, you need the last packets that were exchanged in order
to identify which end has the problem (and hopefully provide some
pointers as to why).  If possible, can you repeat the dump whilst you
run a tcpdump on the rdump flow and then post the last dozen or so
packets in each direction.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpL79vKz7BI4.pgp
Description: PGP signature


panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock

2008-12-30 Thread pluknet
While debugging I noticed that sys/netgraph/ng_base.c#rev1.131
was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15.

But this depends on sys/kern/subr_witness.c#rev1.227 which was
not MFCed, and that is triggering panic (in subj) if kernel is built
with WITNESS.

-- 
wbr,
pluknet
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock

2008-12-30 Thread pluknet
2008/12/30 Alexander Motin :
> pluknet wrote:
>> While debugging I noticed that sys/netgraph/ng_base.c#rev1.131
>> was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15.
>>
>> But this depends on sys/kern/subr_witness.c#rev1.227 which was
>> not MFCed, and that is triggering panic (in subj) if kernel is built
>> with WITNESS.
>
> Merged.
>
> --
> Alexander Motin
>

many thanks!

-- 
wbr,
pluknet
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: lock (ng_worklist) sleep mutex does not match earlier (spin mutex) lock

2008-12-30 Thread Alexander Motin
pluknet wrote:
> While debugging I noticed that sys/netgraph/ng_base.c#rev1.131
> was MFCed to RELENG_6 inbeetwen 6.3 and 6.4 by mav as 1.102.2.15.
> 
> But this depends on sys/kern/subr_witness.c#rev1.227 which was
> not MFCed, and that is triggering panic (in subj) if kernel is built
> with WITNESS.

Merged.

-- 
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: rdump stuck in sbwait state (RELENG_7)

2008-12-30 Thread Terry Kennedy
> Unfortunately, you need the last packets that were exchanged in order
> to identify which end has the problem (and hopefully provide some
> pointers as to why).  If possible, can you repeat the dump whilst you
> run a tcpdump on the rdump flow and then post the last dozen or so
> packets in each direction.

  That could be pretty unpleasant - this happens at a random point while
dumping 4GB or so. If I have to, I'll do it but I was hoping there was
a better way.

  Shouldn't this get torn down by a keepalive at some point? It has been
sitting for 9 hours or so at this point...

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: rdump stuck in sbwait state (RELENG_7)

2008-12-30 Thread Peter Jeremy
On 2008-Dec-30 05:48:26 -0500, Terry Kennedy  wrote:
>> Unfortunately, you need the last packets that were exchanged in order
>> to identify which end has the problem (and hopefully provide some
>> pointers as to why).  If possible, can you repeat the dump whilst you
>> run a tcpdump on the rdump flow and then post the last dozen or so
>> packets in each direction.
>
>  That could be pretty unpleasant - this happens at a random point while
>dumping 4GB or so. If I have to, I'll do it but I was hoping there was
>a better way.

Sorry, I can't think of any - by the time you see it hung, whatever
went wrong has already happened.  You might glean some insight from
the TCP socket state (on the FreeBSD side, use 'netstat -A' to print
the PCB address and gdb to dump the contents but I'm not sure how to
get this data out of OpenVMS).  The '-C' and '-W' options to tcpdump
will help.

>  Shouldn't this get torn down by a keepalive at some point? It has been
>sitting for 9 hours or so at this point...

On FreeBSD, keepalives are off by default.  You change change the
default with sysctl net.inet.tcp.always_keepalive but I think that
only affects new connections.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpEhfcVex9gC.pgp
Description: PGP signature


Re: rdump stuck in sbwait state (RELENG_7)

2008-12-30 Thread Andy Kosela
I'm pretty sure it's caused by FreeBSD.  It can very well be related to
PR 117603, a real nasty dump(8) bug that was introduced in 7.0 on SMP
systems.  But it should have been patched back in March by this:

jeff 2008-03-13 00:46:12 UTC

FreeBSD src repository

Modified files:
sys/kern subr_sleepqueue.c
Log:
PR 117603
- Close a sleepqueue signal race by interlocking with the per-process
spinlock. This was mistakenly omitted from the thread_lock patch and
has been a race since.

MFC After: 1 week
PR: bin/117603
Reported by: Danny Braniss 

Revision Changes Path
1.48 +5 -2 src/sys/kern/subr_sleepqueue.c

So I'm real surprised it shows up again. We got a pretty large backup
environment with dump(8) being a critical element of it.  I just hope
the problem will be resolved before 7.1-RELEASE hit the streets.

Terry, please file a bug report on this and get in touch with iedowse@
who was implementing the aforementioned patch.

Andy Kosela
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SATA hotplug and AHCI

2008-12-30 Thread Bruce M. Simpson

Andrey V. Elsukov wrote:

...
Linux's libata driver has a quirk for VIA AHCI:

/* vt8251 doesn't clear BSY on signature FIS reception,
 * request follow-up softreset.
 */

If i right understand it issues softreset for VIA controllers just
after hardreset. And after softreset it is trying to read device 
signature.


FreeBSD CURRENT has similar code, but it is disabled by default.
You can try install CURRENT and rebuild ata_ahci driver with AHCI_PM 
option.

May be it will help..



I'm glad this came up. When I asked a few weeks ago about SATA Hotplug 
support, I was asking because of a board with a VIA SATA controller I 
was planning to add drives too, on a JBOD basis.


Perhaps this hack can be backported to 7.x to actually make VIA 
controllers useful?


P.S. VIA's SATA RAID BIOS is a pile of poop, don't bother using VIA for 
RAID.


cheers
BMS

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SATA hotplug and AHCI

2008-12-30 Thread David Ehrmann

Bruce M. Simpson wrote:

Andrey V. Elsukov wrote:

...
Linux's libata driver has a quirk for VIA AHCI:

/* vt8251 doesn't clear BSY on signature FIS reception,
 * request follow-up softreset.
 */

If i right understand it issues softreset for VIA controllers just
after hardreset. And after softreset it is trying to read device 
signature.


FreeBSD CURRENT has similar code, but it is disabled by default.
You can try install CURRENT and rebuild ata_ahci driver with AHCI_PM 
option.

May be it will help..



I'm glad this came up. When I asked a few weeks ago about SATA Hotplug 
support, I was asking because of a board with a VIA SATA controller I 
was planning to add drives too, on a JBOD basis.


Perhaps this hack can be backported to 7.x to actually make VIA 
controllers useful?
I'm *probably* going to wait for the next release and hope they enable 
the fix.  Having to run atacontrol attach/detach is a little annoying, 
but it seems to work, so for now, I might just say that's good enough.


P.S. VIA's SATA RAID BIOS is a pile of poop, don't bother using VIA 
for RAID.
I'd say the entire BIOS is.  I had problems getting it to detect boot 
devices for the F11 boot menu.  The were more or less resolved after 
rebooting (so the hardware was no longer new), but still...

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


TCP packet out-of-order problem

2008-12-30 Thread Lin Jui-Nan Eric
Dear listers,

We recently found our new FreeBSD server (located in some foreign
region) has poor network performance. After doing some tcpdump and
iperf testing, we found that out-of-order TCP packets are not inserted
into queue.

This is an 100Mbps line, and TSO is disabled.

% uname -a
FreeBSD bsd 7.1-RC2 FreeBSD 7.1-RC2 #2: Wed Dec 31 03:12:39 CST 2008
  r...@bsd:/usr/obj/usr/src/sys/KERNEL  amd64

% iperf -c 10.1.1.250

Client connecting to office, TCP port 5001
TCP window size: 3.07 MByte (default)

[  4] local 10.1.1.210 port 61488 connected with 10.1.1.250 port 5001
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.2 sec  5.74 MBytes  4.74 Mbits/sec

03:47:21.146397 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
159305:160753(1448) ack 1 win 1040 
03:47:21.146409 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 160753
win 12568 
03:47:21.146473 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
160753:162201(1448) ack 1 win 1040 
03:47:21.146485 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 162201
win 12568 
03:47:21.146972 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
163649:165097(1448) ack 1 win 1040 
03:47:21.146983 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 162201
win 12573 
03:47:21.146985 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
162201:163649(1448) ack 1 win 1040 
03:47:21.146996 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649
win 12568 
03:47:21.146998 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
165097:166545(1448) ack 1 win 1040 
03:47:21.147006 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649
win 12573 
03:47:21.147009 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
166545:167993(1448) ack 1 win 1040 
03:47:21.147017 IP 10.1.1.250.5001 > 10.1.1.210.54919: . ack 163649
win 12573 
03:47:21.147019 IP 10.1.1.210.54919 > 10.1.1.250.5001: .
167993:169441(1448) ack 1 win 1040 

* You can see "ack 163649" repeating, but the packet is transmitted
before 163649:165097.

% cat /etc/sysctl.conf
# $FreeBSD: src/etc/sysctl.conf,v 1.8 2003/03/13 18:43:50 mux Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
debug.bootverbose=1
kern.ipc.somaxconn=8192
kern.maxfiles=65536
kern.maxfilesperproc=32768
kern.maxprocperuid=65536
net.inet.ip.fastforwarding=1
net.inet.tcp.delayed_ack=0
vm.pmap.shpgperproc=2000
kern.ipc.maxsockbuf=8388608
net.inet.tcp.sendspace=3217968
net.inet.tcp.recvspace=3217968

Is our configuration wrong? Or it is an known bug? I have searched
stable & net list, but found no similar discussion.
Thank you all in advance!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


7.1-RC2 : ACPI warning and errors ACPI Error (psparse-0626)

2008-12-30 Thread Bernard Dugas

Hello,

With 7.1-RC2 :

Dec 30 18:10:38 client1 kernel: FreeBSD 7.1-RC2 #0: Tue Dec 23 11:42:13 
UTC 2008
Dec 30 18:10:38 client1 kernel: 
r...@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
Dec 30 18:10:38 client1 kernel: Timecounter "i8254" frequency 1193182 Hz 
quality 0
Dec 30 18:10:38 client1 kernel: CPU: Intel(R) Core(TM)2 CPU 
4300  @ 1.80GHz (1800.01-MHz K8-class CPU)
Dec 30 18:10:38 client1 kernel: Origin = "GenuineIntel"  Id = 0x6f2 
Stepping = 2


I have found following acpi warning and errors :

Dec 30 18:10:38 client1 kernel: cpu0:  on acpi0
Dec 30 18:10:38 client1 kernel: ACPI Warning (tbutils-0243): Incorrect 
checksum in table [ASF!] -  77, should be 32 [20070320]
Dec 30 18:10:38 client1 kernel: ACPI Error (psparse-0626): Method 
parse/execution failed [\_PR_.CPU0._OSC] (Node 0xff0001264aa0), 
AE_ALREADY_EXISTS
Dec 30 18:10:38 client1 kernel: est0: Control> on cpu0
Dec 30 18:10:38 client1 kernel: p4tcc0:  
on cpu0

Dec 30 18:10:38 client1 kernel: cpu1:  on acpi0
Dec 30 18:10:38 client1 kernel: ACPI Error (psparse-0626): Method 
parse/execution failed [\_PR_.CPU1._OSC] (Node 0xff0001264a00), 
AE_ALREADY_EXISTS
Dec 30 18:10:38 client1 kernel: est1: Control> on cpu1
Dec 30 18:10:38 client1 kernel: p4tcc1:  
on cpu1
Dec 30 18:10:38 client1 kernel: acpi_hpet0:  
iomem 0xfed0-0xfed003ff on acpi0

Dec 30 18:10:38 client1 kernel: device_attach: acpi_hpet0 attach returned 12

I don't see any direct wrong behaviour on the system.

Is anybody interested in more details ?

Best regards,
--
Bernard DUGAS Mobile +33 615 333 770
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: rdump stuck in sbwait state (RELENG_7)

2008-12-30 Thread Terry Kennedy
> I'm pretty sure it's caused by FreeBSD.  It can very well be related to
> PR 117603, a real nasty dump(8) bug that was introduced in 7.0 on SMP
> systems.  But it should have been patched back in March by this:
[...]
> So I'm real surprised it shows up again. We got a pretty large backup
> environment with dump(8) being a critical element of it.  I just hope
> the problem will be resolved before 7.1-RELEASE hit the streets.
>
> Terry, please file a bug report on this and get in touch with iedowse@
> who was implementing the aforementioned patch.

  I don't think my hang is related to that problem - mine seems to be in
the TCP code while that problem seems to be in the kernel / filesystem
code (or at least that's what I recall of it from prior discussions).

  Plus, my problem just showed up in a recent build. The last time subr_
sleepqueue was touched seems to have been back in September.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


7.1RC2 - Sendmail : Segmentation fault (core dumped)

2008-12-30 Thread Peter Sprokkelenburg

I had 7.0 installed and did a binary upgrade to 7.1RC2

everything seems okay until I went to check my mailq and got :

Segmentation fault (core dumped)

I am running Postfix as my mail server and it does not seem to be  
affected.


I can't even run sendmail as I get the same error.


--
Peter Sprokkelenburg
mailto:pet...@netreconsys.com



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Lock enabling onboard lan (Attansic L1 GbE) on 7.1-PRERELEASE

2008-12-30 Thread Barbara
Hello,
one of my motherboards has an onboard Attansic network interface, I 
think an AR8121.

# pciconf -lcv
no...@pci0:4:0:0:   class=0x02 
card=0x82261043 chip=0x10481969 rev=0xb0 hdr=0x00
vendor = 'Attansic 
(Now owned by Atheros)'
device = 'L1 Gigabit Ethernet 10/100/1000Base-T 
Ethernet Controller'
class  = network
subclass   = ethernet
cap 
01[40] = powerspec 2  supports D0 D3  current D0
cap 05[48] = MSI supports 
1 message, 64 bit
cap 10[58] = PCI-Express 1 endpoint
cap 03[6c] = VPD


Today I decided to give it a try.
But if I try loading the if_age module, the 
system prints the following lines and then it freezes.

age0:  mem 0xfbdc-0xfbdf irq 36 at 
device 0.0 on pci4
age0: PCI device revision : 0x00b0
age0: Chip id/revision : 
0x9006
age0: 1280 Tx FIFO, 2364 Rx FIFO
age0: MSIX count : 0
age0: MSI count : 
1
age0: Using 1 MSI messages.
age0: Read request size : 512 bytes.
age0: TLP 
payload size : 128 bytes.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.1RC2 - Sendmail : Segmentation fault (core dumped)

2008-12-30 Thread Garrett Cooper
On Dec 30, 2008, at 20:16, Peter Sprokkelenburg  
 wrote:



I had 7.0 installed and did a binary upgrade to 7.1RC2

everything seems okay until I went to check my mailq and got :

Segmentation fault (core dumped)

I am running Postfix as my mail server and it does not seem to be  
affected.


I can't even run sendmail as I get the same error.


Could you run truss on the process please and attach the log?
Thanks,
-Garrett
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"