Re: ZFS MFC heads down

2009-05-22 Thread Pertti Kosunen

Kirk Strauser wrote:
So far so good here (amd64, Core2 Duo, ICH9 SATA) but I'm too chicken to 
upgrade the on-disk format yet.


Me too, upgraded pool to v13 yesterday and everything still ok. Removed 
also all loader.conf tunables. Many thanks for FreeBSD team.


(Tyan Tank GT20, 2GB memory, ICH7, amd64)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS MFC heads down

2009-05-22 Thread Alberto Villa
On Fri, May 22, 2009 at 11:45 AM, Pertti Kosunen
 wrote:
> Me too, upgraded pool to v13 yesterday and everything still ok. Removed also
> all loader.conf tunables. Many thanks for FreeBSD team.

what about i386? does it still need tunables?
-- 
Alberto Villa 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_7 tinderbox] failure on amd64/amd64

2009-05-22 Thread FreeBSD Tinderbox
TB --- 2009-05-22 11:45:27 - tinderbox 2.6 running on freebsd-stable.sentex.ca
TB --- 2009-05-22 11:45:27 - starting RELENG_7 tinderbox run for amd64/amd64
TB --- 2009-05-22 11:45:27 - cleaning the object tree
TB --- 2009-05-22 11:45:48 - cvsupping the source tree
TB --- 2009-05-22 11:45:48 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s 
/tinderbox/RELENG_7/amd64/amd64/supfile
TB --- 2009-05-22 11:45:57 - building world
TB --- 2009-05-22 11:45:57 - MAKEOBJDIRPREFIX=/obj
TB --- 2009-05-22 11:45:57 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2009-05-22 11:45:57 - TARGET=amd64
TB --- 2009-05-22 11:45:57 - TARGET_ARCH=amd64
TB --- 2009-05-22 11:45:57 - TZ=UTC
TB --- 2009-05-22 11:45:57 - __MAKE_CONF=/dev/null
TB --- 2009-05-22 11:45:57 - cd /src
TB --- 2009-05-22 11:45:57 - /usr/bin/make -B buildworld
>>> World build started on Fri May 22 11:45:59 UTC 2009
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> stage 5.1: building 32 bit shim libraries
>>> World build completed on Fri May 22 13:19:43 UTC 2009
TB --- 2009-05-22 13:19:43 - generating LINT kernel config
TB --- 2009-05-22 13:19:43 - cd /src/sys/amd64/conf
TB --- 2009-05-22 13:19:43 - /usr/bin/make -B LINT
TB --- 2009-05-22 13:19:43 - building LINT kernel
TB --- 2009-05-22 13:19:43 - MAKEOBJDIRPREFIX=/obj
TB --- 2009-05-22 13:19:43 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2009-05-22 13:19:43 - TARGET=amd64
TB --- 2009-05-22 13:19:43 - TARGET_ARCH=amd64
TB --- 2009-05-22 13:19:43 - TZ=UTC
TB --- 2009-05-22 13:19:43 - __MAKE_CONF=/dev/null
TB --- 2009-05-22 13:19:43 - cd /src
TB --- 2009-05-22 13:19:43 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Fri May 22 13:19:43 UTC 2009
>>> stage 1: configuring the kernel
[...]
WARNING: kernel contains GPL contaminated emu10k1 headers
WARNING: kernel contains GPL contaminated emu10kx headers
WARNING: kernel contains GPL contaminated emu10kx headers
WARNING: kernel contains GPL contaminated emu10kx headers
WARNING: kernel contains GPL contaminated maestro3 headers
WARNING: kernel contains GPL contaminated ext2fs filesystem
WARNING: kernel contains GPL contaminated ReiserFS filesystem
WARNING: kernel contains GPL contaminated xfs filesystem
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2009-05-22 13:19:44 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2009-05-22 13:19:44 - ERROR: failed to build lint kernel
TB --- 2009-05-22 13:19:44 - 4543.37 user 512.22 system 5656.57 real


http://tinderbox.des.no/tinderbox-releng_7-RELENG_7-amd64-amd64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-22 Thread Joe Karthauser

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I 
still get the same problem. Very confusing. :(.


Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser  wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol
attach sata4' and the device reappears. This happens on the other buses, but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't appear
to be that controller or slow in the drive bay because if I unplug all the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-22 Thread Kip Macy
Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser  wrote:
> Hi Kip,
>
> I seriously don't understand what has happened. If I boot kernel.old I still
> get the same problem. Very confusing. :(.
>
> Joe
>
> on 21/05/2009 19:28 Kip Macy said the following:
>>
>> I have no idea what is happening. I think our best bet is having
>> someone with insight into ATA provide us with help in adding
>> diagnostics.
>>
>> Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.
>>
>> Cheers,
>> Kip
>>
>>
>> On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser  wrote:
>>>
>>> Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
>>> server, which now doesn't boot. :(.
>>>
>>> So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
>>> disks
>>> (gmirror on 500Mb partition on each of five disks, and zraid2 over the
>>> rest
>>> of each drive).
>>>
>>> What I did was to update the userland, and then reboot. I didn't upgrade
>>> the
>>> kernel (but I've subsequently done that and have the same problem).
>>>
>>> What happens is that the kernel hangs booting just after displaying a
>>> LABEL
>>> message or ZFS pool/spool message. I _can_ get it to boot if I boot
>>> single
>>> user with acpi switched off. When I do that I can manually start zfs, and
>>> mount all the partitions. However, one of the disks is missing more
>>> on
>>> that next.
>>>
>>> The machine is running a gigabyte motherboard (domestic gamer P35 board,
>>> similar to this
>>>
>>> http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
>>> although it might be a DS4 variant).  I've got 5 of the 6 sata ports
>>> wired
>>> to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4"
>>> bays
>>> kind of thing).
>>>
>>> Now, because of the gmirror I can boot the system on any disk, or
>>> combination of plugged in disks. I should be able to succeed with the
>>> kernel probe up to the attempt to mount the root filesystem irrespective
>>> of
>>> any zfs pool, etc. And, indeed, this has been working fine for about two
>>> years.
>>>
>>> But, now it hangs in the same place no matter what disk I boot on (I've
>>> tried every bay).
>>>
>>> But, without ACPI enabled it does appear to boot ok... what's going on
>>> here?
>>> Is it possible that the machine has developed a hardware fault?
>>>
>>> Ok, finally, if I boot with ACPI disabled then one of the disks is
>>> missing.
>>> If I unplug it I get a disconnect message from the ata device, and a
>>> reconnect and reinit attempt when I plug it back in, but no device
>>> appears
>>> on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
>>> atacontrol
>>> attach sata4' and the device reappears. This happens on the other buses,
>>> but
>>> not on the last one. It's not the disk, because if I swap it into another
>>> bay, it comes up and appears on the bus. On the other hand it doesn't
>>> appear
>>> to be that controller or slow in the drive bay because if I unplug all
>>> the
>>> over disks the system will boot that disk and get as far as the hang
>>> hmm.
>>>
>>> Is this a consequence of disabling the ACPI?
>>>
>>> Does anyone have a clue what might be going on?
>>>
>>> Joe
>>> ___
>>> freebsd-stable@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>>>
>>
>>
>>
>
>



-- 
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.

Edmund Burke
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem 
goes away if I disconnect some combination of bays.


Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser  wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserwrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other buses,
but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"












___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net.inet.tcp.tso=1 still neceesary with fxp was Re: TCP differences in 7.2 vs 7.1

2009-05-22 Thread Michael L. Squires



On Thu, 21 May 2009, Pyun YongHyeon wrote:


On Wed, May 20, 2009 at 05:55:29PM -0400, Michael L. Squires wrote:

I started having speed problems after shifting from 7.1-STABLE to
7.1-PRERELEASE.  They have continued with 7.2-STABLLE.

Reverting to the 7.1-STABLE kernel eliminated the problem.

After downloading 7.2-STABLE from cvsup.freebsd.org at about 10:40 AM EST
on 5/20/2009, doing a buildworld/buildkernel/installkernel/installworld
cycle I still need to execute "net.inet.tcp.tso=1" to elminate throughput
problems between my home system (on Comcast) and my office PC (connected
via a Time-Warner connection).  This also affects connections to other
systems; downloading Web pages (ebay.com) speeds up after I change the TSO
entry.

The box in question runs NAT and has an fxp (Intel Pro100) interface
connected to a Comcast cable modem and an em (Intel Pro1000) interface
connected to the internal network.

There are no network errors in "netstat -i" on either interface.

The "if_fxp.c" code appears to be the May 7 version.



You should have cvs rev. 1.266.2.15 of if_fxp.c.


This is the dmesg entry for the card in question.  The system is a dual Xeon
Supermicro 1U box, 1GB RAM, single 300GB IDE hard drive.

fxp0:  port 0xe400-0xe43f mem
0xfebfd000-0xfebfdfff,0xfeb8-0xfeb9 irq 27 at device 7.0 on pci0
miibus0:  on fxp0



Since you use both em(4) and fxp(4) I'd like to know which driver
has the issue. Instead of disabling TSO of network stack try
disabling TSO for each interface. For instance,
1. Diable TSO of em(4) and check you see the same issue
   (ifconfig em0 -tso).
2. Diable TSO of fxp(4) and check you see the same issue
   (ifconfig fxp0 -tso).



The version of if_fpx.c is in fact 1.266.2.15.

Connecting to the FreeBSD box from a PC with a bash shell under XP 
SP3/Cygwin OpenSSH I find


(1)  disable "tso" on the internal "em0" interface has no effect; but

(2)  disabling "tso" on the external "fxp0" inteface eliminates the 
througput problem.  The effect appears to be the same as using sysctl to 
disable tso on all interfaces.


With "tso" enabled on the "fxp0" interface the connection (reading email 
using "pine" in a large window) hung completely.


There are no errors in "netstat -i" nor in /var/log/messages.

"netstat -e" on the XP PC shows no discards or errors; however, I don't
think I've ever seen a PC under Windows admit to network errors.

The fxp0 interface connects to a Comcast cable modem, which eventually 
connects to my office PC which is in the "iga.in.gov" domain hosted by 
TimeWarner.


I'll be happy to run anything else you want.

Mike Squires
UN*X at home
Since 1985


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


devd panic on i386 7.2 Release with CARP

2009-05-22 Thread Ken Menzel


I am having a problem with one of my freebsd 7.2R boxes panicing on 
start of devd after upgrading to 7.2R.  It is an old DELL 2400 dual 
processor.  This is a build from completely refreshed sources.


-  generic kernel does not panic (built by me)
- custom kernel does not panic with devd_enable="NO" set in rc.conf, but 
!!! __ I can start devd AFTER booting by hand at the command prompt!


-  custom kernel (carp and more memory ) does panic if devd is started 
automatically by rc.d scripts (the default behaviour). 

Do I really need devd for anything if I am not using USB?  Anyone have 
any idea of how to fix this?


My kernel config is pretty simple,  I am building a test i386 box with a 
carp kernel to try and repro this on another box, but that box is really 
slow.


After booting I just run
kes# devd
devd: Setting hw.bus.devctl_disable to 0
kes#

And it does NOT panic  Weird huh?

kernel config folled by back trace


#
# SMP -- Generic kernel configuration file for FreeBSD/i386 SMP
#Use this for multi-processor machines
#
# $FreeBSD: src/sys/i386/conf/SMP,v 1.5.6.1 2005/09/18 03:37:58 scottl Exp $

include GENERIC

ident   WHI

# To make an SMP kernel, the next line is needed
#optionsSMP # Symmetric MultiProcessor Kernel
#options ASR_COMPAT

options MAXDSIZ="(1536*1024*1024)"
options MAXSSIZ="(512*1024*1024)"
options DFLDSIZ="(1536*1024*1024)"
device  carp
kes#

kgdb back trace of core here

118>
<118>#
<118>Loading configuration files.
<118>kernel dumps on /dev/da0s1b
<118>Entropy harvesting:
<118> interrupts
<118> ethernet
<118> point_to_point
<118> kickstart
<118>.
<118>swapon: adding /dev/da0s1b as swap device
<118>Fast boot: skipping disk checks.
GEOM_LABEL: Label for provider da0s1f is ufsid/3ec3641041d090a9.
<118>Setting hostuuid: 44454c4c-5a9b-1059-8057-b8c04f303031.
<118>Setting hostid: 0xd1c205d3.
<118>Mounting local file systems:
GEOM_LABEL: Label ufsid/3ec3641041d090a9 removed.
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted
<118>.
<118>Setting hostname: kes.icarz.com.
<118>net.inet6.ip6.auto_linklocal:
<118>1
<118> ->
<118>0
<118>
<118>kern.maxfilesperproc:
<118>11095
<118> ->
<118>19000
<118>
<118>kern.maxfiles:
<118>12328
<118> ->
<118>2
<118>
<118>lo0: flags=8049 metric 0 mtu 16384
<118>   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
<118>   inet6 ::1 prefixlen 128
<118>   inet 127.0.0.1 netmask 0xff00
<118>fxp0: flags=8843 metric 0 
mtu 1500

<118>   options=2009
<118>   ether 00:b0:d0:3e:c7:19
<118>   inet 207.99.22.32 netmask 0xff80 broadcast 207.99.22.127
<118>   media: Ethernet autoselect (100baseTX )
<118>   status: active
<118>add net default: gateway 207.99.22.1
<118>Additional routing options:
<118>.
<118>Starting devd.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address   = 0x0
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0874488
stack pointer   = 0x28:0xf7bd0b68
frame pointer   = 0x28:0xf7bd0b68
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 388 (devd)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 2m12s
Physical memory: 2035 MB
Dumping 68 MB: 53 37 21 5

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
/boot/kernel/acpi.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:196
196 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc07e2a07 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07e2cd9 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0ae895c in trap_fatal (frame=0xf7bd0b28, eva=0)
   at /usr/src/sys/i386/i386/trap.c:939
#4  0xc0ae8be0 in trap_pfault (frame=0xf7bd0b28, usermode=0, eva=0)
   at /usr/src/sys/i386/i386/trap.c:852
#5  0xc0ae958c in trap (frame=0xf7bd0b28) at 
/usr/src/sys/i386/i386/trap.c:530

#6  0xc0acdc9b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#7  0xc0874488 in strlen (str=0x0) at /usr/src/sys/libkern/strlen.c:41
#8  0xc080a46c in devread (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0)
   at /usr/src/sys/kern/subr_bus.c:458
#9  0xc07a6039 in giant_read (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0)
   at /usr/src/sys/kern/kern_conf.c:414
#10 0xc076cecd in devfs_read_f (fp=0xc58ba260, uio=0xf7bd0c60,
   cred=0xc5470300, flags=0, td=0xc56288c0)
   at /usr/src/sys/fs/devfs/devfs_vnops.c:1007
#11 0xc081be86 in dofileread (td=0xc56288c0, fd=3, fp=0xc58ba260,

Re: devd panic on i386 7.2 Release with CARP

2009-05-22 Thread Kostik Belousov
On Fri, May 22, 2009 at 03:26:51PM -0400, Ken Menzel wrote:
> 
> I am having a problem with one of my freebsd 7.2R boxes panicing on 
> start of devd after upgrading to 7.2R.  It is an old DELL 2400 dual 
> processor.  This is a build from completely refreshed sources.
> 
> -  generic kernel does not panic (built by me)
> - custom kernel does not panic with devd_enable="NO" set in rc.conf, but 
> !!! __ I can start devd AFTER booting by hand at the command prompt!
> 
> -  custom kernel (carp and more memory ) does panic if devd is started 
> automatically by rc.d scripts (the default behaviour). 
> 
> Do I really need devd for anything if I am not using USB?  Anyone have 
> any idea of how to fix this?
> 
> My kernel config is pretty simple,  I am building a test i386 box with a 
> carp kernel to try and repro this on another box, but that box is really 
> slow.
> 
> After booting I just run
> kes# devd
> devd: Setting hw.bus.devctl_disable to 0
> kes#
...
> <118>lo0: flags=8049 metric 0 mtu 16384
> <118>   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
> <118>   inet6 ::1 prefixlen 128
> <118>   inet 127.0.0.1 netmask 0xff00
> <118>fxp0: flags=8843 metric 0 
> mtu 1500
> <118>   options=2009
> <118>   ether 00:b0:d0:3e:c7:19
> <118>   inet 207.99.22.32 netmask 0xff80 broadcast 207.99.22.127
> <118>   media: Ethernet autoselect (100baseTX )
> <118>   status: active
> <118>add net default: gateway 207.99.22.1
> <118>Additional routing options:
> <118>.
> <118>Starting devd.
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 00
> fault virtual address   = 0x0
> fault code  = supervisor read, page not present
> instruction pointer = 0x20:0xc0874488
> stack pointer   = 0x28:0xf7bd0b68
> frame pointer   = 0x28:0xf7bd0b68
> code segment= base 0x0, limit 0xf, type 0x1b
>= DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 388 (devd)
> trap number = 12
> panic: page fault
> cpuid = 1
> Uptime: 2m12s
> Physical memory: 2035 MB
> Dumping 68 MB: 53 37 21 5
> 
> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
> /boot/kernel/acpi.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/acpi.ko
> #0  doadump () at pcpu.h:196
> 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> (kgdb) backtrace
> #0  doadump () at pcpu.h:196
> #1  0xc07e2a07 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
> #2  0xc07e2cd9 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:574
> #3  0xc0ae895c in trap_fatal (frame=0xf7bd0b28, eva=0)
>at /usr/src/sys/i386/i386/trap.c:939
> #4  0xc0ae8be0 in trap_pfault (frame=0xf7bd0b28, usermode=0, eva=0)
>at /usr/src/sys/i386/i386/trap.c:852
> #5  0xc0ae958c in trap (frame=0xf7bd0b28) at 
> /usr/src/sys/i386/i386/trap.c:530
> #6  0xc0acdc9b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
> #7  0xc0874488 in strlen (str=0x0) at /usr/src/sys/libkern/strlen.c:41
> #8  0xc080a46c in devread (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0)
>at /usr/src/sys/kern/subr_bus.c:458
> #9  0xc07a6039 in giant_read (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0)
>at /usr/src/sys/kern/kern_conf.c:414
> #10 0xc076cecd in devfs_read_f (fp=0xc58ba260, uio=0xf7bd0c60,
>cred=0xc5470300, flags=0, td=0xc56288c0)
>at /usr/src/sys/fs/devfs/devfs_vnops.c:1007
> #11 0xc081be86 in dofileread (td=0xc56288c0, fd=3, fp=0xc58ba260,
>auio=0xf7bd0c60, offset=-1, flags=0) at file.h:245
> #12 0xc081c1f8 in kern_readv (td=0xc56288c0, fd=3, auio=0xf7bd0c60)
>at /usr/src/sys/kern/sys_generic.c:193
> #13 0xc081c2df in read (td=0xc56288c0, uap=0xf7bd0cfc)
>at /usr/src/sys/kern/sys_generic.c:109
> ---Type  to continue, or q  to quit---
> #14 0xc0ae8f35 in syscall (frame=0xf7bd0d38)
>at /usr/src/sys/i386/i386/trap.c:1090
> #15 0xc0acdd00 in Xint0x80_syscall () at 
> /usr/src/sys/i386/i386/exception.s:255
> #16 0x0033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)

The strlen was supplied NULL pointer. This means that n1->dei_data
is NULL. Brief looking over the RELENG_7 code does not reveal any
caller of devctl_queue_data outside subr_bus.c, and all uses inside
subr_bus.c seems to be safe.

Added options in the config cannot affect this behaviour, I believe.
You may add check at the start of the devctl_queue_data() to verify
that data != NULL, and panic when it is. This way, we will see where
it happen.


pgpq3IBeruJwK.pgp
Description: PGP signature


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.


Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.


Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Larry Rosenman

I saw really strange stuff with one bad SATA cable on my 6 drive ZFS array.
It would work most of the time, but
the scrub would either cough up CRC's or hang.

I wound up replacing the disk *AND* the cable, and it's been fine since. 

This is on a SuperMicro chassis with Intel chips.

YMMV
-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

-Original Message-
From: owner-freebsd-sta...@freebsd.org
[mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Joe Karthauser
Sent: Friday, May 22, 2009 3:45 PM
To: Alexander Motin
Cc: freebsd-stable@freebsd.org; Kip Macy
Subject: Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at
kernel boot now, but didn't before... (Re: ZFS MFC heads up))

This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.

Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:
> Hi Alexander,
>
> I've love it if you were able to provide some insight into this problem.
>
> I'm going to try switching sata cables around next to see if the problem
> goes away if I disconnect some combination of bays.
>
> Thanks,
> Joe
>
> on 22/05/2009 19:39 Kip Macy said the following:
>> Motin is your best bet in tracking down ATA problems.
>>
>> Cheers,
>> Kip
>>
>>
>> On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote:
>>> Hi Kip,
>>>
>>> I seriously don't understand what has happened. If I boot kernel.old
>>> I still
>>> get the same problem. Very confusing. :(.
>>>
>>> Joe
>>>
>>> on 21/05/2009 19:28 Kip Macy said the following:
 I have no idea what is happening. I think our best bet is having
 someone with insight into ATA provide us with help in adding
 diagnostics.

 Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

 Cheers,
 Kip


 On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser
 wrote:
> Hmm, I've had a bit of a miserable afternoon trying to fight my
> RELENG_7
> server, which now doesn't boot. :(.
>
> So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
> disks
> (gmirror on 500Mb partition on each of five disks, and zraid2 over the
> rest
> of each drive).
>
> What I did was to update the userland, and then reboot. I didn't
> upgrade
> the
> kernel (but I've subsequently done that and have the same problem).
>
> What happens is that the kernel hangs booting just after displaying a
> LABEL
> message or ZFS pool/spool message. I _can_ get it to boot if I boot
> single
> user with acpi switched off. When I do that I can manually start
> zfs, and
> mount all the partitions. However, one of the disks is missing
> more
> on
> that next.
>
> The machine is running a gigabyte motherboard (domestic gamer P35
> board,
> similar to this
>
>
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?Produ
ctID=2533,
>
> although it might be a DS4 variant). I've got 5 of the 6 sata ports
> wired
> to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
> 5-1/4"
> bays
> kind of thing).
>
> Now, because of the gmirror I can boot the system on any disk, or
> combination of plugged in disks. I should be able to succeed with the
> kernel probe up to the attempt to mount the root filesystem
> irrespective
> of
> any zfs pool, etc. And, indeed, this has been working fine for
> about two
> years.
>
> But, now it hangs in the same place no matter what disk I boot on
> (I've
> tried every bay).
>
> But, without ACPI enabled it does appear to boot ok... what's going on
> here?
> Is it possible that the machine has developed a hardware fault?
>
> Ok, finally, if I boot with ACPI disabled then one of the disks is
> missing.
> If I unplug it I get a disconnect message from the ata device, and a
> reconnect and reinit attempt when I plug it back in, but no device
> appears
> on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
> atacontrol
> attach sata4' and the device reappears. This happens on the other
> buses,
> but
> not on the last one. It's not the disk, because if I swap it into
> another
> bay, it comes up and appears on the bus. On the other hand it 

Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
I spoke too soon. It must have just randomly booted, because it is now 
hanging again. No amount of jiggling cables has made any difference.


:(.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
"freebsd-stable-unsubscr...@freebsd.org"














___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net.inet.tcp.tso=1 still neceesary with fxp was Re: TCP differences in 7.2 vs 7.1

2009-05-22 Thread Pyun YongHyeon
On Fri, May 22, 2009 at 03:50:07PM -0400, Michael L. Squires wrote:
> 
> 
> On Thu, 21 May 2009, Pyun YongHyeon wrote:
> 
> >On Wed, May 20, 2009 at 05:55:29PM -0400, Michael L. Squires wrote:
> >>I started having speed problems after shifting from 7.1-STABLE to
> >>7.1-PRERELEASE.  They have continued with 7.2-STABLLE.
> >>
> >>Reverting to the 7.1-STABLE kernel eliminated the problem.
> >>
> >>After downloading 7.2-STABLE from cvsup.freebsd.org at about 10:40 AM EST
> >>on 5/20/2009, doing a buildworld/buildkernel/installkernel/installworld
> >>cycle I still need to execute "net.inet.tcp.tso=1" to elminate throughput
> >>problems between my home system (on Comcast) and my office PC (connected
> >>via a Time-Warner connection).  This also affects connections to other
> >>systems; downloading Web pages (ebay.com) speeds up after I change the TSO
> >>entry.
> >>
> >>The box in question runs NAT and has an fxp (Intel Pro100) interface
> >>connected to a Comcast cable modem and an em (Intel Pro1000) interface
> >>connected to the internal network.
> >>
> >>There are no network errors in "netstat -i" on either interface.
> >>
> >>The "if_fxp.c" code appears to be the May 7 version.
> >>
> >
> >You should have cvs rev. 1.266.2.15 of if_fxp.c.
> >
> >>This is the dmesg entry for the card in question.  The system is a dual 
> >>Xeon
> >>Supermicro 1U box, 1GB RAM, single 300GB IDE hard drive.
> >>
> >>fxp0:  port 0xe400-0xe43f mem
> >>0xfebfd000-0xfebfdfff,0xfeb8-0xfeb9 irq 27 at device 7.0 on pci0
> >>miibus0:  on fxp0
> >>
> >
> >Since you use both em(4) and fxp(4) I'd like to know which driver
> >has the issue. Instead of disabling TSO of network stack try
> >disabling TSO for each interface. For instance,
> >1. Diable TSO of em(4) and check you see the same issue
> >   (ifconfig em0 -tso).
> >2. Diable TSO of fxp(4) and check you see the same issue
> >   (ifconfig fxp0 -tso).
> >
> 
> The version of if_fpx.c is in fact 1.266.2.15.
> 
> Connecting to the FreeBSD box from a PC with a bash shell under XP 
> SP3/Cygwin OpenSSH I find
> 
> (1)  disable "tso" on the internal "em0" interface has no effect; but
> 
> (2)  disabling "tso" on the external "fxp0" inteface eliminates the 
> througput problem.  The effect appears to be the same as using sysctl to 
> disable tso on all interfaces.
> 
> With "tso" enabled on the "fxp0" interface the connection (reading email 
> using "pine" in a large window) hung completely.
> 
> There are no errors in "netstat -i" nor in /var/log/messages.
> 
> "netstat -e" on the XP PC shows no discards or errors; however, I don't
> think I've ever seen a PC under Windows admit to network errors.
> 
> The fxp0 interface connects to a Comcast cable modem, which eventually 
> connects to my office PC which is in the "iga.in.gov" domain hosted by 
> TimeWarner.
> 
> I'll be happy to run anything else you want.
> 

Would you capture the failing TCP session with tcpdump and mail me
the URL of the captured file(off-list)?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Alexander Motin

Hi.

Joe Karthauser wrote:
I spoke too soon. It must have just randomly booted, because it is now 
hanging again. No amount of jiggling cables has made any difference.


Can you provide verbose boot messages of your system from the beginning 
up to the problem? Especially, all related to the ATA.


Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation?

--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"